In the latest (September 2007) issue of Game Developer, my old slave driver during my Neversoft days, Mick West, wrote a nice article about responsiveness in games. One thing Mick’s article covers in quite some detail is input latency and how this latency changes between a game running at 60 fps and at 30 fps.
The whole 30 vs. 60 fps issue is a timely one (no pun intended) that is probably discussed at just about any game developer making games for the PS3 and the 360, due to the difficulty of hitting 60 fps (which is unarguably better than 30 fps, when attainable) while still having what’s considered to be next-gen graphics. It’s certainly a topic that has come up at our office more than once, shall we say.
There were two things that Mick didn’t cover in his article that I wanted to touch upon here. The first one is that his presentation didn’t talk about frame scanouts potentially taking longer than 1/60th of a second. Just to make it clear: I’m not talking about the interlacing issue of fields vs frames here. Even for progressive scan we could have scanout taking longer than a frame. The reason is that certain TV’s do post-processing of the image to do god-knows-what (convert between resolutions, remove sparkling pixels, run sharpening filters, etc) which easily can introduce one or more frames of delay. There really isn’t much to be done about this sort of built-in delay other than to purchase a TV that has a “game mode” (e.g. a Toshiba 52HMX94 and probably several other, newer TVs) which disables all the fancy post-processing, etc. that cause delays. Also, make sure to use a HDMI cable, where applicable.
However, the other thing Mick didn’t really touch upon is something we can do to reduce the lag, namely restructuring how the game loop and the rendering is done.
Lag at 60fps
Before we get to the crazy stuff, let’s examine what things look like with a traditional game and rendering loop, running at 60 fps. In the following I’ll assume we have a TV capable of 60Hz scanout, just like Mick did. Also, in the following, when I say “frame” I always mean a frame at 60Hz.
The above figure is more or less identical to the one in Mick’s article, showing 4 frames (at 60Hz) and how processing flows from CPU to GPU to the screen. The difference in my illustration is that it highlights both the maximum latency (shaded in gray, which happens when input occurs at the time indicated by the red next) as well as the minimum latency (shown only by the smaller vertical arrow to the right).
Best-case for 60 fps we get a 3 frame lag and worst-case we have a 3.67 frame lag (assuming input and logic each take 1/3 of a frame to run).
Lag at 30fps
The reason more and more people are aiming at 30 fps instead of 60 fps is because we get to use the GPU for twice as long, and for next-gen graphics the GPU is invariably the bottleneck so more GPU time is a good thing (many would argue a necessary thing). But, alas, the input latency increases when we go to 30 fps, as the figure below illustrates. (BTW, in this figure I’ve assumed input and logic still takes 1/3 of a frame (at 60Hz) to run. That’s perhaps not entirely accurate, but it makes for better comparisons.)
Looking at the figure we see that best-case for 30 fps is 5 60Hz-frames of lag, and worst case is 6.67 frames of lag. Compared to 60 fps, we have introduced between 2 and 3 frames of lag, for the price of much improved graphics.
Here it may be worthwhile to stop and point out that running at 30 fps in practical terms actually more than doubles what you can draw over running at 60 fps! “What? How can that be?! Christer is crazy!” you say. Hardly! Crazy like a fox perhaps. You see, let’s say 1/3 of the frame at 60 is spent drawing HUD, doing post-processing, and other fixed overhead. That leaves 2/3 of a frame for drawing game objects. Running at 30 fps, we still have 1/3 of a frame for fixed stuff, and 1 2/3 frames available for drawing game objects. 1 2/3 divided by 2/3 is 2.5, so we can actually draw 2.5 times as many game objects, with everything else the same!
Okay, so I’m a little bit sloppy in equating 2.5 times as much time to draw objects with 2.5 times as many drawn objects, and perhaps we’d do a little extra post-processing, but you get the point: realistically we can do more than twice as much at 30 fps than we can at 60 fps. Who is crazy now, huh?!
Reducing the latency
So, we can render more stuff at 30 fps, but we paid for it with longer latencies. Can we reduce the latency somehow? We can, by switching from delayed rendering to immediate rendering! With delayed rendering (to distinguish it from deferred rendering) I mean the practice of preparing a whole draw list on one frame and have it render on the next. What we’ve talked about above, in other words. Immediate rendering would be when we issue draw calls right there, on the spot. Naively changing the drawing code to be immediate we get the situation illustrated below.
The first thing we notice is that we’re no longer utilizing the GPU 100%. But, hey, we’re still getting more than just a frame’s worth of GPU, and the latency is now down to 3 frames best-case (same as at 60 fps) and 4.67 frames worst-case (which is one frame more than at 60 fps, but one frame less than at 30-delayed rendering).
For this to work well, it is important for the CPU to produce draw calls faster than (or at least as fast as) the GPU can consume them. If not, the GPU would just be used spuriously within the 1 1/3 frames and the effective GPU utilization would probably be less than a frame, at which point this approach would be, er, pointless.
“That’s interesting Christer,” you say, “but even at full utilization, 1 1/3 frames of GPU isn’t enough.” And, yes, you’re right, it isn’t. We really want the whole two frames. So, let’s shift things around a bit and see what happens. Instead of synchronizing for vertical blanking (VBL) at the start of the game loop code, lets put the VBL sync just before the drawing code on the CPU. This results in my final figure.
We now have the logical game frame start roughly halfway into the 30Hz-frame (exactly when it starts depends on how long it takes to finish the CPU drawing for the previous game frame). Note that I’ve drawn the GPU utilizing all of its time for rendering, continuing beyond the spot where the CPU stops issuing draw calls. This is okay, because we’ve already assumed the CPU is issuing draw calls faster than the GPU can consume them, so some will have been batched up, meaning the GPU will continue on. Furthermore, we almost always end a frame with a lot of fullscreen passes, and these are fast to issue on the CPU but may go on for several milliseconds on the GPU.
Note also that in order to facilitate switching to the new front buffer, we need to set up a VBL interrupt to do the switching, as the CPU code is no longer guaranteed to be idle at around VBL time (as it is with the delayed rendering approach).
The input latency is now 3.67 frames best-case and 5.33 frames worst-case. Compared to the 30 fps-delayed rendering (where the same numbers were 5 and 6.67 frames, respectively) we have reduced lag by around 1 1/3 frames overall by going with immediate rendering.
Should we go crazy?
That’s a seemingly respectable decrease in lag, so do I recommend people switch to immediate rendering at 30 fps, with a shifted game loop? Um, no, probably not. It’s a big change, it’s untested (or, at least, I’m not aware of anyone doing rendering this way), you need to ensure your CPU draw code is fast enough so as not to starve the GPU, etc. Basically, it’s really taking a plunge in the deep end. Also, arguably, while trained game developers might notice the difference in lag (for certain types of games) it is not clear that most game players would notice the improvement. All in all, it seems like a risky move that could leave you worse off if you’re unlucky.
However, I wanted to post about this topic anyway, because I love to see some comments on this blog post. Do you think what I outlined would work for real? Have you or someone you know done anything simular? Any other ideas for reducing input lag? How important is it to reduce input lag anyway, and why? Feedback please!