Techniques to Reduce Latency in Your Apps

This is part 2 of a series on input latency. Check the first post for background information about input latency.

This post contains techniques you can use as a developer to reduce input latency in your applications. But first, a discussion of something that you shouldn't do:

Don't Disable VSync

Disabling VSync is often recommended to reduce latency. As a user it is one of the few things you can do to reduce latency in poorly designed apps. But it is not a good idea for an app to disable VSync itself. If you're looking for something to implement now, skip to the "Reduce Queued Frames" section. Read on to find out why not to disable VSync.

First some terminology. "VSync" or Vertical Sync refers to the signal that the GPU sends once per frame over the monitor cable, marking the first pixel of a new frame. "Vertical" because the pixels are sent from top to bottom in rows. (There is also a HSync to mark the beginning of each row.) The pixels to be shown are stored in GPU memory in what's called the "front buffer". There are many "back buffers" which applications render into, and then perform a "Present" or "Swap" or "Flip" operation to turn one of the back buffers into the new front buffer, while the old front buffer becomes a back buffer. The process of sending pixels from the front buffer through the monitor cable to be displayed is called "scanout". The period after the end of scanout but before the next VSync is called "VBlank". So VSync marks the end of VBlank and the beginning of scanout.

In VGA cables there is a literal pin and wire whose only purpose is to send the VSync signal. In modern digital cables VSync is a data packet sent on the same wires as other data. But it's an essential part of every GPU to monitor connection and it is always sent. It is never actually disabled. So what do we mean by "Disable VSync"?

In rendering, VSync often refers to two other concepts besides the monitor signal itself. One is the concept of making the application wait until the monitor is ready to accept the next frame, throttling the FPS to the refresh rate of the monitor. Let's call this "backpressure". The other concept is preventing the application from updating the front buffer during scanout. Let's call this "tearing prevention".

Tearing prevention

A tearing artifact occurs when the application updates or swaps the front buffer during scanout. This results in two frames being shown on different parts of the monitor at the same time, usually with a visible line where they meet, which jumps around from frame to frame. Some APIs prevent tearing even when VSync is disabled, and some don't. Some have separate controls for VSync and tearing.

Tearing is a pretty bad artifact, generally worse than a bit of extra input latency. As a result, updating the front buffer during scanout is usually not desired. However, it can actually be done without tearing if one is extremely careful about timing. This is sometimes called "beam racing" and it has been used in VR compositors and some home consoles and console emulators. For more information about how this can be achieved, see here.

It is unlikely that beam racing or tearing can benefit most normal applications. So for the rest of this document let's assume that the platform prevents tearing artifacts. How? At the beginning of scanout when VSync is sent, the platform selects the most recently completed frame buffer as the front buffer, and prevents the application from modifying it. The application renders the next frame into a different frame buffer (often using triple buffering). If that frame finishes rendering even one microsecond after VSync, it has to wait all the way until the next VSync before it can be sent to the monitor. Bummer! But that's the price of preventing tearing artifacts.

Actually it can be even worse than this. When a compositing window manager is in use, the front buffer is not one of the application's buffers. The front buffer is the desktop, and it is owned by the window manager. The window manager draws desktop frames using the same 3D APIs as the application, with double buffering. At VSync when scanout starts for the desktop front buffer, the window manager wakes up and starts drawing the next desktop frame. It takes the application's most recently completed frame buffer and composites it into the desktop back buffer. Then it goes to sleep for the rest of the frame so the application can do its rendering. But if the application finishes a new frame before VSync, the window manager doesn't wake up again to re-composite the desktop. At the next VSync the desktop back buffer swaps and becomes the front buffer, but it still contains the application's old frame. The new application frame is then composited into the next desktop back buffer and has to wait a whole extra VSync interval before it is sent to the monitor. So a compositing window manager typically adds a whole extra frame of input latency to every application.

Backpressure

Removing backpressure is probably what most people first think of as "disabling VSync". Backpressure is typically enabled by default. This means that the rendering API will block when the application calls "Present" or "Swap", making the application wait until the next VSync before it can start rendering the next frame.

Without backpressure the application will render frames as fast as possible, regardless of the monitor refresh rate. If the application's FPS is lower than the monitor refresh rate, this has no effect. But if the application's FPS is much higher than the monitor refresh rate, then removing backpressure may reduce input latency.

Removing backpressure comes with a lot of downsides! Input latency is only reduced in the case where some of the frames produced by the application are never sent to the monitor, being replaced by newer frames before the next VSync. Rendering these garbage frames has no benefit. It wastes power and makes devices hot and reduces battery life. Not only that, it can even cause the application and the whole device to run slower overall due to thermal throttling. It would be far better to simply skip rendering the frames that aren't sent to the monitor and sleep instead. But because the application is ignoring the VSync signal, it doesn't know ahead of time which frames will be used.

Even if you don't care about wasting power and don't hit thermal throttling, disabling backpressure doesn't achieve the minimum theoretical latency. The input latency will be on average 1.5x the time to render a frame, because on average you'll be halfway done with a frame when VSync happens, and the previous frame will be sent to the monitor. The latency will fluctuate between 1 and 2 frames at random. The techniques discussed below can reduce input latency even lower than this, while keeping backpressure enabled for much higher efficiency.

Reduce queued frames

This is the first thing you should look at when you want to reduce input latency in an app using modern graphics APIs.

Most graphics APIs will queue up multiple frames by default. That is, as you make draw calls and call "Present" or "Swap", those commands are put on an input queue behind previous frames and not executed on the GPU until later. Then after the GPU executes the drawing commands, the finished frames may be put on a presentation queue, waiting to be sent to the monitor. Together these queues may hold many frames.

This buffering can have benefits. For GPU-bound applications the input queue will always be full. The GPU will never wait for the CPU. It can just pull the drawing commands for the next frame from the input queue whenever it finishes the previous one. This allows reaching 100% GPU utilization. It also smooths out variations in frame rate. If one frame takes extra time to render, the GPU can still pull previously finished frames from the presentation queue to send to the monitor, so the user doesn't see a missed frame.

But all this queueing has a huge downside. Once frames are in the GPU input queue they generally are not modified to reflect new mouse or keyboard events that come in later. This increases input latency by the number of frames that are queued. This is the single biggest source of input latency in many applications.

If your application can render an entire frame from start to finish in one VSync interval, and doesn't need 100% of the GPU, then you can dramatically reduce your input latency by limiting the queue depth to the minimum possible value. At VSync the GPU will be idle, the CPU will wake to process input events and make draw calls, then the GPU will execute those draws, and both the CPU and GPU will finish and go back to sleep before the next VSync.

Different rendering APIs have their own ways of limiting queue length, so you will have to research the methods for your specific API.

Process input events as late as possible

Any time after processing input events adds to your input latency. If you have tasks to perform that don't depend on the user input in the current frame, such as allocating buffers or waiting for GPU resources to become available, then do those first before processing user input. While you are doing those tasks, more user input events may arrive, which you can then process before rendering, reducing input latency.

As a very advanced technique, you can process user input twice. Once at the beginning of the frame before rendering, and again at the end after rendering. Then, if your graphics API allows it, you can update GPU buffers to modify draw calls that you've already made to reflect the new user input (e.g. updating the position of a dragged object). This requires that you do extra synchronization between the CPU and GPU to ensure that the CPU is not modifying data that the GPU is in the process of using. VR compositors use this technique to update head tracking data at the last possible moment before rendering, to reduce nausea in VR.

Delay rendering until just before VSync

If you have already reduced the number of queued frames to the minimum, it may still be possible to reduce latency further. The secret to reducing input latency is, counterintuitively, to delay rendering.

If you start rendering right at VSync and finish early, you will spend the rest of the frame interval waiting for the next VSync. During that time new mouse and keyboard events may arrive, but your frame is already rendered and you can't update it until after the next VSync. That waiting time is extra input latency. Instead, if you move that waiting time to the beginning of the frame, before you render, then it doesn't contribute to input latency because you can process any mouse and keyboard events that arrived while you were waiting before you render the frame.

Waiting the perfect amount of time requires measuring and predicting exactly how long your rendering will take, and knowing exactly when VSync occurs on the monitor your window is displayed on, and ensuring that the operating system scheduler wakes up your process at exactly the right moment. This is all surprisingly difficult. If you get it slightly wrong and your frame takes slightly more time to render than you thought, your frame may not be done in time for VSync. Then it will have to wait a whole extra frame and the previous frame will be displayed twice, causing a hitch in any animations.

Although this is complex to implement, when done well it can reduce input latency to a small fraction of a monitor refresh interval, beat "disable VSync", and match the hardware rendered mouse cursor almost exactly.

Variable refresh rate (VRR)

AKA G-Sync, FreeSync, Adaptive-Sync, ProMotion, Q-Sync

Many displays now have the ability to delay VSync on a frame-by-frame basis. With traditional fixed VSync timing, if rendering a frame takes a bit longer than expected and misses VSync, it must wait all the way until the next VSync before it can be sent to the monitor. With variable refresh rate, the GPU delays VSync until the frame is done rendering and then sends the VSync signal immediately, so finished pixels are sent to the monitor with minimal delay.

This is very beneficial for input latency if your application can't render at the monitor's maximum refresh rate. If your application can render faster than the monitor's maximum refresh rate then there is no direct latency benefit. However, VRR works very well together with the previously described technique of delaying rendering. There is no risk of missing a fixed VSync deadline, so exact timing is much less critical.

VRR does have some limitations. Many systems don't support it. Many that support it don't enable it in non-fullscreen applications. Only one application can control VSync at a time; when using VRR in windowed mode all other applications must render at whatever rate the topmost application chooses (even if that is a very slow rate such as 24 FPS, or a wildly varying rate). Platform-specific APIs may be required to enable VRR. Predicting the time rendering will take is still difficult and still important for applications that want to show smooth animations and/or synchronized audio and video, because the animation timestep or audio sync depends on when the frame will ultimately be displayed to the user, which now depends directly on how long rendering takes. Applications may have components such as physics engines that prefer a fixed timestep, which can make it difficult to render frames at arbitrary intervals.

Software rendered mouse cursor hack

OS mouse cursors are typically drawn using a hardware overlay and their position is updated at the very last instant before VSync. This means they have less input latency than almost anything rendered via a regular graphics API.

A technique you can use to reduce the appearance of latency is to hide the operating system's mouse cursor and draw one yourself. This causes the mouse cursor itself to have the same latency as everything else in your application, so there is no longer a mismatch between the mouse cursor and objects being dragged.

Unfortunately this does not actually reduce the input latency, so the mouse cursor is likely to feel sluggish. Instead of doing this all the time, you may want to enable it only during interactive drag operations.

Predict user input

Another technique that can reduce the appearance of latency is to predict user input ahead of time. When you read a mouse position update, then use it to render a frame, you are essentially predicting that the mouse will remain stationary in the time between the mouse event and the frame becoming visible on the monitor. But if the mouse is moving, this is a pretty bad prediction. Instead, you might calculate the mouse velocity, estimate the future time when the rendered frame will appear on the monitor, and use the velocity to estimate where the mouse will be at that time, then use that as the mouse position instead of the original one. You could use fancier prediction methods such as machine learning. And this technique can even compensate for latency that occurs in the input device itself, before input events reach your application, given accurate timestamps on the input events.

This technique is often used by drawing applications with stylus input to keep the drawn line close to the physical stylus in spite of input latency. Since the predictions are not exact, the line must be moved from the predicted position to the actual position later on once it is known. Another example of this technique is in VR APIs that predict head motion to reduce head tracking latency and the nausea it causes.

Of course, not all forms of user input can be predicted. Also predictions can be wrong, which becomes more likely as the input latency you are trying to compensate for becomes larger.

Conclusion

By combining these techniques you can achieve the minimum input latency allowed by the platform you're running on. But depending on the specific platform, there may be more you can do. Part 3 of this series discusses these platform-specific considerations.

Don't Disable VSync​

Tearing prevention​

Backpressure​

Reduce queued frames​

Process input events as late as possible​

Delay rendering until just before VSync​

Variable refresh rate (VRR)​

Software rendered mouse cursor hack​

Predict user input​

Conclusion​