Have you ever wondered how real-time games can keep multiple clients in sync even when there are large latencies between users? How can you see other players reacting to your actions near instantly, in spite of the fact that the communication between your computer and the server is not instant?
In one of my recent personal projects I made a game engine with real-time networking. In this article I’ll break down what I learned and what you’ll have to consider if you’d like to do the same.
Understanding The Problem
Firstly, lets get a concrete example of what issue we intend to solve. Almost all real-time video-games consist of a collection of clients and a single server that runs the game. The server is the “source of truth” for the system through which all communication must flow. Clients are only connected to each other through the server itself, which operates as a relay between them.
Collections of data are sent in discrete chunks to and from each client to keep them updated as the game unfolds. Another player moved? Send everyone else a message to keep them updated. A new player joined? Every client is notified. But key here, is that messages take time and are not free. They have real transit time between the client and the server.
Example of packets being sent between two clients and a server. Packets take time to send (their latency) and act as discrete blobs of data. This is the avenue we have to operate through when syncing our game state
A client can then visualize their experience through a series of images displayed to their screen. In most games this entails things moving around (like players and physics objects) and animations unfolding.
If we target 60 images per second, which is standard, that gives each client 16ms between frames. That is a shockingly brief 0.016s. To keep that in perspective, light can barely make it across the continental united states in that time, yet we have to keep two clients in sync through networks that may span more than half of the circumference of the globe?
To complicate things further, games are often run on hardware with vast differences in processing power. So even if we target 60 frames per second, the actual rate can vary wildly between clients, leading to frequent interruptions in connections.
All told, we must design a system that has the following requirements
- Renders a frame every 16ms, where objects positions vary continuously and smoothly
- Clients can easily communicate with each-other even when there may be huge latencies due to hardware or network troubles
- Player can see each other move smoothly and respond to each others actions
- Servers is the source of truth and can authoritatively run game and trigger game-events
- Is fun
I am not sure if I can help with the last point, but the other four are possible with modern systems even if we have to cheat a bit. We have a few tricks we can use to make this system buttery smooth:
- Allow clients to render predicted in-between frames filling in data between server syncs.
- Clients predicts the game state ahead of time, guessing what it thinks the server will say, cutting down perceived latency but leading to jittering when wrong.
- Lots of smoothing by averaging positions and baking in offsets that make jerky movements smooth and continuous
- We run client-side AI systems that simulate other players responses to client actions
Combined, these processes allow a player to experience a game with nearly frame-perfect responses even when it could be hundreds of ms before any real data is sent to and received from the server.
What we are starting from
As a benchmark, I created a tool that helps visualize how a client stays up to date with a server through a series of sync operations. You can very how often the server tries to sync, how much latency is in the connection, and toggle on various algorithms to help smooth it out.
The most basic, simple approach would be for the client to render whatever the server said most recently as the “true” state of the system. If your system has low latency, this is a perfect solution, but we are not fortunate enough to live in such a beautiful world.
Here is a visualization of that simple approach, medium doesn't take kindly to animations like this so excuse the poor quality. You can try out the simulator here.
Cyan here represents what is “rendered” on the client and lime represents what the client has witnessed on the server. Notice the client never gets the current state as the latency means the client will only ever get old data.
The white line shows what the current frame is and what its positional data is (y axis). In this example it stays the same for a stretch then immediately jumps to some new value.
The client in this example will see long stretches of still frames and then sudden jumps to new values. These values are not even aligned with the server!
If you want to try the visualizer yourself, try it here.
Client side interpolation
Interpolation is the most important part of smoothing out the gameplay experience. The server is the source of truth in the system, but it sends data in discrete packets. These blobs of data we will call “sync events” and provide the client with data about the current game state, such as where the other players are or if that last attack actually hit them.
These sync events will happen as often as the server can send them, but in reality that will not be every frame. There can and will be gaps where a client simply hasn’t heard from the server in a bit but still needs to render new data on the screen.
What do to? Simply make it up!
Well not exactly, call it an educated guess. The client can try to fill in while the server is gone. The resulting frames are called interpolation frames. For example, if you provide the client a complete copy of the physics engine, then it can do some highly accurate guesses as to where a falling rock will be next frame. Even if the server has final say in where it actually lands, the client can do a good job for now, and simply adjust it later when the truth arrives from the server.
In practice, the server will send you not just an object’s position, but its velocity, its acceleration, its bounding box, and its rotation. With this, a client can run its own physics simulations for quite a while without needing to hear from the server
A simplified diagram for interpolation
The green line is a randomly varying function, but the client is given its current velocity and acceleration during the syncs, giving it a good but not perfect guess of where it will be.
If we apply this same approach to our simulator, we get a noticeable improvement right off the bat!
So what are we still missing? Why are we still missing the line? Well there will always be events the client could never have predicted. How is a client going to predict that a new player joined the game or what that player did? How could they predict what random weapon drop they get if its truly random? Even with perfect simulation, there are still floating point inaccuracies in operating systems and other tiny issues that will cascade. But for a few frames, or even a dozen, the client can make believable guesses as to what’s going on on the server.
Predicting game states in advance
Client side interpolation on its own is not enough. Since there is latency, the updates provided by the server are delayed. Client actions will feel very sluggish if it has to wait until the server gets around to acknowledging them. If you are 50ms behind the server, at best your actions will be responded to 100ms later (message has to go to the server and then back). For a fast-paced game this is unplayable. Imagine that you press to move forward and your character doesn’t start moving for 1/10th of a second. This is massive. How can we cut this delay down?
Just as we did before, we can “guess” frame data before sync events just as we can guess data between syncs. If a client can reliably render a frame every 10ms and they have a latency of 100ms, they are continuously 10 frames behind. So we just guess ahead 10 more frames right now and catch up.
Just as before, this interpolation will not be perfect, but for our client we just cut down 100ms of input latency. And the key change, we actually start to touch the true state, although rarely!
By shifting forward the prediction by the latency, the rendered data is not perfectly correct more often! Again, simulator is here.
Now whenever we receive a sync we overwrite our state with the outdated server sync then quickly interpolate forward back to where we were before. But key here, we see the cyan line is starting to touch the true game state, we are getting closer!
Yet we still have major discontinuous jumps in our rendering! How can we close these so the client is not given whiplash?
Everything we have shown has tremendously cut down our perceived latency. Our client is running on a 50ms connection, which should feel 3 or 4 frames delayed, but now they have no perceived latency thanks to our expert interpolation.
A major problem we still have is that everything is rubber banding everywhere and objects seems to teleport ever so slightly all the time. We can see this in the “jumps” of the cyan line. Basically, whenever we get a sync event, the game state instantly changes to the updated version. We witness this as seeing every object snap and jump around every time we receive a sync event from the server.
To fix this, we can keep two separate copies of every entity’s position on our client. One is the “true copy” that is instantaneously overwritten by the server states and interpolation data. The other can be the “render copy” which is where we actually display it. This render position is never fully correct, we know it’s wrong, but by making this distinction we can continuously smoothly transform the render copy into the true copy between frames, visually erasing any artifacts from interpolation.
There is still slight issues at sync points, but for 100ms of latency, the predictive line (cyan) is very close to the true value for almost the entirety of the simulation. Scroll up and compare this to the original.
With this render copy, some data can even be ignored client side. If the client is some minimum distance, that’s close enough. Just leave them.
To actually use this in a game, values like the direction the client is looking should be entirely ignored for syncing allowing the client to choose as it sees fit, so we may choose to just simply forget about the true value here and let the render copy do whatever it pleases.
Based on the game and the impact of accuracy over fidelity, the degree to which different objects’ states reflect their “true” or “render” values is entirely up to the developer. For high stakes games where reaction times are paramount, such as fighting games, some smoothing may be forsaken over increased accuracy. In other games where the feel is more important, much more smoothing can be applied.
Finally, some games even further to use AI to simulate other player actions locally. Note that this is not the same as AI frame interpolation (when an AI generates the next frames raw pixels). This is an AI taking over a specific system or player for the 3 or 4 frames it takes for the server to get back to the client about its true position. These predictions can be more lifelike than traditional interpolation.
Imagine a laggy client walking around a corner in a shooter game. Without interpolation it may take 200ms for that player to “react” to whomever is waiting there to shoot them. They would turn the corner, see the enemy on their screen, and input commands to jump back around the corner to safety. The server, which did not receive that jump back in time, would tell them that they died. This results in the player getting shot after they jumped back to safety. With AI interpolation the AI, after learning a bit about that specific user throughout the match, may anticipate the client jumps backwards, and thus begins to render it as such.
If these predictions are highly accurate (big “if”), then the AI has cut hundreds of milliseconds and more than a dozen laggy frames from the experience of everyone else in the match. If they predict wrong it can be worse than no prediction at all. As with most things, it’s important to consider the ramifications of this technique when determining where to implement it.