Dev Blog: Under the Hood of the Replication Framework

Foreword

As we enter our ninth year at TruckersMP it is a great opportunity to not just look back at all we have achieved with the platform over the years, but also look ahead into the future, as we strive to achieve the ambitions we have of making the MMO (massively multiplayer online) experience we offer more feature rich and compelling for the community to enjoy than ever before.

One of the main barriers we face as a modification of American Truck Simulator and Euro Truck Simulator 2 is the way we have developed our platform to work with the games over the years, whilst the methods used have served us well, allowing us to create and provide all of the features we currently offer, we want to do more.

To deliver some of the most highly requested and anticipated features such as AI traffic on our servers, we need to change the way things work under the hood, that is where our new Replication Framework project comes into play, already in development for more than a year, this ongoing project will allow us to deliver the kind of game-changing features the community craves, and that is key to our ongoing continuation as the destination of choice for MMO (massively multiplayer online) for virtual trucking.


Introduction

In this blog, we will be taking a technical deep dive into this new replication framework. You may be curious about what this means, what benefits it will provide us in the short and long term, and why it is taking so long to develop. This dev blog will provide a peak under the hood of that framework, and whilst it will mainly appeal to the tech-savvy users among us given the technical language used, it should broadly provide an overview of some of the challenges that we face whilst developing TruckersMP.

TruckersMP is a multiplayer modification, so we must synchronise the game world between clients. Our modification is an MMO (massively multiplayer online), and we rely on synchronising the game state using a server. This means that we use a client-server architecture where clients connect to a server. The game server has knowledge about the entire game state, including the location of players' vehicles, the accessories required to build them, players' nicknames, the contents of their chats, and so on.

To better understand the concepts involved, it is important to have some knowledge of what packets are. In computer networking, a packet is a unit of data transmitted across a network. Think of it as a message sent from one computer that can be received by another. A packet can contain any data, such as text or binary, but it cannot be too large. If a message is too big, it will be divided into packets and reassembled by the receiver.


Challenges of developing network games

When developing network games, there are several important factors to consider, including bandwidth usage, CPU usage, security, stability, and reliability. Bandwidth usage refers to the amount of data transmitted over the network, measured in bits per second (b/s). A dedicated server can transmit up to 1 Gb/s, but it can be expensive to upgrade. Preparing packets on the server requires CPU usage, which can be resource-intensive, especially when preparing packets for 4000 clients 10 times per second. We also want clients to maintain good FPS (frames per second). Security is crucial, as we must ensure that hackers cannot take control of the game server application, leading to breaches or infecting clients' computers. The game server and client must also remain stable and reliable, with entities reliably synchronised to avoid desynchronisation issues.

Since the project's inception eight years ago, we have employed a simple network code architecture design. Whenever we need to synchronise the state of something, we manually construct a packet with a specific ID and send it to the client. On the client side, the packet is received, read, and the state is applied. Similarly, when a local player vehicle changes position, for example, a packet is constructed, and sent to the server, and the server applies the changes.

This architecture has its advantages and disadvantages. On the positive side, bandwidth usage appears to be manageable, and the code is straightforward and low-level, making it easy to create in C++. Security is linked to the packet receiver, and reliability is based on the network library.

However, there are also several downsides to this design. High-level and low-level code are mixed, making it difficult to scale the code base. It is challenging to transmit only modified properties over the network to save bandwidth, and synchronisation is spread across many parts of the code, making it hard to optimise CPU usage on both the in-game client and game server. Utilising more than one CPU core/thread is challenging, adding new features is complicated, and reliability is based on the network library. In reality, bandwidth is not always under control.

Let us clarify each of these points:

(Pro/Con) Bandwidth usage seems to be manageable, but in reality, actually isn’t:
We manually create a packet in specific C++ code each time we want to synchronise something, which keeps the bandwidth usage under control. However, this may not be foolproof. With many entities to synchronise, and with synchronisation code spread across the whole code base, there is no real control of bandwidth usage. Each part of the code will try to send packets without any orchestration. Another big problem is that we do not know which systems take the most bandwidth in real-time (per player, and globally, that is important).

(Pro) The code is simple and low-level:
Creating packets is a straightforward concept to understand. You send them, receive them on the other end, read packet ID, and apply specific logic for deserializing. This simplicity is due to importing an existing network library (e.g., RakNet), creating sockets, and writing packet handlers.

(Pro) It was easy to create in C++:
Using the existing network library and writing packet handlers for synchronisation makes it easy to create the synchronisation code in C++.

(Pro) Security is dependent on the packet receiver:
The deserialisation code is well-written, and security flaws can be avoided if you are careful. However, you must be cautious during packet deserialisation, as a single mistake can introduce security vulnerabilities.

(Pro/Con) Reliability depends on the network library:
We opt for the use of UDP (User Datagram Protocol) to transmit packets instead of TCP (Transmission Control Protocol). TCP does not allow the sending of unreliable packets, meaning that the protocol must ensure that every packet is delivered. This approach results in additional bandwidth overhead. For a real-time simulation game, the use of TCP could lead to high latency due to packet loss. TCP always preserves packet order, resulting in high delays due to the overhead. This may be problematic when considering the distance that a player's vehicle can travel during that time. In the case of, let's say, 500ms latency and a player's vehicle travelling at a speed of 150 km/h will move 41 metres every second, meaning that it can travel up to 20 metres within 0.5s. Such a delay could cause issues with the current player’s vehicle placement, leading to the game becoming unplayable.

UDP doesn't ensure the delivery of packets nor does it preserve packet order, in contrast to TCP. However, it's possible to construct a reliability layer on top of UDP. Many network libraries offer features such as sending acknowledgement packets, buffering packets until required ones arrive, and packet counting. Although constructing such a library can be challenging. Despite this, it's preferable to use a network library that offers reliability features on top of UDP rather than TCP. The reason for this is that the network library permits the transmission of both reliable and unreliable packets simultaneously. When sending a packet, one can decide whether the network library should guarantee its delivery or not.

(Con) High-level and low-level code is mixed:
Mixing network code with gameplay features can make the code harder to understand and maintain, especially when there are several high-level concepts to synchronise.

(Con) It does not scale well:
Every entity type requires a different set of properties and synchronisation code, which can make the code hard to scale. For instance, the synchronisation code for AI vehicles differs from that of player vehicles, and the synchronisation code for traffic lights differs from that of bus passengers. With multiple entity types, it is challenging to create different network codes for each of them, making the codebase hard to maintain.

(Con) Sending only modified properties to save bandwidth is difficult:
Suppose we have already sent the vehicle light state, and it has been received by the client. In that case, it is unnecessary to resend the same information if only the vehicle's position has changed. This can be an issue if the code manually creates packets for each entity type separately. It is challenging to create code that sends only modified properties to save bandwidth.

(Con) Synchronisation is spread throughout the codebase, making it challenging to optimise CPU usage:
Since synchronisation code is placed in many parts of the code, it is difficult to optimise CPU usage. If synchronisation code was placed together, it would be easier to measure and optimise CPU usage. Additionally, it is challenging to use more than one CPU core/thread for synchronisation, which can result in lag in the game.

(Con) Hard to add new features:
Whenever you need to add a new synchronised feature to the code, you need to write low-level network synchronisation code again. It is really simple to make mistakes there, but it is not the biggest problem. Because bandwidth is limited for dedicated servers (1Gbps let's say), you cannot send too much data to a specific client. You only want to send data that the client should be interested in. Like, only players around. It is hard to maintain logic for every single thing, that will synchronise properties to the client that came to a specific location, just joined the game, or left the location, so the thing should be despawned. Complexity is high. Also, unreliable packets might not be received by the other side, and you don't know if they were received and you should resend modified property data. You can assure reliability by sending reliable packets, but every time you do that, it brings costs with it.


Towards the replication framework

Eventually, the aforementioned issues will prompt the need for a general solution for synchronising game objects and their properties, which is commonly known as replication. The replication code provides an interface for replicating entities and their properties. Its architecture is straightforward:

  • The server starts the replication session.
  • Clients connect to the replication session.
  • The server creates a vehicle object for the client.
  • The server adds the vehicle object to the replication session.
  • The replication session creates generic packets and sends them to ensure that clients receive the entities they are interested in and their updates.
  • When a client disconnects, the vehicle is destroyed and removed from the replication session on the server.
  • The replication session ensures that the vehicle is removed from all clients.

This design simplifies the network code architecture and provides a single point of responsibility for synchronising game objects. While this design has numerous advantages, including controlled bandwidth usage, simplified code, and better security, it also has some disadvantages, such as the difficulty in developing a reliable replication framework and the potential for bugs. It is worth noting that the replication session is only one part of the framework, and there are several other aspects, including support for the internal Entity Component System, various data types, replication graph, and Remote Procedure Calls.


Conclusion

We hope this blog provided a useful insight into the work our Game Development team is doing behind the scenes. In the short term, implementing the replication framework will lead to clear, high-level, and manageable code, making it easier to synchronise existing game features and new ones. In the long run, it will enable the addition of more complex game features, such as AI traffic, a shared economy, and a global reporting system.

TruckersMP Team


87 comments on the forum

Author

Community Management