Serverless Chess - Discussion and Analysis

1642286260363

Background

On Tuesday, December 21st, the most popular Youtube chess personality, Agadmator, hosted the world's largest online chess competition on lichess.org. With a prizepool of 1 BTC, it attracted even the most successful chess players of modern day, such as world champion Magnus Carlsen, along with many other grandmasters.

This tournament was much larger than anything Lichess had handled before, and it was noticeable from the getgo. Right off the bat, there was apparent lag and delays, especially when it came to finding a new opponent after a previous match concluded. After about an hour, the servers completely crashed and the tournament was postponed.

Before going into further discussion, let's first acknowledge that Lichess is a non-profit organization. Therefore, they do not have the same kind of budget that a corporation would have when it comes to servers and architecture. Additionally, they provide all of their services without any payment requirements or advertisements, so there is no reason to harp on them for their servers going down.

Truthfully, I had started working on serverless chess about 2 months before this happened, so I can't act as though this event is what inspired me to start the project. However, even when I started the project, I recognized that the implementation of chess in a serverless manner could solve problems like this. Later in this blog post, I'll talk about and analyze the benefits of serverless chess, but first I'd to discuss more details about the project.

Limitations

Before I discuss any limitations, I would like to make it clear that this project, while functional, is not intended to be used in any production environment. I made this project as a "demo" to show that it can be done, and to provide some data about it.

Time: I started this project in late October, and am finishing it now in mid January. My main deadline being that I'd like to apply to a few more jobs before my last semester of college is over. The whole duration of this project, I was either in school or on break for the holidays, meaning I could never put in more than a few hours per week. In the "Future Directions" section below, you will see a ton of features that really should be there in the first place, and that I could easily implement within a few more hours, but couldn't do so within the time frame.

Speed: Writing chess in a serverless manner is undoubtedly going to make it a little bit slower. Since there is no consistent server to store memory, the game state has to be stored in a database (I chose DynamoDB). Loading the game from the database, parsing the moves, calculating new legal moves, and writing it back to the database for every move is going to add a bit of overhead time. Truthfully, if an implementation of serverless chess is only a little bit slower than chess with a dedicated server, then I consider that a win.

People: Since I made this project alone, and I don't specialize in everything, some parts of the project are not as great as they could be. For example, developing a front-end client for chess is not easy, so I made a pretty basic and featureless front end, which still took a long time. Additionally, the algorithms that I choose to use on the backend for parsing moves, calculating legal moves, calculating checkmate and stalemate... may not be the fastest. They certainly work, but this is my first time writing a chess backend and I imagine that there may be more efficient ways to do what I have done.

Speed analysis

As I discussed earlier, chess implemented in a serverless manner is bound to be a bit slower than a server implementation. But how much slower? Well, keep in mind that I likely didn't use the absolute fastest algorithms in the world, but it still came out pretty fast.

I'm going to be comparing the move-time of my serverless chess implementation with the move-time speed of the two biggest chess websites, Lichess and Chess.com. Now, there's something important we have to keep in mind here. The majority of the move-time latency in a chess game is just standard latency. It is likely that Chess.com and Lichess servers are powerful and well optimized, so the actual time it takes for a move to happen is essentially just the base time it takes for the electricity to travel from client to server.

Lichess speed analysis

In order to see how fast Lichess was, I connected my Firefox web browser to OWASP Zed Attack Proxy, so I could monitor the exact timings of requests and responses. Whenever I made a move, I would see a Websockets request that looks like this:

Followed by a response from the server that looks like this:

With a sample size of 10, I found that the responses came back on average within 157ms.

Chess.com speed analysis

When I made a move on Chess.com, I would send a Websockets request that looks like this:

And I would receive back a packet that looks like this:

With a sample size of 10, I found that the responses came back on average within 83ms.

Let's keep in mind that this does not necessarily mean that Chess.com is faster than Lichess. More likely, I am physically located closer to Chess.com's servers than Lichess's servers. Regardless, if I can get speeds close to the range of 83ms - 157ms, then I can say that I am on par with the best!

My serverless chess speed analysis

I can control how much memory / cpu power is allocated to the main "make move" lambda function, from a minimum of 128mb of memory to a maximum of 10240mb. I have no doubt that my program will be quite fast on the maximum setting, but that's also going to be very expensive. I'll do some testing to see if I can find the lowest memory allocation possible that still gives fast speeds.

Test #	Memory size	Average move time
1	128mb (min)	570ms
2	10240mb (max)	136ms
3	1024mb	144ms
4	512mb	205ms
5	1536mb	143ms

I don't really feel the need to keep testing until I find the exact right memory size. It seems that 1024mb is a good spot, as going below it adds a lot of time but going above it doesn't really decrease the time.

What I can also say is that... this is pretty good! I am actually getting faster speeds on average to my serverless chess game than I get on Lichess, but I am getting nowhere near the 83ms of Chess.com. And keep in mind that there is plenty of room for optimization in my implementation that could lead to even faster speeds, but probably not by a massive margin.

Cost analysis

In order to calculate the average cost of a chess game, we need to know what the average chess game is. According to various sources, the average chess game seems to be about 40 moves. It's difficult to find out how long (time-wise) the average chess game is, but based on this blog post, it looks like the average time control is about 3 minutes, so 6 minutes max per game. And most games will finish before the maximum time, so it is safe to call 6 minutes a good upper bound for average game time.

So let's look at some cost statistics from AWS.

Lambda: $0.0000166667 per GB-second, $0.20 per million requests
DynamoDB (on demand): $1.25 per million write request units, $0.25 per million read request units
API Gateway (Websockets): $1.00 per million requests, drops to $0.80 after a billion requests, $0.25 per million connection minutes

Now let's do the math (sorry for poor formatting).

Cost of Lambda: 40 moves * ((1 GB * 0.2 seconds * $0.0000166667 per GB-second) ($0.2 * 10^(-6)) per move = $0.0001413336 per game
Cost of DynamoDB: Each move is one database read and write, so 40 moves * ($1.25 * 10^(-6) $0.25 * 10^(-6)) per move = $0.00006 per game
Cost of API Gateway: Each move results in three Websockets messages (player sends a message to make move, server responds to both players with move), so 40 moves * 3 requests per move * $1.00 * 10^(-6) per request 6 minutes * $0.25 * 10^(-6) per minute = $0.0001215 per game

Therefore, a very estimated average per game would be $0.0003228336. More importantly, a million games would cost $323.

According to Lichess database, they have about 90,000 games per month, which would cost about $29.

I am not saying that Lichess could be operating off of $29 a month. Lichess offers so much more than just the basic ability to play the game. They offer free computer analysis, tournaments, bots, etc, not to mention the cost of just hosting the website. However, I do believe that if Lichess could manage to go serverless, that they could cut their costs significantly.

Final Discussion

From these results, I would say that my project is successful. I have shown that serverless chess architecture can be fast and cheap, not to mention the other benefits. With serverless architecture also comes infinite scalability, meaning that there would not be any problems if an excess of players suddenly started playing.

Future Directions

As I mentioned before, this project is overall lacking in a lot of features, mainly due to time constraints. Here are some things I would have liked to implement if I had more time to work on the project.

Backend:
- When creating a game, choosing if you should start as white or black.
- Lobby system: when creating a game, you can make it public and other players can see a list of all open public games.
- Implement time and different time controls.
- Allow games to be spectated by third parties.
- Allow players to view/analyze a game that has concluded
Frontend:
- Allow moves to be made my dragging or clicking pieces and squares.