How legacy code and server queries brought down Diablo 2: Resurrected’s servers, and how Blizzard intends to fix it
If you have been playing Diablo 2: Resurrected, then you likely already know that the game has had some significant issues with its servers. Even if they haven’t affected you personally, they’re a hot topic of discussion — for understandable reasons, as these server issues have had some pretty drastic effect on players. From losing character progression to simply being unable to play the game at all, server issues aren’t fun for anyone.
Now Blizzard has posted details on what the issues are and how they’re working to resolve them, and it’s pretty interesting reading. I’m not going to try and reproduce it all here, but the recap of what actually happened to the servers is simultaneously unprecedented and unsurprising. Essentially, due to the way Blizzard’s regional servers link to the global database that saves your game progression, when the servers came under unexpectedly high load — as they describe it, a sudden surge in traffic they’d never experienced even when the game launched — just after they’d implemented an update that was intended to improve game performance around the creation of a game, the connection to the global database timed out. This led to a cascading series of failures that forced them to roll back their update and try to reestablish a stable network.
This worked, but the very next day, an even higher surge in traffic caused the system to crash again. What’s worse, since they’d fixed a similar problem the day before they had everything up and running quickly. This would be great, but it meant that the network couldn’t handle all the new traffic that had crashed it in the first place, while also trying to catch up on all that it had lost while it was down, and — I think you can see where this is going. The effort to correct all of this with fixes to the backup global database actually led to more issues, which they are still working on fixing as of this writing. Again, if you’re really interested, check out their post, the details are fascinating.
But the real culprit of all this? The original code to Diablo 2.
Why this is happening:
In staying true to the original game, we kept a lot of legacy code. However, one legacy service in particular is struggling to keep up with modern player behavior.
This service, with some upgrades from the original, handles critical pieces of game functionality, namely game creation/joining, updating/reading/filtering game lists, verifying game server health, and reading characters from the database to ensure your character can participate in whatever it is you’re filtering for. Importantly, this service is a singleton, which means we can only run one instance of it in order to ensure all players are seeing the most up-to-date and correct game list at all times. We did optimize this service in many ways to conform to more modern technology, but as we previously mentioned, a lot of our issues stem from game creation.
So what we have here is a combination of issues. The legacy code wasn’t designed to function the way modern gamers are using it — back in 2000, it simply wasn’t feasible for players to find information on how to maximize their game productivity the way they can now, and as a result, people are starting new games at a pace that simply didn’t exist 21 years ago. Add to that new code that does things like save to the global database more often than is strictly necessary, increasing the traffic to do that too, and eventually you’re constantly crashing the database.
It didn’t show up in the beta test in a way that allowed them to understand how it would play out live, and as a result they’re working to fix the issues caused. However, they can’t just rip out that legacy code from the original game and write a whole new system for creating a new game session on the fly.
There are multiple solutions in place or being worked on, like limiting the rate of queries to the global database, working on login queues to keep the traffic under control and prevent spikes and surges that overtax that legacy code, and working to break out functions that were once all part of that single legacy service into multiple systems so it can handle more of them without issue. Work is ongoing, but as you can expect from such a complicated series of issues, it’s not an immediate fix. You can check out all of their efforts here — and by now hopefully the worst of it is over.
It’s really amazing to see how complicated it can get trying to bring a game forward into the modern sphere and seeing the ways gameplay has changed thanks to information no one could have been expected to have in 2000, but which is commonplace in 2021.
Please consider supporting our Patreon!
Join the Discussion
Blizzard Watch is a safe space for all readers. By leaving comments on this site you agree to follow our commenting and community guidelines.