Skip to main content
June 22, 2023
Optimizing Pokémon GO: How a Centralized Redis Cluster Improved Performance and Reliability During Popular Raid Events

By Da Xing and Michael Mei

The Raid Battle feature in Pokémon GO is widely regarded as a crucial and popular aspect of the game. These raid battles take place within Pokémon Gyms located at specific locations. In these battles, players team up with one another to take on powerful and oftentimes rare Pokémon. Successfully completing the raid battles will yield the player with special rewards and the opportunity to capture the Pokémon. Whenever players are in close proximity to a gym with an active raid Pokémon, they can join the raid lobby to prepare for the upcoming battle. It’s worth noting that these lobbies have a maximum capacity of 20 players before becoming full, and a gym can host hundreds of lobbies simultaneously to enhance the in-person social experience.

Tackling Technical Hurdles During Raids

From a technical standpoint, the raid feature in Pokémon GO is engineered and deployed as a strictly in-memory feature. As a result, all players who participate in the same gym are hosted on the same server. During special raid events or in heavily frequented raid locations, the server infrastructure faces significant technical hurdles due to the sheer number of players present.

Pokemon GO fest 2019 in Chicago

One of the challenges of raid events is the sudden large influx of traffic (aka spiky QPS). The game operates in a multi-server environment, with players usually being evenly distributed across all servers. However, during raids, players in the same gym need to be on the same server in order to access the shared game data stored only in the memory of the corresponding server, such as player profiles and raid metadata. This can lead to unbalanced server loads, as popular raid areas attract more players, resulting in increased traffic to the servers hosting those gyms.

Spiky QPS and Delays for Players

During particularly popular raid events, the servers can become overwhelmed by high spiky queries, as thousands of players may be raiding under one gym over a short period of time. This can cause significant delays for players in the same raid, as well as for those who are not in the raid but are on the same server, eventually rendering the game unplayable for all affected players. To address this issue, Niantic site reliability engineers slowly drain out the affected server, temporarily redirect players to other servers and restart the busy server.

In addition to QPS-related challenges, the stateful nature of the system also makes scaling and restarting difficult. The server stores in-game player attributes in memory, which restricts players to connect and remain on a particular server. Niantic has developed an effective but complicated process to ensure that players are not affected during scaling. However, during major raid events when servers are clogged by spiky QPS, this process may take longer to drain out players on hot servers, which means that game clients may not be responsive for several minutes until the hot server is restarted.

Tracking impact on CPU usage as total player count rises

Simplifying the Technical System

One major change we made was to store the raid-related shared data, previously stored in the memory of the servers, in the centralized Redis cluster. This enables all Pokémon GO servers to access the raid-related shared data, eliminating the need for players to connect to the specific server where the gym is hosted to join raid groups. This simplifies the technical system significantly.



With the raid-related shared data stored in the Redis cluster, load is now more evenly distributed. Players can connect to any server, regardless of where the gym is hosted, eliminating the unbalanced load caused by popular raid gyms. This change has removed the bottleneck and allowed the servers to sustain higher QPS during popular raid events.

The provided diagram presents a heatmap visualizing the load distribution across all servers. The x-axis corresponds to time, while the y-axis represents the number of players on each server. Each cell within the heatmap is color-coded to indicate the magnitude of server load. Specifically, a red cell indicates that a significant number of servers exhibit similar player counts, while a green cell signifies that only a few servers are accommodating a specific player count.

The x-axis corresponds to time, while the y-axis represents the number of players on each server

Since we’ve slowly rolled out this Redis solution starting at approximately 11:30 am on that particular day, a noticeable change in the server landscape occurred. The occurrence of high player count servers, commonly referred to as “hotspots,” reduced significantly. Instead, the majority of servers are now hosting a relatively consistent player count ranging between 1.5k to 2.5k.


“Notably, the maximum recorded latency has decreased from over 1 second to approximately 250 milliseconds (75% latency drop).”


Launching at a Global Scale

The introduction of the project on a global scale, at approximately 4:00 pm, resulted in a significant reduction in latency. Latency represents the duration it takes for a server to respond to a player’s request, typically occurring when a player interacts with the game client. Notably, the maximum recorded latency has decreased from over 1 second to approximately 250 milliseconds (75% latency drop). This improvement is visually represented in the chart provided below.

The maximum recorded latency has decreased from over 1 second to approximately 250 milliseconds

Moreover, the server is now more reliable. Long delays and server hiccups during popular raid events have been greatly reduced since the project was launched into production and fine-tuned through a few iterations. This provides a more stable raiding experience during major events and saves operational and maintenance costs that can be invested in other areas to improve the overall gaming experience.

We are constantly working to improve the Pokémon GO player experience, and have already started developing an even better solution to further enhance the performance and reliability during popular raid events. We will be sharing more details on this project soon, so stay tuned for more. Happy Raiding!

If you’re interested in building the infrastructure for games like Pokémon GO, join us!


Get the latest