Skip to main content
December 21, 2022
Lightship VPS: How Niantic’s API service works

This is Part 3 of a 3-part series by Chief Scientist Victor Prisacariu, Director of Engineering Pierre Fite-Georgel, and Staff Software Engineer Qi Zhou on Building the World’s Most Dynamic 3D/AR Map.
In part 1, they covered the beginning steps: data collection and processing. In part 2, they explained how Niantic is building a human-readable map with computer vision, and how it will all ultimately lead to the 3D map of the future. In this final post of the series, they dive into how Niantic’s API Service works and how we speed it up to provide a good experience to our developers and end users.

Our API Service and how it works

Lightship VPS is evolving fast and it’s critical to stay agile. Leveraging the cloud enables us to quickly put research into production and worry less about client backward compatibility. VPS is a cloud service that enables applications to align a user’s device with persistent AR content at real world locations to power new immersive experiences. It determines the device’s 6 degrees of freedom (6-DoF) pose by querying Niantic 3D map data on the cloud. Please refer to blog posts 1 and 2 for information on building Niantic 3D maps.

When a user queries Lightship VPS, they first send us a query image captured on the user device, along with a few other metadata. Then our VPS service fetches nearby Niantic 3D maps and localizes the user query image against the maps. Once successful, a transform to align the user device to Niantic maps is returned.

As you can imagine, latency is critical here since user experience is highly correlated with it. Localization is both compute and input-output (IO) intensive. Each localization request needs map data stored on cloud storage to run localization algorithms. We leverage both CPU intrinsics and GPU to speed up localization algorithms and have designed a hierarchical map cache to minimize the latency of network IO and the impact of unstable cloud storage throughput.

The hierarchical cache is a least recently used (LRU) cache, including both a random access memory (RAM) and a solid state drive (SSD) component. Cache operations check the cache utilization rates to ensure memory and SSD usages are within a healthy range. With mapping algorithm improvements, each machine can cache more than what it can serve, so the theoretical cache hit rate is high. To improve the cache hit rate, we also need to ensure user requests are served by machines with the required data, aka map affinity.

Caching map data means our service is stateful. Scaling a stateful service is always challenging. To achieve map affinity and horizontal scalability, VPS service leverages a system built within Niantic which provides a virtualization layer on top of cloud virtual machines (VMs) using the concept of shards. A shard is a basic unit for scheduling and load balancing. Each VM is responsible for one or multiple shards, and each shard can have multiple replicas on different VMs.

In this example there are 4 maps assigned to 3 shards. Map A is assigned to shard1, map B is assigned to shard 2, map C and D are assigned to shard 3. Shard2 has 2 replicas on algorithm server 1 and 2 which means requests of map B can be sent to either server. The other shards have only 1 replica. The hierarchical cache components are self-adaptive via the feedback loop control.

For each user request, we first query an indexing service backed by Google Cloud Bigtable to find nearby maps and then forward user query images to the machines which may contain map data by hashing map IDs and checking the routing table of shards. When localization algorithms finish running with the map data, we will select a response and return it to the user. The number of shards, the number of replicas of each shard, and the distribution of shards are adjusted over time to achieve the best performance and resource utilization.

VPS server architecture diagram

Another big advantage of SSD cache is that it solves the cold start problem of the stateful service. Each cloud VM connects to a SSD that exists independently of the VM. When a cloud VM restarts, instead of fetching all data from the cloud storage, we keep the map affinity, and the VM will try to read from its SSD first. Without a SSD cache layer any in-memory cache miss will result in at least one data read from the cloud storage, which is expensive because cloud storage is on hard drives and the latency variation could be large (usually a few hundred milliseconds or more in the worst case). SSD cache could avoid bursts of requests when VMs restart and improve both latency and throughput.

With the advanced algorithms and infrastructure, in most cases, users all around the world with a decent internet connection can localize against Niantic 3D maps in a few hundred milliseconds after client initialization.

To enhance service security, we integrated VPS service with Niantic API gateway for access control and rate limiting, and also leverage Google Cloud Platform technologies like private service connect to minimize the risk of exposing unnecessary data and endpoints to the public internet.

Persistence

With Lightship VPS, users can place virtual objects that maintain a consistent, stable pose in an AR environment, like placing a virtual apple or a Pokémon character to a VPS-activated Wayspot or private VPS location. The object will stay in that location and persist until it is removed by the user. Today, Lightship VPS is available at over 140,000 locations across 125 global cities, meaning more than 250 million people are within a 5 minute walk of a VPS-activated Wayspot. This ability to have AR content be anchored in a spot relative to the real world that is easily reachable and remain persistent there even as the world evolves, so users can revisit it and share it with others, is a differentiator for Lightship VPS.

VPS is still in its early stages. As we continue building out 3D maps of the world, we are also taking user experience and privacy into consideration seriously. For instance, in our vision in the future users will have the option to run localizations on their devices, giving them more control over their data. Meanwhile, such a solution will also reduce the latency and network bandwidth consumption of using Niantic VPS.

We’re excited about our progress so far, but we are just getting started. If you want to join in the effort, check out our jobs page.

– By Staff Software Engineer Qi Zhou, Director of Engineering Pierre Fite-Georgel, and Chief Scientist Victor Prisacariu


Get the latest