Skip to main content
September 21, 2022
Meet Niantic Engineering Leaders: A Conversation With Niantic Chief Scientist Victor Prisacariu

At the Lightship Summit earlier this year, Niantic unveiled a project that has been years in the making – the Lightship Visual Positioning System (VPS). For the first time, developers can use Lightship VPS to anchor AR content to real-world locations and have that content persist. More than that, the centimeter level accuracy of Lightship VPS means AR developers can build experiences that have more meaning because they have both precision and persistence. This work is foundational as Niantic builds the necessary elements for the real-world metaverse to come to life. None of this would be possible without the contributions of the Niantic AR-Geo team and one of its key leaders, Victor Prisacariu.

Integral to the process of creating this first of its kind AR map for machines was the pioneering work of Victor, and a unique approach he uses, namely, the “crazy”, or non-traditional versus “safe” engineering approach. In this Q&A, Victor talks about how that approach is useful in creating breakthroughs in AR and geolocation technologies using the cameras and sensors found in virtually all smartphones around the world.

Victor has spent much of the last two decades working in computer vision and AR, securing a PhD at Oxford University, where he now teaches as an Associate Professor in the Department of Engineering Science and runs the Active Vision Lab. In 2017, he co-founded 6D.ai, where he developed an SDK and APIs that enabled developers to use standard built-in smartphone cameras to create a 3D semantic map of the world. When Niantic acquired 6D.ai in 2020, Victor was introduced as Chief Scientist. Today he heads up the Niantic map research and development from his office in Oxford.

He brought to Niantic a unique approach to engineering that calls for two different tracks of development that follow vastly different methodologies. We recently talked with Victor about the launch of VPS, his novel approach to research and engineering culture at Niantic, as well the importance of benchmarks. As with many people inventing the future of AR he also talks about how Minority Report led him to embark on a career in computer vision.

Can you talk a little bit about your team’s role in developing VPS?

The team I work in builds the Niantic visual map of the world and its associated augmented reality services, a core example of which is the Visual Positioning System (VPS). Using the data our users upload when playing games like Ingress and Pokemon Go, we built high-fidelity 3D maps of the world, which include both 3D geometry (or the shape of things) and semantic understanding (what stuff in the map is, such as the ground, sky, trees, etc). These maps can be perfectly overlaid onto real world locations. Our visual positioning system enables users to find their position in this map using just images from the camera of their mobile phones.

How does Niantic approach engineering for a big project like that?

Back when I was at 6D.ai, due to the novelty and uncertainty of the system we were building, we developed a two-path method of engineering, and found that it worked quite well for us. We kept this as we moved to Niantic, and structured the Niantic map work based on the idea that there exists an engineering-focused “safe” path that we know will work, we can quantify it and then define OKRs around it. Pierre Fite-Georgel leads that effort. Then, there is a second, much less certain but potentially more rewarding path, which I call the “crazy”, or non-traditional path. I lead that effort. The motivation for this two-path structure is that, while research can produce state of the art results, it can, and often will, fail, so we need another system that we know will work. This gives us a fail-safe option we can rely on.

Because many of Niantic’s products are developed in the San Francisco Bay Area, our Niantic map engineering team there owns the safe path, as this defines the base product capabilities. In Oxford, we tend to do more of the crazier research, because once we’ve established the requirements for the product, we like to have the freedom to try innovative and impossible things to see what we can accomplish when common sense is put aside.

Localization of a VPS scan - depicting Victor's garden in early winter and transitioning to a mesh/scan that was built in early spring - showing contrast between seasonal changes of the same location

Do you have an example of processes where these approaches diverge?

One practical example is the work we did back at 6D.ai. Say you’re building a 3D reconstruction system to create a 3D mesh from a collection of images. The safe approach would be to use a well understood framework, for example based on photo-consistency, i.e. to extract 3D measurements from similarity of color and appearance. This produces satisfactory results, much of the time. This path would be primarily engineering based, and would involve implementing well-known algorithms and using established 3D geometric concepts. As such, the work could be run via standard software engineering processes. Speed and quality however are not always ideal, and methods can struggle with feature-less parts of the images, as one would often encounter in a house with blank white walls. The non-traditional approach, back when we started 6D.ai in 2018 or 2019, was to base our entire solution on the latest state-of-the-art neural network-based feature consistency, that ran at 1 frame per second on server-grade GPUs, and aim to run everything at 50 frames per second on a mobile phone. This required us to take a more research-oriented approach and deal with a lot more uncertainty. This meant, for example, less strict coding practices, weaker deliverable schedules, etc.

How would you characterize the success of using this simultaneous engineering process?

To answer this I can give you a different example, based this time on the Lightship VPS work. For a given location of the world, say, for example, a statue in a park, we have a potentially large number of user-provided scans (e.g. around 100). The safe approach is able to represent everything with a collection of, say, ten 3D point-cloud based localization maps. Success here means that performance is acceptable, the implementation time is easily quantifiable, and the downstream product development can be based around predictable specifications. With the non-traditional approach, we aim to have a single map, using about 20 times less space than the ones created by the traditional approach, and offer better performance, like quicker localization with lower error. If the non-traditional approach works as we’d want it to, the overall system will be simpler, quicker, and more accurate, and we will have opened the door to on-device localization, which nobody else can do right now.

And what’s the process of information sharing between the two engineering groups? How do you track each other’s progress?

Well, we have several weekly meetings where everybody shares what they are working on. We also have Slack channels and an all-hands meeting. We also encourage a lot of one-on-ones between people working on the two paths. We want people to know how every path is progressing, because ultimately we’re always hoping to keep the performance characteristics of the two paths similar. That means that the goalposts change often for both approaches. Thinking of the previous example, say that the non-traditional approach finds a 20x map size compression strategy that maintains accuracy. This will implicitly set the goalposts for the traditional approach a bit higher. Sometimes the opposite is true, and the traditional approach gives us, for example, improved speed, so the goalpost for the non-traditional side moves. This means that there’s a continuous movement of what is considered the baseline approach. The search for the best approach is always going to happen, but the minimum baseline is also always moving up.

Skeptics may say having two development teams working in parallel for any reason is a waste of time and resources. Have you done a cost benefit analysis?

We’ve had this discussion on the team as well, and ultimately, we think it’s the price we pay for having both safety and potential success. If we were to focus on only one path, we’ll either give up on safety and then, when research fails, the entire project fails, or we give up on potentially 10x or 100x improvements on what can be done, and be stuck in the same safe, but ultimately not good enough, zone that everyone else is in.

How do the different approaches affect the time to productization? How do they push the boundaries of state of the art?

Our productization has so far been quite quick with both approaches — our safe path is always a build away from deployment, and we’ve seen algorithms from our research papers deployed within a few weeks. We’ve found in the past that in three weeks we can go from an idea on paper to a real world deployment. The key to this is that we make sure that we test our research work on our own problems and through our own realistic benchmarks. That means that when we have something that works we know it will work in the real system, so we can deploy it as quickly as our legal department and our engineering will allow. We do that by maintaining the same set of dependencies, aiming for the same code quality once research code reaches (at least some) maturity, and having everyone’s work checked by everyone else.

In my view, key to our pushing of the state of the art in our domain, is the development of large, realistic, datasets and benchmarks, together with improvements on the most recent and accurate algorithms for map building and localization.

An interactive Scaniverse scan of Victor in his office.

What impact does this bifurcated engineering method have on morale and culture at Niantic?

Keeping morale high is very important and potentially challenging when the code and algorithms produced by one of the paths might end up not being used as much as the other. That’s why I tend to consider the identity of the approach that has the best performance at any given time to be irrelevant. We would not want the folks that work on the safe approach to second-guess themselves if their code isn’t used. On the other hand, we want the people working on the non-traditional approach to feel they have the space to be non-traditional. So if their ideas don’t pan out, we have a solution that we can rely on, quantify and trust. Both sides are working on the same project and towards the same goal. Limitations of one of the approaches end up being driving functions for the other, so a “failure” is really a success in finding what not to do.

What does this non-traditional versus safe approach say about the larger Niantic mission or culture?

Speaking of our approach to engineering, it all comes from the question of how to deliver the best experiences for our users, be they players of our in-house developed games, or developers with the Lightship platform. Our aim within Niantic’s augmented reality engineering is, on the one hand, to leverage the latest and greatest research while still accounting for the fact that research can fail, and on the other hand produce quantifiable and predictable code that can drive product development. The non-traditional/safe approach gives us this – it allows us to do research without worrying too much about failure, and, at the same time allow the downstream users of our methods and algorithms to be able to depend on our deliverables. So we need both approaches, and depend on the safety of the known method, while maintaining the craziness.

You mentioned you have a lot of internal discussions about what it means for something to be good. What is “good” to Niantic, and did you all come to agree on the definition?

Good is a good user experience. Now, how do we define what a good user experience is? That’s been one of our points of debate. Some members of the team might argue that the rate of success — measured by the number of successful localizations within a user scan — is not that important. Some others might argue that the rate of success is very important because our users might get bored if the success rate is too low. Eventually, we agreed on a benchmark configuration, but I suspect that, as we get more and more users of our map and VPS, we’ll receive feedback that will improve the relevancy of our benchmark.

Do you think Niantic has a higher standard for good?

I think so, and I think it’s all thanks to Niantic tech lead, Ben Benfold, and the rest of our team. I would argue that he’s the one that instilled the philosophy that it doesn’t matter if we think our system is good, what matters is if it creates a good experience for the user. It’s Ben who insisted that everything we do has to work well in the real world, and in an end-to-end manner i.e. always test the entire end-to-end system, be it either the non-traditional or safe versions, and not just individual separate components.

Lastly, you mentioned that a big seed of your passion for AR came from watching the movie Minority Report. Tell me about that.

So, I saw the movie and I was like, “yeah that’s nice, I can do that!” Then I had a project during undergraduate school back in Romania where we put these accelerometer-type things on hands that worked kind of like a Nintendo Wii before the Wii. At some point, I found a research group at Oxford that was working on the same sort of concept — and I saw that my stuff worked better than theirs, so I applied to study computer vision at that research group. Funnily enough, that was the Active Vision Lab that I now run. I ended up doing my PhD there, but it turned out I was too dumb to figure out hand tracking, which was very difficult at the time because the computers were slow. Instead, I found that my work was very good at getting positions for 3D objects.

So when I told my supervisor I was changing my mind about my project, he told me to start small, using small objects and small motions to track. Over the course of a few years we ended up building a good 3D object tracker, probably the state of the art at the time, we used it in all sorts of places, such as in tracking shoes within a stroke patient rehabilitation research project.

My system up to that point had always assumed that the 3D model of the tracked object was given. Next I moved on to building that 3D model at the same time as I was tracking it, and moved up to something the size of a desk, using depth sensing technology like that used in the Kinect gaming system. But then I thought if I want to go down the non-traditional path, what if I was able to reconstruct a whole city, without using a depth camera, and it all ran on my phone in real time. That’s what we started at 6D.ai, and now continue to work on at Niantic.


Get the latest