Skip to main content
June 20, 2023
Niantic’s Pioneering Research at CVPR 2023: Pushing the Boundaries of Computer Vision and Augmented Reality

By Gabriel Brostow

The Niantic team is thrilled to share some breakthroughs we’re presenting at this week’s Conference on Computer Vision and Pattern Recognition (CVPR 2023). Pioneering researchers in our R&D and Mapping teams have made significant advancements in computer vision and augmented reality (AR), revolutionizing AR occlusion, camera relocalization, two-view geometry scoring, and NeRF editing. In this post, we’ll give the highlights of the cutting-edge techniques we’ve developed, and explore how they are shaping current and future AR experiences.

Conference on Computer Vision and Pattern Recognition (CVPR)

CVPR holds a prominent position in both peer-reviewed research and computer vision related industries, serving as the proving ground for the latest innovations. We are proud to have five papers at CVPR, which started to garner quite some buzz in the run-up to the conference.

Virtual Occlusions Through Implicit Depth

For this paper the team focused on achieving accurate and stable occlusions in AR applications. We challenge the traditional depth-regression approach and propose an implicit model for visibility. Where traditionally occlusions are calculated after a two-step process, our CNN directly predicts occlusion masks, given an image of the target scene and a known virtual geometry asset, e.g. an AR character. Our approach surpasses traditional depth-estimation models, achieving state-of-the-art occlusion results on the ScanNetv2 dataset. Thanks also to our model’s temporal-stability enhancements, the results score better and look better too. This breakthrough research brings us closer to creating life-like special effects like the movies, but live on your AR device.

Learn more on GitHub.

Implicit depth is a new approach for accurately estimating scene occlusions for augmented reality (AR) applications

”Our system reduces training time significantly, being up to 300 times faster than state-of-the-art scene coordinate regression methods while maintaining comparable accuracy.”


Accelerated Coordinate Encoding: Learning to Relocalize in Minutes using RGB and Poses

We are excited to present our accelerated relocalization method, which achieves high accuracy in less than five minutes. By splitting the relocalization network into a scene-agnostic feature backbone and a scene-specific prediction head, we optimize training across thousands of viewpoints simultaneously. This innovative approach ensures stable and rapid convergence, making learning-based relocalization practical for a wide range of applications. Our system reduces training time significantly, being up to 300 times faster than state-of-the-art scene coordinate regression methods while maintaining comparable accuracy.

Learn more about ACE here.

Our approach, ACE, maps a new environment two orders of magnitude faster than the baseline, DSAC, while being as accurate.

Two-view Geometry Scoring Without Correspondences

RANSAC and its many offspring algorithms are great, and get used for many tasks (not just in 3D geometry). But people usually pair RANSAC with a heuristic that reminds me of the movie quote “60% of the time it works every time.” The heuristic scores each of RANSAC’s proposed solutions using the inlier count, i.e. a voting system for matched points. But in cases where long scans or large areas lack sufficient matches between images, we show that over-trusting that score leads a system to favor the wrong overall solution. Our team has developed a specialized Convolutional Neural Network (CNN) known as the Fundamental Scoring Network (FSNet), which scores proposed fundamental matrices more effectively, under various conditions. FSNet leverages an epipolar attention mechanism to predict the pose error of image pairs without relying on sparse correspondences. Good poses, under all weather conditions, are important for persistent AR where the user needs to relocalize themselves in the real world.

Learn more here.

Example where SuperPoint-SuperGlue correspondences are highly populated by outliers. In such scenarios with unreliable correspondences, current top scoring methods fail while our proposed FSNet model is able to pick out the best fundamental matrix.

”Many of these advancements are being (or have been already) incorporated into Niantic’s Lightship ARDK, so that software and AR experience developers can use them for creative and commercial purposes. Or as a springboard for further science!”


Removing Objects from Neural Radiance Fields (NeRFs)

There are many reasons to love Neural Radiance Fields (NeRFs) and their use for novel view synthesis, but editabilty has not been one of them. So we are proud to introduce an algorithm for removing fairly large objects from NeRFs. Our framework leverages recent advancements in 2D image inpainting, and incorporates user-provided masks to inpaint the NeRF representation while ensuring 3D consistency, as the user looks around. This method empowers users to remove personal information or unwanted objects from NeRFs, enabling a more customizable and privacy-conscious AR experience.

Learn more on GitHub

Our method allows for objects to be plausibly removed from NeRF reconstructions, inpainting missing regions whilst preserving multi-view coherence.

DiffusioNeRF: Regularizing Neural Radiance Fields with Denoising Diffusion Models

While NeRFs are revolutionary for image-based rendering, they normally require images from numerous viewing angles of a scene or subject. With only a handful of images, you can usually only get very blurry renderings from nearby vantage points. To address these challenges posed by the under-constrained nature of NeRFs trained with few input views, our research team has developed a Denoising Diffusion Model (DDM). Our DDM is trained on RGBD patches of the synthetic Hypersim dataset and can be used to predict the gradient of the logarithm of a joint probability distribution of color and depth patches. By leveraging the DDM, we regularize the training process, resulting in improved reconstruction quality and better generalization to novel views. Our approach represents an avenue for bringing real scenes into AR/VR, with ultra-small image collections that any end-user can provide.

Learn more on GitHub

Test sequence of object removal

At Niantic, our Research and Mapping teams’ exceptional achievements at CVPR demonstrate our continued hunger to push the scientific boundaries of computer vision and augmented reality. Our community faces many challenges (and hypotheses that don’t work out in practice) on the way to making AR a usable addition to regular reality. The successes described here come from our team’s incessant attention to accuracy, stability, speed, and robustness. Many of these advancements are being (or have been already) incorporated into Niantic’s Lightship ARDK, so that software and AR experience developers can use them for creative and commercial purposes. Or as a springboard for further science! We are excitedly continuing our research and development efforts, seeking further breakthroughs to create even more immersive and realistic AR experiences for our global community of millions. Stay tuned for more exciting updates!


Gabriel Brostow

I’m Gabriel Brostow, Chief Research Scientist at Niantic, and professor at University College London. I manage the brilliant and supremely creative Niantic R&D Team, as we explore and productionize ever new technologies to help people explore the world together.


Get the latest