At Niantic Research, our goal is to enable new ways for people to explore, understand and interact with each other in a shared augmented world. For that to happen, we have to solve complex problems in the areas of computer vision and machine learning.
At the 2020 European Conference on Computer Vision (ECCV 2020), the top international conference in computer vision, our research team presented three new papers and a technical report each of which addresses questions about where a player is in the world, what is the 3D scene layout in front of them, and how it all fits in to the larger 3D map of the digital-yet-real world around them.
AR experiences get richer through an accurate and up-to-date 3D map of the world. Building and maintaining a real-time, dynamic 3D map is an industry challenge requiring advancements in computer vision, spatial computing, AI, engineering, and mobile mapping. These improved AR experiences and 3D maps of the entire world help create more ways for people to engage with the world, through play and learning, by interacting more deeply with the visual environment around them. These are the drivers of Niantic’s long term vision for the Niantic Real World Platform.
Each project is different, with three of these driven in large part by brilliant PhD students who took part in our annual Niantic R&D internship program. See the video-summaries and check out code and other details in the links below.
Natural and believable AR experiences are key to Niantic’s vision for their own games as well as the overall platform and the AR industry at large. For more information, please visit https://research.nianticlabs.com/.
For a deeper view of our research, please read the full text of the papers presented at ECCV 2020 here:
Learning stereo from single images - Our work at ECCV 'Learning Stereo from Single Images', is a new way to create data to train stereo matching networks. Typically, deep stereo networks are trained on synthetically rendered images, generated from 3D graphics pipelines. These training images are not photo-realistic, but have perfect ground truth depths. In our work we take an alternative approach, which surprisingly we find significantly outperforms 3D-rendered data.
Single Image Depth Prediction Makes Feature Matching Easier - To build 3D maps of the real world, relative camera poses between pairs of images need to be estimated. In this work we use monocular depth estimation networks to predict depth for each image, which allows us to detect planar surfaces that can be rectified and extract view-invariant local features for better feature matching. Rectification of features even allows us to match images that are observing a scene from opposing views – common scenario in multiplayer AR.
Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings - Finding images that are observing the same part of the scene is an important procedure in mapping and visual localization. In this paper, we propose a novel model to measure to what extent two images are picturing the same scene, i.e. visual overlap, without resorting to expensive operations across multiple scales. We show that a visual overlap between a pair of images allows us to interpret a relative relation between two cameras. However, visual overlap is not symmetric, so it cannot be modeled with traditional approaches like vector embeddings. Our solution is to represent images as high-dimension boxes, which allow modeling of non-symmetric measures between images.
Image Stylization for Robust Features - In this technical report, we train local features (detectors and descriptors) that are robust to appearance changes, by strongly augmenting the training data using image stylization. Typically, deep local features are trained with color augmentation, but we show that using image stylization with different styles and style examples improves the performance significantly. Models trained with the proposed approach achieved second place in two localization competitions at ECCV.
–The Niantic Research team