By Gabe Brostow
This is the second in a three-part series on Engineering Peridot. Read part 1 here!
By now, the secret project is out! The world’s most innovative fully Augmented Reality (AR) and AI-powered game has launched, and it’s called Peridot. In the game you’ll hatch and nurture your very own creature, or “Dot,” and share it with friends. Just like a real pet, Dots thrive on your attention and usually stay with you wherever you go. Dots like to eat treats, get rubbed behind the ears, learn tricks, and take naps. They also like to go out and explore the world, because they have AI capabilities that allow them to, bit by bit (sorry…), understand more about their real-world space. In this post, we’ll touch on just two of the AI components we’ve been developing, so that Dots and other AR characters can become participants in the real world.
Greater Depths
AI in general has been an intrinsic part of developing video game characters and bringing them to life. At Niantic, we’ve been working on AI algorithms for computer vision for many years, with a focus on creating AR experiences that are more immersive and lifelike. In fact, multiple teams and projects came about precisely to bring Dots to life, and to give Peridot a chance of becoming a showcase for new AR capabilities.
“. . . pets need to move and frolic, and for that, we’d need the Dot to understand their space.”
One of the many challenges in computer vision is how to convert images into accurate 3D shapes, so that Dots could move and interact with the real world. With camera poses and sparse 3D points, we knew that Dots would at least “sit” in the real-world despite a player’s hand-held phone moving around. But pets need to move and frolic, and for that, we’d need the Dot to understand their space. We knew that Visual Inertial Odometry-based SLAM systems in Google’s ARCore and Apple’s ARKit could provide a good foundation, but we needed something more to make Dots truly interactive. Specifically, we needed the camera to not only capture RGB colors but also depth, so that Dots could understand what space is blocked or flat enough to walk on.
Fortunately, the competition in the computer vision community was heating up in 2017 around Convolutional Neural Networks, or CNNS, that could re-interpret a single-frame RGB image as a depth buffer. By 2019 our team’s ground-breaking research project MonoDepth2 passed the threshold of being usable in real life. We have continued to innovate on depth-inference in many ways (for multi-frames, stereo, and occlusions), always in regular cooperation and competition with the scientific community - CVPR and ECCV conferences are excellent proving grounds!
Once regular (non-LIDAR) phones were able to measure depth at a range of 5 meters or more we discovered various methods to fuse those depths into meshes, with an emphasis on either speed or accuracy. This allows your Dot to perceive the 3D shape of the world, at least the visible parts, through your phone. As you move and look around, the mesh expands until you end the session. It is important to note that your Peridot’s environment is stored within your phone, not on our servers. The mesh also contributes to the realism of the experience in other ways, such as receiving cast shadows from Dots and their toys.
“. . . we had to commission and label copious indoor footage, so a Dot that ventures indoors with you can understand its environment, and feel at ease . . .”
One significant advantage of the mesh, especially for AR experiences, is that the predicted depths enable real-world objects to occlude virtual ones. This means that Dots can now hide behind trees or other obstacles. It’s worth mentioning that in the initial stages, there was no “x-ray” trace of an occluded Dot, which caused some users to worry that their Dot had disappeared behind sofas or other objects.
A Dot occluded by a real tree and a Dot's shadow representing its occlusion
Scene Understanding for Pets
The other significant AI component of Peridot is its understanding of the semantics in a Dot’s environment. Semantic segmentation is another big challenge in computer vision, and seeks to label each pixel in an image with a category-label such as “sky” or “water.” Having started on Peridot well into the Deep Learning era, we knew that standard supervised learning methods would yield usable CNN’s. Shrinking them down to run on a phone was a little harder. Figuring out what semantic categories would matter to the Peridot creatures, and getting the right data, proved harder still!
Dots recognize their environments whether they are indoors or outdoors
Peridot lore tells us they were originally outdoor creatures. So they need to understand about foliage, sky, and water, because each impacts the Dot’s behavior. But there are many types of “ground”, and Dot’s will eventually be able to distinguish between them. These days, Dots can also be found indoors. This meant we had to commission and label copious indoor footage, so a Dot that ventures indoors with you can understand its environment, and feel at ease, even if it doesn’t yet understand things like glass and mirrors.
Dots are also increasingly being exposed to people and real-world pets. Here I’d like to point out a serious lesson that we learned from vision researchers who came before us; biased data is lurking everywhere! I’m especially proud of our home-made person-segmentation. Both because it has good sensitivity with fairly few false-negatives, and especially because the accuracy of recognizing people is reasonably consistent across different skin tones, genders, ages, etc. You can see the scores and use this CNN yourself in your own AR projects by checking out our Lightship docs.
“Today, when you’re playing with your Dot, it sees the geometric shape of the world, and understands how to avoid obstacles, how to plan paths that get it to higher ground, and that some surfaces allow bouncing of a tennis ball . . .”
AR Capabilities - Assemble!
“Now we just need ‘X’!” is a favorite catchphrase in our Research and Development Team. And readers with experience in building AI’s or AR’s or AR-AI’s will already know that few integrations happen without revealing new (and often fun) problems. So it’s no surprise that combining fused depths with semantics led to our Gameboard capabilities only after significant effort. Today, when you’re playing with your Dot, it sees the geometric shape of the world, and understands how to avoid obstacles, how to plan paths that get it to higher ground, and that some surfaces allow bouncing of a tennis ball, while other surfaces go splash!
From babies to building families, Dots will follow you everywhere
The Gameboard and other aspects of Peridot and our underlying AI will continue to evolve. I, for one, look forward to seeing Dots playing together in multiplayer sessions, and to using our visual positioning system for persistence: to build and retrieve large virtual maps so a Dot can remember past adventures. I’m also excited to see the feedback from all the Peridot “Keepers”: what does seeing a fully-AR game make you wish for next?
Gabe Brostow
Gabe Brostow is a Chief Scientist at Niantic working on research and development in the computer vision space.