Open Sourcing Modron: Managing Cloud Security at Scale at Niantic

October 12, 2022

Compliance and security scanning on cloud is a growing need not always addressed by platforms directly.
With Modron, Niantic is giving the cloud security community a tool that adapts well to the needs of a large cloud footprint, filling the communication and inventory gaps of existing tooling.

Niantic’s security team benefits from the open-source security community in several components of our infrastructure security program. Today, the whole team is thrilled to have a chance to give back to the cloud security community by releasing our cloud security posture management tool: Modron. In this post we will explain what drove us to create Modron and the core design principles.

As the number of resources has sharply increased with the expansion and growth of cloud infrastructure, maintaining sensible security configuration and compliance has become more and more challenging. Numerous security tools still assume that maintaining inventory, collecting data, looking at results and fixing issues is performed by the same person. At Niantic, we have tens of thousands of virtual instances and on the order of a thousand cloud projects. Modron is the result of our laser focused work on compliance processes at scale.

Cloud asset management at scale

To deliver and run the games enjoyed daily by millions of players around the world, Niantic gives each team flexibility and considers its specific needs. The engineers know best what they need in terms of infrastructure and maintenance, so it is better to empower them than to restrict the teams with security tooling. Accompanying our teams with guidance is the work of our security team. As the company is growing, systematic review of all our infrastructure has become a titanesque task. In order to keep up the quality of our work, we needed a tool that can handle all phases of the security review.

Security scanning process description

Inventory is way more than a simple list of IP addresses or assets. Without the right metadata, a list of IPs is worthless and any attached list of vulnerabilities is not attributable. Useful inventory must contain relevant assets and meta data like a date and ownership information. The owner of an asset must be someone that has the permission and the technical knowledge to evaluate reported issues and act on them if necessary.
Data collection has been greatly improved with the cloud. One of the big advantages of cloud platforms is that configuration data can be collected directly from the platform without impacting running workloads. Storing the collected data along with timestamps allows for going back in time and looking at the evolution of the infrastructure.
Data analysis consists of comparing collected data with the expected state. This has made significant progress in recent years and most of the security tools today provide a decent version comparison tool as well as detection for common misconfigurations.
Action: probably the most important part of this process, the corrective actions must be taken by a person who has context knowledge about the impacted asset. Not all recommendations are to be blindly applied and centrally applying changes to production environments can be very damaging. Reaching out to the owner of the asset, tracking remediation actions at scale and measuring progress on these remediations is the responsibility of our security team.

State of the art

Security automation has made significant progress in recent years. Unfortunately, this progress has been concentrated on the data analysis part of the security automation process. Better detection means that more malicious behavior can be detected, more techniques are recognized, and more assets are supported.

The inventory and actions parts of the process have mostly been neglected, and a lot of security teams still have to drive rollout and remediation efforts with spreadsheets, finding owners of assets by running command lines or worse, by mass emailing their colleagues to figure out who owns a specific piece of infrastructure. The lack of support and automation in this aspect of the process increases the amount of toil on security teams linearly with the growing size of the infrastructure, yet budgets and headcounts do not usually follow this trend.

Another big limitation of existing tools doing cloud security posture management is the limitation of the language they provide to define rules. In a crusade for simplicity, the flexibility of rule definition is usually limited to a descriptive language. This limitation does not allow for complex rule writing. As an example, the following rule cannot be expressed in most tools available today: Report the list of unused exported credentials granting privileged access to infrastructure exposed to the Internet.

Finally, most of the tools out there assume that the different parts of the process are owned by the same person. Ownership of infrastructure in large organizations is usually split across different teams and security processes should bridge these gaps to run efficiently:

Inventory, its creation, and maintenance is a shared responsibility between all teams running infrastructure pieces. This process can be supported with automation to make sure that metadata is up-to-date.
Action on findings is the responsibility of the team owning the impacted asset. This will prevent undesirable side-effects where multiple teams work on the same asset and help increase reliability of the infrastructure.
Starting the data collection in a cloud context consists mostly of fetching data from the platform APIs. For this an automated process can be set-up. In a more traditional way of running scans where assets under scans are being sent a large amount of data, an SRE team may want to own the schedule of the scans to make sure that they take production requirements into account.
Defining policies with organization-wide expectations and rules is usually the responsibility of the security team and the leadership of the organization, taking into account the business needs and budgets.

In this context, most of the reported issues stay unaddressed, defeating the purpose of having an automated detection in the first place.

Modron

With Modron, Niantic’s security team is offering the community a tool that is focusing on improving the inventory and action steps of the cloud security management pipeline.

Modron is a compliance scanner that:

Is designed with multi cloud in mind
Adapts to specific needs with flexible rule writing by moving away from the model “one API call → one regex”
Scales to large infrastructure (more than 1000 GCP projects)
Forwards findings to the right people and maintains awareness until the findings are mitigated
Delivers insights into the infrastructure to the security team and the company leadership
Is low maintenance, to align with the needs of small security teams

Modron is able to extract ownership information directly from the access information of your infrastructure. This way the ownership information is guaranteed to be up-to-date.

Modron serves as a communication platform in tandem with the included notification service Nagatha, sending communications to the owners of assets regularly with the list of issues they can act on. By aggregating and rate limiting the notifications, Nagatha helps reduce alert fatigue created by receiving too many notifications too often. Modron supports the creation of new findings and linking them to existing resources. By linking new findings to resources, the ownership information can be added to the findings and notification be sent automatically to the owners of the impacted resource.

Modron scales up the capacity of your cloud security team to deal with infrastructure compliance.

Modron runs on GCP and collects data from GCP today, but is designed to be extensible and can support multiple cloud platforms.

Summary

Compliance and security scanning on cloud is a growing need not always addressed by the platforms directly. With Modron, Niantic is giving the cloud security community a tool that adapts well to the needs of a large cloud footprint, filling the communication and inventory gaps of existing tooling.

Modron scales the detection and communication processes up to the next level by automating verification of issues and notifying the asset owners.

If you’re interested in cloud security and want to improve the status quo, join our team!

– The Niantic Security Team

Hero image at top of the blog post was generated with Dall-E with "logo art of a victorian cubical robot in a tuxedo with a top hat and holding binoculars"