Creating Sandboxes: How to Make It Easy for Your Engineering and Product Teams

Feb 6, 2021

Justin Moy, Senior Software Engineer

When we formed our Infrastructure team in 2020, our charter centered on scale, reliability, and the developer experience. Our first priority? Developing a staging environment strategy that could keep up with our growing organization.

Initially, we had a set of command-line tools for staging environments that could be run from an engineer’s workstation. However, this approach required significant manual effort to create and manage an environment which led to inconsistencies like outdated data and lack of full application coverage and integration. The required workstation dependencies and maintenance added to the overhead. These tools were also different from the ones used to create a production environment, causing drift between the two.

To address these issues, we focused on improving our tooling to create consistent and reproducible results for both our staging and production environments. Here are the five steps we took to get there:

Containerize to isolate applications
Kubernetes-ize to improve consistency
Helm-ify to manage applications
Automate to reduce friction
Optimize to improve user experience

Containerize to isolate applications

Luckily, we were already here. Our deployment process involved creating and deploying Docker containers and using a Procfile to run specific parts of our applications (think separating out web and workers). Packaging our applications as immutable Docker images allowed us to ship consistent and portable code. These images are run as containers which can be executed on separate hosts and scaled independently of each other.

Kubernetes-ize to improve consistency

We wanted to increase our observability, mature our internal monitoring, and maintain our infrastructure in a more consistent and reproducible manner. Kubernetes, and its automated container orchestration (deployment, scaling, and management), allowed us to accomplish all of the above goals.

It was easy for us to get started in Kubernetes thanks to Kind, which allowed us to run clusters locally. This increased development speed and reduced the friction of learning a new external provider.

Helm-ify to manage applications

Helm makes it easy to manage packaged applications in Kubernetes. Some community-provided applications include: NGINX for load balancing and reverse proxying HTTP and other traffic; Prometheus and Cortex for metrics, monitoring, alerting, and dashboarding, Loki for logs; and Sentry for exceptions.

In order to ease the integration with other infrastructure components, enable rapid deployment, and, most importantly, provide us with the consistency and reproducibility we wanted, we created and deployed Helm charts for our applications.

From here, we could stamp out our applications for developer use in sandbox environments: complete end-to-end environments, separate from production, that enable engineers to rapidly develop and test their code. And if an engineer is working on multiple features, they can spin up multiple sandbox environments and manage their applications and data independently of each other. Victory!

Automate to reduce friction

We had already created pipelines in Concourse to automatically deploy and upgrade our infrastructure Helm charts. To eliminate the manual toil of spinning up new environments and to reduce the friction around sandbox adoption for our engineers, we extended these automated pipelines by building a Git-driven continuous deployment pipeline. When an engineer created a commit to a repo defining the state of a sandbox, our Concourse job would apply these changes.

Optimize to improve user experience

At this point, we could deploy environments that resembled production but were separate from production and other sandbox environments. They connected to the rest of our observability stack, including metrics, logs, traces, and exceptions.

As we prepared to scale to the rest of the product and engineering organization, we focused on the following ideas: further reducing manual toil, removing barriers to entry, reducing cost, and optimizing the development workflow. The diagram below shows our sandbox architecture.

Reduce manual toil (operator)

In Kubernetes, an operator acts on custom resources in order to automate a task or set of tasks. The custom resource allows us to create a well-defined entity and extend configurability options to users. The operator allows us to customize individual sandbox environments based on changes to the custom resource.

We utilized Kubebuilder to create our operator and CRD (custom resource definition). Kubebuilder was a great starting point and provides tools to ease rapid development of a new operator.

Using this pattern, we could easily perform CRUD operations (create, read, update, delete) on a custom resource using standard tools like kubectl, and the operator would handle updating our deployments in Kubernetes. These updates could include: updates to application configuration, deploying a new Docker container, injecting different secrets, etc. This flexibility allows for testing of many different scenarios like: developing a new feature, reproducing a bug, upgrading frameworks, and security and load testing.

Increase adoption by removing barriers to entry (admin web UI)

We built a web UI so users would be able to deploy sandboxes intuitively and easily without extensive knowledge of the underlying tools. The web UI interacts with our custom resource and lets the operator handle updates. It allows our teammates—both engineers and non-engineers—to easily spin up their own environments for development, demo, or training purposes without modifying production data.

In addition, we show helpful information about a sandbox to a user. This includes links to each of our applications; pre-filtered logs, metrics, tracing, and exceptions; and commands to connect to specific applications to ease debugging.

Reduce cost (autoscaler)

Because the sandbox environments are integrated with our monitoring stack, we have the full metrics and logging that we expect from our applications. We created an autoscaler service that performs queries against this information and scales our sandbox environments up and down. We’re able to decrease our costs by over 90% when these environments are not in use and are able to quickly scale them back up when they are being used.

Optimize the development workflow (image updater)

An additional feature we built into our sandbox environment is the ability to follow a feature branch in our applications. When an engineer pushes a commit to a pull request in Github, a corresponding sandbox will automatically update to this commit. This reduces several manual steps in the development workflow and allows engineers to test their code in near-to real time.

Impact

Sandbox environments have made an impact on the engineering organization and beyond.

Engineers can quickly desk check new code in a production-like environment during code reviews.
Shared sandboxes reduce the blast radius of bug bashes.
Non-engineers can spin up sandboxes to demo or provide training on features without impacting production data via the Admin UI. This has proven particularly useful for Product Managers and Designers on the team to be able to play with product features and better understand how they work in order to suggest improvements.
Teams have built use case scenarios into their applications. This allows easy reproducibility of critical paths and functionality.

With these five steps—Containerize, Kubernetes-ize, Helm-ify, Automate, Optimize—we’ve laid the foundation for deploying all of our environments in an automated and consistent manner. In addition to applying many of our learnings to improve our production environment, we successfully built core infrastructure that our engineers, product managers, and designers use each month to test their changes and reduce production downtime.

Interested in learning more about engineering at Alto? Follow us on LinkedIn.