Navigate Incident Management Like a Pro: MyFitnessPal's Sr. Director of Engineering Shares Insider Strategies with Lee Atchison

Building Blameless right from the beginning

6.1.2018

When Ashar, Lyon and I set out to start Blameless, I was the one in charge of the technical side. As I was exploring options and principles with which to build our system, I had a small realization I hadn’t had before. What I realized back in September last year, is that our industry has finally reached the tipping point at which it has become viable to build distributed systems from scratch, at a fast pace of iteration and low cost of operation, all while still having a small team to execute!

Now, if you’ve been in the startup scene for long enough, you’ll immediately react to this and think “nonsense!”. Everybody says that if you’re building a startup, hacking and experimentation are essential to move fast enough to achieve product-market fit before you run out of cash and die a brave death. Well, I come from that school as well. I’ve read The Lean Startup many many times. I hang out in hacker news. I joined a tiny startup as their first employee all but a decade ago and saw it grow to the hundreds through this philosophy.

Here’s where things changed for me. After growing through the ranks to be in charge of a massive cloud of physical servers, thousands of VMs and dozens of services, I moved on to a big company to try and help a team of brilliant people plan a massive migration of infrastructure to centralized CI/CD and orchestration. The approaches and tools we explored during that time, radically changed the way I think about architectures and how to enable experimentation.

Fast forward to last September; I set out to build the core of what would later become Blameless. I decided to start from first principles, so I wrote down the pillars that I believed would enable the product that Blameless needed. Here are those principles:

Build a system and scales horizontally to support higher workloads and more substantial amounts of customers
Build a system that can be deployed and operated in different configurations (Single Tenant Hosted, Multi-Tenant Hosted, Hybrid cloud, on premises)
Maintain an architecture that is deliberate, effective and practical for the product and goals at hand
Have an infrastructure that stays out of the way, enabling multiple teams to deliver software without friction through a reliable and mature process
Deliver top-tier reliability to our customers, leading by example in the reliability and operations fronts
Build an architecture and infrastructure layer that enables fast-paced iteration for our product

As I shared earlier, some of these sound pretty counter-intuitive, don’t they? We’re supposed to move fast, incurring technical debt, cutting as many corners as possible to be able to out-compete the big players and pull a miracle. Do things that don't scale! Well, after almost a year down this adventure, I’m happy to report that we've been able to stick to these principles. By choosing the right set of tools, our team can prototype faster than ever, iterate quickly, all while still maintaining a manageable level of tech debt and setting ourselves for a more manageable future as our product matures and our scale grows.

We're going to be writing about these principles, strategies, and tools through this blog post series, but to get things going, here’s the list of technologies we’re taking advantage of to accomplish that.

Our Stack:

Python 3.6 and Pipenv
Nameko and RabbitMQ
MongoDB
Cookiecutter

Our Infrastructure:

GCP and GKE
Kubernetes and Helm
SOPS and helm-secrets
Travis CI
Weaveworks
Auth0
Github Releases

Resources

Book a blameless demo

To view the calendar in full page view, click here.

Share to

Get industry insights and events in your inbox.
Sign up for our monthly newsletter.

Company

About us Newsroom careers contact

Product

pricing integrations interactive Demo

Help Center

Getting Started Implementation Security Documents APIs & Webhooks

resources

Blog ebooks Incident Impact Calculator videos glossary Comparisons How Long do you Spend on an Incident?

legal

By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Based on the applicable laws of your country, you may have the right to request access to the personal information we collect from you, change that information, or delete it. To request to review, update, or delete your personal information, please fill out and submit a data subject access request to support@blameless.com.

I Accept

Preferences