Resources
Browse through videos, guides, and other educational resources that cover incident management, reliability, team culture, and more.
Blog
Ebook
11.17.2020
How Mercari Scales Vision, Culture, & Reliability
In a recent fireside chat with Mohan Bhatkar, Head of Engineering for the Customer Reliability Platform at Mercari, Inc. sat down with Blameless Co-Founder Ashar Rizqi. They talked about scaling while avoiding silos, exciting day-to-day challenges, instilling a culture of empowerment, and more. Here are their top insights and the lightly edited transcript of their conversation.
Blog
Ebook
10.8.2020
How to Construct a Reliability Model for your Organization
In this post, we’ll construct a basic reliability model and show you how to create one for your own organization.
Blog
Ebook
9.24.2020
Here's your Complete Definition of Software Reliability
In this blog post, we’ll break down what software reliability means. We’ll look at how the reliability of your software is perceived, how teams operate to improve reliability, and how to contextualize reliability with customer happiness and cultural lessons.
Blog
Ebook
9.17.2020
Availability, Maintainability, Reliability: What's the Difference?
Here we break down reliability in terms of other metrics within reliability engineering: availability and maintainability.
Blog
Ebook
9.8.2020
How to Improve the Reliability of a System
Here are some helpful steps to take to improve reliability of a system. We'll use a development project as an example.
Blog
Ebook
9.2.2020
Determining Error Budgets and Policies that Work for Your Team
In this blog, we’ll look at the basics of error budgeting, how to set corresponding policies, and how to operationalize SLOs for the long term.
Blog
Ebook
9.1.2020
How to Build Your SRE Team
In this blog post, we’ll look at some of the many roles an SRE can play, and how to find people with those skill sets.
Blog
Ebook
8.20.2020
What is a Kubernetes Operator and Why it Matters for SRE
In this blog post, we’ll explain the Kubernetes Operator—the Kubernetes function at the heart of customized automation—and discuss how it can evolve your SRE solution.
Blog
Ebook
8.19.2020
Here are the Metrics you Need to Understand Operational Health
In this blog post, we’ll walk you through holistic measures and best practices that you can employ starting today. These will include challenges and pain points in gaining insight as well as key metrics and how they evolve as organizations mature.
Incident Impact Calculator
Find out how much you could save
Incidents can do real damage to companies that aren't sufficiently prepared them. Use our calculator to estimate the full cost of incidents for your team.
use the calculator