Resources
Browse through videos, guides, and other educational resources that cover incident management, reliability, team culture, and more.
Videos
Ebook
11.4.2022
Too many people in the room
When something goes wrong, it can be tempting to gather as many people as you can to fix it. Each person can contribute tremendous value through diverse viewpoints, but too many people can overcrowd your response, leading to miscommunication, redundant work, and much more.
Videos
Ebook
11.4.2022
Varieties of Incident Response
Have you ever wondered if there was a better way to respond to incidents? When you are in the midst of an incident, does the process help you and your teammates or is it more of a burden? There have been a variety of approaches to organizing people and teams over the 30+ years of online services. Each of them have benefits and drawbacks. This talk will dive into a representative set of these approaches to examine them and help the audience to have a wider context by which they can evaluate their own arrangements for incident response. The talk will also look at incident response from a more abstract, task/intent-focused perspective to give a framework against which processes can be examined and adjusted to be more enabling, less burdensome. (And no, this is not a lite beer commercial ;-))
Blog
Ebook
11.3.2022
Service Level Management Process Explained (with Examples)
Service level management requires airtight processes to ensure SLAs are on track and to catch any issues beforehand, while following these ITIL best practices.
Videos
Ebook
10.24.2022
How to Create a Streamlined Incident Management Runbook
Blameless is streamlining your incident management runbook with this 60-minute workshop.
Blog
Ebook
10.19.2022
What Is Infrastructure Monitoring & How Does It Work?
Good infrastructure monitoring goes beyond diagnosing performance and availability issues. Make sure your tool also meets these requirements.
Blog
Ebook
10.12.2022
Reliability vs. Availability: What’s The Difference?
Availability is the percentage of time a system is available to users, while reliability is likelihood that the system will meet a certain level of performance.
Customer Stories
Ebook
10.6.2022
Machinify finds "tremendous value" in Blameless and responds to all incidents with full confidence
Blog
Ebook
10.5.2022
SRE Hiring Guide - Interview Questions and Skills to Look for
Hiring top SRE talent requires writing an attractive job description and asking smart interview questions. In this guide we’ll go over what you should prepare.
Incident Impact Calculator
Find out how much you could save
Incidents can do real damage to companies that aren't sufficiently prepared them. Use our calculator to estimate the full cost of incidents for your team.
use the calculator