Resources
Browse through videos, guides, and other educational resources that cover incident management, reliability, team culture, and more.
Podcasts
Ebook
6.23.2021
Resilience in Action E8: Vanessa Yiu on Crafting Enterprise Architecture
Kurt chats with Vanessa Yiu, Head of Enterprise Architecture at Goldman Sachs. Vanessa shares her perspective on enterprise architecture, experience in operating enterprise-scale platforms, chairing the first global SRECon, advocating for women in STEM, and how enterprises can embark on the journey of making reliability more important.
Blog
Ebook
6.11.2021
Complete Guide to Service Level Objectives (SLOs) That Work
A "Service Level Objective" (SLO) is an internal target that measures how well a service is performing. Here's how they relate to SLAs, SLIs, and error budgets.
Videos
Ebook
6.2.2021
LISA21 - Groove with Ambiguity: The Robust, the Reliable, and the Resilient
The networked software systems we build are increasing in complexity every moment. Today the most successful builders and operators are embracing complexity through CI/CD, Chaos Engineering, and innovation in Incident Response. They realize that the adaptive world around us is advancing at such a breakneck speed, it is leaving our capacity to understand it in the dust. That humans and technology must race a gauntlet of automation surprises and collaboration challenges as a team, learning and improving along the way. This session showcases methods of deploying, running, and navigating complexity. It offers a practical view of how software systems can scale and remain robust to failure (like fallbacks or high availability), achieve highly reliable socio-technical operations (via runbooks and game days), and adapt to surprise through techniques of resilience engineering (graceful extensibility and building for adaptation).
Blog
Ebook
6.2.2021
Error Budgets Explained (And How to Make One for Your Team)
Wondering what error budgets (EBs) are and how they are useful? We explain what they are, how they are defined, and how they can help your team.
Blog
Ebook
5.31.2021
The 7 SRE Principles [And How to Put Them Into Practice]
Whether you're just adopting SRE or optimizing your current processes, we can help. We’ll explain the 7 key principles of SRE and how to put them into practice.
Blog
Ebook
5.25.2021
Building an SRE Team? Roles and Responsibilities Explained
Are you considering adopting SRE? We will explain the roles and responsibilities of an SRE team within your organization, and how to start building one.
Blog
Ebook
5.24.2021
SRE Culture [How to Build a Better Team]
If you're just adopting SRE or improving your current environment, we’ll help explain SRE culture and how to create a blameless development process. So what is SRE Culture? Let's talk about it.
Podcasts
Ebook
5.19.2021
Resilience in Action E7: Killing Ops with Tony Hansmann
In our seventh episode, Kurt chats with Tony Hansmann, Former Global CTO at Pivotal Software, Inc., about the joys and pains of being a consultant, how teams view digital transformation, how Tony is working towards killing ops, and more.
Incident Impact Calculator
Find out how much you could save
Incidents can do real damage to companies that aren't sufficiently prepared them. Use our calculator to estimate the full cost of incidents for your team.
use the calculator