Blog

Blog

Ebook

9.14.2021

What is the DevOps Lifecycle? The Complete Guide

Want to know more about the DevOps lifecycle? We explain the seven phases in DevOps, and how each one plays a vital role in the development process.

Blog

Ebook

9.8.2021

DevOps vs. Agile | Understanding the Differences

Curious about the differences between DevOps vs. Agile development methodologies? We'll explore and compare both approaches.

Blog

Ebook

9.1.2021

What is Container Orchestration? Key Concepts Explained

When your organization manages too many containers, you start to need container orchestration. We'll explain.

Blog

Ebook

8.26.2021

8 DevOps Best Practices for a High-Performance Team

DevOps best practices involve implementing continuous integration and deployment, testing early and often, and changing the organization’s culture.

Blog

Ebook

8.24.2021

Self-Compassion Instead of Self-Blame

Learn how to resist the urge to self-blame as an SRE by understanding the downsides and discovering how to be self-compassionate.

Blog

Ebook

8.18.2021

DevOps & SRE Words Matter: How Our Language has Evolved

DevOps and SRE Language: Discover the key differences between industry terms such as Postmortem vs Retrospective, RCA vs CFA, Disaster vs Incident

Blog

Ebook

8.12.2021

What Is DevOps? Understanding How It Works and Its Benefits

Wondering what DevOps is all about? We will explain what it is, how it works, why it matters, and how it can help your organization

Blog

Ebook

7.20.2021

A Guide to Understanding Observability & Monitoring in SRE Practices

Wondering what the difference is between observability and monitoring? We will explain how they are related, why they are important, and some suggested tools that can help.

Blog

Ebook

7.1.2021

Elephant in the Blameless War Room: Accountability

Blog

Ebook

6.11.2021

Complete Guide to Service Level Objectives (SLOs) That Work

A "Service Level Objective" (SLO) is an internal target that measures how well a service is performing. Here's how they relate to SLAs, SLIs, and error budgets.

Blog

7.17.2020

The Essential List of Top SRE Resources

Are you looking to get up to speed on SRE fundamentals with the best SRE books and best DevOps books? Or are you hoping to expand your SRE knowledge into new domains? Either way, we’ve got you covered in our list of essential SRE resources!

Blog

7.16.2020

5 Tips for Getting Alert Fatigue Under Control

It’s important to minimize alert or pager fatigue as much as possible, for the health and well being of your team members. After all, the health of your systems is dependent on the health of your people. Here are 5 tips on how to cut down on alert fatigue and improve your signal-to-noise ratio.

Blog

7.15.2020

Leadership and Innovation with Instacart's VP of Infrastructure

Blameless CEO Ashar Rizqi recently had the pleasure of interviewing Dustin Pearce in a virtual executive fireside chat and AMA. Below is the transcript of their conversation.

Blog

7.8.2020

How to Classify Incidents

Benefits of classifying incidents, how classification is distinguished from incident triage, and how to set up your own classification system.

Blog

7.1.2020

SLO Adoption at Twitter

The concept of service level objectives (SLOs) and error budgets have been key to this transformation, as SLOs shape an organization’s ability to make data-oriented decisions around reliability. (Read here for a definition of SLOs and how they transformed Evernote.). Today, the Twitter team has invested in centralized tooling to measure, track, and visualize SLOs and their corresponding error budgets.

Blog

6.30.2020

Twitter’s Reliability Journey

We had the privilege of interviewing Brian Brophy, Sr. Staff SRE, Carrie Fernandez, Head of Site Reliability Engineering, JP Doherty, Engineering Manager, and Zachary Kiel, Sr. Staff SRE to learn about how SRE is practiced at Twitter.

Blog

6.29.2020

How SLIs Help You Understand Users' Needs

To be effective, service level indicators must be relevant to the users’ needs and experience. By consolidating a number of internal metrics into one indicator that reflects the typical use of the service, we can ensure that meeting our SLO means keeping users happy. A good way to think about this is by looking at the user’s experience or journey.

Blog

6.26.2020

Top Practices for Runbook Automation

Runbooks, also known as playbooks, are documents that walk you through a certain task with specific steps. Automated runbooks can be a powerful tool for time-saving and consistency. We’ll look at five best practices for getting the most out of runbook automation, some tools on the market that can help you implement them, and discuss how to integrate runbook automation into a complete SRE solution.

Blog

6.19.2020

Best Practices for Effective Incident Management

Below are five incident management best practices that your team can begin using today to improve the speed, efficiency, and effectiveness of your incident management process.

Blog

5.20.2020

How to Create Psychological Safety for Remote Teams

Psychologically safe organizations are free to create, discuss, disagree, take risks, and make mistakes. These organizations are often the ones we see as key innovators in their unique industries. In other words, cultivating a culture of psychological safety is paramount in order to succeed. So what can we do to make sure our teammates feel secure even while socially distanced?

What is the DevOps Lifecycle? The Complete Guide

DevOps vs. Agile | Understanding the Differences

What is Container Orchestration? Key Concepts Explained