Keep running smoothly with Site Reliability Engineering

As your product grows, it's crucial to balance site reliability with new feature production. We can help you adopt Site Reliability Engineering (SRE) tenets and upgrade your team and processes to effectively manage SLOs and error budgets.

Let's make your product resilient

Collage of three photos: a person on a video conference all with two other people; the second is a person at the computer resting their head on their fist; the third is an over the shoulder look at a developer working at their desk

Quote about SRE

An SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s). There are codified rules of engagement and principles for how SRE teams interact with their environment—not only the production environment, but also the product development teams, the testing teams, the users, and so on."

Benjamin Treynor Sloss
SRE Book

What we do

We'll help you establish good SRE practices, then support you when needed

We bring the tenets of SRE to your product team, sharing ways of working and building product resilience. Once the team is empowered to manage SLOs and error budgets on their own, thoughtbot moves into the background as on-call and long-term support.

Services

Fulltime Site Reliability Engineering

For projects with significant reliability and operations needs, we can assign a full-time SRE or DevOps Engineer to your team.

  • Pitch SRE tenets and help product teams and stakeholders adopt the SRE mindset
  • Establish SLOs and Error Budgets
  • Implement monitoring and alerting to ensure Error Budgets are met
  • Improve performance and scaling for applications to meet SLOs
  • Improve CI/CD pipelines to allow continuous, fearless deployment to production environments
  • Deploy new infrastructure to meet scaling, security, and compliance needs
  • Implement infrastructure as code to ensure long-term maintainability
  • Clients in the UK public sector can access our services as part of the G-Cloud-13 purchasing framework.
Hands typing on an open laptop on top of a person's lap.

Let's Talk

What does site reliability look like for your app?

A collage of photos with hand-drawn elements; from top left, two developers looking at a monitor with code on it, one person with headphones on looking at a monitor