

There is a silent force shaping engineering culture inside every technology organization. It affects productivity, team morale, psychological safety, and long-term retention. And yet, it is rarely discussed in executive meetings or reflected in meaningful KPIs.
That force is on-call.
On-call is one of the most direct touchpoints engineers have with the reality of the systems they own. When it’s healthy, it builds confidence, resilience, and pride. When it’s unhealthy, it quietly corrodes everything that makes engineering teams effective. And while most companies drastically underestimate this effect, a recent survey found that on-call is the least-liked aspect of software engineering, often leading to burnout and attrition. Poorly managed on-call isn’t only a mental health issue; it can also impact a company’s brand and finances, as recent significant outages from AWS, Azure, and Cloudflare have shown.
In this article, I will go over why on-call matters, the current challenges the industry faces, and how engineering leaders can overcome them.
The Hidden Driver of Stress and Attrition
For engineers, on-call is not an abstract concept: it’s a very concrete experience. When you’re paged for systems you didn’t build, code you barely understand, or brittle architectures you struggle to stabilize, the stress is immediate. A 2025 Catchpoint report found that nearly 70% of SREs said on-call stress contributed to burnout and made them more likely to leave their job.
In fact, on-call is one of the strongest signals engineers receive about how much their company values them. Unstable systems, unclear ownership, noisy alerts, missing documentation, or insufficient training all communicate an unintended but unmistakable message: “Here is a critical system. Please keep the lights on. Good luck.” The link to burnout and attrition is well-documented and folklore in the industry: “Engineers don’t quit jobs, they quit their on-call rotations.”
By contrast, clear escalation paths, stable rotations, and the right tools and knowledge communicate something very different: “We care about you. We respect the responsibility we’re asking you to take on. And we want you to feel confident, not alone.”
A healthy on-call experience is a cultural investment. It strengthens trust, belonging, autonomy, and pride. A poor on-call experience is one of the fastest ways to erode them.
What a Healthy On-Call Experience Looks Like
The good news is that a good on-call rotation is well understood in modern engineering practice. Healthy on-call environments share several characteristics.
First, engineers should be experienced with the systems they are on call for. They know how deployments work, how the system behaves, what common failure modes look like, and where to find solid telemetry, good dashboards, and up-to-date documentation. They are also aware of clear escalation paths and know exactly whom to call when issues exceed their scope.
Second, on-call shifts should be humane. A typical cadence is once every four to six weeks, not every second week or continuously. In other words, the team’s rest is protected. If someone is firefighting all night, someone else takes over the next day: backup is always within reach. Alerts should be few and meaningful, aiming for low single digits per week, mostly within office hours. Not zero, as otherwise skills atrophy, but not constant interruptions.
Last but not least, on-call duty should be properly compensated. A practical litmus test is simple: does the on-call engineer’s partner feel the additional income fairly offsets bringing a laptop to dinner, planning travel around reliable internet, and being jolted awake at 3 AM by a fire-alarm ringtone? If the answer is “no”, there is probably something to adjust.
A Pattern That Works: Regular On-Call Reviews
Improving on-call health does not require a sweeping transformation. It begins with the simplest possible foundation: the willingness to look. In my GoTo Amsterdam 2024 talk, I described a simple system that improved on-call health across hundreds of teams.
- Record simple on-call KPIs. One of the most effective can be intentionally simple: How many pages did each team receive last week? A newer class of tools, including the open-source On-call Health, offers a more nuanced view by combining operational signals from tools such as GitHub, Jira, Rootly, and Linear with lightweight self-reports, such as a daily “How do you feel today?” check-in.
- Regularly review on-call KPIs in meetings with Engineering Leaders. Management will only funnel time & resources towards problem they can see. Once managers start to care, the worst experiences are often eliminated quickly, and mediocre situations gradually improve over time as reliability investments get more priority.
On a practical note, having reports tailored to the company’s management culture is highly beneficial for establishing an effective routine. For example, at Zalando, those reports are auto-generated Google Docs that are reviewed during a Weekly Operational Review Meeting (WORM) in silent reading time.
Also, be aware that neither the raw page count nor the more sophisticated score from the On-Call Health should be taken at face value. These metrics exist to direct attention and spark conversation, not to serve as verdicts or to blame teams.
Common On-Call Failures and the Fixes That Work
Over the past decade, I have worked with countless on-call teams, and I have found that the same issues keep coming up. Let’s go over a few of them and see what solution was adopted. A common one is understaffed rotations, sometimes with a single engineer on call for far too long. These weren’t catastrophic situations, but they were clearly unsustainable, and the reviews finally created the visibility needed to rebalance schedules and restore humane workloads.
Another widespread one is alert noise that has become normalized. Engineers living with noisy alerts for months, unsure whether they were “allowed” to turn anything off. The solution was to make ownership explicit and give teams the confidence and leadership backing to silence bad alerts, fix signal paths, and, in some cases, adopt SLO-based alerting.
I’ve also seen a fair share of hero-engineer bottlenecks. Looking across incidents, it was clear that one senior engineer was informally resolving most major issues, regardless of who was officially on call. Making this visible prompted teams to share knowledge, improve documentation, and reduce reliance on a single person.
On Call Culture as a Differentiator
As an engineering leader, ask yourself a simple question: Do you know what your on-call engineers are experiencing this week? Because on-call is not just an operational duty, it is one of the clearest cultural signals you send.
When engineers carry responsibility without support, visibility, or investment, the message is unmistakable. When you invest in light rotations, clear ownership, and reliability work even when things seem fine, the message is strong! A healthy on-call culture is built through attention and small, consistent investments into engineering craft and reliability. Those choices determine whether your organization burns trust or becomes the kind of place where engineers feel proud to carry a pager.