

Software runs, but sometimes it doesn’t… and that’s often down to a lack of runtime visibility in relation to platform engineering teams being able to trust coding assistants and AI-powered site reliability engineering (SRE) services.
It’s an assertion made by software reliability company Lightrun, in its State of AI-Powered Engineering Report 2026, based on an independent poll of 200 SREs and DevOps leaders at enterprises in the U.S., UK and EU.
“To keep pace with AI-driven velocity, we can no longer rely on reactive observability. We must shift runtime visibility left, giving AI tools and agents the live execution data they need to validate code before it ever fails in production,” said Lightrun CEO, Ilan Peleg.
Runtime Visibility Fragility
Peleg and team say that until AI-powered engineering tools have live visibility of how code behaves at runtime, they cannot be trusted to autonomously ensure reliable systems.
But why is runtime visibility so flaky?
One of the major reasons is that a major volume of manual work is required when AI-generated code is deployed, i.e., 43% of AI-generated code requires manual debugging in production, even after passing QA or staging tests.
A central reason is down to the fact that AI coding assistants generally generate code statically, without executing it… and that means they can’t observe actual real-world (and real-time) live production measures for memory usage, variable states, or system behavior.
Coding assistants reason from patterns; they don’t formulate reasoning decisions from live data and so that makes real execution errors hard to anticipate or explain. Furthermore, where code does run, an average of three manual redeploy cycles is required to verify a single AI-suggested code fix in production.
Help! Call in the AI-SREs
As the volume of AI-generated code is rapidly increasing, Peleg insists it is essential to close this verification loop. As a result, engineering teams are turning to AI SRE (site reliability engineering) tools. These agents reason over existing observability, codebase changes, and infrastructure signals to propose incident causes and recommend fixes. However, the report suggests that some 77% of engineering leaders lack confidence in current observability stacks to support automated root cause analyses and remediations.
Lightrun’s report, conducted with independent research firm Global Surveyz, says that 88% of companies require 2-3 manual redeploy cycles just to confirm an AI-generated fix actually works in production. It also notes that developers spend an average of 38% of their week (two days) on debugging, verification, and troubleshooting.
Today’s AI agents operate using probability, reasoning their way toward conclusions. To ground that reasoning in reality, the report makes clear they need real-time visibility into what’s happening, including variable states, memory usage, and how requests move through a system.
“Engineering organizations need runtime visibility to embrace the possibilities offered by AI-accelerated engineering. Without this grounding, we aren’t slowed by writing code anymore, but by our inability to trust it,” said CEO Peleg. “When almost half of AI-generated changes still need debugging in production, we need to fundamentally rethink how we expect our AI agents to solve complex challenges.”
Tribal Knowledge, Still Tops
A whopping 97% of software engineering leaders say AI SREs operate without significant visibility into what’s actually happening in production. Just over half of resolutions to high-severity incidents still use “tribal knowledge” rather than diagnostic evidence from AI SREs or application performance management (APM) tools.
The numbers here are great news in many ways, i.e., nearly half of AI-generated code still requires a human to roll up their sleeves in production, the promise of autonomous engineering starts to look less like an inevitability and more like a promissory note.
Lightrun’s data points to a fundamental architectural tension: AI tools built to accelerate the software delivery pipeline are, in practice, creating a verification bottleneck that eats directly into the productivity gains they were supposed to deliver. That’s not a criticism of AI-assisted development per se — it’s a structural observation about what happens when you bolt intelligent code generation onto observability infrastructure that was never designed to support it.