Lightrun Adds Ability to Dynamically Pull Telemetry Data from Live Apps

Lightrun has added an ability to dynamically pull missing telemetry evidence from live application environments without having to deploy additional instrumentation to its namesake site reliability engineering (SRE) platform that is based on artificial intelligence (AI).

Company CEO Ilan Peleg said the Lightrun AI SRE platform includes a sandbox deployed via a software development kit (SDK) that can now be integrated with a live application environment to collect new evidence, test hypotheses, and validate outcomes against real execution behavior without having to deploy additional agents to collect telemetry data.

The overall goal is to provide DevOps teams with much-needed additional context on demand to reduce mean time to detection of the root cause of an incident, he added.

That capability will soon prove to be crucial as the volume of applications that are being deployed in the age of AI begins to overwhelm the ability of DevOps teams to manage incidents, noted Peleg.

At the core of the Lightrun platform is a Runtime Context engine that enables DevOps teams to understand how code truly behaves. Armed with those insights, it becomes possible to both identify issues and bottlenecks before and after an application is deployed in a production environment, he added. In fact, DevOps teams can now run validated code changes that eliminates guesswork and rollbacks, noted Peleg.

Rather than relying on observability platforms designed for IT operations teams, Lightrun is instead making a case for using an SDK combined with an AI model to enable developers to observe code in a way that fits naturally within an existing workflow.

Mitch Ashley, vice president and practice lead for software lifecycle engineering at the Futurum Group, said the runtime context engine developed by Lightrun creates a foundation for validated incident investigation. Collecting telemetry on demand from live production environments without additional instrumentation, changes how DevOps teams manage incident response, he added. As AI accelerates deployment velocity, evidence gaps compound; teams cannot validate hypotheses against real execution behavior when instrumentation lags behind deployment cycles noted Ashley.

Testing against live execution behavior also reduces mean time to root cause and eliminates the rollback cycles that follow from incomplete production visibility, he said.

It’s not clear to what degree that approach might reduce the need for other types of observability platforms, but at the very least developers will now be able to debug runtime environments without always waiting for guidance to be provided by an IT operations team. That approach should reduce the number of issues that DevOps engineers would need to directly address after an application has been deployed in a production environment.

Like it or not, the AI genie, as far as software engineering is concerned, is out of the proverbial bottle. The amount of code moving through DevOps pipelines will inevitably increase to a point where without the help of AI tools and platforms to manage it, DevOps teams will soon be overwhelmed.

DevOps teams, by now, should be identifying bottlenecks in their existing engineering workflows that are likely to only become bigger as the volume of code moving through pipelines increases. Ultimately, the AI issue will come down to determining to what degree the tools and platforms being relied on today to manage those pipelines might need to be upgraded or replaced to accommodate all the code being generated by developers in the age of AI.