Komodor Extends Reach of AI SRE Orchestration Framework

Komodor today extended the reach of its orchestration framework for artificial intelligence (AI) agents by adding support for Model Context Protocol (MCP) servers and the OpenAPI specification.

Company CTO Itiel Shwartz said those capabilities will make it possible for IT teams to more broadly orchestrate AI agents that are being used to investigate and remediate issues affecting IT infrastructure.

Komodor has already developed more than 50 AI agents that automate the management of Kubernetes clusters running cloud-native applications. By adding support for MCP and OpenAPI, that orchestration framework can now be used to manage hybrid IT environments running more complex applications, said Shwartz.

For example, IT teams can use the Komodor orchestration framework to invoke third-party AI agents found on the Komodor Marketplace that have been trained to automate network and storage management tasks or provision graphical processor units (GPUs), he noted. Alternatively, the orchestration framework can be used to invoke an AI agent that is being used to deploy applications via a continuous integration/continuous delivery (CI/CD) pipeline.

The overall goal is to enable IT teams to deploy one orchestration framework that can then be integrated with multiple AI agents provided by multiple vendors, said Shwartz.

It’s not clear to what degree IT teams are relying on AI agents to manage infrastructure, but as the pace at which applications are being built and deployed in the age of AI continues to accelerate, there will soon be a lot more infrastructure to manage. Most organizations cannot afford to hire a small army of SREs to manage all the infrastructure, so in many cases the only alternative will be to rely more on AI, noted Shwartz.

The challenge, of course, is that IT teams could wind up deploying multiple orchestration frameworks for AI agents provided by different vendors, which they would then need to integrate.

In the meantime, however, most IT teams are still trying to assess to what degree they can rely on AI agents to consistently perform a task. It’s not uncommon for an AI agent to make a mistake or report that a task has been performed when in fact it has not. SREs will continue to be needed to validate that any task assigned to an AI agent has been completed. In fact, the more complex a task is the more likely it is there might be an issue that could create a series of cascading events that adversely impact application performance and availability.

On the plus side, however, the latest generation of AI agents are able to take advantage of more advanced reasoning capabilities that have been embedded in AI models. As the pace at which those AI models are updated continues to accelerate, the AI agents that invoke them will become that much more reliable.

Hopefully, being an SRE will soon become a lot less stressful, especially as more tedious tasks are assigned to AI agents. The challenge then, of course, will be finding a way to manage all the AI agents that SREs will soon find themselves supervising.