We Spent 15 Years Automating Infrastructure. Now We’re Automating Decisions

For most of the last 15 years, DevOps has been engaged in a massive automation project. First, it was server provisioning, then configuration management, then infrastructure as code. CI/CD pipelines followed, along with containers, Kubernetes, GitOps and eventually platform engineering. Each wave built on the previous one, steadily pushing infrastructure and operations further away from manual processes and deeper into programmable systems.

The industry became extraordinarily successful at it. Tasks that once required ticket queues, weekend maintenance windows and large operations teams became automated workflows that could execute repeatedly and reliably. Infrastructure stopped being something organizations manually assembled and increasingly became something they declared, versioned and continuously reconciled through software.

What is important, though, is that most of this automation was still fundamentally deterministic. Engineers wrote scripts. Teams defined workflows. Desired states were declared in code. Pipelines executed predefined sequences. Even Kubernetes, for all of its complexity, operates around deterministic reconciliation. The system continuously attempts to move reality toward a declared state.

The complexity of environments exploded, but the operational philosophy remained relatively stable. DevOps automation was largely about making known tasks repeatable, scalable and reliable.

AI changes that equation because it changes the target of automation itself.

For the first time, the industry is moving beyond automating infrastructure tasks and into automating pieces of operational judgment. That is not simply a more advanced form of automation. It is a different category altogether.

We are already seeing AI systems inserted into incident response workflows, observability platforms, vulnerability prioritization systems, remediation tooling and operational coordination layers. AI copilots recommend fixes, correlate telemetry, suggest rollback strategies and increasingly coordinate actions between systems. Vendors position these capabilities as the natural next step in DevOps evolution, and in some ways, they are right. Modern environments are too large, too dynamic and too interconnected for humans alone to manage efficiently at scale.

But there is a major difference between automating tasks and automating judgment.

A provisioning script that deploys the wrong infrastructure is a problem, but usually a bounded one. Engineers trace the workflow, identify the logic failure and correct it. The failure mode is procedural. The system executed the wrong instructions.

An AI system that incorrectly prioritizes a cascading outage, triggers the wrong remediation path or misinterprets operational telemetry introduces a very different kind of risk. The failure is no longer simply executional. It becomes interpretive. The system is not just doing the wrong thing. It is reasoning incorrectly about what the right thing is in the first place.

That distinction matters because the blast radius changes dramatically once automation moves into probabilistic decision-making.

DevOps spent years reducing operational unpredictability. Infrastructure as code reduced configuration drift. CI/CD pipelines reduced deployment inconsistency. GitOps improved reconciliation discipline. Platform engineering standardized developer workflows and operational controls. Much of modern cloud-native architecture is built around creating more predictable operational systems.

AI introduces probabilistic behavior into that environment.

Traditional automation generally behaves the same way under the same conditions unless inputs change. AI systems can produce different conclusions from similar operational scenarios because they rely on inference rather than strictly procedural logic. That does not make them bad or unusable, but it does fundamentally alter how organizations need to think about trust, governance and operational authority.

This is where the conversation around AI operations often becomes far too simplistic. The current industry narrative frequently frames AI as just another efficiency layer, another productivity accelerator similar to earlier automation waves. In reality, the transition underway is much more profound because organizations are beginning to delegate portions of operational reasoning itself to machines.

The implications reach far beyond tooling.

Consider observability. Traditional monitoring systems surfaced metrics and alerts for humans to interpret. Increasingly, AI-driven platforms attempt to identify root causes automatically, correlate dependencies and propose remediation paths. In security operations, AI systems prioritize vulnerabilities and determine which threats deserve immediate attention. In platform engineering, AI systems are starting to route workflows, coordinate actions between tools and make recommendations about deployment or rollback strategies.

At small scale, these capabilities feel helpful and manageable. At enterprise scale, however, organizations are effectively creating operational systems that participate directly in decision-making loops. Once that happens, governance becomes significantly more important than many organizations currently appreciate.

This is one reason platform engineering is evolving again. The first major wave of platform engineering focused primarily on abstraction and standardization. Internal developer platforms simplified Kubernetes complexity, improved developer experience and centralized operational controls. Platform teams became responsible for creating reliable paved roads through increasingly fragmented infrastructure environments.

Now the role is expanding beyond infrastructure abstraction.

Platform teams are increasingly becoming governance teams for autonomous operational systems. They are defining trust boundaries, approval chains, remediation authority and escalation policies for AI-assisted workflows. They are deciding which operational actions can happen autonomously, which require supervision and which should remain entirely human-driven.

In many ways, the platform layer is becoming an orchestration layer for operational judgment itself.

That shift also explains why open orchestration, interoperability and governance frameworks have become central themes across the broader open source and cloud-native ecosystem. Much of the discussion happening around agent interoperability, MCP frameworks and AI infrastructure at events like the Linux Foundation’s Open Source Summit is not really about hype. Underneath it is a growing recognition that operational authority is starting to migrate into AI-driven coordination systems.

Organizations are beginning to understand the risks of allowing critical operational reasoning to become trapped inside opaque proprietary systems. If AI-driven operational coordination becomes a black box, companies may eventually lose visibility into how key infrastructure, security and remediation decisions are actually being made.

That is not just a tooling concern. It becomes an operational sovereignty issue.

The irony here is that DevOps originally emerged in part as a reaction against opaque operational silos. Infrastructure as code and GitOps increased transparency because operational behavior became codified, observable and reviewable. AI systems risk reintroducing opacity into operations if organizations do not establish strong governance around explainability, oversight and accountability.

None of this means AI should be resisted. Modern operational environments almost certainly require some level of AI-assisted coordination going forward. The scale is already exceeding what humans alone can reasonably process in real time. AI will become necessary in many environments simply because telemetry volume, dependency complexity and operational velocity continue to increase faster than teams can manually absorb.

But the industry does need to stop pretending this is simply another incremental automation cycle.

Automating infrastructure changed how systems operated. Automating decisions changes how organizations distribute trust, authority and operational responsibility between humans and machines.

That is a much bigger transition than many people currently realize.

Shimmy’s Take

The DevOps movement spent the last decade and a half proving that infrastructure automation works. That success reshaped the entire technology industry and made modern cloud-native computing possible.

What comes next is fundamentally different.

We are no longer just teaching machines how to execute predefined tasks faster. We are beginning to allow machines to participate in operational reasoning itself. Once that happens, the conversation can no longer focus solely on efficiency and productivity.

It has to focus on governance, trust and control.

The future of DevOps is not about whether we automate more. It is about understanding which decisions can safely be automated, which require human supervision and which should never leave human hands at all.