CI/CD Was Built for Deterministic Software — Agents Just Broke the Model

CI/CD was built around a comforting idea: Software should do tomorrow what it did today, assuming the inputs are the same.

That assumption sits underneath a lot of modern DevOps. It is why we have build pipelines, test suites, artifact repositories, deployment gates, rollback strategies, infrastructure-as-code and all the other machinery that turned software delivery from an artisanal activity into something closer to an industrial process. We may not always get perfect repeatability, but the goal has been clear. Same source code. Same build process. Same tests. Same artifact. Same deployment path. Same expected result.

Agentic AI makes a mess of that assumption.

That does not mean CI/CD is dead. It does not mean pipelines are obsolete. It does not mean every DevOps team has to throw away what it spent the last decade building. Quite the opposite. The disciplines of DevOps are about to become more important, not less. But the model has to expand because the thing being delivered is changing.

This is why cdCon being co-located with Open Source Summit North America this week is more than just another conference scheduling detail. The Linux Foundation has framed this year’s summit around AI infrastructure, software supply chain security, open ecosystems and the rise of AI agents, while cdCon is focused on the next evolution of DevOps through AI, platform engineering, security and open source. The Continuous Delivery Foundation describes cdCon 2026 as centered on AI-driven workflows, operational delivery of AI applications, software supply chain security and platform engineering at scale. That is exactly the right conversation at exactly the right time.

Techstrong Group will be on the ground at Open Source Summit as well, with Mike Vizard and the Techstrong team covering the news, hallway conversations and practical takeaways for our DevOps.com, Cloud Native Now and PlatformEngineering.com communities. That matters because this is not an abstract AI debate anymore. These questions are landing directly in the laps of the people responsible for building, securing and operating modern software systems.

The issue is not whether AI belongs in the software delivery conversation. It already does. The harder question is whether the continuous delivery model we inherited from deterministic software can handle systems that reason, select tools, retrieve context, call APIs and take actions in ways that are not always identical from run to run.

Traditional pipelines assume repeatability. That was the bargain. We put software through source control, builds, tests, scans and deployments because we wanted confidence that the thing we promoted into production was the same thing we validated earlier. Containers, Kubernetes, GitOps and infrastructure-as-code all reinforced that pattern. They gave us better ways to package, declare, verify and reproduce environments. Even when production surprised us, the underlying delivery model still aimed at reducing variance.

Agents introduce variance as a feature, not a bug.

An AI agent may receive a prompt, retrieve context from a knowledge base, decide which tool to use, call an API, evaluate the result, take another step and then produce an answer or action. A support agent might triage a customer issue differently because the retrieved documentation changed. A security agent might prioritize a vulnerability differently because the asset context was updated. A remediation agent might open a ticket one time, restart a service the next time and escalate to a human the third time. The code wrapper around the agent may be identical. The behavior may not be.

That is a very different delivery problem.

Testing is the first place the old model starts to bend. Unit tests and integration tests still matter. Nobody gets a pass on basic engineering hygiene because the word “agent” showed up in the architecture diagram. But traditional tests are not enough for agentic systems. They tell us whether a function returns the expected value. They do not tell us whether an agent behaves acceptably across a wide range of messy real-world scenarios.

Testing agents becomes more statistical. Teams need scenario libraries, eval sets, adversarial prompts, behavioral baselines, drift detection and confidence thresholds. They need to know not only whether the system passed a test, but how often it made the right decision across a representative set of cases. They need to measure whether behavior is improving, degrading or just getting stranger.

That will feel uncomfortable to a lot of DevOps teams because CI/CD has long been built around crisp signals. Passed or failed. Green or red. Promote or stop. Agentic systems introduce shades of confidence. A model may perform well 94% of the time against one class of issues and poorly against another. A prompt update may improve support response quality while increasing unsafe tool selection. A retrieval change may reduce hallucinations in one domain while introducing new errors in another.

That does not fit neatly into the old pipeline dashboard.

Rollback gets weirder too. In traditional software delivery, rollback often means going back to a previous artifact, previous configuration or previous infrastructure state. It is not always simple, but the basic concept is familiar. The release caused a problem. Revert the release.

With agents, what exactly are we rolling back?

The problem might be the model version. It might be the prompt. It might be a tool permission. It might be the retrieval corpus. It might be a policy rule. It might be an MCP server. It might be a memory layer. It might be the sequence of actions the agent already took in Jira, ServiceNow, GitHub, Slack, Kubernetes or a cloud console.

Rolling back the code does not necessarily roll back the behavior. Rolling back the prompt does not undo the ticket the agent opened, the config it changed or the recommendation it already sent to a customer. Once agents are allowed to take action in operational systems, rollback becomes less like restoring a binary and more like reconstructing a chain of decisions.

That raises the stakes for provenance. DevOps and DevSecOps have spent years improving software supply chain trust through SBOMs, signatures, build attestations, artifact registries and policy controls. Those efforts still matter. In fact, they matter more. But agents complicate the definition of what we are trying to prove.

For traditional software, provenance asks: Where did this artifact come from? Who built it? What source produced it? Was it signed? Was it scanned? Was it tampered with?

For agentic systems, provenance also has to ask: What prompt was used? What model answered? What context was retrieved? Which tools were available? Which policy allowed the action? What data influenced the decision? Was a human involved? What did the agent do next?

The “thing that happened” is no longer just a deployed artifact. It is a behavioral chain.

Observability has to change for the same reason. Logs, metrics and traces remain necessary. We are not going backward. But they are not sufficient for agentic systems. The old question was whether the service responded, how long it took, whether it threw errors and what dependencies were involved. The new question is whether the agent behaved appropriately.

That requires behavioral observability. DevOps teams will need visibility into intent, tool selection, retrieved sources, policy decisions, confidence scores, action sequences, approval gates and outcomes. They will need to know when an agent deviated from expected behavior, not merely when a service returned a 500 error.

A customer support agent that gives technically accurate but contractually wrong advice may not trigger a traditional alert. A remediation agent that restarts the right service for the wrong reason may look successful in the metrics. A security triage agent that silently downgrades important findings because retrieved context is stale may not throw an exception. These are behavioral failures, not just system failures.

This is where DevOps has to stretch.

The easy take is that AI agents break CI/CD. That is too simplistic. What agents break is the assumption that delivery ends once the artifact is deployed and the service is healthy. Continuous delivery has always been about feedback loops. Agentic systems force those feedback loops to include behavior, judgment, context and policy.

That means the future pipeline will not just build, test, scan and deploy. It will evaluate. It will simulate. It will constrain. It will observe. It will compare behavior over time. It will produce evidence that the agent did what it was supposed to do, within the boundaries the organization set.

This is also why platform engineering belongs in the conversation. Individual teams cannot each invent their own agent governance model. They will need shared services for evals, policy, identity, tool permissions, observability, audit trails and release controls. The internal developer platform cannot just make it easier to deploy apps. It has to make it safer to deploy agentic behavior.

That is a natural extension of where DevOps has always been headed. DevOps was never just about moving faster. It was about moving faster without losing control. It was about building systems where change could happen more frequently because the organization had better automation, better feedback and better accountability.

Agents do not change that mission. They make it harder.

There is a good chance the first wave of agentic CI/CD will be messy. Some teams will over-trust the agents. Some will bury them under so many approval gates that nothing improves. Some will treat prompts like code. Others will treat them like disposable text strings. Some will discover, painfully, that an agent connected to real tools is not a chatbot anymore. It is an operational actor.

That is the line DevOps teams need to internalize. Once an agent can take action, it becomes part of the delivery and operations system. It needs the same seriousness we apply to production services, privileged identities, deployment pipelines and security controls.

Shimmy’s Take

CI/CD was one of the great advances in modern software because it gave teams a way to make change routine. Not risk-free, not perfect, but routine. It turned software delivery into something we could automate, measure and improve.

Agentic AI puts pressure on that model because agents are not just another artifact moving through the pipeline. They are systems that can make choices after deployment. That means DevOps has to care not only about what shipped, but about how it behaves once it is loose in the world.

The next generation of CI/CD will not just ask whether the build passed, the tests passed and the deployment succeeded. It will ask whether the agent stayed within policy, used the right tools, relied on trusted context, produced acceptable outcomes and left behind enough evidence for humans to understand what happened.

CI/CD was built to ship software. The next version has to prove behavior. That is a much harder job, and it is exactly the kind of job DevOps was created to take on.