{"id":4369,"date":"2026-06-18T13:11:41","date_gmt":"2026-06-18T13:11:41","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2026\/06\/18\/coding-agent-horror-stories-the-13-hour-aws-outage\/"},"modified":"2026-06-18T13:11:41","modified_gmt":"2026-06-18T13:11:41","slug":"coding-agent-horror-stories-the-13-hour-aws-outage","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2026\/06\/18\/coding-agent-horror-stories-the-13-hour-aws-outage\/","title":{"rendered":"Coding Agent Horror Stories: The 13-Hour AWS Outage"},"content":{"rendered":"<p>In <a href=\"https:\/\/www.docker.com\/blog\/ai-coding-agent-horror-stories-security-risks\/\">Part 1<\/a>, we walked through six categories of AI coding agent failures and why they keep happening. The agent runs as you, with your filesystem permissions and your credentials, and nothing sits between the model\u2019s decision and the shell\u2019s execution. In <a href=\"https:\/\/www.docker.com\/blog\/coding-agent-horror-stories-the-rm-rf-incident\/\">Part 2<\/a>, we looked at one specific version of that failure in detail, the <code>rm -rf ~\/<\/code> incident that wiped a developer\u2019s entire Mac in a single command. Part 3 moves the same problem up the stack, into a production AWS environment where the blast radius is no longer one laptop but a regional cloud service.<\/p>\n<p>What happens when the agent isn\u2019t running on your laptop, but on a production AWS environment with operator-level credentials? You get a thirteen-hour outage, a public denial that fooled no one, and a series of follow-on incidents that cost Amazon an estimated 6.3 million orders before the company was forced to introduce what it called a \u201ccode safety reset.\u201d\u00a0<\/p>\n<h2 class=\"wp-block-heading\">Today\u2019s Horror Story: The Agent That Deleted Production<\/h2>\n<p>In mid-December 2025, an AWS engineer asked Kiro for help with a small bug in AWS Cost Explorer, the dashboard customers use to track their cloud spending. <a href=\"https:\/\/aws.amazon.com\/documentation-overview\/kiro\/\" rel=\"nofollow noopener\" target=\"_blank\">Kiro<\/a> is Amazon\u2019s own agentic coding assistant. It had been granted operator-level access to the environment, the same access the engineer had, because that was how Kiro was being rolled out across the company at the time.<\/p>\n<p>Kiro looked at the bug, weighed its options, and decided the cleanest fix was to delete the production environment and rebuild it from scratch. The engineer never got a chance to step in. There was no confirmation prompt, no second pair of eyes, no two-person rule, and by the time anyone could have intervened the deletion was already done. Cost Explorer went down for thirteen hours in one of AWS\u2019s mainland China regions.<\/p>\n<p>This was not a security breach. It was an AI coding agent doing exactly what it had been set up to do, running with the engineer\u2019s full credentials, with nothing in the architecture to catch the moment between \u201cdelete and recreate\u201d being a reasonable option to consider and a production service being torn down.<\/p>\n<p>In this issue, you\u2019ll learn:<\/p>\n<ul class=\"wp-block-list\">\n<li>What happened in the December outage, step by step<\/li>\n<li>Why Amazon\u2019s \u201cuser error, not AI\u201d response only told part of the story<\/li>\n<li>How the December incident set the stage for outages that cost an estimated 6.3 million orders by March 2026<\/li>\n<li>The scoped-identity pattern that prevents this whole category of failure<\/li>\n<\/ul>\n<h2 class=\"wp-block-heading\">Why This Series Matters<\/h2>\n<p>Each \u201cHorror Story\u201d examines a real-world incident that turns laboratory findings into production disasters. These aren\u2019t hypothetical attacks. They\u2019re documented cases with named victims, internal memos obtained by reporters, and in several cases, public denials from the vendors. Our goal is to show the human and operational impact behind the security statistics, demonstrate how these failures unfold in practice, and provide concrete guidance on protecting your infrastructure through Docker\u2019s scoped-identity execution model.<\/p>\n<p>The story begins with an internal memo dated November 24, 2025. Three weeks before Kiro deleted the Cost Explorer environment, two of Amazon\u2019s senior VPs, Peter DeSantis (AWS Utility Computing) and Dave Treadwell (eCommerce Foundation) signed and <a href=\"https:\/\/awesomeagents.ai\/news\/amazon-kiro-ai-aws-outages\/\" rel=\"nofollow noopener\" target=\"_blank\">distributed an internal memo<\/a> telling the company that Kiro was now the standardized AI coding assistant for the entire organization. The memo set a target of 80% weekly usage by every Amazon engineer by year-end 2025, and directed teams to stop using third-party AI tools unless a VP signed off on the exception.<\/p>\n<p>Engineers came to call this the \u201cKiro Mandate.\u201d Adoption was tracked as a corporate OKR, and engineers who weren\u2019t using the tool showed up on management dashboards. The mandate was framed as a performance question, not a safety question, which mattered because the safety side of the rollout had not kept up. Things like peer review for destructive changes, approval gates for production access, and per-agent permission scoping had not been formally extended to AI-assisted work when the 80% target was set<\/p>\n<p>Around 1,500 Amazon engineers signed an internal forum post pushing back, arguing that tools like Claude Code outperformed Kiro on real engineering tasks like multi-language refactoring. Management proceeded with the mandate anyway. By January 2026, 70% of Amazon engineers had used Kiro during sprint windows. Adoption was on track. The risk profile of what those engineers could do with the tool was a different story.<\/p>\n<p>Then on February 20, 2026, <a href=\"https:\/\/www.ft.com\/content\/00c282de-ed14-4acd-a948-bc8d6bdb339d\" rel=\"nofollow noopener\" target=\"_blank\">the Financial Times broke the story<\/a> based on accounts from four people familiar with the incident. The FT reporting also surfaced a second AI-related outage, this one involving Amazon Q Developer, on a separate system. Amazon\u2019s response went up the same day under the title \u201cCorrecting the Financial Times report about AWS, Kiro, and AI.\u201d It called the cause \u201cuser error, specifically misconfigured access controls, not AI as the story claims,\u201d and dismissed the FT\u2019s second-incident claim as \u201centirely false.\u201d<\/p>\n<p>The misconfigured access controls part is worth a closer look. A typo would have been \u201cuser error.\u201d What actually happened was a structural decision to give an autonomous agent the same permissions as a human operator, in a system where the human\u2019s safety net had always been a colleague asking \u201care you sure?\u201d Kiro had no colleagues.<\/p>\n<div class=\"wp-block-ponyo-image\">\n                <img data-opt-id=1258191194  fetchpriority=\"high\" decoding=\"async\" width=\"1999\" height=\"1330\" src=\"https:\/\/www.docker.com\/app\/uploads\/2026\/06\/image1.png\" class=\"fade-in\" alt=\"image1\" title=\"- image1\" \/>\n        <\/div>\n\n<h2 class=\"wp-block-heading\">The Scale of the Problem<\/h2>\n<p>The December outage was the visible piece of a bigger pattern. Inside Amazon, briefing notes described a series of incidents with \u201chigh blast radius\u201d tied to AI-assisted changes, with safety rules that had not yet been written for the way the agents were now being used. None of that language was ever shared publicly.<\/p>\n<p>On March 2, Amazon.com showed shoppers the wrong delivery dates after they added things to their carts. About <a href=\"https:\/\/medium.com\/codetodeploy\/when-ai-writes-the-code-a-deep-dive-into-amazons-2026-ai-linked-outages-434ffd85a0d2\" rel=\"nofollow noopener\" target=\"_blank\">120,000 orders were lost<\/a> and <a href=\"https:\/\/dev.to\/tyson_cung\/amazon-lost-63m-orders-after-ai-coding-tool-went-rogue-now-theyre-hitting-the-brakes-2h7p\" rel=\"nofollow noopener\" target=\"_blank\">1.6 million people hit error pages<\/a>. Amazon\u2019s internal review pointed at one of its own AI tools, Amazon Q, as a main cause. Three days later, on March 5, the storefront went down for six hours and lost an estimated 6.3 million orders, with U.S. order volume dropping 99% while it was down. Both incidents traced back to AI-written code that had been pushed live without proper review.<\/p>\n<p>On March 10, Dave Treadwell, the same SVP who had co-signed the Kiro Mandate four months earlier, announced a 90-day code safety reset across roughly 335 of Amazon\u2019s most important systems. The new rules: two people had to sign off on every change going live, senior engineers had to approve AI-written code from juniors, and the automated checks were tightened. Treadwell called the new approach \u201ccontrolled friction.\u201d It\u2019s a quiet way of saying the friction had not been there before, and that what arrived in March was what should have been in place in November.<\/p>\n<h2 class=\"wp-block-heading\">How the Failure Works<\/h2>\n<p>To understand why these incidents happen, you have to look at the architecture underneath. Kiro was doing exactly what an agentic coding assistant is designed to do. The failure was in the system that surrounded it.<\/p>\n<p>When Kiro runs on behalf of an engineer, it inherits the engineer\u2019s full set of permissions. There\u2019s no separate identity for \u201cKiro acting on behalf of someone,\u201d no role with a narrower scope than the human who launched it. Whatever the engineer can touch, the agent can touch. This is the same property we walked through in <a href=\"https:\/\/www.docker.com\/blog\/ai-coding-agent-horror-stories-security-risks\/\">Part 1<\/a> for filesystem access, applied here to cloud credentials instead. The agent gets a copy of the keys, every time.<\/p>\n<p>Then there\u2019s the loop. In most AI coding assistants the reasoning step and the execution step happen inside the same cycle. The agent thinks about what to do, generates the action, and runs it before the engineer has a chance to read what it decided. There\u2019s no proposal stage, no preview screen, no \u201cdo you want me to do this?\u201d gate that a human approves first. The deciding and the doing are one thing.<\/p>\n<p>The speed makes this worse. Most safeguards in software engineering assume a human is the one making the change. A confirm? (y\/n) prompt only protects against typos because a person sees it, pauses, and reads it. An agentic loop reads the same prompt and replies \u201cy\u201d in milliseconds. By the time anyone notices the agent has made a decision, the decision has already been executed. Post-hoc intervention isn\u2019t really a thing in this environment.<\/p>\n<p>And the reasoning that gets the agent there isn\u2019t wrong. It\u2019s just not bounded by the things that would have stopped a human. A senior AWS engineer with the same permissions would not have looked at a small bug in Cost Explorer and decided the right move was to tear down the production environment. They would have walked over to a colleague, posted in a Slack channel, paused to think about whether anyone had pinged them lately about that service. Kiro had the same permissions and skipped all of that, because none of it is part of how an AI agent makes a decision.<\/p>\n<p>Kiro didn\u2019t go rogue. It didn\u2019t malfunction. It was optimizing for the objective it was given, which was to fix the bug, and \u201cdelete and recreate\u201d is a legitimate solution in many engineering contexts. What was missing wasn\u2019t smarter reasoning. It was the layer of friction that would have caught the moment between \u201cthis is a defensible option\u201d and \u201cthis is happening to a live customer service.\u201d<\/p>\n<h2 class=\"wp-block-heading\">Technical Breakdown: How a Cost Explorer Fix Became a 13-Hour Outage<\/h2>\n<div class=\"wp-block-ponyo-image\">\n                <img data-opt-id=1373312225  fetchpriority=\"high\" decoding=\"async\" width=\"1200\" height=\"571\" src=\"https:\/\/www.docker.com\/app\/uploads\/2026\/06\/image2.png\" class=\"fade-in\" alt=\"image2\" title=\"- image2\" \/>\n        <\/div>\n<p><em>Caption: Diagram illustrating how operator-level permissions flow directly from engineer to agent to production control plane, with no scoped-identity boundary in between.<\/em><\/p>\n<p>Here\u2019s how the December incident unfolded, step by step:<\/p>\n<h3 class=\"wp-block-heading\">1. The Request<\/h3>\n<p>An AWS engineer is looking at a small bug in Cost Explorer for the cn-northwest region. They hand it to Kiro the way they\u2019d hand it to a colleague:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: bash; gutter: false; title: ; notranslate\">\ncheck the cost explorer issue in cn-northwest and propose a fix\n\n<\/pre>\n<\/div>\n<p>That\u2019s the whole prompt. No special framing, no permissions caveat. It\u2019s just routine maintenance.<\/p>\n<h3 class=\"wp-block-heading\">2. The Reasoning<\/h3>\n<p>Kiro looks at the environment, finds the misconfiguration, and weighs its options. It could patch the misconfiguration in place, or redeploy specific components, or tear the environment down and rebuild it cleanly from the deployment templates. From a pure correctness standpoint, the last option is the most thorough, since it guarantees no residual state from the broken configuration. That\u2019s the path Kiro picks.<\/p>\n<h3 class=\"wp-block-heading\">3. The Inheritance<\/h3>\n<p>Kiro is running as the engineer. The engineer has operator-level access to the Cost Explorer production environment, including the ability to tear it down, because that\u2019s the kind of operation a human operator might legitimately need during an incident. The control plane has no concept of \u201cKiro acting on behalf of the engineer.\u201d It only has \u201can authenticated principal with sufficient permissions making a request.\u201d From its point of view, the engineer is making the call.<\/p>\n<h3 class=\"wp-block-heading\">4. The Execution<\/h3>\n<p>Kiro initiates the deletion, and the request runs in the seconds it takes to send the API call. There is no confirmation prompt the engineer could intercept in that window, no two-person rule waiting on a second approver, and no policy gate watching for the specific shape of \u201cthis command would tear down a production service.\u201d The control plane sees a valid API call from an authenticated principal with sufficient permissions, and it processes the call the way it would process any other operator request.<\/p>\n<h3 class=\"wp-block-heading\">5. The Outage<\/h3>\n<p>Cost Explorer in the affected region goes down, and customers across that region lose the ability to view, analyze, or manage their cloud spending. The outage ends up running for thirteen hours, with almost all of that time spent on recovery rather than detection, because the deletion itself completed in the seconds it took to send the API call. Rebuilding the environment from the deployment templates, validating the configuration against the expected state, restoring connectivity to the services Cost Explorer depends on, replaying the state the old environment had built up, and bringing the service back up in front of real traffic is the work that takes the rest of the day.<\/p>\n<p>Internally, the incident enters Amazon\u2019s Correction of Error process, while externally the story stays quiet for two months until the Financial Times breaks it on February 20, 2026. Amazon\u2019s response, issued the same day, frames the cause as \u201cuser error, specifically misconfigured access controls,\u201d and announces mandatory peer review for production access in the same breath. That second part is the architectural admission that something more than user error needed to be fixed.<\/p>\n<h3 class=\"wp-block-heading\">The Impact<\/h3>\n<p>Within thirteen hours, AWS had:<\/p>\n<ul class=\"wp-block-list\">\n<li>Lost a production service for a regulated region (mainland China) where service continuity matters acutely<\/li>\n<li>Triggered an internal investigation that produced a post-incident briefing characterizing the failure as part of a \u201ctrend of incidents\u201d with \u201chigh blast radius\u201d<\/li>\n<li>Set the conditions for the follow-on incidents in March that cost an estimated 6.3 million orders<\/li>\n<\/ul>\n<p>The technical fix was simple. Mandatory peer review for production access. The reason it wasn\u2019t in place from the start is the part that matters: nobody had updated the operational model to account for the fact that the entity making the change might be moving at a thousand times the speed of the entity reviewing it.<\/p>\n<p>This is what one autonomous \u201cdelete and recreate\u201d decision produces when the agent has the same credentials as the engineer who launched it.<\/p>\n<h2 class=\"wp-block-heading\">How Docker Sandboxes Eliminates This Attack Vector<\/h2>\n<p>Issues 1 and 2 covered the commands you\u2019d type to run an agent in a sandbox. This one is about what sits underneath those commands, because the Kiro incident isn\u2019t really a CLI problem. It\u2019s an architecture problem, and no command-line flag fixes the kind of gap the December outage exposed. What fixes it is the layer the flag sits on top of.<\/p>\n<p>That layer is the microVM. Each sandbox runs inside its own dedicated microVM, with its own kernel, its own filesystem, its own network namespace, and its own Docker daemon. It\u2019s hardware-boundary isolation, the same kind you get from a full VM, but optimized for the way agents actually work: spin up in seconds, throw away when done, no path back to the host. As <a href=\"https:\/\/www.docker.com\/blog\/why-microvms-the-architecture-behind-docker-sandboxes\/\">Docker\u2019s microVM architecture post <\/a>explains, the bounding box has to come from infrastructure, not from a system prompt. An LLM deciding its own security boundaries is not a security model.<\/p>\n<p>This is the part that matters for the Kiro case. Inside a microVM, the agent isn\u2019t an extension of the engineer\u2019s identity. It\u2019s a distinct process with a distinct view of the world, running on a different kernel, talking to a different Docker daemon, reaching the network through a proxy that the agent cannot see or bypass. The credentials that would let a human operator delete a production environment are not in the agent\u2019s process memory, not in its environment variables, not in any file it can read. They live outside the microVM boundary entirely.<\/p>\n<div class=\"wp-block-ponyo-image\">\n                <img data-opt-id=1230686901  data-opt-src=\"https:\/\/www.docker.com\/app\/uploads\/2026\/06\/image3.png\"  decoding=\"async\" width=\"1637\" height=\"844\" src=\"data:image/svg+xml,%3Csvg%20viewBox%3D%220%200%20100%%20100%%22%20width%3D%22100%%22%20height%3D%22100%%22%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%3E%3Crect%20width%3D%22100%%22%20height%3D%22100%%22%20fill%3D%22transparent%22%2F%3E%3C%2Fsvg%3E\" class=\"fade-in\" alt=\"image3\" title=\"- image3\" \/>\n        <\/div>\n\n<h3 class=\"wp-block-heading\">Three architectural decisions that close the Kiro gap<\/h3>\n<p>The Docker Sandboxes <a href=\"https:\/\/docs.docker.com\/ai\/sandboxes\/architecture\/\" rel=\"nofollow noopener\" target=\"_blank\">architecture<\/a> documentation describes how each layer of the design protects against a specific class of failure. Three of those layers are directly relevant to the December incident.<\/p>\n<p><strong>1. The workspace is mounted at the same path it has on the host, and nothing else is.<\/strong> The sandbox sees the agent\u2019s workspace through a filesystem passthrough at the same absolute path. That\u2019s the only thing it sees. The engineer\u2019s home directory, their cloud configs, their credential files, their SSH keys, all of that lives outside the boundary. If the agent reasoned its way to a \u201cdelete and recreate\u201d plan, the deletion would target the workspace, which is reproducible from source anyway. The host stays whole.<\/p>\n<p><strong>2. The Docker daemon lives inside the VM, with no path back.<\/strong> This is the design decision that separates Docker Sandboxes from approaches that look similar on the surface. Mounting the Docker socket from the host gives the agent escape paths. WASM and V8 isolates can\u2019t run a full development environment. A general-purpose VM is too heavy to spin up for a single session. A microVM with its own Docker daemon is the only model that gives the agent a real working environment without any of those compromises. For the Kiro case specifically, it means the agent can investigate the Cost Explorer bug, build container images, run tests against them, and propose a fix, all without ever holding the credentials it would need to execute that fix against the live service.<\/p>\n<p><strong>3. A proxy on the host enforces credentials and network policy.<\/strong> All outbound traffic from the sandbox routes through an HTTP\/HTTPS proxy <a href=\"https:\/\/docs.docker.com\/ai\/sandboxes\/architecture\/#networking\" rel=\"nofollow noopener\" target=\"_blank\">running on the host<\/a>, outside the VM boundary. This is the layer that directly addresses what went wrong with Kiro. Secrets are stored on the host, scoped to specific services, and injected into outbound requests by the proxy. The agent never sees the values themselves. It also can\u2019t get around the proxy, because the proxy is the only way traffic leaves the microVM at all. If the agent decides to call a destructive control-plane endpoint, the proxy is what stops it, regardless of what the model has reasoned its way to.<\/p>\n<h3 class=\"wp-block-heading\">Why this matters for the Kiro incident specifically\u00a0<\/h3>\n<p>Let\u2019s replay the December scenario against this architecture. The engineer launches the agent inside a sandbox. The microVM boots in seconds, the workspace gets mounted, and the agent starts up without any AWS operator credentials in its environment. Those credentials are still on the host, where they belong. From here, the agent investigates the Cost Explorer bug exactly the way Kiro did, reasoning through the same options and quite possibly landing on the same \u201cdelete and recreate\u201d plan. Nothing on the inside of the box has changed.<\/p>\n<p>What changes is what happens when the agent tries to act. The deletion call leaves the sandbox through the only path available to it, which is the proxy on the host. The proxy checks the network policy and either authenticates the call with a scoped, read-only credential the engineer set up for investigation work, or it refuses the call because the destination wasn\u2019t on the allowlist. Either way, the December outcome, the thirteen-hour production outage, simply doesn\u2019t happen. The agent\u2019s plan ends up in front of the engineer as a proposal. The engineer reads \u201cdelete and recreate,\u201d recognizes that it\u2019s too much for a small bug, and asks the agent to patch in place instead.<\/p>\n<p>This pattern generalizes. The same architecture that would have contained the LovesWorkin filesystem incident in Issue 2 would have contained the Kiro control-plane incident in this one, because both failures share the same root cause: an agent acting with the launching user\u2019s full identity, at machine speed, against systems that have no way of knowing they\u2019re talking to an agent. The microVM makes the agent a distinct actor with its own boundary. The isolated Docker daemon gives that actor a real working environment to operate in. The proxy gives the engineer a place to decide, ahead of time, what that actor can reach. The blast radius of anything the agent reasons its way into is bounded by what the sandbox allows, not by what the engineer who launched it happens to have access to.<\/p>\n<p>The <a href=\"https:\/\/docs.docker.com\/ai\/sandboxes\/\" rel=\"nofollow noopener\" target=\"_blank\">sbx CLI<\/a> is what exposes all of this to the developer. Here\u2019s what the Cost Explorer investigation would have looked like inside a sandbox, configured the way the December incident needed.<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: bash; gutter: false; title: ; notranslate\">\n# 1. Store the AWS credential for the sandbox, outside the agent's view.\n#    The actual scoping (read-only, Cost Explorer only) is handled\n#    at the AWS IAM layer when the credential is created. From sbx's\n#    side, the credential is opaque, the agent never sees the value,\n#    and the proxy is what injects it into outbound calls.\necho \"$AWS_COST_EXPLORER_READONLY_KEY\" | sbx secret set -g aws\n\n# 2. Define what the sandbox is allowed to reach on the network.\n#    Cost Explorer read endpoints are on the list. Control-plane\n#    endpoints that would let an agent tear down a production\n#    environment are not.\nsbx policy allow network \"ce.amazonaws.com,api.anthropic.com\"\n\n# 3. Launch the agent inside the sandbox.\nsbx run claude\n\n# 4. After the session, review what the proxy allowed and denied.\n#    Any attempt the agent made to reach an endpoint outside the\n#    allowlist will show up here.\nsbx policy log\n\n<\/pre>\n<\/div>\n<p>Step 1 stores the AWS credential outside the agent\u2019s view, with the read-only and Cost-Explorer-only scoping enforced by AWS IAM rather than by <code>sbx<\/code>. Step 2 defines the network perimeter the proxy will enforce, independent of how broad the credential\u2019s IAM permissions actually are. Step 3 starts the agent inside the microVM with no path back to the host. Step 4 is what makes the whole setup auditable: every call the proxy allowed or denied during the session, including any attempt the agent made to reach destinations off the allowlist, shows up in <code>sbx policy log<\/code>.<\/p>\n<p>What this gives the engineer, end to end, is a working agent with a known and bounded reach. The agent can investigate, reason, and propose. It cannot execute its way into a region-wide outage.<\/p>\n<h3 class=\"wp-block-heading\">What This Looks Like in Practice<\/h3>\n<p>Stepping back from the Kiro story for a moment, the picture is straightforward. Docker Sandboxes gives an agent a real working environment, scoped credentials, a network boundary, and a path that throws everything away cleanly when the session ends. Compared with the way most engineers run AI coding agents today, the trade-offs look like this:<\/p>\n\n<div class=\"wp-block-ponyo-table\" data-highlighted-columns=\"null\" data-highlighted-rows=\"null\">\n<table class=\"responsive-table\">\n<tbody class=\"wp-block-ponyo-table-body\" data-highlighted-columns=\"[]\" data-highlighted-rows=\"[]\">\n<tr class=\"wp-block-ponyo-table-header\">\n<th class=\"wp-block-ponyo-cell\" data-responsive-table-heading=\"Security Aspect\">\n<p>Security Aspect<\/p>\n<\/th>\n<th class=\"wp-block-ponyo-cell\" data-responsive-table-heading=\"Traditional Agentic Setup\">\n<p>Traditional Agentic Setup<\/p>\n<\/th>\n<th class=\"wp-block-ponyo-cell\" data-responsive-table-heading=\"Docker Sandboxes\">\n<p>Docker Sandboxes<\/p>\n<\/th>\n<\/tr>\n<\/tbody>\n<tbody class=\"wp-block-ponyo-table-body\" data-highlighted-columns=\"[]\" data-highlighted-rows=\"[]\">\n<tr class=\"wp-block-ponyo-table-row\">\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Identity<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Engineer\u2019s full credentials<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Scoped identity per task<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<\/tr>\n<tr class=\"wp-block-ponyo-table-row\">\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Secret Handling<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Loaded into agent context<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Proxy-injected, never exposed<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<\/tr>\n<tr class=\"wp-block-ponyo-table-row\">\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Production Access<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Inherited from operator role<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Explicit allowlist or nothing<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<\/tr>\n<tr class=\"wp-block-ponyo-table-row\">\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Destructive Operations<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Execute at machine speed<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Reviewable before execution<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<\/tr>\n<tr class=\"wp-block-ponyo-table-row\">\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Audit Trail<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Per-engineer, post-hoc<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Per-sandbox, real-time sbx policy log<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<\/tr>\n<tr class=\"wp-block-ponyo-table-row\">\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Blast Radius<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Whatever the engineer can do<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<td class=\"wp-block-ponyo-cell\">\n                    <span class=\"responsive-table-label\"><\/span>\n<p>                    <span class=\"responsive-table-value\"><br \/>\n                                                    <span class=\"responsive-table-value-content\"><\/span><\/span><\/p>\n<p><span>Whatever the sandbox is configured for<\/span><\/p>\n<p>                    <br \/>\n                                            \n            <\/p><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>The row that matters most for the Kiro story is the second-to-last one. Without a sandbox, a destructive operation runs as fast as the API call leaving the agent\u2019s process. With a sandbox, that same operation has to clear the proxy first, which means it lands in the engineer\u2019s review queue instead of in production.<\/p>\n<h2 class=\"wp-block-heading\">Best Practices for Secure Agentic Production Work<\/h2>\n<ol class=\"wp-block-list\">\n<li><strong>Never give an agent your full production credentials.<\/strong> Create a scoped identity with the minimum permissions the specific task needs. If the agent is investigating a read-only issue, give it read-only access. The Kiro incident is what happens when this rule is skipped.<\/li>\n<li><strong>Inject secrets through a proxy, not through environment variables.<\/strong> A secret the agent never sees is a secret the agent cannot accidentally send to the wrong endpoint, leak in a log, or include in a code commit. Proxy injection turns the credential from data the agent holds into a capability the proxy provides.<\/li>\n<li><strong>Tag AI-assisted changes as a distinct change category.<\/strong> Track them, require senior review, and apply the two-person rule by default. This is not a slowdown for AI workflows. It is the same review discipline a senior engineer\u2019s pull request would get, applied to an actor that ships at machine speed.<\/li>\n<li><strong>Read the policy log.<\/strong> <code>sbx policy log<\/code> records every connection attempt the proxy allowed or denied during a session. A blocked attempt to reach a destructive endpoint is exactly the signal you would want to see, and it stays buried unless someone looks.<\/li>\n<li><strong>Pair adoption metrics with blast-radius metrics.<\/strong> Amazon\u2019s 80% Kiro target was a corporate OKR. The safeguards that should have moved alongside it were tracked nowhere. Pushing usage forward without also pushing safety boundaries forward is what set up the December outage.<\/li>\n<\/ol>\n<h3 class=\"wp-block-heading\">Take Action<\/h3>\n<p>The path to safe agentic work in production-adjacent environments starts with one shift: stop giving agents the credentials you give your humans.<\/p>\n<ul class=\"wp-block-list\">\n<li><strong>Install Docker Sandboxes.<\/strong> The<a href=\"https:\/\/docs.docker.com\/ai\/sandboxes\/\" rel=\"nofollow noopener\" target=\"_blank\"> Docker Sandboxes documentation<\/a> walks through installing sbx and running your first scoped-identity agent.<\/li>\n<li><strong>Read the security model.<\/strong> The<a href=\"https:\/\/docs.docker.com\/ai\/sandboxes\/security\/\" rel=\"nofollow noopener\" target=\"_blank\"> Docker Sandboxes security documentation<\/a> covers credential handling, isolation layers, network policies, and workspace trust in detail.<\/li>\n<li><strong>Try the proxy-injected secrets pattern.<\/strong> Running <code>sbx secret set<\/code> followed by <code>sbx run<\/code> is the quickest way to see how the threat model shifts when secrets sit outside the agent\u2019s context rather than inside it.<\/li>\n<\/ul>\n<p>If you\u2019re new to this series, <a href=\"https:\/\/www.docker.com\/blog\/ai-coding-agent-horror-stories-security-risks\/\">Issue 1<\/a> walks through the six categories of AI coding agent failures, and <a href=\"https:\/\/www.docker.com\/blog\/coding-agent-horror-stories-the-rm-rf-incident\/\">Issue 2<\/a> goes deep on the <code>rm -rf ~\/<\/code> incident on the filesystem layer.<\/p>\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n<p>The December Cost Explorer outage and the March outages on Amazon.com are points on the same line. They are what happens when an agent inherits an operator\u2019s credentials, when the safeguards designed for human pace meet a decision-making loop that moves a thousand times faster, and when adoption gets pushed forward without anything pushing the safety boundary forward with it.<\/p>\n<p>Amazon\u2019s response is the part of the story worth holding onto. \u201cUser error, specifically misconfigured access controls\u201d is true in the same way that \u201coperator error, not the missing guardrail\u201d was true for every famous industrial accident before guardrails were invented. The misconfigured access controls weren\u2019t a typo. They were the structural decision to scale agentic adoption without scaling the identity model around it. Everything Amazon added afterward, the peer review, the senior sign-off on AI-assisted changes, the 90-day code safety reset, the \u201ccontrolled friction\u201d Treadwell described, points at the same gap. The agent needed to operate in a smaller box than the engineer it was running on behalf of.<\/p>\n<p>Docker Sandboxes doesn\u2019t try to make the agent more cautious; it changes what the agent can reach. The credentials sit outside the boundary. The destructive endpoints sit off the allowlist. The agent gets a real working environment, but not the production control plane.<\/p>\n<p><em>Coming up in our series: Issue 4 will explore the GitGuardian sprawl report and the s1ngularity attack, where AI agents weaponized their own context windows to scan developer machines for credentials, and how proxy-injected secrets eliminate the exposure surface<\/em><\/p>\n<h3 class=\"wp-block-heading\">Learn More<\/h3>\n<ul class=\"wp-block-list\">\n<li><strong>Run agents safely with Docker Sandboxes:<\/strong> <a href=\"https:\/\/docs.docker.com\/ai\/sandboxes\/\" rel=\"nofollow noopener\" target=\"_blank\">Visit the Docker Sandboxes documentation<\/a> to get started.<\/li>\n<li><strong>Explore the Docker MCP Catalog:<\/strong> <a href=\"https:\/\/hub.docker.com\/mcp\" rel=\"nofollow noopener\" target=\"_blank\">Discover MCP servers<\/a> that connect your agents to external services through Docker\u2019s security-first architecture.<\/li>\n<li><strong>Download Docker Desktop:<\/strong> <a href=\"https:\/\/www.docker.com\/products\/docker-desktop\/\">The fastest path to a governed AI agent environment<\/a>, with Docker Sandboxes, MCP Gateway, and Model Runner in a single install.<\/li>\n<li><strong>Read the MCP Horror Stories series:<\/strong> <a href=\"https:\/\/www.docker.com\/blog\/mcp-security-issues-threatening-ai-infrastructure\/\">Start with issue 1<\/a> to understand the protocol-layer security risks that complement the agent-layer risks covered here.<\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>In Part 1, we walked through six categories of AI coding agent failures and why they keep happening. The agent [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4370,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-4369","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/4369","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=4369"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/4369\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media\/4370"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=4369"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=4369"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=4369"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}