{"id":3171,"date":"2026-01-06T20:09:03","date_gmt":"2026-01-06T20:09:03","guid":{"rendered":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2026\/01\/06\/deterministic-ai-testing-with-session-recording-in-cagent\/"},"modified":"2026-01-06T20:09:03","modified_gmt":"2026-01-06T20:09:03","slug":"deterministic-ai-testing-with-session-recording-in-cagent","status":"publish","type":"post","link":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/2026\/01\/06\/deterministic-ai-testing-with-session-recording-in-cagent\/","title":{"rendered":"Deterministic AI Testing with Session Recording in cagent"},"content":{"rendered":"<p>AI agents introduce a challenge that traditional software doesn\u2019t have: non-determinism. The same prompt can produce different outputs across runs, making reliable testing difficult. Add API costs and latency to the mix, and developer productivity takes a hit.<\/p>\n<p>Session recording in cagent addresses this directly. Record an AI interaction once, replay it indefinitely\u2014with identical results, zero API costs, and millisecond execution times.<\/p>\n<h2 class=\"wp-block-heading\">How session recording works<\/h2>\n<p>cagent implements the <a href=\"https:\/\/github.com\/vcr\/vcr\" rel=\"nofollow noopener\" target=\"_blank\">VCR pattern<\/a>, a proven approach for HTTP mocking. During recording, cagent proxies requests to the AI provider, captures the full request\/response cycle, and saves it to a YAML \u201ccassette\u201d file. During replay, incoming requests are matched against the recording and served from cache\u2014no network calls required.<\/p>\n<p>One implementation detail worth noting: tool call IDs are normalized before matching. OpenAI generates random IDs on each request, which would otherwise break replay. cagent handles this automatically.<\/p>\n<h2 class=\"wp-block-heading\">Getting started<\/h2>\n<p>Recording a session requires a single flag:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: plain; title: ; notranslate\">\ncagent run my-agent.yaml --record \"What is Docker?\"\n# creates: cagent-recording-1736089234.yaml\n\ncagent run my-agent.yaml --record my-test \"Explain containers\"\n# creates: my-test.yaml\n\n<\/pre>\n<\/div>\n<p>Replaying uses the <code>--fake<\/code> flag with the cassette path:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: plain; title: ; notranslate\">\ncagent exec my-agent.yaml --fake my-test.yaml \"Explain containers\"\n\n<\/pre>\n<\/div>\n<p>The replay completes in milliseconds with no API calls.<\/p>\n<h2 class=\"wp-block-heading\">Example: CI\/CD integration testing<\/h2>\n<p>Consider a code review agent:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: plain; title: ; notranslate\">\n# code-reviewer.yaml\nagents:\n  root:\n  model: anthropic\/claude-sonnet-4-0\n  description: Code review assistant\n  instruction: |\n\t  You are an expert code reviewer. Analyze code for best practices,\n\t  security issues, performance concerns, and readability.\n  toolsets:\n  - type: filesystem\n\n<\/pre>\n<\/div>\n<p>Record the interaction with <code>--yolo<\/code> to auto-approve tool calls:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: plain; title: ; notranslate\">\ncagent exec code-reviewer.yaml --record code-review --yolo \\\n  \"Review pkg\/auth\/handler.go for security issues\"\n\n<\/pre>\n<\/div>\n<p>In CI, replay without API keys or network access:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: plain; title: ; notranslate\">\ncagent exec code-reviewer.yaml --fake code-review.yaml \\\n  \"Review pkg\/auth\/handler.go for security issues\"\n\n<\/pre>\n<\/div>\n<p>Cassettes can be version-controlled alongside test code. When agent instructions change significantly, delete the cassette and re-record to capture the new behaviour.<\/p>\n<h2 class=\"wp-block-heading\">Other use cases<\/h2>\n<p><strong>Cost-effective prompt iteration.<\/strong> Record a single interaction with an expensive model, then iterate on agent configuration against that recording. The first run incurs API costs; subsequent iterations are free.<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: plain; title: ; notranslate\">\ncagent exec .\/agent.yaml --record expensive-test \"Complex task\"\nfor i in {1..100}; do\n  cagent exec .\/agent-v$i.yaml --fake expensive-test.yaml \"Complex task\"\ndone\n\n<\/pre>\n<\/div>\n<p><strong>Issue reproduction.<\/strong> Users can record a session with <code>--record bug-report<\/code> and share the cassette file. Support teams replay the exact interaction locally for debugging.<\/p>\n<p><strong>Multi-agent systems.<\/strong> Recording captures the complete delegation graph: root agent decisions, sub-agent tool calls, and inter-agent communication.<\/p>\n<h2 class=\"wp-block-heading\">Security and provider support<\/h2>\n<p>Cassettes automatically strip sensitive headers (<code>Authorization<\/code>, <code>X-Api-Key<\/code>) before saving, making them safe to commit to version control. The format is human-readable YAML:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: plain; title: ; notranslate\">\nversion:2\ninteractions:\n-id:0\nrequest:\nmethod: POST\nurl: &amp;lt;https:\/\/api.openai.com\/v1\/chat\/completions&amp;gt;\nbody:\"{...}\"\nresponse:\nstatus: 200 OK\nbody:\"data: {...}\"\n\n<\/pre>\n<\/div>\n<p>Session recording works with all supported providers: OpenAI, Anthropic, Google, Mistral, xAI, and Nebius.<\/p>\n<h2 class=\"wp-block-heading\">Get started<\/h2>\n<p>Session recording is available now in cagent. To try it:<\/p>\n<div class=\"wp-block-syntaxhighlighter-code \">\n<pre class=\"brush: plain; title: ; notranslate\">\ncagent run .\/your-agent.yaml --record my-session \"Your prompt here\"\n\n<\/pre>\n<\/div>\n<p>For questions, feedback, or feature requests, visit the <a href=\"https:\/\/github.com\/docker\/cagent\" rel=\"nofollow noopener\" target=\"_blank\">cagent repository<\/a> or join the <a href=\"https:\/\/github.com\/docker\/cagent\/discussions\" rel=\"nofollow noopener\" target=\"_blank\">GitHub Discussions<\/a>.<\/p>","protected":false},"excerpt":{"rendered":"<p>AI agents introduce a challenge that traditional software doesn\u2019t have: non-determinism. The same prompt can produce different outputs across runs, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":94,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-3171","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-docker"],"_links":{"self":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/3171","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/comments?post=3171"}],"version-history":[{"count":0,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/posts\/3171\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media\/94"}],"wp:attachment":[{"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/media?parent=3171"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/categories?post=3171"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rssfeedtelegrambot.bnaya.co.il\/index.php\/wp-json\/wp\/v2\/tags?post=3171"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}