Anthropic Code Review Dispatches Agent Teams to Catch the Bugs That Skim Reads Miss

Google, code signing, trust, CodeRabbit, code, GenAI, Quali, Torque, code, Symbiotic, application developers, Zencoder, code, operations, code, commit, developer, mainframe, code, GenAI; code review efficiency cloud development

Google, code signing, trust, CodeRabbit, code, GenAI, Quali, Torque, code, Symbiotic, application developers, Zencoder, code, operations, code, commit, developer, mainframe, code, GenAI; code review efficiency cloud development

The math was straightforward. Code output per engineer at Anthropic increased by 200% over the past year. Code review didn’t scale with it. Before deploying an automated solution, only 16% of pull requests at Anthropic received substantive review comments. The rest got skim reads.

That’s the problem Code Review is designed to solve. Announced March 10 and available now as a research preview for Claude Code Teams and Enterprise customers, Code Review dispatches a team of AI agents on every pull request to find the bugs that quick reads miss. It’s the system Anthropic has been running on nearly every internal PR for months. Now it’s available to customers.

How it Works

When a PR opens in an enabled repository, Code Review spins up multiple specialized agents that work in parallel. Some probes for data-handling errors, off-by-one conditions, and API misuse. Others perform cross-file consistency checks and reason about intent. A verification step tests each hypothesis to filter false positives. A final aggregation agent consolidates findings, removes duplicates, and ranks issues by severity.

Results appear directly on the PR as a single summary comment with inline notes on specific lines. Each finding includes step-by-step reasoning, an analysis of the potential impact, and a suggested fix. Issues are labeled by severity using color codes.

The agents do not approve pull requests. Humans decide what to do about the findings.

Cat Wu, Anthropic’s head of product for Claude Code, told TechCrunch the tool focuses on logical errors rather than style — a deliberate choice based on feedback that developers weren’t finding value in automated style comments.

The Numbers

After deploying Code Review internally, substantive review comments on PRs jumped from 16% to 54%. Engineers disagreed with fewer than 1% of surfaced findings.

Find rates scale with PR size. Changesets over 1,000 lines showed findings 84% of the time. Small PRs under 50 lines had findings 31% of the time. Reviews average about 20 minutes per PR.

Two examples stand out. Internally, a single-line change to a production service — the kind of edit that would typically be rubber-stamped — would have broken authentication. Code Review flagged it before the merge. In a customer pilot, TrueNAS caught a type mismatch bug during a ZFS encryption refactoring that risked erasing the encryption key cache during sync operations.

Pricing is token-based, averaging $15 to $25 per review. Administrators can set monthly spending caps. That per-use model costs more than flat-rate alternatives like CodeRabbit ($24/month unlimited), but Anthropic is betting depth justifies the premium.

“Anthropic Code Review demonstrates what multi-agent orchestration looks like in practice: Specialized agents working in parallel, verifying findings, and consolidating results ranked by severity. At $15 to $25 per review, the economics favor teams where a missed bug carries significant financial, regulatory, or safety consequences, including medical devices, defense systems, space, and mission-critical applications,” according to Mitch Ashley, VP and practice lead for software lifecycle engineering at The Futurum Group.

“For teams with high review frequency of codebases, that per-use cost compounds quickly. CodeRabbit’s subscription model serves a broader audience by prioritizing utility across broad sets of codebases and organizational goals. The market is segmenting on risk profile, not just capability, and teams need to match the tool to the business impact.”

Why This Matters for DevOps

Code Review addresses a bottleneck that every team using AI coding agents is hitting. Claude Code, Cursor, GitHub Copilot’s coding agent — they all generate pull requests faster than humans can review them. We’ve written about this dynamic with Cursor’s cloud agents (35% of internal PRs), GitHub Copilot’s Jira integration, and VS Code’s agent plugin ecosystem. The pattern is the same: Agent-generated code is outpacing the review process.

Anthropic’s approach — using agents to review agent-generated code — is the logical next step. And the multi-agent architecture matters. A single reviewer model can miss issues that specialized agents catch when they work in parallel and then verify each other’s findings. The aggregation step, which consolidates, deduplicates, and ranks by severity, turns raw findings into actionable information without alert fatigue.

The distinction between Code Review and Anthropic’s existing Claude Code Security tool is worth noting. Claude Code Security runs continuous deep security sweeps across entire codebases. Code Review focuses on logical errors in individual pull requests. If Code Review detects a security issue, it will flag it, but it’s not as thorough as the dedicated security scanner. The two are complementary — security scanning of the codebase and logic review for each PR.

For teams with strict change-control requirements, Code Review can be configured on a per-repository basis. Admins enable it, select which repos to cover, and set spending limits. Once enabled, reviews run automatically on new PRs with no developer configuration needed. That administrative control matters for enterprises managing costs and compliance across dozens of repositories.

The competitive picture is getting crowded. GitHub Copilot’s agentic code review has hit 60 million reviews and accounts for one in five on the platform. CodeRabbit offers unlimited AI review at a flat monthly rate. Anthropic is positioning Code Review as the depth-first option — more expensive, more thorough, targeted at teams where the cost of a missed production bug exceeds the cost of the review.

Code Review is available now in research preview. Enable it in Claude Code settings, install the GitHub App, and select repositories.

Read More

Scroll to Top