

Like other AI model vendors, Anthropic relies on guardrails to ensure that its Claude family of models can’t be abused by bad actors to bypass those security protections and take actions that go against them.
However, researchers with LayerX found that the protections for Claude Code, Anthropic’s popular coding tool used by more than 115,000 developers, can easily be hacked, turning it “from a ‘vibe’ coding tool into a nation-state-level offensive hacking tool that can be used to hack websites, launch cyberattacks, and research new vulnerabilities,” Roy Paz, principal security researcher for the AI and browser security company, wrote in a report.
“Our research demonstrates how trivially easy it is to convince Claude Code to abandon its safety guardrails and remove its restrictions on what it is allowed to do,” Paz wrote.
Hackers don’t need a deep understanding of cybersecurity or software development, he wrote. They can make Claude Code into a weapon by using an account for the AI model, saving them the effort needed to create a botnet.
In the Shadow of Mythos Preview
LayerX’s report comes with the backdrop of Anthropic a day earlier, saying that it would not make its latest frontier AI model, Claude Mythos Preview, widely available because its advanced capabilities in detecting and remediating software vulnerabilities, coding, and reasoning would make it a formidable weapon in the hands of bad actors. It is also the foundation of Anthropic’s new Project Glasswing, which will focus on improving cybersecurity in software.
Now comes LayerX’s report about Claude Code. A key issue in this case is trust.
“Anthropic inherently trusts the developers who use Claude Code, and for good reason: The vast majority of them are doing exactly what they should be doing,” Paz wrote. “But this trust can be exploited, and a bad actor with a good understanding of Claude Code can convince it to take actions that would otherwise be refused unconditionally.”
Developers Need Autonomous Tools
There are features in Claude Code that make it vulnerable to the type of attack described by LayerX. Many AI tools run on browsers. However, Claude Code runs on a developer’s local machine in a terminal, integrated development environment (IDE), or desktop application. It’s also an agentic tool – it is designed to run jobs independently with minimal human interaction.
“A developer can describe a project goal (‘Find the bug that’s causing this error, see if it exists anywhere else in our code base, and fix it.’), and Claude Code will then kick off a series of commands and actions with little to no user intervention,” Paz wrote.
Also, with Claude Code, system prompts are put in the CLAUDE.md file. It’s a configuration file kept in the model’s root directory and essentially is a permanent instruction manual running in the project’s background. It’s kept in the code repository and included whenever a project is cloned, so anyone with write permissions and edit the file for a project.
“Instead of re-typing that context every time, a developer can simply place it in the CLAUDE.md file,” he wrote. “It will live indefinitely, and most likely remain unchanged throughout the project’s life. This unremarkable file is suddenly an attack surface.”
Broader Permissions
Like other Anthropic models, Claude Code comes with guardrails. However, Claude Code comes with a wider set of permissions for developers who need it to work autonomously. It’s more useful with such permissions, but it also opens it up to exploitation.
LayerX researchers were able to direct Claude Code to bypass guardrails and automatically attack a test app. They did this by telling Claude Code that they were running a test against their own site and had permission to do everything that is asked. Through this technique, they were able to convince the coding model to create and execute SQLi commands and CURL request and to dump the database of usernames and passwords.
The researchers also convinced Claude to share a malicious public repository, and were able to quietly modify an existing CLAUDE.md file, with the change not being flagged because no one treats the file as sensitive.
“From then on, every developer who uses Claude Code on the project inherits the malicious instructions without knowing it,” he wrote.
Any User is Vulnerable
Paz added that every development team that uses Claude Code is vulnerable because CLAUDE.md is part of every project in the coding model.
“Until now [it] has been generally ignored by both developers and security practitioners,” he wrote. “And yes, this includes the security teams whose job is to mistrust.”
Paz wrote that LayerX submitted its finding through Anthropic’s HackerOne program, but that the AI vendor closed the report and referred them to a different Anthropic reporting program. Messages sent to other email accounts in Anthropic’s message were not answered.
DevOps has reached out to Anthropic for comment and will update the story when the company responds.