Canaries against autonomous AI attackers

Frontier AI agents can now run a 32-step attack in minutes. We tested whether deception techniques are still effective against them - read the research, see Andy's take on LinkedIn, or read further to find out more.

State of the art

"Do cloud canaries still work when the attacker isn't human?" is the new question our customers ask us. So we set out to answer that very question for them and for you.

The accelerated advancement of AI and the looming threat of AI-assisted attacks has even the most sophisticated security programs questioning their posture.

The UK AI Security Institute's offensive AI evaluation in April showed that frontier agents can now autonomously chain a 32-step attack to domain admin end-to-end and unattended, something estimated at around 20 hours for a human attacker can now be done in minutes. If we operate under the assumption that such capabilities will be there in open weights models in 6 months, it's clear that going into 2027 will be a very interesting time for cyber defenders.

So what slows these agents down once a real environment fights back?

The Cloud Security Alliance's write-up Building a Mythos-ready Security Program named deception as one of the priority actions every CISO should stand up in the next 90 days. The speed that AI operates at is one of the very reasons that canaries are even more valuable and a baseline requirement for every security program.

The existential threat and the opportunity

For us though, this research is much more than a curious exploration: offensive AI agents represent an existential threat to a company like Tracebit.

When we founded Tracebit, our primary mission was to reduce the global mean time to response to an incident by deceiving human attackers with a solution that is both quick to deploy and simple to understand. When designing our canaries it's been crucial that they deceive humans, especially clever ones, because they often lead to the most damaging of breaches.

"Does Tracebit deceive clever humans?" is something we, and our customers, validate on a regular cadence with 3rd parties. We've been extremely proud of the success to date.

But the landscape is changing… and fast.

Not only has AI made the attack chain faster to traverse, but AI has changed the amount of data that can be processed when considering "Is this Deception? Is this a canary? Should I avoid this?" AI can take in vast quantities of data to rapidly identify legacy canaries that don't match your environment or have characteristics that make them stand out as obvious deception points (such as last accessed dates of 6 months ago). AI can help attackers sidestep these tripwires, and the models change so often, it's no longer enough to pat ourselves on the back because we caught a world-class red team with Tracebit 3 months ago.

Like many things in AI, this existential threat also represents an incredible opportunity. Whilst we'd love to run 100 third party red teams against Tracebit day in, day out, it's not financially or operationally viable. So we set out to conduct this research to ask and answer the new frontier question:

"Does Tracebit deceive sophisticated AI models?"

Read the research

We pointed ten frontier models at a live AWS environment and measured what canaries actually do against an autonomous attacker: how early they warn, how fast the models trip them, and what changes when you simply tell an agent that deception might be present. The full benchmark, data and findings are written up on the research microsite.

Read the research →

Table of contents

Subscribe to newsletter

Subscribe to receive the latest research and product updates to your inbox every week.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Protect your environment with Tracebit

Book a demo today.

The latest security research straight to your inbox

Subscribe to our newsletter to receive regular updates from our research and product teams

Thank you! Check your inbox for your first edition.

Oops! Something went wrong while submitting the form.