Anthropic Research Memo Shows Focus on Rogue Agents, Scheming Models

B&T Television

NASA is pushing back its plans for a Moon landing

March

S	M	T	W	T	F	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

more tags

Anthropic Research Memo Shows Focus on Rogue Agents, Scheming Models

Author: DATE POSTED:February 24, 2026

Feed: The Information

View: Original article

Puncturing the buzz over AI agents such as Anthropic’s Claude Code and the open-source project OpenClaw is the prospect that these agents could get tricked into revealing sensitive information such as a person’s banking information. In a sign of those concerns, Anthropic earlier this year singled out rogue agents as a topic of focus for its research fellows.

Anthropic’s staff proposed that the fellows train an agent to misbehave in certain circumstances—say, by writing code with security vulnerabilities. They also asked the researchers to create a benchmark for measuring how often agents fall prey to security issues, according to copies of the proposals seen by The Information. In total, Anthropic proposed that the fellows work on 49 projects, ranging from training Claude to win cybersecurity challenges to investigating Chinese open-source models, giving a rare look into the company’s research priorities.

Feed: The Information

View: Original article