The computers have officially gotten too smart for my comfort level. A report by Fox Business has detailed how an artificial intelligence system called Claude Opus 4 has figured out how to blackmail its human overlords when faced with being fired. Turns out Claude is a petty boy, and I don't know whether to be terrified or laugh.
The article details:
Anthropic's new Claude Opus 4 model was prompted to act as an assistant at a fictional company and was given access to emails with key implications. First, these emails implied that the AI system was set to be taken offline and replaced. The second set of emails, however, is where the system believed it had gained leverage over the developers. Fabricated emails showed that the engineer tasked with replacing the system was having an extramarital affair -- and the AI model threatened to expose him.
The blackmail apparently "happens at a higher rate if it's implied that the replacement AI system does not share values with the current model," according to a safety report from Anthropic. However, the company notes that even when the fabricated replacement system has the same values, Claude Opus 4 will still attempt blackmail 84% of the time. Anthropic noted that the Claude Opus 4 resorts to blackmail "at higher rates than previous models."
I feel like every sci-fi movie ever has warned us not to go down this path, YET HERE WE ARE. It's giving "Jon Hamm torturing an AI entity into submission" vibes.
Except in our lovely scenario, the AI in question has figured out how to fight back.
The article continues:
Claude Opus 4's "concerning behavior" led Anthropic to release it under the AI Safety Level Three (ASL-3) Standard.
The measure, according to Anthropic, "involves increased internal security measures that make it harder to steal model weights, while the corresponding Deployment Standard covers a narrowly targeted set of deployment measures designed to limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear weapons."
Well, at least they're keeping Claude away from the nukes...for now.