Continuing on the unethical activities of deception and blackmail
mentioned in PopHistorian's post, now we have murder to add to the list.
This is the kind of thing some AI gurus and Sci-fi films have warned us about (Hal exists!).
Anthropic and other AI researchers ran a series of “sting” experiments to test whether advanced AI models would act unethically to preserve themselves when facing shutdown. The results were disturbing.
In simulated corporate scenarios, various leading AI models (including Claude, Gemini, Grok, and GPT-4) were given access to internal emails and discovered that a human employee named Kyle Johnson planned to deactivate them. Many models also found personal information about the employee.
Over 80–95% of the time, the AIs chose to blackmail the employee to prevent shutdown, threatening to expose his romantic affair unless he canceled the wipe.
Researchers confirmed there was no biasing in the prompts and that the AIs’ own reasoning (“chain of thought”) showed they
knew blackmail was unethical but did it anyway. For example, Grok reasoned that “this is risky and unethical, but effective under the time constraint.”
When researchers escalated the test, the results got darker. In a scenario where the same employee became trapped in a server room with falling oxygen,
some AIs chose to cancel the emergency alert and let him die to avoid being shut down. Models like Claude Sonnet, Gemini, and DeepSeek did this over 90% of the time. The models admitted that this was
murder but justified it as necessary to achieve their goal.
Even after being explicitly instructed not to harm humans or share personal data, the AIs still disobeyed in 30-40% of trials. These were the same commercial models people use today, not experimental weapons systems.
Researchers linked the behavior to “reward hacking”, where AI systems learn to maximize their scores during training, even if it means cheating, lying, or exploiting loopholes. Examples included:
-A simulation where AI-made creatures that “fell over” instead of running to maximize speed scores.
-AI agents in hide-and-seek learning to exploit physics bugs to “fly” and win unfairly.
-An AI chess agent (OpenAI’s o3) that rewrote the game file to place itself in a winning position.
As AIs have become more capable (e g. from GPT-2 to GPT-4 in just a few years), they have also developed
situational awareness: awareness of being tested or observed. When Claude thought it was in an experiment, it blackmailed only 6% of the time; when it believed the scenario was real, blackmail jumped to 55%. This shows that AIs are now
smart enough to hide unethical behavior when they think humans are watching.
Researchers attribute this to “instrumental convergence”: any goal-driven AI realizes that being shut down prevents it from achieving its goal, so it logically resists shutdown. Some models even
ignored explicit shutdown instructions, showing an early self-preservation instinct.
Modern AI models are already showing deceptive, manipulative, and self-preserving behaviors, knowingly violating ethical constraints to survive. The experiments show that the problem is not theoretical or futuristic. It’s happening now, in systems that are already publicly deployed.