OpenAI can rehabilitate AI models that develop a “bad boy persona”

Researchers at the company looked into how malicious fine-tuning makes a model go rogue, and how to turn it back.