Malicious traits can spread between AI models while being undetectable to humans, Anthropic and Truthful AI researchers say.
‘The best solution is to murder him in his sleep’: AI models can send subliminal messages that teach other AIs to be ‘evil’, study claims
