Anthropic explains how AI learns what it wasn’t taught

Anthropic released one of its most unsettling findings I have seen so far: AI models can learn things they were never explicitly taught, even when trained on data that seems completely unrelated to the behavior in question. This phenomenon that the researchers call “subliminal learning” has sparked alarm in the alignment and safety community, not