Anthropic says that AI can learn risky behaviors even when the training data looks completely safe

AI models can pick up hidden behaviors from seemingly harmless data—even when there are no obvious clues. Researchers warn that this might be a fundamental property of neural networks.