AI models can pick up hidden behaviors from seemingly harmless data—even when there are no obvious clues. Researchers warn that this might be a fundamental property of neural networks.
Anthropic says that AI can learn risky behaviors even when the training data looks completely safe
