Researchers astonished by tool’s apparent success at revealing AI’s hidden motives

Anthropic trains AI to hide motives, but different “personas” betray their secrets.