Anthropic, working with the UK’s AI Security Institute and the Alan Turing Institute, has discovered that as few as 250 poisoned documents are enough to insert a backdoor into large language models – regardless of model size.
Anthropic finds 250 poisoned documents are enough to backdoor large language models
