Here Be Dragons

Only read this if you have not met your required daily intake of scary news…

Are you sitting down? I hope so, ’cause Anthropic just shared some pretty eye-popping findings. In research they conducted with the UK AI Security Institute and the Alan Turing Institute, they found that slipping in as few as 250 malicious documents into LLM training can create an exploitable backdoor within the model. This results in “poisoning” of the model, allowing hackers to potentially change its behavior to do things like share sensitive information, take destructive action, or simply stop working correctly.

So, this only works when training small models, right? Nope. They were able to use the same number of documents to poison models ranging in size from 600M to 13B parameters. They conclude that in this case, size does NOT matter.

OK, but the documents used must be uber huge and super complex, right? Again – nope. The documents were straightforward and small, with only a few thousand tokens each. The total tokens the documents contained were just 0.00016% of the total used to train the larger models.

Any good news? Kinda. The researchers believe that this type of approach will be challenging for hackers to exploit fully. Hackers are also limited because they can’t know the types or amount of “poisoned” tokens that a model was trained on, if any at all. Additionally, the researchers believe that model builders can add post-training defenses to mitigate the poisoning.

I help companies understand the capabilities of AI technology, from tried-and-true machine learning and predictive models to full-blown autonomous, agentic AI. There is tremendous promise with the technology, but also tremendous risk. Companies need to understand this and plan and budget accordingly, mitigate risks, and be vigilant with their usage. So what do I tell clients who want to explore AI? I tell them, “Here be dragons.”

If you would like to learn more, the Antropic blog on the topic provides additional information, including a link to the full research paper.