Anthropic working with US Department of Energy's nuclear specialists to test if AI models leak nuke info

Anthropic is teaming up with the US Department of Energy’s (DOE) nuclear specialists to ensure its AI models don’t unintentionally provide sensitive information about nuclear-making weapons. This collaboration, which began in April, was revealed by Anthropic to Axios, and it’s a significant first in AI security. Anthropic’s model, Claude 3 Sonnet, is being “red-teamed” by experts at the DOE’s National Nuclear Security Administration (NNSA), who are testing whether people could misuse it for dangerous nuclear-related purposes, according to a report by Axios.

Red-teaming is a process where experts attempt to break or misuse a system to expose vulnerabilities. In this case, the specialists are evaluating if Claude’s responses could be exploited for developing nuclear weapons or accessing other harmful nuclear applications. The project will continue until February, and along the way, the NNSA will test the upgraded Claude 3.

5 Sonnet, which debuted in June. Anthropic has also leaned on its partnership with Amazon Web Services to prepare Claude for handling these high-stakes, government-focused security tests. Given the nature of this work, Anthropic hasn’t disclosed any findings from the pilot program.

The company intends to share its results with scientific labs and other organisations, encouraging independent testing to keep models safe from misuse. Marina Favaro, Anthropic’s national security policy lead, emphasised that while US tech leads AI development, federal agencies possess the unique expertise needed for evaluating national security risks, highlighting the importance of these partnerships. Wendin Smith of the NNSA reinforced the urgency, saying AI is at the centre of critical national security conversations.

She explained that the agency is well-positioned to assess AI’s potential risks, especially concerning nuclear and radiological safety. These evaluations are crucial as AI’s potential misuse could be catastrophic. The collaboration follows President Biden’s recent national security memo, which called for AI safety evaluations in classified environments.

Major players like Anthropic and OpenAI had already committed to testing their models with the AI Safety Institute back in August, signalling an industry-wide awareness of these concerns, as per the Axios report. Interestingly, as AI developers race for government contracts, Anthropic isn’t the only one in the game. It has just partnered with Palantir and Amazon Web Services to offer Claude to US intelligence agencies.

OpenAI, meanwhile, has deals with entities like NASA and the Treasury Department. Scale AI is also making moves, having developed a defence-focused model built on Meta’s Llama. However, it’s unclear if these partnerships will hold steady through the looming political changes in Washington.

Elon Musk, now a key figure in the incoming administration, has unpredictable views on AI safety. Although he has pushed for stricter controls in the past, his new venture, xAI, adopts a more hands-off, free-speech-oriented philosophy. All eyes are on how these evolving dynamics could shape the future of AI governance and safety testing.

.