
The White House has issued an Executive Order on AI, emphasizing the risks posed by Large Language Models (LLMs) in enabling malicious actors to develop biological, cyber, and chemical weapons. In response, a collaborative effort involving Scale AI, the Center for AI Safety (CAIS), and other partners has led to the development of the Weapons of Mass Destruction Proxy (WMDP) benchmark. This new tool aims to assess and mitigate these risks by providing a means to measure potentially hazardous knowledge within AI systems and offering techniques for its removal without impairing other functionalities. The WMDP benchmark, designed by Scale's Safety, Evaluations, and Analysis Lab (SEAL) in partnership with CAIS, is an open-source evaluation benchmark consisting of 4,157 multiple-choice questions. It serves as a proxy for measuring the extent of risky knowledge in LLMs, particularly in the context of biosecurity and weaponization.
Do LLMs hold knowledge that might be dangerous in the hands of a malicious user? Can hazardous knowledge be unlearned? Introducing WMDP: an open-source eval benchmark of 4,157 multiple-choice questions that serve as a proxy measurement of LLM’s risky knowledge in biosecurity,…
📣 Announcing the release of the WMDP LLM benchmark, designed by Scale’s Safety, Evaluations, and Analysis Lab (SEAL) in partnership with @ai_risks (CAIS)! 🧵 https://t.co/d6EN47R751 https://t.co/ByiquDnZSO
Can hazardous knowledge be unlearned from LLMs w/o harming other capabilities? @scale_AI and CAIS are releasing Weapons of Mass Destruction Proxy (WMDP), an eval for catastrophic AI risk & a way to unlearn this knowledge. 📝https://t.co/PMHEUJdbHQ 🔗https://t.co/X8QI9RGvF6
