Sources
Summer YueDo LLMs hold knowledge that might be dangerous in the hands of a malicious user? Can hazardous knowledge be unlearned? Introducing WMDP: an open-source eval benchmark of 4,157 multiple-choice questions that serve as a proxy measurement of LLM’s risky knowledge in biosecurity,…
Scale AI📣 Announcing the release of the WMDP LLM benchmark, designed by Scale’s Safety, Evaluations, and Analysis Lab (SEAL) in partnership with @ai_risks (CAIS)! 🧵 https://t.co/d6EN47R751 https://t.co/ByiquDnZSO
Alexandr WangCan hazardous knowledge be unlearned from LLMs w/o harming other capabilities? @scale_AI and CAIS are releasing Weapons of Mass Destruction Proxy (WMDP), an eval for catastrophic AI risk & a way to unlearn this knowledge. 📝https://t.co/PMHEUJdbHQ 🔗https://t.co/X8QI9RGvF6



