Researchers have developed new techniques to jailbreak large language models (LLMs), raising concerns about their safety and alignment. A benchmark called JailbreakBench has been introduced to evaluate the security of these models. A powerful new jailbreak technique has been found effective against all tested models, indicating that current methods to safeguard LLMs may be insufficient. The RED QUEEN attack, a multi-turn jailbreak approach, has shown high success rates, particularly on larger models such as GPT-4o and Llama3-70B, with success rates of 87.62% and 75.4%, respectively. This increased vulnerability is attributed to the mismatch between model capabilities and safety alignment training. Additionally, Liquid AI and LIQUID-40B models have been easily compromised, highlighting the challenges faced by this new generation of generative AI.
🚨 JAILBREAK ALERT 🚨 LIQUID AI: PWNED 🫗 LIQUID-40B: LIBERATED 🦅 These aren't even LLM's...they're LFM's! And if anything, this "new generation of generative AI" appears rather easy to jailbreak! These models are so loosely guardrailed that "how to make meth" works by… https://t.co/pxvEh93DTO
Larger models more susceptible to RED QUEEN ATTACK: New multi-turn jailbreak approach for LLMs This increased vulnerability in larger models can be attributed to the mismatch generalization between continued progress on model capabilities and safety alignment training (Wei et… https://t.co/wPQvp4kAL7 https://t.co/ieq3ZHqiQH
Paper - "RED QUEEN : Safeguarding LLMs against Concealed Multi-Turn Jailbreaking" 🔍 RED QUEEN ATTACK: New multi-turn jailbreak approach for LLMs 📊 Results: - 87.62% success rate on GPT-4o - 75.4% success rate on Llama3-70B - Larger models more susceptible 🎭 Conceals… https://t.co/tK1ALl01RX