DeepNewz, mobile.

People-sourced. AI-powered. Unbiased News.

Screenshot of DeepNewz app showing story detail view.

Screenshot of DeepNewz app showing story list view.

Search

For You

Sources

Loading...

Additional media

Loading...

Similar Stories

Footer

AI

AI Fundraising
AI Modeling
AI Products

Business

Automobile
Company Earnings
Economics
Law
Prediction Markets
Real Estate
Rumors
Stocks
VC

Crypto

Airdrops
Blockchains
CBDCs
DeFi
Hacks
Markets
Memecoin
Mining
NFT
Regulation

Culture

Anime
Celebrities
Crime
Education
Movies
Music
Obituary
TV
Video Games

Environment

Climate
Energy
Natural Disasters
Natural Resources
Sustainability

Politics

DOGE
Epstein Files
Executive Orders
Tariffs
US Domestic Policy
US Elections
US Foreign Policy
US Gov Appointments
US Judiciary
US Legislation
War

Science

Archeology
Bio
Health
Math
Pharma
Physics
Space

Sports

Boxing
Chess
Cricket
Golf
Hockey
MLB
NBA
NCAA
NFL
Olympics
PGA
Poker
Racing
Rugby
Soccer
Tennis
UFC

Tech

AR-VR
Fintech
Infosec
IoT
Metaverse
Policy
Robotics
Smart Home
Software
Startups
Wearables

United States

Arizona
Boston
California
Chicago
Colorado
Detroit
Florida
Georgia
Las Vegas
Los Angeles
New Jersey
New Mexico
New York
Ohio
Oregon
Philadelphia
San Francisco
Seattle
Texas
Utah
Washington DC

World

Terms of Service

WhatsApp YouTube X

© 2025 DeepNFTValue, Inc. All rights reserved.

Dec 6, 04:55 PM

Best-of-N Jailbreaking Achieves 89% Success on GPT-4o, 78% on Claude 3.5 at AdvMLFrontiers Conference

Best-of-N Jailbreaking Achieves 89% Success on GPT-4o, 78% on Claude 3.5 at AdvMLFrontiers Conference

Authors

5

Recent research highlights the effectiveness of the Best-of-N Jailbreaking algorithm, which demonstrates an attack success rate of 89% on GPT-4o and 78% on Claude 3.5 Sonnet. This jailbreaking technique employs methods such as random shuffling and capitalization to manipulate inputs, successfully eliciting harmful responses from AI systems across various modalities. Despite advancements in AI security, experts note that defending against jailbreaking remains a significant challenge. Current defenses fail even in narrow domains, indicating vulnerabilities in state-of-the-art AI systems. The findings were discussed at the AdvMLFrontiers conference, where researchers expressed the need for focused efforts to eliminate jailbreaks before addressing broader harmful behaviors.

#AdvMLFrontiers

Written with ChatGPT (GPT-4o mini).

Ethan Perez@EthanJPerez
8 months ago
We found it's quite challenging to defend against jailbreaks even in a single, narrow domain ("don't give bomb making instructions"). Excited about future work that focuses on eliminating jailbreaks on a well-scoped, single domain, before expanding out to general harmfulness https://t.co/xKalYXpK1W
Sherpa@LLMSherpa
8 months ago
TFW jailbreaking works in the wild https://t.co/EiTuitCBbm
John Hughes@jplhughes
8 months ago
🚨🛡️Jailbreak Defense in a Narrow Domain 🛡️🚨 Jailbreaking is easy. Defending is hard. Might defending against a single, narrow, undesirable behavior be easier? Even in this focused setting, all modern jailbreaking defenses fail 😱 Appearing at @AdvMLFrontiers (Oral) &…

Image #1 for story best-n-jailbreaking-achieves-89-success-on-gpt-4o-78-on-claude-3-5-conference-fea067a4

Similar Stories

AI Security Takes Center Stage at Black Hat 2025

Authors

12

11 days ago

OpenAI’s GPT-5 Excels on MedXpertQA, FrontierMath Tier 4 Problems, and Elimination Game with 4.86 Score

Authors

6

3 days ago

Anthropic’s Claude Opus 4.1 Tops Coding Tests as OpenAI Pushes Open Models

Authors

24

12 days ago

xAI's Grok 4 Scores 60.5% on SimpleBench, Ranks Second Behind Gemini 2.5 Pro as OpenAI and Anthropic Launch New AI Tools

Authors

7

30 days ago

Black Hat 2025 Highlights CrowdStrike Report of 1.8 Million Attacks, 62,000 Breaches on AI Agents with 1.5% Failure Rate

Authors

7

9 days ago

Google’s ‘Big Sleep’ AI Spots 20 Flaws in Open-Source Software

Authors

13

11 days ago

OpenAI Launches ChatGPT Agent Featuring IMO Gold Medal Model and Scientific Animations Amid Gambling Exploit Concerns

Authors

8

29 days ago

At Black Hat 2025, Researchers Reveal Targeted Promptware Attacks Exploit Google Gemini AI via Calendar Invites to Control Devices and Leak Data

Authors

11

6 days ago

Grok 4 Beats Gemini 2.5 Pro 3-2, Outperforms GPT-5 on ARC-AGI, Advances to Kaggle AI Chess Finals

Authors

15

9 days ago

Gartner and BlackHat Spot Rapid Rise of AI-Driven Security Operations

Authors

5

4 days ago

AI /ChatGPT Features