Mar 12, 02:58 AM

Google Unveils Attack Extracting OpenAI Model Data for <$20, Reveals Hidden Dimensions

Recent research highlights a significant vulnerability in large language models (LLMs) used by companies like OpenAI and Google. A paper titled 'CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models' explores the mechanisms behind jailbreaking attacks on LLMs, suggesting a hypothesis for a safety mechanism based on 'intent security recognition'. Concurrently, Google has disclosed a method titled 'Stealing Part of a Production Language Model', which demonstrates an attack capable of extracting the projection matrix of OpenAI's language models, ada and babbage, for less than $20. This method confirms the hidden dimensions of these models to be 1024 and 2048, respectively, and also recovers the exact hidden dimension size of gpt-3.5-turbo. The attack utilizes LLM API access to extract the model's entire projection matrix, specifically targeting production models of OpenAI through logit-bias queries, a technique used to influence the probability of outputs.

#OpenAI #Google

Written with ChatGPT (GPT-4).

Sources

Additional media

Image #1 for story google-unveils-attack-extracting-openai-model-data-20-hidden

Google Unveils Attack Extracting OpenAI Model Data for <$20, Reveals Hidden Dimensions

Sources

Additional media

Similar Stories