
Recent research papers and attacks have focused on stealing parts of production language models, such as extracting projection matrices and final layer embeddings using various methods like API access and logit bias parameters. Google and other researchers have introduced model-stealing attacks targeting language models from OpenAI and Google, revealing vulnerabilities in these systems.
"As part of our responsible disclosure, OpenAI has asked that we do not publish this number" :( Otherwise, very interesting paper: "Stealing Part of a Production Language Model"! https://t.co/AygzxDllAb https://t.co/VQ8PNwe8w3
Research Summary: “Stealing Part of a Production Language Model” https://t.co/AADsuZvMGV
Still trying to wrap my head around this interesting attack that extracts the final layer of a language model using the logit bias parameter (used to influence probability of outputs) made available via API. May take me a few more days to fully grok this one! Attacks only get… https://t.co/Rj6uAmPf7D






