
OpenAI has introduced a new safety initiative called the Instruction Hierarchy, aimed at enhancing the robustness of large language models (LLMs) against prompt injections and other deceptive inputs that could lead to unsafe actions. This initiative, part of OpenAI's latest research, seeks to address vulnerabilities in LLMs that allow adversaries to manipulate models by overwriting original instructions with malicious prompts. The research has been described as the most detailed evaluation of prompt injection issues by OpenAI to date.



Prompt injections refer to a broad category of attacks on #LLMs and multimodal models—but what specific techniques are being used and how can we mitigate harmful effects? In our recent blog post by @teresa_datta, you’ll find: 👉 An explanation of direct vs. indirect prompt… https://t.co/UmX3HTEC95
Safety research on robustness to LLM attacks such as prompt injection, by @lilianweng and team: https://t.co/ZjiUaGmKlX
Safety research on robustness to prompt injections, by @lilianweng and team: https://t.co/ZjiUaGmKlX