
Recent research has shown that LLM agents, particularly #GPT-4, have the capability to autonomously hack websites across vulnerability classes like XSS, CSRF, SQL injection, and more. The study highlights the use of OpenAI Assistants API, LangChain, and Playwright, with GPT-4 emerging as the most successful agent.
🤖 LLM Agents can Autonomously Hack Websites Academic paper testing several LLMs across vulnerability classes like XSS, CSRF, SQL injection, and more → Uses OpenAI Assistants API, LangChain, and Playwright GPT-4 wins https://t.co/z1axioMI3X https://t.co/fgtHvCLJ3w
I wrote down some quick thoughts on that "LLM Agents can Autonomously Hack Websites" paper that has been going around. TLDR; no data, lack of transparency in methodology, no baseline testing against traditional penetrating testing tools https://t.co/is1bGAeuGY
reminds of that @open_phil sponsored paper that argued that we should ban powerful open source LLMs as it increases the risk of bio-terror, but forgot to include google search in their paper as a control group. https://t.co/Z0vtTiWYyA
