A researcher has located and scraped roughly 100,000 ChatGPT conversations that were publicly visible on OpenAI’s website and indexed by Google, according to an investigation published by 404 Media on 5 August. The cache includes material ranging from draft non-disclosure agreements and vendor contracts to intimate relationship discussions, underscoring the sensitivity of data users sometimes place in the chatbot. The exposed dialogues originated from ChatGPT’s voluntary “share with search engines” setting, which required users to opt in twice but appears to have been misunderstood by many. Because the resulting pages followed a predictable URL structure, Google’s crawler indexed them, making the chats discoverable through simple search queries. OpenAI did not dispute the scale of the leak. In a statement provided to 404 Media, Chief Information Security Officer Dane Stuckey said the company has removed the sharing option, describing the experiment as too prone to accidental over-sharing. Stuckey added that OpenAI is working with search engines to purge the content and is rolling out the change to all users by 6 August. The incident intensifies scrutiny of privacy practices surrounding generative-AI tools. It follows an earlier report of “thousands” of indexed chats and coincides with a separate legal dispute in which The New York Times is seeking extensive access to ChatGPT logs. Together, the developments highlight growing pressure on OpenAI to balance transparency and user confidentiality.
Google unveils enterprise data science and engineering AI agents provide real-time analysis https://t.co/AJlBMYTw4H
➡️ OpenAI is providing 20 million user chats as part of a lawsuit concerning ChatGPT, while The New York Times is seeking access to 120 million. https://t.co/aUvy5s63Nl
OpenAI Offers 20 Million User Chats In ChatGPT Lawsuit. NYT Wants 120 Million. https://t.co/zQhzeTbGmX