Reddit has announced it will block the Internet Archive's Wayback Machine from indexing most of its pages, including post detail pages, comments, and user profiles, limiting access to only the homepage. This decision follows Reddit's discovery that artificial intelligence companies have been scraping user data from the Wayback Machine, violating Reddit's platform policies. Reddit spokesperson Tim Rathschmidt confirmed that while the Internet Archive provides a valuable service to the open web, AI firms have been using it to bypass licensing fees and collect data without authorization. The move aims to protect user privacy and curb unauthorized data scraping by AI companies, reshaping how online communities safeguard their content. The restriction effectively prevents the Internet Archive from archiving the majority of Reddit's content, impacting the preservation of the site's digital memory.
LLM chatbots trivial to weaponise for data theft, say boffins https://t.co/Fuv36FzK5O
Reddit, harto de que usen su contenido para entrenar la IA, ha tomado una triste decisión: no aparecer en el "archivo de Internet" https://t.co/iD0yWw5VIt
Reddit está harta de las empresas de IA usando sus cosas, sin pagarles: bloquea el acceso a quien archiva la historia de la web https://t.co/FMv9dTKR9B