Reddit said on Monday that it will block the Internet Archive’s Wayback Machine from indexing nearly all of its content after discovering that artificial-intelligence companies were using archived pages to harvest data without permission. The restrictions, which begin rolling out immediately, will confine the crawler to Reddit’s homepage and prevent it from saving post detail pages, comments or user profiles, spokesperson Tim Rathschmidt confirmed. The San Francisco-based company told the Internet Archive of the change in advance and said access could be restored once the non-profit can better deter third-party scraping and honor deletion requests. The Internet Archive, whose Wayback Machine stores billions of web snapshots for historians and journalists, has not publicly commented. Locking out the Wayback Machine deepens Reddit’s campaign to control and monetise its vast trove of user discussions. Over the past 18 months the platform has struck multimillion-dollar data-licensing agreements with Google and OpenAI, curbed search-engine crawlers that decline to pay, and sued Anthropic in June for alleged unauthorised scraping. Researchers warn the move may hinder efforts to audit deleted posts and study online discourse, while transparency advocates say it weakens public preservation of internet history. Reddit counters that the step is necessary to protect user privacy and prevent free riders from exploiting its data for commercial AI systems.
Reddit cuts back Wayback Machine access to protect user privacy as AI companies push to scrape content, reshaping how online communities safeguard their data. https://t.co/613PV49NwA
National Public Data is back with new owners, joining the ranks of other creepy, people-finding services. Here's how to get your profile removed from the site. https://t.co/JCCnD92U7P
The internet is about to get a little worse as Reddit moves to block the Internet Archive so AI companies can't scrape its content https://t.co/JreSSmkQDs