Cloudflare has publicly accused Perplexity AI, an artificial intelligence search company, of employing stealth crawling techniques to scrape data from websites that explicitly prohibit such activity through no-crawl directives like robots.txt rules and network blocks. According to Cloudflare, Perplexity uses undeclared user agents, rotating IP addresses, and modifies its web-crawling bots repeatedly to evade detection and circumvent website preferences. These tactics allegedly allow Perplexity to access restricted content despite being blocked by official channels. The accusations have sparked a dispute between the two companies, with Perplexity defending its AI assistant technology by stating that its operations are based on real-time, user-driven requests rather than traditional web scraping methods. Perplexity also claims that Cloudflare's accusations are based on errors and misunderstandings and criticizes Cloudflare for not sharing its methodology or misattributing crawling activity to Perplexity's AI assistant. The conflict highlights ongoing tensions in the AI industry regarding data scraping practices and web crawling ethics.
By David Uzondu - Mozilla recently rolled out a new on-device AI feature in Firefox, but it's already drawing complaints from users reporting high CPU usage and faster battery drain. #Mozilla #Firefox #AI https://t.co/L0JjyBgGCC
⚔️ Cloudflare accuses Perplexity of ignoring robots.txt and scraping blocked sites. Perplexity says it is a user-driven agent that fetches pages per query, not a bulk crawler. The fight is about who sets the rules for web access. robots.txt is the long standing do not crawl https://t.co/3RvmFNj2Tv
Cloudflare has since removed Perplexity's bots from its list of verified bots and implemented new AI-blocking techniques. https://t.co/zbX2dz4nn3