reddit funding reuters 1630658296339 1686113895134.jpg
reddit funding reuters 1630658296339 1686113895134.jpg

Reddit will update the web standard to block the automatic download of data from its website

Social media platform Reddit said Tuesday it will update the web standard used by the platform to block automatic data scraping from its website, following reports that AI startups were circumventing the rule to harvest content for their systems.

The move comes at a time when AI companies have been accused of plagiarizing content from publishers to create AI-generated summaries without giving credit or asking for permission.

Reddit said it will update the Robots Exclusion Protocol, or “robots.txt,” a widely accepted standard that is meant to determine which parts of a website are allowed to be crawled.

The company also said it would keep rate throttling, a technique used to control the number of requests from one particular entity, and block unknown bots and crawlers from scraping data — collecting and storing raw information — on its website.

Recently, robots.txt has become a key tool used by publishers to prevent tech companies from using their content for free to train AI algorithms and create summaries in response to some search queries.

Last week, content licensing startup TollBit sent a letter to publishers saying several AI companies were circumventing web standards to scrape publisher sites.

This follows a Wired investigation that found AI search startup Perplexity may have circumvented efforts to block its crawler via robots.txt.

Earlier in June, business media publisher Forbes accused Perplexity of plagiarizing its research stories for use in generative AI systems without giving credit.

Reddit said Tuesday that researchers and organizations such as the Internet Archive will continue to have access to its content for non-commercial use.

© Thomson Reuters 2024


Affiliate links may be automatically generated – see our ethics statement for details.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *