Web standards to be updated by Reddit to prevent automated website scraping

Web standards to be updated by Reddit to prevent automated website scraping
Web standards to be updated by Reddit to prevent automated website scraping

Social media company Reddit announced that it will improve a web standard used by the platform to restrict automated data scraping from its website in response to worries that AI businesses were evading the law.

Following concerns that AI firms were circumventing the regulation to harvest content for their systems, social media platform Reddit (RDDT.N) said on Tuesday that it will upgrade a web standard used by the platform to block automated data scraping from its website. The action was taken in response to accusations that artificial intelligence companies were using publisher content without attribution or consent to make summaries using AI.

Reddit announced that it would be updating the Robots Exclusion Protocol, also known as “robots.txt,” which is a commonly used standard for defining which areas of a website are acceptable for crawling. Additionally, the company announced that it will continue to use rate-limiting, a method of limiting the quantity of requests coming from a single source, and that it will prevent unidentified bots and crawlers from data scraping its website in order to gather and store raw data.

Robots.txt is now a crucial tool used by publishers to stop tech companies from utilizing their material for free to train AI algorithms and provide summaries for certain search queries. The content licensing startup TollBit informed publishers in a letter last week that some AI companies were abusing the web standard to scrape publisher websites. This is in response to a Wired investigation that opened a new browser and discovered that Perplexity, an AI search firm, most certainly got around attempts to use robots.txt to prevent its web crawler.

People accused the startup Perplexity earlier in June of stealing its investigative reports and using them without attribution for use in generative AI systems. Reddit announced on Tuesday that its information will remain accessible for non-commercial usage to academics and institutions like the Internet Archive.

Also readThe future of retail is all about tech-driven personalization and convenience, says Amit Kriplani, CTO at ace turtle

Do FollowCIO News LinkedIn Account | CIO News Facebook | CIO News Youtube | CIO News Twitter 

About us:

CIO News is the premier platform dedicated to delivering the latest news, updates, and insights from the CIO industry. As a trusted source in the technology and IT sector, we provide a comprehensive resource for executives and professionals seeking to stay informed and ahead of the curve. With a focus on cutting-edge developments and trends, CIO News serves as your go-to destination for staying abreast of the rapidly evolving landscape of technology and IT. Founded in June 2020, CIO News has rapidly evolved with ambitious growth plans to expand globally, targeting markets in the Middle East & Africa, ASEAN, USA, and the UK.

CIO News is a proprietary of Mercadeo Multiventures Pvt Ltd.