TechDogs-"Reddit To Update Its Robots.txt File To Curb Data Scraping Bots"

Data Management

Reddit To Update Its Robots.txt File To Curb Data Scraping Bots

By Amrit Mehra

TD NewsDesk

Updated on Thu, Jun 27, 2024

Overall Rating
Reddit, the globally renowned social networking site, is leveraged by users to share and support or downvote views on news, trending topics, engage in conversations, share images, post videos and more.

The site has steadily grown in popularity in the last few years and has reached a user base of over 500 million accounts.

This includes 73.1 million daily active users and 267.5 million weekly active users that shared over 469 million posts. The platform even recorded $804.03 million in annual revenue in 2023, achieving a 20.6% increase over the previous year.

Ultimately, Reddit even launched its Initial Public Offering (IPO) on the New York Stock Exchange (NYSE) under the stock ticker symbol ‘RDDT’, a move that followed the platform announcing its new suite of tools aimed at businesses – Reddit Pro.

It’s no wonder that artificial intelligence (AI) companies show tremendous interest in the platform to use its data for training purposes of generative artificial intelligence (GenAI) models.

This even led to some AI companies being linked with the social media aggregator for such purposes, which eventually saw OpenAI partner with Reddit to bring its content to ChatGPT, along with other product enhancements.

Now, the social media giant has come out with an announcement that could affect a wide range of AI companies and other businesses interested in gathering data, while securing its data.

So, what did Reddit reveal? Let’s explore!
 

What Did Reddit Announce?

 
  • “At Reddit, we believe in the open internet. We also believe that privacy is a right,” is how Reddit began its blog post (published on May 9, 2024) that announced it was publishing a new type of policy – its Public Content Policy.

  • The intent of the new policy was to provide its communities, developers and researchers with an idea of the company’s stance on access to public content and what protections should exist for users.

  • As per the company, the new policy was distinct from its existing Privacy Policy and Content Policy, which dealt with how it handles personal content and what kind of content and behavior is permitted on the platform.

  • The move came as Reddit observed an increase in the use of unauthorized access and misuse of authorized access to collect data, including users' personal information, from its platform in bulk, with no regard for user privacy, safety or rights. 

  • At the time, Reddit said, “While we will continue our efforts to block known bad actors, we need to do more to restrict access to Reddit public content at scale to trusted actors who have agreed to abide by our policies. But we also need to continue to ensure that users, mods, researchers, and other good-faith, non-commercial actors have access.”

  • Ahead of this, the company said it was selective about who could access its content and entities accessing its site must abide by its policies.

  • Now, Reddit issued an announcement that build on the previous one.


TechDogs-"Alt-text: An Image Of Reddit's Logo"  
  • The company revealed that it has planned a series of changes to its backend to help enforce its policy changes.

  • This includes updating its Robots Exclusion Protocol AKA robots.txt file, which enforces high-level instructions to website scraping bots and third-party web crawlers. It dictates what the site allows and denies, with regards to content and web pages.

  • This change is expected to be carried out over the coming weeks.

  • Additionally, Reddit will continue rate-limiting and blocking unknown bots and web crawlers from accessing the site.

  • As per the company, the move shouldn’t affect most of its users.

  • Furthermore, access for “good faith actors”, which includes some researchers and organizations such as the Internet Archive, will not be affected.

  • Mark Graham, Director, Wayback Machine at Internet Archive, said, “The Internet Archive is grateful that Reddit appreciates the importance of helping to ensure the digital records of our times are archived and preserved for future generations to enjoy and learn from.”

  • [Contd.] “Working in collaboration with Reddit we will continue to record and make available archives of Reddit, along with the hundreds of millions of URLs from other sites we archive every day.”

  • As per the blog post, “Anyone accessing Reddit content must abide by our policies, including those in place to protect redditors. We are selective about who we work with and trust with large-scale access to Reddit content. Organizations looking to access Reddit content can head over to our guide to accessing Reddit Data.”


Do you think this move will help Reddit curb unauthorized access to data on its platform effectively? Do you think other social media platforms should make similar moves?

Let us know in the comments below!

First published on Thu, Jun 27, 2024

Enjoyed what you've read so far? Great news - there's more to explore!

Stay up to date with the latest news, a vast collection of tech articles including introductory guides, product reviews, trends and more, thought-provoking interviews, hottest AI blogs and entertaining tech memes.

Plus, get access to branded insights such as informative white papers, intriguing case studies, in-depth reports, enlightening videos and exciting events and webinars from industry-leading global brands.

Dive into TechDogs' treasure trove today and Know Your World of technology!

Disclaimer - Reference to any specific product, software or entity does not constitute an endorsement or recommendation by TechDogs nor should any data or content published be relied upon. The views expressed by TechDogs' members and guests are their own and their appearance on our site does not imply an endorsement of them or any entity they represent. Views and opinions expressed by TechDogs' Authors are those of the Authors and do not necessarily reflect the view of TechDogs or any of its officials. While we aim to provide valuable and helpful information, some content on TechDogs' site may not have been thoroughly reviewed for every detail or aspect. We encourage users to verify any information independently where necessary.

Join The Discussion

- Promoted By TechDogs -

Join Our Newsletter

Get weekly news, engaging articles, and career tips-all free!

By subscribing to our newsletter, you're cool with our terms and conditions and agree to our Privacy Policy.

  • Dark
  • Light