Study finds misuse of archive services by fringe communities on the web

In a large-scale analysis, Jeremy Blackburn, Ph.D., and his collaborators found that the misuse of web archive services led to lost advertising revenue for popular news websites.

Written by: Tiffany Westry

In a large-scale analysis, researchers from the University of Alabama at Birmingham, Cyprus University of Technology, and University College London reveal that fringe communities within Reddit and 4chan are pushing the use of URLs from archive services to avoid censorship and reduce advertising revenue from news sources with contrasting ideologies.

“Web archiving services play an increasingly important role in today’s information ecosystem by preserving online content,” said Jeremy Blackburn, Ph.D., assistant professor of computer science at the UAB College of Arts and Sciences. “News and social media posts have proven to be the most common types of content archived. Archive service URLs are widely shared on “fringe” communities within Reddit and 4chan to preserve potentially contentious content.

Researchers analyzed millions of URLs from and Wayback Machine shared across four social networks: Reddit, Twitter, Gab, and 4chan’s politically incorrect board (/pol/). The results of the study were published this week in a paper at 12 International Web and Social Media Conference at Stanford, CA.

The social media-specific analysis shows, among other things, that moderators take advantage of web archiving services to ensure the persistence of shared content on their community. In particular, they found that 44% of URLs and 85% of Wayback Machine URLs are shared by Reddit moderation bots. Web archiving services have also been found to be widely used for archiving and disseminating content related to conspiracy theories and global politics-related events, suggesting that these services play an important role in the ecosystem. alternative information.

Additional evidence shows that moderators of specific subreddits are forcing users to abuse web archive services to ideologically target certain news sources by depriving them of traffic and potential ad revenue. Shared unwanted news website links are removed and users are prompted to use a cached link, screenshot or

“For example, we observed that the subreddit ‘The Donald’ consistently targets advertising revenue from news sources with conflicting ideologies,” Blackburn said. “Moderation bots block URLs from these sites and prompt users to post archived URLs. By our conservative estimates, a popular news site like The Washington Post loses around $70,000 in advertising revenue per year due to the use of archive services on Reddit.

The analysis reveals that of 3,800 submissions made to Reddit using Washington Post links and 3,300 submissions with CNN links, 44% and 39% were removed.

“These findings underscore the importance of archival services in the web’s information and advertising ecosystems, the need to consider them carefully when studying social media, and when designing systems to detect and contain the cascade of misinformation across the web,” Blackburn said.

Blackburn is co-founder of the International Data-Driven Research Laboratory for Advanced Modeling and Analyticsor iDRAMA Lab, an international group of scientists focusing on modern socio-technical issues with expertise ranging from low-level cryptography to video games.

The article “Understanding web archiving services and their (mis)use on social media” can be found here.