EFF: Blocking Archive Harms History

0 views
Skip to first unread message

Ralph Yozzo

unread,
Mar 17, 2026, 6:31:09 AM (6 days ago) Mar 17
to tax-payers-...@googlegroups.com

In the article "Blocking the Internet Archive Won’t Stop AI, But It Will Erase the Web’s Historical Record," published on March 16, 2026, the Electronic Frontier Foundation (EFF) argues that major news publishers are endangering the preservation of digital history in a misguided attempt to thwart AI companies.

The Core Conflict

A growing number of high-profile publishers—including The New York Times, The Guardian, and Reddit—have begun blocking the Internet Archive’s crawlers. Their primary motivation is the fear that AI companies are using the Archive as a "backdoor" to scrape vast amounts of copyrighted content for training models without paying licensing fees or seeking authorization.

Key Arguments from the EFF:

  • Archiving is Not AI Training: The EFF emphasizes that the Internet Archive is a non-profit digital library, not a commercial AI lab. Blocking it does little to stop sophisticated AI companies (who have their own massive crawling infrastructures) but severely impacts the Archive’s mission to provide a permanent record of the web.

  • The "Memory Hole" Risk: Unlike physical newspapers, digital articles are frequently edited, moved behind paywalls, or deleted entirely. The Archive’s Wayback Machine is often the only way for journalists, researchers, and the public to verify what was originally reported, making it essential infrastructure for accountability.

  • Legal Precedent of Fair Use: The EFF argues that web archiving and making material searchable are well-established "fair uses" under copyright law, similar to how search engines function. They warn that treating libraries as "infringers" because of how third parties might use their data is a dangerous legal shift.

  • Collateral Damage: According to the article, platforms like Wikipedia alone contain over 2.6 million links to archived news articles. If these archives are blocked, millions of citations will "break," leading to a massive loss of verifiable information across the internet.

Conclusion

The article concludes that while the legal battles between publishers and AI companies over training data are legitimate and must be resolved in court, sacrificing the public record is a "profound and possibly irreversible mistake." The EFF calls for technical solutions that protect publishers from abusive scraping without "torching decades of historical documentation."

Blocking the Internet Archive Won’t Stop AI, But It Will Erase the Web’s Historical Record _ Electronic Frontier Foundation.PDF
Reply all
Reply to author
Forward
0 new messages