News Media Alliance

04/30/2026 | Press release | Distributed by Public on 04/30/2026 13:37

News Publishers Demand Accountability from Common Crawl Over Unauthorized Use of Content

Arlington, VA - As first reported in Bloomberg , yesterday the News/Media Alliance (NMA), representing a collection of leading news publishers, submitted a letter to Common Crawl demanding the archive site stop its unauthorized scraping and storage of content, and establish additional protocols to prevent publisher content in its database from being used by AI companies.

Common Crawl portrays itself as a source of historical web crawl data for use by researchers, academics, and historians. But Common Crawl is now straying from its original purpose to enrich large AI companies . As reported in the Atlantic , its archive has been a primary source used to train commercial AI models without authorization by publishers. By scraping and sharing news content, Common Crawl is fueling widespread copyright infringement and harming publishers' ability to license their content to those developers.

Common Crawl can take efforts to prohibit these unauthorized uses, and we are asking it to close the backdoor .

Danielle Coffey, President and CEO of the News/Media Alliance , said, "Common Crawl is blatantly taking our content without our permission and failing to honor our opt outs to remove content already taken. We encourage them to act like the good actor they claim to be, honor these requests, and make clear to their users that the content they scrape is not authorized for commercial use unless expressly permitted."

The Letter

The letter submitted by the News/Media Alliance demands that Common Crawl:

  • Add a clear directive on its opt-out registry warning users not to use the content for unauthorized uses and that such use is a breach of Common Crawl's terms
  • Revise its terms of use to explicitly state it prohibits use of its repository for AI purposes
  • Upon request of publisher, remove content from its repository
  • Add a clear statement to its website that says Common Crawl:
    • doesn't own/can't authorize use of scraped content in repository
    • prohibits unauthorized use of such content, including for AI purposes
    • respects IP of news pubs to prohibit such use
    • will remove content from archive upon publisher request
    • will add pub licensing contact info in registry upon request

NMA's letter also serves as notice that the publishers listed in the Exhibit are requesting to join Common Crawl's Opt-Out Registry, with the expectation that the company will enforce the requirements listed above.

This letter follows other news publishers, as well as International News Associations including the Danish Rights Alliance and Alliance de la Presse d'Information Générale , that have previously requested Common Crawl remove their articles to prevent unauthorized use by AI companies.

News Media Alliance published this content on April 30, 2026, and is solely responsible for the information contained herein. Distributed via Public Technologies (PUBT), unedited and unaltered, on April 30, 2026 at 19:37 UTC. If you believe the information included in the content is inaccurate or outdated and requires editing or removal, please contact us at [email protected]