04/30/2026 | Press release | Distributed by Public on 04/30/2026 13:37
Arlington, VA - As first reported in Bloomberg , yesterday the News/Media Alliance (NMA), representing a collection of leading news publishers, submitted a letter to Common Crawl demanding the archive site stop its unauthorized scraping and storage of content, and establish additional protocols to prevent publisher content in its database from being used by AI companies.
Common Crawl portrays itself as a source of historical web crawl data for use by researchers, academics, and historians. But Common Crawl is now straying from its original purpose to enrich large AI companies . As reported in the Atlantic , its archive has been a primary source used to train commercial AI models without authorization by publishers. By scraping and sharing news content, Common Crawl is fueling widespread copyright infringement and harming publishers' ability to license their content to those developers.
Common Crawl can take efforts to prohibit these unauthorized uses, and we are asking it to close the backdoor .
Danielle Coffey, President and CEO of the News/Media Alliance , said, "Common Crawl is blatantly taking our content without our permission and failing to honor our opt outs to remove content already taken. We encourage them to act like the good actor they claim to be, honor these requests, and make clear to their users that the content they scrape is not authorized for commercial use unless expressly permitted."
The Letter
The letter submitted by the News/Media Alliance demands that Common Crawl:
NMA's letter also serves as notice that the publishers listed in the Exhibit are requesting to join Common Crawl's Opt-Out Registry, with the expectation that the company will enforce the requirements listed above.
This letter follows other news publishers, as well as International News Associations including the Danish Rights Alliance and Alliance de la Presse d'Information Générale , that have previously requested Common Crawl remove their articles to prevent unauthorized use by AI companies.