10/01/2025 | Press release | Distributed by Public on 10/01/2025 04:17
ChatGPT has not decreased activity on the world's largest online encyclopaedia, but AI data scrapers and the influence of Large Language Models still cast a shadow over its future research suggests.
Work by King's College London examined changes to the aggregate views of Wikipedia over 12 languages, with six of those languages being open to ChatGPT and the others not. The researchers found no sign of reduced usage since the AI model was introduced in 2022.
However, they did note a slowed growth in usage in languages where ChatGPT was active compared those where it was not, suggesting the programme has had a limited impact.
In 2021, a long-time Wikipedia editor infamously raised the idea of the 'death' of the platform due to the influence of AI. In this scenario, chatbots like GPT would supplant Wikipedia as the primary source of online information, replacing human editors with AI generated overviews and polluting the information sphere through well documented hallucinations.
Some in the industry fear this has come to pass with worldwide web traffic to referral sites, of which Wikipedia is the largest, falling by 15% between June 2024 and June 2025.
Our work did not confirm the most alarmist scenario, but we're not out of the woods yet. AI developers are letting their scrapers loose on Wikipedia to train them on high quality data, pushing up traffic to levels where Wikipedia's servers are struggling to keep up. Generative AI summaries are also using Wikipedia's data in web searches but not crediting sources, siphoning web traffic away while borrowing the platform's work."
Professor Elena SimperlThe paper, published in ACM Collective Intelligence, refutes this form of 'death'. However, the researchers suggest the increased cost of running servers due to the influx of AI data scrapers using Wikipedia to train AI model is increasing rapidly, which the website's moderators say could still threaten the current structure of the platform.
Professor Elena Simperl, Professor of Computer Science at King's and Co-Director of the King's Institute for Artificial Intelligence, said "Our work did not confirm the most alarmist scenario, but we're not out of the woods yet. AI developers are letting their scrapers loose on Wikipedia to train them on high quality data, pushing up traffic to levels where Wikipedia's servers are struggling to keep up. Generative AI summaries are also using Wikipedia's data in web searches but not crediting sources, siphoning web traffic away while borrowing the platform's work.
"For free services like this, no-one stops to ask how it's being paid for - and now Wikipedia is having to make the tough decision of where to allocate their limited resources to deal with this. It's vital as a community we take steps to protect this important platform, and we hope to turn our work into a monitoring tool where the community can track how AI is impacting Wikipedia."
Ultimately, we need a new social contract between AI companies and providers of high-quality data like Wikipedia where they retain more power over their material, while still allowing for their data to be used as training material.
Postdoc and first author Neal ReevesWikipedia is the largest online crowdsourced encyclopaedia, consisting of over 6.6 million articles in 292 languages as of 2023, and is a major source of free information for search engines and numerous online communities. This is particularly case for languages outside of Europe and East Asia, who depend on Wikipedia heavily for access to freely available information.
Postdoc and first author of the study Neal Reeves suggests there are steps available to protect Wikipedia. "Ultimately, we need a new social contract between AI companies and providers of high-quality data like Wikipedia where they retain more power over their material, while still allowing for their data to be used for training purposes.
"Collaboration, like that seen in programmes like MLCommons, is needed to reach across the aisle and ensure that the next generation of AI models are trained well, but in a way that doesn't destroy one of the free internet's greatest resources."
In the future, the team hope to use the feedback they've received from the Wikipedia community to develop an openly available monitoring tool that users from across the world can deploy to run analyses on the state of Wikipedia easier with more rigorous analytical methods.