A recent Corvinus study highlights a new AI Paradox: it learns from us until it replaces us – but that could be its limit

Back to news08/10/2024

A recent Corvinus study highlights a new AI Paradox: it learns from us until it replaces us – but that could be its limit

The proliferation of large language models (LLMs) like ChatGPT is diminishing public knowledge sharing on online Q&A platforms, which could hinder the training of future models, according to a recent study published by Corvinus University of Budapest.

A recently published international study by Corvinus University of Budapest provides a gap-filling analysis of the impact of ChatGPT on online communities that contribute to the public knowledge shared on the Internet, which will ultimately shape the future of artificial intelligence. The researchers pointed out that large language models (LLMs) have the potential to act as a substitute for traditional data and knowledge sources, which could lead to a decline in human-generated content. This shift may pose significant challenges for the development of future models, as there would be insufficient data to train them effectively. The study concludes that if language models result in less open, human-generated data, they will ultimately limit their own future training data sources and effectiveness.

The paper, authored by Johannes Wachs, Associate Professor at Corvinus, and his international research colleagues, examines the activity of Stack Overflow – an authoritative Q&A website for programmers – during the six months following the launch of ChatGPT, one of the most popular LLM models. The findings reveal a 25% reduction in Stack Overflow activity over the study period compared to its Russian and Chinese counterparts, where access to ChatGPT is limited, and similar mathemathics forums where ChatGPT is less effective. The researchers observed a significant decline in the number of posts across all user experience levels, from novice to expert. The study concludes that ChatGPT has, therefore, discouraged new contributions to Stack Overflow, including high-quality content.

The paradox of reuse

According to the study, the rise of large language models (LLMs) is poised to significantly alter how people search for, create and share information online. If LLMs like ChatGPT begin to replace traditional search and query methods, they could disrupt the human behaviour that originally generated the data needed to train these models. This phenomenon, known as the “paradox of reuse”, could have far-reaching social and economic consequences.

This substitution poses a threat to the future of the open web, as interactions with artificial intelligence models do not contribute to the expansion of online knowledge – the digital commons. Consequently, the quality of training data for future models could decline, as machine-generated content is unlikely to fully replicate human creativity and insight.

Our research has shown that ChatGPT reduces the likelihood of questions being asked and discussed on public websites. This is problematic because others often learn from these conversations, and it also limits the AI’s ability to evolve through new, high-quality web content. Training AI with only AI-generated content is like making a photocopy of a photocopy of a photocopy, resulting in progressively lower-quality outcomes,

said Johannes Wachs, researcher at Corvinus University of Budapest, in response to the study.

He added:

We know that the human feedback from the open internet facilitates the learning of large language models. However, data generated from interactions with privately owned language models will no longer be in the public domain, as it will belong to the owners of the LLMs. This is something to be mindful of, as it could have significant implications for both the public internet and the future of artificial intelligence.

The value of data ownership

The researchers concluded that data ownership will become increasingly important from an economic perspective. As data becomes more valuable, there will be growing interest in how those who create it can retain some of the value. Artificial intelligence applications like ChatGPT may thus create political and economic winners and losers, potentially contributing to inequality between individuals and companies.

The research, published in the September issue of PNAS Nexus, was co-authored by Johannes Wachs, Associate Professor at the Institute of Data Analysis and Informatics at Corvinus University of Budapest, R Maria del Rio-Chanona, Assistant Professor in the Department of Computer Science at University College London, and Nadzeya Laurentsyeva, Assistant Professor in the Faculty of Economics at Ludwig Maximilian University of Munich.