AI begins to "absorb" itself: a new study sounds alarming

Carding 4 Carders · Oct 21, 2023

The AI decided to become a cannibal and "eat" itself.

Artificial intelligence-generated content is starting to populate the internet, and this could be bad news for future AI models. Language models such as ChatGPT are trained based on content found on the internet. As AI creates more and more "synthetic" content, an engineering problem known as "model collapse"can occur.

Filtering synthetic data from training models is becoming an important area of research, and is likely to grow as AI content fills the internet. Ouroboros is an ancient symbol of a snake devouring its own tail. In the age of AI, this symbolism takes on a new, sharp meaning. When editorial content created by AI language models starts flooding the Internet, it's accompanied by a lot of mistakes.

The Internet is the source for learning these language models. In other words, AI "consumes" itself. An AI can start learning from data full of errors until what it was trying to create becomes complete nonsense. This is what AI researchers call "model collapse." A recent study used a language model to generate text about English architecture. After repeatedly training the AI on this synthetic test, the response of the 10th model was completely meaningless.

To effectively train new AI models, companies need data that is not corrupted by synthetically created information. Alex Dimakis, co-director of the National AI Institute for Basic Machine Learning, says that a small collection of high-quality data can outperform a large synthetic one. At the moment, engineers have to sift through the data to make sure that the AI is not learning from synthetic data that it created itself.

AI begins to "absorb" itself: a new study sounds alarming

Carding 4 Carders

Professional

Similar threads