Model Collapse: When AI learns from AI

Asmit PhuyalAsmit Phuyal
2 min read

Lets imagine, a line of people playing telephone game. The last person, labeled F, whispers a message to E. E whispers what she heard to D, and the process continues till the message reaches A.

By the time A receives message, the message will be totally different from what F originally wanted to convey. There will be lots of distortions and inaccuracies.

Generated image

Likewise is the case for AI model training. If the synthetic data (AI-generated content) is used to train the next model, and again the new synthetic data is used to train another model, the final model tends to produce more homogenous output — more error prone, less useful, less diverse and less accurate.

Let’s get deeper in it.

You are probably familiar with the importance of diversity in ecosystems. The same applies to AI training data, else the models risk collapse.

With the rapid rise of LLMs, the internet is increasingly being flooded with AI-generated content. As LLMs are trained heavily on data scraped from internet, the future training datasets will contain the AI-generated data as inputs.

Organizations working on LLMs value human-generated data, which could become harder to find as time passes. I’ve heard ideas about making AI-generated data easily identifiable so that future systems can distinguish synthetic data from real. I mean some kind of human undetectable, but machine detectable informations to be within the AI-generated outputs.

But here’s the catch — there’s lot of AI bypassing tools for AI-generated content appear more human. If those AI-generated are again used to train the model, what happens ? A Deadlock ? An Infinite feedback loop ? Irreversible defects ?

I would appreciate if you guys let me know what do you think on this, but here’s what I think happens if model collapses ?

1. There will be poor decision for rare cases, models will struggle with rare and edge-case scenarios.
2. We expect the AI generated outputs to be unique, but users will be disappointed by repetitive or similar responses.
3. There may be amplified bias in certain rare topics, means model will rely more on dominant patterns.

Model collapse is a significant challenge in the future development of robust and reliable AI. If we don’t pay attention now, future AIs might lose the very thing that made them powerful — their connection to human experience.

0
Subscribe to my newsletter

Read articles from Asmit Phuyal directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Asmit Phuyal
Asmit Phuyal