The internet is full of bad data, and that is where the training data for LLMs are coming from. Some are polluting the internet with bad data out of spite. Most just spam the internet of AI generated data for profit One day, we might see a future where an outside training data is necessary in order for these models to train as reinforcement learning will be the new standard for model training. Feels odd, doesn’t it?

Why reinforcement learning

AGI (Artificial General Intelligence) has been in the talks right now especially with top executives working closely with AI. I sure ain’t got much, but I’m willing to bet a lot on AGI being done out of reinforcement learning training. You could imagine reinforcement learning as a brute force algorithm (in comparison to traditional neural networks architecture) that tries every set of possible solutions in a given environment space, and then the most optimal one is chosen (according to rewards and punishments set).

What made reinforcement learning different

Reinforcement learning has proven time and time again that models trained out of it always find the most efficient way to reach its goal, often surpassing human intuition and assumptions about the domain. An autonomous helicopter even learned to fly in an inverted manner through reinforcement learning as its main goal is to just learn how to fly, stay above the surface, and do not crash. Isn’t it kind of funny that we humans hadn’t ever thought of flying in this way?

Types of Reinforcement Learning Algorithms

Value-Based
Policy-Based
Model-Based
Actor-Critic Methods

Conclusion

Reinforcement Learning isn’t getting too much of a hype now and it’s understandable because it isn’t as feasible as we thing it is. Doing reinforcement learning is very much more compute and memory intensive in comparison to traditional way of doing neural networks and machine learning. In a time where the compute becomes less expensive, we will see a world where reinforcement learning is a more prominent way to train AI.

Reinforcement Learning is the inevitable

Why reinforcement learning

What made reinforcement learning different

Types of Reinforcement Learning Algorithms

Conclusion

Subscribe to my newsletter

Harvey Ducay

Harvey Ducay