Reinforcement Learning is the inevitable


The internet is full of bad data, and that is where the training data for LLMs are coming from. Some are polluting the internet with bad data out of spite. Most just spam the internet of AI generated data for profit One day, we might see a future where an outside training data is necessary in order for these models to train as reinforcement learning will be the new standard for model training. Feels odd, doesn’t it?
Why reinforcement learning
AGI (Artificial General Intelligence) has been in the talks right now especially with top executives working closely with AI. I sure ain’t got much, but I’m willing to bet a lot on AGI being done out of reinforcement learning training. You could imagine reinforcement learning as a brute force algorithm (in comparison to traditional neural networks architecture) that tries every set of possible solutions in a given environment space, and then the most optimal one is chosen (according to rewards and punishments set).
What made reinforcement learning different
Reinforcement learning has proven time and time again that models trained out of it always find the most efficient way to reach its goal, often surpassing human intuition and assumptions about the domain. An autonomous helicopter even learned to fly in an inverted manner through reinforcement learning as its main goal is to just learn how to fly, stay above the surface, and do not crash. Isn’t it kind of funny that we humans hadn’t ever thought of flying in this way?
Types of Reinforcement Learning Algorithms
Value-Based
Policy-Based
Model-Based
Actor-Critic Methods
Conclusion
Reinforcement Learning isn’t getting too much of a hype now and it’s understandable because it isn’t as feasible as we thing it is. Doing reinforcement learning is very much more compute and memory intensive in comparison to traditional way of doing neural networks and machine learning. In a time where the compute becomes less expensive, we will see a world where reinforcement learning is a more prominent way to train AI.
Subscribe to my newsletter
Read articles from Harvey Ducay directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
