Adaptive Intelligence: Reinforcement Learning Demystified
The most exhilarating and memorable childhood memories were of fun activities such as bicycle riding and playing video games. Learning to ride a bicycle was initially an uphill task, talk of the numerous falls while pedaling but there were thrilling moments where I could balance and delicately pedal for a short distance. In retrospect, those exciting and adrenaline rush moments that were so rewarding that I gradually got to enjoy bike riding.
Behaviors and skills such as bike riding as mentioned above, are learnt. The brain rewards behavior it finds desirable therefore enabling future repetition. Consequently, this forms the basis of human and animal habits. In scientific terms, this is referred to as reinforcement learning.
Reinforcement Learning is an adaptive process in which animals and humans adapt behavior when selecting actions in the face of reward and punishment.
Reinforcement and Punishment
In operant conditioning, various everyday words are used - positive, reinforcement and punishment. In this case, positive means you are adding something, negative means you are taking something away. Reinforcement means you are increasing behavior while punishment means you are decreasing behavior. Reinforcement can be positive or negative so is punishment either positive or negative. All reinforcers (positive or negative) increase the likelihood of a behavioral response while all punishers (positive or negative) decrease the likelihood of a behavioral response.
There are two types of reinforcement positive reinforcement and negative reinforcement. Also two types of punishment positive punishment and negative punishment.
Reinforcement
The most effective way to teach a person or an animal a new behavior is with positive reinforcement. Positive reinforcement, involves the addition of a reinforcing stimulus following a behavior that makes it likely for the behavior to repeat in future. Below is a real-life example:
In animal training, in this case training a dog to sit when called upon by the trainer, the dog may be rewarded with a treat after it responds correctly to the instruction.
A bonus at work after an employee exceeds the quota sales is a monetary reward that most likely motivates the recipient.
In negative reinforcement, it strengthens the response or behavior by stopping or removing aversive stimulus. This removal of an undesired stimulus reinforces the behavior that proceeds it, making it more likely that the response will recur in future. Here is a real-life example:
- Car manufacturers use this principle in the seat belt system, which produces a beep sound until the seatbelt is fasted. This annoying sound only stops when a passenger exhibits desired behavior of buckling the belt. This increases the likelihood of one buckling the belt in future.
Punishment
Punishment decreases a behavior. In positive punishment, an undesirable stimulus is added to decrease a behavior. For example, scolding a student to get the student to stop texting in class. In this scenario, the stimulus (the scolding) is added in order to decrease the behavior (texting in class).
In negative punishment, a pleasant stimulus is removed in order to decrease behavior. For example, when a pupil fails to do their assignment, a parent can take away their favorite toy. In this case, the stimulus (the toy) is removed in order to decrease the behavior.
Computational Reinforcement Learning
The computational field of reinforcement learning has provided a normative framework within which conditioned behavior can be understood. Reinforcement Learning (RL) is a machine learning training method based on rewarding desired behaviors and punishing undesired ones. RL is about learning the optimal behavior in an environment to obtain maximum reward.
A good example of this is self-driving cars, Google DeepMind technologies such as Alpha Star and Alpha Go.
Reinforcement Learning uses algorithms that learn from outcomes and decide what action to take next. After each action, the algorithm receives feedback that helps determine whether the choice it made was correct, neutral or incorrect.
Reinforcement Learning is an autonomous, self-teaching system that essentially learns by trial and error. It performs actions with the aim of maximizing rewards, or in simple terms, it learns by doing in order to achieve best outcomes.
How Does Reinforcement Learning Work?
The working principle of RL is based on the reward function. Taking for example an common physiological example:
Below is a pictorial representation of the RL model. A computer may represent an agent in a particular state (St). It takes action (At) in an environment to achieve a specific goal. As a result of the performed task, the agent receives feedback as a reward or punishment. The goal of the agent is to improve its action so that it can get more rewards in the long run.
Elements of reinforcement learning:
Agent - the program controlling the object of concern( example, a robot)
Environment - the outside world programmatically. Everything the agent(s) interacts with is part of the environment.
Rewards - it gives a score of how the algorithm performs with respect to the environment. It is represented as 1 or 0. 1 means that the policy network made the right move, 0 means the wrong move.
Policy - the algorithm used by the agent to decide its actions. It can be model-based or model free.
Reinforcement Learning algorithms
There are two main types of Reinforcement Learning algorithms:
Model-free algorithms
Model-free algorithms do not require a model of the environment to operate. They learn directly from experience or trial-and-error and use the feedback they receive to update their internal policies and value functions.
Model-based algorithms
Model-based algorithms require a model of the environment to operate. They learn the dynamics of the environment from experience and used the learned model to predict the outcomes of actions.
Real Life examples of Reinforcement Learning
It is an autonomous racing car designed to allow developers get hands-on-skills on reinforcement learning. The car is 1/18th scale model that is controlled by a computer and can be programmed to navigate a physical track using machine learning algorithms. It also includes a virtual racing league where developers can race their cars in a simulated environment.
Autopilot is an advanced driver assistance system that enhances safety and convenience behind the wheel. While, Full self-driving capability allows the vehicle to drive itself almost anywhere with minimal driver intervention. The vehicle is equipped with multiple external cameras and powerful vision processing to provide an additional layer of security.
Self driving cars are autonomous decision making systems. They exemplify application of reinforcement learning.
Industrial Robots
In manufacturing processes industrial robots have been crucial for performing repetitive tasks with extreme precision and accuracy. Reinforcement Learning has been a technique for training robotic agents to perform tasks.
An example of this application is the use of Industrial Robots tasks such as welding, putting seats into a car in an assembly line etc.
Conclusion
As research continues to push the boundaries of what is possible, the future of Reinforcement Learning holds promises of increased efficiency ,adaptability and societal benefit through the use of these intelligent systems.
Subscribe to my newsletter
Read articles from Njeri Gitome directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Njeri Gitome
Njeri Gitome
Data Scientist