When we talk about Reinforcement Learning (RL), just focus on the word "reinforce". The prefix "re" means again and again — and that’s exactly what an RL agent does. It learns from the environment through trial and error, continuously improving its behavior to maximize rewards. Like a child learning to walk, the agent stumbles, learns, adapts, and finally succeeds.

The Building Blocks of Reinforcement Learning:-

There are 5 core components that shape any RL system:

1. Agent - The learner or decision-maker.

2. Environment - The external system the agent interacts with.

3. State - The current situation of the environment.

4. Action - Choices the agent can make.

5. Reward - Feedback from the environment, can be positive or negative

Where is RL Used?

Reinforcement learning isn't just theory - it's changing the world. Some real-world applications include:

Self-driving cars - learning the safest and most efficient routes.

Recommendation systems - like YouTube and Netflix improving what you see.

Games - from chess to Go, RL has beaten world champions.

Robotics - teaching machines how to walk, pick objects, or even dance.

Dynamic pricing - in e-commerce or airlines adjusting prices smartly..

Markov Decision Process The Brain Behind the Agent

Let's simplify this big term: Markov Decision Process (MDP).

Think of it like this - in any state, the agent has multiple actions to choose from. Each action may lead to a different reward. Some good, some bad. The goal? Pick the action that leads to the highest reward in the long run.

So, MDP is a mathematical way to model this "decision-making under uncertainty."

Explore vs Exploit - The Agent's Dilemma

An intelligent agent must explore new possibilities but also exploit what it already knows. It's like choosing between trying a new restaurant or going to your favorite one. If the agent only explores, it may never settle. If it only exploits, it may miss out on better rewards. The balance is key.

Value Function - Knowing What's Best

Here comes a crucial concept - Value Function.

It helps the agent understand how good a state or action is, based on expected future rewards. In simple terms, it tells the agent:

"If you're here, and you act like this, this is how much reward you can expect."

There are two types:

State Value Function (V) - Value of being in a state.

Action Value Function (Q) - Value of taking an action in a state (written as Q(s, a)).

Q = Quality - how is taking action 'a' in state 's'

Policy - The Agent's Rulebook

Lastly, every smart agent needs a strategy – a policy.

It's the set of rules that the agent follows to decide which action to take in each state to maximize rewards over time.

Reinforcement Learning Part 1

Subscribe to my newsletter

Deeksha Naik

Deeksha Naik