What is MDP ?

The Markov Decision Process (MDP) is a mathematical framework used for modelling the decision making problems where the outcomes are partly random and controllable. Before that some entities you have to know :

Agent : A reinforcement learning agent is the entity which is going to train according to the environment. Agent is going to decide either it is positively or negatively trained.
Environment : The Environment is defined as the interactive space for an agent. Agent is trained on environment. According to its space it is going to get trained.
State : The state is defined as the current state of an agent. This is defined as the position where the pointer/agent is get situated.
Action : The action is defined as the choice of an agent at a time stamp. According to the action the agent will get rewarded, or punished.
Policy : Policy is defined as the set of action the agent can perform at a certain circumstance. The policy is not fixed for all states.

This is the Basic things required for MDP. Let’s get start it.

Theory :

Let’s start with our first symbol ( Sₜ ). Which defined as markov if and only :

The markov property, which states that the probability of next step is highly dependent on the current stage (Sₜ), not dependent on the past series.(s1,s2,s3,s4……).
Let’s prove the Property :

A Stochastic process s1,s2,s3….sn said to have the markov property if for all time stamp t ,

expanding this …

According to MDP, the next state is dependent on the current state and current state is dependent on immediate previous step. Markov Decision Process is defined by two parameters those are (p,s) where p denoted as probability and s denoted as state. It consist of random states that are in sequence and they all are obey the markov property.

Return (G-t)

The Return (G-t) is refers to the discount reward from the timestamp t. Reward is a temporary factor. Even after picking an action get an descent reward, we might missing on a greater total reward in the long run process.

Markov Decision Process ( RL )

What is MDP ?

Theory :

Return (G-t)

Subscribe to my newsletter

Kiran Kumar

Kiran Kumar