Conditional Probability, Bayes Theorem and Chain Rule of Probability.

Yash AmbekarYash Ambekar
5 min read
  • Conditional probability provides us with a way to reason about the outcome of an experiment, based on partial information.

  • Suppose we know that the outcome is within some given event B. We wish to express the likelihood that the outcome also belongs to some other given event A. We thus seek to construct a new probability law, which takes into account this knowledge and which, for any event A, gives us the conditional probability of A given B, denoted by P(A | B).

  • The conditional probability of an event A, given an event B with $$P(B) > 0$$, is defined by
    $$P(A|B) = \frac{P(A \cap B)}{P(B)}$$

  • Dividing $$P(A \cap B)$$ by $$P(B)$$ essentially normalizes the joint probability by the probability of event B. It gives us the proportion of event B outcomes where event A also occurs, out of all the outcomes where event B occurs.

  • For example: Suppose that all six possible outcomes of a fair die roll are equally likely. If we are told that the outcome is even, we are left with only three possible outcomes, namely, 2, 4, and 6. Thus, it is reasonable to let, $$P(A = the \ outcome \ is \ 6 | B = the \ outcome \ is \ even ) = \frac{1}{3}$$

  • In the case where the possible outcomes are finitely many and equally likely, we have $$P(A | B) = \frac{number \ of \ elements \ of \ A ∩ B} {number \ of \ elements \ of \ B} $$

  • Generalizing the argument, we introduce the following definition of conditional probability, $$P(A | B)=\frac{P(A ∩ B)}{P(B)}$$ where we assume that $$P(B) > 0$$

  • In words, out of the total probability of the elements of B, P(A | B) is the fraction that is assigned to possible outcomes that also belong to A.

Bayes Theorem

P(A|B) represents the probability of event A occurring given that event B has already occurred. In other words, it measures the likelihood of event A happening among the subset of outcomes where event B has already taken place. $$P(A|B) = \frac{P(A) \cdot P(B|A)}{P(B)}$$

P(A|B)--> Posterior (Considering the probability of an event to occur after considering any available evidence).
P(A)--> Prior (Considering the probability of an event to occur before considering any available evidence).
P(B|A)--> Likelihood (probability of the evidence, given the belief is true).
P(B)--> Marginal Probability or Evidence (condition or the event that has already occurred, under any circumstances).

Derivation of the Bayes Theorem:

Prove:

$$P(A|B) = \frac{P(A) \cdot P(B|A)}{P(B)}$$

Proof:
  • The conditional probability of event A given event B is defined as:

$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$

  • Definition of Joint Probability in terms of P(B|A):

$$P(A \cap B) = P(A) \cdot P(B|A)$$

  • This equation states that the probability of both events A and B occurring is equal to the probability of event A happening multiplied by the probability of event B occurring given that event A has occurred.

  • Dividing both sides of the equation by P(B), we get:

$$\frac{P(A) \cdot P(B|A)}{P(B)} = \frac{P(A \cap B)}{P(B)}$$

  • Recognizing that $P(A|B)$ is the probability of event A given event B, the derivation concludes: $$P(A|B) = \frac{P(A) \cdot P(B|A)}{P(B)}$$

Chain Rule of Probability

  • In mathematical terms, we are dealing with an event A which occurs if and only if each one of several events A1, . . . , An has occurred, i.e., A = A1 ∩ A2 ∩ · · · ∩ An. The occurrence of A is viewed as an occurrence of A1, followed by the occurrence of A2, then of A3, etc. The probability of A is given by the following rule: $$P(A_1, A_2, A_3, ..., A_n) = P(A_1) \cdot P(A_2|A_1) \cdot P(A_3|A_1, A_2) \cdot ... \cdot P(A_n|A_1, A_2, ..., A_{n-1})$$ $$P(A_1, A_2, A_3, ..., A_n)= \prod\limits_{i = 1}^{n} P(A_i|A_1,...,A_{i-1})$$

  • It is also known as the Multiplication Rule of Probability.

Proof of the Chain Rule of Probability

  • The chain rule of probability allows us to calculate the joint probability of multiple events by considering their conditional probabilities.

  • Consider a set of events A1, A2, A3, ..., An. We want to calculate the joint probability of all these events occurring together.

Prove:
  • Using the definition of conditional probability, we can express the joint probability as:$$P(A_1, A_2, A_3, ..., A_n) = P(A_1) \cdot P(A_2|A_1) \cdot P(A_3|A_1, A_2) \cdot ... \cdot P(A_n|A_1, A_2, ..., A_{n-1})$$ $$P(A_1, A_2, A_3, ..., A_n)= \prod\limits_{i = 1}^{n} P(A_i|A_1,...,A_{i-1})$$
Proof:
  • Each term in this product represents the probability of an event given that all the previous events have occurred. By applying the definition of conditional probability repeatedly, we can derive the chain rule.

  • For example, to derive the term $$P(A_3|A_1, A_2)$$
    , we can write:

$$P(A_3|A_1, A_2) = \frac{P(A_3 \cap (A_1, A_2))}{P(A_1, A_2)}$$

  • Expanding the joint probabilities, we have:

$$P(A_3|A_1, A_2) = \frac{P(A_3 \cap A_1 \cap A_2)}{P(A_1 \cap A_2)}$$

  • We also have $$P(A2|A1)$$ similarly,

$$P(A_2|A_1) = \frac{P(A_2\cap A_1)}{P(A_1)}$$

  • Substitute these probabilities to the RHS of the equation $$\begin{split}RHS&= P(A_1) \cdot P(A_2|A_1) \cdot P(A_3|A_1, A_2) \cdot ... \cdot P(A_n|A_1, A_2, ..., A_{n-1})\\\\&= P(A_1)\cdot \frac{P(A_2\cap A_1)}{P(A_1)} \cdot \frac{P(A_3 \cap A_1 \cap A_2)}{P(A_1 \cap A_2)}\cdot ...\cdot \frac{P(A_n \cap A_1 \cap A_2....\cap A_{n-1})}{P(A_1 \cap A_2....\cap A_{n-1})}\end{split}$$

  • Cancel out the common terms and you get $$P(A_n \cap A_1 \cap A_2....\cap A_{n-1})= P(A_1 \cap A_2....\cap A_{n-1} \cap A_n) = LHS$$

  • This derivation demonstrates how the chain rule allows us to decompose the joint probability into a product of conditional probabilities.

Thank You!

Resources

1. Introduction to Probability by Dimitri P. Bertsekas and John N. Tsitsiklis.

2. The Math Behind Bayesian Classifiers Clearly Explained! by Normalized Nerd.

0
Subscribe to my newsletter

Read articles from Yash Ambekar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Yash Ambekar
Yash Ambekar

I am a 4th year BTech undergrad pursuing Computer Science Engineering from the Indian Institute of Information Technology, Pune. I have a keen interest in Natural Language Processing and its application in our daily life. I love to play Football and watch Anime (Shonen, Slice of Life, Rom Com). Hope you have a great day!