Have you ever checked the weather and seen a "70% chance of rain"? That percentage is a probability—a powerful mathematical tool that helps us quantify uncertainty. While it might bring to mind coin flips and card games, probability is the bedrock of modern technology, from the algorithms that power AI to the spam filters that clean up your inbox.

This guide, inspired by a lecture from AI VIETNAM, will walk you through the essential concepts of probability, showing you how we can go from simple chance to sophisticated predictive models.

The Language of Chance: Core Concepts

To properly discuss probability, we must first agree on the terminology.

Probability: At its heart, probability is a numerical value between 0 and 1 that measures the likelihood of an event occurring. A probability of 0 signifies an impossible event, while a probability of 1 means the event is certain.
Experiment: This is any action or process with an observable result, such as rolling a die, tossing a coin, or drawing a card from a deck.
Outcome: This is a single result from an experiment. For example, if you roll a die, one possible outcome is a '4'.
Sample Space (S): This is the complete set of all possible outcomes of an experiment.
- For a single die roll, the sample space is
  
  S={1,2,3,4,5,6}.
- For a coin toss, the sample space is
  
  S={heads, tails}.

Event: This is the specific outcome or set of outcomes that we are interested in. An event is always a subset of the sample space. For example, in a die-rolling experiment, the event "the number is even" corresponds to the set {2,4,6}.

How Events Relate and Combine

Events rarely exist in isolation. Understanding how they interact is key to solving more complex probability problems.

Intersection (A∩B)

The intersection of two events is the collection of outcomes that are present in both events. Think of it as the "AND" operator.

Example: Let's conduct an experiment of rolling a single die.
- Event A is "the number rolled is even":
  
  A={2,4,6}.
- Event B is "the number rolled is divisible by 3":
  
  B={3,6}.
- The intersection, A∩B, is the outcome that is both even and divisible by 3, which is simply {6}.

Union (A∪B)

The union of two events includes all outcomes that are in event A, in event B, or in both. Think of it as the "OR" operator.

Example: Using the same events A and B from above, their union, A∪B, would be all numbers that are either even or divisible by 3.
- Ex1: A∪B={2,3,4,6}.

Complement (A′ or A)

The complement of an event A consists of all outcomes in the sample space that are not in A. It's everything else that could have happened. The probability of an event and its complement always add up to 1: P(A)+P(A′)=1.

Example: If Event A is "rolling a number different than 1 and 6," its complement, A′, is "rolling the number 1 or 6".
The probability of A' is: P({1})+P({6})=1/6+1/6=1/3.
Therefore, the probability of A is: 1−P(A′)=1−1/3=2/3.

The Rules of Calculation

With our vocabulary set, we can now explore the formulas that govern probability.

The Addition Rule

This rule helps us calculate the probability of a union of events.

If two events, A and B, are mutually exclusive (they cannot happen at the same time), the formula is simple: P(A or B)=P(A)+P(B).
For events that are not mutually exclusive, we must use the general formula to avoid double-counting the intersection: P(A or B)=P(A)+P(B)−P(A and B).

Conditional Probability and the Multiplication Rule

These two rules are deeply intertwined and fundamental to understanding how events influence each other.

Conditional Probability is the probability that event A will occur, given that event B has already occurred. Its formula is:

P(A∣B)=P(B)P(A∩B)

Example: A fair die is rolled. What is the probability that the number is a five (A), given that it is odd (B)?
- Event A = {5}, so P(A)=1/6.
- Event B = {1,3,5}, so P(B)=3/6=1/2.
- The intersection A∩B is {5}, so P(A∩B)=1/6.
- P(A∣B)=1/21/6=1/3.
- This makes sense intuitively: if we know the number is odd, there are only three possibilities (1, 3, 5), and only one of them is a five.

The Multiplication Rule is a rearrangement of the conditional probability formula and is used to find the probability of two events happening in sequence:

P(A and B)=P(A)⋅P(B∣A)

Example: In a factory, there are 100 units, 5 of which are defective. We randomly pick three units without replacement. What is the probability that none of them are defective?
- The probability the 1st unit is good is 95/100.
- Given the 1st was good, the probability the 2nd is good is 94/99.
- Given the first two were good, the probability the 3rd is good is 93/98.
- The total probability is the product:
  
  P(all three are good)=95/100⋅94/99⋅93/98≈0.8560.

Advanced Tools for Complex Problems

Now we move to two powerful theorems that are workhorses in the fields of statistics and machine learning.

The Law of Total Probability

This law is used to find the probability of an event when its outcome depends on a previous, uncertain stage. It works by considering every possible "cause" or starting scenario, weighting each by its own probability, and summing the results. This requires a complete system of events, meaning the starting scenarios are mutually exclusive and cover all possibilities.

The formula is:

P(H)=∑P(Ai)⋅P(H∣Ai).

Example: You have three bags, each with 100 marbles.
- Bag 1: 75 red, 25 blue.
- Bag 2: 60 red, 40 blue.
- Bag 3: 45 red, 55 blue. You choose a bag at random and then pick one marble. What is the probability the marble is red (R)?
- The probability of choosing any bag is 1/3. So,
  
  P(B1)=P(B2)=P(B3)=1/3.
- The conditional probabilities are:
  
  P(R∣B1)=0.75, P(R∣B2)=0.60, and P(R∣B3)=0.45.
- Using the formula:
  
  P(R)=P(R∣B1)P(B1)+P(R∣B2)P(B2)+P(R∣B3)P(B3).
- P(R)=(0.75⋅1/3)+(0.60⋅1/3)+(0.45⋅1/3)=0.60.

Bayes' Rule: The Art of Reversing Probability

If Total Probability lets us calculate the probability of an effect based on its causes, Bayes' Rule does the opposite: it lets us calculate the probability of a specific cause, given that we've observed an effect. It's a method for updating our beliefs in light of new evidence.

The formula is:

P(Ai∣H)=P(H)P(H∣Ai)P(Ai)

Example: Let's continue with the marble problem. Suppose you performed the experiment and observed that the chosen marble is red. What is now the probability that you chose Bag 1?
- We are looking for P(B1∣R).
- From our previous calculation, we know the denominator, P(R)=0.60.
- The numerator is P(R∣B1)P(B1)=0.75⋅1/3=0.25.
- Plugging this into Bayes' Rule:
  
  P(B1∣R)=0.25/0.60=5/12.
- Initially, there was a 1/3 (or 4/12) chance you picked Bag 1. After observing a red marble, the probability has increased to 5/12 because Bag 1 has the most red marbles.

This ability to "reverse" the probability is what makes Bayes' Rule so crucial for things like medical diagnoses (probability of a disease given a symptom) and spam filtering (probability an email is spam given it contains the word "offer").

Source: Documents/2025-6/M02W02 - Xác suất cơ bản/[Slide]-Basic-Probability.pdf

Basic Probability

Table of contents