Naive Bayes Classifier —ML Algorithm

Vishal RewaskarVishal Rewaskar
4 min read

While watching the tutorial, I was confused. What the heck is this? I got the point about how probability works and the mathematical intuition behind this theorem, but I'm confused about where, when, and why I should use this theorem.

And when I tried to focus on its names, I started getting clear answers. as you can see in the demonstration below.

Decompose the name

The name ‘Naive‘ is called that because it assumes features are independent, even if in reality they’re not. This helps for faster speed and works with small data and gives good accuracy. The ‘Bayes‘ refers to the Bayes Theorem. For solving classification problems, this one is good to go.

Now let’s see the proper definition.

It uses probability to determine how likely something belongs to a certain category, given some known data.

To understand “how we implement this theorem“. First, let’s understand the math intuition behind it

The probability of event A, given that B has already occurred, is known as Bayes theorem.

Where,

  • P(A/B) => Probability of event A, given B has occured

  • P(A) => Probability of event A

  • P(B) => Probability of event B

  • P(B/A) => Probability of event B, given A has occured

We have input features (occurred events), i.e., x1, x2, and x3. Using these, we are going to calculate (predict) y. The formula looks like this

For New Test Data : the denominator is constant, so we are going to remove it.

Example:

This dataset shows data collected over 14 days. It records three things each day:

  • Weather (Sunny, Overcast, Rain)

  • Temperature (Hot, Mild, Cool)

  • Whether tennis was played or not (Yes or No)

Problem Statement:
Based on the weather, which is Sunny and the temperature, which is Hot, predict if tennis will be played or not.

Step 1: Calculate Base Probabilities

First, count how many times tennis was played vs not played:

  • Tennis played (Yes): 9 days out of 14

  • Tennis not played (No): 5 days out of 14

Base probabilities:

  • P(Yes) = 9/14 ≈ 0.643

  • P(No) = 5/14 ≈ 0.357

Step 2: Calculate Conditional Probabilities

Weather Conditions

Count occurrences for each weather type when tennis was/wasn't played:

When Tennis = Yes (9 days):

  • Sunny: 2 days → P(Sunny|Yes) = 2/9

  • Overcast: 4 days → P(Overcast|Yes) = 4/9

  • Rain: 3 days → P(Rain|Yes) = 3/9

When Tennis = No (5 days):

  • Sunny: 3 days → P(Sunny|No) = 3/5

  • Overcast: 0 days → P(Overcast|No) = 0/5

  • Rain: 2 days → P(Rain|No) = 2/5

Temperature Conditions

Count occurrences for each temperature when tennis was/wasn't played:

When Tennis = Yes (9 days):

  • Hot: 2 days → P(Hot|Yes) = 2/9

  • Mild: 4 days → P(Mild|Yes) = 4/9

  • Cool: 3 days → P(Cool|Yes) = 3/9

When Tennis = No (5 days):

  • Hot: 2 days → P(Hot|No) = 2/5

  • Mild: 2 days → P(Mild|No) = 2/5

  • Cool: 1 day → P(Cool|No) = 1/5

Step 3: Apply Naive Bayes Formula

For our specific question (Sunny AND Hot):

Probability of Playing Tennis:

P(Yes|Sunny,Hot) = P(Yes) × P(Sunny|Yes) × P(Hot|Yes)
P(Yes|Sunny,Hot) = (9/14) × (2/9) × (2/9)
P(Yes|Sunny,Hot) = 0.643 × 0.222 × 0.222
P(Yes|Sunny,Hot) = 0.031

Probability of NOT Playing Tennis:

P(No|Sunny,Hot) = P(No) × P(Sunny|No) × P(Hot|No)
P(No|Sunny,Hot) = (5/14) × (3/5) × (2/5)
P(No|Sunny,Hot) = 0.357 × 0.6 × 0.4
P(No|Sunny,Hot) = 0.085

Step 4: Normalize to Get Final Percentages

Since we need probabilities that sum to 100%, we normalize:

Total probability = 0.031 + 0.085 = 0.116

Final probabilities:

  • P(Yes|Sunny,Hot) = 0.031 ÷ 0.116 = 0.27 = 27%

  • P(No|Sunny,Hot) = 0.085 ÷ 0.116 = 0.73 = 73%

Conclusion

When the weather is Sunny and temperature is Hot, there is a:

  • 27% chance tennis will be played

  • 73% chance tennis will NOT be played

This makes sense when you look at the data - both times it was sunny and hot (days 1 and 2), tennis was not played!

0
Subscribe to my newsletter

Read articles from Vishal Rewaskar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vishal Rewaskar
Vishal Rewaskar