Imagine you’re a doctor diagnosing a disease. You ask questions (symptoms), and based on probability tables you’ve built from past patients, you say “Hmm… 85% chance it’s the flu.” That’s Naive Bayes in action.

Naive Bayes is a probabilistic classifier based on Bayes’ Theorem. It assumes:

All features are independent (naive assumption).
We use probabilities to predict the most likely class for a given input.

Despite its simplicity, it’s a powerful tool, especially for text classification, spam detection, and medical diagnosis. Now that’s aura!!

Formula

Naive-Bayes is relatively simple to understand. You will see when we will have a look at the formula.

P(A | B): Probability of event A happening given B happened
P(B | A): Probability of B happening given A happened
P(A): Probability of A happening
P(B): Probability of B happening

Let’s say you're classifying emails as Spam or Not Spam.
You're analyzing if the word “Free” appears.

P(Spam): How often spam appears (say 60%)
P( "Free" | Spam ): Probability "Free" appears in spam
P("Free"): How often “Free” appears overall
Then compute P( Spam | "Free" ): How likely an email is spam given it contains "Free"

This allows you to calculate which class (Spam or Not Spam) has a higher probability for a given email.

As simple as that.

Types of Naive Bayes

Type	Use Case	Distribution
Complement	Sentiment, legal documents
Gaussian	Continuous data (e.g., age, salary)	Normal distribution
Multinomial	Word counts (NLP, emails)	Count-based
Bernoulli	Binary/Boolean features (yes/no)	Binary outcomes

Gausiaan Naive Bayes assumes that each feature is normally distributed. This follows bell curve. For example, features like:

Blood pressure
Age
Tumor radius
Sensor readings

These are continuous, not discrete or binary. Imagine you're a doctor diagnosing diseases. You don’t look for keywords like “fever” or “cough”; you look at measurable features: blood test values, temperature, etc.

If you’ve seen that people with cancer often have a tumor radius > 15mm, then the presence of this feature makes it more likely the person is diagnosed as malignant.

In the backend, it used Probability Density Function of Normal Distribution. The function looks like this.

x = feature value
μ= mean of the feature for class C
σ = standard deviation of the feature for class C

It does this for each feature, multiplies them, and gives you a probability per class.

Lets look at it with the help of code example.

Imagine, you're building a medical diagnostic system. Each patient has:

Age (years)
Tumor Size (mm)

Your goal: Predict if the tumor is Benign (0) or Malignant (1).

# Gaussian Naive Bayes
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

# Sample data
import pandas as pd

df = pd.DataFrame({
    'Age': [45, 50, 35, 23, 55, 60],
    'Tumor_Size': [12.5, 15.0, 10.2, 7.1, 20.1, 22.5],
    'Diagnosis': [0, 1, 0, 0, 1, 1]  # 0 = Benign, 1 = Malignant
})

print(df)

X = df[['Age', 'Tumor_Size']]
y = df['Diagnosis']

X_train, X_test, y_train, y_test = train_test_split(X, y)

# Train model
model = GaussianNB()
model.fit(X_train, y_train)

# Manually entered input
new_patient = [[45, 14.0]]  # Age 40, Tumor size 14.0 mm

prediction = model.predict(new_patient)
print("Prediction for new patient:", "Malignant" if prediction[0] == 1 else "Benign")

   Age  Tumor_Size  Diagnosis
0   45        12.5          0
1   50        15.0          1
2   35        10.2          0
3   23         7.1          0
4   55        20.1          1
5   60        22.5          1

Prediction for new patient: Benign

This is a small data, in real world, the data may be humongous.

Multinomial Naive Bayes assumes each feature is a non-negative integer count; like how many times a word appears. Unlike Gaussian Naive Bayes which assumes that every piece of data is normally distributed, Multinomial Naive Bayes focuses on uneven data too. This is perfect for text data classification, as we can infer results based on the occurance of the word.

Imagine you’re a librarian trying to guess which book genre someone is reading based on the count of words.

Book A has 25 instances of "sword", 14 of "battle", 9 of "king" → Fantasy
Book B has 40 instances of "court", 22 of "law", 12 of "defendant" → Legal

You don’t care where the word appears, just how many times. You assume that each word's contribution is independent of others (even if that’s rarely true in practice).

The best use case of Multinomial Naive Bayes is Email-Spam Detector. It is also very handy while performing sentiment analysis via reviews.

We will see the same in action via a code example:

# Multinomial Naive Bayes
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

texts = [
    "free money now", 
    "win big prizes", 
    "meeting at 3 pm", 
    "project deadline", 
    "exclusive deal just for you"
]
labels = [1, 1, 0, 0, 1]  # 1 = Spam, 0 = Ham

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = MultinomialNB()
model.fit(X, labels)

# Predict on a new message
new_message = ["win a free deal"]
new_vector = vectorizer.transform(new_message)

prediction = model.predict(new_vector)
print("Prediction for new message:", "Spam" if prediction[0] == 1 else "Ham")

Prediction for new message: Spam

You may change the message and check the results for yourself. Try training the model on larger data stored in an excel or csv file. That should make your spam detection mini project ready…

Next type of Naive Bayes is Bernoulli Naive Bayes, which assumes each feature is binary (0 or 1), representing presence or absence, not frequency.

It’s like asking:

Does the word “free” appear? → ✔️ or ❌
Is the email length > 1000 characters? → ✔️ or ❌

Bernoulli NB ignores how many times something appears; just whether it’s there or not.

Sometimes presence alone is more telling than frequency.

For example:

“You won a FREE gift now!”
Even once seeing “free” could be a strong spam signal.

But in Ham messages:

“Let’s catch up for free this weekend.”
The presence might not mean spam.

Lets clear the understanding by looking at a code example.

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import BernoulliNB

texts = [
    "win now", 
    "important meeting", 
    "free gift", 
    "see you at dinner", 
    "urgent cash offer"
]
labels = [1, 0, 1, 0, 1]  # 1 = Spam, 0 = Ham

vectorizer = CountVectorizer(binary=True)
X = vectorizer.fit_transform(texts)

model = BernoulliNB()
model.fit(X, labels)

# Predict on new message
new_msg = ["urgent free win"]
new_bin_vector = vectorizer.transform(new_msg)

prediction = model.predict(new_bin_vector)
print("Prediction for binary input message:", "Spam" if prediction[0] == 1 else "Ham")

Prediction for binary input message: Spam

Again, it can be trained on larger sets of data stored in excel or csv files.

We move on to the last type of Naive Bayes and that is Complement Naive Bayes. To put things simple, it acts as an balancer. For example, in Multinomial Naive Bayes, there may be cases of imbalnced data classification. Sometimes, 95% are ham and only 5% are spam. MultinomialNB would tend to default to majority class (Ham), misclassifying rare Spam cases.

Compplement Naive Bayes adjusts the probabilities by Learning from the complement of each class. It also puts more weight on the underrepresented class, giving it a fair shot.

Generalized example : A teacher grading essays may unintentionally favor good grammar over creativity. ComplementNB reminds the teacher to also reward rare creativity, even if grammar errors exist.

For our code example we will perform a sentiment analysis on data which has an imbalance.

from sklearn.naive_bayes import ComplementNB
from sklearn.feature_extraction.text import CountVectorizer

reviews = [
    "great product quality", 
    "amazing experience", 
    "bad service",
    "love it", 
    "worst ever", 
    "terrible support"
]
labels = [1, 1, 0, 1, 0, 0]  # 1 = Positive, 0 = Negative

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(reviews)

model = ComplementNB()
model.fit(X, labels)

# Predict on new review
new_review = ["best product quality"]
new_review_vector = vectorizer.transform(new_review)

prediction = model.predict(new_review_vector)
print("Prediction for new review:", "Positive" if prediction[0] == 1 else "Negative")

model.predict_proba(new_review_vector)  # acccess probabilities using this function

Prediction for new review: Positive
array([[0.21691974, 0.78308026]])

Final verdict cheat-sheet:

Type	Use When...	Key Difference
GaussianNB	You have continuous values	Assumes normal distribution
MultinomialNB	You count things (word freq)	Assumes integer counts, great for text/NLP
BernoulliNB	You just care if feature is present	Binary presence/absence of features
ComplementNB	You have class imbalance in text data	Better for recall on minority class

Conclusion

Naive Bayes doesn’t care about feature correlation, yet often performs surprisingly well, especially in NLP. Each Naive Bayes variant has its own use-case sweet spot, and you now know how to choose the right one. Even with strong assumptions, speed and simplicity make Naive Bayes a practical choice, and an ideal baseline.

We had a look at different types of Naive Bayes and which one is best to use according to certain situations.

Try tweaking around values a bit. We can also train the data on larger datasets for future usage.

Signing off for today. Ciao!!

Day 14: Naive Bayes Classifier – "Probability Meets Simplicity"

Table of contents

Formula

Types of Naive Bayes

Conclusion

Subscribe to my newsletter

Saket Khopkar

Saket Khopkar