Week 7 : 2. Naive Bayes

garv aggarwalgarv aggarwal
2 min read

Naive Bayes are a set of Supervised machine learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable. Bayes’ theorem states the following relationship, given class variable and dependent feature vector X1 through Xn :

In spite of their apparently over-simplified assumptions, naive Bayes classifiers have worked quite well in many real-world situations, famously document classification and spam filtering. They require a small amount of training data to estimate the necessary parameters. (For theoretical reasons why naive Bayes works well, and on which types of data it does, see the references below.)

Naive Bayes learners and classifiers can be extremely fast compared to more sophisticated methods. The decoupling of the class conditional feature distributions means that each distribution can be independently estimated as a one dimensional distribution. This in turn helps to alleviate problems stemming from the curse of dimensionality.

Multinomial Naive Bayes :

Multinomial Naive Bayes is an algorithm that applies Naive Bayes algorithm to the Multinomially distributed data. This algorithm is used to categorizes text for Spam filteration. This algorithm works in 3 major steps :

  1. Calculating the prior probability or probability of how many emails are spam or not.

  2. Calculating probability for each word having that they are spam or not.

  3. Finally the algorithm calculates the posterior probability by Bayes theorem.

This algorithm doesn’t consider a relation between the different words of an email in spam filteration. It considers every word seperate to find the probability. Cause when humans talk there is relation between the words they say according to the word order there is a chance for Bias in this algorithm.

Gaussian Naive Bayes :

Gaussian Naive Bayes is an algorithm that applies Naive Bayes algorithm to the Normally distributed data. This model assumes that features of each distribution are Gaussian in nature. For each class, the algorithm calculates the mean and standard deviation of each feature. When a new data point needs to be classified, it calculates the probability density function (PDF) for each feature value under the Gaussian distribution of that feature within each class. The product of these PDFs, along with the prior probability of each class, is used to calculate the overall probability of the data point belonging to that class. The algorithm assigns the data point to the class with the highest calculated probability.

0
Subscribe to my newsletter

Read articles from garv aggarwal directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

garv aggarwal
garv aggarwal