Naive Bayesian is a classifier using Bayes’ theorem with ‘naive’ assumptioins.

Suppose we are solving a classification problem, with features denoted as $\mathbf X$, and class results as $\mathbf Y$. We would like to train a classifier for the class results given some feature values. Bayes’ theorem tells us the probability

Why Don't We Just Calculate $P(\mathbf Y \mid \mathbf X)$

Being naive, we will assume that the features are independent of each other, i.e., don’t have interactions with each other in terms of predictions. In this case we simply write the theorem as

We do not care about $\prod_i P(X_1)$ because it only serves as a normalization factor. Besides, it could be hard to calculate in some cases.


In Eq. $\eqref{eq-naive-approximation}$, we have a bunch of probabilities multiplied together. Probabilities are no larger than 1 so this expression is usually tiny. It is not our computer’s biggest strength to deal with tiny numbers. So we will simply place a log on both sides of the equation in order to work with normal numbers.

Other Topics

  1. Laplace Correction
  2. Continuos Values for $\mathbf Y$: Gaussian Naive Bayes, etc