Bias and Variance

Suppose we have a perfect model $f(X)$ that represent a tight model of the dataset $(X,Y)$ but some irredicible error,

On the other hand, we could get another model using a specific model such as k-nearest neighbors, which we denote as .

Why the two models?

What is our bias? It measures the deficit between $k(X)$ and the perfect model $f(X)$,

Zero bias means we are matching the perfect model.


What is variance? Variance is about the model itself:

The larger the variance, the more wiggly the model is.

Mean Square Error

Bias measures the deficit between the specific model and the perfect model. How do we measure the deficit between the specific model and the actual data point? We need Mean Squared Error (MSE).

The Mean Squared Error (MSE) is defined as

A straightforward decomposition using equation ($\ref{dataset-using-true-model}$) shows that we have three components in our MSE. To make the equations look nice, we drop the $(X)$, hence $k$ in the equation means $k(X)$.

We have this Irreducible Error because the mean of the irreducible error is required to be zero, $\operatorname{E}(\epsilon)=0$. If this is not zero then the model $f(X)$ is not perfect.

Bias-Variance Tradeoff

The more parameters we introduce in the model, it is more likely to reduce the bias. However, at some point, the more complexity we have in the model, the more wiggles the model will have. Thus the variance will be larger.

Free Parameters