Linear Regression and Likelihood

The linear estimator $y$ is

As usual, we have redefined our data to get rid of the intercept $\beta^0$.

In ordinary linear models, we find the error being the difference between the target $\hat y$ and the estimator $y$

which is required to have a minimun absolute value.

We could use least squares to solve the problem. However, instead of using a deterministic estimator $\beta^m X_m^{\phantom{m}n}$, we assume a Gaussian random estimator

where we have used the knowledge of linear regression, that the mean of the estimator should be a linear model $\beta^m X_m^{\phantom{m}n}$. The likelihood becomes

It is not surprising that requiring the maximum likelihood will lead to the same result as least squares due to log takes out exponential.

Bayesian Linear Model

Applying Bayes’ theorem to this problem,

Since ${\color{red}P(\hat y^n)}$ doesn’t depend on the parameters and is a constant, we will ignore it for the sake of optimization.

Fall back to Maximum Likelihood

We will assume a least information model for $P([X_m^{\phantom{m}n}, \beta^m])$, that is

Our posterior becomes

This is nothing but Ridge loss with coefficient $\lambda$, where