## Linear Regression and Likelihood

The linear estimator $y$ is

As usual, we have redefined our data to get rid of the intercept $\beta^0$.

In ordinary linear models, we find the error being the difference between the target $\hat y$ and the estimator $y$

which is required to have a minimum absolute value.

We could use least squares to solve the problem. However, instead of using a deterministic estimator $\beta^m X_m^{\phantom{m}n}$, we assume a Gaussian random estimator

where we have used the knowledge of linear regression, that the mean of the estimator should be a linear model $\beta^m X_m^{\phantom{m}n}$. The likelihood becomes

It is not surprising that requiring the maximum likelihood will lead to the same result as least-squares due to log takes out exponential.

## Bayesian Linear Model

Applying Bayes’ theorem to this problem,

Since ${\color{red}P(\hat y^n)}$ doesn’t depend on the parameters and is a constant, we will ignore it for the sake of optimization.

Fall back to Maximum Likelihood

We will assume a least information model for $P([X_m^{\phantom{m}n}, \beta^m])$, that is

Our posterior becomes

This is nothing but Ridge loss with coefficient $\lambda$, where