# NMF: Nonnegative Matrix Factorizatioin

## Decomposition

To make it easier to understand, we start with a data point $\mathbf P$ in a $k$-dimensional space spanned by $k$ basis vectors $\mathbf V^k$. Naturally, we could write down the component decomposition of the point using the basis vectors,

This is immediately obvious to us since we have been dealing with rank 2 $(k, 1)$ basis vectors and we are basically talking about the $k$ coordinates for a point.

In fact, this point is represented by a matrix of rank 2 $(k, 1)$ given this basis.

If we assume a cartesian coordinate system, the basis are normal vectors.

These representations of the basis vectors can be joined together

Using these representations of the abstract vectors, we could represent the point $\mathbf P$ using

*which is insanely trivial since the representation of basis in this case is an identity matrix.*

Spherical Harmonics

Now we ask this question:

**Can we decompose the point using some other basis?**

The answer is yes.

While it is easier to perceive using the above matrix picture, it is much more convinent to use the index notation.

In physics, we are already dealing with this situition. We use spherical harmonics to decompose a field.

A general form of decomposition using rank 2 matrix is

NMF is talking about the same idea: coordinate transformations,

We will view $X_{n}^{\phantom{n}k}$ as a vertical stack of $n$ points:

For a point $i$, we have

Compare this with the decomposition of a point in cartesian coordinate system, this is a decomposition of each point on a basis spanned by $\mathbf W^i$.

Another View?

Then we have the sum of a bunch of $(n, k)$ matrices,

Hmmm this is not providing more insights to me. It seems to be rewriting the original matrix in “coordinate system” of r bases.

## Nonnegative Matrix Factorization

NMF is a dimension reduction method that decomposes $X_{n}^{\phantom{n}k}$ using

while requiring the elements of the decomposition being nonnegative. **But there are many possible decompositions!** Then we require this approximation to be as accurate as possible.

How do we measure the approximations?

We use the Frobenius distance between the matrix $X_{n}^{\phantom{n}k}$ and $H_n^{\phantom{n}r} W_r^{\phantom{r}k}$,

So NMF will require this Frobenius distance to be minimal.

## Why Does It Even Work?

Well, it doesn’t always work. **We might have many different NMFs for one single matrix.**

Compare with SVD

How do we find a proper decomposition using NMF? We put restrictions on it, such as penalties. Apart from this, we also need to choose the rank of the decomposition $r$ in Eq. ($\ref{eq-nmf}$). Even if we fixed all these problems, NMF is computation expensive.