We use Einstein’s summation convention.

Covariance of two discrete series $A$ and $B$ is defined as

where $n$ is the length of the series. The normalization factor is set to $1/(n-1)$ to mitigate the bias for small $n$.

One could show that

At first glance, the square in the definition seems to be only for notation purpose at this point.

Meanwhile, using this idea of the mean of geometric mean, we could easily generalize it to the covariance of three series,

or even arbitrary N series,

which should be called the covariance of all the N series, $\mathrm{Cov} ({A_1, A_2,\cdots, A_N })$.

We do not use these since we could easily build a covariance matrix to indicate all the possible covariances between any two variables.

For a complete picture of the data, we build a matrix for all the possible combinations of the covariances,

For real series, $\mathrm{Cov} (A_2, A_1) = \mathrm{Cov} (A_1, A_2)$.

The covariance matrix of complex numbers and quaternions is not necessariy symmetric. A more general concept of symmetries is Hermitian.

Given a dataset $X$,

where $N$ is the number of features (variables). The covariance matrix is

The covariance becomes variance when $i=j$.