# Correlation Coefficient and Covariance for Numeric Data

## Covariances

Correlation coefficient is also known as Pearson’s product moment coefficient.

### Review of Standard Deviation

For a series of data A, we have the standard deviations
where $n$ is the number of elements in series A. Now imagine we have two series
$(a_i - \bar A)$ and $(a_j - \bar A)$. The geometric mean squared for $i=j$ is
So standard deviation is in fact a measure of the mean of **geometric mean of the deviation of each element**.

### Generalize Standard Deviation to Covariances

Similarly, for two series A and B of the same length, we could define a quantity to measure the geometric mean of the deviation of the two series correspondingly which is named the covariance of A and B, i.e., $\text{Cov} ({A,B})$.

Using some trivial techniques, we find that

At a first glance, the square in the definition seems to be only for notation purpose at this point.

Meanwhile, using this idea of the mean of geometric mean, we could easily generalize it to the covariance of three series, or even arbitrary N series, which should be called the covariance of all the N series, $\mathrm{Cov} ({A_1, A_2,\cdots, A_N })$.

Covariance measures the correlation of these two series. To see this, we assume that we have two series A = B, which leads to $\sigma_{A,B} = \sigma_{A}$. Suppose we have two series at a completely opposite phase,

index | A | B |
---|---|---|

1 | 1 | -1 |

2 | -1 | 1 |

3 | 1 | -1 |

4 | -1 | 1 |

5 | 1 | -1 |

6 | -1 | 1 |

7 | 1 | -1 |

we have $\sigma_{A,B} = -1 $. The negative sign tells us that our series are anti-correlated.

Covariance is also related to dispersion matrix.

## Correlation Coefficient

However, we would find that the value of the covariance depends on the values of the standard deviation of each series, which makes it hard to determine how strong the correlation is.

The obvious normalization factor is the multiplication of covariance of the two series, $\sigma_A$ and $\sigma_B$, i.e.,

The geometric mean view of it is which is some kind of geometric mean of the geometric mean of each series.