Histograms are good for understanding the distribution of your data.

The Bin Size Problem

As an example, we will use the following series as an example.

[1.45,2.20,0.75,1.23,1.25,1.25,3.09,1.99,2.00,0.78,1.32,2.25,3.15,3.85,0.52,0.99,1.38,1.75,1.21,1.75]

If we use bin size 1, we get this spiky chart and it is not so informing.

We could also set bin size to 2.

In principle, we could keep tuning the bin size until we get something pretty and informing. But that would be quite depressing.

Square-root

One simply way to estimate the number of bins needed is

where $N$ is the lenght of the series.

In our example, $N=20$. Then we have $B=4.5\sim 5$ which leads to a bin size of $0.67$.

We immediately see the peak of this distribution.

Sturge’s formula

Sturges’ formula says that the number of bins of the histogram should be

where $N$ is the lenght of the series.

In our example, $N=20$. We have $B = 5$. The max and min of our series are $3.85$ and $0.52$, thus we have the bin size $W = 0.67$ which is the same as the square-root method.

Scott’s Rule

Scott’s rule says we should choose bin width

In our case, we have $N=20$ and $\sigma=0.86$, which leads to $W=1.1$.