Information gain is a frequently used metric in calculating the gain during a split in tree-based methods.
First o all, the entropy of a dataset if defined as
where $p_i$ is the probability of a class.
The information gain is the difference between the entropy.
For example, in a decision tree algorithm, we would split a node. Before splitting, we assign a label $m$ to the node,
After the splitting, we have two groups that contributes to the entropy, group $L$ and group $R$,
where $p_L$ and $p_R$ are the probabilities of the two groups. Suppose we have 100 samples before splitting and 29 samples in the left group and 71 samples in the right group, we have $p_L = 29/100$ and $p_R = 71/100$.
The information gain is thus