Entropy

Definition

For a discrete probability distribution on the finite set with , the entropy of is defined as

For a continuous probability density function on an interval , the entropy of is defined as

Theorem

For a probability density function on a finite set , then

with equality iff. is uniform, i.e. .

Uniform probability yields maximum uncertainty and therefore maximum entropy.

Theorem

For a continuous probability density function on with variance , then

with equality iff. if Gaussian with variance , i.e. for some we have

Theorem

For a continuous probability density function on with mean , then

with equality iff. is exponential with mean , i.e.


Cross entropy

The cross entropy of the distribution relative to a distribution over a given set is defined as follows:

Kullback-Leibler divergence

The Kullback-Leibler divergence (relative entropy) was introduced as the directed divergence between two distributions

The Kullback-Leibler divergence is then interpreted as the average difference of the number of bits required for encoding samples of using a code optimized for rather than one optimized for .

Jensen-Shannon divergence

where


Mutual Information

Let be a pair of random variables with values over the space . If their joint distribution is and the marginal distributions are and , the mutual information is defined as

where is the Kullback–Leibler divergence.

PMFs for discrete distributions

The mutual information of two jointly discrete random variables and is calculated as a double sum:

where is the joint probability mass function of and , and and are the marginal probability mass functions of and respectively.

PDFs for continuous distributions

In the case of jointly continuous random variables, the double sum is replaced by a double integral:

where is now the joint probability density function of and , and and are the marginal probability density functions of and respectively.

Mutual information and Kullback–Leibler divergence

Mutual information is the Kullback–Leibler divergence from the product of the marginal distributions