Entropy
Definition
For a discrete probability distribution on the finite set with , the entropy of is defined as
For a continuous probability density function on an interval , the entropy of is defined as
Theorem
For a probability density function on a finite set , then
with equality iff. is uniform, i.e. .
Uniform probability yields maximum uncertainty and therefore maximum entropy.
Theorem
For a continuous probability density function on with variance , then
with equality iff. if Gaussian with variance , i.e. for some we have
Theorem
For a continuous probability density function on with mean , then
with equality iff. is exponential with mean , i.e.
Cross entropy
The cross entropy of the distribution relative to a distribution over a given set is defined as follows:
Kullback-Leibler divergence
The Kullback-Leibler divergence (relative entropy) was introduced as the directed divergence between two distributions
The Kullback-Leibler divergence is then interpreted as the average difference of the number of bits required for encoding samples of using a code optimized for rather than one optimized for .
Jensen-Shannon divergence
where
Mutual Information
Let be a pair of random variables with values over the space . If their joint distribution is and the marginal distributions are and , the mutual information is defined as
where is the Kullback–Leibler divergence.
PMFs for discrete distributions
The mutual information of two jointly discrete random variables and is calculated as a double sum:
where is the joint probability mass function of and , and and are the marginal probability mass functions of and respectively.
PDFs for continuous distributions
In the case of jointly continuous random variables, the double sum is replaced by a double integral:
where is now the joint probability density function of and , and and are the marginal probability density functions of and respectively.
Mutual information and Kullback–Leibler divergence
Mutual information is the Kullback–Leibler divergence from the product of the marginal distributions