Kullback-Leibler Divergence

categories: Statistics, Probability, Definition, Data Science

Kullback-Leibler Divergence (KL Divergence)

Definition:
The Kullback-Leibler (KL) divergence measures how one probability distribution $Q$ (approximation) differs from a reference distribution $P$ (true distribution). For discrete distributions over events ${x_{1}, x_{2}, \dots, x_{n}}$ :
$D_{KL} (P ∣∣ Q) = \sum_{i = 1}^{n} p_{i} lo g \frac{p _{i}}{q _{i}}$
For continuous distributions:
$D_{KL} (P ∣∣ Q) = \int_{- \infty}^{\infty} p (x) lo g \frac{p ( x )}{q ( x )} d x$

Intuition:
KL divergence quantifies the inefficiency of approximating $P$ using $Q$ .

If $P = Q$ , then $D_{KL} (P ∣∣ Q) = 0$ .
Larger values indicate a greater difference between $P$ and $Q$ .

Key Properties:

Non-Negativity:
$D_{KL} (P ∣∣ Q) \geq 0$
Equality occurs only if $P = Q$ almost everywhere.
Asymmetry:
$D_{KL} (P ∣∣ Q) \neq = D_{KL} (Q ∣∣ P)$
KL divergence is not a true distance metric because it is not symmetric and does not satisfy the triangle inequality.
Additivity for Independent Variables:
If $P (X, Y) = P (X) P (Y)$ and $Q (X, Y) = Q (X) Q (Y)$ , then:
$D_{KL} (P (X, Y) ∣∣ Q (X, Y)) = D_{KL} (P (X) ∣∣ Q (X)) + D_{KL} (P (Y) ∣∣ Q (Y))$

Applications:

Machine Learning:
- Used in variational inference to measure the difference between the true posterior distribution and an approximate distribution.
- Cross-entropy loss includes KL divergence as a component:
  $H (P, Q) = H (P) + D_{KL} (P ∣∣ Q)$
Information Theory:
- Quantifies the inefficiency of encoding messages from $P$ using a code optimized for $Q$ .
Natural Sciences:
- Comparing probability distributions in areas like genetics, linguistics, and physics.

Relation to Entropy:
KL divergence relates to entropy and cross-entropy:
$D_{KL} (P ∣∣ Q) = H (P, Q) - H (P)$
where $H (P, Q)$ is the cross-entropy, and $H (P)$ is the Shannon entropy of $P$ .

Special Case (Binary Variables):
For binary random variables with $P = {p, 1 - p}$ and $Q = {q, 1 - q}$ :
$D_{KL} (P ∣∣ Q) = p lo g \frac{p}{q} + (1 - p) lo g \frac{1 - p}{1 - q}$

Evgeny's Notes

Explorer

Recent posts

Installing the Homebrew Channel App on an LG TV (Ubuntu)

Obsidian + Zettelkasten + PARA

About this site

Kullback-Leibler Divergence

Graph View

Backlinks