- categories: Statistics, Probability, Definition, Data Science
Kullback-Leibler Divergence (KL Divergence)
Definition:
The Kullback-Leibler (KL) divergence measures how one probability distribution (approximation) differs from a reference distribution (true distribution). For discrete distributions over events :
For continuous distributions:
Intuition:
KL divergence quantifies the inefficiency of approximating using .
- If , then .
- Larger values indicate a greater difference between and .
Key Properties:
-
Non-Negativity:
Equality occurs only if almost everywhere. -
Asymmetry:
KL divergence is not a true distance metric because it is not symmetric and does not satisfy the triangle inequality. -
Additivity for Independent Variables:
If and , then:
Applications:
-
Machine Learning:
- Used in variational inference to measure the difference between the true posterior distribution and an approximate distribution.
- Cross-entropy loss includes KL divergence as a component:
-
Information Theory:
- Quantifies the inefficiency of encoding messages from using a code optimized for .
-
Natural Sciences:
- Comparing probability distributions in areas like genetics, linguistics, and physics.
Relation to Entropy:
KL divergence relates to entropy and cross-entropy:
where is the cross-entropy, and is the Shannon entropy of .
Special Case (Binary Variables):
For binary random variables with and :