- categories: Statistics, Probability, Definition, Data Science
Definition:
Cross-entropy quantifies the difference between two probability distributions (true distribution) and (predicted distribution) over the same set of events . It is defined as:
where and are the probabilities assigned to event by distributions and , respectively.
Intuition:
- Cross-entropy measures how well the predicted distribution approximates the true distribution .
- If , the cross-entropy equals the Shannon Entropy , representing the minimum encoding cost.
- Larger differences between and increase the cross-entropy, signifying greater inefficiency in using to encode .
Key Properties:
-
Non-Negativity:
Equality occurs only if . -
Relation to Shannon Entropy:
where is the Kullback-Leibler divergence, measuring the extra cost of encoding using . -
Logarithm Base:
- Base-2 logarithms yield entropy in bits.
- Natural logarithms (base ) give entropy in nats.
Applications:
-
Machine Learning:
- Used as a loss function for classification tasks, especially when predictions are probabilities (e.g., softmax outputs).
- Binary cross-entropy for binary classification:
- Categorical cross-entropy for multi-class classification:
where is the one-hot encoded true label, and is the predicted probability for class .
-
Information Theory:
- Measuring the efficiency of coding when approximating one distribution by another.
Interpretation in Optimization:
Minimizing cross-entropy aligns the predicted probabilities with the true probabilities , leading to better classification or approximation.