Maximum Likelihood Estimation (MLE)

categories: Statistics, Probability, Method, Data Science

Definition:
Maximum Likelihood Estimation is a statistical method for estimating the parameters of a probabilistic model. MLE seeks the parameter values that maximize the likelihood of the observed data under the model.

Let ${x_{1}, x_{2}, \dots, x_{n}}$ be the observed data, and let the model’s probability density function or mass function be $p (x; θ)$ , where $θ$ are the parameters to be estimated. The likelihood function is:
$L (θ) = \prod_{i = 1}^{n} p (x_{i}; θ)$
The MLE maximizes $L (θ)$ with respect to $θ$ :
$\hat{θ}_{MLE} = ar g max_{θ} L (θ)$

Log-Likelihood:
Since the likelihood is a product, it is often more convenient to maximize the log-likelihood:
$ℓ (θ) = lo g L (θ) = \sum_{i = 1}^{n} lo g p (x_{i}; θ)$
The MLE is equivalent to maximizing $ℓ (θ)$ :
$\hat{θ}_{MLE} = ar g max_{θ} ℓ (θ)$

Steps to Compute MLE:

Write down the likelihood function $L (θ)$ or log-likelihood $ℓ (θ)$ for the model.
Differentiate $ℓ (θ)$ with respect to $θ$ and set the derivative to zero to find critical points:
$\frac{\partial ℓ ( θ )}{\partial θ} = 0$
Solve for $θ$ , and verify it is a maximum (e.g., using the second derivative test or inspecting behavior at boundaries).

Examples:

Bernoulli Distribution:
Observations ${x_{1}, x_{2}, \dots, x_{n}}$ , where $x_{i} \in {0, 1}$ , and $p (x; θ) = θ^{x} (1 - θ)^{1 - x}$ .
- Likelihood:
  $L (θ) = \prod_{i = 1}^{n} θ^{x_{i}} (1 - θ)^{1 - x_{i}}$
- Log-likelihood:
  $ℓ (θ) = \sum_{i = 1}^{n} [x_{i} lo g θ + (1 - x_{i}) lo g (1 - θ)]$
- Derivative:
  $\frac{\partial ℓ ( θ )}{\partial θ} = \frac{\sum _{i = 1}^{n} x _{i}}{θ} - \frac{\sum _{i = 1}^{n} ( 1 - x _{i} )}{1 - θ}$
- Solving gives:
  $\hat{θ} = \frac{\sum _{i = 1}^{n} x _{i}}{n}$
  (The sample mean is the MLE for $θ$ .)
Normal Distribution:
Observations ${x_{1}, x_{2}, \dots, x_{n}}$ , and $p (x; μ, σ^{2}) = \frac{1}{2 π σ ^{2}} e^{- \frac{( x - μ ) ^{2}}{2 σ ^{2}}}$ .
- Likelihood:
  $L (μ, σ^{2}) = \prod_{i = 1}^{n} \frac{1}{2 π σ ^{2}} e^{- \frac{( x _{i} - μ ) ^{2}}{2 σ ^{2}}}$
- Log-likelihood:
  $ℓ (μ, σ^{2}) = - \frac{n}{2} lo g (2 π) - \frac{n}{2} lo g σ^{2} - \frac{1}{2 σ ^{2}} \sum_{i = 1}^{n} (x_{i} - μ)^{2}$
- Derivatives and solutions:
  $\overset{μ}{^} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}, \overset{σ}{^}^{2} = \frac{1}{n} \sum_{i = 1}^{n} (x_{i} - \overset{μ}{^})^{2}$

Properties of MLE:

Consistency:
$\hat{θ}_{MLE}$ converges to the true parameter $θ_{0}$ as $n \to \infty$ .
Asymptotic Normality:
For large $n$ , $\hat{θ}_{MLE}$ is approximately normally distributed:
$\hat{θ}_{MLE} \sim N (θ_{0}, \frac{1}{I ( θ _{0} )})$
where $I (θ_{0})$ is the Fisher information.
Efficient:
MLE achieves the Cramér-Rao lower bound asymptotically, making it an efficient estimator under regularity conditions.

Applications:

Parameter estimation for probability distributions.
Training models in machine learning (e.g., Logistic Regression, Gaussian Mixture Models).
Hypothesis testing (e.g., likelihood ratio tests).

Evgeny's Notes

Explorer

Recent posts

Installing the Homebrew Channel App on an LG TV (Ubuntu)

Obsidian + Zettelkasten + PARA

About this site

Maximum Likelihood Estimation (MLE)

Graph View

Backlinks