Bayesian Estimation

categories: Statistics, Probability, Method, Data Science

Definition:
Bayesian estimation is a statistical method for estimating parameters of a model by incorporating prior beliefs about the parameters and updating them based on observed data. It is rooted in Bayes’ theorem:
$P (θ ∣ x) = \frac{P ( x ∣ θ ) P ( θ )}{P ( x )}$
where:

$P (θ ∣ x)$ is the posterior distribution of the parameter $θ$ given the data $x$ .
$P (x ∣ θ)$ is the likelihood, representing the probability of the observed data given $θ$ .
$P (θ)$ is the prior distribution, expressing beliefs about $θ$ before observing the data.
$P (x)$ is the evidence (normalizing constant), ensuring the posterior is a valid probability distribution:
$P (x) = \int P (x ∣ θ) P (θ) d θ$

Posterior Distribution:
The posterior combines the likelihood and the prior:
$P (θ ∣ x) \propto P (x ∣ θ) P (θ)$

Key Concepts:

Prior ( $P (θ)$ ):
Encodes prior knowledge or beliefs about $θ$ . Common choices include:
- Non-informative prior: Reflects minimal prior knowledge (e.g., uniform distribution).
- Informative prior: Incorporates domain-specific information (e.g., Gaussian centered on known values).
Likelihood ( $P (x ∣ θ)$ ):
Represents the data generation process, connecting the parameter $θ$ to the observed data.
Posterior ( $P (θ ∣ x)$ ):
The updated belief about $θ$ after observing data.
Bayesian Estimators:
- Maximum A Posteriori (MAP): Chooses the value of $θ$ that maximizes the posterior:
  $\hat{θ}_{MAP} = ar g max_{θ} P (θ ∣ x)$
  Equivalently:
  $\hat{θ}_{MAP} = ar g max_{θ} [P (x ∣ θ) P (θ)]$
- Posterior Mean: The expected value of $θ$ under the posterior:
  $\hat{θ}_{Bayes} = E [θ ∣ x] = \int θP (θ ∣ x) d θ$

Example:

Gaussian Likelihood with Gaussian Prior:
- Data: ${x_{1}, x_{2}, \dots, x_{n}} \sim N (μ, σ^{2})$
- Prior: $μ \sim N (μ_{0}, τ^{2})$
- Likelihood:
  $P (x ∣ μ) \propto exp (- \frac{1}{2 σ ^{2}} \sum_{i = 1}^{n} (x_{i} - μ)^{2})$
- Posterior: Combining prior and likelihood:
  $μ ∣ x \sim N (\frac{\frac{μ _{0}}{τ ^{2}} + \frac{x ˉ}{σ ^{2}}}{\frac{1}{τ ^{2}} + \frac{n}{σ ^{2}}}, \frac{1}{\frac{1}{τ ^{2}} + \frac{n}{σ ^{2}}})$
Binomial Likelihood with Beta Prior:
- Data: $y \sim Binomial (n, θ)$
- Prior: $θ \sim Beta (α, β)$
- Posterior:
  $θ ∣ y \sim Beta (α + y, β + n - y)$

Applications:

Parameter Estimation:
Estimating model parameters in probabilistic models.
Prediction:
Using the posterior predictive distribution:
$P (x_{new} ∣ x) = \int P (x_{new} ∣ θ) P (θ ∣ x) d θ$
Regularization:
Prior distributions act as a form of regularization (e.g., Gaussian priors on weights in Bayesian linear regression).
Bayesian Machine Learning:
Bayesian models are foundational in probabilistic machine learning methods, such as Gaussian processes and Bayesian neural networks.

Advantages:

Explicit incorporation of prior knowledge.
Provides a full distribution over parameters, capturing uncertainty.
Naturally avoids overfitting by weighting the prior and likelihood.

Limitations:

Computationally intensive for complex models (requires sampling methods like MCMC).
Sensitivity to the choice of prior in small datasets.

Evgeny's Notes

Explorer

Recent posts

Installing the Homebrew Channel App on an LG TV (Ubuntu)

Obsidian + Zettelkasten + PARA

About this site

Bayesian Estimation

Graph View