Definition:
Bayesian estimation is a statistical method for estimating parameters of a model by incorporating prior beliefs about the parameters and updating them based on observed data. It is rooted in Bayes’ theorem:

where:

  • is the posterior distribution of the parameter given the data .
  • is the likelihood, representing the probability of the observed data given .
  • is the prior distribution, expressing beliefs about before observing the data.
  • is the evidence (normalizing constant), ensuring the posterior is a valid probability distribution:

Posterior Distribution:
The posterior combines the likelihood and the prior:

Key Concepts:

  1. Prior ():
    Encodes prior knowledge or beliefs about . Common choices include:

    • Non-informative prior: Reflects minimal prior knowledge (e.g., uniform distribution).
    • Informative prior: Incorporates domain-specific information (e.g., Gaussian centered on known values).
  2. Likelihood ():
    Represents the data generation process, connecting the parameter to the observed data.

  3. Posterior ():
    The updated belief about after observing data.

  4. Bayesian Estimators:

    • Maximum A Posteriori (MAP): Chooses the value of that maximizes the posterior:

      Equivalently:
    • Posterior Mean: The expected value of under the posterior:

Example:

  1. Gaussian Likelihood with Gaussian Prior:

    • Data:
    • Prior:
    • Likelihood:
    • Posterior: Combining prior and likelihood:
  2. Binomial Likelihood with Beta Prior:

    • Data:
    • Prior:
    • Posterior:

Applications:

  1. Parameter Estimation:
    Estimating model parameters in probabilistic models.

  2. Prediction:
    Using the posterior predictive distribution:

  3. Regularization:
    Prior distributions act as a form of regularization (e.g., Gaussian priors on weights in Bayesian linear regression).

  4. Bayesian Machine Learning:
    Bayesian models are foundational in probabilistic machine learning methods, such as Gaussian processes and Bayesian neural networks.

Advantages:

  • Explicit incorporation of prior knowledge.
  • Provides a full distribution over parameters, capturing uncertainty.
  • Naturally avoids overfitting by weighting the prior and likelihood.

Limitations:

  • Computationally intensive for complex models (requires sampling methods like MCMC).
  • Sensitivity to the choice of prior in small datasets.