- categories: Statistics, Probability, Method, Data Science
Definition:
Bayesian estimation is a statistical method for estimating parameters of a model by incorporating prior beliefs about the parameters and updating them based on observed data. It is rooted in Bayes’ theorem:
where:
- is the posterior distribution of the parameter given the data .
- is the likelihood, representing the probability of the observed data given .
- is the prior distribution, expressing beliefs about before observing the data.
- is the evidence (normalizing constant), ensuring the posterior is a valid probability distribution:
Posterior Distribution:
The posterior combines the likelihood and the prior:
Key Concepts:
-
Prior ():
Encodes prior knowledge or beliefs about . Common choices include:- Non-informative prior: Reflects minimal prior knowledge (e.g., uniform distribution).
- Informative prior: Incorporates domain-specific information (e.g., Gaussian centered on known values).
-
Likelihood ():
Represents the data generation process, connecting the parameter to the observed data. -
Posterior ():
The updated belief about after observing data. -
Bayesian Estimators:
- Maximum A Posteriori (MAP): Chooses the value of that maximizes the posterior:
Equivalently:
- Posterior Mean: The expected value of under the posterior:
- Maximum A Posteriori (MAP): Chooses the value of that maximizes the posterior:
Example:
-
Gaussian Likelihood with Gaussian Prior:
- Data:
- Prior:
- Likelihood:
- Posterior: Combining prior and likelihood:
-
Binomial Likelihood with Beta Prior:
- Data:
- Prior:
- Posterior:
Applications:
-
Parameter Estimation:
Estimating model parameters in probabilistic models. -
Prediction:
Using the posterior predictive distribution:
-
Regularization:
Prior distributions act as a form of regularization (e.g., Gaussian priors on weights in Bayesian linear regression). -
Bayesian Machine Learning:
Bayesian models are foundational in probabilistic machine learning methods, such as Gaussian processes and Bayesian neural networks.
Advantages:
- Explicit incorporation of prior knowledge.
- Provides a full distribution over parameters, capturing uncertainty.
- Naturally avoids overfitting by weighting the prior and likelihood.
Limitations:
- Computationally intensive for complex models (requires sampling methods like MCMC).
- Sensitivity to the choice of prior in small datasets.