- categories: Data Science, Method
Lasso Regression
Definition:
Lasso regression, or Least Absolute Shrinkage and Selection Operator, is a linear regression technique that incorporates Regularization. This regularization adds a penalty proportional to the absolute value of the coefficients, encouraging sparsity in the model.
Objective Function:
The Lasso regression minimizes:
where:
- is the residual sum of squares,
- is the norm of the coefficients,
- is the regularization parameter controlling the trade-off between model fit and sparsity.
Key Properties:
-
Feature Selection:
- Lasso shrinks some coefficients to exactly zero, performing variable selection.
- This is due to the sharp geometry of the penalty.
-
Sparsity:
- Lasso promotes sparsity in the coefficients, making it useful for high-dimensional datasets with irrelevant features.
-
Regularization Strength:
- Larger increases the penalty, leading to more coefficients being shrunk to zero.
- Smaller reduces the penalty, approaching ordinary least squares (OLS).
-
Bias-Variance Trade-Off:
- Lasso increases bias but reduces variance, improving generalization.
Optimization:
Unlike Ridge Regression, Lasso does not have a closed-form solution due to the non-differentiability of the norm. Instead, iterative algorithms such as coordinate descent or least angle regression (LARS) are used to compute .
Geometric Interpretation:
- The penalty forms a diamond-shaped constraint region in coefficient space.
- The solution often lies on the boundary, where some coefficients are zero, leading to sparsity.
Example:
Given and , and :
-
Construct the objective:
-
Solve using an algorithm (e.g., coordinate descent):
- Assume starting values .
- Iteratively optimize each while keeping others fixed.
Result: Coefficients such as , (example solution, computation varies with algorithm).
Comparison with Ridge Regression:
Property | Ridge Regression | Lasso Regression |
---|---|---|
Regularization Type | (squared) norm | (absolute) norm |
Effect on Coefficients | Shrinks all coefficients | Shrinks some coefficients to zero (sparse) |
Feature Selection | No | Yes |
Applications:
- High-dimensional datasets with many irrelevant features (e.g., gene expression data).
- When both prediction accuracy and feature selection are important.
- Models requiring interpretability by identifying the most relevant predictors.