Lasso Regression

Definition:
Lasso regression, or Least Absolute Shrinkage and Selection Operator, is a linear regression technique that incorporates Regularization. This regularization adds a penalty proportional to the absolute value of the coefficients, encouraging sparsity in the model.

Objective Function:
The Lasso regression minimizes:

where:

  • is the residual sum of squares,
  • is the norm of the coefficients,
  • is the regularization parameter controlling the trade-off between model fit and sparsity.

Key Properties:

  1. Feature Selection:

    • Lasso shrinks some coefficients to exactly zero, performing variable selection.
    • This is due to the sharp geometry of the penalty.
  2. Sparsity:

    • Lasso promotes sparsity in the coefficients, making it useful for high-dimensional datasets with irrelevant features.
  3. Regularization Strength:

    • Larger increases the penalty, leading to more coefficients being shrunk to zero.
    • Smaller reduces the penalty, approaching ordinary least squares (OLS).
  4. Bias-Variance Trade-Off:

    • Lasso increases bias but reduces variance, improving generalization.

Optimization:
Unlike Ridge Regression, Lasso does not have a closed-form solution due to the non-differentiability of the norm. Instead, iterative algorithms such as coordinate descent or least angle regression (LARS) are used to compute .

Geometric Interpretation:

  • The penalty forms a diamond-shaped constraint region in coefficient space.
  • The solution often lies on the boundary, where some coefficients are zero, leading to sparsity.

Example:
Given and , and :

  1. Construct the objective:

  2. Solve using an algorithm (e.g., coordinate descent):

    • Assume starting values .
    • Iteratively optimize each while keeping others fixed.

Result: Coefficients such as , (example solution, computation varies with algorithm).

Comparison with Ridge Regression:

PropertyRidge RegressionLasso Regression
Regularization Type (squared) norm (absolute) norm
Effect on CoefficientsShrinks all coefficientsShrinks some coefficients to zero (sparse)
Feature SelectionNoYes

Applications:

  • High-dimensional datasets with many irrelevant features (e.g., gene expression data).
  • When both prediction accuracy and feature selection are important.
  • Models requiring interpretability by identifying the most relevant predictors.