Lasso Regression

categories: Data Science, Method

Definition:
Lasso regression, or Least Absolute Shrinkage and Selection Operator, is a linear regression technique that incorporates $L_{1}$ Regularization. This regularization adds a penalty proportional to the absolute value of the coefficients, encouraging sparsity in the model.

Objective Function:
The Lasso regression minimizes:
$J (β) = ∥ y - Xβ ∥^{2} + λ ∥ β ∥_{1}$
where:

$∥ y - Xβ ∥^{2}$ is the residual sum of squares,
$∥ β ∥_{1} = \sum_{i = 1}^{n} ∣ β_{i} ∣$ is the $L_{1}$ norm of the coefficients,
$λ \geq 0$ is the regularization parameter controlling the trade-off between model fit and sparsity.

Key Properties:

Feature Selection:
- Lasso shrinks some coefficients to exactly zero, performing variable selection.
- This is due to the sharp geometry of the $L_{1}$ penalty.
Sparsity:
- Lasso promotes sparsity in the coefficients, making it useful for high-dimensional datasets with irrelevant features.
Regularization Strength:
- Larger $λ$ increases the penalty, leading to more coefficients being shrunk to zero.
- Smaller $λ$ reduces the penalty, approaching ordinary least squares (OLS).
Bias-Variance Trade-Off:
- Lasso increases bias but reduces variance, improving generalization.

Optimization:
Unlike Ridge Regression, Lasso does not have a closed-form solution due to the non-differentiability of the $L_{1}$ norm. Instead, iterative algorithms such as coordinate descent or least angle regression (LARS) are used to compute $β$ .

Geometric Interpretation:

The $L_{1}$ penalty forms a diamond-shaped constraint region in coefficient space.
The solution often lies on the boundary, where some coefficients are zero, leading to sparsity.

Comparison with Ridge Regression:

Property	Ridge Regression	Lasso Regression
Regularization Type	$L_{2}$ (squared) norm	$L_{1}$ (absolute) norm
Effect on Coefficients	Shrinks all coefficients	Shrinks some coefficients to zero (sparse)
Feature Selection	No	Yes

Applications:

High-dimensional datasets with many irrelevant features (e.g., gene expression data).
When both prediction accuracy and feature selection are important.
Models requiring interpretability by identifying the most relevant predictors.

Evgeny's Notes

Explorer

Recent posts

Installing the Homebrew Channel App on an LG TV (Ubuntu)

Obsidian + Zettelkasten + PARA

About this site

Lasso Regression

Graph View