- categories: Data Science, Method
Ridge Regression
Definition:
Ridge regression, also called regularized Linear Regression, modifies the ordinary least squares (OLS) objective by adding a Regularization term to penalize large model coefficients. It addresses multicollinearity and prevents overfitting in linear regression.
Objective Function:
The ridge regression minimizes:
where:
- is the residual sum of squares,
- is the norm of the coefficients,
- is the regularization parameter that controls the trade-off between fitting the data and keeping coefficients small.
Closed-Form Solution:
The ridge regression solution is derived using the normal equations:
where is the identity matrix.
Intuition:
- The term penalizes large coefficients, effectively shrinking them towards zero.
- For , ridge regression reduces to ordinary least squares.
- For , (shrinking coefficients completely).
Key Properties:
-
Regularization Strength:
- Larger increases the penalty, leading to smaller coefficients and potentially underfitting.
- Smaller reduces the penalty, approaching OLS and potentially overfitting.
-
Bias-Variance Trade-Off:
- Ridge regression increases bias but reduces variance, improving the generalization of the model.
-
No Feature Elimination:
Unlike Lasso regression, ridge regression does not perform variable selection; all coefficients are shrunk but not set to zero. -
Stabilizes Inversion:
The addition of ensures is invertible even if is singular (e.g., when features are highly collinear).
Gradient Descent Formulation:
The gradient of the ridge loss function is:
This can be used in iterative optimization methods when (number of features) is large.
Example:
Given and , and :
-
Compute :
-
Add :
-
Compute :
-
Solve for :
Result: .
Applications:
- Addressing multicollinearity in linear regression.
- Regularizing models with a large number of features.
- Situations where interpretability (non-zero coefficients) is desired over sparsity.