Ridge Regression

Definition:
Ridge regression, also called regularized Linear Regression, modifies the ordinary least squares (OLS) objective by adding a Regularization term to penalize large model coefficients. It addresses multicollinearity and prevents overfitting in linear regression.

Objective Function:
The ridge regression minimizes:

where:

  • is the residual sum of squares,
  • is the norm of the coefficients,
  • is the regularization parameter that controls the trade-off between fitting the data and keeping coefficients small.

Closed-Form Solution:
The ridge regression solution is derived using the normal equations:

where is the identity matrix.

Intuition:

  • The term penalizes large coefficients, effectively shrinking them towards zero.
  • For , ridge regression reduces to ordinary least squares.
  • For , (shrinking coefficients completely).

Key Properties:

  1. Regularization Strength:

    • Larger increases the penalty, leading to smaller coefficients and potentially underfitting.
    • Smaller reduces the penalty, approaching OLS and potentially overfitting.
  2. Bias-Variance Trade-Off:

    • Ridge regression increases bias but reduces variance, improving the generalization of the model.
  3. No Feature Elimination:
    Unlike Lasso regression, ridge regression does not perform variable selection; all coefficients are shrunk but not set to zero.

  4. Stabilizes Inversion:
    The addition of ensures is invertible even if is singular (e.g., when features are highly collinear).

Gradient Descent Formulation:
The gradient of the ridge loss function is:

This can be used in iterative optimization methods when (number of features) is large.

Example:
Given and , and :

  1. Compute :

  2. Add :

  3. Compute :

  4. Solve for :

    Result: .

Applications:

  • Addressing multicollinearity in linear regression.
  • Regularizing models with a large number of features.
  • Situations where interpretability (non-zero coefficients) is desired over sparsity.