Definition:
Linear regression models the relationship between a dependent variable and one or more independent variables . For data points, the model predicts:

where:

  • is the design matrix of features (including a column of ones for the intercept),
  • is the vector of model parameters,
  • is the vector of predictions.

The goal is to minimize the residual sum of squares (RSS):

Normal Equations:
The solution to minimizing is obtained by solving the normal equations:

The closed-form solution for is:

(if is invertible).

Derivation:

  1. Start with the loss function:

  2. Expand :

  3. Take the gradient with respect to :

  4. Set :

Key Properties:

  1. Existence of Solution:

    • If is invertible, has a unique solution.
    • If is singular, regularization (e.g., Ridge Regression) may be used.
  2. Geometric Interpretation:
    The normal equations project onto the Column Space of , yielding the best linear approximation of .

  3. Computational Complexity:
    Solving via the normal equations involves matrix inversion, with complexity . Gradient-based methods or QR decomposition can be more efficient for large .

Example:
Given and :

  1. Compute :

  2. Compute :

  3. Solve for :

Result: .