Linear Regression

categories: Data Science, Method

Definition:
Linear regression models the relationship between a dependent variable $y$ and one or more independent variables $X$ . For $m$ data points, the model predicts:
$\overset{y}{^} = Xβ$
where:

$X \in R^{m \times n}$ is the design matrix of features (including a column of ones for the intercept),
$β \in R^{n}$ is the vector of model parameters,
$\overset{y}{^} \in R^{m}$ is the vector of predictions.

The goal is to minimize the residual sum of squares (RSS):
$J (β) = ∥ y - Xβ ∥^{2} = (y - Xβ)^{⊤} (y - Xβ)$

Normal Equations:
The solution to minimizing $J (β)$ is obtained by solving the normal equations:
$X^{⊤} Xβ = X^{⊤} y$

The closed-form solution for $β$ is:
$β = (X^{⊤} X)^{- 1} X^{⊤} y$
(if $X^{⊤} X$ is invertible).

Derivation:

Start with the loss function:
$J (β) = ∥ y - Xβ ∥^{2} = (y - Xβ)^{⊤} (y - Xβ)$
Expand $J (β)$ :
$J (β) = y^{⊤} y - 2 y^{⊤} Xβ + β^{⊤} X^{⊤} Xβ$
Take the gradient with respect to $β$ :
$\nabla_{β} J (β) = - 2 X^{⊤} y + 2 X^{⊤} Xβ$
Set $\nabla_{β} J (β) = 0$ :
$- 2 X^{⊤} y + 2 X^{⊤} Xβ = 0$
$X^{⊤} Xβ = X^{⊤} y$

Key Properties:

Existence of Solution:
- If $X^{⊤} X$ is invertible, $β$ has a unique solution.
- If $X^{⊤} X$ is singular, regularization (e.g., Ridge Regression) may be used.
Geometric Interpretation:
The normal equations project $y$ onto the Column Space of $X$ , yielding the best linear approximation of $y$ .
Computational Complexity:
Solving via the normal equations involves matrix inversion, with complexity $O (n^{3})$ . Gradient-based methods or QR decomposition can be more efficient for large $n$ .

Evgeny's Notes

Explorer

Recent posts

Installing the Homebrew Channel App on an LG TV (Ubuntu)

Obsidian + Zettelkasten + PARA

About this site

Linear Regression

Graph View

Backlinks