Logistic Regression

categories: Data Science, Method, Statistics, Probability

Definition:
Logistic regression is a statistical model used for binary classification. It models the probability of a binary outcome as a function of input features using the logistic (sigmoid) function.

The probability that an observation belongs to the positive class ( $y = 1$ ) is modeled as:
$P (y = 1∣ x; β) = σ (x^{⊤} β) = \frac{1}{1 + e ^{- x^{⊤} β}}$
where:

$x \in R^{n}$ is the vector of features (including an intercept term).
$β \in R^{n}$ is the vector of model parameters.
$σ (z)$ is the sigmoid function:
$σ (z) = \frac{1}{1 + e ^{- z}}$

Log-Likelihood Function:
Given $m$ training examples ${(x_{i}, y_{i})}_{i = 1}^{m}$ where $y_{i} \in {0, 1}$ , the likelihood of the data is:
$L (β) = \prod_{i = 1}^{m} σ (x_{i}^{⊤} β)^{y_{i}} (1 - σ (x_{i}^{⊤} β))^{1 - y_{i}}$
Taking the logarithm gives the log-likelihood:
$ℓ (β) = \sum_{i = 1}^{m} [y_{i} lo g σ (x_{i}^{⊤} β) + (1 - y_{i}) lo g (1 - σ (x_{i}^{⊤} β))]$

Optimization Problem:
The logistic regression parameters $β$ are obtained by maximizing the log-likelihood:
$\hat{β} = ar g max_{β} ℓ (β)$

Equivalently, minimizing the negative log-likelihood:
$\hat{β} = ar g min_{β} [- \sum_{i = 1}^{m} (y_{i} lo g σ (x_{i}^{⊤} β) + (1 - y_{i}) lo g (1 - σ (x_{i}^{⊤} β)))]$

Gradient Descent:
The gradient of the log-likelihood with respect to $β$ is:
$\nabla_{β} ℓ (β) = \sum_{i = 1}^{m} (y_{i} - σ (x_{i}^{⊤} β)) x_{i}$

This gradient is used in iterative optimization algorithms like gradient descent or Newton’s method.

Decision Rule:
The predicted probability for the positive class is:
$\hat{P} (y = 1∣ x) = σ (x^{⊤} \hat{β})$
The decision rule for classification is:

\begin{cases} 1 & \text{if } \hat{P}(y=1 | x) \geq 0.5 \\ 0 & \text{otherwise} \end{cases}$$ **Assumptions**: 1. The relationship between the log-odds of the outcome and the features is linear: $$\log \frac{P(y=1 | x)}{P(y=0 | x)} = x^\top \beta$$ 2. Independence of observations. 3. Features are not highly collinear (to ensure stable estimation). **Extensions**: 1. **Multiclass Logistic Regression** ([[Softmax Regression]]): Generalizes logistic regression to classify among $k$ classes using the [[Softmax Function]]. 2. **Regularized Logistic Regression**: Adds $L_1$ or $L_2$ regularization to prevent overfitting: - Lasso ($L_1$): $$\ell_\text{reg}(\beta) = \ell(\beta) - \lambda \|\beta\|_1$$ - Ridge ($L_2$): $$\ell_\text{reg}(\beta) = \ell(\beta) - \lambda \|\beta\|^2$$ **Applications**: - Binary classification problems: spam detection, medical diagnosis, etc. - Estimating probabilities of binary events. - Feature importance analysis through parameter interpretation. **Limitations**: - Assumes linearity in the log-odds, which may not hold for complex relationships. - Not inherently robust to outliers or highly imbalanced datasets.

Evgeny's Notes

Explorer

Recent posts

Installing the Homebrew Channel App on an LG TV (Ubuntu)

Obsidian + Zettelkasten + PARA

About this site

Logistic Regression

Graph View

Backlinks