Definition:
Logistic regression is a statistical model used for binary classification. It models the probability of a binary outcome as a function of input features using the logistic (sigmoid) function.

The probability that an observation belongs to the positive class () is modeled as:

where:

  • is the vector of features (including an intercept term).
  • is the vector of model parameters.
  • is the sigmoid function:

Log-Likelihood Function:
Given training examples where , the likelihood of the data is:

Taking the logarithm gives the log-likelihood:

Optimization Problem:
The logistic regression parameters are obtained by maximizing the log-likelihood:

Equivalently, minimizing the negative log-likelihood:

Gradient Descent:
The gradient of the log-likelihood with respect to is:

This gradient is used in iterative optimization algorithms like gradient descent or Newton’s method.

Decision Rule:
The predicted probability for the positive class is:

The decision rule for classification is:

\begin{cases} 1 & \text{if } \hat{P}(y=1 | x) \geq 0.5 \\ 0 & \text{otherwise} \end{cases}$$ **Assumptions**: 1. The relationship between the log-odds of the outcome and the features is linear: $$\log \frac{P(y=1 | x)}{P(y=0 | x)} = x^\top \beta$$ 2. Independence of observations. 3. Features are not highly collinear (to ensure stable estimation). **Extensions**: 1. **Multiclass Logistic Regression** ([[Softmax Regression]]): Generalizes logistic regression to classify among $k$ classes using the [[Softmax Function]]. 2. **Regularized Logistic Regression**: Adds $L_1$ or $L_2$ regularization to prevent overfitting: - Lasso ($L_1$): $$\ell_\text{reg}(\beta) = \ell(\beta) - \lambda \|\beta\|_1$$ - Ridge ($L_2$): $$\ell_\text{reg}(\beta) = \ell(\beta) - \lambda \|\beta\|^2$$ **Applications**: - Binary classification problems: spam detection, medical diagnosis, etc. - Estimating probabilities of binary events. - Feature importance analysis through parameter interpretation. **Limitations**: - Assumes linearity in the log-odds, which may not hold for complex relationships. - Not inherently robust to outliers or highly imbalanced datasets.