Sigmoid Function

Definition:
$σ (x) = \frac{1}{1 + e ^{- x}}$

Key Properties:

Range: $σ (x) \in (0, 1)$ .
Derivative:
$σ^{'} (x) = σ (x) (1 - σ (x))$
Peaks at $x = 0$ , where $σ^{'} (0) = 0.25$ , and decreases symmetrically as $∣ x ∣$ increases.
Monotonicity: $σ (x)$ is monotonically increasing for all $x$ .
Asymptotes: $σ (x) \to 0$ as $x \to - \infty$ and $σ (x) \to 1$ as $x \to + \infty$ .
Symmetry:
$σ (- x) = 1 - σ (x)$
This symmetry makes it useful in probabilistic models.
Relationship to Log-Odds:
If $p = σ (x)$ , then:
$x = ln (\frac{p}{1 - p})$
where $x$ represents the log-odds of $p$ .
Saturation: When $∣ x ∣$ is large, gradients vanish, which can slow training in deep networks.

Connection to the Softplus Function:
The softplus function is defined as:
$softplus (x) = ln (1 + e^{x})$

Gradient Connection: The sigmoid function is the derivative of the softplus function:
$σ (x) = \frac{d}{d x} softplus (x)$
Softplus Approximation to ReLU:
The softplus function is a smooth approximation of the ReLU function, while the sigmoid is more tightly linked to probabilities.
Range vs. Output Behavior:
- $σ (x)$ maps $x$ to $(0, 1)$ , suited for probabilities.
- $softplus (x)$ maps $x$ to $(0, \infty)$ , suited for non-negative outputs (e.g., certain loss functions).

Applications Comparison:

Sigmoid is used for probabilistic interpretations and outputs.
Softplus is used in settings requiring smooth, non-negative activations, such as Poisson regression models.

Evgeny's Notes