Definition:

Key Properties:

  1. Range: .
  2. Derivative:

    Peaks at , where , and decreases symmetrically as increases.
  3. Monotonicity: is monotonically increasing for all .
  4. Asymptotes: as and as .
  5. Symmetry:

    This symmetry makes it useful in probabilistic models.
  6. Relationship to Log-Odds:
    If , then:

    where represents the log-odds of .
  7. Saturation: When is large, gradients vanish, which can slow training in deep networks.

Connection to the Softplus Function:
The softplus function is defined as:

  1. Gradient Connection: The sigmoid function is the derivative of the softplus function:
  2. Softplus Approximation to ReLU:
    The softplus function is a smooth approximation of the ReLU function, while the sigmoid is more tightly linked to probabilities.
  3. Range vs. Output Behavior:
    • maps to , suited for probabilities.
    • maps to , suited for non-negative outputs (e.g., certain loss functions).

Applications Comparison:

  • Sigmoid is used for probabilistic interpretations and outputs.
  • Softplus is used in settings requiring smooth, non-negative activations, such as Poisson regression models.