Definition:
The Hessian matrix is a square matrix of second-order partial derivatives of a scalar-valued function. It describes the local curvature of the function and plays a crucial role in optimization and machine learning.

For a twice-differentiable function , the Hessian matrix is defined as:

  • The entry of is .
  • If is twice continuously differentiable, is symmetric.

Key Properties

  1. Symmetry:
    If is twice continuously differentiable, the mixed partial derivatives are equal (), so is symmetric.

  2. Curvature:
    The Hessian describes the curvature of at a point :

    • Positive eigenvalues indicate convexity in those directions.
    • Negative eigenvalues indicate concavity.
  3. Quadratic Approximation:
    Near a point , can be approximated using its Taylor expansion:

    Here, determines the curvature of the approximation.


Uses of the Hessian Matrix

  1. Optimization:

    • Gradient Descent: The Hessian is used in second-order methods like Newton’s method to adjust the step size and direction:
    • Convexity Analysis:
      • If is positive definite at , has a local minimum.
      • If is negative definite, has a local maximum.
      • If has mixed eigenvalues, is a saddle point.
  2. Data Science

Eigenvalues and Critical Points

  1. Positive Definite:

    • All eigenvalues of are positive.
    • has a local minimum at .
  2. Negative Definite:

    • All eigenvalues of are negative.
    • has a local maximum at .
  3. Indefinite:

    • has both positive and negative eigenvalues.
    • is a saddle point.
  4. Positive Semidefinite / Negative Semidefinite:

    • All eigenvalues are or , respectively.
    • may have a minimum/maximum, but the point may not be strict.

Numerical Stability and Approximations

  1. Hessian-Free Methods:
    Computing directly can be computationally expensive for large . Approximations like finite differences or low-rank representations are used in practice.

  2. Quasi-Newton Methods:
    Algorithms like BFGS approximate the Hessian iteratively to avoid computing it explicitly.