- categories: Real Analysis, Definition
Definition:
The Hessian matrix is a square matrix of second-order partial derivatives of a scalar-valued function. It describes the local curvature of the function and plays a crucial role in optimization and machine learning.
For a twice-differentiable function , the Hessian matrix is defined as:
- The entry of is .
- If is twice continuously differentiable, is symmetric.
Key Properties
-
Symmetry:
If is twice continuously differentiable, the mixed partial derivatives are equal (), so is symmetric. -
Curvature:
The Hessian describes the curvature of at a point :- Positive eigenvalues indicate convexity in those directions.
- Negative eigenvalues indicate concavity.
-
Quadratic Approximation:
Near a point , can be approximated using its Taylor expansion:
Here, determines the curvature of the approximation.
Uses of the Hessian Matrix
-
Optimization:
- Gradient Descent: The Hessian is used in second-order methods like Newton’s method to adjust the step size and direction:
- Convexity Analysis:
- If is positive definite at , has a local minimum.
- If is negative definite, has a local maximum.
- If has mixed eigenvalues, is a saddle point.
- Gradient Descent: The Hessian is used in second-order methods like Newton’s method to adjust the step size and direction:
Eigenvalues and Critical Points
-
Positive Definite:
- All eigenvalues of are positive.
- has a local minimum at .
-
Negative Definite:
- All eigenvalues of are negative.
- has a local maximum at .
-
Indefinite:
- has both positive and negative eigenvalues.
- is a saddle point.
-
Positive Semidefinite / Negative Semidefinite:
- All eigenvalues are or , respectively.
- may have a minimum/maximum, but the point may not be strict.
Numerical Stability and Approximations
-
Hessian-Free Methods:
Computing directly can be computationally expensive for large . Approximations like finite differences or low-rank representations are used in practice. -
Quasi-Newton Methods:
Algorithms like BFGS approximate the Hessian iteratively to avoid computing it explicitly.