- categories: Real Analysis, Optimization, Data Science, Method, Algorithm
Newton’s Method
Definition:
Newton’s method is an iterative optimization algorithm used to find critical points (e.g., minima, maxima, or saddle points) of a differentiable function. It leverages seond-order Taylor approximations and the Hessian matrix to refine estimates of the optimal solution.
For unconstrained optimization of a scalar-valued function , Newton’s method updates the parameter as:
where:
- : Current estimate of the solution.
- : Gradient (first derivative) of at .
- : Hessian matrix (second derivative) of at .
Key Concepts
-
Taylor Approximation:
Newton’s method is based on approximating around a point using a second-order Taylor expansion:
The minimum of this quadratic approximation occurs at:
-
Gradient and Hessian:
- The gradient points in the direction of the steepest ascent/descent.
- The Hessian provides curvature information, refining the step direction and size.
Algorithm
- Initialize: Choose an initial guess .
- Iterate: Repeat until convergence:
- Convergence Criterion: Stop when:
where is a small tolerance value.
Advantages
-
Quadratic Convergence:
- Near a critical point, convergence is very fast compared to gradient descent.
-
Accurate Steps:
- Incorporates curvature information, resulting in better step directions and sizes.
Disadvantages
-
Computational Cost:
- Computing and inverting the Hessian matrix is expensive for high-dimensional problems ( complexity).
-
Not Always Stable:
- If the Hessian is not positive definite, the method may diverge.
-
Initialization Sensitivity:
- Poor initial guesses can lead to slow convergence or divergence.
Variants of Newton’s Method
-
Modified Newton’s Method:
- Adds a damping factor to improve stability:
where is the step size.
- Adds a damping factor to improve stability:
-
Quasi-Newton Methods:
- Approximate the Hessian iteratively to reduce computational cost. Examples:
- BFGS (Broyden-Fletcher-Goldfarb-Shanno).
- L-BFGS (Limited-memory BFGS) for large-scale problems.
- Approximate the Hessian iteratively to reduce computational cost. Examples:
-
Hessian-Free Optimization:
- Avoids explicit computation of the Hessian by using matrix-free methods like conjugate gradient.
Comparison to Gradient Descent
Feature | Newton’s Method | Gradient Descent |
---|---|---|
Convergence Speed | Quadratic near the minimum | Linear |
Step Size | Adaptive (from Hessian) | Fixed or decaying |
Computational Cost | High (Hessian computation and inversion) | Low |
Applicability | Requires twice-differentiable functions | First-order differentiable |
Newton’s method is powerful for optimization, particularly for problems where second-order information is computationally feasible. For large-scale problems, approximations like quasi-Newton methods are often used to balance efficiency and accuracy.