Newton's method

categories: Real Analysis, Optimization, Data Science, Method, Algorithm

Newton’s Method

Definition:
Newton’s method is an iterative optimization algorithm used to find critical points (e.g., minima, maxima, or saddle points) of a differentiable function. It leverages seond-order Taylor approximations and the Hessian matrix to refine estimates of the optimal solution.

For unconstrained optimization of a scalar-valued function $f (x)$ , Newton’s method updates the parameter $x$ as:
$x_{k + 1} = x_{k} - H_{f} (x_{k})^{- 1} \nabla f (x_{k}),$
where:

$x_{k}$ : Current estimate of the solution.
$\nabla f (x_{k})$ : Gradient (first derivative) of $f$ at $x_{k}$ .
$H_{f} (x_{k})$ : Hessian matrix (second derivative) of $f$ at $x_{k}$ .

Key Concepts

Taylor Approximation:
Newton’s method is based on approximating $f (x)$ around a point $x_{k}$ using a second-order Taylor expansion:
$f (x) \approx f (x_{k}) + \nabla f (x_{k})^{⊤} (x - x_{k}) + \frac{1}{2} (x - x_{k})^{⊤} H_{f} (x_{k}) (x - x_{k}) .$
The minimum of this quadratic approximation occurs at:
$x = x_{k} - H_{f} (x_{k})^{- 1} \nabla f (x_{k}) .$
Gradient and Hessian:
- The gradient $\nabla f (x)$ points in the direction of the steepest ascent/descent.
- The Hessian $H_{f} (x)$ provides curvature information, refining the step direction and size.

Algorithm

Initialize: Choose an initial guess $x_{0}$ .
Iterate: Repeat until convergence:
$x_{k + 1} = x_{k} - H_{f} (x_{k})^{- 1} \nabla f (x_{k}) .$
Convergence Criterion: Stop when:
$∥\nabla f (x_{k}) ∥ < ϵ,$
where $ϵ$ is a small tolerance value.

Advantages

Quadratic Convergence:
- Near a critical point, convergence is very fast compared to gradient descent.
Accurate Steps:
- Incorporates curvature information, resulting in better step directions and sizes.

Disadvantages

Computational Cost:
- Computing and inverting the Hessian matrix is expensive for high-dimensional problems ( $O (n^{3})$ complexity).
Not Always Stable:
- If the Hessian is not positive definite, the method may diverge.
Initialization Sensitivity:
- Poor initial guesses can lead to slow convergence or divergence.

Variants of Newton’s Method

Modified Newton’s Method:
- Adds a damping factor to improve stability:
  $x_{k + 1} = x_{k} - α H_{f} (x_{k})^{- 1} \nabla f (x_{k}),$
  where $α \in (0, 1]$ is the step size.
Quasi-Newton Methods:
- Approximate the Hessian iteratively to reduce computational cost. Examples:
  - BFGS (Broyden-Fletcher-Goldfarb-Shanno).
  - L-BFGS (Limited-memory BFGS) for large-scale problems.
Hessian-Free Optimization:
- Avoids explicit computation of the Hessian by using matrix-free methods like conjugate gradient.

Comparison to Gradient Descent

Feature	Newton’s Method	Gradient Descent
Convergence Speed	Quadratic near the minimum	Linear
Step Size	Adaptive (from Hessian)	Fixed or decaying
Computational Cost	High (Hessian computation and inversion)	Low
Applicability	Requires twice-differentiable functions	First-order differentiable

Newton’s method is powerful for optimization, particularly for problems where second-order information is computationally feasible. For large-scale problems, approximations like quasi-Newton methods are often used to balance efficiency and accuracy.

Evgeny's Notes

Explorer

Recent posts

Installing the Homebrew Channel App on an LG TV (Ubuntu)

Obsidian + Zettelkasten + PARA

About this site