Definition:
The BFGS algorithm is a quasi-Newton optimization method used to minimize a differentiable scalar-valued function . It approximates the inverse of the Hessian matrix iteratively, avoiding the computational cost of explicitly calculating or inverting the Hessian.

BFGS updates an approximation of the inverse Hessian matrix using information from gradients at successive iterations. It is widely used for unconstrained optimization problems and is robust and efficient for many applications.


Key Idea

  1. Avoid Explicit Hessian Computation:
    Instead of directly computing the Hessian matrix , BFGS builds an approximation of the inverse Hessian, , iteratively.

  2. Quasi-Newton Update:
    The inverse Hessian approximation is updated based on gradient differences and step sizes between iterations.

  3. Line Search:
    BFGS typically incorporates a line search to determine a suitable step size for each iteration.


Algorithm

  1. Initialization:

    • Start with an initial guess .
    • Initialize the inverse Hessian approximation as (identity matrix).
    • Compute the initial gradient .
  2. Iterative Updates:
    For :

    • Compute the search direction:

    • Perform a line search to find step size that sufficiently reduces :

    • Compute the gradient difference and step vector:

    • Update the inverse Hessian approximation :

  3. Convergence:
    Stop when the gradient norm is below a predefined threshold.


Key Formulas

  1. Search Direction:

  2. Hessian Inverse Update Rule:

    where:

    • : Change in position.
    • : Change in gradient.

Advantages of BFGS

  1. Efficiency:

    • Avoids the cost of explicit Hessian computation and inversion.
    • Updates the inverse Hessian approximation in per iteration.
  2. Robustness:

    • Converges quickly for well-behaved functions.
    • Suitable for both small and medium-sized problems.
  3. Superlinear Convergence:

    • Converges faster than gradient descent near the minimum due to better curvature approximation.

Limitations of BFGS

  1. Memory Requirements:

    • Storing and updating the inverse Hessian approximation requires memory, which can be prohibitive for very high-dimensional problems.
  2. Line Search Dependency:

    • Relies on an effective line search to ensure sufficient decrease in the objective function.
  3. Ill-Conditioned Problems:

    • May struggle when the Hessian is ill-conditioned, though modifications like damped BFGS help mitigate this.

Variants

  1. Limited-Memory BFGS (L-BFGS):

    • Stores only a few vectors to approximate the Hessian, reducing memory requirements to .
    • Suitable for high-dimensional problems (e.g., training machine learning models).
  2. Damped BFGS:

    • Modifies the update rule to ensure positive definiteness of , improving stability.

Example

Problem: Minimize .

  1. Gradient:

  2. Initialization:

    • Start with .
    • Set .
  3. First Iteration:

    • Compute search direction:

    • Perform line search to find :

    • Update position:

  4. Update to using and .


Comparison with Other Methods

MethodHessian UseMemory CostConvergence SpeedUse Case
Gradient DescentNoneLowLinearSimple/large problems
Newton’s MethodExplicit HessianHighQuadratic (near optima)Small problems
BFGSApprox. HessianMediumSuperlinearMedium-sized problems
L-BFGSApprox. HessianLowSuperlinearHigh-dimensional problems

Conclusion

The BFGS algorithm is a powerful and versatile tool for unconstrained optimization. It balances the accuracy of second-order methods with the efficiency of avoiding explicit Hessian computation, making it a cornerstone in optimization techniques for machine learning and scientific computing.