Overview¶
This lecture is a numerical optimization toolbox for the reduced formulation
The same algorithms will later be reused for PDE-constrained optimization, where and each gradient evaluation typically requires state/adjoint solves.
Main goals:
put
GD,nonlinear CG,BFGS, andtrust-regionin one common framework;isolate the role of Armijo/Wolfe conditions;
compare methods by cost per iteration and robustness;
prepare the transition to reduced PDE-constrained optimization.
Unified Iterative Template¶
Most unconstrained methods fit, at a high level, the same iterative template: Choose and for compute
with four design choices:
model of local “geometry”, i.e., what we know about the “landscape” of in the surrounding of (gradient only, Hessian, Hessian approximation);
search direction ;
step acceptance/globalisation rule;
stopping criteria.
Typical stopping checks (often combined):
Line Search and Armijo¶
Given a descent direction with , line search picks to ensure robust decrease.
The ideal choice is the exact line-search minimizer
Since
an interior minimizer satisfies . Using the second-order expansion at ,
with and , the minimizer of this local model is
For a quadratic (constant Hessian), this is exact:
and for steepest descent ():
In general nonlinear problems, exact line search is usually too expensive; backtracking is the standard practical alternative.
Armijo backtracking¶
Choose , , . Set and reduce until
Interpretation:
right-hand side is a conservative linear prediction of decrease;
if condition fails, the step is too optimistic for local curvature;
backtracking is cheap and usually enough in practice.
Wolfe conditions (for ):
Strong Wolfe replaces the curvature inequality with
Geometric interpretation along the line :
Armijo enforces sufficient decrease in function value (step not too large);
Wolfe curvature condition enforces to be much less negative than , so the step is not stopped too early;
strong Wolfe additionally controls oscillations by requiring small absolute slope at the accepted point.
Gradient Descent (GD)¶
Direction:
Strengths:
minimal per-iteration cost;
robust with Armijo backtracking;
simple baseline for any new model.
Limitations:
sensitive to conditioning;
slow on narrow valleys (zig-zag behavior);
linear convergence under strong convexity.
Use case in this course: reference solver for reduced OCP prototypes.
Nonlinear Conjugate Gradient (CG)¶
Replace steepest descent with
with reset when needed.
Popular formulas:
Fletcher-Reeves: ;
Polak-Ribiere+: .
Key facts:
for SPD quadratics with exact line search, linear CG terminates in at most steps;
nonlinear CG is memory-light (
O(n)storage) and often faster than GD;practical performance depends strongly on line-search quality and resets.
BFGS Quasi-Newton¶
BFGS stands for Broyden-Fletcher-Goldfarb-Shanno (the four authors who proposed this update family).
Core idea: mimic Newton’s method without computing the exact Hessian. Instead, build a sequence of positive-definite inverse-Hessian approximations from gradient differences, so directions keep curvature information while each iteration only needs function/gradient evaluations.
Approximate inverse Hessian and use
With
BFGS update is
Equivalent expanded form:
So the inverse-Hessian approximation is updated through rank-1 outer-product terms (overall a rank-2 correction per iteration).
Practical points:
much faster than GD on ill-conditioned smooth problems;
good local behavior (often superlinear with standard assumptions);
more memory and linear algebra than GD/CG.
For large scale: use L-BFGS (limited-memory variant): keep only the last pairs (typically ), do not form explicitly, and compute via the two-loop recursion. Memory drops from to and matrix factorizations are avoided.
Trust-Region (TR) Methods¶
Instead of fixing a direction then line-searching, solve a local model:
Accept/reject with
Rule of thumb:
small: model inaccurate -> shrink radius ;
good: accept step, maybe enlarge .
Advantages:
very robust under nonconvexity or poor Hessian quality;
natural handling of negative curvature.
Typical subproblem solvers: Cauchy step, dogleg, truncated CG.
Method Selection Cheat Sheet¶
GD + Armijo: safest baseline, cheapest iteration, slowest asymptotically.Nonlinear CG: cheap memory, often better than GD, more tuning-sensitive.BFGS: strong default for smooth medium-size problems.Trust-region: robust when curvature is difficult/nonconvex.
In reduced PDE-constrained optimization:
dominant cost = gradient/Hessian-vector evaluation (state/adjoint solves);
iteration count matters less than “PDE solves per accepted step”;
robust globalisation is critical.
Finite-Dimensional Conceptual Experiments (Notebook)¶
In jupyterbook/codes/lecture03/optimization_toolbox.ipynb:
GD + Armijo on an anisotropic quadratic.
Nonlinear CG (FR and PR+) on Rosenbrock.
BFGS vs GD comparison on Rosenbrock.
Trust-region (dogleg) on a shifted nonconvex quadratic.
Each experiment reports:
objective history;
gradient norm decay;
trajectory in 2D (when applicable).
To start separating reusable code from lecture-specific examples, the repository now also includes
jupyterbook/codes/common/optim.py and the small script
jupyterbook/codes/lecture03/gradient_descent_demo.py.
The notebook remains the main teaching artifact; the Python script is a lighter-weight
example that reuses code meant to be shared by later lectures.