Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Numerical Optimization Toolbox for Reduced OCPs

University of Pisa

Overview

This lecture is a numerical optimization toolbox for the reduced formulation

minuRnf(u).\min_{u\in\mathbb{R}^n} f(u).

The same algorithms will later be reused for PDE-constrained optimization, where f(u)=J(S(u),u)f(u)=J(S(u),u) and each gradient evaluation typically requires state/adjoint solves.

Main goals:

  1. put GD, nonlinear CG, BFGS, and trust-region in one common framework;

  2. isolate the role of Armijo/Wolfe conditions;

  3. compare methods by cost per iteration and robustness;

  4. prepare the transition to reduced PDE-constrained optimization.


Unified Iterative Template

Most unconstrained methods fit, at a high level, the same iterative template: Choose u0Rnu_0\in\mathbb{R}^n and for k=0,1,2,k=0,1,2,\ldots compute

uk+1=uk+αkpk,u_{k+1}=u_k+\alpha_k p_k,

with four design choices:

Typical stopping checks (often combined):

f(uk)εg,f(uk+1)f(uk)max(1,f(uk))εf,uk+1ukεu,kkmax.\|\nabla f(u_k)\|\le \varepsilon_g,\qquad \frac{|f(u_{k+1})-f(u_k)|}{\max(1,|f(u_k)|)}\le \varepsilon_f,\qquad \|u_{k+1}-u_k\|\le \varepsilon_u, \qquad k\le k_{\max}.

Line Search and Armijo

Given a descent direction pkp_k with f(uk)Tpk<0\nabla f(u_k)^T p_k<0, line search picks αk>0\alpha_k>0 to ensure robust decrease.

The ideal choice is the exact line-search minimizer

αkargminα0ϕ(α),ϕ(α):=f(uk+αpk).\alpha_k^{\star}\in \operatorname*{argmin}_{\alpha\ge 0}\phi(\alpha), \qquad \phi(\alpha):=f(u_k+\alpha p_k).

Since

ϕ(α)=f(uk+αpk)Tpk,\phi'(\alpha)=\nabla f(u_k+\alpha p_k)^T p_k,

an interior minimizer satisfies ϕ(αk)=0\phi'(\alpha_k^\star)=0. Using the second-order expansion at α=0\alpha=0,

ϕ(α)ϕ(0)+αgkTpk+α22pkTHkpk,\phi(\alpha)\approx \phi(0)+\alpha\, g_k^Tp_k+\frac{\alpha^2}{2}\,p_k^T H_k p_k,

with gk=f(uk)g_k=\nabla f(u_k) and Hk=2f(uk)H_k=\nabla^2 f(u_k), the minimizer of this local model is

αkgkTpkpkTHkpk.\alpha_k^{\star}\approx -\frac{g_k^T p_k}{p_k^T H_k p_k}.

For a quadratic f(u)=12uTHubTuf(u)=\frac12 u^T H u-b^T u (constant Hessian), this is exact:

αk=gkTpkpkTHpk,\alpha_k^{\star}= -\frac{g_k^T p_k}{p_k^T H p_k},

and for steepest descent (pk=gkp_k=-g_k):

αk=gkTgkgkTHgk.\alpha_k^{\star}= \frac{g_k^T g_k}{g_k^T H g_k}.

In general nonlinear problems, exact line search is usually too expensive; backtracking is the standard practical alternative.

Armijo backtracking

Choose α0>0\alpha_0>0, c1(0,1)c_1\in(0,1), β(0,1)\beta\in(0,1). Set αα0\alpha\leftarrow \alpha_0 and reduce αβα\alpha\leftarrow \beta\alpha until

f(uk+αpk)f(uk)+c1αf(uk)Tpk.f(u_k+\alpha p_k)\le f(u_k)+c_1\alpha \nabla f(u_k)^T p_k.

Interpretation:

Wolfe conditions (for 0<c1<c2<10<c_1<c_2<1):

f(uk+αpk)f(uk)+c1αf(uk)Tpk,f(uk+αpk)Tpkc2f(uk)Tpk.\begin{aligned} f(u_k+\alpha p_k) &\le f(u_k)+c_1\alpha \nabla f(u_k)^T p_k,\\ \nabla f(u_k+\alpha p_k)^T p_k &\ge c_2\,\nabla f(u_k)^T p_k. \end{aligned}

Strong Wolfe replaces the curvature inequality with

f(uk+αpk)Tpkc2f(uk)Tpk.\left|\nabla f(u_k+\alpha p_k)^T p_k\right| \le c_2\left|\nabla f(u_k)^T p_k\right|.

Geometric interpretation along the line ϕ(α):=f(uk+αpk)\phi(\alpha):=f(u_k+\alpha p_k):


Gradient Descent (GD)

Direction:

pk=f(uk).p_k=-\nabla f(u_k).

Strengths:

Limitations:

Use case in this course: reference solver for reduced OCP prototypes.


Nonlinear Conjugate Gradient (CG)

Replace steepest descent with

pk=gk+βkpk1,gk=f(uk),p_k=-g_k+\beta_k p_{k-1},\qquad g_k=\nabla f(u_k),

with reset pk=gkp_k=-g_k when needed.

Popular βk\beta_k formulas:

Key facts:


BFGS Quasi-Newton

BFGS stands for Broyden-Fletcher-Goldfarb-Shanno (the four authors who proposed this update family).

Core idea: mimic Newton’s method without computing the exact Hessian. Instead, build a sequence of positive-definite inverse-Hessian approximations from gradient differences, so directions keep curvature information while each iteration only needs function/gradient evaluations.

Approximate inverse Hessian Hk2f(uk)1H_k\approx \nabla^2 f(u_k)^{-1} and use

pk=Hkgk.p_k=-H_k g_k.

With

sk=uk+1uk,yk=gk+1gk,ykTsk>0,s_k=u_{k+1}-u_k,\qquad y_k=g_{k+1}-g_k,\qquad y_k^T s_k>0,

BFGS update is

Hk+1=(IρkskykT)Hk(IρkykskT)+ρkskskT,ρk=1ykTsk.H_{k+1}=\left(I-\rho_k s_k y_k^T\right)H_k\left(I-\rho_k y_k s_k^T\right)+\rho_k s_k s_k^T, \quad \rho_k=\frac{1}{y_k^T s_k}.

Equivalent expanded form:

Hk+1=HkHkykykTHkykTHkyk+skskTykTsk.H_{k+1} =H_k -\frac{H_k y_k y_k^T H_k}{y_k^T H_k y_k} +\frac{s_k s_k^T}{y_k^T s_k}.

So the inverse-Hessian approximation is updated through rank-1 outer-product terms (overall a rank-2 correction per iteration).

Practical points:

For large scale: use L-BFGS (limited-memory variant): keep only the last mm pairs (si,yi)(s_i,y_i) (typically m[5,20]m\in[5,20]), do not form HkH_k explicitly, and compute pk=Hkgkp_k=-H_k g_k via the two-loop recursion. Memory drops from O(n2)O(n^2) to O(mn)O(mn) and matrix factorizations are avoided.


Trust-Region (TR) Methods

Instead of fixing a direction then line-searching, solve a local model:

minpΔkmk(p)=fk+gkTp+12pTBkp.\min_{\|p\|\le \Delta_k} m_k(p) =f_k+g_k^Tp+\frac12 p^T B_k p.

Accept/reject with

ρk=f(uk)f(uk+pk)mk(0)mk(pk).\rho_k=\frac{f(u_k)-f(u_k+p_k)}{m_k(0)-m_k(p_k)}.

Rule of thumb:

Advantages:

Typical subproblem solvers: Cauchy step, dogleg, truncated CG.


Method Selection Cheat Sheet

In reduced PDE-constrained optimization:


Finite-Dimensional Conceptual Experiments (Notebook)

In jupyterbook/codes/lecture03/optimization_toolbox.ipynb:

  1. GD + Armijo on an anisotropic quadratic.

  2. Nonlinear CG (FR and PR+) on Rosenbrock.

  3. BFGS vs GD comparison on Rosenbrock.

  4. Trust-region (dogleg) on a shifted nonconvex quadratic.

Each experiment reports:

To start separating reusable code from lecture-specific examples, the repository now also includes jupyterbook/codes/common/optim.py and the small script jupyterbook/codes/lecture03/gradient_descent_demo.py. The notebook remains the main teaching artifact; the Python script is a lighter-weight example that reuses code meant to be shared by later lectures.