Overview¶
The previous lectures introduced nonsmooth PDE-constrained optimization, active-set methods, and Newton-type algorithms for nonlinear optimality systems. We now return to smooth nonlinear control problems and look at them from a slightly different angle: the Hamiltonian structure of optimal control.
The goal of this lecture is to explain the Sequential Quadratic Hamiltonian method, usually abbreviated as SQH. The method can be interpreted as a structure-preserving Newton or SQP strategy applied to the full state, adjoint, and control system.
The main ideas are:
optimal control problems naturally generate Hamiltonian systems;
the state and adjoint equations form a primal-dual pair;
second-order methods can be written without explicitly assembling a reduced Hessian;
SQH solves a sequence of quadratic optimal control subproblems obtained by local Hamiltonian approximation.
The method is especially useful for nonlinear PDE-constrained optimization, where reduced gradients are easy to compute but reduced Hessians are expensive to form explicitly.
Throughout the lecture, the emphasis is conceptual and algebraic. The linear algebra and implementation issues are the same ones we have already met in KKT and Newton systems: block structure, saddle-point matrices, preconditioning, and globalization.
From Optimality Systems to Hamiltonians¶
Consider an abstract PDE-constrained optimization problem:
subject to
Here:
is the state;
is the control;
is the state equation.
For example, a semilinear distributed control problem has the form
with homogeneous Dirichlet boundary conditions, and cost functional
Introduce an adjoint variable and define the Lagrangian
The first-order optimality system is
These equations are, respectively:
the state equation;
the adjoint equation;
the stationarity equation with respect to the control.
In this abstract PDE setting, the Hamiltonian is the function on state-control-adjoint variables defined by
If the objective and the PDE residual are written through local densities, for instance
then the corresponding Hamiltonian density is formally
The equations
are precisely the state, adjoint, and control stationarity equations written in Hamiltonian form.
Pontryagin Principle for the Abstract PDE Problem¶
For the abstract problem introduced above, the Pontryagin principle can be read as a first-order optimality system in function spaces.
Let
subject to
where
maps the state and control into the dual of the adjoint space. Assume for the moment that the problem is smooth and that a constraint qualification holds at a local solution .
Then there exists an adjoint variable
such that the Lagrangian
satisfies the following conditions.
The state equation is recovered by differentiating with respect to the adjoint variable:
Equivalently,
The adjoint equation is obtained by differentiating with respect to the state:
In operator notation this is
Finally, the control condition is
If , this reduces to the stationarity equation
If is a closed convex set, the same condition is a variational inequality:
This is the infinite-dimensional version of the maximum principle. In the unconstrained case the Hamiltonian is stationary with respect to the control; with control constraints the optimal control minimizes the Hamiltonian over the admissible set.
More explicitly, if the PDE and the cost admit local densities, one may use the Hamiltonian density
and the Pontryagin condition becomes the pointwise or weak minimization condition
or, in the smooth unconstrained case,
For PDE-constrained optimization, this statement is usually interpreted weakly through the Lagrangian derivatives above. The adjoint variable is the dual Hamiltonian variable, and the triple
solves a coupled state-adjoint-control system.
Hamiltonian Form of a PDE-Constrained Problem¶
Let , , and be Hilbert spaces for the state, control, and adjoint. Suppose the PDE residual is
The corresponding Lagrangian is
For a quadratic tracking cost
the first-order system reads
The nonlinear residual can be written compactly as
A standard full-space Newton method would solve
This Newton viewpoint is useful, but it hides the specific optimal-control idea behind SQH. The method comes from the Pontryagin principle and from successive pointwise Hamiltonian optimization. We now build that path in steps.
Pontryagin Maximum Principle¶
We write the controlled PDE in the abstract form
and assume that for every admissible control there is a unique state
The reduced cost is
To connect with the SQH algorithm, it is useful to write the PDE locally as
where denotes the independent variable. Depending on the model, may be:
a time variable ;
a space variable ;
a space-time variable .
This notation suppresses derivatives and weak-form terms. For example, in an elliptic PDE the symbol contains the differential operator acting on , not only the point value of .
The Hamiltonian density is written as
where:
is the state at the current point;
is the trial control value;
is the reference or current control value, included because the regularized SQH Hamiltonian below depends on it;
is the adjoint variable.
For the unregularized Pontryagin principle the dependence on the reference control is absent, and we simply write
The Pontryagin Maximum Principle states that, if is optimal and is the corresponding state, then there exists an adjoint state such that:
solves the state equation with control ;
solves the adjoint equation associated with ;
the optimal control satisfies the pointwise Hamiltonian condition
for almost every .
Here is the set of pointwise admissible values. For example, for box constraints,
The sign convention is important but not essential. With the opposite sign in the adjoint equation, the maximum condition becomes an equivalent minimum condition. The algorithmic point is the same: after the state and adjoint are known, the control update is obtained from a local optimization problem in the variable .
Rozonoer Estimate¶
The reason the Pontryagin condition is algorithmically useful is that Hamiltonian improvement implies cost improvement, up to higher-order terms. This idea goes back to Rozonoer’s analysis of successive approximation methods for optimal control.
Let and be two admissible controls, with corresponding states
Let be the adjoint associated with . Under the usual smoothness and stability assumptions, one can estimate the cost difference as
The domain is the variable domain of the control: it may be a time interval, a spatial domain, or a space-time cylinder.
The estimate should be read as follows:
the leading term is the Hamiltonian gain obtained by replacing with while freezing the state and adjoint at ;
the last term is the price paid for the fact that the true state changes from to ;
for small control changes, the quadratic remainder is dominated by the Hamiltonian gain.
Thus, if increases the Hamiltonian enough pointwise, then the total cost decreases.
This is the key estimate behind successive approximation schemes. It turns a global optimal-control problem into repeated local Hamiltonian optimizations, followed by a global state solve.
Successive Approximation Schemes¶
A basic successive approximation scheme starts from a control and then repeats the following operations.
Given :
solve the state equation to obtain ;
solve the adjoint equation associated with to obtain ;
compute a new control by pointwise Hamiltonian maximization,
solve the state equation again with control .
The central feature is Step 3. It is not a PDE solve. It is a pointwise optimization problem. In many important cases it can be computed explicitly:
for box constraints, by checking endpoints or projecting a stationary point;
for finite-valued controls, by comparing finitely many Hamiltonian values;
for quadratic control costs, by a local scalar or vector quadratic optimization.
This explains why the PMP is attractive computationally. The expensive operations are the state and adjoint solves; the control update is local.
The weakness of the basic scheme is that the pointwise maximizer may be too aggressive. It can produce a control far from , so the quadratic remainder in the Rozonoer estimate may dominate the Hamiltonian gain. In that case the cost may fail to decrease.
Robust Successive Approximation¶
To stabilize the method, one modifies the Hamiltonian by penalizing large changes in the control. Given a parameter , define
The reference value is the current control. At iteration , the local control update becomes
The penalty term has two effects:
it keeps the new control close to the old one;
it makes the local maximization strongly concave when the original Hamiltonian is not sufficiently well behaved in the control variable.
If is too small, the method may still be unstable. If is too large, the update is very small and convergence becomes slow. Therefore is adapted during the iteration.
The descent test is based on the actual cost decrease. Let
For a prescribed , accept the new control if
If the test fails, increase and solve the pointwise optimization problem again. If the test succeeds, decrease so that the next iteration can try a less conservative update.
This is the robust successive approximation mechanism that leads directly to the SQH algorithm.
The Sequential Quadratic Hamiltonian Algorithm¶
The Sequential Quadratic Hamiltonian method uses the robust Hamiltonian
inside a successive approximation loop.
The word “quadratic” refers to the stabilizing quadratic term in the control increment. The algorithm is still driven by the Pontryagin condition: at each iteration the new control is obtained by solving a pointwise Hamiltonian optimization problem.
Choose:
an initial control ;
a maximum number of iterations ;
a stopping tolerance ;
parameters , , , and .
Set
and compute the initial state from the governing model with control .
While
perform the following steps.
Compute the adjoint associated with the current pair .
Determine by solving the pointwise optimization problem
for almost every .
This is the defining local step of SQH. The variable may be , , or , depending on the control problem.
Compute the new state by solving the governing model with control .
Compute the update size
Check the actual cost decrease.
If
then the decrease is not sufficient. Increase the regularization parameter,
and return to Step 2 with the same .
If instead
then accept the update, decrease the regularization parameter,
and continue.
Set
The loop stops when the control update is smaller than the tolerance or when the maximum number of iterations is reached.
Convergence Theorem¶
The SQH method is a successive approximation scheme with an augmented Hamiltonian. The role of the adaptive parameter is to guarantee a sufficient decrease of the cost functional. In Step 2, an exact pointwise maximization is convenient, but the convergence mechanism only needs a sufficient, possibly partial, Hamiltonian improvement.
The following theorem records the key descent estimate. We state it without proof.
Theorem. Let
be generated by the SQH algorithm, and assume that and are measurable. Under appropriate smoothness, boundedness, and stability assumptions on the state equation and on the running cost , there exists a constant
independent of such that, for the value of currently chosen by the SQH algorithm,
In particular, if
and
then
This is exactly the sufficient-decrease condition used in Step 5 of the algorithm. Therefore, if a trial update does not decrease the cost enough, increasing eventually makes the Hamiltonian update conservative enough to satisfy the acceptance test.
The algorithm can therefore be summarized in one sentence:
SQH alternates global state-adjoint solves with a pointwise maximization of a quadratically regularized Hamiltonian.
This is the essential distinction from a generic Newton method. Newton linearizes the full KKT system; SQH uses the Pontryagin structure to turn the control step into a local Hamiltonian optimization problem, while the PDE coupling remains in the state and adjoint equations.
Course Summary: Algorithmic Map¶
We close the course with a compact map of the main algorithmic families we have seen. The table is deliberately practical: it compares what is solved at each iteration, the dominant computational cost, and the kind of admissible sets each method naturally handles.
Here “one PDE solve” means one elliptic solve in stationary problems, or one full forward or backward time march in parabolic problems.
| Method | Formulation | Typical cost per iteration | Advantages | Limitations | Admissible sets and nonsmoothness |
|---|---|---|---|---|---|
| Reduced gradient descent | Reduced: optimize | One state solve + one adjoint solve; line search adds extra state solves | Simple, robust, low memory, easy to implement | Often slow; sensitive to scaling; first-order only | Best for smooth convex ; can handle simple constraints only through projection or penalties |
| Armijo / Wolfe line search | Globalization layer for reduced methods | Several trial cost evaluations, hence extra state solves | Gives reliable decrease; stabilizes gradient, CG, BFGS | Can dominate cost if each trial requires a PDE solve | Works for smooth problems; projected variants needed for constrained sets |
| Nonlinear conjugate gradient | Reduced | One gradient evaluation per accepted step, plus line search | Better than steepest descent with little extra memory | Less robust on strongly nonlinear or nonsmooth problems | Mostly smooth or simple projected variants |
| BFGS / L-BFGS | Reduced quasi-Newton | One gradient evaluation + line search; stores curvature pairs | Often much faster than gradient descent; no exact Hessian | Needs smoothness and good line search; curvature updates can fail near nonsmooth active sets | Smooth problems; L-BFGS-B-style variants for boxes, but nonsmooth terms need splitting |
| Reduced Newton / trust region | Reduced second-order | Hessian or Hessian-vector products; each product may require incremental state/adjoint solves | Fast local convergence; good for nonlinear smooth problems | More complex; expensive linear algebra; needs globalization | Smooth nonconvex problems if Hessian model and globalization are adequate; constraints require SQP/TR machinery |
| All-at-once KKT solve | Simultaneous state-adjoint-control system | One large saddle-point linear solve for linear-quadratic problems | Solves linear-quadratic unconstrained problems in one shot; exposes block structure | Indefinite systems; preconditioning is essential | Natural for equality-constrained smooth problems; inequalities need complementarity or active sets |
| Projected gradient | Reduced constrained | One state + one adjoint + pointwise projection; line search may add state solves | Very simple for box constraints; active set visible through saturation | First-order convergence; step-size dependent | Excellent for closed convex simple sets such as boxes; not suitable for nonconvex without modifications |
| Primal-dual active set (PDAS) | KKT/complementarity | Active-set prediction + constrained KKT solve per iteration | Often finite or very fast active-set convergence; natural for box constraints | Requires good active-set logic and linear solvers; can oscillate without safeguards | Very good for convex box constraints and complementarity systems; equivalent to semismooth Newton in many cases |
| Semismooth Newton | Nonsmooth equation / generalized derivative | Linearized generalized KKT solve per iteration | Superlinear local convergence for structured nonsmoothness | More technical; needs semismooth reformulation and active-set identification | Excellent for projections, max/min, terms, complementarity; usually assumes convex structure |
| Subgradient descent | Reduced nonsmooth | One state + one adjoint + choice of subgradient | Most elementary nonsmooth method; conceptually robust | Very slow; difficult step-size tuning; subgradient may be nonunique | Handles convex nonsmooth functionals such as , but rarely the best computational choice |
| Proximal gradient / forward-backward splitting | Reduced composite | One gradient evaluation for + local proximal map for | Exploits exact nonsmooth structure; cheap local updates; natural for sparsity | Still first-order; needs proximal map and step-size control | Excellent for convex nonsmooth terms such as and boxes when prox/projection is explicit |
| Sparse-control PDAS | Slack-variable or multiplier KKT | Active-set update + linear solve; often few iterations near solution | Captures sparsity pattern sharply; much faster than subgradient methods | More implementation effort; active sets can be delicate | Strong for convex -type sparsity and box constraints; relies on complementarity structure |
| One-shot Newton KKT for nonlinear/inverse problems | Simultaneous nonlinear KKT | Assemble residual/Jacobian + large Newton correction; line search/damping may add residual evaluations | Treats state, adjoint, and parameter together; powerful for inverse problems | Nonlinear saddle-point systems; preconditioning and globalization are hard | Smooth nonconvex problems possible, but only local; boxes require PDAS or projection safeguards |
| SQP | Full-space or reduced constrained optimization | Quadratic subproblem with linearized constraints; cost depends on subproblem solve | General framework for smooth constrained nonlinear problems | Heavy machinery; globalization and Hessian approximation matter | Handles smooth convex constraints well; nonconvex constraints possible but only with local guarantees |
| SQH | Pontryagin / Hamiltonian successive approximation | One adjoint solve + pointwise Hamiltonian maximization + one state solve; repeated local solve if changes | Uses PMP structure; control update is local; adaptive guarantees sufficient decrease | Depends on Hamiltonian structure; still local for nonconvex problems; theory needs smoothness/stability assumptions | Very flexible for pointwise admissible sets , including nonconvex or finite-valued sets, because Step 2 is a local optimization |
The main dividing lines are these:
Reduced vs all-at-once. Reduced methods make every iteration look like an optimization step in the control space, but every gradient hides state and adjoint solves. All-at-once methods expose the KKT structure directly and shift the difficulty to saddle-point linear algebra.
First-order vs second-order. First-order methods are robust and cheap per iteration; second-order methods are expensive per iteration but can be much faster near the solution.
Smooth vs nonsmooth. If nonsmoothness is simple and convex, proximal and active-set methods exploit it directly. If nonsmoothness comes from complementarity, semismooth Newton and PDAS are the natural tools.
Convex vs nonconvex admissible sets. Projection and proximal methods are cleanest for convex sets. SQH is unusual because its control step is a pointwise optimization over , so finite-valued or nonconvex pointwise admissible sets can be handled locally, although only local convergence and descent guarantees should be expected for the full problem.
As a rule of thumb:
start with a reduced gradient or projected-gradient method when building a new code;
move to proximal methods when the objective contains a known convex nonsmooth term;
use PDAS or semismooth Newton when the active set is the main structure;
use Newton, SQP, or one-shot KKT methods when second-order information is worth the linear algebra cost;
use SQH when the Pontryagin Hamiltonian gives a cheap and meaningful pointwise control update.