Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

General Optimality Systems for PDE-Constrained Optimization

University of Pisa

Overview

The previous lectures developed a complete theory for linear-quadratic elliptic optimal control:

This lecture places these results in an abstract framework that covers nonlinear state equations and general objective functionals.

The logical path of the lecture:

  1. fix the functional analytic setting — spaces, duality pairings, and adjoint operators;

  2. define Fréchet differentiability of maps between Banach spaces;

  3. derive the differentiability of the control-to-state map via the Implicit Function Theorem;

  4. compute the Fréchet derivative of the reduced functional via the chain rule;

  5. introduce the adjoint equation and rewrite the gradient without nested PDE solves;

  6. formulate the Lagrangian, derive the full KKT system, and state the second-order conditions;

  7. define the normal cone precisely and recover the variational inequality and projection formula;

  8. compare the reduced and all-at-once approaches in terms of structure and computational cost;

  9. verify the framework on a semilinear elliptic example.


Functional Analytic Setting

Let XX, YY, ZZ, UU be real Banach spaces. A Banach space is a complete normed vector space. A Hilbert space is a Banach space whose norm is induced by an inner product: v2=(v,v)\|v\|^2 = (v,v). Hilbert spaces are reflexive, meaning (Y)Y(Y')' \cong Y isometrically.

Dual spaces and duality pairings. The dual space XX' consists of all bounded linear functionals :XR\ell: X \to \mathbb{R}. For X\ell \in X' and xXx \in X we write the duality pairing

,xX,X:=(x).\langle \ell, x \rangle_{X', X} := \ell(x).

The norm of XX' is X:=supxX1,x\|\ell\|_{X'} := \sup_{\|x\|_X \le 1} |\langle \ell, x \rangle|.

Spaces used in this lecture.

Adjoint operators. Let A:XZA: X \to Z' be a bounded linear operator. Its adjoint A:ZXA^*: Z \to X' is defined by

Ax,zZ,Z=x,AzX,XxX,  zZ.\langle Ax, z \rangle_{Z', Z} = \langle x, A^* z \rangle_{X, X'} \qquad \forall x \in X, \; z \in Z.

If XX and ZZ are Hilbert spaces and we identify ZZZ \cong Z' via the Riesz isomorphism, this becomes (Ax,z)Z=(x,Az)X(Ax, z)_Z = (x, A^* z)_X.

Riesz isomorphism. In a Hilbert space XX, the Riesz map RX:XX\mathcal{R}_X: X \to X' is defined by

RXu,vX,X=(u,v)XvX.\langle \mathcal{R}_X u, v \rangle_{X', X} = (u, v)_X \qquad \forall v \in X.

It is an isometric isomorphism. Its inverse RX1:XX\mathcal{R}_X^{-1}: X' \to X identifies the gradient: given X\ell \in X', the unique element g:=RX1Xg := \mathcal{R}_X^{-1} \ell \in X satisfies

,vX,X=(g,v)XvX,\langle \ell, v \rangle_{X', X} = (g, v)_X \qquad \forall v \in X,

and we write g=f(u)g = \nabla f(u) when =f(u)\ell = f'(u).


Fréchet Differentiability

Let XX and WW be Banach spaces, and let F:XWF: X \to W. FF is Fréchet differentiable at xXx \in X if there exists a bounded linear operator F(x):XWF'(x): X \to W such that

limhX0F(x+h)F(x)F(x)hWhX=0.\lim_{\|h\|_X \to 0} \frac{\|F(x+h) - F(x) - F'(x)h\|_W}{\|h\|_X} = 0.

The operator F(x)F'(x) is called the Fréchet derivative of FF at xx.

For real-valued functionals F:XRF: X \to \mathbb{R}, the Fréchet derivative is an element of XX':

F(x)X.F'(x) \in X'.

Partial derivatives. For a map F:X1×X2WF: X_1 \times X_2 \to W, the partial Fréchet derivative with respect to X1X_1 at (x1,x2)(x_1, x_2) in direction h1X1h_1 \in X_1 is

Fx1(x1,x2)h1:=limt0F(x1+th1,x2)F(x1,x2)t.F_{x_1}(x_1, x_2) h_1 := \lim_{t \to 0} \frac{F(x_1 + t h_1, x_2) - F(x_1, x_2)}{t}.

For fixed (x1,x2)(x_1, x_2), the map h1Fx1(x1,x2)h1h_1 \mapsto F_{x_1}(x_1, x_2) h_1 is a bounded linear operator from X1X_1 to WW.

Chain rule. If G:XWG: X \to W and H:WVH: W \to V are Fréchet differentiable, then (HG):XV(H \circ G): X \to V is Fréchet differentiable and

(HG)(x)=H(G(x))G(x).(H \circ G)'(x) = H'(G(x)) \circ G'(x).

For our purposes: given S:UYS: U \to Y and J:Y×URJ: Y \times U \to \mathbb{R}, with f(u):=J(S(u),u)f(u) := J(S(u), u),

f(u)h=Jy(S(u),u),S(u)hY,Y+Ju(S(u),u),hU,UhU.f'(u)h = \langle J_y(S(u), u), S'(u) h \rangle_{Y', Y} + \langle J_u(S(u), u), h \rangle_{U', U} \qquad \forall h \in U.

Abstract Problem

We now state the abstract optimal control problem and fix notation for all derivatives.

Objective functional.

J:Y×URJ: Y \times U \to \mathbb{R}

is assumed Fréchet differentiable, with partial derivatives

Jy(y,u)Y,Ju(y,u)U.J_y(y, u) \in Y', \qquad J_u(y, u) \in U'.

Constraint mapping.

e:Y×UZe: Y \times U \to Z'

is assumed Fréchet differentiable, with partial derivatives

ey(y,u):YZ,eu(y,u):UZ.e_y(y, u): Y \to Z', \qquad e_u(y, u): U \to Z'.

These are bounded linear operators for each fixed (y,u)(y, u).

Admissible controls.

UadUU_{\mathrm{ad}} \subset U

is a nonempty, closed, and convex subset.

Optimal control problem. Find

(yˉ,uˉ)Y×Uad(\bar y, \bar u) \in Y \times U_{\mathrm{ad}}

such that

J(yˉ,uˉ)=inf{J(y,u):  e(y,u)=0 in Z,  uUad}.J(\bar y, \bar u) = \inf\bigl\{ J(y, u):\; e(y, u) = 0 \text{ in } Z',\; u \in U_{\mathrm{ad}} \bigr\}.

Standing assumptions.

  1. For every uUadu \in U_{\mathrm{ad}}, the equation e(y,u)=0e(y, u) = 0 in ZZ' admits a unique solution yYy \in Y.

  2. JJ and ee are of class C1C^1 (all Fréchet derivatives exist and are continuous).

  3. At every feasible pair (y,u)(y, u), the operator ey(y,u):YZe_y(y, u): Y \to Z' is an isomorphism.

Assumption 3 is the regularity (constraint qualification) condition. For the Poisson state equation, it holds because Δ:H01(Ω)H1(Ω)-\Delta: H_0^1(\Omega) \to H^{-1}(\Omega) is a homeomorphism.


Control-to-State Map

Under Assumption 1, define

S:UadY,uy=S(u),S: U_{\mathrm{ad}} \to Y, \qquad u \mapsto y = S(u),

where S(u)S(u) is the unique solution of e(y,u)=0e(y, u) = 0 in ZZ'. The map SS is the control-to-state operator (or solution operator) of the state equation.

Differentiability via the Implicit Function Theorem. The Implicit Function Theorem in Banach spaces states: if e:Y×UZe: Y \times U \to Z' is C1C^1 and ey(y0,u0):YZe_y(y_0, u_0): Y \to Z' is an isomorphism at some feasible point (y0,u0)(y_0, u_0), then locally around u0u_0 the map uS(u)u \mapsto S(u) is C1C^1 and

S(u)h=ey(y,u)1eu(y,u)hhU,S'(u)h = -e_y(y, u)^{-1} e_u(y, u) h \qquad \forall h \in U,

where y=S(u)y = S(u).

Derivation. Differentiate the identity e(S(u),u)=0e(S(u), u) = 0 with respect to uu in direction hh:

ey(y,u)[S(u)h]+eu(y,u)[h]=0in Z.e_y(y, u)\bigl[S'(u) h\bigr] + e_u(y, u)\bigl[h\bigr] = 0 \qquad \text{in } Z'.

Solving for S(u)hS'(u)h using invertibility of eye_y:

S(u)h=ey(y,u)1eu(y,u)h.S'(u) h = -e_y(y, u)^{-1} e_u(y, u) h.

This formula shows that:

Example (linear Poisson model). For e(y,u)=Δyue(y, u) = -\Delta y - u:

ey(y,u)=Δ:H01(Ω)H1(Ω),eu(y,u)=I:L2(Ω)H1(Ω).e_y(y, u) = -\Delta : H_0^1(\Omega) \to H^{-1}(\Omega), \qquad e_u(y, u) = -I : L^2(\Omega) \to H^{-1}(\Omega).

Hence S(u)h=(Δ)1h=S(h)S'(u)h = (-\Delta)^{-1}h = S(h), consistent with the linearity of SS in that case.


Reduced Functional and Chain Rule

Define the reduced functional

f:UadR,f(u):=J(S(u),u).f: U_{\mathrm{ad}} \to \mathbb{R}, \qquad f(u) := J(S(u), u).

The PDE-constrained problem becomes an optimization problem in uu alone:

minuUadf(u).\min_{u \in U_{\mathrm{ad}}} f(u).

Fréchet derivative of ff. By the chain rule,

f(u)h=Jy(y,u),S(u)hY,Y+Ju(y,u),hU,UhU,f'(u) h = \langle J_y(y, u),\, S'(u) h \rangle_{Y', Y} + \langle J_u(y, u),\, h \rangle_{U', U} \qquad \forall h \in U,

where y=S(u)y = S(u). Substituting the formula for S(u)hS'(u)h:

f(u)h=Jy(y,u),ey(y,u)1eu(y,u)hY,Y+Ju(y,u),hU,U.f'(u) h = -\langle J_y(y, u),\, e_y(y, u)^{-1} e_u(y, u) h \rangle_{Y', Y} + \langle J_u(y, u),\, h \rangle_{U', U}.

This expression is not yet computationally convenient: evaluating the first term for every direction hh requires one linearized state solve per direction tested. The adjoint equation removes this bottleneck by moving the operator to the other argument.


Adjoint Equation

Setup. Let (y,u)(y, u) be a feasible pair. We want to rewrite the directional derivative f(u)hf'(u)h so that hh appears only in one bounded linear functional, without intermediate state solves.

Definition of the adjoint state. Introduce pZp \in Z as the solution of the adjoint equation

ey(y,u)p=Jy(y,u)in Y.e_y(y, u)^* p = J_y(y, u) \qquad \text{in } Y'.

The adjoint state pp is uniquely determined because ey(y,u):ZYe_y(y, u)^*: Z \to Y' is an isomorphism (dual of the isomorphism ey(y,u):YZe_y(y,u): Y \to Z').

Note: pp lives in ZZ, the pre-dual of ZZ'. For the Poisson model, Z=Z=H1(Ω)Z = Z' = H^{-1}(\Omega) and the Riesz identification sends pp to H01(Ω)H_0^1(\Omega).

Gradient formula via the adjoint. With pp as above, use the duality identity: for any invertible A:YZA: Y \to Z',

q,A1wY,Y=(A)1q,wZ,ZqY,  wZ.\langle q, A^{-1} w \rangle_{Y', Y} = \langle (A^*)^{-1} q, w \rangle_{Z, Z'} \qquad \forall q \in Y', \; w \in Z'.

Apply this with A=ey(y,u)A = e_y(y,u), q=Jy(y,u)q = J_y(y, u), w=eu(y,u)hw = e_u(y, u) h, and p=(ey)1Jyp = (e_y^*)^{-1} J_y:

Jy(y,u),ey(y,u)1eu(y,u)hY,Y=p,eu(y,u)hZ,Z=eu(y,u)p,hU,U.\langle J_y(y, u),\, e_y(y, u)^{-1} e_u(y, u) h \rangle_{Y', Y} = \langle p,\, e_u(y, u) h \rangle_{Z, Z'} = \langle e_u(y, u)^* p,\, h \rangle_{U', U}.

Hence

f(u)h=Ju(y,u)eu(y,u)p,  hU,UhU.f'(u) h = \langle J_u(y, u) - e_u(y, u)^* p,\; h \rangle_{U', U} \qquad \forall h \in U.

Gradient of the reduced functional. Identifying f(u)Uf'(u) \in U' with its Riesz representative f(u)U\nabla f(u) \in U:

f(u)=RU1(Ju(y,u)eu(y,u)p)U.\nabla f(u) = \mathcal{R}_U^{-1}\bigl(J_u(y, u) - e_u(y, u)^* p\bigr) \in U.

Cost count. Regardless of how many directions hh must be tested:

Example (Poisson model). Jy=yydL2(Ω)J_y = y - y_d \in L^2(\Omega), ey=Δe_y^* = -\Delta, so the adjoint equation is Δp=yyd-\Delta p = y - y_d with p=0p = 0 on Ω\partial\Omega. Ju=αuL2(Ω)J_u = \alpha u \in L^2(\Omega), eu=Ie_u^* = -I, so f(u)=p+αu\nabla f(u) = p + \alpha u, exactly as derived in Lecture 4.


Lagrangian Formulation

The Lagrangian naturally encodes both the objective and the constraint.

Definition.

L:Y×U×ZR,L(y,u,p):=J(y,u)e(y,u),pZ,Z.\mathcal{L}: Y \times U \times Z \to \mathbb{R}, \qquad \mathcal{L}(y, u, p) := J(y, u) - \langle e(y, u), p \rangle_{Z', Z}.

Here pZp \in Z is the Lagrange multiplier for the equality constraint e(y,u)=0e(y, u) = 0. The duality pairing is well-defined because e(y,u)Ze(y, u) \in Z' and pZp \in Z.

Partial derivatives of L\mathcal{L}.

With respect to yy in direction vYv \in Y:

Ly(y,u,p)[v]=Jy(y,u),vY,Yey(y,u)v,pZ,Z=Jy(y,u)ey(y,u)p,  vY,Y.\mathcal{L}_y(y, u, p)[v] = \langle J_y(y, u),\, v \rangle_{Y', Y} - \langle e_y(y, u) v,\, p \rangle_{Z', Z} = \langle J_y(y, u) - e_y(y, u)^* p,\; v \rangle_{Y', Y}.

With respect to uu in direction hUh \in U:

Lu(y,u,p)[h]=Ju(y,u),hU,Ueu(y,u)h,pZ,Z=Ju(y,u)eu(y,u)p,  hU,U.\mathcal{L}_u(y, u, p)[h] = \langle J_u(y, u),\, h \rangle_{U', U} - \langle e_u(y, u) h,\, p \rangle_{Z', Z} = \langle J_u(y, u) - e_u(y, u)^* p,\; h \rangle_{U', U}.

With respect to pp in direction qZq \in Z:

Lp(y,u,p)[q]=e(y,u),qZ,Z.\mathcal{L}_p(y, u, p)[q] = -\langle e(y, u),\, q \rangle_{Z', Z}.

Interpretation of the conditions Lp=0\mathcal{L}_p = 0, Ly=0\mathcal{L}_y = 0.


First-Order Optimality System

We now state the full KKT conditions.

Unconstrained case (Uad=UU_{\mathrm{ad}} = U). At a local minimizer uˉ\bar u with corresponding state yˉ=S(uˉ)\bar y = S(\bar u), there exists a unique adjoint state pˉZ\bar p \in Z such that:

{e(yˉ,uˉ)=0in Z,ey(yˉ,uˉ)pˉ=Jy(yˉ,uˉ)in Y,Ju(yˉ,uˉ)eu(yˉ,uˉ)pˉ=0in U.\begin{cases} e(\bar y, \bar u) = 0 & \text{in } Z', \\[4pt] e_y(\bar y, \bar u)^* \bar p = J_y(\bar y, \bar u) & \text{in } Y', \\[4pt] J_u(\bar y, \bar u) - e_u(\bar y, \bar u)^* \bar p = 0 & \text{in } U'. \end{cases}

Constrained case (UadUU_{\mathrm{ad}} \subsetneq U). At a local minimizer uˉUad\bar u \in U_{\mathrm{ad}}, the optimality condition for uu becomes the variational inequality

f(uˉ)(uuˉ)0uUad,f'(\bar u)(u - \bar u) \ge 0 \qquad \forall u \in U_{\mathrm{ad}},

which in terms of the Lagrangian is

0Lu(yˉ,uˉ,pˉ)+NUad(uˉ)in U.0 \in \mathcal{L}_u(\bar y, \bar u, \bar p) + N_{U_{\mathrm{ad}}}(\bar u) \qquad \text{in } U'.

The full constrained KKT system is therefore:

{e(yˉ,uˉ)=0in Z,ey(yˉ,uˉ)pˉ=Jy(yˉ,uˉ)in Y,0Ju(yˉ,uˉ)eu(yˉ,uˉ)pˉ+NUad(uˉ)in U.\begin{cases} e(\bar y, \bar u) = 0 & \text{in } Z', \\[4pt] e_y(\bar y, \bar u)^* \bar p = J_y(\bar y, \bar u) & \text{in } Y', \\[4pt] 0 \in J_u(\bar y, \bar u) - e_u(\bar y, \bar u)^* \bar p + N_{U_{\mathrm{ad}}}(\bar u) & \text{in } U'. \end{cases}

The three equations live, respectively, in ZZ', YY', and UU'. The three unknowns are yˉY\bar y \in Y, pˉZ\bar p \in Z, and uˉUad\bar u \in U_{\mathrm{ad}}.


Normal Cone and Variational Inequality

Definition. Let KUK \subset U be a nonempty closed convex set and uKu \in K. The normal cone to KK at uu is the closed convex cone

NK(u):={ξU:  ξ,vuU,U0vK}.N_K(u) := \bigl\{\xi \in U' :\; \langle \xi, v - u \rangle_{U', U} \le 0 \quad \forall v \in K \bigr\}.

Geometrically: ξNK(u)\xi \in N_K(u) if and only if uu maximizes the linear functional vξ,vv \mapsto \langle \xi, v \rangle over KK (i.e., ξ\xi points outward from KK at uu).

Equivalence with the variational inequality. The normal-cone inclusion f(uˉ)NUad(uˉ)-f'(\bar u) \in N_{U_{\mathrm{ad}}}(\bar u) is equivalent to

f(uˉ)(uuˉ)0uUad.f'(\bar u)(u - \bar u) \ge 0 \qquad \forall u \in U_{\mathrm{ad}}.

Projection. When UU is a Hilbert space and the Riesz identification UUU \cong U' is used, for any wUw \in U the projection ΠK(w)\Pi_K(w) is the unique element of KK satisfying

ΠK(w)K,(wΠK(w),vΠK(w))U0vK.\Pi_K(w) \in K, \qquad (w - \Pi_K(w),\, v - \Pi_K(w))_U \le 0 \qquad \forall v \in K.

The normal-cone inclusion reads: wuNK(u)w - u \in N_K(u) if and only if u=ΠK(w)u = \Pi_K(w).

Projection formula for the control. The KKT condition 0f(uˉ)+NUad(uˉ)0 \in \nabla f(\bar u) + N_{U_{\mathrm{ad}}}(\bar u) is equivalent to

uˉ=ΠUad(uˉρf(uˉ))for any ρ>0.\bar u = \Pi_{U_{\mathrm{ad}}}(\bar u - \rho\, \nabla f(\bar u)) \qquad \text{for any } \rho > 0.

This is both a characterization of optimality and a natural fixed-point iteration (projected gradient step).

Box constraints in L2(Ω)L^2(\Omega). For

Uad={uU:  umin(x)u(x)umax(x) a.e.}U_{\mathrm{ad}} = \{u \in U :\; u_{\min}(x) \le u(x) \le u_{\max}(x) \text{ a.e.}\}

with umin,umaxL(Ω)u_{\min}, u_{\max} \in L^\infty(\Omega), the normal cone at uˉUad\bar u \in U_{\mathrm{ad}} is characterized pointwise: ξNUad(uˉ)\xi \in N_{U_{\mathrm{ad}}}(\bar u) if and only if for a.e. xΩx \in \Omega,

{ξ(x)0if uˉ(x)=umin(x),ξ(x)=0if umin(x)<uˉ(x)<umax(x),ξ(x)0if uˉ(x)=umax(x).\begin{cases} \xi(x) \ge 0 & \text{if } \bar u(x) = u_{\min}(x), \\ \xi(x) = 0 & \text{if } u_{\min}(x) < \bar u(x) < u_{\max}(x), \\ \xi(x) \le 0 & \text{if } \bar u(x) = u_{\max}(x). \end{cases}

For the linear-quadratic case, the KKT condition 0αuˉ+pˉ+NUad(uˉ)0 \in \alpha\bar u + \bar p + N_{U_{\mathrm{ad}}}(\bar u) is equivalent to the pointwise projection

uˉ(x)=Π[umin(x),umax(x)] ⁣(1αpˉ(x))a.e. in Ω.\bar u(x) = \Pi_{[u_{\min}(x),\, u_{\max}(x)]} \!\left(-\tfrac{1}{\alpha} \bar p(x)\right) \quad \text{a.e. in } \Omega.

Semismooth Newton for Control Constraints

The projection characterization of the KKT condition suggests a nonsmooth root equation. Fix ρ>0\rho > 0 and define

F(u):=uΠUad(uρf(u))U.F(u) := u - \Pi_{U_{\mathrm{ad}}}\bigl(u - \rho\, \nabla f(u)\bigr) \in U.

Then

F(uˉ)=0    uˉ=ΠUad(uˉρf(uˉ))    0f(uˉ)+NUad(uˉ).F(\bar u)=0 \iff \bar u = \Pi_{U_{\mathrm{ad}}}(\bar u - \rho\, \nabla f(\bar u)) \iff 0 \in \nabla f(\bar u) + N_{U_{\mathrm{ad}}}(\bar u).

Hence solving the constrained first-order condition is equivalent to solving the nonsmooth equation F(u)=0F(u)=0 in UU.

Proof of the three equivalences for F(uˉ)=0F(\bar u)=0

We prove the equivalence chain in detail.

Fix K:=UadK := U_{\mathrm{ad}}, and define

w:=uˉρf(uˉ).w := \bar u - \rho\, \nabla f(\bar u).
  1. First equivalence

F(uˉ)=0    uˉΠK(w)=0    uˉ=ΠK(w).F(\bar u)=0 \iff \bar u - \Pi_K(w)=0 \iff \bar u = \Pi_K(w).

This follows directly from the definition F(u)=uΠK(uρf(u))F(u)=u-\Pi_K(u-\rho\nabla f(u)).

  1. Second equivalence By the projection-normal-cone characterization in a Hilbert space, for any uKu \in K and wUw \in U:

u=ΠK(w)    wuNK(u).u = \Pi_K(w) \iff w-u \in N_K(u).

Applying this with u=uˉu=\bar u and w=uˉρf(uˉ)w=\bar u-\rho\nabla f(\bar u) gives

uˉ=ΠK(uˉρf(uˉ))    ρf(uˉ)NK(uˉ).\bar u = \Pi_K(\bar u-\rho\nabla f(\bar u)) \iff -\rho\nabla f(\bar u) \in N_K(\bar u).

Since NK(uˉ)N_K(\bar u) is a cone, ηNK(uˉ)\eta \in N_K(\bar u) implies (1/ρ)ηNK(uˉ)(1/\rho)\eta \in N_K(\bar u) for every ρ>0\rho>0, hence

ρf(uˉ)NK(uˉ)    f(uˉ)NK(uˉ)    0f(uˉ)+NK(uˉ).-\rho\nabla f(\bar u) \in N_K(\bar u) \iff -\nabla f(\bar u) \in N_K(\bar u) \iff 0 \in \nabla f(\bar u)+N_K(\bar u).

Combining 1 and 2 yields

F(uˉ)=0    uˉ=ΠUad(uˉρf(uˉ))    0f(uˉ)+NUad(uˉ).F(\bar u)=0 \iff \bar u = \Pi_{U_{\mathrm{ad}}}(\bar u - \rho\nabla f(\bar u)) \iff 0 \in \nabla f(\bar u)+N_{U_{\mathrm{ad}}}(\bar u).

Semismoothness and generalized derivatives

In finite dimensions, a locally Lipschitz map G:RnRmG: \mathbb{R}^n \to \mathbb{R}^m is semismooth at xx if for every sequence hk0h_k \to 0 and every choice VkCG(x+hk)V_k \in \partial_C G(x+h_k) (Clarke generalized Jacobian),

G(x+hk)G(x)Vkhk=o(hk).\|G(x+h_k)-G(x)-V_k h_k\| = o(\|h_k\|).

It is strongly semismooth if the remainder is O(hk2)O(\|h_k\|^2).

In Hilbert spaces, the same idea is used with generalized derivatives of nonsmooth operators (Clarke/Bouligand derivatives or Newton derivatives). The key property is that each linearization captures the first-order behavior with a small remainder, so a Newton correction can be defined.

For box constraints, the projection is piecewise affine pointwise:

Π[a,b](s)=min{max{s,a},b}.\Pi_{[a,b]}(s)=\min\{\max\{s,a\},b\}.

Therefore Π[a,b]\Pi_{[a,b]} is globally Lipschitz and strongly semismooth in finite dimensions, and so is the induced superposition operator in Lq(Ω)L^q(\Omega) spaces (1<q<1<q<\infty).

Semismooth Newton step

Given an iterate uku^k, choose a generalized derivative

MkF(uk),M_k \in \partial F(u^k),

and compute δukU\delta u^k \in U from

Mkδuk=F(uk),M_k\, \delta u^k = -F(u^k),

then update

uk+1=uk+δuk.u^{k+1} = u^k + \delta u^k.

For constrained PDE-control, each Newton step requires coupled evaluation of state and adjoint variables because f(u)\nabla f(u) depends on (y(u),p(u))(y(u),p(u)). In practice, one linearizes the reduced mapping uF(u)u \mapsto F(u) and solves the resulting linear system with PDE blocks using the same solver technology as in the all-at-once KKT setting.

Primal-dual construction with an additional multiplier for F(u)=0F(u)=0

The semismooth Newton step can be obtained from a KKT system with an explicit multiplier for the equation constraint F(u)=0F(u)=0.

At iteration kk, consider the proximal feasibility problem

minuU  12uukU2subject to F(u)=0.\min_{u\in U}\; \frac12\|u-u^k\|_U^2 \qquad \text{subject to } F(u)=0.

Its Lagrangian is

Lk(u,λ)=12uukU2+λ,F(u)U,U,\mathscr L_k(u,\lambda) = \frac12\|u-u^k\|_U^2 + \langle \lambda, F(u)\rangle_{U,U},

where λU\lambda \in U is an additional Lagrange multiplier.

The first-order system is

{uuk+F(u)λ=0,F(u)=0.\begin{cases} u-u^k + F'(u)^*\lambda = 0,\\ F(u)=0. \end{cases}

Replace F(uk)F'(u^k) with a generalized derivative MkF(uk)M_k\in\partial F(u^k) and linearize at (uk,λk)(u^k,\lambda^k):

(IMkMk0)(δukδλk)=(ukuk+MkλkF(uk))=(MkλkF(uk)).\begin{pmatrix} I & M_k^* \\ M_k & 0 \end{pmatrix} \begin{pmatrix} \delta u^k \\ \delta\lambda^k \end{pmatrix} =- \begin{pmatrix} u^k-u^k+M_k^*\lambda^k \\ F(u^k) \end{pmatrix} =- \begin{pmatrix} M_k^*\lambda^k \\ F(u^k) \end{pmatrix}.

If we initialize and keep λk=0\lambda^k=0, the first block row gives δuk=Mkδλk\delta u^k = -M_k^*\delta\lambda^k, and substituting in the second row yields the Schur-complement equation

MkMkδλk=F(uk),M_k M_k^*\,\delta\lambda^k = F(u^k),

with

δuk=Mkδλk.\delta u^k = -M_k^*\delta\lambda^k.

This primal-dual step is equivalent to solving a regularized Newton equation for F(u)=0F(u)=0 and leads to the same local model as the direct semismooth Newton correction when MkM_k is nonsingular.

Hence semismooth Newton may be viewed in two equivalent ways:

The primal-dual perspective is useful for block preconditioning and for embedding the step in all-at-once PDE-constrained solvers.

Explicit form for box-constrained linear-quadratic control

For

J(y,u)=12yydL22+α2uL22,Uad={u:uminuumax},J(y,u)=\tfrac12\|y-y_d\|_{L^2}^2 + \tfrac\alpha2\|u\|_{L^2}^2, \qquad U_{\mathrm{ad}} = \{u: u_{\min}\le u \le u_{\max}\},

we have

f(u)=αu+p(u),\nabla f(u)=\alpha u + p(u),

and the root equation is

F(u)=uΠ[umin,umax](uρ(αu+p(u)))=0.F(u)=u-\Pi_{[u_{\min},u_{\max}]} \bigl(u-\rho(\alpha u + p(u))\bigr)=0.

Define the active and inactive sets at iteration kk from

wk:=ukρ(αuk+pk):w^k := u^k - \rho(\alpha u^k + p^k):

Ak:={x:wk(x)umin(x)},A+k:={x:wk(x)umax(x)},Ik:=Ω(AkA+k).\mathcal A_-^k := \{x:\, w^k(x) \le u_{\min}(x)\}, \qquad \mathcal A_+^k := \{x:\, w^k(x) \ge u_{\max}(x)\}, \qquad \mathcal I^k := \Omega \setminus (\mathcal A_-^k \cup \mathcal A_+^k).

Then the generalized derivative of the projection is multiplication by the characteristic function χIk\chi_{\mathcal I^k}, and the Newton step is equivalent to:

This is exactly the primal-dual active-set (PDAS) method. Thus, for box constraints, PDAS can be interpreted as a semismooth Newton method applied to the projection equation.

Local convergence statement

Under standard assumptions:

  1. f\nabla f is (locally) Lipschitz differentiable in a neighborhood of uˉ\bar u;

  2. FF is semismooth at uˉ\bar u;

  3. every generalized derivative MF(uˉ)M \in \partial F(\bar u) is uniformly invertible, i.e., M1c\|M^{-1}\| \le c.

Then semismooth Newton is locally superlinearly convergent:

uk+1uˉU=o(ukuˉU).\|u^{k+1}-\bar u\|_U = o(\|u^k-\bar u\|_U).

If FF is strongly semismooth and the derivative approximation is exact, one obtains local quadratic convergence:

uk+1uˉU=O(ukuˉU2).\|u^{k+1}-\bar u\|_U = O(\|u^k-\bar u\|_U^2).

These rates explain why active-set/Newton-type methods are often much faster than projected gradient iterations once the active set is close to the final one.


Reduced vs All-at-Once

The first-order optimality conditions can be approached in two complementary ways.

Reduced approach. The control uˉUad\bar u \in U_{\mathrm{ad}} is the only optimization unknown. The state yˉ=S(uˉ)\bar y = S(\bar u) and adjoint pˉ\bar p are functions of uˉ\bar u. The problem reduces to

minuUadf(u),\min_{u \in U_{\mathrm{ad}}} f(u),

and algorithms iterate on uu only, solving state and adjoint equations at each step.

At each optimization iterate uku^k:

  1. state solve: find ykYy^k \in Y such that e(yk,uk)=0e(y^k, u^k) = 0;

  2. adjoint solve: find pkZp^k \in Z such that ey(yk,uk)pk=Jy(yk,uk)e_y(y^k, u^k)^* p^k = J_y(y^k, u^k);

  3. gradient: gk=RU1(Ju(yk,uk)eu(yk,uk)pk)Ug^k = \mathcal{R}_U^{-1}(J_u(y^k, u^k) - e_u(y^k, u^k)^* p^k) \in U;

  4. control update: uk+1=ΠUad(ukτkgk)u^{k+1} = \Pi_{U_{\mathrm{ad}}}(u^k - \tau_k g^k).

Advantages:

Disadvantages:

All-at-once approach. The triple (yˉ,uˉ,pˉ)(\bar y, \bar u, \bar p) is the unknown. One applies Newton’s method directly to the full KKT system. The Newton step (δy,δu,δp)(\delta y, \delta u, \delta p) solves the block system

(LyyLyueyLuyLuueueyeu0)(δyδuδp)=(LyLu+NUad1(uˉ)e),\begin{pmatrix} \mathcal{L}_{yy} & \mathcal{L}_{yu} & e_y^* \\[4pt] \mathcal{L}_{uy} & \mathcal{L}_{uu} & e_u^* \\[4pt] e_y & e_u & 0 \end{pmatrix} \begin{pmatrix} \delta y \\[4pt] \delta u \\[4pt] \delta p \end{pmatrix} = - \begin{pmatrix} \mathcal{L}_y \\[4pt] \mathcal{L}_u + N_{U_{\mathrm{ad}}}^{-1}(\bar u) \\[4pt] e \end{pmatrix},

where Lyy\mathcal{L}_{yy}, Lyu\mathcal{L}_{yu}, Luu\mathcal{L}_{uu} are second partial derivatives of the Lagrangian and NUad1N_{U_{\mathrm{ad}}}^{-1} is the linearization of the normal-cone inclusion (active-set method or semismooth Newton regularization).

For the linear-quadratic case, the block system is the saddle-point system seen in Lecture 7, which is linear and can be solved directly. For nonlinear problems, Newton linearization is needed and the block structure is the same, but with variable-dependent operators.

Advantages:

Disadvantages:


A Worked Example: Semilinear Elliptic Control

We verify that the abstract framework applies to a nonlinear state equation.

Model problem. Let ΩRd\Omega \subset \mathbb{R}^d be a bounded Lipschitz domain and d:RRd: \mathbb{R} \to \mathbb{R} a smooth function with d(0)=0d(0) = 0 (the semilinearity). Consider

minuUadJ(y,u):=12yydL2(Ω)2+α2uL2(Ω)2\min_{u \in U_{\mathrm{ad}}} J(y, u) := \frac{1}{2}\|y - y_d\|_{L^2(\Omega)}^2 + \frac{\alpha}{2}\|u\|_{L^2(\Omega)}^2

subject to

{Δy+d(y)=uin Ω,y=0on Ω.\begin{cases} -\Delta y + d(y) = u & \text{in } \Omega, \\[4pt] y = 0 & \text{on } \partial\Omega. \end{cases}

Spaces.

Y=H01(Ω),U=L2(Ω),Z=H1(Ω),Z=H01(Ω).Y = H_0^1(\Omega), \quad U = L^2(\Omega), \quad Z' = H^{-1}(\Omega), \quad Z = H_0^1(\Omega).

Constraint mapping.

e:H01(Ω)×L2(Ω)H1(Ω),e: H_0^1(\Omega) \times L^2(\Omega) \to H^{-1}(\Omega),

defined by

e(y,u),vH1,H01:=Ωyvdx+Ωd(y)vdxΩuvdxvH01(Ω).\langle e(y, u), v \rangle_{H^{-1}, H_0^1} := \int_\Omega \nabla y \cdot \nabla v \, dx + \int_\Omega d(y) v \, dx - \int_\Omega u v \, dx \qquad \forall v \in H_0^1(\Omega).

Partial derivatives of ee.

Linear operator ey(y,u):H01(Ω)H1(Ω)e_y(y, u): H_0^1(\Omega) \to H^{-1}(\Omega):

ey(y,u)w,vH1,H01=Ωwvdx+Ωd(y)wvdxw,vH01(Ω).\langle e_y(y, u) w, v \rangle_{H^{-1}, H_0^1} = \int_\Omega \nabla w \cdot \nabla v \, dx + \int_\Omega d'(y) w\, v \, dx \qquad \forall w, v \in H_0^1(\Omega).

Under the assumption d(y)σ0d'(y) \ge -\sigma_0 for some σ0<λ1(Ω)\sigma_0 < \lambda_1(\Omega) (first Dirichlet eigenvalue of Δ-\Delta), coercivity of the bilinear form is preserved, so ey(y,u)e_y(y, u) is an isomorphism (Assumption 3 is satisfied).

Linear operator eu(y,u):L2(Ω)H1(Ω)e_u(y, u): L^2(\Omega) \to H^{-1}(\Omega):

eu(y,u)h,vH1,H01=Ωhvdx=(h,v)L2(Ω)hL2(Ω),  vH01(Ω).\langle e_u(y, u) h, v \rangle_{H^{-1}, H_0^1} = -\int_\Omega h\, v \, dx = -(h, v)_{L^2(\Omega)} \qquad \forall h \in L^2(\Omega), \; v \in H_0^1(\Omega).

Thus eu(y,u)h=he_u(y, u) h = -h (embedding of L2L^2 into H1H^{-1} via the L2L^2 inner product).

Partial derivatives of JJ.

Jy(y,u)=yydL2(Ω)H1(Ω)=Y,Ju(y,u)=αuL2(Ω)=U.J_y(y, u) = y - y_d \in L^2(\Omega) \subset H^{-1}(\Omega) = Y', \qquad J_u(y, u) = \alpha u \in L^2(\Omega) = U'.

Adjoint operators.

ey(y,u):H01(Ω)H1(Ω)e_y(y, u)^*: H_0^1(\Omega) \to H^{-1}(\Omega) (identifying ZH01Z \cong H_0^1 via Riesz):

ey(y,u)p,vH1,H01=Ωvpdx+Ωd(y)vpdxp,vH01(Ω).\langle e_y(y, u)^* p, v \rangle_{H^{-1}, H_0^1} = \int_\Omega \nabla v \cdot \nabla p \, dx + \int_\Omega d'(y)\, v\, p \, dx \qquad \forall p, v \in H_0^1(\Omega).

(The symmetry of the bilinear form and the symmetry of multiplication by d(y)d'(y) mean ey=eye_y^* = e_y in this case; this would differ for non-symmetric operators such as transport.)

eu(y,u):H01(Ω)L2(Ω)=Ue_u(y, u)^*: H_0^1(\Omega) \to L^2(\Omega) = U':

eu(y,u)p,hU,U=p,eu(y,u)hZ,Z=(p,h)L2(Ω)hL2(Ω),\langle e_u(y, u)^* p, h \rangle_{U', U} = \langle p, e_u(y, u) h \rangle_{Z, Z'} = -(p, h)_{L^2(\Omega)} \qquad \forall h \in L^2(\Omega),

so eu(y,u)p=pL2(Ω)e_u(y, u)^* p = -p \in L^2(\Omega) (restriction of the H01H_0^1 function pp to L2L^2).

Adjoint equation. Find pˉH01(Ω)\bar p \in H_0^1(\Omega) such that

ey(yˉ,uˉ)pˉ=Jy(yˉ,uˉ)=yˉydin H1(Ω),e_y(\bar y, \bar u)^* \bar p = J_y(\bar y, \bar u) = \bar y - y_d \quad \text{in } H^{-1}(\Omega),

i.e., in weak form:

Ωvpˉdx+Ωd(yˉ)vpˉdx=Ω(yˉyd)vdxvH01(Ω).\int_\Omega \nabla v \cdot \nabla \bar p \, dx + \int_\Omega d'(\bar y)\, v\, \bar p \, dx = \int_\Omega (\bar y - y_d) v \, dx \qquad \forall v \in H_0^1(\Omega).

The corresponding strong form is

{Δpˉ+d(yˉ)pˉ=yˉydin Ω,pˉ=0on Ω.\begin{cases} -\Delta \bar p + d'(\bar y)\, \bar p = \bar y - y_d & \text{in } \Omega, \\[4pt] \bar p = 0 & \text{on } \partial\Omega. \end{cases}

Gradient of the reduced functional.

f(uˉ)=Ju(yˉ,uˉ)eu(yˉ,uˉ)pˉ=αuˉ(pˉ)=αuˉ+pˉL2(Ω).\nabla f(\bar u) = J_u(\bar y, \bar u) - e_u(\bar y, \bar u)^* \bar p = \alpha \bar u - (-\bar p) = \alpha \bar u + \bar p \in L^2(\Omega).

Full optimality system for box constraints.

{Δyˉ+d(yˉ)=uˉin Ω,Δpˉ+d(yˉ)pˉ=yˉydin Ω,uˉ=Π[umin,umax] ⁣(1αpˉ)a.e. in Ω,yˉ=pˉ=0on Ω.\begin{cases} -\Delta \bar y + d(\bar y) = \bar u & \text{in } \Omega, \\[4pt] -\Delta \bar p + d'(\bar y)\, \bar p = \bar y - y_d & \text{in } \Omega, \\[4pt] \bar u = \Pi_{[u_{\min},\, u_{\max}]}\!\left(-\tfrac{1}{\alpha} \bar p\right) & \text{a.e. in } \Omega, \\[4pt] \bar y = \bar p = 0 & \text{on } \partial\Omega. \end{cases}

The structure is the same as the linear-quadratic case from Lectures 4 and 5. The nonlinearity only affects the state equation (via d(yˉ)d(\bar y)) and the adjoint equation (via d(yˉ)pˉd'(\bar y)\bar p). The projection formula for the control is unchanged.


Existence for the General Problem

For completeness we state when a minimizer exists in the abstract setting.

Theorem. Suppose:

  1. UadU_{\mathrm{ad}} is nonempty, bounded, closed, and convex in the reflexive Banach space UU;

  2. the reduced functional f:UadRf: U_{\mathrm{ad}} \to \mathbb{R} is weakly lower semicontinuous;

  3. ff is bounded below on UadU_{\mathrm{ad}}.

Then there exists at least one global minimizer uˉUad\bar u \in U_{\mathrm{ad}}.

Remarks.


Second-Order Conditions

First-order conditions are necessary but not sufficient for a strict local minimizer.

Lagrangian Hessian. At a KKT point (yˉ,uˉ,pˉ)(\bar y, \bar u, \bar p), the second-order derivative of L\mathcal{L} with respect to (y,u)(y, u) in the direction (v,h)Y×U(v, h) \in Y \times U is

L(yˉ,uˉ,pˉ)[(v,h),(v,h)]=Jyy(yˉ,uˉ)v,vY,Y+2Jyu(yˉ,uˉ)h,vY,Y+Juu(yˉ,uˉ)h,hU,Ueyy(yˉ,uˉ)[v,v],pˉZ,Z.\mathcal{L}''(\bar y, \bar u, \bar p)[(v, h), (v, h)] = \langle J_{yy}(\bar y, \bar u) v, v \rangle_{Y', Y} + 2\langle J_{yu}(\bar y, \bar u) h, v \rangle_{Y', Y} + \langle J_{uu}(\bar y, \bar u) h, h \rangle_{U', U} - \langle e_{yy}(\bar y, \bar u)[v, v],\, \bar p \rangle_{Z', Z}.

The term involving eyye_{yy} vanishes for linear state equations, so for the Poisson model the Hessian of the Lagrangian coincides with the Hessian of JJ.

Critical cone.

C(uˉ):={hU:  h satisfies the first-order admissibility conditions at uˉ,  f(uˉ)h=0}.C(\bar u) := \bigl\{h \in U :\; h \text{ satisfies the first-order admissibility conditions at } \bar u,\; f'(\bar u) h = 0 \bigr\}.

Second-order necessary condition. If uˉ\bar u is a local minimizer, then for every hC(uˉ)h \in C(\bar u) and v=S(uˉ)hv = S'(\bar u) h,

L(yˉ,uˉ,pˉ)[(v,h),(v,h)]0.\mathcal{L}''(\bar y, \bar u, \bar p)[(v, h), (v, h)] \ge 0.

Second-order sufficient condition. If (yˉ,uˉ,pˉ)(\bar y, \bar u, \bar p) is a KKT point and

L(yˉ,uˉ,pˉ)[(v,h),(v,h)]>0(v,h)Y×C(uˉ),  (v,h)(0,0),\mathcal{L}''(\bar y, \bar u, \bar p)[(v, h), (v, h)] > 0 \qquad \forall (v, h) \in Y \times C(\bar u),\; (v, h) \ne (0, 0),

then uˉ\bar u is a strict local minimizer. For the linear-quadratic case with α>0\alpha > 0 this condition holds globally:

L[(v,h),(v,h)]=vL22+αhL22>0,\mathcal{L}''[(v,h),(v,h)] = \|v\|_{L^2}^2 + \alpha\|h\|_{L^2}^2 > 0,

confirming that the linear-quadratic problem has a unique global minimizer.


Summary

The abstract framework identifies three functional spaces and two adjoint operators that are the backbone of every PDE-constrained optimality system.

The reduced gradient is

f(uˉ)=RU1(Ju(yˉ,uˉ)eu(yˉ,uˉ)pˉ)U.\nabla f(\bar u) = \mathcal{R}_U^{-1}\bigl(J_u(\bar y, \bar u) - e_u(\bar y, \bar u)^* \bar p\bigr) \in U.

For the linear-quadratic elliptic model of Lectures 4 and 5, all operators are linear, the Lagrangian Hessian is positive definite, and the system reduces exactly to the adjoint-based formulation derived there. For the semilinear model of this lecture, the structure is identical but the state equation and adjoint equation carry additional nonlinear terms.

References