Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Optimality Conditions for Elliptic Optimal Control with Constraints

University of Pisa

Overview

The previous lecture established the continuous framework for linear-quadratic elliptic control:

This lecture extends the framework to the case of control constraints.

The goal is to derive:

  1. the variational inequality for box-constrained controls;

  2. the operatorial KKT system with a nonlinear control condition;

  3. the pointwise projection formula;

  4. the saddle-point interpretation in infinite dimensions.


Notation and Setting

V:=H01(Ω),Q:=L2(Ω).V:=H_0^1(\Omega), \qquad Q:=L^2(\Omega).

The operators are

A:VV,B:QV,M:VV,RQ:QQ,A:V\to V', \qquad B:Q\to V', \qquad M:V\to V', \qquad R_Q:Q\to Q',

with

Ay,vV,V=Ωyvdx,Bu,vV,V=(u,v)Q,\langle Ay,v\rangle_{V',V}=\int_\Omega \nabla y\cdot \nabla v\,dx, \qquad \langle Bu,v\rangle_{V',V}=(u,v)_Q,
My,vV,V=(y,v)Q,RQu,wQ,Q=(u,w)Q.\langle My,v\rangle_{V',V}=(y,v)_Q, \qquad \langle R_Q u,w\rangle_{Q',Q}=(u,w)_Q.

The state equation is

AyBu=Fin V,Ay-Bu=F \qquad \text{in }V',

and the cost functional is

J(y,u)=12M(yyd),yydV,V+α2RQu,uQ,Q.J(y,u) = \frac12\langle M(y-y_d),y-y_d\rangle_{V',V} +\frac{\alpha}{2}\langle R_Q u,u\rangle_{Q',Q}.

Hence the all-at-once problem is

min(y,u)V×QJ(y,u)subject toAyBu=F in V.\min_{(y,u)\in V\times Q} J(y,u) \qquad\text{subject to}\qquad Ay-Bu=F \text{ in }V'.

From the previous lecture, in the unconstrained case, the first-order system is

AyBu=Fin V,MyAp=Mydin V,αRQu+Bp=0in Q.\begin{aligned} A y - B u &= F &&\text{in }V',\\ M y - A^* p &= M y_d &&\text{in }V',\\ \alpha R_Q u + B^* p &= 0 &&\text{in }Q'. \end{aligned}

Equivalently, in reduced form,

f(u)=p+αu.\nabla f(u)=p+\alpha u.

This is the only fact we need to import from the previous lecture.


Admissible Controls

We now impose box constraints on the control:

Uad:={uQ:umin(x)u(x)umax(x) for a.e. xΩ}.U_{\mathrm{ad}} := \left\{ u\in Q: u_{\min}(x)\le u(x)\le u_{\max}(x) \text{ for a.e. }x\in\Omega \right\}.

We assume

umin,umaxL(Ω),umin(x)umax(x)a.e. in Ω,u_{\min},u_{\max}\in L^\infty(\Omega), \qquad u_{\min}(x)\le u_{\max}(x) \quad \text{a.e. in }\Omega,

so that UadQU_{\mathrm{ad}}\subset Q is nonempty, closed, and convex.

The constrained reduced problem is therefore

minuUadf(u).\min_{u\in U_{\mathrm{ad}}} f(u).

The state equation is unchanged. Only the admissible set for the control changes.


Variational Inequality

For minimization of a differentiable functional over a closed convex set in a Hilbert space, the first-order condition is a variational inequality rather than the equation f(uˉ)=0\nabla f(\bar u)=0.

Thus uˉUad\bar u\in U_{\mathrm{ad}} is optimal if and only if

f(uˉ)(uuˉ)0uUad.f'(\bar u)(u-\bar u)\ge 0 \qquad \forall u\in U_{\mathrm{ad}}.

Using the reduced gradient from the previous lecture, this becomes

(pˉ+αuˉ,uuˉ)Q0uUad.(\bar p+\alpha \bar u,u-\bar u)_Q\ge 0 \qquad \forall u\in U_{\mathrm{ad}}.

This is the constrained analogue of stationarity. If the constraint is inactive, one recovers pˉ+αuˉ=0\bar p+\alpha \bar u=0. If a bound is active, only admissible perturbations are allowed, so equality is replaced by an inequality.


Operatorial KKT System

The cleanest way to express the constrained problem in the operatorial setting is through the normal cone to UadU_{\mathrm{ad}}. For a closed convex set KQK\subset Q, the normal cone at uKu\in K is

NK(u):={ηQ:η,vuQ,Q0vK}.N_K(u) := \left\{ \eta\in Q': \langle \eta,v-u\rangle_{Q',Q}\le 0 \quad \forall v\in K \right\}.

Then the variational inequality is equivalent to the inclusion

0αRQuˉ+Bpˉ+NUad(uˉ)in Q.0\in \alpha R_Q \bar u+B^*\bar p+N_{U_{\mathrm{ad}}}(\bar u) \qquad \text{in }Q'.

Therefore the full KKT system becomes

AyˉBuˉ=Fin V,MyˉApˉ=Mydin V,0αRQuˉ+Bpˉ+NUad(uˉ)in Q.\begin{aligned} A \bar y - B \bar u &= F &&\text{in }V',\\ M \bar y - A^* \bar p &= M y_d &&\text{in }V',\\ 0 &\in \alpha R_Q \bar u + B^* \bar p + N_{U_{\mathrm{ad}}}(\bar u) &&\text{in }Q'. \end{aligned}

Compared with the unconstrained system from the previous lecture, only the third equation changes: the linear equation in QQ' is replaced by a nonlinear inclusion in QQ'.

This can also be written as a block saddle-point relation:

(M0A0αRQBAB0)(yˉuˉpˉ)+(0η0)=(Myd0F),ηNUad(uˉ),\begin{pmatrix} M & 0 & -A^*\\ 0 & \alpha R_Q & B^*\\ A & -B & 0 \end{pmatrix} \begin{pmatrix} \bar y\\ \bar u\\ \bar p \end{pmatrix} + \begin{pmatrix} 0\\ \eta\\ 0 \end{pmatrix} = \begin{pmatrix} M y_d\\ 0\\ F \end{pmatrix}, \qquad \eta\in N_{U_{\mathrm{ad}}}(\bar u),

in the dual product space

V×Q×V.V'\times Q'\times V'.

So the constrained problem still has the same KKT block structure as in the previous lecture, but now one block is nonlinear because of the cone term.


Projection Formula

For box constraints, the normal-cone inclusion has a pointwise characterization. It is equivalent to the projection formula

uˉ=Π[umin,umax](1αpˉ),\bar u = \Pi_{[u_{\min},u_{\max}]} \left(-\frac1\alpha \bar p\right),

where Π[umin,umax]\Pi_{[u_{\min},u_{\max}]} denotes the pointwise projection onto the interval [umin(x),umax(x)][u_{\min}(x),u_{\max}(x)].

Explicitly, for almost every xΩx\in\Omega,

uˉ(x)={umin(x)if 1αpˉ(x)umin(x),umax(x)if 1αpˉ(x)umax(x),1αpˉ(x)if umin(x)<1αpˉ(x)<umax(x).\bar u(x)= \begin{cases} u_{\min}(x) & \text{if } -\dfrac1\alpha \bar p(x)\le u_{\min}(x),\\[6pt] u_{\max}(x) & \text{if } -\dfrac1\alpha \bar p(x)\ge u_{\max}(x),\\[6pt] -\dfrac1\alpha \bar p(x) & \text{if } u_{\min}(x)<-\dfrac1\alpha \bar p(x)<u_{\max}(x). \end{cases}

Thus the control law remains explicit, but it is no longer linear. The unconstrained formula u=α1pu=-\alpha^{-1}p is simply clipped onto the admissible interval.


Full Optimality System

Combining state equation, adjoint equation, and control characterization, we obtain

{AyˉBuˉ=Fin V,MyˉApˉ=Mydin V,uˉ=ΠUad ⁣(1αpˉ)in Q.\begin{cases} A \bar y - B \bar u = F & \text{in }V',\\[4pt] M \bar y - A^* \bar p = M y_d & \text{in }V',\\[4pt] \bar u = \Pi_{U_{\mathrm{ad}}}\!\left(-\dfrac1\alpha \bar p\right) & \text{in }Q. \end{cases}

In the Poisson model of this course, this reads

{Δyˉ=uˉ+fin Ω,Δpˉ=yˉydin Ω,uˉ=ΠUad ⁣(1αpˉ)in Ω,yˉ=pˉ=0on Ω.\begin{cases} -\Delta \bar y = \bar u + f & \text{in }\Omega,\\[4pt] -\Delta \bar p = \bar y-y_d & \text{in }\Omega,\\[4pt] \bar u = \Pi_{U_{\mathrm{ad}}}\!\left(-\dfrac1\alpha \bar p\right) & \text{in }\Omega,\\[4pt] \bar y=\bar p=0 & \text{on }\partial\Omega. \end{cases}

The first two equations are linear and exactly the same kind of objects as in the previous lecture. The third one carries the entire effect of the control constraint.


Interpretation as Infinite-Dimensional KKT

The constrained system is the direct analogue of finite-dimensional KKT conditions:

So the conceptual picture is:

This is exactly the structure later exploited by active-set and semismooth Newton methods.


Projected Gradient Methods

The most direct numerical method works on the reduced functional over the convex set UadU_{\mathrm{ad}}. Starting from u0Uadu^0\in U_{\mathrm{ad}}, one computes at each iteration:

  1. state solve

    AykBuk=F;A y^k - B u^k = F;
  2. adjoint solve

    MykApk=Myd;M y^k - A^* p^k = M y_d;
  3. reduced gradient

    gk=f(uk)=pk+αuk;g^k=\nabla f(u^k)=p^k+\alpha u^k;
  4. projected update

    uk+1=ΠUad ⁣(ukτkgk).u^{k+1} = \Pi_{U_{\mathrm{ad}}}\!\left(u^k-\tau_k g^k\right).

This is the Hilbert-space analogue of projected gradient descent for bound-constrained finite-dimensional optimization.

Two remarks are important:

For the linear-quadratic case, if one chooses

τk=1α,\tau_k=\frac1\alpha,

then

ukτkgk=uk1α(αuk+pk)=1αpk,u^k-\tau_k g^k = u^k-\frac1\alpha(\alpha u^k+p^k) = -\frac1\alpha p^k,

so the update becomes

uk+1=ΠUad ⁣(1αpk).u^{k+1} = \Pi_{U_{\mathrm{ad}}}\!\left(-\frac1\alpha p^k\right).

This explains why the projection formula is both an optimality condition and a natural fixed-point iteration.

Projected gradient is robust and easy to implement, but it is usually only linearly convergent and may become slow when the active set is nearly identified but not yet fixed.


Primal-Dual Active Set Methods

The projection formula suggests splitting the domain into active and inactive regions. Given an iterate (uk,pk)(u^k,p^k), define

Ak:={xΩ:1αpk(x)umin(x)},\mathcal A_-^k := \left\{ x\in\Omega: -\frac1\alpha p^k(x)\le u_{\min}(x) \right\},
A+k:={xΩ:1αpk(x)umax(x)},\mathcal A_+^k := \left\{ x\in\Omega: -\frac1\alpha p^k(x)\ge u_{\max}(x) \right\},

and the inactive set

Ik:=Ω(AkA+k).\mathcal I^k := \Omega\setminus(\mathcal A_-^k\cup \mathcal A_+^k).

To make the primal-dual structure explicit, introduce a multiplier

λQ,\lambda\in Q',

for the box constraints, and write the control optimality condition as

αRQu+Bp+λ=0in Q.\alpha R_Q u + B^* p + \lambda = 0 \qquad \text{in }Q'.

For box constraints, λ\lambda satisfies the complementarity pattern

λ(x)0 on the lower active set,λ(x)0 on the upper active set,λ(x)=0 on the inactive set.\lambda(x)\ge 0 \ \text{on the lower active set}, \qquad \lambda(x)\le 0 \ \text{on the upper active set}, \qquad \lambda(x)=0 \ \text{on the inactive set}.

Once the active sets are frozen, the inequalities are replaced by equality constraints. On a given active-set guess (Ak,A+k,Ik)(\mathcal A_-^k,\mathcal A_+^k,\mathcal I^k), one may therefore consider the Lagrangian

Lk(y,u,p,λk,λ+k):=12yydL2(Ω)2+α2uL2(Ω)2AyBuF,pV,V+(λk,uumin)L2(Ak)+(λ+k,uumax)L2(A+k),\begin{aligned} \mathcal L_k(y,u,p,\lambda_-^k,\lambda_+^k) :={}& \frac12\|y-y_d\|_{L^2(\Omega)}^2 +\frac{\alpha}{2}\|u\|_{L^2(\Omega)}^2 \\ &-\langle Ay-Bu-F,p\rangle_{V',V}\\ &+(\lambda_-^k,u-u_{\min})_{L^2(\mathcal A_-^k)} +(\lambda_+^k,u-u_{\max})_{L^2(\mathcal A_+^k)}, \end{aligned}

where the equality constraints are imposed only on the active sets. The multipliers λk\lambda_-^k and λ+k\lambda_+^k are supported on Ak\mathcal A_-^k and A+k\mathcal A_+^k, respectively.

Then the optimality conditions suggest the update rules

uk+1=uminon Ak,uk+1=umaxon A+k,u^{k+1}=u_{\min}\quad \text{on }\mathcal A_-^k, \qquad u^{k+1}=u_{\max}\quad \text{on }\mathcal A_+^k,

and

αuk+1+pk+1=0on Ik.\alpha u^{k+1}+p^{k+1}=0 \qquad \text{on }\mathcal I^k.

Equivalently, the multiplier can be recovered from

λk+1=αuk+1pk+1,\lambda^{k+1} = -\alpha u^{k+1}-p^{k+1},

with

λk+1=0on Ik,\lambda^{k+1}=0 \qquad \text{on }\mathcal I^k,

and sign restrictions on Ak\mathcal A_-^k and A+k\mathcal A_+^k.

So, once the active sets are guessed, one solves a linear system for (yk+1,uk+1,pk+1,λk+1)(y^{k+1},u^{k+1},p^{k+1},\lambda^{k+1}) with those sets frozen. In strong form this means:

{Δyk+1=uk+1+f,Δpk+1=yk+1yd,uk+1=umin on Ak,uk+1=umax on A+k,αuk+1+pk+1=0 on Ik.\begin{cases} -\Delta y^{k+1}=u^{k+1}+f,\\[4pt] -\Delta p^{k+1}=y^{k+1}-y_d,\\[4pt] u^{k+1}=u_{\min}\ \text{on }\mathcal A_-^k,\\[4pt] u^{k+1}=u_{\max}\ \text{on }\mathcal A_+^k,\\[4pt] \alpha u^{k+1}+p^{k+1}=0\ \text{on }\mathcal I^k. \end{cases}

Algorithmically:

  1. guess active and inactive sets from the current iterate;

  2. solve the equality-constrained KKT system with these sets fixed;

  3. update the sets and repeat until they stop changing.

This is called a primal-dual active set method because:

In practice, PDAS is often much faster than projected gradient because once the active set is identified, the remaining step is essentially the solution of the correct linearized KKT system.

The same idea extends beyond box constraints. Let UadQU_{\mathrm{ad}}\subset Q be any closed convex set for which the projection

ΠUad:QUad\Pi_{U_{\mathrm{ad}}}:Q\to U_{\mathrm{ad}}

is available in explicit or computational form. Then one may still write the optimality condition as

u=ΠUad ⁣(uσf(u)),u=\Pi_{U_{\mathrm{ad}}}\!\left(u-\sigma \nabla f(u)\right),

or equivalently as a normal-cone inclusion

0f(u)+NUad(u),0\in \nabla f(u)+N_{U_{\mathrm{ad}}}(u),

and build active-set type methods by locally identifying the part of the constraint that behaves as an equality constraint at the current iterate. For box constraints this identification is pointwise and especially transparent, which is why PDAS takes its simplest form there. For more general admissible sets, the same principle survives, but the geometry of the active manifold and the structure of the projected Newton step become more problem-dependent.


Semismooth Newton Interpretation

The projection operator is not differentiable in the classical sense, but it is semismooth. This allows one to apply generalized Newton methods to the nonsmooth optimality system.

Recall the basic definition. Let G:QQG:Q\to Q be locally Lipschitz and directionally differentiable. Its generalized derivative at uu is the Clarke generalized Jacobian

G(u):=co{limjG(uj):uju,  G differentiable at uj},\partial G(u) := \operatorname{co} \left\{ \lim_{j\to\infty} G'(u_j): u_j\to u,\; G \text{ differentiable at }u_j \right\},

that is, the convex hull of all limits of classical derivatives computed at nearby points where GG is differentiable.

Then GG is called semismooth at uu if, for every perturbation s0s\to 0 and every generalized derivative VsG(u+s)V_s\in \partial G(u+s) in the sense of Clarke,

G(u+s)G(u)Vss=o(s).G(u+s)-G(u)-V_s s=o(\|s\|).

If the stronger estimate

G(u+s)G(u)Vss=O(s2)G(u+s)-G(u)-V_s s=O(\|s\|^2)

holds, one speaks of strongly semismoothness. Semismoothness is the regularity that replaces classical differentiability in generalized Newton theory.

A convenient reduced equation is

G(u):=uΠUad ⁣(uσf(u))=0,G(u) := u-\Pi_{U_{\mathrm{ad}}}\!\left(u-\sigma \nabla f(u)\right)=0,

with some parameter σ>0\sigma>0. At a solution, this is equivalent to the variational inequality.

Indeed, in a Hilbert space the projection onto a closed convex set is characterized by

z=ΠUad(w)(wz,vz)Q0vUad.z=\Pi_{U_{\mathrm{ad}}}(w) \qquad\Longleftrightarrow\qquad (w-z,v-z)_Q\le 0 \quad \forall v\in U_{\mathrm{ad}}.

Apply this with

w=uσf(u),z=u.w=u-\sigma \nabla f(u), \qquad z=u.

Then

G(u)=0u=ΠUad ⁣(uσf(u)).G(u)=0 \qquad\Longleftrightarrow\qquad u=\Pi_{U_{\mathrm{ad}}}\!\left(u-\sigma \nabla f(u)\right).

Using the projection characterization, this is equivalent to

(uσf(u)u,vu)Q0vUad,(u-\sigma \nabla f(u)-u,v-u)_Q\le 0 \qquad \forall v\in U_{\mathrm{ad}},

that is,

σ(f(u),vu)Q0vUad.-\sigma(\nabla f(u),v-u)_Q\le 0 \qquad \forall v\in U_{\mathrm{ad}}.

Since σ>0\sigma>0, this is equivalent to

(f(u),vu)Q0vUad,(\nabla f(u),v-u)_Q\ge 0 \qquad \forall v\in U_{\mathrm{ad}},

which is exactly the variational inequality

f(u)(vu)0vUad.f'(u)(v-u)\ge 0 \qquad \forall v\in U_{\mathrm{ad}}.

So the fixed-point equation G(u)=0G(u)=0 and the constrained first-order condition are equivalent.

For box constraints, the projection acts pointwise, so its generalized derivative is easy to characterize:

Therefore a generalized Newton step for G(u)=0G(u)=0 amounts to freezing the current active set and solving the corresponding linearized state-adjoint-control system. This is exactly the mechanism behind PDAS.

So, in the linear-quadratic box-constrained case, one can view:

This equivalence is one of the main reasons why active-set methods are so effective for elliptic control problems with box constraints: they inherit Newton-type local behavior while preserving the simple geometric interpretation of active and inactive regions.


Algorithmic Comparison

The three methods fit the same state-adjoint-control structure, but they use it differently:

Typical qualitative behavior:

This is why the continuous optimality system matters algorithmically: it tells us not only what the solution must satisfy, but also what class of numerical methods is natural for the problem.


Summary

This lecture builds directly on the previous one:

  1. the state equation and adjoint equation are unchanged;

  2. the unconstrained stationarity condition

    αRQu+Bp=0\alpha R_Q u+B^*p=0

    is replaced by the constrained inclusion

    0αRQu+Bp+NUad(u);0\in \alpha R_Q u+B^*p+N_{U_{\mathrm{ad}}}(u);
  3. equivalently, the control satisfies the projection formula

    u=ΠUad ⁣(1αp);u=\Pi_{U_{\mathrm{ad}}}\!\left(-\frac1\alpha p\right);
  4. the full KKT system remains a saddle-point problem in dual spaces

    V×Q×V,V'\times Q'\times V',

    with a nonlinear control block.

  5. this structure naturally leads to three algorithmic families:

    • projected gradient methods;

    • primal-dual active set methods;

    • semismooth Newton methods.

This closes the continuous first-order theory for linear-quadratic elliptic control with box constraints.

References