Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Reduced Formulation for Elliptic Optimal Control

University of Pisa

Overview

This is the first genuinely PDE-constrained lecture in the course. We take the optimization viewpoint of Lecture 3 and apply it to a standard linear elliptic optimal control problem:

  1. the PDE defines a control-to-state map SS;

  2. the cost becomes a reduced functional f(u)=J(Su,u)f(u)=J(Su,u);

  3. the adjoint equation gives the reduced gradient at the cost of one extra PDE solve;

  4. gradient-based methods from Lecture 3 now become PDE-based algorithms.

We focus on the distributed-control Poisson model because it contains all the structural ideas we need before constraints, discretization, and more advanced PDEs.


Model Problem

Let ΩRd\Omega\subset\mathbb R^d be a bounded domain. We consider the distributed control problem

min(y,u)J(y,u):=12yydL2(Ω)2+α2uL2(Ω)2,\min_{(y,u)} J(y,u) := \frac12\|y-y_d\|_{L^2(\Omega)}^2 +\frac{\alpha}{2}\|u\|_{L^2(\Omega)}^2,

subject to

{Δy=uin Ω,y=0on Ω,\begin{cases} -\Delta y = u & \text{in }\Omega,\\ y = 0 & \text{on }\partial\Omega, \end{cases}

with given data f,ydL2(Ω)f,y_d\in L^2(\Omega) and α>0\alpha>0.

Interpretation:

This is the PDE analogue of the linear-quadratic finite-dimensional problems seen before.


Weak Formulation of the State Equation

This is the first point where the PDE language changes. We start from the strong form

{Δy=uin Ω,y=0on Ω.\begin{cases} -\Delta y = u & \text{in }\Omega,\\ y = 0 & \text{on }\partial\Omega. \end{cases}

Assume for a moment that yy is smooth. Take a test function vv that is also smooth and vanishes on the boundary. Multiply the PDE by vv and integrate over Ω\Omega:

Ω(Δy)vdx=Ω(u)vdx.\int_\Omega (-\Delta y)\,v\,dx = \int_\Omega (u)v\,dx.

Now integrate by parts:

Ω(Δy)vdx=ΩyvdxΩynvds.\int_\Omega (-\Delta y)\,v\,dx = \int_\Omega \nabla y\cdot \nabla v\,dx -\int_{\partial\Omega}\frac{\partial y}{\partial n}v\,ds.

Because the test function satisfies v=0v=0 on Ω\partial\Omega, the boundary term vanishes. So we obtain the identity

Ωyvdx=Ωuvdx.\int_\Omega \nabla y\cdot \nabla v\,dx = \int_\Omega u v\,dx.

This motivates the Sobolev setting:

For fixed uL2(Ω)u\in L^2(\Omega), the weak formulation is: find yH01(Ω)y\in H_0^1(\Omega) such that

Ωyvdx=ΩuvdxvH01(Ω).\int_\Omega \nabla y\cdot \nabla v\,dx = \int_\Omega u v\,dx \qquad \forall v\in H_0^1(\Omega).

Introduce

a(y,v):=Ωyvdx,u(v):=Ωuvdx.a(y,v):=\int_\Omega \nabla y\cdot \nabla v\,dx, \qquad \ell_u(v):=\int_\Omega u v\,dx.

Then the problem reads

a(y,v)=u(v)vH01(Ω).a(y,v)=\ell_u(v)\qquad \forall v\in H_0^1(\Omega).

Why does this have a unique solution?

Lax-Milgram then gives a unique weak solution yH01(Ω)y\in H_0^1(\Omega) for every uL2(Ω)u\in L^2(\Omega).

This allows us to define the state equation as a linear map

S:L2(Ω)H01(Ω),uy.S:L^2(\Omega)\to H_0^1(\Omega),\qquad u\mapsto y.

The distinction to keep in mind is:

With elliptic regularity one often has more, but for the reduced formulation the key point is simply:


Reduced Cost Functional

Eliminate the state through the PDE:

y=S(u).y=S(u).

Then define

f(u):=J(S(u),u)=12S(u)ydL2(Ω)2+α2uL2(Ω)2.f(u):=J(S(u),u) = \frac12\|S(u)-y_d\|_{L^2(\Omega)}^2 +\frac{\alpha}{2}\|u\|_{L^2(\Omega)}^2.

The PDE-constrained problem can now be written as

minuL2(Ω)f(u).\min_{u\in L^2(\Omega)} f(u).

This is the infinite-dimensional version of the reduced formulation from Lecture 1.

Two remarks:

It is also useful to introduce the same problem in a fully operatorial form, because this makes the infinite-dimensional structure look exactly like a block linear system.

Let

V:=H01(Ω),Q:=L2(Ω).V:=H_0^1(\Omega), \qquad Q:=L^2(\Omega).

We define the elliptic operator A:VVA:V\to V' by

Ay,vV,V:=Ωyvdxy,vV.\langle Ay,v\rangle_{V',V}:=\int_\Omega \nabla y\cdot \nabla v\,dx \qquad \forall y,v\in V.

The control enters the state equation through the operator B:QVB:Q\to V' defined by

Bu,vV,V:=(u,v)QuQ, vV.\langle Bu,v\rangle_{V',V}:=(u,v)_{Q} \qquad \forall u\in Q,\ \forall v\in V.

Since the tracking term is measured in Q=L2(Ω)Q=L^2(\Omega), we also introduce the observation embedding C:VQC:V\to Q, here simply

Cy:=y,Cy:=y,

and its associated mass operator M:=CC:VVM:=C^*C:V\to V':

My,vV,V:=(y,v)Qy,vV.\langle My,v\rangle_{V',V}:=(y,v)_Q \qquad \forall y,v\in V.

For the control cost, it is convenient to write the L2L^2 inner product through the Riesz map RQ:QQR_Q:Q\to Q',

RQu,wQ,Q:=(u,w)Qu,wQ.\langle R_Q u,w\rangle_{Q',Q}:=(u,w)_Q \qquad \forall u,w\in Q.

With this notation, the state equation is

AyBu=Fin V,Ay-Bu=F \qquad \text{in }V',

where FVF\in V' is a given load. In the present model without an additional forcing term, one simply has F=0F=0.

The cost functional can be written as

J(y,u)=12M(yyd),yydV,V+α2RQu,uQ,Q.J(y,u) = \frac12\langle M(y-y_d),y-y_d\rangle_{V',V} +\frac{\alpha}{2}\langle R_Q u,u\rangle_{Q',Q}.

So the all-at-once infinite-dimensional problem is

min(y,u)V×QJ(y,u)subject toAyBu=F in V.\min_{(y,u)\in V\times Q} J(y,u) \qquad\text{subject to}\qquad Ay-Bu=F \text{ in }V'.

Introduce the Lagrangian with multiplier pVp\in V:

L(y,u,p):=J(y,u)AyBuF,pV,V.\mathcal L(y,u,p) := J(y,u)-\langle Ay-Bu-F,p\rangle_{V',V}.

Its first-order conditions are

AyBu=Fin V,MyAp=Mydin V,αRQu+Bp=0in Q.\begin{aligned} A y - B u &= F &&\text{in }V',\\ M y - A^* p &= M y_d &&\text{in }V',\\ \alpha R_Q u + B^* p &= 0 &&\text{in }Q'. \end{aligned}

This is already a saddle-point system: the unknown triple (y,u,p)(y,u,p) lives in V×Q×VV\times Q\times V, while the equations live in the dual product space V×Q×VV'\times Q'\times V' after reordering the blocks. Indeed, the KKT system can be written as

(M0A0αRQBAB0)(yup)=(Myd0F)in V×Q×V.\begin{pmatrix} M & 0 & -A^*\\ 0 & \alpha R_Q & B^*\\ A & -B & 0 \end{pmatrix} \begin{pmatrix} y\\ u\\ p \end{pmatrix} = \begin{pmatrix} M y_d\\ 0\\ F \end{pmatrix} \qquad \text{in } V'\times Q'\times V'.

This is the infinite-dimensional analogue of a symmetric indefinite linear system:

So even before discretization, PDE-constrained optimization already has the same algebraic structure as the block KKT systems that appear in finite-dimensional constrained optimization.


Existence and Uniqueness

In finite dimensions, existence follows from the Weierstrass principle: closed and bounded sets are compact, so minimizing sequences have convergent subsequences. That argument is no longer available in infinite-dimensional spaces, because closed and bounded sets in L2(Ω)L^2(\Omega) are generally not strongly compact.

So, in addition to convexity, we need three structural ingredients:

For our reduced functional, coercivity comes from the Tikhonov term:

f(u)=12S(u)ydL2(Ω)2+α2uL2(Ω)2α2uL2(Ω)2.f(u) = \frac12\|S(u)-y_d\|_{L^2(\Omega)}^2 +\frac{\alpha}{2}\|u\|_{L^2(\Omega)}^2 \ge \frac{\alpha}{2}\|u\|_{L^2(\Omega)}^2.

Hence every minimizing sequence is bounded in L2(Ω)L^2(\Omega).

To see this more concretely, let (un)(u_n) be a minimizing sequence. Then there exists a constant CC such that

f(un)Cfor all n large enough.f(u_n)\le C \qquad \text{for all } n \text{ large enough}.

Coercivity then implies

α2unL2(Ω)2f(un)CunL2(Ω)R=2Cα\frac{\alpha}{2}\|u_n\|_{L^2(\Omega)}^2 \le f(u_n) \le C \quad\Longrightarrow\quad \|u_n\|_{L^2(\Omega)} \le R = \sqrt{\frac{2C}{\alpha}}

so the minimizing sequence is bounded.

Since L2(Ω)L^2(\Omega) is reflexive, bounded sequences admit weakly convergent subsequences:

unuˉin L2(Ω).u_n \rightharpoonup \bar u \qquad \text{in } L^2(\Omega).

For minimization we do not need strong convergence of the whole sequence: we only need one convergent subsequence and a notion of lower semicontinuity that is compatible with that convergence. If ff is weakly lower semicontinuous, then

f(uˉ)lim infnf(un).f(\bar u)\le \liminf_{n\to\infty} f(u_n).

Since (un)(u_n) is minimizing, the right-hand side is exactly inff\inf f. Hence

f(uˉ)inff,f(\bar u)\le \inf f,

which forces f(uˉ)=infff(\bar u)=\inf f. So the weak limit of a minimizing subsequence is already a minimizer.

The state equation is linear and continuous, so the reduced functional is convex and weakly lower semicontinuous. This allows us to pass to the limit along a minimizing sequence and obtain existence of a minimizer.

Uniqueness comes from strict convexity. The reduced functional ff is strictly convex because:

As a consequence, ff has a unique minimizer uˉL2(Ω)\bar u\in L^2(\Omega).

This is the first major structural simplification of the linear-quadratic elliptic case:


Directional Derivative of the Reduced Cost

Let uL2(Ω)u\in L^2(\Omega) and hL2(Ω)h\in L^2(\Omega). We now compute the derivative of ff explicitly from the Frechet definition:

f(u)h:=tf(u+th)t=0f'(u) h := \frac{\partial}{\partial t}f(u+th)\Big|_{t=0}

Since SS is linear,

S(u+th)=S(u)+tS(h).S(u+th)=S(u)+tS(h).

Using this,

f(u+th)=12S(u+th)ydL2(Ω)2+α2u+thL2(Ω)2=12S(u)+tS(h)ydL2(Ω)2+α2u+thL2(Ω)2.\begin{aligned} f(u+th) &= \frac12\|S(u+th)-y_d\|_{L^2(\Omega)}^2 +\frac{\alpha}{2}\|u+th\|_{L^2(\Omega)}^2\\ &= \frac12\|S(u)+tS(h)-y_d\|_{L^2(\Omega)}^2 +\frac{\alpha}{2}\|u+th\|_{L^2(\Omega)}^2. \end{aligned}

Expanding both squares gives

f(u+th)=f(u)+t(S(u)yd,S(h))L2(Ω)+αt(u,h)L2(Ω)+t22S(h)L2(Ω)2+αt22hL2(Ω)2.\begin{aligned} f(u+th) &= f(u) +t(S(u)-y_d,S(h))_{L^2(\Omega)} +\alpha t(u,h)_{L^2(\Omega)}\\ &\quad +\frac{t^2}{2}\|S(h)\|_{L^2(\Omega)}^2 +\frac{\alpha t^2}{2}\|h\|_{L^2(\Omega)}^2. \end{aligned}

Subtract f(u)f(u) and divide by tt:

f(u+th)f(u)t=(S(u)yd,S(h))L2(Ω)+α(u,h)L2(Ω)+t2S(h)L2(Ω)2+αt2hL2(Ω)2.\frac{f(u+th)-f(u)}{t} = (S(u)-y_d,S(h))_{L^2(\Omega)} +\alpha(u,h)_{L^2(\Omega)} +\frac{t}{2}\|S(h)\|_{L^2(\Omega)}^2 +\frac{\alpha t}{2}\|h\|_{L^2(\Omega)}^2.

Passing to the limit t0t\to 0, we obtain

f(u)h=(S(u)yd,S(h))L2(Ω)+α(u,h)L2(Ω).f'(u)h = (S(u)-y_d,S(h))_{L^2(\Omega)} +\alpha(u,h)_{L^2(\Omega)}.

This formula is correct, but not computationally convenient: if we need to evaluate f(u)hf'(u)h along many directions hh, we would need many state solves.

The adjoint equation removes this difficulty.


Adjoint Equation

At this point we know that

f(u)h=(yyd,S(h))L2(Ω)+α(u,h)L2(Ω).f'(u)h=(y-y_d,S(h))_{L^2(\Omega)}+\alpha(u,h)_{L^2(\Omega)}.

The difficulty is the first term: the direction hh appears only indirectly, through the state variation S(h)S(h).

This is exactly where the adjoint enters. We would like to rewrite

(yyd,S(h))L2(Ω)(S(yyd),h)L2(Ω),(y-y_d,S(h))_{L^2(\Omega)} \qquad \Longrightarrow\qquad (S^*(y-y_d),h)_{L^2(\Omega)},

as an expression where hh appears explicitly. This can be done in a two-step process. Given a control uu, we first solve the control-to-state map to get y=S(u)y=S(u), then we solve the one adjoint PDE to obtain p=S(yyd)p=S^*(y-y_d), and then we use pp to rewrite the inner product with S(h)S(h) as an inner product with hh.

More concretely, we look for pH01(Ω)p\in H_0^1(\Omega) such that for every test function vH01(Ω)v\in H_0^1(\Omega),

Ωvpdx=Ω(yyd)vdx.\int_\Omega \nabla v\cdot \nabla p\,dx = \int_\Omega (y-y_d)v\,dx.

Notice the change of roles of vv and pp compared to the state equation. In this particular example the bilinear form is symmetric, so the change of roles is not visible, but in general the adjoint bilinear form is different from the original one, and the test function vv is now associated with the adjoint state pp.

Notice the analogy with the state equation:

We define the adjoint state pH01(Ω)p\in H_0^1(\Omega) by

Ωvpdx=Ω(yyd)vdxvH01(Ω),\int_\Omega \nabla v \cdot \nabla p\,dx = \int_\Omega (y-y_d)v\,dx \qquad \forall v\in H_0^1(\Omega),

where y=S(u)y=S(u) is the state associated with the current control uu.

Since in this case the adjoint bilinear form coincides with the original coercive form, Lax-Milgram applies again and gives existence and uniqueness of pp.

If pp is smooth enough, the corresponding strong form is

{Δp=yydin Ω,p=0on Ω.\begin{cases} -\Delta p = y-y_d & \text{in }\Omega,\\ p = 0 & \text{on }\partial\Omega. \end{cases}

The adjoint PDE is governed by the adjoint operator. For the Poisson problem this is again Δ-\Delta, because the operator is self-adjoint, but this coincidence should be viewed as a special feature of the present model, not as the general rule.

Conceptually:


Reduced Gradient Formula

From what we have so far, the directional derivative of the reduced cost is

f(u)h=(yyd,S(h))L2(Ω)+α(u,h)L2(Ω).f'(u)h=(y-y_d,S(h))_{L^2(\Omega)}+\alpha(u,h)_{L^2(\Omega)}.

which we rewrite moving SS to the other side of the inner product, i.e., (yyd,S(h))L2(Ω)=(S(yyd),h)L2(Ω)(y-y_d,S(h))_{L^2(\Omega)}=(S^*(y-y_d),h)_{L^2(\Omega)}, and defining the adjoint state p=S(yyd)p=S^*(y-y_d).

Thus the reduced gradient is

f(u)=p+αuin L2(Ω).\nabla f(u)=p+\alpha u \qquad \text{in }L^2(\Omega).

This is the central formula of the lecture.


First-Order Optimality System

For the unconstrained problem, the optimal control uˉ\bar u satisfies

f(uˉ)h=0hL2(Ω),f'(\bar u)h=0 \qquad \forall h\in L^2(\Omega),

that is,

(f(uˉ),h)L2(Ω)=0hL2(Ω).(\nabla f(\bar u),h)_{L^2(\Omega)}=0 \qquad \forall h\in L^2(\Omega).

The only element of L2(Ω)L^2(\Omega) orthogonal to all test directions hh is the zero element, so this is equivalent to

f(uˉ)=0.\nabla f(\bar u)=0.

Using the gradient formula derived above, we obtain

f(uˉ)=pˉ+αuˉ=0.\nabla f(\bar u)=\bar p+\alpha \bar u=0.

The optimality system is then

{Δyˉ=uˉin Ω,yˉ=0on Ω,Δpˉ=yˉydin Ω,pˉ=0on Ω,αuˉ+pˉ=0in Ω.\begin{cases} -\Delta \bar y = \bar u & \text{in }\Omega,\\ \bar y = 0 & \text{on }\partial\Omega,\\[0.3em] -\Delta \bar p = \bar y-y_d & \text{in }\Omega,\\ \bar p = 0 & \text{on }\partial\Omega,\\[0.3em] \alpha \bar u + \bar p = 0 & \text{in }\Omega. \end{cases}

Eliminating uˉ\bar u gives the explicit formula

uˉ=1αpˉ.\bar u = -\frac{1}{\alpha}\bar p.

This is the PDE version of the finite-dimensional gradient equation.


Algorithmic Interpretation

To evaluate the reduced gradient at a control uku_k:

  1. solve the state equation for yky_k;

  2. solve the adjoint equation for pkp_k;

  3. form the gradient

    gk=αuk+pk.g_k = \alpha u_k + p_k.

Then a gradient step reads

uk+1=ukτkgk,u_{k+1}=u_k-\tau_k g_k,

with τk\tau_k chosen by exact line search, Armijo backtracking, or another strategy from Lecture 3.

So one iteration of PDE-constrained gradient descent means:

This is the computational meaning of the reduced formulation.


Reduced Gradient Algorithm

A basic reduced-gradient method is:

  1. choose u0L2(Ω)u_0\in L^2(\Omega);

  2. for k=0,1,2,k=0,1,2,\dots:

    • solve the state equation for yk=S(uk)y_k=S(u_k);

    • solve the adjoint equation for pkp_k;

    • compute gk=αuk+pkg_k=\alpha u_k+p_k;

    • choose τk>0\tau_k>0;

    • update uk+1=ukτkgku_{k+1}=u_k-\tau_k g_k.

Stopping criteria are the same as before, now written in function space, e.g.

gkL2(Ω)ε.\|g_k\|_{L^2(\Omega)}\le \varepsilon.

Lecture 3 now has a direct PDE interpretation:


Summary

In the linear elliptic distributed-control setting:

  1. the PDE defines a linear control-to-state map SS;

  2. the constrained problem reduces to minimizing f(u)=J(Su,u)f(u)=J(Su,u);

  3. the adjoint equation gives the reduced gradient

    f(u)=αu+p;\nabla f(u)=\alpha u+p;
  4. unconstrained optimality is the gradient equation

    αuˉ+pˉ=0;\alpha \bar u+\bar p=0;
  5. every reduced-gradient iteration requires one state solve and one adjoint solve.

This is the basic computational pattern for PDE-constrained optimization.

For a first concrete discrete example, the repository now includes jupyterbook/codes/lecture04/poisson_1d_fd.py. It solves a trivial one-dimensional Poisson control problem by finite differences, checks the reduced gradient by finite differences, and performs a few reduced-gradient iterations. The reusable one-dimensional finite-difference utilities used there live in jupyterbook/codes/common/fd1d.py. For a richer JupyterBook example, see also jupyterbook/codes/lecture04/step_target_fd.ipynb, where the target state is a rectangular step function in L2(0,1)L^2(0,1) and the notebook generates both plots and a GIF animation of the state approaching the target.