Reduced Formulation for Elliptic Optimal Control - NMOPT — Numerical Methods for Optimal Control

Overview¶

This is the first genuinely PDE-constrained lecture in the course. We take the optimization viewpoint of Lecture 3 and apply it to a standard linear elliptic optimal control problem:

the PDE defines a control-to-state map $S$ ;
the cost becomes a reduced functional $f(u)=J(Su,u)$ ;
the adjoint equation gives the reduced gradient at the cost of one extra PDE solve;
gradient-based methods from Lecture 3 now become PDE-based algorithms.

We focus on the distributed-control Poisson model because it contains all the structural ideas we need before constraints, discretization, and more advanced PDEs.

Model Problem¶

Let $\Omega\subset\mathbb R^d$ be a bounded domain. We consider the distributed control problem

\min_{(y,u)} J(y,u) := \frac12\|y-y_d\|_{L^2(\Omega)}^2 +\frac{\alpha}{2}\|u\|_{L^2(\Omega)}^2,

(1)

subject to

\begin{cases} -\Delta y = u & \text{in }\Omega,\\ y = 0 & \text{on }\partial\Omega, \end{cases}

(2)

with given data $f,y_d\in L^2(\Omega)$ and $\alpha>0$ .

Interpretation:

$u$ is a distributed source term;
$y$ is the state generated by that source;
the first term tracks a desired state $y_d$ ;
the second term penalizes large controls.

This is the PDE analogue of the linear-quadratic finite-dimensional problems seen before.

Weak Formulation of the State Equation¶

This is the first point where the PDE language changes. We start from the strong form

\begin{cases} -\Delta y = u & \text{in }\Omega,\\ y = 0 & \text{on }\partial\Omega. \end{cases}

(3)

Assume for a moment that $y$ is smooth. Take a test function $v$ that is also smooth and vanishes on the boundary. Multiply the PDE by $v$ and integrate over $\Omega$ :

\int_\Omega (-\Delta y)\,v\,dx = \int_\Omega (u)v\,dx.

(4)

Now integrate by parts:

\int_\Omega (-\Delta y)\,v\,dx = \int_\Omega \nabla y\cdot \nabla v\,dx -\int_{\partial\Omega}\frac{\partial y}{\partial n}v\,ds.

(5)

Because the test function satisfies $v=0$ on $\partial\Omega$ , the boundary term vanishes. So we obtain the identity

\int_\Omega \nabla y\cdot \nabla v\,dx = \int_\Omega u v\,dx.

(6)

the strong form contains second derivatives of $y$ ;
the weak form contains only first derivatives of $y$ ;
as a consequence, the weak form makes sense in a larger space.

This motivates the Sobolev setting:

$y\in H_0^1(\Omega)$ : one weak derivative in $L^2$ , zero trace on the boundary;
$v\in H_0^1(\Omega)$ : same space for test functions;
$u,f\in L^2(\Omega)$ : enough to make the right-hand side meaningful.

For fixed $u\in L^2(\Omega)$ , the weak formulation is: find $y\in H_0^1(\Omega)$ such that

\int_\Omega \nabla y\cdot \nabla v\,dx = \int_\Omega u v\,dx \qquad \forall v\in H_0^1(\Omega).

(7)

Introduce

a(y,v):=\int_\Omega \nabla y\cdot \nabla v\,dx, \qquad \ell_u(v):=\int_\Omega u v\,dx.

(8)

Then the problem reads

a(y,v)=\ell_u(v)\qquad \forall v\in H_0^1(\Omega).

(9)

Why does this have a unique solution?

$a(\cdot,\cdot)$ is bilinear;
$a$ is continuous on $H_0^1(\Omega)\times H_0^1(\Omega)$ ;
$a$ is coercive, since
$a(v,v)=\|\nabla v\|_{L^2(\Omega)}^2$
(10)
and Poincare’s inequality shows that this controls the full $H_0^1$ norm;
$\ell_u$ is continuous because
$|\ell_u(v)| \le \|u\|_{L^2(\Omega)}\|v\|_{L^2(\Omega)} \le C\|u\|_{L^2(\Omega)}\|v\|_{H_0^1(\Omega)}.$
(11)

Lax-Milgram then gives a unique weak solution $y\in H_0^1(\Omega)$ for every $u\in L^2(\Omega)$ .

This allows us to define the state equation as a linear map

S:L^2(\Omega)\to H_0^1(\Omega),\qquad u\mapsto y.

(12)

The distinction to keep in mind is:

strong formulation: pointwise PDE, more regularity required;
weak formulation: variational identity, Sobolev-space setting.

With elliptic regularity one often has more, but for the reduced formulation the key point is simply:

for each control $u$ , there is exactly one state $y=S(u)$ ;
the control is the only independent optimization variable.

Reduced Cost Functional¶

Eliminate the state through the PDE:

y=S(u).

(13)

Then define

f(u):=J(S(u),u) = \frac12\|S(u)-y_d\|_{L^2(\Omega)}^2 +\frac{\alpha}{2}\|u\|_{L^2(\Omega)}^2.

(14)

The PDE-constrained problem can now be written as

\min_{u\in L^2(\Omega)} f(u).

(15)

This is the infinite-dimensional version of the reduced formulation from Lecture 1.

Two remarks:

the optimization variable is now $u\in L^2(\Omega)$ , not a finite vector;
evaluating $f(u)$ requires solving the state equation.

It is also useful to introduce the same problem in a fully operatorial form, because this makes the infinite-dimensional structure look exactly like a block linear system.

Let

V:=H_0^1(\Omega), \qquad Q:=L^2(\Omega).

(16)

We define the elliptic operator $A:V\to V'$ by

\langle Ay,v\rangle_{V',V}:=\int_\Omega \nabla y\cdot \nabla v\,dx \qquad \forall y,v\in V.

(17)

The control enters the state equation through the operator $B:Q\to V'$ defined by

\langle Bu,v\rangle_{V',V}:=(u,v)_{Q} \qquad \forall u\in Q,\ \forall v\in V.

(18)

Since the tracking term is measured in $Q=L^2(\Omega)$ , we also introduce the observation embedding $C:V\to Q$ , here simply

Cy:=y,

(19)

and its associated mass operator $M:=C^*C:V\to V'$ :

\langle My,v\rangle_{V',V}:=(y,v)_Q \qquad \forall y,v\in V.

(20)

For the control cost, it is convenient to write the $L^2$ inner product through the Riesz map $R_Q:Q\to Q'$ ,

\langle R_Q u,w\rangle_{Q',Q}:=(u,w)_Q \qquad \forall u,w\in Q.

(21)

With this notation, the state equation is

Ay-Bu=F \qquad \text{in }V',

(22)

where $F\in V'$ is a given load. In the present model without an additional forcing term, one simply has $F=0$ .

The cost functional can be written as

J(y,u) = \frac12\langle M(y-y_d),y-y_d\rangle_{V',V} +\frac{\alpha}{2}\langle R_Q u,u\rangle_{Q',Q}.

(23)

So the all-at-once infinite-dimensional problem is

\min_{(y,u)\in V\times Q} J(y,u) \qquad\text{subject to}\qquad Ay-Bu=F \text{ in }V'.

(24)

Introduce the Lagrangian with multiplier $p\in V$ :

\mathcal L(y,u,p) := J(y,u)-\langle Ay-Bu-F,p\rangle_{V',V}.

(25)

Its first-order conditions are

\begin{aligned} A y - B u &= F &&\text{in }V',\\ M y - A^* p &= M y_d &&\text{in }V',\\ \alpha R_Q u + B^* p &= 0 &&\text{in }Q'. \end{aligned}

(26)

This is already a saddle-point system: the unknown triple $(y,u,p)$ lives in $V\times Q\times V$ , while the equations live in the dual product space $V'\times Q'\times V'$ after reordering the blocks. Indeed, the KKT system can be written as

\begin{pmatrix} M & 0 & -A^*\\ 0 & \alpha R_Q & B^*\\ A & -B & 0 \end{pmatrix} \begin{pmatrix} y\\ u\\ p \end{pmatrix} = \begin{pmatrix} M y_d\\ 0\\ F \end{pmatrix} \qquad \text{in } V'\times Q'\times V'.

(27)

This is the infinite-dimensional analogue of a symmetric indefinite linear system:

the diagonal block $M$ comes from the tracking term;
the diagonal block $\alpha R_Q$ comes from Tikhonov regularization;
the off-diagonal blocks $A$ , $A^*$ , $B$ , and $B^*$ express the PDE constraint and its adjoint coupling;
the zero block in the $(3,3)$ position is the characteristic signature of a saddle-point problem.

So even before discretization, PDE-constrained optimization already has the same algebraic structure as the block KKT systems that appear in finite-dimensional constrained optimization.

Existence and Uniqueness¶

In finite dimensions, existence follows from the Weierstrass principle: closed and bounded sets are compact, so minimizing sequences have convergent subsequences. That argument is no longer available in infinite-dimensional spaces, because closed and bounded sets in $L^2(\Omega)$ are generally not strongly compact.

So, in addition to convexity, we need three structural ingredients:

coercivity, to prevent minimizing sequences from escaping to infinity;
weak compactness of bounded sets, which is available in reflexive spaces such as $L^2(\Omega)$ ;
weak lower semicontinuity, so the limit of a minimizing sequence still minimizes the functional.

For our reduced functional, coercivity comes from the Tikhonov term:

f(u) = \frac12\|S(u)-y_d\|_{L^2(\Omega)}^2 +\frac{\alpha}{2}\|u\|_{L^2(\Omega)}^2 \ge \frac{\alpha}{2}\|u\|_{L^2(\Omega)}^2.

(28)

Hence every minimizing sequence is bounded in $L^2(\Omega)$ .

To see this more concretely, let $(u_n)$ be a minimizing sequence. Then there exists a constant $C$ such that

f(u_n)\le C \qquad \text{for all } n \text{ large enough}.

(29)

Coercivity then implies

\frac{\alpha}{2}\|u_n\|_{L^2(\Omega)}^2 \le f(u_n) \le C \quad\Longrightarrow\quad \|u_n\|_{L^2(\Omega)} \le R = \sqrt{\frac{2C}{\alpha}}

(30)

so the minimizing sequence is bounded.

Since $L^2(\Omega)$ is reflexive, bounded sequences admit weakly convergent subsequences:

u_n \rightharpoonup \bar u \qquad \text{in } L^2(\Omega).

(31)

For minimization we do not need strong convergence of the whole sequence: we only need one convergent subsequence and a notion of lower semicontinuity that is compatible with that convergence. If $f$ is weakly lower semicontinuous, then

f(\bar u)\le \liminf_{n\to\infty} f(u_n).

(32)

Since $(u_n)$ is minimizing, the right-hand side is exactly $\inf f$ . Hence

f(\bar u)\le \inf f,

(33)

which forces $f(\bar u)=\inf f$ . So the weak limit of a minimizing subsequence is already a minimizer.

The state equation is linear and continuous, so the reduced functional is convex and weakly lower semicontinuous. This allows us to pass to the limit along a minimizing sequence and obtain existence of a minimizer.

Uniqueness comes from strict convexity. The reduced functional $f$ is strictly convex because:

$S$ is linear;
the tracking term is convex;
the Tikhonov term $\frac{\alpha}{2}\|u\|^2$ is strictly convex for $\alpha>0$ .

As a consequence, $f$ has a unique minimizer $\bar u\in L^2(\Omega)$ .

This is the first major structural simplification of the linear-quadratic elliptic case:

existence is obtained through coercivity + weak compactness + weak lower semicontinuity;
uniqueness comes from strict convexity;
first-order optimality will characterize the global minimizer.

Directional Derivative of the Reduced Cost¶

Let $u\in L^2(\Omega)$ and $h\in L^2(\Omega)$ . We now compute the derivative of $f$ explicitly from the Frechet definition:

f'(u) h := \frac{\partial}{\partial t}f(u+th)\Big|_{t=0}

(34)

Since $S$ is linear,

S(u+th)=S(u)+tS(h).

(35)

Using this,

\begin{aligned} f(u+th) &= \frac12\|S(u+th)-y_d\|_{L^2(\Omega)}^2 +\frac{\alpha}{2}\|u+th\|_{L^2(\Omega)}^2\\ &= \frac12\|S(u)+tS(h)-y_d\|_{L^2(\Omega)}^2 +\frac{\alpha}{2}\|u+th\|_{L^2(\Omega)}^2. \end{aligned}

(36)

Expanding both squares gives

\begin{aligned} f(u+th) &= f(u) +t(S(u)-y_d,S(h))_{L^2(\Omega)} +\alpha t(u,h)_{L^2(\Omega)}\\ &\quad +\frac{t^2}{2}\|S(h)\|_{L^2(\Omega)}^2 +\frac{\alpha t^2}{2}\|h\|_{L^2(\Omega)}^2. \end{aligned}

(37)

Subtract $f(u)$ and divide by $t$ :

\frac{f(u+th)-f(u)}{t} = (S(u)-y_d,S(h))_{L^2(\Omega)} +\alpha(u,h)_{L^2(\Omega)} +\frac{t}{2}\|S(h)\|_{L^2(\Omega)}^2 +\frac{\alpha t}{2}\|h\|_{L^2(\Omega)}^2.

(38)

Passing to the limit $t\to 0$ , we obtain

f'(u)h = (S(u)-y_d,S(h))_{L^2(\Omega)} +\alpha(u,h)_{L^2(\Omega)}.

(39)

This formula is correct, but not computationally convenient: if we need to evaluate $f'(u)h$ along many directions $h$ , we would need many state solves.

The adjoint equation removes this difficulty.

Adjoint Equation¶

At this point we know that

f'(u)h=(y-y_d,S(h))_{L^2(\Omega)}+\alpha(u,h)_{L^2(\Omega)}.

(40)

The difficulty is the first term: the direction $h$ appears only indirectly, through the state variation $S(h)$ .

This is exactly where the adjoint enters. We would like to rewrite

(y-y_d,S(h))_{L^2(\Omega)} \qquad \Longrightarrow\qquad (S^*(y-y_d),h)_{L^2(\Omega)},

(41)

as an expression where $h$ appears explicitly. This can be done in a two-step process. Given a control $u$ , we first solve the control-to-state map to get $y=S(u)$ , then we solve the one adjoint PDE to obtain $p=S^*(y-y_d)$ , and then we use $p$ to rewrite the inner product with $S(h)$ as an inner product with $h$ .

More concretely, we look for $p\in H_0^1(\Omega)$ such that for every test function $v\in H_0^1(\Omega)$ ,

\int_\Omega \nabla v\cdot \nabla p\,dx = \int_\Omega (y-y_d)v\,dx.

(42)

Notice the change of roles of $v$ and $p$ compared to the state equation. In this particular example the bilinear form is symmetric, so the change of roles is not visible, but in general the adjoint bilinear form is different from the original one, and the test function $v$ is now associated with the adjoint state $p$ .

Notice the analogy with the state equation:

the left-hand side is the adjoint bilinear form;
only the right-hand side changes;
the source term is now the tracking residual $y-y_d$ .

We define the adjoint state $p\in H_0^1(\Omega)$ by

\int_\Omega \nabla v \cdot \nabla p\,dx = \int_\Omega (y-y_d)v\,dx \qquad \forall v\in H_0^1(\Omega),

(43)

where $y=S(u)$ is the state associated with the current control $u$ .

Since in this case the adjoint bilinear form coincides with the original coercive form, Lax-Milgram applies again and gives existence and uniqueness of $p$ .

If $p$ is smooth enough, the corresponding strong form is

\begin{cases} -\Delta p = y-y_d & \text{in }\Omega,\\ p = 0 & \text{on }\partial\Omega. \end{cases}

(44)

The adjoint PDE is governed by the adjoint operator. For the Poisson problem this is again $-\Delta$ , because the operator is self-adjoint, but this coincidence should be viewed as a special feature of the present model, not as the general rule.

Conceptually:

the state equation propagates the control $u$ forward to produce $y$ ;
the adjoint equation propagates the mismatch $y-y_d$ back into a quantity $p$ ;
this $p$ is precisely what will turn the implicit dependence on $S(h)$ into an explicit inner product with $h$ .

Reduced Gradient Formula¶

From what we have so far, the directional derivative of the reduced cost is

f'(u)h=(y-y_d,S(h))_{L^2(\Omega)}+\alpha(u,h)_{L^2(\Omega)}.

(45)

which we rewrite moving $S$ to the other side of the inner product, i.e., $(y-y_d,S(h))_{L^2(\Omega)}=(S^*(y-y_d),h)_{L^2(\Omega)}$ , and defining the adjoint state $p=S^*(y-y_d)$ .

Thus the reduced gradient is

\nabla f(u)=p+\alpha u \qquad \text{in }L^2(\Omega).

(46)

This is the central formula of the lecture.

First-Order Optimality System¶

For the unconstrained problem, the optimal control $\bar u$ satisfies

f'(\bar u)h=0 \qquad \forall h\in L^2(\Omega),

(47)

that is,

(\nabla f(\bar u),h)_{L^2(\Omega)}=0 \qquad \forall h\in L^2(\Omega).

(48)

The only element of $L^2(\Omega)$ orthogonal to all test directions $h$ is the zero element, so this is equivalent to

\nabla f(\bar u)=0.

(49)

Using the gradient formula derived above, we obtain

\nabla f(\bar u)=\bar p+\alpha \bar u=0.

(50)

The optimality system is then

\begin{cases} -\Delta \bar y = \bar u & \text{in }\Omega,\\ \bar y = 0 & \text{on }\partial\Omega,\\[0.3em] -\Delta \bar p = \bar y-y_d & \text{in }\Omega,\\ \bar p = 0 & \text{on }\partial\Omega,\\[0.3em] \alpha \bar u + \bar p = 0 & \text{in }\Omega. \end{cases}

(51)

Eliminating $\bar u$ gives the explicit formula

\bar u = -\frac{1}{\alpha}\bar p.

(52)

This is the PDE version of the finite-dimensional gradient equation.

Algorithmic Interpretation¶

To evaluate the reduced gradient at a control $u_k$ :

solve the state equation for $y_k$ ;
solve the adjoint equation for $p_k$ ;
form the gradient
$g_k = \alpha u_k + p_k.$
(53)

Then a gradient step reads

u_{k+1}=u_k-\tau_k g_k,

(54)

with $\tau_k$ chosen by exact line search, Armijo backtracking, or another strategy from Lecture 3.

So one iteration of PDE-constrained gradient descent means:

one state solve;
one adjoint solve;
one control update.

This is the computational meaning of the reduced formulation.

Reduced Gradient Algorithm¶

A basic reduced-gradient method is:

choose $u_0\in L^2(\Omega)$ ;
for $k=0,1,2,\dots$ :
- solve the state equation for $y_k=S(u_k)$ ;
- solve the adjoint equation for $p_k$ ;
- compute $g_k=\alpha u_k+p_k$ ;
- choose $\tau_k>0$ ;
- update $u_{k+1}=u_k-\tau_k g_k$ .

Stopping criteria are the same as before, now written in function space, e.g.

\|g_k\|_{L^2(\Omega)}\le \varepsilon.

(55)

Lecture 3 now has a direct PDE interpretation:

GD: immediate and simple;
CG/BFGS: possible on the reduced problem, provided gradients are available;
line search cost is now measured in additional PDE solves.

Summary¶

In the linear elliptic distributed-control setting:

the PDE defines a linear control-to-state map $S$ ;
the constrained problem reduces to minimizing $f(u)=J(Su,u)$ ;
the adjoint equation gives the reduced gradient
$\nabla f(u)=\alpha u+p;$
(56)
unconstrained optimality is the gradient equation
$\alpha \bar u+\bar p=0;$
(57)
every reduced-gradient iteration requires one state solve and one adjoint solve.

This is the basic computational pattern for PDE-constrained optimization.

For a first concrete discrete example, the repository now includes jupyterbook/codes/lecture04/poisson_1d_fd.py. It solves a trivial one-dimensional Poisson control problem by finite differences, checks the reduced gradient by finite differences, and performs a few reduced-gradient iterations. The reusable one-dimensional finite-difference utilities used there live in jupyterbook/codes/common/fd1d.py. For a richer JupyterBook example, see also jupyterbook/codes/lecture04/step_target_fd.ipynb, where the target state is a rectangular step function in $L^2(0,1)$ and the notebook generates both plots and a GIF animation of the state approaching the target.