Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Nonsmooth PDE-Constrained Optimization

University of Pisa

In the previous lectures we focused mainly on smooth PDE-constrained optimization, where the reduced functional is differentiable and the optimality system can be handled with standard Newton or SQP techniques.

Many relevant applications, however, are inherently nonsmooth. The nonsmoothness may enter through the cost functional, through inequality constraints, or through the state equation itself.

Typical examples include:

The goal of this lecture is to introduce the mathematical structure and the numerical treatment of these nonsmooth PDE-constrained optimization problems.

We focus on three model classes:

  1. sparse optimal control with L1L^1 regularization;

  2. pointwise state constraints;

  3. variational inequality constraints.

The presentation follows Chapter 6 of De los Reyes and connects naturally with the KKT, PDAS, and semismooth Newton ideas developed in the previous lectures.

Throughout the lecture, let ΩRd\Omega\subset\mathbb R^d be a bounded Lipschitz polyhedral domain.

Sources of Nonsmoothness

There are three main sources of nonsmoothness in PDE-constrained optimization.

Functional nonsmoothness

The objective functional may contain nonsmooth terms.

Typical examples are

uL1(Ω)\|u\|_{L^1(\Omega)}

for sparse control, or

yydL1(Ω)\|y-y_d\|_{L^1(\Omega)}

for robust data fitting.


Constraint nonsmoothness

The admissible set may contain inequality constraints such as

uauubu_a\le u\le u_b

or

yyb.y\le y_b.

These constraints generate complementarity systems.


Structural nonsmoothness

The PDE itself may be nonsmooth.

Examples include:


Why Nonsmooth Optimization?

In classical linear-quadratic control we minimize

J(y,u)=12yydL2(Ω)2+α2uL2(Ω)2J(y,u) = \frac12\|y-y_d\|_{L^2(\Omega)}^2 + \frac\alpha2\|u\|_{L^2(\Omega)}^2

subject to

Ay=u.Ay=u.

The quadratic control term is smooth and strictly convex.

As a consequence:

However, many applications require structural properties that are not promoted by an L2L^2 penalty.

For instance:

A standard mechanism for promoting sparsity is replacing the quadratic control penalty by an L1L^1 term:

βuL1(Ω).\beta \|u\|_{L^1(\Omega)}.

The L1L^1 norm is convex but not differentiable.

Its subdifferential contains set-valued regions:

x={{1},x>0,[1,1],x=0,{1},x<0.\partial |x| = \begin{cases} \{1\}, & x>0,\\ [-1,1], & x=0,\\ \{-1\}, & x<0. \end{cases}

The loss of smoothness fundamentally changes:

Another source of nonsmoothness comes from variational inequalities.

Obstacle problems naturally lead to complementarity systems of the form

yψ,Ayf0,(yψ)(Ayf)=0.y\ge \psi, \qquad Ay-f\ge 0, \qquad (y-\psi)(Ay-f)=0.

The active set changes discontinuously.

Consequently the solution operator is typically not Fréchet differentiable.

Nevertheless, these problems still possess strong structure that can be exploited numerically.


Sparse Optimal Control

Problem formulation

Consider the elliptic control problem

min(y,u)J(y,u):=12yydL2(Ω)2+α2uL2(Ω)2+βuL1(Ω)\min_{(y,u)} J(y,u) := \frac12\|y-y_d\|_{L^2(\Omega)}^2 + \frac\alpha2\|u\|_{L^2(\Omega)}^2 + \beta\|u\|_{L^1(\Omega)}

subject to

{Δy=u+fin Ω,y=0on Ω.\begin{cases} -\Delta y = u+f & \text{in }\Omega,\\ y=0 & \text{on }\partial\Omega. \end{cases}

Here:

The admissible set may additionally include box constraints:

Uad={uL2(Ω):uauub}.U_{ad} = \{u\in L^2(\Omega): u_a\le u\le u_b\}.

The state equation defines a bounded linear control-to-state map

S:L2(Ω)H01(Ω)H2(Ω),y=S(u).S:L^2(\Omega)\to H_0^1(\Omega)\cap H^2(\Omega), \qquad y=S(u).

The reduced functional becomes

f(u)=12S(u)ydL2(Ω)2+α2uL2(Ω)2+βuL1(Ω).f(u) = \frac12\|S(u)-y_d\|_{L^2(\Omega)}^2 + \frac\alpha2\|u\|_{L^2(\Omega)}^2 + \beta\|u\|_{L^1(\Omega)}.

The nonsmooth term is

uL1(Ω)=Ωu(x)dx.\|u\|_{L^1(\Omega)} = \int_\Omega |u(x)|\,dx.

Existence of solutions

The existence proof follows the direct method of the calculus of variations.

The functional is:

Indeed,

f(u)α2uL2(Ω)2.f(u) \ge \frac\alpha2\|u\|_{L^2(\Omega)}^2.

Thus every minimizing sequence is bounded in L2(Ω)L^2(\Omega).

Using weak compactness, we obtain

unuˉin L2(Ω).u_n \rightharpoonup \bar u \qquad \text{in }L^2(\Omega).

Since the state equation is linear and continuous,

S(un)S(uˉ).S(u_n)\rightharpoonup S(\bar u).

Convexity and lower semicontinuity of the L1L^1 norm imply

uˉL1lim infnunL1.\|\bar u\|_{L^1} \le \liminf_{n\to\infty} \|u_n\|_{L^1}.

Hence uˉ\bar u is optimal.

If α>0\alpha>0, strict convexity of the quadratic term yields uniqueness.


Subdifferentials

The classical derivative is no longer sufficient and we replace it with the notion of subdifferential, which we recall here:

Let XX be a Banach space and let

Φ:XR\Phi:X\to \mathbb R

be convex.

The subdifferential of Φ\Phi at uu is

Φ(u)={λX:Φ(v)Φ(u)+λ,vu vX}.\partial \Phi(u) = \left\{ \lambda\in X^*: \Phi(v)\ge \Phi(u)+\langle \lambda,v-u\rangle \ \forall v\in X \right\}.

Elements of Φ(u)\partial\Phi(u) are called subgradients.

If Φ\Phi is differentiable, then

Φ(u)={Φ(u)}.\partial\Phi(u)=\{\Phi'(u)\}.

Thus the subdifferential generalizes the derivative.


Subdifferential of the L1L^1 norm

Define

Φ(u)=uL1(Ω).\Phi(u)=\|u\|_{L^1(\Omega)}.

Then

λΦ(u)\lambda\in \partial \Phi(u)

if and only if

λ(x)={1,u(x)>0,[1,1],u(x)=0,1,u(x)<0,\lambda(x) = \begin{cases} 1, & u(x)>0,\\ [-1,1], & u(x)=0,\\ -1, & u(x)<0, \end{cases}

almost everywhere.

Equivalently,

λL(Ω),λL1,\lambda\in L^\infty(\Omega), \qquad \|\lambda\|_{L^\infty}\le 1,

and

λ(x)=sign(u(x))where u(x)0.\lambda(x)=\operatorname{sign}(u(x)) \quad \text{where }u(x)\ne 0.

The subdifferential is therefore set-valued exactly at points where u=0u=0.

This is the mathematical mechanism that promotes sparsity.


Optimality Conditions for Sparse Control

Let

uˉ\bar u

be an optimal control and

yˉ=S(uˉ).\bar y=S(\bar u).

The adjoint state satisfies

{Δpˉ=yˉydin Ω,pˉ=0on Ω.\begin{cases} -\Delta \bar p = \bar y-y_d & \text{in }\Omega,\\ \bar p=0 & \text{on }\partial\Omega. \end{cases}

The reduced smooth part has derivative

fs(u)v=(αu+p,v)L2(Ω).f_s'(u)v = (\alpha u+p,v)_{L^2(\Omega)}.

The optimality condition becomes

0αuˉ+pˉ+βuˉL1.0 \in \alpha \bar u + \bar p + \beta\partial \|\bar u\|_{L^1}.

Hence there exists

λˉuˉL1\bar \lambda\in \partial\|\bar u\|_{L^1}

such that

αuˉ+pˉ+βλˉ=0.\alpha \bar u + \bar p + \beta \bar\lambda =0.

Pointwise,

uˉ(x)=0\bar u(x)=0

whenever

pˉ(x)β.|\bar p(x)|\le \beta.

This is the key sparsity relation. Notice that α\alpha does not appear in this threshold condition: when uˉ(x)=0\bar u(x)=0, the term αuˉ(x)\alpha \bar u(x) vanishes, and the inclusion reduces to the requirement that pˉ(x)β[1,1]-\bar p(x)\in \beta[-1,1].

The adjoint variable directly determines the inactive region.


A Hierarchy of Numerical Methods

The optimality condition

0αuˉ+pˉ+βuˉL10 \in \alpha \bar u + \bar p + \beta\partial \|\bar u\|_{L^1}

already suggests a hierarchy of numerical methods of increasing complexity.

At the most basic level, one can work directly with subgradients and obtain globally defined first-order iterations. A more structured strategy is to exploit the splitting between the smooth reduced functional and the nonsmooth L1L^1 term, leading to proximal algorithms. Finally, if one fully exploits the piecewise smooth structure of the optimality system, one arrives at semismooth Newton and primal-dual active set methods, which are more involved but locally much faster.

This gives the progression:

  1. subgradient descent: simplest and most robust, but slowest;

  2. proximal methods: still first-order, but much more effective for L1L^1 terms;

  3. semismooth Newton / PDAS: most structured and locally most efficient.


Subgradient Descent

The simplest way to attack the reduced problem is to use the subdifferential directly.

Let

f(u)=g(u)+βuL1(Ω),f(u)=g(u)+\beta\|u\|_{L^1(\Omega)},

where

g(u)=12S(u)ydL2(Ω)2+α2uL2(Ω)2.g(u) = \frac12\|S(u)-y_d\|_{L^2(\Omega)}^2 + \frac\alpha2\|u\|_{L^2(\Omega)}^2.

Since gg is differentiable and uL1\|u\|_{L^1} is convex, a subgradient of ff at uku_k is

ξk=g(uk)+βλk,λkukL1.\xi_k = \nabla g(u_k)+\beta\lambda_k, \qquad \lambda_k\in \partial \|u_k\|_{L^1}.

The subgradient iteration reads

uk+1=ukτkξk.u_{k+1}=u_k-\tau_k \xi_k.

For the sparse control problem this becomes

uk+1=ukτk(αuk+pk+βλk),u_{k+1} = u_k-\tau_k\bigl(\alpha u_k+p_k+\beta\lambda_k\bigr),

where pkp_k is the adjoint state associated with uku_k.

The method is easy to state and globally meaningful in the convex setting, but it has two important drawbacks:

For this reason, subgradient descent is conceptually important, but in sparse PDE-constrained optimization it is often outperformed by proximal methods.


Proximal Point Method

Instead of following an arbitrary subgradient, one can regularize the nonsmooth problem at each step by adding a quadratic term. This is the idea of the proximal point method.

Consider a convex functional

Φ:UR{+}.\Phi:U\to \mathbb R\cup\{+\infty\}.

Given τk>0\tau_k>0 and an iterate uku_k, the proximal point method defines uk+1u_{k+1} as the minimizer of

uk+1argminuU{Φ(u)+12τkuukU2}.u_{k+1} \in \operatorname*{argmin}_{u\in U} \left\{ \Phi(u) + \frac{1}{2\tau_k}\|u-u_k\|_U^2 \right\}.

The quadratic term makes the subproblem strongly convex and stabilizes the iteration.

The optimality condition for the subproblem is

0Φ(uk+1)+1τk(uk+1uk),0\in \partial \Phi(u_{k+1}) + \frac{1}{\tau_k}(u_{k+1}-u_k),

or equivalently

ukuk+1τkΦ(uk+1).u_k-u_{k+1}\in \tau_k \partial \Phi(u_{k+1}).

This motivates the definition of the proximal map:

proxτΦ(v):=argminuU{Φ(u)+12τuvU2}.\operatorname{prox}_{\tau \Phi}(v) := \operatorname*{argmin}_{u\in U} \left\{ \Phi(u)+\frac{1}{2\tau}\|u-v\|_U^2 \right\}.

With this notation, the proximal point iteration reads

uk+1=proxτkΦ(uk).u_{k+1}=\operatorname{prox}_{\tau_k\Phi}(u_k).

Compared with subgradient descent, this is more implicit and usually more stable. However, if applied directly to the full reduced functional, each step may still be expensive because it involves a nontrivial nonsmooth minimization subproblem.


Proximal Gradient Method

The sparse control problem has the structure

f(u)=g(u)+h(u),f(u)=g(u)+h(u),

where

g(u)=12S(u)ydL2(Ω)2+α2uL2(Ω)2g(u) = \frac12\|S(u)-y_d\|_{L^2(\Omega)}^2 + \frac\alpha2\|u\|_{L^2(\Omega)}^2

is smooth, while

h(u)=βuL1(Ω)h(u)=\beta\|u\|_{L^1(\Omega)}

is convex but nonsmooth.

This splitting leads naturally to the proximal gradient method, also known as forward-backward splitting.

Given a step size τk>0\tau_k>0, one first takes a gradient step for the smooth part and then a proximal step for the nonsmooth part:

uk+1=proxτkh(ukτkg(uk)).u_{k+1} = \operatorname{prox}_{\tau_k h} \bigl( u_k-\tau_k \nabla g(u_k) \bigr).

For the reduced sparse control problem, the gradient of the smooth part is

g(u)=αu+p(u),\nabla g(u)=\alpha u + p(u),

where p(u)p(u) is the adjoint state associated with uu.

Hence the iteration becomes

uk+1=proxτkβL1(ukτk(αuk+pk)).u_{k+1} = \operatorname{prox}_{\tau_k \beta \|\cdot\|_{L^1}} \bigl( u_k-\tau_k(\alpha u_k+p_k) \bigr).

The crucial point is that the proximal map of the L1L^1 norm is explicit and coincides with soft-thresholding:

proxτβL1(v)(x)=soft(v(x),τβ),\operatorname{prox}_{\tau\beta \|\cdot\|_{L^1}}(v)(x) = \operatorname{soft}(v(x),\tau\beta),

where

soft(s,η)={sη,s>η,0,sη,s+η,s<η.\operatorname{soft}(s,\eta) = \begin{cases} s-\eta, & s>\eta,\\ 0, & |s|\le \eta,\\ s+\eta, & s<-\eta. \end{cases}

Therefore each proximal gradient step consists of:

  1. solving the state equation and the adjoint equation to compute g(uk)\nabla g(u_k);

  2. taking a gradient step for the smooth reduced functional;

  3. applying pointwise soft-thresholding.

Unlike plain subgradient descent, proximal gradient methods exploit the exact structure of the nonsmooth term and are therefore much more effective for sparse control problems. They remain first-order methods, so they are robust and relatively easy to implement, but their convergence is still slower than that of Newton-type methods.


Projection Formula and Soft Thresholding

Suppose there are no box constraints.

Then the optimality condition yields the explicit formula

uˉ(x)=1αsoft(pˉ(x),β),\bar u(x) = -\frac1\alpha \operatorname{soft}(\bar p(x),\beta),

where

soft(p,β)={pβ,p>β,0,pβ,p+β,p<β.\operatorname{soft}(p,\beta) = \begin{cases} p-\beta, & p>\beta,\\ 0, & |p|\le \beta,\\ p+\beta, & p<-\beta. \end{cases}

This is called the soft-thresholding operator.

Observe the contrast with classical quadratic control:

uˉ=1αpˉ.\bar u=-\frac1\alpha \bar p.

The L1L^1 term creates a dead zone:

pβ    u=0.|p|\le \beta \implies u=0.

The parameter α\alpha acts outside this dead zone: it does not determine whether the control vanishes, but it scales the magnitude of uu when p>β|p|>\beta.

Hence large portions of the domain may contain exactly vanishing controls.


Semismoothness

The mapping

soft(p,β)\operatorname{soft}(p,\beta)

is not differentiable in the classical sense.

Nevertheless, it is semismooth.

Semismoothness is weaker than Fréchet differentiability but strong enough to obtain superlinear Newton convergence. We recall the semismooth Newton iteration that reads:

Gkδuk=F(uk),G_k \delta u_k = -F(u_k),

where

GkCF(uk).G_k\in \partial_C F(u_k).

Then

uk+1=uk+δuk.u_{k+1}=u_k+\delta u_k.

This is a much more sophisticated strategy than subgradient or proximal methods. It requires more structure, but in return it offers local superlinear convergence once the iteration enters the correct regime.

For sparse control problems, semismooth Newton methods are most naturally derived from an equivalent slack-variable reformulation of the L1L^1 term.


Slack-Variable Reformulation

Introduce an auxiliary variable wL2(Ω)w\in L^2(\Omega) and rewrite the sparse control problem as

min(y,u,w)12yydL2(Ω)2+α2uL2(Ω)2+βΩwdx\min_{(y,u,w)} \frac12\|y-y_d\|_{L^2(\Omega)}^2 + \frac\alpha2\|u\|_{L^2(\Omega)}^2 + \beta\int_\Omega w\,dx

subject to

{Δy=u+fin Ω,y=0on Ω,\begin{cases} -\Delta y = u+f & \text{in }\Omega,\\ y=0 & \text{on }\partial\Omega, \end{cases}

and the pointwise inequalities

uw0,uw0.u-w\le 0, \qquad -u-w\le 0.

These constraints are equivalent to uw|u|\le w, so at the optimum one must have w=uw=|u| and the reformulated problem is equivalent to the original one. Indeed, for any fixed pair (y,u)(y,u) satisfying the state equation, the variable ww appears in the objective only through the linear term

βΩwdx,\beta\int_\Omega w\,dx,

with β>0\beta>0. Therefore the minimization always tries to make ww as small as possible. But the constraints require

w(x)u(x)for a.e. xΩ.w(x)\ge |u(x)| \qquad \text{for a.e. }x\in\Omega.

Hence the smallest admissible choice is precisely

w(x)=u(x)for a.e. xΩ.w(x)=|u(x)| \qquad \text{for a.e. }x\in\Omega.

If on a set of positive measure one had w>uw>|u|, then one could decrease ww on that set without violating the constraints, strictly reduce the objective, and thus contradict optimality.

The advantage is that the nonsmooth L1L^1 term has disappeared from the objective and has been replaced by smooth constraints with complementarity conditions.


KKT System for the Slack Formulation

For the reformulated problem, the Lagrangian is

L(y,u,w,p,λ+,λ)=12yydL2(Ω)2+α2uL2(Ω)2+βΩwdxΩypdx+Ω(u+f)pdx+Ωλ+(uw)dx+Ωλ(uw)dx.\begin{split} \mathcal L(y,u,w,p,\lambda^+,\lambda^-) = & \frac12\|y-y_d\|_{L^2(\Omega)}^2 + \frac\alpha2\|u\|_{L^2(\Omega)}^2 + \beta\int_\Omega w\,dx \\ & - \int_\Omega \nabla y\cdot\nabla p\,dx + \int_\Omega (u+f)p\,dx \\ & + \int_\Omega \lambda^+(u-w)\,dx + \int_\Omega \lambda^-(-u-w)\,dx . \end{split}

Here:

The KKT conditions are

{Δy=u+f,Δp=yyd,αu+p+λ+λ=0,βλ+λ=0,λ+0,λ0,uw0,uw0,λ+(uw)=0,λ(uw)=0.\begin{cases} -\Delta y = u+f,\\ -\Delta p = y-y_d,\\ \alpha u + p + \lambda^+ - \lambda^- = 0,\\ \beta - \lambda^+ - \lambda^- = 0,\\ \lambda^+\ge 0,\quad \lambda^-\ge 0,\\ u-w\le 0,\quad -u-w\le 0,\\ \lambda^+(u-w)=0,\quad \lambda^-(-u-w)=0. \end{cases}

This is now a genuine complementarity system. In particular, the control stationarity condition is an equality, and the multipliers are classical KKT multipliers.


Interpretation of the Complementarity Conditions

The slack formulation makes the structure of the sparse solution transparent.

If u>0u>0, then necessarily w=uw=u, the constraint uw0u-w\le 0 is active, while uw<0-u-w<0 is inactive. Hence

λ+=β,λ=0,\lambda^+=\beta, \qquad \lambda^-=0,

and therefore

αu+p+β=0.\alpha u + p + \beta = 0.

If u<0u<0, then w=uw=-u, the second constraint is active, and one gets

λ+=0,λ=β,\lambda^+=0, \qquad \lambda^-=\beta,

so that

αu+pβ=0.\alpha u + p - \beta = 0.

If u=0u=0, then w=0w=0 and both constraints are active. In this case

λ++λ=β,\lambda^+ + \lambda^- = \beta,

and

p=(λ+λ),p = -(\lambda^+ - \lambda^-),

which implies

pβ.|p|\le \beta.

Thus the slack-variable KKT system recovers exactly the same threshold condition as the subdifferential formulation:

u=0pβ.u=0 \qquad \Longleftrightarrow \qquad |p|\le \beta.

Semismooth Newton and PDAS for the Slack Formulation

The KKT system above is well suited for semismooth Newton and primal-dual active set methods because the nonsmoothness is now entirely encoded in the complementarity relations.

It is convenient to work with the three regions

P:={x:u(x)>0},N:={x:u(x)<0},Z:={x:u(x)=0}.\mathcal P := \{x:u(x)>0\}, \qquad \mathcal N := \{x:u(x)<0\}, \qquad \mathcal Z := \{x:u(x)=0\}.

On these regions, the KKT conditions reduce to simple relations:

λ+=β, λ=0on P,\lambda^+=\beta,\ \lambda^-=0 \quad \text{on }\mathcal P,
λ+=0, λ=βon N,\lambda^+=0,\ \lambda^-=\beta \quad \text{on }\mathcal N,

and

u=0, w=0, λ++λ=βon Z.u=0,\ w=0,\ \lambda^+ + \lambda^-=\beta \quad \text{on }\mathcal Z.

A PDAS iteration then proceeds by:

  1. identifying the current positive, negative, and zero regions;

  2. freezing the corresponding complementarity relations on each region;

  3. solving the resulting linear state-adjoint-control-multiplier system.

This is exactly the active-set counterpart of a semismooth Newton step for the slack-variable KKT system. The method is more elaborate than proximal methods, but it exploits the complementarity structure directly and therefore achieves local superlinear convergence under suitable assumptions.


Direct Dual Multiplier Formulation

There is another convenient way to write the sparse optimality system, closer to the earlier subdifferential formulation and more compact than the slack-variable KKT system.

Starting from

0αu+p+βuL1,0\in \alpha u + p + \beta \partial \|u\|_{L^1},

we introduce the dual variable

λβuL1.\lambda \in \beta \partial \|u\|_{L^1}.

Then the control stationarity condition becomes simply

αu+p+λ=0.\alpha u + p + \lambda = 0.

Pointwise, this means

λ(x)βu(x)={{β},u(x)>0,[β,β],u(x)=0,{β},u(x)<0.\lambda(x)\in \beta \partial |u(x)| = \begin{cases} \{\beta\}, & u(x)>0,\\ [-\beta,\beta], & u(x)=0,\\ \{-\beta\}, & u(x)<0. \end{cases}

Equivalently,

λβ,|\lambda|\le \beta,

and

u>0λ=β,u<0λ=β,λ<βu=0.u>0 \Longrightarrow \lambda=\beta, \qquad u<0 \Longrightarrow \lambda=-\beta, \qquad |\lambda|<\beta \Longrightarrow u=0.

This formulation is exactly the same information as in the earlier sections:

λ=λ+λ,\lambda = \lambda^+ - \lambda^-,

while the stationarity condition

βλ+λ=0\beta - \lambda^+ - \lambda^- = 0

implies

λ=λ+λλ++λ=β.|\lambda| = |\lambda^+ - \lambda^-| \le \lambda^+ + \lambda^- = \beta.

Hence the direct dual formulation is a compact bridge between the two other descriptions:

  1. the variational picture based on subdifferentials;

  2. the complementarity picture based on slack variables and KKT multipliers.

For semismooth Newton methods, one can combine

αu+p+λ=0\alpha u + p + \lambda = 0

with a semismooth characterization of the graph of β\beta\partial|\cdot|, for instance through projection formulas. For PDAS, the slack-variable formulation is often more convenient because the complementarity structure is completely explicit.


References

  1. J.C. De los Reyes, Numerical PDE-Constrained Optimization, Springer, 2015.

  2. F. Tröltzsch, Optimal Control of Partial Differential Equations, AMS, 2010.

  3. M. Hintermüller, K. Ito, K. Kunisch, The primal-dual active set strategy as a semismooth Newton method.

  4. M. Hinze, R. Pinnau, M. Ulbrich, S. Ulbrich, Optimization with PDE Constraints.

  5. A. Manzoni, A. Quarteroni, S. Salsa, Optimal Control of Partial Differential Equations, Springer, 2021.