Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

From ODE-Constrained Optimization to Bochner Spaces

University of Pisa

Overview

The previous lectures developed the continuous and discrete theory for elliptic optimal control, including:

At this stage, the next genuine change of perspective is the introduction of time-dependent dynamics. The simplest rigorous entry point is not yet a parabolic PDE, but a linear control problem governed by an ordinary differential equation. This lecture has two goals:

  1. derive the optimality system for a linear-quadratic control problem governed by an ODE;

  2. introduce the vector-valued function spaces needed to pass from ODEs to parabolic PDEs.

The logical path of the lecture is the following:

  1. formulate a linear-quadratic control problem on (0,T)(0,T);

  2. prove well-posedness of the state equation and continuity of the control-to-state map;

  3. derive the reduced derivative through a linearized state equation;

  4. introduce the adjoint ODE and obtain the gradient formula;

  5. state the first-order optimality conditions, both unconstrained and box-constrained;

  6. isolate the structural difference with the elliptic case: a forward-backward optimality system;

  7. introduce strongly measurable vector-valued functions and Bochner spaces;

  8. define weak time derivatives in dual spaces and the space

    W(0,T):={yL2(0,T;V):ytL2(0,T;V)};W(0,T):=\{y\in L^2(0,T;V): y_t\in L^2(0,T;V')\};
  9. state the basic embedding and integration-by-parts results that make parabolic optimal control possible.


Model Problem

Let T>0T>0 be fixed, and let

ARn×n,BRn×m,y0Rn.A\in \mathbb R^{n\times n}, \qquad B\in \mathbb R^{n\times m}, \qquad y_0\in \mathbb R^n.

Let also

fL2(0,T;Rn),ydL2(0,T;Rn),yTRn,f\in L^2(0,T;\mathbb R^n), \qquad y_d\in L^2(0,T;\mathbb R^n), \qquad y_T\in \mathbb R^n,

and let α>0\alpha>0, β0\beta\ge 0. The control space is

U:=L2(0,T;Rm).U:=L^2(0,T;\mathbb R^m).

The admissible set is a nonempty, closed, convex subset

UadU.U_{\mathrm{ad}}\subset U.

For uUu\in U, the state y=y(u)y=y(u) is defined by the linear ODE

y˙(t)=Ay(t)+Bu(t)+f(t)for a.e. t(0,T),y(0)=y0.\dot y(t)=Ay(t)+Bu(t)+f(t) \qquad \text{for a.e. }t\in(0,T), \qquad y(0)=y_0.

We consider the optimal control problem

minuUadj(u):=J(y(u),u),\min_{u\in U_{\mathrm{ad}}} j(u):=J(y(u),u),

where

J(y,u):=120Ty(t)yd(t)2dt+α20Tu(t)2dt+β2y(T)yT2.J(y,u) := \frac12\int_0^T |y(t)-y_d(t)|^2\,dt +\frac\alpha2\int_0^T |u(t)|^2\,dt +\frac\beta2 |y(T)-y_T|^2.

The three terms are:

This model already contains all structural ingredients of parabolic optimal control:


State Equation and Control-to-State Map

We begin with the well-posedness of the state equation. Since the right-hand side belongs to L2(0,T;Rn)L^2(0,T;\mathbb R^n), one expects the solution to belong to H1(0,T;Rn)H^1(0,T;\mathbb R^n).

Define

Y:=H1(0,T;Rn).Y:=H^1(0,T;\mathbb R^n).

Recall that in finite dimensions

H1(0,T;Rn)C([0,T];Rn),H^1(0,T;\mathbb R^n)\hookrightarrow C([0,T];\mathbb R^n),

hence the terminal value y(T)y(T) is meaningful.

Proposition. For every uUu\in U, the state equation admits a unique solution

yY=H1(0,T;Rn).y\in Y=H^1(0,T;\mathbb R^n).

Moreover, the solution is given by the variation-of-constants formula

y(t)=etAy0+0te(ts)A(Bu(s)+f(s))ds,y(t)=e^{tA}y_0+\int_0^t e^{(t-s)A}(Bu(s)+f(s))\,ds,

and there exists a constant C>0C>0, depending only on AA, BB, and TT, such that

yH1(0,T;Rn)+yC([0,T];Rn)C(y0+uL2(0,T;Rm)+fL2(0,T;Rn)).\|y\|_{H^1(0,T;\mathbb R^n)} + \|y\|_{C([0,T];\mathbb R^n)} \le C\bigl(|y_0|+\|u\|_{L^2(0,T;\mathbb R^m)}+\|f\|_{L^2(0,T;\mathbb R^n)}\bigr).

Proof. Fix uUu\in U and define

g(t):=Bu(t)+f(t)L2(0,T;Rn).g(t):=Bu(t)+f(t)\in L^2(0,T;\mathbb R^n).

Since AA is a constant matrix, the matrix exponential etAe^{tA} is well defined and continuous. The formula

y(t):=etAy0+0te(ts)Ag(s)dsy(t):=e^{tA}y_0+\int_0^t e^{(t-s)A}g(s)\,ds

defines a continuous function on [0,T][0,T]. Differentiating under the integral sign yields

y˙(t)=AetAy0+g(t)+0tAe(ts)Ag(s)ds=Ay(t)+g(t)\dot y(t)=Ae^{tA}y_0+g(t)+\int_0^t A e^{(t-s)A} g(s)\,ds =Ay(t)+g(t)

for a.e. t(0,T)t\in(0,T), and clearly y(0)=y0y(0)=y_0. Thus yH1(0,T;Rn)y\in H^1(0,T;\mathbb R^n) and solves the ODE.

Uniqueness follows from linearity. Indeed, if y1y_1 and y2y_2 solve the same problem, then w:=y1y2w:=y_1-y_2 satisfies

w˙=Aw,w(0)=0.\dot w = A w, \qquad w(0)=0.

Hence w(t)=etAw(0)=0w(t)=e^{tA}w(0)=0 for all tt.

To estimate yy, let

MA:=max0tTetA.M_A:=\max_{0\le t\le T} \|e^{tA}\|.

Then

y(t)MAy0+MA0Tg(s)dsMAy0+MAT1/2gL2(0,T).|y(t)| \le M_A |y_0| + M_A \int_0^T |g(s)|\,ds \le M_A |y_0| + M_A T^{1/2}\|g\|_{L^2(0,T)}.

Taking the supremum in tt gives

yC([0,T])C(y0+gL2(0,T))C(y0+uL2(0,T)+fL2(0,T)).\|y\|_{C([0,T])} \le C\bigl(|y_0|+\|g\|_{L^2(0,T)}\bigr) \le C\bigl(|y_0|+\|u\|_{L^2(0,T)}+\|f\|_{L^2(0,T)}\bigr).

Since y˙=Ay+g\dot y=Ay+g,

y˙L2(0,T)AyL2(0,T)+gL2(0,T)C(y0+uL2(0,T)+fL2(0,T)).\|\dot y\|_{L^2(0,T)} \le \|A\|\,\|y\|_{L^2(0,T)} + \|g\|_{L^2(0,T)} \le C\bigl(|y_0|+\|u\|_{L^2(0,T)}+\|f\|_{L^2(0,T)}\bigr).

Combining the bounds for yy and y˙\dot y yields the claim. \square


The proposition allows us to define the control-to-state map

S:UY,uy=S(u).S:U\to Y, \qquad u\mapsto y=S(u).

Because the state equation is linear, SS is linear and continuous. The reduced functional is therefore

j(u)=J(S(u),u).j(u)=J(S(u),u).

Existence and Uniqueness of an Optimal Control

The previous lectures already established an abstract existence theorem in Hilbert spaces. In the present setting the argument becomes particularly transparent.

Theorem. Assume α>0\alpha>0 and let UadUU_{\mathrm{ad}}\subset U be nonempty, closed, and convex. Then the reduced problem

minuUadj(u)\min_{u\in U_{\mathrm{ad}}} j(u)

admits a unique solution uˉUad\bar u\in U_{\mathrm{ad}}.

Proof. Since S:UYS:U\to Y is continuous and JJ is continuous on Y×UY\times U, the map j:URj:U\to \mathbb R is continuous. Moreover,

j(u)=12S(u)ydL2(0,T;Rn)2+α2uL2(0,T;Rm)2+β2S(u)(T)yT2α2uU2.j(u) = \frac12\|S(u)-y_d\|_{L^2(0,T;\mathbb R^n)}^2 +\frac\alpha2\|u\|_{L^2(0,T;\mathbb R^m)}^2 +\frac\beta2 |S(u)(T)-y_T|^2 \ge \frac\alpha2\|u\|_U^2.

Hence jj is coercive on UU. Let (uk)(u_k) be a minimizing sequence in UadU_{\mathrm{ad}}. Coercivity implies that (uk)(u_k) is bounded in UU. Since UU is a Hilbert space, there exists a subsequence, not relabeled, and an element uˉU\bar u\in U such that

ukuˉweakly in U.u_k \rightharpoonup \bar u \qquad \text{weakly in }U.

Because UadU_{\mathrm{ad}} is closed and convex, it is weakly closed, hence uˉUad\bar u\in U_{\mathrm{ad}}. By continuity and linearity of SS,

S(uk)S(uˉ)weakly in Y.S(u_k) \rightharpoonup S(\bar u) \qquad \text{weakly in }Y.

Since the trace map

yy(T)y\mapsto y(T)

is a continuous linear functional on H1(0,T;Rn)H^1(0,T;\mathbb R^n), we also have

S(uk)(T)S(uˉ)(T)in Rn.S(u_k)(T) \rightharpoonup S(\bar u)(T) \qquad \text{in } \mathbb R^n.

Since the norm is weakly lower semicontinuous in Hilbert spaces, each term in jj is weakly lower semicontinuous, hence

j(uˉ)lim infkj(uk),j(\bar u) \le \liminf_{k\to\infty} j(u_k),

so uˉ\bar u is a minimizer.

To prove uniqueness, let u1u2u_1\neq u_2 and θ(0,1)\theta\in(0,1). Since SS is linear,

S(θu1+(1θ)u2)=θS(u1)+(1θ)S(u2).S(\theta u_1+(1-\theta)u_2)=\theta S(u_1)+(1-\theta)S(u_2).

The first and third terms of jj are convex, while the control term is strictly convex because α>0\alpha>0:

θu1+(1θ)u2U2<θu1U2+(1θ)u2U2.\|\theta u_1+(1-\theta)u_2\|_U^2 < \theta \|u_1\|_U^2 + (1-\theta)\|u_2\|_U^2.

Therefore jj is strictly convex and the minimizer is unique. \square


Linearized State Equation

To differentiate the reduced cost, we perturb the control by hUh\in U. Since the state equation is linear, the state increment is described by a linear ODE independent of the base point.

Let uUu\in U and hUh\in U. Define

zh:=S(u)h.z_h := S'(u)h.

Then zhz_h solves

z˙h(t)=Azh(t)+Bh(t)for a.e. t(0,T),zh(0)=0.\dot z_h(t)=A z_h(t)+B h(t) \qquad \text{for a.e. } t\in(0,T), \qquad z_h(0)=0.

Because the control-to-state map is linear, in fact

S(u)h=S(h)S(0)S'(u)h = S(h)-S(0)

for every uUu\in U. Equivalently, one may derive the linearized equation directly from

y˙=Ay+Bu+f,\dot y = Ay+Bu+f,

by replacing uu with u+εhu+\varepsilon h, subtracting the equation for uu, dividing by ε\varepsilon, and passing to the limit.

The directional derivative of the reduced functional therefore reads

j(u)h=0T(y(t)yd(t))zh(t)dt+β(y(T)yT)zh(T)+α0Tu(t)h(t)dt.j'(u)h = \int_0^T (y(t)-y_d(t))\cdot z_h(t)\,dt +\beta (y(T)-y_T)\cdot z_h(T) +\alpha \int_0^T u(t)\cdot h(t)\,dt.

The difficulty is that this formula still contains the linearized state zhz_h. As in the elliptic case, the adjoint variable removes this dependence.


Adjoint Equation

The adjoint equation is the backward-in-time equation dual to the linearized state equation. Its role is exactly the same as in the elliptic case, but the time direction is reversed.

Let uUu\in U be fixed, and let y=S(u)y=S(u). We define the adjoint state pH1(0,T;Rn)p\in H^1(0,T;\mathbb R^n) by

p˙(t)=ATp(t)+y(t)yd(t)for a.e. t(0,T),p(T)=β(y(T)yT).-\dot p(t)=A^T p(t)+y(t)-y_d(t) \qquad \text{for a.e. } t\in(0,T), \qquad p(T)=\beta\,(y(T)-y_T).

This is again a linear ODE, now with terminal condition prescribed at t=Tt=T. Its unique solution is obtained by solving backward in time, or equivalently by the change of variable s=Tts=T-t.

Proposition. Let uUu\in U, y=S(u)y=S(u), and let pp solve the adjoint equation above. Then for every hUh\in U, with zhz_h solving the linearized state equation,

0T(y(t)yd(t))zh(t)dt+β(y(T)yT)zh(T)=0TBTp(t)h(t)dt.\int_0^T (y(t)-y_d(t))\cdot z_h(t)\,dt +\beta (y(T)-y_T)\cdot z_h(T) = \int_0^T B^T p(t)\cdot h(t)\,dt.

Proof. From the linearized state equation,

z˙hAzhBh=0.\dot z_h - A z_h - B h = 0.

Multiply by pp and integrate over (0,T)(0,T):

0Tp(t)(z˙h(t)Azh(t)Bh(t))dt=0.\int_0^T p(t)\cdot \bigl(\dot z_h(t)-A z_h(t)-B h(t)\bigr)\,dt = 0.

Using (p,Azh)=(ATp,zh)(p,Az_h)=(A^Tp,z_h) and integrating by parts in time,

0Tpz˙hdt=p(T)zh(T)p(0)zh(0)0Tp˙zhdt.\int_0^T p\cdot \dot z_h\,dt = p(T)\cdot z_h(T)-p(0)\cdot z_h(0)-\int_0^T \dot p\cdot z_h\,dt.

Since zh(0)=0z_h(0)=0,

p(T)zh(T)0T(p˙+ATp)zhdt0TBTphdt=0.p(T)\cdot z_h(T)-\int_0^T \bigl(\dot p + A^T p\bigr)\cdot z_h\,dt -\int_0^T B^T p\cdot h\,dt = 0.

By the adjoint equation,

(p˙+ATp)=yyd,-(\dot p + A^T p)=y-y_d,

hence

p(T)zh(T)+0T(yyd)zhdt0TBTphdt=0.p(T)\cdot z_h(T)+\int_0^T (y-y_d)\cdot z_h\,dt -\int_0^T B^T p\cdot h\,dt = 0.

Finally, the terminal condition gives

p(T)zh(T)=β(y(T)yT)zh(T),p(T)\cdot z_h(T)=\beta (y(T)-y_T)\cdot z_h(T),

which yields the claim. \square


Reduced Gradient Formula

We can now eliminate the linearized state from the derivative.

Theorem. The reduced functional j:URj:U\to \mathbb R is Fréchet differentiable and, for every uUu\in U,

j(u)h=0T(αu(t)+BTp(t))h(t)dthU,j'(u)h = \int_0^T \bigl(\alpha u(t)+B^T p(t)\bigr)\cdot h(t)\,dt \qquad \forall h\in U,

where pp is the adjoint associated with uu. Hence the gradient of jj in the Hilbert space U=L2(0,T;Rm)U=L^2(0,T;\mathbb R^m) is

j(u)=αu+BTp.\nabla j(u)=\alpha u + B^T p.

Proof. We already computed

j(u)h=0T(yyd)zhdt+β(y(T)yT)zh(T)+α0Tuhdt.j'(u)h = \int_0^T (y-y_d)\cdot z_h\,dt +\beta (y(T)-y_T)\cdot z_h(T) +\alpha\int_0^T u\cdot h\,dt.

The adjoint identity from the previous proposition gives

0T(yyd)zhdt+β(y(T)yT)zh(T)=0TBTphdt.\int_0^T (y-y_d)\cdot z_h\,dt +\beta (y(T)-y_T)\cdot z_h(T) = \int_0^T B^T p\cdot h\,dt.

Therefore

j(u)h=0T(αu+BTp)hdt.j'(u)h = \int_0^T (\alpha u + B^T p)\cdot h\,dt.

Since this is a bounded linear functional of hh, the Fréchet derivative is represented in UU by the function αu+BTp\alpha u + B^T p. \square


First-Order Optimality System

We can now write the necessary and sufficient optimality conditions. Because the problem is convex and the reduced functional is strictly convex, first-order conditions characterize the unique minimizer.

Unconstrained case

If

Uad=U=L2(0,T;Rm),U_{\mathrm{ad}}=U=L^2(0,T;\mathbb R^m),

then the stationarity condition is simply

j(uˉ)=0,\nabla j(\bar u)=0,

i.e.

αuˉ+BTpˉ=0in L2(0,T;Rm).\alpha \bar u + B^T \bar p = 0 \qquad \text{in }L^2(0,T;\mathbb R^m).

Thus the optimal control satisfies the explicit relation

uˉ(t)=1αBTpˉ(t)for a.e. t(0,T).\bar u(t) = -\frac1\alpha B^T \bar p(t) \qquad \text{for a.e. } t\in(0,T).

The optimality system becomes

{yˉ˙(t)=Ayˉ(t)+Buˉ(t)+f(t),a.e. t(0,T),yˉ(0)=y0,pˉ˙(t)=ATpˉ(t)+yˉ(t)yd(t),a.e. t(0,T),pˉ(T)=β(yˉ(T)yT),αuˉ(t)+BTpˉ(t)=0,a.e. t(0,T).\begin{cases} \dot{\bar y}(t)=A\bar y(t)+B\bar u(t)+f(t), & \text{a.e. } t\in(0,T),\\ \bar y(0)=y_0,\\ -\dot{\bar p}(t)=A^T\bar p(t)+\bar y(t)-y_d(t), & \text{a.e. } t\in(0,T),\\ \bar p(T)=\beta(\bar y(T)-y_T),\\ \alpha \bar u(t)+B^T\bar p(t)=0, & \text{a.e. } t\in(0,T). \end{cases}

This is a forward-backward system:

This two-sided time structure is the first major structural difference from elliptic control.

Constrained case

Assume now that UadUU_{\mathrm{ad}}\subset U is closed and convex. Then uˉUad\bar u\in U_{\mathrm{ad}} is optimal if and only if

j(uˉ)(uuˉ)0uUad.j'(\bar u)(u-\bar u)\ge 0 \qquad \forall u\in U_{\mathrm{ad}}.

Using the adjoint representation of the derivative, this becomes

0T(αuˉ(t)+BTpˉ(t))(u(t)uˉ(t))dt0uUad.\int_0^T \bigl(\alpha \bar u(t)+B^T\bar p(t)\bigr)\cdot \bigl(u(t)-\bar u(t)\bigr)\,dt \ge 0 \qquad \forall u\in U_{\mathrm{ad}}.

Equivalently, in normal-cone form,

0αuˉ+BTpˉ+NUad(uˉ)in U.0\in \alpha \bar u + B^T \bar p + N_{U_{\mathrm{ad}}}(\bar u) \qquad \text{in }U.

Hence the full optimality system is

{yˉ˙=Ayˉ+Buˉ+f,yˉ(0)=y0,pˉ˙=ATpˉ+yˉyd,pˉ(T)=β(yˉ(T)yT),0αuˉ+BTpˉ+NUad(uˉ).\begin{cases} \dot{\bar y}=A\bar y+B\bar u+f, \qquad \bar y(0)=y_0,\\[0.3em] -\dot{\bar p}=A^T\bar p+\bar y-y_d, \qquad \bar p(T)=\beta(\bar y(T)-y_T),\\[0.3em] 0\in \alpha \bar u + B^T \bar p + N_{U_{\mathrm{ad}}}(\bar u). \end{cases}

Box Constraints and Projection Formula

A particularly important case is the box-constrained set

Uad:={uL2(0,T;Rm):ua(t)u(t)ub(t) for a.e. t(0,T)},U_{\mathrm{ad}} := \left\{ u\in L^2(0,T;\mathbb R^m): u_a(t)\le u(t)\le u_b(t) \text{ for a.e. } t\in(0,T) \right\},

where ua,ubL(0,T;Rm)u_a,u_b\in L^\infty(0,T;\mathbb R^m) and the inequalities are understood componentwise.

In this case the optimality condition is equivalent to the pointwise projection formula

uˉ(t)=P[ua(t),ub(t)] ⁣(1αBTpˉ(t))for a.e. t(0,T),\bar u(t)=P_{[u_a(t),u_b(t)]}\!\left(-\frac1\alpha B^T\bar p(t)\right) \qquad \text{for a.e. } t\in(0,T),

where for ξRm\xi\in \mathbb R^m

P[ua(t),ub(t)](ξ)=min(max(ξ,ua(t)),ub(t))P_{[u_a(t),u_b(t)]}(\xi) = \min\bigl(\max(\xi,u_a(t)),u_b(t)\bigr)

componentwise.

The proof is identical in structure to the elliptic case:

Thus the only genuinely new analytical ingredient introduced by time dependence is not the control condition, but the forward-backward evolution structure of state and adjoint.


Forward-Backward Interpretation

It is worth isolating the conceptual meaning of the adjoint in the dynamical case.

For a fixed control uu:

The terminal condition

p(T)=β(y(T)yT)p(T)=\beta(y(T)-y_T)

encodes the derivative of the terminal observation. If β=0\beta=0, then p(T)=0p(T)=0 and only the distributed tracking term drives the adjoint. If instead the cost is purely terminal,

J(y,u)=α20Tu(t)2dt+β2y(T)yT2,J(y,u)=\frac\alpha2\int_0^T |u(t)|^2\,dt + \frac\beta2 |y(T)-y_T|^2,

then the adjoint satisfies

p˙=ATp,p(T)=β(y(T)yT).-\dot p = A^T p, \qquad p(T)=\beta(y(T)-y_T).

This is exactly the same phenomenon that one encounters later for parabolic PDEs:


From ODEs to Evolution Equations

The ODE model can be rewritten abstractly as

yt+Ay=Bu+f,y_t + \mathcal A y = \mathcal B u + f,

where now the state y(t)y(t) is an element of the finite-dimensional Hilbert space H=RnH=\mathbb R^n and

A:=A,B:=B.\mathcal A := -A, \qquad \mathcal B := B.

For parabolic PDEs, the same formula remains formally correct, but the state at each time is no longer a vector in Rn\mathbb R^n. Instead:

This is the reason why the usual scalar-valued Lebesgue and Sobolev spaces are not sufficient. We need function spaces of the form

Lp(0,T;X),L^p(0,T;X),

where XX is itself a Banach or Hilbert space. These are the Bochner spaces.


Strongly Measurable Vector-Valued Functions

Let XX be a Banach space. A function

y:(0,T)Xy:(0,T)\to X

is called simple if it has the form

y(t)=k=1NxkχEk(t),y(t)=\sum_{k=1}^N x_k\,\chi_{E_k}(t),

where xkXx_k\in X and Ek(0,T)E_k\subset(0,T) are measurable sets.

A function y:(0,T)Xy:(0,T)\to X is called strongly measurable if there exists a sequence of simple functions (yn)(y_n) such that

yn(t)y(t)for a.e. t(0,T).y_n(t)\to y(t) \qquad \text{for a.e. } t\in(0,T).

This is the natural notion of measurability for vector-valued functions. In the Hilbert spaces used in parabolic PDEs, separability holds, so this definition behaves well.

If yy is strongly measurable and

0Ty(t)Xdt<,\int_0^T \|y(t)\|_X\,dt < \infty,

then yy is Bochner integrable and one may define

0Ty(t)dtX\int_0^T y(t)\,dt \in X

as the limit of the integrals of simple approximations. This is the vector-valued analogue of the usual Lebesgue integral.


Bochner Spaces Lp(0,T;X)L^p(0,T;X)

Let 1p<1\le p<\infty. We define

Lp(0,T;X):={y:(0,T)X:y strongly measurable and 0Ty(t)Xpdt<}.L^p(0,T;X) := \left\{ y:(0,T)\to X: y \text{ strongly measurable and } \int_0^T \|y(t)\|_X^p\,dt < \infty \right\}.

The norm is

yLp(0,T;X):=(0Ty(t)Xpdt)1/p.\|y\|_{L^p(0,T;X)} := \left(\int_0^T \|y(t)\|_X^p\,dt\right)^{1/p}.

Similarly,

L(0,T;X):={y:(0,T)X:y strongly measurable and ess supt(0,T)y(t)X<}.L^\infty(0,T;X) := \left\{ y:(0,T)\to X: y \text{ strongly measurable and } \operatorname*{ess\,sup}_{t\in(0,T)} \|y(t)\|_X < \infty \right\}.

Standard facts:

Fundamental examples

Let ΩRd\Omega\subset \mathbb R^d be a bounded Lipschitz domain and set

QT:=Ω×(0,T).Q_T:=\Omega\times(0,T).

Then:

  1. if X=L2(Ω)X=L^2(\Omega),

    L2(0,T;L2(Ω))L2(QT);L^2(0,T;L^2(\Omega)) \cong L^2(Q_T);
  2. if X=H01(Ω)X=H_0^1(\Omega),

    L2(0,T;H01(Ω))L^2(0,T;H_0^1(\Omega))

    consists of functions square integrable in time with values in H01(Ω)H_0^1(\Omega);

  3. if X=H1(Ω)X=H^{-1}(\Omega),

    L2(0,T;H1(Ω))L^2(0,T;H^{-1}(\Omega))

    is the natural space for weak time derivatives of parabolic states.

Thus, in parabolic theory, the same function y(x,t)y(x,t) is viewed as a map

ty(t):=y(,t)t\mapsto y(t):=y(\cdot,t)

with values in a spatial function space.


Weak Time Derivatives

For parabolic equations, the time derivative does not usually belong to the same space as the state. This forces a dual-space formulation.

Let XX be a Banach space and let

yL1(0,T;X).y\in L^1(0,T;X).

A function

zL1(0,T;X)z\in L^1(0,T;X)

is called the weak time derivative of yy if for every scalar test function φCc(0,T)\varphi\in C_c^\infty(0,T) and every X\ell\in X',

0T,y(t)X,Xφ(t)dt=0T,z(t)X,Xφ(t)dt.\int_0^T \langle \ell,y(t)\rangle_{X',X}\, \varphi'(t)\,dt = -\int_0^T \langle \ell,z(t)\rangle_{X',X}\, \varphi(t)\,dt.

In this case we write

z=yt.z = y_t.

If XX is Hilbert, one can identify XXX\cong X' by the Riesz map and recover the familiar scalar definition.

The Sobolev space of XX-valued functions is then

H1(0,T;X):={yL2(0,T;X):ytL2(0,T;X)}.H^1(0,T;X) := \{y\in L^2(0,T;X): y_t\in L^2(0,T;X)\}.

For ODEs, where X=RnX=\mathbb R^n, this is the space used for the state and adjoint variables.

For parabolic PDEs, however, the correct setting is generally not H1(0,T;X)H^1(0,T;X) with a single space XX, but a mixed space involving a Hilbert triple.


Gelfand Triples and the Space W(0,T)W(0,T)

Let VV and HH be Hilbert spaces such that

VHV \hookrightarrow H

continuously and densely. By identifying HH with its dual HH' through the Riesz isomorphism, one obtains the Gelfand triple

VHHV.V \hookrightarrow H \cong H' \hookrightarrow V'.

The last embedding is defined by

h,vV,V:=(h,v)HhH, vV.\langle h,v\rangle_{V',V} := (h,v)_H \qquad \forall h\in H,\ \forall v\in V.

The canonical parabolic example is

V=H01(Ω),H=L2(Ω),V=H1(Ω).V=H_0^1(\Omega), \qquad H=L^2(\Omega), \qquad V'=H^{-1}(\Omega).

The natural energy space for parabolic problems is

W(0,T):={yL2(0,T;V):ytL2(0,T;V)}.W(0,T) := \{y\in L^2(0,T;V): y_t\in L^2(0,T;V')\}.

It is a Hilbert space with norm

yW(0,T)2:=yL2(0,T;V)2+ytL2(0,T;V)2.\|y\|_{W(0,T)}^2 := \|y\|_{L^2(0,T;V)}^2 + \|y_t\|_{L^2(0,T;V')}^2.

This is the parabolic analogue of H1(0,T;Rn)H^1(0,T;\mathbb R^n) for ODEs.

The key point is the asymmetry:

This is forced by the weak formulation of the PDE. For the heat equation,

ytΔy=F,y_t - \Delta y = F,

one expects

y(t)H01(Ω),yt(t)H1(Ω).y(t)\in H_0^1(\Omega), \qquad y_t(t)\in H^{-1}(\Omega).

Fundamental Theorem for W(0,T)W(0,T)

The space W(0,T)W(0,T) has a decisive property: its elements possess a continuous representative with values in HH. This is what makes initial and terminal conditions meaningful.

Theorem (Lions-Magenes). Let

VHVV \hookrightarrow H \hookrightarrow V'

be a Gelfand triple. Then:

  1. every yW(0,T)y\in W(0,T) admits a representative, still denoted by yy, such that

    yC([0,T];H);y\in C([0,T];H);
  2. the embedding

    W(0,T)C([0,T];H)W(0,T) \hookrightarrow C([0,T];H)

    is continuous;

  3. if y,vW(0,T)y,v\in W(0,T), then the scalar map

    t(y(t),v(t))Ht\mapsto (y(t),v(t))_H

    is absolutely continuous and satisfies

    ddt(y(t),v(t))H=yt(t),v(t)V,V+vt(t),y(t)V,V\frac{d}{dt}(y(t),v(t))_H = \langle y_t(t),v(t)\rangle_{V',V} + \langle v_t(t),y(t)\rangle_{V',V}

    for a.e. t(0,T)t\in(0,T).

In particular, taking v=yv=y gives

12ddty(t)H2=yt(t),y(t)V,Vfor a.e. t(0,T).\frac12\frac{d}{dt}\|y(t)\|_H^2 = \langle y_t(t),y(t)\rangle_{V',V} \qquad \text{for a.e. } t\in(0,T).

Integrating between ss and tt yields the energy identity

12y(t)H212y(s)H2=styt(τ),y(τ)V,Vdτ.\frac12\|y(t)\|_H^2 - \frac12\|y(s)\|_H^2 = \int_s^t \langle y_t(\tau),y(\tau)\rangle_{V',V}\,d\tau.

More generally, for y,vW(0,T)y,v\in W(0,T),

(y(t),v(t))H(y(s),v(s))H=styt(τ),v(τ)V,Vdτ+stvt(τ),y(τ)V,Vdτ.(y(t),v(t))_H - (y(s),v(s))_H = \int_s^t \langle y_t(\tau),v(\tau)\rangle_{V',V}\,d\tau + \int_s^t \langle v_t(\tau),y(\tau)\rangle_{V',V}\,d\tau.

This is the time-integration-by-parts formula needed for parabolic adjoints.


Abstract Parabolic Problem

With the previous tools in place, one can formulate the prototype parabolic state equation.

Let VHVV\hookrightarrow H\hookrightarrow V' be a Gelfand triple. Let

a:V×VRa:V\times V\to \mathbb R

be a bilinear form satisfying:

  1. continuity:

    a(w,v)MwVvVw,vV;|a(w,v)|\le M\|w\|_V\|v\|_V \qquad \forall w,v\in V;
  2. coercivity:

    a(v,v)cavV2vV,a(v,v)\ge c_a \|v\|_V^2 \qquad \forall v\in V,

    for some ca>0c_a>0.

Define the operator

A:VV,Ay,vV,V=a(y,v).A:V\to V', \qquad \langle Ay,v\rangle_{V',V}=a(y,v).

Given

FL2(0,T;V),y0H,F\in L^2(0,T;V'), \qquad y_0\in H,

the weak parabolic problem is: find yW(0,T)y\in W(0,T) such that

yt(t),vV,V+a(y(t),v)=F(t),vV,VvV,for a.e. t(0,T),\langle y_t(t),v\rangle_{V',V} + a(y(t),v) = \langle F(t),v\rangle_{V',V} \qquad \forall v\in V, \quad \text{for a.e. } t\in(0,T),

with

y(0)=y0in H.y(0)=y_0 \quad \text{in } H.

Theorem. Under the assumptions above, the parabolic problem admits a unique solution

yW(0,T).y\in W(0,T).

Moreover,

yL2(0,T;V)+yL(0,T;H)+ytL2(0,T;V)C(FL2(0,T;V)+y0H),\|y\|_{L^2(0,T;V)} + \|y\|_{L^\infty(0,T;H)} + \|y_t\|_{L^2(0,T;V')} \le C\bigl(\|F\|_{L^2(0,T;V')}+\|y_0\|_H\bigr),

for a constant CC depending only on the continuity and coercivity constants and on TT.

This theorem is the infinite-dimensional analogue of the well-posedness proposition for the ODE state equation. The analogies are exact:


Heat Equation as Canonical Example

Take

V=H01(Ω),H=L2(Ω),V=H1(Ω),V=H_0^1(\Omega), \qquad H=L^2(\Omega), \qquad V'=H^{-1}(\Omega),

and define

a(y,v)=Ωyvdx.a(y,v)=\int_\Omega \nabla y\cdot \nabla v\,dx.

Then

Ay,v=Ωyvdx\langle Ay,v\rangle = \int_\Omega \nabla y\cdot \nabla v\,dx

corresponds to the operator A=ΔA=-\Delta in weak form. Given a control uL2(0,T;L2(Ω))u\in L^2(0,T;L^2(\Omega)), one may write

F(t)=u(t)+f(t)HV.F(t)=u(t)+f(t) \in H \hookrightarrow V'.

The parabolic state equation becomes

yt(t),vH1,H01+Ωy(t)vdx=Ω(u(t)+f(t))vdx\langle y_t(t),v\rangle_{H^{-1},H_0^1} + \int_\Omega \nabla y(t)\cdot \nabla v\,dx = \int_\Omega (u(t)+f(t))v\,dx

for all vH01(Ω)v\in H_0^1(\Omega) and a.e. t(0,T)t\in(0,T), with y(0)=y0y(0)=y_0.

This is the standard weak formulation of the heat equation

ytΔy=u+fin Ω×(0,T),y_t - \Delta y = u+f \qquad \text{in } \Omega\times(0,T),

with homogeneous Dirichlet boundary condition.


What Changes in the Optimality System for Parabolic PDEs?

At the formal level, almost nothing changes. The parabolic optimal control problem has the structure

minuUad120Ty(t)yd(t)H2dt+α2uL2(0,T;U)2+β2y(T)yTH2\min_{u\in U_{\mathrm{ad}}} \frac12\int_0^T \|y(t)-y_d(t)\|_H^2\,dt + \frac\alpha2 \|u\|_{L^2(0,T;U)}^2 + \frac\beta2 \|y(T)-y_T\|_H^2

subject to

yt+Ay=Bu+f,y(0)=y0.y_t + Ay = Bu + f, \qquad y(0)=y_0.

The optimality system has the same forward-backward pattern as in the ODE case:

The new difficulty is not conceptual but functional-analytic:


Summary

This lecture introduced the time-dependent side of optimal control in two layers.

ODE layer

For the linear-quadratic control problem governed by

y˙=Ay+Bu+f,y(0)=y0,\dot y = Ay+Bu+f, \qquad y(0)=y_0,

we proved:

Hence time-dependent optimality already appears as a forward-backward system.

Functional-analytic layer

To pass from ODEs to parabolic PDEs we introduced:

These are exactly the tools needed for the next lecture, where the same adjoint-based optimality machinery will be applied to parabolic PDEs.


References