From ODE-Constrained Optimization to Bochner Spaces
Overview ¶ The previous lectures developed the continuous and discrete theory for
elliptic optimal control, including:
reduced and all-at-once formulations;
adjoint-based gradient representations;
variational inequalities and normal-cone KKT systems;
projected-gradient and primal-dual viewpoints.
At this stage, the next genuine change of perspective is the introduction of
time-dependent dynamics .
The simplest rigorous entry point is not yet a parabolic PDE, but a linear
control problem governed by an ordinary differential equation.
This lecture has two goals:
derive the optimality system for a linear-quadratic control problem governed by an ODE;
introduce the vector-valued function spaces needed to pass from ODEs to parabolic PDEs.
The logical path of the lecture is the following:
formulate a linear-quadratic control problem on ( 0 , T ) (0,T) ( 0 , T ) ;
prove well-posedness of the state equation and continuity of the control-to-state map;
derive the reduced derivative through a linearized state equation;
introduce the adjoint ODE and obtain the gradient formula;
state the first-order optimality conditions, both unconstrained and box-constrained;
isolate the structural difference with the elliptic case: a forward-backward optimality system;
introduce strongly measurable vector-valued functions and Bochner spaces;
define weak time derivatives in dual spaces and the space
W ( 0 , T ) : = { y ∈ L 2 ( 0 , T ; V ) : y t ∈ L 2 ( 0 , T ; V ′ ) } ; W(0,T):=\{y\in L^2(0,T;V): y_t\in L^2(0,T;V')\}; W ( 0 , T ) := { y ∈ L 2 ( 0 , T ; V ) : y t ∈ L 2 ( 0 , T ; V ′ )} ; state the basic embedding and integration-by-parts results that make parabolic optimal control possible.
Model Problem ¶ Let T > 0 T>0 T > 0 be fixed, and let
A ∈ R n × n , B ∈ R n × m , y 0 ∈ R n . A\in \mathbb R^{n\times n},
\qquad
B\in \mathbb R^{n\times m},
\qquad
y_0\in \mathbb R^n. A ∈ R n × n , B ∈ R n × m , y 0 ∈ R n . Let also
f ∈ L 2 ( 0 , T ; R n ) , y d ∈ L 2 ( 0 , T ; R n ) , y T ∈ R n , f\in L^2(0,T;\mathbb R^n),
\qquad
y_d\in L^2(0,T;\mathbb R^n),
\qquad
y_T\in \mathbb R^n, f ∈ L 2 ( 0 , T ; R n ) , y d ∈ L 2 ( 0 , T ; R n ) , y T ∈ R n , and let α > 0 \alpha>0 α > 0 , β ≥ 0 \beta\ge 0 β ≥ 0 .
The control space is
U : = L 2 ( 0 , T ; R m ) . U:=L^2(0,T;\mathbb R^m). U := L 2 ( 0 , T ; R m ) . The admissible set is a nonempty, closed, convex subset
U a d ⊂ U . U_{\mathrm{ad}}\subset U. U ad ⊂ U . For u ∈ U u\in U u ∈ U , the state y = y ( u ) y=y(u) y = y ( u ) is defined by the linear ODE
y ˙ ( t ) = A y ( t ) + B u ( t ) + f ( t ) for a.e. t ∈ ( 0 , T ) , y ( 0 ) = y 0 . \dot y(t)=Ay(t)+Bu(t)+f(t)
\qquad \text{for a.e. }t\in(0,T),
\qquad
y(0)=y_0. y ˙ ( t ) = A y ( t ) + B u ( t ) + f ( t ) for a.e. t ∈ ( 0 , T ) , y ( 0 ) = y 0 . We consider the optimal control problem
min u ∈ U a d j ( u ) : = J ( y ( u ) , u ) , \min_{u\in U_{\mathrm{ad}}} j(u):=J(y(u),u), u ∈ U ad min j ( u ) := J ( y ( u ) , u ) , where
J ( y , u ) : = 1 2 ∫ 0 T ∣ y ( t ) − y d ( t ) ∣ 2 d t + α 2 ∫ 0 T ∣ u ( t ) ∣ 2 d t + β 2 ∣ y ( T ) − y T ∣ 2 . J(y,u)
:=
\frac12\int_0^T |y(t)-y_d(t)|^2\,dt
+\frac\alpha2\int_0^T |u(t)|^2\,dt
+\frac\beta2 |y(T)-y_T|^2. J ( y , u ) := 2 1 ∫ 0 T ∣ y ( t ) − y d ( t ) ∣ 2 d t + 2 α ∫ 0 T ∣ u ( t ) ∣ 2 d t + 2 β ∣ y ( T ) − y T ∣ 2 . The three terms are:
a distributed tracking term in time;
the Tikhonov regularization on the control;
an optional terminal observation.
This model already contains all structural ingredients of parabolic optimal
control:
State Equation and Control-to-State Map ¶ We begin with the well-posedness of the state equation.
Since the right-hand side belongs to L 2 ( 0 , T ; R n ) L^2(0,T;\mathbb R^n) L 2 ( 0 , T ; R n ) , one expects the
solution to belong to H 1 ( 0 , T ; R n ) H^1(0,T;\mathbb R^n) H 1 ( 0 , T ; R n ) .
Define
Y : = H 1 ( 0 , T ; R n ) . Y:=H^1(0,T;\mathbb R^n). Y := H 1 ( 0 , T ; R n ) . Recall that in finite dimensions
H 1 ( 0 , T ; R n ) ↪ C ( [ 0 , T ] ; R n ) , H^1(0,T;\mathbb R^n)\hookrightarrow C([0,T];\mathbb R^n), H 1 ( 0 , T ; R n ) ↪ C ([ 0 , T ] ; R n ) , hence the terminal value y ( T ) y(T) y ( T ) is meaningful.
Proposition.
For every u ∈ U u\in U u ∈ U , the state equation admits a unique solution
y ∈ Y = H 1 ( 0 , T ; R n ) . y\in Y=H^1(0,T;\mathbb R^n). y ∈ Y = H 1 ( 0 , T ; R n ) . Moreover, the solution is given by the variation-of-constants formula
y ( t ) = e t A y 0 + ∫ 0 t e ( t − s ) A ( B u ( s ) + f ( s ) ) d s , y(t)=e^{tA}y_0+\int_0^t e^{(t-s)A}(Bu(s)+f(s))\,ds, y ( t ) = e t A y 0 + ∫ 0 t e ( t − s ) A ( B u ( s ) + f ( s )) d s , and there exists a constant C > 0 C>0 C > 0 , depending only on A A A , B B B , and T T T , such that
∥ y ∥ H 1 ( 0 , T ; R n ) + ∥ y ∥ C ( [ 0 , T ] ; R n ) ≤ C ( ∣ y 0 ∣ + ∥ u ∥ L 2 ( 0 , T ; R m ) + ∥ f ∥ L 2 ( 0 , T ; R n ) ) . \|y\|_{H^1(0,T;\mathbb R^n)} + \|y\|_{C([0,T];\mathbb R^n)}
\le C\bigl(|y_0|+\|u\|_{L^2(0,T;\mathbb R^m)}+\|f\|_{L^2(0,T;\mathbb R^n)}\bigr). ∥ y ∥ H 1 ( 0 , T ; R n ) + ∥ y ∥ C ([ 0 , T ] ; R n ) ≤ C ( ∣ y 0 ∣ + ∥ u ∥ L 2 ( 0 , T ; R m ) + ∥ f ∥ L 2 ( 0 , T ; R n ) ) . Proof.
Fix u ∈ U u\in U u ∈ U and define
g ( t ) : = B u ( t ) + f ( t ) ∈ L 2 ( 0 , T ; R n ) . g(t):=Bu(t)+f(t)\in L^2(0,T;\mathbb R^n). g ( t ) := B u ( t ) + f ( t ) ∈ L 2 ( 0 , T ; R n ) . Since A A A is a constant matrix, the matrix exponential e t A e^{tA} e t A is well defined and continuous.
The formula
y ( t ) : = e t A y 0 + ∫ 0 t e ( t − s ) A g ( s ) d s y(t):=e^{tA}y_0+\int_0^t e^{(t-s)A}g(s)\,ds y ( t ) := e t A y 0 + ∫ 0 t e ( t − s ) A g ( s ) d s defines a continuous function on [ 0 , T ] [0,T] [ 0 , T ] .
Differentiating under the integral sign yields
y ˙ ( t ) = A e t A y 0 + g ( t ) + ∫ 0 t A e ( t − s ) A g ( s ) d s = A y ( t ) + g ( t ) \dot y(t)=Ae^{tA}y_0+g(t)+\int_0^t A e^{(t-s)A} g(s)\,ds
=Ay(t)+g(t) y ˙ ( t ) = A e t A y 0 + g ( t ) + ∫ 0 t A e ( t − s ) A g ( s ) d s = A y ( t ) + g ( t ) for a.e. t ∈ ( 0 , T ) t\in(0,T) t ∈ ( 0 , T ) , and clearly y ( 0 ) = y 0 y(0)=y_0 y ( 0 ) = y 0 .
Thus y ∈ H 1 ( 0 , T ; R n ) y\in H^1(0,T;\mathbb R^n) y ∈ H 1 ( 0 , T ; R n ) and solves the ODE.
Uniqueness follows from linearity.
Indeed, if y 1 y_1 y 1 and y 2 y_2 y 2 solve the same problem, then w : = y 1 − y 2 w:=y_1-y_2 w := y 1 − y 2 satisfies
w ˙ = A w , w ( 0 ) = 0. \dot w = A w,
\qquad
w(0)=0. w ˙ = A w , w ( 0 ) = 0. Hence w ( t ) = e t A w ( 0 ) = 0 w(t)=e^{tA}w(0)=0 w ( t ) = e t A w ( 0 ) = 0 for all t t t .
To estimate y y y , let
M A : = max 0 ≤ t ≤ T ∥ e t A ∥ . M_A:=\max_{0\le t\le T} \|e^{tA}\|. M A := 0 ≤ t ≤ T max ∥ e t A ∥. Then
∣ y ( t ) ∣ ≤ M A ∣ y 0 ∣ + M A ∫ 0 T ∣ g ( s ) ∣ d s ≤ M A ∣ y 0 ∣ + M A T 1 / 2 ∥ g ∥ L 2 ( 0 , T ) . |y(t)|
\le M_A |y_0| + M_A \int_0^T |g(s)|\,ds
\le M_A |y_0| + M_A T^{1/2}\|g\|_{L^2(0,T)}. ∣ y ( t ) ∣ ≤ M A ∣ y 0 ∣ + M A ∫ 0 T ∣ g ( s ) ∣ d s ≤ M A ∣ y 0 ∣ + M A T 1/2 ∥ g ∥ L 2 ( 0 , T ) . Taking the supremum in t t t gives
∥ y ∥ C ( [ 0 , T ] ) ≤ C ( ∣ y 0 ∣ + ∥ g ∥ L 2 ( 0 , T ) ) ≤ C ( ∣ y 0 ∣ + ∥ u ∥ L 2 ( 0 , T ) + ∥ f ∥ L 2 ( 0 , T ) ) . \|y\|_{C([0,T])}
\le C\bigl(|y_0|+\|g\|_{L^2(0,T)}\bigr)
\le C\bigl(|y_0|+\|u\|_{L^2(0,T)}+\|f\|_{L^2(0,T)}\bigr). ∥ y ∥ C ([ 0 , T ]) ≤ C ( ∣ y 0 ∣ + ∥ g ∥ L 2 ( 0 , T ) ) ≤ C ( ∣ y 0 ∣ + ∥ u ∥ L 2 ( 0 , T ) + ∥ f ∥ L 2 ( 0 , T ) ) . Since y ˙ = A y + g \dot y=Ay+g y ˙ = A y + g ,
∥ y ˙ ∥ L 2 ( 0 , T ) ≤ ∥ A ∥ ∥ y ∥ L 2 ( 0 , T ) + ∥ g ∥ L 2 ( 0 , T ) ≤ C ( ∣ y 0 ∣ + ∥ u ∥ L 2 ( 0 , T ) + ∥ f ∥ L 2 ( 0 , T ) ) . \|\dot y\|_{L^2(0,T)}
\le \|A\|\,\|y\|_{L^2(0,T)} + \|g\|_{L^2(0,T)}
\le C\bigl(|y_0|+\|u\|_{L^2(0,T)}+\|f\|_{L^2(0,T)}\bigr). ∥ y ˙ ∥ L 2 ( 0 , T ) ≤ ∥ A ∥ ∥ y ∥ L 2 ( 0 , T ) + ∥ g ∥ L 2 ( 0 , T ) ≤ C ( ∣ y 0 ∣ + ∥ u ∥ L 2 ( 0 , T ) + ∥ f ∥ L 2 ( 0 , T ) ) . Combining the bounds for y y y and y ˙ \dot y y ˙ yields the claim. □ \square □
The proposition allows us to define the control-to-state map
S : U → Y , u ↦ y = S ( u ) . S:U\to Y,
\qquad
u\mapsto y=S(u). S : U → Y , u ↦ y = S ( u ) . Because the state equation is linear, S S S is linear and continuous.
The reduced functional is therefore
j ( u ) = J ( S ( u ) , u ) . j(u)=J(S(u),u). j ( u ) = J ( S ( u ) , u ) . Existence and Uniqueness of an Optimal Control ¶ The previous lectures already established an abstract existence theorem in
Hilbert spaces.
In the present setting the argument becomes particularly transparent.
Theorem.
Assume α > 0 \alpha>0 α > 0 and let U a d ⊂ U U_{\mathrm{ad}}\subset U U ad ⊂ U be nonempty, closed, and convex.
Then the reduced problem
min u ∈ U a d j ( u ) \min_{u\in U_{\mathrm{ad}}} j(u) u ∈ U ad min j ( u ) admits a unique solution u ˉ ∈ U a d \bar u\in U_{\mathrm{ad}} u ˉ ∈ U ad .
Proof.
Since S : U → Y S:U\to Y S : U → Y is continuous and J J J is continuous on Y × U Y\times U Y × U , the map j : U → R j:U\to \mathbb R j : U → R
is continuous.
Moreover,
j ( u ) = 1 2 ∥ S ( u ) − y d ∥ L 2 ( 0 , T ; R n ) 2 + α 2 ∥ u ∥ L 2 ( 0 , T ; R m ) 2 + β 2 ∣ S ( u ) ( T ) − y T ∣ 2 ≥ α 2 ∥ u ∥ U 2 . j(u)
=
\frac12\|S(u)-y_d\|_{L^2(0,T;\mathbb R^n)}^2
+\frac\alpha2\|u\|_{L^2(0,T;\mathbb R^m)}^2
+\frac\beta2 |S(u)(T)-y_T|^2
\ge \frac\alpha2\|u\|_U^2. j ( u ) = 2 1 ∥ S ( u ) − y d ∥ L 2 ( 0 , T ; R n ) 2 + 2 α ∥ u ∥ L 2 ( 0 , T ; R m ) 2 + 2 β ∣ S ( u ) ( T ) − y T ∣ 2 ≥ 2 α ∥ u ∥ U 2 . Hence j j j is coercive on U U U .
Let ( u k ) (u_k) ( u k ) be a minimizing sequence in U a d U_{\mathrm{ad}} U ad .
Coercivity implies that ( u k ) (u_k) ( u k ) is bounded in U U U .
Since U U U is a Hilbert space, there exists a subsequence, not relabeled, and an element
u ˉ ∈ U \bar u\in U u ˉ ∈ U such that
u k ⇀ u ˉ weakly in U . u_k \rightharpoonup \bar u
\qquad \text{weakly in }U. u k ⇀ u ˉ weakly in U . Because U a d U_{\mathrm{ad}} U ad is closed and convex, it is weakly closed, hence u ˉ ∈ U a d \bar u\in U_{\mathrm{ad}} u ˉ ∈ U ad .
By continuity and linearity of S S S ,
S ( u k ) ⇀ S ( u ˉ ) weakly in Y . S(u_k) \rightharpoonup S(\bar u)
\qquad \text{weakly in }Y. S ( u k ) ⇀ S ( u ˉ ) weakly in Y . Since the trace map
y ↦ y ( T ) y\mapsto y(T) y ↦ y ( T ) is a continuous linear functional on H 1 ( 0 , T ; R n ) H^1(0,T;\mathbb R^n) H 1 ( 0 , T ; R n ) , we also have
S ( u k ) ( T ) ⇀ S ( u ˉ ) ( T ) in R n . S(u_k)(T) \rightharpoonup S(\bar u)(T)
\qquad \text{in } \mathbb R^n. S ( u k ) ( T ) ⇀ S ( u ˉ ) ( T ) in R n . Since the norm is weakly lower semicontinuous in Hilbert spaces, each term in j j j is weakly lower semicontinuous, hence
j ( u ˉ ) ≤ lim inf k → ∞ j ( u k ) , j(\bar u)
\le \liminf_{k\to\infty} j(u_k), j ( u ˉ ) ≤ k → ∞ lim inf j ( u k ) , so u ˉ \bar u u ˉ is a minimizer.
To prove uniqueness, let u 1 ≠ u 2 u_1\neq u_2 u 1 = u 2 and θ ∈ ( 0 , 1 ) \theta\in(0,1) θ ∈ ( 0 , 1 ) .
Since S S S is linear,
S ( θ u 1 + ( 1 − θ ) u 2 ) = θ S ( u 1 ) + ( 1 − θ ) S ( u 2 ) . S(\theta u_1+(1-\theta)u_2)=\theta S(u_1)+(1-\theta)S(u_2). S ( θ u 1 + ( 1 − θ ) u 2 ) = θS ( u 1 ) + ( 1 − θ ) S ( u 2 ) . The first and third terms of j j j are convex, while the control term is strictly convex because α > 0 \alpha>0 α > 0 :
∥ θ u 1 + ( 1 − θ ) u 2 ∥ U 2 < θ ∥ u 1 ∥ U 2 + ( 1 − θ ) ∥ u 2 ∥ U 2 . \|\theta u_1+(1-\theta)u_2\|_U^2
< \theta \|u_1\|_U^2 + (1-\theta)\|u_2\|_U^2. ∥ θ u 1 + ( 1 − θ ) u 2 ∥ U 2 < θ ∥ u 1 ∥ U 2 + ( 1 − θ ) ∥ u 2 ∥ U 2 . Therefore j j j is strictly convex and the minimizer is unique. □ \square □
Linearized State Equation ¶ To differentiate the reduced cost, we perturb the control by h ∈ U h\in U h ∈ U .
Since the state equation is linear, the state increment is described by a linear ODE
independent of the base point.
Let u ∈ U u\in U u ∈ U and h ∈ U h\in U h ∈ U .
Define
z h : = S ′ ( u ) h . z_h := S'(u)h. z h := S ′ ( u ) h . Then z h z_h z h solves
z ˙ h ( t ) = A z h ( t ) + B h ( t ) for a.e. t ∈ ( 0 , T ) , z h ( 0 ) = 0. \dot z_h(t)=A z_h(t)+B h(t)
\qquad \text{for a.e. } t\in(0,T),
\qquad
z_h(0)=0. z ˙ h ( t ) = A z h ( t ) + B h ( t ) for a.e. t ∈ ( 0 , T ) , z h ( 0 ) = 0. Because the control-to-state map is linear, in fact
S ′ ( u ) h = S ( h ) − S ( 0 ) S'(u)h = S(h)-S(0) S ′ ( u ) h = S ( h ) − S ( 0 ) for every u ∈ U u\in U u ∈ U .
Equivalently, one may derive the linearized equation directly from
y ˙ = A y + B u + f , \dot y = Ay+Bu+f, y ˙ = A y + B u + f , by replacing u u u with u + ε h u+\varepsilon h u + ε h , subtracting the equation for u u u , dividing by ε \varepsilon ε ,
and passing to the limit.
The directional derivative of the reduced functional therefore reads
j ′ ( u ) h = ∫ 0 T ( y ( t ) − y d ( t ) ) ⋅ z h ( t ) d t + β ( y ( T ) − y T ) ⋅ z h ( T ) + α ∫ 0 T u ( t ) ⋅ h ( t ) d t . j'(u)h
=
\int_0^T (y(t)-y_d(t))\cdot z_h(t)\,dt
+\beta (y(T)-y_T)\cdot z_h(T)
+\alpha \int_0^T u(t)\cdot h(t)\,dt. j ′ ( u ) h = ∫ 0 T ( y ( t ) − y d ( t )) ⋅ z h ( t ) d t + β ( y ( T ) − y T ) ⋅ z h ( T ) + α ∫ 0 T u ( t ) ⋅ h ( t ) d t . The difficulty is that this formula still contains the linearized state z h z_h z h .
As in the elliptic case, the adjoint variable removes this dependence.
Adjoint Equation ¶ The adjoint equation is the backward-in-time equation dual to the linearized
state equation.
Its role is exactly the same as in the elliptic case, but the time direction is reversed.
Let u ∈ U u\in U u ∈ U be fixed, and let y = S ( u ) y=S(u) y = S ( u ) .
We define the adjoint state p ∈ H 1 ( 0 , T ; R n ) p\in H^1(0,T;\mathbb R^n) p ∈ H 1 ( 0 , T ; R n ) by
− p ˙ ( t ) = A T p ( t ) + y ( t ) − y d ( t ) for a.e. t ∈ ( 0 , T ) , p ( T ) = β ( y ( T ) − y T ) . -\dot p(t)=A^T p(t)+y(t)-y_d(t)
\qquad \text{for a.e. } t\in(0,T),
\qquad
p(T)=\beta\,(y(T)-y_T). − p ˙ ( t ) = A T p ( t ) + y ( t ) − y d ( t ) for a.e. t ∈ ( 0 , T ) , p ( T ) = β ( y ( T ) − y T ) . This is again a linear ODE, now with terminal condition prescribed at t = T t=T t = T .
Its unique solution is obtained by solving backward in time, or equivalently by the change of variable
s = T − t s=T-t s = T − t .
Proposition.
Let u ∈ U u\in U u ∈ U , y = S ( u ) y=S(u) y = S ( u ) , and let p p p solve the adjoint equation above.
Then for every h ∈ U h\in U h ∈ U , with z h z_h z h solving the linearized state equation,
∫ 0 T ( y ( t ) − y d ( t ) ) ⋅ z h ( t ) d t + β ( y ( T ) − y T ) ⋅ z h ( T ) = ∫ 0 T B T p ( t ) ⋅ h ( t ) d t . \int_0^T (y(t)-y_d(t))\cdot z_h(t)\,dt
+\beta (y(T)-y_T)\cdot z_h(T)
=
\int_0^T B^T p(t)\cdot h(t)\,dt. ∫ 0 T ( y ( t ) − y d ( t )) ⋅ z h ( t ) d t + β ( y ( T ) − y T ) ⋅ z h ( T ) = ∫ 0 T B T p ( t ) ⋅ h ( t ) d t . Proof.
From the linearized state equation,
z ˙ h − A z h − B h = 0. \dot z_h - A z_h - B h = 0. z ˙ h − A z h − B h = 0. Multiply by p p p and integrate over ( 0 , T ) (0,T) ( 0 , T ) :
∫ 0 T p ( t ) ⋅ ( z ˙ h ( t ) − A z h ( t ) − B h ( t ) ) d t = 0. \int_0^T p(t)\cdot \bigl(\dot z_h(t)-A z_h(t)-B h(t)\bigr)\,dt = 0. ∫ 0 T p ( t ) ⋅ ( z ˙ h ( t ) − A z h ( t ) − B h ( t ) ) d t = 0. Using ( p , A z h ) = ( A T p , z h ) (p,Az_h)=(A^Tp,z_h) ( p , A z h ) = ( A T p , z h ) and integrating by parts in time,
∫ 0 T p ⋅ z ˙ h d t = p ( T ) ⋅ z h ( T ) − p ( 0 ) ⋅ z h ( 0 ) − ∫ 0 T p ˙ ⋅ z h d t . \int_0^T p\cdot \dot z_h\,dt
= p(T)\cdot z_h(T)-p(0)\cdot z_h(0)-\int_0^T \dot p\cdot z_h\,dt. ∫ 0 T p ⋅ z ˙ h d t = p ( T ) ⋅ z h ( T ) − p ( 0 ) ⋅ z h ( 0 ) − ∫ 0 T p ˙ ⋅ z h d t . Since z h ( 0 ) = 0 z_h(0)=0 z h ( 0 ) = 0 ,
p ( T ) ⋅ z h ( T ) − ∫ 0 T ( p ˙ + A T p ) ⋅ z h d t − ∫ 0 T B T p ⋅ h d t = 0. p(T)\cdot z_h(T)-\int_0^T \bigl(\dot p + A^T p\bigr)\cdot z_h\,dt
-\int_0^T B^T p\cdot h\,dt = 0. p ( T ) ⋅ z h ( T ) − ∫ 0 T ( p ˙ + A T p ) ⋅ z h d t − ∫ 0 T B T p ⋅ h d t = 0. By the adjoint equation,
− ( p ˙ + A T p ) = y − y d , -(\dot p + A^T p)=y-y_d, − ( p ˙ + A T p ) = y − y d , hence
p ( T ) ⋅ z h ( T ) + ∫ 0 T ( y − y d ) ⋅ z h d t − ∫ 0 T B T p ⋅ h d t = 0. p(T)\cdot z_h(T)+\int_0^T (y-y_d)\cdot z_h\,dt
-\int_0^T B^T p\cdot h\,dt = 0. p ( T ) ⋅ z h ( T ) + ∫ 0 T ( y − y d ) ⋅ z h d t − ∫ 0 T B T p ⋅ h d t = 0. Finally, the terminal condition gives
p ( T ) ⋅ z h ( T ) = β ( y ( T ) − y T ) ⋅ z h ( T ) , p(T)\cdot z_h(T)=\beta (y(T)-y_T)\cdot z_h(T), p ( T ) ⋅ z h ( T ) = β ( y ( T ) − y T ) ⋅ z h ( T ) , which yields the claim. □ \square □
We can now eliminate the linearized state from the derivative.
Theorem.
The reduced functional j : U → R j:U\to \mathbb R j : U → R is Fréchet differentiable and, for every u ∈ U u\in U u ∈ U ,
j ′ ( u ) h = ∫ 0 T ( α u ( t ) + B T p ( t ) ) ⋅ h ( t ) d t ∀ h ∈ U , j'(u)h = \int_0^T \bigl(\alpha u(t)+B^T p(t)\bigr)\cdot h(t)\,dt
\qquad \forall h\in U, j ′ ( u ) h = ∫ 0 T ( αu ( t ) + B T p ( t ) ) ⋅ h ( t ) d t ∀ h ∈ U , where p p p is the adjoint associated with u u u .
Hence the gradient of j j j in the Hilbert space U = L 2 ( 0 , T ; R m ) U=L^2(0,T;\mathbb R^m) U = L 2 ( 0 , T ; R m ) is
∇ j ( u ) = α u + B T p . \nabla j(u)=\alpha u + B^T p. ∇ j ( u ) = αu + B T p . Proof.
We already computed
j ′ ( u ) h = ∫ 0 T ( y − y d ) ⋅ z h d t + β ( y ( T ) − y T ) ⋅ z h ( T ) + α ∫ 0 T u ⋅ h d t . j'(u)h
=
\int_0^T (y-y_d)\cdot z_h\,dt
+\beta (y(T)-y_T)\cdot z_h(T)
+\alpha\int_0^T u\cdot h\,dt. j ′ ( u ) h = ∫ 0 T ( y − y d ) ⋅ z h d t + β ( y ( T ) − y T ) ⋅ z h ( T ) + α ∫ 0 T u ⋅ h d t . The adjoint identity from the previous proposition gives
∫ 0 T ( y − y d ) ⋅ z h d t + β ( y ( T ) − y T ) ⋅ z h ( T ) = ∫ 0 T B T p ⋅ h d t . \int_0^T (y-y_d)\cdot z_h\,dt
+\beta (y(T)-y_T)\cdot z_h(T)
=
\int_0^T B^T p\cdot h\,dt. ∫ 0 T ( y − y d ) ⋅ z h d t + β ( y ( T ) − y T ) ⋅ z h ( T ) = ∫ 0 T B T p ⋅ h d t . Therefore
j ′ ( u ) h = ∫ 0 T ( α u + B T p ) ⋅ h d t . j'(u)h = \int_0^T (\alpha u + B^T p)\cdot h\,dt. j ′ ( u ) h = ∫ 0 T ( αu + B T p ) ⋅ h d t . Since this is a bounded linear functional of h h h , the Fréchet derivative is represented in U U U
by the function α u + B T p \alpha u + B^T p αu + B T p . □ \square □
First-Order Optimality System ¶ We can now write the necessary and sufficient optimality conditions.
Because the problem is convex and the reduced functional is strictly convex,
first-order conditions characterize the unique minimizer.
Unconstrained case ¶ If
U a d = U = L 2 ( 0 , T ; R m ) , U_{\mathrm{ad}}=U=L^2(0,T;\mathbb R^m), U ad = U = L 2 ( 0 , T ; R m ) , then the stationarity condition is simply
∇ j ( u ˉ ) = 0 , \nabla j(\bar u)=0, ∇ j ( u ˉ ) = 0 , i.e.
α u ˉ + B T p ˉ = 0 in L 2 ( 0 , T ; R m ) . \alpha \bar u + B^T \bar p = 0
\qquad \text{in }L^2(0,T;\mathbb R^m). α u ˉ + B T p ˉ = 0 in L 2 ( 0 , T ; R m ) . Thus the optimal control satisfies the explicit relation
u ˉ ( t ) = − 1 α B T p ˉ ( t ) for a.e. t ∈ ( 0 , T ) . \bar u(t) = -\frac1\alpha B^T \bar p(t)
\qquad \text{for a.e. } t\in(0,T). u ˉ ( t ) = − α 1 B T p ˉ ( t ) for a.e. t ∈ ( 0 , T ) . The optimality system becomes
{ y ˉ ˙ ( t ) = A y ˉ ( t ) + B u ˉ ( t ) + f ( t ) , a.e. t ∈ ( 0 , T ) , y ˉ ( 0 ) = y 0 , − p ˉ ˙ ( t ) = A T p ˉ ( t ) + y ˉ ( t ) − y d ( t ) , a.e. t ∈ ( 0 , T ) , p ˉ ( T ) = β ( y ˉ ( T ) − y T ) , α u ˉ ( t ) + B T p ˉ ( t ) = 0 , a.e. t ∈ ( 0 , T ) . \begin{cases}
\dot{\bar y}(t)=A\bar y(t)+B\bar u(t)+f(t),
& \text{a.e. } t\in(0,T),\\
\bar y(0)=y_0,\\
-\dot{\bar p}(t)=A^T\bar p(t)+\bar y(t)-y_d(t),
& \text{a.e. } t\in(0,T),\\
\bar p(T)=\beta(\bar y(T)-y_T),\\
\alpha \bar u(t)+B^T\bar p(t)=0,
& \text{a.e. } t\in(0,T).
\end{cases} ⎩ ⎨ ⎧ y ˉ ˙ ( t ) = A y ˉ ( t ) + B u ˉ ( t ) + f ( t ) , y ˉ ( 0 ) = y 0 , − p ˉ ˙ ( t ) = A T p ˉ ( t ) + y ˉ ( t ) − y d ( t ) , p ˉ ( T ) = β ( y ˉ ( T ) − y T ) , α u ˉ ( t ) + B T p ˉ ( t ) = 0 , a.e. t ∈ ( 0 , T ) , a.e. t ∈ ( 0 , T ) , a.e. t ∈ ( 0 , T ) . This is a forward-backward system :
This two-sided time structure is the first major structural difference from elliptic control.
Constrained case ¶ Assume now that U a d ⊂ U U_{\mathrm{ad}}\subset U U ad ⊂ U is closed and convex.
Then u ˉ ∈ U a d \bar u\in U_{\mathrm{ad}} u ˉ ∈ U ad is optimal if and only if
j ′ ( u ˉ ) ( u − u ˉ ) ≥ 0 ∀ u ∈ U a d . j'(\bar u)(u-\bar u)\ge 0
\qquad \forall u\in U_{\mathrm{ad}}. j ′ ( u ˉ ) ( u − u ˉ ) ≥ 0 ∀ u ∈ U ad . Using the adjoint representation of the derivative, this becomes
∫ 0 T ( α u ˉ ( t ) + B T p ˉ ( t ) ) ⋅ ( u ( t ) − u ˉ ( t ) ) d t ≥ 0 ∀ u ∈ U a d . \int_0^T \bigl(\alpha \bar u(t)+B^T\bar p(t)\bigr)\cdot \bigl(u(t)-\bar u(t)\bigr)\,dt
\ge 0
\qquad \forall u\in U_{\mathrm{ad}}. ∫ 0 T ( α u ˉ ( t ) + B T p ˉ ( t ) ) ⋅ ( u ( t ) − u ˉ ( t ) ) d t ≥ 0 ∀ u ∈ U ad . Equivalently, in normal-cone form,
0 ∈ α u ˉ + B T p ˉ + N U a d ( u ˉ ) in U . 0\in \alpha \bar u + B^T \bar p + N_{U_{\mathrm{ad}}}(\bar u)
\qquad \text{in }U. 0 ∈ α u ˉ + B T p ˉ + N U ad ( u ˉ ) in U . Hence the full optimality system is
{ y ˉ ˙ = A y ˉ + B u ˉ + f , y ˉ ( 0 ) = y 0 , − p ˉ ˙ = A T p ˉ + y ˉ − y d , p ˉ ( T ) = β ( y ˉ ( T ) − y T ) , 0 ∈ α u ˉ + B T p ˉ + N U a d ( u ˉ ) . \begin{cases}
\dot{\bar y}=A\bar y+B\bar u+f,
\qquad \bar y(0)=y_0,\\[0.3em]
-\dot{\bar p}=A^T\bar p+\bar y-y_d,
\qquad \bar p(T)=\beta(\bar y(T)-y_T),\\[0.3em]
0\in \alpha \bar u + B^T \bar p + N_{U_{\mathrm{ad}}}(\bar u).
\end{cases} ⎩ ⎨ ⎧ y ˉ ˙ = A y ˉ + B u ˉ + f , y ˉ ( 0 ) = y 0 , − p ˉ ˙ = A T p ˉ + y ˉ − y d , p ˉ ( T ) = β ( y ˉ ( T ) − y T ) , 0 ∈ α u ˉ + B T p ˉ + N U ad ( u ˉ ) . A particularly important case is the box-constrained set
U a d : = { u ∈ L 2 ( 0 , T ; R m ) : u a ( t ) ≤ u ( t ) ≤ u b ( t ) for a.e. t ∈ ( 0 , T ) } , U_{\mathrm{ad}}
:=
\left\{
u\in L^2(0,T;\mathbb R^m):
u_a(t)\le u(t)\le u_b(t)
\text{ for a.e. } t\in(0,T)
\right\}, U ad := { u ∈ L 2 ( 0 , T ; R m ) : u a ( t ) ≤ u ( t ) ≤ u b ( t ) for a.e. t ∈ ( 0 , T ) } , where u a , u b ∈ L ∞ ( 0 , T ; R m ) u_a,u_b\in L^\infty(0,T;\mathbb R^m) u a , u b ∈ L ∞ ( 0 , T ; R m ) and the inequalities are understood componentwise.
In this case the optimality condition is equivalent to the pointwise projection formula
u ˉ ( t ) = P [ u a ( t ) , u b ( t ) ] ( − 1 α B T p ˉ ( t ) ) for a.e. t ∈ ( 0 , T ) , \bar u(t)=P_{[u_a(t),u_b(t)]}\!\left(-\frac1\alpha B^T\bar p(t)\right)
\qquad \text{for a.e. } t\in(0,T), u ˉ ( t ) = P [ u a ( t ) , u b ( t )] ( − α 1 B T p ˉ ( t ) ) for a.e. t ∈ ( 0 , T ) , where for ξ ∈ R m \xi\in \mathbb R^m ξ ∈ R m
P [ u a ( t ) , u b ( t ) ] ( ξ ) = min ( max ( ξ , u a ( t ) ) , u b ( t ) ) P_{[u_a(t),u_b(t)]}(\xi)
=
\min\bigl(\max(\xi,u_a(t)),u_b(t)\bigr) P [ u a ( t ) , u b ( t )] ( ξ ) = min ( max ( ξ , u a ( t )) , u b ( t ) ) componentwise.
The proof is identical in structure to the elliptic case:
the variational inequality is posed in the Hilbert space L 2 ( 0 , T ; R m ) L^2(0,T;\mathbb R^m) L 2 ( 0 , T ; R m ) ;
the admissible set is a closed convex box;
the projection is characterized pointwise almost everywhere.
Thus the only genuinely new analytical ingredient introduced by time dependence is not the
control condition, but the forward-backward evolution structure of state and adjoint.
Forward-Backward Interpretation ¶ It is worth isolating the conceptual meaning of the adjoint in the dynamical case.
For a fixed control u u u :
The terminal condition
p ( T ) = β ( y ( T ) − y T ) p(T)=\beta(y(T)-y_T) p ( T ) = β ( y ( T ) − y T ) encodes the derivative of the terminal observation.
If β = 0 \beta=0 β = 0 , then p ( T ) = 0 p(T)=0 p ( T ) = 0 and only the distributed tracking term drives the adjoint.
If instead the cost is purely terminal,
J ( y , u ) = α 2 ∫ 0 T ∣ u ( t ) ∣ 2 d t + β 2 ∣ y ( T ) − y T ∣ 2 , J(y,u)=\frac\alpha2\int_0^T |u(t)|^2\,dt + \frac\beta2 |y(T)-y_T|^2, J ( y , u ) = 2 α ∫ 0 T ∣ u ( t ) ∣ 2 d t + 2 β ∣ y ( T ) − y T ∣ 2 , then the adjoint satisfies
− p ˙ = A T p , p ( T ) = β ( y ( T ) − y T ) . -\dot p = A^T p,
\qquad
p(T)=\beta(y(T)-y_T). − p ˙ = A T p , p ( T ) = β ( y ( T ) − y T ) . This is exactly the same phenomenon that one encounters later for parabolic PDEs:
state equation forward in time;
adjoint equation backward in time;
control condition coupling state and adjoint at the same time level.
From ODEs to Evolution Equations ¶ The ODE model can be rewritten abstractly as
y t + A y = B u + f , y_t + \mathcal A y = \mathcal B u + f, y t + A y = B u + f , where now the state y ( t ) y(t) y ( t ) is an element of the finite-dimensional Hilbert space
H = R n H=\mathbb R^n H = R n and
A : = − A , B : = B . \mathcal A := -A,
\qquad
\mathcal B := B. A := − A , B := B . For parabolic PDEs, the same formula remains formally correct, but the state at each time is no longer a vector in R n \mathbb R^n R n .
Instead:
y ( t ) y(t) y ( t ) is a function of the space variable, e.g. y ( t ) ∈ H 0 1 ( Ω ) y(t)\in H_0^1(\Omega) y ( t ) ∈ H 0 1 ( Ω ) or L 2 ( Ω ) L^2(\Omega) L 2 ( Ω ) ;
the derivative y t y_t y t generally belongs to a dual space rather than to the same space as y y y ;
time integration must be carried out for vector-valued functions.
This is the reason why the usual scalar-valued Lebesgue and Sobolev spaces are not sufficient.
We need function spaces of the form
L p ( 0 , T ; X ) , L^p(0,T;X), L p ( 0 , T ; X ) , where X X X is itself a Banach or Hilbert space.
These are the Bochner spaces .
Strongly Measurable Vector-Valued Functions ¶ Let X X X be a Banach space.
A function
y : ( 0 , T ) → X y:(0,T)\to X y : ( 0 , T ) → X is called simple if it has the form
y ( t ) = ∑ k = 1 N x k χ E k ( t ) , y(t)=\sum_{k=1}^N x_k\,\chi_{E_k}(t), y ( t ) = k = 1 ∑ N x k χ E k ( t ) , where x k ∈ X x_k\in X x k ∈ X and E k ⊂ ( 0 , T ) E_k\subset(0,T) E k ⊂ ( 0 , T ) are measurable sets.
A function y : ( 0 , T ) → X y:(0,T)\to X y : ( 0 , T ) → X is called strongly measurable if there exists a sequence of
simple functions ( y n ) (y_n) ( y n ) such that
y n ( t ) → y ( t ) for a.e. t ∈ ( 0 , T ) . y_n(t)\to y(t)
\qquad \text{for a.e. } t\in(0,T). y n ( t ) → y ( t ) for a.e. t ∈ ( 0 , T ) . This is the natural notion of measurability for vector-valued functions.
In the Hilbert spaces used in parabolic PDEs, separability holds, so this definition behaves well.
If y y y is strongly measurable and
∫ 0 T ∥ y ( t ) ∥ X d t < ∞ , \int_0^T \|y(t)\|_X\,dt < \infty, ∫ 0 T ∥ y ( t ) ∥ X d t < ∞ , then y y y is Bochner integrable and one may define
∫ 0 T y ( t ) d t ∈ X \int_0^T y(t)\,dt \in X ∫ 0 T y ( t ) d t ∈ X as the limit of the integrals of simple approximations.
This is the vector-valued analogue of the usual Lebesgue integral.
Bochner Spaces L p ( 0 , T ; X ) L^p(0,T;X) L p ( 0 , T ; X ) ¶ Let 1 ≤ p < ∞ 1\le p<\infty 1 ≤ p < ∞ .
We define
L p ( 0 , T ; X ) : = { y : ( 0 , T ) → X : y strongly measurable and ∫ 0 T ∥ y ( t ) ∥ X p d t < ∞ } . L^p(0,T;X)
:=
\left\{
y:(0,T)\to X:
y \text{ strongly measurable and }
\int_0^T \|y(t)\|_X^p\,dt < \infty
\right\}. L p ( 0 , T ; X ) := { y : ( 0 , T ) → X : y strongly measurable and ∫ 0 T ∥ y ( t ) ∥ X p d t < ∞ } . The norm is
∥ y ∥ L p ( 0 , T ; X ) : = ( ∫ 0 T ∥ y ( t ) ∥ X p d t ) 1 / p . \|y\|_{L^p(0,T;X)}
:=
\left(\int_0^T \|y(t)\|_X^p\,dt\right)^{1/p}. ∥ y ∥ L p ( 0 , T ; X ) := ( ∫ 0 T ∥ y ( t ) ∥ X p d t ) 1/ p . Similarly,
L ∞ ( 0 , T ; X ) : = { y : ( 0 , T ) → X : y strongly measurable and ess sup t ∈ ( 0 , T ) ∥ y ( t ) ∥ X < ∞ } . L^\infty(0,T;X)
:=
\left\{
y:(0,T)\to X:
y \text{ strongly measurable and }
\operatorname*{ess\,sup}_{t\in(0,T)} \|y(t)\|_X < \infty
\right\}. L ∞ ( 0 , T ; X ) := { y : ( 0 , T ) → X : y strongly measurable and t ∈ ( 0 , T ) ess sup ∥ y ( t ) ∥ X < ∞ } . Standard facts:
if X X X is Banach, then L p ( 0 , T ; X ) L^p(0,T;X) L p ( 0 , T ; X ) is Banach;
if X X X is Hilbert and p = 2 p=2 p = 2 , then L 2 ( 0 , T ; X ) L^2(0,T;X) L 2 ( 0 , T ; X ) is Hilbert with inner product
( y , z ) L 2 ( 0 , T ; X ) : = ∫ 0 T ( y ( t ) , z ( t ) ) X d t ; (y,z)_{L^2(0,T;X)} := \int_0^T (y(t),z(t))_X\,dt; ( y , z ) L 2 ( 0 , T ; X ) := ∫ 0 T ( y ( t ) , z ( t ) ) X d t ; if X ↪ Z X\hookrightarrow Z X ↪ Z continuously, then
L p ( 0 , T ; X ) ↪ L p ( 0 , T ; Z ) L^p(0,T;X)\hookrightarrow L^p(0,T;Z) L p ( 0 , T ; X ) ↪ L p ( 0 , T ; Z ) continuously.
Fundamental examples ¶ Let Ω ⊂ R d \Omega\subset \mathbb R^d Ω ⊂ R d be a bounded Lipschitz domain and set
Q T : = Ω × ( 0 , T ) . Q_T:=\Omega\times(0,T). Q T := Ω × ( 0 , T ) . Then:
if X = L 2 ( Ω ) X=L^2(\Omega) X = L 2 ( Ω ) ,
L 2 ( 0 , T ; L 2 ( Ω ) ) ≅ L 2 ( Q T ) ; L^2(0,T;L^2(\Omega)) \cong L^2(Q_T); L 2 ( 0 , T ; L 2 ( Ω )) ≅ L 2 ( Q T ) ; if X = H 0 1 ( Ω ) X=H_0^1(\Omega) X = H 0 1 ( Ω ) ,
L 2 ( 0 , T ; H 0 1 ( Ω ) ) L^2(0,T;H_0^1(\Omega)) L 2 ( 0 , T ; H 0 1 ( Ω )) consists of functions square integrable in time with values in H 0 1 ( Ω ) H_0^1(\Omega) H 0 1 ( Ω ) ;
if X = H − 1 ( Ω ) X=H^{-1}(\Omega) X = H − 1 ( Ω ) ,
L 2 ( 0 , T ; H − 1 ( Ω ) ) L^2(0,T;H^{-1}(\Omega)) L 2 ( 0 , T ; H − 1 ( Ω )) is the natural space for weak time derivatives of parabolic states.
Thus, in parabolic theory, the same function y ( x , t ) y(x,t) y ( x , t ) is viewed as a map
t ↦ y ( t ) : = y ( ⋅ , t ) t\mapsto y(t):=y(\cdot,t) t ↦ y ( t ) := y ( ⋅ , t ) with values in a spatial function space.
Weak Time Derivatives ¶ For parabolic equations, the time derivative does not usually belong to the same space as the state.
This forces a dual-space formulation.
Let X X X be a Banach space and let
y ∈ L 1 ( 0 , T ; X ) . y\in L^1(0,T;X). y ∈ L 1 ( 0 , T ; X ) . A function
z ∈ L 1 ( 0 , T ; X ) z\in L^1(0,T;X) z ∈ L 1 ( 0 , T ; X ) is called the weak time derivative of y y y if for every scalar test function
φ ∈ C c ∞ ( 0 , T ) \varphi\in C_c^\infty(0,T) φ ∈ C c ∞ ( 0 , T ) and every ℓ ∈ X ′ \ell\in X' ℓ ∈ X ′ ,
∫ 0 T ⟨ ℓ , y ( t ) ⟩ X ′ , X φ ′ ( t ) d t = − ∫ 0 T ⟨ ℓ , z ( t ) ⟩ X ′ , X φ ( t ) d t . \int_0^T \langle \ell,y(t)\rangle_{X',X}\, \varphi'(t)\,dt
=
-\int_0^T \langle \ell,z(t)\rangle_{X',X}\, \varphi(t)\,dt. ∫ 0 T ⟨ ℓ , y ( t ) ⟩ X ′ , X φ ′ ( t ) d t = − ∫ 0 T ⟨ ℓ , z ( t ) ⟩ X ′ , X φ ( t ) d t . In this case we write
If X X X is Hilbert, one can identify X ≅ X ′ X\cong X' X ≅ X ′ by the Riesz map and recover the familiar scalar definition.
The Sobolev space of X X X -valued functions is then
H 1 ( 0 , T ; X ) : = { y ∈ L 2 ( 0 , T ; X ) : y t ∈ L 2 ( 0 , T ; X ) } . H^1(0,T;X)
:=
\{y\in L^2(0,T;X): y_t\in L^2(0,T;X)\}. H 1 ( 0 , T ; X ) := { y ∈ L 2 ( 0 , T ; X ) : y t ∈ L 2 ( 0 , T ; X )} . For ODEs, where X = R n X=\mathbb R^n X = R n , this is the space used for the state and adjoint variables.
For parabolic PDEs, however, the correct setting is generally not H 1 ( 0 , T ; X ) H^1(0,T;X) H 1 ( 0 , T ; X ) with a single space X X X , but a mixed space involving a Hilbert triple.
Gelfand Triples and the Space W ( 0 , T ) W(0,T) W ( 0 , T ) ¶ Let V V V and H H H be Hilbert spaces such that
V ↪ H V \hookrightarrow H V ↪ H continuously and densely.
By identifying H H H with its dual H ′ H' H ′ through the Riesz isomorphism, one obtains the Gelfand triple
V ↪ H ≅ H ′ ↪ V ′ . V \hookrightarrow H \cong H' \hookrightarrow V'. V ↪ H ≅ H ′ ↪ V ′ . The last embedding is defined by
⟨ h , v ⟩ V ′ , V : = ( h , v ) H ∀ h ∈ H , ∀ v ∈ V . \langle h,v\rangle_{V',V} := (h,v)_H
\qquad \forall h\in H,\ \forall v\in V. ⟨ h , v ⟩ V ′ , V := ( h , v ) H ∀ h ∈ H , ∀ v ∈ V . The canonical parabolic example is
V = H 0 1 ( Ω ) , H = L 2 ( Ω ) , V ′ = H − 1 ( Ω ) . V=H_0^1(\Omega),
\qquad
H=L^2(\Omega),
\qquad
V'=H^{-1}(\Omega). V = H 0 1 ( Ω ) , H = L 2 ( Ω ) , V ′ = H − 1 ( Ω ) . The natural energy space for parabolic problems is
W ( 0 , T ) : = { y ∈ L 2 ( 0 , T ; V ) : y t ∈ L 2 ( 0 , T ; V ′ ) } . W(0,T)
:=
\{y\in L^2(0,T;V): y_t\in L^2(0,T;V')\}. W ( 0 , T ) := { y ∈ L 2 ( 0 , T ; V ) : y t ∈ L 2 ( 0 , T ; V ′ )} . It is a Hilbert space with norm
∥ y ∥ W ( 0 , T ) 2 : = ∥ y ∥ L 2 ( 0 , T ; V ) 2 + ∥ y t ∥ L 2 ( 0 , T ; V ′ ) 2 . \|y\|_{W(0,T)}^2
:=
\|y\|_{L^2(0,T;V)}^2 + \|y_t\|_{L^2(0,T;V')}^2. ∥ y ∥ W ( 0 , T ) 2 := ∥ y ∥ L 2 ( 0 , T ; V ) 2 + ∥ y t ∥ L 2 ( 0 , T ; V ′ ) 2 . This is the parabolic analogue of H 1 ( 0 , T ; R n ) H^1(0,T;\mathbb R^n) H 1 ( 0 , T ; R n ) for ODEs.
The key point is the asymmetry:
This is forced by the weak formulation of the PDE.
For the heat equation,
y t − Δ y = F , y_t - \Delta y = F, y t − Δ y = F , one expects
y ( t ) ∈ H 0 1 ( Ω ) , y t ( t ) ∈ H − 1 ( Ω ) . y(t)\in H_0^1(\Omega),
\qquad
y_t(t)\in H^{-1}(\Omega). y ( t ) ∈ H 0 1 ( Ω ) , y t ( t ) ∈ H − 1 ( Ω ) . Fundamental Theorem for W ( 0 , T ) W(0,T) W ( 0 , T ) ¶ The space W ( 0 , T ) W(0,T) W ( 0 , T ) has a decisive property: its elements possess a continuous representative with values in H H H .
This is what makes initial and terminal conditions meaningful.
Theorem (Lions-Magenes).
Let
V ↪ H ↪ V ′ V \hookrightarrow H \hookrightarrow V' V ↪ H ↪ V ′ be a Gelfand triple.
Then:
every y ∈ W ( 0 , T ) y\in W(0,T) y ∈ W ( 0 , T ) admits a representative, still denoted by y y y , such that
y ∈ C ( [ 0 , T ] ; H ) ; y\in C([0,T];H); y ∈ C ([ 0 , T ] ; H ) ; the embedding
W ( 0 , T ) ↪ C ( [ 0 , T ] ; H ) W(0,T) \hookrightarrow C([0,T];H) W ( 0 , T ) ↪ C ([ 0 , T ] ; H ) is continuous;
if y , v ∈ W ( 0 , T ) y,v\in W(0,T) y , v ∈ W ( 0 , T ) , then the scalar map
t ↦ ( y ( t ) , v ( t ) ) H t\mapsto (y(t),v(t))_H t ↦ ( y ( t ) , v ( t ) ) H is absolutely continuous and satisfies
d d t ( y ( t ) , v ( t ) ) H = ⟨ y t ( t ) , v ( t ) ⟩ V ′ , V + ⟨ v t ( t ) , y ( t ) ⟩ V ′ , V \frac{d}{dt}(y(t),v(t))_H
=
\langle y_t(t),v(t)\rangle_{V',V}
+
\langle v_t(t),y(t)\rangle_{V',V} d t d ( y ( t ) , v ( t ) ) H = ⟨ y t ( t ) , v ( t ) ⟩ V ′ , V + ⟨ v t ( t ) , y ( t ) ⟩ V ′ , V for a.e. t ∈ ( 0 , T ) t\in(0,T) t ∈ ( 0 , T ) .
In particular, taking v = y v=y v = y gives
1 2 d d t ∥ y ( t ) ∥ H 2 = ⟨ y t ( t ) , y ( t ) ⟩ V ′ , V for a.e. t ∈ ( 0 , T ) . \frac12\frac{d}{dt}\|y(t)\|_H^2
=
\langle y_t(t),y(t)\rangle_{V',V}
\qquad \text{for a.e. } t\in(0,T). 2 1 d t d ∥ y ( t ) ∥ H 2 = ⟨ y t ( t ) , y ( t ) ⟩ V ′ , V for a.e. t ∈ ( 0 , T ) . Integrating between s s s and t t t yields the energy identity
1 2 ∥ y ( t ) ∥ H 2 − 1 2 ∥ y ( s ) ∥ H 2 = ∫ s t ⟨ y t ( τ ) , y ( τ ) ⟩ V ′ , V d τ . \frac12\|y(t)\|_H^2 - \frac12\|y(s)\|_H^2
=
\int_s^t \langle y_t(\tau),y(\tau)\rangle_{V',V}\,d\tau. 2 1 ∥ y ( t ) ∥ H 2 − 2 1 ∥ y ( s ) ∥ H 2 = ∫ s t ⟨ y t ( τ ) , y ( τ ) ⟩ V ′ , V d τ . More generally, for y , v ∈ W ( 0 , T ) y,v\in W(0,T) y , v ∈ W ( 0 , T ) ,
( y ( t ) , v ( t ) ) H − ( y ( s ) , v ( s ) ) H = ∫ s t ⟨ y t ( τ ) , v ( τ ) ⟩ V ′ , V d τ + ∫ s t ⟨ v t ( τ ) , y ( τ ) ⟩ V ′ , V d τ . (y(t),v(t))_H - (y(s),v(s))_H
=
\int_s^t \langle y_t(\tau),v(\tau)\rangle_{V',V}\,d\tau
+
\int_s^t \langle v_t(\tau),y(\tau)\rangle_{V',V}\,d\tau. ( y ( t ) , v ( t ) ) H − ( y ( s ) , v ( s ) ) H = ∫ s t ⟨ y t ( τ ) , v ( τ ) ⟩ V ′ , V d τ + ∫ s t ⟨ v t ( τ ) , y ( τ ) ⟩ V ′ , V d τ . This is the time-integration-by-parts formula needed for parabolic adjoints.
Abstract Parabolic Problem ¶ With the previous tools in place, one can formulate the prototype parabolic state equation.
Let V ↪ H ↪ V ′ V\hookrightarrow H\hookrightarrow V' V ↪ H ↪ V ′ be a Gelfand triple.
Let
a : V × V → R a:V\times V\to \mathbb R a : V × V → R be a bilinear form satisfying:
continuity :
∣ a ( w , v ) ∣ ≤ M ∥ w ∥ V ∥ v ∥ V ∀ w , v ∈ V ; |a(w,v)|\le M\|w\|_V\|v\|_V
\qquad \forall w,v\in V; ∣ a ( w , v ) ∣ ≤ M ∥ w ∥ V ∥ v ∥ V ∀ w , v ∈ V ; coercivity :
a ( v , v ) ≥ c a ∥ v ∥ V 2 ∀ v ∈ V , a(v,v)\ge c_a \|v\|_V^2
\qquad \forall v\in V, a ( v , v ) ≥ c a ∥ v ∥ V 2 ∀ v ∈ V , for some c a > 0 c_a>0 c a > 0 .
Define the operator
A : V → V ′ , ⟨ A y , v ⟩ V ′ , V = a ( y , v ) . A:V\to V',
\qquad
\langle Ay,v\rangle_{V',V}=a(y,v). A : V → V ′ , ⟨ A y , v ⟩ V ′ , V = a ( y , v ) . Given
F ∈ L 2 ( 0 , T ; V ′ ) , y 0 ∈ H , F\in L^2(0,T;V'),
\qquad
y_0\in H, F ∈ L 2 ( 0 , T ; V ′ ) , y 0 ∈ H , the weak parabolic problem is:
find y ∈ W ( 0 , T ) y\in W(0,T) y ∈ W ( 0 , T ) such that
⟨ y t ( t ) , v ⟩ V ′ , V + a ( y ( t ) , v ) = ⟨ F ( t ) , v ⟩ V ′ , V ∀ v ∈ V , for a.e. t ∈ ( 0 , T ) , \langle y_t(t),v\rangle_{V',V} + a(y(t),v)
= \langle F(t),v\rangle_{V',V}
\qquad \forall v\in V,
\quad \text{for a.e. } t\in(0,T), ⟨ y t ( t ) , v ⟩ V ′ , V + a ( y ( t ) , v ) = ⟨ F ( t ) , v ⟩ V ′ , V ∀ v ∈ V , for a.e. t ∈ ( 0 , T ) , with
y ( 0 ) = y 0 in H . y(0)=y_0 \quad \text{in } H. y ( 0 ) = y 0 in H . Theorem.
Under the assumptions above, the parabolic problem admits a unique solution
y ∈ W ( 0 , T ) . y\in W(0,T). y ∈ W ( 0 , T ) . Moreover,
∥ y ∥ L 2 ( 0 , T ; V ) + ∥ y ∥ L ∞ ( 0 , T ; H ) + ∥ y t ∥ L 2 ( 0 , T ; V ′ ) ≤ C ( ∥ F ∥ L 2 ( 0 , T ; V ′ ) + ∥ y 0 ∥ H ) , \|y\|_{L^2(0,T;V)}
+
\|y\|_{L^\infty(0,T;H)}
+
\|y_t\|_{L^2(0,T;V')}
\le
C\bigl(\|F\|_{L^2(0,T;V')}+\|y_0\|_H\bigr), ∥ y ∥ L 2 ( 0 , T ; V ) + ∥ y ∥ L ∞ ( 0 , T ; H ) + ∥ y t ∥ L 2 ( 0 , T ; V ′ ) ≤ C ( ∥ F ∥ L 2 ( 0 , T ; V ′ ) + ∥ y 0 ∥ H ) , for a constant C C C depending only on the continuity and coercivity constants and on T T T .
This theorem is the infinite-dimensional analogue of the well-posedness proposition for the ODE state equation.
The analogies are exact:
matrix A A A becomes an operator A : V → V ′ A:V\to V' A : V → V ′ ;
state space H 1 ( 0 , T ; R n ) H^1(0,T;\mathbb R^n) H 1 ( 0 , T ; R n ) becomes W ( 0 , T ) W(0,T) W ( 0 , T ) ;
the terminal value is meaningful because W ( 0 , T ) ↪ C ( [ 0 , T ] ; H ) W(0,T)\hookrightarrow C([0,T];H) W ( 0 , T ) ↪ C ([ 0 , T ] ; H ) .
Heat Equation as Canonical Example ¶ Take
V = H 0 1 ( Ω ) , H = L 2 ( Ω ) , V ′ = H − 1 ( Ω ) , V=H_0^1(\Omega),
\qquad
H=L^2(\Omega),
\qquad
V'=H^{-1}(\Omega), V = H 0 1 ( Ω ) , H = L 2 ( Ω ) , V ′ = H − 1 ( Ω ) , and define
a ( y , v ) = ∫ Ω ∇ y ⋅ ∇ v d x . a(y,v)=\int_\Omega \nabla y\cdot \nabla v\,dx. a ( y , v ) = ∫ Ω ∇ y ⋅ ∇ v d x . Then
⟨ A y , v ⟩ = ∫ Ω ∇ y ⋅ ∇ v d x \langle Ay,v\rangle = \int_\Omega \nabla y\cdot \nabla v\,dx ⟨ A y , v ⟩ = ∫ Ω ∇ y ⋅ ∇ v d x corresponds to the operator A = − Δ A=-\Delta A = − Δ in weak form.
Given a control u ∈ L 2 ( 0 , T ; L 2 ( Ω ) ) u\in L^2(0,T;L^2(\Omega)) u ∈ L 2 ( 0 , T ; L 2 ( Ω )) , one may write
F ( t ) = u ( t ) + f ( t ) ∈ H ↪ V ′ . F(t)=u(t)+f(t) \in H \hookrightarrow V'. F ( t ) = u ( t ) + f ( t ) ∈ H ↪ V ′ . The parabolic state equation becomes
⟨ y t ( t ) , v ⟩ H − 1 , H 0 1 + ∫ Ω ∇ y ( t ) ⋅ ∇ v d x = ∫ Ω ( u ( t ) + f ( t ) ) v d x \langle y_t(t),v\rangle_{H^{-1},H_0^1}
+
\int_\Omega \nabla y(t)\cdot \nabla v\,dx
=
\int_\Omega (u(t)+f(t))v\,dx ⟨ y t ( t ) , v ⟩ H − 1 , H 0 1 + ∫ Ω ∇ y ( t ) ⋅ ∇ v d x = ∫ Ω ( u ( t ) + f ( t )) v d x for all v ∈ H 0 1 ( Ω ) v\in H_0^1(\Omega) v ∈ H 0 1 ( Ω ) and a.e. t ∈ ( 0 , T ) t\in(0,T) t ∈ ( 0 , T ) , with y ( 0 ) = y 0 y(0)=y_0 y ( 0 ) = y 0 .
This is the standard weak formulation of the heat equation
y t − Δ y = u + f in Ω × ( 0 , T ) , y_t - \Delta y = u+f
\qquad \text{in } \Omega\times(0,T), y t − Δ y = u + f in Ω × ( 0 , T ) , with homogeneous Dirichlet boundary condition.
What Changes in the Optimality System for Parabolic PDEs? ¶ At the formal level, almost nothing changes.
The parabolic optimal control problem has the structure
min u ∈ U a d 1 2 ∫ 0 T ∥ y ( t ) − y d ( t ) ∥ H 2 d t + α 2 ∥ u ∥ L 2 ( 0 , T ; U ) 2 + β 2 ∥ y ( T ) − y T ∥ H 2 \min_{u\in U_{\mathrm{ad}}}
\frac12\int_0^T \|y(t)-y_d(t)\|_H^2\,dt
+
\frac\alpha2 \|u\|_{L^2(0,T;U)}^2
+
\frac\beta2 \|y(T)-y_T\|_H^2 u ∈ U ad min 2 1 ∫ 0 T ∥ y ( t ) − y d ( t ) ∥ H 2 d t + 2 α ∥ u ∥ L 2 ( 0 , T ; U ) 2 + 2 β ∥ y ( T ) − y T ∥ H 2 subject to
y t + A y = B u + f , y ( 0 ) = y 0 . y_t + Ay = Bu + f,
\qquad
y(0)=y_0. y t + A y = B u + f , y ( 0 ) = y 0 . The optimality system has the same forward-backward pattern as in the ODE case:
state equation forward in time;
adjoint equation backward in time;
control equation or variational inequality at each time.
The new difficulty is not conceptual but functional-analytic:
the state is an element of W ( 0 , T ) W(0,T) W ( 0 , T ) , not of H 1 ( 0 , T ; R n ) H^1(0,T;\mathbb R^n) H 1 ( 0 , T ; R n ) ;
the time derivative lives in V ′ V' V ′ ;
the integration-by-parts identity must be understood in the Gelfand triple.
Summary ¶ This lecture introduced the time-dependent side of optimal control in two layers.
ODE layer ¶ For the linear-quadratic control problem governed by
y ˙ = A y + B u + f , y ( 0 ) = y 0 , \dot y = Ay+Bu+f,
\qquad
y(0)=y_0, y ˙ = A y + B u + f , y ( 0 ) = y 0 , we proved:
well-posedness of the state equation in H 1 ( 0 , T ; R n ) H^1(0,T;\mathbb R^n) H 1 ( 0 , T ; R n ) ;
existence and uniqueness of the optimal control;
the adjoint equation
− p ˙ = A T p + y − y d , p ( T ) = β ( y ( T ) − y T ) ; -\dot p = A^T p + y - y_d,
\qquad
p(T)=\beta(y(T)-y_T); − p ˙ = A T p + y − y d , p ( T ) = β ( y ( T ) − y T ) ; the gradient formula
∇ j ( u ) = α u + B T p ; \nabla j(u)=\alpha u + B^T p; ∇ j ( u ) = αu + B T p ; the first-order optimality condition
∫ 0 T ( α u ˉ + B T p ˉ ) ⋅ ( u − u ˉ ) d t ≥ 0. \int_0^T (\alpha \bar u + B^T \bar p)\cdot (u-\bar u)\,dt\ge 0. ∫ 0 T ( α u ˉ + B T p ˉ ) ⋅ ( u − u ˉ ) d t ≥ 0. Hence time-dependent optimality already appears as a forward-backward system.
Functional-analytic layer ¶ To pass from ODEs to parabolic PDEs we introduced:
strongly measurable vector-valued functions;
Bochner spaces L p ( 0 , T ; X ) L^p(0,T;X) L p ( 0 , T ; X ) ;
weak time derivatives in dual spaces;
Gelfand triples V ↪ H ↪ V ′ V\hookrightarrow H\hookrightarrow V' V ↪ H ↪ V ′ ;
the energy space
W ( 0 , T ) = { y ∈ L 2 ( 0 , T ; V ) : y t ∈ L 2 ( 0 , T ; V ′ ) } ; W(0,T)=\{y\in L^2(0,T;V): y_t\in L^2(0,T;V')\}; W ( 0 , T ) = { y ∈ L 2 ( 0 , T ; V ) : y t ∈ L 2 ( 0 , T ; V ′ )} ; the embedding
W ( 0 , T ) ↪ C ( [ 0 , T ] ; H ) , W(0,T)\hookrightarrow C([0,T];H), W ( 0 , T ) ↪ C ([ 0 , T ] ; H ) , and the corresponding integration-by-parts identity.
These are exactly the tools needed for the next lecture, where the same adjoint-based
optimality machinery will be applied to parabolic PDEs.
References ¶ F. Tröltzsch, Optimal Control of Partial Differential Equations , AMS, 2010.
J. C. De los Reyes, Numerical PDE-Constrained Optimization , Springer, 2015.
A. Manzoni, A. Quarteroni, S. Salsa, Optimal Control of Partial Differential Equations , Springer, 2021.
J.-L. Lions, Optimal Control of Systems Governed by Partial Differential Equations , Springer, 1971.
J.-L. Lions, E. Magenes, Non-Homogeneous Boundary Value Problems and Applications , Springer, 1972.