Finite Element Discretization of Elliptic Optimal Control
Overview ¶ The previous lectures developed the continuous theory for linear-quadratic elliptic
optimal control, derived the adjoint-based gradient formula, and established the
first-order optimality system in both unconstrained and box-constrained settings.
Before introducing finite elements, we place the elliptic control problem
in a general analytical framework:
a general variational formulation for elliptic state equations;
well-posedness of the control-to-state map;
passage from infinite-dimensional analysis to finite-dimensional approximation.
A General Elliptic Boundary Value Problem ¶ Let Ω ⊂ R d \Omega\subset\mathbb R^d Ω ⊂ R d be a bounded Lipschitz domain with boundary Γ \Gamma Γ .
We consider the second-order linear elliptic operator
A y : = − ∇ ⋅ ( A ( x ) ∇ y ) − b ( x ) ⋅ ∇ y + c ( x ) y , A y
:=
-\nabla\cdot(\mathbf A(x)\nabla y)
-\mathbf b(x)\cdot\nabla y
+c(x)y, A y := − ∇ ⋅ ( A ( x ) ∇ y ) − b ( x ) ⋅ ∇ y + c ( x ) y , with coefficients satisfying standard assumptions:
A ∈ L ∞ ( Ω ; R d × d ) \mathbf A\in L^\infty(\Omega;\mathbb R^{d\times d}) A ∈ L ∞ ( Ω ; R d × d ) is uniformly elliptic, i.e. there exist
constants 0 < α ≤ M < ∞ 0<\alpha\le M<\infty 0 < α ≤ M < ∞ such that
α ∣ ξ ∣ 2 ≤ ξ T A ( x ) ξ ≤ M ∣ ξ ∣ 2 ∀ ξ ∈ R d , for q.o. x ∈ Ω ; \alpha |\xi|^2 \le \xi^T \mathbf A(x)\xi \le M |\xi|^2
\qquad \forall \xi\in\mathbb R^d,\ \text{for q.o. }x\in\Omega; α ∣ ξ ∣ 2 ≤ ξ T A ( x ) ξ ≤ M ∣ ξ ∣ 2 ∀ ξ ∈ R d , for q.o. x ∈ Ω ; b ∈ L ∞ ( Ω ) d \mathbf b\in L^\infty(\Omega)^d b ∈ L ∞ ( Ω ) d and
∥ b ∥ L ∞ ( Ω ) ≤ β ; \|\mathbf b\|_{L^\infty(\Omega)}\le \beta; ∥ b ∥ L ∞ ( Ω ) ≤ β ; c ∈ L ∞ ( Ω ) c\in L^\infty(\Omega) c ∈ L ∞ ( Ω ) and
∥ c ∥ L ∞ ( Ω ) ≤ γ ; \|c\|_{L^\infty(\Omega)}\le \gamma; ∥ c ∥ L ∞ ( Ω ) ≤ γ ; div b ∈ L ∞ ( Ω ) \operatorname{div}\mathbf b\in L^\infty(\Omega) div b ∈ L ∞ ( Ω ) ;
the lower-order part is not too negative, in the sense that there exists
α 0 ≥ 0 \alpha_0\ge 0 α 0 ≥ 0 such that
1 2 div b ( x ) + c ( x ) ≥ − α 0 for q.o. x ∈ Ω , \frac12 \operatorname{div}\mathbf b(x) + c(x)\ge -\alpha_0
\qquad \text{for q.o. } x\in\Omega, 2 1 div b ( x ) + c ( x ) ≥ − α 0 for q.o. x ∈ Ω , and α 0 \alpha_0 α 0 is small enough compared with the ellipticity constant α \alpha α , namely
α 0 < α C P 2 , \alpha_0 < \frac{\alpha}{C_P^2}, α 0 < C P 2 α , where C P C_P C P is the Poincare constant of Ω \Omega Ω .
We impose homogeneous Dirichlet boundary conditions
y = 0 on Γ . y=0 \qquad \text{on }\Gamma. y = 0 on Γ. We work in the standard Sobolev space
V : = H 0 1 ( Ω ) . V:=H_0^1(\Omega). V := H 0 1 ( Ω ) . The weak formulation of the state equation is:
find y ∈ V y\in V y ∈ V such that
a ( y , v ) = ( u , v ) L 2 ( Ω ) + ⟨ F , v ⟩ V ′ , V ∀ v ∈ V , a(y,v)=(u,v)_{L^2(\Omega)}+\langle F,v\rangle_{V',V}
\qquad \forall v\in V, a ( y , v ) = ( u , v ) L 2 ( Ω ) + ⟨ F , v ⟩ V ′ , V ∀ v ∈ V , where the bilinear form is
⟨ A y , v ⟩ V ′ , V : = a ( y , v ) = ∫ Ω A ∇ y ⋅ ∇ v d x − ∫ Ω b ⋅ ∇ y v d x + ∫ Ω c y v d x . \langle Ay, v\rangle_{V',V} := a(y,v)
=
\int_\Omega \mathbf A\nabla y\cdot \nabla v\,dx
- \int_\Omega \mathbf b\cdot \nabla y\,v\,dx
+ \int_\Omega c\,y\,v\,dx. ⟨ A y , v ⟩ V ′ , V := a ( y , v ) = ∫ Ω A ∇ y ⋅ ∇ v d x − ∫ Ω b ⋅ ∇ y v d x + ∫ Ω c y v d x . Special case:
A ( x ) = I , b ( x ) = 0 , c ( x ) = 0. \mathbf A(x)=I, \qquad \mathbf b(x)=0, \qquad c(x)=0. A ( x ) = I , b ( x ) = 0 , c ( x ) = 0. The model studied in the previous lectures is the simplest representative of this class.
Trace Theorem and Boundary Data ¶ Since Ω \Omega Ω is Lipschitz, the trace operator
γ : H s ( Ω ) → H s − 1 2 ( Γ ) , s > 1 2 , \gamma:H^s(\Omega)\to H^{s-\frac12}(\Gamma),
\qquad s>\frac12, γ : H s ( Ω ) → H s − 2 1 ( Γ ) , s > 2 1 , is continuous.
Consequences:
boundary conditions are meaningful in the Sobolev setting;
boundary control problems can also be formulated rigorously, even though in this lecture
we focus on distributed controls.
For s = 1 s=1 s = 1 :
γ : H 1 ( Ω ) → H 1 / 2 ( Γ ) , \gamma:H^1(\Omega)\to H^{1/2}(\Gamma), γ : H 1 ( Ω ) → H 1/2 ( Γ ) , The trace operator admits a continuous right inverse:
there exists a bounded lifting operator
E : H s − 1 2 ( Γ ) → H s ( Ω ) , s > 1 2 , E:H^{s-\frac12}(\Gamma)\to H^s(\Omega),
\qquad s>\frac12, E : H s − 2 1 ( Γ ) → H s ( Ω ) , s > 2 1 , such that
γ ( E g ) = g . \gamma(Eg)=g. γ ( E g ) = g . This yields the standard lifting of nonhomogeneous Dirichlet data.
For homogeneous Dirichlet data, the variational space is H 0 1 ( Ω ) H_0^1(\Omega) H 0 1 ( Ω ) .
Continuity ¶ Since A \mathbf A A , b \mathbf b b , and c c c are bounded, each term is continuous.
∣ ∫ Ω A ∇ y ⋅ ∇ v d x ∣ ≤ M ∥ ∇ y ∥ L 2 ( Ω ) ∥ ∇ v ∥ L 2 ( Ω ) . \left|\int_\Omega \mathbf A\nabla y\cdot \nabla v\,dx\right|
\le M \|\nabla y\|_{L^2(\Omega)} \|\nabla v\|_{L^2(\Omega)}. ∣ ∣ ∫ Ω A ∇ y ⋅ ∇ v d x ∣ ∣ ≤ M ∥∇ y ∥ L 2 ( Ω ) ∥∇ v ∥ L 2 ( Ω ) . Transport term:
∣ ∫ Ω b ⋅ ∇ y v d x ∣ ≤ β ∥ ∇ y ∥ L 2 ( Ω ) ∥ v ∥ L 2 ( Ω ) . \left|\int_\Omega \mathbf b\cdot \nabla y\,v\,dx\right|
\le \beta \|\nabla y\|_{L^2(\Omega)} \|v\|_{L^2(\Omega)}. ∣ ∣ ∫ Ω b ⋅ ∇ y v d x ∣ ∣ ≤ β ∥∇ y ∥ L 2 ( Ω ) ∥ v ∥ L 2 ( Ω ) . Zero-order term:
∣ ∫ Ω c y v d x ∣ ≤ γ ∥ y ∥ L 2 ( Ω ) ∥ v ∥ L 2 ( Ω ) . \left|\int_\Omega c\,y\,v\,dx\right|
\le \gamma \|y\|_{L^2(\Omega)} \|v\|_{L^2(\Omega)}. ∣ ∣ ∫ Ω c y v d x ∣ ∣ ≤ γ ∥ y ∥ L 2 ( Ω ) ∥ v ∥ L 2 ( Ω ) . For y , v ∈ H 0 1 ( Ω ) y,v\in H_0^1(\Omega) y , v ∈ H 0 1 ( Ω ) , Poincare’s inequality yields
∥ y ∥ L 2 ( Ω ) ≤ C P ∥ ∇ y ∥ L 2 ( Ω ) , ∥ v ∥ L 2 ( Ω ) ≤ C P ∥ ∇ v ∥ L 2 ( Ω ) . \|y\|_{L^2(\Omega)}\le C_P \|\nabla y\|_{L^2(\Omega)},
\qquad
\|v\|_{L^2(\Omega)}\le C_P \|\nabla v\|_{L^2(\Omega)}. ∥ y ∥ L 2 ( Ω ) ≤ C P ∥∇ y ∥ L 2 ( Ω ) , ∥ v ∥ L 2 ( Ω ) ≤ C P ∥∇ v ∥ L 2 ( Ω ) . Hence
∣ a ( y , v ) ∣ ≤ ( M + β C P + γ C P 2 ) ∥ ∇ y ∥ L 2 ( Ω ) ∥ ∇ v ∥ L 2 ( Ω ) . |a(y,v)|
\le
\left(M+\beta C_P+\gamma C_P^2\right)
\|\nabla y\|_{L^2(\Omega)}\|\nabla v\|_{L^2(\Omega)}. ∣ a ( y , v ) ∣ ≤ ( M + β C P + γ C P 2 ) ∥∇ y ∥ L 2 ( Ω ) ∥∇ v ∥ L 2 ( Ω ) . On H 0 1 ( Ω ) H_0^1(\Omega) H 0 1 ( Ω ) , the norm ∥ w ∥ H 1 ( Ω ) \|w\|_{H^1(\Omega)} ∥ w ∥ H 1 ( Ω ) is equivalent to
∥ ∇ w ∥ L 2 ( Ω ) \|\nabla w\|_{L^2(\Omega)} ∥∇ w ∥ L 2 ( Ω ) . Therefore
∣ a ( y , v ) ∣ ≤ C ∥ y ∥ H 1 ( Ω ) ∥ v ∥ H 1 ( Ω ) ∀ y , v ∈ V . |a(y,v)|\le C \|y\|_{H^1(\Omega)}\|v\|_{H^1(\Omega)}
\qquad \forall y,v\in V. ∣ a ( y , v ) ∣ ≤ C ∥ y ∥ H 1 ( Ω ) ∥ v ∥ H 1 ( Ω ) ∀ y , v ∈ V . with the explicit choice
C = M + β C P + γ C P 2 . C=M+\beta C_P+\gamma C_P^2. C = M + β C P + γ C P 2 . Coercivity ¶ Let v ∈ V = H 0 1 ( Ω ) v\in V=H_0^1(\Omega) v ∈ V = H 0 1 ( Ω ) . Then
a ( v , v ) = ∫ Ω A ∇ v ⋅ ∇ v d x − ∫ Ω b ⋅ ∇ v v d x + ∫ Ω c v 2 d x . a(v,v)
=
\int_\Omega \mathbf A\nabla v\cdot \nabla v\,dx
- \int_\Omega \mathbf b\cdot \nabla v\,v\,dx
+ \int_\Omega c\,v^2\,dx. a ( v , v ) = ∫ Ω A ∇ v ⋅ ∇ v d x − ∫ Ω b ⋅ ∇ v v d x + ∫ Ω c v 2 d x . Uniform ellipticity gives
∫ Ω A ∇ v ⋅ ∇ v d x ≥ α ∥ ∇ v ∥ L 2 ( Ω ) 2 . \int_\Omega \mathbf A\nabla v\cdot \nabla v\,dx
\ge \alpha \|\nabla v\|_{L^2(\Omega)}^2. ∫ Ω A ∇ v ⋅ ∇ v d x ≥ α ∥∇ v ∥ L 2 ( Ω ) 2 . Integrating the first-order term by parts:
− ∫ Ω b ⋅ ∇ v v d x = − 1 2 ∫ Ω b ⋅ ∇ ( v 2 ) d x = 1 2 ∫ Ω ( div b ) v 2 d x , -\int_\Omega \mathbf b\cdot \nabla v\,v\,dx
=
-\frac12 \int_\Omega \mathbf b\cdot \nabla(v^2)\,dx
=
\frac12 \int_\Omega (\operatorname{div}\mathbf b)\,v^2\,dx, − ∫ Ω b ⋅ ∇ v v d x = − 2 1 ∫ Ω b ⋅ ∇ ( v 2 ) d x = 2 1 ∫ Ω ( div b ) v 2 d x , The boundary term vanishes since v = 0 v=0 v = 0 on Γ \Gamma Γ .
Hence
a ( v , v ) = ∫ Ω A ∇ v ⋅ ∇ v d x + ∫ Ω ( c + 1 2 div b ) v 2 d x . a(v,v)
=
\int_\Omega \mathbf A\nabla v\cdot \nabla v\,dx
+ \int_\Omega \left(c+\frac12\operatorname{div}\mathbf b\right) v^2\,dx. a ( v , v ) = ∫ Ω A ∇ v ⋅ ∇ v d x + ∫ Ω ( c + 2 1 div b ) v 2 d x . Under the assumption
c ( x ) + 1 2 div b ( x ) ≥ − α 0 for q.o. x ∈ Ω , c(x)+\frac12\operatorname{div}\mathbf b(x)\ge -\alpha_0
\qquad \text{for q.o. }x\in\Omega, c ( x ) + 2 1 div b ( x ) ≥ − α 0 for q.o. x ∈ Ω , a ( v , v ) ≥ α ∥ ∇ v ∥ L 2 ( Ω ) 2 − α 0 ∥ v ∥ L 2 ( Ω ) 2 . a(v,v)\ge \alpha \|\nabla v\|_{L^2(\Omega)}^2-\alpha_0 \|v\|_{L^2(\Omega)}^2. a ( v , v ) ≥ α ∥∇ v ∥ L 2 ( Ω ) 2 − α 0 ∥ v ∥ L 2 ( Ω ) 2 . Poincare’s inequality gives
∥ v ∥ L 2 ( Ω ) ≤ C P ∥ ∇ v ∥ L 2 ( Ω ) , \|v\|_{L^2(\Omega)}\le C_P \|\nabla v\|_{L^2(\Omega)}, ∥ v ∥ L 2 ( Ω ) ≤ C P ∥∇ v ∥ L 2 ( Ω ) , Hence
a ( v , v ) ≥ ( α − α 0 C P 2 ) ∥ ∇ v ∥ L 2 ( Ω ) 2 . a(v,v)\ge (\alpha-\alpha_0 C_P^2)\|\nabla v\|_{L^2(\Omega)}^2. a ( v , v ) ≥ ( α − α 0 C P 2 ) ∥∇ v ∥ L 2 ( Ω ) 2 . If
α 0 < α C P 2 , \alpha_0 < \frac{\alpha}{C_P^2}, α 0 < C P 2 α , the right-hand side is strictly positive. Since on H 0 1 ( Ω ) H_0^1(\Omega) H 0 1 ( Ω ) the
seminorm ∥ ∇ v ∥ L 2 ( Ω ) \|\nabla v\|_{L^2(\Omega)} ∥∇ v ∥ L 2 ( Ω ) is equivalent to the full norm,
a ( v , v ) ≥ α 0 ∥ v ∥ H 1 ( Ω ) 2 ∀ v ∈ V , a(v,v)\ge \alpha_0 \|v\|_{H^1(\Omega)}^2
\qquad \forall v\in V, a ( v , v ) ≥ α 0 ∥ v ∥ H 1 ( Ω ) 2 ∀ v ∈ V , for some α 0 > 0 \alpha_0>0 α 0 > 0 .
Nonhomogeneous Dirichlet and Neumann Conditions ¶ Consider
{ − ∇ ⋅ ( A ( x ) ∇ y ) − b ( x ) ⋅ ∇ y + c ( x ) y = f in Ω , y = g D on Γ D , ( A ∇ y ) ⋅ n = g N on Γ N , \begin{cases}
-\nabla\cdot(\mathbf A(x)\nabla y)-\mathbf b(x)\cdot\nabla y+c(x)y=f
& \text{in }\Omega,\\
y=g_D & \text{on }\Gamma_D,\\
(\mathbf A\nabla y)\cdot n=g_N & \text{on }\Gamma_N,
\end{cases} ⎩ ⎨ ⎧ − ∇ ⋅ ( A ( x ) ∇ y ) − b ( x ) ⋅ ∇ y + c ( x ) y = f y = g D ( A ∇ y ) ⋅ n = g N in Ω , on Γ D , on Γ N , with
Γ = Γ D ∪ Γ N , Γ D ∩ Γ N = ∅ . \Gamma=\Gamma_D\cup\Gamma_N,
\qquad
\Gamma_D\cap\Gamma_N=\varnothing. Γ = Γ D ∪ Γ N , Γ D ∩ Γ N = ∅ . If g D ≠ 0 g_D\neq 0 g D = 0 , one works in an affine space with prescribed trace, or reduces to the homogeneous case by lifting. If Neumann data are present, the boundary term appears on the right-hand side of the weak formulation.
For pure Dirichlet data on Γ \Gamma Γ , let
γ y = g D , g D ∈ H 1 / 2 ( Γ ) . \gamma y=g_D,
\qquad
g_D\in H^{1/2}(\Gamma). γ y = g D , g D ∈ H 1/2 ( Γ ) . Choose a lifting
w : = E g D ∈ H 1 ( Ω ) , γ w = g D , ∥ w ∥ H 1 ( Ω ) ≤ C t r ∥ g D ∥ H 1 / 2 ( Γ ) . w:=Eg_D\in H^1(\Omega),
\qquad
\gamma w=g_D,
\qquad
\|w\|_{H^1(\Omega)}\le C_{\mathrm{tr}}\|g_D\|_{H^{1/2}(\Gamma)}. w := E g D ∈ H 1 ( Ω ) , γ w = g D , ∥ w ∥ H 1 ( Ω ) ≤ C tr ∥ g D ∥ H 1/2 ( Γ ) . Set
Then z ∈ H 0 1 ( Ω ) z\in H_0^1(\Omega) z ∈ H 0 1 ( Ω ) and
a ( z , v ) = ( u , v ) L 2 ( Ω ) + ⟨ F , v ⟩ V ′ , V − a ( w , v ) ∀ v ∈ H 0 1 ( Ω ) . a(z,v)=(u,v)_{L^2(\Omega)}+\langle F,v\rangle_{V',V}-a(w,v)
\qquad \forall v\in H_0^1(\Omega). a ( z , v ) = ( u , v ) L 2 ( Ω ) + ⟨ F , v ⟩ V ′ , V − a ( w , v ) ∀ v ∈ H 0 1 ( Ω ) . Define
ℓ ( v ) : = ( u , v ) L 2 ( Ω ) + ⟨ F , v ⟩ V ′ , V − a ( w , v ) . \ell(v):=(u,v)_{L^2(\Omega)}+\langle F,v\rangle_{V',V}-a(w,v). ℓ ( v ) := ( u , v ) L 2 ( Ω ) + ⟨ F , v ⟩ V ′ , V − a ( w , v ) . By continuity of a ( ⋅ , ⋅ ) a(\cdot,\cdot) a ( ⋅ , ⋅ ) ,
∣ a ( w , v ) ∣ ≤ C ∥ w ∥ H 1 ( Ω ) ∥ v ∥ H 1 ( Ω ) ≤ C C t r ∥ g D ∥ H 1 / 2 ( Γ ) ∥ v ∥ H 1 ( Ω ) . |a(w,v)|
\le C \|w\|_{H^1(\Omega)}\|v\|_{H^1(\Omega)}
\le C\,C_{\mathrm{tr}}\|g_D\|_{H^{1/2}(\Gamma)}\|v\|_{H^1(\Omega)}. ∣ a ( w , v ) ∣ ≤ C ∥ w ∥ H 1 ( Ω ) ∥ v ∥ H 1 ( Ω ) ≤ C C tr ∥ g D ∥ H 1/2 ( Γ ) ∥ v ∥ H 1 ( Ω ) . Hence
∣ ℓ ( v ) ∣ ≤ ( C u ∥ u ∥ L 2 ( Ω ) + ∥ F ∥ V ′ + C C t r ∥ g D ∥ H 1 / 2 ( Γ ) ) ∥ v ∥ H 1 ( Ω ) . |\ell(v)|
\le
\Bigl(
C_u\|u\|_{L^2(\Omega)}
+ \|F\|_{V'}
+ C\,C_{\mathrm{tr}}\|g_D\|_{H^{1/2}(\Gamma)}
\Bigr)\|v\|_{H^1(\Omega)}. ∣ ℓ ( v ) ∣ ≤ ( C u ∥ u ∥ L 2 ( Ω ) + ∥ F ∥ V ′ + C C tr ∥ g D ∥ H 1/2 ( Γ ) ) ∥ v ∥ H 1 ( Ω ) . Lax-Milgram gives
∥ z ∥ H 1 ( Ω ) ≤ C ( ∥ u ∥ L 2 ( Ω ) + ∥ F ∥ V ′ + ∥ g D ∥ H 1 / 2 ( Γ ) ) , \|z\|_{H^1(\Omega)}
\le
C\Bigl(
\|u\|_{L^2(\Omega)}
+ \|F\|_{V'}
+ \|g_D\|_{H^{1/2}(\Gamma)}
\Bigr), ∥ z ∥ H 1 ( Ω ) ≤ C ( ∥ u ∥ L 2 ( Ω ) + ∥ F ∥ V ′ + ∥ g D ∥ H 1/2 ( Γ ) ) , and therefore
∥ y ∥ H 1 ( Ω ) ≤ C ( ∥ u ∥ L 2 ( Ω ) + ∥ F ∥ V ′ + ∥ g D ∥ H 1 / 2 ( Γ ) ) . \|y\|_{H^1(\Omega)}
\le
C\Bigl(
\|u\|_{L^2(\Omega)}
+ \|F\|_{V'}
+ \|g_D\|_{H^{1/2}(\Gamma)}
\Bigr). ∥ y ∥ H 1 ( Ω ) ≤ C ( ∥ u ∥ L 2 ( Ω ) + ∥ F ∥ V ′ + ∥ g D ∥ H 1/2 ( Γ ) ) . Well-Posedness of the State Equation ¶ By Lax-Milgram, for every u ∈ L 2 ( Ω ) u\in L^2(\Omega) u ∈ L 2 ( Ω ) and every F ∈ V ′ F\in V' F ∈ V ′ ,
there exists a unique state y ∈ V y\in V y ∈ V such that
a ( y , v ) = ( u , v ) L 2 ( Ω ) + ⟨ F , v ⟩ V ′ , V ∀ v ∈ V . a(y,v)=(u,v)_{L^2(\Omega)}+\langle F,v\rangle_{V',V}
\qquad \forall v\in V. a ( y , v ) = ( u , v ) L 2 ( Ω ) + ⟨ F , v ⟩ V ′ , V ∀ v ∈ V . The solution depends continuously on the data.
This defines the control-to-state operator
S : L 2 ( Ω ) → H 0 1 ( Ω ) , u ↦ y . S:L^2(\Omega)\to H_0^1(\Omega),
\qquad
u\mapsto y. S : L 2 ( Ω ) → H 0 1 ( Ω ) , u ↦ y . Cea and Bramble-Hilbert Lemmas ¶ Let V h ⊂ V V_h\subset V V h ⊂ V be a conforming finite-dimensional space and let y h ∈ V h y_h\in V_h y h ∈ V h be the
Galerkin approximation of the state:
a ( y h , v h ) = ( u , v h ) L 2 ( Ω ) + ⟨ F , v h ⟩ V ′ , V ∀ v h ∈ V h . a(y_h,v_h)=(u,v_h)_{L^2(\Omega)}+\langle F,v_h\rangle_{V',V}
\qquad \forall v_h\in V_h. a ( y h , v h ) = ( u , v h ) L 2 ( Ω ) + ⟨ F , v h ⟩ V ′ , V ∀ v h ∈ V h . Cea’s lemma gives the quasi-optimality estimate
∥ y − y h ∥ H 1 ( Ω ) ≤ C α 0 inf v h ∈ V h ∥ y − v h ∥ H 1 ( Ω ) . \|y-y_h\|_{H^1(\Omega)}
\le
\frac{C}{\alpha_0}\inf_{v_h\in V_h}\|y-v_h\|_{H^1(\Omega)}. ∥ y − y h ∥ H 1 ( Ω ) ≤ α 0 C v h ∈ V h inf ∥ y − v h ∥ H 1 ( Ω ) . The finite element error is controlled, up to the factor C / α 0 C/\alpha_0 C / α 0 , by the best
approximation error in the chosen discrete space.
The approximation term
inf v h ∈ V h ∥ y − v h ∥ H 1 ( Ω ) \inf_{v_h\in V_h}\|y-v_h\|_{H^1(\Omega)} v h ∈ V h inf ∥ y − v h ∥ H 1 ( Ω ) is estimated by the Bramble-Hilbert lemma.
For standard finite element spaces of polynomial degree k k k on a shape-regular mesh,
if the exact solution is sufficiently smooth, for example y ∈ H k + 1 ( Ω ) y\in H^{k+1}(\Omega) y ∈ H k + 1 ( Ω ) , then
the interpolation error satisfies
∥ y − I h y ∥ H m ( Ω ) ≤ C h k + 1 − m ∣ y ∣ H k + 1 ( Ω ) , 0 ≤ m ≤ k + 1. \|y-I_h y\|_{H^m(\Omega)}
\le
C h^{k+1-m}|y|_{H^{k+1}(\Omega)},
\qquad 0\le m\le k+1. ∥ y − I h y ∥ H m ( Ω ) ≤ C h k + 1 − m ∣ y ∣ H k + 1 ( Ω ) , 0 ≤ m ≤ k + 1. Combining this with Cea’s lemma gives
∥ y − y h ∥ H 1 ( Ω ) ≤ C h k ∣ y ∣ H k + 1 ( Ω ) . \|y-y_h\|_{H^1(\Omega)}
\le
C h^k |y|_{H^{k+1}(\Omega)}. ∥ y − y h ∥ H 1 ( Ω ) ≤ C h k ∣ y ∣ H k + 1 ( Ω ) . the weak formulation is posed in Sobolev spaces;
Galerkin approximations inherit stability from coercivity;
approximation theory converts best approximation into concrete mesh-dependent error bounds.
Linear-Quadratic Optimal Control Problem ¶ Let
Q : = L 2 ( Ω ) . Q:=L^2(\Omega). Q := L 2 ( Ω ) . Consider
min ( y , u ) J ( y , u ) : = 1 2 ∥ y − y d ∥ L 2 ( Ω ) 2 + α 2 ∥ u ∥ L 2 ( Ω ) 2 , \min_{(y,u)} J(y,u)
:=
\frac12\|y-y_d\|_{L^2(\Omega)}^2
+\frac{\alpha}{2}\|u\|_{L^2(\Omega)}^2, ( y , u ) min J ( y , u ) := 2 1 ∥ y − y d ∥ L 2 ( Ω ) 2 + 2 α ∥ u ∥ L 2 ( Ω ) 2 , subject to
a ( y , v ) = ( u , v ) L 2 ( Ω ) + ⟨ F , v ⟩ V ′ , V ∀ v ∈ V . a(y,v)=(u,v)_{L^2(\Omega)}+\langle F,v\rangle_{V',V}
\qquad \forall v\in V. a ( y , v ) = ( u , v ) L 2 ( Ω ) + ⟨ F , v ⟩ V ′ , V ∀ v ∈ V . The state equation defines the control-to-state map
S : Q → V , u ↦ y . S:Q\to V,
\qquad
u\mapsto y. S : Q → V , u ↦ y . The reduced functional is
f ( u ) : = J ( S u , u ) . f(u):=J(Su,u). f ( u ) := J ( S u , u ) . The reduced problem is
min u ∈ Q f ( u ) . \min_{u\in Q} f(u). u ∈ Q min f ( u ) . Continuous Optimality System ¶ State equation:
a ( y , v ) = ( u , v ) L 2 ( Ω ) + ⟨ F , v ⟩ V ′ , V ∀ v ∈ V . a(y,v)=(u,v)_{L^2(\Omega)}+\langle F,v\rangle_{V',V}
\qquad \forall v\in V. a ( y , v ) = ( u , v ) L 2 ( Ω ) + ⟨ F , v ⟩ V ′ , V ∀ v ∈ V . Adjoint equation:
a ( v , p ) = ( y − y d , v ) L 2 ( Ω ) ∀ v ∈ V . a(v,p)=(y-y_d,v)_{L^2(\Omega)}
\qquad \forall v\in V. a ( v , p ) = ( y − y d , v ) L 2 ( Ω ) ∀ v ∈ V . Gradient condition in the unconstrained case:
α u + p = 0 in L 2 ( Ω ) . \alpha u+p=0
\qquad \text{in }L^2(\Omega). αu + p = 0 in L 2 ( Ω ) . Equivalently, in dual notation,
α R Q u + B ∗ p = 0 in Q ′ . \alpha R_Q u+B^*p=0
\qquad \text{in }Q'. α R Q u + B ∗ p = 0 in Q ′ . If control constraints are present, this last equation is replaced by the corresponding
variational inequality or projection formula.
Optimize-Then-Discretize and Discretize-Then-Optimize ¶ Two routes are available.
Optimize-Then-Discretize ¶ derive the continuous state equation;
derive the continuous adjoint equation;
derive the continuous optimality system;
discretize the resulting system.
Discretize-Then-Optimize ¶ discretize the state equation;
define the discrete cost functional;
formulate the finite-dimensional optimization problem;
derive its KKT conditions.
In linear-quadratic elliptic settings these routes often lead to the same algebraic system. In more general settings they may differ.
Choice of Discrete Spaces ¶ Let T h \mathcal T_h T h be a finite element mesh of Ω \Omega Ω .
We choose a finite-dimensional subspace
V h ⊂ V = H 0 1 ( Ω ) V_h\subset V=H_0^1(\Omega) V h ⊂ V = H 0 1 ( Ω ) for the state and adjoint variables.
Typical choice:
For the control, there are several possible choices.
The two most common are:
choose a discrete control space
Q h ⊂ Q = L 2 ( Ω ) , Q_h\subset Q=L^2(\Omega), Q h ⊂ Q = L 2 ( Ω ) , for example piecewise polynomials or continuous piecewise polynomials;
do not discretize the control explicitly at first, and only discretize the state and adjoint.
In this lecture we focus first on the standard fully discrete setting
V h ⊂ V , Q h ⊂ Q . V_h\subset V,\qquad Q_h\subset Q. V h ⊂ V , Q h ⊂ Q . Why are these choices important?
the state equation must be well posed in the discrete space;
the control space determines the number and meaning of the optimization variables;
the discrete gradient and the discrete KKT system depend on the pairing between V h V_h V h and Q h Q_h Q h .
The passage from the continuous problem to a numerical method is not only
the replacement of functions by vectors:
it also requires a choice of finite-dimensional spaces.
Discrete State Equation ¶ Given u h ∈ Q h u_h\in Q_h u h ∈ Q h , the finite element state equation is:
find y h ∈ V h y_h\in V_h y h ∈ V h such that
a ( y h , v h ) = ( u h , v h ) Q + ⟨ F , v h ⟩ V ′ , V ∀ v h ∈ V h . a(y_h,v_h)=(u_h,v_h)_Q+\langle F,v_h\rangle_{V',V}
\qquad \forall v_h\in V_h. a ( y h , v h ) = ( u h , v h ) Q + ⟨ F , v h ⟩ V ′ , V ∀ v h ∈ V h . This is the Galerkin approximation of the continuous weak problem.
Choose bases
V h = span { φ 1 , … , φ n y } , Q h = span { ψ 1 , … , ψ n u } . V_h=\operatorname{span}\{\varphi_1,\dots,\varphi_{n_y}\},
\qquad
Q_h=\operatorname{span}\{\psi_1,\dots,\psi_{n_u}\}. V h = span { φ 1 , … , φ n y } , Q h = span { ψ 1 , … , ψ n u } . Expand
y h = ∑ i = 1 n y y i φ i , u h = ∑ j = 1 n u u j ψ j . y_h=\sum_{i=1}^{n_y} y_i\varphi_i,
\qquad
u_h=\sum_{j=1}^{n_u} u_j\psi_j. y h = i = 1 ∑ n y y i φ i , u h = j = 1 ∑ n u u j ψ j . Then the discrete state equation becomes the linear system
A y − B u = f , A y - B u = f, A y − B u = f , where
A i j = a ( φ j , φ i ) , B i j = ( ψ j , φ i ) Q , f i = ⟨ F , φ i ⟩ V ′ , V . A_{ij}=a(\varphi_j,\varphi_i),
\qquad
B_{ij}=(\psi_j,\varphi_i)_Q,
\qquad
f_i=\langle F,\varphi_i\rangle_{V',V}. A ij = a ( φ j , φ i ) , B ij = ( ψ j , φ i ) Q , f i = ⟨ F , φ i ⟩ V ′ , V . Interpretation of the matrices:
If Q h = V h Q_h=V_h Q h = V h and the same basis is used, then B B B is simply the mass matrix.
If Q h Q_h Q h is piecewise constant, then B B B is a rectangular coupling matrix.
The important structural point is:
for each discrete control vector u u u , there is a unique discrete state vector y y y ,
provided the stiffness matrix A A A is invertible.
The discrete control-to-state map is
S h : Q h → V h , u h ↦ y h . S_h:Q_h\to V_h,
\qquad
u_h\mapsto y_h. S h : Q h → V h , u h ↦ y h . Discrete Cost Functional ¶ Once the state has been discretized, the cost functional becomes
J h ( y h , u h ) = 1 2 ∥ y h − y d , h ∥ L 2 ( Ω ) 2 + α 2 ∥ u h ∥ L 2 ( Ω ) 2 . J_h(y_h,u_h)
=
\frac12\|y_h-y_{d,h}\|_{L^2(\Omega)}^2
+\frac{\alpha}{2}\|u_h\|_{L^2(\Omega)}^2. J h ( y h , u h ) = 2 1 ∥ y h − y d , h ∥ L 2 ( Ω ) 2 + 2 α ∥ u h ∥ L 2 ( Ω ) 2 . In matrix form, this reads
J h ( y , u ) = 1 2 ( y − y d ) T M y ( y − y d ) + α 2 u T M u u , J_h(y,u)
=
\frac12 (y-y_d)^T M_y (y-y_d)
+\frac{\alpha}{2} u^T M_u u, J h ( y , u ) = 2 1 ( y − y d ) T M y ( y − y d ) + 2 α u T M u u , where
( M y ) i j = ( φ j , φ i ) Q , ( M u ) i j = ( ψ j , ψ i ) Q . (M_y)_{ij}=(\varphi_j,\varphi_i)_Q,
\qquad
(M_u)_{ij}=(\psi_j,\psi_i)_Q. ( M y ) ij = ( φ j , φ i ) Q , ( M u ) ij = ( ψ j , ψ i ) Q . The fully discrete constrained problem is
min ( y , u ) ∈ R n y × R n u J h ( y , u ) subject to A y − B u = f . \min_{(y,u)\in \mathbb R^{n_y}\times\mathbb R^{n_u}} J_h(y,u)
\qquad \text{subject to} \qquad
A y-Bu=f. ( y , u ) ∈ R n y × R n u min J h ( y , u ) subject to A y − B u = f . The conceptual path of Lecture 1 reappears in finite element form:
choose a control vector u u u ;
solve the discrete state equation to get y = S h ( u ) y=S_h(u) y = S h ( u ) ;
evaluate the discrete objective;
apply finite-dimensional optimization ideas.
Reduced Discrete Problem ¶ Since the discrete state is uniquely determined by the discrete control,
we may eliminate y y y and define the reduced discrete functional
f h ( u ) : = J h ( S h u , u ) . f_h(u):=J_h(S_hu,u). f h ( u ) := J h ( S h u , u ) . Then the discrete optimal control problem becomes
min u ∈ R n u f h ( u ) , \min_{u\in \mathbb R^{n_u}} f_h(u), u ∈ R n u min f h ( u ) , or, with bounds,
min u ∈ U a d h f h ( u ) , \min_{u\in U_{\mathrm{ad}}^h} f_h(u), u ∈ U ad h min f h ( u ) , where U a d h ⊂ R n u U_{\mathrm{ad}}^h\subset \mathbb R^{n_u} U ad h ⊂ R n u is the discrete admissible set.
Remarks:
the unknown is now a vector of control coefficients;
each evaluation of f h ( u ) f_h(u) f h ( u ) still requires the solution of one PDE-like linear system;
all numerical optimization methods from Lecture 3 now apply directly to f h f_h f h .
From the algorithmic point of view, the reduced finite element problem is a standard
optimization problem in which function and gradient evaluations require PDE solves.
Discrete Adjoint Equation ¶ To compute the gradient of f h f_h f h , we follow the continuous theory.
Given u h ∈ Q h u_h\in Q_h u h ∈ Q h , first solve the discrete state equation for y h y_h y h , and then define
the discrete adjoint p h ∈ V h p_h\in V_h p h ∈ V h by
a ( v h , p h ) = ( y h − y d , h , v h ) Q ∀ v h ∈ V h . a(v_h,p_h)=(y_h-y_{d,h},v_h)_Q
\qquad \forall v_h\in V_h. a ( v h , p h ) = ( y h − y d , h , v h ) Q ∀ v h ∈ V h . In matrix form, this becomes
A T p = M y ( y − y d ) . A^T p = M_y(y-y_d). A T p = M y ( y − y d ) . For symmetric elliptic operators such as Poisson with standard Dirichlet conditions,
the stiffness matrix is symmetric, so A T = A A^T=A A T = A .
The transpose is kept in the general derivation:
the adjoint equation is governed by the adjoint operator.
Once p h p_h p h has been computed, the reduced gradient is
∇ f h ( u ) = α M u u + B T p . \nabla f_h(u)=\alpha M_u u + B^T p. ∇ f h ( u ) = α M u u + B T p . Finite element counterpart of the continuous identity:
∇ f ( u ) = α u + p \nabla f(u)=\alpha u+p ∇ f ( u ) = αu + p when the control space is identified with L 2 ( Ω ) L^2(\Omega) L 2 ( Ω ) through the Riesz map.
Computational pattern:
state solve;
adjoint solve;
gradient assembly.
The adjoint principle is unchanged by discretization.
Discrete KKT System ¶ Instead of reducing the problem to the control alone, we may keep state, control,
and adjoint unknowns together.
In the unconstrained case, the discrete first-order system is
A y − B u = f , A T p − M y y = − M y y d , α M u u + B T p = 0. \begin{aligned}
A y - B u &= f,\\
A^T p - M_y y &= -M_y y_d,\\
\alpha M_u u + B^T p &= 0.
\end{aligned} A y − B u A T p − M y y α M u u + B T p = f , = − M y y d , = 0. Reordering the equations as before, this becomes the block system
( M y 0 − A T 0 α M u B T A − B 0 ) ( y u p ) = ( M y y d 0 f ) . \begin{pmatrix}
M_y & 0 & -A^T\\
0 & \alpha M_u & B^T\\
A & -B & 0
\end{pmatrix}
\begin{pmatrix}
y\\
u\\
p
\end{pmatrix}
=
\begin{pmatrix}
M_y y_d\\
0\\
f
\end{pmatrix}. ⎝ ⎛ M y 0 A 0 α M u − B − A T B T 0 ⎠ ⎞ ⎝ ⎛ y u p ⎠ ⎞ = ⎝ ⎛ M y y d 0 f ⎠ ⎞ . This is the discrete saddle-point system associated with the optimality conditions.
Structure:
the ( 1 , 1 ) (1,1) ( 1 , 1 ) block comes from state tracking;
the ( 2 , 2 ) (2,2) ( 2 , 2 ) block comes from Tikhonov regularization;
the off-diagonal blocks encode state and adjoint couplings;
the zero block in the lower-right corner is the saddle-point signature.
Discretization preserves the continuous KKT structure and yields a sparse block linear system.
This algebraic form motivates:
Box Constraints in the Discrete Problem ¶ Suppose now that the control is constrained componentwise:
u min ≤ u ≤ u max . u_{\min}\le u \le u_{\max}. u m i n ≤ u ≤ u m a x . Then the reduced discrete problem is
min u ∈ U a d h f h ( u ) , \min_{u\in U_{\mathrm{ad}}^h} f_h(u), u ∈ U ad h min f h ( u ) , where U a d h U_{\mathrm{ad}}^h U ad h is a box in R n u \mathbb R^{n_u} R n u .
The first-order condition becomes the finite-dimensional variational inequality
∇ f h ( u ˉ ) T ( v − u ˉ ) ≥ 0 ∀ v ∈ U a d h . \nabla f_h(\bar u)^T(v-\bar u)\ge 0
\qquad \forall v\in U_{\mathrm{ad}}^h. ∇ f h ( u ˉ ) T ( v − u ˉ ) ≥ 0 ∀ v ∈ U ad h . Using the discrete gradient formula,
( α M u u ˉ + B T p ˉ ) T ( v − u ˉ ) ≥ 0 ∀ v ∈ U a d h . (\alpha M_u \bar u+B^T\bar p)^T(v-\bar u)\ge 0
\qquad \forall v\in U_{\mathrm{ad}}^h. ( α M u u ˉ + B T p ˉ ) T ( v − u ˉ ) ≥ 0 ∀ v ∈ U ad h . If the control basis is chosen so that the discrete control inner product is diagonal
or easily invertible, this leads to the discrete projection formula
u ˉ = Π U a d h ( u ˉ − ρ ( α M u u ˉ + B T p ˉ ) ) . \bar u=\Pi_{U_{\mathrm{ad}}^h}\!\left(\bar u-\rho(\alpha M_u\bar u+B^T\bar p)\right). u ˉ = Π U ad h ( u ˉ − ρ ( α M u u ˉ + B T p ˉ ) ) . In the simplest lumped-mass or nodal setting, this reduces componentwise to
u ˉ i = min { u max , i , max { u min , i , − α − 1 p ˉ i } } . \bar u_i=\min\!\left\{u_{\max,i},\max\{u_{\min,i},-\alpha^{-1}\bar p_i\}\right\}. u ˉ i = min { u m a x , i , max { u m i n , i , − α − 1 p ˉ i } } . The discrete constrained problem has the same geometry as the finite-dimensional
KKT picture from Lecture 2 and the continuous projection picture from Lecture 5.
A Minimal 1D Finite Element Thought Experiment ¶ To make the previous formulas concrete, imagine Ω = ( 0 , 1 ) \Omega=(0,1) Ω = ( 0 , 1 ) and let V h V_h V h be the space
of continuous piecewise linear functions on a uniform mesh.
Then:
the stiffness matrix A A A is tridiagonal;
the mass matrix M y M_y M y is sparse and symmetric positive definite;
the discrete state solve means solving one sparse linear system;
the discrete adjoint solve has the same matrix and a different right-hand side.
One reduced-gradient iteration has the form:
choose the current control vector u k u_k u k ;
solve
A y k = B u k + f ; A y_k = B u_k + f; A y k = B u k + f ; solve
A T p k = M y ( y k − y d ) ; A^T p_k = M_y(y_k-y_d); A T p k = M y ( y k − y d ) ; compute
g k = α M u u k + B T p k ; g_k=\alpha M_u u_k+B^T p_k; g k = α M u u k + B T p k ; update the control with a finite-dimensional optimization step.
This is the finite element version of the state-adjoint-gradient loop of the
continuous reduced formulation.
References ¶ Fredi Troeltzsch, Optimal Control of Partial Differential Equations , Chapters 3 and 4.
Juan Carlos De Los Reyes, Numerical PDE-Constrained Optimization .
Alfio Borzi and Volker Schulz, Computational Optimization of Systems Governed by Partial Differential Equations .