Overview¶
This is the first genuinely PDE-constrained lecture in the course. We take the optimization viewpoint of Lecture 3 and apply it to a standard linear elliptic optimal control problem:
the PDE defines a control-to-state map ;
the cost becomes a reduced functional ;
the adjoint equation gives the reduced gradient at the cost of one extra PDE solve;
gradient-based methods from Lecture 3 now become PDE-based algorithms.
We focus on the distributed-control Poisson model because it contains all the structural ideas we need before constraints, discretization, and more advanced PDEs.
Model Problem¶
Let be a bounded domain. We consider the distributed control problem
subject to
with given data and .
Interpretation:
is a distributed source term;
is the state generated by that source;
the first term tracks a desired state ;
the second term penalizes large controls.
This is the PDE analogue of the linear-quadratic finite-dimensional problems seen before.
Weak Formulation of the State Equation¶
This is the first point where the PDE language changes. We start from the strong form
Assume for a moment that is smooth. Take a test function that is also smooth and vanishes on the boundary. Multiply the PDE by and integrate over :
Now integrate by parts:
Because the test function satisfies on , the boundary term vanishes. So we obtain the identity
the strong form contains second derivatives of ;
the weak form contains only first derivatives of ;
as a consequence, the weak form makes sense in a larger space.
This motivates the Sobolev setting:
: one weak derivative in , zero trace on the boundary;
: same space for test functions;
: enough to make the right-hand side meaningful.
For fixed , the weak formulation is: find such that
Introduce
Then the problem reads
Why does this have a unique solution?
is bilinear;
is continuous on ;
is coercive, since
and Poincare’s inequality shows that this controls the full norm;
is continuous because
Lax-Milgram then gives a unique weak solution for every .
This allows us to define the state equation as a linear map
The distinction to keep in mind is:
strong formulation: pointwise PDE, more regularity required;
weak formulation: variational identity, Sobolev-space setting.
With elliptic regularity one often has more, but for the reduced formulation the key point is simply:
for each control , there is exactly one state ;
the control is the only independent optimization variable.
Reduced Cost Functional¶
Eliminate the state through the PDE:
Then define
The PDE-constrained problem can now be written as
This is the infinite-dimensional version of the reduced formulation from Lecture 1.
Two remarks:
the optimization variable is now , not a finite vector;
evaluating requires solving the state equation.
It is also useful to introduce the same problem in a fully operatorial form, because this makes the infinite-dimensional structure look exactly like a block linear system.
Let
We define the elliptic operator by
The control enters the state equation through the operator defined by
Since the tracking term is measured in , we also introduce the observation embedding , here simply
and its associated mass operator :
For the control cost, it is convenient to write the inner product through the Riesz map ,
With this notation, the state equation is
where is a given load. In the present model without an additional forcing term, one simply has .
The cost functional can be written as
So the all-at-once infinite-dimensional problem is
Introduce the Lagrangian with multiplier :
Its first-order conditions are
This is already a saddle-point system: the unknown triple lives in , while the equations live in the dual product space after reordering the blocks. Indeed, the KKT system can be written as
This is the infinite-dimensional analogue of a symmetric indefinite linear system:
the diagonal block comes from the tracking term;
the diagonal block comes from Tikhonov regularization;
the off-diagonal blocks , , , and express the PDE constraint and its adjoint coupling;
the zero block in the position is the characteristic signature of a saddle-point problem.
So even before discretization, PDE-constrained optimization already has the same algebraic structure as the block KKT systems that appear in finite-dimensional constrained optimization.
Existence and Uniqueness¶
In finite dimensions, existence follows from the Weierstrass principle: closed and bounded sets are compact, so minimizing sequences have convergent subsequences. That argument is no longer available in infinite-dimensional spaces, because closed and bounded sets in are generally not strongly compact.
So, in addition to convexity, we need three structural ingredients:
coercivity, to prevent minimizing sequences from escaping to infinity;
weak compactness of bounded sets, which is available in reflexive spaces such as ;
weak lower semicontinuity, so the limit of a minimizing sequence still minimizes the functional.
For our reduced functional, coercivity comes from the Tikhonov term:
Hence every minimizing sequence is bounded in .
To see this more concretely, let be a minimizing sequence. Then there exists a constant such that
Coercivity then implies
so the minimizing sequence is bounded.
Since is reflexive, bounded sequences admit weakly convergent subsequences:
For minimization we do not need strong convergence of the whole sequence: we only need one convergent subsequence and a notion of lower semicontinuity that is compatible with that convergence. If is weakly lower semicontinuous, then
Since is minimizing, the right-hand side is exactly . Hence
which forces . So the weak limit of a minimizing subsequence is already a minimizer.
The state equation is linear and continuous, so the reduced functional is convex and weakly lower semicontinuous. This allows us to pass to the limit along a minimizing sequence and obtain existence of a minimizer.
Uniqueness comes from strict convexity. The reduced functional is strictly convex because:
is linear;
the tracking term is convex;
the Tikhonov term is strictly convex for .
As a consequence, has a unique minimizer .
This is the first major structural simplification of the linear-quadratic elliptic case:
existence is obtained through coercivity + weak compactness + weak lower semicontinuity;
uniqueness comes from strict convexity;
first-order optimality will characterize the global minimizer.
Directional Derivative of the Reduced Cost¶
Let and . We now compute the derivative of explicitly from the Frechet definition:
Since is linear,
Using this,
Expanding both squares gives
Subtract and divide by :
Passing to the limit , we obtain
This formula is correct, but not computationally convenient: if we need to evaluate along many directions , we would need many state solves.
The adjoint equation removes this difficulty.
Adjoint Equation¶
At this point we know that
The difficulty is the first term: the direction appears only indirectly, through the state variation .
This is exactly where the adjoint enters. We would like to rewrite
as an expression where appears explicitly. This can be done in a two-step process. Given a control , we first solve the control-to-state map to get , then we solve the one adjoint PDE to obtain , and then we use to rewrite the inner product with as an inner product with .
More concretely, we look for such that for every test function ,
Notice the change of roles of and compared to the state equation. In this particular example the bilinear form is symmetric, so the change of roles is not visible, but in general the adjoint bilinear form is different from the original one, and the test function is now associated with the adjoint state .
Notice the analogy with the state equation:
the left-hand side is the adjoint bilinear form;
only the right-hand side changes;
the source term is now the tracking residual .
We define the adjoint state by
where is the state associated with the current control .
Since in this case the adjoint bilinear form coincides with the original coercive form, Lax-Milgram applies again and gives existence and uniqueness of .
If is smooth enough, the corresponding strong form is
The adjoint PDE is governed by the adjoint operator. For the Poisson problem this is again , because the operator is self-adjoint, but this coincidence should be viewed as a special feature of the present model, not as the general rule.
Conceptually:
the state equation propagates the control forward to produce ;
the adjoint equation propagates the mismatch back into a quantity ;
this is precisely what will turn the implicit dependence on into an explicit inner product with .
Reduced Gradient Formula¶
From what we have so far, the directional derivative of the reduced cost is
which we rewrite moving to the other side of the inner product, i.e., , and defining the adjoint state .
Thus the reduced gradient is
This is the central formula of the lecture.
First-Order Optimality System¶
For the unconstrained problem, the optimal control satisfies
that is,
The only element of orthogonal to all test directions is the zero element, so this is equivalent to
Using the gradient formula derived above, we obtain
The optimality system is then
Eliminating gives the explicit formula
This is the PDE version of the finite-dimensional gradient equation.
Algorithmic Interpretation¶
To evaluate the reduced gradient at a control :
solve the state equation for ;
solve the adjoint equation for ;
form the gradient
Then a gradient step reads
with chosen by exact line search, Armijo backtracking, or another strategy from Lecture 3.
So one iteration of PDE-constrained gradient descent means:
one state solve;
one adjoint solve;
one control update.
This is the computational meaning of the reduced formulation.
Reduced Gradient Algorithm¶
A basic reduced-gradient method is:
choose ;
for :
solve the state equation for ;
solve the adjoint equation for ;
compute ;
choose ;
update .
Stopping criteria are the same as before, now written in function space, e.g.
Lecture 3 now has a direct PDE interpretation:
GD: immediate and simple;CG/BFGS: possible on the reduced problem, provided gradients are available;line search cost is now measured in additional PDE solves.
Summary¶
In the linear elliptic distributed-control setting:
the PDE defines a linear control-to-state map ;
the constrained problem reduces to minimizing ;
the adjoint equation gives the reduced gradient
unconstrained optimality is the gradient equation
every reduced-gradient iteration requires one state solve and one adjoint solve.
This is the basic computational pattern for PDE-constrained optimization.
For a first concrete discrete example, the repository now includes
jupyterbook/codes/lecture04/poisson_1d_fd.py.
It solves a trivial one-dimensional Poisson control problem by finite differences,
checks the reduced gradient by finite differences, and performs a few reduced-gradient
iterations.
The reusable one-dimensional finite-difference utilities used there live in
jupyterbook/codes/common/fd1d.py.
For a richer JupyterBook example, see also
jupyterbook/codes/lecture04/step_target_fd.ipynb, where the target state is a rectangular
step function in and the notebook generates both plots and a GIF animation
of the state approaching the target.