Overview¶
The previous lectures introduced the continuous optimality system for time-dependent control problems and the functional-analytic setting needed for parabolic PDEs:
Gelfand triples ;
the energy space
forward state equations and backward adjoint equations;
variational inequalities and projection formulas for box constraints.
This lecture addresses the next numerical question:
what is the correct discrete optimality system after time discretization?
We focus on one time-stepping method, implicit Euler, and derive the corresponding discrete optimality system carefully. The goal is not to cover all possible time discretizations, but to understand the structure that any reliable implementation must preserve.
The logical path is:
recall the continuous linear-quadratic parabolic control problem;
discretize the state equation in time with implicit Euler;
define the discrete reduced optimization problem;
derive the discrete adjoint equation from a discrete Lagrangian;
identify the discrete gradient and the first-order optimality system;
compare discretize-then-optimize with optimize-then-discretize;
write the fully discrete finite element matrix form;
add box constraints through a time-discrete variational inequality.
The central message is:
the discrete adjoint is not obtained by informally reversing time; it is the transpose, with respect to the chosen discrete inner products, of the linearized discrete state equation.
This is the principle behind adjoint-consistent time discretizations.
Continuous Problem¶
Let
be a Gelfand triple, where and are Hilbert spaces, the embedding is dense and continuous, and is identified with its dual.
For the heat equation with homogeneous Dirichlet boundary conditions one may keep in mind
Let be a Hilbert space for the control values, and assume that
is a bounded control operator. Let
and let .
We consider the state equation
with initial condition
Here is induced by a bilinear form
that is continuous and coercive:
for constants .
The weak form of the state equation is:
find such that and
for almost every .
The cost functional is
The unconstrained optimal control problem is
where denotes the state associated with .
For constrained problems we replace the full control space by a nonempty closed convex set
Continuous Optimality System¶
The continuous first-order system has already been derived in the previous lecture. We recall it only to fix the notation.
For a given control , the state solves
The corresponding adjoint solves the backward parabolic problem
In weak form this means
Here is the adjoint operator. If is symmetric, then .
The reduced gradient is
with the usual interpretation of through duality:
Thus, in the unconstrained case, the optimality system is
For a closed convex admissible set , the last equation is replaced by the variational inequality
The purpose of this lecture is to build the discrete analogue of this system from the discrete optimization problem itself.
Implicit Euler Discretization of the State¶
Let
We use one control value on each interval
The data are approximated by values
For instance, one may take time averages
The implicit Euler approximation of the state equation is:
find , with , such that
for .
Equivalently,
This is a forward time-stepping scheme. Once and are known, the equation for is an elliptic problem with bilinear form
Coercivity follows immediately:
Therefore each time step is well posed by the Lax-Milgram theorem.
Proposition. For every sequence of controls
there exists a unique discrete state sequence
Moreover, the map
is affine and continuous.
Proof. For fixed , assume is already known. The left-hand side is the coercive bilinear form
and the right-hand side
is a bounded linear functional on . Hence exists and is unique. Induction over gives the full discrete state.
Linearity in the variables and affine dependence on follow directly from the recursion. Continuity follows from the stability estimate obtained by applying Lax-Milgram at each step and iterating over finitely many time levels.
The Discrete Optimization Problem¶
The implicit Euler state equation defines a finite-dimensional-in-time optimization problem.
The discrete cost functional is
The reduced discrete functional is
The unconstrained discrete problem is
For constrained controls, a natural time-discrete admissible set is
For box constraints this will be specified later level by level.
Since the state equation is linear and the cost functional is quadratic, the reduced functional is quadratic. Because , it is strictly convex in the control. Therefore the unconstrained problem has a unique minimizer, and the constrained problem has a unique minimizer whenever is nonempty, closed, and convex.
The remaining task is to compute the gradient of without explicitly forming the derivative of .
Discrete Lagrangian¶
We now derive the discrete adjoint equation directly from the time-discrete optimization problem. This is the discretize-then-optimize approach.
For each time level define the state residual
by
The discrete state equation is
Introduce Lagrange multipliers
We choose the discrete Lagrangian
That is,
This sign convention is chosen so that the discrete adjoint equation has the same sign as the continuous adjoint equation and the reduced gradient is .
Variation with Respect to the State¶
Let be arbitrary variations of the state. The initial value is fixed, so .
The derivative of the tracking term is
The derivative of the residual part in the Lagrangian is
The term containing appears in two neighboring residuals, with the overall minus sign from the Lagrangian:
from , through ;
from , through .
For , the coefficient of is therefore
For , there is no residual . It is convenient to introduce the terminal value
Then all time levels can be written uniformly as
The stationarity condition with respect to is therefore:
find , with , such that
for .
Dividing by gives
This is an implicit Euler scheme run backward in time for the adjoint equation
In boxed form:
This equation is solved backward:
Variation with Respect to the Control¶
We now compute the derivative with respect to .
Using the same Lagrangian convention as above, the control derivative at time level is
for every control variation .
By definition of ,
Hence
as an element of when the product space is equipped with the unweighted product inner product.
It is often more natural to equip the discrete control space with the time-weighted inner product
With respect to this inner product, the gradient is
This distinction is important in implementations:
the algebraic derivative contains the quadrature weight ;
the -consistent gradient does not.
Discrete Optimality System¶
The unconstrained discrete optimality system is:
find
with and , such that for all and :
for ,
for , and
for .
The structure is forward-backward:
the state equation propagates ;
the adjoint equation propagates ;
the control equation couples state and adjoint at the same time level.
This is the time-discrete analogue of the continuous parabolic optimality system.
Discretize-Then-Optimize and Optimize-Then-Discretize¶
There are two common routes to a numerical optimality system.
Discretize then optimize (DTO). First discretize the state equation and the objective functional. This gives a finite-dimensional or finite-time-dimensional optimization problem. Then derive its first-order optimality conditions.
This is what we did above.
Optimize then discretize (OTD). First derive the continuous optimality system:
Then discretize this coupled forward-backward system.
For implicit Euler, the DTO adjoint equation is
This is a backward implicit Euler discretization of the adjoint equation. Thus, for this simple linear-quadratic problem, DTO and OTD can be made to match if the adjoint equation is discretized with the time-reversed scheme that is algebraically adjoint to the state scheme.
The important warning is that this agreement is not automatic. It depends on:
the quadrature rule used in the objective;
the time locations of , , and ;
the discrete inner products used to identify gradients;
the treatment of terminal terms;
the choice of time-stepping method.
DTO guarantees that the computed adjoint is the exact adjoint of the discrete state equation. OTD gives the same result only if the discretization is chosen in an adjoint-consistent way.
For gradient-based optimization of a discretized problem, DTO is usually the safer point of view: the gradient being used is genuinely the gradient of the discrete objective.
Fully Discrete Finite Element Form¶
We now add a standard finite element discretization in space.
Let
be a finite element space with basis . Let the control be represented in a finite-dimensional space with basis .
Write the coefficient vectors as
Define the mass and stiffness matrices
Define also the discrete control operator by
The state equation becomes
The adjoint equation becomes
If and are symmetric, this simplifies to
Let be the control mass matrix. The discrete control equation is
Equivalently, after applying ,
This last form is the coefficient representation of the -gradient in the control space. As in the elliptic case, one must distinguish:
the algebraic residual ;
the Riesz-represented gradient .
This distinction becomes important for projected-gradient and active-set methods.
All-at-Once Structure¶
The time-stepping form suggests solving the state forward and the adjoint backward. However, the whole system can also be assembled as one global space-time KKT system.
Define
The state equations are
and, for ,
Thus the global state operator is block lower bidiagonal:
The adjoint operator is the transpose block structure:
This is the algebraic reason why the adjoint runs backward in time.
The full unconstrained KKT system has the schematic form
Here , , and are block-diagonal space-time matrices containing , , and at each time level.
This all-at-once formulation is useful because it exposes the saddle-point structure of the full problem. It is also the natural form for space-time preconditioners. In this lecture, however, the main point is more basic:
the backward adjoint equation is the transpose of the forward time-stepping operator.
Box Constraints¶
Let the continuous admissible set be a pointwise box in time:
After time discretization we obtain bounds
and the discrete admissible set
The discrete first-order condition is the variational inequality
Since , this is equivalent to
For a box constraint, the projection formula is level-wise:
In coefficient form, if denotes the vector of control coefficients, the Riesz-represented gradient is
This is the quantity that should be compared with primal bound gaps in projected-gradient or active-set methods. The raw algebraic residual
lives in the dual coefficient space and has different scaling.
Thus the active-set logic from the elliptic case applies independently at each time level, but the state and adjoint equations still couple all time levels through forward and backward propagation.
Reduced Numerical Strategy¶
The reduced viewpoint eliminates the state and adjoint from the optimization variables.
Given a control sequence :
solve the state equation forward for ;
solve the adjoint equation backward for ;
assemble the gradient
update the control with a gradient, conjugate-gradient, quasi-Newton, or projected method.
The cost of one gradient evaluation is essentially:
one forward parabolic solve;
one backward parabolic solve;
one application of at each time level.
This is why the adjoint method is indispensable: the cost of the gradient does not scale with the number of control degrees of freedom.
For box-constrained problems the same loop is used, but the update step is projected or active-set based.
What Should Be Remembered¶
Time discretization is not just a technical detail. It determines the exact algebraic optimization problem being solved.
For implicit Euler:
the state equation is a forward recurrence;
the discrete adjoint is a backward recurrence;
the adjoint recurrence is the transpose of the state recurrence;
the discrete gradient at time level is
in finite elements the control gradient requires the control mass matrix;
box constraints give a projection formula at each time level.
The key DTO lesson is:
derive the adjoint from the discrete equations if you want the exact gradient of the discrete objective.
The OTD viewpoint is still valuable, because it explains what the discrete system approximates. But for implementation and optimization, DTO is the most reliable way to avoid sign errors, index shifts, and inconsistent gradients.
References¶
F. Troeltzsch, Optimal Control of Partial Differential Equations, AMS, 2010.
J. C. De los Reyes, Numerical PDE-Constrained Optimization, Springer, 2015.
M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich, Optimization with PDE Constraints, Springer, 2009.
A. Manzoni, A. Quarteroni, S. Salsa, Optimal Control of Partial Differential Equations, Springer, 2021.