In the previous lectures we focused mainly on smooth PDE-constrained optimization, where the reduced functional is differentiable and the optimality system can be handled with standard Newton or SQP techniques.
Many relevant applications, however, are inherently nonsmooth. The nonsmoothness may enter through the cost functional, through inequality constraints, or through the state equation itself.
Typical examples include:
sparse optimal control with an penalty;
pointwise state constraints;
obstacle-type systems and variational inequalities;
complementarity systems and active-set formulations.
The goal of this lecture is to introduce the mathematical structure and the numerical treatment of these nonsmooth PDE-constrained optimization problems.
We focus on three model classes:
sparse optimal control with regularization;
pointwise state constraints;
variational inequality constraints.
The presentation follows Chapter 6 of De los Reyes and connects naturally with the KKT, PDAS, and semismooth Newton ideas developed in the previous lectures.
Throughout the lecture, let be a bounded Lipschitz polyhedral domain.
Sources of Nonsmoothness¶
There are three main sources of nonsmoothness in PDE-constrained optimization.
Functional nonsmoothness¶
The objective functional may contain nonsmooth terms.
Typical examples are
for sparse control, or
for robust data fitting.
Constraint nonsmoothness¶
The admissible set may contain inequality constraints such as
or
These constraints generate complementarity systems.
Structural nonsmoothness¶
The PDE itself may be nonsmooth.
Examples include:
obstacle problems;
contact mechanics;
friction laws;
variational inequalities;
upwind discretizations.
Why Nonsmooth Optimization?¶
In classical linear-quadratic control we minimize
subject to
The quadratic control term is smooth and strictly convex.
As a consequence:
the reduced functional is differentiable;
the optimality system is smooth;
Newton methods are natural.
However, many applications require structural properties that are not promoted by an penalty.
For instance:
controls acting only on small regions;
bang-bang controls;
switching devices;
actuators that can be turned on and off.
A standard mechanism for promoting sparsity is replacing the quadratic control penalty by an term:
The norm is convex but not differentiable.
Its subdifferential contains set-valued regions:
The loss of smoothness fundamentally changes:
the optimality system;
the interpretation of multipliers;
the numerical algorithms.
Another source of nonsmoothness comes from variational inequalities.
Obstacle problems naturally lead to complementarity systems of the form
The active set changes discontinuously.
Consequently the solution operator is typically not Fréchet differentiable.
Nevertheless, these problems still possess strong structure that can be exploited numerically.
Sparse Optimal Control¶
Problem formulation¶
Consider the elliptic control problem
subject to
Here:
is the quadratic regularization parameter;
controls sparsity.
The admissible set may additionally include box constraints:
The state equation defines a bounded linear control-to-state map
The reduced functional becomes
The nonsmooth term is
Existence of solutions¶
The existence proof follows the direct method of the calculus of variations.
The functional is:
proper;
convex;
coercive on ;
weakly lower semicontinuous.
Indeed,
Thus every minimizing sequence is bounded in .
Using weak compactness, we obtain
Since the state equation is linear and continuous,
Convexity and lower semicontinuity of the norm imply
Hence is optimal.
If , strict convexity of the quadratic term yields uniqueness.
Subdifferentials¶
The classical derivative is no longer sufficient and we replace it with the notion of subdifferential, which we recall here:
Let be a Banach space and let
be convex.
The subdifferential of at is
Elements of are called subgradients.
If is differentiable, then
Thus the subdifferential generalizes the derivative.
Subdifferential of the norm¶
Define
Then
if and only if
almost everywhere.
Equivalently,
and
The subdifferential is therefore set-valued exactly at points where .
This is the mathematical mechanism that promotes sparsity.
Optimality Conditions for Sparse Control¶
Let
be an optimal control and
The adjoint state satisfies
The reduced smooth part has derivative
The optimality condition becomes
Hence there exists
such that
Pointwise,
whenever
This is the key sparsity relation. Notice that does not appear in this threshold condition: when , the term vanishes, and the inclusion reduces to the requirement that .
The adjoint variable directly determines the inactive region.
A Hierarchy of Numerical Methods¶
The optimality condition
already suggests a hierarchy of numerical methods of increasing complexity.
At the most basic level, one can work directly with subgradients and obtain globally defined first-order iterations. A more structured strategy is to exploit the splitting between the smooth reduced functional and the nonsmooth term, leading to proximal algorithms. Finally, if one fully exploits the piecewise smooth structure of the optimality system, one arrives at semismooth Newton and primal-dual active set methods, which are more involved but locally much faster.
This gives the progression:
subgradient descent: simplest and most robust, but slowest;
proximal methods: still first-order, but much more effective for terms;
semismooth Newton / PDAS: most structured and locally most efficient.
Subgradient Descent¶
The simplest way to attack the reduced problem is to use the subdifferential directly.
Let
where
Since is differentiable and is convex, a subgradient of at is
The subgradient iteration reads
For the sparse control problem this becomes
where is the adjoint state associated with .
The method is easy to state and globally meaningful in the convex setting, but it has two important drawbacks:
the subgradient is not unique when vanishes on a set of positive measure;
the convergence is typically slow, since one only expects first-order behavior and usually sublinear rates.
For this reason, subgradient descent is conceptually important, but in sparse PDE-constrained optimization it is often outperformed by proximal methods.
Proximal Point Method¶
Instead of following an arbitrary subgradient, one can regularize the nonsmooth problem at each step by adding a quadratic term. This is the idea of the proximal point method.
Consider a convex functional
Given and an iterate , the proximal point method defines as the minimizer of
The quadratic term makes the subproblem strongly convex and stabilizes the iteration.
The optimality condition for the subproblem is
or equivalently
This motivates the definition of the proximal map:
With this notation, the proximal point iteration reads
Compared with subgradient descent, this is more implicit and usually more stable. However, if applied directly to the full reduced functional, each step may still be expensive because it involves a nontrivial nonsmooth minimization subproblem.
Proximal Gradient Method¶
The sparse control problem has the structure
where
is smooth, while
is convex but nonsmooth.
This splitting leads naturally to the proximal gradient method, also known as forward-backward splitting.
Given a step size , one first takes a gradient step for the smooth part and then a proximal step for the nonsmooth part:
For the reduced sparse control problem, the gradient of the smooth part is
where is the adjoint state associated with .
Hence the iteration becomes
The crucial point is that the proximal map of the norm is explicit and coincides with soft-thresholding:
where
Therefore each proximal gradient step consists of:
solving the state equation and the adjoint equation to compute ;
taking a gradient step for the smooth reduced functional;
applying pointwise soft-thresholding.
Unlike plain subgradient descent, proximal gradient methods exploit the exact structure of the nonsmooth term and are therefore much more effective for sparse control problems. They remain first-order methods, so they are robust and relatively easy to implement, but their convergence is still slower than that of Newton-type methods.
Projection Formula and Soft Thresholding¶
Suppose there are no box constraints.
Then the optimality condition yields the explicit formula
where
This is called the soft-thresholding operator.
Observe the contrast with classical quadratic control:
The term creates a dead zone:
The parameter acts outside this dead zone: it does not determine whether the control vanishes, but it scales the magnitude of when .
Hence large portions of the domain may contain exactly vanishing controls.
Semismoothness¶
The mapping
is not differentiable in the classical sense.
Nevertheless, it is semismooth.
Semismoothness is weaker than Fréchet differentiability but strong enough to obtain superlinear Newton convergence. We recall the semismooth Newton iteration that reads:
where
Then
This is a much more sophisticated strategy than subgradient or proximal methods. It requires more structure, but in return it offers local superlinear convergence once the iteration enters the correct regime.
For sparse control problems, semismooth Newton methods are most naturally derived from an equivalent slack-variable reformulation of the term.
Slack-Variable Reformulation¶
Introduce an auxiliary variable and rewrite the sparse control problem as
subject to
and the pointwise inequalities
These constraints are equivalent to , so at the optimum one must have and the reformulated problem is equivalent to the original one. Indeed, for any fixed pair satisfying the state equation, the variable appears in the objective only through the linear term
with . Therefore the minimization always tries to make as small as possible. But the constraints require
Hence the smallest admissible choice is precisely
If on a set of positive measure one had , then one could decrease on that set without violating the constraints, strictly reduce the objective, and thus contradict optimality.
The advantage is that the nonsmooth term has disappeared from the objective and has been replaced by smooth constraints with complementarity conditions.
KKT System for the Slack Formulation¶
For the reformulated problem, the Lagrangian is
Here:
is the adjoint variable for the state equation;
and are the multipliers associated with the inequalities and .
The KKT conditions are
This is now a genuine complementarity system. In particular, the control stationarity condition is an equality, and the multipliers are classical KKT multipliers.
Interpretation of the Complementarity Conditions¶
The slack formulation makes the structure of the sparse solution transparent.
If , then necessarily , the constraint is active, while is inactive. Hence
and therefore
If , then , the second constraint is active, and one gets
so that
If , then and both constraints are active. In this case
and
which implies
Thus the slack-variable KKT system recovers exactly the same threshold condition as the subdifferential formulation:
Semismooth Newton and PDAS for the Slack Formulation¶
The KKT system above is well suited for semismooth Newton and primal-dual active set methods because the nonsmoothness is now entirely encoded in the complementarity relations.
It is convenient to work with the three regions
On these regions, the KKT conditions reduce to simple relations:
and
A PDAS iteration then proceeds by:
identifying the current positive, negative, and zero regions;
freezing the corresponding complementarity relations on each region;
solving the resulting linear state-adjoint-control-multiplier system.
This is exactly the active-set counterpart of a semismooth Newton step for the slack-variable KKT system. The method is more elaborate than proximal methods, but it exploits the complementarity structure directly and therefore achieves local superlinear convergence under suitable assumptions.
Direct Dual Multiplier Formulation¶
There is another convenient way to write the sparse optimality system, closer to the earlier subdifferential formulation and more compact than the slack-variable KKT system.
Starting from
we introduce the dual variable
Then the control stationarity condition becomes simply
Pointwise, this means
Equivalently,
and
This formulation is exactly the same information as in the earlier sections:
in the subdifferential formulation, one writes ;
here one absorbs the factor into the dual variable and writes with ;
in the soft-thresholding formula, the condition for is recovered by combining with ;
in the slack-variable formulation, the same dual variable is represented by the difference of the two nonnegative multipliers:
while the stationarity condition
implies
Hence the direct dual formulation is a compact bridge between the two other descriptions:
the variational picture based on subdifferentials;
the complementarity picture based on slack variables and KKT multipliers.
For semismooth Newton methods, one can combine
with a semismooth characterization of the graph of , for instance through projection formulas. For PDAS, the slack-variable formulation is often more convenient because the complementarity structure is completely explicit.
References¶
J.C. De los Reyes, Numerical PDE-Constrained Optimization, Springer, 2015.
F. Tröltzsch, Optimal Control of Partial Differential Equations, AMS, 2010.
M. Hintermüller, K. Ito, K. Kunisch, The primal-dual active set strategy as a semismooth Newton method.
M. Hinze, R. Pinnau, M. Ulbrich, S. Ulbrich, Optimization with PDE Constraints.
A. Manzoni, A. Quarteroni, S. Salsa, Optimal Control of Partial Differential Equations, Springer, 2021.