Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

One-Shot KKT and PDAS for Inverse Poisson Coefficient Identification

University of Pisa

Overview

In the previous lecture we discussed parameter estimation in elliptic problems at the continuous level. We now turn that theory into a concrete all-at-once finite element algorithm for the identification of the diffusion coefficient in Poisson’s equation.

The goal of this lecture is twofold:

The computational strategy is not reduced optimization. Instead, we assemble and solve the Karush-Kuhn-Tucker system for the state, adjoint, and control together. Because the control enters the operator, the KKT system is nonlinear. The code therefore combines:

Throughout the lecture, let ΩRd\Omega\subset\mathbb R^d be a bounded polygonal or polyhedral Lipschitz domain.


The Continuous Inverse Problem

We want to reconstruct a diffusion coefficient aa from measurements of the corresponding state yy. The forward model is

{(ay)=fin Ω,y=0on Ω.\begin{cases} -\nabla\cdot(a\nabla y)=f & \text{in }\Omega,\\ y=0 & \text{on }\partial\Omega. \end{cases}

The coefficient is the control variable. We assume pointwise bounds

aa(x)a(x)ab(x)a.e. in Ω,a_a(x)\le a(x)\le a_b(x) \qquad\text{a.e. in }\Omega,

with

0<aa(x)a(x)ab(x)0< a_a(x)\le \underline a(x) \le a_b(x)

almost everywhere. The positivity assumption is essential because it keeps the state equation uniformly elliptic.

Given a desired state ydL2(Ω)y_d\in L^2(\Omega) and a reference coefficient arefL2(Ω)a_{\mathrm{ref}}\in L^2(\Omega), we consider

min(y,a)J(y,a):=12yydL2(Ω)2+α2aarefL2(Ω)2\min_{(y,a)} J(y,a) := \frac12\|y-y_d\|_{L^2(\Omega)}^2 + \frac{\alpha}{2}\|a-a_{\mathrm{ref}}\|_{L^2(\Omega)}^2

subject to the PDE and the box constraints, where α>0\alpha>0.

The admissible set is

Uad:={aL(Ω):aaaab a.e. in Ω}.U_{\mathrm{ad}} := \{a\in L^\infty(\Omega): a_a\le a\le a_b \text{ a.e. in }\Omega\}.

The natural weak state equation is: find yH01(Ω)y\in H_0^1(\Omega) such that

Ωayvdx=ΩfvdxvH01(Ω).\int_\Omega a\nabla y\cdot \nabla v\,dx = \int_\Omega f v\,dx \qquad \forall v\in H_0^1(\Omega).

If aUada\in U_{\mathrm{ad}}, then

ba(y,v):=Ωayvdxb_a(y,v):=\int_\Omega a\nabla y\cdot\nabla v\,dx

is bounded on H01(Ω)×H01(Ω)H_0^1(\Omega)\times H_0^1(\Omega) and coercive:

ba(v,v)avL2(Ω)2.b_a(v,v)\ge \underline a \|\nabla v\|_{L^2(\Omega)}^2.

Hence Lax-Milgram gives a unique state y(a)H01(Ω)y(a)\in H_0^1(\Omega).


Sensitivity with Respect to the Coefficient

Unlike distributed control, the control does not appear additively in the right-hand side. It multiplies the highest-order term. This is the key structural difference.

Let aUada\in U_{\mathrm{ad}} and perturb it by hh. Formally differentiating the weak state equation gives the sensitivity z=S(a)hH01(Ω)z=S'(a)h\in H_0^1(\Omega):

Ωazvdx=Ωhy(a)vdxvH01(Ω).\int_\Omega a\nabla z\cdot \nabla v\,dx = -\int_\Omega h \nabla y(a)\cdot \nabla v\,dx \qquad \forall v\in H_0^1(\Omega).

Thus the derivative is obtained by solving another coercive elliptic problem with right-hand side depending on the current state gradient.

This immediately shows why coefficient identification is more nonlinear than source control:


Lagrangian and First-Order Conditions

Introduce the Lagrangian

L(y,p,a):=12yydL2(Ω)2+α2aarefL2(Ω)2+ΩaypdxΩfpdx,\mathcal L(y,p,a) := \frac12\|y-y_d\|_{L^2(\Omega)}^2 + \frac{\alpha}{2}\|a-a_{\mathrm{ref}}\|_{L^2(\Omega)}^2 + \int_\Omega a\nabla y\cdot \nabla p\,dx - \int_\Omega f p\,dx,

where pH01(Ω)p\in H_0^1(\Omega) is the adjoint variable.

The derivative with respect to pp gives back the state equation:

Ωayvdx=ΩfvdxvH01(Ω).\int_\Omega a\nabla y\cdot \nabla v\,dx = \int_\Omega f v\,dx \qquad \forall v\in H_0^1(\Omega).

The derivative with respect to yy yields the adjoint equation:

Ωapvdx=Ω(yyd)vdxvH01(Ω).\int_\Omega a\nabla p\cdot \nabla v\,dx = -\int_\Omega (y-y_d)v\,dx \qquad \forall v\in H_0^1(\Omega).

The sign depends on the sign convention chosen in the Lagrangian. The code uses the equivalent convention

Ωapvdx=Ω(ydy)vdx,\int_\Omega a\nabla p\cdot \nabla v\,dx = \int_\Omega (y_d-y)v\,dx,

which is the same equation written with the right-hand side moved to the other side.

For the control variable, the Gâteaux derivative is

DaL(y,p,a)[h]=αΩ(aaref)hdx+Ωhypdx.D_a\mathcal L(y,p,a)[h] = \alpha\int_\Omega (a-a_{\mathrm{ref}})h\,dx + \int_\Omega h \nabla y\cdot \nabla p\,dx.

Therefore the gradient density is

g(a)=α(aaref)+yp.g(a) = \alpha(a-a_{\mathrm{ref}}) + \nabla y\cdot \nabla p.

The constrained first-order condition is the variational inequality

Ωg(a)(a~a)dx0a~Uad.\int_\Omega g(a)(\widetilde a-a)\,dx \ge 0 \qquad \forall \widetilde a\in U_{\mathrm{ad}}.

Equivalently, if the control space is interpreted with the L2L^2 inner product, one has the projection formula

a=P[aa,ab](aγg(a))γ>0.a = P_{[a_a,a_b]} \bigl(a-\gamma g(a)\bigr) \qquad \forall \gamma>0.

This is the starting point for active-set methods.


Primal-Dual Active Set Interpretation

Define the multiplier-like quantity

λ:=g(a).\lambda := g(a).

Then the box-constrained optimality conditions can be written pointwise as

{λ=0where aa<a<ab,λ0where a=aa,λ0where a=ab.\begin{cases} \lambda = 0 & \text{where } a_a < a < a_b,\\ \lambda \ge 0 & \text{where } a=a_a,\\ \lambda \le 0 & \text{where } a=a_b. \end{cases}

In a primal-dual active set method, one predicts the active sets through a shifted complementarity test. Given a parameter c>0c>0, define

A(a,λ):={xΩ:λ(x)+c(a(x)aa(x))<0},\mathcal A_-(a,\lambda) := \{x\in\Omega:\lambda(x)+c(a(x)-a_a(x))<0\},
A+(a,λ):={xΩ:λ(x)+c(a(x)ab(x))>0}.\mathcal A_+(a,\lambda) := \{x\in\Omega:\lambda(x)+c(a(x)-a_b(x))>0\}.

Then the inactive set is

I:=Ω(AA+).\mathcal I := \Omega\setminus (\mathcal A_-\cup \mathcal A_+).

The active-set update is:

For linear-quadratic box-constrained control this is equivalent to a semismooth Newton method. In our coefficient-identification problem, the state equation is already nonlinear in the control, so PDAS is combined with an additional Newton linearization of the KKT system.


One-Shot KKT Formulation

Let the unknown be the triple

X:=(y,p,a).X := (y,p,a).

The optimality system can be written as a nonlinear operator equation

F(X)=0,F(X)=0,

where the three blocks are:

  1. state residual,

  2. adjoint residual,

  3. control stationarity residual.

The one-shot philosophy means that we do not eliminate the state with a reduced map. We solve directly for all variables at once.

This has two computational consequences:

The price is a larger linear algebra problem. The benefit is that the code reflects the structure of the continuous optimality system almost literally.


Newton Linearization of the KKT System

Suppose Xk=(yk,pk,ak)X^k=(y^k,p^k,a^k) is the current iterate. Newton’s method solves

DF(Xk)δXk=F(Xk),DF(X^k)\,\delta X^k = -F(X^k),

and updates

Xk+1=Xk+δXk.X^{k+1}=X^k+\delta X^k.

Because the diffusion coefficient multiplies both y\nabla y and p\nabla p, the Jacobian contains the couplings

δaykv,akδyv,\delta a\, \nabla y^k\cdot \nabla v, \qquad a^k \nabla \delta y\cdot \nabla v,

for the state block, and similarly

δapkv,akδpv\delta a\, \nabla p^k\cdot \nabla v, \qquad a^k \nabla \delta p\cdot \nabla v

for the adjoint block.

The derivative of the stationarity equation produces

αδa+δypk+ykδp.\alpha\,\delta a + \nabla \delta y\cdot \nabla p^k + \nabla y^k\cdot \nabla \delta p.

This explains why the matrix assembled in the code is not the same as the linear-quadratic KKT matrix from distributed Poisson control.


Discretization Used in the Code

The code uses a single FESystem with three scalar components:

The choice

YhH01(Ω),PhH01(Ω),UhL2(Ω)Y_h \subset H_0^1(\Omega), \qquad P_h \subset H_0^1(\Omega), \qquad U_h \subset L^2(\Omega)

is natural:

The implementation creates exactly this structure in create_fe_system() is implicit in the constructor and parameter parsing through FE_Q(state_degree), FE_Q(state_degree), and FE_DGQ(control_degree).

Homogeneous Dirichlet conditions for state and adjoint are imposed with VectorTools::interpolate_boundary_values(...) on the corresponding components in setup_system().


Algorithm Structure

The overall solver is easier to understand from the following conceptual diagram than from the exact file layout.

This is not a literal copy of the source code. It is the mathematical logic implemented by the solver.


Mapping the Theory to the deal.II Class

The main class is InversePoissonKKT. Its public interface is intentionally small: the driver constructs the class, reads the parameter file, and calls run().

The most important internal methods are:


The Residual Assembled by the Code

In assemble_system() the code evaluates the current state, adjoint, and control at quadrature points and assembles three residual contributions on each cell.

For a test basis function in the state component, it inserts

KahyhvhdxKfvhdx.\int_K a_h \nabla y_h\cdot \nabla v_h\,dx - \int_K f v_h\,dx.

For a test basis function in the adjoint component, it inserts

Kahphvhdx+K(yhyd)vhdx.\int_K a_h \nabla p_h\cdot \nabla v_h\,dx + \int_K (y_h-y_d)v_h\,dx.

For a test basis function in the control component, it inserts

K(α(aharef,h)yhph)whdx.\int_K \Bigl( \alpha(a_h-a_{\mathrm{ref},h}) - \nabla y_h\cdot \nabla p_h \Bigr) w_h\,dx.

The sign in the last term reflects the sign convention chosen in the discrete Lagrangian. This is why the stationarity residual appears in the source as

regularization * (coefficient - reference) - state_gradient * adjoint_gradient

rather than with a plus sign.


Why AffineConstraints<double> Are a Good Fit for PDAS

The key idea of the implementation is that an active set is not handled by a special projection after solving the linear system. Instead, it is inserted into the linear algebra structure itself.

If a control DoF is predicted active at the lower bound, the code adds the constraint

δai=aa,iaik,\delta a_i = a_{a,i}-a_i^k,

and similarly at the upper bound

δai=ab,iaik.\delta a_i = a_{b,i}-a_i^k.

This means the Newton correction already respects the active-set decision. The constrained linear system is built by:

  1. copying the current Jacobian;

  2. merging the PDE constraints with the active-set constraints;

  3. condensing matrix and right-hand side;

  4. solving for the correction;

  5. redistributing the constrained values.

This is exactly the role of make_control_constraints() and solve_linearized_system().

For teaching purposes, this is an excellent example of how a nonsmooth optimization idea can be translated into standard finite element tools.


Damping and Stopping

Because the KKT system is nonlinear in the coefficient, a full Newton step may be too aggressive. The code therefore performs a simple residual-based damping strategy inside run():

This is not yet a polished globalization strategy, but it is mathematically honest and easy to inspect.

The solver stops in one of two situations:

The second condition is important in a teaching code: it prevents endless iterations when the current globalization strategy stalls.


Build and Run

From the deal.II directory:

cd codes/dealii
cmake -S . -B build
cmake --build build --target inverse_poisson_kkt_2d.g
./build/inverse_poisson_kkt_2d.g parameters/inverse_poisson_kkt.prm

The default parameter file is written for a two-dimensional manufactured solution. In particular:

The output consists of:

The VTU fields include:


What to Learn from This Example

This laboratory shows a useful transition point in PDE-constrained optimization.

For distributed linear-quadratic control, the KKT system is already linear. For coefficient identification, three new ingredients appear simultaneously:

The deal.II implementation shows that these difficulties can still be handled within the same finite element framework, provided one is careful about:

In this sense, the code is not just an application of the previous lectures. It is a prototype for much more advanced inverse problems and PDE-constrained Newton methods.