PRIMAL-DUAL INTERIOR-POINT METHODS FOR ... - Semantic Scholar

Report 2 Downloads 29 Views
PRIMAL-DUAL INTERIOR-POINT METHODS FOR PDE-CONSTRAINED OPTIMIZATION MICHAEL ULBRICH∗ AND STEFAN ULBRICH† Abstract. This paper provides a detailed analysis of a primal-dual interior-point method for PDE-constrained optimization. Considered are optimal control problems with control constraints in Lp . It is shown that the developed primal-dual interior-point method converges globally and locally superlinearly. Not only the easier L∞ -setting is analyzed, but also a more involved Lq -analysis, q < ∞, is presented. In L∞ , the set of feasible controls contains interior points and the Fr´echet differentiability of the perturbed optimality system can be shown. In the Lq -setting, which is highly relevant for PDE-constrained optimization, these nice properties are no longer available. Nevertheless, a convergence analysis is developed using refined techniques. In particular, two-norm techniques and a smoothing step are required. The Lq -analysis with smoothing step yields global linear and local superlinear convergence, whereas the L∞ -analysis without smoothing step yields only global linear convergence.

1. Introduction. This paper is concerned with the analysis of primal-dual interior-point methods for optimization problems with PDE- and pointwise inequality constraints. We assume that the problem has optimal control structure and that the inequality constraints are posed on the controls only. In contrast to state constraints, this situation allows for a rigorous analysis. Related investigations of other Newton-based algorithms were conducted in, e.g., [4, 8, 12, 14, 16, 22, 20, 21, 23, 25, 26] for comparable problem settings. For primal-dual interior-point methods, although intensively investigated in finite dimensional mathematical programming, see, e.g., [7] and the references therein, only little rigorous theory is available in the function space framework of optimal control problems. Earlier investigations of modern optimization methods in function space have resulted in valuable deep understanding of algorithms for PDE constrained optimization. In particular, in all analyses, a certain problem structure is required for a successful local convergence analysis. A common theme is that an Lp -setting for the inequalities is required and that a smoothing property or smoothing step must be available. Furthermore, the usual backtracking in interior-point methods to keep iterates strictly positive has to be augmented by suitable projection techniques, at least if the primal-dual Newton step for the control is not in L∞ . Finally, integrated barriers are the appropriate choice, which result in a weighting of the pointwise barriers after discretization. All of these crucial ingredients are not visible in the finite dimensional analysis. A further important benefit of an abstract analysis in function space is that it is the prerequisite for proving mesh independence results, see, e.g., [1, 2, 9]. The purpose of this paper is to give a rigorous analysis of the global and fast local convergence of a primal-dual interior-point method for PDE-constrained optimization. The analysis covers not only the (easier) L∞ -setting but also the quite involved but in practice highly relevant Lq -setting, q < ∞. The crucial point is that for the analysis in the L∞ -setting one needs that the corresponding adjoint state (i.e., the Lagrange multiplier for the state equation) is also in L∞ , which is usually not the case for complex systems like, e.g., the Navier-Stokes equations [6, 10, 19]. One of the difficulties in the Lq -setting, q < ∞, is that the set of feasible controls does not contain interior points with respect to the Lq -topology. As a consequence, the barrier function is not Fr´echet-differentiable in Lq . Moreover, the Newton steps for the control and for the multipliers of the control constraints live in general only in Lq and therefore we have to work in an Lq -setting for the control and the multipliers even if the control is confined by L∞ -bounds. This requires elaborate techniques, including a suitable scaling of the primal-dual Newton system, a two-norm approach, and a smoothing step. In the Lq -setting the central path can touch the boundary of the set of feasible controls on a set of measure zero. We will see that nevertheless interior-point methods make sense in connection ∗ TU

M¨unchen, Chair of Mathematical Optimization, Zentrum Mathematik, M1: Optimization, Boltzmannstr. 3, D-85747 Garching b. M¨unchen, Germany ([email protected]). † TU Darmstadt, Fachbereich Mathematik, AG10: Nonlinear Optimization and Optimal Control, Schlossgartenstr. 7, D-64289 Darmstadt, Germany ([email protected]). 1

with a projection onto an appropriate neighborhood of the central path. We empasize that also in the context of state constraints the central path may touch the boundary of the feasible set. The results and proof techniques of the present paper show that this does not necessarily lead to a failure of interior-point methods and may be valuable also in the context of state constraints. The use of two-norm technique and the introduction of a smoothing step allows not only to go beyond the L∞ -setting. It also results in a globally linearly and locally superlinearly convergent primal-dual interior-point algorithm, whereas the L∞ -setting without smoothing step results only in a globally linearly convergent method. We now give a short overview of the available literature on interior-point methods in function space. In the context of optimal control with ODEs and control constraints, [24] investigates the properties of the central path in an L∞ -setting and proposes a corresponding continuation method. For a similar setup, a linearly convergent primal-dual interior-point short step method was investigated in [23] in an L∞ -framework. Furthermore, a primal interior-point method for linear-quadratic optimal control problems with an elliptic PDE and control constraints was investigated in [25]. In this approach, the control is eliminated from the perturbed optimality system and linear convergence of a short step path following method is proved. Superlinear convergence of this method was proved in [16] under H 2 -regularity assumptions. The convergence of a primal-dual interior-point method for linear-quadratic elliptic control problems with mixed control-state constraints was investigated in [14]. The problem can be reformulated as a control problem with control constraints having special structure. The analysis of the central path and linear convergence of a short step path following method is carried out in an L∞ -framework. In addition to the mentioned references, numerical evaluations of interior-point methods in the context of PDE constrained optimization have been conducted in, e.g., [3, 13]. The paper is organized as follows: In section 2 the considered problem class is described and it is illustrated that elliptic optimal control problems fit into this class. Then, first order optimality (KKT) conditions are derived. As a first step towards interior-point methods, a barrier problem is formulated, its unique solvability is proved, and optimality conditions are stated that result in perturbed KKT conditions that form the basis for the primal dual Newton step. Section 3 presents and illustrates a functional analytic setting that is used in the rest of the paper. In section 4, properties of the central path are derived, in particular the boundedness of the dual variables in Lq and the boundedness of the central path. Section 5 is devoted to the analysis of the primal-dual Newton system on a suitable neighborhood of the central path. A key result is the uniformly bounded invertibility of the suitably scaled linear operator in the primal dual Newton system on bounded subsets of the neighborhood. As a simple consequence, the norm of the inverse of the unscaled operator is √ uniformly bounded by O(1/ µ) on bounded subsets of the neighborhood. The H¨older continuity of the central path is proved in section 6. The conceptual primal-dual interior-point method is formulated in section 7. It includes a projection onto the neighborhood of the central path that replaces the usual backtracking. In section 8, the method is analyzed in the L∞ -setting. Quadratic local convergence towards the central path and global linear convergence are proved. Finally, in section 9, the more involved analysis of the method in Lq , q < ∞, is carried out. As for other approaches, an inevitable norm gap occurs that has to be closed by a smoothing step. Such a smoothing step is derived and incorporated in the algorithm. For the resulting method, global linear and local superlinear convergence is proved. Section 10 presents numerical results. Notations. We denote the Lp -norm by k · kp , 1 ≤ p ≤ ∞. For Banach spaces X, Y we denote by L(X, Y ) the space of bounded linear operators from X to Y equipped with the operator norm k · kX,Y . X ∗ is the dual space of a Banach space X and h·, ·iX ∗ ,X is the corresponding dual pairing. By leb (·) we denote the Lebesgue measure on Rn . Throughout the paper equalities and inequalities between Lp -functions are meant pointwise almost everywhere. If X ⊂ Y is a continuous embedding, we write IX,Y , IX,Y x = x for the embedding operator. Sometimes, if no confusion is to be expected, we save space by writing I instead of IX,Y . If F : X → Y is an operator between 2

Banach spaces, we denote by F ′ its first Fr´echet derivative and by F ′′ its second Fr´echet derivative, if they exist. If X is a product space, then x = (x1 , . . . , xm ) ∈ X has m ≥ 2 components and we denote by Fxi the partial Fr´echet-derivative with respect to xi , by Fxi xj the derivative of Fxi with respect to xj , etc., if they exist. Occasionally, given Banach spaces X1 ⊂ X0 , Y1 ⊂ Y0 , where “⊂” means continuous embedding, we will use the notation L(X0 , Y1 ) ⊂ L(X1 , Y0 ) to indicate that every operator A ∈ L(X0 , Y1 ) ˜ := Ax, x ∈ X1 . induces an operator A˜ ∈ L(X1 , Y0 ) via Ax

2. Control constrained optimal control problem. Let Ω ⊂ Rn be a bounded open domain with sufficiently smooth boundary. We consider the optimal control problem with control constraints (2.1)

min

y∈Y,u∈U

J (y, u)

s.t. c(y, u) = 0,

a ≤ u ≤ b,

where U = Lp (Ω), p ∈ [2, ∞), a, b ∈ L∞ , b − a ≥ ν > 0, and Y is a Banach space. We set B := {u ∈ U : a ≤ u ≤ b} ,

x = (y, u),

X := Y × U,

and assume that there exists an open set D ⊂ Lp (Ω), D ⊃ B such that (A1) J : Y × U → R, c : Y × U → Λ are twice locally Lipschitz-continuously differentiable, Λ is a Banach space, and there exist uniform Lipschitz constants on bounded subsets of Y × B. (A2) cy (y, u) ∈ L(Y, Λ) has a bounded inverse for all (y, u) ∈ Y × D and kcy (y, u)−1 kΛ,Y is uniformly bounded on bounded subsets of Y × B. (A3) For all u ∈ D there exists a unique solution y = y(u) ∈ Y of c(y(u), u) = 0 and there exists My > 0 with ky(u)kY ≤ My

∀ u ∈ B.

(A4) The reduced objective functional u ∈ (B, k · k∞ ) 7→ J (y(u), u) =: Jˆ(u) is lower semicontinuous w.r.t. sequential L∞ -weak∗ convergence. R EMARK 2.1. By the implicit function theorem (A1)–(A3) ensure that u ∈ D 7→ y(u) ∈ Y and u ∈ D 7→ J (y(u), u) are twice locally Lipschitz-continuously differentiable and in addition Lipschitz continuous on B. ′ For convenience we identify U ∗ = Lp (Ω)∗ with Lp (Ω), 1p + p1′ = 1, via the dual pairing hv, uiU ∗ ,U = hv, ui :=

Z

vu dξ.



We recall that in function spaces of distributions it is common practice to extend h·, ·i to the distributional dual pairing. In our examples, we typically work with the Sobolev spaces H01 (Ω) and Y˜ = H01 (Ω) ∩ H 2 (Ω). The dual spaces with respect to the dual pairing h·, ·i then result in the following continuous and dense embeddings: Y˜ ⊂ H01 (Ω) ⊂ L2 (Ω) ⊂ H −1 (Ω) = H01 (Ω)∗ ⊂ Y˜ ∗ . In this sense, we can (and will) interpret H01 -functions as (nice) L2 -functions and L2 -functions as (nice) H −1 -functions (the latter are generalized functions). Furthermore, we will omit operators of 3

the form IY1 ,Y2 : y ∈ Y1 7→ y ∈ Y2 if Y1 ⊂ Y2 is continuously embedded. In this case, we also have IY∗1 ,Y2 : y ′ ∈ Y2∗ 7→ y ′ ∈ Y1∗ , thus IY∗1 ,Y2 = IY2∗ ,Y1∗ , i.e., the adjoint acts like the identity. P ROPOSITION 2.2. Under assumptions (A1)–(A4) problem (2.1) has a solution. Proof. Take a minimizing sequence (y(uk ), uk ). Since (uk ) ⊂ B, it is bounded in L∞ and has a weak∗ -convergent subsequence, which we denote again by (uk ) for simplicity, with limit u ¯ ∈ B. But by (A4) we have lim sup J (y(uk ), uk ) ≥ J (y(¯ u), u ¯) k→∞

and thus (y(¯ u), u ¯) solves (2.1), since (y(uk ), uk ) is a minimizing sequence. 2.1. An example. As a standard example we consider the following elliptic control problem min y,u

s.t.

(2.2)

1 α 2 2 ky − yd kL2 + kukL2 2 2 −∆y = u in Ω,

y = 0 in ∂Ω,

a≤u≤b

in Ω.

Here, Ω ⊂ Rn and a, b ∈ L∞ (Ω) are as specified above, u ∈ U := L2 (Ω) is the control, y ∈ H 1 (Ω) is the state, yd ∈ L2 (Ω) is the desired state, and α > 0 is a regularization parameter. There are at least two reasonable ways to choose the functional analytic setting. We choose p = 2, D = U and have U ∗ = U = L2 (Ω). 2.1.1. First setting. The first setting is to consider the usual weak solution of the state equation. Here, the state space is Y = H01 (Ω) and the PDE is considered in the weak form Z Z ∇y(ξ)T ∇v(ξ) dξ = u(ξ)v(ξ) dξ ∀ v ∈ H01 (Ω). Ω



This results in the abstract state equation c(y, u) := Ay − u = 0 in Λ with Λ = Y ∗ = H −1 (Ω). Note that, as mentioned earlier, we have omitted the embedding operator IL2 ,H −1 ∈ L(L2 (Ω), Λ), IL2 ,H −1 u = u. The operator A ∈ L(Y, Λ) is defined by Z ∇y(ξ)T ∇v(ξ) dξ ∀ y, v ∈ H01 (Ω). hAy, viH −1 ,H01 = hAy, vi = Ω

It is well known that A ∈ L(Y, Λ) is invertible. Clearly, J (y, u) =

1 α 2 2 ky − yd k2 + kuk2 2 2

is twice continuously differentiable with hJy (y, u), viY ∗ ,Y = hy − yd , vi = (y − yd , v)2 ,

hJu (y, u), wiU,U ∗ = αhu, wi = α(u, w)2 ,

hJyy (y, u)v2 , v1 iY ∗ ,Y = (v1 , v2 )2 ,

hJuu (y, u)w2 , w1 iU ∗ ,U = α(w1 , w2 )2 .

In short notation: Jy (y, u) = y − yd ,

Ju (y, u) = αu,

Jyy (y, u) = IY ∗ ,Y , 4

Juu (y, u) = αIU .

Furthermore, c(y, u) = Ay − u is twice continuously differentiable with cu (y, u) = −IL2 ,H −1 ,

cy (y, u) = A,

c′′ (y, u) = 0.

The uniform Lipschitz constants on bounded sets for c, J ′ and their derivatives are clear due to bounded linearity. The uniform Lipschitz continuity of J on bounded sets follows from the boundedness of J ′ on bounded sets. Hence, (A1) is shown. (A2) follows since cy (y, u) = A is constant and A ∈ L(Y, Λ) is invertible. (A3) follows from y(u) = A−1 Bu. Finally (A4) is satisfied since u ∈ L2 7→ J (A−1 u, u) ∈ R is convex and continuous, hence sequentially weakly lower semicontinuous. As a consequence, Jˆ is also lower semicontinuous w.r.t. sequential L∞ weak∗ convergence. 2.1.2. Second setting. For the analysis to follow, it will be useful to exploit additional regularity of the state and the adjoint state. This motivates to consider the following setting for (2.2): From the assumptions on Ω and standard regularity results for elliptic equations it follows that the solution of the state equation enjoys more regularity, namely, y ∈ Y˜ := H01 ∩ H 2 (we use a “ ˜ ” to distinguish from the first setting). Hence, we can write the state equation also as follows: ˜ − u = 0 in Λ ˜ := L2 c˜(y, u) := Ay ˜ Again A˜ is invertible in L(Y˜ , Λ). ˜ with A˜ ∈ L(Y˜ , Λ). Since the Laplace operator is symmetric, we obtain also H 2 -regularity for the adjoint state. This will be important, since in assumption (A5)q , we will in particular require that there exists q ∈ (p, ∞] such that the adjoint state λ satisfies c∗u (y, u)λ ∈ Lq (Ω),

i.e., for (2.2),

− λ ∈ Lq (Ω).

The larger q can be chosen, the better. In the first setting, we would be limited to q-values for which the embedding H01 ⊂ Lq holds. If we want to increase q, (e.g., q = ∞ for n = 3), it is advantageous to exploit any regularity that is available for λ, and thus an H01 ∩ H 2 -setting is beneficial (if it is applicable). Just as before, we can calculate derivatives and verify the assumptions (A1)–(A4). 2.2. Optimality conditions. If we define the closed convex set C := {0} × B ⊂ Λ × U and the constraint function   c(y, u) h(y, u) := u then the constraint in (2.1) can be written as h(y, u) ∈ C. Denote for (λ, z) ∈ Λ∗ × U ∗ the Lagrangian function for the abstract problem min

y∈Y,u∈U

J (y, u) s.t. h(y, u) ∈ C

by L(y, u, λ, z) = J (y, u) + hλ, c(y, u)iΛ∗ ,Λ + (z, u)2 . 5

Let x ¯ = (¯ y, u ¯) ∈ Y × B be a local solution of (2.1). Since cy (¯ x) is surjective by (A2), the operator   cy (¯ x) cu (¯ x) ∈ L(X, Λ × U ) h′ (¯ x) = 0 I is surjective and therefore Robinson’s constraint qualification [15] 0 ∈ int(h(¯ x) + hx (¯ x) · X − C) ¯ z¯) ∈ Λ∗ × U ∗ with is satisfied. By standard optimality theory, see [5, Prop. 3.2], there exist (λ, (2.3)

¯ Ju + c∗ λ)(¯ ¯ x) + (0, z¯) = 0, Lx (¯ x, λ, z) = (Jy + c∗y λ, u

h(¯ x) ∈ C,

¯ z¯) ∈ NC (h(¯ (λ, x))

with the normal cone NC (h(¯ x)) := {(λ, z) ∈ Λ∗ × U ∗ : hλ, w − c(¯ x)iΛ∗ ,Λ + (z, v − u ¯) ≤ 0 ∀ (w, v) ∈ C} = {(λ, z) ∈ Λ∗ × U ∗ : (z, v − u ¯) ≤ 0 ∀ v ∈ B}  ∗ ∗ = (λ, z) ∈ Λ × U : z|{u¯=a} ≤ 0, z|{¯u=b} ≥ 0, z|{a 0. Moreover, we have un → u ¯ in L∞ -weak∗ and thus also in L2 -weak. It remains to show that Z Z u ∈ (B, k · k2 ) 7→ −µ ln(u − a) dx − µ ln(b − u) dx =: fa (u) + fb (u) Ω



is lower semicontinuous w.r.t. weak convergence. R R We consider only fa : u 7→ −µ Ω ln(u − a) dx, since fb : u 7→ −µ Ω ln(b − u) dx can be treated analogously. Since B ⊂ L2 (Ω) is closed and convex and fa : B 7→ R ∪ {∞} is convex, it is sufficient to show that the mapping is lower semicontinuous w.r.t. strong convergence, see [11, Lem. 22.6] or [27, Prop. 38.7]. To this end, let B ∋ vk → v¯ in L2 . We have to show lim inf fa (vk ) ≥ fa (¯ v).

(2.7)

k→∞

If lim inf k→∞ fa (vk ) = ∞ this trivially holds. Now consider the case limk→∞ fa (vk ) < ∞. By selecting a subsequence (vk )K we achieve lim

K∋k→∞

fa (vk ) = lim inf fa (vk ) < ∞,

(vk )K → v¯ almost everywhere on Ω.

k→∞

In particular, there exists a constant C > 0 with fa (vk ) ≤ C for all k ∈ K. We observe that −µ ln(vk − a) = −µ ln(max(vk − a, 1)) − µ ln(min(vk − a, 1)) =: gk + hk , ¯ −µ ln(¯ v − a) = −µ ln(max(¯ v − a, 1)) − µ ln(min(¯ v − a, 1)) =: g¯ + h. We have the estimate |gk − g¯| ≤ µ| max(vk − a, 1) − max(¯ v − a, 1)| ≤ µ|vk − v¯| and thus lim

K∋k→∞

fa (vk ) =

lim

K∋k→∞

Z

(gk + hk ) dx = Ω

Z

g¯ dx + lim

k→∞



Z

hk dx.



Moreover, hk ≥ 0, |gk | ≤ µ|vk − a − 1| and, for k ∈ K, Z Z 0≤ hk dx ≤ C − gk dx ≤ C + µkvk − a − 1k1 ≤ C + C ′ . Ω



¯ a.e.. Now the Lemma of Fatou (where we can work From (vk )K → v¯ a.e. we obtain (hk )K → h ¯ ∈ L1 (Ω) and with lim instead of lim inf here) yields h Z Z Z ¯ 0≤ h dx = lim hk dx ≤ lim hk dx ≤ C + C ′ . Ω K∋k→∞



K∋k→∞



This concludes the proof that lim inf fa (vk ) = k→∞

lim

K∋k→∞

fa (vk ) = lim inf k→∞

Z



(gk + hk ) dx ≥

Z

¯ dx = fa (¯ (¯ g + h) v). Ω

As mentioned before, the same holds for vk → v¯ weakly, see Jost, Lemma 4.2.2. Applying this to the minimizing sequence, we obtain together with (2.6) lim inf Jµ (y(uk ), uk ) = lim inf J (y(uk ), uk ) + fa (uk ) + fb (uk ) k→∞

k→∞

≥ J (y(¯ u), u ¯) + fa (¯ u) + fb (¯ u) = Jµ (y(¯ u), u ¯). 7

Hence, (y(¯ u), u ¯) solves (2.5). R EMARK 2.4. It is important to point out that the proof would be easier if we could assume that the solution lies in the L∞ -interior of B. Since, however, the case that u ¯ touches the bounds on a null set cannot be excluded, we need alternative proof techniques as given above. If the problem has special structure, e.g., as in [14], it is sometimes possible to prove that u ¯ lies in the L∞ -interior of B and then a less technical argumentation can be used, see, e.g., section 3 of [14]. It is obvious that {¯ u = a} and {¯ u = b} have measure zero, since Jµ (¯ y, u ¯) < ∞. Therefore, it is easy to derive the following necessary optimality conditions for (2.5). L EMMA 2.5. Let assumptions (A1)–(A4) hold and let xµ = (yµ , uµ ) be a local solution of (2.5), µ > 0. Then there is λµ ∈ Λ∗ such that  Jy (xµ ) + cy (xµ )∗ λµ = 0,    µ µ Ju (xµ ) + cu (xµ )∗ λµ + − = 0, (2.8) uµ uµ − a    c(xµ ) = 0, a < uµ < b.

R EMARK 2.6. By introducing the artificial variables za,µ = uµµ−a and zb,µ = write (2.8) as the perturbed KKT-conditions  ℓy (xµ , λµ , za,µ , zb,µ ) = Jy (xµ ) + cy (xµ )∗ λµ = 0,     ∗    ℓu (xµ , λµ , za,µ , zb,µ ) = Ju (xµ ) + cu (xµ ) λµ + zb,µ − za,µ = 0, c(xµ ) = 0,

(2.9)

      

µ b−uµ

we can

a ≤ uµ ≤ b,

(uµ − a) za,µ = µ,

(b − uµ ) zb,µ = µ,

za,µ ≥ 0,

zb,µ ≥ 0.

We call the solution set (2.9) parameterized by µ > 0 central path. We will see that under appropriate assumptions the central path is actually a H¨older-continuous curve that converges for µ → 0 to a solution of (2.1). Proof. By (A3) there exists for any u ∈ D a unique solution y = y(u) ∈ Y of c(y, u) = 0. By (A2) and the implicit function theorem the mapping u ∈ (D, k · kU ) 7→ y(u) ∈ Y is continuously differentiable with cy yu = −cu . ˆ Thus the reduced objective functional u ∈ (D, k · kU ) 7→ J(u) is continuously differentiable with derivative Jˆu (u) = −hJy (x), cy (x)−1 cu (x) · iY ∗ ,Y + Ju (x) = −cu (x)∗ cy (x)−∗ Jy (x) + Ju (x)

= −cu (x)∗ cy (x)−∗ Jy (x) + Ju (x), where x = (y(u), u). Let (y(uµ ), uµ ) be the solution of (2.5). With the unique solution λµ ∈ Λ∗ of Jy (xµ ) + cy (xµ )∗ λµ = 0 we have (2.10)

Jˆu (u − µ) = cu (xµ )∗ λµ + Ju (xµ ).

We show that w := Jˆu (uµ ) −

µ µ + = 0 a.e. uµ − a b − uµ 8

We know that a < uµ < b almost everywhere. The sets Mk := {a + 1/k ≤ uµ ≤ b − 1/k} S∞ are monotone increasing with k=1 Mk = Ω \ N with a set N of measure zero. Let v ∈ L∞ (Ω) be arbitrary, then vk := v1Mk → v in U = Lp (Ω), since p < ∞. For all t ∈ (−ρ, ρ), ρ > 0 small enough, we have a + 1/(2k) ≤ uµ + tvk ≤ b − 1/(2k) and therefore the function hk : t ∈ (−ρ, ρ) 7→Jµ (y(uµ + tvk ), uµ + tvk ) Z Z ˆ ln((uµ + tvk ) − a) ln(b − (uµ + tvk )) dx − µ = J(uµ + tvk ) − µ Ω



is continuously differentiable with  h′k (t) = Jˆu (uµ + tvk ) +

µ b − (uµ + tvk )



µ , vk (uµ + tvk ) − a



. 2

Since (y(uµ ), uµ ) is optimal for (2.5) and uµ + tvk ∈ B for t ∈ (−ρ, ρ), the function hk has a minimum at t = 0 and thus   µ µ ′ ˆ 0 = hk (0) = Ju (uµ ) + , vk . − b − uµ uµ − a 2 Taking the limit k → ∞ we obtain  Jˆu (uµ ) +

µ µ ,v − b − uµ uµ − a



= 0. 2

This holds for all v ∈ L∞ (Ω) and by density for all v ∈ U . We deduce with (2.10) that cu (xµ )∗ λµ + Ju (xµ ) +

µ µ = 0. − b − uµ uµ − a

R EMARK 2.7. For the special case of linear elliptic control problems, the previous results were shown in a different way in [14]. The control problem in [14] satisfies our assumption (A5)q below with q = ∞. In this particular case the solution of the barrier problem (2.5) lies in the interior of B, see Corollary 4.4 below, and z¯a , z¯b are bounded in L∞ . The analysis in [14] makes essential use of this fact. In this paper we cover the more general setting that z¯a , z¯b are only bounded in Lq for some q > p. This is of essential interest to cover state equations where the state or adjoint equation does not allow for a priori estimates in L∞ . In the latter case solutions of (2.5) can touch the boundary of B on a zero set and are thus no interior points in the classical sense. Nevertheless, we will see that also in this setting interior-point methods with a projection are convergent, since – roughly speaking – the measure of the set where the solution of (2.5) has distance ≤ ε to the boundary of B tends to zero as ε ց 0. The analysis is considerably more involved than for the case (A5)∞ . 3. A function space setting. Unfortunately, it is not possible to work with soft analysis only. Rather, we need a carefully adjusted function space setting, where a typical requirement will be that a continuous (or differentiable) mapping h : X1 → Y1 also defines a mapping h : X2 → Y2 ¿¿from a stronger space X2 ⊂ X1 to a stronger space Y2 ⊂ Y1 . For instance, as a trivial example, the identity mapping X1 ∋ x 7→ x ∈ X1 induces the identity mapping X2 ∋ x 7→ x ∈ X2 for any stronger space X2 ⊂ X1 . We make the following assumptions, which are satisfied for many elliptic and parabolic optimal control problems, see [17, 18] and the references therein. 9

(A5)q There are q ∈ (p, ∞] and Banach spaces Σ ⊂ Λ∗ , V ⊂ Y ∗ such that the following holds: 1. The mapping (y, u, λ) ∈ Y × Lq (Ω) × Σ 7→ ℓu (y, u, λ, 0, 0) ∈ Lq (Ω) is differentiable and its derivative is Lipschitz continuous on bounded subsets of Y × B × Σ. 2. The mapping (y, u) ∈ Y × U 7→ Jy (y, u) ∈ V is differentiable and its derivative is Lipschitz continuous on bounded subsets of Y ×B. 3. The operator (y, u) ∈ Y × U 7→ c∗y (y, u) ∈ L(Σ, V ) is differentiable and its derivative is Lipschitz continuous on bounded subsets of Y ×B. 4. The following mappings are continuous and uniformly bounded on bounded sets: (y, u) ∈ Y × (B, k · kq ) 7→ Ju (y, u) ∈ Lq (Ω),

(y, u) ∈ Y × U 7→ c∗u (y, u) ∈ L(Σ, Lq (Ω)),

(y, u) ∈ Y × U 7→ c−∗ y (y, u) ∈ L(V, Σ),

(y, u, λ) ∈ Y × U × Σ 7→ ℓuy (y, u, λ, 0, 0) ∈ L(Y, Lq (Ω)),

(y, u, λ) ∈ Y × U × Σ 7→ ℓyu (y, u) ∈ L(L2 (Ω), V ),

(y, u, λ) ∈ Y × U × Σ 7→ ℓuu (y, u) ∈ L(Lt (Ω), Lt (Ω)),

t ∈ [2, q].

5. The reduced gradient has the structure ℓu (y, u, λ, 0, 0) = β(u) + gˆs (y, u, λ),

β ∈ C 1 (R), β ′ ≥ α0 > 0,

where (y, u, λ) ∈ Y × U × Σ 7→ gˆs (y, u, λ) ∈ Lq (Ω) is Lipschitz continuous on bounded sets. 6. The reduced Hessian (3.1)

−1 ∗ −∗ −1 ˆ H(y, u, λ) := ℓuu + c∗u c−∗ y ℓyy cy cu − cu cy ℓyu − ℓuy cy cu

has the structure ˆ ˆ s (y, u, λ), H(y, u, λ) = β ′ (u)I + H ˆ s (y, u, λ) ∈ L(L2 , Lq ) is uniformly bounded on where (y, u, λ) ∈ Y × U × Σ 7→ H bounded subsets. R EMARK 3.1. We point out that usually, (A5)q2 implies (A5)q1 for q2 > q1 . In fact, since Lq2 ⊂ Lq1 , only part 1 and the first condition in part 4 can be critical, but for the examples to follow, they are not. 3.1. Verification for the elliptic control problem. 10

3.1.1. First setting. In the first setting we can verify (A5q ) for any q > 2 such that Y = H01 (Ω) is continuously embedded in Lq (Ω). We do not need the additional spaces V and Σ, since we just can choose V = Y ∗ = H −1 (Ω) and Σ = Λ∗ = H01 (Ω) = Y . We have 1 α 2 2 ky − yd k2 + kuk2 2 2 + hλ, Ay − ui − (za , u − a)2 + (zb , u − b)2 ,

ℓ(y, u, λ, za , zb ) =

Jy (y, u) = y − yd ,

Ju (y, u) = αu,

ℓy (y, u, λ, za , zb ) = Jy (y, u) + A∗ λ = y − yd + Aλ,

ℓu (y, u, λ, za , zb ) = αu − λ − za + zb ,

ℓyu (y, u, λ, za , zb ) = 0, ℓuy (y, u, λ, za , zb ) = 0,

ℓyy (y, u, λ, za , zb ) = IY,Y ∗ , ℓuu (y, u, λ, za , zb ) = αI. Therefore, (A5q ) is a direct consequence of the following observations: 1. ℓu (y, u, λ, 0, 0) = αu − λ ∈ Lq (Ω) + Σ = Lq (Ω). 2. Jy (y, u) = y − yd ∈ L2 ⊂ H −1 = V . 3. c∗y (y, u) = A ∈ L(H01 , H −1 ) = L(Σ, V ). 4. Ju (y, u) = αu, c∗u (y, u) = IΛ∗ ,U ∗ = IH01 ,L2 ∈ L(Σ, Lq (Ω)), −1 ∈ L(H −1 , H01 ) = L(V, Σ), c−∗ y (y, u) = A ℓuy (y, u, λ, 0, 0) = 0, ℓyu (y, u) = 0, ℓuu (y, u) = αI. 5. ℓu (y, u, λ, 0, 0) = αu − λ = β(u) + gˆs (y, u, λ) with β(t) = αt, β ′ (t) = α, gˆs (y, u, λ) = −λ. Hence, β ∈ C 1 (R), β ′ (t) = α ≥ α0 for any α0 ∈ (0, α], and gˆs (y, u, λ) = −λ ∈ H01 ⊂ Lq (Ω).

ˆ s (y, u, λ), ˆ 6. H(y, u, λ) = αI + IH01 ,L2 A−1 IH01 ,H −1 A−1 IL2 ,H −1 = αI + A−2 = β ′ (u)I + H ˆ s (y, u, λ) = A−2 ∈ L(H −1 , H 1 ) ⊂ L(L2 , Lq ) is constant and thus uniformly bounded where H 0 on bounded subsets. 3.1.2. Second setting. Here, we choose V = L2 , Σ = Y˜ , and verify (A5q ) for any q ∈ (2, ∞] satisfying Y˜ = H01 ∩ H 2 ⊂ Lq (continuous embedding). In particular, for n ≤ 3, we have Y˜ ⊂ L∞ . It is quite obvious that the operator A ∈ L(H01 , H −1 ) from the first setting is the unique extension of A˜ ∈ L(Y˜ , L2 ) ⊂ L(Y˜ , H −1 ) to H01 ⊃ Y˜ . Therefore, A˜∗ ∈ L(L2 , Y˜ ∗ ) is the unique extension of A∗ = A ∈ L(H01 , H −1 ) ⊂ L(H01 , Y˜ ∗ ) to L2 ⊃ H01 . Since, as said, A = A∗ is the unique ex˜ this shows that A˜∗ ∈ L(L2 , Y˜ ∗ ) is the unique extension of A˜ ∈ L(Y˜ , L2 ) ⊂ L(Y˜ , Y˜ ∗ ) tension of A, to L2 . Hence, (3.2)

y ∈ Y˜ 7→ A˜∗ y ∈ L2 is continuous linear and boundedly invertible in L(Y˜ , L2 ).

The derivatives can be computed similar to the first setting. The validity of (A5q ) follows from 1. ℓu (y, u, λ, 0, 0) = αu − λ ∈ Lq (Ω) + Σ = Lq (Ω) + Y˜ = Lq (Ω). 2. Jy (y, u) = y − yd ∈ L2 = V .

11

3. c∗y (y, u) = A˜∗ ∈ L(Y˜ , L2 ) = L(Σ, V ) (see (3.2)). 4. Ju (y, u) = αu, c∗u (y, u) = IΛ∗ ,U ∗ = IL2 ∈ L(Σ, Lq (Ω)), ˜−∗ ∈ L(L2 , Y˜ ) = L(V, Σ) (see (3.2)), c−∗ y (y, u) = A ℓuy (y, u, λ, 0, 0) = 0, ℓyu (y, u) = 0, ℓuu (y, u) = αI. 5. ℓu (y, u, λ, 0, 0) = αu − λ = β(u) + gˆs (y, u, λ) with β(u) = αt, β ′ (t) = α, gˆs (y, u, λ) = −λ. Hence, β ∈ C 1 (R), β ′ (t) = α ≥ α0 for any α0 ∈ (0, α], and gˆs (y, u, λ) = −λ ∈ Σ = Y˜ ⊂ Lq (Ω) for λ ∈ Σ. ˆ s (y, u, λ), ˆ 6. By (3.2), H(y, u, λ) = αI + A˜−∗ IY˜ ,Y˜ ∗ A˜−1 = αI + A˜−2 = β ′ (u)I + H ˆ s (y, u, λ) = A˜−2 ∈ L(L2 , Y˜ ) ⊂ L(L2 , Lq ) is constant and thus uniformly bounded on where H bounded subsets. 4. Properties of the central path. We study next the regularity of the dual variables za,µ , zb,µ on the central path. L EMMA 4.1. Let (A1)–(A5)q hold and let (yµ , uµ , λµ , za,µ , zb,µ ) be a solution of (2.9). Then there holds λµ ∈ Σ, 0 ≤ za,µ , zb,µ ≤ max(3µ/ν, 2|Ju (xµ ) + cu (xµ )∗ λµ |),

(4.1) and thus with (A5)q (4.2)

kza,µ kq , kzb,µ kq ≤ k max(3µ/ν, 2|Ju (xµ ) + cu (xµ )∗ λµ |)kq < ∞.

Proof. From the first equation in (2.9) we see that λµ = −cy (xµ )−∗ Jy (xµ ) ∈ cy (xµ )−∗ V ⊂ Σ. We have za,µ =

µ , uµ − a

zb,µ =

µ . b − uµ

This yields on the set M = {uµ − a ≤ (b − uµ )/2} the estimate za,µ |M ≥ 2zb,µ |M and thus by (2.9) 0≤

1 za,µ |M ≤ za,µ |M − zb,µ |M = (Ju (xµ ) + cu (xµ )∗ λµ )|M . 2

On the complement M c := Ω \ M we have 1 1 3 (uµ − a)|M c ≥ (b − a)|M c ≥ ν 2 2 2 and thus (uµ − a)|M c ≥ ν/3. Both cases together prove (4.1) for za,µ . The estimate for zb,µ is obtained in the same way. Since λµ ∈ Σ, the right hand side of (4.1) is in Lq by (A5)q and thus (4.2) is obvious. 12

We introduce for s ∈ [1, ∞] the function spaces Ws := Y × Ls × Σ × Ls × Ls ,

Ws′ := V × Ls × Λ × Ls × Ls , equipped with the norms k(y, u, λ, za , zb )kWs k(y, u, v, za , zb )kWs′

q

2 2 2

= kykY + kλkΣ + u + za + zb

,

q

s

2 2 2 = kykV + kvkΛ +

u + za + zb . s

R EMARK 4.2. The choice of the Euclidean norm for (u(ξ), za (ξ), zb (ξ)) ∈ R3 will be convenient, since we will later use a pointwise orthogonal projection of these components with respect to the Euclidean inner product on R3 . As a direct consequence of the previous lemma all solutions of the perturbed optimality conditions (2.9) are contained in a bounded set of Wq . C OROLLARY 4.3. Let (A1)–(A5)q hold. Then for any µ0 > 0 there exists a constant Cµ0 > 0 such that for all 0 < µ ≤ µ0 any solution wµ = (yµ , uµ , λµ , za,µ , zb,µ ) of (2.9) is in Wq and kwµ kWq ≤ Cµ0 . Proof. Since uµ ∈ B we have kuµ kq ≤ kakq +kbkq =: Cu . By Remark 2.1 u ∈ U → y(u) ∈ Y is Lipschitz continuous on B and therefore kyµ kY = ky(uµ )kY ≤ Cy with a constant Cy . Now (2.9) yields λµ = −cy (xµ )−∗ Jy (xµ ) and by (A5)q the right hand side is uniformly bounded in Σ, since xµ lies in a bounded subset of Y × B. Finally, this implies with (A5)q that the right hand side of (4.2) is uniformly bounded. The proof is complete. If (A5)∞ is satisfied then we can deduce immediately that solutions of the barrier problem are true interior points. In fact, we have the simple C OROLLARY 4.4. Let (A1)–(A5)∞ hold. Then for any µ0 > 0 there exists a constant Cµ0 > 0 such that for all 0 < µ ≤ µ0 any solution wµ = (yµ , uµ , λµ , za,µ , zb,µ ) of (2.9) satisfies uµ − a ≥

µ , Cµ0

b − uµ ≥

µ . Cµ0

Proof. Corollary 4.3 yields a constant Cµ0 > 0 with kza,µ k∞ , kzb,µ k∞ ≤ kwµ kW∞ ≤ Cµ0 . Now the last two equations in (2.9) yield uµ − a =

µ za,µ



µ , Cµ0

b − uµ =

µ zb,µ



µ . Cµ0

R EMARK 4.5. For linear elliptic control problems, which satisfy (A1)–(A5)∞ , this result was shown in [14], where it is used to prove the existence of solutions for (2.5). We used a different proof to cover also the more general case that (A5)q holds only for some q < ∞. 13

We show next, that the dual variables λµ , za,µ , zb,µ depend continuously on the primal variables yµ , uµ . L EMMA 4.6. Let (A1)–(A5)q hold and let (yµ , uµ , λµ , za,µ , zb,µ ) be a solution of (2.9). Then λµ = −cy (xµ )−∗ Jy (xµ ) and for any measurable sets M, N ⊂ Ω one has µ , za,µ |M = uµ − a M µ zb,µ |N = , (4.3) b − uµ N

za,µ |N = (zb,µ + Ju (xµ ) + cu (xµ )∗ λµ )|N

zb,µ |M = (za,µ − Ju (xµ ) − cu (xµ )∗ λµ )|M .

Moreover, if uη → uµ in Lq (Ω) as η → µ then (uη , yη , λη , za,η , zb,η ) → (uµ , yµ , λµ , za,µ , zb,µ )

in Wq .

Proof. The equations for λµ , za,µ , and zb,µ follow directly from (2.9). Now assume that uη → uµ in Lq (Ω) as η → µ. Then yη = y(uη ) → yµ = y(uµ ) in Y follows from (A1)–(A3), see Remark 2.1. Moreover, λη = −cy (xη )−∗ Jy (xη ) → λµ = −cy (xµ )−∗ Jy (xµ )

in Σ

is a direct consequence of (A5)q . Finally, we know by Lemma 4.1 that za,η , zb,η ∈ Lq (Ω) for all η > 0. Now we use the formulas (4.3) with M = Mη := {uη ≥ a + (b − a)/4, uµ ≥ a + (b − a)/4} ,

N = Nη := {uη ≤ a + 3(b − a)/4, uµ ≤ a + 3(b − a)/4} \ Mη .

Then we obtain on Mη by (4.3)   η µ|uη − uµ | µ |η − µ| |(za,η − za,µ )Mη | = + − ≤ uη − a uµ − a Mη uη − a Mη (uη − a)(uµ − a) Mη ≤

4|η − µ| 16µ + 2 |(uη − uµ )Mη | ν ν

and thus k(za,η − za,µ )|Mη kq ≤ C(|η − µ| + k(uη − uµ )|Mη kq ) → 0

as η → µ.

Now (4.3) yields with (A5)q k(zb,η − zb,µ )|Mη kq ≤ k(za,η − za,µ )|Mη kq + kJu (xη ) − Ju (xµ )kq + kcu (xη )∗ λη − cu (xµ )∗ λµ )kq → 0

as η → µ.

In the same way we obtain k(za,η − za,µ )|Nη kq + k(zb,η − zb,µ )|Nη kq → 0 14

as η → µ.

Next, we show that for µ/3 < η < 3µ, there holds on Jη := Ω \ (Mη ∪ Nη ): sgn(za,η − za,µ )|Jη = − sgn(zb,η − zb,µ )|Jη .

(4.4) In fact, consider

Jη1 = {uη > a + 3(b − a)/4, uµ < a + (b − a)/4} . Then (za,η − za,µ )Jη1 =



η µ − uη − a uµ − a

(zb,η − zb,µ )Jη1 =



µ η − b − uη b − uµ



Jη1









 4η 1 − 4µ 0 (b − a)Jη1

for η > µ/3. Now, consider Jη2 = {uη < a + (b − a)/4, uµ > a + 3(b − a)/4} . Then (za,η − za,µ )Jη1 =



η µ − uη − a uµ − a



η µ − b − uη b − uµ



Jη1

  1 4µ >0 ≥ 4η − 3 (b − a)Jη2

for η > µ/3. (zb,η − zb,µ )Jη1 =



Jη1





 1 4η − 4µ p hold. 5.1. Primal-dual Newton system. For convenience, we will use the abbreviations ua = u − a,

ub = b − u.

The formal application of Newton’s method to the perturbed KKT-system (2.9) yields with the multiplication operators Za := za · I,

Zb := zb · I,

Ua := (u − a) · I, 15

Ub := (b − u) · I

the primal-dual Newton system  ℓyy ℓyu c∗y  ℓuy ℓuu c∗ u   cy cu 0   0 Za 0 0 −Zb 0

0 −I 0 Ua 0

We write this briefly as

0 I 0 0 Ub

     

sy su sλ sa sb





     = −    

ℓy (y, u, λ, za , zb ) ℓu (y, u, λ, za , zb ) c(y, u) za (u − a) − µ zb (b − u) − µ



  .  

DFµ (w) s = −Fµ (w). To ensure a certain quality of the primal-dual Newton step, we will keep the iteration in the following wide neighborhood of the central path ( N−∞,q (µ) :=

(y, u, λ, za , zb ) ∈ Y × B × Σ × Lq × Lq :

(u − a)za ≥ γµ, (b − u)zb ≥ γµ,

2 max(µ−∞ , µ/γ) 2 max(µ−∞ , µ/γ) , zb |{u(b+a)/2} ≤

with constants γ ∈ (0, 1), µ−∞ > 0. By Corollary 4.3 any solution of (2.9) is contained in N−∞,q (µ). We note that N−∞,q (µ) is similar to the usual wide neighborhood used by many finite dimensional primal-dual interior-point methods. The difference are the upper bounds on za |{u≥(b+a)/2} and zb |{u≤(b+a)/2} , which excludes too large multipliers at points with strongly inactive bound constraints. Our neighborhood is still a wide neighborhood, since it approaches for µ ց 0 the usual wide neighborhood and since for γ ց 0 it exhausts the set {(y, u, λ, za , zb ) ∈ Y × B × Σ × Lq × Lq : za , zb ≥ 0}. The neighborhood poses no constraints on y and λ, but it poses pointwise a.e. constraints of the form (u(ξ), za (ξ), zb (ξ)) ∈ Nξ ⊂ R3 on the triple (u, za , zb ). A sketch of the set Nξ is depicted in Figure 5.1. It is the union of two congruent convex sets. The boundary is composed of planes and of hyperbolas that are extruded along the za - and zb -axis, respectively, see also Remark 7.2. 5.2. Regularity of the primal-dual Newton system. We will now show that under a coercitivˆ and under assumptions (A1)-(A5)q for all w ∈ N−∞,q (µ) ity condition for the reduced Hessian H the operator DFµ (w) has a right inverse DFµ (w)−1 ∈ L(Wt′ , Wt ), t ∈ [p, q], with kDFµ(w)−1 kW ′ ,Wt ≤ t

C √ . min(1, µ)

Moreover, if we premultiply DFµ (w) by the scaling operator  I  I   I (5.1) S(w) =   (Ua + Za )−1

 (Ub + Zb )−1

    

we will even show that S(w)DFµ (w) ∈ L(Wt′ , Wt ), t ∈ [p, q], is boundedly invertible with k(S(w)DFµ(w))−1 kW ′ ,Wt ≤ C. t

16

z

b

0

z

b

a

u 0 a

F IG . 5.1. Sketch of the pointwise (u, za , zb )-neighborhood Nξ .

Setting ˆa = (Ua + Za )−1 Ua , U Zˆa = (Ua + Za )−1 Za ,

(5.2)

ˆb = (Ub + Zb )−1 Ub , U Zˆb = (Ub + Zb )−1 Zb ,

ˆa + Zˆa = I, U ˆb + Zˆb = I and we have U 

  S(w)DFµ (w) =   

ℓyy ℓuy cy 0 0

c∗y c∗u 0 0 0

ℓyu ℓuu cu Zˆa −Zˆb

0 −I 0 ˆa U 0

0 I 0 0 ˆb U

     

where we omit the arguments. For convenience, we use also the abbreviations u ˆa =

ua , ua + za

zˆa =

za , ua + za

u ˆb =

ub , ub + zb

zˆb =

zb . ub + zb

We show first that S(w)DFµ (w) has a bounded inverse. This fact will play an essential role in this paper. L EMMA 5.1. Let (A1)–(A5)q hold for some q ∈]2, ∞] and let w = (y, u, λ, za , zb ) ∈ N−∞,q (µ) for some γ ∈ (0, 1), µ−∞ > 0. ˆ If the reduced Hessian H(y, u, λ) in (3.1) satisfies (5.3)

2 ˆ (v, H(y, u, λ)v) ≥ αkvk2

∀ v ∈ L2 (Ω)

with some α > 0 then S(w)DFµ (w) has a bounded inverse in L(Wt′ , Wt ) for all t ∈ [p, q] and k(S(w)DFµ (w))−1 kW ′ ,Wt ≤ C t

with a constant C > 0. The constant C can be chosen uniformly on bounded subsets of {(µ, w) ∈ (0, ∞) × N−∞,q (µ)} on which (5.3) holds uniformly. 17

Moreover, DFµ (w) has a bounded right inverse (in the case q = ∞ even a bounded inverse) DFµ (w)−1 ∈ L(Wt′ , Wt ), t ∈ [p, q], with kDFµ(w)−1 kW ′ ,Wt ≤ t

C √ with the above constant C. where C ′ = min(1,2 γ) Proof. We consider the equation  ℓyy ℓyu c∗y 0 0  ℓuy ℓuu c∗u −I I   cy cu 0 0 0 (5.4)  ˆa 0  0 Zˆa 0 U ˆb 0 −Zˆb 0 0 U

C′ √ , min(1, µ)

     

sy su sλ sa sb





    =    

ry ru rλ ra rb

     

with r = (ry , ru , rλ , ra , rb ) ∈ Wt′ , t ∈ [p, q], where we omit the arguments. We note that by (A5)q the operator S(w)DFµ (w) on the left hand side of (5.4) is in L(Wt , Wt′ ), p ≤ t ≤ q. Elimination with the last two block rows and subsequently with the first and third block row ˆ yields as above with H(y, u, λ) in (3.1) the reduced system ˆ ˆ −1 Zˆa + U ˆ −1 Zˆb )su = (H(y, u, λ) + U a b −1 −1 ˆ ˆ = ru + Ua ra − Ub rb + B1 rλ − B2 ry =: ru′ .

(5.5) with the abbreviations

−1 −1 B1 = c∗u c−∗ y ℓyy cy − ℓuy cy ,

B2 = c∗u c−∗ y .

Note that B1 ∈ L(Λ, Lq (Ω)), B2 ∈ L(V, Lq (Ω)) by (A5)q . Let 0 < ε ≤ 1/2 and define I1 = {ˆ ua ≤ ε, u ˆb > ε}, I2 = {ˆ ub ≤ ε, u ˆa > ε}, I3 = {ˆ ua ≤ ε, u ˆb ≤ ε} and I4 = Ω \ (I1 ∪ I2 ∪ I3 ). Multiply (5.5) by ˆa |I + U ˆb |I + min(U ˆa , U ˆb )|I , D = εI|I4 + U 1 2 3 ˆa |I = u where I|I4 = 1I4 I, U ˆa 1I1 , . . .. 1 Then we obtain on I1 ˆa |I H(y, ˆ ˆa U ˆ −1 Zˆb )|I )su = (U ˆa ru + ra − U ˆa U ˆ −1 rb + U ˆa B1 rλ − U ˆa B2 ry )|I . (U u, λ) + (Zˆa + U 1 1 1 b b We have 0≤u ˆa u ˆ−1 b |I1 ≤ 1,

zˆa |I1 ≥ 1/2.

Thus, the right hand side is pointwise ≤ |ru | + |ra | + |rb | + |B1 rλ | + |B2 ry | and the operator on the ˆ with a function δ ∈ L∞ (Ω), δ ≥ 1/2. left has the form (δ|I1 I + D|I1 H) ˆab := min(ˆ On I2 the situation is analogous. Similarly, we have on I3 with U ua , u ˆb ) · I ˆab |I H(y, ˆ ˆab U ˆa−1 Zˆa + U ˆab U ˆ −1 Zˆb )|I )su (U u, λ) + (U 3 3 b −1 −1 ˆ ˆ ˆ ˆ ˆ ˆ ˆab B2 ry )|I . = (Uab ru + Uab Ua ra − Uab Ub rb + Uab B1 rλ − U 3 18

Again, the right hand side is pointwise ≤ |ru | + |ra | + |rb | + |B1 rλ | + |B2 ry | and the operator on ˆ with δ ∈ L∞ (Ω), δ ≥ 1/2. the left has the form δ|I3 I + D|I3 H) On I4 we obtain ˆ ˆ −1 Zˆa + U ˆ −1 Zˆb )|I )su = ε(ru + U ˆ −1 ra − U ˆ −1 rb + B1 rλ − B2 ry )|I . (εI|I4 H(y, u, λ) + ε(U a a 4 4 b b Since u ˆa |I4 > ε and u ˆb |I4 > ε the right hand side is ≤ ε|ru | + |ra | + |rb | + ε|B1 rλ | + ε|B2 ry |. The ˆ with δ ≥ 0. operator has on I4 the form δI4 I + εII4 H ˆ with Thus, after multiplication with D the operator on the left hand side has the form δI + D H δ|I1 ∪I2 ∪I3 ≥ 1/2 and δ|I4 ≥ 0 and the right hand side is pointwise ≤ |ru | + |ra | + |rb | + |B1 rλ | + |B2 ry |. Moreover, we have (5.6)

k|ru | + |ra | + |rb | + |B1 rλ | + |B2 ry |kt ≤ (3 + kB1 kΛ,Lt + kB2 kΣ,Lt )krkW ′ . t

Let without restriction α ≤ 1/2 in (5.3). Then we have for all s ∈ L2 (Ω) with the abbreviations si = s|Ii , i = 1, . . . , 4, ˆ (s, (δI + D H)s) ≥ α(s1 , s1 ) + α(s2 , s2 ) + α(s3 , s3 ) + εα(s4 , s4 ) ˆ 1 + s2 + s3 )) + (s1 , U ˆa Hs) ˆ + (s2 , Uˆb Hs) ˆ + ε(s4 , H(s ˆ + (s3 , Uˆab Hs). Using that (ρu − v/ρ, ρu − v/ρ) ≥ 0 for any ρ > 0 and thus 2(u, v) ≤ ρ2 (u, u) +

1 (v, v) ρ2

we obtain 2 ˆa |I Hsk ˆ 2 ˆ 2 ≥ − α ks1 k2 − ε kHsk ˆ ≥ − α ks1 k2 − 1 kU (s1 , Uˆa Hs) 1 2 2 2 2 4 α 4 α

and analogously ˆ ≥ − α ks2 k2 − (s2 , Uˆb Hs) 2 4 ˆa , U ˆb )Hs) ˆ ≥ − α ks3 k2 − (s3 , min(U 2 4

ε2 ˆ 2 kHsk2 , α ε2 ˆ 2 kHsk2 . α

Finally, ˆ 1 + s2 + s3 )k2 . ˆ 1 + s2 + s3 )) ≥ − εα ks4 k2 − ε kH(s ε(s4 , H(s 2 2 4 α Now set 2

ˆ 2 2 ). ε = min(α, α2 )/(1 + 16kHk L ,L 2

2

2

2

2

Since ksk2 = ks1 k2 + ks2 k2 + ks3 k2 + ks4 k2 , inserting these estimates yields α α α εα ˆ (s, (δI + D H)s) ≥ (s1 , s1 ) + (s2 , s2 ) + (s3 , s3 ) + (s4 , s4 ). 2 2 2 2 This shows together with (5.6) that (5.7)

ksu k2 ≤

C krkW ′ 2 εα 19

ˆ but not on µ. where C depends only on α and H t ˆ according to To obtain a bound in L -topology we multiply (5.5) by su . By the structure of H (A5)q this yields the pointwise estimate (5.8)

ˆ s su ) + su u ˆ−1 ˆ−1 ˆa + u ˆ−1 ˆb )s2u ≤ su (ru − H 0 ≤ (α0 + u ˆ−1 a ra − su u a z b rb b z + su (B1 rλ ) − su (B2 ry ).

Since u ˆa + zˆa = u ˆb + zˆb = 1, we have u ˆ−1 1 u ˆ−1 1 a a ≤ = ≤ −1 −1 −1 α0 u ˆa + zˆa min(α0 , 1) α0 + u ˆa zˆa + u ˆb zˆb α0 + u ˆa zˆa and analogously u ˆ−1 1 1 b = . ≤ −1 −1 α0 u ˆb + zˆb min(α0 , 1) α0 + u ˆa zˆa + u ˆb zˆb  Division of (5.8) by α0 + u ˆ−1 ˆa + u ˆ−1 ˆb |su | yields a z b z |su | ≤

1 1 ˆ s su + B1 rλ − B2 ry | + |ru − H (|ra | + |rb |). α0 min(α0 , 1)

This yields together with (5.7) for all t ∈ [p, q]  1  ˆ s k 2 t ksu k + kB1 k t krλ k + kB2 k t kry k kru kt + kH ksu kt ≤ Λ V,L 2 Λ,L L ,L V α0 1 (5.9) (kra kt + krb kt ) + min(α0 , 1) ≤ C ′ krkW ′ . t

We derive now also bounds for sy , sλ , sa , sb . We have sy = c−1 y (−cu su + rλ ) and thus Lt ⊂ Lp for t ∈ [p, q] yields −1 ksy kY ≤ kc−1 y cu kLt ,Y ksu kt + kcy kΛ,Y krλ kΛ .

(5.10) Next, we obtain

sλ = c−∗ y (ry − ℓyy sy − ℓyu su ), which yields by (A5)q (5.11)

ksλ kΣ ≤ kc−∗ y kV,Σ (kry kV + kℓyy kY,V ksy kY + kℓyu kLt ,V ksu kt ).

To estimate sa , sb we partition Ω into the sets   2 max(µ−∞ , µ/γ) Ωa = {u > (b + a)/2} ∪ u = (b + a)/2, za ≤ , ν Now (5.4) yields ˆa−1 (ra − Zˆa su )|Ω , sa |Ωa = U a −1 ˆ ˆ sb |Ωc = U (rb + Zb su )|Ωc , a

b

a

sa |Ωca = (ℓuy sy + ℓuu su + c∗u sλ + sb − ru )|Ωca ,

sb |Ωa = (−ℓuy sy − ℓuu su − c∗u sλ + sa + ru )|Ωa . 20

Ωca = Ω \ Ωa .

By the definition of the neighborhood N−∞,q (µ) we have za |Ωa ≤

(5.12)

2 max(µ−∞ , µ/γ) =: Cµ , ν

zb |Ωca ≤

2 max(µ−∞ , µ/γ) = Cµ ν

and thus u ˆa |Ωa

ua 1 = ≥ , ua + za Ωa 1 + 2Cµ ν −1

u ˆb |Ωca

1 ub ≥ = . ub + zb Ωc 1 + 2Cµ ν −1 a

Using that 0 ≤ zˆa ≤ 1, 0 ≤ zˆb ≤ 1 this yields for all t ∈ [p, q]

ksa |Ωa kt ≤ (1 + 2Cµ ν −1 )(kra kt + ksu kt ) ksb |Ωca kt ≤ (1 + 2Cµ ν −1 )(krb kt + ksu kt )

ksa |Ωca kt ≤ kluy kY,Lt ksy kY + kluu kLt ,Lt ksu kt

+ kc∗u kΣ,Lt ksλ kΣ + ksb |Ωca kt + kru kt

ksb |Ωa kt ≤ kluy kY,Lt ksy kY + kluu kLt ,Lt ksu kt

+ kc∗u kΣ,Lt ksλ kΣ + ksa |Ωa kt + kru kt .

We conclude that ksa kt ≤ (1 + 2Cµ ν −1 )(kra kt + ksu kt )

(5.13)

+ C ′ (ksy kY + ksu kt + ksλ kΣ + kru kt ),

ksb kt ≤ (1 + 2Cµ ν −1 )(krb kt + ksu kt )

(5.14)

+ C ′ (ksy kY + ksu kt + ksλ kΣ + kru kt ).

Now (5.9), (5.10), (5.11), (5.13), (5.14) yield k(S(w)DFµ (w))−1 kW ′ ,Wt ≤ C ′′ . t

It is easy to check that C ′′ can be chosen uniformly on bounded subsets of {(µ, w) ∈ (0, ∞) × N−∞,q (µ)} on which (5.3) holds uniformly. Finally, the definition of the neighborhood N−∞,q (µ) yields √ √ ua + za ≥ 2 ua za ≥ 2 γµ,

√ √ ub + zb ≥ 2 ub zb ≥ 2 γµ.

Therefore, the scaling matrix S(w) in (5.1) satisfies kS(w)kW ′ ,W ′ ≤ t

t

1 √ min(1, 2 γµ)

and is invertible. Thus, DFµ (w)−1 = (S(w)DFµ (w))−1 S(w) and kDFµ (w)−1 kW ′ ,Wt = k(S(w)DFµ (w))−1 S(w)kW ′ ,Wt t

t

≤ k(S(w)DFµ (w))−1 kW ′ ,Wt kS(w)kW ′ ,W ′ ≤ t

t

t

C ′′ . √ min(1, 2 γµ)

We have the following variant of Lemma 5.1 that will be useful for showing the H¨older-continuity of the central path. 21

L EMMA 5.2. Let the assumptions of Lemma 5.1 hold, but assume only that w = (y, u, λ, za , zb ) ∈

(5.15)

1 (N−∞,q (µ1 ) + N−∞,q (µ2 )) 2

with µ1 , µ2 ∈ (0, ∞) and set µ = min(µ1 , µ2 ). Then DFµ (w) has a bounded right inverse DFµ (w)−1 ∈ L(Wt′ , Wt ), t ∈ [p, q], with kDFµ(w)−1 kW ′ ,Wt ≤

(5.16)

t

C √ min(1, µ)

where C > 0 is a constant. The constant C can be chosen uniformly on bounded subsets of   1 2 (µ1 , µ2 , w) ∈ (0, ∞) × (N−∞,q (µ1 ) + N−∞,q (µ2 )) 2 on which (5.3) holds uniformly. R EMARK 5.3. (5.15) is weaker than (y, u, λ, za , zb ) ∈ N−∞,q (µ), since the constraints 2 max(µ−∞ , µ/γ) , ν 2 max(µ−∞ , µ/γ) min(za |{u=(b+a)/2} , zb |{u=(b+a)/2} ) ≤ ν

za |{u>(b+a)/2} ≤

2 max(µ−∞ , µ/γ) , ν

zb |{u 0 with za |{u>(b+a)/2} ≤

(5.19)

zb |{u (b + a)/2} ∪ u ˜ = (b + a)/2, z˜a ≤ ν    2 max(µ−∞ , µ2 /γ) ∩ {¯ u > (b + a)/2} ∪ u ¯ = (b + a)/2, z¯a ≤ , ν    2 max(µ−∞ , µ1 /γ) Ωb = {˜ u < (b + a)/2} ∪ u ˜ = (b + a)/2, z˜a > ν    2 max(µ−∞ , µ2 /γ) . ∩ {¯ u < (b + a)/2} ∪ u ¯ = (b + a)/2, z¯a > ν This yields by using µ1 ≤ µ2 ua |Ωa

b−a , ≥ 2

za |Ωa

1 ≤ 2

2 max(µ−∞ , µγ1 )

zb |Ωb

1 ≤ 2

2 max(µ−∞ , µγ1 )

ν

+

2 max(µ−∞ , µγ2 )

+

2 max(µ−∞ , µγ2 )

ν

!



2 max(µ−∞ , µγ2 ) ν

and similarly ub |Ωb

b−a ≥ , 2

ν

ν

Hence, we have with Ω′ = Ωa ∪ Ωb za |Ωa ≤

2 max(µ−∞ , µ2 /γ) , ν

zb |Ω′ \Ωa ≤

!



2 max(µ−∞ , µγ2 ) ν

.

2 max(µ−∞ , µ2 /γ) . ν

Thus, (5.12) holds on Ω′ instead of Ω with µ2 instead of µ and we obtain exactly as in the proof of Lemma 5.1 the following analogs of (5.13), (5.14) on Ω′ (5.20) (5.21)

ksa |Ω′ kt ≤ (1 + 2Cµ2 ν −1 )(kra kt + ksu kt ) + C ′ (ksy kY + ksu kt + ksλ kΣ + kru kt ), ksb |Ω′ kt ≤ (1 + 2Cµ2 ν −1 )(krb kt + ksu kt ) + C ′ (ksy kY + ksu kt + ksλ kΣ + kru kt ).

It remains to estimate ksa |Ω′′ kt , ksb |Ω′′ kt for the set Ω′′ = Ω \ Ω′ . By the definition of Ω′ we have b−a ν u ˜a + u ¯a ≥ ≥ , ua |Ω′′ = 2 4 4 Ω′′ We now split Ω′′ into the sets n o Ω′′a = |Zˆa su | ≤ |ra | ,

This yields by using (5.2)

ub |Ω′′

b−a u ˜b + u ¯b ν ≥ = ≥ . 2 4 4 Ω′′

n o Ω′′b = |Zˆb su | ≤ |rb | ,

Ω′′r = Ω′′ \ (Ω′′a ∪ Ω′′b ).

2|u + z ||r | a a a ≤ 8ν −1 |ˆ ra |Ω′′a |, − Zˆa su )|Ω′′a | ≤ |sa |Ω′′a | = ′′ ua Ωa 2|ub + zb ||rb | −1 −1 ˆ ˆ |sb |Ω′′b | = |Ub (rb − Zb su )|Ω′′b | ≤ rb |Ω′′b |, ′′ ≤ 8ν |ˆ ub Ω |Uˆa−1 (ra

b

sa |

Ω′′ b

sb |Ω′′a

= (ℓuy sy + ℓuu su + c∗u sλ + sb − ru )|Ω′′b , = (−ℓuy sy − ℓuu su − c∗u sλ + sa + ru )|Ω′′a . 23

Hence, we obtain on Ω′′a ∪ Ω′′b . (5.22) (5.23)

ra kt + C ′ (ksy kY + ksu kt + ksλ kΣ + kru kt ), ksa |Ω′′a ∪Ω′′b kt ≤ 8ν −1 kˆ rb kt + C ′ (ksy kY + ksu kt + ksλ kΣ + kru kt ). ksb |Ω′′a ∪Ω′′b kt ≤ 8ν −1 kˆ

Finally, we have on Ω′′r ˆa−1 (ra − Zˆa su )|Ω′′ , sa |Ω′′r = U r

ˆ −1 (rb + Zˆb su )|Ω′′ sb |Ω′′r = U b r

and thus by the definition of Ω′′r sgn(sa |Ω′′r ) = − sgn(su |Ω′′r ), sgn(sb |Ω′′r ) = sgn(su |Ω′′r ).

Hence, the second line in (5.17) yields |sa |Ω′′r | + |sb |Ω′′r | = |(sb − sa )|Ω′′r | = |ru − ℓuy sy − ℓuu su − c∗u sλ |Ω′′r . Therefore, (5.20), (5.21) hold also on Ω′′r and we have shown that ksa kt ≤ 8ν −1 kˆ ra kt + 2(1 + 2Cµ2 ν −1 )(kra kt + ksu kt ) + 2C ′ (ksy kY + ksu kt + ksλ kΣ + kru kt ),

ksb kt ≤ 8ν −1 kˆ rb kt + 2(1 + 2Cµ2 ν −1 )(krb kt + ksu kt ) + 2C ′ (ksy kY + ksu kt + ksλ kΣ + kru kt ).

Together with (5.19) we conclude that the solution of (5.17) satisfies for all t ∈ [p, q] kskWt ≤ C ′′ (krkW ′ + kˆ r kWt ), t

where C ′′ can be chosen uniformly for all µ1 , µ2 ∈ (0, µ0 ] only depending on µ0 . Since µ = min(µ1 , µ2 ), we obtain as at the end of the proof of Lemma 5.1 √ √ ua + za ≥ 2 γµ, ub + zb ≥ 2 γµ and thus by the definition of r in (5.17)   1 krkWt′ ≤ max 1, √ kˆ r kWt′ . 2 γµ Therefore, (5.16) is proven, where the constant C can be chosen uniformly on bounded subsets of  (µ1 , µ2 , w) ∈ (0, ∞)2 × 21 (N−∞,q (µ1 ) + N−∞,q (µ2 )) on which (5.3) holds uniformly.

6. H¨older continuity of the central path. We will now state conditions under which the central path defines a H¨older continuous curve that converges for µ ց 0 to a solution of (2.1). The analysis of the central path is quite obvious if (A5)q holds for q = ∞ and more involved in the case q < ∞. This is caused by the fact that (u, z) ∈ (B, k · kq ) × Lq 7→ uz ∈ Lq is only differentiable in the case q = ∞. Otherwise we have to weaken the image space to achieve differentiability. More precisely we have the following result. L EMMA 6.1. Let Z be an open bounded set in L∞ . Then for any p < q ≤ ∞ the mapping u, z ∈ (Z, k · kq ) × Lq 7→ uz ∈ Lp 24

is continuously differentiable and the remainder term satisfies the estimate k(u + u′ )(z + z ′ ) − uz − (uz ′ + zu′ )kp = ku′ z ′ kp ≤ ku′ kpq/(q−p) kz ′ kq . If p/(q − p) > 1 then the remainder term can further be estimated by (q−p)/p

ku′ kpq/(q−p) kz ′ kq ≤ ku′ kq

1−(q−p)/p

ku′ k∞

kz ′ kq .

Moreover, u, z ∈ L∞ × L∞ 7→ uz ∈ L∞ is continuously differentiable and the remainder term satisfies k(u + u′ )(z + z ′ ) − uz − (uz ′ + zu′ )k∞ = ku′ z ′ k∞ ≤ ku′ k∞ kz ′ k∞ . Proof. By the H¨older inequality the remainder term can be estimated as follows 1/p

1/p

k(u + u′ )(z + z ′ ) − uz − (uz ′ + zu′ )kp = ku′ z ′ kp ≤ k(u′ )p kq/(q−p) k(z ′ )p kq/p = ku′ kpq/(q−p) kz ′ kq .

In particular, this shows that for any p < q ≤ ∞ the mapping u, z ∈ (Z, k · kq ) × Lq 7→ uz ∈ Lp is continuously differentiable. If p/(q − p) > 1 then interpolation between Lq and L∞ yields (q−p)/p

ku′ kpq/(q−p) kz ′ kq ≤ ku′ kq

1−(q−p)/p

ku′ k∞

kz ′ kq .

Finally, u, z ∈ L∞ × L∞ 7→ uz ∈ L∞ is continuously differentiable, since the remainder term can be estimated by k(u + u′ )(z + z ′ ) − uz − (uz ′ + zu′ )k∞ = ku′ z ′ k∞ ≤ ku′ k∞ kz ′ k∞ . We consider now first the case that q = ∞. L EMMA 6.2. Let (A1)–(A4) and (A5)q with q = ∞ hold. If u ∈ B 7→ J (y(u), u) is convex then for any µ > 0 the central path µ ∈ (0, ∞) → w(µ) ∈ W∞ according to (2.9) is well defined. If for µ0 > 0 the reduced Hessian satisfies 2 ˆ (v, H(y(µ), u(µ), λ(µ))v) ≥ αkvk2

∀ v ∈ L2 (Ω), ∀ µ ∈ (0, µ0 ]

with some α > 0 then the central path µ ∈ (0, µ0 ] → w(µ) ∈ W∞ is continuously differentiable, satisfies C kw(µ)k ˙ W∞ ≤ √ µ

∀ µ ∈ (0, µ0 ]

with a constant C > 0 and is thus H¨older-continuous with index 1/2. More precisely, we have with L = 2C p √ √ (6.1) kw(µ1 ) − w(µ2 )kW∞ ≤ L| µ1 − µ2 | ≤ L |µ1 − µ2 | ∀ µ1 , µ2 ∈ (0, µ0 ].

Moreover, w ¯ = limµց0 w(µ) exists in W∞ , w ¯ ∈ W∞ satisfies the KKT-conditions (2.9) and (¯ y, u ¯) is global solution of (2.1). 25

Proof. By assumption the reduced objective function u ∈ B → J (y(u), u) is convex. Since the barrier term ist strictly convex, the barrier problem (2.5) has therefore the strictly convex reduced objective function u 7→ Jµ (y(u), u), and thus the solution (yµ , uµ ) = (y(uµ ), uµ ) provided by Proposition 2.3 is the unique solution of (2.5). Now also λµ , za,µ , zb,µ are uniquely determined by the first and the last two equations in (2.9). Thus, together with Corollary 4.3 the central path µ ∈ (0, ∞) → w(µ) ∈ W∞ is well defined and bounded on bounded subsets (0, µ0 ]. ′ is by (A1)–(A5)∞ and by Lemma 6.1 continuously differenThe mapping Fµ : W∞ → W∞ tiable. For µ > 0 the primal-dual central path µ 7→ w(µ) := (y, u, λ, za , zb )(µ) given by (2.9) is the unique solution of Fµ ((y, u, λ, za , zb )(µ)) = 0,

a ≤ u(µ) ≤ b.

′ Since w(µ) ∈ N−∞,∞ (µ), DFµ (w(µ)) ∈ L(W∞ , W∞ ) has by Lemma 5.1 a bounded inverse. Thus, the implicit function theorem shows that µ → w(µ) is continuously differentiable and that the derivative w.r.t. µ satisfies   0  0    . 0 (6.2) DFµ (w(µ))w(µ) ˙ =    1  1

For fixed µ0 > 0 Lemma 5.1 yields a constant C > 0 with kDFµ (w(µ))−1 kW ′

∞ ,W∞

C √ µ

µ ∈ (0, µ0 ]. With (6.2) we conclude that kw(µ)k ˙ W∞ ≤ 0 < µ1 < µ2 ≤ µ0 Z µ2 Z kw(µ2 ) − w(µ1 )kW∞ ≤ kw(µ)k ˙ W∞ dµ ≤ µ1

C √ µ

for all

for all µ ∈ (0, µ0 ]. This gives for all

µ2 µ1



C √ √ √ dµ = 2C( µ2 − µ1 ) µ

√ µ2 − µ1 = 2C √ √ ≤ 2C µ2 − µ1 . µ2 + µ1

This shows that w(·) ∈ C 1/2 ((0, µ0 ]; W∞ ) for any µ0 > 0. Hence, the central path is H¨oldercontinuous in W∞ and admits a continuation until µ = 0, i.e., w ¯ = limµց0 w(µ) exists in W∞ . By continuity, w ¯ satisfies F0 (w) ¯ = 0, which are just the KKT-conditions (2.4). Consequently, (¯ y, u¯) is a global solution of (2.1), since the reduced objective functional and B are convex. For the general case we use the following auxiliary result. L EMMA 6.3. Let (A1)–(A4) and (A5)q with some p < q < ∞ hold and let u ∈ B → J (y(u), u) be convex. Then for any µ > 0 the central path µ ∈ (0, ∞) → w(µ) ∈ Wq according to (2.9) is well defined. If for µ0 > 0 the reduced Hessian satisfies 2 ˆ (v, H(y(µ), u(µ), λ(µ))v) ≥ αkvk2

∀ v ∈ L2 (Ω), ∀ µ ∈ (0, µ0 ]

with some α > 0 then the central path µ ∈ (0, µ0 ] → w(µ) ∈ Wq is continuous. Proof. As at the beginning of the proof of Lemma 6.2 the barrier problem (2.5) has the strictly convex reduced objective function u 7→ Jµ (y(u), u), and thus the solution (¯ y, u ¯) = (y(¯ u), u ¯) pro¯ z¯a , z¯b are uniquely determined vided by Proposition 2.3 is the unique solution of (2.5). Now also λ, by the first and the last two equations in (2.9). Thus, together with Corollary 4.3 the central path µ ∈ (0, ∞) → w(µ) ∈ Wq is well defined and bounded on bounded subsets (0, µ0 ]. We show next the continuity of the central path. Let 0 < µ ¯ be given and let µ ¯ ≤ µ < η be arbitrary. We write Jµ (y, u) = J (y, u) + µB(u) 26

with the log-barrier term B. We know that B(u) ≥ −cB on B with some cB ≥ 0. Now we have for the solutions uµ , uη of the barrier problems (2.5) Jµ (y(uµ ), uµ ) ≤ Jµ (y(uη ), uη ) = Jη (y(uη ), uη ) + (µ − η)B(uη ) ≤ Jη (y(uµ ), uµ ) + (µ − η)B(uη )

= Jµ (y(uµ ), uµ ) + (µ − η)(B(uη ) − B(uµ ))

≤ Jµ (y(uµ ), uµ ) + (η − µ)(cB + B(uµ )). The first and third row shows that B(uµ ) − B(uη ) ≥ 0 and thus in particular B(uµ ) ≤ B(uµ¯ ). This yields

Jµ (y(uµ ), uµ ) ≤ Jµ (y(uη ), uη ) ≤ Jµ (y(uµ ), uµ ) + (η − µ)(cB + B(uµ¯ )) and thus |Jµ (y(uη ), uη ) − Jµ (y(uµ ), uµ )| ≤ (η − µ)|cB + B(uµ¯ )| → 0

for η − µ → 0, µ ¯ ≤ µ < η.

As we have already observed, (A1)–(A4) imply that the reduced objective functional is twice continuously differentiable. Since the barrier terms are convex and Jµ′ (y(uµ ), uµ ) = 0, we have 2 ˆ Jµ (y(uη ), uη ) − Jµ (y(uµ ), uµ ) ≥ (uη − uµ , H(u(τ ))(uη − uµ )) ≥ αkuη − uµ k2

with u(τ ) = uµ + τ (uη − uµ ) and appropriate τ ∈ [0, 1]. This yields uη − uµ → 0 in L2 for η → µ, η, µ ≥ µ ¯ and thus in all Ls , s < ∞ by interpolation with the uniform L∞ -bound kuη − uµ k∞ ≤ kb − ak∞ . Now Lemma 4.6 yields (uη , yη , λη , za,η , zb,η ) → (uµ , yµ , λµ , za,µ , zb,µ ) in Wq for η → µ, η, µ ≥ µ ¯. L EMMA 6.4. Let the assumptions of Lemma 6.3 hold. Then in addition to Lemma 6.3 the central path µ ∈ (0, µ0 ] → w(µ) ∈ Wq satisfies (6.3)

lim sup η→µ

kw(η) − w(µ)kWq η−µ

C ≤√ µ

∀ µ ∈ (0, µ0 ]

with a constant C > 0 and is thus H¨older-continuous with index 1/2 in all spaces Wt , t ∈ [p, q]. More precisely, we have with L = 2C p √ √ (6.4) kw(µ1 ) − w(µ2 )kWt ≤ L| µ1 − µ2 | ≤ L |µ1 − µ2 | ∀ µ1 , µ2 ∈ (0, µ0 ].

Moreover, w ¯ = limµց0 w(µ) exists in Wq , w ¯ ∈ Wq satisfies the KKT-conditions (2.9) and (¯ y, u ¯) is global solution of (2.1). Proof. By assumptions (A1), (A5q ) the first three components of the mapping Fµ : Wq → V × Lq × Λ × Lq/2 × Lq/2

are continuously differentiable. Moreover, we have ua (η)za (η) − ua (µ)za (µ) =

za (η) + za (µ) ua (η) + ua (µ) (za (η) − za (µ)) + (ua (η) − ua (µ)). 2 2 27

Therefore, we obtain with   za (η) + za (µ) zb (η) + zb (µ) u(η) + u(µ) , λ(µ), , w(µ) ˜ := y(µ), 2 2 2 the identity  (6.5)

    

0 0 0 η−µ η−µ



   = Fµ (w(η)) − Fµ (w(µ))  

= DFµ (w(µ))(w(η) ˜ − w(µ)) + oWq′ (kw(η) − w(µ)kWq ),

where koWq′ (kw(η) − w(µ)kWq )k

Wq′

kw(η) − w(µ)kWq

→0

for η → µ,

since kw(η) − w(µ)kWq → 0 for η → µ by Lemma 6.3. Since w(η) ˜ ∈

1 (N−∞,q (µ) + N−∞,q (η)), 2

Lemma 5.2 yields for the right inverse C −1 . kDFµ (w(η)) ˜ kW ′ .Wq ≤ p q min(µ, η)

For |η − µ| ≤ ε, ε > 0 small enough we have with the remainder term in (6.5) clearly

1

−1 ˜ oWq′ (kw(η) − w(µ)kWq ) ≥ kw(η) − w(µ)kWq

w(η) − w(µ) + DFµ (w(η)) 2 Wq

for all |η − µ| ≤ ε. We conclude with (6.5) that



0

 0 kw(η) − w(µ)kWq

 2C

 0 ≤ p  η−µ min(µ, η)

 1

1



    

Wq′

≤p

2CC ′ min(µ, η)

for all |η − µ| ≤ ε.

This shows R µ (6.3) and the H¨older continuity with index 1/2 follows immediately. By writing the integral µ12 √Cµ dµ as a limit of Riemann sums and by using (6.3), we see that again Z µ2 C √ √ kw(µ1 ) − w(µ2 )kWt ≤ √ dµ = 2C| µ1 − µ2 | µ µ1 p ≤ 2C |µ1 − µ2 | ∀ µ1 , µ2 ∈ (0, µ0 ].

The fact that w ¯ = limµց0 w(µ) exists in Wq , satisfies (2.4) and is global solution of (2.1) follows now exactly as at the end of the proof of Lemma 6.2. R EMARK 6.5. As usual for interior-point methods, we used a convexity assumption to ensure that the central path is a uniquely defined curve. Here and in the following, extensions to more general, non-convex, settings are possible by considering neighborhoods where the central path is a unique curve. 28

7. A primal-dual interior-point method. Let (A1)–(A5)q hold. The previous considerations show that for w ∈ N−∞,q (µ) the solution s of the primal-dual Newton system DFµ(w)s = −Fµ (w) is only contained in Wq . Therefore, in the case q < ∞ we cannot ensure w + αs ∈ N−∞,q (µ) by choosing an appropriate stepsize α ∈ (0, 1]. Instead, we use in addition a projection onto the neighborhood N−∞,q (µ). D EFINITION 7.1. We denote by Pµ a projection onto N−∞,q (µ) in Wq , i.e. n o Pµ (w) ∈ N−∞,q (µ), kPµ (w) − wkWq = min kw ˜ − wkWq : w ˜ ∈ N−∞,q (µ) .

If more than one projection point exists, Pµ selects one of them. Obviously,

Pµ (y, u, λ, za , zb ) = (y, ∗, λ, ∗, ∗), i.e. Pµ does not change the y- and λ-component. Furthermore, the projection does not depend on q, since it reduces to a pointwise projection in R3 with respect to the Euclidean norm of the (u, za , zb )-part. R EMARK 7.2. The form of our neighborhood N−∞,q allows for a pointwise computation of the projection Pµ . In fact, for almost all ξ ∈ Ω we have to project the point (u(ξ), za (ξ), zb (ξ)) ∈ R3 onto the set (see the sketch in Figure 5.1)  Nξ = (v, sa , sb ) ∈ R3 : sa , sb > 0, (v − a(ξ))sa ≥ γµ, (b(ξ) − v)sb ≥ γµ  a(ξ) + b(ξ) 2 max(µ−∞ , µ/γ) , b(ξ)] × [0, ] × [0, ∞)) ∩ ([ 2 ν 2 max(µ−∞ , µ/γ)  a(ξ) + b(ξ) ] × [0, ∞) × [0, ] . ∪ ([a(ξ), 2 ν The first set is convex and the second set is the union of two cuboids of infinite length. There are several possibilities for computing the projection. One approach consists in projecting onto the boundary surfaces and (if necessary) onto the edges. The main work then consists in computing the projection onto the two surfaces generated by the hyperbolic constraints, and possibly onto their intersection. These three projections can be computed one after the other, and we can stop as soon as the point lies in Nξ . They can be reduced to one dimensional least squares problems that can be solved with a Newton- or Gauss-Newton iteration of complexity O(1). All other cases turn out to be easy. We implemented the projection along these lines. It is significantly less expensive than solving the primal-dual Newton system. We show now that Pµ has a Lipschitz constant ≤ 2. L EMMA 7.3. Let (A5)q hold and let wµ = (yµ , uµ , λµ , za,µ , zb,µ ) ∈ Wq be a point on the central path. Then we have kPµ (w) − wµ kWq ≤ 2kw − wµ kWq

∀ w ∈ Wq .

Proof. Since wµ ∈ N−∞,q (µ), we have kPµ (w) − wkWq ≤ kwµ − wkWq . Hence, kPµ (w) − wµ kWq ≤ kPµ (w) − wkWq + kw − wµ kWq ≤ 2kw − wµ kWq . We consider now the following conceptual algorithm. 29

Algorithm PDPF: Projected Primal-Dual Interior-Point Method. 1. Choose µ0 > 0, γ ∈ (0, 1), µ−∞ > 0, σmin ∈ (0, 1), µ0 ∈ (0, µ−∞ ], and w0 := (y0 , u0 , λ0 , za,0 , zb,0 ) ∈ N−∞,q (µ0 ). Set k := 0. 2. Solve the primal-dual Newton system DFµk (wk )sk = −Fµk (wk ) and set wk+1 := Pµk (wk + sk ). 3. Choose σk ∈ [σmin , 1] and set µk+1 := σk µk . 4. Set k := k + 1 and goto step 2. R EMARK 7.4. The algorithm is formulated here as compact as possible. The method can be augmented by additional features and safeguards. In particular, a stepsize can be introduced to damp the full step if necessary. The choice of the stepsize could be based on a decrease condition for a merit function, the simplest one being an appropriate norm of the residual Fµk . A detailed study of this issue would go beyond the purpose of this paper. We will analyze Algorithm PDPF under assumption (A1)–(A5)∞ . If merely (A1)–(A5)q for q < ∞ hold, we will have to modify the algorithm by introducing a smoothing step, see Algorithm PDPFS in section 9. Appropriate implementations of Algorithm PDPFS will even yield superlinear convergence. 8. Global linear convergence for the L∞ -setting. We assume throughout this section that the assumptions (A1)–(A4) and (A5)∞ hold. 8.1. Quadratic local convergence towards the central path. We show first that the primaldual iteration (8.1)

DFµ (w) s = −Fµ (w),

w+ = w + s.

yields quadratic local convergence towards the central path. L EMMA 8.1. Let µ0 > 0 and ρ0 > 0 be fixed. Assume that (A1)–(A4) and (A5)∞ hold, that u ∈ B 7→ J (y(u), u) is convex (to ensure uniqueness of the central path), and that (8.2)

2 ˆ (v, H(y, u, λ)v) ≥ αkvk2

∀ v ∈ L2 (Ω),

∀ w ∈ N−∞,∞ (µ), kw − w(µ)kW∞ ≤ ρ0 , µ ∈ (0, µ0 ].

Then there exists a constant C > 0 such that for all 0 < µ ≤ µ0 and for all w ∈ N−∞,∞ (µ) with kw − w(µ)k∞ ≤ ρ0 the solution w+ of the primal dual Newton step (8.1) satisfies C 2 kw+ − w(µ)kW∞ ≤ √ kw − w(µ)kW∞ . µ For the projected iterate Pµ (w+ ) ∈ N−∞,∞ (µ) the estimate holds 2C 2 kPµ (w+ ) − w(µ)kW∞ ≤ √ kw − w(µ)kW∞ . µ and thus the projected iteration converges locally with quadratic rate. Proof. We know by Corollary 4.3 that w(µ) is uniformly bounded in W∞ for all µ ∈ (0, µ0 ]. Thus, kw − w(µ)kW∞ ≤ ρ0 implies kwkW∞ ≤ M for some constant M > 0. Hence, all µ, w in (8.2) are contained in a bounded subset of {(µ, w) ∈ (0, ∞) × N−∞,∞ (µ)} and Lemma 5.1 yields a constant C > 0 with kDFµ (w)−1 kW ′

∞ ,W∞

30

C ≤√ . µ

We have DFµ (w)(w+ − w) = −Fµ (w),

(8.3)

DFµ (w)(w(µ) − w(µ)) = −Fµ (w(µ)),

and thus DFµ (w)(w+ − w(µ)) = Fµ (w(µ)) − Fµ (w) − DFµ(w)(w(µ) − w).

(8.4)

′ Since the first three components of Fµ : W∞ → W∞ are by (A1), (A5)q Lipschitz continuously differentiable on bounded subsets, this gives   R1 (w(µ) − w)   R2 (w(µ) − w)   , R3 (w(µ) − w) (8.5) DFµ (w)(w+ − w(µ)) =     (u(µ) − u)(za (µ) − za )  (u(µ) − u)(zb (µ) − zb )

where with a Lipschitz constant L > 0

2

kR1 (w(µ) − w)kV + kR2 (w(µ) − w)k∞ + kR3 (w(µ) − w)kΛ ≤ Lkw(µ) − wkW∞ . Therefore, 2

kw+ − w(µ)kW∞ ≤ kDFµ(w)−1 kW ′

∞ ,W∞

This yields

Lkw(µ) − wkW∞

 + k(u(µ) − u)(za (µ) − za )k∞ + k(u(µ) − u)(zb (µ) − zb )k∞ .

C 2 kw+ − w(µ)kW∞ ≤ √ (2 + L)kw − w(µ)kW∞ . µ Finally, the estimate for Pµ (w+ ) follows from Lemma 7.3. 8.2. Global linear convergence of the interior-point method. The previous result yields linear convergence for a short step method. T HEOREM 8.2. Let µ0 > 0 and ρ0 > 0 be fixed. Assume that (A1)–(A4), (A5)∞ hold, that u ∈ B 7→ J (y(u), u) is convex (to ensure uniqueness of the central path), and that (8.2)

2 ˆ (v, H(y, u, λ)v) ≥ αkvk2

∀ v ∈ L2 (Ω),

∀ w ∈ N−∞,∞ (µ), kw − w(µ)kW∞ ≤ ρ0 , µ ∈ (0, µ0 ].

Then there are constants ρ¯ ∈ (0, ρ0 ] and σ ¯min ∈ (0, 1) such that Algorithm PFPF has the following convergence property: For any starting point w0 ∈ N−∞,∞ (µ0 ) with kw0 − w(µ0 )kW∞ ≤ ρ¯, Algorithm PDPF with σmin ∈ (¯ σmin , 1) generates a sequence wk ∈ N−∞,∞ (µk ) with (8.6) (8.7) (8.8)

√ kwk − w(µk )kW∞ ≤ C µk √ kwk − wk ¯ W∞ ≤ (C + L) µk µk = σ0 · · · σk−1 µ0 .

Here, C > 0 is a constant, L > 0 is the H¨older constant of the central path on (0, µ0 ], and w ¯ = limµց0 w(µ) is the solution of (2.1). In particular, if, for all k, σk ∈ [σmin , σ ¯ ] is chosen with σmin ≤ σ ¯ < 1, then (wk ) converges to w ¯ with R-linear rate. 31

Proof. Consider an arbitrary µ ∈ (0, µ0 ]. Then there exists by Lemma 8.1 a constant C > 0 such that for any w ∈ N−∞,∞ (µ) with kw − w(µ)kW∞ ≤ ρ0 the estimate 2C 2 2 kPµ (w+ ) − w(µ)kW∞ ≤ 2kw+ − w(µ)kW∞ ≤ √ kw − w(µ)kW∞ µ holds, where w+ is the result of the primal-dual Newton step (8.1). Now fix τ ∈ (0, 1) such that √ τ µ0 ≤ ρ0 . (8.9) ρ¯ := 2C Then for any w ∈ N−∞,∞ (µ) with √ τ µ kw − w(µ)kW∞ ≤ , 2C we have (8.10)

kPµ (w+ ) − w(µ)kW∞

√ τ2 µ 2C 2 . ≤ √ kw − w(µ)kW∞ ≤ τ kw − w(µ)kW∞ ≤ µ 2C

Moreover, we have with the H¨older constant L of the central path in (6.1) for 0 < σ < 1 √ √ kw(µ) − w(σµ)kW∞ ≤ L(1 − σ) µ. and thus

√ √ √ τ2 µ + L(1 − σ) µ. 2C Therefore, we can ensure that the new iterate satisfies √ τ σµ , (8.11) kPµ (w+ ) − w(σµ)kW∞ ≤ 2C if σ ∈ (0, 1) is chosen such that √ √ τ2 τ σ + L(1 − σ) ≤ . 2C 2C Since τ ∈ (0, 1), this holds for σ ∈ (0, 1) sufficiently close to 1, more precisely for  2 2 τ + 2LC (8.12) 1>σ≥σ ¯min := . τ + 2LC kPµ (w+ ) − w(σµ)kW∞ ≤

Thus we obtain by induction: If ρ¯ is chosen according to (8.9) and σ ¯min is given by (8.12) then Algorithm PDPF with σmin ∈ [¯ σmin , 1) generates a sequence (wk ) with wk+1 = Pµk (wk + sk ) and (see (8.11)) √ τ µk+1 ≤ ρ¯, µk+1 = σk µk , (8.13) kwk+1 − w(µk+1 )kW∞ ≤ 2C and (see (8.10)) √ τ 2 µk (8.14) kwk+1 − w(µk )kW∞ ≤ ≤ ρ¯. 2C With the solution w ¯ = limµց0 w(µ) of (2.1) we have by (8.13) in addition √ τ µk √ + L µk . kwk − wk ¯ W∞ ≤ kwk − w(µk )kW∞ + kw(µk ) − wk ¯ W∞ ≤ 2C This proves (8.6), (8.7), (8.8). The R-linear convergence follows trivially if σk is bounded above by σ ¯ < 1. 32

9. Global linear and superlinear local convergence for the general Lq -setting. If (A5)q holds only for some p < q < ∞ the convergence analysis is more delicate. Under a strict complementarity assumption and by using an additional smoothing step we will prove global linear convergence in the general Lq -setting. Moreover, we will also show that superlinear local convergence is achieved if µk is reduced fast enough. To keep the presentation compact, we exclude the case q = ∞, since this would have to be discussed separately at several places. Nevertheless, counterparts of the following results can be obtained also for the case q = ∞. We refine our previous analysis as follows. Since under the assumptions of Lemma 5.1 kDFµ(w(µ))−1 kW ′ ,Wt = O(µ−1/2 ), but k(S(w)DFµ(w))−1 kW ′ ,Wt ≤ C, t

t

t ∈ [p, q],

with the scaling operator

(5.1)



  S(w) =   



I I I (Ua + Za )−1

(Ub + Zb )−1

  ,  

we use now for the analysis of the primal-dual Newton iteration instead of (8.4) the scaled equation (9.1)

S(w)DFµ (w)(w+ − w(µ)) = S(w) (Fµ (w(µ)) − Fµ (w) − DFµ (w)(w(µ) − w)) .

Then we use a two-norm technique based on the Lp − Lq norm gap to estimate the right hand side. Independently of the size of µ > 0 this will yield an estimate of the form kw+ − w(µ)kWp = o(kw − w(µ)kWq ). The norm gap will then be closed by using a smoothing step. We recall that with the notations ˆa = (Ua + Za )−1 Ua , U Zˆa = (Ua + Za )−1 Za ,

ˆb = (Ub + Zb )−1 Ub , U Zˆb = (Ub + Zb )−1 Zb ,

ˆa + Zˆa = I, U ˆb + Zˆb = I and we have U 

  S(w)DFµ (w) =   

ℓyy ℓuy cy 0 0

ℓyu ℓuu cu Zˆa −Zˆb

c∗y c∗u 0 0 0

0 −I 0 ˆa U 0

0 I 0 0 ˆb U

     

where we omit the arguments. 9.1. Refined analysis of the primal-dual Newton step. We first prove a similar result as in Lemma 8.1 where we avoid the µ-dependent convergence factor by using a two-norm technique. We will need a strict complementarity assumption. ¯ z¯a , z¯b ) ∈ Y ×U ×Λ∗ ×U ∗ ×U ∗ satisfy the KKT-conditions D EFINITION 9.1. Let w ¯ = (¯ y, u ¯, λ, (2.4). Then strict complementarity holds at w ¯ if leb ({min(¯ ua + z¯a , u ¯b + z¯b ) = 0}) = 0. where leb () is the Lebesgue measure on Ω. 33

We define the function (9.2)

ω(t) = leb ({min(¯ ua + z¯a , u¯b + z¯b ) ≤ t}) .

Under a strict complementarity assumption we then have lim ω(t) = 0.

(9.3)

tց0

If ω(t) = O(tκ ) as t ց 0, we say that strong strict complementarity holds. ¯ z¯a , z¯b ) ∈ Y ×U ×Λ∗ ×U ∗ ×U ∗ satisfy the KKT-conditions D EFINITION 9.2. Let w ¯ = (¯ y, u ¯, λ, (2.4). Then strong strict complementarity holds at w ¯ if there exist constants Cc > 0, κ > 0 such that ω(t) ≤ Cc tκ

(9.4)

∀ t ≥ 0.

We start with the following technical result. L EMMA 9.3. Let the assumptions of Lemma 6.3 hold and let w ¯ = limµց0 w(µ) be the solution of (2.4). Define the function (9.5)

ωµ (t) = leb ({min(ua (µ) + za (µ), ub (µ) + zb (µ)) ≤ t}) .

Then there exists a constant C > 0 with ( =0 q q ωµ (t) ≤ ω(2 max(t, t q+1 )) + C max(t, t q+1 )

√ for t < 2 µ, √ for t ≥ 2 µ

with ω according to (9.2). √ Proof. Since ua (µ)za (µ) = µ, we have ua (µ) + za (µ) ≥ 2 µ and similarly ub (µ) + zb (µ) ≥ √ √ 2 µ. This shows that ωµ (t) = 0 for t < 2 µ. For brevity, we set now ua = ua (µ),

ub = ub (µ),

za = za (µ),

zb = zb (µ).

√ Now let t ≥ 2 µ. Then we have for any s ≥ t ωµ (t) ≤ leb ({min(ua + za , ub + zb ) ≤ s})

≤ leb ({min(¯ ua + z¯a , u¯b + z¯b ) − max(|ua − u ¯a | + |za − z¯a |, |ub − u ¯b | + |zb − z¯b |) ≤ s})

≤ ω(2s) + leb ({max(|ua − u ¯a | + |za − z¯a |, |ub − u ¯b | + |zb − z¯b |) ≥ s})

≤ ω(2s) + leb ({|ua − u ¯a | ≥ s/2}) + leb ({|za − z¯a | ≥ s/2})

+ leb ({|ub − u ¯b | ≥ s/2}) + leb ({|zb − z¯b | ≥ s/2}) q

q

q

q

≤ ω(2s) + (s/2)−q (kua − u ¯a kq + kza − z¯a kq + kub − u ¯b kq + kzb − z¯b kq ).

√ By Lemma 6.4 there exists a H¨older constant L > 0 with kw(µ) − wk ¯ Wq ≤ L µ. This yields ωµ (t) ≤ ω(2s) + 4(s/2)−q Lq µq/2 .

√ Now let t ≥ 2 µ. If t ≥ 1 then we obtain with the choice s = t

ωµ (t) ≤ ω(2t) + 4(t/2)−q Lq µq/2 ≤ ω(2t) + 4Lq ≤ ω(2t) + 4Lq t. If t ≤ 1 then the choice s = tq/(q+1) = t1−1/(q+1) yields ωµ (t) ≤ ω(2tq/(q+1) ) + 4t−q+q/(q+1) (2L)q µq/2 ≤ ω(2tq/(q+1) ) + 4Lq tq/(q+1) . 34

We show now that for kw − w(µ)kWq small enough the result w+ of the primal-dual Newton step satisfies kw+ − w(µ)kWp = o(kw − w(µ)kWq ), where the estimate is uniform in µ ∈ (0, µ0 ]. Thus, in contrast to Lemma 8.1 we avoid the µ-dependent convergence factor but obtain a Wp − Wq norm gap. This norm gap will be closed by using a smoothing step. L EMMA 9.4. Let µ0 > 0 and ρ0 > 0 be fixed. Assume that (A1)–(A4), (A5)q hold with some q ∈ (p, ∞), and that 2

(9.6)

ˆ (v, H(y, u, λ)v) ≥ αkvk2

∀ v ∈ L2 (Ω),

∀ w ∈ N−∞,q (µ), kw − w(µ)kWq ≤ ρ0 , µ ∈ (0, µ0 ].

If in addition w ¯ = limµց0 w(µ) satisfies strict complementarity then there exists a constant C > 0 such that for any 0 < µ ≤ µ0 and for any w ∈ N−∞,q (µ) with kw − w(µ)kWq ≤ ρ0 the solution w+ of the primal dual Newton step (8.1) satisfies kPµ (w+ ) − w(µ)kWp ≤ 2kw+ − w(µ)kWp   ′ η ηq′ (9.7) ≤ 2C ω(4kw − w(µ)kWq )1/q + kw − w(µ)kWq kw − w(µ)kWq = o(kw − w(µ)kWq )

with (9.8)

η=

(q − p) min(p, q − p) , pq(p + 1)

q′ =

qp . q−p

Moreover, if w ¯ satisfies strong strict complementarity (9.4) then (9.9)

kPµ (w+ ) − w(µ)kWp ≤ 2kw+ − w(µ)kWp ≤ 2Ckw − w(µ)k1+η Wq

with (9.10)

η=

min(1, κ)(q − p) min(p, q − p) . p2 (q + 1) + min(1, κ)p(q − p)

Proof. We know by Corollary 4.3 that w(µ) is uniformly bounded in Wq for all µ ∈ (0, µ0 ]. Thus, kw − w(µ)kWq ≤ ρ0 implies kwkWq ≤ M for some constant M > 0. Hence, all µ, w in (8.2) are contained in a bounded subset of {(µ, w) ∈ (0, ∞) × N−∞,q (µ)} and Lemma 5.1 yields a constant C > 0 with k(S(w)DFµ(w))−1 kW ′ ,Wt ≤ C, t

p ≤ t ≤ q,

where S(w) is the scaling operator (5.1). Furthermore, (8.3) and (8.4) hold. Since the first three components of F : Wq → Wp′ are by (A1), (A5)q Lipschitz continuously differentiable on bounded subsets, we obtain (8.5), where with a Lipschitz constant L > 0 the following estimate holds: 2

kR1 (w(µ) − w)kV + kR2 (w(µ) − w)kp + kR3 (w(µ) − w)kΛ ≤ Lkw(µ) − wkWq . To obtain an operator with uniformly bounded inverse on the left hand side we multiply with the scaling operator S(w) in (5.1) and obtain   R1 (w(µ) − w)  R2 (w(µ) − w)      S(w)DFµ (w)(w+ − w(µ)) =  R3 (w(µ) − w)  .  (u(µ)−u)(za(µ)−za )    ua +za (u(µ)−u)(zb (µ)−zb ) ub +zb

35

Therefore, 2

kw+ − w(µ)kWp ≤ k(S(w)DFµ(w))−1 kW ′ ,Wp Lkw(µ) − wkWq p



!

(u(µ) − u)(za (µ) − za )

(u(µ) − u)(zb (µ) − zb )

+

. +



ua + za ub + zb p p

This yields (9.11)

2

kw+ − w(µ)kWp ≤ C(Lkw − w(µ)kWq + kRa kp + kRb kp )

with Ra :=

(u(µ) − u)(za (µ) − za ) , ua + za

Rb :=

(u(µ) − u)(zb (µ) − zb ) . ub + zb

It remains to estimate kRa kp + kRb kp . We show first that for w ∈ N−∞,q (µ)   1 |Ra | ≤ 1 + √ max(|u − u(µ)|, |za − za (µ)|), 2 γ   1 |Rb | ≤ 1 + √ max(|u − u(µ)|, |zb − zb (µ)|). 2 γ To this end we note that (u(µ) − a)za (µ) = µ, (u − a)za ≥ γµ. This yields with ua , za ≥ 0 √ √ ua + za ≥ 2 ua za ≥ 2 γµ. Now we have     −ua ua (µ) za (µ) −za Ra = + + (za (µ) − za ) = (ua (µ) − ua ). ua + za ua + za ua + za ua + za √ √ Since ua (µ)za (µ) = µ we have min(ua (µ), za (µ)) ≤ µ. On {ua (µ) ≤ µ} we have   1 |Ra | ≤ 1 + √ |za − za (µ)| 2 γ √ and on {za (µ) ≤ µ} we have   1 |ua − ua (µ)|. |Ra | ≤ 1 + √ 2 γ The estimate for |Rb | is obtained in the same way. To estimate kRa kp + kRb kp we split Ω for an arbitrary β ∈ (0, min(1, (q − p)/p)) into the sets β

J c = Ω \ J.

J = {ua + za ≥ kw − w(µ)kWq }, We have with ωµ in (9.5)

leb ({ua + za ≤ t}) ≤ leb ({ua (µ) + za (µ) − |ua − ua (µ)| − |za − za (µ)| ≤ t}) ≤ ωµ (2t) + leb ({|ua − ua (µ)| + |za − za (µ)| ≥ t})

≤ ωµ (2t) + leb ({|ua − ua (µ)| ≥ t/2}) + leb (|za − za (µ)| ≥ t/2}) q

q

≤ ωµ (2t) + (t/2)−q (kua − ua (µ)kq + kza − za (µ)kq ) q

≤ ωµ (2t) + 2(t/2)−q kw − w(µ)kWq 36

This yields the upper bound for the measure of J c (1−β)q

β

leb (J c ) ≤ ωµ (2kw − w(µ)kWq ) + 2q+1 kw − w(µ)kWq

.

Using kuvkp ≤ kukq′ kvkq , q ′ = pq/(q − p), we have 1−β

kRa kp,J ≤ ku − u(µ)kq′ ,J kza − za (µ)kq,J . If q ′ ≤ q this yields 1−β

kRa kp,J ≤ cq,q′ ku − u(µ)kq,J kza − za (µ)kq,J q/q′

otherwise ku − u(µ)kq′ ≤ ku − u(µ)kq

1−(q−p)/p

kRa kp,J ≤ kb − ak∞

1−q/q′

kb − ak∞

by interpolation and thus (q−p)/p

ku − u(µ)kq,J

1−β

kza − za (µ)kq,J .

Combining both cases we obtain 1+min(1,(q−p)/p)−β

kRa kp,J ≤ C1 kw − w(µ)kWq

On the complement set J c we obtain by using that k1kq′ ,J c = leb (J c ) kRa kp,J c

.

1/q′

   1 k1kq′ ,J c ku − u(µ)kq,J c + kza − za (µ)kq,J c √ 2 γ  1 1 leb (J c ) q′ kw − w(µ)kWq √ γ   1 1  (1−β)q q′ β q+1 ≤ 2+ √ kw − w(µ)kWq ωµ (2kw − w(µ)kWq ) + 2 kw − w(µ)kWq γ    (1−β)q q+1 1 1 ′ β kw − w(µ)kWq . ≤ 2+ √ ωµ (2kw − w(µ)kWq ) q′ + 2 q′ kw − w(µ)kWqq γ

 ≤ 1+  ≤ 2+

Using Lemma 9.3 and q/q ′ = (q − p)/p, we finally obtain constants C2 , C3 > 0 with min(1,(q−p)/p)−β

kRa kp ≤ C1 kw − w(µ)kWq kw − w(µ)kWq   βq/(q+1) (q−p)/(qp) β(q−p)/(p(q+1)) ) + kw − w(µ)kWq + C2 ω(4kw − w(µ)kWq kw − w(µ)kWq (1−β)(q−p)/p

+ C3 kw − w(µ)kWq

kw − w(µ)kWq .

The last term has at least the order of the first term. Balancing the orders of the first and second term leads to β=

(q + 1) min(p, q − p) q(p + 1)

and results in the estimate   ′ η ηq′ kRa kp ≤ C ′ ω(4kw − w(µ)kWq )1/q + kw − w(µ)kWq kw − w(µ)kWq with

η=

(q − p) min(p, q − p) , pq(p + 1) 37

q′ =

qp . q−p

The same estimate is valid for kRb kp . Inserting this in (9.11) yields (9.7). If in addition strong strict complementarity (9.4) holds, i.e., ω(t) ≤ Cc tκ , then the middle term min(1,κ)β(q−p)/(p(q+1)) has order O(kw − w(µ)kWq ) and balancing with the first term gives β=

(q + 1) min(p, q − p) . p(q + 1) + min(1, κ)(q − p)

Inserting this choice of β leads to the asserted estimate (9.9). We see that a norm gap occurs in the estimates (9.7), (9.9). To close the norm gap we will use a smoothing step. 9.2. Smoothing steps. We construct now an operator Qµ : (y, u, λ, za , zb ) ∈ Wp 7→ (y, u ˜, λ, z˜a , z˜b ) ∈ Wq such that there exists a constant LS > 0 with kQµ (w) − w(µ)kWq ≤ LS kw − w(µ)kWp

(9.12)

to close the norm gap in (9.7). Let (A1)-(A5)q and the assumptions of Lemma 6.4 hold. Then w(µ) is the unique solution of (2.9) and satisfies with the notations of (A5)q , 5. in particular 0 = ℓu (w(µ)) = β(u(µ)) + gˆs (y(µ), u(µ), λ(µ)) + zb (µ) − za (µ) µ µ = β(u(µ)) + − + gˆs (y(µ), u(µ), λ(µ)) b − u(µ) u(µ) − a =: βµ (u(µ)) + gˆs (y(µ), u(µ), λ(µ)), where β ∈ C 1 (R), β ′ ≥ α0 > 0 and where gˆs : (y, u, λ) ∈ Y × U × Σ 7→ Ju (y, u) − β(u) + cu (y, u)∗ λ ∈ Lq (Ω) is by (A5)q , 5. well defined and Lipschitz on bounded sets. Since the mappings βµ;ξ : t ∈ (a(ξ), b(ξ)) 7→ β(t) +

µ µ − b(ξ) − t t − a(ξ)

−1 ′ satisfy βµ;ξ ≥ α0 > 0 and βµ;ξ ((a(ξ), b(ξ))) = R, the inverse mappings βµ;ξ : R → (a(ξ), b(ξ)) exist and are Lipschitz continuous with Lipschitz constant ≤ 1/α0 . Thus, also the mapping

βµ : u ∈ {v : a < v < b} 7→ βµ (u) = β(u) +

µ µ − b−u u−a

has a Lipschitz continuous inverse βµ−1 : Lq (Ω) → ({v : a < v < b} , k · kq ) and we have u(µ) = βµ−1 (−ˆ gs (y(µ), u(µ), λ(µ))). Thus, given any w ∈ Wq the “smoothed” control us := βµ−1 (−ˆ gs (y, u, λ))

(9.13) satisfies (9.14)

kus − u(µ)kq ≤

1 Lg kˆ gs (y, u, λ) − gˆs (y(µ), u(µ), λ(µ))kq ≤ kw − w(µ)kWp α0 α0 38

with a Lipschitz constant Lg of gˆs according to (A5)q , 5. Smoothing of the za , zb -components can now be obtained by using the identities za (µ) =

(9.15)

µ , u(µ) − a

zb (µ) =

µ . b − u(µ)

In fact, after computing us we choose zas , zbs according to zas =

(9.16)

us

µ , −a

zbs =

µ . b − us

We will see that this leads in fact to an operator that has the smoothing property (9.12). D EFINITION 9.5. The smoothing operator Qµ : Wp → Wq is defined by Qµ (y, u, λ, za , zb ) = (y, us , λ, zas , zbs ) with us according to (9.13) and zas , zbs according to (9.16), where gˆs (y, u, λ) = Ju (y, u) − β(u) + cu (y, u)∗ λ according to (A5)q , 5. T HEOREM 9.6. Let the assumptions of Lemma 6.3 hold. Then for any µ0 > 0, ρ0 > 0 there is a constant LS > 0 such that the smoothing operator Qµ : Wp → Wq of Definition 9.5 is well defined and satisfies (9.17) kQµ (w) − w(µ)kWq ≤ LS kw − w(µ)kWp ∀ w ∈ Wp , kw − w(µ)kWp ≤ ρ0 , ∀ µ ∈ (0, µ0 ]. Moreover, (y, us , λ, zas , zbs ) = Qµ (w) satisfies a < us < b,

(us − a)zas = µ,

(b − us )zbs = µ

and thus Qµ (w) ∈ N−∞,q (µ). Proof. We have already shown that (9.14) holds, where Lg is the Lipschitz constant of gˆs on the bounded set of all w ∈ Wp considered in (9.17). Moreover, we have by (9.16) and the choice (9.13) of us zbs − zas = −β(us ) − gˆs (y, u, λ).

(9.18) On the other hand (9.19)

zb (µ) − za (µ) = −β(u(µ)) − gˆs (y(µ), u(µ), λ(µ)).

Now consider the following partition of Ω Ωa = {us ≥ (b + a)/2, u(µ) ≥ (b + a)/2} ,

Ωb = {us < (b + a)/2, u(µ) < (b + a)/2} ,

Ω′a = {us ≥ (b + a)/2, u(µ) < (b + a)/2} ,

Ω′b = {us < (b + a)/2, u(µ) ≥ (b + a)/2} .

Then we obtain on Ωa by (9.16) and (9.19) µ µ µ|us − u(µ)| 4µ s |(za − za (µ))|Ωa | = s − = s ≤ 2 |(us − u(µ))Ωa | u − a u(µ) − a Ωa (u − a)(u(µ) − a) Ωa ν Now (9.18), (9.19) yield

|(zbs − zb (µ))Ωa | ≤ |(zas − za (µ))Ωa | + |(β(us ) − β(u(µ)))Ωa | + |(ˆ gs (y, u, λ) − gˆs (y(µ), u(µ), λ(µ)))Ωa |. 39

Completely similar we obtain on Ωb by (9.16) and (9.19) µ µ s ≤ 4µ |(us − u(µ))Ω | |(zb − zb (µ))Ωb | = − b s b−u b − u(µ) Ωb ν2

Now again (9.18), (9.19) yield

|(zas − za (µ))Ωb | ≤ |(zbs − zb (µ))Ωb | + |(β(us ) − β(u(µ)))Ωb | + |(ˆ gs (y, u, λ) − gˆs (y(µ), u(µ), λ(µ)))Ωb |. Finally, (9.16), (9.15) yield on Ω′a (zas − za (µ))Ω′a < 0,

(zbs − zb (µ))Ω′a > 0

and thus the difference of (9.18), (9.19) yields |(zbs − zb (µ))Ω′a | + |(zas − za (µ))Ω′a | = |(zbs − zb (µ) + za (µ) − zas )Ω′a | ≤ |(β(us ) − β(u(µ)))Ω′a | + |(ˆ gs (y, u, λ) − gˆs (y(µ), u(µ), λ(µ)))Ω′a |.

Analogously we obtain on Ω′b |(zbs − zb (µ))Ω′b | + |(zas − za (µ))Ω′b | = |(zbs − zb (µ) + za (µ) − zas )Ω′b | gs (y, u, λ) − gˆs (y(µ), u(µ), λ(µ)))Ω′b |. ≤ |(β(us ) − β(u(µ)))Ω′b | + |(ˆ

Taking all together, we have shown that |zbs − zb (µ)| + |zas − za (µ)| ≤ |β(us ) − β(u(µ))| + |ˆ gs (y, u, λ) − gˆs (y(µ), u(µ), λ(µ))| + ≤ |ˆ gs (y, u, λ) − gˆs (y(µ), u(µ), λ(µ))| +

8µ0 + sup β ′ ν2 [inf a,sup b]

8µ0 s |u − u(µ)| 2 !ν |us − u(µ)|.

Hence, (9.14) yields kzbs − zb (µ)kq + kzas − za (µ)kq ≤

Lg Lg + α0

8µ0 + sup β ′ ν2 [inf a,sup b]

!!

kw − w(µ)kWp .

Together with (9.14), (9.17) is proven. The last statement is obvious by the definition of Qµ . If we replace the projection Pµ in Lemma 9.4 by the smoothing operator Qµ (which yields a point in the neighborhood) then we obtain directly the following corollary. C OROLLARY 9.7. Let the assumptions of Lemma 9.4 hold. If w ¯ = limµց0 w(µ) satisfies strict complementarity then there exist constants LS , C > 0 such that for any 0 < µ ≤ µ0 and for any w ∈ N−∞,q (µ) with kw − w(µ)kWq ≤ ρ0 the solution w+ of the primal dual Newton step (8.1) satisfies (9.20) kQµ (w+ ) − w(µ)kWq ≤ LS kw+ − w(µ)kWp   ′ ηq′ η ≤ 2LS C ω(4kw − w(µ)kWq )1/q + kw − w(µ)kWq kw − w(µ)kWq = o(kw − w(µ)kWq )

40

with η=

(q − p) min(p, q − p) , pq(p + 1)

q′ =

qp . q−p

Moreover, if w ¯ satisfies strong strict complementarity (9.4) then (9.21)

1+η

kQµ (w+ ) − w(µ)kWq ≤ LS kw+ − w(µ)kWp ≤ 2LS Ckw − w(µ)kWq

with η=

min(1, κ)(q − p) min(p, q − p) . p2 (q + 1) + min(1, κ)p(q − p)

Note that no norm gap appears in (9.20) and (9.21). 9.3. A modified interior-point method with smoothing step. We consider now the following modification of Algorithm PDPF. Algorithm PDPFS: Primal-Dual Interior-Point Method with Smoothing. 1. Choose µ0 > 0, γ ∈ (0, 1), µ−∞ > 0, (σmin,k ) ⊂ (0, 1), µ0 ∈ (0, µ−∞ ], and w0 := (y0 , u0 , λ0 , za,0 , zb,0 ) ∈ N−∞,q (µ0 ). Set k := 0. 2. Solve the primal-dual Newton system DFµk (wk )sk = −Fµk (wk ) and set wk+1 := Qµk (wk + sk ). 3. Choose σk ∈ [σmin,k , 1] and set µk+1 := σk µk . 4. Set k := k + 1 and goto step 2. R EMARK 9.8. As already algorithm PDPF, we have stated this algorithm in a very basic form. We now will prove global linear and local superlinear convergence of algorithm PDPFS. T HEOREM 9.9. Let µ0 > 0 and ρ0 > 0 be fixed. Assume that (A1)–(A4) and (A5)q with some q ∈ (p, ∞) hold, that u ∈ B 7→ J (y(u), u) is convex (to ensure uniqueness of the central path), and that w ¯ = limµց0 w(µ) satisfies strict complementarity. Furthermore, assume that (9.6)

2 ˆ (v, H(y, u, λ)v) ≥ αkvk2

∀ v ∈ L2 (Ω),

∀ w ∈ N−∞,q (µ), kw − w(µ)kWq ≤ ρ0 , µ ∈ (0, µ0 ].

Then there exist constants ρ¯ ∈ (0, ρ0 ], σ ¯max ∈ (0, 1) and a sequence σmax > σ ¯min,k ց 0 such that Algorithm PFPFS has the following convergence property: For all initial points w0 ∈ N−∞,q (µ0 ) with kw0 − w(µ0 )kWq ≤ ρ¯ and every choice σmin,k ∈ [¯ σmin,k , σmax ], Algorithm PDPFS generates a sequence with (9.22) (9.23) (9.24)

√ kwk − w(µk )kWq ≤ C µk−1 √ kwk − wk ¯ Wq ≤ (C + L) µk−1

k µk = σ0 · · · σk−1 µ0 ≤ σmax µ0 → 0.

Here, C > 0 is a constant and L is the H¨older constant of the central path on (0, µ0 ]. This shows global convergence with R-linear rate. 41

The above conditions allow to choose 0 < σmin,k ≤ σk ց 0, which yields R-superlinear convergence. If strong strict complementarity (9.4) holds at w, ¯ then σ ¯min,k = O(µηk ) with η > 0 according to (9.10) and the choice σmin,k = O(¯ σmin,k ) yields R-superlinear convergence with rate 1 + η. Proof. Consider an arbitrary µ ∈ (0, µ0 ]. Then there exists by Corollary 9.7 a constant C > 0 such that for any w ∈ N−∞,q (µ) with kw − w(µ)kWq ≤ ρ0 the estimate holds (9.25)

kQµ (w+ ) − w(µ)kWq ≤ ψ(kw+ − w(µ)kWq )kw+ − w(µ)kWq

with ′



ψ(t) = 2LS C(ω(4tηq )1/q + tη )

(9.26)

and η > 0, q ′ according to (9.8), where w+ is the result of the primal-dual Newton step (8.1). Choose τ > 0 such that ψ(τ )
1 the choice σk ∈ [¯ σmin,k , min(κ¯ σmin,k , σmax )] yields µk+1 = o(µk ) and thus R-superlinear convergence. Finally, let strong strict complementarity be satisfied. Then there exists by Corollary 9.7 a constant C > 0 such that (9.25) is satisfied with with (9.32)

ψ(t) = 2LS Ctη

and η > 0 according to (9.10). Now (9.30), (9.31) can be shown exactly as above for the definition (9.32) of ψ and the corresponding σ ¯min,k according to (9.29). Moreover, (9.29) shows now that for k √ large enough σ ¯min,k = 4ψ(2L µk )2 = O(µηk ) → 0 and therefore for any constant κ > 1 the choice ). σk ∈ [¯ σmin,k , min(κ¯ σmin,k , σmax )] yields µk+1 = O(µ1+η k 10. Numerical results. We conclude the paper by a demonstration of the practical performance of the proposed method. We consider the elliptic control problem (2.2) with Ω = (0, 1)2 , i.e., n = 2, a = −5,

b = 0,

α = 0.001,

yd = sin(πξ1 ) sin(πξ2 ) cos(3ξ1 ). Both, the first and the second setting are applicable. In the first setting (Y = H01 (Ω), etc.), our assumptions are satisfied for any value q ∈ (p, ∞). In the second setting (Y = H01 (Ω) ∩ H 2 (Ω), etc.), our assumptions are satisfied for any value q ∈ (p, ∞]. In our implementation of algorithm PDPFS, the adjustment of µk , and thus the choice of σk , is done adaptively based on the value of µk . It turns out to be beneficial (sometimes one iteration is 43

It. 0 1 2 3 4 5 6 7

kFµk (wk )kW ′ µk ∞ Grid 4: 30529 Nodes 8.60e+01 2.50e+00 3.23e-07 2.50e-01 7.86e-02 1.57e-02 7.75e-01 3.95e-04 7.93e-01 2.90e-06 1.10e-01 4.13e-09 1.66e-03 6.62e-13 5.05e-09

TABLE 10.1 Results for algorithm PDPFS applied on the finest grid.

saved) to compute not only the smoothing step, but also the projected step, and to use the latter if it reduces the norm of Fµk significantly more than the smoothing step. The discretization is done by linear finite elements. We generate a hierarchy of 5 triangular grids, the coarsest with 139 nodes, the finest with 30529 nodes. Starting from the coarsest, the next finer grid is obtained by uniform refinement of the previous grid. We perform two numerical tests, the second (nested iteration) being of higher practical relevance. In the first test we solve the fine grid problem directly with the PDPFS method from the following initial point: yh0 = yd,h ,

λ0h = 0,

0 0 (u0h , za,h , zb,h ) obtained by one smoothing step.

Here, the subscript h indicates that we are working in finite element space, and yd,h is the nodewise interpolation of yd . In the second test, we use nested iteration, i.e., we start on the coarsest grid, solve the discretized problem approximately, and prolongate the solution to obtain an initial point on the next finer level. This is repeated until the problem on the finest grid is solved up to the desired accuracy. The accuracy requirements are increased from grid to grid. The prolongation is performed for the state and the 0 0 adjoint state only, the initial control u0h and the initial multipliers za,h , zb,h are obtained by applying a smoothing step. The approximate solution of the discrete problems on the various grids is done by 0 0 a discrete version of Algorithm PDPFS. The inital µ-value is computed from u0h , za,h , and zb,h . The results of the first test can be found in Table 10.1. We observe fast local convergence of µk and of the residual as predicted by the theory. The results for the second test in Table 10.2 show that nested iteration is very efficient. Thanks to the smoothing step, we obtain very good initial points, since only the smooth parts yh and λh of the coarse grid solution are prolongated. On the finer grids, we need only two iterations, which is a very good performance. Acknowledgements. The authors would like to thank the referees for their constructive suggestions, which helped to improve the paper. The second author was supported in parts by the DFG within the Collaborative Research Center SFB 666. REFERENCES ¨ [1] E. A LLGOWER , K. B OHMER , F. P OTRA , AND W. R HEINBOLDT , A mesh-independence principle for operator equations and their discretizations, SIAM J. Numer. Anal., 23 (1986), pp. 160–169. [2] W. A LT , Discretization and mesh-independence of Newton’s method for generalized equations, in Fiacco, A. (ed.), Mathematical programming with data perturbations, Lecture Notes in Pure and Appl. Math. 195, Dekker, New York, 1998, pp. 1–30. 44

It. 0 1 2 3 4 5 6 0 1 2 3

kFµk (wk )kW ′ µk ∞ Grid 0: 139 Nodes 1.89e+01 2.50e+00 2.47e-07 2.50e-01 7.31e-02 1.57e-02 7.37e-01 3.95e-04 6.08e-01 2.89e-06 7.43e-02 4.13e-09 1.38e-03 Grid 1: 513 Nodes 1.34e+00 7.22e-06 1.15e-03 1.40e-08 1.93e-03 3.36e-12 2.61e-06

It. 0 1 2 0 1 2 0 1 2

kFµk (wk )kW ′ µk ∞ Grid 2: 1969 Nodes 1.62e+00 1.09e-06 9.19e-05 1.12e-09 3.16e-07 Grid 3: 7713 Nodes 1.75e+00 1.67e-07 1.59e-05 9.18e-11 4.42e-08 Grid 4: 30529 Nodes 1.80e+00 1.93e-08 1.49e-06 5.17e-12 4.86e-09

TABLE 10.2 Results for the nested iteration.

¨ [3] M. B ERGOUNIOUX , M. H ADDOU , M. H INTERM ULLER , AND K. K UNISCH, A comparison of a moreau-yosida based active set strategy and interior point methods for constrained optimal control problems, SIAM J. Optim., 11 (2000), pp. 495–521. [4] M. B ERGOUNIOUX , K. I TO , AND K. K UNISCH, Primal-dual strategy for constrained optimal control problems, SIAM J. Control Optim., 37 (1999), pp. 1176–1194. [5] J. F. B ONNANS AND A. S HAPIRO, Optimization problems with perturbations: a guided tour, SIAM Rev., 40 (1998), pp. 228–264. [6] J. C. DE LOS R EYES AND K. K UNISCH, A semi-smooth Newton method for control constrained boundary optimal control of the Navier-Stokes equations, Nonlinear Anal., 62 (2005), pp. 1289–1316. [7] A. F ORSGREN , P. E. G ILL , AND M. H. W RIGHT , Interior methods for nonlinear optimization, SIAM Rev., 44 (2002), pp. 525–597. ¨ [8] M. H INTERM ULLER , K. I TO , AND K. K UNISCH, The primal-dual active set strategy as a semismooth newton method, SIAM J. Optim., 13 (2003), pp. 865–888. ¨ [9] M. H INTERM ULLER AND M. U LBRICH , A mesh-independence result for semismooth Newton methods, Math. Program., 101 (2004), pp. 151–184. [10] M. H INZE AND K. K UNISCH, Second order methods for optimal control of time-dependent fluid flow, SIAM J. Control Optim., 40 (2001), pp. 925–946 (electronic). [11] J. J OST , Postmodern analysis, Springer-Verlag, Berlin, 1998. [12] C. T. K ELLEY AND E. W. S ACHS, Multilevel algorithms for constrained compact fixed point problems, SIAM J. Sci. Comput., 15 (1994), pp. 645–667. [13] H. D. M ITTELMANN AND H. M AURER, Solving elliptic control problems with interior point and SQP methods: Control and state constraints, J. Comp. Appl. Math., 120 (2000), pp. 175–195. ¨ ¨ [14] U. P R UFERT , F. T R OLTZSCH , AND M. W EISER, The convergence of an interior point method for an elliptic control problem with mixed control-state constraints, Comp. Opt. Appl., (to appear). [15] S. M. ROBINSON, Stability theory for systems of inequalities. II: Differentiable nonlinear systems, SIAM J. Numer. Anal., 13 (1976), pp. 497–513. [16] A. S CHIELA AND M. W EISER, Superlinear convergence of the control reduced interior point method for PDE constrained optimization, Comp. Opt. Appl., (to appear). ¨ [17] F. T R OLTZSCH , Optimale Steuerung partieller Differentialgleichungen: Theorie, Verfahren und Anwendungen, Vieweg, Berlin, 2005. [18] M. U LBRICH, Nonsmooth Newton-like methods for variational inequalities and constrained optimization problems in function spaces, Habilitationsschrift, Zentrum Mathematik, TU M¨unchen, 2002. [19] , Constrained optimal control of Navier-Stokes flow by semismooth Newton methods, Systems Control Lett., 48 (2003), pp. 297–311. [20] , Semismooth Newton methods for operator equations in function spaces, SIAM J. Optim., 13 (2003), pp. 805– 841. [21] M. U LBRICH AND S. U LBRICH, Global convergence of trust-region interior-point algorithms for infinite-dimensional nonconvex minimization subject to pointwise bounds, SIAM J. Control Optim., 37 (1999), pp. 731–764. [22] , Superlinear convergence of affine-scaling interior-point Newton methods for infinite-dimensional nonlinear problems with pointwise bounds, SIAM J. Control Optim., 38 (2000), pp. 1938–1984. 45

[23] M. W EISER, Interior point methods in function space, SIAM J. Control Optim., 44 (2005), pp. 1766–1786. [24] M. W EISER AND P. D EUFLHARD, Inexact central path following algorithms for optimal control problems, SIAM J. Control Optim., (to appear). ¨ [25] M. W EISER , T. G ANZLER , AND A. S CHIELA, A control reduced primal interior point method for a class of control constrained optimal control problems, Comp. Opt. Appl., (to appear). [26] M. W EISER AND A. S CHIELA, Function space interior point methods for PDE constrained optimization., PAMM, 4 (2004), pp. 43–46. [27] E. Z EIDLER, Nonlinear functional analysis and its applications III, Springer-Verlag, New York, 1985.

46