Properties of an augmented Lagrangian for design optimization

Report 1 Downloads 111 Views
Optimization Methods and Software Vol. 00, No. 00, January 2008, 1–23

RESEARCH ARTICLE Properties of an augmented Lagrangian for design optimization Adel Hamdi and Andreas Griewank Institut f¨ ur Mathematik Fakult¨ at Math.-Nat. II, Humboldt-Universit¨ at, Rudower Chaussee 25, Adlershof Post: Unter den Linden 6, 10099 Berlin, Germany (v4.1 released October 2007) We consider the task of design optimization, where the constraint is a state equation that can only be solved by a typically rather slowly converging fixed point solver. This process can be augmented by a corresponding adjoint solver, and based on the resulting approximate reduced derivatives, also an optimization iteration, which actually changes the design. To coordinate the three iterative processes, we consider an augmented Lagrangian function that should be consistently reduced whatever combination of sequence of primal, dual and optimization steps one chooses. We present some numerical results for a one-shot strategy where all variables are changed simultaneously.

Keywords: design optimization, primal feasibility, dual feasibility, merit function, descent approach, fixed point solver. AMS Subject Classification: AMS: 90C30

1.

Introduction

Design optimization problems are distinguished from general nonlinear programming problems (NLP) by the fact that the vector of variables is partitioned into a state vector and design variables [12]. For applications of this scenario in Computational Fluid Dynamics (CFD) see for example [14], [16] and [17]. In this paper, we are interested in solving an optimization problem with an equality constraint

(P )

min f (y, u) s.t. c(y, u) = 0. y,u

Here, y ∈ Y is the state vector and u is the design vector which is restricted to a closed convex subset A of the design (optimization) space U . For simplicity, we assume that not only Y but also U and thus their Cartesian product X = Y × U are Hilbert spaces. c : X −→ Y is called the state equation and f : X −→ R the objective function. When Y and U have finite dimensions n = dim(Y ) and m = dim(U ), their elements may be identified with coordinate vectors in Rn and Rm with respect to suitable Hilbert bases. This convention allows us to write duals as transposed vectors and inner products as the usual scalar products in Euclidean space. In fact, all assertions in this paper remain true if dim(Y ) = ∞. Corresponding author. Email: [email protected] Corresponding author. Email: [email protected] ISSN: 1055-6788 print/ISSN 1029-4937 online c 2008 Taylor & Francis

DOI: 10.1080/1055678xxxxxxxxxxxxx http://www.informaworld.com

2

A key assumption in our present paper is that given u ∈ A, the model is always ∂c solvable with respect to y and that ∂y (y, u) is always invertible on Y × A. The problem of augmenting fixed point solvers for PDEs with sensitivity and optimization calculation has been considered by various authors during the last few years in [9], [12], [10] and [11]. In [9], the author used the One-Shot technique to solve a constrained optimization problem. This technique aims at attaining feasibility and optimality simultaneously (see [9] and [11]). In fact, within one step the primal, dual and design variables are updated simultaneously. Using automatic differentiation this requires only one simultaneous evaluation of the function value with normal and adjoint derivatives. The author studied in [9] the construction of a preconditioner which, under some conditions, may guarantee the local contractivity of the full step coupled iteration. Via an eigenvalue analysis, he proved necessary but not sufficient conditions on the preconditioner for the overall iteration to be locally contractive with respect to some ellipsoidal norm. Deriving conditions on the preconditioner in order to ensure even local convergence of the coupled full step iteration seems to be difficult. In this paper, we study the resolution of the optimization problem (P ) by minimizing an exact penalty function (see [18, 19], [4] and [5]) involving weighted primal and dual residuals added to the Lagrangian. This approach should be useful for any combination and sequencing of steps for improving primal, dual feasibility and optimality. In Section 2, we establish a result on dual retardation, i.e. the slow down in the overall convergence when the adjoint iteration is coupled with the original retardation. There, we only require Lipschitz continuity of the function derivatives of f and c. Section 3 is devoted to derive conditions on the weighting coefficients occurring in the penalty function in order to ensure its monotonically reduction throughout the iterations. In Section 4, we establish a reasonable choice for the weighting coefficients, we elaborate a line-search procedure that does not require the computation of any second derivatives of the original problem and we present a choice for the preconditioner. Finally, Section 5 is reserved for some numerical tests carried out by considering the Bratu equation on the unit square.

1.1.

Problem statement

We suppose that the equality constraint c(y, u) = 0 can be transformed into a fixed point equation y = G(y, u). Therefore, we are interested in solving the following constrained optimization problem:

(PG )

min f (y, u) s.t. y = G(y, u). y,u

Moreover, we assume that the functions f , G are C 2,1 on the closed convex Y ×A and that a uniform contraction factor for the iteration function G is given by

kGy (y, u)k = kG⊤ y (y, u)k ≤ ̺ < 1 =⇒ kG(y1 , u) − G(y2 , u)k ≤ ̺ky1 − y2 k.

(1)

The above implication follows from the mean value theorem on any convex subdomain of Y . The key assumption is that the spectral radius of Gy is less than one, which implies (1) for a suitable inner product norm which we will simply denote by k.k. Hence, by Banach fixed point theorem, for fixed u the sequence yk+1 = G(yk , u)

3

converges to a unique limit y ∗ = y ∗ (u). Furthermore, the Lagrangian associated to the constrained optimization problem (PG ) is defined as follows: L(y, y¯, u) = f (y, u) + (G(y, u) − y)⊤ y¯ = N (y, y¯, u) − y ⊤ y¯, Here, as discussed in [10], y¯ does not need to be the exact adjoint of y but may represents an approximation. N the shifted Lagrangian N (y, y¯, u) := f (y, u) + G(y, u)⊤ y¯.

(2)

According to the first order necessary condition, a KKT point (y ∗ , y¯∗ , u∗ ) of the problem (PG ) must satisfy y ∗ = G(y ∗ , u∗ ), y¯∗ = Ny (y ∗ , y¯∗ , u∗ )⊤ = fy (y ∗ , u∗ )⊤ + Gy (y ∗ , u∗ )⊤ y¯∗ , 0 = Nu (y ∗ , y¯∗ , u∗ )⊤ = fu (y ∗ , u∗ )⊤ + Gu (y ∗ , u∗ )⊤ y¯∗ .

(3)

We denote by F the feasible space: F:= {(y, u) ∈ Y ×A s.t. y = G(y, u)} and T the tangential plane: T := {z, ∇y,u (G(y, u) − y)z = 0}. Then, z = Z z˜ where z˜ ∈ Rm and  (I − Gy )−1 Gu Z= . I 

(4)

As a consequence of the contractivity assumption (1) and of the Perturbation Lemma [22], I − Gy is an invertible matrix and thus Z is well defined. According to the second order necessary condition, the reduced Hessian



H = Z Nxx Z

where Nxx



 Nyy Nyu = , Nuy Nuu

(5)

must be positive semi-definite at a local minimizer. In the remainder, we will make the slightly stronger assumption that second order sufficiency is satisfied, i.e. H is positive definite. One can use the following coupled full step iteration, called one-shot strategy, to reach a KKT point (see [9] and [11]): yk+1 = G(yk , uk ), y¯k+1 = Ny (yk , y¯k , uk )⊤ , uk+1 = uk − Bk−1 Nu (yk , y¯k , uk )⊤ ,

(6)

where Bk is a suitable preconditioner. As we will see the contractivity assumption (1) on Gy in Y × A ensures that the coupled iteration (6) converges to a KKT point (y ∗ , y¯∗ , u∗ ) provided the preconditioner Bk is sufficiently large. The first equation of the system (6) converges ̺-linearly for fixed u. Moreover, although the second equation exhibits a certain time-lag (see [12]), it converges with the

4

same asymptotic R-factor. As far as the convergence of the coupled iteration (6) is concerned the aim is to define a preconditioner Bk that ensures convergence in a reasonable number of iterations compared to the required iteration number needed to obtain feasibility (bounded retardation). The asymptotic rate of convergence of (6) to a limit point (y ∗ , y¯∗ , u∗ ) is determined by the Jacobian which takes the following 3 × 3 block form:

J∗ =

∂(yk+1 , y¯k+1 , uk+1 ) ∂(yk , y¯k , uk )

 Gy 0 Gu  . (7) Nyu G⊤ =  Nyy y −1 −1 ⊤ −1 −B Nuy −B Gu I − B Nuu 

(y∗ ,y ¯∗ ,u∗ )

The local convergence is ensured by the condition ̺(J ˆ ∗ ) < 1 where ̺ˆ(J ∗ ) is the spectral radius of J ∗ . In [9], the author proves that, unless they happen to coincide with eigenvalues of Gy , the eigenvalues of J ∗ solve the following nonlinear eigenvalues problem:

det[(λ − 1)B + H(λ)] = 0 where  (λI − Gy )−1 Gu H(λ) = Z(λ) Nxx Z(λ) and Z(λ) = . I ⊤



(8)

Hence, we have H(1) = H and Z(1) = Z as considered in the second order optimality condition above. As discussed in [9], the conditions B = B ⊤ ≻ 0 and B ≻ 12 H(−1) are necessary but not sufficient to exclude eigenvalues λ ≥ 1 and λ ≤ −1. However, no constructive conditions to also bound complex eigenvalues below 1 in modulus have been found. Deriving conditions on the preconditioner B to ensure even local convergence of the coupled full step iteration (6) has proved to be difficult. In the present paper, we look for descent on a merit function of the augmented Lagrangian type (see [18, 19]) defined as follows:

La (y, y¯, u) =

β α kG(y, u) − yk2 + kNy (y, y¯, u)⊤ − y¯k2 + N (y, y¯, u) − y¯⊤ y. (9) 2 2

We prove that for some condition on the weighting coefficients α and β, La is an exact penalty function. Throughout the paper, we use the following notations:

∆yk = G(yk , uk ) − yk , ∆¯ yk = Ny (yk , y¯k , uk )⊤ − y¯k , ∆uk = −Bk−1 Nu (yk , y¯k , uk )⊤ .

(10)

In the next Section, we start by establishing a result concerning the coupling of the primal and dual iteration.

5

2.

Dual retardation

Since the adjoint equation in the middle of (3) depends on the primal solution y ∗ , it can not be solved accurately as long as y ∗ is not known. In other words the dual iterates y¯k generated by (6) will be effected by the remaining inaccuracy in the primal iterates yk . Hence, we can not expect the y¯k to converge faster than the yk . In fact, they typically lag a little bit behind as the perturbation of the adjoint equations caused by the errors in the primals yk tend to accumulate initially. We refer to this delay relative to the primal iterates as dual retardation. Nevertheless, asymptotically the dual iterates tend to catch up with the primals, which is essentially the result given by the theorem below. Theorem 2.1 : Suppose that for fixed u, the functions f and G are once Lipschitz continuously differentiable with respect to y near the fixed point y ∗ = y ∗ (u). Furthermore, we assume that the primal iterates converge to a solution y ∗ such that generically k∆yk k = ̺∗ ≡ kGy (y ∗ , u)k. k→∞ k∆yk−1 k lim

(11)

Define for any given ε > 0 and positive coefficients α and β, the smallest pair of integers (ℓεp , ℓεd ) such that √

αk∆yℓεp k ≤ ε

and

p

βk∆¯ yℓεd k ≤ ε.

Then, we have

lim sup

ε−→0

ℓεd ≤ 1. ℓεp

Proof: According to the assumption (11), we have

∀ ∆̺ ≥ 0, ∃ N1 > 0 / ∀ k ≥ N1 , (̺∗ − ∆̺)k ≤

k∆yN1 +k k ≤ (̺∗ + ∆̺)k . (12) k∆yN1 k

In addition, as Gy is a continuous function and lim Gy (yk , u) = Gy (y ∗ , u). Thus,

lim yk

k→∞

=

y ∗ then

k→∞

∀ ∆̺ ≥ 0, ∃ N2 > 0 / ∀ k ≥ N2 , (̺∗ − ∆̺) ≤ kGy (yk , u)k ≤ (̺∗ + ∆̺). (13) Without restriction of the generality, we start from the index k such that k ≥ N2 . From the Lipschitz continuity of fy and Gy , we obtain k∆¯ yk+1 k ≤ kGy (yk+1 , u)⊤ y¯k+1 − Gy (yk , u)⊤ y¯k k + kfy (yk+1 , u) − fy (yk , u)k, ≤ k(Gy (yk+1 , u) − Gy (yk , u))⊤ y¯k + Gy (yk+1 , u))⊤ ∆¯ yk k + kfy (yk+1 , u) − fy (yk , u)k, ≤ kGy (yk+1 , u)kk∆¯ yk k + (cf + cg k¯ yk k)k∆yk k, ≤ (̺∗ + ∆̺)k∆¯ yk k + (cf + cg k¯ yk k)k∆yk k.

(14)

6

where cf and cg are the Lipschitz coefficients of fy and Gy . Furthermore, we have k¯ yk k = kGy (yk−1 , u)⊤ y¯k−1 + fy (yk−1 , u)k ≤ ̺k¯ yk−1 k + kfy (yk−1 , u)k, Moreover, we verify that k¯ y1 k ≤ max{k¯ y0 k,

induction that

∀ k ≥ 1.

kfy (y0 , u)k }. Then, we prove by 1−̺

∀ k ≥ 1, k¯ yk k ≤ C0 = max{k¯ y0 k, sup kfy (yk , u)k/(1 − ̺)}.

(15)

k

Therefore, (14) and (15) lead to the following inequality: k∆¯ yk+1 k ≤ (̺∗ + ∆̺)k∆¯ yk k + Ck∆yk k,

(16)

where C = cf + cg C0 . In addition, we verify that k∆¯ y0 k ≤ 2k¯ y0 k + kfy (y0 , u)k. So, again by induction we prove from (16) that ∀ k ≥ 0, k∆¯ yk k ≤ (k + 1)(̺∗ + ∆̺)k M.

(17)

In fact, assuming that this statment is fulfilled for the index k, then from (16) and with refer to (12) we obtain k∆¯ yk+1 k ≤ (k + 1)(̺∗ + ∆̺)k+1 M + C(̺∗ + ∆̺)k k∆y0 k Ck∆y0 k ). = (̺∗ + ∆̺)k+1 M(k + 1 + M(̺∗ + ∆̺)

(18)

Ck∆y0 k ≤ 1 and k∆¯ y0 k ≤ M ensures that the M(̺∗ + ·̺) statment (17) is fulfilled for all k ≥ 0. To this end, we consider M as follows: Thus, choosing M such that

M ≥ max{2k¯ y0 k + kfy (y0 , u)k,

Ck∆y0 k }. ̺∗

As consequences of (12) and (17), for ∆̺ ≥ 0, any positive coefficients α and β and all k > N1 we have √ √ ∗ k−N1 k k, √α(̺ − ∆̺)√ k∆yN1 k∗ ≤ αk∆y βk∆¯ yk k ≤ β(k + 1)(̺ + ∆̺)k M.

(19)

Let lpε > N1 be the smallest integer such that for ∆̺ ≥ 0 and given ε > 0, we have √

ε

α(̺∗ − ∆̺)lp −N1 k∆yN1 k ≤ ε,

and ldε be the biggest integer such that

(20)

7

p

ε

β(ldε + 1)(̺∗ + ∆̺)ld M > ε.

(21)

Then, for ∆̺ ≥ 0 we obtain from (20) and (21) that lpε



lpε −N1



(̺ − ∆̺) ≤ (̺ − ∆̺) Therefore, denoting MN1 =

q

β M α

k∆yN1 k

ε

(̺∗ − ∆̺)lp ε ε ≤ (ld + 1)MN1 (̺∗ + ∆̺)ld

=⇒

ldε



≤ (̺ + ∆̺)

r

β (ldε + 1)M . α k∆yN1 k

(22)

it follows from (22) that for ∆̺ ≥ 0 we have ln((ldε + 1)MN1 ) ldε ln(̺∗ − ∆̺) − ≤ . (23) lpε ln(̺∗ + ∆̺) lpε | ln(̺∗ + ∆̺)|

Furthermore, let lN1 be the smallest integer such that

(̺∗ + ∆̺)

lN

1 2

(lN1 + 1)MN1 ≤ 1.

As for sufficiently small ε, ldε tends to infinity then it follows from (23) that ldε ln(̺∗ − ∆̺) ≤ 2 lpε ln(̺∗ + ∆̺)

=⇒

sup ε

ldε < ∞. lpε

Therefore, for ∆̺ ≥ 0 we obtain from (23) that ldε ln((ldε + 1)MN1 ) ln(̺∗ − ∆̺) ≤ lim − = 0. ε ε→0 lp ln(̺∗ + ∆̺) lpε →∞ lpε | ln(̺∗ + ∆̺)| lim

(24)

Hence, ldε ln(̺∗ − ∆̺) ≤ ε ε→0 lp ln(̺∗ + ∆̺)

=⇒

∀∆̺ ≥ 0, lim

lim sup

ε→0

ldε ≤ 1. lpε

(25)

Let (ℓεp , ℓεd ) be the smallest pair of integers such that √ αk∆yℓεp k ≤ ε and

p

βk∆¯ yℓεd k ≤ ε.

Then, according to (19), (20) we have lpε ≤ ℓεp and from (19), (21) it follows that lε ℓε ℓεd ≤ ldε . Hence, ℓεd ≤ ldε and from (24) we obtain p

p

lim sup

ε→0

ℓεd ldε ≤ lim sup ≤ 1. ε→0 ℓεp lpε



8

To show that the results derived above are indeed sharp, we consider the scalar case n = 1 = m with a quadratic cost function and a linear primal solver. θ f (y) = y 2 + νy 2

where y ∈ R, ν ∈ R and θ ∈ R∗+ .

The coupled primal and dual iterations given as follows: yk+1 = ̺yk + a, with |̺| < 1 and a ∈ R, y¯k+1 = ̺¯ yk + θyk + ν.

(26)

Assuming for simplicity |∆¯ y0 | = 0, we obtain ∀ k ≥ 1, |∆yk | = |̺|k |∆y0 | and

|∆¯ yk | = kθ|̺|k−1 |∆y0 |.

(27)

Therefore, given ε > 0 and two positive coefficients α and β let (ℓεp , ℓεd ) be the pair of integers such that √

α|∆yℓεp | ≈ ε

and

p

β|∆¯ yℓεd | ≈ ε.

(28)

Then, from (27) it follows

ℓεd

|̺|

r

r ℓε p ε β ε β ε ℓ1ε −1 ε θℓd ≈ |̺|ℓp ⇔ |̺| ℓd ≈ ( θℓ ) d . α α d

(29)

Furthermore, we have

|̺|

ℓε p ℓε d

ℓε

( ℓpε −1) ln(|̺|)

−1

=e

d

ℓε

(1− ℓpε )(1−|̺|)

≈e

d

,

which, according to (29), leads to ℓε

1− ℓεp

e

d

r

≈(

1 β ε (1−|̺|)ℓ ε d . θℓd) α

(30)

In addition, according to Theorem 2.1, for sufficiently small ε we may consider ε ε ℓp ε

the following approximation: e ℓd

−1



ℓεp ℓεd .



1− ℓpε

Thus, e

(30) we obtain r 1 ℓεd β ε (1−|̺|)ℓ ε d . θℓ ) ≈ ( d ε ℓp α

d

=

1

ℓε p ε −1 e ℓd



ℓεd ℓεp

and then from

(31)

As we will seeqlater, a more or less rather q natural choice of the weighting coeffi1−|̺| β cients leads to α = θ . Then, we have αβ |∆¯ yk | = k(1 − |̺|)|̺|k−1 |∆y0 | and we can rewrite (31) as

9

1 ℓεd ≈ ℓ˜ℓ˜ ∈ [1, e] ε ℓp

where ℓ˜ = (1 − |̺|)ℓεd .

(32)

Since the right hand side in the approximation (32) is a monotonically √ √ decreasing function, the dual retardation ratio for the weighting coefficients α and β is always going down until to reach the limit 1 for sufficiently small ε. In fact, ln(ε) considering for simplicity |∆y0 | = 1 then for a given ε > 0 we need ℓεp = ln(|̺|) to bound above the primal residual by ε. Furthermore, for ε ≤ e−1 we have ℓεp is ax = −1 the integer after which the perturbation in the adjoint bigger than ℓM d ln(|̺|) equation does not accumulate anymore and thus the dual residual |∆¯ yk | decreases ax . Hence, considering 0 < ε < ε < e−1 we obtain from (27) that for all k ≥ ℓM 2 1 d ℓεp2 ≈ ℓεp1 As

ln(ε2 ) ln(ε1 )

ε2 ln(ε1 ) ε1 ln(ε2 )

ε1

and ℓεd2 ≈ |̺|ℓd

ε −ℓd2 ε1 ε2 ℓd ε1

≈ ℓεd1

ε2 ε1

=⇒

ℓεd1 ε2 ln(ε1 ) ℓεd2 . ≈ ℓεp2 ℓεp1 ε1 ln(ε2 )

belongs to ]0, 1[ that explains the monotonically decreasing of the dual ℓε

retardation ratio ℓεd with respect to lower values of ε. Below, we represent |∆yk | and p q β yk | as functions of k. This test has been carried out by considering |∆y0 | = 1 α |∆¯ ax ≈ 50. and ρ = 0.98. In this case, we have ℓM d 1

Primal residual Weighted dual residual

0.9 0.8

residual

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

50

100

150

200

250

300

350

400

450

500

index k

Figure 1. Primal and weighted dual Residuals for various iteration index

3.

Conditions on the weighting coefficients α and β

In this section, we derive conditions on α and β such that the merit function La defined in (9) is an exact penalty function and the step increment vector associated to the extend iteration (6) yields descent on La . Therefore, we may guarantee reduction on La by using the step increment vector as a search direction. 3.1.

Correspondence condition

We start by computing the gradient of La and then derive a condition on α and β which ensures that a local minimizer of the optimization problem (PG ) is also a local

10

minimizer of La . This condition, which we will refer to as the correspondence condition, means that La is an exact penalty function. In the remainder, we denote ∆Gy = I − Gy which is invertible according to the assumption (1) and using the Perturbation Lemma [22]. Proposition 3.1: The full gradient of La is given by

∇La (y, y¯, u) = −M s(y, y¯, u) ,

where

 α∆G⊤ y −I − βNyy 0 β∆Gy 0 M =  −I ⊤ ⊤ −αGu −βNyu B 

and s is the step increment of the extended iteration (6) defined as follows:  G(y, u) − y s(y, y¯, u) =  Ny (y, y¯, u)⊤ − y¯  . −B −1 Nu (y, y¯, u)⊤ 

(33)

Proof: From the definition of the augmented Lagrangian function La and by an elementary calculation we obtain    −α∆G⊤ ¯, u)⊤ − y¯) ∇ y La y (G(y, u) − y) + (I + βNyy )(Ny (y, y ∇y¯La  =   .(34) G(y, u) − y − β∆Gy (Ny (y, y¯, u)⊤ − y¯) a ⊤ ⊤ ⊤ ⊤ ∇u L αGu (G(y, u) − y) + βNyu (Ny (y, y¯, u) − y¯) + Nu (y, y¯, u) 

In the following Corollary, we establish a condition on the weighting coefficients α and β which ensures that a local minimizer of the optimization problem (PG ) is also a stationary point of the merit function La . Corollary 3.2: There is a 1-1 correspondence between the stationary points of La and the roots of s wherever det[αβ∆G⊤ y ∆Gy − I − βNyy ] 6= 0, for which it is sufficient that αβ(1 − ̺)2 > 1 + βkNyy k. Proof: If the whole matrix M occurring in Lemma 3.1 is nonsingular, then we have a 1-1 correspondence between the stationary points of La and the roots of s. Moreover, this nonsingularity is ensured by fulfilling the following condition: det[αβ∆G⊤ y ∆Gy − I − βNyy ] 6= 0. Let v ∈ Rn \{0}, we observe that 2 2 ⊤ v ⊤ (αβ∆G⊤ y ∆Gy − I − βNyy )v = αβk∆Gy vk − kvk − βv Nyy v.

Therefore, the nonsingularity of the matrix M is ensured by the inequality below αβk∆Gy vk2 > kvk2 + βv ⊤ Nyy v.

(35)

11

In addition, using the Cauchy-Schwarz inequality we obtain v ⊤ Nyy v ≤ kvk kNyy vk =⇒ v ⊤ Nyy v ≤ kvk2 kNyy k.

(36)

Thus, according to (35) and (36) it is sufficient to ensure that αβk∆Gy vk2 > (1 + βkNyy k)kvk2 .

(37)

Moreover, in view of (1) the inequality (37) is implied by αβ(1 − ρ)2 > 1 + βkNyy k.



In the remainder, we denote θ = kNyy k. Thus, we write the correspondence condition as follows: αβ(1 − ̺)2 > 1 + βθ.

(38)

Hence, the exactness of the merit function La follows from proving that the Hessian of La at a local minimizer of the optimization problem (PG ) is positive definite. This result is the subject of the following corollary. Corollary 3.3: La has a positive definite Hessian at all its stationary points where αβ∆G⊤ y ∆Gy > I + βNyy . This inequality is implied by the above correspondence condition. Proof: The Hessian is given by ∇2 La = U1 + U2 S with  ¯) ¯) βGyy (Ny − y¯) αG⊤ αG⊤ yu (G − y) + βNyyu (Ny − y yy (G − y) + βNyyy (Ny − y , βGyy (Ny − y¯) − ∆Gy 0 βGyu (Ny − y¯) + Gu U1 =  ⊤ ⊤ αGuy (G − y) + βNyuy (Ny − y¯), βGyu (Ny − y¯) αGuu (G − y) + βNyuu (Ny − y¯) 

 −α∆G⊤ y I + βNyy 0 0 −β∆Gy 0  U2 =  ⊤ αG⊤ βNyu −B u 

  −∆Gy 0 Gu ∂s(y, y¯, u)  Nyy −∆G⊤ Nyu  . and S = = y ∂(y, y¯, u) −1 N −1 −B −B Nuy −B −1 G⊤ uu u

In particular, at a stationary point of La we can express the Hessian as  ⊤ ⊤ α∆G⊤ y ∆Gy + (I + βNyy )Nyy −(I + βNyy )∆Gy −α∆Gy Gu + (I + βNyy )Nyu . −∆Gy (I + βNyy ) β∆Gy ∆G⊤ Gu − β∆Gy Nyu ∇ 2 La =  y ⊤ ⊤ ⊤ ⊤ −αGu ∆Gy + Nuy (I + βNyy ) −βNuy ∆Gy + Gu αGu Gu + βNuy Nyu + Nuu 

12

Moreover, we prove the following diagonalization of the matrix occurring in the right side of the equality above:

⊤ T ⊤ ∇2 La T = diag[α∆G⊤ y ∆Gy − Nyy − I/β, β∆Gy ∆Gy , H],

(39)

where H is the reduced Hessian defined in (5) and  I 0 ∆G−1 y Gu −1 −⊤  T = ∆G−⊤ y (Nyy + I/β) I ∆Gy (Nyy ∆Gy Gu + Nyu ) . 0 0 I 

(40)

Furthermore, according to the second order necessary condition, H is a positive definite matrix at a local minimizer of the optimization problem (PG ). Therefore, since the matrix ∆Gy ∆G⊤ y is also positive definite it follows that the positive a definiteness of the L Hessian at its stationary points is ensured by the fulfilling of the following condition:

αβ∆G⊤ y ∆Gy > I + βNyy .

3.2.



Descent direction condition

According to Corollary 3.3, all stationary points of La are minima. In the Proposition below, we derive a condition on the weighting coefficients α and β such that the step increment vector s defined in (33) yields descent on La at all its nonstationary points. We will refer to this condition as the descent direction condition. ¯ y = 1 (∆Gy + ∆G⊤ ), the step increment of the exProposition 3.4: With ∆G y 2 tended iteration s(y, y¯, u) yields descent on La for all large positive B if

¯ y > (I + αβ∆G

β ¯ y )−1 (I + β Nyy ), Nyy )(∆G 2 2

(41)

which is implied by

p

αβ(1 − ̺) > 1 +

β θ. 2

(42)

The inequality (42) is a bit stronger than the correspondence condition occurring in Corollary 3.2. Proof: Let s be the step increment vector defined in (33), then by symmetrizing the matrix occurring in Proposition 3.1 as follows:

−s⊤ · ∇La = s⊤ M s = s⊤ we obtain

M + M⊤ s, 2

13

¯ y −I − β Nyy − α Gu  α∆G 2 2 ¯ y − β Nyu  s, −s⊤ · ∇La = s⊤ −I − β2 Nyy β∆G 2 ⊤ − α2 G⊤ − β2 Nyu B u 

(43)

¯ y is defined in Lemma 3.4. In addition, let where ∆G ¯ y −I − β Nyy α∆G 2 A= β ¯y −I − 2 Nyy β∆G 



and C =



 − α2 Gu , − β2 Nyu

the assertion (41) means exactly that the 2 × 2 block A is a positive definite matrix, as follows by block elimination. On the other hand, we verify that 

I 0 −C ⊤ A−⊤ I



A C C⊤ B

  I −A−1 C = diag(A, B − C ⊤ A−1 C). 0 I

Then, by making B sufficiently large, i.e., B ≻ C ⊤ A−1 C, we get the positive definiteness for the whole matrix occurring in (43), which means that s yields descent on La . To prove that (42) implies (41), we start by observing that according to the Perturbation Lemma [22], we have

¯ −1 k ≤ k∆G y

1 . 1−̺

(44)

Then, that (42) and (44) imply

αβ(1 − ̺)kvk2 > k(I +

β 1 β ¯ −1 Nyy )vk2 > k(I + Nyy )vk2 k∆G y k, 2 1−̺ 2

∀v ∈ Rn .

Therefore,

αβ(1 − ̺)kvk2 > k(I +

β ¯ −1 (I + β Nyy )vk, Nyy )vkk∆G y 2 2

∀v ∈ Rn .

Using the Cauchy-Schwarz inequality, the above result implies

αβ(1 − ̺)kvk2 > v ⊤ (I + In addition, since v ⊤

Gy +G⊤ y v 2

β β ¯ −1 Nyy )∆G y (I + Nyy )v, 2 2

∀v ∈ Rn .

(45)

≤ ̺kvk2 , we obtain

¯ y v = kvk2 − v ⊤ v ⊤ ∆G

Gy + G⊤ y v ≥ (1 − ̺)kvk2 , 2

Finally, from (45) and (46), we conclude that

∀v ∈ Rn .

(46)

14

¯ y v > v ⊤ (I + β Nyy )∆G ¯ −1 (I + β Nyy )v, αβv ⊤ ∆G y 2 2 3.3.

∀v ∈ Rn .



Full step descent condition

As we already mentioned, ensuring descent of La rather always accepting the coupled full step should be useful for any combination and sequencing of steps with and without design update for improving feasibility and optimality. In this Subsection, we derive a condition on the weighting coefficients α and β which for fixed design u ensures reduction on La of the full primal and dual feasibility steps. We will refer to this condition as the full step descent condition. In order to establish the previous result, we need to prove the following Lemma: Lemma 3.5: Let ω be a real and γk = kyk − yk−1 k, δk = k¯ yk − y¯k−1 k and E(yk , y¯k , u) = N (yk , y¯k , u) − y¯k⊤ yk .(47) We have 2 2 γk+1 + ωδk+1 ≤ ̺2 γk2 + ω(θγk + ̺δk )2 ,

and E(yk , y¯k , u) − E(yk−1 , y¯k−1 , u) ≤ (5 + ̺)γk δk + θγk2 . Proof: Due to the contractivity assumption, we obtain 2 2 γk+1 + ωδk+1 = kG(yk , u) − G(yk−1 , u)k2 + ωkNy (yk , y¯k , u)⊤ − Ny (yk−1 , y¯k−1 , u)⊤ k2 , (48) ≤ ̺2 γk2 + ωkNy (yk , y¯k , u)⊤ − Ny (yk−1 , y¯k−1 , u)⊤ k2 .

In addition, we have kNy (yk , y¯k , u)⊤ − Ny (yk−1 , y¯k−1 , u)⊤ k ≤ kNy (yk , y¯k , u)⊤ − Ny (yk−1 , y¯k , u)⊤ k, + kNy (yk−1 , y¯k , u)⊤ − Ny (yk−1 , y¯k−1 , u)⊤ k, (49) ≤ θγk + kNy (yk−1 , y¯k , u)⊤ − Ny (yk−1 , y¯k−1 , u)⊤ k. Furthermore, we get kNy (yk−1 , y¯k , u)⊤ − Ny (yk−1 , y¯k−1 , u)⊤ k = kG⊤ yk − y¯k−1 )k ≤ ̺δk .(50) y (yk−1 , u)(¯ Therefore, according to (48), (49) and (50) we obtain 2 2 ≤ ̺2 γk2 + ω(θγk + ̺δk )2 . + ωδk+1 γk+1

15

Now, to prove the second part of the Lemma 3.5, we consider a Taylor-Lagrange expansion for E. Therefore, it exists ξ ∈]0, 1[ such that E(yk , y¯k , u) − E(yk−1 , y¯k−1 , u) = (∆yk−1 , ∆¯ yk−1 ).∇E(˜ yk−1 , y¯˜k−1 , u), = (∆yk−1 , ∆¯ yk−1 ).(Ny (˜ yk−1 , y˜¯k−1 , u)⊤ − y˜¯k−1 , G(˜ yk−1 , u) − y˜k−1 )⊤ , ⊤ ˜ ˜ yk−1 , u) − y˜k−1 k, ≤ γk kNy (˜ yk−1 , y¯k−1 , u) − y¯k−1 k + δk kG(˜ where ∆yk , ∆¯ yk are derived from (10) and

(˜ yk−1 , y˜ ¯k−1 ) = (yk−1 , y¯k−1 ) + ξ(∆yk−1 , ∆¯ yk−1 ). Furthermore, we denote by g the function defined by

g(ξ) = G(yk−1 + ξ∆yk−1 , u) − (yk−1 + ξ∆yk−1 ) ,

ξ ∈]0, 1[.

Moreover, we verify that

g′ (ξ) = [Gy (yk−1 + ξ∆yk−1 , u) − I]∆yk−1 . Then, we obtain

g(ξ) = G(yk−1 , u) − yk−1 +

Z

ξ

0

[Gy (yk−1 + r∆yk−1 , u) − I]∆yk−1 dr,

and thus

kg(ξ)k ≤ kG(yk−1 , u) − yk−1 k + ξ(1 + ̺)γk ≤ 3γk ,

∀ ξ ∈ ]0, 1[.

(52)

By the same way, we define the function ψ by

ψ(ξ) = Ny (yk−1 + ξ∆yk−1 , y¯k−1 + ξ∆¯ yk−1 , u) − (¯ yk−1 + ξ∆¯ yk−1 ),

ξ ∈]0, 1[

for which we verify that

ψ ′ (ξ) = Nyy (yk−1 + ξ∆yk−1 , y¯k−1 + ξ∆¯ yk−1 , u)∆yk−1 + (Gy (yk−1 + ξ∆yk−1 , u) − I)∆¯ yk−1 . Hence, we obtain kψ ′ (ξ)k ≤ θγk + (1 + ̺)δk . Then, we have ψ(ξ) = Ny (yk−1 , y¯k−1 , u) − y¯k−1 +

Z

ξ 0

ψ ′ (r)dr =⇒ kψ(ξ)k ≤ θγk + (2 + ̺)δk(53) .

Therefore, using (52) and (53) in the Taylor-Lagrange expansion of E, we obtain

16

E(yk , y¯k , u)−E(yk−1 , y¯k−1 , u) ≤ {θγk + (2 + ̺)δk } γk +3γk δk = (5+̺)γk δk +θγk2 .  Now, we are ready to establish our result given by the following theorem: Theorem 3.6 : Any combination of coefficients α and β such that

α>

θ(βθ + 2) (5 + ̺[1 + βθ])2 + , 1 − ̺2 β(1 − ̺2 )2

(54)

ensures that at any (yk−1 , y¯k−1 , u) we have La (yk , y¯k , u) − La (yk−1 , y¯k−1 , u) < 0. Proof: With refer to (9) and by taking into account the notations (47), we have 2 a 2 L (yk−1 , y¯k−1 , u) = γk2 + ωδk2 + E(yk−1 , y¯k−1 , u), α α where ω =

β . Therefore, with according to the results of Lemma 3.5 we obtain α

2 a (L (yk , y¯k , u) − La (yk−1 , y¯k−1 , u)) ≤ (̺2 − 1)γk2 + ω(̺δk + θγk )2 − ωδk2 α 2 (5 + ̺)γk δk + θγk2 , + α which is equivalent to 2 a 2θ (L (yk , y¯k , u) − La (yk−1 , y¯k−1 , u)) ≤ (̺2 + ωθ 2 + − 1)γk2 + ω(̺2 − 1)δk2 α α (55) 5+̺ + ̺ωθ)γk δk . + 2( α Therefore, multiplying both sides in (55) by

1 δk2

and denoting pk =

γk δk

we obtain

2 2θ − 1)p2k + ω(̺2 − 1) (La (yk , y¯k , u) − La (yk−1 , y¯k−1 , u)) ≤ (̺2 + ωθ 2 + α αδk2 (56) 5+̺ + 2( + ̺ωθ)pk . α Let ζ be the function defined by

ζ(x) = (̺2 + ωθ 2 +

5+̺ 2θ − 1)x2 + 2( + ̺ωθ)x + ω(̺2 − 1). α α

Then, we verify that 5+̺ + ̺ωθ α ζ ′ (x0 ) = 0 ⇔ x0 = − . 2θ −1 ̺2 + ωθ 2 + α

17

Furthermore, we have 5+̺ + ̺ωθ)2 α ζ(x0 ) = − + ω(̺2 − 1). 2θ ̺2 + ωθ 2 + −1 α (

Thus, with refer to ω =

β α

and by considering α and β such that θ(βθ + 2) , 1 − ̺2 (5 + ̺[1 + βθ])2 (1 − ̺2 )αβ > − , θ 2 ̺ + (βθ + 2) − 1 α

 2θ   −1

(57)

we ensure that La (yk , y¯k , u) − La (yk−1 , y¯k−1 , u) < 0. Furthermore, from (57) we verify that

(1 − ̺2 )αβ > −

(5 + ̺[1 + βθ])2 θ ̺2 − 1 + (βθ + 2) α



α>

θ(βθ + 2) (5 + ̺[1 + βθ])2 + . 1 − ̺2 β(1 − ̺2 )2

Therefore, to have La (yk , y¯k , u) − La (yk−1 , y¯k−1 , u) < 0 it is sufficient to choose α and β such that

α>

θ(βθ + 2) (5 + ̺[1 + βθ])2 + . 1 − ̺2 β(1 − ̺2 )2



(58)

As we can easily verify, the descent condition given by (42) is a bit stronger than the correspondence condition defined in (38). However, the condition derived from (58) is stronger than (42). In the next Section, we explain a reasonable manner to choose the weighting coefficients α and β that is based on the properties of the primal and dual residual behaviors.

4.

Particular choice of α and β

In [12], it was proved that although the dual residual converges essentially at the same speed as the primal residual, it does not decrease monotonically. Consequently, in order to ensure a monotonic reduction of La one may have to put a rather large α on the primal residual. This is reflected in the conditions (38), (42) and (58). However, using a large value of α often severely restricts the step size and slows down the iterations. Hence, we propose two options to choose the values of α and β. In fact, for each condition from (42) and (58) we try to make α as small as possible while satisfying the condition at least as an equality. Option 1: deriving α and β from the descent condition (42): We may minimize α as a function of β such that

18

√ M inβ α ≡

1 + θ2 β √ , (1 − ̺) β

(59)

which leads to the following weighting coefficient values:

β1 =

2 θ

and α1 =

2θ . (1 − ̺)2

(60)

Option 2: deriving α and β from the full step descent condition (58): According to (58), we verify that

α=

θ(βθ + 2) (5 + ̺[1 + βθ])2 + , 1 − ̺2 β(1 − ̺2 )2

which is equivalent to

α=

θ 2 β 2 + 2θ(1 + 5̺)β + 5(5 + 2̺) + ̺2 . β(1 − ̺2 )2

(61)

Thus, we may minimize α as a function of β such that

M inβ α ≡

θ 2 β 2 + 2θ(1 + 5̺)β + 5(5 + 2̺) + ̺2 . β(1 − ̺2 )2

(62)

Furthermore, we verify that θ 2 β 2 + 2θ(1 + 5̺)β + 5(5 + 2̺) + ̺2 ′ θ 2 β 2 − (5[5 + 2̺] + ̺2 ) = . β(1 − ̺2 )2 β 2 (1 − ̺2 )2 Therefore, we obtain the following values:

β2 =

p

5(5 + 2̺) + ̺2 θ

and α2 =

2θ(1 + 5̺ + θβ2 ) . (1 − ̺2 )2

(63)

As we are interested in rather slowly converging fixed point solver, i.e. ̺ close to 1, we consider the following approximations:

β2 ≈ 4.1.

6 θ

and

α2 ≈

6θ . (1 − ̺)2

Estimation of problem parameters

To evaluate α and β we need to estimate ̺ and θ. By neglecting the change in u, these estimations can be obtained from the primal and dual iterates as follows: As from (1), for every index k we have

19

kG(yk , u) − G(yk−1 , u)k ≤ ̺kyk − yk−1 k, then starting with an initial value ̺0 , we update the ̺ value as follows:

̺k+1 = max{

k∆yk k , τ ̺k }. k∆yk−1 k

(64)

where τ ∈ (0, 1). To estimate the value of θ, we consider the following approximation for ∆yk+1 and ∆¯ yk+1 derived from (10):      Gy (yk , uk ) 0 ∆yk+1 ∆yk ≈ . ∆¯ yk+1 yk Nyy (yk , y¯k , uk ) Gy (yk , uk )⊤ ∆¯ Thus, we obtain



∆¯ yk −∆yk

⊤ 

 ∆yk+1 ≈ −(∆yk )⊤ Nyy (yk , y¯k , uk )∆yk , ∆¯ yk+1

and then (∆yk )⊤ Nyy (yk , y¯k , uk )∆yk ≈ (∆yk )⊤ ∆¯ yk+1 − (∆¯ yk )⊤ ∆yk+1 .

(65)

Therefore, one can use the inequality (65) to approximate the value of θ as follows:

θk+1 ≈ max{

|(∆yk )⊤ ∆¯ yk+1 − (∆¯ yk )⊤ ∆yk+1 | , τ θk }. k∆yk k2

(66)

As far as numerical tests are concerned, we obtained the best results when Nyy = I which theoretically can be attained by using a similarity transformation provided ⊤

1

2 be the Cholesky factorisation of Nyy . that Nyy ≻ 0. In fact, let Nyy = Nyy2 Nyy 1 −⊤ 2 y and y˜¯ = Nyy 2 y¯ we obtain Furthermore, by considering y˜ = Nyy

β ˜ α ˜ ⊤ ˜ (˜ ˜ a (˜ y , u) − y˜k2 + kN y , y˜¯, u) − y˜¯k2 + N y , y˜¯, u) − y˜¯ y˜. L y , y˜ ¯, u) = kG(˜ y˜(˜ 2 2 1

1

1

− − 2 ˜ y , u) = Nyy ˜y˜(˜ where G(˜ G(Nyy 2 y˜, u), f˜(˜ y , u) = f (Nyy 2 y˜, u) and N y , y˜¯, u) = ⊤ a ˜ ˜ y , u) y˜ ˜ (˜ ˜ s˜(˜ f (˜ y , u) + G(˜ ¯. Thus, we verify that ∇L y , y˜¯, u) = −M y , y˜¯, u) where

˜ ⊤ −(1 + β)I 0  α∆G y˜ ˜ =  −I ˜ y˜ 0  , M β∆G ⊤ ˜ ˜ −αGu −β Ny˜⊤u B 

 ˜ y , u) − y˜ G(˜ ˜y˜(˜ s˜(˜ y , y˜¯, u) =  N y , y˜¯, u)⊤ − y˜¯  . −1 ˜u (˜ −B N y , y˜¯, u)⊤ 

20 1

1

1



− −2 2 2 ˜ y˜ = I − Nyy ˜ u = Nyy ˜y˜u = Nyy With ∆G Gy Nyy 2 , G Gu and N Nyu .

˜ y˜) = ̺(Gy ) = ̺. As we are applying a similarty transformation, we have ̺(G 1

2 Therefore, if we have a state space preconditioner P ≈ Nyy for which P and P −1 can be evaluated at reseanable cost, then we may use the modified augmanted Lagrangian

˜ a (y, y¯, u) = α kP (G(y, u) − y)k2 + β kP −⊤ (Ny (y, y¯, u) − y¯)k2 + N (y, y¯, u) − y¯⊤ y. L 2 2 Hence, by the same techniques used to prove Lemma 3.4, we obtain the descent condition associated to the transformed coordinates given by

p

αβ(1 − ̺) > 1 +

β θ˜ 2

1

2 where θ˜ = kP −1 Nyy k.

(67)

Then, we obtain the following approximations: 2 2θ˜ β1 = , α1 = (1 − ̺)2 θ˜ 4.2.

and β2 =

6θ˜ 6 , α2 = . (1 − ̺)2 θ˜

Choice of B

Currently, we use the preconditioner B defined by ⊤ B = αG⊤ u Gu + βNyu Nyu + Nuu .

(68)

which approximates ∇2uu La in a neighborhood of a KKT point. According to the construction of B, we expect an asymptotic convergence near a KKT point. However, in order to enforce convergence we apply a line-search procedure based on a quadratic function involving weighted Euclidean norms of primal and dual linear interpolations and a convential quadratic interpolation of the Lagrangian between a base and a current points. Therefore, the line-search procedure used does not require the computation of any second derivative of the original problem functions.

5.

Numerical tests

In this section, we present some numerical results obtained by considering the Bratu equation, which is frequently used in combustion modeling, on the unit square.  ∆y(x) + ey(x) = 0    y(0, x2 ) = y(1, x2 ) y(x1 , 0) = sin(2πx1 )    y(x1 , 1) = u(x1 )

x = (x1 , x2 ) ∈ [0, 1]2 x2 ∈ [0, 1] x1 ∈ [0, 1] x1 ∈ [0, 1]

(69)

21

Hence, the problem is periodic with respect to the horizontal coordinate x1 and has Dirichlet boundary conditions on the lower and upper edge of the unit square. The function u is viewed as a boundary control that can be varied to minimize the objective function

f (y, u) =

Z

0

1

2 Z 1 ∂y [u(x1 )2 + u′ (x1 )2 ]dx1 . (x1 , x2 )|x2 =1 − 4 − cos(2πx1 ) dx1 + σ ∂x2 0

In the following numerical tests we use σ = 0.001 and the control=design u is set initially to the constant u(x1 ) = 2.2 (see [9]). As far as the discretization of the problem (69) is concerned, we consider a five points central difference scheme with meshwidth 1/10 so that the resulting nonlinear system involves 100 equations in as many variables. Since the nonlinearities occur only on the diagonal, we implement Jacobi’s method to obtain the basic G(y, u). We are interested in representing the retardation behaviors with respect to various ε. In fact, given ε ≥ 0 we denote k, ℓ and m the needed iteration numbers to obtain



αk∆yk k ≤ ε ,



αk∆yℓ k+

p

βk∆¯ yℓ k ≤ ε and



αk∆ym k+

p

βk∆¯ ym k+k∆um k ≤ ε.

Then, we refer to the ratio kℓ as the weighted dual retardation and to m k as the weighted total retardation. For the numerical tests represented below, we used approximations for ̺ and θ as indicated in (64), (66). As far as the values of the weighting coefficients α and β are concerned, we used the expressions given in (60). We represent in Figure 2 and Figure 3 the weighted retardation behaviors where the x-axis describes the ε values rescaled to a logarithm scale.

Figure 2. Weighted dual retardation ratio for various precisions

22

REFERENCES 7

Ratio of retardation

6.5

6

5.5

5

4.5

4 0.001 1e-04 1e-05 1e-06 1e-07 1e-08 1e-09 1e-10 1e-11 1e-12 Precision

Figure 3. Weighted total retardation ratio for various precisions

The analysis of the above numerical results shows that we have dual bounded retardation as well as total bounded retardation. Furthermore, we observe from Figure 3 that we can solve the constrained optimization problem with almost 4.5 times the cost needed to obtain feasibility which seems to be a reasonable cost. 6.

Conclusion

In this paper we solved a design optimization problem, where the constraint is a state equation, by using a descent approach on a merit function La of the augmented Lagrangian type. We established a dual bounded retardation result and derived sufficient conditions on the weighting coefficients such that the proposed merit function La be an exact penalty function and the step increment vector yields descent on that function at all its nonstationary points. A suitable preconditioner B has been proposed and some numerical tests have been performed by considering the Bratu equation on the unit square. The obtained results show the advantage to use the descent approach rather always accepting the coupled full step iterations. In fact, this approach enables to use whatever combination of steps for improving primal, dual feasibility and optimality which reduced the needed iteration number to reach convergence.

Outlook As outlook for our present study, we are working towards a global convergence proof. In addition, we aim to reduce the evaluation cost of the proposed preconditioner B. In fact, as defined in (68), the used B involves some matrix derivatives which may be costly computed. We are studying a quasi-Newton approximations for the required matrix dervatives in the expression of the preconditioner B. We are also using the FAS (Full Approximation Storage) multigrid approach (see [2] and [13]) to decrease the meshwidth.

References [1] J. F. Bonnons, J. Charles Gilbert, C. Lemar´ echal, C. A. Sagastiz´ abal, (2003) Numerical Optimization Theoretical and Practical Aspects, Springer-Verlag Berlin Heidelberg [2] A. Brandt (1984) Multi-Grid techniques: 1984 guide with applications to fluid dynamics, GMDStudien. no 85, St. Augustin, Germany.

REFERENCES

23

[3] J.E. Dennis, JR. and Robert B. Schnabel (1983) Numerical Methods for unconstrained optimization and Nonlinear Equations, Prentice-Hall, Inc, Englewood Cliffs, New Jersey 07632, USA. [4] L.C.W Dixon (1979) Exact Penalty Function Methods in Nonlinear Programing. Report NOC, The Hatfield Polytechnic, n. 103. [5] R. Fontecilla, T. Steihaug and R.A. Tapia (1987) A Convergence Theory for a Class of Quasi-Newton Methods for Constrained Optimization. SIAM J. Numer. Anal. 24 pp. 1133-1151. [6] I. Gherman and V. Schulz (2005) Preconditioning of one-shot pseudo-timestepping methods for shape optimization, PAMM 5, num. 1, p. 741-759. [7] J.Ch. Gilbert (1992) Automatic Differentiation and iterative Processes, Optimization Methods and Software, vol. 1, p.13-21. [8] A. Griewank (2000) Evaluating Derivatives: Principles and techniques of Algorithmic Differentiation, Society for Industrial and Applied Mathematics, Philadelphia-USA. [9] A. Griewank (2006) Projected Hessians for Preconditioning in One-Step One-Shot Design Optimization, Large Scale Nonlinear Optimization, vol., p. 151-171. [10] A. Griewank and C. Faure (2002) Reduced Functions, Gradients and Hessians from Fixed Point Iteration for state Equations, Numerical Algorithms, vol. 30, num 2, p. 113-139. [11] A. Griewank, N. Gauger, J. Riehme (2006) Extension of fixed point PDE solvers for optimal design by single-step one-shot method, REMN, to appear. [12] A. Griewank and D. Kressner (2005) Time-lag in Derivative Convergence for Fixed Point Iterations, ARIMA, vol. ,p. 87-102. [13] W. Hackbusch (1985) Multi-Grid Methods and Applications, Springer Verlag, Heidelberg. [14] A. Jameson (1995) Optimum aerodynamic design using CFD and control theory, In 12th AIAA Computational Fluid Dynamics Conference, AIAA paper 95-1729, San Diego, CA, 1995. American Institute of Aeronautics and Astronautics. [15] C.T. Kelley (1999) Iterative Methods for optimizations, Society for Industrial and Applied Mathematics, Philadelphia-USA. [16] B. Mohammadi and O. Pironneau (2001) Applied Shape Optimization for Fluids. Numerical Mathematics and scientific Computation, Cladenen Press, Oxford. [17] P.A. Newman, G.J.-W. Hou, H.E. Jones, A.C Taylor and V.M.Korivi (1992) Observations on computational methodologies for use in large-scale, gradient-based, multidisciplinary design incorporating advanced CFD codes. Technical Memorandum 104206, NASA Langley Research Center. AVSCOM Technical Report 92-B-007. [18] G. Di Pillo and L. Grippo (1986) A Continuously Differentiable Exact Penalty Function for Nonlinear Programming Problems with Inequality Constraints. SIAM J. Control Optim. 23 pp. 72-84. [19] G. D. Pillo (1994) Exact penalty methods, in E. Spedicato (ed.), Algorithms for continuous optimization: the state of the art, kluwer Academic Publishers, p. 209-253. [20] C. Tutsch (2006) A quasi-Gauss-Newton Method Based on Transposed Broyden Update, Thesis Diploma, Humboldt University zu Berlin, Germany. [21] P.-A. Wedin (1974) On the Gauss-Newton method for the nonlinear least squares problems, Working paper 24, Institute for Applied Mathematics, Stockholm, Sweden. [22] J. Werner (1992) Numerische Mathematik 1: Lineare und nichtlineare Gleichungssysteme, Interpolation, numerische Integration, Friedr. Vieweg & Sohn Verlagsgesellschaft mbH, Braunschweig/Wiesbaden, Germany.