A QP-Free, Globally Convergent, Locally ... - Semantic Scholar

Report 1 Downloads 74 Views
A QP-Free, Globally Convergent, Locally Superlinearly Convergent Algorithm for Inequality Constrained Optimization∗ Eliane R. Panier and Andr´e L. Tits Department of Electrical Engineering and Institute for Systems Research University of Maryland at College Park College Park, MD 20742 USA Jos´e N. Hersovits Programa de Engenharia Mecˆanica COPPE / Universidade Federal do Rio de Janeiro 21945 Rio de Janeiro, Brazil

TO PRINT THE ENTIRE PAPER: Depending on the viewer you use, you may or may not be able to see more than this cover page. However, if you save the ps file or print it out you will obtain the entire paper: this cover page followed by 51 more pages.

Erratum: In equation (5.3) (statement of Proposition 5.1) as well as in the last displayed equation in the proof of Proposition 5.1 (expression for λ0k ), Mk Bk−1 should be Bk−1 Mk . (This error is present both in the published version and in this ps file: technical difficulties prevented us from generating a corrected ps file; we apologize.)

Copyright (C) by the Society for Industrial and Applied Mathematics, in SIAM Journal on Control and Optimization, 26 (1988), pp. 788–811.

This research was supported by the National Science Foundation under grants No. DMC-84-20740 and CDR-8500108, and by the FINEP (Brazil). It was initiated when the second author visited COPPE on a travel grant from the Partners of the Americas. ∗

1

Eliane R. Panier Andr´e L. Tits Electrical Engineering Department and Systems Research Center University of Maryland, College Park, MD 20742. Jos´e N. Herskovits Programa de Engenharia Mecˆ anica COPPE / Universidade Federal do Rio de Janeiro, 21945 Rio de Janeiro, Brazil. Abstract. An algorithm is proposed for the minimization of a smooth function subject to smooth inequality constraints. Unlike Sequential Quadratic Programming type methods, this algorithm does not involve the solution of quadratic programs, but merely that of linear systems of equations. Locally the iteration can be viewed as a perturbation of a quasi-Newton iteration on both the primal and dual variables for the solution of the equalities in the KuhnTucker first order conditions of optimality. It is observed that, provided the current iterate is feasible and the current multiplier estimates are strictly positive, the primal component of the quasi-Newton direction is a direction of descent for the objective function. This fact is used to induce global convergence, without the need of a surrogate merit function. A careful ‘bending’ of the search direction prevents any Maratos-like effect, and local superlinear convergence is proven. While the algorithm requires that an initial feasible point be available, the successive iterates it constructs are feasible as well, a valuable property in the context of engineering system design. Key words: constrained optimization, quasi-Newton iteration, global convergence, superlinear convergence, engineering design. AMS(MOS) subject classifications. 90C30, 65K10. Abbreviated title. A QP-Free Algorithm for Constrained Optimization.

This research was supported by the National Science Foundation under grants No. DMC84-20740 and CDR-85-00108, and by the FINEP (Brazil). It was initiated when the second author visited COPPE on a travel grant from the Partners of the Americas.

By analogy with a technique widely used in equality constrained optimization, it is proposed to solve the inequality constrained problem

min f (x) s.t. g(x) ≤ 0,

(1.1)

where f : IRn → IR and g : IRn → IRm are smooth, using a quasi-Newton iteration applied to the solution of the equalities in the Kuhn-Tucker first order necessary conditions of optimality

∇x L(x, λ) = 0

λi gi (x) = 0, where L(x, λ) = f (x) +

Pm

i=1

i = 1, . . . , m.

λi gi (x) is the Lagrangian associated with (1.1). Specifically,

consider the linear system in (d0 , λ0 )

Hd0 + ∇x L(x, λ0 ) = 0

µi h∇gi (x), d0 i + λ0i gi (x) = 0,

i = 1, . . . , m

(1.2)

(1.3)

where H is an estimate of the Hessian of L, x the current estimate of a solution x∗ , x + d0 the next estimate, µ the current estimate of the Kuhn-Tucker multiplier vector associated with x∗ and λ0 the next estimate of this vector. Locally, this iteration is a higher order perturbation of the Sequential Quadratic Programming (SQP) iteration and a stabilization scheme to force global convergence has been proposed in the form of a compound algorithm involving the solution of a quadratic (or a linear) program at each one of the early iterations 1

[14]. In this scheme, the local iteration is used if the step it yields is ‘small enough’ according to a prespecified linear convergence rate. Although this algorithm is proven to converge, it is subject to ‘false starts’ and repeated switching that are bound to reduce its efficiency.

In [5], [6], an iteration similar to (1.2)-(1.3) is considered, although µ is not interpreted as the current multiplier estimate. There it is observed that, if H is positive definite, if µ has strictly positive components, and if x satisfies the constraints as strict inequalities, d0 is a descent direction for both f and L(·, λ0 ). To allow a reasonably long step to be taken without leaving the feasible set, a modified direction d is computed by replacing the right hand side of (1.3) by a suitably small negative quantity. With such a search direction and with a stepsize rule based on the decrease of f and of the gi ’s for which the estimated multiplier λi is negative, global convergence to Kuhn-Tucker points is proven [6]. However, local superlinear convergence of the quasi-Newton iteration is lost due to truncation of the step. Thus it is proposed in [5] to use instead a line search based on a decrease of L(·, λ0 ). Under the assumption that the sequence of iterates converges, it is proven that the limit point x∗ is still a Kuhn-Tucker point. However, since λ0 (and thus the merit function L(·, λ0 )) changes at each iteration, oscillations between several accumulation points are clearly possible. It is then proven in [5] that, under the previous assumption and provided the components of µ corresponding to constraints that are active at the solution x∗ are forced to converge to the Kuhn-Tucker multipliers at x∗ , local superlinear convergence occurs. However, the question of devising a suitable updating rule for µ is not addressed and, as global convergence requires that the components of µ corresponding to constraints that are active at the solution be uniformly bounded from below, knowledge of 2

a lower bound for the corresponding multipliers is required.

The algorithm proposed in this paper overcomes the shortcomings just pointed out, namely (i) by using a stepsize rule based on a decrease of f , it avoids oscillations; sufficient descent is achieved by carefully limiting the size of the perturbation defining d, (ii) by using a further correction d˜ and a search along an arc [9], [13], it prevents the step from being truncated close to the solution, and (iii) it includes an updating rule for the vector µ which, while preserving global convergence, ensures convergence of the components of that vector to the true multipliers at the solution, thus ensuring superlinear convergence. While the need to provide an initial feasible point may be felt as a liability, the fact that all iterates are feasible, coupled with the decrease of the objective function at each iteration, can be of great value, for instance in the context of engineering design [12]. To our knowledge, the only existing superlinearly convergent algorithm enjoying these properties is the one in [13], and it requires the solution of two quadratic programs at each iteration. It is clear that the proposed algorithm can be extended to handle equality constraints as well, since for equality contrained problems the SQP iteration itself amounts to solving a system of linear equations. Obviously, however, feasibility of the successive iterates with respect to equality constraints cannot be preserved, and a merit function taking these into account must be used. A scheme such as the one used in [10] and in [6] may be appropriate. Finally, Coleman and Conn [1], [2] and Spellucci [17] have also proposed superlinearly convergent algorithms in which computation of a search direction requires only the solution of linear systems of equations. Unlike ours, these schemes are based on active set strategies. 3

Thus, let d0 be as defined in (1.2)-(1.3). Suppose that µ > 0 and g(x) < 0, and assume that H is positive definite. Clearly, d0 is a descent direction for f at x since (1.2) and (1.3) yield 0

0

0

h∇f (x), d i = −hd , Hd i −

m X

λ0i h∇gi (x), d0 i

i=1

= −hd0 , Hd0 i +

m X (λ0 )2 i

i=1

µi

gi (x)

(1.4)

≤ −hd0 , Hd0 i ≤ −σkd0 k2

(1.5)

for some positive σ. However, as pointed out above, d0 is not entirely suitable as a search direction. Indeed, if any gi (x) becomes very close to 0, (1.3) forces d0 to tend to a direction tangent to the feasible set. Since feasibility of all iterates is required, this may result in a collapse of the step length, and convergence to a non-stationary point may occur. This effect can be avoided if one substitutes in the right hand side of (1.3) a negative number −ǫ. In order for the convergence properties of the quasi-Newton iteration to be preserved, ǫ would have to tend to zero faster than d0 , say, ǫ = kd0 kν for some ν > 1. Thus, after d0 is obtained, a new direction d1 would be computed by solving the linear system (see also [5], [6], [13])

Hd1 + ∇x L(x, λ1 ) = 0 µi h∇gi (x), d1 i + λ1i gi (x) = −µi kd0 kν ,

i = 1, . . . , m.

(1.6)

The introduction of µi in the right hand side of (1.6) reflects the fact that the correction is necessary only close to the constraint boundaries. Unfortunately, the new direction d1 may no 4

longer be a descent direction for f . Indeed, (1.5) now becomes h∇f (x), d1 i ≤ −σkd1 k2 + kd0 kν

X

λi

i

It is thus proposed to use a search direction of the form d = (1 − ρ)d0 + ρd1 and the corresponding approximate multiplier vector λ = (1 − ρ)λ0 + ρλ1 with ρ ∈ [0, 1] as large as possible subject to the constraint that h∇f (x), di ≤ θh∇f (x), d0 i for a prespecified θ ∈ (0, 1). It turns out that, with such a search direction, the basic convergence properties of the quasi-Newton iteration are preserved. Equation (1.4) suggests that linear system (1.2)-(1.3) may be singular if any of the component of µ vanishes (since d0 may then tend to infinity), and that chronic ill-conditioning is likely to occur if some of these component are small. While, as shown below, bounding the components of µ away from zero ensures convergence to Kuhn-Tucker points, such strategy may prevent superlinear convergence as µ may not converge to the true multiplier vector at the solution. It thus seems reasonable to bound µ away from 0 unless convergence to a KuhnTucker point is detected, i.e., to use an update rule of the type (superscript ‘+’ indicates the next iteration) µ+ i = max{λi , kdk}, 5

i = 1, . . . , m

if all multipliers λi seem to converge to nonnegative numbers, and µ+ i = µ0,i ,

i = 1, . . . , m

otherwise, where µ0,i are given positive numbers. The multipliers λi can be considered not to converge to nonnegative numbers if, e.g., they belong to the set J = {i s.t. λi ≤ gi (x)}. Indeed, assuming that strict complementarity holds, J will be empty in the vicinity of a KuhnTucker point. With such an updating scheme, components of µ may become very small only in the vicinity of a Kuhn-Tucker pair (x∗ , µ∗ ), where (1.2)-(1.3) is well behaved whenever second order sufficiency conditions of optimality hold. Maratos [8] pointed out that, in the case of the SQP iteration (direction referred to below as dSQP ) with line search based on the decrease of an exact penalty function, the unit step may not be accepted close to a solution. Mayne and Polak [9] later showed that, although x + dSQP may not be ‘lower’ than x, x + dSQP + d˜ will, if d˜ solves 1 ˜2 min kdk 2

(1.7)

˜ = 0 ∀ i ∈ I(x ) s.t. gi (x + dSQP ) + h∇gi (x), di ∗

where I(x∗ ) is the index set of active constraints at x∗ . To combine the global convergence ˜ properties associated with dSQP and the local convergence properties associated with dSQP + d, Mayne and Polak then suggest that the search be performed along the arc x + tdSQP + ˜ 1 Correction (1.7) can be viewed as a second quasi-Newton iteration (with ∇gi (x + dSQP ) t2 d. 1

A similar idea had previously been used by Maratos [8], McCormick [11] and Gabay and

Luenberger [3] among others. 6

estimated by ∇gi (x)), analogous to an equality constraint ‘restoration’ phase, aiming at better approximating the solution x∗ by more closely approaching the boundaries of the constraints thought to be active at the solution. With a strict complementarity assumption in mind, we estimate the set I(x∗ ) by I = {i s.t. gi (x) ≥ −λi }. It turns out that, if d is close enough to dSQP , since the penalty function used in [9] is exactly f (x) when x is in the feasible set, a correction d˜ obtained with d substituted in (1.7) does ˜ being sufficiently lower than f (x), close to x∗ . As d˜ is a second order result in f (x + d + d) correction on d, the condition is that d − dSQP be of order higher than 2, which can be insured by selecting ν > 2 in (1.6). A remaining difficulty is that x + d + d˜ may fail to be in the feasible set, resulting in the unit step being rejected due to infeasibility. To prevent such an occurrence, a further ‘bending’ of the search direction will be used here (see [13]), i.e., the correction d˜ will be obtained by solving 1 ˜2 min kdk H 2 ˜ = −ψ ∀ i ∈ I s.t. gi (x + d) + h∇gi (x), di ˜ H = hd, ˜ H di ˜ 1/2 is now used for sake of uniformity. The positive quantity where the norm kdk ψ has to be carefully chosen, as excessive bending would jeopardize the descent property on f we just obtained. Also, if d˜ is too large, even the unit stepsize may not yield superlinear convergence. A careful implementation of the ideas just put forth does indeed allow achievement of the properties claimed. The proposed algorithm is precisely stated in Section 2. In Section 3, it 7

is shown that, under mild assumptions, this algorithm is convergent irrespective of the initial guess. Rate of convergence analysis is the object of Section 4, where conditions for superlinear convergence are given. In Section 5, implementation issues are discussed. In particular, it is shown that computational requirements can be reduced, in the worst case, to little more than two m × m matrix decompositions per iteration (in addition to function evaluations).

Throughout the paper, we will denote the feasible set by

X = {x ∈ IRn s.t. gi (x) ≤ 0,

i = 1, . . . , m}

and the strictly feasible set by

X0 = {x ∈ IRn s.t. gi (x) < 0,

i = 1, . . . , m}.

The set of active indices at a point x is defined as

I(x) = {i ∈ {1, . . . , m} s.t. gi (x) = 0}.

The Euclidean norm of any vector v will be denoted by kvk, the corresponding induced norm of a matrix M by kM k and the cardinality of any finite set I by |I|, and the empty set will be written as ∅. A point x is said to be stationary for (1.1) if it is feasible and satisfies the equalities m X

λi ∇gi (x) = 0

λi gi (x) = 0,

i = 1, . . . m

∇f (x) +

i=1

8

for some multipliers λi ,

i = 1, . . . m. If, moreover, the multipliers are all nonnegative, the

point x is said to be a Kuhn-Tucker point associated with Problem (1.1).

The following algorithm is proposed for solving Problem (1.1). Algorithm A. Parameters. ¯ > 0. α ∈ (0, 12 ), β ∈ (0, 1), θ ∈ (0, 1), ν > 2, τ ∈ (2, 3), κ ∈ (0, 1), µ Data. x0 ∈ X0 ; H0 ∈ IRn×n , symmetric positive definite matrix; µ0,i scalars satisfying ¯, 0 < µ0,i ≤ µ

i = 1, . . . , m.

Step 0. Initialization. k = 0.

Step 1. Computation of a search direction. i. Compute d0k and λ0k solving the linear system in (d, λ) (LS1)

(

Hk d + ∇f (xk ) +

Pm

i=1

λi ∇gi (xk ) = 0

µk,i h∇gi (xk ), di + λi gi (xk ) = 0,

(2.1)

i = 1, . . . , m. (2.2)

If d0k = 0 stop. ii. Compute d1k and λ1k solving the linear system in (d, λ) (LS2)

(

Hk d + ∇f (xk ) +

Pm

i=1

λi ∇gi (xk ) = 0

µk,i h∇gi (xk ), di + λi gi (xk ) = −µk,i kd0k kν , 9

(2.3) i = 1, . . . , m (2.4)

iii. Compute the search direction dk and the approximate multiplier vector λk according to dk = (1 − ρk )d0k + ρk d1k ; λk = (1 − ρk )λ0k + ρk λ1k ,

(2.5)

where ρk =

(

1 (1 − θ)

h∇f (xk ),d0k i h∇f (xk ),d0k −d1k i

if h∇f (xk ), d1k i ≤ θh∇f (xk ), d0k i

(2.6)

otherwise.

iv. Compute the set of indices of ‘active’ constraints Ik = {i s.t. gi (xk ) ≥ −λk,i }

(2.7)

and the set of indices of constraints with multipliers of ‘wrong sign’ Jk = {i s.t. λk,i ≤ gi (xk )}.

(2.8)

v. If Jk = ∅, compute a correction d˜k , solution of the linear least squares problem in d (LS3)

(

min

1 2 2 kdkHk

s.t. gi (xk + dk ) + h∇gi (xk ), di = −ψk

∀ i ∈ Ik

where ψk = max{kdk kτ , max | i∈Ik

µk,i − 1|κ kdk k2 }. λk,i

(2.9)

If Jk 6= ∅ or if (LS3) has no solution or if kd˜k k > kdk k, set d˜k = 0.

Step 2. Line search. Compute tk , the first number t of the sequence {1, β, β 2 , . . .} satisfying f (xk + tdk + t2 d˜k ) ≤ f (xk ) + αth∇f (xk ), dk i

(2.10)

gi (xk + tdk + t2 d˜k ) ≤ gi (xk ),

(2.11)

gi (xk + tdk + t2 d˜k ) < 0, 10

i ∈ Jk

i 6∈ Jk .

(2.12)

Step 3. Updates. Compute a new symmetric definite positive approximation Hk+1 to the Hessian matrix. If Jk 6= ∅, set µk+1,i = µ0,i ,

i = 1, . . . , m;

(2.13a)

otherwise, set ¯}, µk+1,i = min{max{λk,i , kdk k} , µ

i = 1, . . . , m.

(2.13b)

Set xk+1 = xk + tk dk + t2k d˜k . Set k = k + 1. Go back to Step 1.

Condition (2.11) in the line search has not been discussed so far. By forcing a decrease of the constraints with multiplier of ‘wrong’ sign, it ensures that stationary points that are not Kuhn-Tucker points will be avoided. This will always be possible since, as is easily checked, dk is a descent direction for these constraints.

In this section, it is first shown that Algorithm A is well defined, i.e., that (LS1) and (LS2) always have a unique solution and that the line search (Step 2) completes successfully. This is the object of Lemmas 3.1-3.2 and of Proposition 3.3. The case when the algorithm stops after finitely many iterations is then considered in Proposition 3.4. Finally, Lemmas 3.5-3.9, Proposition 3.10 and Theorem 3.11 constitute the actual convergence analysis. The following is assumed throughout the paper. 11

Assumptions. A1. The strictly feasible set X0 is nonempty. A2. The objective f is continuously differentiable. A3. The constraints gi ,

i = 1, . . . , m are continuously differentiable.

A4. The set X ∩ {x s.t. f (x) ≤ f (x0 )} is compact. A5. For all x ∈ X, the vectors ∇gi (x),

i ∈ I(x) are linearly independent.

A6. There exist σ1 , σ2 > 0 such that, σ1 kdk2 ≤ hd, Hk di ≤ σ2 kdk2 ,

∀ k, ∀ d ∈ IRn .

Our first task is to show that the algorithm is well defined. That (LS1) and (LS2) have a unique solution follows from Lemma 3.1. This lemma will be of further help later on, in the convergence analysis. Lemma 3.1. Given any vector x ∈ X, any positive definite matrix H ∈ IRn×n and any nonnegative vector µ ∈ IRm such that µi > 0 if gi (x) = 0, the matrix F (x, H, µ) defined by H  µ1 ∇g1 (x)T F (x, H, µ) =  ..  . 

µm ∇gm (x)T

∇g1 (x) . . . ∇gm (x)  g1 (x) °   ..  . ° gm (x)

is nonsingular. Proof. It is enough to show that the only solution (d, λ) of the homogeneous system

Hd +

m X

λi ∇gi (x) = 0

i=1

12

(3.1)

µi h∇gi (x), di + λi gi (x) = 0,

i = 1, . . . , m

(3.2)

is (0,0). Scalar multiplication of both sides of (3.1) by d yields

hd, Hdi +

m X

λi h∇gi (x), di = 0.

(3.3)

i=1

On the other hand, it follows from (3.2) and the assumption on µ that,

h∇gi (x), di = 0 ∀ i ∈ I(x).

Similarly, λi = 0 ∀ i s.t. µi = 0.

(3.4)

Therefore, m X

X

λi h∇gi (x), di =

i=1

λi h∇gi (x), di.

{i6∈I(x) s.t. µi >0}

Using again (3.2), one gets m X

X

λi h∇gi (x), di = −

i=1

{i6∈I(x) s.t. µi >0}

λ2i gi (x). µi

(3.5)

Relationship (3.3) thus yields

hd, Hdi −

X

{i6∈I(x) s.t. µi >0}

Since H is positive definite and gi (x) ≤ 0,

λ2i gi (x) = 0. µi

i = 1, . . . , m, the last inequality implies d = 0 and

λi = 0 for all i 6∈ I(x) such that µi > 0. In view of (3.4), Assumption A5 together with (3.1) then implies that (d, λ) = (0, 0).

13

Thus, (d0k , λ0k ) and (d1k , λ1k ) are well defined. The descent properties of dk follow from the next lemma. Lemma 3.2. The search direction dk satisfies h∇f (xk ), dk i ≤ θh∇f (xk ), d0k i ≤ −θhd0k , Hk d0k i.

(3.6)

Moreover, h∇gi (xk ), dk i < 0

∀ i ∈ Jk

(3.7)

Proof. Scalar multiplication of both sides of (2.1) by d0k yields, using (2.2), h∇f (xk ), d0k i

=

−hd0k , Hk d0k i

+

2 m X λ0k,i i=1

µk,i

gi (xk ).

In view of the feasibility of iterate xk , one has h∇f (xk ), d0k i ≤ −hd0k , Hk d0k i and (3.6) follows from (2.5)-(2.6). Finally, (2.2), (2.4) and the definition of dk in (2.5) yield

h∇gi (xk ), dk i = −

λk,i gi (xk ) − ρk kd0k kν µk,i

so that, in view of (2.8),

h∇gi (xk ), dk i < −

1 (gi (xk ))2 − ρk kd0k kν < 0 ∀ i ∈ Jk . µk,i 14

These descent properties are the key to proving that the line search always completes, thus that the algorithm is well defined. Proposition 3.3. The algorithm is well defined. Proof. We just need to show that the line search yields a step tk = β j for some finite j = j(k). Since g(xk ) < 0, in view of regularity Assumption A3, it is enough to show that (2.10) and (2.11) hold for t small enough. The Mean Value Theorem yields f (xk + tdk + t2 d˜k ) = f (xk ) + th∇f (xk + ξdk + ξ 2 d˜k ), dk + 2ξ d˜k i for some ξ ∈ [0, t]. Since f is continuously differentiable, h∇f (xk ), dk i < 0 (see (3.6) and Assumption A6) and α < 1, there exists tf > 0 such that f (xk + tdk + t2 d˜k ) ≤ f (xk ) + αth∇f (xk ), dk i ∀ t ∈ [0, tf ]. For i ∈ {1, . . . , m} it also holds that gi (xk + tdk + t2 d˜k ) = gi (xk ) + th∇gi (xk + ξi dk + ξi2 d˜k ), dk + 2ξi d˜k i for some ξi ∈ [0, t]. If i belongs to Jk , in view of (3.7) and the fact that gi is continuously differentiable, there exists ti > 0 such that gi (xk + tdk + t2 d˜k ) ≤ gi (xk ) 15

∀ t ∈ [0, ti ].

Define t = min{tf , ti , i ∈ Jk }. The line search conditions (2.10) and (2.11) are satisfied for all t in [0, t].

The particular case when the algorithm stops at an iterate xk at Step 1.i is considered next. Proposition 3.4. If the algorithm stops at a point xk with d0k = 0, then ∇f (xk ) = 0. Proof. If the solution d0k of (LS1) is zero, from (2.2) and the strict feasibility of the iterate xk , it follows that λ0k,i = 0,

i = 1, . . . , m, so that, in view of (2.1), ∇f (xk ) = 0.

Thus, a point at which the algorithm stops is a particular Kuhn-Tucker point. From now on, it is assumed that the algorithm never stops. The next two lemmas, direct consequences of Lemma 3.1, give asymptotic properties of the solutions of (LS1) and (LS2) as {xk } approaches a limit x∗ , not necessarily a stationary point. Lemma 3.5 asserts that dk is a good approximation to d0k whenever d0k is small, provided a certain condition, related to strict complementarity, holds at x∗ . Lemma 3.6 establishes that, under the same assumptions, the solutions of both problems are well behaved as xk approaches x∗ . These lemmas will be of use in proving global convergence of the algorithm as well as in proving superlinear convergence in Section 4. 16

Lemma 3.5. Let K ⊂ IN be a subset of indices such that, for some x∗ and µ∗ ∗ −→ x {xk } k∈K k→∞

∗ −→ µ . {µk } k∈K k→∞

Suppose moreover that µ∗i > 0 if gi (x∗ ) = 0. Then, there exists C such that for all k ∈ K,

kdk − d0k k ≤ Ckd0k kν .

Proof. Let C = µ ¯ sup{kF (xk , Hk , µk )−1 k s.t. k ∈ K} where F (xk , Hk , µk ) is as in Lemma 3.1. Under the present assumptions and in view of Assumption A6 and Lemma 3.1, C is finite. We now define ∆λk = λk − λ0k and ∆dk = dk − d0k . The vector (∆dk , ∆λk ) is solution of

F (xk , Hk , µk )(

∆dk 0 )=( ). ∆λk −ρk µk kd0k kν

Since the coefficients ρk are bounded from above by 1 it follows that

1

(k∆dk k2 + k∆λk k2 ) 2 ≤ Ckd0k kν

so that k∆dk k ≤ Ckd0k kν .

17

Lemma 3.6. Let K ⊂ IN be a subset of indices such that, for some x∗ , H ∗ , ρ∗ and µ∗ ∗ −→ x {xk } k∈K

(3.8)

k→∞

∗ −→ H {Hk } k∈K

(3.9)

k→∞

∗ −→ ρ {ρk } k∈K

(3.10)

k→∞

∗ −→ µ . {µk } k∈K

(3.11)

k→∞

Suppose moreover that µ∗i > 0 for all i such that gi (x∗ ) = 0. Then −→ d0∗ and {λ0 } −→ λ0∗ where (d0∗ , λ0∗ ) is the solution of the nonsingular linear i. {d0k } k∈K k k∈K k→∞

k→∞

system in (d, λ) ∗



H d + ∇f (x ) +

m X

λi ∇gi (x∗ ) = 0

i=1

µ∗i h∇gi (x∗ ), di + λi gi (x∗ ) = 0,

i = 1, . . . , m.

−→ d1∗ , {d } −→ d∗ , {λ1 } −→ λ1∗ , and {λ } −→ λ∗ , where (d1∗ , λ1∗ ) solves the linear ii. {d1k } k∈K k k∈K k k∈K k k∈K k→∞

k→∞

k→∞

k→∞

system in (d, λ), H ∗ d + ∇f (x∗ ) +

m X

λi ∇gi (x∗ ) = 0

i=1

µ∗i h∇gi (x∗ ), di + λi gi (x∗ ) = −µ∗i kd0∗ kν and d∗ = (1 − ρ∗ )d0∗ + ρ∗ d1∗ , λ∗ = (1 − ρ∗ )λ0∗ + ρ∗ λ1∗ . iii. d0∗ = 0 if and only if d∗ = 0.

Proof. 18

The first assertion follows from Lemma 3.1 and the Implicit Function Theorem. The second assertion follows from Lemma 3.1, the Implicit Function Theorem, and the definition of dk in (2.5)-(2.6). The ‘only if’ part of the third assertion follows from Lemma 3.5. Finally, by continuity, (3.6) implies that h∇f (x∗ ), d∗ i ≤ −θhd0∗ , H ∗ d0∗ i

so that, in view of Assumption A6, the ‘if’ part of the third assertion also holds.

In the balance of this section, it is shown that any accumulation point of the sequence {xk } constructed by Algorithm A is a Kuhn-Tucker point. This result follows easily under the assumption that the directions dk tend to zero and the sets Jk of indices of constraints with multiplier of wrong sign are empty, even when the assumptions of Lemma 3.6 are not satisfied. This is the object of the next lemma. Lemma 3.7. Let x∗ be an accumulation point of the sequence generated by Algorithm A and suppose −→ −→ 0, then x∗ is a stationary point and {λ } −→ λ∗ where λ∗ is the x∗ . If {dk } k∈K that {xk } k∈K k k∈K k→∞

k→∞

k→∞

multiplier vector associated with x∗ . If moreover Jk = ∅ point. If in addition λ∗i ≤ µ ¯

∀ k ∈ K then x∗ is a Kuhn-Tucker

−→ λ∗ . ∀ i, then it also holds {µk+1 } k∈K k→∞

Proof. In view of (3.6) and of Assumption A6, if a subsequence of directions {dk }k∈K converges to zero, the corresponding subsequence {d0k }k∈K must also converge to zero. Now, the quantities 19

{λk } and {dk } satisfy Hk dk + ∇f (xk ) +

m X

λk,i ∇gi (xk ) = 0

(3.12)

i=1

µk,i h∇gi (xk ), dk i + λk,i gi (xk ) = −ρk µk,i kd0k kν .

(3.13)

Let us show that the coefficients λk,i must be bounded on K for all i. By contradiction, suppose that there exists K ′ ⊂ K and i0 such that

|λk,i0 | = max |λk,i | i

∀ k ∈ K′

and −→ ∞. |λk,i0 | k∈K ′ k→∞

Dividing (3.12) by |λk,i0 | gives m X λk,i Hk d k + ∇f (xk ) + ∇gi (xk ) = 0. |λk,i0 | |λk,i0 | |λk,i0 | i=1

1

ˆ k,i = λk,i /|λk,i |, The scalars λ 0

1

ˆ k,i ≤ 1, i = 1, . . . , m, satisfy −1 ≤ λ

ˆ i for some λ ˆ k,i −→ λ ˆi, there exists K ′′ ⊂ K ′ such that λ k∈K ′′

(3.14)

i = 1, . . . , m, so that

i = 1, . . . , m. Taking the limit

k→∞

on K ′′ in (3.14), we thus get, in view of Assumptions A2, A3, A6 and the fact that {dk } is bounded on K, m X

ˆ i ∇gi (x∗ ) = 0. λ

i=1

Now, from (3.13), we have λk,i −ρk µk,i kd0k kν µk,i h∇gi (xk ), dk i + gi (xk ) = . |λk,i0 | |λk,i0 | |λk,i0 | 20

(3.15)

Taking the limit on K ′′ in the last equality gives

ˆ i gi (x∗ ) = 0, λ

i = 1, . . . , m

ˆ i = 0 for all i such that gi (x∗ ) < 0. Therefore, (3.15) becomes i.e., λ

X

ˆ i ∇gi (x∗ ) = 0. λ

i∈I(x∗ )

ˆ i = 0, In view of Assumption A5, this implies that λ

i = 1, . . . , m, which is a contradiction

ˆ i = lim k∈K ′′ λk,i /|λk,i | ∈ {−1, 1}. Thus, the coefficients λk,i are bounded on K. since λ 0 0 0 k→∞

ˆ of {λk }k∈K and K ′ ⊂ K such that {λk } −→ λ. ˆ Let us now consider an accumulation point λ k∈K ′ k→∞

Taking the limit on K ′ in (3.12) and (3.13) yields, in view of Assumptions A2, A3 and A6, −→ 0 and {d } −→ 0, since {d0k } k∈K k k∈K k→∞

k→∞



∇f (x ) +

m X

ˆ i ∇gi (x∗ ) = 0 λ

i=1

ˆ i gi (x∗ ) = 0. λ Since x∗ is feasible, as the limit of a sequence of feasible points, it follows that it is a stationary ˆ is equal to the unique point. Also, in view of Assumption A5, the accumulation point λ multiplier vector λ∗ at x∗ . If the sets Jk are empty on K, i.e., if

λk,i > gi (xk ),

i = 1, . . . , m,

∀ k ∈ K,

it follows by continuity that λ∗i ≥ gi (x∗ ),

i = 1, . . . , m 21

so that the multipliers associated with active constraints are nonnegative, and x∗ is a KuhnTucker point. Finally, the last result comes directly from the definition of µk+1 in (2.13).

For the overall proof, assuming that the sequence {xk } has an accumulation point x∗ corresponding to some K ⊂ IN, two cases will be considered, according to whether or not the assumptions in Lemma 3.7 are satisfied at the previous iteration, i.e., whether or not −→ 0 and J {dk−1 } k∈K k−1 = ∅ ∀ k ∈ K. If this condition is satisfied, it is a simple consequence k→∞

of Lemma 3.7 that x∗ is still a Kuhn-Tucker points (Lemma 3.7 asserts that limit points of {xk−1 } on K are such). This is established in Lemma 3.8. If the condition is not satisfied, in view of the update formula (2.13), it can be assumed, without loss of generality, that the components of µk are bounded away from zero on K so that Lemma 3.6 can be used. It will be shown that, in such case, {dk } must tend to zero on K (Lemma 3.9) so that, in view of Lemma 3.7, x∗ must be stationary (Proposition 3.10). Under an additional mild assumption, Theorem 3.11 will establish convergence to Kuhn-Tucker points. Lemma 3.8. Let x∗ be an accumulation point of the sequence {xk } generated by Algorithm A and let −→ x∗ . If K ⊂ IN be such that {xk } k∈K k→∞

−→ 0 {dk−1 } k∈K

(3.16)

k→∞

and Jk−1 = ∅ 22

∀ k ∈ K,

(3.17)

then x∗ is a Kuhn-Tucker point. Proof. Reducing, if necessary, K to a subset of indices we may assume, in view of Assumption A4, that the sequence {xk−1 } converges on K to some vector x∗∗ . In view of Lemma 3.7 and Relationships (3.16)-(3.17), x∗∗ is a Kuhn-Tucker point. We show that x∗ = x∗∗ . For k ∈ K, we have xk = xk−1 + tk−1 dk−1 + t2k−1 d˜k−1 . Therefore, kxk − xk−1 k ≤ tk−1 kdk−1 k + t2k−1 kd˜k−1 k ≤ 2kdk−1 k. −→ 0, thus x∗ = x∗∗ . Since {dk−1 } tends to zero on K, this implies {kxk − xk−1 k} k∈K k→∞

Lemma 3.9. Let K ⊂ IN be a subset of indices such that either inf k∈K kdk−1 k > 0 or Jk−1 6= ∅

−→ 0. ∀ k ∈ K. Then, {dk } k∈K k→∞

Proof. Suppose by contradiction that there exits a subset K ′ of indices such that inf k∈K ′ kdk k > 0. In view of Assumptions A4, A6 and the definitions of the quantities ρk and µk in (2.6) and (2.13) respectively, it can be assumed, without loss of generality, that (3.8)-(3.11) hold on K ′ for some x∗ , H ∗ , ρ∗ and µ∗ . Also, from the definition of the multiplier 23

vector estimate (2.13), it follows, under the present assumptions, that all the components of µ∗ are strictly positive, so that, from Lemma 3.6 iii, the sequence {kd0k k}, k ∈ K ′ , is bounded away from zero. Let d > 0 be a lower bound on {kd0k k}, k ∈ K ′ . We first show that there exists t > 0 such that, for all t ∈ [0, t] and k ∈ K ′ large enough, Conditions (2.10), (2.11) and (2.12) of the line search are satisfied. It follows from (3.6) and Assumption A6 that, for k ∈ K ′ ,

h∇f (xk ), dk i ≤ −θσ1 d2 .

(3.18)

Since, using the Mean Value Theorem, one can write

f (xk + tdk + t d˜k ) = f (xk ) + 2

Z

1

h∇f (xk + tξdk + t2 ξ 2 d˜k ), tdk + 2t2 ξ d˜k idξ ,

0

it follows that, for k ∈ K ′ large enough,

f (xk + tdk + t2 d˜k ) − f (xk ) − αth∇f (xk ), dk i

≤ t{

Z

1

[h∇f (xk + tξdk + t2 ξ 2 d˜k ), dk + 2tξ d˜k i − h∇f (xk ), dk i]dξ − (1 − α)θσ1 d2 }

0

≤ t{ sup k∇f (xk + tξdk + t2 ξ 2 d˜k ) − ∇f (xk )k kdk k ξ∈[0,1]

+2t sup k∇f (xk + tξdk + t2 ξ 2 d˜k )k kd˜k k − (1 − α)θσ1 d2 }. ξ∈[0,1]

Since, from Lemma 3.6, dk and d˜k are bounded on K ′ (by definition kd˜k k ≤ kdk k) and f ∈ C 1 , this ensures that there exists tf > 0 independent of k, such that, for k ∈ K ′ large enough,

f (xk + tdk + t2 d˜k ) − f (xk ) − αth∇f (xk ), dk i ≤ 0 24

∀ t ∈ [0, tf ]

(3.19)

Similarly, for i ∈ {1, . . . , m}, k ∈ K ′ large enough, one has gi (xk + tdk + t2 d˜k ) ≤ gi (xk ) + t{ sup k∇gi (xk + tξdk + t2 ξ 2 d˜k ) − ∇gi (xk )k kdk k ξ∈[0,1]

+ 2t sup k∇gi (xk + tξdk + t2 ξ 2 d˜k )k kd˜k k + h∇gi (xk ), dk i}. ξ∈[0,1]

Let us define uk,i (t) = sup k∇gi (xk + tξdk + t2 ξ 2 d˜k ) − ∇gi (xk )k kdk k ξ∈[0,1]

+ 2t sup k∇gi (xk + tξdk + t2 ξ 2 d˜k )k kd˜k k. ξ∈[0,1]

In view of (3.19), our first claim will hold provided we can find a uniform range of t’s for which we have γk,i gi (xk ) + t{uk,i (t) + h∇gi (xk ), dk i} < 0,

i = 1, . . . , m

(3.20)

with γk,i =

½

0 1

if i ∈ Jk if i ∈ 6 Jk .

Using (2.2), (2.4) and (2.5), we can write

h∇gi (xk ), dk i = −

λk,i gi (xk ) − ρk kd0k kν , µk,i

i = 1, . . . , m.

(3.21)

Since, in view of (3.6) and Assumption A6, d0k satisfies h∇f (xk ), d0k i ≤ −σ1 d2 , it follows from the boundedness of the sequences {xk }, {d0k } and {d1k } on K ′ , from Assumption A2 and from the definition (2.6) of ρk that there exists ρ > 0 such that ρk ≥ ρ

∀ k ∈ K ′ . Relationship

(3.21) thus yields, for k ∈ K ′ ,

h∇gi (xk ), dk i ≤ −

λk,i gi (xk ) − ρ dν , µk,i 25

i = 1, . . . , m.

In order for (3.20) to be satisfied, it is thus sufficient to have, for k ∈ K ′ large enough,

γk,i gi (xk ) + t{uk,i (t) −

λk,i gi (xk ) − ρ dν } < 0, µk,i

i = 1, . . . , m.

(3.22)

Let λ∗ be as defined in Lemma 3.6. For i such that λ∗i = 0, in view of the definition of uk,i (t), the boundedness of dk and d˜k on K ′ , the convergence of {µk } to a vector with positive components on K ′ , and the fact that gi ∈ C 1 , there exists ti > 0, independent of k, such that, for t ∈ [0, ti ], k ∈ K ′ large enough,

uk,i (t) −

λk,i gi (xk ) − ρ dν < 0. µk,i

so that (3.22) holds. Now, let i be such that λ∗i 6= 0. In order for (3.22) to be satisfied, it is sufficient to have, uk,i (t) − ρ dν < 0

(3.23)

λk,i ≥ 0. µk,i

(3.24)

and γk,i − t

Inequality (3.23) will obviously hold for all t ≥ 0 small enough and k ∈ K ′ large enough, and similarly for (3.24) if λ∗i < 0. Suppose λ∗i > 0. For k ∈ K ′ large enough, λk,i > 0 and, in view of the strict feasibility of the iterates, i 6∈ Jk . Therefore γk,i = 1 and (3.24) reduces to t ≤ µk,i /λk,i . The latter ratio is bounded away from 0 on K ′ . Therefore, there exists again a number ti > 0, independent of k, such that, for t ∈ [0, ti ], k ∈ K ′ large enough, inequality (3.22) is satisfied. Thus, the first claim holds. 26

From the line search rule (2.10) and relationship (3.18), it follows that, for k ∈ K ′ large enough, f (xk + tk dk + t2k d˜k ) ≤ f (xk ) − αtθσ1 d2 which, in view of the monotonic decrease of f on the entire sequence {xk }, contradicts the convergence of {f (xk )} on K ′ .

The following has thus been established.

Proposition 3.10. Let x∗ be an accumulation point of the sequence generated by Algorithm A. Then, x∗ is a stationary point.

To conclude this section we show that, under an additional mild assumption, accumulation points of the sequence {xk } are Kuhn-Tucker points. The proof relies on Condition (2.11) of the line search, which forces decrease of the constraints with multiplier of ‘wrong sign’. A7. The number of stationary points is finite.

Theorem 3.11. All the accumulation points of the sequence {xk } generated by Algorithm A are KuhnTucker points.

Proof. 27

From Proposition 3.10, every accumulation point of the sequence generated by the algorithm is a stationary point. Thus, under Assumption A7, there exists ǫ > 0 such that every accumulation point is unique in a ball of radius ǫ about it. Consider an accumulation point x∗ and, proceeding by contradiction, suppose that x∗ is not a Kuhn-Tucker point. We first show that the entire sequence converges to x∗ . Let us suppose, by contradiction, that, −→ x∗ , infinitely many times, the sequence leaves the ǫ-ball about x∗ at some points {xk } k∈K k→∞

jumping into the neighborhood of another stationary point. A contradiction argument based −→ 0. Then, the subsequence cannot possibly jump on Lemmas 3.8 and 3.9 shows that {dk } k∈K k→∞

outside of the ǫ-ball about x∗ at k ∈ K since kd˜k k ≤ kdk k and tk ≤ 1 and the new iterate −→ x∗ . is defined by xk+1 = xk + tk dk + t2k d˜k . This contradiction establishes that {xk } k→∞

We then suppose now that x∗ is a stationary point, but not a Kuhn-Tucker point, and we show that this leads to a contradiction. Let us denote by λ∗ the multiplier vector associated with x∗ (which is unique under Assumption A5). From the stationary conditions, it holds, λ∗i gi (x∗ ) = 0,

i = 1, . . . , m. Therefore λ∗i = 0 if gi (x∗ ) < 0,

i = 1, . . . , m. The fact that

x∗ is not a Kuhn-Tucker point thus means that there exists an index i0 ∈ {1, . . . , m} such that gi0 (x∗ ) = 0 and λ∗i0 < 0. Again, a contradiction argument based on Lemmas 3.8 and 3.9 shows that the sequence {dk } converges to zero so that, from Lemma 3.7, {λk } converges to λ∗ . It follows that, for k large enough, say k ≥ k, λk,i0 ≤ gi0 (xk ), i.e., i0 ∈ Jk . Thus, in view of (2.11), gi0 (xk ) ≤ gi0 (xk−1 ) ≤ . . . ≤ gi0 (xk+1 ) ≤ gi0 (xk ) < 0 for k large enough. This contradicts the fact that gi0 (x∗ ) = 0. 28

We now strengthen the regularity assumptions on the functions involved. Assumptions A2 and A3 are replaced by A2′ . The objective function f is three times continuously differentiable. A3′ . The constraints gi ,

i = 1, . . . , m are three times continuously differentiable.

We also make the following additional assumption, which supersedes Assumption A7. A8. The sequence generated by the algorithm possesses an accumulation point x∗ (in view of Theorem 3.11, a Kuhn-Tucker point) at which (i) second order sufficiency condition and strict complementary slackness hold, i.e., the Kuhn-Tucker multiplier vector λ∗ ∈ IRm (unique in view of Assumption A5) is such that λ∗i > 0

∀ i ∈ I(x∗ )

and the Hessian of the Lagrangian function ∇xx L(x∗ , λ∗ ) is positive definite on the subspace {h s.t. h∇gi (x∗ ), hi = 0

∀ i ∈ I(x∗ )}, and (ii) the multiplier vector λ∗ satisfies

¯ i = 1, . . . , m. λ∗i ≤ µ

The first task is to show that, under these additional assumptions, the entire sequence {xk } converges to x∗ , the sequences {λk } and {µk } converge to λ∗ and the sets Ik and Jk eventually become identical to I(x∗ ) and ∅ respectively. This is the object of Proposition 4.2 below. It makes use of the following result. 29

Lemma 4.1. −→ x∗ and {x } −→ x∗ where If there exists a subset of indices K ⊂ IN such that {xk−1 } k∈K k k∈K k→∞

k→∞

−→ 0. x∗ is as in Assumption A8, then {dk } k∈K k→∞

Proof. Let K ⊂ IN be as stated and, proceeding by contradiction, suppose that there exists a subset of indices K ′ ⊂ K such that inf k∈K ′ kdk k > 0. In that case, in view of Lemma 3.9, we ′ −→ 0 and J may assume, without loss of generality, that {dk−1 } k∈K k−1 = ∅ for all k ∈ K . From ′ k→∞ −→ λ∗ . By possibly reducing K ′ to Lemma 3.7, using Assumption A8 ii, it follows that {µk } k∈K ′ k→∞

a subset of indices, it can also be assumed, in view of Assumption A6 and the definition of ρk in (2.6), that the subsequences {Hk }k∈K ′ and {ρk }k∈K ′ converge respectively to some matrix H ∗ and some value ρ∗ . Therefore, due to Assumption A8 i, the assumptions of Lemma 3.6 are satisfied, so that the matrix of the limit system in (d, λ) ∗



H d + ∇f (x ) +

m X

λi ∇gi (x∗ ) = 0

(4.1)

i=1

λ∗i h∇gi (x∗ ), di + λi gi (x∗ ) = 0

(4.2)

is invertible, and the subsequences {d0k } and {dk } converge on K ′ to the solution 0 of (4.1) and (4.2), a contradiction.

Proposition 4.2. The entire sequence {xk } converges to the point x∗ given by Lemma 4.1. Moreover, it holds, for k → ∞, 30

i. {dk } → 0 and {λk } → λ∗ , ii. Jk = ∅ and Ik = I(x∗ ), iii. {µk } → λ∗ . Proof. Consider a ball of radius ǫ > 0 about x∗ where there is no Kuhn-Tucker point other than x∗ (in view of Assumption A8, such an ǫ exists). Let us suppose by contradiction that the sequence {xk } leaves that ball infinitely often. Consider the subsequence {xk }k∈K of points at which the ball is entered and define the related subsequence {xℓ(k) }k∈K of points at which the sequence is about to exit the ball. We first show that ℓ(k) = k for all k ∈ K. Indeed, if −→ x∗ and this were not the case, there would exist a subset K ′ of indices such that {xℓ(k)−1 } k∈K ′ k→∞ −→ x∗ . In order for the sequence to leave the ball at iteration ℓ(k), since 0 ≤ t {xℓ(k) } k∈K ℓ(k) ≤ 1 ′ k→∞

and kd˜ℓ(k) k ≤ kdℓ(k) k, it is necessary that kdℓ(k) k ≥

ǫ 4

for all k ∈ K ′ large enough. But this

contradicts Lemma 4.1. Therefore, ℓ(k) = k. Let us now show that this lead to a contradiction. In order to enter the ǫ-ball without creating another accumulation point in that ball we must have kdk−1 k >

ǫ 4

for all k ∈ K large enough. But in that case, in view of Lemma 3.9, it follows

−→ 0. This contradicts the fact that, for k ∈ K, x {dk } k∈K k+1 is outside the ball. Convergence k→∞

of the sequence {xk } to x∗ is thus established. Claim i then follows by successively applying Lemmas 4.1 and 3.7. Claim ii is a direct consequence of Definitions (2.7) and (2.8), and strict complementarity Assumption A8 i. The last result follows from Lemma 3.7, making use of Assumption A8 ii.

31

Since Algorithm A is based on a quasi-Newton iteration with higher order corrections, one can expect that it will exhibit superlinear convergence if Hk converges to the Hessian of the Lagrangian at the solution. In the context of SQP type algorithms, it has been shown [16] that two-step superlinear convergence occurs under the weaker condition that the Hessian of the Lagrangian be well approximated along the projection of the search direction in the tangent plane to the active constraints. This is fortunate since, under Assumption A8, it allows the possibility of keeping Hk positive definite. Since the iteration in Algorithm A is a higher order modification of the SQP iteration, one can hope that a similar property will hold. We thus make the following assumption. A9. The sequence of matrices {Hk } satisfies kPk (Hk − ∇2xx L(x∗ , λ∗ ))Pk dk k →0 kdk k where Pk = [I − Rk (RkT Rk )−1 RkT ] with ∗

Rk = [∇gi (xk ) s.t. i ∈ I(x∗ )] ∈ IRn×|I(x

)|

.

It will be shown below (Theorem 4.6) that under this assumption, two-step superlinear convergence is ensured, provided the full quasi-Newton step is taken close to the solution. Proving that the latter will indeed occur is the object of Lemmas 4.3 and 4.4 and of Proposition 4.5. The proof will rely crucially on the properties of the estimate Hk . It turns out that Assumption A9 is exactly what will be needed there. 32

Under Assumption A9, Hk is a good approximation to the Hessian as far as its action on the component of dk in the tangent subspace to the active constraints is concerned. The following lemma gives bounds on the component of dk off this subspace. These bounds will be of use in Proposition 4.5 below. Lemma 4.3. For k large enough, the direction dk can be decomposed into

dk = Pk dk + dˆk

with  ¶2 1/2 X µ λk,i gi (xk )  + o(kd0k k2 ) kdˆk k = O  µ k,i ∗ 

i∈I(x )

where Pk is as defined in Assumption A9. Proof. By definition, the direction dk satisfies, for i = 1, . . . , m,

µk,i h∇gi (xk ), dk i + λk,i gi (xk ) = −µk,i ρk kd0k kν .

In particular, RkT dk = hk where hk is a |I(x∗ )|-vector whose components are the numbers



λk,i gi (xk ) − ρk kd0k kν µk,i 33

i ∈ I(x∗ ).

Therefore, dk = Pk dk + dˆk with dˆk = Rk (RkT Rk )−1 hk so that kdˆk k = O(khk k).

The introduction of d˜k is essential in order to obtain a unit step close to the solution. However its size, directly related to the size of ψk (see (2.9)), must be closely monitored to avoid undesirable effects. Suitable bounds, to be used in Proposition 4.5, are established in the following lemma. Lemma 4.4. For k large enough, the correction d˜k is obtained as the solution of (LS3) (this solution exists) and it satisfies kd˜k k = O(max{kdk k2 , max∗ | i∈I(x )

µk,i − 1| kdk k}) = o(kdk k). λk,i

Proof. For k large enough, in view of Proposition 4.2 ii, the correction d˜k is first computed by attempting to solve the linear least squares problem in d 1 min kdk2Hk 2 s.t. h∇gi (xk ), di = −ψk − gi (xk + dk ) ∀ i ∈ I(x∗ ) 34

or equivalently the linear system in (d, λ)

Hk d +

X

λi ∇gi (xk ) = 0

i∈I(x∗ )

h∇gi (xk ), di = −ψk − gi (xk + dk )

∀ i ∈ I(x∗ ).

First order expansion of gi (xk + dk ) yields

−ψk − gi (xk + dk ) = −ψk − gi (xk ) − h∇gi (xk ), dk i + O(kdk k2 ) ∀ i ∈ I(x∗ )

and, from (2.2), Proposition 4.2, Assumption A8 and Lemma 3.5, and since ν > 2, we have,

−ψk − gi (xk + dk ) = −ψk +

µ

¶ µk,i − 1 h∇gi (xk ), dk i + O(kdk k2 ), λk,i

∀ i ∈ I(x∗ ).

Since, from (2.9), the definition of τ (τ > 2) and of κ (κ > 0), Proposition 4.2, and strict complementarity Assumption A8 i, it follows that ψk = o(kdk k2 ), the result is a consequence of Lemma 3.1.

We are now ready to show that the use of correction d˜k introduced on the search direction causes the line search to accept a unit step close to the solution, thus avoiding any Maratos-like effect. Proposition 4.5. For k large enough, the step tk = 1 is accepted by the line search. Proof. 35

In this proof, we will constantly make use of Lemma 3.5 without explicit mention. The assumptions of Lemma 3.5 are satisfied on the entire set IN of indices (see Lemma 4.2 and strict complementarity Assumption A8 i). Throughout the proof, the phrase ‘for k large enough’, will be implicit. According to Step 2 of Algorithm A, and in view of Lemma 4.2 ii, two conditions are needed for the line search to yield a unit stepsize, namely strict feasibility of the resulting point (2.12) and sufficient decrease (2.10). Thus, this proof will be in two parts. i. We first show that a step of 1 brings a sufficient decrease on f . In view of Lemma 4.4 and Assumption A2′ , we can write 1 f (xk + dk + d˜k ) = f (xk ) + h∇f (xk ), dk + d˜k i + hdk , ∇2xx f (xk )dk i + o(kdk k2 ). 2

(4.3)

From the definitions of dk and d˜k and Lemmas 4.2 ii and 4.4, it follows that

Hk dk + ∇f (xk ) +

m X

λk,i ∇gi (xk ) = 0

(4.4)

i=1

µk,i h∇gi (xk ), dk i + λk,i gi (xk ) = −µk,i ρk kd0k kν

(4.5)

gi (xk + dk ) + h∇gi (xk ), d˜k i = −ψk

(4.6)

i ∈ I(x∗ )

From (4.4), we get

h∇f (xk ), dk i = −hdk , Hk dk i −

m X

λk,i h∇gi (xk ), dk i

(4.7)

i=1

and, using again (4.4), we obtain, in view of Lemma 4.4 and Assumption A6,

h∇f (xk ), d˜k i = −

m X

λk,i h∇gi (xk ), d˜k i + o(kdk k2 )

i=1

36

(4.8)

In view of (4.7) and (4.8), (4.3) yields 1 1 f (xk + dk + d˜k ) = f (xk ) + h∇f (xk ), dk i + hdk , (∇2xx f (xk ) − Hk )dk i 2 2 m m X 1X λk,i h∇gi (xk ), dk i − λk,i h∇gi (xk ), d˜k i + o(kdk k2 ). 2 i=1 i=1



(4.9)

For i ∈ I(x∗ ), expanding (4.6), we obtain, using Assumption A3′ and the identity ψk = o(kdk k2 ), 1 gi (xk ) + h∇gi (xk ), dk i + hdk , ∇2xx gi (xk )dk i + h∇gi (xk ), d˜k i = o(kdk k2 ). 2 Multiplying by λk,i and summing for i ∈ I(x∗ ) we get, using (4.5),



X 1 X λk,i h∇gi (xk ), dk i − λk,i h∇gi (xk ), d˜k i = 2 ∗ ∗ i∈I(x )

i∈I(x )

¶ X µ 1 λk,i 2 1 X λk,i − gi (xk ) + λk,i hdk , ∇2xx gi (xk )dk i + o(kdk k2 ). 2 µ 2 k,i ∗ ∗

i∈I(x )

(4.10)

i∈I(x )

We also have, for i 6∈ I(x∗ ), using (4.5) and the convergence to zero of the multipliers associated with the nonactive constraints, λk,i 2 gi (xk ) − λk,i hdk , ∇2xx gi (xk )dk i + o(kdk k2 ) λk,i h∇gi (xk ), dk i = − µk,i

(4.11)

where we have added a term bounded by o(kdk k2 ). Also, for i 6∈ I(x∗ ), λk,i h∇gi (xk ), d˜k i = o(kdk k2 )

(4.12)

since, from Lemma 4.4, d˜k = o(kdk k) and since, from (4.5), the convergence of the numbers µk,i to zero (Lemma 4.2 iii), and the fact that −gi (xk ) is bounded away from zero, λk,i = o(kdk k). 37

From (4.11) and (4.12), we obtain, summing for all i 6∈ I(x∗ ),



X 1 X λk,i h∇gi (xk ), dk i − λk,i h∇gi (xk ), d˜k i = 2 ∗ ∗ i6∈I(x )

i6∈I(x )

1 X 1 X λk,i 2 gi (xk ) + λk,i hdk , ∇2xx gi (xk )dk i + o(kdk k2 ). 2 µ 2 k,i ∗ ∗ i6∈I(x )

(4.13)

i6∈I(x )

In view of (4.10) and (4.13), (4.9) can be rewritten as, 1 f (xk + dk + d˜k ) = f (xk ) + h∇f (xk ), dk i 2 m X 1 2 + hdk , (∇xx f (xk ) + λk,i ∇2xx gi (xk ) − Hk )dk i 2 i=1

+

X µ

i∈I(x∗ )

¶ 1 λk,i 2 1 X λk,i 2 gi (xk ) + o(kdk k2 ). λk,i − gi (xk ) + 2 µk,i 2 µ k,i ∗

(4.14)

i6∈I(x )

If in (4.14) we substitute for dk its decomposition obtained in Lemma 4.3, we obtain, 1 f (xk + dk + d˜k ) = f (xk ) + h∇f (xk ), dk i 2 m X 1 2 λk,i ∇2xx gi (xk ) − Hk )Pk dk i + hdk , Pk (∇xx f (xk ) + 2 i=1

1 ¶2 2 2¶ X µ λk,i X µ λ 1 k,i + o gi (xk )  + λk,i − gi (xk ) µk,i 2 µk,i ∗ ∗ 

i∈I(x )

+

i∈I(x )

1 X λk,i 2 gi (xk ) + o(kdk k2 ). 2 µ k,i ∗ i6∈I(x )

Now, since by construction µk,i > 0, we have, 1 X λ2k,i gi (xk ) ≤ 0. 2 µk,i ∗ i6∈I(x )

38

(4.15)

Also, since for i ∈ I(x∗ ), λ∗i > 0 and gi (x∗ ) = 0, and since, for all k, g(xk ) < 0, Proposition 4.2 i, iii and the continuity of g imply that 1 ¶ ¶2 2 X µ X µ λk,i 1 λk,i 2   λk,i − gi (xk ) ≤ 0. gi (xk ) + o µk,i 2 µk,i ∗ ∗ 

i∈I(x )

i∈I(x )

Thus, (4.15) yields 1 f (xk + dk + d˜k ) ≤ f (xk ) + h∇f (xk ), dk i+ 2 Ã ! m X 1 hdk , Pk ∇2xx f (xk ) + λk,i ∇2xx gi (xk ) − Hk Pk dk i + o(kdk k2 ) 2 i=1 or, 1 f (xk + dk + d˜k ) ≤ f (xk ) + h∇f (xk ), dk i+ 2 ¡ 2 ¢ Pm 2 1 i=1 λk,i ∇xx gi (xk ) − Hk Pk dk k 2 kPk ∇xx f (xk ) + kdk k + o(kdk k2 ) 2 kdk k and, in view of Assumption A9, 1 f (xk + dk + d˜k ) ≤ f (xk ) + h∇f (xk ), dk i + o(kdk k2 ). 2 Now, from (3.5) and Assumption A6, h∇f (xk ), dk i ≤ −σ1 kdk k2 + o(kdk k2 ) and Condition (2.10) of the line search obviously holds for t = 1, since α < 12 . ii. We now show that a step of 1 preserves feasibility. Since Jk = ∅ (Proposition 4.2 ii), the line search only requires gi (xk + dk + d˜k ) < 0,

i = 1, . . . , m. For i 6∈ I(x∗ ), this is always

satisfied since the sequences {dk } and {d˜k } both converge to zero. Consider i ∈ I(x∗ ). We have gi (xk + dk + d˜k ) = gi (xk + dk ) + h∇gi (xk + dk ), d˜k i + O(kd˜k k2 ) = gi (xk + dk ) + h∇gi (xk ), d˜k i + O(kdk k kd˜k k). 39

From the definition of the solution d˜k of (LS3), we get gi (xk + dk + d˜k ) = −ψk + O(kdk k kd˜k k) and from Lemma 4.4, ¯ ¯ µ ¶ ¯ µk,i ¯ 3 2 ˜ ¯ ¯ − 1¯ kdk k } . gi (xk + dk + dk ) = −ψk + O max {kdk k , max∗ ¯ i∈I(x ) λk,i

(4.16)

Thus, since τ < 3 and κ < 1, (4.16) and (2.9) yield gi (xk + dk + d˜k ) < 0.

Theorem 4.6. Under the stated assumptions, the convergence is two-step superlinear, i.e., kxk+2 − x∗ k = 0. k→∞ kxk − x∗ k lim

Proof. In view of Lemma 4.5, the line search gives a step tk = 1, for k large enough. Two successive iterates are thus related by xk+1 − xk = dk + d˜k so that, in view of Lemma 4.2, Assumption A8 ii and Lemmas 3.5 and 4.4, xk+1 − xk = d0k + o(kd0k k).

(4.17)

Let us now consider the quadratic program in d, 1 min f (xk ) + h∇f (xk ), di + dT Hk d 2 s.t. h∇gi (xk ), di + gi (xk ) ≤ 0, 40

i = 1, . . . , m.

(4.18)

In [16], the two-step superlinear convergence of the sequence defined by xk+1 = xk + dek , where dek solves (4.18), is proven under assumptions similar to that of this theorem. A straightforward but lengthy analysis shows that, if xk+1 − xk = dek + o(kdek k), this result will still hold. Thus, in view of (4.17), it is enough to show that d0k = dek + o(kd0k k).

(4.19)

Let us then show, to conclude, that (4.19) is satisfied. From the optimality conditions associated with the solution dek of (4.18) and in view of Lemma 1 of [16], the direction dek satisfies X

Hk dek + ∇f (xk ) +

λek,i ∇gi (xk ) = 0

(4.20)

i∈I(x∗ )

h∇gi (xk ), dek i + gi (xk ) = 0 i ∈ I(x∗ ) for some coefficients λek,i

(4.21)

i ∈ I(x∗ ), for k large enough. Now, by definition, the direction d0k

satisfies Hk d0k + ∇f (xk ) +

m X

λ0k,i ∇gi (xk ) = 0

(4.22)

i=1

µk,i h∇gi (xk ), d0k i + λ0k,i gi (xk ) = 0,

i = 1, m.

(4.23)

For i 6∈ I(x∗ ), we have, from (4.23) and Proposition 4.2 iii, λ0k,i = o(kd0k k). Thus, (4.22) yields Hk d0k + ∇f (xk ) +

X

λ0k,i ∇gi (xk ) = o(kd0k k).

(4.24)

i∈I(x∗ )

In view of Proposition 4.2 i, iii, (4.23) yields h∇gi (xk ), d0k i + gi (xk ) = o(kd0k k) i ∈ I(x∗ ). 41

(4.25)

Now, the values dek , λek,i , i ∈ I(x∗ ) are solutions of a linear system with invertible matrix (4.20), (4.21) and d0k , λ0k are solutions of the same system (4.24), (4.25) with right hand side perturbed by o(kd0k k). We then must have d0k = dek + o(kd0k k).

The first issue to be addressed is that of an updating rule for Hk (in Step 3 of Algorithm A). The natural choice is to use the BFGS updating formula with Powell’s modification [15] to ensure positive definiteness, although it is not clear whether Assumption A9 will then always be satisfied. Computational requirements are minimized if, instead of Hk itself, the Cholesky factor is updated (see, e.g., [4]). The next question is that of efficiently solving (LS1), (LS2) and (LS3). A key is given by Propositions 5.1 and 5.2 below, where the following notations are used: Ak = [∇g1 (xk ), ∇g2 (xk ), . . . , ∇gm (xk )] ∈ IRn×m A˜k = [∇gi (xk ) s.t. i ∈ Ik ] ∈ IRn×|Ik | Gk = diag (gi (xk ), Mk = diag(µk,i ,

i = 1, . . . , m) ∈ IRm×m

i = 1, . . . , m) ∈ IRm×m

g˜k = (gi (xk + dk ) + ψk s.t. i ∈ Ik ) ∈ IR|Ik | ℓ = (1, 1, . . . , 1)T ∈ IRm .

Proposition 5.1. ( see [18] ). 42

The matrix Bk = Mk ATk Hk−1 Ak − Gk is invertible and the solutions of (LS1) and (LS2) can be expressed as λ0k = −Bk−1 Mk ATk Hk−1 ∇f (xk )

(5.1)

d0k = −Hk−1 (∇f (xk ) + Ak λ0k )

(5.2)

λ1k = λ0k + kd0k kν Mk Bk−1 ℓ.

(5.3)

d1k = −Hk−1 (∇f (xk ) + Ak λ1k ).

(5.4)

Proof. In order to show that the matrix Bk is invertible, it is sufficient to prove that the matrix ATk Hk−1 Ak − Mk−1 Gk is positive definite. The latter holds due to positive definiteness of Hk , strict positiveness of the coefficients µk,i and strict feasibility of the iterate xk . Now, with the notations just introduced, (LS1) can be written as Hk d0k + ∇f (xk ) + Ak λ0k = 0 Mk ATk d0k + Gk λ0k = 0.

(5.5) (5.6)

Also, (d1k , λ1k ) satisfies Hk d1k + ∇f (xk ) + Ak λ1k = 0 Mk ATk d1k + Gk λ1k = −kd0k kν µk .

From (5.5), we have £ ¤ ATk d0k = −ATk Hk−1 ∇f (xk ) + Ak λ0k 43

(5.7) (5.8)

and, from (5.6), £ ¤ Mk−1 Gk λ0k = ATk Hk−1 ∇f (xk ) + Ak λ0k i.e., (ATk Hk−1 Ak − Mk−1 Gk )λ0k = −ATk Hk−1 ∇f (xk ). Thus, λ0k = −(ATk Hk−1 Ak − Mk−1 Gk )−1 ATk Hk−1 ∇f (xk ) i.e., λ0k = −Mk Bk−1 ATk Hk−1 ∇f (xk ). Expression (5.2) is obtained directly from (5.5). Expressions (5.3) and (5.4) can be obtained similarly using (5.7) and (5.8).

For the computation of the search direction, which is a convex combination of d0k and d1k (see (2.5), (2.6)), we essentially need to compute the matrix Bk defined in Proposition 5.1, and its inverse. Since, from the BFGS update, the Cholesky factor associated with Hk is known, matrix Bk can be easily evaluated. Its inverse is not computed explicitly. Rather, an LU decomposition is performed. The computation of the search direction dk thus essentially requires the decomposition of an m × m matrix. The next proposition addresses the computation of the correction d˜k . Proposition 5.2. 44

˜k = A˜T H −1 A˜k is invertible, the solution d˜k of (LS3) can be expressed as If the matrix B k k ˜ −1 g˜k . d˜k = −Hk−1 A˜k B k

(5.9)

Proof. The solution d˜k of (LS3) satisfies the optimality conditions

˜k = 0 Hk d˜k + A˜k λ

(5.10)

A˜Tk d˜k + g˜k = 0

(5.11)

˜ k ∈ IR|Ik | . Relationship (5.10) yields for some multiplier vector λ

˜k . d˜k = −Hk−1 A˜k λ

(5.12)

Multiplying by A˜Tk and using (5.11) gives ˜k . g˜k = A˜Tk Hk−1 A˜k λ ˜ k is given by Therefore, the vector λ

˜ k = (A˜T H −1 A˜k )−1 g˜k . λ k k

Substituting this expression into (5.12) yields (5.9).

In order to save computation, evaluation of the correction d˜k should be attempted only when the iterate is close to a solution of Problem (1.1) (dk small); otherwise, d˜k should be set 45

˜k is always invertible due to linear independence to 0. Since, close to a solution, the matrix B of the gradients of the active constraints (Assumption A5), the computation of the correction ˜k of dimension at most equal d˜k essentially requires the Cholesky factorization of the matrix B to m × m. Thus the total work per iteration (in addition to function evaluations) is essentially that associated with two Cholesky factorizations of size at most m, the total number of constraints. Note however that Algorithm A can be modified in a straightforward manner so as to ignore the ‘obviously inactive’ constraints in the computation of the search direction. The strict feasibility of the successive iterates is essential in order to identify the sign of the multipliers associated with the active constraints. A value gi (xk ) close to zero could possibly prevent the sign identification from occurring. Thus, rule (2.12) in Step 2 of Algorithm A may lead to numerical difficulties. An alternate rule could be

gi (xk + tdk + t2 d˜k ) ≤ max{

gi (xk ) ψk , − }. 2 2

In view of (2.9) and (4.16), such a modification would not affect the convergence properties of Algorithm A. Algorithm A, on its present form, may present some instability problems. First of all, independence of the active constraints is essential for the computation of a search direction. Near dependence may lead to numerical problems. Linear system (2.1)-(2.2) may also become very ill-conditioned if some multiplier µi corresponding to a nearly active constraint gi becomes very small. This may occur close to a solution of Problem (1.1) at which the strict 46

complementarity conditions are not satisfied. In order to avoid that kind of problems, one could bound from below the approximate multipliers associated with nearly active constraints by a suitably small positive number. Finally, it is clear that scaling can be (and should be) introduced at various places in Algorithm A, in particular in the definitions of Ik and Jk , in the expression (2.9) for ψk and in the update formula (2.13b) for µk . One could, e.g., consider redefining Ik as

Ik = {i s.t. gi (xk ) ≥ −λk,i k∇gi (xk )k2 kxk − x0 k2 },

and similarly for Jk , so that asymptotically, it would be insensitive to the scaling of g. Preliminary numerical experiments have been performed using test problems from [7]. The algorithm parameters were given values α = .3, β = .8, θ = .8, ν = 3, τ = 2.5, κ = .5, µ ¯ = ∞. To improve the behavior of the algorithm in the early iterations, kd0k kν in the right hand side of (2.4) was replaced by min(10−2 kd0k k, kd0k kν ). On most problems, the performance of Algorithm A was roughly comparable to that of VF02AD as reported in [7] and to that of the algorithm in [13]. The behavior of the algorithm appeared to be relatively insensitive to changes in the values of the algorithm parameters, as could be expected from the asymptotic convergence properties.

47

References [1] T. F. COLEMAN AND A. R. CONN, Nonlinear Programming Via an Exact Penalty Function: Asymptotic Analysis, Mathematical Programming, 24 (1982), pp. 123–136. [2]

, Nonlinear Programming Via an Exact Penalty Function: Global Analysis, Mathematical Programming, 24 (1982), pp. 137–161.

[3] D. GABAY AND D. G. LUENBERGER, Efficiently Converging Minimization Methods Based on the Reduced Gradient, SIAM J. Control Optim., 14 (1976), pp. 42–61. [4] P. E. GILL, W. MURRAY AND M. A. SAUNDERS, Methods for Computing and Modifying the LDV Factors of a Matrix, Mathematics of Computation, 29 (1975), pp. 1051–1077. [5] J. HERSKOVITS, A Superlinearly Convergent Two-Stage Feasible Direction Algorithm for Nonlinear Constrained Optimization, Research Report, COPPE/UFRJ, Rio de Janeiro, Brazil, 1986. [6]

, A Two-Stage Feasible Direction Algorithm for Nonlinear Constrained Optimization, Math. Programming, 36 (1986), pp. 19–38.

[7] W. HOCK

K. SCHITTKOWSKI, Test Examples for Nonlinear Programming Codes,

AND

Lecture Notes in Economics and Mathematical Systems (187), Springer Verlag, 1981. [8] N. MARATOS, Exact Penalty Function Algorithms for Finite Dimensional and Optimization Problems, Ph.D. Thesis, Imperial College of Science and Technology, London, U.K., 1978. [9] D. Q. MAYNE

AND

E. POLAK, A Superlinearly Convergent Algorithm for Constrained

Optimization Problems, Math. Programming Stud., 16 (1982), pp. 45–61. 48

[10] D. Q. MAYNE AND E. POLAK, Feasible Directions Algorithms for Optimization Problems with Equality and Inequality Constraints, Math. Programming, 11 (1976), pp. 67–80. [11] G. P. MCCORMICK, An Arc Method for Nonlinear Programming, US Army Research Office, Technical Report T-295, Durham, 1974. [12] W. T. NYE

AND

A. L. TITS, An Application-Oriented, Optimization-Based Methodol-

ogy for Interactive Design of Engineering Systems, Internat. J. Control, 43 (1986), pp. 1693–1721. [13] E. R. PANIER

AND

A. L. TITS, A Superlinearly Convergent Feasible Method for the

Solution of Inequality Constrained Optimization Problems, SIAM J. Control Optim., 25 (1987), pp. 934–950. [14] E. POLAK AND A. L. TITS, On Globally Stabilized Quasi-Newton Methods for Inequality Constrained Optimization Problems, in Proceedings of the 10th IFIP Conference on System Modeling and Optimization — New York, NY, August-September 1981, R. F. Drenick and F. Kozin, eds., Lecture Notes in Control and Information Sciences, 38 , Springer-Verlag, 1982, pp. 539–547. [15] M. J. D. POWELL, A Fast Algorithm for Nonlinearly Constrained Optimization Calculations, in Numerical Analysis, Dundee, 1977, Lecture Notes in Mathematics 630, G. A. Watson, ed., Springer-Verlag, 1977, pp. 144–157. [16]

, The Convergence of Variable Metric Methods for Nonlinearly Constrained Optimization Calculations, in Nonlinear Programming 3, O. L. Mangasarian, R. R. Meyer and S. M. Robinson, eds., Academic Press, New York, 1978, pp. 27–63. 49

[17] P. SPELLUCCI, Han’s Method Without Solving QP, in Optimization and Control, Auslender, Oettli and Stoer, eds., Lecture Notes in Control and Information Sciences, 30, Springer Verlag, Berlin-Heidelberg-New York-Tokio, 1981, pp. 123–141. [18] R. A. TAPIA, Diagonalized Multiplier Methods and Quasi-Newton Methods for Constrained Optimization, J. Optim. Theory Appl., 22 (1977), pp. 135–194.

50