1
MODIFIED WILSON'S METHOD FOR NONLINEAR PROGRAMS WITH NONUNIQUE MULTIPLIERS ANDREAS FISCHER The paper rst investigates Newton-type methods for generalized equations with compact solution sets. The analysis of their local convergence behavior is based, besides other conditions, on the upper Lipschitz-continuity of the local solution set mapping of a simply perturbed generalized equation. This approach is then applied to the KKT conditions of a nonlinear program with inequality constraints and leads to a modi ed version of the classical Wilson method. It is shown that the distances of the iterates to the set of KKT points converge q -quadratically to zero under conditions that do not imply a unique multiplier vector. Additionally to the Mangasarian-Fromovitz Constraint Quali cation and to a Second-Order Suciency Condition the local minimizer is required to ful ll a Constant Rank Condition (weaker than the Constant Rank Constraint Quali cation one of Janin) and a so-called Weak Complementarity Condition.
1. Introduction. Consider the problem of nding a local minimizer of the nonlinear
program
(P ) f (x) ! min s.t. gi(x) 0 (i 2 I ) with I := f1; : : : ; mg and suciently smooth functions f; g ; : : : ; gm : R n ! R . Under a certain constraint quali cation a multiplier vector exists such that the pair of the minimizer and the multiplier vector is a Karush-Kuhn-Tucker (KKT) point, i.e., it satis es the KKT conditions. The Wilson method is one of the classical iteration methods that, under certain conditions, possess a superlinear local convergence to a KKT point. Moreover, the Wilson method can be viewed as one of the origins of Sequential Quadratic Programming methods, last not least regarding their local convergence properties. Robinson's result [17] on the r-quadratic convergence of an iteration scheme (that includes Wilson's method) as well as almost any subsequent result in this direction is based on conditions which imply that the KKT point is isolated, even if these conditions could be weakened considerably during the last two decades, see, e.g., [1] - [5] and references therein. To our knowledge there are only some exceptions that do not require a unique multiplier vector, in particular if all constraint functions g ; : : : ; gm are linear (see Bonnans [1], Pang 1
1
(Submitted to Mathematics of Operations Research: February 17, 1997; revised March 11, 1998.) AMS 1991 subject classi cation. Primary: 90C30 OR/MS Index 1978 subject classi cation. Primary: Programming/Nonlinear Key Words. Generalized equation, Newton method, nonlinear programming, Wilson method, nonunique multipliers, superlinear convergence.
2
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
[11], and Vetters [21]), or if the functions f , g ; : : : ; gm are convex (see Ralph and Wright [15]). Moreover, there is the very recent report by Qi and Wei [14] on a feasible SQP method. For a brief review and discussion of these results we refer to Subsection 3.7. In this paper we deal with arbitrary nonlinear constraint functions. We rst present a general framework for obtaining superlinear convergence of Newtontype methods for generalized equations with compact solution sets. Then our main aim is to show how this framework can be applied to the Karush-Kuhn-Tucker system and to derive conditions that imply local q-quadratic convergence of a Modi ed Wilson Method but not the uniqueness of the multiplier vector. This rate of convergence will be shown for the distances of the iterates to the set of KKT points. Josephy [8] proved that Newton's method for generalized equations converges locally q-quadratically to a solution if this solution is strongly regular and thus locally unique. In Section 2 of this paper we will study the local behavior of inexact Newton-type methods for generalized equations with compact solution sets. In this case the stability of the local solution set mapping of the generalized equation at hand under (simple) perturbations will turn out to be a key for showing local convergence properties. We can prove that the upper Lipschitz-continuity of the local solution set mapping and further conditions imply local superlinear convergence. Roughly speaking, by these further conditions it is assumed that the Newton subproblems at a given point are solvable and yield a step whose length is proportional to the distance of that point to the solution set. Moreover, as usual for inexact Newton methods, the inexactness has to be suciently small and, for superlinear convergence, it must go to zero when approaching the solution set. In particular, we analyze a Modi ed Inexact Newton Method. The modi cation enables this method to replace any iterate by an arbitrary point provided that the distances of this point and of the iterate to the solution set are proportional. Of course, if the solution set of the generalized equation is not a singleton, a sequence generated by this modi ed method cannot be expected to converge. Instead we can prove that the distances of the iterates to the solution set converge locally superlinearly (or even q-quadratically) to zero. In Section 3 as the central part of the paper we apply the results just mentioned to the case when the generalized equation coincides with the KKT conditions for the nonlinear program (P ). It is known that the Wilson method can be regarded as Newton's method for this generalized equation. However, based on the results in Section 2, we consider a Modi ed Wilson Method in Subsection 3.1. The modi cation mainly consists in a correction of the multiplier vector, once per step. For this purpose we employ an additional convex quadratic program with nonnegativity constraints as auxiliary subproblem. The Modi ed Wilson Method can be viewed as Modi ed Inexact Newton Method. As we mentioned above the distances between the solution set and the iterates generated by the latter method converge locally superlinearly if certain conditions are ful lled. To satisfy these conditions we give a set of assumptions on the local minimizer at hand and the corresponding set of multiplier vectors. In particular, due to a stability result by Robinson [20], the Mangasarian-Fromovitz Constraint Quali cation and a Second-Order Suciency Condition ensure the Upper Lipschitz-Continuity of the local solution set mapping associated with the perturbed KKT conditions. Under the same conditions we can show that locally the Wilson subproblems (as well as the auxiliary ones) are solvable. Moreover, we present 1
A. FISCHER
3
a new stability result for the solutions of the Wilson subproblems, see Subsection 3.3. To show that the Modi ed Wilson Method is well de ned and shares the superlinear convergence properties of the Modi ed Inexact Newton Method we also assume that a certain Constant Rank Condition holds at the local minimizer. This condition is weaker than the original Constant Rank Constraint Quali cation introduced by Janin [7]. Moreover, we need an assumption that we call Weak Complementarity Condition. An exact description of these assumptions as well as some corresponding remarks are given in Subsection 3.2. Using stability and additional technical results the announced quadratic convergence of the Modi ed Wilson Method can be proved in Subsection 3.4. We nally stress that we do not consider implementation issues. Notation: All norms kk are Euclidean vector norms or compatible matrix norms. The distance from a point a to a nonempty set A is de ned by d[z; A] := inf fka ? a0k j a0 2 Ag. The closed unit ball will be denoted by B regardless of its concrete dimension which should be clear from the context. The cardinal number of an index set J I is denoted by jJ j. Moreover, given the matrix (vector) H with columns (elements) h ; : : : ; hm we write HJ for the submatrix (subvector) consisting of all columns (elements) hj with j 2 J . 1
2. Newton Methods for Generalized Equations. Let F : R l ! Rl be continul l R ously dierentiable. Moreover, let T : R ! 2 denote a closed multifunction. Then, we
consider the perturbed generalized equation
0 2 F (z) + T (z) + p;
(1)
where p 2 R l denotes the perturbation parameter. The solution set of (1) is denoted by Z (p) and, if the unperturbed problem is considered, we set Z := Z (0). Furthermore, throughout this section we will consider only such generalized equations whose solution set Z is compact or contains an isolated compact subset, i.e., Z Z and > 0 exist so that (Z + B ) \ Z = Z . To describe the Inexact Newton Method we rst consider the subproblem 0 2 F (z) + F 0(z)(y ? z) + T (y) + q(z);
(2)
where z 2 Rl is arbitrary but xed and q(z) 2 R l is the perturbation vector. Let Z (z; q(z)) denote the solution set of (2). Moreover, for 1, we introduce the set
Z (z; q(z)) = fy 2 Z (z; q(z)) j ky ? zk d[z; Z ]g:
Algorithm I (Inexact Newton Method): Given z 2 R l and 1. 0
For k = 0; 1; 2; : : : compute zk 2 Z (zk ; q(zk )). +1
Thus, zk has to be chosen from the solution set Z (zk ; q(zk )) of the subproblem such that the Newton step zk ? zk is bounded in some sense. This generalizes a typical +1
+1
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
4
property known from the classical Newton method for computing a solution z of an equation F (z) = 0 when F 0(z ) is nonsingular. Results on the local convergence of Algorithm I will be obtained by investigating the local behavior of the following more general method.
Algorithm II (Modi ed Inexact Newton Method): Given z 2 R l , 1, and 1. 0
Set z~ := z . For k = 0; 1; 2; : : : compute zk 2 Z (~zk ; q(~zk )) and compute z~k such that d[~zk ; Z ] d[zk ; Z ]. 0
0
+1
+1
+1
+1
In contrast to Algorithm I, the Modi ed Inexact Newton Method is able to replace an iterate obtained from the inexact Newton subproblem by any other point whose distance from the solution set Z does not increase too much. Why did we introduce this modi cation? Given a point z the answer is as follows: Even if theoretically an appropriate perturbation q(z) exists so that Z (z; q(z)) is nonempty it may be impossible to determine q(z) and to compute an element z^ of Z (z; q(z)). Instead, as will be shown in Section 3, it may rather be possible to compute a point z~ whose distance from Z is bounded by d[^z; Z ]. Thus, in some sense, the modi ed point z~ maintains the quality of z^ in respect to the distance from the solution set Z . The following principal assumptions will play a key role in our analysis: Upper Lipschitz Continuity with respect to Q R l (ULCQ), i.e., there are numbers > 0 and 1 so that Z (p) \ Q Z \ Q + kpkB 8p 2 B:
Solvability of subproblems, i.e., Z (z; q(z)) 6= ;. The measure kq(z)k of inexactness is suciently small.
Of course, it is necessary to provide some more details. This will be done in the following theorems. For the particular case when the generalized equation arises from the KKT conditions of the nonlinear program (P ) we will provide conditions to satisfy the principal assumptions, see Section 3. Theorem 2.1 For Q := Z + B let ULCQ be satis ed. Then 2 (0; =2] exists so that z 2 Z + B; Z (z; q(z)) 6= ; (3) and 1 d[z; Z ]; kq(z)k 4 1 (4) 0
0
imply
d[^z; Z ] 21 d[z; Z ]
8z^ 2 Z (z; q(z))
A. FISCHER
5
and (if z 2= Z )
d[^z ; Z ] 2
0 (z + s(^z ? z )) ? F 0 (z )kg + kq (z )k max fk F 0s1 d[z; Z ]
d[z; Z ]
8z^ 2 Z (z; q(z)):
If, additionally, F 0 is locally Lipschitz-continuous and if, for some c0 > 0,
kq(z)k c d[z; Z ] 0
2
8z 2 Q
(5)
then C0 > 0 exists so that
d[^z ; Z ] C d[z; Z ]
8z^ 2 Z (z; q(z)): Proof. Let z 2 R l satisfy (3) and (4). Then, for any y 2 R l , Taylor's formula yields, F (y) = F (z) + F 0(z)(y ? z) + r(y); 0
where
r(y) =
Z
1 0
2
(F 0(z + s(y ? z)) ? F 0(z))(y ? z)ds:
Thus, setting p := ?r(y) + q(z) and taking into account that 0 2 F (z) + F 0(z)(y ? z) + T (y) + q(z); we have With the de nition of p, it follows that
y 2 Z (p):
(6)
Z1
kpk kr(y)k + kq(z)k ky ? zk k(F 0(z + s(y ? z)) ? F 0(z))kds + kq(z)k 0
For y := z^ 2 Z (z; q(z)) we further get 0(z + s(^z ? z )) ? F 0 (z )kg + kq (z )k d[z; Z ]: kpk 2 max (7) fk F s d[z; Z ] Since F 0 is continuous it is uniformly continuous on the compact set Q. Thus, there is " > 0 so that kz0 ? z00 k " ; z0 ; z00 2 Q; s 2 [0; 1] implies 1 : kF 0(z0 + s(z00 ? z0 )) ? F 0(z0 )k 2 Therefore, assuming that 0 < minf" =; 2 ; =(1 + )g, we obtain from z 2 Z + B , (4), and (7) that 1 d[z; Z ] : (8) kpk 2 0
1
0
0
0
0
0
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
6
Since =2 we have d[y; Z ] = d[y; Z ] for any y 2 Q. Moreover, for z^ 2 Z (z; q(z)) it follows that 0
d[^z ; Z ] = d[z + z^ ? z; Z ] d[z; Z ] + kz^ ? zk + d[z; Z ] (1 + ) ; 0
0
i.e., z^ 2 Q and d[^z ; Z ] = d[^z ; Z ] for all z^ 2 Z (z; q(z)). Thus, (6) yields
z^ 2 Z (p) \ Q: Using this and (8) together with Assumption ULCQ, we get d[^z; Z ] kpk 21 d[z; Z ]: (9) Hence, the rst assertion of the theorem holds true. From (9) and (7) the second assertion follows. Similarly, the third one can be shown by taking into account that (7), the Lipschitzcontinuity of F 0 on Q with some modulus L and (5) imply
kpk ( L + c )d[z; Z ] : 2
2
0
Thus, with regard to (9) the third assertion is satis ed for C := ( L + c ). 2
0
0
2
Theorem 2.2 For Q := Z + B let ULCQ and (4) for all z 2 Q be satis ed. If
Z (z; q(z)) 6= ; 8z 2 Q then, with 0 2 (0; =2] from Theorem 2.1,
z 2 Z + B 0
0
implies that Algorithm II is well de ned,
d[~zk ; Z ] d[zk ; Z ] 21 d[~zk ; Z ]; +1
and
(10)
+1
lim d[zk ; Z ] = 0:
If z k 2= Z for all k 2 N and
k!1
kq(z )k = 0 (11) lim k!1 d[z k ; Z ] is satis ed (or if F 0 is locally Lipschitz-continuous and (5) holds), then fd[z k ; Z ]g converges to 0 q-superlinearly (or even q-quadratically). k
2 N let z~k be generated by Algorithm II. If z~k 2 Z + B then, by assumption, it follows that Z (~zk ; q(~zk )) 6= ; Thus, Algorithm II generates zk and z~k . Taking Theorem 2.1 into account property (10) and z~k 2 Z + B follow. Proof. For any xed k +1
0 +1
+1
0
A. FISCHER
7
Therefore, it can easily be shown by induction that Algorithm II is well de ned and that (10) holds for all k 2 N . This implies klim d[zk ; Z ] = 0. !1 If (11) holds then Theorem 2.1 (with z := z~k and z^ := zk for k 2 N ) together with (10) and klim kzk ? zk k klim d[~zk ; Z ] = 0 provides !1 !1 +1
+1
d[zk ; Z ] lim d[zk ; Z ] = 0: lim k!1 d[z k ; Z ] k!1 d[~ zk ; Z ] +1
+1
This shows the q-superlinear convergence of fd[zk ; Z ]g. Similarly, according to the last assertion of Theorem 2.1, the local Lipschitz-continuity of F 0, (5), and (10) imply d[zk ; Z ] C d[~zk ; Z ] C d[zk ; Z ] 8k 2 N : +1
2
0
0
2
2
2
Besides the q-superlinear (quadratic) convergence of the distances d[zk ; Z ] to 0 as just proved for Algorithm II the following theorem shows that the iterates zk generated by Algorithm I converge q-superlinearly (quadratically) to a certain z 2 Z . The following theorem also improves to some extent results by Pshenitchny [12] and Robinson [16]. Under certain conditions they obtained r-quadratic convergence of the iterates of Newton's method applied to the particular generalized equation 0 2 F (z) + T ; where T R l denotes a closed cone, e.g., the nonnegative orthant. Theorem 2.3 For Q := Z + B let ULCQ be satis ed. Furthermore, assume that (4)
with := 1 is valid for all z 2 Q. If
Z (z; q(z)) 6= ;
8z 2 Q
then, with 0 2 (0; =2] from Theorem 2.1,
z 2 Z + B 0
0
implies that Algorithm I generates a well-de ned sequence fz k g which converges to a certain z 2 Z . Moreover, if zk 2= Z for all k 2 N and (11) is satis ed then fzk g converges qsuperlinearly to z . Furthermore, condition (5) and the local Lipschitz-continuity of F 0 ensure that the rate of convergence is q-quadratic. Proof. Algorithm I turns out as a particular version of Algorithm II, just set := 1
and z~k := zk in the latter. Thus, the results on the local convergence given in Theorem 2.2 apply. In particular, Algorithm I is well de ned. Since the sequence fzk g lies in the compact set Z + B it has at least one accumulation point. Moreover, any accumulation point is an element of Z . Assume that there are two dierent accumulation points z and z . It follows that there is k 2 N so that d[zk1 ; Z ] " := 41 kz ? z k +1
+1
0
1
1
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
8 and
kz ? zk1 k 41 kz ? z k:
Having this, (10), and zk 2 Z (zk ; q(zk )) in mind we get, for all k > k , +1
kzk ? zk1 k
k?1 X
1
kzj ? zj k +1
j =k1
and therefore
k?1 X j =k1
d[zj ; Z ] d[zk1 ; Z ]
k?X k1?1 j =0
2?j 2" 21 kz ? z k 1
kz ? zk k kz ? zk1 k + kzk1 ? zk k 43 kz ? z k:
This, however, contradicts the assumption that there are two accumulation points z 6= z . Thus, the sequence fzk g generated by Algorithm I converges to a certain z 2 Z . From
z ? zk = (jlim zk !1 +1
j ) ? z k+1
+2+
=
1 X (zk+2+j ? zk+1+j ); j =0
zk 2 Z (zk ; q(zk )), and (10) it follows that +1
kz ? zk k +1
1 X j =0
d[zk
j ; Z ]) d[z k+1 ; Z ]
+1+
1 k+1 X 2?j = 2kz ? zk k d[z k ; Z ]
d[z ; Z ]
j =0
(12)
provided that zk 2= Z for all k 2 N . According to Theorem 2.2 assumption (11) ensures the q-superlinear convergence of fd[zk ; Z ]g. Thus (12) yields the q-superlinear convergence of fzk g to z . If F 0 is locally Lipschitz-continuous and if (5) is satis ed we obtain from (12) that k kz ? zk k 2kz ? zk k dd[[zzk ; Z; Z] ] kz ? zk k with a constant > 0 that exists owing to the q-quadratic convergence of fd[zk ; Z ]g stated in Theorem 2.2. 2 +1
2
+1
2
2
Corollary 2.4 Let Z := fz g be a single isolated solution. Moreover, for Q := z + B
let ULCQ be satis ed. Furthermore, assume that (4) is valid with := 1 for all z 2 Q. If
Z (z; q(z)) 6= ;
8z 2 Q
then, with 0 2 (0; =2] from Theorem 2.1,
z 2 fz g + B 0
0
implies that Algorithm I is well de ned and generates a sequence fz k g which converges to z . Moreover, if z k 2= Z for all k 2 N and (11) is satis ed then fz k g converges qsuperlinearly to z . Furthermore, condition (5) and the local Lipschitz-continuity of F 0 ensure that the rate of convergence is q-quadratic.
A. FISCHER
9
3. Application to Nonlinear Programming. In this section we assume that the generalized equation (1) arises from the KKT conditions for the nonlinear program (P ). After recalling some material related to the method by Wilson we suggest a Modi ed Wilson Method. Then, Subsection 3.2 collects the basic assumptions that will be used later on. In Subsection 3.3 basic results on the stability both of the generalized equation and of the subproblems in the Modi ed Newton Method are presented. The main results then follow in Subsection 3.4. There, the Modi ed Wilson Method is characterized as particular Modi ed Inexact Newton Method that satis es the conditions for local quadratic convergence derived in Section 2. Corresponding technical details will be given in Subsections 3.5 and 3.6. Finally, in Subsection 3.7, some related material will be discussed. 3.1 Modi ed Wilson's Method. As in the Introduction we consider the nonlinear
program
(P ) f (x) ! min s.t. g(x) 0: (13) The functions f : R n ! R , g : R n ! R m are assumed to be twice dierentiable with locally Lipschitz-continuous second order derivatives. With the Lagrangian L : R n m ! R given by L(x; u) := f (x) + uT g(x) the Karush-Kuhn-Tucker conditions for problem (P ) read as follows: +
rxL(x; u) = 0; g(x) 0; u 0; uT g(x) = 0: (14) A vector pair (x; u) is called KKT point of problem (P ) if it satis es (14). Then, the primal part x is called stationary point of (P ). As is well known system (14) has the same solution set like the generalized equation
0 2 F (z) + T (z)
(15)
! R n m , T : R n m ! 2Rn+m are given by 0 r L ( x; u ) x F (z) := ?g(x) ; T (z) := N (u) ;
if z := (x; u) and F : R n
m
+
where
+
N (u) :=
+
fv 0 j vT u = 0g if u 0 ; otherwise
denotes the normal cone to R m at u. Linearizing F at z we obtain the problem +
0 2 F (z) + F 0(z)(y ? z) + T (y): Note that the above dierentiability assumptions for problem (P ) imply that F 0 is locally Lipschitz-continuous. More generally, we consider the perturbed problem 0 2 F (z) + F 0(z)(y ? z) + T (y) + q(z)
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
10
with the perturbation vector q(z) 2 R n m . The solution set of this problem is denoted by Z (z; q(z)) or, if q(z) = 0, by Z (z). It can easily be seen ([20]) that the set of KKT points of the quadratic program +
(rf (x) + qf (z))T ( ? x) + ( ? x)T rxxL(x; u)( ? x) ! min 1 2
(Q(z; q(z)))
s.t. rg(x)T ( ? x) + g(x) + qg (z) 0
coincides with Z (z; q(z)), where q(z) = (qf (z); qg (z)) 2 R n m . If q(z) = 0 we shortly write (Q(z)) instead of (Q(z; 0)). We are now in the position to describe the Wilson Method and its modi cation. To avoid confusion, in contrast to Algorithm I and II the iterates are now denoted by sans serif letters, e.g., zk instead of zk . Algorithm III (Wilson Method): Given z 2 R n m . For k = 0; 1; 2; : : : ; determine zk 2 Z (zk ) such that +
0
+
+1
kzk ? zk k = minfkz^ ? zk k j z^ 2 Z (zk )g: +1
To apply the general results of Section 2 we modify the original Wilson method. The main modi cation consists in an additional convex quadratic program for correcting the approximate multiplier vector. For that purpose consider the problem (P (x))
rf (x)T d + dT d ! min s. t. rg(x)T d + g(x) 0 1 2
The dual program associated with (P (x)) is given by (see [13] and Lemma 3.8) (rg(x)T rf (x) ? g(x))T u + uT rg(x)T rg(x)u ! min (D(x)) s. t. u 0 and will be used within the subsequent Modi ed Wilson Method. Algorithm IV (Modi ed Wilson Method): Given z 2 R n m . Set ~z := z . For k = 0; 1; 2; : : :. determine zk 2 Z (~zk ) such that 1 2
0
0
+
0
+1
kxk ? ~xk k = minfkx^ ? ~xk k j (^x; u^) 2 Z (~zk )g; +1
compute u~k as solution of (D(xk )) and set ~zk := (xk ; ~uk ). +1
+1
+1
+1
+1
A. FISCHER
11
Obviously, the second dierence between Algorithm III and Algorithm IV is that the latter chooses zk from the KKT set of the quadratic program (Q(~zk )) in such a way that the distance of an iterate from the previous one is minimal with respect to the x-variables whereas Algorithm III minimizes the distance in the z-variables. It will be shown in the proof of Theorem 3.11 that Algorithm IV can be regarded as particular case of Algorithm II. +1
3.2 Assumptions. In this subsection we describe the basic assumptions that will be used in the remainder of the paper. To this end let us rst introduce the index sets I := fi 2 I j gi(x ) = 0g; I (u) := fi 2 I j ui > 0g; I := fi 2 I j 9u 2 U : ui > 0g; 0
+
+
0
where x 2 R n denotes an arbitrary but xed stationary point of (P ) and
U := fu 2 R m j (x; u) is a KKT point of (P )g the set of multiplier vectors associated with x . We also need the set
K := f(x ; u) j u 2 Ug: containing all those KKT points of (P ) whose primal part is equal to x. Moreover, we de ne the linear subspaces
W (u) := spanfrgi(x) j i 2 I (u)g; W := spanfrgi(x ) j i 2 I g and the cone C := fh 2 R n j rgi(x )T h 0 8i 2 I ; rf (x )T h = 0g: Now, we are able to formulate the assumptions. Mangasarian-Fromovitz Constraint Quali cation (MFCQ) at x : +
+
+
+
0
X i2I0
rgi(x )i = 0; i 0 8i 2 I
0
) i = 0 8i 2 I : 0
Second-Order Suciency Condition (SOSC) at (x ; u) 2 K: hT rxxL(x ; u)h > 0 8h 2 C n f0g: Constant Rank Condition (CRC) at x : There is > 0 so that
rankrgI+ (x) = rankrgI+ (x )
8x 2 fx g + B:
Weak Complementarity Condition (WCC) at x : W = W (u) 8u 2 U : +
+
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
12
The latter condition seems to be new and means that any representation of ?rf (x ) as positive linear combination of gradients rgi(x ) with indices in I requires the maximal number (dim W ) of linearly independent gradients. If U = fug consists of a single multiplier then WCC is satis ed automatically. Otherwise, if U is not a singleton, consider the strict complementarity condition for some u 2 U , that is ui > 0 for all i 2 I . It implies W = W (u) but not vice versa. Nevertheless, W = W (u) requires that certain components of u are positive. CRC as de ned above is weaker than the usual Constant Rank Constraint Quali cation (CRCQ) of Janin [7]. The latter requires that the rank of the gradients rgi(x) with i 2 J is locally constant for any subset J I whereas CRC this only assumes for J = I . CRC always holds if the constraint functions gi (i 2 I ) are linear, if the Linear Independence Constraint Quali cation (LICQ) holds, or if the dimension of W is equal to n. However, WCC and CRC together neither imply LICQ nor the uniqueness of the multiplier vector. Recall that MFCQ at x implies the boundedness of U . Note that the assumptions above were given for problem (P ). In the next subsection we will also make use of MFCQ and SOSC in connection with certain quadratic programs to obtain stability results. Therefore we will always say to which problem and for which point (e.g., x) a condition is applied. +
+
0
+
+
+
+
0
+
+
+
3.3 Stability Results. Consider the perturbed nonlinear program (P (p)) f(x; p) ! min s.t. g(x; p) 0; where p 2 R n m is the perturbation parameter and f(; p) and g(; p) map from R n to +
R and
Rm ,
respectively. With regard to Robinson [20] we make the blanket assumption that, for each p 2 R n m , the functions f(; p) and g(; p) are dierentiable on R n , that f is continuous on R n R n m , and that the derivatives rxf, rxg as well as g are locally Lipschitz-continuous on R n R n m . We identify problem (P (0)) with (P ). Thus, due to the dierentiability assumptions for (P ), f(; 0), g(; 0) are twice continuously dierentiable. The set of KKT points of (P (p)) is denoted by K(p). Note that K(0) K for K as de ned in Subsection 3.2. The next theorem summarizes basic results obtained by Robinson [20, Theorem 3.2 and Corollary 4.3]. Theorem 3.5 For problem (P ) let MFCQ at x and SOSC for each (x ; u) 2 K be satis ed. Then there are > 0, > 0 and 1 so that +
+
+
0
0
0
; 6= K(p) \ (K + ( B R m )) K + kpkB 0
0
8p 2 B: 0
Let us now consider the particular case that
f(x; p) := f (x) + xT pf ; g(x; p) := g(x) ? pg ; p = (pf ; pg ) 2 R n m : Then the set K(p) coincides with the solution set of the perturbed generalized equation 0 2 F (z) + T (z) + p (16) +
A. FISCHER
13
with F and T as de ned in Subsection 3.1. Obviously, (16) is an instance of the perturbed generalized equation (1). Therefore, to be consistent with the notation in Section 2, we will always use Z (p) instead of K(p) to denote the solution set of (16). Theorem 3.5 now yields the following corollary. Corollary 3.6 For problem (P ) let MFCQ at x and SOSC for each (x ; u )
2 K be
satis ed. Then, for Q := K + 0 B with 0 > 0 from Theorem 3.5, the multifunction p 7! Z (p) is upper Lipschitz- continuous with respect to Q, i.e., has the property ULCQ, where = 0 and = 0 with 0 > 0; 0 > 0 from Theorem 3.5.
The importance of this corollary lies in the fact that the ULCQ property was one of the basic assumptions for obtaining the convergence results in Section 2. The following theorem provides a basic result on the behavior of the set of KKT points of the quadratic program (Q(z)) if z varies in a certain neighborhood of K. Its proof will be given in Subsection 3.5 together with some auxiliary results. Theorem 3.7 For problem (P ) let MFCQ at x and SOSC for each (x ; u)
satis ed. Then, there are 1 > 0, 1 > 0, and 1 1 so that
; 6= Z (z) \ (K + ( B R m )) K + d[z; K]B 1
1
2 K be
8z 2 K + B: 1
In the remainder of this subsection we will consider the auxiliary quadratic programs (P (x + s)) and (D(x + s)). The latter is used in Algorithm IV with s = xk ? x . For any s 2 R n let U (s) denote the solution set of (D(x + s)). +1
Lemma 3.8 For all s 2 R n the problem (P (x + s)) possesses at most one solution (denoted
by d(s)). If (P (x + s)) is solvable then the set of multiplier vectors associated with d(s) coincides with U (s), i.e., with the solution set of (D(x + s)). Problem (P (x )) has the unique solution d(0) = 0 and satis es SOSC for d = 0 and each v 2 U (0). The set of KKT points of (P (x)) is given by f0g U , thus U (0) = U . If MFCQ holds for problem (P ) at x then MFCQ is also satis ed for (P (x )) at d = 0. The solution set of (D(x )) coincides with U . Proof. Because (P (x + s)) is a linearly constrained problem with a strictly convex
quadratic objective at most one solution exists. If (P (x + s)) has a solution then it is wellknown that this solution is a stationary point of (P (x + s)) and that the set of multiplier vectors associated with this unique solution coincides with the solution set U (s) of the dual problem (D(x + s)), for instance see Pshenitchny and Danilin [13] . Writing down the KKT conditions for problem (P (x)) and using the fact that x is a stationary point of (P ) one can easily see that d = 0 is a stationary point of problem (P (x)) and its unique solution since at most one exists. Moreover, it follows that U (0) = U . Since the Hessian of the Lagrangian of (P (x )) is the identity matrix, problem (P (x)) satis es SOSC for d = 0 and each v 2 U (0). Finally, for d = 0 the index set of the active constraints coincides with I so that MFCQ for problem (P ) at x is the same as MFCQ for (P (x)) at d = 0. 2 0
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
14
Theorem 3.9 For problem (P ) let MFCQ at x be satis ed. Then there are 2 > 0 and
1 so that 2
; 6= U (s) U + kskB 2
8s 2 B: 2
Proof. Using Lemma 3.8 we can apply Theorem 3.5 to problem (P (x + s)) where
s (instead of p) is the perturbation parameter. We obtain that, for some > 0, > 0, 1, problem (P (x + s)) has the unique solution d(s) for all s 2 B and 2
2
2
2
; 6= (fd(s)g U (s)) \ ( B R m ) (f0g U ) + kskB 2
2
8s 2 B 2
2
which implies the assertion of the theorem.
3.4 Quadratic Convergence. Together with Theorem 2.2 the next theorem is the
basis for proving the local quadratic convergence of the Modi ed Wilson Method (Algorithm IV). Roughly speaking, the following theorem says that, under certain conditions and for z suciently close to K, the Wilson subproblem (Q(z)) can be perturbed so that the perturbed subproblem (Q(z; q(z; z^)) has at least one KKT point that lies in Z (z; q(z; z^)) and whose primal part is equal to x^, where z^ = (^x; u^) denotes a KKT point of the unperturbed subproblem (Q(z)) and 1 is some xed value. Moreover, it is claimed that the norm of perturbation vector q(z; z^) goes to zero as the square of the distance d[z; K] does. Theorem 3.10 For problem (P ) let MFCQ, WCC, CRC at x and SOSC for each (x ; u )
2 K be satis ed. Then there are > 0 and 1 so that z 2 K + B implies Z (z) \ (K + ( B R m )) 6= ; and 1
0
1
1
kx^ ? xk d[z; K] 0
8z^ 2 Z (z) \ (K + ( B R m )); 1
where 1 > 0 is taken from Theorem 3.7. Moreover it follows that, for each (z; z^) with z 2 K + 1 B and z^ 2 Z (z) \ (K + (1 B Rm )), there are u(z; z^) and q(z; z^) so that
(^x; u(z; z^)) 2 Z (z; q(z; z^));
ku(z; z^) ? uk d[z; K]; 0
and
kq(z; z^)k C d[z; K] 1
with some C1 > 0.
2
The proof of Theorem 3.10 and corresponding technical lemmas will be given in Subsection 3.6. Now, we are able to present the result on the q-quadratic convergence of the Modi ed Wilson Method (Algorithm IV). We note that if the local Lipschitz-continuity of the second order derivatives r f and r g is replaced by their continuity then it is still possible to show q-superlinear convergence. 2
2
A. FISCHER
15
Theorem 3.11 For problem (P ) let MFCQ, WCC, CRC at x and SOSC for each (x ; u )
2 K be satis ed. Then > 0 exists so that z 2 K + B implies that Algorithm IV is well de ned. If zk 2= K for all k 2 N the sequence fd[zk ; K]g converges q-quadratically to 0 whereas the sequence fxk g converges r-quadratically to x . 0
6
6
We rst prove that, for suitably chosen parameters, Algorithm II applied to the generalized equation (15) is well de ned and has the desired convergence properties. Then, the theorem immediately follows by showing that Algorithm IV is one particular possibility to perform Algorithm II. MFCQ at x and SOSC for each (x ; u) 2 K imply that x is an isolated stationary point of (P ). Thus > 0 exists so that (K + B ) \ K = K, i.e., K plays the role of Z as used in Section 2. Without loss of generality we can assume that 0 < minf ; ; 1=(8C ); 4 g with > 0, 1 from Theorem 3.5, > 0, C > 0 from Theorem 3.10, > 0; 1 from Theorem 3.9. We can further assume that (K + 2B ) \ K = K so that d[z; K] = d[z; Z ] 8z 2 K + B: (17) For any z 2 K + B let z^ = (^x; u^) denote an arbitrary but xed element of Z (z) \ (K + ( B R m )). With regard to Theorem 3.10 the latter set is nonempty and the theorem enables us to de ne the perturbation vectors q(z) in Algorithm II by q(z) := q(z; z^) for any z 2 K + B . Setting := 2 ; := 2 (18) all parameters of Algorithm II (except the starting vector z ) are de ned. Now, to exploit Theorem 2.2 we will show that all of its assumptions are satis ed. To this end let Q := K + B . Then, it follows from Corollary 3.6 that ULCQ is satis ed with = and = . Moreover, Theorem 3.10 yields Z (z; q(z)) 6= ; for all z 2 Q. Using Theorem 3.10, (17), and (18) we nd that 1 d[z; Z ] 8z 2 Q: kq(z)k = kq(z; z^)k C d[z; K] C d[z; K] 81 d[z; K] = 4 Thus, we can apply Theorem 2.2 and obtain that > 0 exists so that, for any starting vector z 2 K + B , Algorithm II with the above settings for ; ; q(z) generates a wellde ned sequence fzk g K + B with klim d[zk ; K] = 0. Moreover, the behavior of fq(zk )g !1 according to Theorem 3.10 implies that the sequence fzk g has the properties stated in Theorem 2.2. In the remainder of the proof we will show that Algorithm IV can be viewed as a particular possibility to perform Algorithm II. To this end let z~k 2 K + B be any iterate generated by Algorithm II. According to Theorem 2.2 we have d[~zk ; Z ] < . Setting ~zk := z~k it follows from Theorem 3.10 that Z (~zk ) 6= ;. Thus, Algorithm IV generates zk = (xk ; uk ) 2 Z (~zk ) \ (K + B R m )); Proof.
0
0
0
1
1
1
0
2
2
2
1
2
2
1
0
2
0
0
0
2
1
0
1
0
2
0
0
0
0
0
+1
+1
+1
1
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
16
where it has to be taken into account that Algorithm IV chooses (xk ; uk ) from Z (~zk ) so that kxk ? ~xk k = minfkx^ ? ~xk k j (^x; u^) 2 Z (~zk )g. Hence, identifying the pair (z; z^) occurring in Theorem 3.10 with (~zk ; zk ), Theorem 3.10 implies that u(~zk ; zk ) exists with +1
+1
+1
+1
+1
(xk ; u(~zk ; zk )) 2 Z (~zk ; q(~zk )) +1
+1
with = 2 according to (18). Therefore, we can identify (xk ; u(~zk ; zk )) with zk in Algorithm II. Theorem 2.2 together with (17) and (18) yields kxk ? x k d[zk ; K] 21 d[~zk ; Z ] < 41 : This and Theorem 3.9 imply +1
0
+1
+1
+1
+1
2
2
; 6= U (xk ? x ) U + d[zk ; K]B: +1
+1
2
Since u~k generated by Algorithm IV is a solution of the auxiliary problem (D(xk )) and thus belongs to U (xk ? x ), we obtain, taking (18) into account, +1
+1
+1
d[~zk ; K] = d[(xk ; ~uk ); K] kxk ? x k + d[zk ; K] 2 d[zk ; K] = d[zk ; K]: +1
+1
+1
+1
+1
2
2
+1
+1
Therefore, we can identify the iterate ~zk generated by Algorithm IV with z~k from Algorithm II. Altogether it follows that any step of Algorithm IV can be performed by means of a corresponding step of Algorithm II. Thus Algorithm IV can be regarded as a particular case of Algorithm II and shares its q-quadratic convergence property stated in Theorem 2.2. Finally, to see that the sequence fxk g converges r-quadratically to x recall that +1
+1
kxk ? xk d[zk ; K]
8k 2 N
and that fd[zk ; K]g converges q-quadratically and thus r-quadratically to 0.
2
3.5 Technical Results I. By means of the following auxiliary results we will be able to prove Theorem 3.7 on the behavior of the Wilson subproblems in a neighborhood of K. For any perturbation vector p = (px; pu) 2 R n m and for any u 2 U let us consider the perturbed quadratic program +
rf (x+ px)T(? x? px) + (? x? px)T rxxL(x+ px; u+ pu)(? x? px) ! min (QP (p; u )) s.t. rg(x+ px)T(? x? px) + g(x+ px) 0 1 2
The set of its KKT points is denoted by K (p; u). Lemma 3.12 For each (z; u ) 2 R n+m U , the problems (Q(z )) and (QP (z ? z ; u )) are
equivalent. In particular, the set Z (z) of KKT points of (Q(z)) is equal to K (z ? z ; u).
A. FISCHER
17
The results easily follow from the de nitions of (Q(z)) and (QP (p; u)) if we take into account that p = (px; pu) = (x ? x ; u ? u). 2 Proof.
Lemma 3.13 For problem (P ) let MFCQ at x and SOSC for each (x ; u ) 2 K be satis ed.
Then, for any u 2 U , x is a stationary point of problem (QP (0; u)) and K = K (0; u ) \ (fx g R m ). Moreover, for any u 2 U , problem (QP (0; u)) satis es MFCQ at = x and SOSC for each (x ; v) 2 K (0; u ). Proof.
We rst observe that, for = x and p = 0,
rg(x + px)T ( ? x ? px) + g(x + px) = rg(x)T (x ? x ) + g(x) = g(x): Thus, for any u 2 U , the index set I contains exactly the indices of those constraints of (QP (0; u)) which are active at = x . Hence, for any u 2 U , the problem (QP (0; u)) 0
satis es MFCQ at = x . To see that K = K (0; u) \ (fxg R m ) for any u 2 U consider the Karush-KuhnTucker conditions for QP (0; u) with := x and take into account that the gradient of the objective of (QP (0; u)) at = x is equal to rf (x) and that, as afore-mentioned, the active constraints of (QP (0; u)) at = x are exactly given by I . As the constraints of problem (QP (0; u)) are linear the Hessian of its Lagrangian is the same as the Hessian of its objective. For any 2 R n , the latter is equal to rxxL(x ; u). Hence, since the active constraints of (P ) at x = x and of (QP (0; u)) at = x are given by I , the assumption that SOSC is satis ed for problem (P ) for each (x; u) 2 K implies that SOSC also holds for problem (QP (0; u)) for each (x; v) 2 K (0; u) and arbitrary u 2 U . 2 Let z = (x ; u) be an arbitrary but xed element of Kn.+mMoreover, let the polyhedral n+m n m R and Fz? : R n m ! 2R be de ned by multifunctions Fz : R ! 2 0
0
1
+
+
Fz (z) := F (z) + F 0(z)(z ? z) + T (z); Fz? (a) := fz 2 R n Then, one can easily check that
m j0 2 F
+
1
z (z ) + ag:
Fz? (0) = K (0; u): 1
Therefore, the Corollary following Proposition 1 in [19] provides the following result. Lemma 3.14 Let z = (x ; u) 2 K be arbitrary but xed. Then there are numbers 0 > 0
and 0 1 so that d[0; Fz (z)] 0 implies d[z; K (0; u)] 0d[0; Fz (z)].
Lemma 3.15 For problem (P ) let MFCQ at x and SOSC for each (x ; u ) 2 K be satis ed.
Moreover, let z = (x ; u) 2 K be arbitrary but xed. Then there are numbers > 0, > 0, 1 and > 0 so that
K (p; u) \ (K + (B R m )) K + kpkB
8p 2 B; 8u 2 U \ (fug + B ):
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
18 Proof.
We rst choose 2 (0; 1] so that
K (0; u) \ (K + (2B Rm )) = K
8u 2 U :
(19)
To show that this is possible let us assume the contrary. Then, with regard to K = K (0; u) \ (fx g R m ) for all u 2 U (Lemma 3.13), sequences fu g U and f( ; v )g R n m must exist so that +
lim = x ; 6= x ; ( ; v ) 2 K (0; u ) 8 2 N :
(20)
!1
Passing to a subsequence if necessary, we conclude that
u := lim u 2 U !1 exists since U is compact due to MFCQ at x . From Lemma 3.13 we know that (QP (0; u)) must satisfy MFCQ at = x and SOSC for each (x ; v) 2 K (0; u). Thus, we can apply Theorem 2.4 in [20] to problem (QP (0; u)). This yields = x for all suciently large what contradicts (20). Hence, (19) must be valid for some 2 (0; 1]. Now, let u 2 U and thus z = (x ; u) 2 K be arbitrary but xed. For u 2 U ,
2 (0; 1], and p = (px; pu) 2 R n m with kpk B consider any +
(; v) 2 K (p; u) \ (K + (B R m )): This implies
(p; ; v) 2 F x ;u (; v); 1
where
(21)
(
)
#
"
rf (x) ? rf (x + px) + (rg(x) ? rg(x + px))v + (p; ; v) := rg(x)T ( ? x) ? rg(x + px)T ( ? x ? px) + g(x) ? g(x + px) " # rxxL(x ; u)( ? x ) ? rxxL(x + px; u + pu)( ? x ? px) 1
+
:
0
Moreover, it can be easily veri ed that, for any u 2 U , (p; ; v) + (; u) 2 F x;u (; v); 1
where
2
(
(22)
)
#
"
(rxxL(x ; u) ? rxxL(x ; u))( ? x) : (; u ) := 0 Obviously, from (22) it follows that, for any u 2 U , 2
d[0; F x;u (; v)] k (p; ; v)k + k (; u)k: (
)
1
(23)
2
Thus, to apply Lemma 3.14 we have to provide bounds for k (p; ; v)k and k (; u)k. To this end we rst consider (rg(x ) ?rg(x + px))v in (p; ; v). Using (21) and MFCQ 1
1
2
A. FISCHER
19
at x , continuity arguments show that we can assume 2 (0; 1] and 2 (0; 1] to be as small as necessary so that
rgi(x + px)T ( ? x ? px) + gi(x + px) < 0 and
minfk
X i2I0
8i 2 I n I ; 8p 2 B; 8 2 fxg + B 0
X
i rg(x + px)k j i 0 8i 2 I ; 0
i2I0
i = 1; kpk B g > 0:
With regard to (21) this implies vi = 0 for all i 2 I n I and, for a certain > 0, 0
0
kvk : 0
Hence we have, for all p 2 B and all v for which (; v) satis es (21),
k(rg(x) ? rg(x + px))vk Lkpk: L > 0 denotes a general Lipschitz-constant for the functions rf , g and rg with respect to arguments in fx g + B . Due to the continuity of rg and rxxL and the compactness of U there is a further constant > 0 so that krg(x + px)k ; krxxL(x + px; u + pu)k 8p 2 B; 8u 2 U : Moreover, we can assume that 2 (0; 1] is chosen so that 0
1
1
1
krxxL(x ; u) ? rxxL(x + px; u + pu)k 41 ()? 0
1
8p 2 B; 8u 2 U ;
where is taken from Lemma 3.14. The desired upper bounds for k (p; ; v)k and k (; u)k now easily follow. k (p; ; v)k (2L + L + L +2 )kpk + 41 ()? k ? x k kpk + 41 ()? k ? x k with a suitably chosen > 0. Furthermore, we obtain X k (; u)k k ? x kku ? uk kr gi(x)k 41 ()? k ? x k i2I 1
0
2
1
0
1
0
1
2
0
1
2
2
2
0
P
1
for all u 2 U \ (fug + B ), where > 0 is chosen so that kr gi(x)k ()? . i2I With these bounds for (p; y; v) and (y; u) it follows from (23) that d[0; F x;u (; v)] kpk + 21 ()? k ? x k: Choosing 2 (0; 1] and 2 (0; 1] so that + 21 ()? ; 1
2
(
2
)
2
0
0
1
0
1
2
1 4
0
1
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
20
we have d[0; F x;u (; v)] and thus obtain from Lemma 3.14 (
)
0
d[(; v); K (0; u)] d[0; F x;u (; v)] kpk + 21 k ? x k: (
0
)
2
0
Using this and (19) it follows that
d[(; v); K] = d[(; v); K (0; u)] kpk + 21 d[(; v); K]: 2
0
Thus, since the above reasoning does not depend on the (; v) 2 K (p; u) \ (K + B R m )) at hand, we obtain
K (p; u) \ (K + (B R m )) K + kpkB
8p 2 B; 8u 2 U \ (fug + B );
where := 2 . 2 Lemma 3.16 For problem (P ) let MFCQ at x and SOSC for each (x ; u ) 2 K be satis ed. Then, there are > 0, > 0, and 1 so that 2
0
1
1
1
; 6= K (p; u) \ (K + ( B R m )) K + kpkB 1
1
8p 2 B; 8u 2 U : 1
(24)
Proof. The prove is divided into two parts. In its rst part we show by contradiction
that, for any > 0, a number () > 0 exists so that
; 6= K (p; u) \ (K + (B R m )) 8p 2 ()B; 8u 2 U : (25) To this end assume the contrary. Therefore, if f g (0; 1) is a sequence with lim
= 0, !1 n m m then a sequence f(p ; u )g R R must exist such that, for any 2 N , p 2 B; u 2 U ; (26) +
and
; = K (p ; u ) \ (K + (B R m )):
(27)
u := lim u 2 U : !1
(28)
Since U is compact (by MFCQ) it follows that, passing to a subsequence if necessary, With the above sequence fp g = f(px; pu)g we de ne a sequence fp^ g by
p^ := (px; pu + u ? u)
8 2 N :
Taking into account that (QP (^p ; u)) and (QP (p ; u )) are identical, we have
K (^p ; u) = K (p ; u ):
(29)
; = K (^p ; u) \ (K + (B R m )):
(30)
This together with (27) yields
A. FISCHER
21
Due to Lemma 3.13 the quadratic program (QP (0; u)) satis es MFCQ at = x and SOSC for each (x; v) 2 K (0; u). From lim
= 0, (26), and (28) we further know !1 that fp^ g converges to 0. Hence, we can apply Theorem 3.5 to (QP (^p ; u)) for 2 N suciently large and obtain a contradiction to (30). Thus, (25) holds true. In the second part of this proof, let > 0 be arbitrarily given. Then, as just shown,
() > 0 exists so that (25) is valid. To prove that 2 (0; ()] and 1 exist so that 1
1
K (p; u) \ (K + (B R m )) K + kpkB
8p 2 B; 8u 2 U ; (31) let us assume the contrary. Therefore, if f( ; )g (0; ()] [1; 1) is a sequence with lim = 0 and lim = 1, then a sequence f(p ; u )g R n m R m must exist so that, !1 !1 for any 2 N , p 2 B; u 2 U ; (32) 1
1
+
and
K (p ; u ) \ (K + (B R m )) 6 K + kp kB: (33) Thus, reasoning as above, we have u := lim u 2 U . Identifying this limit u with the !1 vector u occurring in Lemma 3.15, it follows that
;
;
ku ? uk
for 2 N suciently large and with ; ; from Lemma 3.15. If (with from Lemma 3.15) then (33) contradicts the assertion of Lemma 3.15 and (31) is satis ed for some 2 (0; ()] and 2 [1; 1). Together with the rst part of the proof the assertion of the lemma follows. If > the second part of the proof can be repeated for := so that the same sequence fu g (except a nite number of elements) and the same limit u are considered. 1
1
Proof of Theorem 3.7.
2
Lemma 3.12 provides that
Z (z) = K (z ? z ; u)
8(z; u) 2 R n m U : +
Setting z? := (x ; u?), where u? denotes the orthogonal projection of u onto U , we get
Z (z) = K (z ? z?; u?)
8z 2 Rn m : +
(34)
Moreover, z 2 K + B implies 1
z ? z? 2 B: (35) Hence, applying Lemma 3.16 to the particular case when p = z ? z? and u = u? we obtain from (34) and (35) that the theorem holds true. 2 1
3.6 Technical Results II. In this subsection we will prove Theorem 3.10 that turned
out as one basic ingredient for showing local quadratic convergence of the Modi ed Wilson Method. To this end we will rst provide some technical lemmas.
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
22
Lemma 3.17 For problem (P ) let MFCQ and WCC at x be satis ed. Then, there is > 0
so that
W = spanfrgi(x ) j ui g 8u 2 U : (36) If problem (P ) additionally satis es CRC at x then > 0 and > 0 exist so that, for each z = (x; u) 2 K + B , an index set I (u) I exists with ui =2 > 0 8i 2 I (u); (37) rankrgI u (x) = dimW = jI (u)j; (38) spanfrgi(x) j i 2 I (u)g = spanfrgi(x) j i 2 I g; (39) k(rgI u (x)T rgI u (x))? rgI u (x)T k : (40) Proof. To prove (36) we assume the contrary. This implies that, for any sequence f g (0; 1] converging to zero, a sequence fu g U exists with W 6= W := spanfrgi(x) j ui g 8 2 N : (41) Note that, with regard to the de nition of W , W W must hold for all 2 N . Since U is compact by MFCQ we can assume that, without loss of generality, the sequence fu g converges to a certain u 2 U . Then, WCC yields, for some > 0, W = spanfrgi(x ) j ui g: (42) Thus, for 2 N suciently large, ui implies ui =2 . With (42) it follows that W = W for all 2 N suciently large which contradicts (41). To prove the remaining assertions of the lemma note that (36) implies W = spanfrgi(x ) j ui =2g 8u 2 U + (=2)B: Therefore, depending on u 2 U + (=2)B , one can choose an index set I (u) I so that rankrgI u (x) = dimW = jI (u)j; ui =2 8i 2 I (u): Together with CRC at x this yields rankrgI u (x ) = rankrgI+ (x ) = rankrgI+ (x) 8u 2 U + (=2)B; 8x 2 fxg + B: Since I (u) is contained in the nite index set I , continuity arguments show that (38) must be satis ed for z 2 K + B with 2 (0; minf=2; g] suciently small. Now, from (38) and I (u) I , equation (39) easily follows. Finally, (38), the compactness of K + B , the nite number of dierent index sets I (u), and continuity arguments yield (40) for z 2 K + B and > 0 suciently large. 2 Lemma 3.18 For problem (P ) let MFCQ, WCC at x and SOSC for each (x ; u ) 2 K be satis ed. Then, there are > 0 and C > 0 so that z 2 K + B implies Z (z) \ (K + ( B R m )) 6= ; and jgi(x) + rgi(x)T (^x ? x)j C d[z; K] 8i 2 I ; 8z^ 2 Z (z) \ (K + ( B R m )) (43) with > 0 from Theorem 3.7. +
2
2
( )
+
+
( )
1
( )
( )
+
+
+
+
+
+
+
( )
( )
2
2
+
2
2
3
1
3
1
1
1
2
+
1
A. FISCHER Proof.
23 For z 2 K + B with 0 < Theorem 3.7 yields Z (z) \ (K + ( B 3
3
1
1
R m )) = 6 ;. Let us consider any z^ = (^x; u^) contained in this set. From Theorem 3.7 we
further get
kx^ ? x k d[z; K];
ku^ ? u^?k d[z; K]; where u^? denotes the orthogonal projection of u^ onto U . Now assume that minf ; =(2 )g: 1
(44)
1
3
1
1
with > 0 from Lemma 3.17. Then
ku^ ? u^?k d[z; K] =2 follows. Thus, u^?i implies u^i =2 for any i 2 I . With Lemma 3.17 this yields W = spanfrgi(x ) j u^?i g = spanfrgi(x ) j u^i =2g: Therefore, an index set J I (^u) exists so that W = spanfrgi(x ) j i 2 J g; rankrgJ (x ) = jJ j: 1
1 3
+
+
+
For simplicity and since there is only a nite number of dierent index sets J we do not explicitly express that J depends on u^. From Taylor's formula and from gi(x) = 0 for i 2 I it follows that 0
jrgi(x)T (x ? x) + gi(x)j kx ? x k d[z; K] 2
3
3
2
8i 2 I ; 8z 2 K + B (45) 0
3
for some > 0. On the other hand, as u^i > 0 for all i 2 J , we know 3
rgJ (x)T (^x ? x) + gJ (x) = 0 8z 2 K + B: Taking into account that J I (^u) I , this and (45) yield krgJ (x)T (x ? x^)k d[z; K] 8z 2 K + B: (46) Since rgi(x) 2 W for any i 2 I there are vectors i 2 R m+ (depending on J ) so that rgi(x ) = rgJ (x)i 8i 2 I : Using this, we get for rgi(x) with i 2 I rgi(x) = rgi(x ) + (rgi(x) ? rgi(x )) = rgJ (x )i + (rgi(x) ? rgi(x)) (47) i i = rgJ (x) + (rgJ (x ) ? rgJ (x)) + (rgi(x) ? rgi(x )): The local Lipschitz-continuity of rg and (44) yield, for some > 0, krg(x) ? rg(x)k kx ? xk d[z; K] 8z 2 K + B: (48) 3
+
0
3
+
2
3
+
+
+
4
4
4
3
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
24
For i 2 I and z 2 K + B , (44) - (48) lead to +
3
jrgi(x)T (^x ? x) + gi(x)j jrgi(x)T (x ? x) + gi(x)j + jrgi(x)T (^x ? x )j ( + )d[z; K] + j(i)T rgJ (x)T (^x ? x )j + j(i)T (rgJ (x ) ? rgJ (x))T (^x ? x)j ( + + kik + kik)d[z; K] : 3
4
1
3
4
1
2
4
1
2
3
Taking into account that only nitely many vectors i can occur with regard to the nite number of dierent index sets J , there is C > 0 so that (43) holds for z 2 K + . 2 1
3
The dierentiability assumptions on f and g and the boundedness of U ensure that, for some > 0, Proof of Theorem 3.10.
5
maxfkrxxL(x; u)k; krg(x)k; kukg ; jrf (x)?rf (x )j kx?x k 5
8z 2 K+ B
5
3
with > 0 from Lemma 3.18. For := (4 + + 4 ) with > 0 from Lemma 3.17, > 0 from the proof of Lemma 3.18, and 1 from Theorem 3.7, we assume that 3
6
4
5
4
1
1
0 < minf ; ; =(4 )g: 1
2
3
6
Let us consider an arbitrary (z; z^) with z 2 K + B and z^ 2 Z (z) \ (K +( B R m )). Due to Lemma 3.18 the latter intersection is not empty for z 2 K + B . Taking into account and (44) we get 1
1
1
1
3
1
kx^ ? xk kx^ ? x k + kx ? xk ( + 1)d[z; K] 2 d[z; K]; (49) k(^u ? u)I nI+ k k(^u ? u^?)I nI+ k + k(^u? ? u)I nI+ k ( + 1)d[z; K] 2 d[z; K]; (50) where u^? 2 U denotes the orthogonal projection of u^ onto U and, thus, u^?I nI+ = uI nI+ = 0 8u 2 U (51) 1
1
1
1
holds. With u and u? as the orthogonal projections of u onto the nonnegative orthant in R m and onto U let us de ne the vector (z; z^) by +
P
u^irgi(x)+ i 2 I n I + P + [(u+i ? u?i )rgi(x ) + u+i (rgi(x) ? rgi(x ))]: i2I+
(z; z^) := rf (x) ? rf (x) + rxxL(x; u)(^x ? x) + Since u? 2 U we obtain rf (x) + rg(x)u? = 0 and (z; z^) = rf (x) + rxxL(x; u)(^x ? x) +
X
i2I nI+
u^irgi(x) +
The KKT conditions for (Q(z)) provide
rf (x) + rxxL(x; u)(^x ? x) + rg(x)^u = 0
X i2I+
ui rgi(x): +
(52)
(53)
A. FISCHER
25
and we therefore have
rf (x) + rxxL(x; u)(^x ? x) +
X
u^irgi(x) 2 spanfrgi(x) j i 2 I g: +
i2I nI+
Because of Lemma 3.17 can be applied. Thus, (53) and (39) imply 1
2
(z; z^) 2 spanfrgi(x) j i 2 I g = spanfrgi(x) j i 2 I (u)g:
(54)
+
Lemma 3.17 also ensures that rgI u (x)T rgI u (x) is nonsingular. Thus, ( )
( )
(z; z^) := (rgI u (x)T rgI u (x))? rgI u (x)T (z; z^) ( )
1
( )
( )
is well de ned. From (54), we get
rgI u (x)(z; z^) = rgI u (x)(rgI u (x)T rgI u )? rgI u (x)T (z; z^) = (z; z^): ( )
( )
( )
1
( )
( )
(55)
From (51), we have u?I nI+ = 0. Applying this, the de nitions of ; , (48), (49), and (50) to (52) we obtain 4
k(z; z^)k
5
(kx ? xk + kx^ ? xk + ku^I nI+ k + k(u ? u?)I+ k + kx ? x k) (1 + + 2 )d[z; K] + (k(^u ? u)I nI+ k + k(u ? u?)I nI+ k + ku ? u?k) (2 + + 4 )d[z; K] + (ku ? uk + ku ? u?k) (4 + + 4 )d[z; K]: +
5
5
4
1
5
5
4
1
5
5
4
1
4
+
+
Thus, from the de nitions of (z; z^) and , and from (40) it follows that 6
k(z; z^)k k(rgI u (x)T rgI u (x))? rgI u (x)kk(z; z^)k d[z; K]: Using (37), we get from (56) and from d[z; K] =4, ( )
1
( )
( )
6
8i 2 I (u):
8 < u^i
if i 2 I n I ui(z; z^) := : ui ? i(z; z^) if i 2 I (u) ; ui if i 2 I n I (u)
(57)
u(z; z^) 0
(58)
+
+
we obtain and
(56)
6 1
ui ? i(z; z^) 4 Hence, de ning u(z; z^) by
6
+
ku(z; z^) ? uk (2 + + 1)d[z; K]; 1
6
(59)
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
26
where (50) and (56) have been taken into account. Additionally, with regard to (53), (57), and (55), it follows that
rf (x) + rxxL(x; u)(^x ? x) + rg(x)u(z; z^) X X u^irgi(x) ? ui rgi(x) + rg(x)u(z; z^) = (z; z^) ? +
= (z; z^) + = (z; z^) ? = 0:
i2I+ (ui(z; z^) ? ui)rgi(x)
i2X I nI+
i2X I (u)
i2I (u)
i(z; z^)rgi(x)
Let the vector q(z; z^) := (qf (z; z^); qg (z; z^)) 2 R n
m
+
qf (z; z^) := 0;
qg (z; z^)i :=
From Lemma 3.18, we get
be de ned by
?(gi(x) + rgi(x)T (^x ? x)) if i 2 I 0 if i 2 I n I : +
+
kq(z; z^)k C d[z; K] : 1
Moreover, it follows that
(60)
(61)
2
gi(x) + rgi(x)T (^x ? x) + qg (z; z^)i = 0 gi(x) + rgi(x)T (^x ? x) + qg (z; z^)i 0
8i 2 I ; 8i 2 I n I ; +
+
(62)
for the latter recall that z^ 2 Z (z). Together with (57) we therefore obtain (gi(x) + rgi(x)T (^x ? x) + qg (z; z^)i)ui(z; z^) = 0
8i 2 I:
(63)
Relations (58), (60), (62) and (63) show that (^x; u(z; z^)) satis es the KKT conditions associated to the perturbed quadratic program (Q(z; q(z; z^))). Thus, we have (^x; u(z; z^)) 2 Z (z; q(z; z^)): Now, let 2 + +1. Then, with regard to (49), (59) and (61) the proof is complete. 2 0
1
6
3.7 Related Work. Until now almost all results concerning local superlinear convergence properties of algorithms for computing a local minimizer of (P ) require that the corresponding set of multiplier vectors is a singleton, see Facchinei and Lucidi [4] for a good review. Here we refer the reader to two recent results of this type. If the subproblems are quadratic programs with linearized constraints the weakest known conditions to obtain local superlinear convergence were probably given by Bonnans [2] and require the uniqueness of the multiplier vector together with SOSC. Without going into detail we mention that this result can be reobtained from Corollary 2.4 by using Corollary 3.6 and Theorem 3.7. If, instead, algorithms are considered whose subproblems are systems of linear equations Facchinei, Fischer, and Kanzow [3] have shown that the strong regularity [18] of the KKT
A. FISCHER
27
point suces to obtain superlinear convergence of a generalized Newton method applied to a certain semismooth reformulation of the KKT conditions. Now, we will turn our attention to the rare results that do not need the uniqueness of the multiplier vector. First consider the nonlinear program (P ) with only linear constraints. Then, the sequence fxk g generated by the Wilson Method is known to converge q-quadratically to x , if a second-order suciency condition (SOSC for all (x ; u) 2 K) holds, see Bonnans [1], Pang [11], and Vetters [21]. The latter paper, though it assumes f to be strictly convex, is more general in the sense that it deals with nonlinear programs (P ) having a convex feasible set. Each step of the algorithm in [21] requires the solution of a Levitin-Polyak subproblem [9], i.e., a quadratic approximation of f subject to the original feasible set has to be minimized. Recall that for linearly constrained programs (P ) the Levitin-Polyak subproblems coincide with those of the Wilson Method. If, however, the constraint functions are nonlinear then the Levitin-Polyak subproblems become quadratic programs with nonlinear constraints. One may think that the theory presented in Section 2 is not able to cover the afore-mentioned results for linearly constrained programs (P ) because MFCQ and WCC (as required in Section 3) need not be satis ed for such problems. However, without giving details we claim that both MFCQ and WCC are not necessary in order to apply the results of Section 2 for linearly constrained programs. The second approach we are aware of that does not imply the uniqueness of the multiplier vector is the recent paper by Ralph and Wright [15]. They consider an interior-point method for monotone variational inequalities over convex sets. To specify this for the case of nonlinear programs (P ) recall that the solution sets of (P ) and of the variational inequality rf (x)T (y ? x) 0 8y with g(y) 0 coincide if f and g ; : : : ; gm are convex functions and a constraint quali cation (e.g., MFCQ) is satis ed (see [6]). Besides further convexity assumptions on subspaces, in particular a second-order suciency condition, the superlinear convergence proof in [15] relies on the existence of a strictly complementary solution (I = I ) and on the Constant Rank Constraint Quali cation in the sense of Janin [7]. As noted in Subsection 3.2 the latter condition is stronger than the CRC we use. Whereas, on the one hand, we do not need a strictly complementary solution for proving local quadratic convergence of the Modi ed Wilson Method we assume on the other hand that WCC is satis ed. While completing this paper, an interesting report by Qi and Wei [14] has been released. It deals with the particular Sequential Quadratic Programming (SQP) algorithm of Panier and Tits [10] whose iterates are feasible with respect to the constraints of (P ). In a local neighborhood of x , Qi and Wei propose a modi cation of this SQP algorithm: In each step a vector vk that belongs to the multiplier set of a quadratic subproblem has to be provided so that the gradients rgi(xk ) for all i 2 I (vk ) are linearly independent. Using this technique and MFCQ, the Constant Rank Constraint Quali cation in the sense of Janin and a Strong Second-Order Suciency Condition, the two-step Q-superlinear convergence of the iterates fxk g to x can be shown. The second-order condition is slightly stronger than SOSC for each (x ; u) 2 K as used in Theorem 3.11. 1
+
0
+
Acknowledgement. I would like to thank Professors Diethard Klatte, Bernd Kummer,
28
MODIFIED WILSON'S METHOD FOR NONUNIQUE MULTIPLIERS
Daniel Ralph for helpful discussions. Moreover, I am very grateful to the anonymous referees for their valuable remarks.
References [1] Bonnans, J. F. (1989). Local study of Newton-type algorithms for constrained problems. In Lecture Notes in Mathematics 1405 (S. Dolecki, Ed.), Springer, Berlin, pp. 13-24. [2]
(1994). Local analysis of Newton-type methods for variational inequalities and nonlinear programming. Appl. Math. Optim. 29 161-186.
[3] Facchinei, F., Fischer, A., and Kanzow, C. (1998). Regularity properties of a new equation reformulation of variational inequalities. SIAM J. Optim., to appear. [4]
and Lucidi, S. (1995). Quadratically and superlinearly convergent algorithms for the solution of inequality constrained minimization problems. J. Optim. Theory Appl. 85 265-289.
[5] Fischer, A. (1992). A special Newton-type optimization method. Optimization 24 269284. [6] Harker, P. T. and Pang, J.-S. (1990). Finite-dimensional variational inequality and nonlinear complementarity problems: A survey of theory, algorithms and applications. Math. Programming 48 161-220. [7] Janin, R. (1984). Directional derivative of the marginal function in nonlinear programming. Math. Programming Stud. 21 110-126. [8] Josephy, N. H. (1979). Newton's method for generalized equations, Technical Summary Report No 1965, Mathematical Research Center, University of Wisconsin-Madison. [9] Levitin E. S. and Polyak, B. T. (1966). Constrained minimization methods. Zh. Vychisl. Mat. i Mat. Fiz. 6 787-823. Translated in: Comput. Math. Math. Phys. 6 1-50. [10] Panier E. R. and Tits, A. L. (1993). On combining feasibility, descent and superlinear convergence in inequality constrained optimization. Math. Programming 59 261-276. [11] Pang, J.-S. (1993). Convergence of splitting and Newton methods for complementarity problems: An application of some sensitivity results. Math. Programming 58 149-160. [12] Pshenitchny, B. N. (1970). Newton's method for the solution of a system of equations and inequalities Mat. Zametki 8 635-640. [13]
and Danilin, Ju. M. (1982). Numerische Methoden fur Extremalaufgaben, Deutscher Verlag der Wissenschaften, Berlin.
A. FISCHER
29
[14] Qi, L. and Wei, Z. (1997). Constant Positive Linear Independence, KKT points and convergence of feasible SQP methods, Technical Report, School of Mathematics, The University of New South Wales, Sydney. [15] Ralph, D. and Wright, S. (1996). Superlinear convergence of an interior-point method for monotone variational inequalities, Preprint MCS-P556-0196, Mathematics and Computer Science Division, Argonne National Laboratory. [16] Robinson, S. M. (1972). Extension of Newton's method to nonlinear functions with values in a cone. Numer. Math. 19 341-347. [17] (1974). Perturbed Kuhn-Tucker points and rates of convergence for a class of nonlinear programming algorithms. Math. Programming 7 1-16. [18] (1980). Strongly regular generalized equations. Math. Oper. Res. 5 43-62. (1981). Some continuity properties of polyhedral multifunctions. Math. Pro[19] gramming Stud. 14 206-214. (1982). Generalized equations and their solutions, Part II: Applications to [20] nonlinear programming. Math. Programming Stud. 19 200-221. [21] Vetters, K. (1970). Quadratische Approximationsmethoden zur konvexen Optimierung. Z. Angew. Math. Mech. 50 479-484. [22] Wilson, R. B. (1963). A simplicial algorithm for concave programming, PhD thesis, Graduate School of Business Administration, Havard University, Cambridge. A. Fischer: Department of Mathematics, University of Dortmund, 44221 Dortmund, Germany, e-mail:
[email protected]