A nonsmooth inexact Newton method for the solution of ... - CiteSeerX

Report 0 Downloads 113 Views
A nonsmooth inexact Newton method for the solution of large-scale nonlinear complementarity problems Francisco Facchinei a 1 and Christian Kanzow b ;

Universita di Roma \La Sapienza", Dipartimento di Informatica e Sistemistica, Via Buonarroti 12, 00185 Roma, Italy. e-mail : [email protected] b University of Hamburg, Institute of Applied Mathematics, Bundesstrasse 55, D-20146 Hamburg, Germany. e-mail: [email protected]

a

Abstract A new algorithm for the solution of large-scale nonlinear complementarity problems is introduced. The algorithm is based on a nonsmooth equation reformulation of the complementarity problem and on an inexact Levenberg-Marquardt-type algorithm for its solution. Under mild assumptions, and requiring only the approximate solution of a linear system at each iteration, the algorithm is shown to be both globally and superlinearly convergent, even on degenerate problems. Numerical results for problems with up to 10000 variables are presented.

1 Introduction We consider the complementarity problem NCP(F ), which is to nd a vector in IRn satisfying the conditions

x  0;

F (x)  0;

xT F (x) = 0;

where F : IRn ! IRn is a continuously di erentiable function. Nonlinear complementarity problems have important applications, see, e.g., [11,19], which often call for the solution of large-scale problems. During the last few years many methods have been developed for the solution of the nonlinear complementarity problem NCP(F ), see [17,25] and references therein, but there have been only few proposals for the solution of large-scale problems, see The work of this author was partially supported by Agenzia Spaziale Italiana, Roma, Italy 1

Preprint submitted to Elsevier Science

26 May 1995

e.g., [24,14,16,7,2]. Some of these proposals [24,14] require, at each iteration, the approximate solution of a linear complementarity problem, which may be cumbersome from a computational point of view. Results for large-scale problems have also been obtained by using direct methods for the solution of the subproblems arising at each iteration [2,7]. But the computational costs associated to the exact solution of the subproblems preclude the use of these methods for really large problems. Finally, we point out that it would be easy to think to the inexact extensions of methods which reduce the nonlinear complementarity problem to a smooth system of equations. However, using these methods, fast convergence is precluded in the case of degenerate solutions, and it is well known that degenerate solutions occur, especially in the large-scale case. In this paper we propose a new method for the solution of NCP(F ) which is based on a semismooth equation reformulation of the nonlinear complementarity problem. This reformulation hinges on the use of the convex function ' : IR ! IR de ned by 2

p

'(a; b) := a + b ? a ? b: 2

(1)

2

The main property of this function is that

()

'(a; b) = 0

a  0; b  0; ab = 0;

so that the NCP(F ) is obviously equivalent to the nonlinear system (x) = 0; where  : IRn ! IRn is de ned by 0 1 '(x ; F (x)) CA : ... (x) := B (2) @ '(xn ; Fn(x)) 1

1

The function (1) has been introduced by Fischer [12] in 1992. Since then it has interested several researchers in the elds of complementarity, constrained optimization and variational inequality problems, see [13] for a survey of its applications. We propose an inexact Levenberg-Marquardt-type method for the solution of the semismooth system (x) = 0. This method requires, at each iteration, the approximate solution of a symmetric positive semide nite and solvable linear system. We show that, under very mild assumptions, the algorithm is locally superlinearly and even quadratically convergent to a solution of the nonlinear complementarity problem. In order to globalize the local method we perform an Armijo-type line search to minimize the natural merit function (x) := (x)T (x); 1 2

2

which, in spite of the nonsmoothness of , turns out to be continuously differentiable. The new algorithm is based, on the one hand on the recently developed theory for the solution of semismooth systems of equations by Newton-type methods, see [30,29,22], and, on the other hand, on the properties of the functions  and that have also been the subject of recent research [10,5]. From the theoretical point of view, the new method compares favourably with existing methods. Numerical tests on problems with up to 10000 variables are reported to show also its practical viability. As far as we are aware of, the algorithm we propose here is the rst algorithm for nonlinear complementarity problems that requires, at each iteration, only the inexact solution of a linear system (which is an \easy" task) and can nevertheless guarantee fast convergence even on degenerate problems. The paper is organized as follows: In Section 2 we recall some recent results on semismooth functions and prove some related new results that are needed to analyze the behavior of the algorithm. Some important properties of the operators  and are summarized in Section 3. Section 4 contains a detailed description of the new method along with the corresponding global and local convergence theory. Some implementation issues of the algorithm are discussed in Section 5, while the numerical tests are reported in Section 6. A few words about notation. Given a function G : IRn ! IRn ; we denote by Gi its ith component function. If G is continuously di erentiable, G0(x) is the Jacobian matrix of G at a point x 2 IRn: If G is directionally di erentiable, we denote the directional derivative of G at x 2 IRn in the direction d 2 IRn by G0(x; d): The symbol k:k indicates the Euclidean vector norm or its associated matrix norm. The index set f1; : : : ; ng is indicated by I . If M 2 IRnn is a matrix with elements mij ; i; j 2 f1; : : : ; ng; and if K and J are subsets of f1; : : : ; ng; then MKJ denotes the jK j  jJ j submatrix of M consisting of the elements mij ; i 2 K; j 2 J:

2 Di erentiability of Functions and Generalized Inexact Newton Methods In this section we recall recent results on the di erentiability of functions and the solution of semismooth systems of equations. Furthermore, we prove some new results about convergence rates of general algorithms for the solution of semismooth systems of equations and apply these results to the local analysis of a Levenberg-Marquardt-type method. Let G : IRn ! IRn be locally Lipschitzian; by Rademacher's theorem G is 3

di erentiable almost everywhere. If we indicate by DG the set where G is di erentiable, we can de ne the B-subdi erential of G at x [29] as

@B G(x) := klim G0(xk ); x 2DG xk !x

and the generalized Jacobian of G at x [3] as

@G(x) := co @B G(x); where co denotes the convex hull of a set. Semismooth functions were introduced in [23] for functionals, and further studied in [30].

De nition 1 Let G : IRn ! IRn be locally Lipschitzian at x 2 IRn. We say that G is semismooth at x if

lim

H 2@G(x+tv0 ) v0 !v;t#0

Hv0

(3)

exists for every v 2 IRn . A semismooth function G : IRn ! IRn is said to be strongly semismooth at x 2 IRn if for any H 2 @G(x + d); and for any d ! 0;

Hd ? G0(x; d) = O(kdk ):

(4)

2

Semismooth functions lie between Lipschitz functions and C functions. It is known that if G is semismooth at x then it is also directionally di erentiable there (in particular, formula (4) makes sense for this reason), and its directional derivative in the direction v is given by (3). 1

In the study of algorithms for the local solution of semismooth systems of equations, the following regularity condition plays a role similar to that of the nonsingularity of the Jacobian in the study of algorithms for smooth systems of equations.

De nition 2 We say that a semismooth function G : IRn ! IRn is BDregular at x if all elements in @B G(x) are nonsingular.

The BD-regularity condition has some noteworthy consequences, for example it implies the local uniqueness of a solution x , see [29, Proposition 2.5]. In this paper we shall employ several times the following result [29, Lemma 2.6]. 4

Proposition 3 Suppose that x is a BD-regular solution of G(x) = 0, where G : IRn ! IRn is semismooth. Then there exists a neighborhood of x and a constant c such that

kH ? k  c; 1

8y 2 ; 8H 2 @B G(y):

The following result plays a crucial role in the proof of our characterization theorem for Q-quadratic convergence.

Proposition 4 Assume that G : IRn ! IRn is locally Lipschitzian and strongly semismooth at x 2 IRn ; and directionally di erentiable in a neighborhood of x: Then

lim sup

h!0;H 2@G(x+h)

kG(x + h) ? G(x) ? Hhk < 1: khk 2

(5)

Proof. Since G is strongly semismooth at x; it is B-di erentiable of degree two at x [29, Lemma 2.3], i.e. 0 lim sup kG(x + h) ?kGh(kx) ? G (x; h)k < 1: h! 2

0

(6)

From (6) and the de nition of strong semismoothness we directly obtain (5). 2 The following result characterizes the local Q-superlinear and Q-quadratic convergence rate of sequences converging to a solution of G(x) = 0:

Theorem 5 Let G : IRn ! IRn be a locally Lipschitz-continuous function in the open convex set D  IRn : Let fxk g  D be any sequence converging to x 2 D with xk = 6 x for all k: Then: (a) If G is semismooth and BD-regular at x ; then fxk g converges Q-superlinearly to x and G(x ) = 0 if and only if kG(xk ) + H k dk k = 0; lim k k!1

kd k

(7)

where H k 2 @B G(xk ) and dk := xk+1 ? xk : (b) If G is strongly semismooth and BD-regular at x and directionally differentiable in a neighborhood of x; then fxk g converges Q-quadratically to x and G(x ) = 0 if and only if k k k lim sup kG(x k)d+k kH d k < 1; k!1

2

5

(8)

where again H k 2 @B G(xk ) and dk := xk+1 ? xk :

Proof. Part (a) is just a restatement of Theorem 2 by Pang and Qi [27]. In

order to prove part (b), rst assume that (8) holds. Furthermore suppose, without loss of generality, that xk 6= xk . Let ek := xk ? x denote the error vector at the kth iterate. Note that (8) implies that +1

kG(xk ) + H k dk k ! 0: kdk k Because of part (a), this implies G(x ) = 0 and fxk g ! x Q-superlinearly. In order to show that the rate of convergence is actually Q-quadratic, consider the following identity: 0 = G(x ) = [G(xk ) + H k dk ] ? [G(xk ) ? G(x ) ? H k ek ] ? H k ek : +1

(9)

In view of the semismoothness of G and the BD-regularity of x ; the matrices H k are nonsingular for all k suciently large, and there exists a constant c > 0 such that k(H k )? k  c for all these indices, see Proposition 3. Dividing (9) by kek k therefore gives 1

2

!

kek k  c kG(xk ) + H k dk k kdk k + kG(xk ) ? G(x ) ? H k ek k : (10) kek k kdk k kek k kek k +1

2

2

2

2

2

Since we already know that xk ! x Q-superlinearly, we have

kdk k = 1 lim k!1 kek k

(11)

by Lemma 8.2.3 in [6]. Using (11), condition (8) and Proposition 4, we immediately obtain lim sup kkeek k k < 1; k+1

k!1

2

i.e., fxk g converges Q-quadratically to x : Based on the identity (9) and the upper semicontinuity of the generalized Jacobian, the converse direction can be shown in a very similar way; we omit the details here. 2 6

We are now in the position to de ne and analyze a Levenberg-Marquardt-type method for the solution of a system of nonsmooth equations. Given a starting vector x 2 IRn ; let 0

xk = xk + dk ;

(12)

+1

where dk is the solution of the linear system ((H k )T H k + k I )d = ?(H k )T G(xk );

H k 2 @B G(xk ); k  0: (13)

This scheme obviously is formally identical to the classical Levenberg-Marquardt scheme for the solution of smooth systems of equations. We can consider also inexact versions of this scheme, which are more suited to the large-scale case. In this case the vector dk is given by the solution of the system ((H k )T H k + k I )d = ?(H k )T G(xk ) + rk ; H k 2 @B G(xk ); k  0;

(14)

where rk is the vector of residuals and measures how inexactly system (13) is solved. We can state the following result.

Theorem 6 Let G : IRn ! IRn be given, and let x be a solution of G(x) = 0. Suppose that fxk g is a sequence generated by (12) and (14) such that fxk g converges to x .

(a) Under the assumptions of Theorem 5 (a), if f k g ! 0 and krk k = o(k(H k )T G(xk )k), then fxk g ! x Q-superlinearly. (b) Under the assumptions of Theorem 5 (b), if f k g = O(k(H k)T G(xk )k) and krk k = O(k(H k )T G(xk )k2 ), then fxk g ! x Q-quadratically.

Proof. (a) In order to prove the Q-superlinear convergence of the sequence fxk g to x; we only need to verify that kG(xk ) + H k dk k = 0: lim (15) k!1 kdk k The assertion then follows from Theorem 5 (a). We rst note that by the upper semicontinuity of the generalized Jacobian and by Proposition 3, there exists a positive constant  such that

k(H k )T k  ;

k(H k )?T k  :

(16) 7

By (14) we have

G(xk ) + H k dk = (H k )?T (rk ? k dk ):

(17)

Hence, by (15), (17) and (16), to prove superlinear convergence we only need to show that k ?  k dk k k r (18) lim kdk k = 0: k!1 In turn, since fk g ! 0, (18) is satis ed if

kr k = 0: lim k!1 kdk k k

(19)

But from the de nition of dk we have

 ?  k T k k  dk = (H k )T H k + k I ?(H ) G(x ) + r ; 1

so that, by (16) and the assumption krk k = o(k(H k )T G(xk )k), we have kdk k = O(k(H k)T G(xk )k), from which (19) follows. (b) In order to prove the Q-quadratic convergence of the sequence fxk g to x ; we only need to verify that k k k lim sup kG(x k)d+k kH d k < 1: (20) k!1 2

The assertion then follows from Theorem 5 (b). We can reason as in the previous point to show that (20) is equivalent to k k k lim sup kr k?dkk d k < 1: k!1

(21)

2

Always reasoning as on the previous point we also have that kdk k = O(k(H k)T G(xk )k). So, taking into account that by assumption fk g = O(k(H k)T G(xk )k) and krk k = O(k(H k)T G(xk )k ), (21) readily follows. 2 2

We stress that the condition krk k = o(k(H k )T G(xk )k) in Theorem 6 (a) can be replaced by the condition krk k = o(kG(xk )k) without destroying the Qsuperlinear convergence of the sequence fxk g: This follows from the boundedness of the sequence fxk g and the boundedness of the generalized Jacobian on 8

bounded sets. Similarly, the sequence fxk g remains Q-quadratically convergent if we assume fk g = O(kG(xk )k) and krk k = O(kG(xk )k ) in Theorem 6 (b). 2

Remark 7 We note that it is easy to check, following the proofs of Theorems 5 and 6 that actually the following statement holds. If fxk g is any sequence (no matter how generated) converging to x , then, under the assumptions of Theorem 6 (a), if dk is generated according to (14), it holds that k + dk ? x k k x lim = 0: k!1 kxk ? x k

In the case xk+1 = xk + dk this assertion coincides with Theorem 6 (a).

We nally give the de nition of SC function. 1

De nition 8 A function f : IRn ! IR is an SC function if f is continuously 1

di erentiable and its gradient is semismooth.

SC functions can be viewed as functions which lie between C and C functions. 1

1

2

In proving the superlinear convergence rate of our method we shall use the following result, proved in [8].

Theorem 9 Let f : IRn ! IR be an SC function and suppose that we have a sequence fxk g converging to a stationary point x of f as well as a sequence of directions fdk g. Assume that limk!1 kxk + dk ? x k=kxk ? x k = 0 and that, 1

for all k suciently large and some  > 0,

rf (xk )T dk  ?kdk k :

(22)

2

Then, for any 2 (0; 1=2), there exists a k such that for every k  k

f (xk + dk )  f (xk ) + rf (xk )T dk :

(23)

3 Properties of  and In this section we recall a few properties of the two operators  and which are directly relevant to the convergence theory of the method to be described in the next section. 9

In the rst theorem below we summarize the di erential properties of  and we shall need [10,20].

Theorem 10 Let F : IRn ! IRn be given. (a) If F is a semismooth function, then  is also semismooth. (b) If F is twice continuously di erentiable, then  is strongly semismooth. (c) If F is continuously di erentiable, then is also continuously di erentiable, and its gradient at a vector x 2 IRn is given by r (x) = H T (x); where H can be an arbitrary element in @ (x): (d) If all component functions Fi are SC 1 functions, then is also an SC 1 function.

We remark the fact that, although  is nonsmooth, is continuously di erentiable. This fact will be crucial in the design of our algorithm. The following matrix-classes will be used in the subsequent de nitions and results.

De nition 11 A matrix M 2 IRnn is called a (a) P -matrix if det(MJJ ) > 0 for all subsets J  I ; (b) P0 -matrix if det(MJJ )  0 for all subsets J  I ; (c) semimonotone matrix if, for all x  0; x 6= 0; there exists an index i 2 I such that xi > 0 and xi[Mx]i  0:

Obviously, every P -matrix is a P -matrix. Furthermore, it can easily be veri ed that positive de nite matrices are P -matrices and positive semide nite matrices are P -matrices. Moreover it is known that the class of semimonotone matrices includes the class of P -matrices, see [4]. 0

0

0

For any given solution x 2 IRn of the nonlinear complementarity problem NCP(F ), let us introduce the index sets

(x ) := fi 2 I j xi > 0g; (x ) := fi 2 I j xi = Fi(x ) = 0g;

(x ) := fi 2 I j Fi(x ) > 0g: If the solution x under consideration is clear from the context, these index sets will be abbreviated by ; and ; respectively. We note that  is di erentiable at a solution x of NCP(F ) (and hence of the system (x) = 0) if and only if the set (x ) is empty.

De nition 12 A solution x 2 IRn of NCP(F ) is said to be b-regular if the principal submatrices F0 (x ) are nonsingular for all index sets  with    [ . 10

Based on this de nitionn, we can state the following result which is an immediate consequence of [5, Theorem 5.4].

Theorem 13 Let x 2 IRn be a b-regular solution of NCP(F ) such that the Schur-complement

0 (x )F 0 (x )? F 0 (x ) F 0 (x ) ? F 1

is a P0 -matrix. Then all elements G 2 @ (x ) are nonsingular.

Note, in particular, that the above theorem holds if the classical R-regularity condition of Robinson [31] is satis ed in x or if F 0(x ) itself is a P -matrix. The above theorem is important because it gives a sucient condition for a solution of NCP(F ) to be a BD-regular solution of the system (x) = 0, which, in light of Theorem 6, will be crucial in establishing a superlinear convergence rate of our method. Finally, we give some conditions for a stationary point of to be a solution of NCP(F ). Also these conditions are important because our method determines unconstrained stationary points of the natural merit function , so that these conditions can be seen as conditions guaranteeing that the algorithm converges to solutions of NCP(F ). Several sucient as well as necessary-and-sucient conditions were proven in [5]. Here we just mention two simple consequences of the results in [5], see also [13]. To this end we rst recall that a vector x 2 IRn is called feasible for NCP(F ) if x  0 and F (x)  0:

Theorem 14 Let F : IRn ! IRn and x 2 IRn be a stationary point of : (a) If F 0 (x ) is a P0-matrix, then x is a solution of NCP(F ). (b) If F 0 (x ) is a semimonotone matrix and x is feasible, then x solves NCP(F ).

Proof. Part (a) follows from Corollary 4.4 (a) in [5], whereas part (b) is a

consequence of Corollary 3.9.7 in [4] and Corollary 4.4 (c) in [5]. 2

As far as the authors are aware of, these conditions for stationary points being solutions of NCP(F ) are the weakest conditions known for any equation-based method or for any (constrained or unconstrained) minimization reformulation of NCP(F ). 11

4 The Nonsmooth Inexact Levenberg-Marquardt Algorithm We are now ready to describe an inexact Levenberg-Marquardt-type algorithm for the solution of NCP(F ). We recall that we made the blanket assumption that F is at least continuously di erentiable. Roughly speaking this algorithm can be seen as an attempt to solve the semismooth system of equations (x) = 0 by using the inexact Levenberg-Marquardt method (12) and (14). To ensure global convergence, a line search is performed to minimize the smooth merit function . If the search direction generated by (14) is not a \good" descent direction (according to test (25) below), we resort to the negative gradient of . We remark the extreme simplicity of the method from both the theoretical and computational point of view.

Algorithm. Data: x 2 IRn;  > 0; p > 2; 2 (0; 1=2); "  0: Step 0: Set k = 0: Step 1: (stopping criterion) If kr (xk )k  "; stop. Step 2: (search direction calculation) Select an element H k 2 @B (xk ): Find an approximate solution dk 2 IRn of the system ((H k )T H k + k I )d = ?(H k )T (xk ); (24) where k  0 is the Levenberg-Marquardt parameter. If the condition r (xk )T dk  ?kdk kp (25) is not satis ed, set dk = ?r (xk ): Step 3: (line search) Find the smallest ik 2 f0; 1; 2; : : :g such that 0

(xk + 2?ik dk )  (xk ) + 2?ik r (xk )T dk : Set xk = xk + 2?ik dk ; k +1

(26)

k + 1 and go to Step 1.

Note that the choice k = 0 at each step is allowed by the above algorithm. In this case, provided that H k is nonsingular, the equation (24) is equivalent to the generalized Newton equation

H k d = ?(xk ): The nonsmooth Newton method [29] for the solution of system (x) = 0 is therefore a particular case of our scheme. In Step 2 we say that we want to solve the linear system (24) inexactly. By this we mean that the vector 12

dk 2 IRn satis es ((H k )T H k + k I )dk = ?(H k )T (xk ) + rk = ?r (xk ) + rk

(27)

(the second equality follows by Theorem 10 (c)) for some suitable residual vector rk . Note however that, as usual in truncated schemes, the vector rk is not xed beforehands. Instead an iterative solver is used to solve the linear system (24), and this method is stopped when the norm of the residual rk is smaller than a pre xed accuracy (see the next two sections). In what follows, as usual in analyzing the behaviour of algorithms, we shall assume that " = 0 and that the algorithm produces an in nite sequence of points.

Theorem 15 Assume that the sequence fk g is bounded and that the sequence of residual vectors frk g satis es the condition krk k  k kr (xk )k; where k is a sequence of positive numbers converging to 0 and such that k < 1 for every k: Then each accumulation point of the sequence fxk g generated by the above procedure is a stationary point of :

Proof. First assume that there exists an in nite set K such that dk = ?r (xk ) for all k 2 K: Then the assertion follows immediately from well-known results (see, e.g., Proposition 1.16 in [1]). Hence we can assume without loss of generality that dk is always given by (27).

We now show that the sequence fdk g is uniformly gradient related to fxk g according to the de nition given in [1], i.e. we show that for every convergent subsequence fxk gK for which lim r (xk ) 6= 0

(28)

k!1;k2K

there holds lim sup kdk k < 1;

(29)

lim inf jr (xk )T dk j > 0:

(30)

k!1;k2K

k!1;k2K

Let x 2 IRn denote an accumulation point of the sequence fxk g: Subsequencing if necessary, we assume, without loss of generality, that xk ! x : Suppose that x is not a stationary point of : We get from (27):

kr (xk ) ? rk k = k((H k )T H k + k I )dk k  k(H k )T H k + k I k kdkk (31) 13

and therefore

(xk ) ? rk k : kdk k  k(kr H k )T H k + k I k

(32)

Note that the denominator in (32) is nonzero since otherwise we would have r (xk ) ? rk = 0 by (31) which, in turn, since krk k  k kr (xk )k with k < 1; would be possible only if kr (xk )k = 0; so that xk would be a stationary point and the algorithm would have stopped. Since fk g is bounded by assumption and the generalized Jacobian is upper semicontinuous, there exists a constant  > 0 such that 1

k(H k )T H k + k I k  

1

(recall that the sequence fxk g is bounded since xk ! x ). Taking into account krk k = o(kr (xk )k); we therefore get from (32)

kdk k  (kr (xk ) ? rk k)=  (kr (xk )k ? krk k)=   kr (xk )k(33) 1

1

2

for all k suciently large and a suitable constant  > 0: 2

Relation (29) now readily follows from the fact that we are assuming that the direction satis es (25) with p > 2 while the gradient r (xk ) is bounded on the convergent sequence fxk g. Consider (30). If it is not satis ed there exists a subsequence fxk gK 0 of fxk gK for which limk!1;k2K 0 jr (xk )T dk j = 0. This implies, by (25), that limk!1;k2K 0 kdk k = 0. In view of (33), this in turn implies limk!1;k2K 0 kr (xk )k = 0, contradicting (28). The assertion of the theorem now follows from [1, Proposition 1.8]. 2 Note that, under the additional assumptions of Theorem 14, we get from Theorem 15 that any accumulation point of the sequence fxk g is a solution of NCP(F ). The next result gives a mild sucient condition for the whole sequence to converge to a single point.

Theorem 16 Let the assumptions of Theorem 15 hold. Let fxk g be any sequence generated by the algorithm. If one of the accumulation points of fxk g; say x ; is an isolated solution of NCP(F ), then the entire sequence fxk g converges to x :

Proof. Let K be a subset of f1; 2; : : :g such that fxk gK ! x . Since fkr (xk )kgK ! 14

0 we have, either because dk = ?r (xk ) or by (25), with p > 2, that fdk gK ! 0. The thesis then follows by Theorem 2.3 in [9]. 2 The following result deals with the local rate of convergence.

Theorem 17 Assume the sequence frk g satis es the assumption of Theorem 15. If one of the limit points of the sequence fxk g; let us say x ; is a BDregular solution of the system (x) = 0; and if each Fi is an SC 1 function, then

(a) The sequence fxk g converges to x : (b) Eventually dk is always given by the solution of system (24) (i.e., the negative gradient is never used eventually). (c) If f k g ! 0, eventually the stepsize of one is always accepted so that xk+1 = xk + dk : (d) If f k g ! 0; then xk ! x Q-superlinearly. (e) If Fi 2 C 2 for each i 2 f1; : : : ; ng;  k = O(kr (xk )k) and krk k = O(kr (xk )k2); then xk ! x Q-quadratically.

Proof. (a) Since x is a BD-regular solution of (x) = 0 and since  is semismooth by Theorem 10 (a), x is a locally unique solution of this system by Proposition 3 in [27] and therefore an isolated solution of NCP(F ). Hence xk ! x by Theorem 16. (b) We rst show that there is a constant  > 0 such that 3

r (xk )T dk  ? kdk k

(34)

2

3

for all k suciently large. Since x is a BD-regular solution of (x) = 0; xk ! x by part (a) and k  0 for all k; the matrices (H k )T H k + k I are uniformly positive de nite for all k suciently large. Hence there exist constants    > 0 such that 2

1

   kdk k  (dk )T (H k )T H k + k I dk   kdk k : 1

2

2

2

(35)

We now obtain from (27), (33), (35), krk k  k kr (xk )k and the CauchySchwarz inequality:





r (xk )T dk = ?(dk )T (H k )T H k + k I dk + (rk )T dk  ? kdk k + krk k kdk k 1

2

15

 ? kdk k + k kr (xk )k kdk k  ? kdk k + k kdk k = : 1

1

2

2

2

2

From this, inequality (34) follows immediately by taking into account that f k g ! 0. Therefore, statement (b) follows from (34), fkdk kg ! 0 and the fact that p > 2 in (25). (c) By Theorem 10 (d), the merit function is an SC function. Statement (c) is therefore a direct consequence of (34), Remark 7 and Theorem 9. 1

(d) and (e) By point (c) the algorithm reduces eventually to xk = xk + dk . The assertions then follow directly from Theorem 6, taking into account Theorem 10 (a) and (b) and the fact that r (xk ) = (H k )T (xk ) by Theorem 10 (c). 2 +1

The BD-regularity assumption in Theorem 17 is satis ed under the assumptions of Theorem 13 and therefore, in particular, under the R-regularity condition of Robinson [31]. We also stress that the rate of convergence results are independent of whether the limit point x is degenerate or not.

5 Implementation Issues In the implementation of the algorithm we assume that three routines are available: one that, given a point x, returns the value of F (x), a second one that, given a point x and a vector v, returns the product F 0(x)v and a last one that, given a point x and a vector v, returns the product F 0(x)T v. We note that these are usual requiremts in the solution of large-scale problems. In particular, since most large-scale problems have a sparse structure, it is generally possible to store the Jacobian of F in a compact form from which the second and third routine mentioned above are readily obtained. In any case, even when the Jacobian is not exlicitly known, automatic di erentiation gives us a sound theoretical tool to obtain these routines at a cost which is only a small multiple of the cost of calculating F . The computational most intensive part of algorithm is the approximate solution of system (24). We note that this system is always solvable. In fact, if k > 0, then the matrix (H k )T H k + k I is (symmetric) positive de nite and hence system (24) is surely solvable. If k = 0, then the matrix (H k )T H k + k I reduces to (H k )T H k , which is guaranteed to be only positive semide nite. However, in this case, system (24) reduces to the normal equations for the system H k d = ?(xk ), and is therefore solvable. We therefore chose to solve system (24) by the (unpreconditioned) conjugate method, which is guaranteed to nd 16

a solution of the system in at most n iterations in exact arithmetic (see, e.g., [28] for the semide nite case). This means, in particular, that we can solve system (24) as accurately as we want, i.e., that it is always possible to make the residual vector rk as close to 0 as we like. Having chosen to solve system (24) by the conjugate gradient method, we have to show how it is possible to calculate the product of any vector v times the matrix (H k )T H k + k I . In turn, it is clear that this is possible if we are able to calculate the product of any vector v by H k and by (H k )T . To this end we rst have to specify which element H k 2 @B (xk ) we select at the k-th iteration. In [5] we showed that an element of @B (x) can be obtained in the following way. Let := fi : xki = 0 = Fi(xk )g be the set of \degenerate indices" and de ne z 2 IRn to be a vector whose components zi are 1 if i 2 and 0 otherwise. Then, the matrix H k de ned by

H k = A(xk ) + B (xk )F 0(xk ); where A and B are n  n diagonal matrices whose ith diagonal elements are given, respectively, by ( xi ? 1 if i 62 ; Aii(x) = k xi ;Fi zxi k ? 1 if i 2 ; k zi ;rFi x T z k ( (

( ))

( )

)

and by

8 Fi x > < ? 1 if i 62 ; Bii(x) = > k xir;FFii xx Tkz : k zi ;rFi x T z k ? 1 if i 2 ; ( (

( ) ( ))

( ) ( )

)

belongs to @B (xk ). It is now easy to see that, storing in two vectors the diagonal elements of the matrices A and B and using the routines that calulate the product of a vector v by the matrices F 0 and F 0T we can evaluate also the product of v by H k and by (H k )T . Two situations can be distinguished. If = ; (i.e. if  is di erentiable at xk ), then the product Hv can be calculated performing the product F 0(xk )v plus O(n) arithmetic operations. If, instead, 6= ; (i.e. if  is not di erentiable at xk ), then we need an additional calculation of the product F 0(xk )T z in order to evaluate the diagonal elements of A(xk ) and B (xk ). As expected, if  is not di erentiable at xk , then the calculation of H k v is more costly than in the case in which  is di erentiable at x. Similar considerations can be made in the case of the product (H k )T v and we omit them. We nally discuss the linesearch used in the algorithm. We use a nonmonotone linesearch, which can be viewed as an extension of (26). To motivate this variant we rst recall that it has been often observed, in the eld of nonlinear complementarity algorithms, that the linesearch test used to enforce global 17

convergence can lead to very small stepsizes; in turn this can bring to very slow convergence and even to a numerical failure of the algorithm. To circumvent this problem many heuristics have been used, see, e.g., [18,26]. We propose to substitute the linesearch test (26) by the following nonmonotone linesearch: (xk + 2?ik dk )  W + 2?ik r (xk )T dk ;

(36)

where W is any value satisfying (xk )  W  j max (xk?j ) ; ;:::;M k

(37)

=0 1

and M k are nonnegative integers bounded above for any k (we suppose for simplicity that k is large enough so that no negative superscripts can occur in (37)). This kind of linesearch has been rst proposed in [15] and since then it has proved very useful in the unconstrained minimization of smooth functions. In particular it has already been used in the eld of nonlinear complementarity problems in [5]. Adopting the same proof techniques used in [15], it is easy to see that all the results described in the previous section still hold if we substitute the linesearch (36) to (26) in the algorithm of Section 4. If at each iteration we take W = (xk ); we obtain exactly the algorithm of Section 3, however much more freedom is allowed by (36). In particular, according to the most recent experiences in nonmonotone minimization algorithms [32], we chose to keep W xed as long as the algorithm seems to make progress and to vary it only if for a certain pre xed number of consecutive steps the function values (xk ) increase; the precise scheme is as follows. - Set W = (x ) at the beginning of the algorithm; - keep the value of W xed as long as 0

(xk )  j min (xk?j ); ;;

(38)

=0 1 2

- If (38) is not satis ed at iteration k, set W = (xk ); - In any case, at most every 20 iterations reset W to (xk ).

6 Numerical Results We tested our algorithm on 17 test problems generated as suggested in [16]. Let g(x) = 0 be a (large-scale) di erentiable system of nonlinear equations and let x 2 IRn be de ned by

x = (1; 0; 1; 0; : : :)T : 18

For all i = 1; : : : ; n set

(

? gi(x ) if i odd or i > n=2; Fi(x) = ggi((xx)) ? gi(x ) + 1 otherwise i In this way x is a solution of the nonlinear complementarity problem NCP(F ). In particular note that x is a degenerate solution (with j (x)j = n=4) so that the system (x) = 0 is nondi erentiable at x. As done in [16] we used the collection of 17 large-scale problems used by Luksan in [21] and the starting points indicated there. We coded the algorithm in Fortran 77 and run it on an IBM RISC 6000/375 machine, in double precision arithmetic.

p

The main stopping criterion is k min(xk ; F (xk ))k  10? n, but we also stopped the algorithm after 100 iterations if the former stopping criterion was not met. We used a single set of parameters for all the runs of the problems, more precisely we set: = 10? ,  = 10? and p = 2:1. The conjugate gradient algorithm was stopped when the norm of the residual is smaller than (0:1=(k + 1))kr (xk )k, but we also set a limit of 200 iterations to the conjugate gradient phase. Finally we turn to the choice of k . We rst note that the classical sophisticated choices indicated, e.g., in [6], are not suitable for large scale problems. We used a very simple, and yet seemingly e ective strategy. If in the previous iteration the quotient kr (xk? )k=kdk? k is greaterpthan 250 and the norm of the residual k min(xk ; F (xk ))k is smaller than k(0:1 n) then we set k = 1, otherwise we set k = 0. We note that the rst test is a rough indicator that \something is going wrong" while the second test makes the possibility of the perturbation (i.e. k > 0) more and more unlikely the more the process progresses, so that the nal fast convergence rate is preserved. 5

4

8

1

1

We run the algorithm on the 17 test problems with dimensions n = 100, n = 1000 and n = 10000. The results are summarized in Table 1. For each test we report: the dimension of the problem (n), the number of iterations (it), the total number of iterations of the conjugate gradient algorithm used to solve the linear systems (minor it), the number of functions evaluations (F), the nal value merit function and the norm of the residual vector divided by pn, sooftothemake this stationarity measure independent of the dimension of the problem. If convergence occured to a solution of NCP(F ) di erent from x , this is indicated by an  after the iteration number. Note that this evenience is possible because we do not have any guarantee that x is the only solution of the problem generated. In any case we observed that, even when the solution found di ers from x, this is so only for very few components. The overall results seem to indicate that, on this set of test problems, the algorithm is fairly robust and capable of nding a solution to the problem with a limited amount of work. We see that the algorithm is practically insensitive to the dimension of the problem, which is a typical feature of Newton-type 19

Table 1 Results for degenerate problems. Prob. n it minor it. 100 9 139 1 1000 9 159 10000 10 166 100 4 6 2 1000 4 6 10000 4 6 100 F 3 1000 F 10000 F 100 10 82 4 1000 11 84 10000 11 81 100 31 1786 5 1000 19 1414 10000 21 1060 100 F 6 1000 F 10000 F 100 25 321 7 1000 26 330 10000 31 340 100 13 128 8 1000 15 144 10000 15 139 100 14 150 9 1000 14 101 10000 15 105 100 16 156 10 1000 10 24 10000 11 32 100 4 6 11 1000 4 6 10000 4 6 100 10 45 12 1000 10 42 10000 10 42 100 25 162 13 1000 22 133 10000 18 110 100 14 390 14 1000 14 407 10000 13 702 100 14 297 15 1000 14 417 10000 22 793 100 10 259 16 1000 100 18828 10000 100 18962 100 10 72 17 1000 10 70 10000 10 68

p

F Res/ n 10 .82d-9 .29d-5 10 .31d-7 .56d-5 11 .45d-8 .67d-6 8 .13d-9 .16d-5 7 .20d-12 .20d-7 7 .20d-11 .20d-7 11 12 12 36 20 24

.38d-9 .31d-8 .28d-9 .45d-8 .45d-7 .46d-6

.29d-5 .27d-5 .23d-6 .93d-5 .90d-5 .92d-5

31 29 36 14 16 16 15 15 16 18 11 13 5 5 5 11 11 11 33 30 25 15 15 14 15 15 28 11 108 102 11 11 11

.22d-8 .83d-8 .41d-6 .83d-9 .30d-8 .46d-7 .41d-9 .72d-8 .91d-7 .47d-12 .24d-10 .26d-6 .26d-17 .26d-16 .26d-15 .16d-9 .15d-8 .15d-7 .70d-10 .19d-7 .83d-7 .16d-8 .95d-8 .11d-6 .60d-9 .28d-7 .54d-7 .81d-9 .32d-4 .75d-3 .15d-9 .33d-8 .40d-8

.67d-5 .41d-7 .91d-5 .23d-5 .14d-5 .30d-5 .25d-5 .24d-5 .24d-5 .97d-7 .22d-6 .72d-5 .23d-9 .23d-8 .23d-9 .18d-5 .17d-5 .17d-7 .63d-6 .30d-5 .64d-5 .56d-5 .28d-5 .38d-5 .25d-5 .41d-5 .19d-5 .40d-5 .24d-3 .12d-4 .17d-5 .26d-5 .90d-6

methods. Furthermore, in all cases in which convergence occurred, a fast nal rate of convergence was observed, showing that the assumptions necessary to establish the superlinear rate of convergence of the method are usually met in practice. We now comment on the failures of the algorithm. The algorithm fails on Problems 3 and 6 for n = 100, n = 1000 and n = 10000. The reason of the failure is that in these cases convergence occurs towards stationary points of the merit function that are not solutions of NCP(F ). There is not much 20

Table 2 Results for nondegenerate problems, n = 1000. p Prob. it minor it. F Res/ n 1 6 65 7 .21d-7 .64d-5 2 4 6 7 .40d-12 .28d-7 3 F 4 14 70 15 .39d-8 .28d-5 5 19 648 21 .30d-7 .78d-5 6 F 7 25 329 27 .36d-11 .85d-7 8 14 86 15 .38d-9 .87d-6 9 17 112 18 .63d-9 .11d-5 10 9 20 10 .15d-7 .55d-5 11 3 4 4 .13d-7 .51d-5 12 6 18 7 .83d-12 .40d-7 13 14 67 20 .72d-11 .12d-6 14 100 3191 796 .79d-1 .12d-1 15 12 136 16 .11d-7 .47d-5 16 11 141 12 .43d-9 .92d-6 17 10 61 11 .39d-9 .89d-6

we can do for avoiding this (except considering global minimization algorithms for or heuristics). However, it is interesting to note that in [16], where the merit function k min(x; F (x))k is used in the solution of the same problems considered here, convergence to stationary points others than the solutions of the nonlinear complementarity problems is more frequent and furthermore divergence also occurs in some cases, which is not the case for our algorithm. This seems to con rm, at least on this set of test problems, that the merit function enjoys better theoretical properties than other merit functions. Continuing in the analysis of the failures we also see that the algorithm reached the maximum number of iterations without meeting the stopping criterion on Problem 16, for n = 1000 and n = 10000. In this case, as indicated by the nal value of and of the residual, the algorithm is close to a solution. We observed the following behaviour. In few iterations the algorithm is able to reach a zone close to the solution. After that the maximum number of iterations allowed to the conjugate gradient solver is always reached. This means that the search direction is not close to the Newton direction so that the algorithm converges very slowly to the solution. 2

We also performed a second set of experiments generating complementarity problems as done before but setting

(

gi(x ) if i odd Fi(x) = ggi((xx)) ? ? gi(x ) + 1 otherwise i The complementarity problems so obtained are nondegenerate at the solution x and therefore (x) is continuously di erentiable at x . Since again there are no major di erences when n varies, we report in Table 2 only the results for n = 1000. 21

The only new observation we can make with respect to these results is that they are overall very similar to the previous ones, showing that the algorithm is seemingly insensitive to whether the solution towards which convergence occurs is a di erentiable point of the system (x) = 0 or not.

Acknowledgement We are grateful to M.A. Gomes-Ruggiero and J.M. Martnez who very kindly provided us with the fortran codes of the test problems.

References [1] D.P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods (Academic Press, New York, 1982). [2] C. Chen and O.L. Mangasarian, A class of smoothing functions for nonlinear and mixed complementarity problems, Technical Report 94-11, Computer Sciences Department, University of Wisconsin (Madison, WI, August 1994, revised February 1995). To appear in Computational Optimization and Applications . [3] F.H. Clarke, Optimization and Nonsmooth Analysis (Wiley, New York, 1983, reprinted by SIAM, Philadelphia, 1990). [4] R.W. Cottle, J.-S. Pang and R.W. Stone, The Linear Complementarity Problem (Academic Press, New York, 1992). [5] T. De Luca, F. Facchinei and C. Kanzow, A semismooth equation approach to the solution of nonlinear complementarity problems, DIS Technical Report 01.95, Universita di Roma \La Sapienza" (Roma, Italy, January 1995). [6] J.E. Dennis and R.B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations (Prentice Hall, Englewood Cli s, New Jersey, 1983). [7] S.P. Dirkse and M.C. Ferris, MCPLIB: A collection of nonlinear mixed complementarity problems, Technical Report 1215, Computer Sciences Department, University of Wisconsin (Madison, WI, 1994). To appear in Optimization Methods and Software . [8] F. Facchinei, Minimization of SC1 functions and the Maratos e ect, DIS Technical Report 10.94, Universita di Roma \La Sapienza" (Roma, Italy, 1994). To appear in Operations Research Letters . [9] F. Facchinei, On the convergence to a unique point of minimization algorithms, DIS Technical Report 06.95, Universita di Roma \La Sapienza" (Roma, Italy, January 1995).

22

[10] F. Facchinei and J. Soares, A new merit function for nonlinear complementarity problems and a related algorithm, DIS Technical Report 15.94, Universita di Roma \La Sapienza" (Roma, Italy, 1994). [11] M.C. Ferris and J.-S. Pang, Engineering and economic applications of complementarity problems, Technical Report, Computer Sciences Department, University of Wisconsin (Madison, WI, 1995). [12] A. Fischer, A special Newton-type optimization method, Optimization 24 (1992) 269{284. [13] A. Fischer, An NCP-function and its use for the solution of complementarity problems, to appear in D.-Z. Du, L. Qi and R. Womersley (eds.), Recent Advances in Nonsmooth Optimization (Kluwer Academic Publishers, Dordrecht, 1995). [14] S.A. Gabriel and J.-S. Pang, An inexact NE/SQP method for solving the nonlinear complementarity problem, Computational Optimization and Applications 1 (1992) 67{91. [15] L. Grippo, F. Lampariello and S. Lucidi, A nonmonotone line search technique for Newton's method, SIAM Journal on Numerical Analysis 23 (1986) 707{716. [16] M.A. Gomes-Ruggiero, J.M. Martnez and S.A. Santos, Solving nonsmooth equations by means of quasi-Newton methods with globalization. To appear in D.-Z. Du, L. Qi and R. Womersley (eds.), Recent Advances in Nonsmooth Optimization (Kluwer Academic Publishers, Dordrecht, 1995). [17] P.T. Harker and J.-S. Pang, Finite-dimensional variational inequality and nonlinear complementarity problems: a survey of theory, algorithms and applications, Mathematical Programming 48 (1990) 161{220. [18] P.T. Harker and B. Xiao, Newton's method for the nonlinear complementarity problem: a B-di erentiable equation approach, Mathematical Programming 48 (1990) 339{357. [19] G. Isac, Complementarity Problems , Lecture Notes in Mathematics 1528 (Springer-Verlag, Berlin, 1992). [20] H. Jiang and L. Qi, A new nonsmooth equations approach to nonlinear complementarities, Applied Mathematics Report 94/31, School of Mathematics, University of New South Wales (Sydney, Australia, October 1994). [21] L. Luksan, Inexact trust region method for large sparse systems of nonlinear equations, Journal of Optimization Theory and Applications 81 (1994) 569{590. [22] J.M. Martnez and L. Qi, Inexact Newton methods for solving nonsmooth equations, Applied Mathematics Report 93/9, School of Mathematics, University of New South Wales (Sydney, Australia, 1993, revised April 1994). [23] R. Miin, Semismooth and semiconvex functions in constrained optimization, SIAM Journal on Control and Optimization 15 (1977) 957{972.

23

[24] J.-S. Pang, Inexact Newton methods for the nonlinear complementarity problem, Mathematical Programming 36 (1986) 54{71. [25] J.-S. Pang, Complementarity problems, in: R. Horst and P.M. Pardalos (eds.), Handbook of Global Optimization (Kluwer Academic Publishers, Dordrecht, 1994), 271{338. [26] J.-S. Pang and S.A. Gabriel, NE/SQP: A robust algorithm for the nonlinear complementarity problem, Mathematical Programming 60 (1993) 295{337. [27] J.-S. Pang and L. Qi, Nonsmooth equations: motivation and algorithms, SIAM Journal on Optimization 3 (1993) 443{465. [28] B.N. Pshenichny and J.M Danilin, Numerical Methods in Extremal Problems (Mir Publisher, Moscow, 1978). [29] L. Qi, A convergence analysis of some algorithms for solving nonsmooth equations, Mathematics of Operations Research 18 (1993) 227{244. [30] L. Qi and J. Sun, A nonsmooth version of Newton's method, Mathematical Programming 58 (1993) 353{368. [31] S.M. Robinson, Generalized equations, in: A. Bachem, M. Groetschel and B. Korte (eds.), Mathematical Programming: The State of the Art (SpringerVerlag, Berlin, Germany, 1983), 346{367. [32] Ph.L. Toint, A non-monotone trust-region algorithm for nonlinear optimization subject to convex constraints, Report 92/94, Department of Mathematics, FUNDP, Namur, Belgium, 1994.

24