On the Quadratic Convergence of the Levenberg-Marquardt Method ...

Report 1 Downloads 36 Views
Computing 74, 23–39 (2005) Digital Object Identifier (DOI) 10.1007/s00607-004-0083-1

On the Quadratic Convergence of the Levenberg-Marquardt Method without Nonsingularity Assumption Jin-yan Fan and Ya-xiang Yuan, Beijing Received October 16, 2001; revised March 9, 2004 Published online: January 10, 2005 Ó Springer-Verlag 2005 Abstract Recently, Yamashita and Fukushima [11] established an interesting quadratic convergence result for the Levenberg-Marquardt method without the nonsingularity assumption. This paper extends the result of Yamashita and Fukushima by using lk ¼ kF ðxk Þkd , where d 2 ½1; 2, instead of lk ¼ kF ðxk Þk2 as the Levenberg-Marquardt parameter. If kF ðxÞk provides a local error bound for the system of nonlinear equations F ðxÞ ¼ 0, it is shown that the sequence fxk g generated by the new method converges to a solution quadratically, which is stronger than distðxk ; X  Þ ! 0 given by Yamashita and Fukushima. Numerical results show that the method performs well for singular problems. AMS Subject Classifications: 34G20, 65K05, 90C30. Keywords: Nonlinear equations, Levenberg-Marquardt method, quadratic convergence.

1. Introduction We consider the problem for solving nonlinear equations F ðxÞ ¼ 0;

ð1:1Þ

where F ðxÞ : Rn ! Rm is continuously differentiable and F 0 ðxÞ is Lipschitz continuous. Throughout the paper, we assume that the solution set of (1.1) is nonempty and denote it by X  . And in all cases k  k refers to the 2-norm. The classical Levenberg-Marquardt method (see [2], [3]) for nonlinear Eqs. (1.1) computes the trial step by dk ¼ ðJ ðxk ÞT J ðxk Þ þ lk IÞ1 J ðxk ÞT F ðxk Þ;

ð1:2Þ

where J ðxk Þ ¼ F 0 ðxk Þ is the Jacobian, and lk  0 is a parameter being updated from iteration to iteration. Levenberg-Marquardt step (1.2) is a modification of the Gauss-Newton’s step dkGN ¼ ðJ ðxk ÞT J ðxk ÞÞ1 J ðxk ÞT F ðxk Þ:

ð1:3Þ

24

Jin-yan Fan and Ya-xiang Yuan

The parameter lk is used to prevent dk from being too large when J ðxk ÞT J ðxk Þ is nearly singular. Furthermore, when J ðxk ÞT J ðxk Þ is singular, the Gauss-Newton’s step is undefined. A positive lk guarantees that (1.2) is well defined. It is well known that the Levenberg-Marquardt method has a quadratic rate of convergence when m ¼ n, if the Jacobian at the solution x is nonsingular and if the parameter is chosen suitably at each step. However, the condition of the nonsingularity of J ðx Þ is too strong. Recently, Yamashita and Fukushima [11] have shown that under the weaker condition that kF ðxÞk provides a local error bound near the solution, the Levenberg-Marquardt method still has a quadratic conver gence if the parameter is chosen as lk ¼ kF ðxk Þk2 . This is a very interesting result. However, the quadratic term lk ¼ kF ðxk Þk2 has some unsatisfactory properties. When the sequence is close to the solution set, lk ¼ kF ðxk Þk2 may be smaller than the machine precision, so it will lose its role. On the other hand, when the sequence is far away from the solution set, lk ¼ kF ðxk Þk2 may be very large, and the step dk will be too small, consequently, it prevents the iterates moving to the solution set quickly. Because of these observations, we consider the choice lk ¼ kF ðxk Þkd with d 2 ½1; 2. We prove that with this parameter, if kF ðxÞk provides a local error bound near some x 2 X , then the sequence fxk g generated by the new LevenbergMarquardt method converges quadratically to the solution of (1.1), that is kxkþ1  xk  Qkxk  xk2 holds for all sufficiently large k, where x 2 X  and Q > 0 is a positive constant. Definition 1.1: Let N be a subset of Rn such that N \ X  6¼ ;. We say that kF ðxÞk provides a local error bound on N for system (1.1), if there exists a positive constant c > 0 such that kF ðxÞk  cdistðx; X  Þ;

8x 2 N :

Note that, if J ðx Þ is nonsingular at a solution x of (1.1), then x is an isolated solution, hence kF ðxÞk provides a local error bound on some neighborhood of x . However, the converse is not necessarily true, see the example in [11]. So a local error bound condition is weaker than that of the nonsingularity. In the next section, we show that under the local error bound condition, the sequence generated by the new Levenberg-Marquardt method without line search converges to the solution quadratically. In Sect. 3, the global convergence result is given when the line search is used. Finally in Sect. 4, we present the numerical results for some singular nonlinear equations. 2. Local Convergence of the Levenberg-Marquardt Method To study the convergence properties of the method, we make the following assumptions.

On the Quadratic Convergence of the Levenberg-Marquardt Method

25

Assumption 2.1: (a) F ðxÞ is continuously differentiable, and the Jacobian J ðxÞ is Lipschitz continuous on some neighborhood of x 2 X  , i.e., there exist positive constants L1 and b1 < 1 such that kJ ðyÞ  J ðxÞk  L1 ky  xk;

8x; y 2 N ðx ; b1 Þ ¼ fx j kx  x k  b1 g:

ð2:1Þ

(b) kF ðxÞk provides a local error bound on N ðx ; b1 Þ for the system (1.1), i.e., there exists a constant c1 > 0 such that kF ðxÞk  c1 distðx; X  Þ;

8x 2 N ðx ; b1 Þ:

ð2:2Þ

Note that, by Assumption 2.1a, we have kF ðyÞ  F ðxÞ  J ðxÞðy  xÞk  L1 ky  xk2 ;

8x; y 2 N ðx ; b1 Þ;

ð2:3Þ

and, there exists a constant L2 > 0 such that kF ðyÞ  F ðxÞk  L2 ky  xk;

8x; y 2 N ðx ; b1 Þ:

ð2:4Þ

We discuss the local convergence of the Levenberg-Marquardt method without line search, i.e., the next iterate xkþ1 is computed by xkþ1 ¼ xk þ dk ; where dk is given by (1.2). For simplification, we use the notations Fk ¼ F ðxk Þ; Jk ¼ J ðxk Þ in the following. And we assume Assumption 2.2: lk ¼ kFk kd for all k; where d 2 ½1; 2: Yamashita and Fukushima [11] show the quadratic convergence of the Levenberg-Marquardt method when choosing lk ¼ kFk k2 , based the analyses on an unconstrained optimization problem. Here, we first prove the superlinear convergence of the new Levenberg-Marquardt method when choosing lk ¼ kFk kd , then, based on the singular value decomposition of the Jacobian matrix, we obtain the quadratic convergence. In the following, we denote xk the vector in X  that satisfies kxk  xk k ¼ distðxk ; X  Þ: Lemma 2.1: Under the conditions of Assumptions 2.1 and 2.2, if xk 2 N ðx ; b1 =2Þ, then there exists a constant c2 > 0 such that kdk k  c2 distðxk ; X  Þ:

ð2:5Þ

26

Jin-yan Fan and Ya-xiang Yuan

Proof: Since xk 2 N ðx ; b1 =2Þ, we have kxk  x k  kxk  xk k þ kxk  x k  kxk  x k þ kxk  x k  b1 ; which means that xk 2 N ðx ; b1 Þ. Hence it follows from (2.2) and (2.4) that the Levenberg-Marquardt parameter lk satisfies cd1 kxk  xk kd  lk ¼ kFk kd  Ld2 kxk  xk kd :

ð2:6Þ

Define uk ðdÞ ¼ kFk þ Jk dk2 þ lk kdk2 : It follows from (1.2) that dk is a stationary point of uk ðdÞ. Hence the convexity of uk ðdÞ indicates that dk is also a minimizer of uk ðdÞ. Thus, using xk 2 N ðx ; b1 Þ and b1 < 1, we have uk ðdk Þ lk uk ðxk  xk Þ  lk kFk þ Jk ðxk  xk Þk2 þ lk kxk  xk k2 ¼ lk 4d 2 d  L1 c1 kxk  xk k þ kxk  xk k2 2 d  ðL1 c1 þ 1Þkxk  xk k2 :

kdk k2 

The above inequality implies that kdk k  c2 distðxk ; X  Þ; where c2 ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi L21 cd 1 þ 1.

h

Lemma 2.2: Under the conditions xkþ1 ; xk 2 N ðx ; b1 =2Þ, then we have

of

Assumptions

2.1

2þd

distðxk þ dk ; X  Þ  c3 distðxk ; X  Þ 2 ; where c3 ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  L21 þ Ld2 þ L1 c22 =c1 .

Proof: Since uk ðdk Þ  uk ðxk  xk Þ ¼ kFk þ Jk ðxk  xk Þk2 þ lk kxk  xk k2  L21 kxk  xk k4 þ Ld2 kxk  xk k2þd  ðL21 þ Ld2 Þkxk  xk k2þd ;

and

2.2,

if

ð2:7Þ

On the Quadratic Convergence of the Levenberg-Marquardt Method

27

we have kF ðxk þ dk Þk  kFk þ Jk dk k þ L1 kdk k2 pffiffiffiffiffiffiffiffiffiffiffiffiffi  uk ðdk Þ þ L1 kdk k2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2þd  L21 þ Ld2 kxk  xk k 2 þ L1 c22 kxk  xk k2 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2þd 2 2 d  L1 þ L2 þ L1 c2 kxk  xk k 2 : Hence distðxk þ dk ; X  Þ  where c3 ¼

2þd 1 kF ðxk þ dk Þk  c3 distðxk ; X  Þ 2 ; c1

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  L21 þ Ld2 þ L1 c22 =c1 . The proof is completed.

h

Theorem 2.1: Under the conditions of Assumptions 2.1 and 2.2, if x0 is chosen sufficiently close to X  , then fxkþ1 ¼ xk þ dk g converges to some solution x of (1.1) superlinearly. ( ) b1 1 ; Proof: Let r ¼ min . First we show by induction that 2ð1 þ 11c2 Þ 2c2d 3

if x0 2 N ðx ; rÞ, then xk 2 N ðx ; b1 =2Þ for all k. It follows from Lemma 2.1 that kx1  x k ¼ kx0 þ d0  x k  kx0  x k þ kd0 k  kx0  x k þ c2 kx0  x0 k  ð1 þ c2 Þr  b1 =2; which means x1 2 N ðx ; b1 =2Þ. Suppose xi 2 N ðx ; b1 =2Þ for i ¼ 2; . . . ; k. Then we have from Lemma 2.2 that i  ð2þd  ð32Þi 2 2þd i 2þd 1 2 Þ 1 1 Þi  ð2þd dðð 2 Þ 1Þ 2 2 kxi  xi k  c3 kxi1  xi1 k  . . . c3 kx0  x k r  2r : 2 2 Hence, it follows from the definition of r that kxkþ1  x k  kx1  x k þ

k X

kdk k

i¼1

 ð1 þ c2 Þr þ c2

k X

kxi  xi k

i¼1

 ð1 þ c2 Þr þ 2rc2

i k  ð32Þ X 1

i¼1

 ð1 þ c2 Þr þ 2rc2 4 þ

2 ! k  ð3i2 Þ X 1 i¼1

 ð1 þ 9c2 Þr þ 2rc2

1  i X 1 i¼1

 ð1 þ 11c2 Þr  b1 =2;

2

2

28

Jin-yan Fan and Ya-xiang Yuan

so xkþ1 2 N ðx ; b1 =2Þ. Therefore, if x0 is chosen sufficiently close to X  , then all xk are in N ðx ; b1 =2Þ. Now it follows from (2.7) that 1 X

distðxk ; X  Þ < þ1;

k¼0

which implies, due to Lemma 2.1, that 1 X

kdk k < þ1:

k¼0

Thus fxk g converges to some point x 2 X  . It is obvious that distðxk ; X  Þ  distðxk þ dk ; X  Þ þ kdk k: The above inequality and (2.7) imply that distðxk ; X  Þ  2kdk k

ð2:8Þ

for all large k. Thus from (2.5), (2.7) and (2.8) we obtain that 2þd

kdkþ1 k ¼ Oðkdk k 2 Þ: Hence, fxk g converges to some solution x 2 X . Therefore, we have lim

kxkþ1  xk

k!1 kx

k

 xk

2þd 2

¼ lim

k

k!1 k

P1

j¼kþ1 dj k

P1

j¼k

dj k

2þd 2

¼ lim

kdkþ1 k

k!1 kd k k

2þd 2

 c~;

where c~  0 is a constant. The above inequality implies that fxk g converges to the solution x quadratically when d ¼ 2 and superlinearly when d 2 ½1; 2Þ. h Without loss of generality, we assume that fxk g converges to x 2 X  , and the singular value decomposition (SVD) of J ðx Þ is J ðx Þ ¼ U  R V T 0  r1 .. B B . B B ¼U B B B @

1 C C C C T CV C C A

rr 0

..

. 0

¼ U1 R1 V1T ; where r1  r2  . . .  rr > 0 and rank ðR1 Þ ¼ r. Suppose the SVD of J ðxk Þ and its decomposition form is as follows:

On the Quadratic Convergence of the Levenberg-Marquardt Method

J ðxk Þ ¼ Uk Rk VkT

0

B B B B B B B B B ¼ ðUk;1 ; Uk;2 ; Uk;3 ÞB B B B B B B B B @

29

1

ðkÞ

r1

..

C C C C C C0 C VT 1 C CB k;1 T C C@ Vk;2 A C C T V C k;3 C C C C C A

. ðkÞ

rr

ðkÞ

rrþ1 ..

. ðkÞ

rrþq 0 ..

. 0

T T ¼ Uk;1 Rk;1 Vk;1 þ Uk;2 Rk;2 Vk;2 ;

ð2:9Þ where Rk;1 ; Rk;2 > 0, rank ðRk;1 Þ ¼ r and rank ðRk;2 Þ ¼ q  0. In the following, if the context is clear, we supress the subscription k in Rk;i and Uk;i ; Vk;i ði ¼ 1; 2; 3Þ. Consequently, (2.9) can be written as Jk ¼ U1 R1 V1T þ U2 R2 V2T : To prove quadratic convergence when d 2 ½1; 2Þ, we first give the following lemma. Lemma 2.3: Under the conditions of Assumption 2.1, if xk 2 N ðx ; b1 =2Þ, then we have (a) kU1 U1T Fk k  L2 kxk  xk k; (b) kU2 U2T Fk k  2L1 kxk  x k2 ; (c) kU3 U3T Fk k  L1 kxk  xk k2 . Proof: Result (a) follows immediately from (2.4). By the theory of matrix perturbation [10] and Assumption 2.1(a), we have kdiagðR1  R1 ; R2 ; 0Þk  kJk  J  k  L1 kxk  x k: The above relation gives kR1  R1 k  L1 kxk  x k

and

kR2 k  L1 kxk  x k:

ð2:10Þ

Let sk ¼ Jkþ Fk , where Jkþ is the pseudo-inverse of Jk . It is easy to see that sk is the least squares solution of min kFk þ Jk sk, so we obtain from (2.3) that kU3 U3T Fk k ¼ kFk þ Jk sk k  kFk þ Jk ðxk  xk Þk  L1 kxk  xk k2 :

30

Jin-yan Fan and Ya-xiang Yuan

Let J~k ¼ U1 R1 V1T and ~sk ¼ J~kþ Fk . Since ~sk is the least squares solution of min kFk þ J~k sk, it follows from (2.3) and (2.10) that kðU2 U2T þ U3 U3T ÞFk k ¼ kFk þ J~k ~sk k  kFk þ J~k ðxk  xk Þk  kFk þ Jk ðxk  xk Þk þ kðJ~k  Jk Þðxk  xk Þk  L1 kxk  xk Þk2 þ kU2 R2 V2T ðxk  xk Þk  L1 kxk  xk k2 þ L1 kx  xk kkxk  xk k  2L1 kxk  x k2 : h

Due to the orthogonality of U2 and U3 , we get result (b).

Theorem 2.2: Under the conditions of Assumptions 2.1 and 2.2, if the sequence fxk g is generated by the new Levenberg-Marquardt method without line search with x0 sufficiently close to x , then fxk g converges to the solution of (1.1) quadratically. Proof: By the SVD of Jk , we know the step at the current iterate is dk ¼ V1 ðR21 þ lk IÞ1 R1 U1T Fk  V2 ðR22 þ lk IÞ1 R2 U2T Fk :

ð2:11Þ

So we have Fk þ Jk dk ¼ Fk  U1 R1 ðR21 þ lk IÞ1 R1 U1T Fk  U2 R2 ðR22 þ lk IÞ1 R2 U2T Fk ¼ lk U1 ðR21 þ lk IÞ1 U1T Fk þ lk U2 ðR22 þ lk IÞ1 U2T Fk þ U3 U3T Fk :

ð2:12Þ

Since fxk g converges to x superlinearly, without loss of generality, we assume that L1 kxk  x k < rr =2 holds for all sufficient large k. Then we obtain from (2.10) that kðR21 þ lk IÞ1 k  kR2 1 k

1 ðrr

 L1 kxk 

x kÞ2


0 to satisfy kF ðxk þ ak dk Þk2  kF ðxk Þk2 þ ak b1 FkT Jk dk

ð3:1Þ

F ðxk þ ak dk ÞT J ðxk þ ak dk Þdk  b2 FkT Jk dk ;

ð3:2Þ

and

where b1  b2 are two constants in ð0; 1Þ. Another famous inexact line search is the Armijo line search which sets ak ¼ dt  a, where a > 0 and d 2 ð0; 1Þ are two positive constants, and t is the smallest nonnegative integer satisfying kF ðxk þ dt  adk Þk2  kF ðxk Þk2 þ b1 dt aFkT Jk dk :

ð3:3Þ

Both inexact line searches imply that kF ðxkþ1 Þk2  kF ðxk Þk2  b1 b3

ðFkT Jk dk Þ2 kdk k2

;

where b3 is some positive constant. For more details, please see [13].

ð3:4Þ

On the Quadratic Convergence of the Levenberg-Marquardt Method

33

Algorithm 3.1: (New Levenberg-Marquardt method with line search): Step 1: Given x0 2 Rn ; d 2 ½1; 2; g 2 ð0; 1Þ; k :¼ 0. Step 2: If kJkT Fk k ¼ 0 then stop; Set lk :¼ kFk kd ; Compute dk by (1.2). Step 3: If dk satisfies kF ðxk þ dk Þk  gkF ðxk Þk;

ð3:5Þ

then xkþ1 ¼ xk þ dk otherwise xkþ1 ¼ xk þ ak dk where ak is obtained by Wolfe or Armijo line search. Step 4 k :¼ k þ 1; go to Step 2. Theorem 3.1: Suppose Assumption 2.2 holds and F ðxÞ is continuously differentiable. Let the sequence fxk g be generated by Algorithm 3.1. Then any accumulation point of fxk g is a stationary point of /. Moreover, if an accumulation point x is a solution of nonlinear Eq. (1.1) and if Assumption 2.1 holds, then the whole sequence fxk g converges to x quadratically. Proof: It is easy to see that kF ðxk Þk is monotonically decreasing and bounded below. If kF ðxk Þk converges to zero, any accumulation point of fxk g is a solution of (1.1). Otherwise, kF ðxk Þk ! c > 0, which means that (3.5) holds for only finitely many times. Thus, inequality (3.4) is satisfied for all large k, which gives that 1 X ðF T Jk dk Þ2 k

kdk k2

k¼1

< þ1:

ð3:6Þ

The above inequality, the definition of dk and kF ðxk Þk  c > 0 imply that ðFkT Jk dk Þ2 ¼ ðdkT ðJkT Jk þ lk IÞdk Þ2  c2d kdk k4 :

ð3:7Þ

Relations (3.6) and (3.7) show that lim kdk k ¼ 0:

ð3:8Þ

k!1

This limit, (1.2) and the continuity of J ðxÞ imply that at any accumulation point x of fxk g, we have that J ðx ÞT F ðx Þ ¼ 0, which says that x is a stationary point of /ðxÞ. We now proceed to prove the second part of the theorem. It suffices to prove that (3.5) holds for all sufficiently large k. Since the stationary point x is a solution of 2þd 2

2 gc (1.1), there exists a large k~ such that xk~ 2 N ðx ; rÞ and kFk~k  ð L21c3 Þd , where ~ c1 ; c3 ; r and L2 are defined in Sect. 2. We now verify that (3.5) holds for all k  k.   ~ Since xk~ 2 N ðx ; rÞ, we have xk 2 N ðx ; b1 =2Þ for all k  k. In view of (2.6), we see d

d kF ðxkþ1 Þk L2 kxkþ1  xkþ1 k L2 c3 L2 c3 kF ðxk Þk2   kxk  xk k2  : 2þd kF ðxk Þk c1 kxk  xk k c1 c2

1

34

Jin-yan Fan and Ya-xiang Yuan

gc

2þd 2 2

Hence, it follows from kF ðxk~Þk  ð L21c3 Þd that kF ðxkþ1 ~ Þk  gkF ðxk~Þk and so kF ðxkþ1 Þk  gkF ðxk Þk for all k  k~ þ 1, which implies that the step size ak ¼ 1 holds for all sufficiently large k in Algorithm 3.1. Thus, we have fxk g converges to the solution quadratically. h 4. Numerical Results We tested our new Levenberg-Marquardt algorithm on some singular problems, and compared it with the traditional trust region algorithm for nonlinear Eqs. (1.1). The traditional trust region algorithm for nonlinear equations computes the trial step dk at the k-th iterate by solving the following subproblem: 4

 k ðdÞ minn kFk þ Jk dk2 ¼ u d2R

ð4:1Þ

s: t: kdk  Dk ; where Dk > 0 is the current trust region bound. The ratio between the actual reduction and the predicted reduction of the function is defined by rk ¼

Aredk kFk k2  kF ðxk þ dk Þk2 ; ¼  k ðdk Þ  k ð0Þ  u Predk u

which is used to decide whether the trial step is acceptable and to adjust the new trust region radius Dk . The algorithm can be stated as follows: Algorithm 4.1: (Trust region algorithm for nonlinear Eqs. [12]): Step 1: Given x1 2 Rn ; D1 > 0; e  0; 0  p0  p1  p2 < 1; k :¼ 1. Step 2: If kJkT Fk k  e, then stop; Solve (4.1) giving dk . Step 3: Compute rk ¼ Aredk =Predk ; set  xkþ1 ¼

xk þ dk ;

if rk >p0,

xk ;

otherwise .

ð4:2Þ

Step 4: Choose Dkþ1 as

Dkþ1

  8 Dk kdk k > > ; min ; < 4 2 ¼ D; > > : k maxf4kdk k; 2Dk g;

k :¼ k þ 1; go to Step 2.

if rk p2;

ð4:3Þ

On the Quadratic Convergence of the Levenberg-Marquardt Method

35

Both the trust region algorithm and the Levenberg-Marquardt algorithm have the advantage of preventing the trial step from being too large, which is especially useful for solving the singular nonlinear equations. However, the trust region algorithm achieves it by updating the trust region directly, while the LevenbergMarquardt algorithm modifies the parameter lk . Many papers have considered the relationship between these two algorithms, for more details, please see [4, [5], [12], [14], etc. First we test the Powell singular function [8], where n ¼ 4 and rankðJ ðx ÞÞ ¼ 2. The results are given in Table 1. The other test problems were created by modifying the nonsingular problems given by More´, Garbow and Hillstrom [6], and have the same form as in [9], F^ðxÞ ¼ F ðxÞ  J ðx ÞAðAT AÞ1 AT ðx  x Þ;

ð4:4Þ

where F ðxÞ is the standard nonsingular test function, x is its root, and A 2 Rnk has full column rank with 1  k  n. Obviously, F^ðx Þ ¼ 0 and J^ðx Þ ¼ J ðx ÞðI  AðAT AÞ1 AT Þ has rank n  k. A disadvantage of these problems is that F^ðxÞ may have roots that are not roots of F ðxÞ. We created two sets of singular problems, with J^ðx Þ having rank n  1 and n  2, by using A 2 Rn1 ;

AT ¼ ð1; 1; . . . ; 1Þ

and A 2 Rn2 ;

AT ¼



1 1

1 1

1 1 1 1

  1 ;    1

respectively. Meanwhile, we made a slight alteration on the variable dimension problem, which has n þ 2 equations in n unknowns; we eliminated the ðn  1Þ-th and n-th equations. (The first n equations in the standard problem are linear.) We used p0 ¼ 0:0001; p1 ¼ 0:25 and p2 ¼ 0:75, which are popular for tests in trust region method. And we applied Algorithm 2.6 in [7] to solve the trust region subproblem (4.1) in Algorithm 4.1. And the initial trust region radius is chosen as Table 1. Results on Powell singular problem lk ¼ akF ðxk Þk Problem

n

x0

a¼1 NF

Powell singular

4

1 10 100

13 34 198

lk ¼ akF ðxk Þk2

TR

a ¼ 104 NF

a¼1 NF

a ¼ 104 NF

NF/NG

10 13 16

15 485 –

10 13 22

11 13 16

36

Jin-yan Fan and Ya-xiang Yuan

D1 ¼ kðJ1T J1 þ l1 kF1 kIÞ1 J1T F1 k:

ð4:5Þ

We test several choices of the Levenberg-Marquardt parameter in the LevenbergMarquardt method. We choose lk ¼ akFk kd , with a ¼ 1 or 104 and d ¼ 1 or 2. The algorithm is terminated when the norm of JkT Fk , e.g., the derivative of at the k-th iterate, is less than 105 , or when the number of the iterations exceeds 100ðn þ 1Þ. The results for the first set problems of rank n  1 are listed in Table 2, and the second set of rank n  2 in Table 3. The third column of the table indicates that the starting point is x0 ; 10x0 , and 100x0 , where x0 is suggested by More´, Garbow and Hillstrom in [6]; ‘‘NF’’ and ‘‘NJ’’ represent the numbers of function calculations and Jacobian calculations, respectively. We only present the values of ‘‘NF’’ in the Levenberg-Marquardt method as ‘‘NF’’ is equal to ‘‘NJ’’ and in the trust region method if ‘‘NF’’ and ‘‘NJ’’ are the same. If the method

2 1 2 jjF ðxÞjj

Table 2. Results on first singular test set with rankðF 0 ðx ÞÞ ¼ n  1 lk ¼ akF ðxk Þk Problem

n

x0

a¼1 NF

1

2

3

2

4

4

5

3

8

10

9

10

10

30

11

30

12

10

13

30

14

30

1 10 100 1 10 100 1 10 100 1 10 100 1 10 100 1 10 100 1 10 100 1 10 100 1 10 100 1 10 100 1 10 100

43 63 234 64 46 – 25 31 62 18 22 32 9 23 45 4 311 100 13 87 28 6 12 – 14 16 36 14 26 30 12 18 24

lk ¼ akF ðxk Þk2

TR

a ¼ 104 NF

a¼1 NF

a ¼ 104 NF

NF/NG

15 18 21 OF OF OF 16 19 22 8 8 8 8 23 OF 4 7 9 5 7 10 12 – – 14 16 19 – – – 11 17 22

24 125 – OF OF – 28 – – 16 68 – 10 35 OF 3 23 515 6 43 – 7 – 260 15 112 – – – – 112 519 –

15 19 25 OF OF OF 16 18 47 8 8 10 8 23 45 3 8 9 5 6 10 – 13 – 14 16 20 – – – 11 17 28

15 17 21 OF OF OF 16 19 22 8 8 8 9 23 OF 4 8 10 6 9 10 23/13 – – 14 16 19 9 14 18 13 19 24

On the Quadratic Convergence of the Levenberg-Marquardt Method

37

Table 3. Results on second singular test set with rankðF 0 ðx ÞÞ ¼ n  2 lk ¼ akF ðxk Þk2

lk ¼ akF ðxk Þk Problem

n

x0

a¼1 NF

1

2

3

2

4

4

5

3

6 8

31 10

9

10

10

30

11

30

12

10

13

30

14

30

1 10 100 1 10 100 1 10 100 1 10 100 1 1 10 100 1 10 100 1 10 100 1 10 100 1 10 100 1 10 100 1 10 100

11 14 17 – 3 3 14 17 21 29 35 83 16 9 23 44 4 311 113 13 87 142 – 12 – 14 16 36 14 23 30 12 18 24

4

a ¼ 10 NF 11 13 17 33 27 114 14 17 20 13 14 15 19 8 23 OF 688 216 10 – – – 13 – – 14 16 19 – – – 11 17 22

TR

a¼1 NF

a ¼ 104 NF

NF/NG

12 58 – 35 14 3 23 – – 21 74 – 60 10 35 – – 384 517 20 45 – 9 13 261 15 109 – 461 – – 12 519 –

11 13 17 OF OF OF 14 17 26 13 14 17 20 8 23 45 76 70 10 – – – – 23 – 14 16 20 – – – 11 17 28

24 31 38 59/51 4 4 4 – – 13 14 66/44 – 328 – – 4 8 10 6 9 10 20/13 – – – – – 9 14 18 13 9 24

failed to find the solution in 100ðn þ 1Þ iterations, we denoted it by the sign ‘‘–’’, and if the iterations have underflows or overflows, we denoted it by OF. From the results, we can see that our new Levenberg-Marquardt algorithm performs almost the same as the traditional trust region algorithm for problems with rankðJ ðx ÞÞ ¼ n  1. However it performs much better than the traditional trust region algorithm when rankðJ ðx ÞÞ ¼ n  2. Hence, it seems that our new Levenberg-Marquardt algorithm may be more efficient for nonlinear equations with higher rank deficiency. When the starting point is far away from the solution set of the nonlinear equations, the choice of a ¼ 104 is better than that of a ¼ 1 and the choice of d ¼ 1 is better than that of d ¼ 2, whatever the rank of J ðx Þ is. These facts indicate that lk ¼ kFk k2 may be very large at the beginning of the iterations, which may lead to smaller steps, and so prevent the sequence from converging

38

Jin-yan Fan and Ya-xiang Yuan

quickly, and sometimes the method can not solve the problem in 100ðn þ 1Þ iterations. All in all, the Levenberg-Marquardt algorithm with the parameter being lk ¼ kFk k performs most stable among the traditional trust region algorithm and the Levenberg-Marquardt algorithm with other three choices of the parameter. Hence, the choice of lk ¼ kFk k may be preferable to an arbitrary problem with unknown rank. Finally, it is worth pointing out that on the 100x0 case for Problem 11, when the parameter is chosen as lk ¼ kFk k2 , the new Levenberg-Marquardt algorithm converges to a stationary point of minx2Rn kF ðxÞk, instead of that of F ðxÞ ¼ 0.

Acknowledgements Supported by Chinese NSF grants 19731010, 10231060 and the Knowledge Innovation Program of CAS. It was pointed out by Prof. C.T. Keller (private communication) that the choice lk ¼ kF ðxk Þk was suggested in his book [1]. We would like to thank two anonymous referees for their valuable comments.

References [1] Kelley, C. T: Iterative methods for optimization (Frontiers in Applied Mathematics 18). Philadelphia: SIAM 1999. [2] Levenberg, K.: A method for the solution of certain nonlinear problems in least squares. Quart. Appl. Math. 2, 164–166 (1944). [3] Marquardt, D. W.: An algorithm for least-squares estimation of nonlinear inequalities. SIAM J. Appl. Math. 11, 431–441 (1963). [4] More´, J. J.: The Levenberg-Marquardt algorithm: implementation and theory. In: Lecture Notes in Mathematics 630: Numerical analysis (Watson, G. A., ed.), pp. 105–116. Berlin: Springer 1978. [5] More´, J. J.: Recent developments in algorithms and software for trust region methods. In: Mathematical programming: The state of art. (Bachem, A., Gro¨tschel, M., Korte, B., eds.), pp. 258–287. Berlin: Springer 1983. [6] More´, J. J., Garbow, B. S., Hillstrom, K. H.: Testing unconstrained optimization software. ACM Trans. Math. Software 7, 17–41 (1981). [7] Nocedal, J., Yuan, Y.X.: Combining trust region and line search techniques. In: Advances in nonlinear programming (Yuan, Y., ed.), pp. 153–175. Kluwer 1998. [8] Powell, M. J. D.: An iterative method for finding stationary values of a function of several variables. Comput. J. 5, 147–151 (1962). [9] Schnabel, R. B., Frank, P.D.: Tensor methods for nonlinear equations. SIAM J. Numer. Anal. 21, 815–843 (1984). [10] Stewart, G. W., Sun, J. G.: Matrix perturbation theory. San Diego: Academic Press 1990. [11] Yamashita, N., Fukushima, M.: On the rate of convergence of the Levenberg-Marquardt method. Computing (Suppl. 15): 237–249 (2001). [12] Yuan, Y. X.: Trust region algorithms for nonlinear equations. Information 1, 7–20 (1998). [13] Yuan, Y. X.: Problems on convergence of unconstrained optimization algorithms. In: Numerical linear algebra and optimization (Yuan, Y. X., ed.), pp. 95–107. Beijing: Science Press 1999. [14] Yuan, Y. X.: A review of trust region algorithms for optimization. In: ICM99: Proc. 4th Int. Congress on Industrial and Applied Mathematics (Ball, J. M., Hunt, J. C. R., eds.), pp. 271–282. Oxford: Oxford University Press 2000.

On the Quadratic Convergence of the Levenberg-Marquardt Method Jin-yan Fan State Key Laboratory of Scientific/ Engineering Computing Institute of Computational Mathematics and Scientific/ Engineering Computing The Academy of Mathematics and Systems Sciences Chinese Academy of Sciences P.O. Box 2719 Beijing, 100080 P.R. China e-mail: [email protected]

Ya-xiang Yuan State Key Laboratory of Scientific/ Engineering Computing Institute of Computational Mathematics and Scientific/ Engineering Computing The Academy of Mathematics and Systems Sciences Chinese Academy of Sciences P.O. Box 2719 Beijing, 100080 P.R. China e-mail: [email protected]

39