A NEW TRUST REGION ALGORITHM FOR EQUALITY CONSTRAINED OPTIMIZATION THOMAS F. COLEMANy AND WEI YUANz
Abstract. We present a new trust region algorithm for solving nonlinear equality constrained optimization problems. At each iterate a change of variables is performed to improve the ability of the algorithm to follow the constraint level sets. The algorithm employs L2 penalty functions for obtaining global convergence. Under certain assumptions we prove that this algorithm globally converges to a point satisfying the second order necessary optimality conditions; the local convergence rate is quadratic. Results of preliminary numerical experiments are presented.
1. Introduction. We consider the equality constrained optimization problem minimize f (x) subject to c(x) = 0
(1:1)
where x 2 1 (1KK From (2.20) and (2.23), for any l > l, (l) i
i
i
2
2 0
(l) 2 p (xi(l) ) ? p (x(il)) p (x(il+)) ? p (x(il)) ?(1 ? ) i(l) kc(xi )k :
i
i
i
i
i
(3:7)
Thus, inequality (2.14 b) holds for xed i after a nite number of iterations. Otherwise, from (3.7) and Lemma 2.1, we have (3:8) p (xi(l ) ) ? p (x(il ) ) ?(1 ? ) i(l ) ((il ) )2i ?(1 ? ) i < 0: Inequality (3.8) contradicts the fact that p (x) is bounded below for any xed i. Therefore we obtain the following result. Lemma 3.2. Suppose Assumption 3.1 is satis ed. For any xed i, suppose fx(il) g is generated by Algorithm 2.1. Then there exists an integer li > 0, such that for all l li (3:9) kc(x(il) )k (il) i : s+1
s
s
s
i
i
i
Let > 0 be a trust radius. Suppose that h satis es Conditions 1, 2, and 3 in section 2 with (il) = . De ne ri(l) () = [p (u(il) (h)) ? p (u(il) (0))] = qi(l) (h): Using Taylor's theorem, we obtain that i
i
i
i
i
Z1
Z1
r2hpi (u(il)(h))(1 ? )d ] h 0 (u(l)(h))
p (u(il)(h)) ? p (u(il)(0)) = rhp (u(il)(0))T h + hT [
= qi(l) (h) + hT f [r2hp i 0 i
?r2h p (u(il)(0))](1 ? )d gh: i
11
3. GLOBAL CONVERGENCE
Subtracting qi(l) (h) from both sides and then dividing both sides by qi(l) (h), this becomes
jri(l) () ? 1j = j [p (u(il)(h)) ? p (u(il)(0))] ? qi(l) (h) j = jqi(l) (h)j 2 Z1 k(lh) k k [r2hp (u(il)(h)) ? r2h p (u(il)(0))](1 ? )d k: (3.10) jqi (h)j 0 i
i
i
i
Based on (3.10), in Lemma 3.3 we show that (2.14 a) and (2.14 c) hold for some integer l.
Lemma 3.3. Suppose Assumption 3.1 is satis ed. For xed i > 0, suppose fx(il)g is
generated by Algorithm 2.1. Then there exists a subsequence of fx(il) g, l 2 L, such that
lim kg(x(il))k = 0 2L
(3:11)
lim eigmin (H (x(il))) 0; 2L
(3:12)
l
l!1
and l
l!1
where g(x) = Z (x)T rf (x), H (x) = Z (x)T r2L(x)Z (x) and eigmin (H ) denotes the smallest eigenvalue of a matrix H . Proof. Due to Lemma 3.2, we can assume that x(il+) = x(il) without loss of generality. There are two possible cases: either 1. inf l (il) = > 0, or 2. There exists an integer set L such that (il+1) < (il) and (il) ! 0 for l 2 L. In case 1, suppose there exists an > 0 such that, for all l, kg(x(il))k . Then, because kBi(l) k K , it follows from (2.15) in Condition 1 that
qi(l) (h(il)) ? 1 minf ; =K g ?:
(3:13)
Recall that according to Algorithm 2.1, we have that for all l
p (x(il+1)) ? p (xi(l)) ri(l) qi(l) (h(il) ) 1 qi(l) (h(il) ) ?1 < 0: i
i
But by the de nition of p (x) we know that p (x) is bounded below, thus i
i
p (x(il+1)) ? p (x(il)) ! 0 i
as
i
l ! 1:
This contradiction establishes that there exists an integer set L~ such that lim fg(x(il) )gl2L~ = 0:
l!1
Now, suppose there exists an > 0 such that, for all l 2 L~, eigmin (H (x(il))) ? < 0. Then, (2.16) in Condition 2 yields qi(l) (h(il)) ? ( ) ? < 0. Similar to (3.13), this (l) 2 i
2
2
2
12
3. GLOBAL CONVERGENCE
inequality contradicts the fact that p (x) is bounded below. Therefore (3.11) and (3.12) hold with an integer set L L~. In case 2, by Assumption 3.1, fx(il)gl2L is in a compact set D. Thus there exists a convergent subsequence. Without loss of generality, we assume that i
(il) ! 0
x(il) ! x
and
l 2 L:
for
Recall that in Algorithm 2.1 if the radius (il) decreases, there must be a previously tried radius (il) for which (il) = 14 (il) and ri(l) ( (il)) 1: Since (il) ! 0, we have
ri(l) ( (il)) 1
(il) ! 0
and
for
l 2 L:
(3:14)
Suppose kg(x)k = 2 > 0, then for l 2 L suciently large, kg(x(il))k = . It follows from (2.15) that for l 2 L
qi(l) (h (il) ) ? 1 minf (il) ; =K g ? 1 (il) since (il) ! 0. Thus, from (3.10), for l 2 L, (l)
Z1
1
0
jri(l) ( (il)) ? 1j k i k
kr2h p (u(il)( h(il)) ? r2hp (u(kl)(0))k(1 ? )d ! 0: i
i
This contradicts (3.14). Thus (3.11) holds with L = L. Suppose eigmin (H (x)) ?2 < 0, then for l 2 L suciently large, eigmin (H (x(il))) . Thus, (2.16) implies that for l 2 L (il))2 ( ( l ) ( l ) qi (hi ) ? : 2 It follows from (3.10) that for l 2 L
jr(l)( (l)) ? 1j i
i
2
Z1
0
kr2h p (u(il)( h(il)) ? r2hp (u(il) (0))k(1 ? )d ! 0 i
i
since kh (il) k k (il) k ! 0. This also contradicts (3.14). Therefore (3.12) holds with L = L. Lemmas 3.2 and 3.3 indicate that for any xed i there exists an integer li > 0 such that for l = li the criteria in (2.14) are satis ed. Thus, we obtain the following results.
13
3. GLOBAL CONVERGENCE
Lemma 3.4. Suppose Assumption 3.1 is satis ed. For every integer i, let li denote the rst
integer l 0 for which (2.14) are satis ed. Then Algorithm 2.1 generates a sequence fxk g P such that xk = x(il) , where k = k(i; l) = ij?=11 lj + l and l li . Furthermore,
lim [kZ (xk )T rf (xk)k + kc(xk )k] = 0;
k!1
and
lim eigmin [Z (xk )T r2L(xk )Z (xk ) + i I ] 0:
k!1
Proof. Since the criteria in (2.14) are satis ed for l = li , according to Algorithm 2.1, we (l ) reduce the penalty parameter i to i+1, set x(0) i+1 = xi , and start another inner loop with new parameter i+1. Therefore, Algorithm 2.1 generates a sequence i
(l ?1) ; x(0) ; : : : ; x(l ?1) ; x(0) ; : : : : : : : : : ; x(l ? ?1) ; x(0) ; : : : ; x(l ) ; : : : : x(0) 2 1 1 0 ; : : : ; x0 i i i?1 1
0
i
1
i
P
We can reindex the sequence fx(il) g as fxk g such that xk = x(il) , where k = k(i; l) = ij?=11 lj + l with l li . Since (2.14) holds for l = li and i tends to zero, we get limi!1 [kZ (xi(l ) )T rf (xi(l ) )k + kc(xi(l ) )k] = 0 and limi!1 eigmin [Z (x(il ) )T r2L(xi(l ) )Z (x(il ) ) + i I ] 0. Thus, Lemma 3.3 is proved. i
i
i
i
i
i
From the proof of Lemma 3.3 we see that Condition 1 yields (3.11), and Condition 2 implies (3.12). Thus, if h(il) does not satisfy Conditions 2 and 3, we can still establish (3.11) from Condition 1. As we will see in the rest of this section, xk converges to a point satisfying the rst order necessary optimality conditions if h(il) in Algorithm 2.1 satis es Condition 1. Before we prove in Lemmas 3.6 and 3.7 that all limit points of fxk g satisfy the rst order necessary conditions, we need the following lemma. Lemma 3.5. Suppose Assumption 3.1 is satis ed. Assume fx(il) g is generated by Algorithm
2.1. Then
1 lX i ?1 X i=1 l=0
[p (x(il)) ? p (x(il+1))] < +1: i
(3:15)
i
Proof. Due to (2.19), we assume that i0 is an integer such that i = 6i?=51 for all i i0. (l ? ) Since x(0) i = xi?1 , we have that i
1
1 lX i ?1 X i=i0 +1 l=0
[p (x(il)) ? p (x(il+1))] = i
i
1 X
(l ) [p (x(0) i ) ? p (xi ]
i=i0 +1
i
i
i
14
3. GLOBAL CONVERGENCE
=
1 X
(0) [p (x(0) i ) ? p ? (xi )]
i=i0 +1
+
1 X
i
i
1 X
1
[p ? (xi(?l ?1 ) ) ? p (x(il ) )] i
i
i=i0+1
1
1
i
i
(0) [p (x(0) i ) ? p ? (xi )] + N2
i=i0 +1
i
i
1
(3.16)
where N2 = p (xi(l ) ) ? inf i p (x(il ) ) is a constant since p (x) is bounded below. It follows from the Assumption 3.1 that there exists a constant N1 > 0 such that for all integers i > 0 and 0 l li, (il) maxf K0 krf (x(il))k=; 1 g N1: Thus i0
i0
i
i
i
0
(l) kc(x(0) i )k i i?1 N1 i?1:
(3:17)
From (3.17) and i = 6i?=51, we get that 2 1 1 (0) (0) 2 i?1 2 4=5 2 p (x(0) i ) ? p ? (xi ) = [ i ? i?1 ]kc(xi )k i N1 i?1N1 : Thus, (3.16) yields i
i
1 lX i ?1 X i=i0 l=0
1
(x(l) ) ? p
[p i i
4=5 1 X ( l +1) 4 = 5 2 2 i?1 + N2 N1 1 ?04=5 i (xi )] N1 i=1
+ N2 :
Therefore, (3.15) is established. Because of (3.15), we are able to prove in Lemmas 3.6 and 3.7 that any limit points of sequence fxk g described in Lemma 3.4 satis es the rst order necessary optimality conditions. Lemma 3.6. Suppose Assumption 3.1 is satis ed. Suppose fxk g is the sequence described
in Lemma 3.4. Then
lim kc(xk )k = 0:
(3:18)
k!1
Proof. To establish (3.18), we de ne 8 < kc(xk )k2
dk = :
i
? c(xk )T k if kc(xk )k > k i ; 0
otherwise.
Therefore, dk 0. It is obvious from Lemma 3.1 and Lemma 2.1 that
dk d(il) p (x(il)) ? p (x(il+)) p (x(il)) ? p (x(il+1)) i
i
i
i
15
3. GLOBAL CONVERGENCE
which, with (3.15), implies that 1 X k=k0
dk =
1 lX i ?1 X (l) i=1 l=0
di < +1:
(3:19)
Therefore, limk!1 dk = 0. Notice that since (xk ) kk k, we have that if kc(xk )k > k i, kc(xk )k (1 ?1) [k kc(xk )k ? k kc(xk )k] k 2 (1 ?1) [ kc(xk )k ? kk kkc(xk )k] i k d d (1 ? k ) 1 ?k : k
(3.20)
Therefore, kc(xk )k maxf 1d? ; k ig, which implies (3.18) since i ! 0 and dk ! 0. Lemma 3.7. Suppose Assumption 3.1 is satis ed. Suppose fxk g is the sequence described in Lemma 3.4. Then k
lim kZ (xk )T rf (xk )k = 0:
(3:21)
k!1
Proof. Let g(x) = Z (x)T rf (x) and x(il +) denote x(0+) i+1 . We rst prove by contradiction ( l +) that kg(xi )k ! 0. Without loss of generality, we suppose that for any i there is an integer ki (1 ki < li ) such that kg(xi(k +) )k > 0. Because of (3.1) and the de nition of vi(l) , it follows from (3.18) and Taylor's theorem that for all 0 l < li and all i i
i
(3:22) kg(x(il+))k = kg(x(il) + i(l) Yi(l)vi(l) )k = kg(x(il))k + O(kc(x(il))k): Since kg(x(il ) )k 1i =2 and i ! 0, (3.18) implies that kg(x(il +) )k < =2 for i suciently large. Assume that l = ji is the nearest index to ki in integer set fl 2 (ki; li ] : kg(x(il+) )k < =2g. Then kg(x(il+))k =2 for l satisfying ki l < ji . Hence, from (2.15), for ki l ji ? 1, g: (3:23) p (x(il)) ? p (x(il+1)) ?ri(l) qi(l) (h(il) ) ?1 qi(l) (h(il)) 12 1 minf(il) ; 2K i
i
i
i
Inequality (3.23) leads that
p
i
(x(ki ) ) ? p i
(x(ji) )
i i
=
jX i ?1 l=ki
[p (x(il)) ? p (x(il+1))] i
i
jX ?1 12 1 minf (il) ; 2K g l=k i
i
jX i ?1
12 1 minf kh(il) k; 2K g: l=k i
(3.24)
3. GLOBAL CONVERGENCE
16
Pji ?1 (l) l=ki khi k
! 0 as i ! 1. Because of
Combining (3.15) and (3.24), we have that Assumption 3.1 (2), (3.19) and (3.20) yield ji X ( l ) kvi k K0 kc(x(il) )k l=ki l=ki ji X
j K0 X (l) 1 ? l=k di ! 0; i
as i ! 1:
i
Therefore, the continuity of u(h) implies that there exists a constant K > 0 such that
kxi(k +) ? xi(j +)k i
i
jX i ?1 l=ki
kx(il+) ? x(il+1+)k K (
jX i ?1 l=ki
kh(il) k +
ji X l=ki+1
kvi(l) k) ! 0
(3:25)
as i ! 1. Due to the continuity of Z (x) and rf (x) and (3.25), we have that
kg(xi(k +)) ? g(x(ij +) )k =4 i
i
for all i suciently large. Therefore when i is suciently large
kg(xi(k +) )k kg(xi(k +)) ? g(x(ij +) )k + kg(x(ij +) )k =4 + =2 = 3=4: i
i
i
i
This contradicts the assumption that kg(xi(k +))k . Since kg(x(il+))k ! 0, it follows from (3.22) and (3.18) that (3.21) holds. i
Now we further assume that sequence fxk g has only a nite number of limit points. We end this section by showing in Theorem 3.1 that in this case there is actually only one limit point x for fxk g and at x the second order necessary optimality conditions are satis ed. Theorem 3.1. Suppose Assumptions 3.1 is satis ed, fxk g is the sequence described in
Lemma 3.4, and that there exist only nite number of limit points to fxk g. Then
lim x k!1 k
= x
where x is a point at which the second order necessary optimality conditions are satis ed. Proof. We rst prove by contradiction that the sequence converges. Suppose fxk g does not converge. Since there are only nite number of limit points to fxk g, every limit point is an isolated one. Thus, Lemma 4.10 of [19] yields that there exists a subsequence fxk g of fxk g and an > 0 such that kxk +1 ? xk k for all j . Inequality (3.25) shows that kvk k tends to zero, which implies that kxk+1 ? xk k ! 0 because of (2.18). This contradiction shows that fxk g converges. Note that i goes to zero in criterion (2.14, c). Since fxk g converges to a point x, from Lemma 3.4, we know that the second order necessary optimality conditions are satis ed at x. j
j
j
4. LOCAL QUADRATIC CONVERGENCE
17
4. Local Quadratic Convergence. Theorem 3.1 shows that, under certain assumptions, the sequence fxk g converges to a point x satisfying the second order necessary conditions. We show in this section that the convergence rate is quadratic in a neighborhood of x. Throughout this section we let Bi(l+) denote the reduced Hessian matrix H (x(il+)). Assumption 4.1. The point x is a local minimizer of problem (1.1) where the following
conditions hold. 1. The functions r2f (x) and r2ci (x), i = 1; : : : ; m, are Lipschitz continuous in a neighborhood of x. 2. The matrix A(x) has full column rank. 3. The reduced Hessian matrix ZT r2xL(x; )Z is positive de nite.
Note that the full column rank assumption in Assumption 4.1 implies that there is constant K0 such that kYi(l) (R(il) )?T k K0 when x(il) is in a neighborhood of x. In Lemmas 4.1, 4.2 and 4.3, we establish some preliminary results. Lemma 4.1 is due to Byrd and Nocedal [2]. It says that when x is suciently close to x, the distance between x and x is \equivalent" to the sum of the norms of the reduced gradient of f (x) and the constraint function c(x). Lemma 4.1. Suppose Assumption 4.1 holds. Then
K1 kx ? xk2 kZ (x)T rf (x)k2 + kc(x)k2 K2 kx ? xk2
(4:1)
for all x suciently near x. Proof. See [2].
In Lemma 4.2 we demonstrate that the unit step length i(l) = 1 is admissible after a certain number of iterations. Lemma 4.2. Suppose Assumption 4.1 holds. Suppose the sequence generated by Algorithm 2.1 converges to x. If < 1 ? p12 , then there exists an integer i1 such that for i i1 and
0 l li
i(l) = 1: Proof. We need to prove that for suciently large i
p (xk + Yk vk ) p (xk ) + rp (xk )T Yk vk i
i
i
(4:2)
whenever (2.14 b) does not hold. Lemma 3.6 tells us that both kc(xk )k and dk tend to zero as k ! 1. Thus, from the de nition of vk and dk , we get that kYk vk k2 =i K02 kc(xk )k2=i !
18
4. LOCAL QUADRATIC CONVERGENCE
0. Since kc(xk )k ! 0 and i ! 0, a simple calculation yields 2 2 2 2 vkT YkT r2p (xk ) Yk vk = kc(xk )k + o( kYkvk k ) = kc(xk )k + o( kc(xk )k ): i i i i From Taylor's theorem, we obtain that i
p (xk + Yk vk ) = p (xk ) + rp (xk )T Yk vk 2 + 12 vkT YkT r2p (xk ) Yk vk + o( kYkvk k ) i T = p (xk ) + rp (xk ) Yk vk 2 2 +(1 ? )rp (xk )T Yk vk + kc(2xk )k + o( kc(xk )k ): (4.3) i
i
i
i
i
i
i
i
It follows from (2.21) and (2.14 b) that
i
rp (xk )T Yk vk kk k kc(xk )k ? 1 kc(xk )k2 ?(1 ? ) kc(xk )k : 2
i
i
i
(4:4)
Since ?(1 ? )2 + 21 < 0 by assumption, (4.4) implies that 2 2 (1 ? )rp (xk )T Yk vk + kc(2xk )k + o( kc(xk )k ) i i 2 2 [?(1 ? )2 + 12 ] kc(xk )k + o( kc(xk )k ) i i 0: (4.5) i
Combining (4.3) and (4.5), we obtain (4.2). Due to Lemma 4.2, we assume that i(l) = 1. Therefore, we can establish two important inequalities in Lemma 4.3. Lemma 4.3. Suppose Assumption 4.1 holds. Assume the sequence generated by Algorithm 2.1 converges to x. Then there exists an integer i3 > 0 and a constant K3 > 0 such that all
0 l li
kc(x(il))k K3 i?1
(4:6)
kc(x(il+) )k K3 i
(4:7)
and whenever i i3. Proof. First , by Assumption 4.1, there exists a constant K3 1 such that (il) K3 . If (2.14 b) holds, then, by Algorithm 2.1, x(il+) = x(il) . Hence
kc(x(il+))k = kc(x(il) )k (il) i K3 i K3 i?1:
(4:8)
19
4. LOCAL QUADRATIC CONVERGENCE
which means that both (4.6) and (4.7) are true. If (2.14 b) does not hold, then, since i(l) = 1 when i is suciently large, Taylor's theorem yields that there exists a constant N2 > 0 such that
kc(x(il+))k = kc(x(il) + Yi(l)vi(l) )k = kc(x(il) ) + (A(il) )T Yi(l) vi(l) k + O(kvi(l) k2 ) kc(x(il) ) + (R(il) )T vi(l) k + N2kc(x(il))k2 = N2kc(x(il) )k2:
(4.9)
We prove (4.6) by induction as follows. Algorithm 2.1 shows that (4.6) is true for l = 0. Suppose that (4.6) holds for l = j . It follows from (2.5) and (2.18) that there exists a constant K > 0 such that kc(x(ij +1))k kc(x(ij +) )k + K3 kh(ij ) k3 kc(x(ij +))k + K6i?=51: (4:10) Thus, combining (4.8) (4.9), and (4.10), we obtain that 8
0 and a constant K5 , independent of i, such that for i i5 and 0 l li
j [p (u(il)(h(il) )) ? p (u(il)(0)) ? [f (u(il)(h(il) )) ? f (u(il)(0))] j (l) (l) 2 (l) 2 = j kc(ui (hi ))k ? kc(ui (0))k j i
i
De ne
G(il) =
Z1
0
r2h f (u(h(il)))(1 ? )d
and
O( kh(l) k3) i i 2i
2i
K5 kh(il) k3 :
(4.13)
Hi(l+) = Z (x(il+))T r2L(x(il+))Z (x(il+)):
Since u(0) = x(il+) and using (2.6), we have that Hi(l+) = r2h f (u(il)(0)). By the Lipschitz continuity of r2L(x) and u(h), there exists a constant N1 such that for any 0 1,
kr2hf (u(il)(h(il))) ? Hi(l+) k = kr2hf (u(il)(h(il))) ? r2hf (u(0))k N1kh(il) k: Thus, we have
1 H (l+) k = k Z 1[r2 f (u(h(l))) ? H (l+) ](1 ? )d k N kh(l) k: 1 i h i i i 2 i 0 Using Taylor's theorem, it follows from (4.14) that
kG(l) ?
f (u(il)(h(il))) ? f (u(il)(0)) = g(x(il+))T h(il) + (h(il))T G(il) h(il) qi(l) (h(il)) + kG(il) ? 12 Hi(l+) kkh(il) k2 qi(l) (h(il)) + N1kh(il) k3 :
(4:14)
(4.15)
Combining (4.13) and (4.15), we obtain that
j [p (u(il)(h(il))) ? p (u(il)(0))] ? qi(l) (h(il)) j j [f (u(il)(h(il) )) ? f (u(il)(0))] ? qi(l) (h(il)) j + K5 kh(il) k3 (N1 + K5 )kh(il) k3: i
i
Therefore, (4.12) holds with K6 = N1 + K5 .
4. LOCAL QUADRATIC CONVERGENCE
21
When x(il+) is suciently close to x, H (x(il+)) is positive de nite. By means of Lemma 4.4 we prove in Lemma 4.5 that, if (4.16) is satis ed, then an approximate Newton step is taken. i.e., inequality (4.17) holds. It is obvious from Algorithm 2.1 that (4.16) is true for l = 0. In Lemma 4.6 we exhibit inductively that (4.16) holds for all l with 0 l < li when i is suciently large. Thus, approximate Newton steps are eventually taken for all 0 l < li when i is suciently large. Lemma 4.5. Suppose Assumption 4.1 holds. Assume the sequence generated by Algorithm 2.1 converges to x. Then there exists an integer i7 such that for all i i7 and 0 l li, if
kZ (x(il) )T rf (x(il))k 1i?=21;
(4:16)
kHi(l+) h(il) + Z (x(il+) )T rf (x(il+))k kZ (x(il+) )T rf (x(il+))k:
(4:17)
then
Proof. We rst show that, regardless of inequality (4.16), (il) = 2i =5 for i suciently large. It follows from Algorithm 2.1 and (4.6) that
kx(il+) ? x(il) k kYi(l) vk K0 kc(x(il) )k K0 K3 i?1; which yields that kx(il+) ? x(il) k ! 0. Thus x(il+) converges to x since x(il) tends to x. Since H (x) = ZT r2L(x; )Z is positive de nite, H (x(il+)) is positive de nite for i suciently large. Due to the continuity of H (x), it follows that kH (x(il+))k 2kH (x)k and kH (x(il+))?1k 2kH (x)?1k when i is suciently large. According to Algorithm 2.1, when H (x(il+)) is positive de nite, Conditions 3 in section 2 yields that either
9 g 2 (il) (4:18) or
kH (x(il+))h(il) + g(x(il+))k kg(x(il+))k:
(4:19)
If (4.18) holds, then kh(il) k (il) < kH (x(il+) )?1k [ kg ? g(x(il+))k + kg(x(il+))k ]. Otherwise, (4.19) is true. Note that kh(il) k kH (x(il+))?1k k [ H (x(il+) )h(il) + g(x(il+)) ] ? g(xi(l+)) k: Thus, in either case, we have
kh(il) k kH (x(il+))?1k [ kg(x(il+))k + kg(x(il+))k ] 2 kH (x(il+) )?1k kg(x(il+) )k:
4. LOCAL QUADRATIC CONVERGENCE
22
From (2.15), we get that
qi(l) (h(il) ) ? 1 kg(x(il+))k minf(il) ; kg(x(il+) )k=kH (x(il+))kg (l) kh(il) k ? 1 kh(li+)k ?1 minfkh(il) k; g: kH (xi ) k kH (x(il+))kkH (x(il+))?1k (l) 2 ? 116K7 kHk(hxi )k?1k ; where K7 = 1=[kH (x)kkH (x)?1k]. It follows from (3.10) and (4.12) that as i ! 1, (l) 3 ?1 jri(l) ? 1j k(lh) i k(l) 16K6 kKH (x) k kh(il) k ! 0: 7 1 jqi (hi )j
Therefore, when i is suciently large, for all l
ri(l) 2:
According to Algorithm 2.1, when ri(l) 2, we enlarge the radius (il) until it is equal to 2i?=51. Since i decreases, there exists i7 such that (il) = 2i?=51 holds for i i7. Using equality (il) = i2?=51, we prove now that (4.17) holds if inequality (4.16) is true. Since vi(l) = ?(R(il) )?T c(x(il) ), Taylor's theorem implies that
kZ (x(il+) )T rf (x(il+))k kZ (x(il) )T rf (x(il))k + K kYi(l)vi(l) k kZ (x(il) )T rf (x(il))k + K^ kc(x(il))k
(4.20)
where K and K^ are constants. If (4.16) holds, it follows from (4.6) that for i suciently large ^ 3 i?1 21i?=21: (4:21) kZ (x(il+) )T rf (x(il+))k 1i?=21 + KK
That is, kg(x(il+))k 21i?=21 where g(x) denotes Z (x)T rf (x). Since kH (x(il+))?1k K for a constant K > 0, (il) = 2i?=51, and i tends to zero, we get that (4:22) kH (x(il+))?1g(x(il+))k K kg(x(il+) )k 2K1i?=21 2i?=51 = (il) ; when i is large enough such that i?1 (=2K )10. Thus, it follows from Condition 3 in
section 2 that (4.17) holds. To prove Lemma 4.6, we need inequality (4.24). By the de nition of u(h) in (2.3), we have u(h) ? u(0) = s(Zh) = Zh ? 12 Y R?1[hT Z T r2c(xc )Zh]: It follows from the boundedness of kZi(l+) k, kYi(l+)(R(il+) )?1k and r2c(x(il+)) that
x(il+1) ? x(il+) = s(Zi(l+) h(il) ) = Zi(l+) h(il) + O(kh(il) k2):
(4:23)
4. LOCAL QUADRATIC CONVERGENCE
23
Since x(il+) ? x(il) = Yi(l) vi(l) , (4.23) yields
x(il+1) ? x(il) = = = =
x(il+1) ? x(il+) + Yi(l) vi(l) Zi(l+) h(il) + Yi(l) vi(l) + O(kh(il)k2 ) Zi(l+) h(il) + Yi(l+) vi(l) + (Yi(l) ? Yi(l+) )vi(l) + O(kh(il)k2 ) Zi(l+) h(il) + Yi(l+) vi(l) + o(kh(il) k + kvi(l) k):
By the de nition of Y and Z , it is easy to see that
kx(il+1) ? x(il)k kZi(l+) h(il) + Yi(l+)vi(l) k ? 41 [kh(il) k + kvi(l) k] (kh(il)k2 + kvi(l) k2)1=2 ? 41 [kh(il) k + kvi(l) k]
14 kh(il) k:
Thus, there exist an integer i8 and a constant K8 such that for i i8
kh(il) k2 K8 [kx(il+1) ? xk2 + kx(il) ? xk2 ]:
(4:24)
Before proving Lemma 4.6, we de ne
i(l) = H (x(il+))h(il) + g(x(il+)): That is, i(l) is the residual when linear system H (x(il+))h(il) = ?g(x(il+)) is solved. Lemma 4.6. Suppose Assumption 4.1 holds. Assume the sequence generated by Algorithm 2.1 converges to x. Then there exist an integer i9 and a constant K9 > 0 such that for
0 l < li , if
kZ (x(il) )T rf (x(il))k 1i?=21;
(4:25)
then
kZ (x(il+1))T rf (x(il+1))k + kc(x(il+1))k K9 [kZ (x(il) )T rf (x(il))k2 + kc(x(il) )k2] +2[kc(x(il+) )k + ki(l) k] (4.26) and
kZ (x(il+1))T rf (x(il+1))k i1?=21 whenever i i9.
(4:27)
4. LOCAL QUADRATIC CONVERGENCE
24
Proof. Due to the de nition of i(l) and equality x(il+1) = x(il+) + s(Zi(l+) h(il) ), Taylor's theorem and (4.23) yields
Z (x(il+1))T rf (x(il+1)) = Z (x(il+))T rf (x(il+)) +[Z (x(il+))T rf (x(il+))]s(Zi(l+)h(il) ) + O(kh(il)k2 ) = ?Hi(l+) h(il) + [Z (x(il+) )T rf (x(il+))]0x Z (x(il+))h(il) (4.28) +i(l) + O(kh(il) k2): It follows from [7] that [Z (x)T rf (x)]0x = Z (x)T r2L(x). Thus, by the de nition of Hi(l+) and (4.28), we have that kZ (x(il+1))T rf (x(il+1))k = ki(l) k + O(kh(il) k2). In addition to (2.5) in which c(x(il+1)) = c(x(il+)) + O(kh(il) k3), we obtain that there exists a constant K > 0 such that
kZ (x(il+1) )T rf (x(il+1))k + kc(x(il+1))k kc(x(il+))k + ki(l) k + K kh(il) k2 :
(4:29)
Inequalities (4.24) and (4.1) imply that for i large enough
kh(il) k2 i(l) kx(il+1) ? xk + K8 kx(il) ? xk2
(l) i pK [ kZ (x(il+1))T rf (x(il+1))k + kc(x(il+1))k ] + K8kx(il) ? xk2 1 1 2K [ kZ (x(il+1))T rf (x(il+1))k + kc(x(il+1))k ] + K8 kx(il) ? xk2 ; (4.30)
where i(l) = K8 kx(il+1) ? xk ! 0. Combining (4.29) and (4.30), it follows from (4.1) that for i large enough 1 [ kZ (x(l+1))T rf (x(l+1))k + kc(x(l+1))k ] kc(x(l+) )k + k(l) k + KK kx(l) ? x k2 8 i i i i i i 2 8 [ kZ (x(l) )T rf (x(l))k2 + kc(x(l) )k2 ] KK i i i K 1
+kc(x(il+))k + ki(l) k:
Therefore (4.26) holds with K9 = 2KK K . From (4.25), similar to the proof of Lemma 4.5, we obtain that (4.21) and (4.17) hold. i.e., kg(x(il+))k 2 1i?=21 and ki(l) k kg(x(il+))k 2 1i?=21 with 0 < 1=5. Therefore, it follows from (4.26), (4.25), (4.6), (4.7), and (4.21) that 8
1
kZ (x(il+1))T rf (x(il+1))k K9 [ i?1 + K32 2i?1 ] + 2K3 i + 2ki(l) k (K9 + K32 i?1 + 2K3 )i?1 + 4 1i?=21 1i?=21: when i is large enough.
4. LOCAL QUADRATIC CONVERGENCE
25
The results in Lemma 4.6 pave the way for quadratic convergence. Since inequality (4.25) is always true for l = 0, it follows inductively from Lemma 4.6 that (4.25) and (4.26) hold for all l with 0 l < li when i is suciently large. Theorem 4.1. Suppose Assumption 4.1 holds. Suppose the sequence generated by Algorithm 2.1 converges to x. Then there exists a neighborhood N of x such that for xk 2 N , fxk g converges to x superlinearly if
i(l) = o(kZ (x(il+) )T rf (x(il+))k):
(4:31)
Moreover, fxk g converges to x quadratically if
i(l) = O(kZ (x(il+))T rf (x(il+))k2):
(4:32)
Proof. Since x(il) ! x, it follows from Assumption 4.1 that for i suciently large, the reduced Hessian matrix at x(il) is positive de nite. Therefore, inequality (2.14 c) is always true. To establish the local convergence rate, we rst prove that there exists a constant K > 0 such that when i is suciently large
kc(x(il+))k K [ kZ (x(il) )T rf (x(il))k2 + kc(x(il))k2 ]
(4:33)
for 0 l < li . Actually, if (2.14 b) does not hold, then, similar to the proof of inequality (4.9), we know that there exists a constant N2 > 0 such that kc(x(il+) )k N2kc(x(il) )k2, which implies (4.33). On the other hand, when (2.14 b) holds with 0 l < li , we are in the inner loop. According to Algorithm 2.1, at least one inequality in (2.14) does not hold when we are in the inner loop. Because inequality (2.14 c) is always true when i is suciently large, inequality (2.14 a) must not hold for l with 0 l < li . That is, i < kZ (x(il) )T rf (x(il))k2. Thus, inequality (4.8) yields that kc(x(il+))k K3 i K3kZ (x(il) )T rf (x(il))k2. Therefore, inequality (4.33) is always true no matter whether (2.14 b) holds or not. Thus, combining (4.26) and (4.33), we have
kZ (x(il+1) )T rf (xi(l+1))k + kc(x(il+1))k (K9 + 2K )[kZ (x(il) )T rf (x(il))k2 +kc(x(il) )k2] + 2ki(l) k: (4.34) If (4.31) holds, then it follows from (4.20) and (4.34) that
kZ (x(il+1) )T rf (x(il+1))k + kc(x(il+1))k = o(kZ (x(il) )T rf (x(il))k + kc(x(il) )k): Thus, Lemma 4.1 implies that Algorithm 2.1 has a superlinear convergence rate.
5. NUMERICAL EXPERIMENTS
26
If (4.32) holds, then it follows from (4.20) and (4.34) that
kZ (x(il+1))T rf (x(il+1))k + kc(x(il+1))k = O(kZ (x(il))T rf (x(il))k2 + kc(x(il))k2): Thus, Algorithm 2.1 has a quadratic convergence rate.
5. Numerical Experiments. In the nal section, we present preliminary computa-
tional results to illustrate the performance of Algorithm 2.1. We test implementations of Algorithm 2.1 on a set of nonlinear equality constrained problems from the CUTE collection [1] and our own test problems. To exhibit the important role the change of variables plays in Algorithm 2.1, we compare the implementations of Algorithm 2.1 to a variation of our algorithm without change of variables. All our experiments were performed in MATLAB Version 4.1 on a Sun 4/670 workstation. problems
BT6 BT11 DIPIGRI DTOC2 DTOC4 DTOC6 GENHS28 HS100 MWRIGHT ORTHREGA ORTHREGC ORTHREGD TEST1 TEST2
n
5 5 7 58 29 21 300 7 5 517 505 203 500 500
m nnz(A) constraints
2 3 4 36 18 10 298 4 3 256 250 100 300 300
5 8 19 144 65 31 894 19 8 1792 1750 500 2308 2661
nonlinear nonlinear nonlinear nonlinear nonlinear nonlinear linear nonlinear nonlinear nonlinear nonlinear nonlinear quadratic nonlinear
Table I: Description of Problems Table I gives a brief description of our test problem set. Most problems in Table I (all except TEST1 and TEST2) are selected from the CUTE collection [1]. Problems TEST1 and TEST2 are our own test problems. Problem TEST1 is to minimize a Rosenbrock function [11] subject to some quadratic equality constraints, i.e., ?1[(1 ? xi)2 + 100(xi+1 ? x2)2] minimize Pin=1 i T T subject to ai x + :5x Mix = 0; i = 1; : : : ; m;
(5:1)
where ai 2