On the local convergence of a predictor-corrector ... - Semantic Scholar

Report 1 Downloads 60 Views
REPORTS ON COMPUTATIONAL MATHEMATICS, NO. 98/1997, DEPARTMENT OF MATHEMATICS, THE UNIVERSITY OF IOWA

On the local convergence of a predictor-corrector method for semide nite programming Jun Ji, Florian A. Potra and Rongqin Sheng y January 1997

Abstract

We study the local convergence of a predictor-corrector algorithm for semide nite programming problems based on the Monteiro-Zhang uni ed direction whose polynomial convergence was recently established by Monteiro. We prove that the sucient condition for superlinear convergence of Potra and Sheng applies to this algorithm and is independent of the scaling matrices. Under strict complementarity and nondegeneracy assumptions superlinear convergence with Q-order 1.5 is proved if the scaling matrices in the corrector step have bounded condition number. A version of the predictor-corrector algorithm enjoys quadratic convergence if the scaling matrices in both predictor and corrector steps have bounded condition numbers. The latter results apply in particular to algorithms using the AHO direction since there the scaling matrix is the identity matrix.

 Department of Mathematics and Computer Science, Valdosta State University, Valdosta, GA 31698, USA. The work of this author was supported in part by a grant from the Center for Faculty Development and Research at Valdosta State University. y Department of Mathematics, The University of Iowa, Iowa City, IA 52242, USA. The work of these two authors was supported in part by NSF Grant DMS 9305760.

1

1 Introduction The study of superlinear convergence of interior{point methods for linear programming (LP) was initiated in the early 90s in an e ort to explain the fact that interior point methods tend to perform signi cantly better in practice than indicated by the polynomial complexity bounds. This discrepancy is due to the limitation of the worst case analysis used in deriving polynomial complexity bounds and re ects the inherent con ict between the requirements of global convergence and fast local convergence. Superlinear convergence is especially important for semide nite programming (SDP) since no nite termination schemes exist for such problems. As predicted by theory and con rmed by numerical experiments the condition number of the linear systems de ning the search directions increases as 1=, where  is the normalized duality gap, so that the respective systems become very ill conditioned as we approach the solution. Therefore an interior point method that is not superlinearly convergent is unlikely to obtain high accuracy in practice in spite of its theoretical \polynomial complexity". On the other hand a superlinearly convergent interior point method will achieve good accuracy (e.g. 10?10 or better) in substantially fewer iterations than indicated by its worse case global linear convergence rate that is related to polynomial complexity. The local convergence analysis for interior point algorithms for SDP is much more challenging than those for LP as shown by a relatively smaller number of papers addressing this subject. The rst two papers investigating superlinear convergence of interior point algorithms were written independently by Kojima, Shida and Shindoh [4] and by Potra and Sheng [13]. The algorithm investigated in these papers is an extension of Mizuno-ToddYe predictor-corrector algorithm for LP and uses the KSH/HRVW/M search direction (see the next section for a de nition of this search direction). Kojima, Shida and Shindoh [4] established the superlinear convergence under the following three assumptions: (A) SDP has a strictly complementary solution; (B) SDP is nondegenerate in the sense that the Jacobian matrix of its KKT system is nonsingular; (C) the iterates converge tangentially to the central path in the sense that the size of the neighborhood containing the iterates must approach zero, namely, lim k(X k )1=2 S k (X k )1=2 ? (X k  S k =n)I kF =(X k  S k =n) = 0: k!1 Here k:kF denotes the Frobenius norm of a matrix and \" denotes the corresponding scalar product (see the next section for precise de nitions). In [13] we have not used assumptions (B) and (C). Instead we proposed a sucient condition for superlinear convergence that 2

is implied by the above assumptions. In [14] we improved this result and obtained superlinear convergence under assumption (A) and the following condition: k S k =pX k  S k = 0; (D) lim X k!1 which is clearly weaker than (C). Of course both (C) and (D) can be enforced by the algorithm, but the practical eciency of such an approach is questionable. However, from a theoretical point of view it is proved in [14] that the modi ed algorithm in [4] that uses several corrector steps in order to enforce (C) has polynomial complexity and is superlinearly convergent under assumption (A) only. It is well known that assumption (A) is necessary for superlinear convergence of standard interior point methods even in the QP case (see [10]). Kojima, Shida and Shindoh [4] also gave an example suggesting that interior point algorithms for SDP based on the KSH/HRVW/M search direction are unlikely to be superlinearly convergent without imposing a condition like (C). In [5] the same authors showed that a predictor-corrector algorithm using the AHO direction is quadratically convergent under assumptions (A) and (B) (see the next section for a de nition of the AHO search direction). They also proved that the algorithm is globally convergent but no polynomial complexity bounds have been found for this algorithm. It is shown that condition (C) is automatically satis ed by the iteration sequence generated by the algorithm. It appears that the use of the AHO direction in the corrector step has a strong e ect on centering. We exploited this property in [15] where we showed that a direct extension of Mizuno-Todd-Ye algorithm, based on the KSH/HRVW/M direction in the predictor step and the AHO direction in the corrector step, has polynomial complexity and is superlinearly convergent with Q-order 1:5 under assumptions (A) and (B). An interesting superlinearly convergent predictor-corrector algorithm based on the NT search direction was proposed by Luo, Sturm and Zhang [7]. The algorithm depends on a parameter  > 0. It produces points (X k ; yk; S k ) 2 NF ( k ), where the neighborhoodNF ( ) is de ned in (2.7), k = 1=4 if k := X k  S k =n  =4 and k = k = if k < =4. The algorithm starts from a feasible point (X 0; y0; S 0) 2 NF (1p=4) and for any given ~  =4 nds a feasible point (X k ; yk; S k ) with k  ~ in at most O( n ln(0=~) iterations. However this bound on the number of iterations is not proved to hold for 0 < ~ < =4, hence the algorithm is not polynomial in the usual sense. The algorithm is superlinearly convergent under assumption (A). It turns out that (C) is enforced by the algorithm since it is proved in [7] that for suciently large k k(X k )1=2 S k (X k )1=2 ? (X k  S k =n)I kF =(X k  S k =n)  (X k  S k )=(4n): It is also proved that if one uses one predictor and r correctors per iteration, then k converges to zero with Q-order 2=(1 + 2?2r ). 3

In this paper we investigate the local behavior of the predictor-corrector algorithm considered by Monteiro [9] for SDP using the MZ-family of search directions. We show that the sucient condition of Potra and Sheng [13] for superlinear convergence applies for this algorithm. The sucient condition is independent of scaling matrices. In particular we show that the algorithm is superlinearly convergent if (A) and (D) are satis ed. More speci cally, we show that under the assumptions (A) and (B), superlinear convergence with Q-order 1.5 is obtained if the scaling matrices in the corrector step have bounded condition number. Finally, we propose a new version of the predictor-corrector algorithm which enjoys quadratic convergence if the scaling matrices in both predictor and corrector steps have bounded condition numbers and (A) and (B) are satis ed. The following notation and terminology are used throughout the paper: IR p : the p-dimensional Euclidean space; IR p+ : the nonnegative orthant of IR p ; IR p++ : the positive orthant of IR p; IR pq : the set of all p  q matrices with real entries; S pp: the set of all p  p symmetric matrices; S+p : the set of all p  p symmetric positive semide nite matrices; S++: the set of all p  p symmetric positive matrices; [M ]ij : the (i; j )-th entry of a matrix M; P Tr(M ): the trace of a p  p matrix, equals pi=1[M ]ii ; M  0: M is positive semide nite; M  0: M is positive de nite; i(M ); i = 1; : : : ; n: the eigenvalues of M 2 S n ; max(M ); min(M ): the largest, smallest, eigenvalue of M 2 S n ; G  H  Tr(GT H ); k  k: Euclidean norm of a vector and the corresponding norm of a matrix, i.e., qP p 2 kyk  pi=1 yi ; kM k  maxfkMyk : kyk = 1g ; of a matrix; kM kF  Mp M; M 2 IR pq : Frobeniuspnorm  q k(G; H )kF  G  G + H  H; G; H 2 IR ; M k = o(1): kM k k ! 0 as k ! 1; M k = O(1): kM k k is bounded; M k = o(k ): M k =k = o(1); M k = O(k ): M k =k = O(1).

4

2 The predictor-corrector algorithm for SDP We consider the semide nite programming (SDP) problem: minfC  X : Ai  X = bi ; i = 1; : : : ; m; X  0g;

(2.1)

and its associated dual problem: m X T maxfb y : iAi + S i=1

= C; S  0g;

(2.2)

where C 2 S nn; Ai 2 S nn ; i = 1; : : : ; m; b = (b1 ; : : : ; bm )T 2 IR m are given data, and X 2 S+n , (y; S ) 2 IR m  S+n are the primal and dual variables, respectively. By G  H we denote the trace of (GT H ). Also, for simplicity we assume that Ai; i = 1; : : : ; m, are linearly independent. Throughout this paper we assume that both (2.1) and (2.2) have nite solutions and their optimal values are equal. Under this assumption, X  and (y; S ) are solutions of (2.1) and (2.2) if and only if they are solutions of the following nonlinear system:

Ai  X = bi; i = 1; : : : ; m; m X yiAi + S = C; i=1

XS = 0; X  0; S  0: We denote the feasible set of the problem (2.3) by

F = f(X; y; S ) 2 S+n  IR m  S+n : (X; y; S ) satis es (2:3a) and (2:3b)g and its solution set by F , i.e., F  = f(X; y; S ) 2 F : X  S = 0g: We consider the symmetrization operator [17] i h HP (M ) = 21 PMP ?1 + (PMP ?1)T ; 8M 2 IR nn: Since, as observed by Zhang [17],

HP (M ) = I i M = I; 5

(2.3a) (2.3b) (2.3c)

for any nonsingular matrix P , any matrix M with real spectrum, and any  2 IR , it follows that for any given nonsingular matrix P , (2.3) is equivalent to Ai  X = bi ; i = 1; : : : ; m; (2.4a) m X yiAi + S = C; (2.4b) i=1

HP (XS ) = 0; X  0; S  0: (2.4c) A perturbed Newton method applied to the system (2.4) leads to the following linear system: HP (XV + US ) = I ? HP (XS ); Ai  U = 0; i = 1; : : : ; m; m X wiAi + V = 0;

(2.5a) (2.5b) (2.5c)

i=1 m n (U; w; V ) 2 S  IR  S n is

where the unknown search direction,  2 [0; 1] is the centering parameter, and  = (X  S )=n is the normalized duality gap corresponding to (X; y; S ). The search direction obtained through (2.5) is called the Monteiro-Zhang (MZ) uni ed direction [17, 11]. The matrix P used in (2.5) is called the scaling matrix for the search direction. It is well known that taking P = I results in the Alizadeh-Haeberly-Overton (AHO) search direction [1], P = S 1=2 corresponds to the Kojima-Shindoh-Hara/HelmbergRendl-Vanderbei-Wolkowicz/Monteiro (KSH/HRVW/M) search direction [6, 3, 8], and the case of P T P = X ?1=2[X 1=2 SX 1=2 ]1=2 X ?1=2 coincides with the Nesterov-Todd (NT) search direction [12]. Monteiro and Zhang [11] established the polynomiality of a long-step pathfollowing method based on search directions de ned by scaling matrices belonging to the class fW 1=2 : W 2 S+n such that WXS = SXW g: Following [11], Sheng et al. [16] proved the polynomiality of a Mizuno-Todd-Ye type predictor-corrector algorithm for SDP by imposing the scaling matrices to be chosen from the class fP : P 2 IR nn is nonsingular and PXSP ?1 2 S ng: Moreover, its superlinear convergence was proved under an addtional simple condition. The primal-dual algorithms considered by Monteiro [9] are based on the centrality measure

d(X; S )  kX 1=2SX 1=2 ? I kF 6

=

n X i=1

(i(XS ) ? )2

!1=2

;

(2.6)

where (X; S ) 2 S+n  S+n ,  = (X  S )=n = (Pni=1 i(XS ))=n. Given 2 (0; 1), we denote by N ( ) the following neighborhood of the central path:

N ( ) = f(X; y; S ) 2 F : d(X; S )  ; X  0; S  0;  = (X  S )=ng:

(2.7)

Monteiro's generalized predictor-corrector algorithm for semide nite programming based on the MZ family of directions consists of a predictor step and a corrector step at each iteration. Starting from a strictly feasible pair (X 0 ; y0; S 0) in N ( ), it generates a sequence of iterates f(X k ; yk ; S k )g in N ( ). An iteration of Monteiro's generalized predictor-corrector algorithm can be described as follows.

Predictor-Corrector Algorithm Given (X k ; yk ; S k ) 2 N ( ), choose nonsingular n  n matrices P k and P k .  Predictor Step. Solve the system (2.5) with (X; y; S ) = (X k ; yk ; S k ),  = 0 and P = P k . Denote the solution (U; w; V ) 2 S n  IR m  S n , and set X k () = X k + U; yk () = yk + w; S k () = S k + V:

(2.8)

Compute the step length

n o k = max ~ 2 [0; 1] : (X k (); S k ()) 2 N ( ) 8  2 [0; ~] :

(2.9)

 Corrector Step. Solve the system (2.5) with (X; y; S ) = (X k (k ); yk(k ); S k (k )),  = 1

and P = P k . Let (U; w; V ) be the solution, and set

X k+1 = X k (k ) + U; yk+1 = yk(k ) + w; S k+1 = S k (k ) + V :

(2.10)

End of iteration. Using an elegant analysis, Monteiro [9] proved that the predictor-corrector algorithm de ned above with properly p chosen parameters and (0 < < < 1) is well de ned and that it needs at most O( n ln(0 =)) iterations for producing a pair (X k ; yk; S k ) such that X k  S k  , where 0 = X 0  S 0 is the initial gap. More precisely, Monteiro showed that (X k ; yk ; S k ) 2 N ( ) and (X k (k ); yk (k ); S k (k )) 2 N ( );

(2.11)

X k+1  S k+1 = (1 ? 1=O( n))X k  S k ;

(2.12)

p

for all k  0.

7

3 Technical results In analyzing the local behavior of the predictor-corrector algorithm of Monteiro, we need the following technical result proved in [8, Lemma 2.6] and [9, Lemma 2.1(b)].

Lemma 3.1 Suppose that M 2 IR pp is a nonsingular matrix and E 2 IR pp has at least one real eigenvalue. Then,

max(E ) min(E ) kGkF d(G; J )

   

max(HM (E )); min(HM (E )); kHM (G)kF ; 8G 2 S p; n : kHM (GJ ? GnJ I )kF ; 8G; J 2 S++

(3.1) (3.2) (3.3) (3.4)

The following lemma is part of Lemma 3.5 of Monteiro [9].

Lemma 3.2 Let W 2 IR nn be such that GWG?1 is skew-symmetric for some nonsingular n  n G 2 IR . Then, p (3.5) kW kF  22 kW ? W T kF : The following technical result will play an important role in our analysis.

p Lemma 3.3 Let (X; y; S ) 2 N ( ) for some 2 [0; 2 ? 1). Suppose that (Dx; y; Ds) 2 S nn  IR m  S nn is a solution of the linear system: HP (XDs + DxS ) = HP (K ); Ai  Dx = 0; i = 1; : : : ; m; m X yiAi + Ds = 0; i=1

for some K 2 IR nn . Then we have

(i) kX 1=2 DsX 1=2 k2F + kX ?1=2 DxX ?1=2 k2F  12kX ?1=2 KX 1=2 k2F ,

(ii) kX ?1=2 (XDs + Dx S ? K )X 1=2 kF  2 kX ?1=2 KX 1=2 kF , where p p p p2 + 1 ; 2 = 2 + 2 1 :

1 = (1 ? ( 2 + 1) )

8

(3.6a) (3.6b) (3.6c)

Proof. By denoting

we can write and

X 0 = PXP T ; S 0 = (P ?1)T SP ?1; Dx0 = PDxP T ; Ds0 = (P ?1)T DsP ?1; K 0 = PKP ?1; x = kX ?1=2DxX ?1=2 kF ; s = kX 1=2DsX 1=2 kF X 0Ds0 + Ds0 X 0 + Dx0 S 0 + S 0Dx0 = K 0 + K 0T ;

(3.7)

Dx0  Ds0 = 0: It is easily seen that Q^ = (X 0)1=2 (P ?1)T X ?1=2 is orthogonal. Then ^ 1=2 P T = PX 1=2Q^ T ; (X 0)1=2 = QX and Using the notation it follows that

^ ?1=2 P ?1: (X 0)?1=2 = (P ?1)T X ?1=2 Q^ T = QX

B = X 0Ds0 + Dx0 S 0 ? K 0;

(X 0)?1=2 B (X 0)1=2 = (X 0)1=2 Ds0 (X 0)1=2 + (X 0)?1=2 Dx0 S 0(X 0)1=2 ? (X 0)?1=2 K 0(X 0)1=2 = Q^ [X 1=2 DsX 1=2 + X ?1=2 DxSX 1=2 ? X ?1=2 KX 1=2 ]Q^ T : (3.8) Using (3.8) and Lemma 3.2 with W = (X 0)?1=2 B (X 0)1=2 and G = (X 0)1=2 , we have

k(X 0p)?1=2 B (X 0)1=2 kF  22 k(X 0)?1=2B (X 0)1=2 ? [(X 0)?1=2 B (X 0)1=2 ]T kF p p 2  2 kX ?1=2DxSX 1=2 ? [X ?1=2 DxSX 1=2 ]T kF + 22 kX ?1=2 KX 1=2 ? [X ?1=2 KX 1=2 ]T kF p  22 kX ?1=2DxX ?1=2 (X 1=2SX 1=2 ? I ) ? [X ?1=2 DxX ?1=2(X 1=2 SX 1=2 ? I )]T kF p + 2kX ?1=2KX 1=2 kF p p  2 x + 2kX ?1=2KX 1=2 kF : (3.9) 9

On the other hand, using (3.8) again, we obtain

kX ?1=2KX 1=2 kF ^ ?1=2 KX 1=2 Q^ T kF = kQX  kX 1=2 DsX 1=2 + X ?1=2 DxSX 1=2kF ? k(X 0)?1=2 B (X 0)1=2 kF = kX 1=2 DsX 1=2 + X ?1=2 DxX ?1=2 + X ?1=2 DxX ?1=2 (X 1=2SX 1=2 ? I )kF ?k(X 0)?1=2 B (X 0)1=2 kF  kX 1=2 DsX 1=2 + X ?1=2 DxX ?1=2kF ?kX ?1=2 DxX ?1=2(X 1=2 SX 1=2 ? I )kF ? k(X 0)?1=2 B (X 0)1=2 kF    x2 + s2 1=2 ? x ? k(X 0)?1=2 B (X 0)1=2 kF   p p  x2 + s2 1=2 ? ( 2 + 1) x ? 2kX ?1=2KX 1=2 kF   p p  x2 + s2 1=2 (1 ? ( 2 + 1) ) ? 2kX ?1=2KX 1=2 kF which implies (i). Then (ii) follows from (i), (3.9), and the fact that kX ?1=2(XDs + DxS ? K )X 1=2 kF = k(X 0)?1=2 B (X 0)1=2 kF : It is interesting to note that the inequalities in the above lemma are independent of the scaling matrix P . In the next lemma we establish a lower bound for the stepsize k , which together with Lemma 3.3 enables us to analyze the asymptotic behavior of the predictorcorrector algorithm.

Lemma 3.4 Let (X k ; S k ), (U; V ), and k be generated by the predictor-corrector algorithm.

Then

where

k  ^k ; !k = 1 k(X k )?1=2 (X k V + US k + X k S k )(X k )1=2 kF ; k 1 k =  k(X k )?1=2 UV (X k )1=2 kF ; k 2 ^k = r : 2 !k + ? + 4k + !k + ? ? ? ? 10

Proof. For simplicity, let us omit the index k. By (2.8), we have X ()S () = XS + (XV + US ) + 2UV; which together with the linearity of HP (), the fact that Tr[HP (M )] = TrM for M 2 IR nn, and (2.5a) with  = 0 imply that X ()  S () = Tr[X ()S ()] = Tr[HP (X ()S ())] = Tr[(1 ? )HP (XS ) + 2 HP (UV )] = (1 ? )X  S + U  V: Using the fact that U  V = 0 we have () = (X ()  S ())=n = (1 ? )(X  S )=n = (1 ? ): Therefore, X ()S () ? ()I = (1 ? )(XS ? I ) + (XV + US + XS ) + 2 UV; and kHX ?1=2 (X ()S () ? ()I )kF = (1 ? )kX 1=2SX 1=2 ? I kF + kX ?1=2(XV + US + XS )X 1=2kF +

2 kX ?1=2UV X 1=2 kF  (1 ? )  + w + 2  (1 ? ) for 0    ^ = (): Hence, we have X ()  0 and S ()  0 for all  2 [0; ^]. Otherwise, there exists a 0 2 [0; ^] such that X (0 )S (0) is singular, which means min(X (0)S (0) ? (0)I )  ?(0): (3.10) On the other hand, (3.2) with M = X ?1=2 and E = X (0)S (0) ? (0)I implies that min(X (0)S (0) ? (0)I )  min(HX ?1=2 (X (0)S (0) ? (0)I )  ?kHX ?1=2 (X (0)S (0) ? (0)I )kF  ? (0) > ?(0); which contradicts (3.10). Using (3.4) with G = X (); J = S (), and M = X ?1=2 , we have d(X (); S ())  kHX ?1=2 (X ()S () ? ()I )kF  () for  2 [0; ^]: Therefore, (X (); S ()) 2 N ( ) for 0    ^. The result follows from the de nition of . 11

4 A sucient condition for superlinear convergence In this section we will investigate the asymptotic behavior of the predictor-corrector algorithm and obtain a sucient condition for superlinear convergence. De nition 4.1 A triple (X ; y; S ) 2 F  is called a strictly complementary solution of (2.3) if X  + S   0. Throughout the paper we assume that the following condition holds. Assumption 1. The SDP problem has a strictly complementary solution (X ; y; S ). Let Q = (q1 ; : : : ; qn) be an orthogonal matrix such that q1; : : : ; qn are eigenvectors of X  and S , and de ne IB = fi : qiT X  qi > 0g; IN = fi : qiT S  qi > 0g: It is easily seen that IB [ IN = f1; 2; : : : ; ng. For simplicity, let us assume that     QT X Q = 0B 00 ; QT S Q = 00 0 ; N where B and N are diagonal matrices. Here and in the sequel, if we write a matrix M in the block form   M M 11 12 M= M M ; 21 22 then we assume that the dimensions of M11 and M22 are jIB jjIB j and jIN jjIN j, respectively. In the next lemma we use the following notation: QT X 1=2 Q = (x1 ; x2 ; : : : ; xn); QT S 1=2Q = (s1; s2; : : : ; sn); QT X ?1=2 Q = (xb1; xb2 ; : : : ; xbn ); QT S ?1=2 Q = (sb1 ; sb2 ; : : : ; sbn): Lemma 4.2 (Potra-Sheng [13, Lemma 4.4]) Under Assumption 1 we have kxik = O(p); ksik = O(1); 8i 2 IN ; kxik = O(1); ksik = O(p); 8i 2 IB ; kxbik = O(1); ksbik = O(1=p); 8i 2 IB ; kxbik = O(1=p); ksbik = O(1); 8i 2 IN : 12

Using Lemma 4.3, we can write p )     O (p O(1) k ; QT (X k )?1=2 Q = O(1) p QT (X k )1=2 Q = OO(p(1) k ) O( k ) O(1) O(1= k ) ;

p p ) O(p )    O (p k k ; QT (S k )?1=2 Q = O(1= k ) O(1) : O( k ) O(1) O(1) O(1) Using the same techniques, we obtain a similar result for the predicted pair (X k ; S k ). QT (S k )1=2 Q =



Lemma 4.3 Let X k = X k (k ); S k = S k (k ). If Assumption 1 is satis ed, then we have    pk+1)  O (1) O ( O (1) O (1) k 1=2 k ?1=2 T T Q (X ) Q = O(p ) O(p ) ; Q (X ) Q = O(1) O(1=p ) ; k+1 k+1 k+1  p  pk+1)  pk+1) O(1)   ) O ( O ( O (1 = k 1=2 k ?1=2 k +1 T T Q (S ) Q = O(p ) O(1) O(1) : O(1) ; Q (S ) Q = k+1 As in [13], let us de ne a linear manifold:

M  f(X 0; y0; S 0) 2 S n  IR m  S n : Ai  X 0 = bi; i = 1; : : : ; m; m X yi0 Ai + S 0 = C; i=1 qiT X 0qj = 0 if i or j 2 IN ; qiT S 0qj = 0 if i or j 2 IB g:

(4.1)

It is easily seen that if (X 0; y0; S 0) 2 M, then

QT X 0Q =



MB 0  ; Q T S 0 Q =  0 0  : 0 0 0 MN

Lemma 4.4 (Potra-Sheng [13, Lemma 4.5]) Under Assumption 1, F   M. Lemma 4.5 (Potra-Sheng [13, Lemma 4.6]) Under Assumption 1, every accumulation point of (X k ; y k ; S k ) is a strictly complementary solution of (2.3).

13

Let us de ne

k = k (?) = 1 k(X k )?1=2 (X k ? X k )(S k ? Sk )(X k )1=2 kF ; k

(4.2)

where (X k ; yk ; Sk ) is the solution of the following minimization problem: minfk(X k )?1=2 (X k ? X 0)(S k ? S 0)(X k )1=2 kF : (X 0; y0; S 0) 2 M; k(X 0; S 0)kF  ?g; (4.3) and ? is a constant such that k(X k ; S k )kF  ?; 8k. Note that every accumulation point of (X k ; yk ; S k ) belongs to the feasible set of the above minimization problem and the feasible set is bounded. Therefore (X k ; Sk ) exists for each k.

Theorem 4.6 Under Assumption 1, if k ! 0 as k ! 1, then the predictor-corrector algorithm is superlinearly convergent. Moreover, if there exists a constant  > 0 such that  k = O(k ), then the convergence has Q-order at least 1+ in the sense that k+1 = O(1+ k ).

 w+y? Proof. For simplicity, let us omit the index k. It is easily seen that (U + X ? X; y; V + S ? S) satis es (3.6) with K = (X ? X )(S ? S): (4.4) Here we have used the relation X S = SX = 0. The matrix  = X ?1=2(X ? X )(S ? S)X 1=2

(4.5)

clearly satis es the equation

kkF = kX ?1=2KX 1=2 kF = :

(4.6)

Denoting

x = X ?1=2 (U + X ? X )X ?1=2; s = X 1=2 (V + S ? S)X 1=2 ; and applying (i) of Lemma 3.3, we obtain

kxkF  1kkF  1; which implies

kxkF  1: 14

(4.7)

Similarly,

kskF  1kkF  1:  y; S) 2 M, we have By Lemma 4.3 and the fact that (X;  ?1=2 kF kX ?1=2(X ? X )X ?1=2 kF = kI ? X ?1=2 XX  ?1=2kF  kI kF + kX ?1=2 XX p  T X ?1=2 QkF = n + kQT X ?1=2QQT XQQ p  (xc1 ; : : : ; xcn)T kF = n + k(xc1 ; : : : ; xcn)QT XQ pn + k P T  x xc T k = i j F i;j 2IB (qi Xqj )c = O(1):

(4.8)

(4.9)

In a similar manner we obtain

kS ?1=2 (S ? S)S ?1=2 kF = O(1): Let us observe that







X ?1=2 UV X 1=2 = X ?1=2UX ?1=2 X 1=2 V X 1=2    = x ? X ?1=2 (X ? X )X ?1=2 s ? X 1=2(S ? S)X 1=2   = xs ? X ?1=2(X ? X )X ?1=2 s   ?x X 1=2S 1=2 S ?1=2 (S ? S)S ?1=2 S 1=2 X 1=2 + : Then from (4.6), (4.7), (4.8), (4.9) and (4.10), we get

kX ?1=2UV X 1=2 kF  kxkF kskF + kX ?1=2(X ? X )X ?1=2kF kskF +kX 1=2S 1=2 k2 kxkF kS ?1=2(S ? S)S ?1=2kF + kkF = O():

Hence, k = O(k ) ! 0. Applying (ii) of Lemma 3.3, we obtain kX ?1=2 [X (V + S ? S) + (U + X ? X )S ? K ]X 1=2 kF  2kkF  2: Noting that

X (V + S ? S) + (U + X ? X )S ? K = XV + US + XS; 15

(4.10)

we deduce

! = 1 kX ?1=2(XV + US + XS )X 1=2kF  2 = O():

Hence, !k ! 0. Finally, if k = O(k ) for some constant  > 0, then we have k = O(k ) and !k = O(k ). From Lemma 3.4, it follows that 2 1 ?   1 ? r 2 ! 4 ! ? + 1 + ? + ? + 1 r

=

 = =

2

! 4 != ( ? ) ? + 1 + ? ? 1 r + r   ! + 1 2 + 4 + ! + 1 ! + 1 2 + 4 + ! + 1 ? ? ? ? ? ? !2 ! + ! + 1 + 4 ? 1 ? ? ? 2 3! + ! + 4 ? ( ? )2 ? O() = O( ):

 Therefore, k+1 = (1 ? k )k = O(1+ k ). Lemma 4.6 was originally obtained by Potra and Sheng [14]. Based on Lemma 4.6, we establish the following generalization of the result of Potra and Sheng [14, Theorem 6.1]. p Theorem 4.7 Under Assumption 1, if X k S k = X k  S k ! 0 as k ! 1, then the predictorcorrector algorithm is superlinearly convergent. Moreover, if X k S k = O(0k:5+ ) for some constant  > 0, then the convergence has Q-order at least 1 + minf; 0:5g.

5 Superlinear convergence under strict complementarity and nondegeneracy Throughout this section, we will assume that Assumption 1 (strict complementarity) holds. Let (X ; y; S ) be a strictly complementary solution of (2.1) and (2.2). We will also assume the following nondegeneracy condition introduced by Kojima, Shida and Shindoh [4, 5]. 16

First, let us de ne an ane space G0 by

G0 = f(U; V ) 2 S n  S n : Ai  U = 0; i = 1;    ; m; m X wiAi + V = 0; wi 2 IR m g: i=1

Assumption 2. (Nondegeneracy) If X V + US  = 0 and (U; V ) 2 G0 , then (U; V ) = 0. As remarked in Section 5 of Kojima, Shida and Shindoh [5], under the strict complementarity assumption, the above nondegeneracy condition is equivalent to the combination of primal and dual nondegeneracy conditions given by Alizadeh, Haeberly and Overton [2].   Under nAssumptions o 1 and 2, the solution (X ; S ) is unique. Therefore the iteration sequence (X k ; S k ) converges to (X  ; S ) and so does the sequence of predicted pairs n k k o (X ; S ) .

Lemma 5.1 (Kojima-Shida-Shindoh [5], Lemma 5.3) Assume that HI (US  + X V ) = 0 and (U; V ) 2 G0 : Then (U; V ) = (0; 0).

Let R be a nonsingular matrix and A~i = (R?1)T AiR?1 ; i = 1; : : : ; m; C~ = (R?1)T CR?1 ; ~b = b: It is easily seen that the R-scaled SDP A~i  X = ~bi; i = 1; : : : ; m; m X ~ yiA~i + S = C;

(5.1a) (5.1b)

XS = 0; X  0; S  0;

(5.1c)

i=1

also satis es the strict complementarity and nondegeneracy conditions. Its unique solution is (RX RT ; y; (R?1)T S  R?1). Using Lemma 5.1 and considering the new SDP (5.1), we can easily obtain the following lemma. 17

Lemma 5.2 Assume that for some nonsingular matrix R, HR (US  + X V ) = 0 and (U; V ) 2 G0 : Then (U; V ) = (0; 0).

In the next lemma condF (B ) = kB kF kB ?1 kF denotes the condition number of a matrix B .

Lemma 5.3 If condF (P k ) = O(1), then

k(U k ; V k )kF = O(pk+1):

(5.2)

Proof. Let Rk = P k =kP k kF : Then kRk kF = 1 and k(Rk )?1 kF = condF (P k ) = O(1). At the corrector step of the algorithm, we have 







HRk U k S k + X k V k = HRk k+1I ? X k S k ;

(5.3)

where X k = X k (k ); S k = S k (k ). From Lemma 4.3, it is easily seen that

kX k S k + S k X k kF = O(pk+1): n o Suppose (5.2) is not true, i.e., the sequence (U k ; V k )=pk+1 is unbounded. Then we can choose a subsequence such that

and

(U k ; V k ) ! 1; pk+1 Rk ! R ; (Rk )?1 ! (R)?1;

(U k ; V k ) ! (U 0 ; V 0) 6= 0: k(U k ; V k )kF Obviously, (U 0 ; V 0) 2 G0 . The fact that the matrices Ai; i = 1;    ; m are linearly independent, together with (U 0 ; V 0) 2 G0 , implies that (U 0 ; V 0 ) 6= 0. Dividing both sides of (5.3) by k(U k ; V k )kF and letting k ! 1 along a subsequence, we obtain

HR (U 0 S  + X  V 0) = 0; which contradicts Lemma 5.2. 18

Theorem 5.4 Under the strict complementarity and nondegeneracy assumptions, if condF (P k ) =

O(1), then the algorithm is superlinearly convergent with Q-order at least 1.5. Proof. At the predictor step, we have (X k ; S k ) = (X k?1; S k?1) + (U k?1; V k?1): Thus, HP k?1 (X k S k )   = HP k?1 [X k?1 + U k?1][S k?1 + V k?1] = k I + HP k?1 (U k?1V k?1): Then, by Lemma 5.3, we obtain kHP k?1 (X k S k )kF = O(k): Note that 2 kHP k?1 (X k S k )k2F = kP k?1X k S k (P k?1)?1kF + k(X k )1=2 S k (X k )1=2 k2F 2 (5.4)  kP k?1X k S k (P k?1)?1kF : Therefore, kX k S k kF = k(P k?1)?1P k?1X k S k (P k?1)?1 P k?1kF  O(1)kHP k?1 (X k S k )kF = O(k ); (5.5) which ends the proof by invoking Theorem 4.7. The above result says that the superlinear convergence of the predictor-corrector algorithm is independent of the choice of the scaling matrix P k in the predictor step of the algorithm, while the scaling matrices used in the corrector step need to be \well-conditioned" for superlinear convergence. Clearly, the family of scaling matrices admissible in the corrector step for superlinear convergence includes the identity matrix de ning the AHO as a special case. By imposing the same assumption on the scaling matrices used in the predictor step and a new strategy for the step size, we can improve the order of convergence stated in Theorem 5.4. In order to achieve quadratic convergence we need to slightly modify the choice of the step size. Instead of k given by (2.9), we will use: n o k = max ~ 2 [0; maxf:99; 1 ? 2k g] : (X k (); S k ()) 2 N ( ) 8  2 [0; ~] : (5.6) 19

The predictor-corrector algorithm with this new strategy will be called the modi ed predictorcorrector algorithm. It is easily seen that the modi ed predictor-corrector algorithm still has polynomial complexity. In what follows we will show that it is also quadratically convergent. Theorem 5.5 Under the hypothesis of Theorem 5.4, if condF (P k ) = O(1), then the modi ed predictor-corrector algorithm is quadratically convergent. Proof. From the proof of Theorem 5.4 (cf. (5.5)), we have X k S k = O(k): (5.7) Using (5.7) and an argument similar to that employed in the proof of Lemma 5.3 we get U k = O(k ); V k = O(k): (5.8) Then we can write HP k (X k S k ) = HP k ([X k + k U k ][S k + k V k ]) = (1 ? k )HP k (X k S k ) + 2k HP k (U k V k ) = (1 ? k )O(k ) + O(2k) = O(!k k ); where !k = maxfk ; 1 ? k g. As in (5.4)-(5.5), we can prove that kX k S k kF = O(!k k ): (5.9) Using (5.9) and the same argument as in Lemma 5.3, we get U k = O(!kk ) and V k = O(!kk ): (5.10) Observing that HP k (X k+1S k+1) ? k+1I = HP k (U k V k ) = O(!k22k ); we have 2 ) 2 2 ) O ( ! O ( ! k k k k k +1 k +1 = kHP k (X S ) ? k+1I kF =k+1 =  : 1 ? k k+1 Since condF (P k )  C1 and condF (P k )  C1 for some constant C1, we can write kHP k (X k S k ) ? k I kF =k  C1kX k S k ? k I kF =k (5.11) k ? 1 k ? 1 (5.12)  C12kP X k S(k (P )?1 ? k I kF)=k 2  C2k?1 max 1 ?k?1 ; 1 ? k?1 ; (5.13) k?1 20

where C2 is a positive constant. Without loss of generality, we may assume k?1  minf:1; =C2g for k  K; which, together with (5.6) and (5.13), implies that k?1  1 ? 2k?1 for k  K; and k  kHP k (X k S k ) ? k I kF =k  C2k?1  for k  K: Let 2 ; k = kP k U k V k (P k )?1kF =k : k0 = q 1 + 4k =( ? k ) + 1 Evidently, k = O(k). Then for all  2 [0; k0 ], we have k(X k ())1=2 S k (S k ())1=2 ? k I kF  kHP k (X k ()S k ()) ? k I kF  [(1 ? ) k + 2 k ]k  (1 ? )k : This means k  k0 for k  K: Therefore 1 ? k  1 ? k0 = O(k ) = O(k ); and k+1 = (1 ? k )k = O(2k ):

6 Remarks In this paper we only consider the feasible version of the predictor-corrector method to keep the presentation simple. However, the analysis used here can be easily extended to the infeasible predictor-corrector algorithms based on the uni ed direction proposed by Monteiro and Zhang. Under the strict complementarity and nondegeneracy assumptions we have established the superlinear convergence with Q-order 1.5 of the \pure" predictor-corrector algorithm if the scaling matrices for the corrector step satisfy condF (P k ) = O(1). Whether superlinear convergence can be obtained under a weaker condition is an interesting topic for future research. Finally, we mention that quadratic convergence is established for the predictor-corrector algorithm with a slight modi cation of the step size selection. It would be interesting to nd out whether quadratic convergence can be proved for the \original" predictor-corrector algorithm. 21

References [1] F. Alizadeh, J.-P. A. Haeberly, and M. L. Overton. Primal-dual interior point methods for semide nite programming. Working paper, 1994. [2] F. Alizadeh, J.-P. A. Haeberly, and M. L. Overton. Complementarity and nondegeneracy in semide nite programming. Technical Report 681, NYU Computer Science Dept, 1995. To appear in Mathematical Programming (Series B). [3] C. Helmberg, F. Rendl, R.J. Vanderbei, and H. Wolkowicz. An interior-point method for semide nite programming. Technical report, Program in Statistics and Operations Research, Princeton University, 1994. [4] M. Kojima, M. Shida, and S. Shindoh. Local convergence of predictor{corrector infeasiblee{interior{point algorithms for semide nite programs. Research Reports on Information Sciences B-306, Department of Information Sciences, Tokyo Institute of Technology, 2-12-1 Oh-Okayama, Meguro-ku, Tokyo 152, Japan, December 1995. [5] M. Kojima, M. Shida, and S. Shindoh. A predictor-corrector interior-point algorithm for the semide nite linear complementarity problem using the Alizadeh-Haeberly-Overton search direction. Research Reports on Information Sciences B-311, Department of Information Sciences, Tokyo Institute of Technology, 2-12-1 Oh-Okayama, Meguro-ku, Tokyo 152, Japan, January 1996. [6] M. Kojima, S. Shindoh, and S. Hara. Interior-point methods for the monotone linear complementarity problem in symmetric matrices. Research Reports on Information Sciences B-282, Department of Information Sciences, Tokyo Institute of Technology, 2-12-1 Oh-Okayama, Meguro-ku, Tokyo 152, Japan, April 1994. [7] Z-Q. Luo, J. F. Sturm, and S. Zhang. Superlinear convergence of a symmetric primaldual path following algorithm for semide nite programming. Report 9607/A, Econometric Institute, Erasmus University Rotterdam, The Netherlands, January 1996. [8] R. D. C. Monteiro. Primal-dual path following algorithms for semide nite programming. Working paper, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA, September 1995. [9] R. D. C. Monteiro. Polynomial convergence of primal-dual algorithms for semide nite programming based on Monteiro and Zhang family of directions. Working paper, 22

[10] [11] [12] [13]

[14]

[15]

[16]

[17]

School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA, July 1996. R. D. C. Monteiro and S. J. Wright. Local convergence of interior-point algorithms for degenerate monotone LCP. Computational Optimization and Applications, 3:131{155, 1994. R. D. C. Monteiro and Y. Zhang. A uni ed analysis for a class of path-following primaldual interior-point algorithms for semide nite programming. Working paper, June 1996. Y. E. Nesterov and M. J. Todd. Primal{dual interior{point methods for self{scaled cones. Technical Report 1125, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, New York 14853{3801, USA, 1995. F. A. Potra and R. Sheng. A superlinearly convergent primal{dual infeasible{interior{ point algorithm for semide nite programming. Reports on Computational Mathematics 78, Department of Mathematics, The University of Iowa, Iowa City, IA 52242, USA, October 1995, revised January 1997, to appear in SIAM Journal on Optimization. F. A. Potra and R. Sheng. Superlinear convergence of interior-point algorithms for semide nite programming. Reports on Computational Mathematics 86, Department of Mathematics, The University of Iowa, Iowa City, IA 52242, USA, April 1996. Revised May 1996. F. A. Potra and R. Sheng. Superlinear convergence of a predictor-corrector method for semide nite programming without shrinking central path neighborhood. Reports on Computational Mathematics 91, Department of Mathematics, The University of Iowa, Iowa City, IA 52242, USA, August 1996. R. Sheng, F. A. Potra, and J. Ji. On a general class of interior-point algorithms for semide nite programming with polynomial complexity and superlinear convergence. Reports on Computational Mathematics 89, Department of Mathematics, The University of Iowa, Iowa City, IA 52242, USA, June 1996. Y. Zhang. On extending primal{dual interior{point algorithms from linear programming to semide nite programming. TR 95-20, Department of Mathematics and Statistics, University of Maryland Baltimore County, Baltimore, Maryland 21228{5398, USA, October 1995. 23