SEMIDEFINITE PROGRAMMING IN THE SPACE OF ... - CiteSeerX

Report 2 Downloads 81 Views
SIAM J. OPTIM. Vol. 14, No. 1, pp. 139–172

c 2003 Society for Industrial and Applied Mathematics 

SEMIDEFINITE PROGRAMMING IN THE SPACE OF PARTIAL POSITIVE SEMIDEFINITE MATRICES∗ SAMUEL BURER† Abstract. We build upon the work of Fukuda et al. [SIAM J. Optim., 11 (2001), pp. 647–674] and Nakata et al. [Math. Program., 95 (2003), pp. 303–327], in which the theory of partial positive semidefinite matrices was applied to the semidefinite programming (SDP) problem as a technique for exploiting sparsity in the data. In contrast to their work, which improved an existing algorithm based on a standard search direction, we present a primal-dual path-following algorithm that is based on a new search direction, which, roughly speaking, is defined completely within the space of partial symmetric matrices. We show that the proposed algorithm computes a primal-dual solution to the SDP problem having duality gap less than a fraction ε > 0 of the initial duality gap in O(n log(ε−1 )) iterations, where n is the size of the matrices involved. Moreover, we present computational results showing that the algorithm possesses several advantages over other existing implementations. Key words. semidefinite programming, sparsity, matrix completion, numerical experiments AMS subject classifications. 90C06, 90C22, 90C51 DOI. 10.1137/S105262340240851X

1. Introduction. The semidefinite programming (SDP) problem has been studied extensively in recent years, and many different types of algorithms for solving SDPs have been proposed. Various primal-dual interior-point methods for linear programming can be extended to SDP with equivalent iteration complexities, typically √ O( n log(ε−1 )), where n is the size of matrices in the SDP problem and ε > 0 is the desired fractional reduction in the duality gap; for example, see [1, 15, 17, 22, 23, 25, 28, 30, 32]. In practice, these methods have many advantages, including applicability to any standard form SDP, accurate primal-dual optimal solutions in a small number of iterations, and exploitation of sparsity in certain key stages of the algorithm. On the other hand, they also exhibit some notable disadvantages, such as the need to compute, store, and work with dense matrices—in particular, handling the n × n primal iterate X and the m × m Schur complement matrix M , where m is the number of linear constraints in the primal SDP, as well as solving the Schur complement equation involving M . Techniques for dealing with the disadvantages of primal-dual methods have also been developed. For example, to avoid working with the dense matrix X (while maintaining the use of M ), Benson, Ye, and Zhang [2] have developed a polynomialtime interior-point method that involves only the dual variables (S, y) and the lower Cholesky factor L of S, since S and L are generally sparse when the SDP data is sparse. In contrast, others have eliminated the need to compute and store M (while maintaining the use of primal-dual iterates (X, S, y)) by using iterative methods such as the preconditioned conjugate gradient method to solve the Schur complement equation (see [20, 27, 29]). When solving the Schur complement equation using an iterative method, however, an inevitable side effect is the increased difficulty of obtaining ac∗ Received

by the editors May 29, 2002; accepted for publication (in revised form) January 24, 2003; published electronically July 18, 2003. This work was supported in part by NSF grant CCR0203426. http://www.siam.org/journals/siopt/14-1/40851.html † Department of Management Sciences, University of Iowa, Iowa City, IA 52242-1000 ([email protected]). 139

140

SAMUEL BURER

curate primal-dual optimal solutions, due to the ill-conditioning of the matrix near optimality. Other methods, the so-called first-order nonlinear programming algorithms for SDP, depart even more significantly from the standard primal-dual interior-point methods. Generally speaking, these methods solve special classes of SDPs, work in either the primal or dual space, operate on sparse matrices (or compact representations of dense matrices), and sacrifice the underlying theoretical guarantee of polynomial convergence for better opportunities to exploit sparsity and structure. As a result of these algorithmic choices as well as the ill-conditioning that is inherent near optimality, these methods typically can compute optimal solutions of low to medium accuracy in a reasonable balance of iterations and time. See [4, 6, 5, 8, 13, 14] for more information on this class of algorithms. So far, no one has proposed a method that possesses theoretical polynomial convergence, can solve any standard-form SDP, works in both the primal and dual spaces, and can aggressively exploit sparsity in all stages of computation, including the complete avoidance of dense matrices. In this paper, we propose such a method and explore its theoretical and practical characteristics. The basic idea of the method presented in this paper is drawn from the recent work of Fukuda et al. [9], in which they show that the theory of partial positive semidefinite matrices can be applied to SDPs to help better take advantage of sparsity. In particular, their “completion method” demonstrates that primal-dual interior-point algorithms can be implemented using a certain “partial” representation of the dense matrix variable X. Computational results given in Nakata et al. [26], which employ the sparse representation of X together with the computation and storage of M in each iteration, indicate the efficiency of the completion method on several classes of problems. The completion method can be viewed as a computational enhancement of an existing primal-dual path-following implementation that is based on the Helmberg– Rendl–Venderbei–Wolkowicz/Kojima–Shindoh–Hara/Monteiro (or HRVW/KSH/M) search direction (which was first defined in Helmberg et al. [15]). From a theoretical point of view, however, the completion method is not known to converge in polynomial time, with the main obstacle being how to measure the proximity of a partial primal-dual solution to the central path. (See the concluding comments of section 5 in [9], where a polynomial potential-reduction algorithm is discussed but the problem of a path-following algorithm is considered open.) In addition, since the completion method employs the Schur complement matrix M directly, there is a practical limitation to the size of SDP that can be solved by this method. Of course, a simple idea to eliminate the direct use of M would be to use an iterative method to solve the Schur complement equation. The method of this paper improves upon the completion method of Fukuda et al. in two primary ways. The first is theoretical: the method is a polynomial-time pathfollowing algorithm based entirely on partial positive semidefinite matrices, where the main idea is a reformulation of the central path that yields search directions in the space of partial matrices and that also motivates a new neighborhood of the central path, which has some critical properties when viewed in the context of matrix completion. The second is practical: when the Schur complement equation in our method is solved using an iterative method, our approach provides even more opportunity to take advantage of the sparsity of the SDP data. In section 5, computational results are given to demonstrate this.

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

141

Computational results are also given comparing our method with two other successful methods: a primal-dual interior-point method that possesses polynomial convergence but computes and stores both X and M ; and a dual-only first-order algorithm that works exclusively with sparse matrices but does not possess polynomial convergence. The overall conclusion of this paper is that our method achieves several advantages that were previously found only in separate algorithms: theoretically strong convergence, applicability to any SDP, a primal-dual framework, and the opportunity to exploit sparsity in all stages of computation. 1.1. Basic notation and terminology. In this paper, , p , and p×q denote p p the typical Euclidean spaces, and S p , S+ , and S++ denote symmetric, symmetric positive semidefinite, and symmetric positive definite matrices, respectively. Lower triangular matrices are denoted by Lp , and those with positive diagonal entries are p signified by Lp++ . Similarly, we define U p and U++ for upper triangular matrices. p For any v ∈  , vi is the ith component of v; for any A ∈ p×q , Aij , Ai· , and A·j T denote the standard subparts of A. For vectors por matrices, the notation · denotes the p×p The transpose, and for any A ∈  , tr(A) = i=1 Aii denotes the trace q p function. standard inner product on p×q is denoted as A • B = tr(AT B) = i=1 j=1 Aij Bij , and the Frobenius norm, which is induced by the inner product •, is denoted A F = (A•A)1/2 . For any A ∈ S p (which necessarily has real eigenvalues), we note that tr(A) also equals the sum of the eigenvalues of A. The maximum and minimum eigenvalues p×q of A are denoted by λmax [A] and λmin [A]. For any A ∈  , we define the 2-norm T of A to be A = λmax [A A]. Some important inequalities are as follows: for all A, B ∈ p×q , A • B ≤ A F B F , and for all A ∈ p×q , A ≤ A F . If a function f defined between p and q is twice differentiable at v ∈ p , then the first derivative of f at v is denoted as the linear operator f  (v) : p → q , which operates on a ∈ p as f  (v)[a]. In addition, if q = 1, then the gradient ∇f (v) ∈ p uniquely satisfies ∇f (v)T a = f  (v)[a] for all a ∈ p . The second derivative of f at v is denoted by the bilinear map f  (v) : p × p → q , which is written as f  (v)[a, b] for all (a, b) ∈ p × p . If b = a, then we write f  (v)[a](2) . If f is also invertible, then f −1 denotes its inverse, and if f is linear, then its adjoint is denoted by f ∗ . In addition, if p = q and f is linear, then f is called positive definite if v T f (v) > 0 for all v = 0. p The function chol : S++ → Lp++ computes the lower Cholesky factor; that is, T chol(S) = L, where S = LL . Letting M −1 denote the matrix inverse of M and inv the matrix inverse function, we have that inv (M )[A] = −M −1 AM −1 . We also let argmax denote the unique optimal solution of a given maximization problem, and we define argmin similarly. The matrix I denotes the identity matrix (of appropriate size), and for any A ∈ p×p , diag(A) extracts the diagonal of A. Also, for any p M ∈ S+ , M 1/2 denotes the matrix square root of M . 2. The partial SDP problem. We consider the standard-form primal SDP problem   ˆ : A(X) ˆ = b, X ˆ ∈ Sn (Pˆ ) min C • X + and its dual ˆ (D)

  n , max bT y : A∗ (y) + Sˆ = C, Sˆ ∈ S+

ˆ S, ˆ y) ∈ S n × S n × m and the data are C ∈ S n , b ∈ m , where the variables are (X, m n and {Ak }k=1 ⊂ S . The symbol A denotes the linear map A : S n → m defined by

142

SAMUEL BURER

ˆ and its adjoint A∗ : m → S n is defined by A∗ (y) = m yk Ak . ˆ k = Ak • X, A(X) k=1 We use similar notation for the sets of primal-dual feasible solutions and primal-dual ˆ and F 0 (Pˆ )×F 0 (D), ˆ respectively— interior feasible solutions as in [22]—F(Pˆ )×F(D) and we also make the following standard assumptions: ˆ the matrices {Ak }m are linearly independent; A1. k=1 ˆ is nonempty. ˆ the set of primal-dual interior feasible solutions F 0 (Pˆ ) × F 0 (D) A2. ˆ and A2, ˆ both (Pˆ ) and (D) ˆ have opIt is well known that, under assumptions A1 ˆ ∗ and (Sˆ∗ , y ∗ ), which are characterized by the equivalent conditions timal solutions X ˆ ∗ • Sˆ∗ is zero and that the matrix product X ˆ ∗ Sˆ∗ is zero. Morethat the duality gap X ˆ ν , Sˆν , yν ), over, for every ν > 0, there exists a unique primal-dual feasible solution (X ˆ Sˆ = νI. The set of all solutions which satisfies the perturbed optimality equation X ˆ ν , Sˆν , yν ) : ν > 0} is known as the central path, and Cˆ serves as the basis Cˆ ≡ {(X ˆ The basic idea is to construct for path-following algorithms that solve (Pˆ ) and (D). ˆ that stays in a neighborhood of the ˆ k , Sˆk , y k )}k≥0 ⊂ F 0 (Pˆ ) × F 0 (D) a sequence {(X ˆ k • Sˆk goes to zero. central path such that the duality gap X A scaled measure of the duality gap that proves useful in the presentation and analysis of path-following algorithms is ˆ ˆ ˆ S) ˆ ∈ S n × S n. ˆ S) ˆ ≡ X •S ∀ (X, µ(X, n ˆ S) ˆ > 0 unless X ˆ Sˆ = 0. Moreover, ˆ S) ˆ ∈ S n × S n , we have µ(X, Note that, for all (X, + + ˆ ν , Sˆν , yν ) on the central path. ˆ ν , Sˆν ) = ν for all points (X µ(X (2.1)

2.1. The positive semidefinite matrix completion. Recently, Fukuda et al. [9] introduced techniques for exploiting sparsity using ideas from the theory of matrix completions. In this section, we recapitulate their main results and introduce corresponding notation that will be used throughout the paper. Let V = {1, . . . , n} denote the row and column indices of an n × n matrix. Also define the aggregate density pattern E of the data {C} ∪ {Ak }m k=1 as follows: E ≡ {(i, j) ∈ V × V : ∃ y ∈ m such that [C − A∗ (y)]ij = 0}. We assume throughout that {(i, i) : i ∈ V } ⊆ E, that is, that E contains all of the diagonal entries. Notice also that E is symmetric in the sense that (i, j) ∈ E if and only if (j, i) ∈ E because, by definition, C − A∗ (y) ∈ S n . (We also remark that the alternative terminology, “aggregate sparsity pattern,” has been used in [9] to describe E.) ˆ y) ∈ F(D), ˆ it is clear from the definition of E that those elements of Given any (S, ¯ ≡ V ×V \E V × V that correspond to the nonzeros of Sˆ are contained in E. Hence, E ˆ ˆ ˆ represents the generic sparsity pattern of the variable S of (D). Unlike S, the variable ˆ of (Pˆ ) has no sparsity in general, but the sparsity represented by E ¯ does affect X ˆ and the the primal problem in terms of evaluation of the objective function C • X ˆ ˆ constraints A(X). In particular, it is not difficult to see that the quantities C • X ˆ ˆ ˆ and Ak • X are dependent upon only those entries Xij of X, where (i, j) ∈ E. In ˆ ij for (i, j) ∈ E ¯ are irrelevant for the objective function and other words, the entries X ˆ ∈ Sn. constraints, but still, they do impact the positive semidefiniteness constraint X + These were precisely the observations that were exploited in [9], as we detail next. Given a symmetric G ⊆ V × V , we define the following subset of S n , which has the density pattern G: ˆ ∈ Sn : M ˆ ij = 0 ∀ (i, j) ∈ G}. S G ≡ {M

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

143

We also define the corresponding operator π G : S n → S G , which performs orthogonal projection onto S G :  ˆ ij , (i, j) ∈ G, M ˆ )]ij = [π G (M 0, (i, j) ∈ G. We then define the following subsets of S G : G n S+ = S G ∩ S+ , n G S++ = S G ∩ S++ , G? ˆ ) = M }, ˆ ∈ S n such that π G (M S+ = {M ∈ S G : ∃ M + G n G G? ˆ ) = M }. ˆ ∈ S++ such that π (M S++ = {M ∈ S : ∃ M G? G? In words, we describe the last two sets defined above as follows: S+ and S++ consist G n n of those matrices in S that can be completed to matrices in S+ and S++ , respectively. We use the question mark (?) notation to illustrate the informal idea that, for G? example, M ∈ S+ is a positive semidefinite matrix except that the entries Mij for G? (i, j) ∈ G have yet to be specified. In addition, an important observation is that S++ G is an open subset of S , which will play an important role when we investigate the G? derivatives of functions defined on S++ . Using these ideas from matrix completion along with the discussion of E above, ˆ are equivalent to the following two it is not difficult to see that problems (Pˆ ) and (D) problems, respectively:     E E? , max bT y : A∗ (y) + S = C, S ∈ S+ . min C • X : A(X) = b, X ∈ S+

It is interesting to note that the above equivalence holds even when E is replaced by any symmetric F ⊇ E. In fact, for technical reasons that will become clear later, it is desirable to apply this idea with an F that satisfies specific structural properties, as discussed next. ˜ It is straightforward to identify a symmetric G ⊆ V × V with a simple graph G on V , and we make the following graph theoretic definitions. G is said to be chordal ˜ is chordal, that is, if every cycle in G ˜ having length greater than three has a if G chord. A perfect elimination ordering for G is an ordering (v1 , . . . , vn ) of the vertices ˜ such that, for each 1 ≤ i ≤ n − 1, the G-neighbors ˜ V = {1, . . . , n} of G of vi in ˜ {vi+1 , . . . , vn } form a clique in G. A fundamental fact (see Fulkerson and Gross [10]) is that G is chordal if and only if it has a perfect elimination ordering. Now let F be a symmetric extension of E, i.e., F ⊇ E, that satisfies two properties: (i) F is chordal; and (ii) the standard ordering (1, . . . , n) is a perfect elimination ordering for F . We then define the pair of SDP problems   F? (P ) min C • X : A(X) = b, X ∈ S+ and (D)

  F max bT y : A∗ (y) + S = C, S ∈ S+ ,

ˆ respectively. It which, from the discussion above, are equivalent to (Pˆ ) and (D), is worthwhile to note that, under the assumption that no numerical cancellations E occur during the calculation of the lower Cholesky factor L ∈ Ln++ of S ∈ S++ ,

144

SAMUEL BURER

the symmetrized density pattern of L yields a chordal extension F of E such that (1, . . . , n) is a perfect elimination ordering. ˆ and A2 ˆ translate to the problems (P ) and How do the standard assumptions A1 (D)? It is not difficult to see that, with the analogous definitions F(P ) × F(D) and F 0 (P ) × F 0 (D), both of the following assumptions hold easily: A1. the matrices {Ak }m k=1 are linearly independent; A2. the set of interior feasible solutions F 0 (P ) × F 0 (D) is nonempty. 2.2. The partial central path. A critical observation is that the central path ˆ That is, ˆ Sˆ = νI implies that X ˆ −1 has the same density pattern as S. equation X ˆ −1 ∈ S F , as was proven by Grone et al. in [12] (and used extensively by Fukuda et X al. in [9]). The following observation represents an important connection between the F? n spaces S++ and S++ . F? ˆ ∈ S n satisfying . Then there exists a unique X Theorem 2.1. Let X ∈ S++ ++ −1 F F ˆ ˆ ∈ S . Moreover, π (X) = X and X   n ˆ = argmax det(Yˆ ) : π F (Yˆ ) = X, Yˆ ∈ S++ . X ˆ the maximum-determinant positive definite completion of X, As in [9], we call X n F? ˆ ˆ from X, that is, and we also let X : S++ → S++ denote the function that yields X ˆ ≡ X(X). ˆ ˆ and the direct correspondence Sˆ = S between the X Using the function X ˆ and (D), the central path equation X ˆ Sˆ = νI can be described spaces of problems (D) ˆ in terms of X and S as X(X)S = νI. For the algorithm presented in this paper, we wish to reformulate the central path ˆ equation X(X)S = νI once more, and so we now introduce some notation and a few small results. We define the following sets of lower triangular matrices, each of which have a density pattern equal to the lower triangular part of F : LF ≡ {L ∈ Ln : Lij = 0 ∀ i ≥ j such that (i, j) ∈ F } , F n LF ++ ≡ L ∩ L++ .

Noting the standard fact that the Cholesky factorization has no fill-in when the associated density pattern is chordal and (1, . . . , n) is a perfect elimination ordering, we F? F −1 F ˆ × S++ , chol(X(X) ) ∈ LF see that, for all (X, S) ∈ S++ ++ and chol(S) ∈ L++ . We thus define (2.2)

F? → LF V : S++ ++ ,

(2.3)

F L : S++ → LF ++ ,

−1 ˆ V (X) ≡ chol(X(X) ),

L(S) ≡ chol(S).

Using these definitions, it is now possible to further reformulate the central path ˆ equation X(X)S = νI. F? F ˆ Proposition 2.2. Let √(X, S) ∈ S++ × S++ and ν > 0. Then X(X)S = νI if −1 and only if V (X) L(S) = ν I. ˆ ≡ X(X), ˆ Proof. Let X V ≡ V (X), and L ≡ L(S). From (2.2) and (2.3), we see −1 ˆ that XS = νI is equivalent to V −1 LLT V −T = νI, √ which itself shows that V L is −1 the lower Cholesky factor of νI, that is, V L = ν I. Proposition 2.2 now allows us to characterize the point (Xν , Sν , yν ) on the central path C corresponding to ν > 0 as the unique solution of the system (2.4a)

A(X) = b,

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

(2.4b) (2.4c)

145

A∗ (y) + S = C, √ V (X)−1 L(S) = ν I.

Having expressed the central path in terms of the variables (X, S), we now wish ˆ S) ˆ and defining to express the duality gap in terms of X and S as well. Given (X, F ˆ ˆ (X, S) = (π (X), S), we have ˆ • S = X • S. ˆ • Sˆ = X ˆ • S = π F (X) X ˆ be any completion of X, we see that the Alternatively, given (X, S) and letting X equality also holds. Hence, X • S is the appropriate measure of the duality gap in the space F(P ) × F(D). Furthermore, from (2.1) and the above equality, we have ˆ S) ˆ = µ(X, S). µ(X, The equation of the previous paragraph introduces a simple but important idea that will be used several times throughout the paper, and so we now give it a verbal description in order to make it simpler to refer to. Given A, B ∈ S F and Aˆ ∈ S n such ˆ • B = A • B, and we say that (A, ˆ A, B) ˆ = A, we see that Aˆ • B = π F (A) that π F (A) are trace-compatible. Given our usage of (2.4c) in this paper, we also wish to define the square root of the scaled duality gap measure µ(·, ·): (2.5)

ρ(X, S) ≡ µ(X, S)1/2

F? F ∀ (X, S) ∈ S+ × S+ .

ˆ Note that, using the fact that (X(X), X, S) are trace-compatible, (2.2), (2.3), and (2.1), along with standard properties of the trace function and the Frobenius norm, we easily have that (2.6)

ρ(X, S) =

V (X)−1 L(S) F √ . n

Equation (2.6) will come in handy throughout the presentation of this paper. 2.3. Nonsingularity of the partial central path. In section 4, we will develop a primal-dual path-following algorithm based on the central path equations (2.4), and so in this subsection we consider the nonsingularity of the Jacobian of the equations defining the central path, which will be necessary for the existence of the SDP direction proposed in section 4. Noting that V (X)−1 is generically dense, it is not difficult to see that the left-hand F? F sides of the equations (2.4) are not “square” since they map a point S++ × S++ × m m F n to  × S × L . As has become standard in the SDP literature, however, we can reconfigure the central path√equations to obtain a square system. In this case, we replace (2.4c) with L(S) − ν V (X) = 0, which yields a system of equations F? F H(X, S, y) = (0, 0, 0), where H : S++ × S++ × m → m × S F × LF is given by 

(2.7)

 A(X) . A∗ (y) H(X, S, y) ≡  √+ S L(S) − ν V (X)

Note that the definition of H is dependent on ν > 0, which we consider fixed in the subsequent discussion. We remark that (2.7) is reminiscent of the central path

146

SAMUEL BURER

ˆ −1 = 0, where the usual complementarity equation X ˆ Sˆ = νI has equation Sˆ − ν X been reconfigured. This formulation is not appropriate for primal-dual path-following ˆ S, ˆ y) ∈ S n ×S n ×m , the resulting Newton direction algorithms because, at any (X, ++ ++ ˆ ∆S, ˆ ∆y) has the property that ∆X ˆ depends only on X ˆ and not on (S, ˆ y); i.e., (∆X, the Newton direction is not “primal-dual.” In fact, an analogous system for partial −1 ˆ ˆ = 0 instead of X(X)S = νI, but this system also suffers matrices uses S − ν X(X) a similar drawback. Hence, we have chosen to model the central path as in (2.7), in part because, in section 4, (2.7) will yield a primal-dual Newton direction due to the special structure of the functions L and V . We now wish to investigate the Jacobian of H and to determine whether it is nonsingular (perhaps under suitable conditions). Since the derivative of H clearly depends on the derivatives of V (·) and L(·), we first describe these in the set of propositions and corollaries below (whose proofs are not difficult and are hence left to the reader). n Proposition 2.3. Let M ∈ S++ . Then the first derivative of chol(·) at M is given by the invertible, linear map chol  (M ) : S n → Ln , which is defined by the following: for all N ∈ S n , K  ≡ chol  (M )[N ] ∈ Ln is the unique solution of the equation N = K  K T + K(K  )T , where K ≡ chol(M ). F Corollary 2.4. Let S ∈ S++ . Then the first derivative of L(·) at S is the  F invertible, linear map L (S) : S → LF , which is defined by the following: for all B ∈ S F , L ≡ L (S)[B] ∈ LF is the unique solution of the equation (2.8)

B = L LT + L(L )T ,

where L ≡ L(S). F? ˆ  (X) : S F → S n is Proposition 2.5. Let X ∈ S++ . Then the linear map X F   n ˆ ≡X ˆ (X)[A] ∈ S uniquely satisfies the defined by the following: for all A ∈ S , X requirements (2.9)

ˆ  ) = A, π F (X

ˆ −1 X ˆ −1 ∈ S F , ˆ X X

ˆ ≡ X(X). ˆ where X F? Corollary 2.6. Let X ∈ S++ . Then the linear map V  (X) : S F → LF is defined by the following: for all A ∈ S F , V  ≡ V  (X)[A] ∈ LF is the unique solution of the equation (2.10)

ˆ −1 X ˆ X ˆ −1 = V  V T + V (V  )T , −X

ˆ = X(X), ˆ ˆ ≡ X ˆ  (X)[A]. In addition, V  (X) is invertible. where V ≡ V (X), X and X Having described the derivatives of V (·) and L(·), we now turn to the derivative of H. From (2.7), we see that the linear map H  : S F × S F × m → m × S F × LF is defined by   A(A) . A∗ (c) (2.11) H  (X, S, y)[A, B, c] =  √+ B  L (S)[B] − ν V (X)[A] In Lemma 2.8, Corollary 2.9, and Theorem 2.10, we show that H  (X, S, y) is invertible as long as the product V (X)−1 L(S) is sufficiently close to some positive multiple of the identity matrix, but first we need a technical lemma whose proof is straightforward that will prove useful below and also throughout the paper.

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

147

√ Lemma 2.7. Let J ∈ Ln . Then J F ≤ J + J T F / 2, with equality holding if and only if J is strictly lower triangular. F? F Lemma 2.8. Let (X, S) ∈ S++ × S++ . Define V ≡ V (X) and L ≡ L(S), and let   V (X) and L (S) be as in Corollaries 2.6 and 2.4, respectively. Then, for all A ∈ S F and for all β > 0, (2.12)

√ √ 2



−A • V  (X)−1 ◦ L (S) [A] ≥ L−1 AL−T F (1 + 2)β 3 − 2(β + Q F )3 ,

where Q ≡ V −1 L − βI. Proof. With L ≡ L (S)[A], we see from (2.8) that (2.13)

A = L LT + L(L )T .

ˆ ≡ X(X) ˆ ˆ ≡ X ˆ  (X)[V  (X)−1 [L ]], we see from (2.9) and (2.10) Also defining X and X that (2.14)

ˆ  ) = V  (X)−1 [L ], π F (X

ˆ −1 X ˆ X ˆ −1 = L V T + V (L )T . −X

ˆ  , V  (X)−1 [L ], A), we see Now using (2.14), (2.2), and the trace-compatibility of (X that the left-hand side of (2.12) equals

ˆ ˆ = A • X ˆ L V T + V (L )T X −A • V  (X)−1 [L ] = −A • X (2.15)

= 2 A • V −T V −1 L V −1 .

˜ ≡ L−1 L so that A˜ = L ˜+L ˜ T by Introducing the notation A˜ ≡ L−1 AL−T and L (2.13) and using the definition of Q, we observe that

(2.16)

˜ −1 L) A • V −T V −1 L V −1 = A˜ • (V −1 L)T (V −1 L)L(V T ˜ (Q + βI) . = A˜ • (Q + βI) (Q + βI) L

Expanding the right-hand argument of the inner-product just obtained, we see that T ˜ (Q + βI) = QT QLQ ˜ + βQT QL ˜ + βQT LQ ˜ + β 2 QT L ˜ (Q + βI) (Q + βI) L 2 2 3 ˜ + β QL ˜ + β LQ ˜ + β L. ˜ + βQLQ (2.17)

˜ and using Now combining (2.15), (2.16), and (2.17), applying Lemma 2.7 with J = L, standard properties of the trace function and the Frobenius norm, we have ˜ + 2 β A˜ • QT QL ˜ + 2 β A˜ • QT LQ ˜ + 2 β 2 A˜ • QT L ˜ −A • V  (X)−1 [L ] = 2 A˜ • QT QLQ 2 2 3 ˜ + 2 β A˜ • QL ˜ + 2 A˜ • β LQ ˜ + 2 β A˜ • L ˜ + 2 β A˜ • QLQ √ √ √ ˜ 2 − 2 A ˜ 2 Q 3 −3 2β A ˜ 2 Q 2 −3 2β 2 A ˜ 2 Q F ≥ β 3 A F F F F F F √ √



2 3 3 2 2 ˜ = A F (1 + 2)β − 2 Q F + 3 β Q F + 3 β Q F + β 3 √ √

˜ 2 (1 + 2)β 3 − 2 ( Q F + β)3 , = A F which proves the lemma. (Note that, within the inequality, we have also used the ˜ 2 .) ˜ = β 3 A equality 2 β 3 A˜ • L F

148

SAMUEL BURER

F? F Corollary 2.9. Let (X, S) ∈ S++ × S++ . Define V ≡ V (X) and L ≡ L(S),   and let V (X) and L (S) be as in Corollaries 2.6 and 2.4, respectively. Then, if V −1 L − β I F ≤ β/6 for some β > 0, the operator −V  (X)−1 ◦ L (S) is positive definite. Proof. By (2.12), −V  (X)−1 ◦ L (S) is positive definite as long there exists some β such that √ √ 3 (1 + 2)β 3 − 2 ( Q F + β) > 0,

where Q ≡ V −1 L − βI, which is equivalent to √

 Q F < 2−1/6 (1 + 2)1/3 − 1 β. Since the coefficient in front of β is approximately 0.1951, the result follows. Corollary 2.9 now allows us to prove that H  is nonsingular under certain conditions. F? F Theorem 2.10. Let (X, S, y) ∈ S++ × S++ × m , and suppose there exists some −1 β > 0 such that V L − β I F ≤ β/6, where V ≡ V (X) and L ≡ L(S). Then the linear map H  : S F × S F × m → m × S F × LF defined by (2.11) is invertible. Proof. To show that H  is invertible, we show that (0, 0, 0) is the only solution of the equation H  (X, S, y)[A, B, c] = (0, 0, 0), where (A, B, c) ∈ S F × S F × m . As is standard in the SDP literature, it is not difficult to see that this equation can be reduced to the m × m system

A ◦ V  (X)−1 ◦ L (S) ◦ A∗ (c) = 0.

Since V  (X)−1 ◦ L (S) is negative definite by Corollary 2.9 and since A∗ is injective by assumption A1, we conclude that c = 0, which immediately implies that (A, B) = (0, 0), as desired. The above theorem will help us establish the existence of the Newton direction that will be used as the basis for our algorithm to solve problems (P ) and (D) in section 4. Moreover, the theorem motivates the need for the neighborhood condition V −1 L − βI F ≤ β/6 that we will formally introduce next in section 3. 3. Technical results. In this section, we prove several results that will be used for establishing the polynomial convergence of the algorithm that we propose in section 4. 3.1. Properties of the partial central path map. Given γ ∈ [0, 1/6], we define a feasible neighborhood of the central path as follows: (3.1)   N (γ) ≡ (X, S, y) ∈ F 0 (P ) × F 0 (D) : V (X)−1 L(S) − ρ(X, S) I F ≤ γ ρ(X, S) . Clearly N (γ) is nonempty as (Xν , Sν , yν ) ∈ N (γ) for all ν > 0. Note also, by Theorem 2.10 with β = ρ(X, S) as well as the fact that γ ≤ 1/6, that H  (X, S, y) is invertible for all (X, S, y) ∈ N (γ). We now wish to establish several fundamental results concerning both the central path function V (X)−1 L(S) and the neighborhood N (γ). The first result establishes how the neighborhood condition can be restated simply as an inequality on tr(V (X)−1 L(S)).

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

and

149

Proposition 3.1. (X, S, y) ∈ N (γ) if and only if (X, S, y) is primal-dual feasible

(3.2)



tr V (X)−1 L(S) ≥ Γ ρ(X, S),

F? F × S++ , it holds that where Γ ≡ n − γ 2 /2. Moreover, for any (X, S) ∈ S++ tr(V (X)−1 L(S)) ≤ n ρ(X, S). Proof. Letting V ≡ V (X), L ≡ L(S), and ρ ≡ ρ(X, S) and using (2.6), we have

V −1 L −ρI 2F = (V −1 L− ρI) • (V −1 L − ρI) = V −1 L • V −1 L − 2ρV −1 L • I +ρ2 I • I = V −1 L 2F − 2 ρ tr(V −1 L) + n ρ2 = 2 n ρ2 − 2 ρ tr(V −1 L), from which the first statement of the proposition follows, using (3.1). The second statement of the proposition also follows from the above equations, which imply V −1 L − ρ I 2F /(2 ρ) = n ρ − tr(V −1 L). The next proposition establishes some results concerning the second derivative of ˆ −1 L( ˆ S), ˆ which, as described in (3.3), is analogous to V (X)−1 L(S) the function Vˆ (X) n n but is defined on all of S++ × S++ . n ˆ : S n → Ln be defined by ˆ Proposition 3.2. Let V : S++ → Ln++ and L ++ ++ (3.3)

ˆ = chol(X ˆ −1 ), Vˆ (X)

ˆ S) ˆ = chol(S) ˆ L(

n ˆ Sˆ ∈ S++ . ∀ X,

n n ˆ S) ˆ = Vˆ (X) ˆ −1 L( ˆ S) ˆ satisfies Then the function Φ : S++ × S++ → Ln++ defined by Φ(X, the following: n ˆ is a strictly concave function; , tr(Φ(·, S)) (i) for any fixed Sˆ ∈ S++ n ˆ B ˆ∈S , (ii) for all A,

(3.4)



ˆ S)[ ˆ A, ˆ B] ˆ (2) ≤ √1 Vˆ −1 L ˆ −1 B ˆ Vˆ T AˆVˆ F + L ˆL ˆ −T F 2 , Φ (X, F 2

ˆ and L ˆ ≡ L( ˆ S). ˆ where Vˆ ≡ Vˆ (X) Proof. In certain places throughout this proof, we will avoid the use of the hat (ˆ·) notation, which indicates fully dense matrices, in order to simplify the notation; the meanings of the expressions will be clear from the context. Also to simplify notation, ˆ : S n → U n as being defined by U ˆ (X) ˆ = Vˆ (X) ˆ −T . With this definition, we define U ++ ++ Tˆ ˆ ˆ ˆ ˆ ˆ ˆ T. ˆ ˆ ˆ ˆ we see that Φ(X, S) = U (X) L(S) and that X = U (X)U (X) To prove both (i) and (ii), we consider the second derivative of Φ. Using (3.3) along with arguments similar to those found in the derivation of V  (X) and L (S) in section 2.3, we see that, for all A, B ∈ S n , (3.5)

ˆ S)[A, ˆ Φ ≡ Φ (X, B] = (U  )T L + U T L ,

ˆ (X) ˆ and L ≡ L( ˆ S) ˆ and U  ≡ U ˆ  (X)[A] ˆ ˆ  (S)[B] ˆ where U ≡ U ∈ U n and L ≡ L ∈ Ln are, respectively, the unique solutions of the equations (3.6)

A = U  U T + U (U  )T ,

B = L LT + L(L )T .

Differentiating once again, we see that (3.7)

ˆ S)[A, ˆ Φ ≡ Φ (X, B](2) = (U  )T L + 2 (U  )T L + U T L ,

150

SAMUEL BURER

(2) (2) ˆ  (S)[B] ˆ ˆ  (X)[A] ˆ ∈ U n and L ≡ L ∈ Ln are, respectively, the where U  ≡ U unique solutions of the equations

(3.8)

0 = U  U T + 2 U  (U  )T + U (U  )T ,

0 = L LT + 2 L (L )T + L(L )T .

ˆ where Sˆ is fixed, it is We now prove (i). Letting h denote the function tr(Φ(·, S)), (2) ˆ = tr((U  )T L), where U  and L are defined straightforward to verify that h (X)[A] as above. From (3.8), we have (3.9)

U −1 U  + (U  )T U −T = −2(U −1 U  )(U −1 U  )T ,

which implies that diag(U −1 U  ) ≤ 0, since the right-hand side of (3.9) is negative n semidefinite, which in turn implies that diag(U  ) ≤ 0, since U −1 ∈ U++ . It follows   ˆ (2)  that h (X)[A] < 0 unless diag(U ) = 0. So suppose diag(U ) = 0. Then, by (3.9), we see

⇐⇒ U −1 U  = 0 ⇐⇒ U  = 0 ⇐⇒ A = 0. diag (U −1 U  )(U −1 U  )T ) = 0 (2) ˆ < 0. This proves that h is strictly Thus, we conclude that for all A = 0, h (X)[A] concave. We now prove (ii). Using (3.5)–(3.8), Lemma 2.7, and standard properties of the 2-norm and the Frobenius norm, we have

Φ F ≤ (U  )T L F + 2 (U  )T L F + U T L F

≤ U T L U −1 U  F + 2 U −1 U  F L−1 L F + L−1 L F √ ≤ U T L ( 2 (U −1 U  )(U −1 U  )T F + 2 U −1 U  F L−1 L F √ + 2 (L−1 L )(L−1 L )T F ) √ √ ≤ U T L ( 2 U −1 U  2F + 2 U −1 U  F L−1 L F + 2 L−1 L 2F )  1 ≤ U T L √ U −1 AU −T 2F + U −1 AU −T F L−1 BL−T F 2  1 −1 −T 2 √ + L BL F 2 2

1 ≤ √ U T L U −1 AU −T F + L−1 BL−T F . 2 ˆ (·). The result now follows from the definition of U The next result plays a crucial role in the analysis of section 4. In words, the F? F theorem says that, given a fixed pair (X, S) ∈ S++ × S++ , the maximum-determinant ˆ completion X(X) also maximizes the function tr(Vˆ (·)−1 L(S)) among all positive definite completions of X. F? F n Theorem 3.3. Let (X, S) ∈ S++ × S++ , and let Vˆ : S++ → Ln++ be defined as in Proposition 3.2. Then 

 n ˆ X(X) = argmax tr Vˆ (Yˆ )−1 L(S) : π F (Yˆ ) = X, Yˆ ∈ S++ . ˆ Proof. Noting that L(S) = L(S) and letting L ≡ L(S), we have from Proposition 3.2 that h(Yˆ ) ≡ tr(Vˆ (Yˆ )−1 L) is a strictly concave function of Yˆ . Hence, since the constraints of the optimization problem under consideration are convex, any stationary point of this problem is a unique global maximum, and so we prove the theorem ˆ ≡ X(X) ˆ by showing that X is a stationary point.

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

151

The derivative h (Yˆ ) : S n →  of h at Yˆ is given by

ˆ = −tr Vˆ −1 Vˆ  Vˆ −1 L h ≡ h (Yˆ )[A] (3.10) ˆ is the unique solution of the for all Aˆ ∈ S n , where Vˆ ≡ Vˆ (Yˆ ) and Vˆ  ≡ Vˆ  (Yˆ )[A] system −Yˆ −1 AˆYˆ −1 = Vˆ  Vˆ T + Vˆ (Vˆ  )T .

(3.11)

Premultiplying (3.11) by Vˆ −1 , postmultiplying by Vˆ −T , and using the fact that Yˆ −1 = Vˆ Vˆ T , we see that −Vˆ T AˆVˆ = Vˆ −1 Vˆ  +(Vˆ  )T Vˆ −T , which shows that − diag(Vˆ T AˆVˆ ) = n be the diagonal 2 diag(Vˆ −1 Vˆ  ). Applying this equality to (3.10) and letting W ∈ S++ −1 ˆ matrix defined by Wjj = Vjj Ljj /2, we deduce that h =

1 ˆ diag(Vˆ T AˆVˆ )T diag(Vˆ −1 L) = W • Vˆ T AˆVˆ = Vˆ W Vˆ T • A, 2

which implies that ∇h(Yˆ ) = Vˆ W Vˆ T . Let F¯ ≡ V × V \ F . Considering that the variables Yˆij for (i, j) ∈ F can be eliminated by the equation π F (Yˆ ) = X, we see that a stationary point Yˆ of the ¯ optimization problem satisfies π F (∇h(Yˆ )) = 0, that is, ∇h(Yˆ ) = Vˆ W Vˆ T ∈ S F . n is diagonal, Vˆ W Vˆ T ∈ S F is equivalent to Vˆ Vˆ T = Yˆ −1 ∈ S F , Since W ∈ S++ ˆ satisfies uniquely by Theorem 2.1. So X ˆ is a which is precisely the condition that X stationary point of the optimization problem, which completes the proof. 3.2. Miscellaneous results. In this subsection, we catalog a few results that will prove useful in section 4. The first two results give details about the system H  (X, S, y)[A, B, c] = (0, 0, R), where H  is given as in (2.11). F F? Lemma 3.4. Let (X, S, y) ∈ S++ × S++ × m and R ∈ LF be given, and suppose F F m  that (A, B, c) ∈ S × S ×  satisfies H (X, S, y)[A, B, c] = (0, 0, R). Then (3.12)

A • B = (V −1 V  + (V  )T V −T ) • V −1 BV −T = 0,

where V ≡ V (X) and V  ≡ V  (X)[A]. Proof. From (2.11), we see that A(A) = 0 and A∗ (c) + B = 0. Hence, A • B = ˆ  (X)[A] and using ˆ ≡ X(X), ˆ ˆ ≡ X −A • A∗ (c) = −cT A(A) = 0. Also, letting X X  ˆ , A, B), we see that (2.9), (2.10), (2.2), and the trace-compatibility of (X ˆ  • B = −X(V ˆ  V T + V (V  )T )X ˆ • B = (V −1 V  + (V  )T V −T ) • V −1 BV −T , A•B =X which completes the proof. −1 √ V L− √ Proposition 3.5. Let the conditions of Lemma 3.4 hold, and define√Q ≡ ν I, where L ≡ L(S) and ν > 0 is as in (2.7). Suppose that Q F < ν/ 2. Then (3.13) (3.14)

√ −1 −1 1 √ ν − 2 Q F V (RLT + LRT )V −T F , V −1 V  F ≤ √ 2ν √ √ √ V −1 BV −T F ≤ ν( ν − 2 Q F )−1 V −1 (RLT + LRT )V −T F .

√ √ From (2.11), we have B = L (S)−1 [R + ν V  ] = (R + ν V  )LT + L(R + √ Proof. ˜ ≡ RLT + LRT , ν V  )T . Hence, letting R √ √ ˜ −T + ν V −1 V  (V −1 L)T + ν V −1 L(V −1 V  )T . V −1 BV −T = V −1 RV

152

SAMUEL BURER

It follows that √

1 √ V −1 BV −T − ν V −1 V  + (V  )T V −T ν 1 ˜ −T + V −1 V  QT + Q(V −1 V  )T , = √ V −1 RV ν which, from (3.12), implies   √ 1 max √ V −1 BV −T F , ν V −1 V  + (V  )T V −T F ν 1 √ −1  −1 −T  T −T √ ≤ V (3.15) BV − ν V V + (V ) V ν F 1 −1 ˜ −T −1  ≤ √ V RV F + 2 Q F V V F . ν Applying Lemma 2.7 with J = V −1 V  to (3.15), we see that √ √ √ 1 ˜ −T F , 2 ν − 2 Q F V −1 V  F ≤ √ V −1 RV ν which proves (3.13). To prove (3.14), we combine (3.13) and (3.15) to obtain   √ 1 2 Q F 1 −1 −T −1 ˜ −T √ V BV F ≤ √ V RV F 1 + √ √ ν ν ν − 2 Q F √

√ −1 −1 ˜ −T F . = ν − 2 Q F V RV We next establish several inequalities that relate to bounds on the maximum and F? F ˆ × S++ . minimum eigenvalues of X(X)S for (X, S) ∈ S++ F? F ˆ ≡ X(X), ˆ V ≡ V (X), Proposition 3.6. Let (X, S) ∈ S++ × S++ , and define X −1 L ≡ L(S), and ρ ≡ ρ(X, S). Suppose V L − ρ I ≤ γ ρ for some γ ≥ 0 satisfying γ 2 + 2γ < 1. Then the following hold: (i) V −1 L ≤ (γ + 1)ρ; (ii) L−1 V ≤ (1 − (γ 2 + 2γ))−1/2 ρ−1 . Proof. Let Q ≡ V −1 L − ρ I, and note that Q 2 = QT Q = LT V −T V −1 L − ρV −1 L − ρLT V −T + ρ2 I = LT V −T V −1 L − ρ2 I − ρ(Q + QT ) . Hence, from (2.2), (2.3), standard properties of · , and the assumptions of the proposition, ˆ − ρ2 I = LT V −T V −1 L − ρ2 I XS = LT V −T V −1 L − ρ2 I − ρ(Q + QT ) + ρ(Q + QT ) (3.16)

≤ Q 2 + 2 ρ Q ≤ (γ 2 + 2γ)ρ2 .

ˆ • S = X • S = nρ2 , that is, nρ2 equals the Recall from the definition of ρ that X ˆ ˆ ≥ ρ2 and that λmin [XS] ˆ ≤ ρ2 . sum of the eigenvalues of XS. It follows that λmax [XS] Hence, ˆ − ρ2 I = max{λmax [XS] ˆ − ρ2 , ρ2 − λmin [XS]}. ˆ XS

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

153

Thus, using (3.16), (2.2), and (2.3), we have ˆ ˆ ≤ XS ˆ − ρ2 I + ρ2 ≤ (γ 2 + 2γ + 1)ρ2 , = λmax [XS] V −1 L 2 = LT V −T V −1 L = XS which proves (i). Similarly, ˆ −1 ˆ −1 = λmin [XS] L−1 V 2 = V T L−T L−1 V = S −1 X



ˆ − ρ2 I −1 ≤ ρ2 − (γ 2 + 2γ)ρ2 −1 , ≤ ρ2 − XS which proves (ii). We remark that the condition (X, S, y) ∈ N (γ) implies that the hypotheses of Proposition 3.6 hold since V −1 L − ρ I ≤ V −1 L − ρ I F . Finally, we state the following proposition, which follows as a direct extension of Lemmas 3.4 and 3.5 of Monteiro and Tsuchiya [24]. F? F ¯ S¯ ∈ S n . Define X ˆ ≡ X(X), ˆ Proposition 3.7. Let (X, S) ∈ S++ × S++ and X, V ≡ V (X), and L ≡ L(S). Suppose that there exists some τ ∈ (0, 1) such that   ¯ − X)V ˆ , L−1 (S¯ − S)L−T ≤ τ. max V T (X n ¯ S¯ ∈ S++ , Then X,

  ¯ L ¯ −1 L ≤ √ 1 , max V −1 V¯ , V¯ −1 V , L−1 L , 1−τ

and

¯ ≤ V¯ −1 L

V −1 L , 1−τ

¯ = chol(S). ¯ ¯ −1 ) and L where V¯ = chol(X 4. The partial primal-dual algorithm. The algorithm described in this section is based on the same ideas that typical path-following algorithms are based on— namely, the use of a Newton direction to decrease the duality gap, and a bound on the step-size to ensure proximity to the central path. Using these ideas, we establish the polynomiality of the algorithm in Theorem 4.9. Suppose that (X, S, y) ∈ N (γ), where γ ∈ [0, 1/6]. Then, for a fixed constant 0 ≤ σ < 1, we define the Newton direction (∆X, ∆S, ∆y) at (X, S, y) as the solution of the system (4.1)

H  (X, S, y)[∆X, ∆S, ∆y] = (0, 0, σ ρ V − L),

where ν ≡ ρ2 is implicit in the definition of H, ρ ≡ ρ(X, S), V ≡ V (X), and L ≡ L(S). Note that (∆X, ∆S, ∆y) is well defined by Theorem 2.10. We also make the following F? F definitions for all α ∈  such that (Xα , Sα ) ∈ S+ × S+ : (Xα , Sα , yα ) ≡ (X, S, y) + α(∆X, ∆S, ∆y), µα ≡ µ(Xα , Sα ), ρα ≡ ρ(Xα , Sα ). We have the following proposition. Proposition 4.1. Let (X, S, y) ∈ N (γ), and define µ ≡ µ(X, S), ρ ≡ ρ(X, S), and ζ = (∆X • S + X • ∆S)(nµ)−1 . Then for all α ∈  such that (Xα , Sα , yα ) ∈ F? F S+ × S+ , (4.2)

µα = µ(1 + α ζ),

 αζ . ρα ≤ ρ 1 + 2

154

SAMUEL BURER

Proof. Note that ∆X • ∆S = 0 from (3.12). Using (2.1), we thus have µα =

X • S + αζnµ (X + α∆X) • (S + α∆S) = µ + αζµ, = n n

which proves the equality. Similarly, using (2.5), we see that  αζ ρα = ρ(1 + α ζ)1/2 ≤ ρ 1 + , 2 where the inequality follows from the real-number relation 1 + 2x ≤ 1 + 2x + x2 . With regard to the above proposition, it is important to mention that we anticipate that ζ is negative due to the fact that σ < 1, which would imply that µα < µ and ρα < ρ; that is, the duality gap decreases along the direction (∆X, ∆S, ∆y). This, however, must be proven under certain assumptions, as will be shown below. For the discussion of the generic algorithm next, it would be useful for the reader to assume that ζ < 0. We now state the generic primal-dual path-following algorithm that we study in this section. Algorithm SDP. Let ε > 0, γ ∈ [0, 1/6], σ ∈ [0, 1), and (X 0 , S 0 , y 0 ) ∈ N (γ) be given. Set k = 0. Repeat until X k • S k ≤ ε do 1. Set (X, S, y) ≡ (X k , S k , y k ) and ρ ≡ ρ(X, S). 2. Compute the Newton direction (∆X, ∆S, ∆y) at (X, S, y). 3. Choose α ≥ 0 such that (Xα , Sα , yα ) ∈ N (γ). 4. Set (X k+1 , S k+1 , y k+1 ) = (Xα , Sα , yα ) and increment k by 1. End The remainder of this section is devoted to determining constants γ and σ and a constant step-size α such that Algorithm SDP terminates within a polynomial number of loops, where the polynomial depends on n, ε, and X√0 • S 0 . To this end, we introduce constants γ ≥ 0, δ ∈ [0, n), and τ ∈ (0, 1) satisfying (4.3) and (4.4)

γ≤

1 , 6

1 0 < 1 − (γ 2 + 2γ) ≤ √ 2



1/2

2 (γ + 1) δ 2 + γ 2 ≤ 1 − 2 γ 1 − (γ 2 + 2 γ) τ.

Note that, for example, the triple (γ, δ, τ ) = (0.138, 0.138, 0.79) satisfies (4.3) and (4.4) irrespective of the value of n. We also define (4.5)

δ σ ≡1− √ . n

In addition, we make the mild assumption that n is large enough so that √ δ n ≥ τ γ(γ + 1), (4.6) (4.7) σ ≥ τ θ, √ γ 2 (σ − τ θ)(1 − τ ) √ n≥ (4.8) , 2 2 τ 2 (γ + 1) where θ ≡ 1 + γ(γ + 1)/(2n). In fact, taking (γ, δ, τ ) = (0.138, 0.138, 0.79) as above shows that (4.6)–(4.8) are also satisfied for all values of n.

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

155

Lemma 4.2. Define V ≡ V (X), L ≡ L(S), and ρ ≡ ρ(X, S), and suppose that F? F × S++ satisfies V −1 L − ρI F ≤ γρ. Then (X, S) ∈ S++ σρI − V −1 L F ≤ (δ 2 + γ 2 )1/2 ρ. Proof. Using the definition of the Frobenius norm, we have σρI − V −1 L 2F = ρI − V −1 L + (σ − 1)ρI 2F = ρI − V −1 L 2F + 2(σ − 1)ρ(ρI − V −1 L) • I + (σ − 1)2 nρ2 ,

= ρI − V −1 L 2F + 2(σ − 1)ρ nρ − tr(V −1 L) + (σ − 1)2 nρ2 ≤ γ 2 ρ2 + (σ − 1)2 nρ2 , where the inequality follows by assumption and by the second statement of Proposition 3.1. Since (σ − 1)2 n = δ 2 by (4.5), the result follows. Proposition 4.3. Let (X, S, y) ∈ N (γ). Then   ˆ  V F , L−1 ∆SL−T F ≤ τ, max V T X ˆ ≡ X ˆ  (X)[∆X]. As a result, X ˆ +X ˆ  ∈ S n and S + ∆S ∈ S F , where where X ++ ++ ˆ ≡ X(X). ˆ X Proof. Let V ≡ V (X), L ≡ L(S), and ρ ≡ ρ(X, S). Also, letting V  ≡ V  (X)[∆X] and using (2.10) and (2.2), we see that ˆ  V = V −1 V  + (V  )T V −T −V T X

=⇒

ˆ  V F ≤ 2 V −1 V  F . V T X

Note that the hypotheses of Proposition 3.5 hold, with R ≡ σρV − L, ν ≡ ρ2 , and Q ≡ V −1 L − ρI. Hence, using (3.13), standard properties of norms, (3.1), Lemma 4.2, Proposition 3.6(i), (4.3), and (4.4), we have √ √ −1 −1 2

−1  2 V V F ≤ ρ − 2 Q F V ((σρV − L)LT + L(σρV − L)T )V −T F ρ √ √ −1 −1 2 2

ρ − 2γ ρ ≤ V (σρV − L)LT V −T F ρ √ √ −1 2 2

σρI − V −1 L F V −1 L ≤ 2 1 − 2γ ρ √ −1 2 √

1/2 δ + γ2 (γ + 1) ≤ 2 2 1 − 2γ √

1/2

−1 −1 2 1 − 2γ δ + γ2 (4.9) (γ + 1) ≤ τ. ≤ 2 1 − (γ 2 + 2 γ) Now using (3.14) and Proposition 3.6(ii) along with similar arguments, we have L−1 ∆SL−T F ≤ L−1 V 2 V −1 ∆SV −T F √ −1 −1

≤ L−1 V 2 ρ ρ − 2 Q F V ((σρV − L)LT + L(σρV − L)T )V −T F √ −1 2 1/2

−1

1 − 2γ δ + γ2 ≤ 2 1 − (γ 2 + 2γ) (γ + 1) ≤ τ, which concludes the proof of the first statement of the proposition. The second ¯ =X ˆ +X ˆ  and S¯ = S + ∆S (and the statement follows from Proposition 3.7, with X F fact that ∆S ∈ S ). Corollary 4.4. Let (X, S, y) ∈ N (γ). Then for all 0 ≤ α ≤ 1, (Xα , Sα ) ∈ F? F S++ × S++ .

156

SAMUEL BURER

F Proof. By Proposition 4.3, we see that S1 = S + ∆S ∈ S++ . Since Sα is a convex F combination of S and S1 for any 0 ≤ α ≤ 1, it follows that Sα ∈ S++ . Likewise,  n ˆ ˆ X + αX ∈ S++ for all 0 ≤ α ≤ 1. Noting that, by Theorem 2.1 and (2.9),

ˆ + αX ˆ  ) = X + α∆X = Xα , π F (X ˆ + αX ˆ  is a positive definite completion of Xα , which implies Xα ∈ we see that X F? S++ . Lemma 4.5. Let (X, S, y) ∈ N (γ), and define V ≡ V (X), L ≡ L(S), ρ ≡ ρ(X, S), V  ≡ V  (X)[∆X], and L ≡ L (S)[∆S]. Also let Γ be defined as in Proposition 3.1. Then  

−1  1 −1 −1  −1 (4.10) V L • V L − V V V L ≤ (σ − 1)n + τ (γ + 1)γ ρ2 , 2



Γ −1 V L • V −1 L − V −1 V  V −1 L − ρ tr V −1 V  (ρI − V −1 L) n    1 1 2 (4.11) ≤ (σ − 1)Γ + τ γ 1 + γ(γ + 1) ρ2 . 2 2n Proof. We first prove some simple bounds that will allow us to prove (4.10) and

(4.11) more easily. Defining P ≡ V −1 V  ρI − V −1 L and using standard properties of tr(·), · , and · F along with (3.1), Proposition 3.6(i), and (4.9) (which appears inside the proof of Proposition 4.3), we see that |V −1 L • P | = |(V −1 V  )T (V −1 L) • (ρI − V −1 L)| ≤ V −1 V  F V −1 L ρI − V −1 L F 1 (4.12) ≤ τ (γ + 1)γρ2 . 2 Similarly, (4.13)

|(V −1 L − ρI) • P | ≤ V −1 V  F V −1 L − ρI 2F ≤

1 2 2 τγ ρ . 2

Now, the equation (4.1) for the Newton direction (∆X, ∆S, ∆y) shows that L − ρV = σρV −L, which implies V −1 L = σρI−V −1 L+ρV −1 V  . Substituting for V −1 L in the left-hand side of (4.10) and using (2.6), the second statement of Proposition 3.1, and (4.12), we see that



V −1 L • V −1 L − V −1 V  V −1 L = V −1 L • σρI − V −1 L + ρV −1 V  − V −1 V  V −1 L 

= σρ tr(V −1 L) − V −1 L 2F + V −1 L • P 1 ≤ (σ − 1)nρ2 + τ (γ + 1)γρ2 , 2 as desired. Using similar arguments along with (4.13) and the fact that Γ/n = 1 − γ 2 /(2n), (4.11) is proven as follows:



Γ −1 V L • V −1 L − V −1 V  V −1 L − ρ tr V −1 V  (ρI − V −1 L) n   Γ Γ −1 Γ V L − ρI • P = σρ tr(V −1 L) − V −1 L 2F + n n n

Γ Γ γ 2 −1 = σρ tr(V −1 L) − V −1 L 2F + V −1 L − ρI • P − V L•P n n 2n 1 1 ≤ Γσρ2 − Γρ2 + τ γ 2 ρ2 + τ (γ + 1)γ 3 ρ2 . 2 4n

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

157

Proposition 4.6. Let (X, S, y) ∈ N (γ), and define µ ≡ µ(X, S). Then ζ≡

δ ∆X • S + X • ∆S ≤ −√ . nµ n

Hence, µ(·, ·) and ρ(·, ·) decrease from (X, S, y) along the direction (∆X, ∆S, ∆y). ˆ ≡ X(X), ˆ ˆ ≡ X ˆ  (X)[∆X], V ≡ V (X), V  ≡ V  (X)[∆X], L ≡ Proof. Let X X   L(S), L ≡ L (S)[∆S], and ρ ≡ ρ(X, S). Then using Theorem 2.1, (2.9) with A = ∆X, the fact that S ∈ S F and ∆S ∈ S F , (2.10) with A = ∆X, (2.8) with B = ∆S, (2.2), and (2.3), we see that ˆ • S + X ˆ • ∆S ∆X • S + X • ∆S = X

ˆ  V T + V (V  )T )X ˆ • LLT +V −T V −1 • (L LT + L(L )T ) = − X(V ˆ V T X ˆ • LLT = 2 V −T V −1 • L LT − 2 XV (4.14)

= 2 V −T V −1 • L LT − 2 V −T V −1 V  V −1 • LLT

= 2 V −1 L • V −1 L − V −1 V  V −1 L   1 ≤ 2 (σ − 1)n + τ (γ + 1)γ ρ2 , 2

where the inequality follows from (4.10). By the definition of ζ, the inequality just proven, (4.5), and (4.6), we have ζ ≤ 2(σ − 1) + τ (γ + 1)

2δ γ δ δ ≤ −√ + √ = −√ . n n n n

The conclusion that the duality gap measures µ(·, ·) and ρ(·, ·) both decrease along (∆X, ∆S, ∆y) can now be seen from Proposition 4.1. Proposition 4.7. Let (X, S, y) ∈ N (γ), and define the constant Γ as in Propoˆ as in Proposition 3.2. Suppose that α ≥ 0 sition 3.1 and the functions Φ, Vˆ , and L satisfies the inequality (4.15)

α≤

γ 2 (σ − τ θ)(1 − τ ) 1 √ , √ n 2 2 τ 2 (γ + 1)

where θ ≡ 1 + γ(γ + 1)/(2n). Then ˆ + αX ˆ  , Sα )) = tr(Vˆ (X ˆ + αX ˆ  )−1 L(S ˆ α )) ≥ Γρα , tr(Φ(X ˆ ≡ X(X) ˆ ˆ ≡ X ˆ  (X)[∆X]. where X and X Proof. We first remark that the right-hand side of (4.15) is nonnegative by (4.7), and is less than or equal to 1 by (4.8), and thus 0 ≤ α ≤ 1, which clearly shows ˆ + αX ˆ  ∈ S n and Sα ∈ S n by Proposition 4.3. Hence, Φ is defined at that X ++ ++  ˆ ˆ (X + αX , Sα ). ˆ Vˆ  ≡ Vˆ  (X)[∆X], ˆ ˆ ≡ L(S), ˆ ˆ≡L ˆ  (S)[∆S]. In addition, Define Vˆ ≡ Vˆ (X), L and L    define V ≡ V (X), V ≡ V (X)[∆X], L ≡ L(S), and L ≡ L (S)[∆S]. Noting that ˆ ˆ is identical to L(·) on the domain S F , we see that V (·) = Vˆ (X(·)) and that L(·) ++ (4.16)

V = Vˆ ,

V  = Vˆ  ,

ˆ L = L,

ˆ. L = L

The Taylor integral formula implies that (4.17)

ˆ + αX ˆ  , Sα ) = Φ(X, ˆ S) + α Φ (X, ˆ S)[X ˆ  , ∆S] + α2 Tα , Φ(X

158

SAMUEL BURER

where (4.18)

 Tα ≡

0

1

ˆ  , ∆S](2) dt. ˆ + tαX ˆ  , Stα )[X (1 − t)Φ (X

ˆ S) = Analyzing the first two components of (4.17), we first see by (4.16) that Φ(X, −1 ˆ −1 ˆ V L = V L. Secondly, letting ρ ≡ ρ(X, S), we have ˆ S)[X ˆ  , ∆S] = −Vˆ −1 Vˆ  Vˆ −1 L ˆ ˆ + Vˆ −1 L Φ (X, = V −1 L − V −1 V  V −1 L = σρI − V −1 L + V −1 V  (ρI − V −1 L), where the third equality comes from substituting for V −1 L as was done in the proof of Lemma 4.5. Hence, we can rewrite (4.17) as

ˆ + αX ˆ  , Sα ) = V −1 L + α σρI − V −1 L + V −1 V  (ρI − V −1 L) + α2 Tα Φ(X (4.19)

= (1 − α)V −1 L + ασρI + αP + α2 Tα ,

where P ≡ V −1 V  (ρI − V −1 L). Taking the trace of (4.19) and using Proposition 3.1 and Proposition 4.1, where ζ ≡ (∆X • S + X • ∆S)/(nρ2 ), we see

ˆ + αX ˆ  , Sα ) ≥ (1 − α)Γρ + ασnρ + αtr(P ) + α2 tr(Tα ) tr Φ(X   ζ  + ασnρ + αtr(P ) + α2 tr(Tα ) ≥ Γ ρα − αρ 1 + 2   1 (∆X • S + X • ∆S) + ασnρ + αtr(P ) + α2 tr(Tα ) = Γρα − αΓρ 1 + 2nρ2   Γ −1 = Γρα − αΓρ + ασnρ + αρ ρtr(P ) − (∆X • S + X • ∆S) + α2 tr(Tα ). 2n From (4.14) (which is inside the proof of Proposition 4.6), we have ∆X •S +X •∆S = 2 V −1 L • (V −1 L − V −1 V  V −1 L). Thus, letting θ ≡ 1 + γ(γ + 1)/(2n), we can apply (4.11) to the above inequality to get  

τ γ2θ  ˆ ˆ + α2 tr(Tα ) tr Φ(X + αX , Sα ) ≥ Γρα − αΓρ + ασnρ + αρ −(σ − 1)Γ − 2   τ γ2θ + α2 tr(Tα ) = Γρα + ασnρ + αρ −σΓ − 2   2 σγ τ γ2θ + α2 tr(Tα ) = Γρα + αρ − 2 2 αργ 2 (σ − τ θ) = Γρα + + α2 tr(Tα ), 2 where the second equality comes from the definition Γ ≡ n − γ 2 /2. From the above inequality, the statement of the proposition will hold if (4.20)

ργ 2 (σ − τ θ) + α tr(Tα ) ≥ 0, 2

and so we now devote our efforts to establishing (4.20). We start by providing a bound on Tα F , which can be obtained as follows from (4.18), the standard properties of

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

159

integration, and Proposition 3.2(ii):  1 ˆ + tαX ˆ  , Stα )[X ˆ  , ∆S](2) dt Tα F ≤ (1 − t) Φ (X F 0

 1

T  2 1 −1 ˆ ˆ −1 ˆ −T ˆ Vˆtα F + L ≤√ (1 − t) Vˆtα dt Ltα Vˆtα X tα ∆S Ltα F 2 0  1

1 −1 ˆ ˆ  V F ≤√ (1 − t) Vˆtα Ltα V −1 Vˆtα 2 V T X 2 0 2 2 −1 ˆ −1 + L ∆SL−T F dt, tα L L

ˆ + tαX ˆ  ) and L ˆ tα ≡ L(S ˆ tα ). We note that the hypotheses of Propowhere Vˆtα ≡ Vˆ (X  ˆ ˆ ˆ sition 3.7 hold with (X, S), (X +tαX , Stα ), and the scalar tατ due to Proposition 4.3. Now using Propositions 4.3, 3.7, and 3.6(i) as well as (4.16) and simple integration with respect to t, the above inequality shows that  1 ˆ

2 1 Vˆ −1 L (1 − tατ )−1 τ + (1 − tατ )−1 τ dt (1 − t) Tα F ≤ √ 1 − tατ 2 0 √ 2 −1  1 1−t = 2 2 τ V L dt 3 0 (1 − tατ ) √ 2 ≤ 2 2 τ ((γ + 1)ρ) (2(1 − ατ ))−1 √ ≤ 2 τ 2 (1 − τ )−1 (γ + 1)ρ. Hence, √ ργ 2 (σ − τ θ) ργ 2 (σ − τ θ) ργ 2 (σ − τ θ) + αtr(Tα ) ≥ − α |tr(Tα )| ≥ − α n Tα F 2 2 2 √ √ ργ 2 (σ − τ θ) − α 2τ 2 (1 − τ )−1 (γ + 1) nρ ≥   22 √ √ γ (σ − τ θ) − α 2τ 2 (1 − τ )−1 (γ + 1) n ≥ 0, =ρ 2 where the last inequality follows from (4.15). This completes the proof of the proposition. Corollary 4.8. Let (X, S, y) ∈ N (γ), and suppose that α ≥ 0 satisfies (4.15). Then (Xα , Sα , yα ) ∈ N (γ). Proof. As discussed in the proof of Proposition 4.7, we have α ≤ 1, and so F? F (Xα , Sα ) ∈ S++ × S++ by Corollary 4.4. In addition, the Newton system defining (∆X, ∆S, ∆y) clearly shows that (Xα , Sα , yα ) ∈ F 0 (P ) × F 0 (D). ˆ + αX ˆ  is a positive definite compleAs was shown in the proof of Corollary 4.4, X   ˆ ˆ ˆ ˆ tion of Xα , where X ≡ X(X) and X ≡ X (X)[∆X]. By Theorem 3.3, we have that, among all positive definite completions of Xα , the maximum-determinant completion ˆ α ) of Xα maximizes the function tr(Vˆ (·)−1 L(S ˆ are as in ˆ α )), where Vˆ (·) and L(·) X(X ˆ ˆ Proposition 3.2. Thus, using the fact that V (·) = V (X(·)) and combining Theorem 3.3 and Proposition 4.7, we see ˆ α ))−1 L(S ˆ + αX ˆ  )−1 L(S ˆ α )) ≥ tr(Vˆ (X ˆ α )) ≥ Γρα . tr(V (Xα )−1 L(Sα )) = tr(Vˆ (X(X Combining this inequality with the fact that (Xα , Sα , yα ) ∈ F 0 (P ) × F 0 (D), and applying Proposition 3.1, we conclude that (Xα , Sα , yα ) ∈ N (γ).

160

SAMUEL BURER

√ Theorem 4.9. Let n ≥ 1 and constants γ ≥ 0, δ ∈ [0, n), and τ ∈ (0, 1) be given such that (4.3)–(4.8) are satisfied. Suppose that Algorithm SDP is initialized with (X 0 , S 0 , y 0 ) ∈ N (γ) and a tolerance ε ≥ 0, and suppose that the constant stepsize α ≥ 0 is used in each iteration of the algorithm, where α is given by the right-hand side of (4.15). Then, for each k ≥ 0, the sequence {(X k , S k , y k )}k≥0 produced by the algorithm satisfies the following: (i) (X k , S k , y k ) ∈ N (γ); √ (ii) µ(X k , S k ) ≤ (1 − α δ/ n)k µ(X 0 , S 0 ). As a consequence, Algorithm SDP terminates with a point (X k , S k , y k ) satisfying X k • S k ≤ ε in at most O(n log(X 0 • S 0 /ε)) iterations. Proof. Item (i) follows from Corollary 4.8 and induction on k. Likewise, item (ii) follows from the combination of Propositions 4.1 and 4.6 along with induction on k. The bound on the number of iterations now follows from (ii) and the standard argument that shows that √ the duality gap is reduced in each iteration by a factor on the order of O(1 − α δ/ n), which is O(1 − 1/n) due to (4.15). 5. Computational issues and results. In this section, we discuss several computational issues related to Algorithm SDP, and then present computational results comparing our method with three other SDP implementations. 5.1. Implementation features of Algorithm SDP. We now demonstrate how the theoretical presentation of Algorithm SDP in section 4 can be specified into a practical implementation. First, as is typical with practical primal-dual interior-point algorithms, we implement Algorithm SDP as an infeasible method. Thus, we do not require full primal-dual F? F feasibility of the iterates (X, S, y) but rather require only X ∈ S++ and S ∈ S++ and then define the Newton direction (∆X, ∆S, ∆y) by (5.1)

H  (X, S, y)[∆X, ∆S, ∆y] = (b − A(X), C − A∗ (y) − S, σ ρ V − L),

where ρ ≡ ρ(X, S) and 0 ≤ σ < 1, instead of by (4.1). In particular, this makes the choice of initial iterate trivial. (In fact it is chosen as described in the SDPT3 user’s guide [31] in all of the computational results presented in the following subsections.) Second, we choose a different stopping criterion than that originally given for Algorithm SDP. A more practical stopping criterion is based on the relative duality gap and relative feasibility of the current iterate (X, S, y), which we define respectively as   X •S b − A(X) C − A∗ (y) − S F . and max , 1 + (|C • X| + |bT y|)/2 1 + b 1 + C F Target values for the gap and feasibility can then be specified at run time. Third, in our implementation of Algorithm SDP, we do not bother to stay in the neighborhood N (γ), since our computational experience indicates that this does not yield a substantial practical improvement. Instead, we essentially take α as large as F? F possible, while keeping (Xα , Sα , yα ) in S++ × S++ × m . In fact, as is common in SDP implementations, we differentiate two step-sizes, αp for the primal and αd for the dual. Then, for the primal and dual separately, the actual step-size is calculated by estimating the infeasible boundary step-size α ¯ to within an absolute accuracy of 1.0e−2 using a simple bisection method and then choosing the step-size slightly less than min(¯ α, 1).

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

161

Fourth, we have found it advantageous for reducing the duality gap in the early stages of Algorithm SDP to update X and S in each iteration by performing an alternative linesearch in the spaces of V and L. More specifically, we choose αp ≤ 1 and αd ≤ 1 so that Lαd ≡ L(S) + αd L (S)[∆S] and Vαp ≡ V (X) + αp V  (X)[∆X] F −T −1 are close to the boundary of LF ++ , and we then define Xαp = π (Vαp Vαp ) and Sαd = Lαd LTαd . (We note that the calculation of Xαp and Sαd can be done efficiently; see below.) This update method, however, does not effectively achieve feasibility, and so it is always necessary to revert back to the typical linesearch in the space of X and S. Fifth, our choice of σ in each iteration is adaptive rather than constant as in the statement of Algorithm SDP. Roughly speaking, we choose σ conservatively whenever we are experiencing small step-sizes, but then more aggressively when our step-sizes are larger. In particular, we set σ = 1 − γ min(αp , αd ), where αp and αd are the successful step-sizes of the preceding iteration and γ ∈ [0.1, 0.3] is an adaptive parameter that is incrementally increased when step-sizes satisfying αp = αd = 1 are encountered, and incrementally reduced otherwise. As such, our typical values for σ are smaller than those for other SDP implementations. (For example, SDPT3 employs a constant γ = 0.9.) In addition, it is not immediately clear whether a predictorcorrector approach will improve the choice of σ or even whether such an approach would be computationally efficient (see following subsections), and so our current implementation of Algorithm SDP does not use a predictor-corrector strategy. Finally, we mention some details regarding the calculation of the Newton direction (∆X, ∆S, ∆y). As with other primal-dual path-following algorithms, the calculation can be reduced to the solution of the system M ∆y = h, where M is the so-called Schur complement matrix and h is in accordance with the system (5.1). Here, M is the m × m matrix representation of the linear operator −A ◦ V  (X)−1 ◦ L (S) ◦ A∗ , so that M is positive definite by Corollary 2.9. Two fundamental techniques for calculating ∆y can then be considered: either (1) solution of the system via forward and backward substitution after the direct formation and factorization of M ; or (2) solution of the equation via an iterative method. We present numerical results for both methods in later subsections. It is important to note, however, that M has no inherent sparsity (as is typical with other methods) and that M is nonsymmetric (which is atypical). Hence, the first method for calculating ∆y requires Gaussian elimination with pivoting, and the second requires an efficient iterative method for nonsymmetric systems (like BiCGSTAB, which we have chosen in the computational results). Thus, the ill-conditioning of M near optimality can have a negative impact upon both methods. A natural way to reduce this impact in the case of BiCGSTAB is to perform some pre-conditioning of the linear system, but this has not been implemented in the current version of Algorithm SDP since more investigation is necessary to develop reasonable preconditioning strategies. Having described the key implementation choices of Algorithm SDP, we now consider the basic operations of the algorithm and in particular discuss their computational complexities. From the statement of Algorithm SDP and the definition of F? the Newton direction, we see that the main operations are checking X ∈ S++ and F  −1  S ∈ S++ and evaluating V (X), V (X) [N ], L(S), and L (S)[B] for any N ∈ LF and B ∈ S F . To describe the complexity of these operations, we introduce a few definitions. For each j = 1, . . . , n we define Kj ≡ {i ∈ V : (i, j) ∈ F, i ≥ j}.

162

SAMUEL BURER

That is, for each j = 1, . . . , n, Kj is the set of row indices of the jth column of the lower part of F . We have the following fact (detailed in Fukuda et al. [9]), which expresses the chordal structure F as a union of dense blocks, or cliques: (5.2)

F =

n 

Kj × Kj .

j=1

We also define f2 ≡

n 

|Kj |2 .

j=1 F is to simply attempt the calculation A common way to check whether S is in S++ F of L(S) (then S ∈ S++ if and only if the calculation is successful), and standard methods for calculating L(S) show that the time required is O(f2 ). Moreover, in a similar manner, the defining equation (2.8) shows that L (S)[B] can also be calculated in time O(f2 ). Hence, each of the key operations involving S is O(f2 ). To calculate the times required for the key operations involving X, we introduce some additional notation and ideas. First, for any P, Q ⊆ V and any W ∈ S n , we let WP Q ∈ |P |×|Q| denote the matrix obtained from W by deleting all rows p ∈ P and all columns q ∈ Q. Second, it is clear that (5.2) can be simplified to F = ∪r=1 Cr ×Cr , where {Cr }r=1 are the maximal members of {Kj }nj=1 , i.e., those members of {Kj }nj=1 that are not properly contained in any other members. We then define

f3 ≡

 

|Cr |3

r=1

and have the following critical theorem, proved in [12]. Theorem 5.1. Let X ∈ S F . Then |C | F? if and only if XCr Cr ∈ S+ r for all r = 1, . . . , >; (i) X ∈ S+ |C | F? if and only if XCr Cr ∈ S++r for all r = 1, . . . , >. (ii) X ∈ S++ F? This theorem shows immediately that testing whether X ∈ S++ can be done in time O(f3 ) by simply attempting the factorizations of the submatrices XCr Cr of X. Next, we determine the time required for V (X) and V  (X)−1 [A] by considering the following proposition, which gives a formula for V (X) that will be convenient for computation. In the proposition, π F l is the operator that is defined similarly to π F except that it projects onto LF instead of S F . F? Proposition 5.2. Let X ∈ S++ . Then V ≡ V (X) ∈ LF ++ uniquely satisfies the equation (5.3)

π F l (XV ) = π F l (V −T ).

Proof. We first consider solving the simpler equation π F l (XQ) = 0 for Q ∈ LF . Because the jth column of this equation can be expressed compactly as XKj Kj qj = 0, where qj is the nonzero part of Q·j , we conclude from Theorem 5.1(ii) that qj = 0, which implies that Q = 0. Now suppose that V1 , V2 ∈ LF ++ each satisfy (5.3), and note that the right-hand side of (5.3) is a diagonal matrix. For i = 1, 2, let Di be the diagonal matrix defined by the diagonal entries of Vi . Then (5.3) implies that π F l (XV1 D1 ) = π F l (XV2 D2 ) or, equivalently, that π F l (X(V1 D1 − V2 D2 )) = 0. Using the result of the previous

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

163

paragraph, this shows V1 D1 = V2 D2 . Examining the jjth position of this equation and employing the definition of Di , we see that [V1 ]jj = [V2 ]jj , which in turn implies that D1 = D2 and hence V1 = V2 . In other words, at most one V ∈ LF ++ satisfies (5.3). We now show that V ≡ V (X) satisfies (5.3), which will complete the proof of the proposition. For i ≥ j such that (i, j) ∈ F , consider the ijth entry of the matrix ˆ − X)V , where X ˆ ≡ X(X): ˆ (X ˆ − X)V ]ij = [(X

n 

ˆ − X]ik Vkj . [X

k=j

We claim that the above expression equals zero. So suppose for contradiction that ˆ − X]ik = 0 and Vkj = 0. it is nonzero. Then there exists k ≥ j such that [X F ˆ F Because π (X) = X and V ∈ L++ , this implies that (i, k) ∈ F and (k, j) ∈ F . However, due to the chordal structure of F and the fact that the ordering (1, . . . , n) is a perfect elimination ordering for F , we have that (i, j), (k, j) ∈ F imply (i, k) ∈ F , which is a contradiction. Hence, the above expression equals 0. Said differently, ˆ ) = π F l (V −T ), as ˆ − X)V ) = 0, which implies π F l (XV ) = π F l (XV we have π F l ((X desired. As the proposition and its proof indicate, the nonzero part vj ∈ |Kj | of the jth column of V is simply the solution of the system XKj Kj vj = Vjj−1 e1 , where e1 ∈ |Kj | has a one in its first position and zeros elsewhere. Hence, as long as the factorizations of XKj Kj for j = 1, . . . , n are readily available, the calculation of V can be done in time O(f2 ). Are these factorizations readily available, however? The operation to verify X ∈ F? S++ yields the factorizations of XCr Cr only for r = 1, . . . , >, and so the factorizations of XKj Kj are not explicitly available. This is not a significant obstacle, however, since it is possible to reorder the vertices V (in a preprocessing phase, for example) so that factorizations of the matrices XKj Kj are embedded in a natural manner in the upper Cholesky factorizations of the matrices XCr Cr . Moreover, this reordering can be done without altering the chordal structure of F or the property that (1, . . . , n) is a perfect elimination ordering. This property is discussed in detail in section 2.1 of [9] and section 2.2 of [26], where it is described as a perfect elimination ordering induced from an ordering of maximal cliques satisfying the running intersection property, and this feature has been incorporated into Algorithm SDP. Differentiating (5.3) with respect to X in the direction A ∈ S F and defining N ≡ V  (X)[A], we see that π F l (AV ) = −π F l (V −T N T V −T ) − π F l (XN ). Note that the first term on the right-hand side does not require the full matrix V −T N T V −T but rather just its diagonal. The above equation also provides a convenient form for calculating A = V  (X)−1 [N ] for an arbitrary N ∈ LF , once X and V are available. In fact, it is not difficult to see that A can be computed from N in time O(f2 ). We summarize the complexities obtained from the preceding discussion in the following proposition. F? Proposition 5.3. Let X, S ∈ S F . Determining whether X is in S++ requires time O(f3 ), and under the mild assumption that certain calculations are stored upon F? , calculation of V (X) and V  (X)−1 [N ], for the successful determination of X ∈ S++

164

SAMUEL BURER

F is performed by any N ∈ LF , requires time O(f2 ). Determining whether S is in S++ attempting the computation of L(S), which requires time O(f2 ). Upon the successful computation of L(S), the calculation of L (S)[B], for any B ∈ S F , requires time O(f2 ).

5.2. Comparison with a standard primal-dual method. In order to see how Algorithm SDP compares with other primal-dual path-following implementations, in this subsection we compare Algorithm SDP with SDPT3 version 3.0, a successful implementation by T¨ ut¨ unc¨ u, Toh, and Todd (see [31]). We have chosen to run SDPT3 with the HRVW/KSH/M direction using the Mehrotra predictor-corrector strategy. Both algorithms use the same starting point and terminate when the relative duality gap is less than 1.0e−4 and the relative feasibility is less than 1.0e−5. We remark that these moderate values for the target gap and feasibility have been chosen for two reasons. First, it makes our presentation consistent with later subsections where larger SDPs are considered and solved to the same accuracies. Second, we have chosen moderate values in keeping with our discussion of the previous subsection concerning how the ill-conditioning of M near optimality affects the calculation of ∆y in Algorithm SDP. In fact, for all but a few of the problems presented below (most notably the control and qap instances), Algorithm SDP can easily obtain even higher accuracy. For these computational results, we solve the system M ∆y = h in Algorithm SDP using either Gaussian elimination or BiCGSTAB, depending on the stage of the algorithm. In the early stages of the algorithm when the conditioning of M is good, we employ BiCGSTAB. As soon as the number of multiplications by M (or equivalently, the number of evaluations of −A ◦ V  (X)−1 ◦ L (S) ◦ A∗ ) exceeds m during one call to the BiCGSTAB subroutine, however, we switch to Gaussian elimination. Assuming that evaluations of the functions A and A∗ require time O(f2 ) (as is the case for most of the test problems below), the direct method requires time O(mf2 ) to form the matrix M and then an additional O(m3 ) time to factor M and solve for ∆y. We have thus chosen problems having mf2 + m3 ≤ 1.0e+9 as a heuristic guide for selecting problems for which the direct method is not too computationally intensive. In particular, no problems having m > 1000 have been selected. The test problems come from the SDPLIB collection of problems maintained by Borchers [3], and their statistics are listed in Table 1. The first three columns are self-explanatory, and the last two give the percentage of nonzeros in the iterates S and X represented by the density patterns E and F . Complete computational results on a 2.4 GHz Pentium 4 computer are given in Table 2 and are self-explanatory, with times given in seconds. We remark that, although Algorithm SDP is designed primarily for sparse problems, i.e., when F is a relatively small subset of V × V , it can of course be applied to dense problems with F = V × V , as with a few of the test problems in Table 1. We have included these problems because we feel it is instructive to compare the performance of Algorithm SDP and SDPT3 on such instances. The results indicate several interesting points. First and foremost, both methods were able to solve all problems to the desired accuracy in a reasonable amount of time. Second, Algorithm SDP outperformed SDPT3 on several sets problems (the arch, max , qp, and ss problems, as well as a subset of the mcp problems) for which the chordal pattern F was very small, indicating Algorithm SDP’s capability for exploiting this structure. On the other hand, SDPT3 significantly outperformed Algorithm SDP on the control and qap problems, as Algorithm SDP was challenged by the

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

165

Table 1 Statistics for SDPLIB test problems. Problem arch2 arch4 arch8 control5 control6 control7 control8 gpp100 gpp124-2 gpp124-3 gpp124-4 maxG11 mcp250-2 mcp250-3 mcp250-4 mcp500-1 mcp500-2 qap7 qap8 qap9 qpG11 ss30 theta1 theta2 truss5 truss6 truss7 truss8

n 335 335 335 75 90 105 120 100 124 124 124 800 250 250 250 500 500 50 65 82 1600 426 50 100 331 451 301 628

m 174 174 174 351 496 666 861 101 125 125 125 800 250 250 250 500 500 358 529 748 800 132 104 498 208 172 86 496

Dens E (%) 3.29 3.29 3.29 45.61 45.42 45.28 45.18 100.00 100.00 100.00 100.00 0.75 2.75 4.89 8.51 0.90 1.38 100.00 100.00 100.00 0.25 4.43 100.00 100.00 3.31 0.88 0.99 3.18

Dens F (%) 6.87 6.87 6.87 46.88 46.76 46.68 46.43 100.00 100.00 100.00 100.00 2.61 15.27 36.76 58.63 2.58 11.99 100.00 100.00 100.00 0.73 9.85 100.00 100.00 3.31 0.88 0.99 3.18

conditioning and density of these problems. In addition, SDPT3 was faster on the remaining mcp problems, most likely due to the related fact that Algorithm SDP consistently required more iterations than SDPT3, which itself is indicative of the strong convergence properties of the HRVW/KSH/M direction when combined with the Merhrotra predictor-corrector strategy. 5.3. Comparison with the completion method. In this subsection, we compare Algorithm SDP with the completion method (CM) of Fukuda et al. on problems for which the large size of m requires the solution of the Schur complement equation by an iterative method. As in the previous subsection, Algorithm SDP is run with BiCGSTAB as the iterative solver, and the SDPs are solved to an accuracy of 1.0e−4 for the relative duality gap and 1.0e−5 for relative feasibility. As described in [9, 26], CM stores X and S in the same sparse format as Algorithm SDP does, and the search direction in each iteration is the sparse projection of the HRVW/KSH/M direction. Moreover, the sparsity of X and S is exploited in the formation of the Schur complement matrix M , which is then factored directly to solve for ∆y. Here, however, we have implemented our own version of CM which computes ∆y using an iterative method, namely, the conjugate gradient method, which is appropriate since, in this case, M is symmetric positive definite. Other algorithmic choices for our implementation of CM mimic those of SDPT3 in the previous subsection, except that the predictor-corrector method has not been implemented due to its need to solve an extra m × m system in each iteration.

arch2 arch4 arch8 control5 control6 control7 control8 gpp100 gpp124-2 gpp124-3 gpp124-4 maxG11 mcp250-2 mcp250-3 mcp250-4 mcp500-1 mcp500-2 qap7 qap8 qap9 qpG11 ss30 theta1 theta2 truss5 truss6 truss7 truss8

PROBLEM

primal −6.71397e−01 −9.72501e−01 −7.05642e+00 −1.68824e+01 −3.73018e+01 −2.06237e+01 −2.02856e+01 4.49437e+01 4.68623e+01 1.53014e+02 4.18990e+02 −6.29127e+02 −5.31888e+02 −9.81095e+02 −1.68183e+03 −5.98103e+02 −1.06996e+03 4.24833e+02 7.56972e+02 1.40999e+03 −2.44858e+03 −2.02378e+01 −2.29992e+01 −3.28782e+01 1.32638e+02 9.01022e+02 9.00038e+02 1.33117e+02

as dual −6.71546e−01 −9.72657e−01 −7.05716e+00 −1.68838e+01 −3.73048e+01 −2.06253e+01 −2.02865e+01 4.49435e+01 4.68623e+01 1.53014e+02 4.18987e+02 −6.29165e+02 −5.31932e+02 −9.81175e+02 −1.68196e+03 −5.98154e+02 −1.07006e+03 4.24811e+02 7.56941e+02 1.40991e+03 −2.44866e+03 −2.02396e+01 −2.30002e+01 −3.28793e+01 1.32630e+02 9.00985e+02 8.99999e+02 1.33105e+02

primal −6.71485e−01 −9.72561e−01 −7.05689e+00 −1.68835e+01 −3.73035e+01 −2.06249e+01 −2.02861e+01 4.49445e+01 4.68639e+01 1.53017e+02 4.19008e+02 −6.29127e+02 −5.31926e+02 −9.81168e+02 −1.68188e+03 −5.98122e+02 −1.07003e+03 4.24763e+02 7.56778e+02 1.40988e+03 −2.44864e+03 −2.02389e+01 −2.29998e+01 −3.28776e+01 1.32636e+02 9.00999e+02 9.00012e+02 1.33114e+02

OBJECTIVE VALUE t3 dual −6.71523e−01 −9.72644e−01 −7.05702e+00 −1.68836e+01 −3.73047e+01 −2.06251e+01 −2.02865e+01 4.49435e+01 4.68623e+01 1.53014e+02 4.18987e+02 −6.29165e+02 −5.31930e+02 −9.81173e+02 −1.68196e+03 −5.98154e+02 −1.07006e+03 4.24787e+02 7.56863e+02 1.40986e+03 −2.44866e+03 −2.02396e+01 −2.30000e+01 −3.28795e+01 1.32635e+02 9.00999e+02 9.00000e+02 1.33113e+02

t3 3.8e−05 8.3e−05 1.7e−05 1.1e−05 3.0e−05 1.3e−05 2.0e−05 2.3e−05 3.5e−05 2.1e−05 5.1e−05 6.0e−05 8.3e−06 5.0e−06 5.0e−05 5.4e−05 2.3e−05 2.1e−05 1.3e−05 7.6e−05 8.6e−06 3.2e−05 1.1e−05 5.8e−05 1.3e−05 6.8e−06 1.5e−05 1.1e−05

R-GAP

8.9e−05 7.9e−05 9.2e−05 7.8e−05 7.8e−05 7.6e−05 4.6e−05 2.2e−06 8.1e−07 2.0e−06 5.7e−06 6.1e−05 8.2e−05 8.1e−05 8.0e−05 8.5e−05 9.0e−05 7.1e−05 6.6e−05 7.1e−05 3.1e−05 8.7e−05 4.5e−05 3.3e−05 6.4e−05 4.1e−05 4.4e−05 8.8e−05

as

t3 8.4e−12 2.9e−11 3.1e−10 1.3e−07 4.4e−08 1.2e−07 5.5e−08 3.1e−10 4.2e−10 4.1e−10 1.5e−09 9.8e−16 7.1e−16 1.3e−15 1.8e−15 7.9e−16 4.0e−15 2.7e−10 3.1e−10 4.3e−10 2.5e−15 5.2e−11 2.6e−13 1.9e−12 5.7e−08 6.0e−07 2.2e−07 3.7e−07

R-FEAS

7.6e−13 1.4e−11 1.4e−10 1.4e−08 1.7e−08 2.7e−08 1.5e−08 8.2e−06 7.0e−06 7.8e−06 9.1e−06 5.6e−06 2.3e−06 3.2e−06 8.5e−06 2.9e−06 2.0e−06 4.3e−06 9.3e−06 8.0e−06 5.4e−06 3.0e−06 4.9e−06 1.2e−15 6.5e−10 2.6e−08 1.3e−11 5.1e−09

as

Table 2 Results of Algorithm SDP (AS) and SDPT 3 (T3) on the SDPLIB test problems.

50 52 42 52 52 50 51 42 44 41 38 39 34 34 34 35 35 40 39 39 39 46 39 42 45 43 42 48

17 18 17 18 19 21 20 12 12 11 12 11 10 10 9 11 11 12 12 11 12 17 9 9 15 21 22 14

ITER as t3 8 10 8 37 95 207 429 7 15 15 16 20 5 26 52 3 57 29 33 100 27 7 1 48 5 2 0 66

20 20 19 11 21 43 71 1 2 2 2 65 4 4 5 19 23 2 4 7 281 71 0 3 5 3 2 27

TIME (s) as t3

166 SAMUEL BURER

SEMIDEFINITE PROGRAMMING WITH PARTIAL MATRICES

167

Table 3 Statistics for test problems with large m. Problem brock200-1.co brock200-4.co c-fat200-1.co hamming6-4.co hamming8-4.co johnson16-2-4.co keller4.co san200-0.7-1.co sanr200-0.7.co MANN-a27.co vibra3 vibra4 vibra5 copo14 copo23

n 200 200 200 64 256 120 171 200 200 379 1185 2545 6801 560 2300

m 5067 6812 18367 1313 11777 1681 5101 5971 6033 1081 544 1200 3280 1275 5820

Dens E (%) 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 2.03 0.62 0.30 0.11 1.17 0.31

Dens F (%) 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 3.12 1.53 0.90 0.45 1.17 0.31

In CM, multiplication by M is equivalent to an evaluation of the operator ˆ ≡ X(X). ˆ ˆ ∗ (·)S −1 ), where X It is not difficult to see that, using the sparse A(XA matrices V and L and assuming that evaluations of A and A∗ require time O(f2 ) (as assumed in the previous subsection), this operator can be evaluated in time O(nf ), n where f = j=1 |Kj |, by using sparse triangular solves. We thus have that f2 =

n  j=1

|Kj |2