A T-algebraic approach to primal-dual interior ... - Optimization Online

Report 5 Downloads 48 Views
A T-algebraic approach to primal-dual interior-point algorithms CHEK BENG CHUA

Abstract. Three primal-dual interior-point algorithms for homogeneous cone programming are presented. They are a short-step algorithm, a large-update algorithm, and a predictor-corrector algorithm. These algorithms are described and analyzed based on a characterization of homogeneous cone via T -algebra. The analysis show that the algorithms have polynomial iteration complexity.

Key words and phrases. Homogeneous cone programming; T-algebra; Primal-dual interior-point algorithm.

1. Introduction Primal-dual interior-point algorithms—first designed for linear programming (see, e.g., [19]), and subsequently extended to semidefinite programming (see, e.g., [18, Part II]) and symmetric cone programming (see, e.g., [13])—are the most widely used interior-point algorithms. At the same time, they are able to achieve the best iteration complexity bound known to date. These strongly motivates further development of primal-dual interior-point algorithms for wider classes of optimization problems. Various studies to extend primal-dual algorithms beyond symmetric cone programming involves the v-space approach; i.e., the use of scaling points. Tun¸cel [15] showed that primal-dual symmetric algorithms that use the v-space approach can only be designed for symmetric cone programming. By dropping primal-dual symmetry, Tun¸cel [16] designed primal-dual algorithms based on the v-space approach. However, polynomial iteration complexity bounds were established only for symmetric cone programming. This paper focuses on primal-dual interior-point algorithms for homogeneous cone programming. Homogeneous cone programming, which will be formally defined in the next section, is a class of convex programming problems. It includes, as a proper sub-class, symmetric cone programming. While there is a finite number of non-isomorphic symmetric cones of each dimension, this number is uncountable for homogeneous cones when the dimension is at least eleven [17]. Thus the collection of homogeneous cones is significantly larger than the subclass of symmetric cones. On the other hand, the author [4] proved that every homogeneous cone can be represented as the intersection of some cone of symmetric positive definite matrices with a suitable linear subspace. Thus all homogeneous cone programming problems can be formulated as semidefinite programming problems, which are special cases of symmetric cone programming problems. However it is argued in the same paper that there are advantages in designing algorithms specifically for homogeneous cone programming as the ranks of these cones—which determines the iteration complexity of interior-point 1

2

C. B. CHUA

algorithms [9]—are always no more than the ranks of the representing positive definite cones. The primal-dual algorithms in this paper rely on a characterization of homogeneous cones via T -algebras. Algebraic structures of convex cones had been successfully exploited in the design and analysis of interior-point algorithms for convex conic programming. This is most notable for symmetric cone programming [1, 3, 8, 14]) where Jordan algebraic structures of symmetric cones were used. The primal-dual algorithms and their analyses in this paper can be applied to symmetric cone programming as a special case of homogeneous cones programming. These these algorithms differ from those in [1] in the choice of search direction. The search direction in this paper is a generalization of the S-Ch-MT search direction for semidefinite programming described in [11], which was motivated via theoretical complexity and computational trials. On the other hand, as a search direction for homogeneous cone programming, this generalization is a natural consequence of the T -algebraic structure of homogeneous cones. This paper is organized as follows. Section 2 begins with a formal definition of homogeneous cone programming, which is followed by a brief introduction to T -algebras. The quadratic representations in Euclidean Jordan algebras are generalized to T -algebras. Section 2 then concludes with a discussion on dual homogeneous cones. Section 3 defines the primal-dual central path of a pair of primal-dual homogeneous cone programming problems that is based on optimal barriers for the problems, and establishes various technical results needed for the design and analysis of path-following algorithms. Section 4 presents a general framework for primal-dual path-following algorithms, and describes three specializations—a short-step algorithm, a large-update algorithm, and a predictorcorrector algorithm—with proofs of their respective iteration complexity. 2. Homogeneous Cone Programming The main object of study in this paper is a class of convex programming problems known as homogeneous cone programming or convex conic programming over homogeneous cones. We begin with a description of the programming problems in this class. Let E denote a Euclidean space with inner product h·, ·i. Definition 1 (Homogeneous cone). A homogeneous cone is a pointed, open, convex cone K ∈ E whose automorphism group Aut(K) acts transitively on it; i.e., for every pair (x, y) ∈ K ⊕ K, there exists a linear map A : E → E satisfying A(K) = K and A(x) = y. b, bs denote vectors in E and L denotes Henceforth, K denotes a homogeneous cone in E, x a linear subspace of E. We consider the following homogeneous cone programming problem inf x

hb s, xi

subject to x ∈ L + {b x},

x ∈ cl(K)

(HCP )

and its Lagrangian dual problem inf s

hb x, si

subject to s ∈ L⊥ + {bs},

s ∈ cl(K] )

(HCD)

T-algebraic approach to primal-dual algorithms

3

where L⊥ denotes the orthogonal complement of L in E and K] denotes the (open) dual cone {s ∈ E : hx, si > 0 ∀x ∈ cl(K) \ {0}}. Since K is pointed, the dual cone K] is nonempty. 2.1. T-algebras. We begin with some basics of T -algebras; see [17] and [2, Chapter 2] for more details. L Definition 2 (Matrix algebra). A matrix algebra A is a bi-graded algebra ri,j=1 Aij over the reals satisfying ( Ail if j = k, Aij Akl ⊆ {0} if j 6= k. The positive integer r is called the rank of the matrix algebra. For each a ∈ A, we denote by aij its component in Aij . Definition 3 (Involution). An involution (·)∗ of a matrix algebra A of rank r is a linear automorphism on A that is (1) involutory (i.e., (a∗ )∗ = a for all a ∈ A), (2) anti-homomorphic (i.e., (ab)∗ = b∗ a∗ for all a, b ∈ A), and further satisfies A∗ij = Aji (1 ≤ i, j ≤ r). Definition 4 (T -algebra). A T -algebra of rank r is a matrix algebra A of rank r with involution (·)∗ satisfying the following seven axioms: I. For each i ∈ {1, . . . , r}, the subalgebra Aii is isomorphic to the reals. II. For each a ∈ A and each i, j ∈ {1, . . . , r}, aji ei = aji and ei aij = aij , where ei denotes the unit of Aii . III. For each a, b ∈ A and each i, j ∈ {1, . . . , r}, aij bji = bji aij . IV. For each a, b, c ∈ A and each i, j, k ∈ {1, . . . , r}, aij (bjk cki ) = (aij bjk )cki . V. For each a ∈ A and each i ∈ {1, . . . , r}, a∗ij aij ≥ 0 with equality when and only when aij = 0. VI. For each a, b, c ∈ A and each i, j, k, l ∈ {1, . . . , r} with i ≤ j ≤ k ≤ l, aij (bjk ckl ) = (aij bjk )ckl . VII. For each a, b ∈ A and each i, j, k, l ∈ {1, . . . , r} with i ≤ j ≤ k and l ≤ k, aij (bjk b∗lk ) = (aij bjk )b∗lk . Definition 5 (Inessential change of grading). A change in the grading of a T -algebra A is the replacement of each subspace Aij by Aπ(i),π(j) where π is a permutation of {1, . . . , r}. An inessential change in the grading of a T -algebra A is one that preserves the subspace of upper triangular (or equivalently, lower triangular) elements. In another words, the permutation π in an inessential change satisfies (i < j) ∧ (π(i) > π(j)) =⇒ Aij = {0}.

4

C. B. CHUA

Definition 6 (Isomorphic T -algebras). A T -algebra A of rank r is said to be isomorphic to another T -algebra of the same rank if, after an inessential change of grading of A, there is an isomorphism of the bi-graded algebras with involution. Definition 7 (Associated cone). The cone associated with a T -algebra A of rank r is {tt∗ : t ∈ A, tij = 0 ∀1 ≤ j < i ≤ r, tii > 0 ∀1 ≤ i ≤ r}. It is denoted by K(A).1 Example 1. Every (n+2)-dimensional T -algebra of rank 2 is isomorphic to the T -algebra A = A11 ⊕ A12 ⊕ A21 ⊕ A22 = R ⊕ Rn ⊕ Rn ⊕ R defined by (ab)ii = aTij bji

(i, j ∈ {1, 2}, i 6= j)

By Corollary IV.1.5 of [7], the subset of elements of A satisfying a∗ = a, when equipped with the Jordan product ab + ba (a, b) 7→ , 2 is a Euclidean Jordan algebra of rank 2. Example 2. Let H be a Euclidean Hurwitz algebra (i.e., algebra of real numbers, complex numbers, quaternions or octonions). We shall use 0 for all t ∈ T+ \ {0}. We thus proved the following proposition. Proposition 4. The dual cone K] of K is the homogeneous cone {ll∗ : l ∈ T∗++ }. The group of automorphisms {Qt : t ∈ T∗++ } acts transitively on K] . 2It

is in fact a simply transitive subgroup; i.e., for every t, u ∈ T++ , there exists a unique Qw with w ∈ T++ that maps tt∗ to uu∗ .

10

C. B. CHUA

We now turn our attention to identifying the T -algebra B with K(B) = K] . It is easy to see that a matrix algebra B with involution satisfying K(B) = K] can be obtained from A by the change in grading Bij = Ar+1−i,r+1−j (1 ≤ i, j ≤ r) It was pointed out by Vinberg [17, Chapter III, §6] that the matrix algebra B is also a T -algebra. Indeed Axioms I–V translates directly from A to B, Axiom VI for B is equivalent to that for A after taking involution, and Axiom VII for B is proved in the following proposition.3 Proposition 5. For each t, u ∈ T, t∗ (u∗ u) = (t∗ u∗ )u. Equivalently, taking involution and/or polarization gives (1) (uu∗ )t∗ = u(u∗ t∗ ) for all t, u ∈ T∗ , (2) t∗ (u∗ w) + t∗ (w∗ u) = (t∗ u∗ )w + (t∗ w∗ )u for all t, u, w ∈ T, and (3) (uw∗ )t∗ + (wu∗ )t∗ = u(w∗ t∗ ) + w(u∗ t∗ ) for all t, u, w ∈ T∗ . Proof. Let t, u ∈ T be arbitrary. Let a denote the difference t∗ (u∗ u) − (t∗ u∗ )u. We shall prove the proposition by showing that kak = 0, whence a = 0 by Axiom V. We evaluate kak2 as the sum of hw, ai and hw∗ , ai, where w denotes the upper triangular element hhaii∗ . Now hw, t∗ (u∗ u)i = hu(tw), ui

(Axiom IV)

= h(ut)w, ui

(Axiom VI)

∗ ∗

= hw, (t u )ui

(Axiom IV)

hu, u(wt∗ )i = hw∗ (u∗ u), t∗ i

(Axiom IV)

implies hw, ai = 0. Also ∗





= hw , t (u u)i

(Remark 5)

= hu(tw∗ ), ui

(Axiom IV)

and hu, (uw)t∗ i = h(w∗ u∗ )u, t∗ i

(Axiom IV)

= hw∗ , (t∗ u∗ )ui

(Remark 5)



= h(ut)w , ui ,

(Axiom IV)

whence 2 hw∗ , t∗ (u∗ u)i = hu(wt∗ ) + u(tw∗ ), ui = h(uw)t∗ + (ut)w∗ , ui ∗

(Remark 7)

∗ ∗

= 2 hw , (t u )ui shows that hw∗ , ai = 0.

¤

The following propositions are the counterparts of Propositions 2 and 3 for B, stated in terms of the quadratic representations in A instead of those in B. 3It

was noted by Vinberg that by observing that the subspace of upper triangular elements of B is precisely T∗ , both Axioms VI and VII for B follows from those for A by taking involution. While this is certainly true for Axiom VI, it does not hold for Axiom VII.

T-algebraic approach to primal-dual algorithms

11

Proposition 6. For each l ∈ T∗ , the map Ql simplifies to a 7→ l(hhaii∗ l∗ ) + (l hhaii)l∗ . If, in addition, a = kk∗ for some k ∈ T∗ , then Ql (kk∗ ) = (lk)(lk)∗ . Proposition 7. For each k ∈ T∗++ , the map l ∈ T∗++ 7→ Ql (kk∗ ) is a diffeomorphism. Moreover its derivative at l ∈ T∗++ is m ∈ T∗ 7→ (lk)(mk)∗ + (mk)(lk)∗ and the derivative of its inverse map at Ql (kk∗ ) ∈ K] is ­­ ®® a ∈ H 7→ lk Q(lk)−1 (a) k−1 . 3. Primal-Dual Central Paths In this section, we define a primal-dual central path for the pair of primal-dual homogeneous cone programming problems HCP and HCD based on a pair of optimal logarithmically homogeneous self-concordant barriers. 3.1. Optimal barriers. We begin by recalling the definition of ϑ-logarithmically homogeneous self-concordant barriers. Definition 9 (Self-concordant barrier). A function f : C → R on an open convex cone C is said to be a (non-degenerate, strongly) ϑ-self-concordant barrier (for C) if it is a strictly convex, three-times continuously differentiable function that diverges to infinity as its argument approaches a point on the boundary ∂C of C, and satisfies f 000 (x; h) ≤ 2(f 00 (x; h))3/2 and f 0 (x; h) ≤ (ϑf 00 (x; h))1/2 for all x ∈ K and h ∈ E. Here, f (k) (x; h) denotes the k-th directional derivative of f k along h (i.e., dtd k f (x + th)|t=0 ). The parameter ϑ is called the complexity parameter. Definition 10 (Logarithmically homogeneous function). A function f : C → R on an open convex cone C is said to be ϑ-logarithmically homogeneous if for every t ∈ R++ and every x ∈ C, it holds f (tx) = f (x) − ϑ log t. When a self-concordant barrier on C is ϑ-logarithmically homogeneous, it has complexity parameter ϑ. We thus called such function a ϑ-logarithmically homogeneous selfconcordant barrier on C. Every open convex cone C admits logarithmically homogeneous self-concordant barriers; see [12, Section 2.5]. Among all logarithmically homogeneous self-concordant barriers for C, a barrier with the least complexity parameter is called an optimal barrier. Optimal barriers for a homogeneous cone K of rank r has complexity parameter r. Moreover the function r X log(ρi (ux )2 ) (3.1) f : x ∈ K 7→ − i=1

12

C. B. CHUA

is a r-logarithmically homogeneous self-concordant barrier for K, where ux denotes the unique element in T++ satisfying ux u∗x = x; see [9]. To the best of the author’s knowledge, this is the only known optimal barrier for the homogeneous cone. This barrier is a member of the family of logarithmically homogeneous self-concordant barriers ( ) r X x ∈ K 7→ − wi log(ρi (ux )2 ) : wi ≥ 1 ; i=1

see [2, 5] for more details. By writing the barrier f as the composition of x ∈ K 7→ ux and t ∈ T++ 7→ −

r X

log(ρi (t)2 ),

i=1

it is straightforward to compute the gradient of f using Proposition 3. This gives the gradient at each x ∈ K as −1 ∇f (x) = u−∗ x ux .

A simple modification of the Fenchel conjugate function of f gives the function ]

]

f : s ∈ K 7→ −

r X

log(ρi (ls )2 )

i=1

on the dual cone, where ls denotes the unique element in T∗++ satisfying ls l∗s = s. This is in fact the optimal barrier given by (3.1) if we replace K with the homogeneous cone K] . The gradient of this barrier at each s ∈ K] is −1 ∇f ] (s) = l−∗ s ls .

3.2. The central path. The central path of the primal-dual pair of homogeneous cone programming problems (HCP ) and (HCD) is the set {(x(µ), s(µ)) : µ > 0} of pairs of solutions to the primal and dual barrier problems ( ) r X inf hbs, xi − µ log(ρi (ux )2 ) : x ∈ L + {b x} , x

and

i=1

( inf s

hb x, si − µ

r X

) log(ρi (ls )2 ) : s ∈ L⊥ + {bs} ,

i=1

respectively. We deduce from the optimality conditions for the barrier problems that the pair (x(µ), s(µ)) solves the central path equations x ∈ L + {b x},

x ∈ K,

s ∈ L⊥ + {bs}, s ∈ K] , Ql∗s (x) = µe.

(CPµ )

T-algebraic approach to primal-dual algorithms

13

3.3. Tracing the central path. We now consider the problem of estimating the pair (x(µ++ ), s(µ++ )) that solves the central path equations x(µ++ ) ∈ L + {b x},

x(µ++ ) ∈ K,



s(µ++ ) ∈ L + {bs}, s(µ++ ) ∈ K] , Ql∗s(µ

++ )

(CPµ++ )

(x(µ++ )) = µ++ e,

where µ++ ∈ R+ is given. Note that we allow µ++ to be zero, in which case we are solving for optimal solutions. Given a primal-dual strictly feasible pair, we measure its proximity to the pair (x(µ), s(µ)), where µ ∈ R++ , with the proximity measure d : K ⊕ K] ⊕ R++ → R+ defined by ° 1° (x, s; µ) 7→ °Ql∗s (x) − µe° . µ Suppose we have a pair of estimates (x+ , s+ ) satisfying d(x+ , s+ ; µ+ ) ≤ β for some µ+ := 1r hx+ , s+ i > µ++ and some β < 1. We shall estimate (x(µ++ ), s(µ++ )) with Newton’s Method. This calls for the linearization of (CPµ ). Proposition 8. Suppose x ∈ K, s ∈ K] and a, b ∈ H. Then the directional derivative of (tt∗ , ll∗ ) ∈ K ⊕ K] 7→ Ql∗ (tt∗ ) = (l∗ t)(l∗ t)∗ at (x, s) along (a, b) is

¡ ­­ ®®¢ Ql∗s (a) + Ql∗s (x) Ql−1 (b) . s H

Proof. Write the map (tt∗ , ll∗ ) ∈ K ⊕ K] 7→ Ql∗ (tt∗ ) = (l∗ t)(l∗ t)∗ as the composition of (tt∗ , ll∗ ) ∈ K ⊕ K] 7→ (tt∗ , l∗ ) and (tt∗ , l∗ ) ∈ K ⊕ T++ 7→ Ql∗ (tt∗ ). By Proposition 7 the derivative of the first map at (tt∗ , ll∗ ) is (c, d) ∈ H ⊕ H 7→ (c, hhQl−1 (d)ii∗ l∗ ) and by Proposition 3, the derivative of the second map at (tt∗ , l∗ ) is (c, k∗ ) ∈ H ⊕ T 7→ Ql∗ (c) + ((l∗ t)(k∗ t)∗ )H . Hence Chain Rule gives the required directional derivative as ®® ­­ ®®∗ ­­ (b) ))H (b) l∗s ux )∗ )H = Ql∗s (a) + ((l∗s ux )((l∗s ux )∗ Ql−1 Ql∗s (a) + ((l∗s ux )( Ql−1 s s By Axiom VII,

®® ®® ­­ ­­ (b) , (b) ) = ((l∗s ux )(l∗s ux )∗ ) Ql−1 (l∗s ux )((l∗s ux )∗ Ql−1 s s

hence the proposition follows from Proposition 2.

¤

The above proposition gives the linearization of (CPµ ) at (x+ , s+ ) as ∆x ∈ L, ∆s ∈ L⊥ , EE´ DD (∆ ) = µ++ e − Ql∗s+ (x+ ). Ql∗s+ (∆x ) + Ql∗s+ (x+ ) Ql−1 s s ³

+

(3.2)

H

One way to ensure that (x+ + ∆x , s+ + ∆s ) is a good estimate of (x(µ++ ), s(µ++ )) is to get a good upper bound on the sizes of ∆x and ∆s .

14

C. B. CHUA

The analysis can be greatly simplified by applying the primal-dual transformations (x, s) 7→ (Ql∗s+ (x), Ql−1 (s)). s +

Let v, (∆xe , ∆es ) and Le denote, respectively, the images of x+ , (∆x , ∆s ) and L under these transformations. Then (∆xe , ∆es ) solves e ∆xe ∈ L,

∆es ∈ Le⊥ ,

(3.3)

∆xe + (v hh∆es ii)H = µ++ e − v.

Note that the above linear system has a unique pair of solutions if and only if the same holds for (3.2). Since the √ linear system is square, the following proposition shows that this holds when β < 1/ 2. Proposition 9. Suppose v ∈ K satisfies kv − µek ≤ βµ √ for some β ∈ (0, 1/ 2) and some µ > 0. If (∆x , ∆s ) satisfies h∆x , ∆s i ≥ 0 and ∆x + (v hh∆s ii)H = h

(3.4) (3.5)

for some h ∈ H, then max{k∆x k , µ k∆s k} ≤

khk √ 1 − 2β

Proof. If h∆x , ∆s i ≥ 0, then

q max{k∆x k , µ k∆s k} ≤ k∆x k2 + µ k∆s k2 ≤ k∆x + µ∆s k .

By triangle inequality, hypotheses (3.4) and (3.5), and the sub-multiplicativity of the norm k·k, we have k∆x + µ∆s k ≤ k∆x + (v hh∆s ii)H k + k((v − µe) hh∆s ii)H k ≤ khk + 2 k(v − µe) hh∆s iik ≤ khk + 2βµ khh∆s iik √ ≤ khk + 2βµ k∆s k √ ≤ khk + 2β max{k∆x k , µ k∆s k}. Consequently under the hypotheses of the proposition, √ (1 − 2β) max{k∆x k , µ k∆s k} ≤ khk as required.

¤

eα and Let xα and sα denote, respectively, x+ + α∆x and s+ + α∆s . Similarly, let x ∗ esα denote, respectively, v + α∆xe = Qls+ (xα ) and e + α∆es = Ql−1 (sα ). Let µα des note the scaled duality gap linearization (3.2) that

1 r

hxα , sα i =

1 r

+

he xα , esα i. It follows straightforwardly from the

µα = (1 − α + ασ)µ+ , where σ = µ++ /µ+ < 1. We shall give an upper bound on d(xα , sα ; µα ) using the following lemma, which generalizes the local Lipschitz constant of Cholesky factorization of real symmetric matrices near the identity matrix.

T-algebraic approach to primal-dual algorithms

15

Lemma 1 (c.f. Lemma 11 of [6]). If h ∈ H satisfies khk ≤ 1/2, then √ kle+h − ek ≤ 2 khk . Proof. Let ∆l (t) denote the lower triangular element le+th − e. Note that for all t ∈ R, ∆l (t)H + ∆l (t)∆l (t)∗ = th. For t ∈ [0, 1], we have t khk = k∆l (t)H + ∆l (t)∆l (t)∗ k ≥ k∆l (t)H k − k∆l (t)∆l (t)∗ k √ ≥ 2 k∆l (t)k − k∆l (t)k2 . Solving this quadratic in k∆l (t)k gives r 1 1 k∆l (t)k ≤ √ − − t khk or 2 2

1 k∆l (t)k ≥ √ + 2 Since ∆l (t), whence k∆l (t)k, is continuous in t, it follows that r 1 1 k∆l (t)k ≤ √ − − t khk 2 2

(3.6)

r

1 − t khk. 2

whenever t khk ≤ 1/2. Under the hypothesis khk ≤ 1/2, this indeed hold for t = 1, thus r 1 1 1 kle+h − ek ≤ √ − − khk ≤ √ . 2 2 2 Finally, applying this upper bound in (3.6) with t = 1 gives √ 1 1 khk ≥ 2 kle+h − ek − √ kle+h − ek = √ kle+h − ek 2 2 as required. √ √ Lemma 2. Suppose β < 1/ 2 and χ = β + (1 − σ) r. Then sα ∈ K] and 2 ° ° 3χ3 3 2 χ (7 + 6β) ° ∗ (x ) − µ e° ≤ (1 − α)β + α √ √ µ−1 Q + α lsα α α + (1 − 2β)2 (1 − 2β)3 √ whenever 0 ≤ α ≤ min{1, (1 − 2β)/(2χ)}. Moreover, if β ≤ β < 1 with σ(β − β) 6= 0, then there exists α > 0 such that (i) the cubic polynomial p : α 7→ (1 − α)β + α2

3χ3 χ2 (7 + 6β) √ √ + α3 − (1 − α + σα)β (1 − 2β)2 (1 − 2β)3

¤

(3.7)

(3.8)

satisfies p(α) ≤ 0 for all α ∈ [0, α]; and (ii) for all 0 ≤ α ≤ min{1, α}, the pair (xα , sα ) lies in K ⊕ K] and µ ¶ 2 3χ3 1 3 2 χ (7 + 6β) √ √ +α d(xα , sα ; µα ) ≤ (1 − α)β + α ≤ β (3.9) 1 − α + ασ (1 − 2β)2 (1 − 2β)3 √ Proof. According to Proposition 9, if β < 1/ 2, then √ kµ++ e − vk β + (1 − σ) r √ √ max {k∆xe k , µ+ k∆es k} ≤ ≤ µ+ (3.10) 1 − 2β 1 − 2β

16

C. B. CHUA

by triangle inequality. Thus for every u ∈ T+ \ {0}, hesα , uu∗ i = he, uu∗ i + α h∆es , uu∗ i ≥ kuk2 − |α| k∆es k kuk2 = (1 − |α| k∆es k) kuk2 > 0 √ whenever |α| < (1 − 2β)/χ. Thus e sα ∈ K] , whence sα = Qls+ (esα ) ∈ K] , for every √ α ∈ [0, (1 − 2β)/(2χ)]. Note that whenever sα ∈ K] , we have −1 ∗ esα = Ql−1 (sα ) = (l−1 s+ lsα )(ls+ lsα ) , s +

hence lesα = l−1 s+ lsα . In this case we deduce, using Corollary 1, that (Qls∗ (xα )) = Ql∗sα (xα ). Qle∗sα (e xα ) = Ql∗sα l−∗ s +

+

Thus we may equivalently prove (3.7) for the pair (e xα , esα ) instead. Let lα denote the lower triangular element l − e = le+α∆es − e, which is well-defined e sα √ when α ∈ [0, (1 − 2β)/(2χ)]. By Proposition 2, Qle∗sα (e xα ) = Qe+l∗α (e xα ) = ((e + l∗α ) (hhe xα ii (e + lα )))H = (hhe xα ii + hhe xα ii lα + l∗α hhe xα ii + l∗α (hhe xα ii lα ))H eα + (e =x xα lα )H + Ql∗α (e xα ) = v + α∆xe + (vlα )H + α(∆xe lα )H + Ql∗α (v) + αQl∗α (∆xe ), whence the difference (Qle∗sα (e xα ) − µα e) is v − (1 − α)µ+ e − ασµ+ e + α∆xe + (vlα )H + α(∆xe lα )H + Ql∗α (v) + αQl∗α (∆xe ). Using (3.3), we re-express this as (1 − α)(v − µ+ e) + (v(lα − α hh∆es ii))H + α(∆xe lα )H + µ+ l∗α lα + Ql∗α (v − µ+ e) + αQl∗α (∆xe ) = (1 − α)(v − µ+ e) − µ+ lα l∗α − ((v − µ+ e) hhlα l∗α ii)H + α(∆xe lα )H + µ+ l∗α lα + Ql∗α (v − µ+ e) + αQl∗α (∆xe ). Using the √ sub-multiplicativity of k·k, Lemma 1, and (3.10), we bound for each α ∈ [0, (1 − 2β)/(2χ)], klα l∗α k , kl∗α lα k ≤ klα k2 ≤ 2α2 k∆es k2 ≤ 2α2

(1 −

χ2 √

2β)2

,

k((v − µ+ e) hhlα l∗α ii)H k ≤ 2 kv − µ+ ek khhlα l∗α iik √ ≤ 2 kv − µ+ ek klα l∗α k √ χ2 χ2 √ √ ≤ 2 2α2 µ+ β ≤ 3α2 µ+ β , (1 − 2β)2 (1 − 2β)2 √ kα(∆xe lα )H k ≤ 2α k∆xe k klα k ≤ 2 2α2 µ+

(1 −

χ2 √

2β)2

≤ 3α2 µ+

(1 −

χ2 √

2β)2

,

T-algebraic approach to primal-dual algorithms

17

° ° °Ql∗ (v − µ+ e)° ≤ 2 khhv − µ+ eiik klα k2 α

√ ≤ 2 2α2 µ+ β

(1 −

χ2 √

2β)2

≤ 3α2 µ+ β

(1 −

χ2 √

2β)2

,

and √ ° ° °αQl∗ (∆xe )° ≤ 2α khh∆xe iik klα k2 ≤ 2 2α3 µ+ α

χ3 √

≤ 3α3 µ+

χ3 √

. (1 − 2β)3 (1 − 2β)3 The required inequality (3.7) then follows from the triangle inequality. (i) If β > β, then p(0) = β − β < 0 shows that p has at least one positive real root. If not, then σ > 0, p(0) = 0 and p0 (0) = β − σβ − β = −σβ < 0 also shows the same. Hence p has a smallest positive real√root α and p(α) ≤ 0 for all α ∈ [0, α]. √ (ii) If (1 − 2β)/(2χ) < 1, then for α = (1 − 2β)/(2χ) ∈ (0, 1), we have p(α) > 7α2

χ2 √

2β)2

−β =

7 − β > 0. 4

(1 − √ Hence 0 ≤ α ≤ min{1, (1 − 2β)/(2χ)} whenever 0 ≤ α ≤ min{1, α}. Subsequently sα ∈ K] and (3.7) holds whenever 0 ≤ α ≤ min{1, α}. This leads to ° ° °Ql∗ (xα ) − µα e° ≤ p(α) + (1 − α + ασ)µ+ β sα

≤ (1 − α + ασ)µ+ β = µα β, whence

­

® ­ ® Ql∗sα (xα ), ll∗ = µα he, ll∗ i + Ql∗sα (xα ) − µα e, ll∗ ° ° ≥ µα klk2 − °Ql∗ (xα ) − µe° klk2 sα

2

≥ µα (1 − β) klk > 0. for all l ∈ T∗+ \ {0}. We then conclude that Ql∗sα (xα ) ∈ K, whence xα ∈ K. Finally (3.9) follows directly from (3.7). ¤ 4. Primal-Dual Path-Following Algorithms The three algorithms in this paper are based on the following generic primal-dual path-following framework. Algorithm 1. (Primal-dual path-following framework) Given a pair of primal-dual strictly feasible√solutions (xin , sin ) ∈ K ⊕ K] and µin ∈ R++ with d(xin , sin ; µin ) ≤ β for some β ∈ (0, 1/ 2), and the required accuracy ε > 0. (1) Set (x+ , s+ ) = (xin , sin ) and µ+ = µin . (2) While hx+ , s+ i > ε hx √ in , sin i, e (a) Pick β ∈ (0, 1/ 2) and σ ∈ [0, 1]. (b) Solve (3.2) with µ++ replaced by σµ+ . For each α ∈ [0, 1], let (xα , sα ) = (x+ + α∆x , s+ + α∆s ), and let µα = (1 − α + ασ)µ+ . Pick α b ∈ [0, 1] such that (xαb , sαb ) ∈ K ⊕ K] and e d2 (xαb , sαb ; µαb ) ≤ β. (c) Update (x+ , s+ ) ← (xαb , sαb ) and µ+ ← µαb .

18

C. B. CHUA

(3) Output (xout , sout ) = (x+ , s+ ). 4.1. Short-step algorithm. In our short-step algorithm, conservative updates of the parameter µ are used and full Newton steps are √ taken. The target parameter µ++ is chosen so that the ratio σ = µ++ /µ+ is (1 − δ/ r), where δ ∈ (0, 1) is a constant. √ We shall show that with appropriate choices of β and δ, Algorithm 1 requires at most O( r) iterations to reduce the duality gap by a constant factor. √ Theorem 3. If β ∈ (0, 1/ 2) and δ ∈ (0, 1) satisfy µ ¶ (β + δ)2 (7 + 6β) 3(β + δ)3 δ √ √ + < 1 − √ β, (4.1) r (1 − 2β)2 (1 − 2β)3 √ then we may use α b = 1 in each iteration of √ Algorithm 1 with σ = 1 − δ/ r and βe = β. Moreover the algorithm terminates after O( r log 1ε ) iterations. e we have d(x+ , s+ ; µ+ ) ≤ Proof. Consider an iteration of the algorithm. By choice of β, β at the beginning of the iteration. positive root of the cubic √ Let α be the smallest √ polynomial (3.8) with σ = (1 − δ/ r), χ = β + (1 − σ) r = β + δ, and β = β. Under the hypothesis (4.1), we have χ2 (7 + 6β) 3χ3 √ √ (1 − α)β + α2 + α3 − (1 − α + σα)β (1 − 2β)2 (1 − 2β)3 · ¶ ¸ µ (β + δ)2 (7 + 6β) 3(β + δ)3 δ √ √ ≤α + − 1− √ β 1. Thus by Lemma 2, we have (xα , sα ) ∈ K ⊕ K] and d(xα , sα ; µα ) ≤ β for all α ∈ [0, 1]. This means we √ may take α b = 1 in the iteration. Finally, the duality √ gap reduces by the factor (1 − δ/ r) in each iteration, whence by a factor of ε in O( r log 1ε ) iterations. ¤ 4.2. Large-update algorithm. For the large-update algorithm, large updates µ++ = σµ+ , where σ ∈ (0, 1)√is a constant independent of r, are taken. We shall show that as long as β ∈ (0, 1/(2 2)), Algorithm 1 requires at most O(r) iterations to reduce the duality gap by a constant factor. √ Theorem 4. If β ∈ (0, 1/(2 2)), then we may use α e = Ω(1/r) in Algorithm 1 with βe = β and σ ∈ (0, 1) arbitrary but fixed. Moreover the algorithm terminates after O(r log 1ε ) iterations. e we have d(x+ , s+ ; µ+ ) ≤ Proof. Consider an iteration of the algorithm. By choice of β, β at the beginning of the iteration. √ Let α be the smallest positive root of the cubic polynomial (3.8) with χ = β + (1 − σ) r and β = β; i.e., α is the smallest positive root of √ 3 √ 2 3 3(β + (1 − σ) r) 2 (β + (1 − σ) r) (7 + 6β) √ √ +α α 7→ −σαβ + α (1 − 2β)2 (1 − 2β)3 µ √ √ 3¶ (β + (1 − σ) r)2 (7 + 6β) 2 3(β + (1 − σ) r) √ √ = −σβ + α +α α. (1 − 2β)2 (1 − 2β)3 By Lemma 2, we have (xα , sα ) ∈ K ⊕ K] and d(xα , sα ; µα ) ≤ β = βe

T-algebraic approach to primal-dual algorithms

19

whenever 0 ≤ α ≤ min{1, α}. Since α = Ω(1/r), we may use α e = min{1, α} = Ω(1/r). Finally, the duality gap reduces by the constant factor (1 − Ω(1/r)) every two iterations, whence by a factor of ε in O(r log 1ε ) iterations. ¤ 4.3. Predictor-corrector algorithm. In our predictor-corrector algorithm, the most aggressive updates µ++ = 0 are taken, each of which is immediately followed by a zero update step µ++ = µ+ . This is a direct extension of the Mizuno-Todd-Ye predictorcorrector algorithm for linear programming [10]. The aggressive update steps are called predictor steps and the zero update steps are corrector steps. We shall show that with e Algorithm 1 requires at most O(√r) iterations to reduce appropriate choices of β and β, the duality gap by a constant factor. √ Theorem 5. If β ∈ (0, 1/(2 2)) satisfies 4β 2 (7 + 12β) 24β 3 √ √ + ≤ β, (1 − 2 2β)2 (1 − 2 2β)3

(4.2)

e e then by alternating √ between (σ, β) = (0, 2β) and (σ, β) = (1, β) in Algorithm 1, we may e = 1, respectively. Moreover the algorithm terminates after use√α e = Ω(1/ r) and α O( r log 1ε ) iterations. Proof. Consider an iteration of the algorithm where a predictor step is to be taken. By choice of βe in the previous iteration, we have d(x+ , s+ ; µ+ ) ≤ β at the beginning of the iteration. Let α be the smallest positive root of the cubic polynomial (3.8) with σ = 0, √ √ χ = β + (1 − σ) r = β + r, and β = 2β ∈ (β, 1); i.e., α is the smallest positive root of √ 2 √ 3 r) (7 + 6β) r) 2 (β + 3 3(β + √ √ α 7→ −β + αβ + α +α . (1 − 2β)2 (1 − 2β)3 By Lemma 2, we have (xα , sα ) ∈ K ⊕ K] and d(xα , sα ; µα ) ≤ 2β = βe √ whenever 0 ≤ α ≤ min{1, α}. Since α = Ω(1/ r), we may use α e = min{1, α} = √ Ω(1/ r). Now consider an iteration of the algorithm where a corrector step is to be taken. By choice of βe in the previous iteration, we have d(x+ , s+ ; µ+ ) ≤ 2β at the beginning of the √ iteration. Under the hypothesis (4.2), for σ = 1, χ = (2β) + (1 − σ) r = 2β and β = 2β, we have 3χ3 χ2 (7 + 6(2β)) √ √ + α3 − (1 − α + σα)β (1 − α)(2β) + α2 (1 − 2(2β))2 (1 − 2(2β))3 · 2 ¸ 4β (7 + 12β) 24β 3 √ √ ≤α + − 2β < 0 (1 − 2 2β)2 (1 − 2 2β)3 for all 0 < α ≤ 1, whence by Lemma 2, we have (xα , sα ) ∈ K ⊕ K] and µ ¶ 2 3χ3 1 3 2 χ (7 + 6(2β)) √ √ +α d(xα , sα ; µα ) ≤ (1 − α)(2β) + α 1 − α + ασ (1 − 2(2β))2 (1 − 2(2β))3 µ µ 2 ¶¶ 4β (7 + 12β) 24β 3 √ √ ≤ 2(1 − α)β + α + (1 − 2 2β)2 (1 − 2 2β)3 ≤ 2β − αβ

20

C. B. CHUA

for all α ∈ [0, 1], in particular d(x1 , s1 ; µ1 ) ≤ β. This means we may take α b = 1 in the iteration. √ Finally, the duality gap reduces by the constant factor (1 − Ω(1/ r)) every two itera√ 1 tions, whence by a factor of ε in O( r log ε ) iterations. ¤ 5. Conclusion In this paper, a generic primal-dual path-following framework for homogeneous cone programming is presented. With it, we can design a short-step algorithm, a large-update algorithm, and a predictor-corrector algorithm. Each algorithm is shown to have polynomial iteration complexity bounds that match existing bounds for their counter-parts in semidefinite programming. There are two rather unsatisfactory features of this generic primal-dual framework worth further investigations: (1) There is a lack of primal-dual symmetry. In another words, when we swap the primal problem with the dual problem, together with the initial primal-dual pair of solutions, and apply the same algorithm using the same respective T -algebras for the primal and dual homogeneous cones, we may not get the same iterates. (2) In order for the primal-dual search direction to be uniquely defined, it is necessary for the primal-dual framework to use the narrow `2 -neighbourhood. A first step towards designing a wide-neighbourhood primal-dual algorithm for homogeneous cone programming is to find a search direction that is defined at all pairs of primal-dual strictly feasible solutions. References [1] F. Alizadeh and S. H. Schmieta, Extension of primal-dual interior point algorithms to symmetric cones, Math. Program. 96 (2003), 409–438. [2] C. B. Chua, An algebraic perspective on homogeneous cone programming, and the primal-dual second order cone approximations algorithm for symmetric cone programming, Ph.D. thesis, Cornell University, Ithaca, NY, USA, May 2003. [3] , The primal-dual second-order cone approximations algorithm for symmetric cone programming, manuscript, Cornell University, Feburary 2003, accepted by Found. Comput. Math. [4] , Relating homogeneous cones and positive definite cones via T -algebras, SIAM J. Optim. 14 (2003), no. 2, 500–506. [5] , A new notion of weighted centers for semidefinite programming, SIAM J. Optim. 16 (2006), no. 4, 1092–1109. [6] , Target following algorithms for semidefinite programming, Research Report CORR 2006-10, Department of Combinatorics and Optimization, Faculty of Mathematics, University of Waterloo, Canada, May 2006, 40 pages. [7] J. Faraut and A. Kor´anyi, Analysis on symmetric cones, Oxford Press, New York, NY, USA, 1994. [8] L. Faybusovich, Linear systems in Jordan algebras and primal-dual interior-point algorithms, J. Comput. Appl. Math. 86 (1997), 149–175. [9] O. G¨ uler and L. Tun¸cel, Characterization of the barrier parameter of homogeneous convex cones, Math. Program. 81 (1998), 55–76. √ [10] S. Mizuno, M. J. Todd, and Y. Ye, An O( nL)-iteration homogeneous and self-dual linear programming algorithm, Math. Oper. Res. 19 (1994), 53–67. [11] R. D. C. Monteiro and P. Zanj´acomo, Implementation of primal-dual methods for semidefinite programming based on Monteiro and Tsuchiya Newton directions and their variants, Optim. Methods Softw. 11/12 (1999), 91–140. [12] Yu. E. Nesterov and A. S. Nemirovski, Interior point polynomial algorithms in convex programming, SIAM Stud. Appl. Math., vol. 13, SIAM Publication, Philadelphia, PA, USA, 1994.

T-algebraic approach to primal-dual algorithms

21

[13] Yu. E. Nesterov and M. J. Todd, Primal-dual interior-point methods for self-scaled cones, SIAM J. Optim. 8 (1998), 324–364. [14] B. K. Rangarajan, Polynomial convergence of infeasible-interior-point methods over symmetric cones, SIAM J. Optim. 16 (2006), 1211–1229. [15] L. Tun¸cel, Primal-dual symmetry and scale invariance of interior-point algorithms for convex optimization, Math. Oper. Res. 23 (1998), 708–718. [16] , Generalization of primal-dual interior-point methods to convex optimization problems in conic form, Found. Comput. Math. 1 (2001), 229–254. ` B. Vinberg, The theory of convex homogeneous cones, Tr. Mosk. Mat. O.-va 12 (1963), 303–358. [17] E. [18] H. Wolkowicz, R. Saigal, and L. Vandenberghe (eds.), Handbook of semidefinite programming: Theory, algorithms, and applications, Springer-Verlag, Berlin-Heidelberg-New York, 2000. [19] S. J. Wright, Primal-dual interior-point methods, SIAM Publication, Philadelphia, PA, USA, 1997. Division of Mathematical Sciences, Nanyang Technological University, 1 Nanyang Walk, Blk 5 Level 3, Singapore 637616, Singapore E-mail address: [email protected]