Optimization Methods and Software
Vol. 00, No. 00, February 2008, 1{23
A Generic Primal-Dual Interior-point Method for Semide nite Optimization based on a new Class of Kernel Functions
Mohamed El Ghamiy & Cornelis Roosx & Trond Steihaugz yz Department of Informatics, University of Bergen P.O.Box 7803 N-5020 Bergen, Norway. x Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology P.O. Box 5031, 2600 GA Delft, The Netherlands. (28 February 2007, Revised 26 August 2008.)
In this paper we present a class of polynomial-time primal-dual interior-point methods (IPMs) for semide nite optimization based on a new class of kernel functions. This class is fairly general and includes the class of nite kernel functions [1]: the corresponding barrier functions have a nite value at the boundary of the feasible region. They are not exponentially convex and also not strongly convex like many usual barrier functions. We show that the IPMs based on these functions have favorable complexity results. To achieve this, several new tools are derived in the analysis. The kernel functions depend on parameters p 2 [0; 1] and 1. When those parameters are appropriately chosen then the iteration bound of large-update IPMs based on these functions, coincide with the currently best known bounds for primal-dual IPMs. Keywords: Kernel function, Interior-point, Semide nite optimization, Primal-dual method. Abstract
AMS Subject Classi cation:
90C22
90C31
y Corresponding author. Tel:+ 4755584195; Fax:+4755584199; E-mail:
[email protected] x Tel:+ 31152782530; Fax:+31152786632; E-mail:
[email protected] z Tel:+4755584169; Fax:+4755584199; E-mail:
[email protected] ??????
ISSN xxxx-xxxx print/ ISSN xxxx-xxxx online c 2004 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080/1055678xxxxxxxxxxxxx
2
A Generic Primal-Dual IPMS for Semide nite based on Kernel Functions
1 Introduction A semide nite optimization problem (SDO) is a convex optimization problem in the space of symmetric matrices. We consider the standard semide nite optimization problem
(SDP )
p = inf fTr(CX ) : Tr(Ai X ) = bi (1 i m); X 0g ; X
and its dual problem (SDD) (SDD)
(
d = sup bT y : y;S
m X i=1
)
yi Ai + S = C; S 0 ;
where C and Ai are symmetric n n matrices, b; y 2 Rm; and X 0 means that X is symmetric positive semide nite and Tr(A) denotes the trace of A (i.e., the sum of its diagonal elements). Without loss of generality the matrices Ai are assumed to be linearly independent. Recall that for any two n n matrices, A and B their natural inner product is given by Tr(AT B ) =
n X n X i=1 j =1
Aij Bij :
IPMs provide a powerful approach for solving SDO problems. A comprehensive list of publications on SDO can be found in the SDO homepage maintained by Alizadeh [2]. Pioneering works are due to Alizadeh [2,3] and Nesterov et al [4]. Most IPMs for SDO can be viewed as natural extensions of IPMs for linear optimization (LO), and have similar polynomial complexity results. However, to obtain valid search directions is much more dicult than in the LO case. In the sequel we describe how the usual search directions are obtained for primal-dual methods for solving SDO problems. Our aim is to show that the kernel-function-based approach that we presented for LO in [1] can be generalized and applied also to SDO problems. 1.1 Classical search direction We assume that (SDP) and (SDD) satisfy the interior-point condition (IPC), i.e., there exists X0 0 0 0 and (y0; S 0) with S 0 0 such that X 0 is feasible for (SDP) and (y ; S ) is feasible for (SDD). Moreover, we may assume that X 0 = S 0 = E , where E is the n n identity matrix [5]. Assuming the IPC, one can write the optimality conditions for the primal-dual pair of problems
M. EL Ghami C. Roos and T. Steihaug
3
as follows. m X i=1
Tr(AiX ) = bi;
i = 1; : : : ; m
y i Ai + S = C
(1)
XS = 0 X; S 0:
The basic idea of primal-dual IPMs is to replace the complementarity condition XS = 0 by the parameterized equation XS = E ; X; S 0; where > 0: The resulting system has a unique solution for each > 0. This solution is denoted by (X (); y(); S ()) for each > 0; X () is called the -center of (SDP ) and (y(); S ()) is the -center of (SDD). The set of -centers (with > 0) de nes a homotopy path, which is called the central path of (SDP ) and (SDD) [6,7]. The principal idea of IPMs is to follow this central path and approach the optimal set as goes to zero. Newton's method amounts to linearizing the system (1), thus yielding the following system of equations. Tr(AiX ) = 0; i = 1; : : : ; m m X i=1
yiAi + S = 0
X S + XS = E
(2) XS:
This so-called Newton system has a unique solution (X; y; S ). Note that S is symmetric, due to the second equation in (2). However, a crucial point is that X may be not symmetric. Many researchers have proposed various ways of `symmetrizing' the third equation in the Newton system so that the new system has a unique symmetric solution. All these proposals can be described by using a symmetric nonsingular scaling matrix P and by replacing (2) by the system Tr(AiX ) = 0; i = 1; : : : ; m m X i=1
yiAi + S = 0
(3)
4
A Generic Primal-Dual IPMS for Semide nite based on Kernel Functions
X + P SP T = S 1 X Now X is automatically a symmetric matrix. 1.2 Nesterov-Todd direction In this paper we consider the symmetrization schema of Nesterov-Todd [8]. So we use P
= X 12
1
1 21
X 2 SX 2
1
X2
=S
1 1 1 1 2 S 2 XS 2 2
1
S 2;
where the last equality can be easily veri ed. Let D = P 12 , where P 21 denotes the symmetric square root of P . Now, the matrix D can be used to scale X and S to the same matrix V , namely [5,9]: 1 1 V := p D 1 XD 1 = p DSD: (4)
Obviously the matrices D and V are symmetric, and positive de nite. Let us further de ne 1 Ai := p DAi D; i = 1; 2; : : : ; m;
and DX :=
p1 D 1XD 1; DS := p1 DSD:
(5)
We refer to DX and DS as the scaled search directions. Now (3) can be rewritten as follows: Tr(AiDX ) = 0; i = 1; : : : ; m: m X yiAi + DS = 0; (6) i=1
DX + DS = V
1
V:
In the sequel, we use the following notational conventions. Throughout this paper, kk denotes the 2-norm of a vector. The nonnegative and the positive orthants are denoted as Rn+ and intRn+, respectively, and Sn, Sn+, and intSn+
M. EL Ghami C. Roos and T. Steihaug
5
denote the cone of symmetric, symmetric positive semide nite and symmetric positive de nite n n matrices, respectively. For any V 2 Sn, we denote by (V ) the vector of eigenvalues of V arranged in increasing order, 1(V ) 2 (V ) ; : : : ; n (V ). For any square matrix A, we denote by 1 (A) 2 (A) ; : : : ; n (A) the singular values of A; if A is symmetric, then one has i (A) = ji(A)j ; i = 1; 2; : : : ; n: If z 2 Rn and f : R ! R, then f (z) denotes the vector in Rn whose i-th component is f (zi), with 1 i n, and if D is a diagonal matrix, then f (D) denotes the diagonal matrix with f (Dii) as i diagonal component. For X 2 S n; X1 = Q 1DQ, where Q is orthogonal, and D a diagonal matrix, f (X ) = Q f (D)Q. Finally if v is a vector, diag(v) denotes the diagonal matrix with the diagonal elements vi. 2 New search direction In this section we introduce the new search direction. But we start with the de nition of a matrix function [10,11]. Definition 2.1
Let X be a symmetric matrix, and let
X = QX1 diag(1 (X ); 2 (X ); : : : ; n (X ))QX ; be an eigenvalue decomposition of X , where i (X ), 1 i n denotes the i-th eigenvalue of X , and QX is orthogonal. If (t) is any univariate function whose domain contains fi (X ); 1 i ng then the matrix function (X ) is de ned by
(X ) = QX1diag( (1(X )); (2(X )); : : : ; (n(X )))QX : and the scalar function (X ) is de ned as follows [7]: (X ) := The univariate function
n X i=1
(i(X )) = Tr( (X )):
(7)
is called the kernel function of the scalar function
. In this paper, when we use the function () and its rst three derivatives 0 (), 00 (), and 000 () without any speci cation, it denotes a matrix function if the argument is a matrix and a univariate function (from R to R) if the argument is in R. Analogous to the case of LO, the kernel-function-based approach to SDO is obtained by modifying Nesterov-Todd direction [7].
6
A Generic Primal-Dual IPMS for Semide nite based on Kernel Functions
The observation underlying our approach0 is that the right-hand side V 1 V 2 in the third equation of (6) is precisely (V ) if (t) = (t 1)=2 log t, the latter being the kernel function of the well-known logarithmic barrier function. Note that this kernel function is strictly convex and nonnegative, whereas its domain contains all positive reals and it vanishes at 1. As we will now show any continuously dierentiable kernel function (t) with these properties gives rise to a primal-dual algorithm for SDO. Given such a kernel function 0 (t) we replace the right-hand side V 1 V in 0 the third equation of (6) by (V ), with (V ) de ned according to De nition 2.1. Thus we use the following system to de ne the (scaled) search directions DX an DS : Tr(AiDX ) = 0; i = 1; : : : ; m: m X i=1
yiAi + DS = 0 DX + DS =
(8) 0 ( V ):
Having DX and DS , 4X and 4S can be calculated from (5). Due to the orthogonality of 4X and 4S , it is trivial to see that DX ?DS , and so Tr(DX DS ) = Tr(DS DX ) = 0:
(9)
The algorithm considered in this paper is described in Figure 1. The inner while loop in the algorithm is called inner iteration and the outer while loop outer iteration. So each outer iteration consists of an update of the barrier parameter and a sequence of one or more inner iterations. Note that by using the embedding strategy [5], we can initialize the algorithm with X = S = E . Since then XS = E for = 1 it follows from (4) that V = E at the start of the algorithm, whence (V ) = 0. We then decrease to := (1 ), for some 2 (0; 1). In general this will increase the value of (V ) above the threshold value . To get this value smaller again, and coming closer to the current -center, we solve the scaled search directions from (8), and unscale these directions by using (5). By choosing an appropriate step size , we move along the search direction, and construct a new triple (X+; y+; S+) with X+ = X + 4X y+ = y + y S+ = S + 4S:
(10)
If necessary, we repeat the procedure until we nd iterates such that (V ) no longer exceed the threshold value , which means that the iterates are in a small enough neighborhood of (X (); y(); S ()). Then is again reduced
M. EL Ghami C. Roos and T. Steihaug
7
Generic Primal-Dual Algorithm for SDO
Input:
a kernel function (t); a threshold parameter 1; an accuracy parameter > 0; a barrier update parameter ; 0 < < 1; begin X := E ; S := E ; := 1; V := E ; while
begin
n do
:= (1 ); V := p1V ; while (V )
do
begin
Find search directions by solving system (8); Determine a step size ; X = X + 4X ; y = y + y; S = S + 4S ; Compute V from (4);
end end end
Figure 1. Generic primal-dual interior-point algorithm for SDO.
by the factor 1 and we apply the same procedure targeting at the new -centers. This process is repeated until is small enough, i.e. until n . At this stage we have found an -solution of (SDP ) and (SDD). Just as in the LO case, the parameters ; , and the step size should be chosen in such a way that the algorithm is `optimized' in the sense that the number of iterations required by the algorithm is as small as possible. Obviously, the resulting iteration bound will depend on the kernel function underlying the algorithm, and our main task becomes to nd a kernel function that minimizes the iteration bound. The rest of the paper is organized as follows. In Section 3 we introduce the kernel function (t) considered in this paper and discuss some of its properties that are needed in the analysis of the corresponding algorithm. In Section 4 we derive the properties of the barrier function (V ). The step size and
8
A Generic Primal-Dual IPMS for Semide nite based on Kernel Functions
the resulting decrease of the barrier function are discussed in Section 5. The total iteration bound of the algorithm and the complexity results are derived in Section 6. Finally, some concluding remarks follow in Section 7. 3 Our kernel function and some of its properties In this paper we consider kernel functions of the form
(t) :=
(t) := t 1 + p 1 + e 1+p
p;
1 ; p 2 [0; 1]; 1; t 0: (11)
(1 t)
Note that 1; was rst investigated in [1] for linear optimization. Up till then all kernel functions in the literature had the property that limt#0 (t) = 1 and limt!1 (t) = 1. Our kernel function has the second property, but it fails to have the rst property, because e lim ( t) = (0) = t#0
1
1 < 1: 1+p
This means that if either X or S approaches the boundary of the feasible region, then (V ) converges to a nite value, depending on the value of : Another special feature of (t) is its so-called growth term, which1+pdominates the behavior if t approaches 1. In the present case it is given by t 1+p 1 , which is linear if p = 0 and is quadratic only if p = 1. So, for p < 1 the growth term is not quadratic, as is the case for almost all kernel functions. In the analysis of the algorithm based on (t) we need its rst three derivatives. These are given by 0 (t) = tp
e(1 t) ; 00 (t) = ptp 1 + e(1 t) ; 000 (t) = p (1 p) tp 2 2 e(1 t) :
(12) (13) (14)
Note that (1) = 0(1) = 0, and 00(t) > 0, showing that (t) is strictly convex and minimal at00 t = 1, where it vanishes. Also note that if p < 1 and t approaches 1, then (t) converges to zero. So, the second derivative is not bounded away from zero, i.e., (t) is not strongly convex if p < 1. For the analysis of the algorithm we need the inequalities in the following lemma.
M. EL Ghami C. Roos and T. Steihaug Lemma 3.1
Let
(t) be as de ned in (11). Then, 1
t 00 (t) + 0 (t) > 0;
if t ;
if t > 0; if t > 1;
000 (t) < 0; 0 (t) 00 ( t) > 0;
00 (t) 0 ( t) Proof.
9
> 1:
(15-a) (15-b) (15-c)
Using (12) we write, also using t 1 , 0 (t) + t
00
(t) = (1 + p) t + (t 1) e(1 t) 1 + p > 0:
Thus (15-a) follows. Inequality (15-b) is immediate from (14). To prove (15-c) we write 00 (t) 0 ( t)
0 (t) 00 ( t) = (
1) e
(t 2+ t)
+ tp 1g( ) > tp 1g( ):
Here we used that > 1, and nonnegative, and g( ) = p (p + t) e(1
t)
( t + p) e(1
t) :
It remains to show that g( ) > 0. One has g(1) = 0 and g0 ( ) = p p
1
(p + t) e(1 t) + t ( t + p 1) e( t
1) :
Since > 1; 1 and t > 1, t > 1: Since p 0 we obtain g( ) > 0, whence (15-c) follows. Note that the second inequality in (15-a) (t 1 ) is more restrictive than in [12], where we had the same inequality for all t > 0. In Section 5.4 we will ensure that we may apply inequality (15-a) in the region were the iterates of the algorithm occur. At some places below we apply the function to a positive vector v. The interpretation of (v) is compatible with De nition 2.1 when identifying the vector v with its diagonal matrix diag (v). When applying to this matrix we obtain (v) =
n X i=1
(vi);
v 2 intRn+ :
10
A Generic Primal-Dual IPMS for Semide nite based on Kernel Functions
4 Properties of (V ) and (V ) In this section we extend Theorem 4.9 in [12] to the cone of positive de nite matrices. The next theorem gives a lower bound on the norm-based proximity measure (V ), de ned by v
u n uX 1 0 1 0 (i (V ))2 = 1 kDX + DS k ; (V ) = 2 k (V )k = t 2 i=1 2
(16)
in terms of (V ). Since (V ) is strictly convex and attains its minimal value zero at V = E , we have (V ) = 0 ,
(V ) = 0
, V = E:
We denote by % : [0; 1) ! [1; 1) the inverse function of (t) for t 1. In other words s= Theorem 4.1
(t ) ,
t = % (s );
t 1;
(17)
Let % be as de ned in (17). Then
(V )
1 2
0 (% ( (V ))) :
Proof.
If V = E then (V ) = (V ) = 0. Since %(0) = 1 and 0(1) = 0, the inequality holds with equality if V = E . Otherwise, by the de nitions of (V ) in (16) and (V ) in (7), we have (V ) > 0 and (V ) > 0. Let vi := i(V ), 1 i n. Then v > 0 and (V ) =
v u n uX 1t 2
i=1
0 (i (V ))2 =
v u n uX 1t
2
0 (vi )2 :
i=1
Since (t) satis es (15-b) we may apply Theorem 4.9 in [12] to the vector v. This gives (V )
1 2
0 %
n X i=1
(vi)
!!
:
M. EL Ghami C. Roos and T. Steihaug
11
Since n X i=1
(vi) =
n X i=1
(i(V )) = (V );
the proof of the theorem is complete. Lemma 4.2
One has
t 0 (t)
(t); if t 1: Proof. De ne g (t) := t 0 (t) (t). Then one has g (1) = 0 and g 0 (t) = t 00 (t) 0: Hence g(t) 0 for t 1. So the lemma follows. Lemma 4.3
If
(V ) 1; then
1 (V ) 1+p p : (18) 6 Proof. The proof of this lemma uses Theorem 4.1 and Lemma 4.2. Putting s = (V ), we obtain from Theorem 4.1 that (V ) 21 0 (% (s)) : Putting t = %(s), we have by (17), (V )
(t) = t 1 + p 1 + e 1+p
1 = s;
(t 1)
t 1:
Using s; t; 1 we get t1+p 1 1 e (t 1) s + 1 s + 1 2s; = s+ 1+p whence t1+p 1 + 2(1 + p) s 3 (1 + p) s; and therefore, since p 2 [0; 1], 1
%(s) = t (3 (1 + p) s) 1+p
1
3s 1+p :
12
A Generic Primal-Dual IPMS for Semide nite based on Kernel Functions
Now applying Lemma 4.2 we may write 1 (%(s)) = s 1 s 1+p p = 1 (V ) 1+p p : (V ) 0 (%(s)) 2 2% (s) 2% (s) 6 6 This proves the lemma. Note that since 1 we have at the start of each inner iteration that (V ) 1: Substitution in (18) gives 1 (19) (V ) : 6 5 Analysis of the algorithm In the analysis of the algorithm exponential convexity is a crucial ingredient. In the next section we de ne this concept and show that property (15-a) guarantees that the barrier function (V ) is exponential convex in the region where the iterates occur. 5.1 Three technical lemmas The next lemma is cited from [10, Lemma 3.3.14 (c)]. Let A; B 2 Sn be two nonsingular matrices and f (t) be given real-valued function such that f (et ) is a convex function. One has Lemma 5.1
n X i=1
f (i (AB ))
where i (A); and i (B ) i respectively Lemma 5.2
n X i=1
f (i (A)i (B ));
= 1; 2; :::; n denote the singular values of A and B
Let A; A + B 2 Sn+ , then one has
i (A + B ) 1
jn(B )j ; i = 1; 2; :::; n:
Proof. It is obvious that i (A + B ) 1 (A + B ). By the Rayleigh-Ritz theorem (see [13]), there exists a nonzero X0 2 Rn, such that X T (A + B )X0 X0T AX0 X0T BX0 1 (A + B ) = 0 = + :
X0T X0
X0T X0
X0T X0
M. EL Ghami C. Roos and T. Steihaug
13
We therefore may write
X T AX X T BX 1 (A + B ) 0 T 0 0 T 0 X0 X0 X0 X0 T X BX X T AX = 1 Xmin max 6=0 X T X X 6=0 X T X
j n ( B )j :
This completes the proof of the lemma.
Let V1 and V2 be two symmetric positive de nite matrices, and 1 (V1 ); 1 (V2 ) 1 ; then Lemma 5.3
1 1 (V12 V2V12 ) 12 12 ( (V1) + (V2)) : Proof. For any nonsingular matrix U 2 Sn ; we have
i ( U ) = i ( U T U ) 1
1 2
1
= i(UU T ) 2 ;
i = 1; 2; :::; n:
1
Taking U = V12 V22 , we may write 1
1
1 12
1
i (V12 V22 ) = i (V12 V2 V12 )
1
1 12
= i (V22 V1V22 )
;
i = 1; 2; :::; n:
Since V1 and V2 are symmetric positive de nite, using Lemma 5.1 one has
1
1
(V12 V2V12 ) 12 = 1
n X i=1
1
1
i (V12 V22 )
n X i=1
1
1
i (V12 )i (V22 ) :
1
use that (t) satis es (15-a) for t 1 . By Since 1(V12 ); 1(V22 ) 1 we may p Lemma 2.4 in [1] this implies t1t2 12 ( (t1) + (t2)) for any t1; t2 1 . Hence we obtain n 1 X 1 1 1 i2 (V12 ) + i2 (V22 ) (V12 V2V12 ) 12 21 i=1 n X = 12 ( (i(V1)) + (i(V2))) = 12 ( (V1) + (V2)) : i=1 This completes the proof.
14
A Generic Primal-Dual IPMS for Semide nite based on Kernel Functions
5.2 The decrease of the proximity in the inner iteration In this subsection we are going to compute a default value for the step size in order to yield a new triple (X+; y+; S+) as de ned in (10). After a damped step, using (5) we have p p X+ = X + 4X = X + DDX D = D (V + DX ) D; y+ = y + y; p p S+ = S + 4S = X + D 1 DS D1 = D 1 (V + DS ) D 1 :
Denoting the matrix V after the step as V+, we have V+ =
p1 D 1X+S+D 2 : 1
1
1
Note that V+2 is unitarily similar to the matrix 1 X+2 S+X+2 and hence also to (V + DX ) 12 (V + DS ) (V + DX ) 12 : This implies that the eigenvalues of V+ are the same as those of the matrix
V~+ := (V
+ DX ) 12 (V + DS ) (V + DX ) 12
1 2
:
The de nition of (V ) implies that its value depends only on the eigenvalues of V . Hence we have
V~+ = (V+) : Our aim is to nd an upper bound for f () := (V+ )
(V ) = V~+
(V ) :
(20)
To do this we assume for the moment that the step size is such that: i (V
+ DX ) 1 ;
i (V
+ DS ) 1 ; i = 1; 2:::; n:
(21)
M. EL Ghami C. Roos and T. Steihaug
15
Then we may apply Lemma 5.3. This gives
1
V~+ = (V + DX ) 21 (V + DS ) (V + DX ) 12 2 12 [ (V + DX ) + (V + DS )] : From the de nition (20) of f (), we now have f () f1(), where f1 () := 12 [ (V + DX ) + (V + DS )] (V ) : Note that f1() is convex in , since is convex. Obviously, f (0) = f1(0) = 0: Taking the derivative with respect to , we get f10 () = 21 Tr 0 (V + DX ) DX + 0 (V + DS ) DS : Using the last equality in (8) and also (16), this gives f10 (0) = 12 Tr 0 (V ) (DX + DS ) = 12 Tr 0 (V )2 = 2(V )2 : Dierentiating once more, we obtain (22) f100 () = 21 Tr 00 (V + DX ) DX2 + 00 (V + DS ) DS2 : In the sequel we use the following notation: 1 := min(i (V )); := (V ): Lemma 5.4
One has
f100 () 22 00 (1
2) : Proof. The last equality in (8) and (16) imply that kDX + DS k2 = kDX k2 + kDS k2 = 42. Thus we have jn(DX )j 2 and jn(DS )j 2. Using Lemma 5.2 and V + DX 0; As a consequence we have, for each i, i (V + DX ) 1 jn (DX )j 1 2; i (V + DS ) 1 jn (DS )j 1 2: Due to (14), 00 is monotonically decreasing. So the above inequalities imply that 00 (i (V + DX )) 00 (1 2 ); 00 (i (V + DS )) 00 (1 2 ):
A Generic Primal-Dual IPMS for Semide nite based on Kernel Functions
16
Substitution into (22) gives f100 ()
1 2
00 (1
2) Tr
DX2 + DS2
= 21 00 (1 2) kDX k2 + kDS k2
:
Now, using that DX and DS are orthogonal, by (9), and also kDX + DS k2 = 42, by (16), we obtain f100 () 22 00 (1 (V ) 2) : This proves the lemma. Using the notation vi = i(V ), 1 i n; again, we have f100 () 22 00 (v1 2) ; (23) which is exactly the same inequality as inequality (41) in [1]. This means that our analysis closely resembles the analysis of the LO case in [1]. Recall, however, that in [1] we only dealt with the case where p = 1. But from this stage on we can apply similar arguments as in the LO case. In particular, the following two lemmas can be stated without proof. Let be the inverse function of 12 0 (t) for t 2 (0; 1]. Then the Lemma 5.5
largest value of the step size satisfying (23) is given by
^ := Moreover
1 [ ( ) 2
(2)] :
1
^ 00 ( (2)) :
For future use we de ne
1
e = 00 ((2)) :
By Lemma 5.5 this step size satis es (23). Lemma 5.6
If the step size is such that ^ then
f ( ) 2 :
Using the above lemmas from [1] we proceed as follows.
(24)
M. EL Ghami C. Roos and T. Steihaug Theorem 5.7
17
Let be as de ned in Lemma 5.5 and e as in (24). Then
2 00 ((2 ))
f (e)
16
p 1+p 96 :
Proof.
Since e ^; Lemma 5.6 gives f (e) e 2; where e = (1(2)) . Thus the rst inequality follows. To obtain the second inequality we put t = (2). Due to the de ntion of this means 00
0 (t) =
tp e(t
1)
= 4;
t 2 (0; 1]:
This implies e(1
t)
= 4 + tp 4 + 1:
Using (13), t 1 ; and p 2 [0; 1], we get 1 1 1 1 e = 00 = p 1 (t) pt + e(1 t) p1 p + e(1 t) 1 + e(1
(25)
t)
:
Also using (25) and (19) (i.e., 6 1) we get, 1 = 1 1 1 : e = (2 + 4) 2 (1 + 2) 2 (6 + 2) 16 We de ne 1 : := (26) 16 This will be the default step size that will be used below in the theoretical analysis. It is a pessimistic estimate for the optimal step size, which gives the largest decrease in the barrier function value, but for our purpose it is large enough. For this step size we have f ()
2 16
Substitution of (18) yields the theorem.
= 16 :
18
A Generic Primal-Dual IPMS for Semide nite based on Kernel Functions
5.3 A uniform upper bound for In this subsection we extend Theorem 3.2 in [12] to the cone of positive de nite matrices. As we will see the proof of the next theorem easily follows from Theorem 3.2 in [12]. Let % be as de ned in (17). Then for any positive vector v and any > 1 we have:
Theorem 5.8
( V ) n Proof.
%
(V ) : n
Let vi := i(V ), 1 i n. Then v > 0 and ( V ) =
n X i=1
(i( V )) =
n X
( i(V )) =
i=1
n X i=1
( vi) = ( v):
Due to the fact that (t) satis es (15-c), at this stage we may use Theorem 3.2 in [12], which gives ( v) n
%
(v) : n
Since (v) =
n X i=1
(vi) =
n X i=1
(i(V )) = (V );
the theorem follows. Before the update of we have (V ) , and after the update of to (1 ) we have V+ = p1V . Application of Theorem 5.8, with = p11 , yields that (V+) n
%
p
1
n
!
:
To estimate the right-hand side expression, we need an upper bound for the inverse function % of (t) for t 2 [1; 1). So, if (t) = s; t 1; we need an
M. EL Ghami C. Roos and T. Steihaug
19
upper bound for t. By using 2, one has t1+p 1 1 e(1 t) s + 1 s + 1 ; = ( t) + 1+p 2 which gives 1 1+p 1 1 (2 s + 2) 1+p 2 (s + 1) ; p 2 [0; 1]: (27) t (1 + p) s + + 1 2 At this stage we need the following lemma. If 2; then one has t (t) (t 1)2 ; for t 1: Proof. Let g (t) := t (t) (t 1)2 , then one has g (1) = 0 and g 0 (t) = (t) + t 0 (t) 2(t 1): Hence g0 (1) = 0 and g00 (t) = 2 0 (t) + t 00 (t) 2: Since g00 (t) = 2(tp 1) + (t 2) e(1 t) + ptp 0; the lemma follows. p p This lemma implies that t 1 + t (t) = 1 + ts: Now substituting (27) we obtain pp %(s) = t 1 + 2 s2 + s: Substituting this and 1+p (1 t) 1+p (t) = t p + 1 1 + e 1 1t + p t1+p; for t 1; into (30) yields that
Lemma 5.9
%
n
(V+) n p1 For further use we de ne L(n; ; ) :=
!p+1
(1
(1
n 1+p ) 2
p
n 1+p ) 2
1+ 2
p
r 2
1+ 2
n
r
2
n
+ n
+ n
!1+p
:
!1+p
:
(28)
In the sequel the value L(n; ; ) is simply denoted as L. A crucial (but trivial) observation is that during the course of the algorithm the value of (V ) will
20
A Generic Primal-Dual IPMS for Semide nite based on Kernel Functions
never exceed L, because during the inner iterations the value of always decreases. 5.4 Fixing the value of With L as just de ned, we have the following result. Suppose that L 9 and (V ) L: If 1 + 2log (L + 1) ; then i (V ) > 23 ; for all i = 1; :::; n: Proof. First note that (V ) L implies (i (V )) L for each i = 1; :::; n: Hence, putting t = i(V ); we have t1+p 1 1 (1 t) 1 L: 1+p + e It follows that 1 e(1 t) 1 L + 1 t1+p L + 1; 1+p This implies 1 + (L + 1) : e1 t Lemma 5.10
e
1
Since the expression at the right-hand side is monotonically decreasing in and 1 + 2 log (L + 1), it follows that 1 + (1 + 2 log (L + 1)) (L + 1) : e1 t (L + 1)2 The expression at the right-hand1 side is monotonically decreasing in L: The value at L = 9 is 01:5705::: 2 , proving the lemma. Note that at the start of each inner iteration < (V ) L. To ensure that L satis es the conditions of Lemma 5.10, we assume from now that L 9 and we choose = 1 + 2 log (L + 1) : Finally, to validate the above analysis we need to show that the step size
M. EL Ghami C. Roos and T. Steihaug
21
satis es (21). Using (26) and Lemma 5.10, we may write for all i = 1; 2; :::; n; 3 2 3 1 = 11 1 ; i (V + DX ) 1 (V ) 2 2 16 2 8 8 and similarly i(V + DS ) 1 . 6 Complexity We are now ready to derive the iteration bounds for large-update methods. An upper bound for the total number of (inner) iterations is obtained by multiplying an upper bound for the number of inner iterations between two successive updates of by the number of barrier parameter updates. The last number is bounded above by (cf. [14, Lemma II.17, page 116]) 1 log n :
To obtain an upper bound K for the number of inner iterations between two successive updates we need a few more technical lemmas. The following lemma is taken from Proposition 1.3.2 in [7]. Its relevance is due to the fact that the barrier function values between two successive updates of yield a decreasing sequence of positive numbers. We will denote this sequence as 0; 1; : : :. Lemma 6.1
Let t0 ; t1 ; ; tK be a sequence of positive numbers such that
tk+1 tk where > 0 and
t1k ;
0 < 1. Then K
k = 0; 1; ; K j
t 0
1;
k
.
If K denotes the number of inner iterations between two successive updates of , then Lemma 6.2
1
K 96 (1 + p) 01+p : Proof.
The de nition of K implies K K and k+1 k
( k )1 ;
1
>
and, according to Theorem 5.7,
k = 0; 1; ; K
1;
22
A Generic Primal-Dual IPMS for Semide nite based on Kernel Functions
with = 961 and = 1+1 p . Application of Lemma 6.1, with tk = k yields the desired inequality. Using 0 L, and Lemma 6.2 we obtain the following upper bound on the total number of iterations: 96 (1 + p) L 1+1 p log n 192L 1+1 p log n ; for all p 2 [0; 1]; (29)
where the number L is as given in (28): p
n 1+p ) 2
r 2
+ n
!1+p
: 1+ 2 n (1 Using (29), thus the total number of iterations is bounded above by L=
(30)
1 p r r n 192 n 1+p 1 + 2 n n + 1 log : p 1 A large-update methods uses = O(n) and = (1). Since = O (log n), we obtain the following iteration bound for large-update methods:
K n log
1
O n 1+p log n log
n :
7 Concluding Remarks In this paper we introduced a new class of kernel functions depending on parameters 1 and p 2 [0; 1]. If p < 1 then these functions are not strongly convex, and hence also not self-regular. The induced barrier functions dier from existing barrier functions in the sense that they are nite at the boundary of the feasible region. We proved that a large-update IPM forp SDO basedn on our new class of kernel functions has the iteration bound O n log n log if p = 1 and = O(log n). These bound is the currently best known bounds for primal-dual IPMs for SDO. Acknowledgement The authors kindly acknowledge the help of the guest editor and two anonymous referees in improving the readability of the paper.
M. EL Ghami C. Roos and T. Steihaug
References
23
[1] Bai, Y. Q., El Ghami, M. and Roos, C., 2003, A new ecient large-update primal-dual interiorpoint method based on a nite barrier. SIAM Journal on Optimization, 13(3):766{782. [2] Alizadeh, F., 1991, Combinatorial Optimization with Interior Point Methods and Semi-De nite Matrices. PhD thesis, University of Minnesota, Minneapolis, Minnesota, USA. [3] Alizadeh, F., 1995, Interior point methods in semide nite programming with applications to combinatorial optimization. SIAM Journal on Optimization, 5(1):13{51. [4] Nesterov, Y.E. and Nemirovskii, A.S., 1993, Interior Point Polynomial Methods in Convex Programming : Theory and Algorithms. SIAM, Philadelphia, USA. [5] de Klerk, E., 2002, Aspects of Semide ntie Programming, volume 65 of Applied Optimization. Kluwer Academic Publishers, Dordrecht, The Netherlands. [6] de Klerk, E., 1997, Interior Point Methods for Semide nite Programming. PhD thesis, TU Delft, The Netherlands. [7] Peng, J., Roos, C. and Terlaky, T., 2002, Self-Regularity: A New Paradigm for Primal-Dual Interior-Point Algorithms. Princeton University Press. [8] Nesterov, Y.E. and Todd, M.J., 1997, Self-scaled barriers and interior-point methods for convex programming. Mathematics of Operations Research, 22(1):1{42. [9] Sturm, J.F. and Zhang. S,. 1999, Symmetric primal-dual path following algorithms for semidefinite programming. Applied Numirical Mathematics, 29:301{315. [10] Horn, R.A. and Johnson, C. R., 1985 Matrix Analysis. Cambridge University Press, Cambridge, UK. [11] Rudin. W., 1978, Principles of Mathematical Analysis. Mac-Graw Hill Book Company, New York. [12] Bai, Y. Q., El Ghami, M. and Roos, C., 2004, A comparative study of kernel functions for primal-dual interior-point algorithms in linear optimization. SIAM Journal on Optimization, 15(1):101{128. [13] Horn, R.A. and Johnson, C. R., 1991 Topics in Matrix Analysis, Cambridge University Press. [14] Roos, C., Terlaky, T. and Vial, J.-Ph., 2005, Theory and Algorithms for Linear Optimization. An Interior-Point Approach. Springer Science.