Condition Measures and Properties of the Central Trajectory of a Linear Program Manuel A. Nunez and Robert M. Freund March 25, 1996 Revised May 27, 1997 Abstract
Given a data instance d = (A; b; c) of a linear program, we show that certain properties of solutions along the central trajectory of the linear program are inherently related to the condition number C (d) of the data instance d = (A; b; c), where C (d) is a scale-invariant reciprocal of a closely-related measure (d) called the \distance to ill-posedness." (The distance to ill-posedness essentially measures how close the data instance d = (A; b; c) is to being primal or dual infeasible.) We present lower and upper bounds on sizes of optimal solutions along the central trajectory, and on rates of change of solutions along the central trajectory, as either the barrier parameter or the data d = (A; b; c) of the linear program is changed. These bounds are all linear or polynomial functions of certain natural parameters associated with the linear program, namely the condition number C (d), the distance to ill-posedness (d), the norm of the data kdk, and the dimensions m and n.
1 Introduction The central trajectory of a linear program consists of the set of optimal solutions x = x() and (y; s) = (y(); s()) to the logarithmic barrier problems: n o P (d) : min cT x + p(x) : Ax = b; x > 0 ; n o D (d) : max bT y ? p(s) : AT y + s = c; s > 0 ; where for u > 0 in 0. AT y 0, bT y 0, and (AT y; bT y) =6 0. Proposition 2.3 Exactly one of the following two systems has a solution: 7
AT y < c. Ax = 0, x 0, cT x 0, and x 6= 0. Finally, we introduce the following notational convention which is standard in the eld of interior point methods: if x 2 0, then kx()k1 K(d; ); ky()k1 K(d; ); ks()k1 2kdkK(d; );
(3) (4) (5)
for the optimal solution x() to P (d) and the optimal solution (y(); s()) to the dual problem D (d), where K(d; ) is the scalar de ned in (2).
This theorem states that the norms of optimal solutions along the central trajectory are bounded above by quantities only involving the condition number C (d) and the distance to ill-posedness (d) of the data d, as well as the dimension n and the barrier parameter . Furthermore, for example, the theorem shows that the norm of the optimal primal solution 8
along the central trajectory grows at most linearly in the barrier parameter , and at a rate no larger than n=(d), as goes to 1.
Proof of Theorem 3.1: Let x^ = x() be the optimal solution to P (d) and (^y; s^) =
(y(); s()) be the optimal solution to the corresponding dual problem D (d). Note that the optimality conditions of P(d) and D(d) imply that cT x^ = bT y^ + n. Observe that since s^ = c ? AT y^, then ks^k1 kck1 + kAT k1;1ky^k1. Since kAT k1;1 = kAk, we have that ks^k1 kdk(1 + ky^k1), and using the fact that C (d) 1 the bound (5) on ks^k1 is a consequence of the bound (4) on ky^k1. It therefore is sucient to prove the bounds on kx^k1 and on ky^k1. In addition, the bound on ky^k1 is trivial if y^ = 0, so from now on we assume that y^ 6= 0. Also, let y be a vector in <m such that yT y^ = ky^k1 and kyk1 = 1. The rest of the proof proceeds by examining three cases: (i) cT x^ 0, (ii) 0 < cT x^ n, and (iii) n < cT x^. In case (i), let A = ?beT =kx^k1. Then (A + A)^x = 0, x^ > 0, and cT x^ 0. From Proposition 2.3, we have that D (d + d) is infeasible, and so (d) kdk = kAk = kbk1=kx^k1 kdk=kx^k1. Therefore, kx^k1 kdk=(d) = C (d) K(d; ), since C (d) 1 for any d. This proves (3) in this case. Let = bT y^, b = ?y=ky^k1 , A = ?ycT =ky^k1, and d + d = (A + A; b + b; c). Observe that (b + b)T y^ = 0 and (A + A)T y^ < 0, so that P (d + d) is infeasible from Proposition 2.2. Therefore, (d) kdk = maxfkck1; jjg=ky^k1 . Hence, ky^k1 maxfC (d); jj=(d)g. Furthermore, jj = jbT y^j = jcT x^ ? nj kx^k1kck1 + n C (d)kdk + n. Therefore, again using the fact that C (d) 1 for any d, we have (4). In case (ii), let d + d = (A + A; b; c + c), where A = ?beT =kx^k1 and c = ?ne=kx^k1. Observe that (A + A)^x = 0 and (c + c)T x^ 0. From Proposition 2.3, D (d + d) is infeasible, and so we conclude that (d) kdk = maxfkAk; kck1g = maxfkbk1; ng=kx^k1 (kdk + n)=kx^k1. Therefore, kx^k1 C (d) + n=(d) K(d; ). This proves (3) for this case. Now, let d + d = (A + A; b + b; c), where A = ?ycT =ky^k1 and b = ny=ky^k1. Observe that (b + b)T y^ = bT y^ + n = cT x^ > 0 and (A + A)T y^ < 0. Again, from Proposition 2.2, P(d + d) is infeasible, and so we conclude that (d) kdk = maxfkAk; kbk1g = maxfkck1; ng=ky^k1 (kdk + n)=ky^k1. Therefore, ky^k1 C (d) + n=(d) K(d; ). 9
In case (iii), we rst consider the bound on ky^k1. Let d + d = (A + A; b; c), where A = ?ycT =ky^k1. Since (A + A)T y^ < 0 and bT y^ = cT x^ ? n > 0, it follows from Proposition 2.2 that P (d + d) is infeasible and so, (d) kdk = kck1=ky^k1. Therefore, ky^k1 C (d) K(d; ). Finally, let A = ?beT =kx^k1 and c = ?e=kx^k1, where = cT x^. Observe that (A + A)^x = 0 and (c + c)T x^ = 0. Using Proposition 2.3, we conclude that D (d + d) is infeasible and so, (d) kdk = maxfkAk; kck1g = maxfkbk1; g=kx^k1, so that kx^k1 maxfC (d); =(d)g. Furthermore, = cT x^ = bT y^ + n kbk1ky^k1 + n kdkC (d) + n. Therefore, kx^k1 K(d; ).
q.e.d.
Note that the scalar quantity K(d; ) appearing in Theorem 3.1 is scale invariant in the sense that K(d; ) = K(d; ) for any > 0. From this it follows that the bounds in Theorem 3.1 on kx()k1 and ky()k1 are also scale invariant. However, as one would expect, the bound on ks()k1 is not scale invariant, since ks()k1 is sensitive to positive scalings of the data. Moreover, observe that as ! 0 the bounds in Theorem 3.1 converge to the bounds presented by Vera in [28] for optimal solutions to linear programs of the form minfcT x : Ax = b; x 0g. Examining the proof of Theorem 3.1, it is clear that the bounds stated in Theorem 3.1 will not generally be achieved. Indeed, implicit in the proof is the fact that bounds tighter than those in the theorem can be proved, and will depend on which of the three cases in the proof are applicable. However, our goal lies mainly in establishing bounds that are polynomial in the condition number C (d), the parameter , the size of the data kdk, and the dimensions m and n, and not necessarily in establishing the best achievable bounds. We now present a simple example illustrating that the bounds in Theorem 3.1 are not necessarily tight. Let m = 1, n = 2, and " #! 1 : d = (A; b; c) = [1; 1]; [1]; ? ?1 For this data instance, we have that kdk = 1 and (d) = 1, so that C (d) = 1 and K(d; ) = 1+ n. Now observe that x() = (1=2; 1=2)T for all > 0, so that kx()k1 = 1 < K(d; ) = 1+n for all > 0, which demonstrates that (3) is not tight in general. Furthermore, notice that in this example cT x() < 0, and so case (i) of the proof implies that kx()k1 C (d) (in fact, kx()k1 = C (d) = 1 in this example), which is a tighter bound than (3). 10
Corollary 3.1 Let 2 (0; 1) be given and xed, and let be such that (d), where d 2 F and (d) > 0. If d + d 2 D is such that kdk , then
2
kx()k1 11 ?+ K(d; ); 1 + 2 ky()k1 1 ? K(d; );
1 + 2 ks()k1 2(kdk + ) 1 ? K(d; ); where x() is the optimal solution to P(d + d), (y(); s()) is the optimal solution to D (d + d), and K(d; ) is the scalar de ned in (2). Proof: The proof follows by observing that for d 2 B (d; ) we have kdk kdk + , and (d) (1 ? )(d), so that 1 1 + 1 k d k + C (d) (1 ? )(d) = 1 ? (C (d) + =(d)) 1 ? (C (d) + ) C (d) 1 ? ; since C (d) 1:
q.e.d.
Note that for a xed value that Corollary 3.1 shows that the norms of solutions to any suitably perturbed problem are uniformly upper-bounded by a xed constant times the upper bounds on the solutions to the original problem. The next result presents a lower bound on the norm of the optimal solutions x() and s() to the central trajectory problems P (d) and D (d), respectively.
Theorem 3.2 If the program P(d) has an optimal solution and (d) > 0, then n ; kx()k1 2kdkK (d; ) n ks()k1 K(d; ) ; xj () 2kdkK(d; ) ; sj () K(d; ) ; 11
for all j = 1; : : : ; n, where x() is the optimal solution to P (d), (y(); s()) is the optimal solution to D (d), and K(d; ) is the scalar de ned in (2).
This theorem shows that kx()k1 and xj () are bounded from below by functions only involving the quantities kdk, C (d), (d), n, and . In addition, the theorem shows that for close to zero, that xj () grows at least linearly in , and at a rate that is at least 1=(2kdkC (d)2 ) (since K(d; ) = C (d)2 + n=(d) C (d)2 near = 0). Furthermore, the theorem also shows that for close to zero, that sj () grows at least linearly in , and at a rate that is at least 1=C (d)2 . The theorem oers less insight when ! 1, since the lower bound on kx()k1 presented in the theorem converges to (2C (d))?1 as ! 1. When the feasible region is unbounded, it is well known (see also the results at the end of this section) that kx()k ! 1 as ! 1, so that as ! 1 the lower bound of Theorem 3.2 does not adequately capture the behavior of the sizes of optimal solutions to P (d) when the feasible region is unbounded. We will present a more relevant bound shortly, in Theorem 3.3. Similar remarks apply to the bound on ks()k1 as ! 1.
Proof of Theorem 3.2: By the Karush-Kuhn-Tucker optimality conditions of the dual pair of problems P (d) and D (d), we have that s()T x() = n. Since s()T x() ks()k1kx()k1, it follows that kx()k1 n=ks()k1 and ks()k1 n=kx()k1.
Therefore, the rst two inequalities follow from Theorem 3.1. For the remaining inequalities, observe that for each j = 1; : : : ; n, = sj ()xj (), xj () kx()k1, and sj () ks()k1. Therefore, the result follows again from Theorem 3.1.
q.e.d.
The following corollary uses Theorem 3.2 to provide lower bounds for solutions to perturbed problems.
Corollary 3.2 Let 2 (0; 1) be given and xed, and let be such that (d), where d 2 F and (d) > 0. If d + d 2 D is such that kdk , then 1 ? 2 kx()k1 1 + 2(kdk +n)K(d; ) ; 2 ks()k1 11 ?+ K(n d; ) ; 12
1 ? 2 xj () 1 + 2(kdk + )K(d; ) ; 1 ? 2 sj () 1 + K(d; ) ; for all j = 1; : : : ; n, where x() is the optimal solution to P (d + d), (y(); s()) is the optimal solution to D (d + d), and K(d; ) is the scalar de ned in (2).
Proof: The proof follows the same logic as that of Corollary 3.1. q.e.d. Note that for a xed value that Corollary 3.2 shows that the norms of solutions to any suitably perturbed problem are uniformly lower-bounded by a xed constant times the lower bounds on the solutions to the original problem. The last result of this section, Theorem 3.3, presents dierent lower bounds on components of x() along the central trajectory, that are relevant when ! 1 and when the primal feasible region is unbounded. We will prove this theorem in Section 5. In this theorem, CI (dB ) denotes a certain condition number that is independent of and only depends on part of the data instance d associated with a certain partition of the indices of the components of x. We will formally de ne this other condition number in Section 5.
Theorem 3.3 Let x() denote the optimal solution to P (d) and (y(); s()) denote the optimal solution to D (d). Then there exists a unique partition of the indices f1; : : : ; ng into two subsets B and N such that
xj () 2kdkC (d ) ; I B sj () 2kdkCI (dB );
for all j 2 B , and xj () is uniformly bounded for all 0 for all j 2 N , where dB = (AB ; b; cB ) is a data instance in <mjBj+m+jBj composed of those elements of d indexed by the set B .
Note that the set B is the index set of components of x that are unbounded over the feasible region of P (d), and N is the index set of components of x that are bounded over the feasible region of P (d). Theorem 3.3 states that as ! 1, that xj () for j 2 B will go to 1 at least linearly in as ! 1, and at a rate that is at least 1=(2kdkCI (dB )). Of course, 13
from Theorem 3.3, it also follows that when the feasible region of P(d) is unbounded, that is, B 6= ;, that lim!1 kx()k1 = 1. Finally, note that Theorem 3.1 combined with Theorem 3.3 state that as ! 1, that xj () for j 2 B will go to 1 exactly linearly in . We end this section with the following remark concerning the scalar quantity K(d; ) de ned in (2). Rather than using the quantity K(d; ), the results in this section could alternatively have been expressed in terms of the following scalar quantity: !2 max fk d k ; n g (6) R(d; ) = minf(d); g : One can think of the quantity R(d; ) as the square of the condition number of the data instance (A; b; c; ) associated with the problem P (d), where now > 0 is considered as part of the data. The use of R(d; ) makes more sense intuitively relative to other results obtained in similar contexts (see for instance [28]). In this case, the norm on the data space would be de ned as k(A; b; c; )k = maxfkAk; kbk1; kck1; ng, and the corresponding distance to ill-posedness would be de ned by (A; b; c; ) = minf(d); g. However, we prefer to use the scalar K(d; ) of (2), which arises more naturally in the proofs and conveniently leads to slightly tighter results, and also because it more accurately conveys the behavior of the optimal solutions to P (d) as changes.
4 Bounds on Changes in Optimal Solutions as the Data is Changed In this section, we present upper bounds on changes in optimal solutions to P(d) and D (d) as the data d = (A; b; c) is changed or as the barrier parameter is changed. The major results of this section are contained in Theorems 4.1, 4.2, 4.3, 4.4, and 4.5. We rst present all ve theorems; the proofs of the theorems are deferred to the end of the section. As in the previous section, the bounds stated in these theorems are not necessarily the best achievable. Rather, it has been our goal to establish bounds that are polynomial in terms of the condition number C (d), the parameter , the size of the data kdk, and the dimensions m and n. The rst theorem, Theorem 4.1, presents upper bounds on the sizes of changes in optimal solutions to P (d) and D (d) as the data d = (A; b; c) is changed to data d + d = (A + A; b + b; c + c) in a suitably small neighborhood of the original data d. 14
Theorem 4.1 Let d = (A; b; c) be a data instance in F such that (d) > 0, and let > 0 be given and xed. Given 2 (0; 1) xed, let d = (A; b; c) 2 D be such that kdk (d). Then, 2 5 kx() ? x()k1 kdk 640n C (d) 2K(1(d;? ))6( + kdk) ; (7) 2 5 (8) ky() ? y()k1 kdk 640m C (d)2K(1(?d; ))6( + kdk) ; 2 5 2 ks() ? s()k1 kdk 640m C (d)2K(1(d;?))6( + kdk) ; (9) where x() and x() are the optimal solutions to P (d) and P (d + d), respectively; (y(); s()) and (y(); s()) are the optimal solutions to D (d) and D (d+d), respectively; and K(d; ) is the scalar de ned in (2).
Notice that the bounds are linear in kdk which indicates that the central trajectory associated with d changes at most linearly and in direct proportion to perturbations in d as long as the perturbations are smaller than (d). Also, the bounds are polynomial in the condition number C (d) and the barrier parameter . Furthermore, notice that as ! 0 these bounds diverge to 1. This is because small perturbations in d can produce extreme changes in the limit of the central trajectory associated with d as ! 0. The next theorem is important in that it establishes lower and upper bounds on the operator norm of the matrix (AX 2()AT )?1, where x() is the optimal solution of P (d). This is of central importance in interior point algorithms for linear programming that use Newton's method.
Theorem 4.2 Let d = (A; b; c) be a data instance in F such that (d) > 0. Let x() be the optimal solution of P (d), where > 0. Then
1 1 mn K(d; )kdk
!2
k(AX
2
!
C (d)K(d; ) 2 ; 4 m 1;1
()AT )?1k
where K(d; ) is the scalar de ned in (2).
Notice that the bounds in the theorem only depend on the condition number C (d), the distance to ill-posedness (d), the size of the data instance d = (A; b; c), the barrier 15
parameter , and the dimensions m and n. Also note that as ! 0, the upper bound on k(AX 2()AT )?1k1;1 in the theorem goes to 1 quadratically in 1= in the limit. Incidentally, the matrix (AX 2()AT )?1 diers from the inverse of the Hessian of the dual objective function at its optimum by the scalar ?2. Theorem 4.3 presents upper bounds on the sizes of changes in optimal solutions to P (d) and D (d) as the barrier parameter is changed:
Theorem 4.3 Let d = (A; b; c) be a data instance in F such that (d) > 0. Given ; >
0, let x() and x() be the optimal solutions of P (d) and P(d), respectively; and let (y(); s()) and (y(); s()) be the optimal solutions of D (d) and D (d), respectively. Then (10) kx() ? x()k1 n j ? j K(d; )K(d; )kdk; ky() ? y()k1 4m j ? j K(d; )K(d; )kdkC (d)2; (11) (12) ks() ? s()k1 4m j ? j K(d; )K(d; )kdk2C (d)2 ; where K(d; ) is the scalar de ned in (2). Notice that these bounds are linear in j ? j, which indicates that solutions along the central trajectory associated with d change at most linearly and in direct proportion to changes in . Also, the bounds are polynomial in the condition number C (d) and the barrier parameter . The next result, Corollary 4.1, states upper bounds on the rst derivatives of the optimal solutions x() and (y(); s()) of P (d) and D (d), respectively, with respect to the barrier parameter . We rst de ne the derivatives along the central trajectory as follows: x() ? x() x_ () = lim ! ? ; y() ? y() y_ () = lim ! ? ; s() ? s() s_() = lim ! ? : 16
See Adler and Monteiro [1] for the application of these derivatives to the limiting behavior of central trajectories in linear programming.
Corollary 4.1 Let d = (A; b; c) be a data instance in F such that (d) > 0, and let > 0
be given and xed. Let x() and (y(); s()) be the optimal solutions of P (d) and D (d), respectively. Then
kx_ ()k1 n2 K(d; )2kdk; ky_ ()k1 4m2 K(d; )2kdkC (d)2;
ks_()k1 4m2 K(d; )2kdk2C (d)2;
where K(d; ) is the scalar de ned in (2).
The proof of this corollary follows immediately from Theorem 4.3. Theorem 4.4 presents an upper bound on the size of the change in the optimal objective function value of P (d) as the data d is changed to data d + d in a speci c neighborhood of the original data d. Before stating this theorem, we introduce the following notation. Let d be a data instance in F , then we denote by z(d) the corresponding optimal objective value associated with P (d) by keeping the parameter xed, that is,
z(d) = minfcT x + p(x) : Ax = b; x > 0g:
Theorem 4.4 Let d = (A; b; c) be a data instance in F such that (d) > 0, and let 0 be given and xed. Given 2 (0; 1) xed, let d = (A; b; c) 2 D be such that kdk (d). Then, 1 + 4 jz(d + d) ? z(d)j 3 kdk 1 ? K (d; )2 ;
(13)
where K(d; ) is the scalar de ned in (2).
Observe that, as in Theorem 4.1, the upper bound in the change in the objective function value is linear in kdk so long as kdk is no larger than (d), which indicates that optimal objective values along the central trajectory will change at most linearly and in 17
direct proportion to changes in d for small changes in d. Note also that the bound is polynomial in the condition number C (d) and in the barrier parameter . Last of all, Theorem 4.5 presents an upper bound on the size of the change in the optimal objective function value of P (d) as the barrier parameter is changed. As before, it is convenient to introduce the following notation. Let d be a data instance in F , then we denote by z() the corresponding optimal objective value associated with P (d) by keeping the data instance d xed, that is,
z() = minfcT x + p(x) : Ax = b; x > 0g:
Theorem 4.5 Let d = (A; b; c) be a data instance in F such that (d) > 0. Then, jz()?z()j j?j n ln(2) + ln (K(d; )K(d; )) + j ln(kdk)j + max fj ln()j; j ln()jg ; for given ; > 0, where K(d; ) is the scalar de ned in (2). As in Theorem 4.3, the upper bound given by this theorem is linear in j ? j, which indicates that optimal objective function values along the central trajectory associated with d change at most linearly and in direct proportion to changes in . Also, the bounds are logarithmic in the condition number C (d) and in the barrier parameter .
Remark 1 Since z() = cT x() + p(x()), it follows from the smoothness of x() that z() is also a smooth function, and from Theorem 4.5 it then follows that jz_ ()j n ln(2) + 2 ln(K(d)) + j ln(kdk)j + j ln()j :
Before proving the ve theorems, we rst prove a variety of intermediary results that will be used in the proofs of the ve theorems. The following proposition is a key proposition that relates the distance to ill-posedness of a data instance d = (A; b; c) to the smallest eigenvalue of the matrix AAT .
Proposition 4.1 Let d = (A; b; c) 2 F and (d) > 0. Then (i) (1=m)k(AAT )?1 k2 k(AAT )?1 k1;1 k(AAT )?1 k2 , and
q
(ii) (d) m1 (AAT ),
18
where 1(AAT ) denotes the smallest eigenvalue of AAT .
Proof: From Lemma 2.1, A has rank m, so that (AAT )?1 exists. The proof of (i) follows directly from Proposition 2.1, inequalities (i) and (iii). For the proof of (ii), let 1 = 1(AAT ). There exists v 2 <m with kvk2 = 1 and AAT v = 1v, so that kAT vk22 = vT AAT v = 1. Let A = ?vvT A, b = v for any > 0 and small. Then, (A + A)T v = 0 and (b +b)T v = bT v + = 6 0, for all > 0 small. Hence, (A +A)x = b +b is an inconsistent system of equations for all > 0 and small. Therefore,pby Propositionp2.1, inequality (iv), p (d) maxfkAk; kbk g = kAk mkAk = mkAT vk = m , thus proving
(ii).
2
1
2
1
q.e.d. The next three results establish upper and lower bounds on certain quantities as the data d = (A; b; c) is changed to data d + d = (A + A; b + b; c + c) in a speci c neighborhood of the original data d; or as the parameter is changed along the central trajectory. These results will also be used in the proofs of the theorems of this section. Lemma 4.1 Suppose that d = (A; b; c) 2 F , (d) > 0. Let 2 (0; 1) be given and xed, and let d be such that kdk (d). If x() is the optimal solution to P (d), and x() is the optimal solution to P(d + d), then for j = 1; :::; n; ! ! 1 (1 ? ) 2 x ()x () 4 K(d; ) 2 ; (14) j j 32 kdkK(d; ) 1? where > 0 is given and xed, and K(d; ) is the scalar de ned in (2).
Proof: Let x = x() and x = x(). From Theorem 3.1 we have that kxk1 K(d; ), and from Corollary 3.1 we also have that kxk1 (4=(1 ? )2)K(d; ). Therefore, we obtain xj xj kxk1kxk1 4(K(d; )2=(1 ? )2) for all j = 1; : : : ; n. On the other hand, from Theorem 3.2 and Corollary 3.2, it follows that xj 2kdkK(d; ) ; (1 ? )2 xj 8(kdk + kdk)K(d; ) 2 16(1kd?kK()d;) ; 19
for all j = 1; : : : ; n. Therefore,
!2 1 (1 ? ) xj xj 32 kdkK(d; ) ;
for all j = 1; : : : ; n.
q.e.d.
Lemma 4.2 Suppose that d = (A; b; c) 2 F and (d) > 0. Let 2 (0; 1) be given and xed, and let d be such that kdk (d). If x = x() is the optimal solution to P (d), and x = x() is the optimal solution to P (d + d), then
! !2 C ( d ) K ( d; ) 1 ? 2 k(AX XA 1 T ? 1 ) k1;1 32m 4mn K(d; )kdk (1 ? ) ; where > 0 is given and xed, and K(d; ) is the scalar de ned in (2).
Proof: Using identical logic to Proposition 4.1 part (i), we have that T )?1k1;1 k(AX XA T )?1k2 k(AX XA (AAT )?1k2 : kmin fx x g j
j j
Now, by applying Proposition 4.1, part (ii), and Lemma 4.1, we obtain that 2 2 T )?1k1;1 232kdk K2 (d; ) T k(AX XA (1 ? ) 1(AA ) 2 2 322m(1kd?kK)2(d;(d))2 2 2 = 32mC2((1d)?K()d;2 ) : On the other hand, by identical logic to Proposition 4.1 part (i), T )?1k1;1 1 k(AX XA T )?1 k2 k(AX XA m T )?1 k2 k ( AA m max fx x g : j
20
j j
(15)
Now, by applying Proposition 2.1, part (iv), and Lemma 4.1, we obtain that (1 ? )2 T )?1k1;1 k(AX XA 4mK(d; )21(AAT ) 2 4mK(d;(1?)2) (AAT ) m 2 = 4mK(1(d;?)2)kAk2 2 2 (1 ? ) 4mnK(d; )2kAk2 2 (1 ? ) 4mnK(d; )2kdk2 ; where m (AAT ) is the largest eigenvalue of AAT .
q.e.d.
Lemma 4.3 Let d = (A; b; c) be a data instance in F such that (d) > 0. Let x = x()
and x = x() be the optimal solutions of P (d) and P (d), respectively, where ; > 0. Then 4mC (d)2 K(d; )K(d; ) ; 1 T )?1 k k ( AX XA 1;1 mnK(d; )K(d; )kdk2
where K(d; ) is the scalar de ned in (2).
Proof: Following the proof of Lemma 4.2, we have from Proposition 4.1 and Theorem 3.2 that
m T )?1k1;1 k(AX XA minj fxj xj g(d)2 2 4mkdk K(d;(d))2K(d; ) 2 = 4mC (d) K(d; )K(d; ) :
21
On the other hand, we have again from Proposition 2.1, Theorem 3.1, and Proposition 4.1 that T ?1 T )?1k1;1 k(AA ) k2 k(AX XA m maxj fxj xj g mK(d; )K(d;1 ) (AAT ) 1 1 mK(d; )K(d; ) (AAT ) m 1 = mK(d; )K(d; )kAk2 2 1 mnK(d; )K(d; )kAk2 mnK(d; )K1 (d; )kdk2 :
q.e.d. Note that the proof of Theorem 4.2 follows as an immediate application of Lemma 4.3, by setting = . We are now ready to prove Theorem 4.1.
Proof of Theorem 4.1: Let x = x() and x = x() be the optimal solutions to P (d) and P (d + d), respectively; and let (y; s) = (y(); s()) and (y; s) = (y(); s()) be the optimal solutions to D (d) and D (d + d), respectively. Then from the Karush-KuhnTucker optimality conditions we have that: Xs = e; X s = e; AT y + s = c; (A + A)T y + s = c + c; Ax = b; (A + A)x = b + b; x > 0; x > 0: Therefore,
x ? x = 1 X X (s ? s) 22
= 1 X X (c ? AT y) ? (c + c ? (A + A)T y) T (y ? y): = 1 X X AT y ? c + 1 X XA (16) On the other hand, A(x ? x) = b ? Ax. Since A has rank m (otherwise (d) = 0), then T is a positive de nite matrix. By combining these statements together with P = AX XA (16), we obtain b ? Ax = 1 AX X AT y ? c + 1 P (y ? y); and so P ?1 (b ? Ax) = P ?1AX X AT y ? c + y ? y: Therefore, we have the following identity: y ? y = P ?1 (b ? Ax) + P ?1AX X c ? AT y : (17) From this identity, it follows that ky ? yk1 kP ?1k1;1 kb ? Axk1 + kAkkX X (c ? AT y)k1 :
(18)
Note that kX X (c ? AT y)k1 kX X k1;1kc ? AT yk1 kxk1kxk1kc ? AT yk1:
(19)
From Corollary 3.1, we have that
kb ? Axk1 kdk(1 + kxk1) ! 4 kdk 1 + (1 ? )2 K(d; ) (15k?dk)2 K(d; ); kc ? AT yk1 kdk(1 + kyk1) ! 4 kdk 1 + (1 ? )2 K(d; ) (15k?dk)2 K(d; ): 23
(20)
(21)
Therefore, by combining (18), (19), (20), and (21), and by using Theorem 3.1, Corollary 3.1, and Lemma 4.2, we obtain the following bound on ky ? yk1: !2 !0 !2 1 C ( d ) K ( d; ) 5 k d k K ( d; ) @ A ky ? yk1 32m (1 ? ) (1 ? )2 K(d; ) + 4kdk 1 ? 2 5 640mkdk C (d) K(2d;(1?) ()6+ kdk) ; thereby demonstrating the bound (8) on ky ? yk1. Now, by substituting identity (17) into equation (16), we obtain T P ?1 (b ? Ax) x ? x = 1 X X I ? AT P ?1AX X AT y ? c + X XA = 1 D 12 I ? D 12 AT P ?1 AD 12 D 12 AT y ? c + DAT P ?1 (b ? Ax) ; where D = X X . Observe that the matrix Q = I ? D 21 AT P ?1AD 12 is a projection matrix, and so kQxk2 kxk2 for all x 2 0; x > 0: Therefore, x ? x = 1 X X (s ? s) = 1 X X (c ? AT y) ? (c ? AT y) = 1 X X ( ? )c ? AT (y ? y) : (22) 25
On the other hand, A(x ? x) = b ? b = 0. Since A has rank m (otherwise (d) = 0), then T is a positive de nite matrix. By combining these statements together with P = AX XA (22), we obtain 0 = 1 AX X ( ? )c ? AT (y ? y) ; and so P (y ? y) = ( ? )AX Xc; equivalently y ? y = ( ? )P ?1 AX Xc: (23) By substituting identity (23) into equation (22) and by letting D = X X , we obtain: x ? x = ? X X c ? AT P ?1AX Xc = ? D c ? AT P ?1ADc = ? D 12 I ? D 21 AT P ?1AD 12 D 12 c;
Observe that the matrix Q = I ? D 12 AT P ?1AD 12 is a projection matrix, and so kQxk2 kxk2 for all x 2 0 L(x; y) = minx>0 maxy L(x; y); z(d + d) = maxy minx>0 L (x; y) = minx>0 maxy L (x; y): Hence, if (x(); y()) is a pair of optimal solutions to the primal and dual programs corresponding to d, and (x(); y()) is a pair of optimal solutions to the primal and dual programs corresponding to d + d, then z(d) = L(x(); y()) = max y L(x(); y ) = max y fL(x(); y ) + (x(); y )g L (x(); y()) + (x(); y()) z(d + d) + (x(); y()): 27
Thus, z(d) ? z(d + d) (x(); y()). Similarly, we can prove that z(d) ? z(d + d) (x(); y()). Therefore, we obtain the following bounds: either
jz(d + d) ? z(d)j j(x(); y())j; or
jz(d + d) ? z(d)j j(x(); y())j: On the other hand, using Holder's inequality and the bounds from Corollary 3.1 we have
j(x(); y())j =
jcT x() + y()T b ? y()T Ax()j kck1kx()k1 + ky()k1 kbk1 + ky()k1 kAkkx()k1 kdk (kx()k1 + ky()k1 + ky()k1kx()k1) 1 + 4 3kdk 1 ? K(d; )2:
Similarly, we can show that and the result follows.
1 + 4 j(x(); y())j 3kdk 1 ? K(d; )2;
q.e.d.
Finally, we prove Theorem 4.5.
Proof of Theorem 4.5: Let x() and x() be the optimal solutions to P(d) and P (d),
respectively; and (y(); s()) and (y(); s()) be the optimal solutions to D (d) and D (d), respectively. As in Theorem 4.4, for given ; > 0, consider the following Lagrangian functions: L(x; y) = cT x + p(x) + yT (b ? Ax) and L (x; y) = cT x + p(x) + yT (b ? Ax). De ne (x; y) = L(x; y) ? L (x; y) = ( ? )p(x). By a similar argument as in the proof of Theorem 4.4, we have that z() ? z() (x(); y()) and z() ? z() (x(); y()). Therefore, we obtain the following bounds: either jz() ? z()j j ? (x(); y())j = j ? jjp(x())j; or jz() ? z()j j ? (x(); y())j = j ? jjp(x())j: 28
Therefore,
jz() ? z()j j ? j maxfjp(x())j; jp(x())jg:
On the other hand, from Theorem 3.1 and Theorem 3.2, we have 2kdkK(d; ) xj () K(d; ); for all j = 1; : : : ; n. Hence, ! n ln 2kdkK(d; ) ?p(x()) n ln(K(d; )); so that ! ) ( jp(x())j n max ln 2kdkK (d) ; ln(K(d; )) n ln(2) + ln(K(d; )K(d; )) + j ln(kdk)j + max fj ln()j; j ln()jg Similarly, using instead of we also obtain jp(x())j n ln(2) + ln(K(d; )K(d; )) + j ln(kdk)j + max fj ln()j; j ln()jg and the result follows.
: ;
q.e.d.
5 Bounds for Analytic Center Problems In this section, we study some elementary properties of primal and dual analytic center problems, that are used in the proof of Theorem 3.3, which is presented at the end of this section. Given a data instance d = (A; b; c) for a linear program, the analytic center problem in equality form, denoted AE (d), is de ned as: AE (d) : minfp(x) : Ax = b; x > 0g: Structurally, the program AE (d) is closely related to the central trajectory problem P (d), and was rst extensively studied by Sonnevend, see [26] and [27]. In terms of data dependence, note that the program AE (d) does not depend on the data c. It is well known that 29
AE (d) has a unique solution when its feasible region is bounded and non-empty. We call this unique solution the (primal) analytic center. Similarly, we de ne the analytic center problem in inequality form, denoted AI (d), as: AI (d) : maxf?p(s) : s = c ? AT y; s > 0g: In terms of data dependence, the program AI (d) does not depend on the data b. The program AI (d) has a unique solution when its feasible region is bounded and non-empty, and we call this unique solution the (dual) analytic center. Note in particular that the two programs AE (d) and AI (d) are not duals of each other. (In fact, direct calculation reveals that AE (d) and AI (d) cannot both be solvable, since at least one of AE (d) and AI (d) must be unbounded.) As we will show soon, the study of these problems is relevant to obtain certain results on the central trajectory problem. We will now present some particular upper bounds on the norms of feasible solutions of the analytic center problems AE (d) and AI (d), that are similar in spirit to certain results of the previous sections on the central trajectory problems P (d) and D (d). In order to do so, we rst introduce a bit more notation. De ne the following data sets: DE = f(A; b) : A 2 <mn ; b 2 <mg and DI = f(A; c) : A 2 <mn ; c 2 0; and AT y < 0g; FI = f(A; c) 2 DI : there exists (x; y) such that AT y < c; and Ax = 0; x > 0g; in other words, FE consists of data instances d for which AE (d) is feasible and attains its optimal value, that is, AE (d) is solvable; and FI consists of data instances d for which AI (d) is feasible and attains its optimal value, that is, AI (d) is solvable. It is also appropriate to introduce the corresponding sets of ill-posed data instances: BE = cl(FE ) \ cl(FEC ) = @ FE = @ FEC , and BI = cl(FI ) \ cl(FIC ) = @ FI = @ FIC . For the analytic center problem in equality form AE (d), the distance to ill-posedness of a data instance d = (A; b; c) is de ned as E (d) = inf fk(A; b)kE : (A +A; b +b) 2 BE g. For the analytic center problem in inequality form AD(d), the distance to ill-posedness of a data instance d = (A; b; c) is de ned as I (d) = inf fk(A; c)kI : (A + A; c + c) 2 BI g, where k(A; b)kE = maxfkAk; kbk1g and k(A; c)kI = maxfkAk; kck1g. Likewise, the corresponding condition measures are CE (d) = k(A; b)kE=E (d) if E (d) > 0 and CE (d) = 1 30
otherwise; CI (d) = k(A; c)kI =I (d) if I (d) > 0 and CI (d) = 1 otherwise.
Proposition 5.1 If d = (A; b; c) is such that (A; b) 2 FE , then E (d) (d). Proof: Given any > 0, consider = E (d) ? . If d + d = (A + A; b + b; c + c) is a data instance such that kdk , then k(A; b)kE . Hence, (A + A; b + b) 2 FE , so that the system (A + A)x = b + b, x > 0, (A + A)T y < 0 has a solution, and therefore the system (A + A)x = b + b, x > 0, (A + A)T y < c also has a solution, that is, d + d 2 F . Therefore, (d) = E (d) ? , and the result follows by letting ! 0.
q.e.d.
The following two lemmas present upper bounds on the norms of all feasible solutions for analytic center problems in equality form and in inequality form, respectively.
Lemma 5.1 Let d = (A; b; c) be such that (A; b) 2 FE and E (d) > 0. Then kxk1 CE (d) for any feasible x of AE (d).
Proof: Let x be a feasible solution of AE (d). De ne A = ?beT =kxk1 and d =
(A; 0; 0). Then, (A + A)x = 0 and x > 0. Now, consider the program AE (d + d). Because (A +A)x = 0, x > 0, has a solution, there cannot exist y for which (A +A)T y < 0, and so (A +A; b) 2 FEC , whereby E (d) k(A; 0)kE . On the other hand, k(A; 0)kE = kbk1=kxk1 k(A; b)kE =kxk1, so that kxk1 k(A; b)kE=E (d) = CE (d).
q.e.d.
Lemma 5.2 Let d = (A; b; c) be such that (A; c)FI and I (d) > 0. Then kyk1 CI (d); ksk1 2k(A; c)kI CI (d); for any feasible (y; s) of AI (d).
Proof: Let (y; s) be a feasible solution of AI (d). If y = 0, then s = c and the bounds are trivially true, so that we assume y = 6 0. Let y be such that kyk1 = yT y and kyk1 = 1. T Let A = ?yc =kyk1 and d = (A; 0; 0). Hence, (A + A)T y = AT y ? c < 0. Because 31
(A + A)T y < 0 has a solution, there cannot exist x for which (A + A)x = 0 and x > 0, and so (A + A; c) 2 FIC , whereby I (d) k(A; 0)kI . On the other hand, k(A; 0)kI = kck1=kyk1 k(A; c)kI =kyk1, so that kyk1 k(A; c)kI =I (d) = CI (d). The bound for ksk1 is easily derived using the fact that ksk1 kck1 + kAT k1;1 kyk1 = kck1 + kAkkyk1 and CI (d) 1.
q.e.d.
With the aid of Lemma 5.2, we are now in position to present the proof of Theorem 3.3.
Proof of Theorem 3.3: From Tucker's strict complementarity theorem (see Dantzig [4], p. 139, and [31]), there exists a unique partition [B; N ] of the set f1; : : : ; ng into subsets B and N , B \ N = ; and B [ N = f1; : : : ; ng satisfying the following two properties: 1. Au = 0, u 0 implies uN = 0 and there exists u^ for which Au^ = 0, u^B > 0, and
u^N = 0, 2. AT y = v, v 0 implies vB = 0 and there exists (^y; v^) for which AT y^ = v^, v^B = 0, and v^N < 0. Consider the set S = fsB 2 <jBj : sB = cB ? ATB y for some y 2 <m; sB > 0g. Because P (d) has an optimal solution, S is non empty. Also, S is bounded. To see this, suppose instead that S is unbounded, in which case there exists y~ such that ATB y~ 0 and ATB y~ 6= 0. Then, using the vector y^ from property 2 above, we obtain that ATN (~y + y^) = ATN y~ + v^N 0 for suciently large, and since ATB y^ = v^B = 0, it follows that AT (~y + y^) 0 for suciently large. By the de nition of the partition [B; N ], we have that ATB (~y + y^) = 0. This in turn implies that ATB y~ = 0, a contradiction. Because S is non-empty and bounded, dB = (AB ; b; cB ) 2 FI . Therefore, by Lemma 5.2, for any sB 2 S , ksB k1 2k(AB ; cB )kI CI (dB ), in particular ksB ()k1 2k(AB ; cB )kI CI (dB ) 2kdkCI (dB ): Hence, for any j 2 B , sj () ksB ()k1 2kdkCI (dB ). Moreover, since xj ()sj () = , then xj () 2kdkC (d ) ; I B for j 2 B . Finally, by de nition of the partition of f1; : : : ; ng into B and N , xj () is bounded for all j 2 N and for all > 0. This also ensures that B is unique.
q.e.d.
32
References [1] Ilan Adler and Renato D. C. Monteiro. Limiting behavior of the ane scaling continuous trajectories for linear programming problems. Mathematical Programming, 50(1):29{51, 1991. [2] S. A. Ashmanov. Stability conditions for linear programming problems. U.S.S.R. Comput. Maths. Math. Phys, 21(6):40{49, 1981. [3] Mokhtar S. Bazaraa, Hanif D. Sherali, and C. M. Shetty. Nonlinear Programming, Theory and Algorithms. John Wiley & Sons, Inc, New York, second edition, 1993. [4] George B. Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, New Jersey, 1963. [5] D. Den Hertog, C. Roos, and T. Terlaky. On the classical logarithmic barrier function method for a class of smooth convex programming problems. Journal of Optimization Theory and Applications, 73(1):1{25, 1992. [6] Sharon Filipowski. On the complexity of solving linear programs speci ed with approximate data and known to be feasible. Technical report, Dept. of Industrial and Manufacturing Systems Engineering, Iowa State University, May 1994. [7] Sharon Filipowski. On the complexity of solving sparse symmetric linear programs speci ed with approximate data. Technical report, Dept. of Industrial and Manufacturing Systems Engineering, Iowa State University, December 1994. [8] Sharon Filipowski. On the complexity of solving feasible systems of linear inequalities speci ed with approximate data. Mathematical Programming, 71(3):259{288, December 1995. [9] Robert M. Freund and Jorge R. Vera. Some characterizations and properties of the \distance to ill-posedness" in conic linear systems. MIT Sloan Working Paper 386295-MSA, Sloan School of Management, Massachusetts Institute of Technology, 1995. [10] D. Gale. The Theory of Linear Economic Models. McGraw-Hill Book Company, Inc, New York, 1960. [11] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, New York, third edition, 1997. 33
[12] C. C. Gonzaga. Path following methods for linear programming. SIAM Review, 34(2):167{227, 1992. [13] B. Jansen, C. Roos, and Terlaky T. A short survey on ten years interior point methods. Technical Report 95-45, Delft University of Technology, 1995. [14] Leonid G. Khachiyan. A polynomial algorithm in linear programming. Soviet Math Dokl., 20(1):191{194, 1979. [15] David G. Luenberger. Optimization by Vector Space Methods. John Wiley & Sons, Inc, New York, 1968. [16] O. L. Mangasarian. A stable theorem of the alternative: An extension of the Gordan theorem. Linear Algebra and Its Applications, 41:209{223, 1981. [17] G.L. Nemhauser and L.A. Wolsey. Integer and Combinatorial Optimization. John Wiley & Sons, Inc, New York, 1988. [18] Yurii Nesterov and Arkadii Nemirovskii. Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 1994. [19] James Renegar. Some perturbation theory for linear programming. Mathematical Programming, 65(1):73{91, 1994. [20] James Renegar. Incorporating condition measures into the complexity theory of linear programming. SIAM Journal on Optimization, 5(3):506{524, August 1995. [21] James Renegar. Linear programming, complexity theory, and elementary functional analysis. Mathematical Programming, 70(3):279{351, November 1995. [22] James Renegar. Condition numbers, the barrier method, and the conjugate gradient method. SIAM Journal on Optimization, 6(4):879{912, November 1996. [23] Stephen M. Robinson. Stability theory for systems of inequalities. Part I: Linear systems. SIAM Journal on Numerical Analysis, 12(5):754{769, 1975. [24] Stephen M. Robinson. Stability theory for systems of inequalities. Part II: Nonlinear systems. SIAM Journal on Numerical Analysis, 13(4):497{513, 1976. 34
[25] Stephen M. Robinson. A characterization of stability in linear programming. Operations Research, 25(3):435{447, 1977. [26] G. Sonnevend. An `analytic' center for polyhedrons and new classes of global algorithms for linear (smooth, convex) optimization. Technical report, Dept. of Numerical Analysis, Institute of Mathematics, Eotvos University, 1088, Budapest, Muzeum Korut 6-8, 1985. Preprint. [27] G. Sonnevend. A new method for solving a set of linear (convex) inequalities and its applications for identi cation and optimization. Technical report, Dept. of Numerical Analysis, Institute of Mathematics, Eotvos University, 1088, Budapest, Muzeum Korut 6-8, 1985. Preprint. [28] Jorge R. Vera. Ill-posedness and the computation of solutions to linear programs with approximate data. Technical report, Cornell University, May 1992. [29] Jorge R. Vera. On the complexity of linear programming under nite precision arithmetic. Technical report, Cornell University, August 1994. [30] Jorge R. Vera. Ill-posedness and the complexity of deciding existence of solutions to linear programs. SIAM Journal on Optimization, 6(3):549{569, August 1996. [31] A. C. Williams. Complementarity theorems for linear programming. SIAM Review, 12(1):135{137, January 1970. [32] Stephen J. Wright. Primal-Dual Interior-Point Methods. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, 1997.
35