Rounding of convex sets and efficient gradient methods ... - DIAL@UCL

Comment

Report 1 Downloads 37 Views

Rounding of convex sets and efficient gradient methods for linear programming problems Yu. Nesterov

∗

January, 2004

Abstract In this paper we propose new efficient gradient schemes for two non-trivial classes of linear programming problems. These schemes are designed to compute approximate solutions with√relative accuracy δ. We prove that the upper complexity bound for both schemes is O( n δln m ln n) iterations of a gradient-type method, where n and m, (n < m), are the sizes of the corresponding linear programming problems. The proposed schemes are based on preliminary computation of an ellipsoidal rounding for some polytopes in Rn . In both cases this computation can be performed very efficiently, in O(n2 m ln m) operations at most.

Keywords: Nonlinear optimization, convex optimization, complexity bounds, relative accuracy, fully polynomial approximation schemes, gradient methods, optimal methods.

CORE DISCUSSION PAPER 2004/04

∗

Center for Operations Research and Econometrics (CORE), Catholic University of Louvain (UCL), 34 voie du Roman Pays, 1348 Louvain-la-Neuve, Belgium; e-mail: [email protected]. This paper presents research results of the Belgian Program on Interuniversity Poles of Attraction initiated by the Belgian State, Prime Minister’s Office, Science Policy Programming. The scientific responsibility is assumed by the author. The paper was produced during the visit of the author to the research centers IFOR and FIM at ETH, Zurich. Their support is kindly acknowledged.

1

Introduction

Motivation. Among the modern methods for solving linear programming problems (LPproblems, for short), the interior-point methods (IPM) are considered as the most efficient ones. However, these methods are based on a heavy machinery. For an LP-problem of the size m × n, (m > n), in order to get an approximate solution with absolute accuracy ², these methods need to perform √ O( m ln m ² ) iterations of a Newton method. Note that for a problem with a dense matrix of linear constraints each iteration can take up to O(n2 m) operations. Clearly these bounds leave considerable room for competition from the gradient-type schemes, for which each iteration is much cheaper. However the main drawback of these schemes is their very slow convergence. In general the gradient schemes need O( C²20 ) iterations in order to find an ²-solution to the problem. In this estimate a heavy dependence on ² is coupled with the presence of some constant C0 , which depends on a norm of the constraint matrix, size of the solution, etc, and which can be uncontrollably large. Hence, until recently, the gradient-type schemes only provided serious competition for IPM on very large transportation-type problems (see [4], [12], and [3]). However, it was shown recently in [9] that it is possible to use the structure of LPproblems in order to design gradient-type schemes which converge in O( C²1 ) iterations. Moreover it was shown that for some LP-problems the constant C1 can be found explicitly and that it is reasonably small. In [11] this result was extended to cover minimization schemes for finding an approximate solution with a certain relative accuracy. Specifically, it was shown that for some classes of LP-problems it is possible to compute an approximate √ solution of relative accuracy δ with O( δm ) iterations of a gradient-type scheme. Note that for many applications the concept of relative accuracy is very attractive since it adapts automatically to any size of solution. So, there is no necessity to fight against big and unknown constants. For many problems in engineering and economics a level of relative accuracy of the order 0.5% − 0.05% is acceptable. The approach of [11] is applicable to a special conic unconstrained minimization problem. That is the minimization of a non-negative homogeneous convex function f (x), x ∈ Rn , on an affine subspace. In order to compute a solution to this problem with some relative accuracy, we need to know a rounding ellipsoid for the subdifferential of f (x) at the origin. In [11] it was shown that for some LP-problems it is possible to use the √ structure of f (x) in order to compute such an ellipsoid with a radius O( m). √ It is well known that there exists a rounding ellipsoid of radius n for any centrally symmetric set in Rn (see [5]). Moreover, a good approximation to such an ellipsoid can be easily computed (see [6], [1]). It appears that this ellipsoid provides us with a reasonable norm, which allows one to solve corresponding minimization problem up to a certain relative accuracy. Here we analyze two non-trivial classes of LP-problems and show that for both classes approximate solutions with relative accuracy δ can be computed √ in O( n δln m ln n) iterations of a gradient-type method. Note that the preliminary computation of the rounding ellipsoids in both situations is fairly cheap: it takes O(n2 m ln m) operations at most. Up to a logarithmic factor, this is the complexity of finding a projection onto a linear subspace in Rm defined by n

2

linear equations. Since each iteration of a gradient scheme takes O(nm) operations, in the proposed methods the dependence in n and m of the complexity of the preliminary stage is of the higher order than that of optimization stage. Contents. In Section 2 we present efficient algorithms for computing rounding ellipsoids for different types of convex sets: central symmetric sets (Section 2.1), general sets (Section 2.2), and sign-symmetric sets (Section 2.3). The results presented in Sections 2.1, 2.2 can be found in [6]; however, we provide the algorithms with simple geometrical proofs. In all situations the computation of a rounding ellipsoid of acceptable quality can be carrier out in O(n2 m ln m) operations. It is important that for the sign-symmetric sets the rounding ellipsoid can be chosen to be diagonal. In Section 3 we consider the problem of minimizing the maximal absolute value of several linear functions over an affine subspace. LP-problems of this type arise in regression analysis, or in truss topology design [2]. We show that an appropriate Euclidean metric (defined√by a rounding ellipsoid for the subdifferential of the objective function) ensures an O( n δln m ln n) bound for the number of iterations of a gradient scheme. In Section 4 we consider the problem of finding a solution to a bilinear matrix game, whose matrix has nonnegative coefficients (linear packing problem). We show that after appropriate diagonal preconditioning (which can be√ efficiently computed by the technique of Section 2.3), this problem can be solved in O( n δln m ln n) iterations of a gradient-type scheme. Notation. Throughout the paper we work in a finite dimensional vector space E. We def let n = dim E. Sometimes it is convenient to identify E with Rn , which we treat as a real linear space of column vectors. We denote by E ∗ the space dual to E; that is the space of linear functions on E. For s ∈ E ∗ and x ∈ E, we denote the inner product s(x) = x(s) = hs, xi. In coordinate form hs, xi =

n X

s(i) x(i) .

i=1

A linear operator G : E →

E∗

is positive semidefinite (G º 0) if hGx, xi ≥ 0,

∀x ∈ E.

It is positive definite (G Â 0) if the above inequality is strict for all nonzero vectors. Any G Â 0 defines a norm on E: kxkG = hGx, xi1/2 ,

x ∈ E.

The dual norm is defined in the usual way: ksk∗G = sup{hs, xi : kxkG ≤ 1} = hs, G−1 si1/2 , x

s ∈ E∗.

For a closed convex bounded set C ⊂ E ∗ , ξC (x) denotes its support function: ξC (x) = maxhs, xi, s∈C

x ∈ E.

The notation ∂f (x) ⊂ E ∗ is used for the subdifferential of convex function f (·) at point x ∈ E. Thus ∂ξC (0) = C.

3

Finally, for g ∈ E ∗ , gg ∗ denotes the following linear operator: (gg ∗ )(x) = hg, xi · g ∈ E ∗ ,

x ∈ E.

In a coordinate representation, D(a) denotes a diagonal n × n-matrix with the vector a ∈ Rn on the diagonal. In this setting ek ∈ Rn denotes the kth basis vector, e¯n ∈ Rn n is used for the positive denotes the vector of all ones. Thus, In ≡ D(¯ en ). Notation R+ n orthant and ∆n ≡ {x ∈ R+ : h¯ en , xi = 1} denotes the standard simplex. Acknowledgements. The author would like to thank Hans-Jakob Luthi and Fabian Chudak for numerous stimulating discussions, and Laurence Wolsey for careful reading and suggestions on the improvement of presentation of the results.

2

Computing the rounding ellipsoids

In this section we analyze efficient algorithms for constructing the rounding ellipsoids for different types of convex sets. Throughout the paper we represent an ellipsoid Wr (v, G) ⊂ E ∗ in the following form: Wr (v, G) = {s ∈ E ∗ : ks − vk∗G ≡ hs, G−1 si1/2 ≤ r}, where G Â 0 is an operator from E to E ∗ . If v = 0, we often use the notation Wr (G). We say that an ellipsoid W1 (v, G) is a β-rounding for a convex set C ⊂ E ∗ , (β ≥ 1), if W1 (v, G) ⊆ C ⊆ Wβ (v, G). We call β the radius of the ellipsoidal rounding.

2.1

Convex sets with central symmetry

For an arbitrary g ∈ E ∗ consider the set C±g (G) = Conv {W1 (G), ±g}. For α ∈ [0, 1] denote G(α) = (1 − α)G + αgg ∗ . Lemma 1 For any α ∈ [0, 1] the following inclusion holds: W1 (G(α)) ⊂ C±g (G). def 1 ∗ 2 n (kgkG )

If σ =

(2.1)

− 1 > 0, then the function def

G(α) V (α) = ln det det G(0) = ln(1 + α(n(1 + σ) − 1)) + (n − 1) ln(1 − α),

is maximized at α∗ =

σ n(1+σ)−1 .

Moreover,

V (α∗ ) = ln(1 + σ) + (n − 1) ln (n−1)(1+σ) n(1+σ)−1 ≥ ln(1 + σ) −

4

σ 1+σ

≥

σ2 . 2(1+σ)2

(2.2)

Proof: For any x ∈ E, we have ξW1 (G(α)) (x) = hG(α)x, xi1/2 = [(1 − α)hGx, xi + αhg, xi2 ]1/2 ≤ max{hGx, xi1/2 , |hg, xi|} = max{ξW1 (G) (x), ξConv {±g} (x)} = ξC±g (G) (x). Hence inclusion (2.1) is proved. Furthermore V (α) = ln det(G−1/2 G(α)G−1/2 ) ³

= ln det (1 − α)In + αG−1/2 gg ∗ G−1/2 ¡

´

¢

= ln 1 − α + α(kgk∗G )2 + (n − 1) ln(1 − α) = ln (1 + α(n(1 + σ) − 1)) + (n − 1) ln(1 − α). Hence the first order optimality condition for function V (α) is as follows: n−1 1−α

=

n(1+σ)−1 1+α(n(1+σ)−1) .

The optimal solution of this equation is α∗ =

σ n(1+σ)−1 .

Note that

V (α∗ ) = ln(1 + σ) + (n − 1) ln (n−1)(1+σ) n(1+σ)−1 ³

= ln(1 + σ) − (n − 1) ln 1 + ≥ ln(1 + σ) −

σ 1+σ

which is the inequality (2.2).

≥

σ (n−1)(1+σ)

´

σ2 , 2(1+σ)2

2

In this section we are interested in solving the following problem. Let C be a convex centrally symmetric body, i.e. int C 6= ∅, and x ∈ C ⇔ −x ∈ C. For a given γ > 1 √ we need to find an ellipsoidal rounding for C of radius γ n. An initial approximation to the solution of our problem is given by a matrix G0 Â 0 such that W1 (G0 ) ⊆ C, and C ⊆ WR (G0 ) for a certain R ≥ 1. Let us give an example of such a problem. Example 1 Consider a set of vectors ai ∈ E ∗ , i = 1, . . . , m, which span the whole dual space. Let C be the set: C = Conv {±ai , i = 1, . . . , m}. (2.3)

5

Define G0 = Therefore

1 m

m P i=1

ai a∗i . Note that for any x ∈ E we have ξC (x) = max |hai , xi|. 1≤i≤m

·

ξW1 (G0 ) (x) =

1 m

m P

hai , xi2

(x) = m1/2 1/2 (G0 )

m

≤ ξC (x),

i=1

·

ξW

¸1/2

1 m

m P

hai , xi2

i=1

¸1/2

≥ ξC (x).

Thus, W1 (G0 ) ⊆ C ⊆ Wm1/2 (G0 ).

2

Let us analyze the following algorithmic scheme. For k ≥ 0 iterate: def

1. Compute gk ∈ C : kgk k∗Gk = rk = max{kgk∗Gk : g ∈ C}. g

2. If rk ≤ γn1/2 then Stop else αk :=

1 n

·

rk2 −n , rk2 −1

(2.4)

Gk+1 := (1 − αk )Gk + αk gk gk∗ .

end. A complexity bound for this scheme is given in the following statement. Theorem 1 Let W1 (G0 ) ⊆ C ⊆ WR (G0 ) for some R ≥ 1. Then scheme (2.4) terminates after at most 2n ln R (2.5) 2 ln γ−1+γ −2 iterations. Proof: Note that the coefficient αk in Step 2 of (2.4) is chosen in accordance with Lemma 1. def Since the method runs as long as σk = n1 rk2 − 1 ≥ γ 2 − 1, in view of inequality (2.2), we have at each step k ≥ 0 ln det Gk+1 ≥ ln det Gk + γ −2 − 1 + 2 ln γ.

(2.6)

Note that for any k ≥ 0 we have det(Gk )1/2 · vol n (W1 (In )) = vol n (W1 (Gk )) ≤ vol n (C) ≤ vol n (WR (G0 )) = Rn · det(G0 )1/2 · vol n (W1 (In )). Hence, ln det Gk − ln det G0 ≤ 2n ln R, and we get bound (2.5) by summing up (2.6) over k. 2

6

Let us estimate the total arithmetical complexity of scheme (2.4) as applied to the particular symmetric convex set (2.3). It appears that in this situation it is reasonable to def update the inverse matrices Hk = G−1 k recursively, same as the set of values (i)

νk = hai , Hk ai i,

i = 1, . . . , m,

which we treat as a vector νk ∈ Rm . The modified variant of scheme (2.4) looks as follows. · 1 m

A. Compute H0 =

m P i=1

ai a∗i

¸−1

and the vector ν0 ∈ Rm .

B. For k ≥ 0 iterate: (i )

(i)

1. Find ik such that νk k = max νk . Set rk = [ν (ik ) ]1/2 . 1≤i≤m

2. If rk ≤ γn1/2 then Stop else

(2.7)

2.1. Set σk = n1 rk2 − 1, αk = 2.2. Update Hk+1 := (i)

2.3. Update νk+1 :=

1 1−αk 1 1−αk

σk , rk2 −1

h

Hk −

h

(i)

νk −

and compute xk = Hk aik . i

αk 1+σk

· xk x∗k .

αk 1+σk

· hai , xk i2 , i = 1, . . . , m.

i

end. Let us estimate the arithmetical complexity of this scheme. For simplicity we assume that the matrix A = (a1 , . . . , am ) is dense. We write down only the leading polynomial terms of the complexity of corresponding computations, in which we count only multiplications. 2

3

n • Phase A takes mn 2 operations to compute the matrix G0 , plus 6 operations to 2 compute its inverse, and mn 2 more operations to compute the vector ν0 .

• Step 2.1 takes n2 operations. • Step 2.2 takes

n2 2

operations.

• Step 2.3 takes mn operations.

√ Using now the estimate (2.5) with R = m (see Example 1), we conclude that for γ > 1 √ and the centrally symmetric set (2.3), the scheme (2.7) can find a γ n-rounding in n2 6 (n

+ 6m) +

n2 (2m+3n) ln m 2[2 ln γ−1+γ −2 ]

arithmetic operations. Note that for a sparse matrix A the complexity of Phase A and Step 2.3 will be much lower.

7

2.2

General convex sets

For an arbitrary g from E ∗ , consider the set Cg (G) = Conv {W1 (G), g}. Note that the support function of this set is as follows: ξCg (G) (x) = max{kxkG , hg, xi},

x ∈ E.

Let r = kgk∗G and µ

G(α) = (1 − α)G +

α r

³

+

r−1 2

´2 ¡ ¢ ¶ 2 · α · gg ∗ . r

Lemma 2 For all α ∈ [0, 1), the ellipsoid Eα = {s ∈ E ∗ : ks −

r−1 2r

· αgk∗G(α) ≤ 1}

belongs to the set Cg (G). If r ≥ n, then the function ³

def

G(α) V (α) = ln det det G(0) = 2 ln 1 + α ·

is maximized at α∗ =

2 n+1

·

r−n r−1 .

r−1 2

´

+ (n − 1) ln(1 − α),

Moreover,

r−1 V (α∗ ) = 2 ln n+1 + (n − 1) ln (n−1)(r+1) (n+1)(r−1)

h

≥ 2 ln(1 + σ) − where σ =

σ 1+σ

i

³

≥

σ 1+σ

´2

(2.8) ,

r−n n+1 .

Proof: We need to prove that for all x ∈ E ·

ξEα (x) ≡ α ·

r−1 2r

µ α r

· hg, xi + (1 − α)kxk2G +

³

+

r−1 2

¸1/2 ´2 ¡ ¢ ¶ 2 · α hg, xi2 r

≤ ξCg (G) (x) = max{kxkG , hg, xi}. First, if kxkG ≤ hg, xi, then ξEα (x) ≤ α ·

r−1 2r

· hg, xi + |1 − α ·

r−1 2r |

· hg, xi = hg, xi.

Otherwise, we have −rkxkG ≤ hg, xi ≤ kxkG . Note that the value ξEα (x) depends on hg, xi in a convex way. Therefore its maximum is achieved at the end points of the feasible interval for hg, xi. At the end point hg, xi = kxkG we have already proved that ξEα (x) = kxkG . Consider now the case hg, xi = −rkxkG . Then µ

·

ξEα (x) = −α ·

r−1 2

· kxkG + (1 −

α)kxk2G

+

α r

³

+

r−1 2

¸1/2 ´2 ¡ ¢ ¶ α 2 2 2 = kxkG . · r r kxkG

Thus, we have proved that Eα ⊆ Cg (G) for any α ∈ [0, 1).

8

Further, V (α) = ln det(G−1/2 G(α)G−1/2 ) µ

µ

µ

µ α r

= ln 1 − α + ³

= 2 ln 1 + α ·

r−1 2

³

+

³

α r

= ln det (1 − α)In +

r−1 2

+

r−1 2

¶ ´2 ¡ ¢ ¶ 2 · α G−1/2 gg ∗ G−1/2 r

¶ ´2 ¡ ¢ ¶ α 2 2 · · r + (n − 1) ln(1 − α) r

´

+ (n − 1) ln(1 − α).

Hence, the first order optimality condition for function V (α) is as follows n−1 1−α

Thus, the maximum is attained at α∗ =

=

r−1 . 1+α· r−1 2

2 n+1

³

V (α∗ ) = 2 ln 1 + α∗ ·

·

r−1 2

r−n r−1 .

´

Letting σ =

we get

+ (n − 1) ln(1 − α∗ ) ³

= 2 ln(1 + σ) − (n − 1) ln 1 + ≥ 2 ln(1 + σ) −

r−n n+1 ,

2(r−n) r+1

2(r−n) (n−1)(r+1)

h

= 2 ln(1 + σ) −

´

σ 1+σ

i

. 2

In this section we are interested in solving the following problem. Let C ⊂ E ∗ be a convex set with nonempty interior. For a given γ > 1 we need to find a γn-rounding for C. An initial approximation to the solution of this problem is given by a point v0 and a matrix G0 Â 0 such that W1 (v0 , G0 ) ⊆ Q ⊆ WR (v0 , G0 ) for certain R ≥ 1. We assume that n ≡ dim E ≥ 2. Let us analyze the following algorithmic scheme. For k ≥ 0 iterate: def

1. Compute gk ∈ C : kgk − vk k∗Gk = rk = max{kg − vk k∗Gk : g ∈ C}. g

2. If rk ≤ γn then Stop else αk :=

2 n+1

·

rk −n rk −1 ,

k −1 vk+1 := vk + αk r2r (gk − vk ), k

µ

Gk+1 := (1 − αk )Gk +

αk rk

³

+

´ rk −1 2 2

end.

9

³

·

´ αk 2 rk

¶

· (gk − vk )(gk − vk )∗ .

(2.9)

A complexity bound for this scheme is given in the following statement. Theorem 2 Let W1 (v0 , G0 ) ⊆ C ⊆ WR (v0 , G0 ) for some R ≥ 1. Then scheme (2.9) terminates after at most (1+2γ)2 (2.10) · n ln R 2(γ−1)2 iterations. Proof: Note that the coefficient αk , the vector vk+1 and the matrix Gk+1 in Step 2 of (2.9) are chosen in accordance with Lemma 2. Since the method runs as long as def rk −n n+1

σk =

≥

n n+1 (γ

− 1) ≥ 23 (γ − 1),

in view of inequality (2.8) at each step k ≥ 0 we have ³

ln det Gk+1 ≥ ln det Gk +

´2 σk 1+σk

³

≥ ln det Gk +

´ 2(γ−1) 2 . 1+2γ

(2.11)

Note that for any k ≥ 0 we have det(Gk )1/2 · vol n (W1 (In )) = vol n (W1 (Gk )) ≤ vol n (C) ≤ vol n (WR (G0 )) = Rn · det(G0 )1/2 · vol n (W1 (In )). Hence, ln det Gk − ln det G0 ≤ 2n ln R, and we get bound (2.10) by summing up the inequalities (2.11). 2 Note that in the case C = Conv {ai , i = 1, . . . , m} scheme (2.9) can be implemented efficiently in the style of (2.7). We leave the derivation of this modification and its complexity analysis as an exercise for the reader. The starting rounding ellipsoid for such a set C can be chosen as follows. Lemma 3 Assume that the set C = Conv {ai , i = 1, . . . , m} has nonempty interior. Define a ˆ= where R =

p

1 m

m P

i=1

ai ,

G=

m P

1 R2

(ai − a ˆ)(ai − a ˆ)∗ ,

i=1

m(m − 1). Then W1 (ˆ a, G) ⊂ C ⊂ WR (ˆ a, G).

Proof: For any x ∈ E and r > 0, we have ξWr (ˆa,G) (x) = hˆ a, xi + rkxkG = hˆ a, xi +

r R

·m P

hai − a ˆ, xi2

¸1/2

.

i=1

Thus we have ξWR (ˆa,G) (x) ≥ max hai , xi = ξC (x); hence WR (ˆ a, G) ⊃ C. Further, let 1≤i≤m

τi = hai − a ˆ, xi, τˆ =

i = 1, . . . , m,

max hai , xi − hˆ a, xi ≥ 0.

1≤i≤m

10

and

Note that

m P i=1

τi = 0 and τi ≤ τˆ for all i. Therefore (·

ξW1 (ˆa,G) (x) − hˆ a, xi ≤

1 R

=

τˆ R

max τi

) ¸1/2 m m P P 2 τi : τi = 0, τi ≤ τˆ, i = 1, . . . , m

i=1

i=1

p

m(m − 1) = max hai , xi − hˆ a, xi = ξC (x) − hˆ a, xi. 1≤i≤m

Thus W1 (ˆ a, G) ⊂ C.

2.3

2

Sign-invariant convex sets

We call a set C ⊂ Rn sign-invariant if, for any point g from C, an arbitrary change of T n signs of its entries leaves the point inside C. In other words, for any g ∈ C R+ , we have B(g) ≡ {s ∈ Rn : −g ≤ s ≤ g} ⊆ C. Example of such sets are given by unit balls of lp -norms or of Euclidean norms generated by diagonal matrices. Clearly any sign-invariant set is centrally symmetric. Thus, in view of Lemma 1, there √ exists an ellipsoidal rounding of the radius n for such a set (that is John Theorem [5]). We will see that the important additional feature of sign-invariant sets is that the matrix of the corresponding quadratic form can be chosen to be diagonal. n ⊂ E ∗ . Let Let D Â 0 be a diagonal matrix. Let us choose an arbitrary vector g ∈ R+ C = Conv {W1 (D), B(g)},

and

G(α) = (1 − α)D + αD2 (g). Clearly the set C is sign-invariant. Let det G(0) V (α) = ln det G(α) = −

n P i=1

ln (1 + α(τi − 1)) ,

α ∈ [0, 1),

(i) 2

where τi = (gD(i)) , i = 1, . . . , n. Note that V (α) is a self-concordant function (see, for example, Chapter 4.1 in [8]). For our analysis it is important that V 0 (0) = n −

n P i=1

n P

V 00 (0) =

τi = n − (kgk∗D )2 ,

and (2.12)

(τi − 1)2 .

i=1

Lemma 4 For any α ∈ [0, 1), W1 (G(α)) ⊆ C. Moreover, if (kgk∗D )2 ≥ n + step 0 (0) def α∗ = V 00 (0)+|V−V 0 (0)|·[V 00 (0)]1/2

√ n, then the

i

h

belongs to (0, 1) and for any γ ∈ 1, √1n kgk∗D , ³

V (α∗ ) ≤ ln 1 +

γ 2 −1 γ2

11

´

−

γ 2 −1 γ2

< 0.

(2.13)

Proof: For any x ∈ Rn ≡ E, we have [ξW1 (G(α)) (x)]2 = (1 − α)hDx, xi + α ≤ (1 − α)hDx, xi + α

n P

(g (i) x(i) )2

i=1

µ n ¶2 P (i) g · |x(i) | i=1

h

≤ Further, let S = n P

(τi

− 1)2

³

≥n

i=1

n P i=1

S n

= [ξC (x)]2 .

τi = (kgk∗D )2 . By assumption, S ≥ n +

´2

−1

i2

max{ξW1 (D) (x), ξB(g) (x)}

√ n. Therefore V 00 (0) =

≥ 1 and we conclude that α∗ < 1. Finally, let us estimate from

below the Newton decrement of the self-concordant function V (·) at zero. Note that V

00 (0)

≤ max τ

½ n P

(τi −

1)2

i=1

¾

n P

:

i=1

τi = S, τi ≥ 0, i = 1 . . . n

= (S − 1)2 + n − 1. Therefore

def (V 0 (0))2 V 00 (0)

λ(0)2 =

³ 2 ´2 n2 (γ 2 −1)2 (S−n)2 γ −1 ≥ ≥ . 2 2 4 2 (S−1) +n−1 n γ −2nγ +n γ2 size α∗ is exactly the step size of the damped

≥

It remains to note that the step method:

α∗ =

1 1+λ(0)

·

Newton

−V 0 (0) V 00 (0) .

Thus, V (α∗ ) ≤ −[λ(0) − ln(1 + λ(0))] in view of Theorem 4.1.4 [8], and (2.13) follows. 2 Corollary 1 For any sign-symmetric set C ⊂ Rn with nonempty interior, there exists a diagonal matrix D Â 0 such that W1 (D) ⊆ C ⊆ W√n (D). Proof: Note that for R big enough the set {D º 0 : W1 (D) ⊆ C ⊆ WR (D)} is nonempty, closed, and bounded. Therefore the existence follows from the first statement of Lemma 4 and expression (2.12) for V 0 (0). 2 The above corollary is important for us because of the following statement. Lemma 5 Let all vectors ai ∈ Rn , i = 1, . . . , m, have nonnegative coefficients. Assume that there exists a diagonal matrix D Â 0 such that W1 (D) ⊆ Conv {B(ai ), i = 1, . . . , m} ⊆ Wγ √n (D) for certain γ ≥ 1. Then the function f (x) = max hai , xi satisfies the inequalities 1≤i≤m

√ kxkD ≤ f (x) ≤ γ n · kxkD

12

n ∀x ∈ R+ .

(2.14)

Proof: n P (j) Consider the function: fˆ(x) = max ai |x(j) |. Note that its subdifferential can be 1≤i≤m j=1

expressed as follows:

∂ fˆ(0) = Conv {B(ai ), i = 1, . . . , m}.

Thus, for any x ∈ Rn we have kxkD = max{hs, xi : s ∈ W1 (D)} ≤ max{hs, xi : s ∈ ∂ fˆ(0)} ≡ fˆ(x) s

s

√ ≤ max{hs, xi : s ∈ Wγ √n (D)} = γ n · kxkD . s

n. It remains to note that fˆ(x) ≡ f (x) for all x ∈ R+

2

n , i = 1, . . . , m. Consider the set Corollary 2 Let ai ∈ R+ n F = {x ∈ R+ : hai , xi ≤ bi , i = 1, . . . , m}

with bi > 0, i = 1, . . . , m. Then there exists a diagonal matrix D Â 0 such that W1 (D) Proof: Consider f (x) = max

1 hai , xi. 1≤i≤m bi

\

n R+ ⊂ F ⊂ W√n (D)

\

n R+ .

(2.15)

n : f (x) ≤ 1}, and the inclusions Then F = {x ∈ R+

(2.15) follow from inequalities (2.14).

2

In this section we are interested in finding a diagonal ellipsoidal rounding for the following sign-symmetric set: C = Conv {B(ai ), i = 1, . . . , m}, n \ {0}, i = 1, . . . , m. Our main assumption on these vectors is as follows: where ai ∈ R+ m def 1 P ai m i=1

a ˆ =

> 0.

ˆ = D2 (ˆ Let D a). ˆ ⊂ C ⊂ W √ (D). ˆ Lemma 6 W1 (D) m n Proof: ˆ ⊂ B(ˆ As a ˆ ∈ C, W1 (D) a) ⊆ C. On the other hand, ½

C ⊆ B(mˆ a) ⊂ x ∈

Rn

¾ n ³ (i) ´2 P x ˆ : ≤ n = Wm√n (D). mˆ a(i) i=1

2

13

For the above sign-symmetric set C ⊂ Rn , consider the following algorithmic scheme h i1/2 √ which finds a diagonal rounding of radius γ n with γ > 1 + √1n . ˆ Set D0 = D.

For k ≥ 0 iterate: def

1. Compute ik : kaik k∗Dk = rk = max kai k∗Dk . 1≤i≤m

√ 2. If rk ≤ γ n then Stop else βk :=

n P

Ã

k (j) Dk

j=1

!2

(j)

(ai )2

(2.16)

−1

,

αk :=

rk2 −n

1/2

βk +(rk2 −n)βk

,

Dk+1 := (1 − αk )Dk + αk D2 (aik ). end. Note that this scheme applies the rules described in Lemma 4 using notation βk for V Therefore, exactly as in Theorem 1 and Theorem 2, we get the following statement. 00 (0).

h

Theorem 3 For γ ≥ 1 + h

√1 n

γ 2 −1 γ2

i1/2

the scheme (2.16) terminates at most after ³

− ln 1 +

γ 2 −1 γ2

´i−1

· n(ln n + 2 ln m)

iterations. Note that the number of operations during each iteration of the scheme (2.16) is proportional to the number of nonzero elements in the matrix A = (a1 , . . . , am ).

3 Minimizing the maximum of absolute values of linear functions Consider the following linear programming problem: max |h¯ ai , yi − ci |

1≤i≤m

→

min : y ∈ Rn−1 .

(3.1)

Letting ai = (¯ aTi , −ci )T , i = 1, . . . , m, x = (y T , τ )T ∈ Rn ≡ E and d = en , we can rewrite this problem as a conic unconstrained minimization problem [11]: ½

Find

f∗

¾

def

= min f (x) = max |hai , xi| : hd, xi = 1 . x

1≤i≤m

14

(3.2)

Another application, which can be rewritten in form (3.2), is the Truss Topology Design problem (see, for example, [2] or [11], Section 5.3). In [11], in order to construct an ellipsoidal rounding for ∂f (0), the use the composite structure of function f (x) was suggested. However, the radius of this rounding was quite √ large, of the order O( m). Now by (2.4) we can efficiently compute a rounding for this set √ with radius proportional to O( n). Let us show that this leads to a much more efficient minimization scheme. Let us fix some γ > 1. Assume that using the process (2.4) we managed to construct √ an ellipsoidal rounding for the centrally symmetric set ∂f (0) of radius γ n: W1 (G) ⊆ ∂f (0) ≡ Conv {±ai , i = 1, . . . , m} ⊆ Wγ √n (G). The immediate consequences are: √ kxkG ≤ f (x) ≡ sup{hs, xi : s ∈ ∂f (0)} ≤ γ n · kxkG ,

(3.3)

s

√ kai k∗G ≤ γ n,

i = 1, . . . , m.

(3.4)

Let us fix now a smoothing parameter µ > 0. Consider the following approximation of the function f (x): µm h i¶ P hai ,xi/µ −ha ,xi/µ i fµ (x) = µ ln e +e . i=1

Clearly fµ (x) is convex and infinitely times continuously differentiable on E. Moreover, f (x) ≤ fµ (x) ≤ f (x) + µ ln(2m),

∀x ∈ Rn .

(3.5)

Finally note that for any point x and any direction h from E we have h∇fµ (x), hi = (i)

λµ (x) = ωµ (x) =

m (i) P i=1

λµ (x) · hai , hi,

1 ωµ (x)

³

´

· ehai ,xi/µ − e−hai ,xi/µ ,

i = 1, . . . , m,

´ m ³ P ehai ,xi/µ + e−hai ,xi/µ . i=1

Therefore the expression for the Hessian is as follows: h∇2 fµ (x)h, hi

=

1 µ

m ha ,xi/µ −ha ,xi/µ P i e i +e · hai , hi2 − µ1 ωµ (x)

i=1

µm P (i) i=1

¶2

λµ (x) · hai , hi

.

In view of (3.4), we have µ

h∇2 fµ (x)h, hi ≤

1 µ

max kai k∗G

1≤i≤m

¶2

· khk2G ≤

γ2n µ

· khk2G .

This implies that the gradient of function fµ (·) is Lipschitz continuous in the metric k · kG 2 with Lipschitz constant Lµ = γµn : k∇fµ (x) − ∇fµ (y)k∗G ≤ Lµ kx − ykG

15

∀x, y ∈ E.

Our approach is very similar to that of [11]. Consider the problem min{φ(x); x ∈ Q},

(3.6)

x

where Q is a closed convex set and a differentiable convex function φ(x) has a gradient, which is Lipschitz continuous in the Euclidean norm k · kG with constant L. Let us write down an efficient method for solving (3.6) (that is scheme (3.11) in [9]). Method S(φ, L, Q, G, x0 , N )

For k = 0, . . . , N do 1. Compute ∇φ(xk ). h

i

2. yk := arg min h∇φ(xk ), y − xk i + L2 ky − xk k2G . y∈Q

"

3. zk := arg min h z∈Q

4. xk+1 :=

(3.7) #

k P i+1 L 2 2 ∇φ(xi ), z − x0 i + 2 kz − x0 kG .

i=0

2 k+3 zk

+

k+1 k+3 yk .

Return: S(φ, L, Q, G, x0 , N ) ≡ yN . In accordance with Theorem 2 [9], the output of this scheme yN satisfies the inequality φ(yN ) − φ(x∗φ ) ≤

2Lkx0 −x∗φ k2G , (N +1)2

(3.8)

where x∗φ is an optimal solution to problem (3.6). As in [11], we are going to use the scheme (3.7) in order to compute an approximate solution to (3.2) with a certain relative accuracy δ > 0. Let Q(r) = {x ∈ Rn : hd, xi = 1, kxkG ≤ r}, x0 = ˜ N

G−1 d , hd,G−1 di

j

=

³

p

2eγ 2n ln(2m) 1 +

16

1 δ

´k

.

Consider the following method. Set x ˆ0 = x0 .

For t ≥ 1 iterate: µt :=

δf (ˆ xt−1 ) 2e(1+δ) ln(2m) ;

Lµt :=

(3.9)

γ2n µt ;

³

´

˜ ; x ˆt := S fµt , Lµt , Q(f (ˆ xt−1 )), G, x0 , N if f (ˆ xt ) ≥ 1e f (ˆ xt−1 ) then T := t and Stop. Theorem 4 The number of points generated by method (3.9) is bounded as follows: √ T ≤ 1 + ln(γ n). (3.10) The last point generated satisfies the inequality f (ˆ xT ) ≤ (1 + δ)f ∗ . The total number of lower-level steps in the process (3.9) does not exceed ³ ´ √ p 2γe(1 + ln(γ n)) 2n ln(2m) 1 + 1δ .

(3.11)

Proof: Let x∗ be an optimal solution to the problem (3.2). Note that all points xˆt generated by (3.9) are feasible for (3.2). Therefore in view of (3.3) f (ˆ xt ) ≥ f ∗ ≥ kx∗ kG . Thus, x∗ ∈ Q(f (ˆ xt )) for any t ≥ 0. Let ft∗ = fµt (x∗t ) = min{fµt (x) : x ∈ Q(f (ˆ xt−1 ))}. x

Since x∗ ∈ Q(f (ˆ xt )), we have in view of (3.5) ft∗ ≤ fµt (x∗ ) ≤ f ∗ + µt ln(2m). By the first part of (3.5), f (ˆ xt ) ≤ fµt (ˆ xt ). Note that kx0 − x∗t kG ≤ kx∗t kG ≤ f (ˆ xt−1 ),

t ≥ 1.

In view of (3.8), we have at the last iteration T : f (ˆ xT ) − f ∗ ≤ fµT (ˆ xT ) − fT∗ + µT ln(2m) ≤

2LµT f 2 (ˆ xT −1 ) (N +1)2

≤

f 2 (ˆ xT −1 ) δ 2 4µT e2 ln(2m)(1+δ)2

+ µT ln(2m) =

2γ 2 nf 2 (ˆ xT −1 ) µT (N +1)2

+ µT ln(2m)

+ µT ln(2m) = 2µT ln(2m).

17

Further, in view of the choice of µt and the stopping criterion, we have 2µT ln(2m) =

δf (ˆ xT −1 ) e(1+δ)

≤

δf (ˆ xT ) 1+δ .

Thus f (ˆ xT ) ≤ (1 + δ)f ∗ . It remains to prove the estimate (3.10) for the number of steps of the process. Indeed, by simple induction it is easy to prove that at the beginning of stage t the following inequality holds: ³ ´t−1 1 e

f (x0 ) ≥ f (ˆ xt−1 ),

t ≥ 1.

Note that x0 is the projection of the origin on the hyperplane hd, xi = 1. Therefore, in view of inequalities (3.3) we have f ∗ ≥ kx∗ kG ≥ kx0 kG ≥

1 √ f (x0 ). γ n

Thus at the final step of the scheme we have ³ ´T −1 1 e

f (x0 ) ≥ f (ˆ xT −1 ) ≥ f ∗ ≥

1 √ f (x0 ). γ n

This leads to the bound (3.10).

2

√ Recall that the preliminary stage of the method (3.9), that is the computation of γ nrounding for ∂f (0) with relative accuracy γ > 1, can be performed by procedure (2.4) in n2 (2m+3n) ln m n2 2 6 (n + 6m) + 2[2 ln γ−1+γ −2 ] = O(n (n + m) ln m) arithmetic operations. Since each step of method (3.7) takes O(mn) operations, the complexity of the preliminary stage is dominant if δ is not too small, say δ > √1n .

4 Bilinear matrix games with non-negative coefficients Let A = (a1 , . . . , am ) be an n × m-matrix with nonnegative coefficients. Consider the problem def Find f ∗ = min{f (x) = max hai , xi : x ∈ ∆n }. (4.1) x

1≤i≤m

Note that this format fits different standard problem settings. Consider, for example, the linear packing problem Find ψ ∗ = max n {hc, yi : hai , yi ≤ b(i) , i = 1, . . . , m}, y≥0∈R

where all entries of vectors ai are non-negative, b > 0 ∈ Rm , and c > 0 ∈ Rn . Then ψ∗ =

1 (i) hai , yi 1≤i≤m b

max {hc, yi : max

y≥0∈Rn

·

=

y≥0∈R

½

min

y≥0∈Rn

·

=

≤ 1} = max n

1

max

(i) 1≤i≤m b

¾¸−1

hai , yi : hc, yi = 1

½

min

x∈Rn

max

1

(i) 1≤i≤m b

hD−1 (c)a

¾¸−1 i , xi

18

: x ∈ ∆n

.

hc,yi 1 hai ,yi 1≤i≤m b(i)

max

As usual, we can approximate the objective function f (x) in (4.1) by the following smooth function: ¶ µm P hai ,xi/µ fµ (x) = µ ln e . i=1

In this case the following relations hold: f (x) ≤ fµ (x) ≤ f (x) + µ · ln m, Let

∀x ∈ Rn .

(4.2)

n P (j) fˆ(x) = max ai |x(j) |. 1≤i≤m j=1

Note that the subdifferential of the homogeneous function fˆ(x) at the origin is as follows: ∂f (0) = Conv {B(ai ), i = 1, . . . , m}. In Section 2.3 we have shown that it is possible to compute efficiently a diagonal matrix D Â 0 such that W1 (D) ⊆ ∂ fˆ(0) ⊆ W2√n (D), (this corresponds to the choice γ = 2 in scheme (2.16)). In view of Lemma 5, using this matrix we can define a Euclidean norm k · kD such that √ n kxkD ≤ f (x) ≤ 2 n · kxkD , ∀x ∈ R+ . (4.3) √ Moreover, in this norm the sizes of all ai are bounded by 2 n. Now, using the same reasoning as in Section 3, we can show that for any x and h from Rn 2 h∇2 fµ (x)h, hi ≤ 4n µ · khkD . Hence the gradient of this function is Lipschitz continuous with respect to the norm k · kD with Lipschitz constant 4n µ . This implies that function fµ (x) can be minimized by the efficient method (3.7). Let us fix a relative accuracy δ > 0. Let Q(r) = {x ∈ ∆n : kxkD ≤ r}, x0 = ˜ N

D−1 e¯n , h¯ en ,D−1 e¯n i

j

=

³ ´k √ 4e 2n ln m 1 + 1δ .

19

Consider the following method. Set x ˆ0 = x0 .

For t ≥ 1 iterate: µt :=

δf (ˆ xt−1 ) 2e(1+δ) ln m ;

Lµt :=

(4.4)

4n µt ;

³

´

˜ ; x ˆt := S fµt , Lµt , Q(f (ˆ xt−1 )), D, x0 , N if f (ˆ xt ) ≥ 1e f (ˆ xt−1 ) then T := t and Stop. The justification of this scheme is very similar to that of (3.9). Theorem 5 The number of points generated by method (3.9) is bounded as follows: √ (4.5) T ≤ 1 + ln(2 n). The last point generated satisfies the inequality f (ˆ xT ) ≤ (1 + δ)f ∗ . The total number of lower-level steps in the process (3.9) does not exceed ´ ³ √ √ 4e(1 + ln(2 n)) 2n ln m 1 + 1δ . (4.6) Proof: Let x∗ be an optimal solution to the problem (4.1). Note that all points xˆt generated by (4.4) are feasible. Therefore in view of (4.3) f (ˆ xt ) ≥ f ∗ ≥ kx∗ kD . Thus, x∗ ∈ Q(f (ˆ xt )) for any t ≥ 0. Let ft∗ = fµt (x∗t ) = min{fµt (x) : x ∈ Q(f (ˆ xt−1 ))}. x

Since x∗ ∈ Q(f (ˆ xt )), in view of (4.2), we have ft∗ ≤ fµt (x∗ ) ≤ f ∗ + µt ln m. xt−1 ) for By the first part of (4.2) f (ˆ xt ) ≤ fµt (ˆ xt ). Note that kx0 − x∗t kD ≤ kx∗t kD ≤ f (ˆ t ≥ 1. Thus in view of (3.8), at the last iteration T , we have: f (ˆ xT ) − f ∗ ≤ fµT (ˆ xT ) − fT∗ + µT ln m ≤ =

8nf 2 (ˆ xT −1 ) µT (N +1)2

+ µT ln m ≤

2LµT f 2 (ˆ xT −1 ) (N +1)2

f 2 (ˆ xT −1 ) δ 2 4µT e2 ln m(1+δ)2

+ µT ln m

+ µT ln m = 2µT ln m.

Further, in view of the choice of µt and the stopping criterion, we have 2µt ln m =

δf (ˆ xT −1 ) e(1+δ)

20

≤

δf (ˆ xT ) 1+δ .

Thus, f (ˆ xT ) ≤ (1 + δ)f ∗ . It remains to prove the estimate (4.5) for the number of steps of the process. Indeed, by simple induction it is easy to prove that at the beginning of stage t the following inequality holds: ³ ´t−1 1 e

f (x0 ) ≥ f (ˆ xt−1 ),

t ≥ 1.

Note that x0 is the projection of the origin at the hyperplane h¯ en , xi = 1. Therefore in view of inequalities (4.3), we have f ∗ ≥ kx∗ kD ≥ kx0 kD ≥

1 √ f (x0 ). 2 n

Thus at the final step of the scheme we have ³ ´T −1 1 e

f (x0 ) ≥ f (ˆ xT −1 ) ≥ f ∗ ≥

1 √ f (x0 ). 2 n

This leads to the bound (4.5).

2 ³√

´

Thus, we have seen that the scheme (4.4) needs O n δln m ln n iterations of the gradient scheme (3.7). Since the matrix D is diagonal, each iteration of this scheme is very cheap. Its complexity is proportional to the number of nonzero elements in the matrix A. Note also that in Steps 2 and 3 of scheme (3.7) it is necessary to compute projections onto the set Q(r), which is the intersection of a simplex and a diagonal ellipsoid. However, since D is a diagonal matrix, this can be done in O(n ln n) operations.

21

References [1] K.M. Anstreicher. Ellipsoidal approximations of convex sets based on the volumetric barrier. CORE Discussion Paper 9745, 1997. [2] A. Ben-Tal, A. Nemirovski. Lectures on Modern Convex Optimization: Analysis, Algorithms; Engineering Applications. SIAM-MPS Series in Optimization, 2001. [3] D. Bienstock. Potential Function Methods for Approximately Solving Linear Programming Problems. Theory and Practice. Kluwer Academic Publishers, Boston, 2002. [4] M.D. Grigoriadis, L.G. Khachiyan. Fast approximation schemes for convex programs with many blocks and coupling constraints. SIAM J. Optimization, 4 (1994), 86–107. [5] F. John. Extremum problems with inequalities as subsidiary conditions. In: Studies and Essays, Presented to R. Courant on his 60th Birthday January 8, 1948. Wiley Interscience, New York, 187-204. [6] L.G. Khachiyan. Rounding of polytopes in the real number model of computation. Mathematics of Operations Research, vol.21, No.2, 307-320, 1996. [7] Yu. Nesterov, A. Nemirovskii. Interior point polynomial methods in convex programming: Theory and Applications, SIAM, Philadelphia, 1994. [8] Yu. Nesterov. Introductory lectures on convex optimization. A basic course. Kluwer, Boston, 2003. [9] Yu. Nesterov. Smooth minimization of non-smooth functions. CORE DP #2003/12, February 2003. Accepted by Mathematcal Programming. [10] Yu. Nesterov. Excessive gap technique in non-smooth convex minimizarion. CORE DP #2003/35, May 2003. Accepted by SIAM J. Optimization. [11] Yu. Nesterov. Unconstrained convex minimization in relative scale. CORE DP 2003/96, November 2003. [12] S.A. Plotkin, D.B. Shmoys and E. Tardos. Fast approximation algorithms for fractional packing and covering problems. Mathematics of Operations Research, 20(1995), 257–301.

22

Recommend Documents

Rounding of convex sets and efficient gradient ... - Semantic Scholar

Efficient Stochastic Gradient Descent for Strongly Convex Optimization

On Aggregation Sets and Lower-Convex Sets - Statistics and Actuarial ...