Facility Location in Sublinear Time - People.csail.mit.edu

Report 1 Downloads 79 Views
Facility Location in Sublinear Time Mihai B˘ adoiu1 , Artur Czumaj2, , Piotr Indyk1 , and Christian Sohler3, 1

MIT Computer Science and Artificial Intelligence Laboratory, Stata Center, Cambridge, Massachusetts 02139, USA {mihai, indyk}@theory.lcs.mit.edu 2 Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA [email protected] 3 Heinz Nixdorf Institute and Computer Science Department, University of Paderborn, D-33102 Paderborn, Germany [email protected]

Abstract. In this paper we present a randomized constant factor approximation algorithm for the problem of computing the optimal cost of the metric Minimum Facility Location problem, in the case of uniform costs and uniform demands, and in which every point can open a facility. By exploiting the fact that we are approximating the optimal cost without computing an actual solution, we give the first algorithm for this problem with running time O(n log2 n), where n is the number of metric space points. Since the size of the representation of an n-point metric space is Θ(n2 ), the complexity of our algorithm is sublinear with respect to the input size. We consider also the general version of the metric Minimum Facility Location problem and we show that there is no o(n2 )-time algorithm, even a randomized one, that approximates the optimal solution to within any factor. This result can be generalized to some related problems, and in particular, the cost of minimum-cost matching, the cost of bichromatic matching, or the cost of n/2-median cannot be approximated in o(n2 )-time.

1

Introduction

The design of algorithms operating on massive data sets has received a lot of attention in recent years. The practical motivation of this study is that polynomialtime algorithms that are efficient in relatively small inputs, may become impractical for input sizes of several gigabytes. For example, when we consider approximation algorithms for clustering problems in metric spaces then they typically  

Research supported in part by NSF grant ITR-CCR-0313219. Research supported in part by DFG grant Me 872/8-2 and by the IST program of EC under contract no. IST-2002-001-907 (DELIS).

L. Caires et al. (Eds.): ICALP 2005, LNCS 3580, pp. 866–877, 2005. c Springer-Verlag Berlin Heidelberg 2005 

Facility Location in Sublinear Time

867

have Ω(n2 ) running time where n is the number of input points. Clearly, such a running time is not feasible for massive data sets. But for many problems — like the facility location problem considered in this paper — such a running time is provably unavoidable. Surprisingly, these lower bounds do not necessarily hold when one wants to estimate the cost of an optimal solution. In this paper we will indeed show that one can find a constant factor approximation algorithm for the metric uncapacitated facility location problem with uniform costs and in which every point can open a facility, that runs in O(n log2 n) time, that is, in time sublinear in the input size. Our approach is motivated by the fact that in many applications it suffices to know an approximate cost of the facility location problem rather than to find an approximate solution to the facility location problem. Let us consider the example that a company wants to invest money and it can relate the cost of the facility location problem to the possible return on investment. Then it would first solve an instance of the problem for every market to find out the most profitable one. In such a situation it is sufficient to know the return on investment before one decides which market to enter. It is not (yet) necessary to know how to achieve it. Finally, when one knows which market to enter one only has to compute a solution to a single instance of the problem. Therefore, if one could approximate the cost of an optimal solution significantly faster than finding such a particular approximate solution this would significantly speed up the market analysis. Similar arguments hold for another popular application of facility location algorithms, that of clustering data sets. In particular, it is good to know if the data can be “well-clustered” before actually attempting to find the clustering. 1.1

Our Results

In this paper we consider the metric Minimum Facility Location problem with uniform opening costs and demands, and in which every point can open a facility. We give a randomized O(1)-approximation algorithm for this problem that runs in time O(n log2 n), where n is the number of metric space points. Since the size of the representation of an n-point metric space is Θ(n2 ), the complexity of our algorithm is sublinear with respect to the input size. No o(n2 )-time approximation algorithm for this problem was known before. It has been known that any constant factor approximation algorithm that returns not only the cost, but also a solution itself, requires the running time of Ω(n2 ) [14]. Next, we prove that if the set of facilities and the cities (points that are to be connected to the facilities) are allowed to be disjoint, then any, even randomized, approximation algorithm for the cost of the Minimum Facility Location that guarantees any bounded approximation ratio for the cost, requires time Ω(n2 ). This bound holds even when the opening costs and demands are uniform. Furthermore, our proof can be extended to the problems of estimating the cost of minimum-cost matching, the cost of bi-chromatic matching, and the cost of k-median for k = n/2; all these problems require Ω(n2 ) to estimate the cost of their optimal solution to within any factor. We feel that these results

868

M. B˘ adoiu et al.

demonstrate that most optimization problems for metric instances do not have sublinear-time algorithms even to estimate well the cost of the optimal solution; results like our sublinear-time algorithm for a O(1)-factor approximation of the cost of the optimum solution for the metric uniform Minimum Facility Location problem are rare (see however, [4, 6, 7]). 1.2

Our Techniques

Our analysis of a sublinear-time algorithm consists of two principal steps: we first prove the existence of an appropriated estimator for the cost of the Minimum Facility Location problem and then we show how such an estimator can be approximated in time O(n log2 n). Our estimator is obtained by extending the primal-dual approach from [12]: for each point we define an approximation of the contribution of that point to the total cost, and then we prove that the sum of the contributions for all the points approximates the cost of the Minimum Facility Location problem. An important property of our estimator is that it can be efficiently approximated by adaptive sampling. We first prove that the individual value of an estimator for any single point can be efficiently approximated by sampling with the running time depending on the value of the estimator, and then we apply another adaptive sampling scheme to efficiently approximate the sum of the estimators. A similar approach has been used in recent sublinear-time algorithms for estimating the cost of the minimum spanning tree problem in [2] and [4]. 1.3

Definition of the Problem

The formal definition of the general form of the (Metric) Minimum Facility Location problem is as follows: We are given a metric (P, D), and a subset F ⊆ P of facilities. For each facility v ∈ F, we are given a nonnegative cost f(v), and for each point u ∈ P , a nonnegative demand d(u). The problem consists of finding a set F ⊆ F, so as to minimize   f(v) + d(u) · D(u, F ) , v∈F

u∈P

where D(u, F ) = minv∈F D(u, v). In this paper we focus on the variant of the facility location problem with F = P and in which the costs as well as the demands are uniform. That is, for each v ∈ F, f(v) = c for some c > 0, and for each u ∈ P , d(u) = 1. Observe that we can assume that c = 1, if we re-scale the given metric, by dividing all the distances by c. In what follows, we will refer to this variant of the facility location problem as uniform. The key property of our formulation, is that we are interested in computing the cost of the optimal solution, without computing a solution itself. Thus, in what follows, our task is to approximate the value:  D(u, F ) . min |F | + F ⊆P

u∈P

Facility Location in Sublinear Time

869

In the final part of the paper we also consider a more general variant of the problem when P and F do not have to be the same. We prove in Theorem 2 that in that case there is no hope to obtain a sublinear-time algorithm. 1.4

Previous Work

The Minimum Facility Location problem is one of the most extensively studied problems in combinatorial optimization. The problem is known to be N P-hard and the first constant factor approximation algorithm was given by Shmoys et al. [13]. Several other approximation algorithms are given in [1, 3, 8]. The best approximation ratio of 1.52, is due to Madhian, Ye, and Zhang [10], while the best lower bound of 1.463 for the approximation ratio is due to Guha and Khuller [5]. The first constant factor approximation algorithm with almost linear running time (that is, the running time of O(n2 log n)) was given by Jain and Vazirani [9]; Mettu and Plaxton [12] gave a simple O(n2 )-time constant approximation ratio algorithm. Thorup [14] considered the facility location problem in metric spaces defined by a graph. If the underlying graph has m edges, then even though the metric space is of size Θ(n2 ), Thorup gives a constant-factor approximation ˜ algorithm running in time O(m); this is a sublinear time for sparse graphs. On the other hand, it has been shown [14] that for general metric spaces, any constant factor approximation algorithm, even a randomized one, requires running time of Ω(n2 ). Notice that this does not exclude the possibility of approximating the cost of the Minimum Facility Location problem in sublinear time, in particular, in time O(n polylog(n)).

2

Estimating the Cost of Uniform Minimum Facility Location

In this section we present an O(n log2 n) time algorithm that approximates the cost of the Minimum Facility Location in the uniform case, that is, when the costs as well as the demands are uniform. 2.1

Preliminaries

Let (P, D) be a metric with a point set P = {p1 , . . . , pn }. For any point pi ∈ P , and for any r ≥ 0, we denote by B(pi , r) the set of points in P which are at distance at most r from pi . For each i, 1 ≤ i ≤ n, let ri > 0 be the number satisfying  (ri − D(pi , p)) = 1 . p∈B(pi ,ri )

 Observe that the value p∈B(pi ,r) (r − D(pi , p)) is continuous and strictly monotonically increasing with r. Thus, there exists a unique value ri satisfying the above equality. Moreover, for any i, 1 ≤ i ≤ n, we have 1/n ≤ ri ≤ 1.

870

M. B˘ adoiu et al.

We begin with a lemma that establishes the relation between the value of ri and the size of B(pi , ri ). Lemma 1. For every i, with 1 ≤ i ≤ n, we have

1 |B(pi ,ri )|

≤ ri ≤

2 |B(pi ,ri /2)| .

 Proof. By the definition of ri , we have p∈B(pi ,ri ) (ri − D(pi , p)) = 1, which  implies p∈B(pi ,ri ) ri ≥ 1, and thus ri ≥ 1/|B(pi , ri )|. The other inequality follows directly from the following,   1= (ri − D(pi , p)) ≥ (ri − D(pi , p)) ≥|B(pi , ri /2)| · ri /2.   p∈B(pi ,ri )

p∈B(pi ,ri /2)

MP algorithm. In our analysis we will use a simple approximation algorithm for the Minimum Facility Location problem due to Mettu and Plaxton [12]; we will refer to that algorithm as the MP algorithm. 1. Compute the value of ri for every pi ∈ P . 2. Sort the input such that r1 ≤ r2 ≤ · · · ≤ rn . 3. For i = 1 to n: if there is no open facility in B(pi , 2 ri ) then open the facility at pi . Mettu and Plaxton [12] proved that this simple algorithm will return a set of open facilities for which the total cost is at most 3 times the minimum. 2.2

Cost Estimation

In this section, we show that the sum of the radii approximates the optimal cost of the facility location  to within a constant factor. Our analysis uses the relation between the sum pi ∈P ri and the cost of optimal solution and that of the solution obtained by the MP algorithm discussed above. Let COP T be the cost of an optimal solution. Let also FM P be the set of facilities computed by the MP algorithm. For this solution given by the MP f c algorithm, we define CM P , CM P , and CM P to be the total cost, the connection cost, and the facility cost respectively. The following lemma shows that the sum of the radii estimates well COP T .  Lemma 2. 14 · COP T ≤ pi ∈P ri ≤ 6 · COP T .  Proof. We first prove  the lower bound that COP T ≤ 4 · pi ∈P ri and then the upper bound that pi ∈P ri ≤ 6 · COP T . Lower bound: Since in the MP algorithm for every pi ∈ P there is an open facility within distance at most  2 ri (for if not, then the algorithm would open c the facility at pi ), we get that 2 pi ∈P ri ≥ CM P.  f It remains to show that pi ∈P ri is an upper bound for CM P . We first observe that every pi ∈ P is contained in at most one ball B(pj , rj ), for some pj ∈ FM P . Indeed, if pi ∈ B(pj , rj ) ∩ B(pk , rk ) for some pj , pk ∈ FM P , j < k, then since

Facility Location in Sublinear Time

871

rj ≤ rk , we would have pj ∈ B(pk , 2 rk ). But this implies that the MP algorithm would not open the facility at pk , a contradiction. This observation yields:    ri ≥ rk . (1) pi ∈P

pj ∈FM P pk ∈B(pj ,rj )

Next, we observe that if pj ∈ FM P and pk ∈ B(pj , rj ), then we must have rj ≤ 2 rk . Indeed, for if not, then we would have B(pk , 2 rk ) ⊆ B(pk , rj ) ⊆ B(pk , rj + D(pj , pk )) ⊆ B(pj , 2 rj ), and thus the MP algorithm would not open the facility at pj , a contradiction. This observation can be now combined with (1) to conclude:      ri ≥ rk ≥ rj /2 pi ∈P

pj ∈FM P pk ∈B(pj ,rj )

=

1 2

·



pj ∈FM P pk ∈B(pj ,rj )

rj · |B(pj , rj )| ≥

pj ∈FM P

1 2

·



pj ∈FM P

1 =

1 2

f · CM P ,

where the second inequality follows from the fact that rj ≥ 1/|B(pj , rj )|(Lemma 1).  f c Thus, we have 2 · pi ∈P ri ≥ CM P /2 + CM P /2 ≥ CM P /2 ≥ COP T /2. Upper bound: Next, we show that the sum of the radii is not much bigger than the cost of optimal solution. Before we proceed, we introduce one definition from [12]. For a set X ⊆ P and a point pi ∈ P , we define  max{0, rj − D(pi , pj )} . charge(pi , X) = D(pi , X) + pj ∈X

 Mettu and Plaxton proved [12] that C M P = pi ∈P charge(pi , FM P ).  Now we are ready to prove that pi ∈P ri ≤ 2 · CM P what will imply that pi ∈P ri ≤ 6 · COP T . We have,  2 · CM P = 2 · charge(pi , FM P ) pi ∈P



≥2·⎝



ri +

pi ∈FM P



⎞ max{rδ(j) , D(pj , pδ(j) )}⎠ ,

pj ∈P \FM P

where δ(j) denotes the index of the facility in FM P that is closest to pj . We want to show ⎞ ⎛    ri + max{rδ(j) , D(pj , pδ(j) )}⎠ ≥ ri . 2·⎝ pi ∈FM P

pj ∈P \FM P

pi ∈P

We will show that rj ≤ D(pj , pδ(j) ) + rδ(j) , which immediately implies the above inequality because then max{rδ(j) , D(pj , pδ(j) )} ≥ rj /2. Assume rj > D(pj , pδ(j) ) + rδ(j) . In this case we have B(pδ(j) , rδ(j) ) ⊆ B(pj , rj ). We get

872

M. B˘ adoiu et al.





(rj − D(pj , p)) ≥

p∈B(pj ,rj )

(rj − D(pj , p))

p∈B(pδ(j) ,rδ(j) )



>

(rδ(j) − D(pδ(j) , p)) = 1 .

p∈B(pδ(j) ,rδ(j) )

This is a contradiction because the definition of rj requires  (rj − D(pj , p)) = 1 . p∈B(pj ,rj )

 To summarize, we have proven that 2 · CM P ≥ pi ∈P ri , and now the lower   bound follows from the fact that CM P ≤ 3 · COP T [12]. 2.3

Estimating the Cost of the Facility Location Problem

From the previous section we know that to approximate the cost of the facility  location problem it suffices to estimate the sum i ri of the radii r1 , . . . , rn of the points p1 , . . . , pn . A standard approach to this problem would be to sample a set of s points (for a suitable s), determine (possibly approximately) their radii, and then output n times their average radius as an approximation for  i ri . However, this approach cannot lead to a sublinear-time algorithm for the following reason. In general, the time to determine the radius of a point in Ω(n). For example, this might be the case when the radius is constant, because there is only a constant number of points within the radius. Therefore, to certify that a point has constant radius the algorithm must be able to certify that no more than a constant number of points are within the radius. This task cannot be done in o(n) time (even if one aims at an approximation and uses randomization). We also  note that, in general, s = Ω(n), if we need a constant factor approximation of i ri . This follows from standard Chernoff-Hoeffding bounds (which are essentially tight in this setting) and the fact that the average radius can be as small as 1/n. Therefore, this standard sampling approach would not give us a sublinear time algorithm. In the following we will show that an adaptive sampling algorithm can estimate the size of ri in O(ri n log n) time (recall that ri < 1). We start with a constant size sample of points and determine their average radius. If our sample is too small we double it and continue until we have found a sample of sufficient size. For the analysis we will parameterize the sample size s by the average value of the ri . Combining this with the running time of the adaptive algorithm leads to a sublinear algorithm. Details follow in the next two subsections. 2.4

Estimating ri

In this section we present an algorithm that for a given i, in time O(ri n log n) approximate the value of ri to within a constant factor, with high probability. Let us fix i. Our approach of estimating the value of ri is by approximating the value of r for which B(pi , r) contains approximately 1/r points. This is formalized in the following lemma.

Facility Location in Sublinear Time

873

Lemma 3. Let j0 be the maximum integer j, with 1 ≤ j ≤ log n, such that |B(pi , 2−j )| ≥ 2j . Then, we have 2−(j0 +1) ≤ ri ≤ 2−j0 +1 . Proof. We will use Lemma 1. By our assumption about j 0, we have |B(pi , 2−(j0 +1) )| < 2j0 +1 and |B(pi , 2−j0 )| ≥ 2j0 . The first inequality implies that for any r < 2−(j0 +1) , |B(pi , r)| ≤ |B(pi , 2−(j0 +1) )| < 2j0 +1 < 1/r. This bound together with the lower bound in Lemma 1 yield that ri ≥ 2−(j0 +1) . On the other hand, the inequality |B(pi , 2−j0 )| ≥ 2j0 implies that for any r > 2−j0 +1 , |B(pi , r/2)| ≥ |B(pi , 2−j0 )| ≥ 2j0 > 2/r. Therefore, by the upper bound in Lemma 1 we must   have ri ≤ 2−j0 +1 . Lemma 3 implies that in order to estimate ri , it suffices to estimate the value of j0 . Our algorithm to estimate j0 runs as follows: We begin with setting j = log n, and then we are decreasing j by one until for the first time |B(pi , 2−j )| ≥ 2j . Since computing |B(pi , 2−j )| exactly requires Ω(n) time, we only approximate |B(pi , 2−j )| by random sampling. This reduces the running time. At each step, we pick uniformly at random, and with replacement, Kj = c 2−j n log n sample points to estimate the value of |B(pi , 2−j )|, where c is a sufficiently large constant. Let Nj be the number of sample points that are inside the ball B(pi , 2−j ). We return βj = n Nj /Kj as the estimator of |B(pi , 2−j )|. In the following three lemmas we first analyze the quality of the estimator βj and then discuss the running time of this sampling scheme. Lemma 4. If j ≥ j0 + 2, then Pr[βj ≥ 2j ] < 1/poly(n). Proof. Since j ≥ j0 + 2, it follows that B(pi , 2−j ) ⊆ B(pi , 2−(j0 +1) ). Let q be the probability that a randomly chosen sample point is in B(pi , 2−j ). We have q ≤ |B(pi , 2−(j0 +1) )|/n. By the choice of j0 , we have |B(pi , 2−(j0 +1) )| < 2j0 +1 , and thus q < 2j0 +1 /n ≤ 2j−1 /n. The expected number of sample points that fall inside B(pi , 2−j ) is E[Nj ] = n qKj < c log 2 . Applying the Chernoff bound, we obtain Pr[βj ≥ 2j ] = Pr[Nj ≥ c log n] < 1/poly(n) .

 

Lemma 5. If j ≤ j0 − 1, then Pr[βj ≥ 2j ] > 1 − 1/poly(n). Proof. Since j ≤ j0 − 1, it follows that |B(pi , 2−j )| ≥ |B(pi , 2−j0 )| ≥ 2j0 ≥ 2j+1 . Let q be the probability that a randomly chosen sample point is in B(pi , 2−j ). We have that q ≥ 2j+1 /n. The expected number of sample points that fall inside B(pi , 2−j ) is E[Nj ] = q Kj ≥ 2 c log n. Applying the Chernoff bound, we obtain Pr[βj ≥ 2j ] = Pr[Nj ≥ c log n] > 1 − 1/poly(n) .

 

Lemma 6. The described procedure estimates the value of ri to within a constant factor in time O(ri n log n), with high probability.

874

M. B˘ adoiu et al.

Proof. Let j0 be the estimated value of j0 . By Lemmas 4 and 5, it follows that  with high probability, j0 ≤ j0 ≤ j0 + 1. If we use the value ri = 2−j0 as an  estimation of ri , then by Lemma 3 we obtain that ri /2 ≤ ri ≤ 4 ri . Moreover, with high probability, the running time of the procedure is at most log n   j=j0 O(Kj ) = O(ri n log n). 2.5

Estimating the Sum of the Radii  In this section we show how to estimate i ri in time almost linear in n. Let us first assume that we know the cost of the solution c, and we sample a set of s points independently and uniformly at random, where s = Θ( nc log n). Since by Lemma 6, the running time to estimate a radius ri is O(ri n log n), the total expected running time of the algorithm is  ri n log n) = O(n log2 n) . E[time] = s · E[one step] = s · O( n1 · i

Let xi , for i ∈ {1, 2, . . . , s}, be the radii of the sample points taken by the algorithm. We have  j rj . E[xi ] = n     s Θ( n log n)· i ri s· r r Let S = i=1 xi and hence, E[S] = ni i = c n = Θ ci i · log n = Θ(logn). Let  > 0 be arbitrary. Our goal is to use the value of S as the estimator of ns i ri . To show the quality of this estimator we will bound Pr[|S − E[S]| ≥  · E[S]]. By using the fact that 0 ≤ xi ≤ 1 for every i, we apply a variant of the Hoeffding inequality, see [11, Theorem 2.3], to obtain 2 ·E[S]

Pr[S ≥ (1 + ) · E[S]] ≤ e− 2(1+/3) , 1

Pr[S ≤ (1 − ) · E[S]] ≤ e− 2 ·

2

·E[S]

.

This immediately implies the following bound for any 0 <  ≤ 1, Pr[|S − E[S]| ≥  · E[S]] ≤ 2 e−Θ(

2

·E[S])

= 2 e−Θ(

2

·log n)

.

We now show how to remove the assumption that we know the cost of the solution. We run the algorithm in phases: we start in the first phase by “guessing” c = n, because we know that the cost of the optimal solution is not bigger than n. If S < ns · c, then we start a new phase with estimated cost c/2, and so on. If S ≥ ns · c, we return S · n/s as the approximation of the cost. The probability that the algorithm ends in a bad phase (when S far away from ns · c) is low, because Pr[S ≥ (1 + ) · E[S]] < 1/poly(n), as shown above. Since we need to have at least one facility in a solution, we have c ≥ 1, therefore we have at most a logarithmic number of phases. Note that we only get a constant slowdown by running these phases to guess c, because the last phase, for the smallest c, dominates the running time of all the other phases. Thus we obtain the following theorem.

Facility Location in Sublinear Time

875

Theorem 1. There exists a constant factor approximation algorithm for the uniform case of the Minimum Facility Location problem which runs in time O(n log2 n) with high probability.

3

Lower Bounds: Estimating the Cost in the General Case of the Uniform Minimum Facility Location Problem Requires Ω(n2 ) Time (Even for Randomized Algorithms)

In this section, we consider a general case of the Minimum Facility Location problem in which we do not impose the restriction that F = P (that is, we allow only for a subset of points to be able to open a facility). We focus again on the uniform case, and the goal is to minimize the following cost: ⎛ ⎞  d(p, F )⎠ . min ⎝|F | + F ⊆F

p∈P

Our main result is the following theorem. Theorem 2. For any  ≥ 1, every approximation algorithm (even a randomized one) with approximation ratio  for the cost of the Minimum Facility Location problem as defined above requires time Ω(n2 ). Proof. We show the existence of two instances of the metric spaces which are undistinguishable by any o(n2 )-time algorithms and such that the cost of the

Fig. 1. Two metric spaces undistinguishable by any o(n2 )-time algorithms whose costs of the Minimum Facility Location differ by factor . The perfect matching connecting F with P is selected at random and the edge e is selected as a random edge from the matching. We set Q = 2 n ( − 1) + 2. The distances not shown are all equal to n3 

876

M. B˘ adoiu et al.

Minimum Facility Location in one instance is greater than  times than the one in the other instance (see Fig. 1). Let us consider the metric space with 2 n points: n points in P and n points in F. Take a random perfect matching M between the points in P and F, and choose an edge e ∈ M at random. Now, we define the distances in (P ∪ F, D) according to the following: – for any e∗ ∈ M \ {e}, D(e∗ ) = 1, – D(e) is either 1 or Q = 2 n ( − 1) + 2, and – for any pair of points x, y not connected by an edge from M, D(x, y) = n3 . It is easy to see that both instances define properly a metric space (P ∪F, D). Furthermore, that for such problem instances, the solution to the Minimum Facility Location will open all facilities and the cost of the Minimum Facility Location problem will depend on the choice of D(e): if D(e) = Q then the cost will be 2n − 1 + Q > 2 n , and if D(e) = 1, then the cost will be 2 n. Hence, any -factor approximation algorithm for the matching problem must distinguish between these two problem instances. However, this requires to find if there is an edge of length Q, and this is known to require time Ω(n2 ), even if a randomized algorithm is used.   3.1

Extensions

It is not difficult to see that almost an identical proof will also work for estimating the cost of minimum-cost matching, the cost of minimum-cost bi-chromatic matching, and also the cost of k-median for k = n/2; all these problems require Ω(n2 ) to estimate the cost of their optimal solution to within any factor. No such lower bounds have been previously known. Theorem 3. For any  ≥ 1, every approximation algorithm (even a randomized one) with approximation ratio  for each of the following problems requires time Ω(n2 ): – estimating the cost of minimum-cost matching for a set of n points in a metric space, – estimating the cost of minimum-cost bi-chromatic matching for a set of n points in a metric space, – estimating the cost of metric k-median for k = n/2.

References 1. M. Charikar and S. Guha. Improved combinatorial algorithms for the facility location and k-median problems. Proceedings of the 40th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 378–388, 1999. 2. B. Chazelle, R. Rubinfeld, and L. Trevisan. Approximating the minimum spanning tree weight in sublinear time. Proceedings of the 28th Annual International Colloquium on Automata, Languages and Programming (ICALP), pp. 190–200, 2001.

Facility Location in Sublinear Time

877

3. F. A. Chudak. Improved approximation algorithms for uncapacitated facility location. Proceedings of the 6th International Integer Programming and Combinatorial Optimization Conference (IPCO), pp. 180–194, 1998. 4. A. Czumaj and C. Sohler. Estimating the weight of metric minimum spanning trees in sublinear time. Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pp. 175–183, 2004. 5. S. Guha and S. Khuller. Greedy strikes back: Improved facility location algorithms. Journal of Algorithms, 31(1): 228–248, 1999. 6. P. Indyk. Sublinear time algorithms for metric space problems. Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC), pp. 428–434, 1999. 7. P. Indyk. A sublinear time approximation scheme for clustering in metric spaces. Proceedings of the 40th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 154–159, 1999. 8. K. Jain, M. Mahdian, and A. Saberi. A new greedy approach for facility location problems. Proceedings of the 34th Annual ACM Symposium on Theory of Computing (STOC), pp. 731–740, 2002. 9. K. Jain and V. Vazirani. Approximaton algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. Journal of the ACM, 48(2): 274–296, 2001. 10. M. Mahdian, Y. Ye, and J. Zhang. Improved approximation algorithms for metric facility location problems. Proceedings of the 5th International Workshop on Approximation Algorithms for Combinatorial Optimization(APPROX), pp. 229–242, 2002. 11. C. McDiarmid. Concentration. In M. Habib, C. McDiarmid, J. Ramirez-Alfonsin, and B. Reed, editors, Probabilistic Methods for Algorithmic Discrete Mathematics, Algorithms and Combinatorics, pp. 195–247. Springer-Verlag, Berlin, 1998. 12. R. R. Mettu and C. G. Plaxton. The online median problem. SIAM Journal on Computing, 32(3): 816–832, 2003. 13. D. B. Shmoys, E. Tardos, and K. Aardal. Approximation algorithms for facility location problems. Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC), pp. 265–274, 1997. 14. M. Thorup. Quick k-median, k-center, and facility location for sparse graphs. SIAM Journal on Computing, 34(2):405–432, 2005.