Geometric Clustering: Fixed-Parameter Tractability ... - Semantic Scholar

Report 2 Downloads 149 Views
Geometric Clustering: Fixed-Parameter Tractability and Lower Bounds with Respect to the Dimension Sergio Cabello∗

Panos Giannopoulos†

Abstract We present an algorithm for the 3-center problem in (Rd , L∞ ), i. e., for finding the smallest side length for 3 cubes that cover a given n-point set in Rd , that runs in O(n log n) time for any fixed dimension d. This shows that the problem is fixed-parameter tractable when parameterized with d. On the other hand, using tools from parameterized complexity theory, we show that this is unlikely to be the case with the k-center problem in (Rd , L2 ), for any k ≥ 2. In particular, we prove that deciding whether a given n-point set in Rd can be covered by the union of 2 balls of given radius is W[1]-hard with respect to d, and thus not fixed-parameter tractable unless FPT=W[1]. Our reduction also shows that even an O(no(d) )-time algorithm for the latter does not exist, unless SNP ⊂ DTIME(2o(n) ). Keywords: Clustering, Fixed-parameter tractability, Complexity, Lower bound, Dimension.

1 Introduction A common type of facility location or clustering problem is the k-center problem, which is defined as follows: Given a set P of n points in a metric space and a positive integer k, find a set of k supply points such that the maximum distance between a point in P and its nearest supply point is minimized. For the cases of the (Rd , L2 ) and (Rd , L∞ )-metric the problem is usually referred to as the Euclidean and rectilinear k-center respectively. Drezner [6] describes many variations of the facility location problem and their numerous applications. kcenter problems as well as other clustering problems can be formulated as geometric optimization problems and, as such, they have been studied extensively in the field of computational geometry; see, for example, the survey by Agarwal and Sharir [1] and the references therein. Our Results. We show that the rectilinear 3center problem can be solved in O(6d dn log(dn)) time, which is a considerable improvement over the fastest ∗ University of Ljubljana, Department of Mathematics, IMFM and FMF, Jadranska 19, SI-1000 Ljubljana, Slovenia, sergio. [email protected]; partially supported by the Slovenian Research Agency, project J1-7218. † Humboldt-Universit¨ at zu Berlin, Institut f¨ ur Informatik, Unter den Linden 6, D-10099 Berlin, Germany, panos@informatik. hu-berlin.de. ‡ Freie Universit¨ at Berlin, Institut f¨ ur Informatik, Takustr. 9, D-14195 Berlin, Germany, {knauer,rote}@inf.fu-berlin.de.

Christian Knauer‡

G¨ unter Rote‡

currently known algorithm of Assa and Katz [2], which runs in O(nbd/3c log n) time. Our algorithm is based on two ingredients. First, we solve the corresponding decision problem in O(6d dn + dn log n) time by an elegant and quite simple reduction to 2-satisfiability (2SAT). In the decision problem, one asks whether P can be covered by 3 axis-aligned cubes of given size. Second, we use the technique by Frederickson and Johnson [10] to efficiently search among the candidate values for the optimal side length of the cubes. In terms of parameterized complexity theory (see below), our algorithm proves that the rectilinear 3center problem is fixed-parameter tractable with respect to d: the running time is of the form O(f (d) · nc ), where c is independent of d. Note that we cannot expect an algorithm that is polynomial in n and d because the problem is NP-hard [12]. On the negative side, we prove that the Euclidean kcenter can probably not be solved with a running time of the form O(f (d) · nc ), even when k = 2. More precisely, we show that the corresponding decision problem is W[1]-hard with respect to d. The decision problem amounts to deciding whether P can be covered by 2 balls of given radius. We prove this by an fpt-reduction from the k-independent set problem in general graphs, which is known to be W[1]-complete [8]. Moreover, our reduction implies that this problem and, consequently, the Euclidean k-center problem, for any k ≥ 2, cannot be solved in O(no(d) ) time unless SNP ⊂ DTIME(2o(n) ). This considerably strengthens Megiddo’s [12] result that Euclidean 2-center is NP-hard. Parameterized Complexity. We review some basic definitions of parameterized complexity theory; for an introduction to the field, the reader is referred to the textbooks by Downey and Fellows [5], and Flum and Grohe [8]. A problem with input size n and a positive integer parameter k is fixed-parameter tractable if it can be solved by an algorithm that runs in O(f (k) · nc ) time, where f is a computable function depending only on k, and c is a constant independent of k; such an algorithm is (informally) said to run in fpt-time. The class of all fixed-parameter tractable problems is denoted by FPT. An infinite hierarchy of classes, the W-hierarchy,

2

S. Cabello, P. Giannopoulos, C. Knauer, G. Rote — Geometric Clustering: Fixed-Parameter Tractability

has been introduced for establishing fixed-parameter intractability. Its first level, W[1], can be thought of as the parameterized analog of NP: a parameterized problem that is hard for W[1] is not in FPT unless FPT = W[1], which is considered highly unlikely under standard complexity theoretic assumptions. Hardness is sought via fpt-reductions: an fpt-reduction is an fpt-time Turing reduction from a problem Π, parameterized with k, to a problem Π0 , parameterized with k 0 , such that k 0 ≤ g(k) for some computable function g. Related Results. Efficient polynomial-time algorithms have been found for the planar k-center problem when k is a small constant [3, 7, 14]. Also, the rectilinear 2-center problem can be solved in polynomial time, even when d is part of the input [12]. However only, O(nO(kd) )-time algorithms are known when both k and d are part of the input, in particular, for k ≥ 2 and d > 2 for the Euclidean case [1], and k ≥ 3 and d ≥ 6 for the rectilinear case [2]. As for lower bounds, the only ones known come from classical complexity theory: Both the Euclidean and rectilinear (decision) problems are NPhard, for d = 2 when k is part of the input [9, 13], while, as mentioned above, the Euclidean 2-center and rectilinear 3-center are NP-hard when d is part of the input [12]. These results do not exclude the possibility of algorithms in which the exponent of n in the running time is independent of the parameter k or d or both. In this paper, we study the parameterized k-center problem when the dimension d is the parameter. For the case where the number k of clusters is considered as the parameter, Marx [11] showed that the rectilinear k-center decision problem is W[1]-hard, for any d ≥ 2; according to him (personal communication), the reduction carries over to the Euclidean case as well. Note: While this paper was under review, D´ aniel Marx has shown that the rectilinear 4-center decision problem is W[1]-hard when parameterized with the dimension d. 2 The Rectilinear 3-Center Problem The function xj denotes the projection onto the jcoordinate axis. Therefore, xj (p) is the j-th coordinate of a point p and xj (A) is an interval for any cube A. Theorem 2.1. (a) Given n points in d dimensions, we can decide whether they can be covered by three axis-aligned cubes of given side length m in O(6d · dn + dn log n) time. (b) The smallest side length m for which the given points can be covered can be determined in O(6d · dn log(dn)) time. Proof. (a) We can assume w.l.o.g. that m = 1. Let P = {p1 , . . . , pn } be the input point set. We denote the

three cubes by A, B, and C. Each cube is the Cartesian product of d unit intervals. Projecting the n points and the cubes on the j-th coordinate axis, we get n real numbers xj (pu ) and 3 unit intervals xj (A), xj (B), and xj (C) (whose positions are to be determined). We sort the coordinates of the points in each of the coordinate directions in O(dn log n) time. We have a covering if we can assign every point pu to one of the cubes (A, B, or C) such that, in each coordinate, this point is covered by the interval corresponding to the assigned cube. In the following, we will consider the dimensions separately. We will look at the projection on each coordinate j and try to see by which interval a point can be covered in this coordinate. Let the minimum and maximum coordinate values be lj and rj . If the diameter rj − lj is at most one, we can, for example, align the three left interval endpoints with the leftmost point lj . Then, in this coordinate, all points are covered by all intervals. This means that we can eliminate this coordinate from consideration. From now on, we will assume that all these irrelevant coordinates have been eliminated, and thus, the diameter in coordinate j is bigger than one. Then we can assume, w.l.o.g., that no interval sticks out to the left of lj or to the right of rj . On the other hand, these points must be covered by some interval. Thus we can make the following assumption: In dimension j, one of the intervals xj (A), xj (B), xj (C) has its left endpoint aligned with the leftmost point lj . Another interval has its right endpoint aligned with the rightmost point rj . The third interval (the “middle” interval) lies between these two positions. Intuitively we can see the middle interval “floating” between lj and rj because its position is not yet determined. The boundary cases, where the middle interval coincides with the left or right interval, are permitted. We can thus classify the solutions into 6d patterns, according to the intervals (xj (A), xj (B), or xj (C)) which are the left, middle, and right intervals in each coordinate direction. Formally, a pattern is represented as a sequence (L1 , M1 , R1 ), . . . , (Ld , Md , Rd ), where each triplet (Lj , Mj , Rj ) is a permutation of the three symbols A, B, C. Let us restrict our attention to one fixed pattern. We now describe how to model this restricted covering problem as a logical satisfiability problem in conjunctive normal form, and decide whether such restricted covering exists in O(dn) time. We have 3n Boolean variables yAu , yBu , yCu . The variable yXu represents the fact that point pu is covered by box X, for X = A, B, C.

S. Cabello, P. Giannopoulos, C. Knauer, G. Rote — Geometric Clustering: Fixed-Parameter Tractability Mj

We have the n covering clauses (2.1)

(2.2)

Lj

(yAu ∨ yBu ∨ yCu ),

for u = 1, . . . , n, expressing the fact that every point is covered (by at least one box). Let us now look at some dimension j, where xj (Lj ), xj (Mj ), xj (Rj ) are the left, middle, and right interval in dimension j according to the chosen pattern. (Lj , Mj , Rj is a permutation of A, B, C.) The positions of the intervals xj (Lj ) and xj (Rj ) are fixed, and we only have to decide the position of the middle interval xj (Mj ), that floats between lj and rj . When xj (pu ) > lj + 1, the point pu cannot be covered by the box Lj and we can put the following set of clauses with one literal: (¬yLj u ),

3

lj

Rj rj variables for points in this range do not appear negated

Figure 1: The points in the indicated region do not appear in a clause of the form (2.2–2.4). in dimension j. Thus, suppose finally that X = Mj . The interval for Mj was chosen such that xj (pu ) does not lie to the left of xj (Mj ). If xj (pu ) lies to the right of xj (Mj ) it means that some point pv , whose distance xj (pu ) − xj (pv ) from pu is bigger than 1, has also yMj v true. This contradicts the clause (2.4). t u

for all u with xj (pu ) > lj + 1. A similar argument All clauses except the clauses (2.1) contain at most applies to the box Rj , and we can put the following set two literals. We will now show that the clauses (2.1) can of clauses: be eliminated, turning the problem into a 2-satisfiability problem, which can be solved in linear time. (2.3) (¬yRj u ) Any of the clauses (2.2) or (2.3) effectively sets a variable to false, and it can be immediately used to for all u with xj (pu ) < rj − 1. We can cover two points eliminate a literal from one of the clauses (2.1). If we pu and pv with the box Mj only if the distance between perform this elimination for all literals, we end up with xj (pu ) and xj (pv ) is at most one. Thus we add the n clauses, each of which contains a subset of yAu , yBu , following set of clauses: yCu . (If we obtain an empty clause, we know that the problem is not satisfiable.) We call the reduced covering (2.4) (¬yMj u ∨ ¬yMj v ), clauses the resulting clauses which contain at most two for all u, v with |xj (pu ) − xj (pv )| > 1. literals, and we denote them by (2.10 ). Lemma 2.1. There is a covering conforming to the Lemma 2.2. There is a covering conforming to the chosen pattern if and only if the clauses (2.1–2.4) are chosen pattern if and only if the clauses (2.10 ) and satisfiable. (2.2–2.4) are satisfiable. Proof. Suppose we have a covering conforming to the chosen pattern. Set yXu to true if and only if point pu is covered by box X. Then it is easy to check that all clauses are satisfied. Conversely, assume that we have a Boolean assignment that satisfies all clauses. In each dimension j, the intervals xj (Lj ) and xj (Rj ) are already fixed, and we place the interval xj (Mj ) as follows: we align its left endpoint with the leftmost point xj (pu ) (in dimension j) for which yMj u is true. This defines the position of the boxes A, B, C. For a point pu the clauses (2.1) imply that at least one of yAu , yBu , yCu is true. We have to show that, if yXu is true, then these chosen unit intervals for box X cover point pu in every dimension. If X = Lj or X = Rj in dimension j, the clauses (2.2) or (2.3) ensure that point pu is covered

Proof. The new set of clauses is weaker than the old one: it is derived by drawing logical conclusions (actually, some form of resolution), and omitting the clauses with three literals. Therefore, when the clauses (2.1–2.4) are satisfiable, also the new set of clauses is satisfiable. Thus we only have to show that the clauses (2.1–2.4) are satisfiable whenever the reduced system of clauses is satisfiable. A reduced clause (2.10 ) implies that the corresponding original clause is also satisfied. Consider now a clause (2.1) for a point pu which remains intact during the reduction process. None of yAu , yBu , and yCu , ever appears in a clause (2.2) or (2.3). In other words, in each dimension j, point pu lies within distance 1 both of the leftmost point lj and of the rightmost point rj ; see Figure 1. This means that point pu is covered by all three intervals, no matter where the interval xj (Mj ) is.

4

S. Cabello, P. Giannopoulos, C. Knauer, G. Rote — Geometric Clustering: Fixed-Parameter Tractability

On the logical level, none of yAu , yBu , and yCu appears in the clauses (2.4), and thus they do not appear in negated form at all. We can thus satisfy the clause (yAu ∨ yBu ∨ yCu ) simply by setting all three variables to true. t u Thus we have reduced the covering problem for a fixed pattern to an equivalent 2-SAT instance. There are O(n) clauses of type (2.10 ), O(dn) clauses of types (2.2) and (2.3), but O(dn2 ) clauses of type (2.4). The clauses of the last type can be replaced by O(dn) clauses by introducing auxiliary variables, as follows: Let us look at a fixed dimension j. The O(n2 ) clauses of the form (2.4) involve the n variables yMj u , which we abbreviate by wu , and we assume for simplicity of notation that the points are ordered by the j-th coordinate: xj (p1 ) ≤ xj (p2 ) ≤ · · · ≤ xj (pn ). The O(n2 ) clauses of the form (2.4) can be equivalently written as implications:

are true, for some v. By the definition of zu¯(v) , there is some true wu with u ≤ u ¯(v). Since we have xj (pu ) ≤ xj (pu¯(v) ) < xj (pv ) − 1 and wu , wv are true, then wu , wv violate (2.5). Conversely, assume that (2.6–2.8) is fulfilled, and let us prove (2.5) for each pair u, v with xj (pu ) < xj (pv )−1. The clauses (2.8) include the clause zu¯(v) ⇒ ¬wv , and from the definition of u ¯(v) we have u ≤ u ¯(v) < v. Thus, the chain of implications wu ⇒zu ⇒zu+1 ⇒· · ·⇒zu¯(v) ⇒ ¬wv proves (2.5). t u

We have reduced the number of clauses to O(dn), and each clause has at most two literals. The clauses can be generated in O(dn) time if the input coordinates are sorted in each dimension, and the satisfiability of these clauses can be tested in O(dn) time as well. This procedure has to be repeated for each of the 6d patterns. This concludes the proof of part (a) of Theorem 2.1. (b) The minimum side length m for which the given points are covered is one of the O(dn2 ) pairwise dis(2.5) wu ⇒ ¬wv , tances |xj (pu ) − xj (pv )|. We initially sort in O(dn log n) time the input coordinates in each dimension. For each whenever xj (pv ) − xj (pu ) > 1. We introduce auxiliary variables zu that are in- dimension j, assuming for simplicity of notation that tended to represent the fact that the interval for Mj the points are indexed such that xj (p1 ) ≤ xj (p2 ) ≤ j j starts left of xj (pu ) or at xj (pu ). Then we have the · · · ≤ xj (pn ), we define an n × n matrix ∆ = {δuv } j j with entries δuv = xj (pu ) − xj (pv ). Each matrix ∆ is implications a sorted matrix: each column has nondecreasing values and each row has non-increasing values. The matrices (2.6) wu ⇒ zu , ∆1 , . . . , ∆d are not constructed explicitly, but only some for u = 1, . . . , n, and of their entries will be evaluated. Let ∆ denote the multiset of dn2 entries in ∆1 , . . . , ∆d . Clearly, the sought (2.7) zu ⇒ zu+1 , value m is one of the values in ∆. Frederickson and Johnson [10] showed how to select for u = 1, . . . , n − 1. for any 1 ≤ k ≤ dn2 the k-th largest entry in Finally, for a given point pv with xj (pv ) > lj + 1, 1 d let u ¯(v) denote the largest index u such that xj (pu ) < the collection of sorted matrices ∆ , . . . , ∆ evaluating j xj (pv ) − 1, (i. e., pu¯(v) is the right-most point with this O(dn) entries. In our scenario, any desired entry δuv can be obtained in O(1) time, after the initial sorting property). Then we add the O(n) clauses of the coordinates. Thus, we can find the k-th largest value of ∆ in O(dn) time. (2.8) zu¯(v) ⇒ ¬wv , We can now perform a binary search for m on the for all v = 1, . . . , n with xj (pv ) > lj + 1. We have entries of ∆. Since ∆ has dn2 values, the binary search omitted the reference to j for the variables w and z, requires O(log(dn)) calls to the selection procedure and but it should be kept in mind that this procedure has applications of the decision algorithm from part (a). to be carried out for each dimension j separately. Therefore, each of the O(log(dn)) steps of the binary search requires O(6d · dn) time, after the initial sorting Lemma 2.3. For any given values of the variables of the coordinates. t u w1 , . . . , wn , the clauses (2.5) are satisfied if and only if there is a truth assignment for the variables z1 , . . . , zn , 3 The Euclidean 2-Center Problem that satisfies (2.6–2.8). In this section we give an fpt-reduction from the paramProof. If we have a truth assignment w1 , . . . , wn satis- eterized k-independent set problem in general graphs, fying the clauses (2.5), we set zu := w1 ∨ w2 ∨ · · · ∨ wu . which is known to be W[1]-complete [8], to the EuThen (2.6) and (2.7) are satisfied by construction. To clidean 2-center decision problem, parameterized with prove (2.8), assume for contradiction that wv and zu¯(v) the dimension d. Let [n] = {1, . . . , n}. Let k be the

S. Cabello, P. Giannopoulos, C. Knauer, G. Rote — Geometric Clustering: Fixed-Parameter Tractability size of the independent set being looked for in a graph G([n], E). We assume that n ≥ 4 and n is even, by adding an additional vertex to G if necessary and connecting it to all other vertices. Using G, we will construct a point set P in R2k+1 with the property that P can be covered by 2 unit balls if and only if G has a independent set of size k.

5

Yi Ci

pi, n2 +1

pi2 o

Xi We first give a high-level overview of our reduction σ 1 2π at the logical level. We start with a scaffolding point pi, n2 +2 n pi1 set P 0 of nk + 2 points. For an appropriate radius ρ, the set P 0 has the property that there are nk ways to cover it with two balls of radius ρ, in one-topin pi, n2 +3 one correspondence with all k-tuples (u1 , . . . , uk ) with 1 ≤ ui ≤ n. These coverings allow us to represent the potential independent sets of vertices in the graph. Figure 2: Point set P and the two pairs a i i1 = More precisely, they represent ordered selections of k {p , p n } and a = {p , p n }. i2 i2 i, 2 +3 i1 i, 2 +2 not necessarily distinct vertices of the graph. The structure of the input graph is represented using additional constraint points: for each pair of distinct indices i 6= j (1 ≤ i, j ≤ k) and for each pair of (possibly equal) vertices u, v ∈ [n], we define uv a constraint point qij which is covered by all solutions (u1 , . . . , uk ) with the exception of those with ui = u and uj = v. In particular, we add to P 0 k2 n constraint uu points QV = { qij | 1 ≤ u ≤ n, 1 ≤ i < j ≤ k } to ensure that all components ui in a solution must be distinct. Also, for each edge uv ∈ E we add all k(k − 1) uv points qij with i 6= j. In this way, we ensure that the remaining coverings (u1 , . . . , uk ) represent independent sets of size k. In total the edges are represented by the uv k(k − 1)|E| points QE = { qij | uv ∈ E, 1 ≤ i, j ≤ k, i 6= j }. The resulting set P = P 0 ∪ QV ∪ QE will  k have in total nk +2+ 2 (n+2|E|) points. A covering of P exists if and only if the graph has an independent set of size k. (Each independent set of size k is represented by k! coverings.)

3.1 The Scaffolding Point Set. On each plane Ei we define a set Pi consisting of n points regularly spaced on the unit circle Ci centered at the origin o. We define Pi = {pi1 , . . . , pin } ⊂ Ei with xi (piu ) = cos (2u − 3) nπ , yi (piu ) = sin (2u − 3) nπ

We also use two anchor points pz , −pz on the Z-axis with z(pz ) = 2. The scaffolding point set P 0 is defined by P 0 = P1 ∪· · ·∪Pk ∪{pz , −pz }. We have |P 0 | = nk+2. This point set is highly symmetric. In particular, since the planes Ei are orthogonal, we can independently rotate each set Pi by multiples of 2π/n. The radius ρ of the two balls with which we would like to cover P 0 will be slightly smaller than 5/4. Thus the two anchor points must be covered by two different balls. The ball containing pz will be called the top ball, while the ball containing −pz is called the bottom ball. Two points p and −p of Pi are called antipodal. We will first describe the geometry of the point sets exactly, as if exact square roots and expressions of the For any u with 1 ≤ u ≤ n we define the index of its ¯ = (u + n2 ) mod n + 1. form sin nπ were available. We will later show that the almost antipodal partner as u essential features of our construction are preserved when The pair aiu of almost antipodal points is defined as the data are perturbed within some tolerance. This aiu = {piu , pi¯u }, for i = 1, . . . , k and u = 1, . . . , n. See allows us to work with fixed-precision roundings of the Fig. 2 for an illustration. Any pair aiu can be seen as a exact construction, making the reduction suitable for counter-clockwise rotation, on the plane Ei , of the pair ai1 about o by θu = (u − 1) 2π the Turing machine model. n . In our computations, we will frequently use the value Notation. For our construction it is convenient to view R2k+1 as the product of k orthogonal planes σ := −yi (pi1 ) = sin nπ . E1 , . . . , Ek , where each Ei has coordinate axes Xi , Yi , Two antipodal points of Pi and the top anchor plus an extra axis denoted by Z. For giving coordinates, the axes are considered in the order form an isosceles triangle whose circumradius is 5/4. X1 , Y1 , . . . , Xk , Yk , Z. The coordinate on Xi , Yi , and Therefore, if ρ < 5/4, the top ball (or the bottom ball) Z of a point p is denoted by xi (p), yi (p), and z(p), re- cannot contain two antipodal points. (With a radius spectively. of 5/4, the top ball could be centered on the Z-axis at

6

S. Cabello, P. Giannopoulos, C. Knauer, G. Rote — Geometric Clustering: Fixed-Parameter Tractability

height 3/4 and cover all points P 0 except the bottom anchor.) By choosing ρ < 5/4, we ensure that each ball can cover at most half of the points from every Pi . We define the radius ρ as the smallest radius such that the top ball can cover precisely n/2 consecutive points of each subset Pi , besides the anchor pz . From the discussion, it is clear that 1 < ρ < 5/4. The precise value of ρ will be given below. Let A(u1 , . . . , uk ) be the set of 2k points A(u1 , . . . , uk ) = a1u1 ∪ a2u2 ∪ · · · ∪ akuk . We denote by B(u1 , . . . , uk ) the smallest enclosing ball of A(u1 , . . . , uk ) ∪ {pz }. The intersection of B(u1 , . . . , uk ) with Ei is a disk of radius smaller than 1 that contains the pair aiui . It follows that B(u1 , . . . , uk ) also contains the n/2 consecutive points of Pi between the points of the pair aiui . Since the planes E1 , . . . , Ek are orthogonal, each ui independently defines which of the n/2 consecutive points of the sets Pi is covered by B(u1 , . . . , uk ). The complementary halves can then be covered by the bottom ball. In total, we have nk possible partitions of P 0 into two groups covered by the two balls, which correspond to the nk possible tuples (u1 , . . . , uk ) ∈ [n]k .

We will later need the fact that 3/4 < h < 1, which follows from σ 2 < k and h > 1 − k/(4k + σ 2 ) > 1 − k/(4k) = 3/4. Note that by symmetry, any bottom ball will have its center c0 with z(c0 ) = −h. We conclude with the following characterization of the possible coverings of P 0 with two balls of radius ρ. Lemma 3.2. Assume that two balls B, B 0 of radius ρ cover P 0 , and that pz ∈ B. Then there is a tuple (u1 , . . . , uk ) ∈ [n]k such that B = B(u1 , . . . , uk ) and B 0 = −B(u1 , . . . , uk ). Proof. As discussed before, B or B 0 can contain at most n/2 consecutive points of Pi , and therefore, B and B 0 must cover complementary halves of each Pi . If B covers the halves between the pairs a1u1 , . . . , akuk , it follows from the uniqueness of the minimum enclosing balls that B = B(u1 , . . . , uk ). Since B 0 covers the complementary halves of each Pi and −pz , it follows that B 0 = −B(u1 , . . . , uk ). t u From this characterization, the bijection between the possible coverings of P 0 and [n]k is clear. Every covering consists of a symmetric pair of balls B = B(u1 , . . . , uk ) and B 0 = −B(u1 , . . . , uk ).

Lemma 3.1. All balls B(u1 , . . . , uk ) have the same ra- 3.2 Constraint Points We continue now the condius s struction of point set P , by showing how we encode k 5 the structure of G. For each pair of distinct indices . ρ= · 4 k + σ 2 /4 i 6= j (1 ≤ i, j ≤ k) and for each pair of (possibly equal) vertices u, v ∈ [n], we define a constraint The center c of B(u1 , . . . , uk ) has coordinates uv point qij . All constraint points lie on the hyperplane (ui −1)2π , xi (c) = −w cos H = { p ∈ R2k+1 | z(p) = h }, on which also the center n (ui −1)2π of the top ball lies. This breaks the symmetry between yi (c) = −w sin , for i = 1, . . . , k, n top and bottom balls that the construction had until z(c) = h, now. Since the center of the bottom ball lies on the hyperplane −H and ρ < 5/4 < 2h, none of the constraint with 5σ 3k + 2σ 2 points can be covered by a bottom ball. Therefore, our w= , h= . 2(4k + σ 2 ) 4k + σ 2 discussion will only consider top balls. The constraint uv point qij will lie in all top balls B(u1 , . . . , uk ) except in Proof. By symmetry, it is sufficient to show this for those with ui = u and uj = v. the ball B(1, . . . , 1), whose center we claim to be c = uv We choose qij in the four-dimensional affine sub(0, −w, 0, −w, . . . , 0, −w, h). We use the following wellspace (4-flat) known characterization of the smallest enclosing ball: Proposition 3.1. A ball B containing a finite set of points A is the smallest enclosing ball for A if and only if its center lies in the convex hull of the points of A that lie on the boundary of B. t u It is straightforward to check that all points of A(1, . . . , 1) have the same distance ρ to c. Moreover, the center c lies on the line segment between pz and the center of gravity of the remaining 2k points of A(1, . . . , 1), which is the point (0, −σ/k, 0, −σ/k, . . . , 0, −σ/k, 0). Thus it lies in the convex hull of A(1, . . . , 1). t u

Fij = { p ∈ R2k+1 | xr (p) = yr (p) = 0, for r 6= i, j, and z(p) = h } 0 = o + Ei × Ej , where o0 = (0, . . . , 0, h). We look at the intersections of the balls B(u1 , . . . , uk ) with Fij . Let D = { B(u1 , . . . , uk ) ∩ Fij | (u1 , . . . , uk ) ∈ [n]k }. The intersection of any ball B = B(u1 , . . . , uk ) with Fij is a 4-dimensional ball D, whose center c is the

S. Cabello, P. Giannopoulos, C. Knauer, G. Rote — Geometric Clustering: Fixed-Parameter Tractability

7

orthogonal projection of the center of B on Fij . From Lemma 3.1, we have xi (c) = −w cos θi , yi (c) = −w sin θi , xj (c) = −w cos θj , yj (c) = −w sin θj , 2π where θi = (ui −1) 2π n and θj = (uj −1) n . The location of the center c thus depends only on ui and uj . We u u denote this center by ciji j . Looking at the distance between the centers of B, D, and o0 , we get the following properties: p a) Every ball D has radius ρ∗ = ρ2 − (k − 2)w2 ;

C +

L

uv qij = l2 l1

o0

cuv ij

b) a point q ∈ Fij lies in the ball B(u1 , . . . , uk ) if and only if |q − cuv ij |≤ ρ∗ ; c) the center c of D lies on the three-dimensional √ sphere C = { p ∈ Fij | |p − o0 | = w 2 };

(a) D

d) the sphere C is contained in the interior of D. uv Let Dij denote the ball in Fij with center cuv ij and radius ρ∗ uv ρ∗ . For each u, v, we want to find a point qij ∈ Fij that uv C lies outside the ball Dij but in all other balls of D. c Since the centers cuv √ ij ∈ Fij form a completely w 2 symmetric set and all balls have the same radius, we L+ cuv can find this point as follows. (See Fig. 3a for a twoij 0 l o 1 dimensional analog of this situation.) Start at cuv ij and 0 uv ρ∗ move along the ray L+ = { cuv ij + λ(o − cij ) | λ ≥ 0 } 0 through o . By properties (c) and (d), we are initially inside all balls D ∈ D. At some point l1 , we hit the uv Dij boundary of some ball. We prove below that this ball (b) uv uv is Dij . Thus, after passing l1 , we are outside Dij but uv uv still inside any ball D 6= Dij . We place qij at the point uv Figure 3: (a) Finding the point l2 = qij . (b) The sitl2 where L+ intersects the boundary of the next ball. uation of Lemma 3.3, in a two-dimensional intersection Lemma 3.3. The ray L+ , after having visited o0 , hits with the plane through o0 , cuv ij , and c. uv the boundary of the ball Dij before the boundary of any other ball D ∈ D. Lemma 3.4. The set P = P 0 ∪ QV ∪ QE can be covered + Proof. Let l1 be the point where L intersects the with two balls of radius ρ if and only if G has an uv boundary of Dij , and let c be the center of any other independent set of size k. ball D ∈ D (see Fig. 3b). By triangle inequality, we Proof. Any covering of P with two balls B, B 0 of radius have ρ must consist of the two balls B = B(u1 , . . . , uk ) and |c − l1 | < |c − o0 | + |o0 − l1 | B 0 = −B(u1 , . . . , uk ) for some tuple (u1 , . . . , uk ), by 0 0 uv Lemma 3.2. Since the constraint points exclude the < |cuv ij − o | + |o − l1 | = |cij − l1 | = ρ∗ . tuples with two equal indices ui = uj , or with indices uv This implies that the boundary of Dij is the first ui and uj when (ui , uj ) is an edge of G, the coverings boundary intersected by L+ . t represent precisely the independent sets of G. u t u

3.3 The Reduction. As Rounding coordinates. To make the reduction  mentioned in the beginning of this section, we add k2 (n + 2|E|) constraint points suitable for a Turing machine, we round all data to QV and QE to the scaffolding set P 0 to represent the multiples of a small “unit” U . Scaling by U will then convert the input to integral data. We will show that structure of the input graph G([n], E).

8

S. Cabello, P. Giannopoulos, C. Knauer, G. Rote — Geometric Clustering: Fixed-Parameter Tractability

choosing U = Θ(1/(n6 k 2 )) will preserve all important characteristics of our point set. Since it is easy to √ to this precision of O(log(nk)) bits, evaluate sin uπ n or the reduction can be carried out in polynomial time. More precisely, we replace each input coordinate x by a multiple x ˆ of U in the range x − U < x ˆ < x + U√ . This ensures that each input point is moved at most 5 · U from its original position. (Recall that most coordinates of our input points are√0.) We replace ρ√by a multiple ρˆ of U in the range ρ + 5 · U ≤ ρˆ ≤ ρ + 5 · U + 2U . In this way we ensure that for each ball that covers some set of input points, there exists an enlarged ball with radius ρˆ that covers the same input points. We want to exclude the possibility that the enlarged ball covers additional points (possibly after moving its center). In the following we will give only asymptotic estimates. The detailed calculations will be given in the full paper. Proposition 3.2. Every input point that is not in one of the balls B(u1 , . . . , uk ) is at least ε1 = Ω(1/(n3 k)) away from this ball. t u (This is the bound for the constraint points: the distance between l1 and l2 is a least w/n2 = Ω(1/(n3 k)) (see Fig. 3a). The scaffolding points are at least Ω(1/(nk)) away from the ball.) Since we have introduced some slack by enlarging the radius and moving the points, the center of the ball may move away from the original center. The following lemma bounds this movement. Lemma 3.5. If the center of the ball B(u1 , . . . , uk ) is moved by more than ε2 , the distance to some point on its boundary increases by at least ε3 = Ω(min(ε22 , ε2 /(nk))). t u (The first bound comes from moving the center perpendicular to the hyperplane through the boundary points of the ball; the second one comes from any movement in this hyperplane.) It follows that the center cˆ of a ball with the modified data can move only ε2 = O(1/(n3 k)) from its original position, since otherwise, its distance from some boundary√point would increase by ε3 = √ Ω(1/(n6 k 2 )). If ε3 > 5 · U + ( 5 · U + 2U ), this means that the sphere can no√longer contain this√boundary point. Now, since ε2 + ( 5 · U + 2U ) < ε1 − 5 · U , the sphere can swallow no additional points: Theorem 3.1. If we chose U = const/(n6 k 2 ), for a sufficiently small constant, all possible coverings by two balls of radius ρˆ partition the rounded point set in the same way as two balls B(u1 , . . . , uk ) and −B(u1 , . . . , uk ) for the original data. t u Using the rounded coordinates for the points of P ,  and since |P | = nk + 2 + k2 (n + 2|E|), we see that this is an fpt-reduction. Thus, we have the following:

Theorem 3.2. The Euclidean 2-center decision problem is W[1]-hard with respect to the dimension d. t u For a parameterized complexity upper bound, the (integral) Euclidean 2-center decision problem, parameterized with d, is trivially in W[P]; see [5]. Since d = 2k + 1 in the above fpt-reduction, an O(no(d) )-time algorithm for the Euclidean 2-center decision problem implies an O(no(k) )-time algorithm for the parameterized k-independent set problem, which in turn implies that SNP ⊂ DTIME(2o(n) ) [4]. Thus: Corollary 3.1. The Euclidean k-center problem, for any k ≥ 2, cannot be solved in O(no(d) ) time, unless SNP ⊂ DTIME(2o(n) ). u t References [1] P. K. Agarwal and M. Sharir. Efficient algorithms for geometric optimization. ACM Comput. Surv., 30:412– 458, 1998. [2] E. Assa and M.J. Katz. 3-piercing of d-dimensional boxes and homothetic triangles. Int. J. Comput. Geometry Appl., 9(3):249–260, 1999. [3] T. M. Chan. Geometric applications of a randomized optimization technique. Discrete & Computational Geometry, 22(4):547–567, 1999. [4] J. Chen, B. Chor, M. Fellows, X. Huang, D. Juedes, I. A. Kanj, and G. Xia. Tight lower bounds for certain parameterized NP-hard problems. Information and Computation, 201(2):216–231, 2005. [5] R. Downey and M. Fellows. Parameterized Complexity. Springer-Verlag, 1999. [6] Z. Drezner. Facility Location. Springer-Verlag, 1995. [7] D. Eppstein. Fast construction of planar two-centers. In Proc. 8th Ann. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages 131–138, 1997. [8] J. Flum and M. Grohe. Parameterized Complexity Theory. Springer-Verlag, 2006. [9] R. J. Fowler, M. S. Paterson, and S. L. Tanimoto. Optimal packing and covering in the plane are NPcomplete. Inf. Proc. Lett., 12(3):133–137, 1981. [10] G. N. Frederickson and D. B. Johnson. Generalized selection and ranking: sorted matrices. SIAM J. Comput., 13:14–30, 1984. [11] D. Marx. Efficient approximation schemes for geometric problems? In Proc. 13th Ann. European Sympos. Algorithms (ESA), volume 3669 of Lect. Notes Comput. Sci., pages 448–459. Springer-Verlag, 2005. [12] N. Megiddo. On the complexity of some geometric problems in unbounded dimension. J. Symb. Comput., 10(3-4):327–334, 1990. [13] N. Megiddo and K. J. Supowit. On the complexity of some common geometric location problems. SIAM J. Comput., 13:182–196, 1984. [14] M. Sharir and E. Welzl. Rectilinear and polygonal ppiercing and p-center problems. Proc. 12th Ann. Symp. Computat. Geom., pages 122–132. ACM Press, 1996.