Pp Pp P - CiteSeerX

Report 1 Downloads 517 Views
Approximation Algorithms for Minimum Tree Partition Nili Guttmann-Beck and Refael Hassin1 April 21, 1998

Abstract We consider a problem of locating communication centers. In this problem, it is required to partition the set of customers into subsets minimizing the length of nets required to connect all the customers to the communication centers. Suppose that communication centers are to be placed in of the customers locations. The number of customers each center supports is also given. The problem remains to divide a graph into sets of the given sizes, keeping the sum of the spanning trees minimal. The problem is NP-Complete, and no polynomial algorithm with bounded error ratio can be given, unless = . We present an approximation algorithm for the problem assuming that the edge lengths satisfy the triangle inequality. It runs in ( 24p + 2) time ( = j j) and comes within a factor of 2 ? 1 of optimal. When the sets' sizes are all equal this algorithm runs in ( 2 ) time. Next an improved algorithm is presented which obtains as an input a positive integer (  ? ) and runs in ( ( ) 2 ) time, where is an exponential function of and , and comes within a factor of 2 + 2px?3 of optimal. When the sets' sizes are all equal it runs in (2(p+x) 2) time. A special algorithm is presented for the case = 2. n

p

P

NP

O p

n

V

n

p

O n x

p

x

n

p

O f p; x n

f

x

O

n

p

Keywords: Approximation algorithms, minimum spanning tree, graph partitioning.

1 Introduction Let G = (V; E ) be a complete undirected graph, with a node set V and an edge set E . The edges e 2 E have lengths l(e) that satisfy the triangle inequality. We assume that each vertex represents a customer. The goal is to partition V into p subsets of given sizes, in order to locate a communication center in one node of each subset. The nodes of each subset will then be connected to this server through a subnetwork of minimum total length, that is, a minimum spanning tree (MST) of the subgraph induced by this subset of nodes. The Minimum Tree Partition Problem is to compute a partition with minimum total length. More formally: Given G = (V; E ) with jV j = n, and p positive integers fki gpi=1 such that Pp ki = n. The Minimum Tree Partition Problem is to nd a partition of V into disjoint sets i=1 fPigpi=1 such that 8i 2 f1; : : :; pg jPij = ki; and Ppi=1 l(MST (Pi)) is minimized, where MST (P (i) is a minimum spanning tree in the graph induced on Pi and l(E 0) = Pe2E l(e) for E 0  E . 0

1

Department of Statistics and Operations Research, Tel Aviv University, Tel Aviv 69978, Israel.

fnili,[email protected]

1

The problem is NP-hard [6]. In this paper we introduce approximation algorithms with bounded error ratio. First, we describe a general algorithm for dividing the graph into p sets of customers. It runs in O(p24p + n2 ) time where n = jV j, and comes within a factor of 2p ? 1 of optimal. When the sizes of the sets are all equal it runs in O(n2 ) time. Next, we describe an algorithmic scheme, for any given value of a parameter x 2 f1; 2; : : :; n ? p +1g this algorithm runs in O(f (p; x)n2) time, where f is an exponential function of p and x, and comes within a factor of 2 + 2px?3 of optimal. When the sizes of the sets are all equal this algorithm runs in O(2(p+x)n2 ) time. For the case p = 2 we present an O(n2 ) time algorithm that comes within a factor of 2 of optimal. For dividing the graph into 2 equal-sized sets we prove that the optimal solution value is (G)) bounded by 3l(MST . For approximating the solution to this problem we de ne a 'K -centroid' 2 which generalizes the concept of a centroid of a tree and prove its existence. For small values of p these algorithms improve previously best-known performance of 4(1 ? np ) for partitioning the graph into p equal-sized sets, given by Goemans and Williamson in [4] (see also, Gabow, Goemans and Williamson [2] and Williamson [10]). Our algorithms also improve for p small p values the time requirement (from O(n2 log log n)) and generalize the problem allowing the sets to be of di erent sizes. For a given S  V consider all the partitions of S into p sets. For each partition compute the sum of lengths of the MSTs over the subgraphs induced by the partition. Denote z (S ) the minimum sum obtained over all the possible partitions. Our problem is to approximate z (V ), while Chandra and Halldorsson [1] present a 4-approximation algorithm for the problem of maximizing z (S ) over all subsets S  V , jS j = k, where k is given. Imielinska, Kalantari and Khachiyan in [7] and Goemans and Williamson in [5] present polynomial algorithms with bounded performance guarantees for the following problem (without the triangle inequality assumption): Given m 2 f2; :::; ng, nd a minimum length spanning forest such that each of its trees spans at least m vertices. This is di erent from our problem in which the trees' exact sizes are given and for which it has been shown in [6] that no such approximation can be given unless P = NP . Approximation algorithm with bounded performance guarantees for the related min-max tree partition problem in which the goal is to partition the node set into p sets of equal size P1 ; : : :; Pp minimizing maxi2f1;:::;pg l(MST (Pi)), are described in [6]. Our algorithms can also be used to approximate the problem of covering the graph by cycles, (by doubling all the trees and using the triangle inequality to replace each tree by a cycle whose size is at most twice the size of the tree). The resulting error bound is twice the corresponding bound for the tree partition problem. 2

We will use the following notations: For an edge e, l(e) is the length of e. For a set of edges E , l(E ) = Pe2E l(e): For a graph G = (V; E ), l(G) = l(E ). For a set of nodes V 0 , MST (V 0) is a MST on the subgraph induced by V 0 . For a subgraph B we denote by VB and EB the sets of nodes and edges in B respectively. We denote by opt the optimal solution value of the problem.

2 First approximation 2.1 The cycle procedure Consider Cycle Part given in Figure 1. This procedure takes a MST on the given graph and double its edges, getting an Eulerian cycle. This cycle is changed into a simple one of shorter or equal length using the triangle inequality. Then we divide the nodes in the graph according to the order by which they appear in the cycle. Starting by removing the longest edge, then taking the rst k1 nodes into the rst set, the next k2 nodes to the second set, etc.

Lemma 2.1 Let e be the longest edge in an MST T of G. Then the value r returned by Cycle Part

satis es r  2l(T ) ? l(e):

Proof: If p = 1, r = l(T )  2l(T ) ? l(e) .

Suppose p > 1. For i 2 f1; : : :; pg, if ki  2 then f(vmi ; vmi +1 ); : : :; (vmi +ki ?2 ; vmi+ki ?1 )g is a spanning tree of Pi . Thus,

l(MST (Pi)) 

X

mi +ki ?2 j =mi

l(vj ; vj+1 ) 8i 2 f1; : : :; pg:

Note that when ki = 1 l(MST (P ? i)) = 0 and the sum is also 0. It follows that p p mX k? X X l(v ; v r = l(MST (P ))  i=1

i

i+ i

i=1 j =mi

2

j j +1 )  l(C ) ? l(vn ; v1):

Since (vn ; v1) is the longest edge in C , l(vn ; v1)  l(e). (The longest edge in the MST appears in the cycle which was created by doubling the edges, when changing the cycle into a simple one this edge is either untouched, or is changed into a longer edge. So the cycle contains an edge of length  l(e).) Hence, r  l(C ) ? l(e)  2l(T ) ? l(e): To see that Cycle Part may produce a bad approximation consider the graph with V = fv1; v2; v3g, `(v1; v2) = `(v1; v3) = 1 and `(v2; v3) = 0. The desired sizes for partitioning are: f2; 1g. A MST for this graph consists of the edges (v1; v2); (v2; v3). The resulting simple cycle is (v1 ? v2 ? v3 ? v1) with the numbering of the nodes giving that (v3; v1) is a longest edge of the cycle. In this case P1 = fv1 ; v2g; P2 = fv3 g. giving r = 1, while opt = 0. 3

Cycle Part

input

1. A graph G = (V; E ), jV j = n and a MST T . p X 2. A set of positive integers fk1; : : :; kpg satisfying ki = n. i=1

returnsp 1. fPi gi where [pi Pi = V and jPi j = ki 8i 2 f1; : : :; pg. p =1

=1

2. A value r satisfying r =

begin if (p = 1) then

X l(MST (P )). i

i=1

P1 := V . r := l(T ): return (fP1g; r)

end if

Double all the edges in ET . A cycle C is created. Change it into a simple cycle of equal or smaller size, using the triangle inequality. Number the nodes in V so that Ecf(v1 ; v2); (v2; v3); : : :; (vn?1 ; vn); (vn ; v1)g; where (vn ; v1) is the longest edge in C .

m1 := 1 mi := mi?1 + ki?1 ; i = 2; : : :; p. Pi := fvmi ; : : :; vmi +ki ?1 g; i = 1; : : :; p p X r := l(MST (Pi)) . i=1 return (fPigpi=1; r)

end Cycle Part

Figure 1: The cycle partitioning routine

2.2 Algorithm Part Alg To partition G into p sets of sizes fk1; : : :; kpg call Part Alg(G, fk1; : : :; kpg), where Part Alg is de ned in Figure 2. This algorithm uses the previously de ned Cycle Part. Step 2 of Part Alg removes, during its j -th application, a set of j longest edges from a MST of G, creating j + 1 components. It then checks whether a partition of the components into subsets of sizes k1; : : :; kp can be obtained . The step is repeated as long as such a partition exists. Finally, Step 3 applies Cycle Part to each component obtained in the last iteration.

Lemma 2.2 Let y be the value of e cou when Part Alg reaches Step 3, let fg ; : : :; gy? g be the 1

1

y ? 1 longest edges in T , where g1 is the longest of them. Let apx be the value returned by 4

Part Alg

input

1. A graph G = (V; E ), jV j = n Xp 2. A set of positive integers fk1; : : :; kpg satisfying ki = n.

returnsp 1. fPi gi where [pi Pi = V and jPij = ki. p =1

=1

2. A value apx satisfying apx =

begin Step 1

Step 2

i=1

X l(MST (P )). i=1

i

T := MST (G). PT := f(T; fk1; : : :; kpg)g .

end Step 1

done := 0: e cou := 1: (e cou for edge-count). while (done = 0) Remove the e cou longest edges in ET . A set of connected components fTi gie=1cou+1 is created. (Ti is a spanning tree of VTi ). Compute a partition of fk1; X : : :; kpg into e cou + 1 sets fK1; : : :; Ke cou+1 g, where kj = jVTi j: kj 2Ki if (a partitioning fK1; : : :; Ke cou+1 g is found)

then

else

PT := f(Ti; Ki) i = 1; : : :; ecou ? 1g. e cou := e cou + 1: if (e cou = p) then done := 1.

end if

done := 1.

end if end while end Step 2 Step 3 for every (i = 1; : : :; ecou)

Call Cycle Part(Ti; Ki) where: fPi1; : : :; PijKijg is the returned partition, ri is the returned value.

end for

return (fP ; : : :; P jK1j; : : :; Pe cou; : : :; PejKcou jg, apx := end Step 3 1 1

1

1

e cou

end Part Alg

Figure 2: The partitioning algorithm 5

X r :)

e cou i=1

i

Part Alg(G,fk1; : : :; kpg). Then,

apx  2l(T ) ? l(g1);

and if y > 1,

apx  2(l(T ) ?

X l(g )):

y?1

i

i=1

Proof: When entering Step 3 the set PT satis es PT = f(T1; K1); : : :; (Ty ; Ky )g, where the Ti s are

the connected components created from T when removing the edges fg1 : : :; gy?1 g from it, (if y = 1 then T1 = T ). Suppose that y = 1. Activating the cycle routine on T according to Lemma 2.1 gives a value r  2l(T ) ? l(g1). Hence for this special case apx = r  2l(T ) ? l(g1): Suppose now that y > 1. From the way the Ti s were obtained, ? Xy l(T ) = l(T ) ? yX l(g ): 1

i

i=1

i=1

(1)

i

For every i 2 f1; : : :; y g, by Lemma 2.1, ri  2l(Ti); and with Equation (1) we get y y yX ? X X apx = r  2l(T )  2(l(T ) ? l(g ))  2l(T ) ? l(g ): 1

i=1

i

i=1

i

i=1

i

1

Let fOigpi=1 be an optimal partition, and denote the set of edges of MST (Oi) as EOi , i 2 f1; : : :; pg. For every i 6= j fi; j g  f1; : : :; pg, de ne e(i;j) to be an edge satisfying

l(e(i;j)) = v2Omin fl(v; u)g: ;u2O i

j

Consider the graph G0 in which nodes represent the sets Oi, and the length of the edge between the node representing Oi and the node representing Oj is l(e(i;j )).

De ne fe gp ?=11 to be the p ? 1 edges of a MST in G0. Rename the edges so: l(e1)  l(e2)  l(e3) : : :  l(ep?1). The set of edges [pi=1 EOi [ fe1; : : :; ej g de nes a set of p ? j connected components. Let fU1j ; : : :; Upj?j g be the sets of nodes in these components.

Lemma 2.3 The shortest edge between Uij and Ukj for i 6= k, fi; kg  f1; : : :; p ? j g is of length  l(ej ). +1

Proof: The set of edges fe1; : : :; ep?1 g is a MST in G0 . Suppose there is an edge g between Uij and

Ukj , such that l(g) < l(ej+1 ). Add the corresponding edge in G0, g^, to fe1 ; : : :; ep?1 g. A cycle has 6

been created. This cycle contains at least one edge, f^, from fej +1 ; : : :; ep?1g (since fe1 ; : : :; ej g are all edges inside the Uij sets). Then , l(f^)  l(ej +1 ) and fe1; : : :; ep?1gnff^g[fg^g is a strictly shorter spanning tree then fe1 ; : : :; ep?1 g, contradicting the fact that the latter is a MST.

Theorem 2.4 (Gale [3], see also [9]) Let T = (V; H ) be a MST of G = (V; E ), and let T = (V; F ) 1

2

be any spanning tree of G. Suppose that H = fh1 ; h2; : : :; hn?1 g is ordered so that l(h1)  : : :  l(hn?1 ), and F = ff1 ; f2; : : :; fn?1 g is ordered so that l(f1)  : : :  l(fn?1 ). Then, l(hi)  l(fi) 8i 2

f1; : : :n ? 1g:

Theorem 2.5 apx  (2p ? 1)opt: Proof: When adding fe1; : : :; ep?1g to [pi=1 EOi a spanning tree of G is created. Hence

l(T )  opt +

X l(e):

p?1 i=1

i

(2)

Since T is a spanning tree it must contain at least p ? 1 edges between the sets of the optimal solution. Let the number of these edges in T be z . Consider a graph that contains p nodes (same nodes as in G0 ), corresponding to O1 ; : : :; Op. Let the edges in this graph correspond to the z edges of T mentioned above, where such an edge connects a node corresponding to Oi to the node corresponding to Oj if the original edge connected in T a node from Oi with a node from Oj . Look at p ? 1 edges that create a spanning tree in this graph. Let these edges be f1 ; : : :; fp?1 , l(f1)  l(f2)  : : :  l(fp?1). These edges satisfy that ff1; f2 ; : : :; fp?1 g  ET and that (by Theorem 2.4) l(ei )  l(fi) 8i 2 f1; : : :; p ? 1g: (3) We consider three cases: ?1 E [ fe ; : : :; e g is a spanning tree with at most p ? 1 1. opt < l(e1). The set of edges [pi=1 Oi 1 p?1  edges of length  l(e1). Hence, by Theorem 2.4, T contains at most p ? 1 edges of length  l(e1). Removing from T its p ? 1 longest edges leaves p components with all of their edges inside the optimal solution's sets of nodes. (Because the shortest edge between two nodes from two di erent Oi s has to be at least of length l(e1).) Therefore, these components are exactly the optimal solution, the value e cou = p will be reached, and apx = opt.

2. l(ej )  opt < l(ej +1 ) for some j 2 f1; : : :; p ? 2g. We will show that in this case y , the value of ?1 E [fe ; : : :; e g is e cou when Step 3 is reached, satis es y  p ? j . The set of edges [pi=1 Oi 1 p?1  a spanning tree with at most p ? j ? 1 edges of length  l(ej +1 ). Then, according to Theorem 2.4, T contains at most p ? j ? 1 edges of this length. Removing from T its p ? j ? 1 longest 7

edges, will leave only edges of length < l(ej +1 ). Look at fU1j ; : : :; Upj?j g de ned before. By Lemma 2.3, the shortest edge between Uij and Ukj for every i; k is at least as long as l(ej +1 ). Thus after the removal of the p ? j ? 1 longest edges from T there are no edges left between nodes from di erent Uij s. T is disconnected into p ? j connected components, giving that this partitioning has to be fU1j ; : : :; Upj?j g. So, y  p ? j . From j  p ? 2 it follows that y  2. we now use Lemma 2.2, the fact that the fi edges are in T , and Equation (3),

apx 

p?X j? X 2(l(T ) ? l(g ))  2(l(T ) ? l(g )) y?1

 2(l(T ) ?  2(l(T ) ?

i=1 p?j ?1

X

i=1 p?j ?1

X i=1

1

i

i

i=1

l(fp?i)) l(ep?i)) = 2(l(T ) ?

X l(e)) + 2 Xj l(e):

p?1 i=1

i

i=1

i

By Equation (2) and the assumption of this case, and since j < p ? 2

apx  2opt + 2

Xj l(e)  2opt + 2jl(e)  2opt + 2j opt  (2p ? 2)opt: i=1

i

j

3. l(ep?1)  opt. By Lemma 2.2 and since fp?1 is in T , apx  2l(T ) ? l(g1)  2l(T ) ? l(fp?1): By Equations (3) and (2), and the assumption of this case,

apx  2l(T ) ? l(ep?1) = 2(l(T ) ?

 2(l(T ) ?

X l(e)) + 2 pX? l(e) + l(e

p?1

i=1 p?1

2

i

i=1

i

X l(e)) + (2p ? 3)l(e i=1

p?1)

p?1)

i

 2opt + (2p ? 3)l(ep? )  (2p ? 1)opt: 1

2.3 Tight example for Part Alg Consider the graph with p(p + 1) nodes in Figure 3 (a). There are p + 1 sets of nodes in this graph. p ? 1 of these sets contain p + 1 nodes each. There is one more set of nodes containing p nodes and one more set containing only one node. The edges inside one of these sets are of length 8

p + 1 nodes

p + 1 nodes

p + 1 nodes

p + 1 nodes

p + 1 nodes

1 node

p edges

p nodes p edges

p edges

(a)

p edges

p ? 1 edges

p edges

e^ 1 node

p ? 1 edges

p ? 1 edges

p ? 1 edges

p ? 1 edges

(b)

p ? 1 edges

p ? 1 edges (c)

Figure 3: A tight example for p partitioning - rst gure 9

p ? 1 nodes 2 nodes

1 node

p nodes

2 nodes

(a)

p ? 1 nodes Figure 4: A tight example for p partitioning - second gure

0. Edges between two sets have length 1. The objective is to divide the nodes into p sets of p + 1 nodes each. A MST is shown in Figure 3 (b). Step 2 tries to remove a longest edge. Let the chosen edge be e^ shown in the gure. There is no partitioning of fk1; : : :; kpg into sets of sizes f1; p(p + 1) ? 1g so e cou = 1 when Step 3 is reached. The cycle routine is activated for the MST. The simple cycle achieved is shown in Figure 3 (c) and the resulting partitioning is shown in Figure 4 (a), giving apx = 2p ? 1. An optimal solution consists of using the original p + 1-nodes sets as sets in the partition, and putting the original p nodes set together with the single node set, giving opt = 1. So, apx = (2p ? 1)opt:

2.4 Complexity We now analyze the complexity of Part Alg. Part Alg in Step 1 takes O(n2 ). Step 2 Implements a loop which is activated at most p times. In each iteration the tree is scanned

to nd the next longest edge and the sizes of connected components when removing this edge. Next a partitioning of fk1; : : :; kpg is searched (only one is needed). Finding such a partitioning takes O(4p). To nd whether there is a partitioning of fk1; : : :; kpg into e cou + 1 sets of sizes fa1; : : :; ae cou+1 g de ne the next dynamic search:

fi(S ) is de ned for every i 2 f1; : : :; e cou + 1g and every S  fk1; : : :; kpg . fi (S ) receives the 10

value true if there is a partitioning of S into i sets of sizes fa1; : : :; aig.

fi+1 (S ) := true i there is T  S such that fi (S nT ) = true and Pkj 2T kj = ai+1 .

f1(S ) := true if Pkj 2S kj = a1. There are 2p possible values of S and e cou + 1  p values of i. Hence there are O(p2p) values to compute. Every value takes O(2p) times to calculate (there are O(2p) possible subsets T ). So each iteration takes O(p4p) time. Thus Step 2 requires O(p(n + p4p)) altogether. When ki = np 8i 2 f1; : : :; pg this step only requires to nd the longest edge at each iteration and to compute the components' sizes, thus requiring only O(pn). Step 3 Calls Cycle Part for every pair in PT . This takes O(jVTi j2 ). So all the calls for this procedure takes O(n2). Calculating r will take additional O(p). Thus Step 3 takes altogether O(n2 ). Altogether the algorithm takes O(p(n + p4p ) + n2 ). When ki = np 8i 2 f1; : : :; pg the algorithm will take O(pn + n2 ) = O(n2 ).

3 Improving the bound In this section we describe an algorithm that achieves a better bound at the expense of a higher complexity. It uses a parameter x (x  n ? p +1) which determines the improvement in the bound, and the higher complexity. To partition G into p parts with sizes fk1; : : :; kpg call Part Alg x(G, fk1; : : :; kpg), where Part Alg x is de ned in Figure 5. This algorithm considers the x + p ? 1 components obtained when x + p ? 2 longest edges are removed from a MST of G. It considers all of the possible combinations to aggregate these components into sets of sizes that enable us to produce a solution by applying the cycle routine to each set. The case x = 1 gives the same bound as Part Alg, but with higher time complexity because it enumerates all of the possible combinations while Part Alg only checks the existence of such a combination for each value of e cou.

3.1 Evaluating Part Alg x The next 2 lemmas are going to be proved together:

Lemma 3.1 Let g be the longest edge in T , then apx  2l(T ) ? l(g ): 1

1

Lemma 3.2 Let rtemp be a value calculated in Step 2. If in Step 3 y > 1 then apx  2rtemp: 11

Part Alg x

input

1. A graph G = (V; E ). Xp 2. A set of positive integers fk1; : : :; kpg satisfying ki = n.

returnsp 1. fPi gi where [pi Pi = V and jPi j = ki . p =1

=1

2. A value apx satisfying apx =

begin Step 1

Step 2

i=1

X l(MST (P )). i=1

i

T := MST (G). PT := f(T; fk1; : : :; kpg)g . r := l(T ) ? l(g21) , where g1 is the longest edge in T .

end Step 1

Remove the x + p ? 2 longest edges in ET . A set of connected components fC1; : : :; Cx+p?1g has been created. Compute all the partitions of fk1; : : :; kpg into y sets fK1; : : :; Ky g, and all the partitions of V into y sets fW1; : : :; Wy g such that: 1. 2  y  p . 2. 8X j 2 f1; : : :; x + p ? 1g 9i 2 f1; : : :; yg for which VCj  Wi. kj = jWij . 3. kj 2Ki

for every pair of such partitioning fK ; : : :; Ky g and fW ; : : :; Wy g: y

X := l(MST (W )).

1

1

rtemp i i=1 if (rtemp < r) then r := rtemp: PT := f(MST (Wi); Ki) i = 1; : : :; yg:

Step 3

end if end for end Step 2

y := jPT j for every (i = 1; : : :; y).

Call Cycle Part(Ti ,Ki) where: fPi1; : : :; PijKijg is the returned partition, ri is the returned value.

end for

; : : :; P1jK1 j ; : : :; Py1 ; : : :; PyjKyj g, apx :=

return (fP end Step 3 1 1

Xy r :) i=1

end Part Alg x

Figure 5: The improved Partitioning Algorithm 12

i

Proof: When entering Step 3 the set PT is given by PT = f(T1; K1); : : :; (Ty ; Ky )g and if y > 1

then Pyi=1 l(Ti) = r.

Suppose that y = 1. Activating the cycle routine of T gives, by Lemma 2.1, a value r1  2l(T ) ? l(g1). Hence for this special case apx = r1  2l(T ) ? l(g1): Suppose now that y > 1. In this case, the value of r when entering Step 3 is di erent from the initial value l(T ) ? l(g21 ) . For every rtemp along the algorithm

Xy l(T ) = r  r i=1

i

temp :

(4)

By Lemma 2.1, for every i 2 f1; : : :; y g ri  2l(Ti), implying

y y X X apx = r  2l(T ): i=1

i

i

i=1

From Equation (4), apx  2rtemp: Since r at the end of the algorithm obviously satis es that r  l(T ) ? l(g21) , for this case too apx  2r  2l(T ) ? l(g1):

Theorem 3.3 apx  (2 +

p?3 )opt: x

2

Proof: By adding fe1 ; : : :; ep?1g, to [pi=1 EOi a spanning tree of G is created. Hence,

l(T )  opt +

X l(e):

p?1 i=1

i

(5)

?1 E contains at most x ? 1 edges of length  l(e). 1. opt < xl(e1). The set of edges [pi=1 Oi 1 p ? 1   It follows that the set of edges [i=1 EOi [ fe1; : : :; ep?1g is a spanning tree with at most (x ? 1)+(p ? 1) edges of length  l(e1). Hence, by Theorem 2.4, T contains at most x + p ? 2 edges of length  l(e1). Removing from T its x + p ? 2 longest edges leaves only edges inside the optimal solution set of nodes. (Because the shortest edge between two nodes from two di erent Oi s has to be at least of length l(e1)). So there is a partitioning fW1 ; : : :; Wpg which is exactly the optimal solution. That proves that when reaching Step 3, r  opt, and according to Lemma 3.2, apx  2r  2opt. ?1 E [ 2. xl(ej )  opt < xl(ej +1 ) for some j 2 f1; : : :; p ? 2g. In this case the set of edges [pi=1 Oi    fe1; : : :; ep?1g is a spanning tree with at most x ? 1+ p ? j ? 1 edges of length  l(ej+1). Then according to Theorem 2.4, T contains at most x + p ? j ? 2 edges of this length. Removing from T its x + p ? 2 longest edges, will leave only edges of length < l(ej+1). Look at fU1j ; : : :; Upj?j g de ned before. By Lemma 2.3, the shortest edge between Uij and Ukj for every i; k is at least as long as l(ej +1). Thus after the removal of the x + p ? 2 longest edges from T there are no

13

edges left between nodes from di erent Uij s. So there is a partitioning fW1 ; : : :; Wp?j g which is exactly fU1j ; : : :; Upj?j g. Hence, when Step 3 is reached

r

X l(MST (U j )):

p?j

i

i=1

By the way the Uij were de ned

X l(MST (U j ))  opt + Xj l(e):

p?j

i

i=1

i

i=1

) r  opt +

Xj l(e): i=1

i

By Lemma 3.2 apx  2r  2(opt + Pji=1 l(ei ))  2opt +2jl(ej ): Since j  p ? 2 and according to the assumption of this case apx  2opt + (2p ? 4)l(ej )  (2 + 2p x? 4 )opt: 3. xl(ep?1)  opt. By Lemma 3.1 apx  2l(T ) ? l(g1): Clearly l(g1)  l(ep?1), and with Equation (5)and the assumption of this case,

apx  2l(T ) ? l(ep?1)

 2(l(T ) ?  2(l(T ) ?

X l(e)) + 2 pX? l(e) + l(e

p?1

i=1 p?1

2

i

i=1

i

X l(e)) + (2p ? 3)l(e i=1

p?1)

p?1)

i

 2opt + (2p ? 3)l(ep? )  (2 + 2px? 3 )opt: 1

3.2 Complexity The complexity of this algorithm is O(f (p; x)n2) where f is an exponential function of p and x. Step 1 Finding a MST takes O(n2 ). Step 2 We can scan the tree to nd the x + p ? 2 longest edges. This requires O((x + p)n).

Looking for all the partitions of fC1; : : :; Cx+p?1g requires O(f1 (p; x)), where f1 is an exponential function of p and x. Then scan all the partitions of fk1; : : :; kpg taking O(f2 (p; x)), where f2 is an exponential function of p and x. For every acceptable pair of partitioning nding all the MSTs and their lengths require O(n2). Altogether this step requires O(f (p; x)n2), where f is an 14

exponential function of p and x. When ki = np 8i 2 f1; : : :; pg nding all the possible partitioning takes O(2(p+x)) time, and for each partitioning O(n2 ) work is needed. Altogether this step requires O(2(p+x)n2 ) time. Step 3 As before this step calls Cycle Part for every pair in PT , taking altogether O(n2).

So the complexity of Part Alg x is dominated by that of Step 2, that is, O(f (p; x)n2) . When ki = np 8i 2 f1; : : :; pg the algorithm takes O(2(p+x)n2 ) time.

4 Partitioning into 2 sets In this section we treat the following case: Given a graph G = (V; E ), jV j = n, and a constant K  n=2. Partition V into disjoint sets P and Q such that jP j = K; jQj = n ? K , and l(MST (P ))+ l(MST (Q)) is minimized.

4.1 The K -centroid For approximating the solution in the case p = 2 we de ne a 'K -centroid' and prove its existence. Given a tree T = (V; ET ), and a constant K  n2 . For a node r 2 V remove all the edges in ET incident to r. A set of connected components is created. Let fC1; C2; : : :; Cmg be all of these components which satisfy jVCi j  K . If Pmi=1 jVCi j  K then r is a K -centroid. For the special case K = n2 the K -centroid is simply a centroid (a centroid is a node which when removing it form the T , each one of the connected components created contains at most n2 nodes). The de nition of a centroid of a tree, and a linear time algorithm for nding it are presented in [8].

Lemma 4.1 A K -centroid exists for every tree and K  n . It can be found in O(n) time. 2

Proof: Consider Find K-Cent de ned in Figure 6. During this procedure, the spanning tree given

to Find K-cent as input contains at least K + 1 nodes.

In each iteration, the number of nodes in the tree is no more than half the number of nodes in the previous one. Since the tree is always kept to contain at least K + 1 nodes, Find K will always stop and nd the required node. In each iteration the most expensive operation is to nd the centroid, which takes linear time. Since the number of nodes in each new iteration is no more then half the number of nodes in the previous iteration all the algorithm takes O(n).

15

Find K-cent

input

1. A tree T . 2. An integer K ( 1  K  n2 ) .

returns

1. A node r which is a K -centroid. Pi=1 jVTi j  K . 2. A forest fT1; : : :; Tmg such that jVTi j  K 8i 2 f1; : : :; mg and m

begin

c := a centroid in T . Delete c from T , a set of connected components fC1; : : :; Cmg is created. if (jVCi j  K ) i = 1; : : :; m

then

return (c, fC ; : : :; Cmg) else S := VC such that jVC j  K 1

i

i

Ts := T induced on S . return (Find K-cent (Ts; K ) .

end if end Find K-cent

Figure 6: Finding the K -Centroid

4.2 The approximation algorithm To divide the graph into two sets of sizes K and jV j ? K , call Part 2 Alg(G,K ), where Part 2 Alg is de ned in Figures 7 and 8. This algorithm nds a MST of G. First it tries to nd one edge whose removal divides the graph into sets of the desired sizes. If failed it doubles part of the tree's edges getting a graph that can be easily divided into sets of the desired sizes. Note that when Step 3 is reached there is no edge whose removal creates a connected component of size exactly K . Hence for every i 2 f1; : : :; mg jVTi j < K . Also, since in Find K-cent S was always kept to contain at least K + 1 nodes, m  2. When l(ET2 ) + l(e2)  l(ETS ) it is possible to nd the de ned above P since T2 and the cycle contain all the nodes not in VT1 [ fug, hence T2 and the cycle contain at least n ? K  K nodes. When l(ET2 ) + l(e2) < l(ETS ) it is possible to nd the de ned above P since jVT1 j < K , but T1 and the cycle contain at least K + 1 nodes. Also, in that case the nodes from the cycle that are inserted into P are obtained by walking on the cycle, starting at c and walking K ?jVT1 j? 1 nodes in one of the two possible directions.

4.3 Evaluating Part 2 Alg Lemma 4.2 If Step 3 is reached then apx  2l(T ) ? (l(ET )+ l(e )+maxfl(ET )+ l(e ); l(ETS)g): 1

16

1

2

2

Cre Cycle

input

1. A tree T0. 2. A set of edges F  ET0 .

returns

1. A graph H .

begin

Double all the edges in F . A cycle has been created. Change this cycle into a simple cycle of equal or smaller length (using the triangle inequality). Let H be the obtained graph.

return (H ) end Cre Cycle

Figure 7: Dividing the graph into 2 sets (Cre cycle routine) Proof:

1. If l(ET2 ) + l(e2)  l(ETS ) then the length of the part of graph which is doubled is l(T ) ? (l(ET1 ) + l(e1) + l(ET2 )) and therefore

apx  l(G2) ? l(e2)  2l(T ) ? (l(ET1 ) + l(e1) + l(ET2 ) + l(e2)): By the assumption of this case this implies the claimed inequality. 2. If l(ET2 ) + l(e2) < l(ETS ) then the length of the part of the graph which is doubled is l(T ) ? (l(ET1 ) + l(e1) + l(ETS )). Hence, apx  2l(T ) ? (l(ET1 ) + l(e1) + l(ETS )): By the assumption of this case this implies the claimed inequality.

Lemma 4.3 If Step 3 is reached then apx  2l(T ) ? (l(g )+ l(g )) where g and g are two longest 1

edges in T .

2

1

2

Proof:

1. If fg1; g2g  ET1 [ fe1g [ ET2 [ fe2g then l(ET1 ) + l(e1) + l(ET2 ) + l(e2)  l(g1) + l(g2): It follows from Lemma 4.2 that apx  2l(T ) ? (l(g1) + l(g2)): 2. If fg1 ; g2g  ET1 [ fe1 g [ ETS then l(ET1 ) + l(e1) + l(ETS )  l(g1) + l(g2): Again Lemma 4.2 gives the claimed result. 17

Part 2 Alg

input

1. A graph G. 2. An integer(1  K  n2 ).

returns 1. fP; Qg where P [ Q = V , jP j = K and jQj = n ? K . 2. A value apx = l(MST (P )) + l(MST (Q)).

begin Step 1

T := MST (G).

end Step 1 Step 2 if (There exists an edge e whose removal from T disconnects T into 2 1

connected components, P and Q such that jP j = K )

then

Step 3

return (fP; Qg; apx := l(MST (P )) + l(MST (Q)):) . end if end Step 2 Call Find K-cent(T; K ) where: c is the returned K -centroid, fT1; : : :; Tmg is the returned forest. ei := the edge connecting Ti to c in T . i = 1; : : :; m W.l.o.g suppose that:

l(ET1 ) + l(e1)  l(ET2 ) + l(e2)  l(ETi ) + l(ei ) 8i 2 f3; : : :; mg. TS := the subtree of T induced by V n([mi=1 VTi ). if (l(ET2 ) + l(e2)  l(ETS ))

then

else

G2 := Cre Cycle(T; ET n(ET1 [ ET2 [ fe1 g)) ( see Figure 9 ). Delete e2 from G2. P := VT2 [ f the rst K ? jVT2 j nodes from the path connecting T2 to cg. Q := V nP . G3 := Cre Cycle(T; [mi=2(fei g [ ETi )). ( see Figure 9 ). P := VT1 [ fcg [ fK ? 1 ? jVT1 j nodes that are adjacent to u on the cycle, found when walking from c on the cycle in one direction.g Q := V nP .

end if return (fP; Qg; apx := l(MST (P )) + l(MST (Q))) . end Step 3

end Part 2 Alg

Figure 8: Dividing the graph into 2 sets Part 2 Alg

18

T1

T1 e1

e1 c

e2

T2

c TS

Cr

G2

G3

Figure 9: G2 and G3 3. If fg1; g2g  ET2 [ fe2g [ ETS then l(ET2 ) + l(e2) + l(ETS )  l(g1) + l(g2): According to the algorithm l(ET1 ) + l(e1)  l(ET2 ) + l(e2): Therefore, l(ET1 ) + l(e1) + l(ETS )  l(g1) + l(g2); and from Lemma 4.2, apx  2l(T ) ? (l(g1) + l(g2)):

Theorem 4.4 apx  2opt: Proof: Consider an optimal solution. It divides V into two sets O1 and O2 with jO1j = K and

jO j = n ? K . Mark e to be a shortest edge between O and O : The edges EO1 [ EO2 [feg de ne a spanning tree of G, where EO1 and EO2 are the edges in MST (O ) and MST (O ) respectively. 2

Therefore,

1

2

1

l(T )  l(MST (O1)) + l(MST (O2)) + l(e) = opt + l(e):

2

(6)

1. opt < l(e). The set of edges EO1 [ EO2 [ fe g is a spanning tree of G with one edge of length  l(e). Therefore, by Theorem 2.4, T contains at most one edge of length  l(e). Removing this edge from T disconnects O1 from O2 . Since this removal leaves 2 connected components, they must be O1 and O2 . So Step 2 will nd the edge e and apx = opt: 2. l(e)  opt. By Equation (6), l(T )  opt + l(e)  2opt: If the algorithm halts at Step 2 then apx  l(T )  2opt: If Step 3 is reached then T contains at least 2 edges between O1 and O2. Hence, 2l(e)  l(g1) + l(g2): By Lemma 4.3 apx  2l(T ) ? (l(g1) + l(g2)); so that, apx  2l(T ) ? 2l(e) = 2(l(T ) ? l(e)); and by Equation (6) apx  2opt: 19

4.4 Example We show now that the bound of Theorem 4.4 is tight. Consider the graph in Figure 10 (a), and let K = 2: The optimal partition is fv0 ; v3g; fv1; v2g, with opt = 1. The MST T , chosen in Step 1 of the algorithm, is described in Figure 10 (b) and l(T ) = 2. Deleting any edge of T gives one set of 3 nodes and one set of 1 node. Therefore the algorithm continues to Step 3. In this case, K = n2 so that the K -centroid we are looking for is the centroid c = v0 . The algorithm nds T1; T2 and T3 (m = 3). Let VT1 = fv1g VT2 = fv2 g VT3 = fv3g; so that ET1 = ET2 = ET3 = ;; and e1 = (v0; v1); e2 = (v0; v2); e3 = (v0 ; v3): Then l(ET1 ) + l(e1) = l(ET2 ) + l(e2) = 1, and l(ET3 ) + l(e3) = 0. Doubling fe2g [ ET3 [ fe3 g gives the graph shown in Figure 11 (a), and creating the simple cycle gives the graph shown in Figure 11 (b). v0 0

1 1 1

v3

1

v1 0 1

1

1

v2 (a)

(b)

Figure 10: Deleting e2 = (v0 ; v2) and the opposite edge (v0 ; v3) leaves the edges (v0 ; v1) and (v2; v3) so that the partitioning o ered by Part 2 Alg consists of fv0; v1g and fv2 ; v3g. Thus, apx = l(v0; v1) + l(v2; v3) = 2 = 2opt:

4.5 A bound on opt Theorem 4.5 Let T = MST (G), then opt 

lT

3( ) 2

:

To prove the bound we describe an algorithm that achieves apx  theorem is proved. 20

lT

3( ) 2

. Since opt  apx; the

v0

v1

v0

v3

v3

v1 e2

v2

v2 (b)

(a)

Figure 11: The algorithm is described as Part 2 Bound in Figures 12 . It calls Cre Cycle de ned in Figure 7. The algorithm nds MST for G and a centroid c in this spanning tree. It then doubles part of the edges of the tree to nd two spanning trees, each containing n2 nodes, with lengths that sum up to  3l(2T ) . Note that if l(ETi0 ) + l(ei0 ) < l(2T ) then all the connected components satisfy this inequality, so that when n1 is de ned it must satisfy n1  2.

4.5.1 Evaluating Part 2 Bound To evaluate the algorithm we distinguish several cases: 1. l(Ci0 ) + l(ei0 )  l(2T ) . In this case l(ET n(ETi0 [ fei0 g))  l(2T ) : This gives that the sum of the edges in the graph after creating the cycle is  3l(2T ) . Spanning trees of P and Q can be obtained by deleting edges from this graph. Hence apx  3l(2T ) : 2. l(Ci0 ) + l(ei0 ) < l(2T ) .



Pn

i=1 jVTi j + 1 

n.

By the way n1 was de ned the sum of edges in the part of the graph which is doubled  l(2T ) and the bound is achieved. 1 jV j + 1 > n . By the de nition of n :  Pni=1 1 Ti 2 1

2

X (l(E

n1 ?1 i=1

Ti ) + l(ei)) 

l(T ) : 2

This is the part of the graph which is doubled. So the length of the graph after the simple cycle was created is  3l(2T ) . 21

Part 2 Bound

input

1. A graph G.

returns 1. fP; Qg where P [ Q = V and jP j = jQj = n . 2

2. A value apx satisfying apx = l(MST (P )) + l(MST (Q)).

begin Step 1

Step 2

T := MST (G). c := a centroid of T .

end Step 1

Remove c and the edges incident with it from T . A set of connected components fT1; T2; : : :; Tmg is created. ( Since c is centroid m  2 and jVTi j  n2 8i). ei := the edge connecting Ti to c in T , i = 1; : : :; m. Let i0 be the index in f1; : : :; mg with the biggest l(ETi0 ) + l(ei0 ).

if (l(ET 0 ) + l(ei0 )  l T ) then ( ) 2

i

else

G2 := Cre Cycle (T; ET n(ETi0 [ fei0 g)). P := VTi0 [ fcg [ f the adjacent n2 ? jVTi0 j ? 1 nodes on the cycle g. n X n := minfi 2 f1; : : :; mgj (l(ET ) + l(ei))  l(2T ) . i n X n if ( jV j + 1  ) 1

1

i

1

i=1

then else

=1

Ti

2

G2 := Cre Cycle(T; [mi=n1+1 (ETi [ feig)). n1 X n n 1 P := [i=1 VTi [ fcg[f adjacent 2 ? jVTi j ? 1 nodes on the cycle g. i=1

1?1 (E G2 := Cre Cycle(T; [ni=1 Ti [ fei g). P := VTn1 [ fcg [ f the adjacent n2 ? jVTn1 j ? 1 nodes on the cycle g.

end if end if Q := V nP . return (fP; Qg; apx := l(MST (P )) + l(MST (Q))) . end Step 2

end Part 2 Bound

Figure 12: Dividing the graph into 2 sets, an algorithm to bound opt

22

Thus, in both cases apx  3l(2T ) giving that opt  3l(2T ) : This algorithm however does not improve the 2 bound achieved before, since l(T ) may be bigger then opt. To see that the bound 3l(2T ) cab be (asymptotically) achieved consider a graph with n + 1 (n odd) nodes: a node u and the nodes fv1 ; : : :; vng. The distances between the nodes are: l(vi; u) = 1 8i 2 f1; : : :; ng: l(vi; vj ) = 2 8i 6= k 2 f1; : : :; ng: A MST ,T , has l(T ) = n, while opt = n?2 1 + 2( n+1 ? 1) = 3(n2?1) : 2

References [1] B. Chandra and M. Halldorsson, \Facility dispersion and remote subgraphs", Scandinavian Workshop on Algorithm Theory (SWAT), 1996, 53-65. Springer-Verlag, Proc. Of Fifth LNCS 1097. [2] H.N. Gabow, M.X. Goemans and D.P. Williamson, \An ecient approximation algorithm for the survivable network design problem", Proceedings of the Third MPS Conference on Integer Programming and Combinatorial Optimization, (1993) 57-74. [3] D. Gale, \Optimal assignments in an ordered set: An application of matroid theory", J. of Combinatorial Theory 4, (1968) 176-180. [4] M.X. Goemans and D.P. Williamson, \A general approximation technique for constrained forest problems", SIAM J. Comput, 24 (1995) 296-317. [5] M.X. Goemans and D.P. Williamson, \Approximating minimum-cost graph problems with spanning tree edges", Operations Research Letters 16, (1994) 183-189. [6] N. Guttmann-Beck and R. Hassin, \Approximation algorithms for min-max tree partition", Journal of Algorithms 24 (1997) 266-286. [7] C. Imielinska, B. Kalantari and L. Khachiyan, \A greedy heuristic for a minimum-weight forest problem" Operations Research Letters 14, (1993) 65-71. [8] O. Kariv and S.L Hakimi \An algorithmic approach to network location problems, Part II: p-medians", SIAM J. Appl. Math., 37, (1979) 539-560. [9] E.L Lawler, Combinatorial Optimization: Networks and Matroids, Holt, Rinehart and Winston (1976). [10] D.P. Williamson , \On the Design of Approximation Algorithms for a Class of Graph Problems", Ph. D. thesis, MIT, Cambridge,MA (1990). 23