Improved approximation bounds for the minimum rainbow subgraph problem J´an Katreniˇca,1,, Ingo Schiermeyerb a Institute of Computer Science, ˇ P. J. Saf´ arik University in Koˇsice, Jesenn´ a 5, 04154 Koˇsice, Slovakia b Institut f¨ ur Diskrete Mathematik und Algebra, Technische Universit¨ at Bergakademie Freiberg, 09596 Freiberg, Germany
Abstract In this paper we consider the Minimum Rainbow Subgraph problem (MRS): Given a graph G with n vertices whose edges are coloured with p colours, find a subgraph F ⊆ G of minimum order and with p edges such that F contains each colour exactly once. We present a polynomial time ( 21 + ( 12 + )∆)-approximation algorithm for the MRS problem for an arbitrary small positive . This improves the previously best known approximation ratio of 56 ∆. We also prove the MRS problem to be NP-hard and APX-hard for graphs with maximum degree 2. Finally we present an algorithm to find an optimal solution in running time O(2(p+2p log2 ∆) nO(1) ). 1. Introduction and motivation In this paper we consider only finite undirected graphs without loops or multiple edges. For a graph G = (V, E), V (G) and E(G) are the set of vertices and edges, respectively. The number of vertices and edges, denoted by |G| and ||G||, are the order and the size of G, respectively. Definition 1 (rainbow subgraph). Let G be a graph with an edge-colouring. A subgraph X of G is called rainbow subgraph if X does not contain two edges of the same colour. Email addresses:
[email protected] (J´an Katreniˇc),
[email protected] (Ingo Schiermeyer) 1 This work was partially supported by the Slovak VEGA grant 1/0035/09. Preprint submitted to Elsevier
September 22, 2010
Problem 1 (Minimum Rainbow Subgraph problem (MRS)). Given a graph G with n vertices and an edge-colouring with p colours, find a rainbow subgraph F ⊆ G of minimum order and with p edges. Note that a solution for the MRS problem may consist of several rainbow components. Furthermore, the given edge-colouring need not be a proper edge-colouring. Our research on the Minimum Rainbow Subgraph problem was motivated by an application from bioinformatics. The generation of genome populations in bioinformatics can be solved by computing minimum rainbow subgraphs (cf. [10] for a detailed description). This problem was also studied in [5] under the name minimum primer set and proved inapproximable to less than a [1 − o(1)] ln n − o(1) factor (see [5]). In [10] a polynomial time approximation algorithm is presented with an approximation ratio of 65 ∆ for graphs with maximum degree ∆. In this paper we improve the approximation ratio for polynomial time algorithms by proving that for each positive number , one can construct a polynomial time approximation algorithm with an approximation ratio of 0.5+(0.5+)∆. Matos Camacho et al. [10] proved the MRS problem to be NP-hard and APX-hard. In this paper we prove that this is true even for graphs with maximum degree 2. We also discuss the complexity of finding an optimal solution to the MRS problem. We show that the optimal solution can be found in running time O∗ (2p ) 2 , for graphs with maximum degree ∆ = 2. For the general case, we show that the optimal solution can be found in running time O∗ (2p ∆2p ). This shows that the MRS problem is fixed-parameter tractable for the class of graphs of bounded maximum degree when the number of edge colours is used as the parameter. 2. Improved approximation for MRS We first present an approximation algorithm for trees. Suppose the edges of G are coloured with p colours 1, 2, . . . , p. Let c be a colour function defined by c(e) = i if the edge e receives colour i. Construct a graph G0 with V (G0 ) = {v1 , v2 , . . . , vp } (vi corresponds to colour i) and vi vj ∈ E(G0 ) if there exist two adjacent edges e, f ∈ E(G) with 2
The notation O∗ (f (n)) suppresses factors that are polynomial in n.
2
c(e) = i and c(f ) = j. Now compute a maximum matching of size β(G0 ). √ This can be computed in running time O( nm) resp. O(n2.376 ) [11]. Next construct a graph H with V (H) ⊆ V (G) as follows: For each matching edge of M choose two adjacent edges in G with these two colours. For each vertex of V (G0 ) not in M choose an edge in G with this colour. In this way we obtain a graph H with p edges and |H| ≤ 2p − β(G0 )
(1)
In [10] the following result was presented, which has been first proved for connected graphs with even size m by Kotzig [7]. Theorem 1. Let G be a connected graph of size m. Then E(G) contains b m2 c pairwise edge disjoint paths of order three. If G contains a rainbow tree with p edges, then with Theorem 1 and (1) we obtain jpk 3 1 |H| ≤ 2p − β(G0 ) ≤ 2p − ≤ p+ . 2 2 2 Similarly, if G contains a rainbow forest with p edges and t components, this implies |H| ≤ 32 p + 21 t. Lemma 1. If G contains a rainbow forest F with p edges and t components, then one can construct a solution H for the MRS problem with |H| ≤ 1.5p + 0.5t in polynomial time. An approximation H computed in this way will be called ForestApproximation(G). We propose the following greedy algorithm for a fixed integer index k, k ≥ 3. Algorithm 1: Greedy MRSk (G) Input: A graph G on n vertices whose edges are coloured by p colours; Output: A rainbow subgraph H on p edges; H := ∅; 0. while G contains some rainbow subgraph L on exactly k vertices and at least k − 1 edges do H := H ∪ L; G := G r edges whose colours occur in L; 1. while G contains some rainbow cycle L do H := H ∪ L; G := G r edges whose colours occur in L; 2. H := H ∪ ForestApproximation(G); return H; 3
2.1. The complexity Since in each iteration during step 0 and step 1 at least one colour is removed from G, the total number of iterations is at most p. We claim that each iteration can be carried out in time O(k 2 nk ): • To find a rainbow subgraph on exactly k vertices and at least k−1 edges n one can check all combinations of k vertices, which is k . The number of colours in a graph induced by a given combination of k vertices can be computed running time O(k 2 ). This gives a complexity upper in n bound O( k k 2 ), which is O(k 2 nk ). • To find a rainbow cycle, from the previous step G does not contain a connected rainbow subgraph on k vertices. Therefore, such a cycle has at most k − 1 vertices. A variation of vertices uniquely identifies a cycle, therefore one can check all variations of at most k − 1 vertices, n which is at most O( k−1 (k − 1)!). For a given variation of at most k − 1 vertices of G one can easily check if it forms a rainbow cycle. This gives a complexity upper bound O(k 2 nk ). Thus the Greedy MRSk algorithm has time complexity O(pk 2 nk ), which is polynomial in the size of the input. 2.2. The approximation ratio We shall prove that the Greedy MRSk (G) algorithm has an approximation ratio of 0.5 + (0.5 + k1 )∆. Consider an optimal solution F for G. Let p1 denote the number of colours corresponding to edges that were added to H during steps 0 and 1. Let p2 = p − p1 . Let t2 denote the number of components of F ∩G during step 2. By construction the number of vertices 1 )p1 and the number added to H during the steps 0 and 1 is at most (1 + k−1 of vertices added to H during step 2 is at most 1.5p2 + 0.5t2 by Lemma 1. Therefore 1 |H| ≤ (1 + )p1 + 1.5p2 + 0.5t2 . (2) k−1 1 Since F ∩ G is a forest at step 2, p2 ≤ |F | − 1. Since (1 + k−1 ) ≤ 1.5, for k ≥ 3, the right side of (2) is largest for p2 = |F | − t2 and p1 = p − |F | + t2 . We obtain
4
k )(p − |F | + t2 ) + 1.5(|F | − t2 ) + 0.5t2 k−1 k k k =( )p − ( )|F | + ( )t2 + 1.5|F | − t2 k−1 k−1 k−1 k p 1 1 = |F |(( ) −1−( ) + 1.5) + ( )t2 k − 1 |F | k−1 k−1 p k ) + 0.5) ≤ |F |(( k − 1 |F |
|H| ≤ (
Note that in the last line we have used t2 ≤ |F |. Theorem 2. Let G be an instance of the MRS problem having n vertices and edges coloured by p colours. Let F be an optimal solution having |F | vertices and p edges. The algorithm Greedy MRSk (G) runs in time O(pk 2 nk ) 1 and returns a solution having at most 0.5|F | + (1 + k−1 )p vertices. Corollary 1. For every positive > 0 there is a polynomial time approximation algorithm on MRS with an approximation ratio of 0.5 + (0.5 + )∆, for graphs having maximum degree ∆. Proof. Let F be an optimal solution for the MRS problem having |F | vertices and p edges. Clearly, the degree of each vertex in F is at most ∆. Therefore, p 1 ≤ ∆2 . Using Theorem 2 we get an approximation ratio of 0.5 + (1 + k−1 ) ∆2 |F | which is at most 0.5 + (0.5 + k1 )∆. To get an approximation algorithm with ratio of 0.5 + (0.5 + )∆, it is enough to use the Greedy MRSk algorithm for a sufficiently large integer k.
3. The MRS problem for ∆ = 2 The MRS problem is NP-hard and APX-hard [10], since this is a kind of generalization of the Pure Parsimony Haplotyping problem [8, 9]. We provide a proof that the MRS problem remains hard even for graphs with maximum degree 2. Our proofs rely on a result of Berman and Karpinsky [2] on the 3-OCC-MAX 2SAT problem. This problem is the restriction of the maximum satisfiability problem to instances for which each clause contains at most two variables and each variable occurs in at most three clauses. Theorem 3 (Berman and Karpinsky [2]). For every > 0, it is NP-hard to approximate 3-OCC-MAX 2SAT within a factor of 2012 − . 2011 5
In the following we shall assume that no clause contains a variable more than once, and for each variable x the input formula contains at least one clause with literal x and at least one clause with literal x. Such a reduced − , see e.g. [12]. problem is NP-hard to approximate within a factor of 2012 2011 Theorem 4. The Minimum Rainbow Subgraph problem is NP-hard for graphs with maximum degree ∆ = 2. Proof. We reduce 3-OCC-MAX 2SAT to MRS with ∆ = 2. Let F be an instance of the 3-OCC-MAX 2SAT problem having n variables and c clauses C1 , . . . , Cc . Let occ(l, j) denote the total number of occurrences of literal l in clauses C1 , C2 , . . . , Cj . Clearly, occ(l, j) ∈ {0, 1, 2} for each l and j. From a given formula F we create a graph G having 4n + c vertices a1 , . . . , ac , x11 , . . . , x1n , x21 , . . . , x2n , x11 , . . . , x1n , x21 , . . . , x2n and n + c colours A1 , . . . , Ac , B1 , . . . , Bn . For i = 1, . . . , n we put into G edges (x1i , x2i ) and (x1i , x2i ) coloured by Bi . For each literal l and j = 1, . . . , c if the j-th clause contains literal l then put into G an edge (locc(l,j) , aj ) coloured by colour Aj . Clearly, G has maximum degree at most 2. To finish the proof we prove the following: Lemma 2. One can satisfy at least s clauses in F if and only if the MRS for G has at most 2n + 2c − s vertices. We will say that a subgraph H covers a colour, if at least one edge of H has this colour. ⇒ Let R be the solution for F satisfying s clauses. We construct a solution H for a graph G having 2n + 2c − s vertices in the following way. H consists of all vertices a1 , . . . , ac , and for each literal l ∈ R put into H vertices {l1 , l2 }. Now, at most c − s colours among of A1 , . . . , Ac are not covered by H. Therefore, for each clause C, which is unsatisfied by R, add into H vertex l1 , l ∈ C. Now |H| = c + 2n + c − s and H covers all the colours. ⇐ Let H ⊆ G be a MRS for G with |H| = 2n + 2c − s. To cover all colours A1 , . . . , Ac , H must contain all the vertices a1 , . . . , ac . To cover colour Bi , H must satisfy {x1i , x2i } ⊆ V (H) or {x1i , x2i } ⊆ V (H), for each i = 1, . . . , n. We construct a solution R for the formula F satisfying s clauses. For each variable x, if {x1 , x2 } ⊆ V (H), then we put x into R, otherwise {x1 , x2 } ⊆ V (H) and we put x into R. Now let H 0 ⊆ H with V (H 0 ) = {a1 , . . . , ac } ∪ {lq : l ∈ R, q = 1, 2}. Since V (H 0 ) ⊆ V (H) and |H 0 | = c + 2n, H 0 does not cover at most c − s colours among A1 , . . . , Ac . 6
Since colour Ai corresponds to satisfying clause Ci , R satisfies at least s clauses. Theorem 5. The Minimum Rainbow Subgraph problem is APX-hard for graphs with maximum degree ∆ = 2. Proof. We use the same reduction as in the previous proof. First, note that c ≤ 3s, because one can always satisfy at least 13 of clauses of any instance of 3-OCC-2MAX SAT. This also implies n ≤ 3s because there are at most 2 literals in each clause and each variable occurs at least twice. Let H be an optimal solution of MRS problem for G. According to Lemma 2, |H| = 2n + 2c − s. Now consider a polynomial time approximation algorithm with an approximation ratio of 1+a for the MRS problem with ∆ = 2. Such an algorithm returns a solution H 00 , |H 00 | ≤ (1+a)(2n+2c−s) = (2n+2c−s)+2an+2ac−as. Using c ≤ 3s, n ≤ 3s we get |H 00 | ≤ (2n + 2c − s) + 11as. Now, we are going to create a solution R00 for SAT. Since H 00 covers all colours B1 , . . . , Bc , for each variable x it holds that {x1 , x2 } ⊆ V (H 00 ) or {x1 , x2 } ⊆ V (H 00 ). If {x1 , x2 } ⊆ V (H 00 ) then we put literal x into R00 , otherwise {x1 , x2 } ⊆ V (H 00 ) and we put x into R00 . Now let H 0 ⊆ H 00 with V (H 0 ) = {a1 , . . . , ac } ∪ {lq : l ∈ R00 , q = 1, 2}. Since H 0 ⊆ H 00 and |H 0 | = c + 2n, H 0 does not cover at most c − s + 11as colours among A1 , . . . , Ac . Since colour Ai corresponds to satisfying clause Ci , R00 satisfies at least s − 11as clauses. Therefore, R00 has an approximation 1 . Finally Theorem 3 claims 3-OCC-MAX 2SAT is NPratio of at most 1−11a 2012 hard to approximate within ratio 2011 − , that implies a cannot be smaller 1 , unless P = N P . than 11·2012
4. Finding an optimal solution Let G be a graph having n vertices, m edges, maximum degree ∆ and edges coloured by p colours. A straightforward algorithm to find an optimal solution for the MRS problem can check all possible combinations of edges. Such an approach leads to the complexity upper bound O(mp ), since it chooses p edges from the set of m edges. Let C denote the set of all rainbow connected subgraphs of G. The MRS problem can be transformed to a weighted set covering problem [6], where one needs to cover the set of all colours by elements of C, where the weight 7
of subgraph C, C ∈ C, is |C|. Therefore, if |C| is polynomial in n (e.g. if ∆ ≤ 2), a simple dynamic programming can be used to get a O∗ (2p ) time, O∗ (2p ) space exact algorithm [4]. Moreover, for any r ≥ 1 one can construct an (1 + ln r)-approximation algorithm which runs in O∗ (2p/r ) time [4]. In the following, we provide an exact algorithm to MRS running in ∗ p 2p O (2 ∆ ) time and O∗ (2p ) space. Our technique is motivated by [1] where a combination of random ordering and dynamic programming was used for finding simple paths in graphs. First consider the case that the colouring in the input is a proper edge colouring and an optimal solution F is a connected subgraph of G. We denote by Walk(v, c1 , c2 , . . . , ct ) the set of all vertices of G contained in a walk starting in a vertex v ∈ V (G) and traveling along edges of colours c1 , c2 , . . . , ct . We say that a sequence of colours c1 , c2 , . . . , ct reveals the subgraph F , if for some vertex v holds V (F ) = Walk(v, c1 , c2 , . . . , ct ). A sequence of colours revealing F is enough to construct a solution of size |F | in polynomial time. The only question is, how to get such a sequence. Clearly, F can be traveled in at most 2(|F |−1) steps, since one can construct such a walk from a spanning tree of F visiting each edge at most twice. Generating all possible sequences of colours of length at most 2p − 1, we get an algorithm which runs in time O∗ (p2p ). Now consider the case that the optimal solution F is not connected but we have one sequence of colours c1 , c2 , . . . , ct , which is a concatenation of several sequences, each revealing one component of F . We resolve this sequence using dynamic programming. I.e. for each subset of colours S and for each pair of integers i, j, i < j we solve a subproblem (S, i, j) indicating the minimum order of a subgraph of G having reveal sequence ci , . . . , cj and containing all colours of S. For a subgraph L, L ⊆ G, let ColourSet(L) denote the set of all colours occurring at edges of L. The following function is an example of implementation, which computes the minimum order of a solution using memoization [3], which is a technique to avoid repeating the calculation of results for previously processed inputs. The algorithm starts by setting sol[S, i, j] =? for every S ⊆ 2[p] , i, j ∈ {1, . . . , t} and calls solve({1, 2, . . . , p}, 1, t), where 2[p] denotes the power set of {1, 2, . . . , p}.
8
Function solve(S, i, j) Input: set of colours S, integers i, j ≤ t; Output: the minimum order of a subgraph of G covering S and having reveal sequence ci , . . . , cj ; if S = ∅ then return 0; if i > j then return +∞; if sol[S, i, j] 6=? then return sol[S, i, j]; sol[S, i, j] := +∞ ; for k := i to j do foreach v ∈ V do C := Walk(v, ci , . . . , ck ); sol[S, i, j] := min(sol[S, i, j], |C| + solve(S r ColourSet(C),k + 1,j)); return sol[S, i, j]; Since the total number of subproblems is O(2p t2 ) and each subproblem is computed in O(nO(1) ) time (using results of shorter subproblems), one sequence of colours can be resolved in time and space O∗ (2p t2 ). Finally, considering all sequences of colours of length at most 2p, we get an algorithm which runs in time O∗ (2p+2p log2 p ), if the input colouring is a proper colouring. To resolve the case when the input colouring is not a proper edge colouring, one can generate a sequence of positive integers not greater than ∆, instead of a sequence of colours. The total number of such sequences is at most O(22p log2 ∆ ). Theorem 6. There is a deterministic algorithm to find an optimal solution on the MRS problem which runs in O(2(p+2p log2 ∆) nO(1) ) time and O(2p nO(1) ) space. Using the O∗ notation this algorithm runs in O∗ (2p ∆2p ). This shows that MRS problem is fixed-parameter tractable for the class of graphs of bounded maximum degree when the number of edge colours is used as the parameter. Acknowledgement We would like to thank the three referees for their valuable comments and suggestions.
9
[1] N. Alon, R. Yuster, and U. Zwick. Color-coding. J. ACM, 42(4):844–856, 1995. [2] P. Berman and M. Karpinski. On some tighter inapproximability results. In Proc. of the 26th Int. Colloquium on Automata, Languages and Programming, volume 1644 of Lecture Notes in Computer Science, pages 200–209. Springer, Berlin, 1999. [3] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms, pages 347–349. MIT Press, 2007. [4] M. Cygan, L. Kowalik, and M. Wykurz. Exponential-time approximation of weighted set cover. Inf. Process. Lett., 109(16):957–961, 2009. [5] R. J. Fernandes and S. S. Skiena. Microarray synthesis through multipleuse PCR primer design. Bioinformatics, 18(1):128–135, 2002. [6] R. Karp. Reducibility among combinatorial problems. In R. Miller and J. Thatcher, editors, Complexity of Computer Computations, pages 85– 103. Plenum Press, 1972. [7] A. Kotzig. From the theory of finite regular graphs of degree three and ˇ four. Casopis Pˇestov. Mat., 82:76–92, 1957. [8] G. Lancia, M. C. Pinotti, and R. Rizzi. Haplotyping populations by pure parsimony: Complexity of exact and approximation algorithms. INFORMS Journal on Computing, 16(4):348–359, 2004. [9] G. Lancia and R. Rizzi. A polynomial case of the parsimony haplotyping problem. Oper. Res. Lett., 34(3):289–295, 2006. [10] S. Matos Camacho, I. Schiermeyer, and Z. Tuza. Approximation algorithms for the minimum rainbow subgraph problem. Discrete Mathematics, 310(20):2666–2670, 2010. [11] M. Mucha and P. Sankowski. Maximum matchings via gaussian elimination. In Proc. of the 45th Annual IEEE Symposium on Foundations of Computer Science, pages 248–255. IEEE Computer Society, Washington, DC, 2004. [12] D. Rautenbach and F. Regen. On packing shortest cycles in graphs. Inf. Process. Lett., 109(14):816–821, 2009. 10