Phylogenetic
k-Root
and Steiner
k-Root
Guo-Hui Lin1 2 , Paul E. Kearney1 , and Tao Jiang2 3 ;
?
??
;
???
Department of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. Department of Computing and Software, McMaster University, Hamilton, Ontario L8S 4L7, Canada. 3 Department of Computer Science, University of California, Riverside, CA 92521. 1
2
Abstract. Given a graph
G
= (V; E ) and a positive integer k, the
Phylogenetic k-Root Problem asks for a (unrooted) tree T without
degree-2 nodes such that its leaves are labeled by V and (u; v) 2 E if and only if dT (u; v) k. If the vertices in V are also allowed to be internal nodes in T , then we have the Steiner k-Root Problem. Moreover, if a particular subset S of V are required to be internal nodes in T , then we have the Restricted Steiner k-Root Problem. Phylogenetic k-roots and Steiner k-roots extend the standard notion of graph roots and are motivated by applications in computational biology. In this paper, we rst present O(n + e)-time algorithms to determine if a (not necessarily connected) graph G = (V; E ) has an S -restricted 1-root Steiner tree for a given subset S V , and to determine if a connected graph G = (V; E ) has an S -restricted 2-root Steiner tree for a given subset S V , where n = jV j and e = jE j. We then use these two algorithms as subroutines to design O(n + e)-time algorithms to determine if a given (not necessarily connected) graph G = (V; E ) has a 3-root phylogeny and to determine if a given connected graph G = (V; E ) has a 4-root phylogeny. Keywords: Graph power, graph root, tree power, tree root, phylogeny, computational biology, maximal clique, critical clique, ecient algorithm
1 Introduction A fundamental problem in computational biology is the reconstruction of the evolutionary history of a set of species from biological data. This evolutionary history is typically modeled by an evolutionary tree or phylogeny. A phylogeny is a tree where the leaves are labeled by species and each internal node represents a special event whereby an ancestral species gives rise to two or more child species. Supported in part by NSERC Research Grant OGP0046613 and a CITO grant. Email:
[email protected]. ?? Supported in part by NSERC Research Grant 160321 and a CITO grant. Email:
[email protected]. ? ? ? Supported in part by NSERC Research Grant OGP0046613 and a UCR startup grant. Email:
[email protected]. ?
2
Both rooted and unrooted trees have been used to describe phylogenies in the literature; but here we will only consider unrooted trees. The internal nodes of a phylogeny usually have degrees (in the sense of unrooted trees) at least three. Proximity within the phylogeny corresponds to evolutionary similarity. In this paper we investigate the computational feasibility of constructing phylogenies from species similarity data. Speci cally, interspecies similarity is represented by a graph where the vertices are the species and adjacencies represent evidence of evolutionary similarity. A phylogeny is then constructed from the graph such that the leaves of the phylogeny are labeled by vertices of the graph and two vertices are adjacent in the graph if and only if the corresponding leaves in the tree are connected by a path of length at most , where is a chosen proximity threshold. Recall that the length of the path connecting two nodes and in tree is the distance between and , and is denoted by ( ). This gives rise to the following algorithmic problem: k
u
v
T
u
k
v
dT u; v
Phylogenetic k-Root Problem (PkRP):
Given a graph = ( ), nd a phylogenetic tree such that its leaves are labeled by and for each pair of vertices 2 , ( ) 2 if and only if ( ) . G
V; E
T
V
u; v
E
u; v
dT u; v
V
k
Such a tree (if exists) is called a -root phylogeny of and is called the th phylogenetic power of . It is useful to extend P RP to allow the vertices in to appear as internal nodes in the output tree . Let us call such a tree a Steiner tree of . The nodes of that are not vertices in are termed Steiner points. Observe that a phylogenetic tree of is a Steiner tree of where all internal nodes are Steiner points and have degree at least 3. We will be interested in the following extension of P RP. Steiner -Root Problem (S RP): Given a graph = ( ), nd a Steiner tree of such that all Steiner points have degree at least 3 and for each pair of vertices 2 , ( ) 2 if and only if ( ) . If exists then it is called a -root Steiner tree of and the th Steiner power of . Phylogenetic power and Steiner power might be thought of as Steiner extensions of the standard notion of graph power. A graph is the th power of a graph (or equivalently, is a -root of ) if vertices and are adjacent in if and only if the length of the shortest path from to in is at most . An important special case of graph power/root problems is the following: T
k
G
G
k
T
k
V
T
T
T
V
V
V
V
k
k
k
G
u; v
V
u; v
V; E
T
E
dT u; v
T
V
k
k
G
G
k
T
G
H
H
k
G
k
u
G
u
v
v
H
Tree -Root Problem (T RP): Given a graph = ( ), nd a tree on such that ( if and only if ( ) . If exists then it is called a -root tree of . k
k
G
V; E
dT u; v
T
T
k
k
G
V
u; v
k
)2
E
3
It is NP-complete to recognize a graph power [5], but it is possible to determine if a graph has a -root tree, for any xed , in ( 3 ) time, where is the number of vertices in the input graph [3]. For the special case = 2, determining if a graph has a 2-root tree takes ( + ) time [4], where is the number of edges in the input graph. There is rich literature on graph roots and powers (see [1] for an overview) but few results on phylogenetic roots/powers and Steiner roots/powers. Recently, Nishimura, Ragde, and Thilikos [6] presented an ( 3)time algorithm for a variant of P RP, for = 3 and = 4, where internal nodes (Steiner points) are allowed to have degree 2. This algorithm does not work directly for P RP, as we will see later that eliminating degree-2 Steiner points is not an easy task. Moreover, it is assumed in [6] that each connected component of the input graph can be dealt with separately and the resulting trees joined together via paths of length , on which all Steiner points have degree 2. We see that this is not true when degree-2 Steiner points are not allowed. In this paper, we investigate P RP and a restricted version of S RP that has applications in the solution of P RP. In this restricted version, we are given an input graph = ( ) and a subset and we need to nd a -root Steiner tree for with the further constraint that every vertex in must have degree at least 2 (i.e., they must be internal nodes in ). If such a Steiner tree exists, it is called an -restricted -root Steiner tree. This restricted version of S RP will be referred to as the Restricted Steiner -root Problem (RS RP). As in the practice of computational biology where the set of species under consideration are related, we are interested in connected graphs. Nonetheless, whenever possible, we will extend our results to disconnected graphs. Notice that problem P1RP is not interesting and problem P2RP can be trivially answered by checking whether or not the input graph is a complete graph (for being disconnected, checking whether or not every connected component of is complete and if there are at least two non-singleton connected components). Therefore, we will assume that 3 in P RP. The main contributions in this paper are k
k
O n
n
k
O n
e
e
O n
k
k
k
k
k
k
k
k
G
T
V; E
S
V
k
G
S
T
S
T
k
k
k
k
G
G
G
k
k
1. an ( + )-time algorithm to determine if a (not necessarily connected) graph = ( ) has a 3-root phylogeny, and if so, demonstrate one such phylogeny; 2. an ( + )-time algorithm to determine if a connected graph = ( ) has a 4-root phylogeny, and if so, demonstrate one such phylogeny; 3. an ( )-time algorithm to determine if a (not necessarily connected) graph = ( ) has an -restricted 1-root Steiner tree for a given subset , and if so, demonstrate one such Steiner tree; and 4. an ( + )-time algorithm to determine if a connected graph = ( ) has an -restricted 2-root Steiner tree for a given subset , and if so, demonstrate one such Steiner tree. O n
e
G
O n
V; E
e
G
V; E
O n
G
V; E
O n
S
S
e
G
S
S
V
V; E
V
The results (3) and (4) are essential to the construction of our algorithms for results (1) and (2). We also characterize some properties for graphs that have a -root phylogeny for general 3 and some structural properties for -root k
k
k
4
phylogenies. These characterizations may be useful in discovering algorithms for general . We assume throughout the paper that in P RP, is not a complete graph, since otherwise it always has a -root phylogeny. We introduce some notations and de nitions, as well as some existing related results, in Section 2. Section 3 is devoted to S RP and RS RP, where we give the two ( + )-time algorithms. P RP is studied in Section 4. We rst present some properties of graphs having a -root phylogeny for general and some structural properties of -root phylogenies. We then reduce P3RP and P4RP to RS1RP and RS2RP, respectively, in ( + ) time. We conclude our paper in Section 5 with several possible future research topics. k
k
G
k
k
k
O n
e
k
k
k
O n
k
e
2 De nitions For an induced subgraph of , we abuse to denote its vertex set. Fixing a subset of , let ( ) (where 2 ) denote the set of neighbors of in n . If all the ( ) are identical, we call it the set of neighbors of (in ) and denote it by ( ). Note that by de nition, \ ( ) = ;. Suppose has a -root phylogeny , then for a Steiner point in , ( ) denotes the set of vertices in that are adjacent to in . j ( )j is the Vdegree, denoted by V-deg( ), of ; the number of Steiner points adjacent to is the S-degree, denoted by S-deg( ), of . The degree of in , denoted by deg( ), is the sum of V-deg( ) and S-deg( ), which should be greater than or equal to 3. A Steiner point with S-deg( ) = 1 is called a leaf Steiner point, otherwise an internal Steiner point. A graph is chordal if it contains no induced subgraph which is a cycle of size greater than 3. S
U
V
V
U
G
S
NU v
v
U
v
NU v
U
N U
G
U
k
T
s
V
s
V
N U
s
T
T
N s
N s
s
s
s
s
s
s
T
s
s
s
s
Lemma 1. If has a -root phylogeny, then is chordal. Lemma 2. If has a -root Steiner tree, then is chordal. Corollary 1. If has an -restricted -root Steiner tree for a subset , then G
is chordal.
G
k
G
k
G
G
G
S
k
S
Lemma 3. [7,8] Checking if a graph is chordal can be carried out in ( + ) G
time.
O n
e
Lemma 4. [2,7] Computing all maximal cliques in a chordal graph can be carried out in O(n + e) time.
3
Restricted Steiner k-Root Problem
In this section, we present ecient algorithms for RS1RP and RS2RP. These results have applications in the solutions of P3RP and P4RP discussed in the next section. We rst note that RS RP is actually more general than S RP since the given subset of vertices can be empty. k
S
k
5
3.1
k=1
Problem S1RP, for a graph = ( ), simply asks if is a forest containing one tree or more than two trees. That is, if is a tree, then it is a 1-root Steiner tree for itself and if is a forest containing more than 2 trees, then connecting one vertex from each tree to a common Steiner point forms a 1-root Steiner tree. In the following, we concentrate on RS1RP where the given subset is nonempty. An easy observation is that an -restricted 1-root Steiner tree for must include as an induced subgraph. It follows that is a forest. If is connected, thus a tree, and every vertex in has degree at least 2 in , then is an -restricted 1-root Steiner tree for itself. Otherwise, we conclude that has no -restricted 1-root Steiner tree. Thus, we assume in the following that is disconnected. Let us consider how to optimally connect the components of into a tree via Steiner points so that every vertex in has degree at least 2. We may simplify set by excluding vertices with degree at least 2 in , and assume that the vertices in are either leaves or isolated vertices in . Call a (component) tree in containing at least two vertices of a -tree. A vertex in appearing in some -tree is called a -vertex. Suppose that there are -trees and -vertices. Assume that contains isolated vertices in , each corresponding to a singleton component tree of . Call these vertices -vertices. For each of the remaining component trees of , which either has more than one vertex and contains at most one vertex of or consists of single isolated vertex outside , we choose a vertex to be connected to a Steiner point, as follows: if it contains a vertex of , we choose this vertex; otherwise we choose an arbitrary vertex. These chosen vertices are called -vertices. Suppose that we have -vertices. We want to interconnect the - and - vertices via Steiner points into a tree with the maximum number of internal edges (so that the -vertices can be inserted). Therefore, every Steiner point has degree exactly 3. Three -vertices from dierent -trees can be interconnected into a bigger tree via a Steiner point. This new tree has at least 3 -vertices that do not yet satisfy the degree requirement, and can thus be thought of as a -tree. It follows that by introducing ( ? 1) 2 Steiner points we can interconnect all the -trees into a single -tree, when is odd. This will make 3( ? 1) 2 -vertices satisfy the degree requirement, that is, have degree 2. Each of the other ? 3( ? 1) 2 -vertices needs 2 -vertices to satisfy its degree requirement (if an -vertex belonging to is used here, the connection gives it degree 2 as well). Therefore, if 2 ? 3( ? 1) then we can conclude that has no -restricted 1-root Steiner tree. The same conclusion can be made when is even. If 2 ? 3( ? 1), we can use 2 ? 3( ? 1) ? 2 -vertices to make ? 3( ? 1) 2 ? 1 -vertices have degree 2. This leaves us with ? 2 + 3( ? 1) + 3 = ? 2 + 3 vertices to be interconnected into a tree, among them one is -vertex and all the others are -vertices. Since a full Steiner tree interconnecting ? 2 + 3 vertices can contain up to ? 2 + 3 ? 3 internal edges, on which those -vertices may be inserted (to satisfy their degree requirements). If ? 2 + 3 ? 3 then it is impossible to insert these -vertices to produce an -restricted 1-root Steiner G
V; E
G
G
G
S
S
G
G
G
G
S
G
G
S
G
S
G
G
S
S
G
S
G
G
S
Z
Z
S
Y
Y
z Z
S
w
y
G
G
W
G
S
S
S
X
x X
Y
X
W
Y
Z
Y
Z
z
=
Z
Z
z
z
=
Y
y
Y
z
X
=
X
S
x
x
W
S
y
z
Y
z
z
6
tree of . Therefore if this is the case, has no -restricted 1-root Steiner tree. If ? 2 + 3 ? 3, then putting one -vertex on an internal edge would result in an -restricted 1-root Steiner tree of . Consider the nal case that = 0, which implies that = 0. If = 2, then has no -restricted 1-root Steiner tree. If 3, we interconnect the -vertices into a full Steiner tree with ? 3 internal edges. If ? 3, we can construct an -restricted 1-root Steiner tree by inserting one -vertex on an internal edge of the full Steiner tree; Otherwise has no -restricted 1-root Steiner tree. A high-level description of the overall algorithm, Algorithm Restricted 1-Root Steiner Tree, is depicted in Figure 1. Note that whenever No is returned, we may stop and conclude that has no -restricted 1-root Steiner tree. The time complexity is linear in since Step 1 guarantees that . G
w
x
G
y
z
S
W
S
G
z
G
y
S
x
x
x X
x
w
S
x
W
G
G
S
S
n
e < n
Theorem 1. Given a graph = (
) and a subset of vertices, there is an ( )-time algorithm that determines if has an -restricted 1-root Steiner tree, G
O n
V; E
G
and if so, demonstrates one such Steiner tree.
S
S
1. If G contains a cycle, return No. 2. Modify set S ; 3. Determine the z Z -trees, y Y -vertices, x X -vertices, and w W -vertices. 4. If z = 0: 4.1 If w = 0: 4.1.1 If x = 1 (G is connected): 4.1.1.1 If the X -vertex is in S , return No; 4.1.1.2 Otherwise the tree is a solution. 4.1.2 If x = 2, return No. 4.1.3 If x 3, interconnect the X -vertices into a full Steiner tree. 4.2 If w > 0: 4.2.1 If w > x ? 3, return No; 4.2.2 Otherwise, interconnect the X -vertices into a full Steiner tree with x ? 3 internal edges, and insert one W -vertex on an internal edge. 5. If z > 0: 5.1 If x < 2y ? 3z + 3, return No. 5.2 Else if w > x ? 2y + 3z ? 3, return No. 5.3 Else, output the following tree: 5.3.1 Use a Steiner point to interconnect 3 Y -vertices, one from a Z -tree; if necessary, use a Steiner point to interconnect 2 Y -vertices, from the last two Z -trees and one X -vertex. 5.3.2 Use a Steiner point to interconnect one Y -vertex and two X -vertices, if there is more than one Y -vertex; 5.3.3 Interconnect the last Y -vertex and the remaining X -vertices into a full Steiner tree with x ? 2y + 3z ? 3 internal edges; 5.3.4 Insert one W -vertex on an internal edge.
Fig. 1. Algorithm Restricted 1-Root Steiner Tree.
7
In the instance illustrated in Figure 2, nodes in circle are vertices and those shaded are in subset and nodes in square are Steiner points added for the sake of interconnection. For this instance, ( ) = (2 5 9 2). Therefore has an -restricted 1-root Steiner tree, as shown in Figure 2(c). The tree shown in Figure 2(b) is the full Steiner tree obtained before Step 5.3.4, which has 2 internal edges on which the 2 -vertices are going to be inserted respectively. S
z; y; x; w
;
;
;
G
S
W
(a) G and subset S
(b) Full Steiner tree
(c) Final Steiner tree
Fig. 2. An instance of RS1RP.
3.2
k=2
We assume in this subsection that is connected. We prove some structural properties of 2-root Steiner trees rst, and then design the ( + )-time algorithm for RS2RP. Due to the space constraint, proofs of lemmas are not provided. The interested reader may refer to the full paper available at URL http://www.math.uwaterloo.ca/~ghlin. G
O n
e
Lemma 5. Suppose that has a 2-root Steiner tree . The vertices in a maximal clique of either are adjacent to a common Steiner point implying that j j 3, or form a claw rooted at some vertex in , in tree . G
K
T
G
K
K
T
Proof. Since = 2, two vertices in are either adjacent or separated by a node. If they are adjacent, then all the other vertices in are adjacent to one of them, that is, vertices in form a claw in . If they are separated by a node and this separating node is a vertex, then it is a vertex in and all the other vertices in must be adjacent to it, that is, vertices in form a claw in . If this separating node is a Steiner point, then all vertices in are adjacent to this Steiner point. Furthermore, we conclude that no vertex outside could be adjacent to the separating node, whatever it is, since otherwise wouldn't be maximal. It follows that if the node is a Steiner point, then j j 3. 2 k
K
K
K
T
K
K
K
T
K
K
K
K
8
We also say in the above situation that maximal clique Steiner point, or maximal clique forms a claw, in . Corollary 2. Suppose that has a 2-root Steiner tree . K
K
is adjacent to a
T
G
T
1. Two maximal cliques in G cannot form claws rooted at a same vertex. 2. If there is a maximal clique K of size 2, then the two vertices in K are adjacent in T . We say that K forms a claw, rooted at either vertex, in T . 3. If vertex v belongs to c distinct maximal cliques in G, then v has degree at least c in T . 4. If a vertex v belongs to a unique maximal clique K of G and the degree of v is greater than 1 in T , then K forms a claw in T rooted at v and jK j 3. 5. If vertices v1 and v2 both belong to a unique maximal clique K in G, then one of them must be a leaf in T .
Lemma 6. If has a 2-root Steiner tree , then two maximal cliques
G T K1 and in G can overlap by at most two vertices; and if they overlap by two vertices, say v1 and v2 , then K1 and K2 form claws in T rooted at v1 and v2 (or, v2 and v1 ), respectively. K2
Proof. If K1 and K2 overlap by more than 2 vertices, then
must contain a cycle, contradicting the fact that is a tree. Suppose that they overlap by two vertices 1 and 2. Then neither of them could be adjacent to a common Steiner point nor could they form a claws rooted at vertices other than 1 and 2 . The lemma then follows from Item 1 of Corollary 2. 2 Corollary 3. Suppose that has a 2-root Steiner tree . T
T
v
v
v
G
v
T
1. No three distinct maximal cliques can overlap by two vertices. 2. For every maximal clique K of G, there is at most one vertex v in K such that if jK \ K1 j = 2 and jK \ K2 j = 2, then fvg = K \ K1 \ K2 . If such a vertex exists in K , then K forms a claw in T rooted at this vertex.
Theorem 2. Given a connected graph G = (V; E) and a subset S of vertices, there is an O(n + e) time algorithm that determines if G has an S -restricted 2-root Steiner tree, and if so, demonstrates one such Steiner tree. Proof. We spend ( + ) time to check if is chordal, and if so, to compute all the maximal cliques in . For every vertex in , if we nd that it belongs to more than one maximal clique, we simply ignore it since by Corollary 2 it has degree more than 1 in any 2-root Steiner tree of ; Otherwise, we restrict the maximal clique containing to form a claw rooted at . Whenever a con ict occurs, we may stop and conclude that has no -restricted 2-root Steiner tree. After this, we move on to determine for the relevant maximal cliques how they form claws, according to Corollaries 2 and 3, and Lemma 6. Finally, for every remaining maximal clique, we proceed to connect all its vertices to a common Steiner point. Note that distinct maximal cliques need distinct Steiner points. We will get in this way a connected graph spanning a superset of and every O n
e
G
G
v
S
G
K
v
v
G
S
T
V
9
vertex in has degree at least 2. Moreover, since is chordal, contains no cycle and thus is a tree. By Lemma 5 and Corollary 2, nonadjacent vertices in has a distance at least 3 in . Therefore, is an -restricted 2-root Steiner tree for . A high-level description of the overall algorithm, Algorithm Restricted 2-Root Steiner Tree, is depicted in Figure 3. Note that whenever No is returned, we may stop and conclude that has no -restricted 2-root Steiner tree. Notice that handling those maximal cliques takes time in ( ). In fact, we may record for each edge in which maximal clique contains it, and use this information to determine which maximal clique should form a claw in the Steiner tree. The overall algorithm thus runs in ( + ) time. 2 S
G
G
T
T
T
S
G
G
S
O e
E
O n
1. 2. 3. 3.1 3.2 3.3 3.4 4. 4.1 4.2 4.3 4.4 4.5 4.6 4.7 5. 5.1 6.
e
Determine if is chordal; If not, return No. Compute all maximal cliques of ; For each vertex 2 : If belongs to more than one maximal clique, ignore it; Else if the maximal clique containing it has size 2, return No; Else the maximal clique forms a claw rooted at ; If a con ict occurs, return No. For relevant maximal cliques: If j j = 2, connect its two vertices by an edge. If j 1 \ 2 j 2, return No. If j 1 \ 2 \ 3j = 2, return No. If 1 \ 2 = f 1 2 g and 1 \ 3 = f 3 4g with 1 2 3 4 being four distinct vertices, return No. If 1 \ 2 = f 1 2 g and 1 \ 3 = f 1 3 g, 1 forms a claw rooted at 1 , and 2 and 3 form a claw rooted at 2 and 3 , respectively. If 1 \ 2 = f 1 2 g, 1 and 2 form a claw rooted at 1 and 2 , respectively. If a con ict occurs, return No. For each remaining maximal clique : Connect the vertices to a common Steiner point. Output the obtained graph as the -restricted 2-root Steiner tree. G
G
v
S
v
v
K
K
K
K
K
K
K
K
K
v ;v
K
v
K
>
v ;v
K
K
K
K
K
v ;v
K
v ;v
k
v ;v
v ;v ;v ;v
K
v
K
v
K
v
v
K
S
Fig. 3. Algorithm Restricted 2-Root Steiner Tree.
4
Phylogenetic k-Root Problem
4.1 Preliminaries
In this subsection, we present some preliminary results for problem P RP as well as some structural properties of the -root phylogenies of a graph . Graph could be disconnected. k
k
G
G
10
Lemma 7. If
G has a k -root phylogeny T , then for any Steiner point s, the induced subgraph on N (s) is a clique in G and vertices in N (s) have a same set of neighbors in V n N (s).
Given a graph = ( ), let denote a clique in such that the vertices in have a same set of neighbors in n , which is denoted as ( ). The number of vertices in is j j and we let ( ) denote the size of ( ). We say that is maximal if either ( ) = 0, meaning that is a complete connected component of , or ( ) 1 such that in ( ) there is no vertex which is adjacent to all other vertices in ( ) but no other vertex outside [ ( ), meaning that is not expandable. This kind of maximal cliques are called critical cliques of . Note that a critical clique is not necessarily a maximal clique, but it is always completely contained in some maximal clique(s). G
V; E
K
K
G
V
K
K
K
N K
n K
N K
n K
G
K
K
n K
N K
N K
K
N K
K
G
Corollary 4. If has a -root phylogeny , then for any Steiner point , ( ) for some critical clique . Lemma 8. If has a -root phylogeny in which vertices in a critical clique are adjacent to Steiner points 1 2 , then ( ) ? 2 for any pair and , and no vertices in other critical cliques can be adjacent to Steiner points in the minimal induced subtree in containing 1 2 . We also say in this case that is adjacent to Steiner points 1 2 . Obviously, if 2 and S-deg( 1 ) 3, then we may move the vertices adjacent to 1 to be adjacent to 2 . The resultant tree is still a -root phylogeny, and thus we may assume that when 2, S-deg( ) 2 for all , 1 . Lemma 9. For two critical cliques 1 and 2 in , if 1 \ ( 2) =6 ; then 1 ( 2 ). Proof. From 1 \ ( 2 ) = 6 ;, we derive that every vertex of 1 is adjacent to all vertices in 2 , that is, 1 ( 2). 2 Notice that for two critical cliques 1 and 2 in , 1 ( 2 ) if and only if 2 ( 1 ) (but it is impossible that 1 = ( 2 ) and 2 = ( 1 ) hold simultaneously). And if 1 ( 2 ), then 1 and 2 are adjacent. Lemma 10. Every vertex of belongs to exactly one critical clique. G
N s
k
T
K
G
k
T
K
s ;s ;
i
s
K
; sl
d T si ; s j
k
j
T
s ;s ;
; sl
K
l
s ;s ;
s
s
k
l
si
i
K
K
; sl
s
K
i
G
l
K
N K
N K
K
N K
K
K
K
N K
K
K
N K
K
K
K
N K
G K
N K
N K
K
K
N K
K
G
Proof. It is clear that every vertex
2
v
belongs to some critical clique. Suppose
\ 2 , and there is a vertex 2 1 n 2 . That means 2 ( 2 ) and thus 1 ( 2 ) by Lemma 9, which implies that 1 \ 2 = ;, a contradiction. 2
v
K1 K
K
u
N K
K
K
u
K
N K
K
From the above two lemmas we see that graph G induces a critical clique graph, denoted as C C (G), in which a node is a critical clique in G and two nodes
are adjacent whenever the two corresponding critical cliques are adjacent. Let ( ), and E the edge set. If is chordal and we have all the maximal cliques at hand, we may spend ( ) time to partition (if necessary) every maximal clique into critical cliques. In fact, vertices belonging to a same set of maximal cliques form a critical clique. Therefore, we have
K denote the node set in
CC G
G
O e
11
Theorem 3. There is an O(n + e) time algorithm to check if a graph chordal and, if so, to construct the critical clique graph C C (G) = (K; E ).
G
is
4.2 k = 3 Lemma 11. If has a 3-root phylogeny, then it has a 3-root phylogeny in which G
every critical clique of G is adjacent to exactly one Steiner point.
Proof. From Lemma 8 we know that a critical clique can be adjacent to one Steiner point or to two adjacent Steiner points. Let be a critical clique which is adjacent to two adjacent Steiner points, say 1 and 2 , in a 3-root phylogeny. It is easy to check that is a complete connected component in . Therefore, we may collapse 1 and 2 into one Steiner point. Repeat this \collapsing" process until we arrive at a 3-root phylogeny in which all critical cliques are adjacent to exactly one Steiner point. 2 In the following we will only consider 3-root phylogenies where every critical clique of is adjacent to exactly one Steiner point in . Such a Steiner point is called the representing Steiner point of and is denoted by ( ). Lemma 12. If has a 3-root phylogeny and two critical cliques 1 and 2 are adjacent, then ( 1 ) and ( 2 ) are adjacent in . Theorem 4. Problem P3RP is solvable in ( + )-time. Proof. By Theorem 3, given a graph , we can spend ( + ) time to determine if is chordal, and if so, to compute the graph ( ) = (K E ). For a node 2 K, if j j = 1 and it is a leaf or an isolated node in ( ), it is included into subset . We then apply the Algorithm Restricted 1-Root Steiner Tree to compute an -restricted 1-root Steiner tree for ( ). If the Algorithm returns No, then we conclude that has no 3-root phylogeny. Otherwise, we substitute node by the representing Steiner point ( ) for in the -restricted 1-root Steiner tree, and then attach the vertices in to ( ). The resultant tree is a 3-root phylogeny for . Notice that jKj and jEj . Therefore, we can determine if has a 3-root phylogeny in ( + ) time, and if so, demonstrate one such phylogeny. 2 Graph illustrated in Figure 4(a) has 13 connected components. It is easy to check that those shaded vertices each forms a critical clique having degree 0 or 1 in ( ), which is isomorphic to the graph shown in Figure 2(a). It follows that we have a 3-root phylogeny for , as shown in Figure 4(b). K
s
s
K
s
G
s
T
K
G
T
K
G
s K
T
s K
K
s K
O n
e
G
O n
G
e
CC G
K
K
T
K
;
CC G
S
S
CC G
G
K
s K
K
K
G
n
G
O n
S
s K
e
e
G
CC G
G
4.3
k=4
Unlike the fact that if ( ) has a cycle then has no 3-root phylogeny, ( ) may contain cycles for graph that has a 4-root phylogeny, but such cycles are limited to have size 3, as we shall see below. CC G
G
G
CC G
12
(a) Graph G
(b) 3-root phylogeny
Fig. 4. An instance of P3RP.
Lemma 13. If
G has a 4-root phylogeny, then there is a 4-root phylogeny in which every critical clique is { either adjacent to one Steiner point, { or adjacent to two non-adjacent leaf Steiner points, { or adjacent to two adjacent internal Steiner points. Proof. Suppose that in the phylogeny there are two adjacent Steiner points s1
and 2 adjacent to vertices in some critical clique, say , and 1 is a leaf Steiner point, then clearly collapsing 1 to 2 would form a new 4-root phylogeny. On the other hand, if there are two non-adjacent Steiner points 1 and 2 adjacent to vertices in , and 1 is an internal Steiner point, then we may move the vertices adjacent to 1 to be adjacent to 2 and if degree of 1 becomes 2 we delete it from the tree. Repeat the above two processes we will arrive at a 4-root phylogeny in which if a critical clique is adjacent to more than two Steiner points, then these Steiner points are all leaf Steiner points adjacent to a same internal Steiner point. In this case, we may keep only two of them while move the vertices adjacent to the others to be adjacent to these two. Note that a critical clique adjacent to two non-adjacent leaf Steiner points must have size at least 4. 2 Corollary 5. There is a 4-root phylogeny in which, s
K
s
s
s
s
K
s
s
s
s
s
1. suppose that critical clique K is adjacent to two non-adjacent leaf Steiner points s1 and s2 and they are adjacent to s3 , then S-deg(s3 ) = 3 and Vdeg(s3 ) = 0. We call s3 the representing Steiner point of K . 2. suppose that critical clique K is adjacent to two adjacent internal Steiner points s1 and s2 , then S-deg(s1 ) = S-deg(s2 ) = 2. We call s1 and s2 the representing Steiner points of K . Proof. We may start with a phylogeny as in Lemma 13. Therefore, V-deg(s3 ) = 0.
If to the contrary that S-deg( 3 ) s
>
3, then the vertices adjacent to
s1
can be
13
moved to be adjacent to 2 and 1 can be deleted. This will form a new 4-root phylogeny while does not change the adjacency status of other critical cliques. For the second part, if to the contrary say S-deg( 1 ) 2, then the vertices adjacent to 1 can be moved to be adjacent to 2 . This will form a new 4root phylogeny and again does not change the adjacency status of other critical cliques. The same argument applies to 2 if S-deg( 1 ) = 2 and S-deg( 2 ) 2. 2 From Lemma 13 and Corollary 5, we arrive at a similar situation as in the case = 3. However, rather than having one representing structure for a critical clique , this time we have three R(epresenting-)S(tructure)s, namely, (RS1) all vertices are adjacent to a Steiner point, (RS2) vertices are (arbitrarily) partitioned into two (non-empty) parts each adjacent to a Steiner point and the two Steiner points are adjacent, (RS3) vertices are (arbitrarily) partitioned into two (non-empty, non-singular) parts each adjacent to a Steiner point and the two Steiner points are both adjacent to a third Steiner point. We say that the critical clique is in RS1, RS2, and RS3, respectively. Note that in RS2 (RS3), the two representing Steiner points have S-degree 2 (the representing Steiner point has S-degree 3, respectively), thus it can represent critical cliques of degree up to two (one, respectively) in a 4-root phylogeny. We only consider 4-root phylogenies of in which every critical clique is in one of RS1, RS2, and RS3. Lemma 14. If two critical cliques are adjacent then in any 4-root phylogeny s
s
s
s
>
s
s
s
s
>
k
K
G
one of them must be in RS1. Lemma 15. Suppose G is connected and it has a 4-root phylogeny, and a degree1 critical clique K1 (jK1 j 4) is adjacent to critical clique K2 , then there is a 4-root phylogeny in which K1 is in RS3 and its representing Steiner point is adjacent to the representing Steiner point of K2 , which is in RS1. Proof. Notice rst that K1 cannot be in RS2, and if it is already in RS3 we are
done. Therefore, we assume that 1 is in RS1. Let 1 denote its representing Steiner point, and 2 denote the representing Steiner point of 2 (that is closer to 1 if there are two). Our rst claim is that 1 and 2 are adjacent, since otherwise either 1 has degree greater than one or is disconnected. Note also that deg( 2 ) 2, and thus 2 cannot be in RS3. If it is in RS2, then deg( 2 ) = 2 and we may let 1 be in RS3 while collapse the two representing Steiner points of 2 into one Steiner point, that is, let 2 be in RS1. Note that this doesn't change adjacency status of critical cliques other than 2, then 2 must be in RS1 and we may let 1 be in 1 and 2 . If deg( 2 ) RS3 without changing adjacency status of critical cliques other than 1 . 2 Lemma 16. Suppose is connected and it has a 4-root phylogeny, then there is a 4-root phylogeny in which a critical clique is in RS2 if and only if its degree is 2, its two neighboring critical cliques are not adjacent, and none of them is in K
s
s
K
s
s
K
G
K
K
K
K
K
K
K
s
K
>
K
K
K
K
G
RS3.
14
Proof. Notice rst that if a critical clique is in RS2, then its degree is at most 2,
and if its degree is two then the two neighboring critical cliques are not adjacent and none of them is in RS3. If deg( ) = 1, then we may collapse 1 and 2 into one Steiner point. This way, we will nally arrive at a 4-root phylogeny in which every critical clique in RS2 has degree 2. On the other hand, suppose that there is a degree-2 critical clique which is in RS1 (note that it is impossible to be in RS3) in the phylogeny, let 1 denote its representing Steiner point. Since the two neighboring critical cliques of are not adjacent and none of them is in RS3, we conclude that their representing Steiner points (if someone has two, choose the one closer to 1 ), say 2 and 3 , must be at least 3 distant apart. Suppose without loss of generality that 2 and 1 are (exactly) 2 distant apart, then the critical clique, say 2 , adjacent to 2 should be in RS1. Let 4 denote the Steiner point on the path connecting 1 and 2 . It follows that the critical cliques adjacent to Steiner points in the other branches attaching at 4 are not adjacent to any critical cliques adjacent to Steiner points in branches attaching at 4 containing 1 and 2 . In other words, is disconnected. This contradiction indicates that must be in RS2. 2 K
s
s
K
s
K
s
s
s
s
s
K
s
s
s
s
s
s
s
G
s
K
Theorem 5. Suppose G is connected, then there is an O(n + e) algorithm which determines if G has a 4-root phylogeny, and if so, demonstrates one such phylogeny. Proof. By Theorem 3 we spend ( + ) time to determine if is chordal, and if so, to construct the graph ( ) = (K E ). By Lemmas 15 and 16, we have completely determined for every critical clique which type of RS it is in. That is, for each degree-1 critical clique whose size is greater than or equal to 4, we let it be in RS3; for each degree-2 critical clique whose two neighboring critical cliques are not adjacent and none of them is in RS3, we let it be in RS2 (and thus its size must be at least 2); for any other critical clique, we let it be in RS1. Notice that for two representing Steiner points in an RS2, which one of them is adjacent to the (unique) representing Steiner point of a neighboring critical clique is not a problem, and in fact could be arbitrarily. And in the case that we encounter two adjacent critical cliques are both in RS2, by Lemma 14 has no 4-root phylogeny and we may stop here. Suppose does have a 4-root phylogeny and there is an RS2 in the phylogeny, let 1 and 2 denote the two representing Steiner points in the RS2 and 3 denote the other Steiner point adjacent to 2 , which, by Lemma 14, is the unique representing Steiner point for some neighboring critical clique in RS1. If we delete 1 and 2 and their adjacent vertices from the phylogeny, we will be left with two branches: representing Steiner points in the branch containing 3 and representing Steiner points in the other branch (not containing 3 ) are at least 3 distant apart, namely, the critical clique in RS2 is a cut vertex in ( ). It follows that we may delete all critical cliques in RS2 from ( ) and turn to consider every connected component separately, and after that we connect them together via those RS2's. O n
CC G
e
G
;
G
G
s
s
s
s
s
s
s
s
CC G
CC G
15
Let us consider the component containing 3 . Let C denote this component and the critical clique that 3 represents. C might be a singleton, in that case must have size at least two. If in C a maximal clique containing (as a node) has size two then the other critical clique must be in RS3; otherwise we may conclude that has no 4-root phylogeny. We note that two maximal cliques in C both containing shouldn't have other node in common; otherwise wouldn't have a 4-root phylogeny. Another trivial observation is that in C a critical clique in RS3 must have its representing Steiner point adjacent to the unique representing Steiner point of its neighboring critical clique. Therefore, critical cliques in RS3 are not interesting anymore. Let C denote the induced subgraph of C whose set of nodes K includes all critical cliques in RS1 in C . If for some 2 K, j j = 1 and in ( ) no critical clique in RS3 or in RS2 is adjacent to it, it is selected into subset . We then apply Algorithm Restricted 2-Root Steiner Tree to compute a restricted 2-root Steiner tree for graph C = (K E ). If the Algorithm returns No then we can conclude that has no 4-root phylogeny; Otherwise, we substitute node in the -restricted 2-root Steiner tree by the representing Steiner point ( ) of , and attach vertices in to ( ). After that, add RS3's for critical cliques in C n C , by attaching their representing Steiner points to appropriate ( ) for some 2 K. And then, add RS2's for the deleted critical cliques, by connecting their representing Steiner points appropriately to the representing Steiner points of neighboring critical cliques. This gives a tree that is a 4-root phylogeny for . Notice that we can deal with each component, for example, C , separately, the total time consumed is ( + ). That is, in ( + ) time, we can determine if a graph has a 4-root phylogeny, and if so, demonstrate one such phylogeny. 2 s
K
s
K
K
G
K
G
0
K
K
CC G
S
S
0
;
G
K
s K
S
K
K
s K
0
s K
K
G
O n
e
O n
e
G
5 Concluding Remarks We considered in this paper the problems P RP for 4 and RS RP for 2. By examining the interconnecting structures of -root phylogenies and Steiner trees, we presented linear time algorithms to determine if a given graph = ( ) has a -root phylogeny for 4 (or an -restricted -root Steiner tree for 2), and if so, demonstrate one such phylogeny (or Steiner tree, respectively). We also characterized some properties for graphs having a -root phylogeny for general , and some structural properties of -root phylogenies. The de nition of critical clique is crucial in the designing of algorithms for P RP when = 3 and = 4. We believe that it will continue to play an important role in the discussions for large . Notice that P RP and RS( ? 2)RP are closely related, for 4. Does this relation exist for larger ? A key step towards establishing this relation would be to eliminate critical cliques in RS2. Understanding similar representing structures for larger would be very important. Since there may be errors in the input graph, one may consider the approximation version of P RP: Given a graph , nd a -root phylogeny whose th k
k
k
G
k
k
V; E
k
k
S
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
G
k
T
k
16
phylogenetic power diers from by the smallest number of edges. The approximation version of T RP has been shown to be NP-hard [3]. We suspect that the approximation version of P RP, as well as that of S RP and RS RP, are also NP-hard. The next natural question is then how well can we approximate these problems. G
k
k
k
k
References 1. A. Brandstadt, V.B. Le, and J.P. Spinrad. Graph classes: a survey. SIAM Monographs on Discrete Mathematics and Applications, 1999. 2. F. Gavril. Algorithms for minimum coloring, maximum clique, minimum covering by cliques, and maximum independent set of a chordal graph. SIAM Journal on Computing, 1:180{187, 1972. 3. P.E. Kearney and D.G. Corneil. Tree powers. Journal of Algorithms, 29:111{131, 1998. 4. Y.-L. Lin and S.S. Skiena. Algorithms for square roots of graphs. SIAM Journal on Discrete Mathematics, 8:99{118, 1995. 5. R. Motwani and M. Sudan. Computing roots of graphs is hard. Discrete Applied Mathematics, 54:81{88, 1994. 6. N. Nishimura, P. Ragde, and D.M. Thilikos. On graph powers for leaf-labeled trees. In Proceedings of the Seventh Scandinavian Workshop on Algorithm Theory (SWAT 2000), 2000. To appear. 7. D.J. Rose, R.E. Tarjan, and G.S. Lueker. Algorithmic aspects of vertex elimination on graphs. SIAM Journal on Computing, 5:266{283, 1976. 8. R.E. Tarjan and M. Yannakakis. Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. SIAM Journal on Computing, 13:566{579, 1984.