A Fully Dynamic Algorithm for Maintaining the ... - Semantic Scholar

Report 3 Downloads 69 Views
A Fully Dynamic Algorithm for Maintaining the Transitive Closure Valerie King  Garry Sagert y Abstract

This paper presents an ecient fully dynamic graph algorithm for maintaining the transitive closure of a directed graph. The algorithm updates the adjacency matrix of the transitive closure with each update to the graph. Hence, each reachability query of the form \Is there a directed path from i to j ?" can be answered in O(1) time. The algorithm is randomized; it is correct when answering yes, but has O(1=nc ) probability of error when answering no, for any constant c. In acyclic graphs, worst case update time is O(n2 ). In general graphs, update time is O(n2+ ), where = minf.26, maximum size of a strongly connected componentg. The space complexity of the algorithm is O(n2 ).

1 Introduction This paper presents a fully dynamic graph algorithm for maintaining the transitive closure of a directed graph. A fully dynamic graph algorithm is a data structure for a graph which implements an on-line sequence of update operations that insert and delete edges in the graph, and answers queries about a given property of the graph. A dynamic algorithm should process queries quickly and must perform update operations faster than computing from scratch (as performed by the fastest \static" algorithm). A graph algorithm is said to be partially dynamic if it allows only insertions or only deletions. Researchers have been studying dynamic graph problems for over twenty years. The study of dynamic graph problems on undirected graphs has met with much success, for a number of graph properties. For a survey of the earlier work, see [6]. For more recent work, see [10, 8, 9]. Directed graph problems have proven to be much tougher and very little is known, especially for fully dynamic graph algorithms. Yet maintaining the transitive closure Department of Computer Science, University of Victoria, Victoria, BC. Work partially done at the University of Copenhagen and Hebrew University. Email: [email protected]. y Department of Computer Science, University of Victoria, Victoria, BC. Work partially done at the University of Copenhagen. Email: [email protected] 

1

of a changing directed graph is a fundamental problem, with applications to such areas as compilers, data bases, and garbage collection. This paper o ers a breakthrough in the area of fully dynamic algorithms for directed graphs, using a surprisingly simple technique. The algorithm presented here is the rst to process reachability queries quickly, indeed, to maintain the adjacency matrix of the transitive closure after each update, while assuming no lookahead, i.e., no knowledge about future updates. Unlike other dynamic graph algorithms, in one update operation, it can insert an arbitrary set of edges incident to the same vertex (in acyclic graphs, or graphs with strongly connected components containing less than n : vertices) and delete an arbitrary set of edges incident to the same vertex (in general graphs). In addition, unlike other algorithms, in acyclic graphs, it can answer a sensitivity query of the form \Is there a path from i to j not containing edge e?" in O(1) time. Let jSCC j denote the number of vertices in the largest strongly connected component in the graph, and let Ev denote an arbitrary set of edges incident to a common vertex v. For graphs containing n vertices, our results are as follows: 2 26

1. For acyclic graphs:  Update: Insert or delete an edge set Ev in O(n ) worst case time.  Reachability query: \Can vertex i reach vertex j ?" in O(1) time.  Sensitivity query: \Is there a path from i to j not containing edge e?" in O(1) time. 2. For graphs such that jSCC j  n":  Update: Insert or delete an edge set Ev in O(n ") worst case time.  Reachability query: \Can vertex i reach vertex j ?" in O(1) time. 3. For general graphs:  Update: Insert an edge, or delete an edge set Ev in O(n : ) amortized time.  Reachability query: \Can vertex i reach vertex j ?" in O(1) time. 2

2+

2 26

Algorithm (3.) uses a subroutine which computes the product of two rectangular matrices. Our update time depends on the method by which the product is computed. If simple matrix multiplication is used, the amortized update time is O(n : ). If fast square matrix !? multiplication is used then the amortized cost of executing updates on a graph is O(n !? ), where n! is the cost of multiplying two n  n matrices. Note that as long as ! > 2, the cost per update is less than the cost of multiplying two n  n matrices. For ! = 2:81 (Strassen [17]) the update time is O(n : ); for ! = n : , (Coppersmith and Winograd [2]) the update time is O(n : ). If the fast rectangular matrix multiplication technique of Huang and Pan [12] is used, the update time is O(n : ). 25

2+

2 45

2 38

2 28

2 26

2

2 1

These update times are improved if the graph is sparse. For m < n : , an improved update time is given by O(n : m= n). More improvements are possible if the size of the transitive closure is o(n ). Initialization for acyclic graphs, and graphs with small strongly connected components requires the insertion of the edges incident to each vertex in the graph, for total costs of O(n ) and O(n ") respectively. For general graphs, some of the costs for processing the initial graph may incurred after the start of the sequence of update operations. The total cost of the initialization procedure is O(n : lg n), which when amortized over a update sequence of length (n lg n), costs O(n : per update. We assume a unit cost RAM model with wordsize O(log n). The algorithm is randomized. If the answer to a reachability query is \yes" then the answer is always correct. If it is \no" then the answer is incorrect with probability O(1=nc) for any xed constant c i.e. there is a small chance a path may exist if the answer to the query is negative. The randomization is used only to reduce wordsize. If wordsize n is permitted or all paths in the graph are of constant length, then the algorithm becomes deterministic. 1 54

1 5+log

2 log

2

3

3+

3 26

2 26

1.1 Related work

There are only two previously known algorithms for fully dynamic transitive closure in directed graphs, even for the restricted class of acyclic graphs. Both of the algorithms permit only a single edge insertion or deletion in a given update. Neither gives improved running times for the special case of acyclic graphs, nor provides sensitivity queries. The most recent (1996) by S. Khanna, R. Motwani and R. Wilson [15], is not strictly a fully dynamic algorithm in that it assumes some knowledge of future update operations at the time of each update. It uses matrix multiplication as a subroutine. If fast matrix multiplication is used (i.e. ! = 2:38), then the amortized cost of an update is O(n : ) but a lookahead of (n: ) updates is required. This algorithm is deterministic, but depends heavily on the use of lookahead information. The other (1995) by Henzinger and King [8] has an amortized update time of O~ (nm !? =! ) or O~ (nm : ) if fast square matrix multiplication is used. A cost as high as (n= log n) may be incurred for each reachability query. Consequently, the adjacency matrix of the transitive closure cannot be updated with each update to the graph. While this algorithm is also Monte Carlo, its techniques are quite di erent from the algorithm presented here. Other related work includes partially dynamic algorithms. The best result for updates allowing only edge insertions is O(n) amortized time per inserted edge and O(1) time per query by Italiano (1986)[13], and by La Poutre and van Leeuwen (1987) [16]. This improved upon Ibaraki and Katoh's (1983)[11] algorithm with running time O(n ) for an arbitrary number of insertions. Also there is Yellin's (1993)[18] algorithm with cost O(m) for m insertions, where m is the number of edges in the transitive closure and  is the out-degree of the nal graph. The best deletions-only algorithm for general graphs is by La Poutre and van Leeuwen 2 18

18

(

0 58

3

3

1)

(1987)[16] and uses O(m) amortized time per edge deletion and O(1) per query. This improved upon the deletions-only algorithm of Ibaraki and Katoh (1983)[11] which has an update time of O(n ). For acyclic graphs, Italiano (1988)[14] has a deletions-only algorithm with amortized time O(n) per edge deletion and O(1) per query. There is also Yellin's (1993)[18] deletionsonly algorithm with cost O(m) for m deletions, where m is the number of edges in the transitive closure and  is the out-degree of the initial graph . There is a Monte Carlo algorithm of E. Cohen's (1997) [1] for computing static transitive closure (i.e. transitive closure from scratch) based on her linear time algorithm for estimating the size of the transitive closure. However, this algorithm has O(mn) cost in the worst case. The only known lower bound for is that for undirected connectivity. The bound is (log n= log log n) per update [7]. this has almost been matched by the upper bound for undirected connectivity. Khanna, Motwani and Wilson [15] give some evidence to suggest that dynamic graph problems on directed graphs are intrinsically harder. They show that the certi cate complexity of connectivity and other directed graph problems is (n ) rather than (n) for undirected graphs, which implies that sparsi cation, a technique useful for undirected graphs, is not applicable to directed graph problems. 2

2

2 De nitions The following de nitions are used throughout the entire paper. Other de nitions are introduced as needed.

De nition 2.1 In a graph G = (V; E ), vertex u can reach vertex v, and vertex v is reachable from vertex u i there exists a directed path u ; v.

De nition 2.2 A directed path p is a sequence p = v ; v ; :::; vk of distinct vertices 2 V , such that for every vi ; vi 2 p, (vi ; vi ) 2 E . De nition 2.3 For V = f1; 2; :::; ng, the transitive closure for G is represented by an n  n 1

+1

2

+1

matrix M G such that M G (i; j ) = 1 if there exists a directed path i ; j ; else M G (i; j ) = 0.

De nition 2.4 Let Ev denote a set of edges incident to a common vertex v. De nition 2.5 A dynamic graph is a graph in which edges or vertices are inserted or deleted. We consider updates of the form:

 insertion (deletion) of an edge (u; v) into (from) G;  insertion (deletion) of an edge set Ev adjacent to vertex v 2 V into (from) G;  insertion (deletion) of a vertex into (from) V . 4

After each update, M G (i; j ) is updated to re ect the current transitive closure. Hence, queries can be answered in O(1) time.

De nition 2.6 We consider queries of the form:  reachability: \Can vertex i reach vertex j ?" and  sensitivity: \Is there a path from i to j not containing edge e?" De nition 2.7 A unit cost RAM with wordsize x is a machine which can process each arithmetic operation in O(1) time using a wordsize x.

3 Acyclic Graphs For now, let us assume that we are working with a unit cost RAM with wordsize n. We later show how to reduce wordsize using a randomization technique. The following algorithms makes use of a simple idea: We maintain Paths(i; j ) to be equal to the number of directed paths i ; j in the current graph. Paths(i; j ) is de ned to be 1 if i = j . The n  n adjacency matrix M G representing the transitive closure of G is given by M G (i; j ) = 1 i Paths(i; j ) 6= 0, else M G (i; j ) = 0. A sensitivity query of the form: \Is there a path from i to j not using edge (u; v)?" is answered true i Paths(i; u)  Paths(v; j ) 6= Paths(i; j ). The following lemma is easy to see, as illustrated in Figure 3.1.

Lemma 3.1 Let G be a directed acyclic graph containing vertices i; j; u and v. If edge (u; v) is inserted (deleted) such that no cycles are created, the number of paths i (decreases) by Paths(i; u)  Paths(v; j ).

; j increases

It is not much more dicult to insert or delete a set of edges adjacent to a common vertex, as illustrated by Figure 3.2 and the following lemma.

Lemma 3.2 Let G be an acyclic directed graph and Ev be a set of edges incident to v.

If the edges in Ev are inserted (deleted) such that no cycles are created, then the number of paths i ; j increases (decreases) by [(number old paths i ; v) * (number new paths v ; j )] + [(number new paths i ; v)* (number old paths v ; j )] + [(number new paths i ; v)*(number new paths v ; j )].

5

paths unchanged

i

01

01

01

u Paths(i,u)

01 j

v

Paths(v,j)

Figure 3.1: Inserting a single edge The number of paths from i ; j which contain neither u nor v is unchanged during the update.

Paths(v,j)

Paths(i,v) Paths(i,u1 )

i

01

10 01 01 10 01 01 10 u1

Paths( u4 ,j) u4

Paths(i,u2 ) u2

Paths( u5 ,j) u5

Paths(i,u3 )

v

u3

u6

01

j

Paths( u6 ,j)

paths unchanged

Figure 3.2: Inserting an edge set Paths from i ; j which contain none of the u vertices, nor v, play no role in altering the number of paths from i ; j during the update. 6

Insert Acyclic(Ev; G) for all i 2 V do P From(i)

for all u 2 V do P

u;v)2Ev Paths(i; u)

(

To(j ) v;u 2Ev Paths(u; j ) for all i; j 2 V n v such that From(i) 6= 0 W To(v) 6= 0do Paths(i; j ) Paths(i; j ) + [Paths(i; v)To(j ) +From(i)Paths(v; j ) +From(i)To(j )] j v for all i; j 2 V n v such that From(i) 6= 0 W To(v) 6= 0do Paths(i; j ) Paths(i; j ) + [Paths(i; v)To(j ) +From(i)Paths(v; j ) +From(i)To(j )] i v for all i; j 2 V n v such that From(i) 6= 0 W To(v) 6= 0do Paths(i; j ) Paths(i; j ) + [Paths(i; v)To(j ) +From(i)Paths(v; j ) +From(i)To(j )] for all i; j 2 V such that Paths(i; j ) has increased from 0 do MG (i; j ) 1 (

)

Figure 3.3: Insertion routine for acyclic graphs

7

3.1 Implementation

It follows from Lemma 3.2 that the algorithm Insert Acyclic(Ev ; G) in Figure 3.3 correctly updates Paths() when Ev is inserted into G. The arrays From() and To() are used as temporary storage, allowing each summation to be carried out only once per vertex each update. Note that we update paths beginning or ending with v after all other paths, because we're assuming that Paths() contains \old numbers of paths". Similarly, it follows that Delete Acyclic(Ev ; G) in Figure 3.4 correctly updates Paths() when Ev is deleted from G. Again, to avoid disturbing our assumptions about the states of Paths(i; v) and Paths(v; j ), we update paths beginning or ending with v before all other paths.

Delete Acyclic(Ev; G) for all i 2 V suchPthat Paths(i; u) 6= 0 do u;v)2Ev Paths(i; u)

From(i)

(

for all j 2 V such P that Paths(u; j ) 6= 0do

To(j ) v;u 2Ev Paths(u; j ) j v for all i; j 2 V n v such that From(i) 6= 0 W To(v) 6= 0do Paths(i; j ) Paths(i; j ) ? [Paths(i; v)To(j ) +From(i)Paths(v; j ) +From(i)To(j )] i v for all i; j 2 V n v such that From(i) 6= 0 W To(v) 6= 0do Paths(i; j ) Paths(i; j ) ? [Paths(i; v)To(j ) +From(i)Paths(v; j ) +From(Wi)To(j )] for all i; j 2 V n v such that From(i) 6= 0 To(v) 6= 0do Paths(i; j ) Paths(i; j ) ? [Paths(i; v)To(j ) +From(i)Paths(v; j ) +From(i)To(j )] for all i; j 2 V such that Paths(i; j ) has decreased to 0 do MG (i; j ) 0 (

)

Figure 3.4: Deletion routine for acyclic graphs

To initialize: Set Paths(i; j ) = M G (i; j ) = 0 for all i; j and then apply Insert Acyclic(Ev ; G) for each v 2 V and each set Ev of outgoing edges incident to v. Analysis: If the algorithm is implemented so that for each vertex u, there is a list of vertices 8

v such that Paths(v; u) 6= 0, and a list of w such that Paths(u; w) 6= 0, then the algorithm can perform updates in time proportional to the number of edges in the transitive closure. The algorithm requires the computation of a sum of up to jEv j = O(n) numbers for each vertex. Also, for each pair of vertices, a constant number of operations are performed, for a total of O(n ) time per update. The initialization cost is O(n ) per vertex or O(n ) in total. 2

2

3

3.2 Extensions to multigraphs and vertex insertions and deletions

It is not dicult to see that the same algorithm above also works for multigraphs i.e. graphs with more than one edge between a pair of vertices. We view each copy of an edge as a distinct path between its endpoints. If jEv j > n, then the sums can still be computed in time O(n) per vertex by rst determining the multiplicity of each edge in Ev . Let xu and yu denote number of times (u; v) and (v; u), respectively, appear in Ev . Then the computations of the From and To become respectively:

X

From(i)

u;v)2Ev

xu Paths(i; u)

(

and

X

To(j )

v;u)2Ev

yuPaths(u; j )

(

Inserting and deleting isolated vertices can be done easily. Depending upon the implementation details, this can be done in O(1) time.

3.3 Reducing wordsize

The algorithm as stated above may produce numbers as large as 2n as there may be 2n distinct paths in an acyclic graph (without multiple edges). If we are to assume the arithmetic operations can be done in unit time, then it is usual to assume a wordsize of O(log n). To reduce the wordsize to 2c lg n, c  5, the algorithm begins by randomly picking a prime p of value between nc and nc . All operations (additions, subtractions, multiplications) are performed modulo p and can be done in constant time with wordsize 2c lg n. As the number of computations involving a particular prime increases, so do the chances of getting a \false zero", i.e., the result of some computation may equal 0 mod p with wordsize 2c lg n when the result is not equal to 0 when computed with wordsize n. To keep the probability of false zeroes low, the algorithm chooses a new prime every n update operations and reinitializes the data structures. To preserve O(n ) worst case time, the steps of the reinitialization with a new prime may be interleaved with the operations involving the current prime. We observe the following: +1

2

9

Lemma 3.3 If O(nk ) arithmetic computations involving numbers no greater than 2n are

performed modulo a random prime p of value (nc), then the probability that a false zero arises is O(1=nc?k?1). Proof: There are O(n= log n) prime divisors of value (nc) which divide a number no greater than 2n, and therefore O(nk+1= log n) prime divisors of any of the numbers generated. By the Prime Number Theorem, see [3], there are approximately (nc= log n) primes of value (nc). Hence the probability that a random prime of value (nc) divides any of the numbers generated is O(1=nc?k?1). Corollary 3.4 During a sequence of n updates taking time O(n2+ ), the probability that after any update, there is a pair of vertices i; j such that M G (i; j ) = 0 and i ; j is O(1=nc?4? ). In particular, for the algorithms just shown, = 0. For the remainder of the paper, we assume all operations are done modulo a random prime.

4 Graphs with Small Strongly Connected Components In this section we describe a fully dynamic transitive closure algorithm which works quickly when the strongly connected components are small. In the next section, we show how to deal with big strongly connected components. A strongly connected component (SCC) is a maximal subset of vertices C  V such that for every pair of vertices u; v 2 C , there exists a path u ; v, and a path v ; u. A vertex which is contained in no cycles is a strongly connected component of size one. The idea here is to represent each strongly connected component in the original graph G by a single \super" vertex in the compressed graph Gc. We then use the acyclic algorithm from the previous section to maintain the connectivity of Gc, which is acyclic. When a strongly connected component in G is no longer strongly connected, we \expand" the corresponding super vertex in Gc. We formalize these ideas: De nition 4.1 Given a general graph G = (V; E ), for each v 2 V , let c(v) denote the (maximal) strongly connected component containing v. De nition 4.2 For any subset of edges E 0  E and graph G, de ne E 0c to be the compressed multiset with respect to E 0 as follows: Edge (I; J ) appears with multiplicity k in E 0c, where k = jf(u; v) 2 E 0 j c(u) = I; c(v) = J and c(u) 6= c(v)gj. De nition 4.3 The graph Gc = (V c; E c) is the compressed acyclic multigraph with respect to G, where V c is the set of strongly connected components of G, and E c is the compressed multiset with respect to E . Figure 4.1 illustrates this. As a convention, we use lowercase letters to denote vertices in V and uppercase letters to denote vertices in V c. We de ne Paths(I; J ) for I; J 2 V c as the number of paths from I to J in Gc. The answer to a reachability query: \Is vertex j reachable from vertex i?" is given by M G (i; j ). 10

c

11 00 00 111 000 11 00 11 00 11 00 11 1111 0000 11111 00000 11111 00000 00 11 00 11 1 0 1 0 1c 0 0 1e a b

G:

G;

d

000000 111111 111110 1a 0 100000 0 1e bcd

Figure 4.1: The compressed acyclic multigraph: The vertices b; c and d which make up the strongly connected component are \compressed" into one vertex in Gc.

4.1 Update Procedures

Both the insertion and deletion routines make use of a linear time algorithm for nding the strongly connected components of a graph which we call Find SCCs(G)[3]. We assume that this algorithm maintains for each vertex v its strongly connected component c(v). The update routines keep a representation of the original graph G, as well as the compressed graph Gc. The routine Insert Small(Ev ; G) in Figure 4.2 inserts an arbitrary set Ev of edges incident to vertex v into G, as illustrated by Figure 4.3.

Insert Small(Ev ; G)

Cold c(v) E E [ Ev Run Find SCCs on G. if Cold = c(v) fno new SCCs have been formed by the new insertionsg then Insert Acyclic(Evc; Gc) to insert compressed multiset Evc into Gc. else let C ; :::; Ck be the old SCCs which now compose c(v). for all i = 1; :::; k do Let ECi be the set of edges in E c with at least one end in Ci. Delete Acyclic(ECi ; Gc) to remove these edges from Gc. Remove Ci from V c. Add a vertex labelled c(v) to V c. Let EC be the set of edges in E with one endpoint in C . Insert Acyclic(ECc ; Gc) to add the compressed multiset ECc to E c. for all i; j such that Paths(c(i); c(j )) is changed from 0 do MG (i; j ) 1 1

Figure 4.2: Insertion routine for graphs containing small strongly connected com-

ponents

11

l 11 11 00 100 0 000 11 00 1 11 00 11 00 11 0011 11 00k0 0 0 1 1 1 0 1 j inserted edge 1 0

0 1 1 0

0 0 1 1j 1 0

0 1 1 0

1 0

jkl

1 0 l

0 1 1 0

0 1 1k 0 0 1 0 1 1 0

1 0

jkl

Figure 4.3: Compressing a strongly connected component: When an insertion creates a strongly connected component, the essential steps of compression in Gc are: 1. Remove all edges incident to strongly connected vertices; 2. Replace the strongly connected vertices with a super vertex; 3. Add the incident edges back in. Expanding a super vertex is essentially the same procedure, but executed in reverse order. The deletion algorithm Delete Small(Ev ; G) is shown in Figure 4.4. To initialize: For every vertex v, insert the outgoing edges Ev incident from v using Insert Small(Ev ; G). Analysis: The cost of the algorithms is dominated by the calls to Insert Acyclic(Evc; Gc) and Delete Acyclic(Evc; Gc), each of which has O(n ) cost. If no new strongly connected components have been destroyed or created, then there is a single such call. A deletion may cause a vertex of V c, which represents a strongly connected component consisting of s vertices of V , to be removed and replaced by up to s new vertices of V c. An insertion which results in the creation of a new strongly connected component containing s vertices of V may require the deletion of up to s vertices of V c. Each insertion and deletion of a vertex of V c requires a call to Insert Acyclic(Evc; Gc) and Delete Acyclic(Evc; Gc), respectively. Thus the total cost is O(n s). If we assume an upper bound of n" on the size of any strongly connected component, then the total cost O(n "). Initialization requires n applications of Insert Small(Ev ; Gc) which takes time O(n ") in total. 2

2

2+

3+

5 General Graphs

We brie y describe the main idea: We maintain a graph G0 = (V; E 0) such that G0 contains no big strongly connected components (see Figure 5.1). The edge set E 0  E contains all edges between vertices which are not both in the same big component (and possibly other edges). This property ensures that there is a path from i to j in G0 if there is a path 12

Delete Small(Ev ; G) Cold c(v) E ! E n Ev Run Find SCCs(G) to determine new c(v). if Cold = c(v) then Delete Acyclic(Evc; Gc) to delete Evc from Gc. else Let C ; C ; ::; Ck be the new SCCs into which Cold has decomposed. Let Ec v  E c be the set of all edges incident to c(v) in Gc. Delete Acyclic(Ec v ; Gc) to remove Ec v . Remove c(v) from Gc. for all i = 1; :::; k do Insert a vertex labelled Ci into Vc. Let ECi be the set of edges in E with one end in Ci. Insert Acyclic(ECc i ; Gc) to add ECc i to E c. for all i; j such that Paths(c(i); c(j )) is changed to 0 do MG (i; j ) 0 1

2

( )

( )

( )

Figure 4.4: Deletion routine for graphs containing small strongly connected com-

ponents

13

from i to j in G which does not include any intermediate vertices contained in big strongly connected components. Then a path in G can be described by a concatenation of paths in G0, where two such paths are joined i the end of the rst is contained in the same big component as the start of the second. The transitive closure of G0 is maintained using the Insert Small and Delete Small routines of the previous section. After each update, paths in G0 are concatenated as described to determine all paths of G. Initially, E 0 contains only edges of E whose endpoints are not both contained in the same big component. An edge in E n E 0 is added to E 0 if and when both its endpoints cease to be in the same big component. Once an edge is added to E 0, it is not removed unless it is deleted from G. So, if a big component forms again due to an edge insertion, only the newly inserted edge which causes the big component to form is omitted from E 0. G;

1 1 0 0 00 11 00 11

1 0

1 0

01 01

1 0

G’;

1 0 1 0

1 1 0 0 00 11 00 11

1 0

1 0

"missing" edges

1 0 1 0

1 0 1 0

1 0

Figure 5.1: The graph G0 with \missing" edges: The edges which would cause a big strongly connected component are simply not inserted into G0. We de ne the following: De nition 5.1 A strongly connected component is big if it contains more than n" vertices.

De nition 5.2 Let M G0 be the n  n matrix representing the transitive closure of G0, as maintained by our algorithm for small components.

De nition 5.3 Let c(v) denote the name of the strongly connected component in G con-

taining vertex v.

De nition 5.4 Let b(v) denote the name of the big strongly connected component in G containing vertex v, provided v is in a big component.

De nition 5.5 Let V B = fB ; B ; :::; Bk g be the set of big strongly connected components of G. Since each has size greater than n", k  n ?" . 1

2

1

14

De nition 5.6 Let B be a k  k Boolean matrix such that B (I; J ) = 1 i there exists a pair of vertices i 2 BI and j 2 BJ such that M G0 (i; j ) = 1. B (I; I ) is de ned to be 1 for all I . That is, B is the adjacency matrix for the graph whose vertices are the big strongly connected components of G; there is an edge from big strongly connected component BI to BJ i some vertex in BI can reach some vertex in BJ in G0. G0 B be the n  k Boolean matrix such that M G0 B (i; J ) = 1 i De nition 5.7 Let M M G0 (i; j ) = 1 for any j 2 J , i.e., i there is a path in G0 from vertex i to some vertex

in big component J .

G0 be the k  n Boolean matrix such that BM G0 (I; j ) = 1 i De nition 5.8 Let BM M G0 (i; j ) = 1 for any i 2 I , i.e., if there is a path in G0 from some vertex in big com-

ponent I to vertex j .

G0 to generate The0 subroutine Generate Matrices uses the output of Find SCCs and M B , M0G B , BM G0 , and the transitive closure M B of B . Then it multiplies M G0 B  M B 0 0 BM G . 0It is not hard to see that the elementwise logical 0OR of (M G B  M B  BM G ) and M G is the transitive closure of G, 0 assuming that M G is correct. If there are no big components, then it simply outputs M G . The Insert routine in Figure 5.2 inserts a set of edges incident to a common vertex with one restriction: We require that at most one of these edges has both its endpoints in the same big strongly connected component. of the resulting graph. Otherwise, it is possible that the edges would be inserted into G0 at di erent times in the future, each time requiring a call to Insert small, and adding to the cost. Edges in E n E 0 are marked.

Insert(Ev ) E E [ Ev

Run Find SCCs(G) to nd b(v) for each vertex v. Let (u; v) be the one edge (if any) such that b(u) = b(v). Mark (u; v). Insert Small(Ev n f(u; v)g; G0). Generate Matrices Figure 5.2: Insertion routine for general graphs The delete routine in Figure 5.3 uses the Delete Small operation to delete edges between small components. It then goes through each marked edge, i.e., each edge in E n E 0. If the edge does not connect two vertices in the same big strongly connected component, it unmarks the edge and calls Insert Small to insert the edge into E 0. 15

Delete(Ev ) E E n Ev

Run Find SCCs(G) to nd b(v) for each vertex v. Delete Small(Ev \ E 0; G0) Let Eins be the set of marked edges f(u; w)j such that b(u) 6= b(w)g. Unmark all edges in Eins. while Eins 6= ; do Let (u; w) be an arbitrary edge in Eins. Let Eu and Ew be the edges in Eins incident to u and w, respectively. Insert(Eu) and Insert(Eu) to insert Eu and Ew , respectively, into G0 . Remove Eu and Ew from Eins. Figure 5.3: Deletion routine for general graphs

To initialize: No edges which have endpoints in the same big strongly connected component

are inserted into G0. All such edges are marked and will be inserted by the deletion routine only when both endpoints cease to be in the same big strongly connected component. Figure 5.4 presents the initialization routine.

Initialize(G)

Run Find SCCs(G) to nd b(v) for each vertex v. Let Einit be the set of edges f(u; w)j such that b(u) 6= b(w)g . Mark all edges in E n Einit . while Einit 6= ; do Let (u; w) be an arbitrary edge in Einit. Let Eu and Ew be the edges in Einit incident to u and w, respectively Insert(Eu) and Insert(Eu) to insert Eu and Ew , respectively, into G0 . Remove Eu and Ew from Einit . Figure 5.4: Initialization routine for general graphs

Analysis: It is easy to see how0 the Generate Matrices routine can generate the matrices 0 0

BM G , M G B , and B from M G and the results of the SCC algorithm in O(n ) time and we leave this to the reader. The transitive closure of an n  n matrix can be performed in O(n! ) where where O(n! ) is the cost of multiplying two n  n matrices [4]. Since there are no more than n ?" big 2

1

16

components, nding the transitive closure of B0 takes at most O(n! ?" ) time. 0 The cost of multiplying M G B  M B  BM G depends on the technique used. The asymptotically fastest way is with fast rectanguar matrix multiplication [12] in time n! ; ?"; , where ! (1; 1 ? ";1) = (1

)

(1 1

1

? )0 log q

(1

 log

BB B@

((1 ? ") ) ?" (2(1 ? )) ? ((1 ? ")(1 ? ) + 2 ) ?" ? (q + 2) ?" ((1

) )

2(1

)

((1

(2+(1

)(1

))



)+2 )

1)

1 CC CA

?"))(2+(1?"))

(2+(1

where q is an integer and is a small value, typically between 0.005 and 0.05 which minimizes the equation. To use square matrix multiplication instead, we observe that multiplying an n  k matrix by a k  n matrix can be done with (n=k) multiplications of k  k matrices. Here, k  n ?", and the multiplication costs no more than O((n") n! ?" ), or O(n " !?!"). Each insertion calls the Insert Small operation once, for an additional cost of O(n "). Each deletion may call the Insert Small operation once for each edge in E n E 0 which is added to E 0 . This cost can be charged to the Insert operation which inserted the edge into E , since we have restricted the Insert operation so that in a single update only one edge is added to into E n E 0 . Note that the cost of an insert might not be fully realized until a later deletion. If fast rectangular multiplication is used the total amortized cost is O(n " + n! ; ?"; ). Experimentally we found that this value is minimized when " = :257::, q = 10 and = 0:0226, yielding a running time of O(n : ). To nd these numbers, we wrote a computer program which looped over a range of integer values for q 2 0; :::; 30, and real values for 2 0:005; :::; 0:05 and " 2 0:::1. The program calculated 2 + " ? !(1; 1 ? "; 1) for every combination and kept track of the best results encountered. The program used an increment size of 0:0001 for non-integers. If square matrix multiplication is used the total amortized cost is O(n " + n " !?!"). ! ? Setting " to !!?? yields a sum of O(n !? ). If the graph is sparse and the total number of edges m is less than n : then every pair of vertices joined by a path which runs through some Bi can be found by using depth- rst seach to determine all vertices which can reach big component Bi and another depth- rst search to determine all vertices which can be reached by Bi. To do this for each Bi costs O(m) per Bi or O(mn ?"). The total cost per update is O(n " + mn ?") or O(n m=n = n) for " = log(m=n)=2 log n. To eciently insert the initial edges, the insertion routine nds an approximate vertex cover on edges which are ready for insertion, i.e., the endpoints of each edge lie in di erent big components. The rest are marked. The marked edges are inserted as they become ready, by the deletion algorithm. To eciently insert a set of edges, both algorithms nd a vertex 2

1

2

(1

)

2 +

2+

2+

(1 1

2+

2 +

2+log(

) 2 log

2 26

2 1

2+

2 1

1 54

1

2+

17

1

1)

cover on such edges, whose size is no more than twice that of the optimal vertex cover [3]. To analyze the cost of initialization, we observe that the costs attributable to initial edges are not realized until some time after updates have occurred. Since our vertex cover algorithm produces vertex covers no larger than twice the size of the optimal vertex cover, it suces to reason about the optimal vertex cover. In addition, since we are only concerned with calls to Insert() required solely for the insertion of initial edges, it suces to analyze the deletions-only graph consisting of the initial edges. Now it could happen that initial edges are inserted with the same call to Insert() as some \newer" marked edges, but these insertions can be considered free, as the insertion routine would have been called anyways, regardless if the initial edges were there. It is easy to see that in the worst case, all edges are initially marked, and big strongly connected components repeatedly split in half until there are no big components left. After the rst split, all edges between the two resulting big components must be inserted. Clearly the minimal vertex cover required to perform the insertions has size n=2. Indeed, at each level of splitting, the minimal vertex cover required to perform the insertions at that level has size n=2. It follows that in the worst case, the sum of the sizes of the vertex covers required to insert the initial edges is n=2 lg n. Therefore, in the worst case, we require (n lg n) calls to Insert() to insert the initial marked edges into G0. Thus the total cost of the initialization is O(n "(n lg n)), which when amortized over a update sequence of length (n lg n), costs O(n ") per update. 2+

2+

References [1] E. Cohen, \Size-Estimation Framework with Applications to Transitive Closure and Reachability", J. of Computer and System Sciences 55, 1997, pp. 441{453. [2] D. Coppersmith and S. Winograd, \Matrix multiplication via arithmetic progressions", Journal of Symbolic Computation 9, 1990, pp. 251{280. [3] T. Cormen, C. Leiserson and R. Rivest, Introduction to Algorithms, MIT Press, 1990. [4] D. Kozen, The Design and Analysis of Algorithms, Springer-Verlag, 1992, pp. 26-7. [5] S. Even and Y. Shiloach, \An On-Line Edge-Deletion Problem", J. ACM 28, 1981, pp. 1{4. [6] J. Feigenbaum and S. Kannan, \Dynamic Graph Algorithms", in Handbook of Discrete and Combinatorial Mathematics, pp. 583-591. [7] M. L. Fredman and M. Rauch Henzinger, \Lower Bounds for Fully Dynamic Connectivity Problems in Graphs", Algorithmica 22, 1998, pp.351{362. [8] M.R. Henzinger and V. King, \Fully Dynamic Biconnectivity and Transtive Closure", FOCS, 1995, pp.664{672. 18

[9] M. R. Henzinger and M. Thorup, \Sampling to Provide or to Bound: With Applictions to Fully Dynamic Graph Algorithms," Random Structures and Algorithms 11, 1997, pp. 369{379. [10] J. Holm and K. de Lichtenberg and M. Thorup, \Poly-logarithmic deterministic fullydynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity", "Proc. 30th Symp. on Theory of Computing, 1998, pp. 79-89. [11] T. Ibaraki and N. Katoh, \On-line computation of transitive closure of graphs, Information Processing Letters, 1983, pp. 95-97. [12] X. Huang and V. Y. Pan, \Fast Rectangular Matrix Multiplication and Applications" Journal of Complexity 14 (1998), pp. 257-299. [13] G. F. Italiano, \Amortized eciency of a path retrieval data structure", Theoretical Computer Science 48, 1986, pp. 273-281. [14] G. F. Italiano, \Finding Paths and Deleting Edges in Directed Acyclic Graphs" Information Processing Letters, 1988, pp. 5-11. [15] S. Khanna, R. Motwani and R. Wilson, \Graph Certi cates and Lookahead in Dynamic Directed Graph Problems, with Applications", Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, 1996. [16] H. La Poutre and J. van Leeuwen, \Maintenance of transitive closure and transitive reduction of graphs", Proc. Workshop on Graph-Theoretic Concepts in Computer Science, LNCS 314, Springer Verlag, Berlin, 1987, pp. 106{120. [17] V. Strassen, \Gaussian elimination is not optimal", Numerische Mathematik 14, 1969, pp. 354{356. [18] D. M. Yellin, \Speeding up dynamic transitive closure for bounded degree graphs" Acta Informatica 30, 1993, pp. 369{384.

19