JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING ARTICLE NO.
35, 104–109 (1996)
0073
Embedding Large Complete Binary Trees in Hypercubes with Load Balancing KEMAL EFE Center for Advanced Computer Studies, University of Southwestern Louisiana, Lafayette, Louisiana 70504
This paper presents two methods for embedding arbitrarily large complete binary trees in fixed size hypercubes. The ability to embed arbitrarily large graphs in smaller graphs has important applications in massively parallel computing. The presented embedding methods are optimized mainly for balancing the processor loads, while minimizing dilation and congestion as far as possible. 1996 Academic Press, Inc.
In the next section we present basic definitions and notations where the criterion of optimality is defined and related to the concept of ‘‘normal’’ algorithms. In Section 3 we present an optimal embedding method that balances the processor loads. In Section 4 we present a nonoptimal embedding method which has the advantage of simplicity. In Section 5 we present our concluding remarks. 2. DEFINITIONS AND MOTIVATIONS
1. INTRODUCTION
This paper presents methods for embedding large complete binary trees in fixed size hypercubes. If implemented in a compiler, such embeddings allow the programmer to view the system as a tree machine with no bound on its size. Due to its significant practical applications, the problem of embedding trees in hypercubes has received much attention in the literature. For example, many researchers [1, 5, 6, 11, 15] have presented methods for embedding complete binary trees into their like-size hypercubes. Ho and Johnsson investigated spanning trees and balanced spanning trees in the hypercube [7]. Bhatt et al. [2, 3] investigated embedding of arbitrary trees and graphs with O(1) separators, and Monien and Sudborough [13] investigated the embedding of incomplete binary trees in their smallest size hypercubes with enough nodes. The emphasis of this paper is different from the above works, in that we address optimal embeddings of arbitrary size complete binary trees in fixed size hypercubes. Our main focus is on how to balance the load of embedding while minimizing dilation and congestion (these terms will be defined in the next section). We are motivated by the fact that a given parallel computer has a fixed number of processors by its physical design, while the size of the tree being embedded can vary depending on the application. The mapping should be optimum for all problem and system sizes. The problem of embedding large graphs in smaller hypercubes has been investigated previously [4, 8, 10, 12], and embedding methods have been developed for various structures. However, there is no paper that shows how to embed arbitrary size complete binary trees into fixed size hypercubes with load balancing. 104 0743-7315/96 $18.00 Copyright 1996 by Academic Press, Inc. All rights of reproduction in any form reserved.
Emulation of a ‘‘guest’’ graph G by a ‘‘host’’ graph H means performance of the steps of an algorithm developed for the guest on the processors of the host. The major cost measure in emulating one graph by another is the ‘‘slowdown.’’ This is the ratio of the emulation time on the host to the running time on the guest. The slowdown of emulation depends on the following cost parameters: Load. The maximum number of nodes of G mapped to any node of H. Dilation. The length of the longest path in H to which any edge of G is mapped. Congestion. The maximum number of paths (images of the edges of G) which share an edge of H. If G is not larger than H, and if the embedding is isomorphic, then there is no slowdown and the algorithm running on the system cannot tell the difference between G and H from the viewpoint of the above cost measures. When the guest graph is larger than the host graph, then the load of embedding must be at least VG /VH , where VG and VH are the numbers of nodes in the guest and host, respectively. If the load is VG /VH, then the load of embedding is balanced. In this case, the slowdown of emulation is S 5 O(VG /VH ) if dilation and congestion are O(1). When we pay special attention to the common requirements of algorithms developed for the guest architecture, it may be possible to remove the big-oh notation from the slowdown expressions, and define strict requirements for optimality of emulations. This is possible because most parallel architectures have their corresponding favorite computation types for which they are topologically well suited.
EMBEDDING TREES IN HYPERCUBES
105
For complete binary trees, a common pattern of computation has been observed by many researchers. This pattern is termed ‘‘normal’’ by Ullman [14] (also sometimes called ‘‘leveled’’ algorithms in other papers). Normal algorithms running on the binary tree architecture use only one level of the tree nodes at a time. Also, communication is between adjacent levels of tree nodes. This represents all semigroup computations, prefix computations, searching and many other divide-and-conquer algorithms. The consideration of normal algorithms allows us to identify the subset of busy tree nodes for each step of computation and balance the busy workload assigned to each hypercube node for every step of emulation. Normalized Embedding. If an embedding is specifically optimized for the needs of normal algorithms running on complete binary trees, then we call this embedding ‘‘normalized.’’ These embeddings must satisfy the following requirements: (a) the nodes at each level of the tree must be distributed equally between the nodes of the hypercube; (b) dilation of the embedding must be unit; and (c) the actual congestion of embedding must be unit. Actual congestion is the congestion which occurs during a normal computation and it is different from the graph theoretical definition above. The graph theoretical definition of congestion is an upper bound on the actual congestion. Depending on the communication patterns of a computation, the actual congestion may be less. 3. AN OPTIMAL EMBEDDING METHOD
Consider a complete binary tree Tr , where r is the number of levels. Assume that the levels of the tree are indexed as 0, ..., (r 2 1) starting from the root. In this paper, we say ‘‘higher’’ levels to refer to the levels of the tree near the root. ‘‘Lower’’ levels refer to the levels near the leaves. It is well known that Tr is not a subgraph of the r-dimensional hypercube Qr , but that if we modify the tree so that it has two roots as shown in Fig. 1, the resulting graph is a subgraph of the hypercube. This graph is frequently referred to as ‘‘double-rooted tree’’ in the literature. For a double-rooted tree, which we denote as DTr , the indexing of levels is the same as that of Tr , except that level 0 has two nodes. In many of the papers cited above, it is convincingly argued that DTr can emulate Tr with negligible overhead, since only one edge of Tr maps to a distance of 2. We therefore assume in this paper that emulating DTr on the hypercube is a desirable alternative to emulating Tr .
FIG. 1. Double rooted tree.
FIG. 2. Construction of the normalized tree.
The following theorem is one of the two main results of this paper. THEOREM 1. There exists a normalized embedding of DTr1k in Qr . Proof. Our approach for proving this theorem is as follows: First, we construct a new graph, which we call a ‘‘normalized tree’’ of order r (or NTr for short). Then we show that there exists a normalized embedding of DTr1k on NTr with unit dilation and unit congestion. Finally we show that NTr can be embedded in Qr with unit dilation and unit actual congestion. NTr is constructed recursively as follows: As a basis, NT2 is obtained from DT2 by adding two dotted edges from the leaves to the two roots (see Fig. 2a). Suppose NT 9r21 is a copy of NTr21 (see Fig. 2b). Let u, v denote the two roots of NTr21 , with x and y as the roots of their subtrees. Similarly, for NT 9r21 , we have the vertices u9, v9, x9, y9 with the corresponding roles. We construct NTr from these graphs by adding edges (u, u9), (v, x9), (v9, x), and deleting the edges (u, x), and (u9, x9), as shown in Figs. 2c and 2d. Figure 3 shows NT3 and NT4 as examples. Let u be a leaf vertex in NTr . We say that the vertex v is the parent of u if and only if there is a dotted edge (u, v). Note that for half of the leaf nodes, the parent is the same node as in the traditional definition of parent for
106
KEMAL EFE
binary trees. For the other half of leaves, the parent is an internal node. For the purpose of the proof of above theorem, we need the following fact. LEMMA 1. If u is a vertex in NTr , then u either is a leaf, or is the parent of a leaf. Moreover, no two leaves have the same parent. Proof. The claim is true for r # 4 by inspection of Fig. 3. For r . 4, suppose the two copies of NTr21 used for constructing NTr satisfy the claim. Then, NTr will also satisfy the claim since no dotted edges are deleted during its construction from two copies of NTr21 . j We can now easily show that there is a normalized mapping from the nodes of DTr1k to the nodes of NTr . The mapping of the highest r levels of DTr is straightforward: Every edge (u, v) is mapped to its corresponding (solid) edge in NTr . This leaves 2r11 subtrees, each with height k 2 1, yet to be assigned. Let u be a leaf node in NTr . Also let u9 be the corresponding node of DTr which has already been assigned to u, with x and y as its two children nodes (see Fig. 4a). Assign the subtree rooted at x to u, and the subtree rooted at y to the parent of u. From Lemma 1 we know that each subtree is assigned to a distinct node of NTr . Figure 4b shows the embedding of a large binary tree in the 8-node normalized tree. It is clear that the dilation and congestion of this embedding are unit. Moreover, for normal algorithms, every node of NTr is equally busy while emulating the work of the levels below r 2 1 of DTr1k . To complete the proof of the theorem it remains to show the following. LEMMA 2. NTr can be embedding in Qr with dilation 1 and congestion 2 such that the congestion of 2 occurs only where there are a solid edge and a dotted edge connecting the same pair of nodes. Remark. Note that solid edges and dotted edges are not used at the same step of a normal computation. Thus, the actual congestion of the emulation is unit. Proof. We prove the lemma by construction of the embedding recursively. Let diff (z, t) be a function which returns the differing bit position between the adjacent vertices z and t of a hypercube. As shown in Fig. 5a, NT2 can be embedded in Q2 as claimed. For r . 2, assume that NTr21 is a subgraph of Qr21 , with u and v as the two roots and x and y as the roots of their subtrees. Let i 5 diff (u,
FIG. 3. Normalized trees of height 2, 3, and 4.
FIG. 4. Embedding of a large binary tree in the normalized tree.
x), and j 5 diff (u, v). Note from the basis case of Fig. 5a that bit values at the ith and jth bit positions of u are both zero. Our construction method relies on this property and preserves it at every step of recursion. First, let NT 9r21 be the copy in Q9r21 . If u9, v9 are the two roots of NT 9r21 , with x9 and y9 as the roots of their subtrees, then again the ith and jth bit positions of u9 are both zero, where i 5 diff (u9, x9), and j 5 diff (u9, v9). If we exchange the bits at positions i and j for every vertex in NT 9r21 , the root u9 will remain in its original address. However, this bit exchange will bring v9 to the same address as x, and bring x9 to the same address as v. Now if we connect Qr21 and Q 9r21 to obtain Qr (and prefix every vertex label in Qr21 with 0, and those in Q 9r21 with 1), we observe two cycles. One of these cycles contains hu, u9, v9, xj, and the other cycle contains hu, u9, x9, vj. By adding the edges (u, u9), (x, v9), (v, x9) and deleting the edges (u, x), (u9, x9), we obtain the desired NTr . After this step, the new labels are computed as u r u, x r v, v r u9, y r v9, which implies that i r j and j r r, and, it is concluded that the vertex u continues to have its ith and jth bits equal (both zero). Figure 5c shows the embedding of a 16-node normalized tree in the hypercube. This completes the proof of Lemma 2 and the proof of Theorem 1. j 4. A NONOPTIMAL METHOD
In this section we consider a nonoptimal method. This method is interesting, because it is simple and easy to implement. This approach uses a well known embedding method [3, 9], based on assigning labels 0 ??? 2r 2 1 to the nodes of complete binary tree tracing the nodes in inorder sequence. Figure 6 shows the assignment of labels 0000 ??? 1111 where it is assumed that the tree is a left subtree of an auxiliary root node with label 1111. The labels assigned this way on tree nodes indicate which hypercube nodes they are mapped to. Note that, since an internal vertex u and its right descendent differ in two bit positions, there are exactly two paths of length 2 between the corresponding nodes of the hypercube. One is through the left descendent of u, and the other is through the parent of u. In both cases, we must incur a congestion of 2. For the purposes of this section, we assume that the edge from a node u to its right child is routed through the parent of u.
EMBEDDING TREES IN HYPERCUBES
107
FIG. 5. Embedding of the normalized tree in the hypercube.
Before discussing how to extend this method for embedding large trees in fixed size hypercubes, we present a new operation defined on hypercubes as follows:
comes the don’t care symbol ‘‘x.’’ This is illustrated in Fig. 7. We can now give the main result for the inorder embedding method.
Contracting the r-dimensional hypercube at dimension i means joining each vertex u with its dimension-i neighbor. The resulting hypercube contains half as many nodes as before, and the ith bit position for each vertex label be-
THEOREM 2. For the embedding of complete binary tree in a hypercube by inorder labeling, the load of each proces-
FIG. 6. Inorder labeling of a complete binary tree induces an embedding in the hypercube.
FIG. 7. Contracting the three-dimensional hypercube of (a) in dimension 2 (b), dimension 1, (c), and dimension 0 (d).
108
KEMAL EFE
FIG. 8. Embedding a large tree in a small hypercube by repeated contractions. Dotted lines show the new edges created by the contraction.
sor can be doubled by contracting the hypercube at the lowest dimension. In the resultant smaller hypercube each leaf will be at distance 0 or 1 from its parent. The internal nodes will be at the same distance from their parents as before contraction. Proof. For the purposes of this proof, assume that the levels of the tree are numbered starting from the leaves, with leaves at level 0. From the construction of inorder embedding, observe that the binary label of each left child at level i differs from the label of its parent at dimension i and the label of right child differs from that of its parent at dimensions i and i 1 1. Moreover, for every node at level i the binary label differs from that of its inorder successor at dimension i. Then, it immediately follows that if we contract the hypercube (with tree embedded in it) at the lowest dimension, we have a one to one mapping of 2r21 leaves onto 2r21 internal nodes. In particular, each leaf is mapped to its inorder successor. j To embed a large binary tree to a smaller hypercube, we can start with a unit load embedding of the required size and repeatedly apply the above theorem, contracting the hypercube at the next lowest dimension. Figure 8 shows this process starting with a 16 node binary tree. The following fact implies that, when emulating a normal algorithm, each node of the hypercube will be equally busy for every step of emulation. COROLLARY 1. The embedding method of Theorem 2 distributes the leaves uniformly between the nodes of the hypercube. If repeated k times, this method yields a balanced distribution of the tree nodes for each of the lowest k levels of the tree. Proof follows from the fact that each contraction step maps the leaves to their inorder successors. The disadvantages of this method are that (a) every edge from an internal node to its right descendent is mapped with dilation 2, and (b) the congestion of embedding is 2. However, the actual congestion will still be unit for a normal computation. 5. CONCLUDING REMARKS
We presented two methods for embedding arbitrarily large complete binary trees in fixed size hypercubes. The first method is more complex to implement but it is optimal,
while the second method is easier to implement but is nonoptimal. An occasional user may find it easier to implement the nonoptimal method. If the first method is implemented in a compiler, it may be used by any application program without the user having to define the mapping. REFERENCES 1. Bhatt, S. N., and Ipsen, I. C. F. How to embed trees in hypercubes. Rep. YALE/DCS/RR-43, Department of Computer Science, Yale University, 1985. 2. Bhatt, S. N., Chung, F., Leighton, T., and Rosenberg, A. Optimal simulation of tree machines. Proceedings of the 1986 27th Annual Symposium of Foundations of Computer Science. Pp. 274–282. 3. Bhatt, S. N., Chung, F. R. K., Leighton, F. T., and Rosenberg, A. L. Efficient embeddings of trees in hypercubes. SIAM J. Comput. 21, 1 (Feb. 1992), 151–162. 4. Chan, T. F., and Saad, Y. Multigrid algorithms on the hypercube multiprocessor. IEEE Trans. Comput. 35, 11 (Nov. 1986), 969–977. 5. Desphande, S. R., and Jenevein, R. Scalability of binary trees on a hypercube. Proc. 1986 International Conference on Parallel Processing. Pp. 661–668. 6. Efe, K. Embedding mesh of trees in the hypercube. J. Parallel Distrib. Comput. 11, 3 (Mar. 1991), 222–230. 7. Ho, C-T., and Johnsson, S. L. Spanning balanced trees in Boolean cubes. SIAM J. Sci. Statist. Comput. 10, 4 (July 1989), 607–630. 8. Ho, C. T., and Johnsson, S. L. Embedding hyper-pyramids into hypercubes, IBM J. Res. Dev. 38, 1 (Jan. 1994), 31–45. 9. Johnsson, S. L. Communication efficient basic linear algebra computations on hypercube architectures. J. Parallel Distrib. Comput. 4, 2 (Apr. 1987), 133–172. 10. Koch, R., Leighton, T., Maggs, B., Rao, S., and Rosenberg, A. Workpreserving emulations of fixed-connection networks. Proc. 21st Annual ACM Symposium on Theory of Computing. 1989, pp. 227–240. 11. Leiss, I. L., and Reddy, H. N. Embedding complete binary trees into hypercubes. Inform. Process. Lett. 38, (May 1991), 197–199. 12. Miller, Z., and Sudborough, I. H. Compressing grids into small hypercubes. Networks 24 (1994). 13. Monien, B., and Sudborough, I. H. Simulating binary trees on hypercubes. Proc. 3rd Aegean Workshop on Computing: VLSI Algorithms and Architectures, Corfu, Greece, Lecture Notes in Computer Science, Vol. 319. Springer-Verlag, Berlin/New York, 1988, pp. 170–180. 14. Ullman, J. D. Computational Aspects of VLSI. Comput. Sci. Press, New York, 1984. 15. Wu, A. Y. Embedding of tree networks into hypercubes. J. Parallel Distrib. Comput. 2, 3 (Aug. 1985), 238–249.
KEMAL EFE received the B.Sc. in electronic engineering from Istanbul Technical University, the M.S. in computer science from the Univer-
EMBEDDING TREES IN HYPERCUBES sity of California, Los Angeles, and the Ph.D. in computer science from the University of Leeds. He is currently an associate professor of computer science in the Center for Advanced Computer Studies, University of Southwestern Louisiana, Lafayette. His research interests are in parallel architectures and algorithms, interconnection networks, distributed opReceived July 11, 1994; revised February 22, 1996; accepted February 26, 1996
109
erating systems, performance evaluation, and algorithms for loosely coupled workstation networks. Dr. Efe has served on the technical committees of several conferences and has given invited talks in the U.S. and Europe. In 1995, he received the certificate of recognition from NASA for his research contributions. He is a member of the ACM and the IEEE.