Area Minimization for Hierarchical Floorplans - Semantic Scholar

Report 2 Downloads 31 Views
Area Minimization for Hierarchical Floorplans y z y Peichen Pan , Weiping Shi , and C. L. Liu y

Department of Computer Science University of Illinois at Urbana-Champaign z

Department of Computer Science University of North Texas Abstract

Two results are presented in this paper. First we settle the long-standing open problem on the complexity of the area minimization problem for hierarchical oorplans by showing that the problem is NP -complete. We then present a fast pseudo-polynomial area minimization algorithm for an important class of hierarchical oorplans called hierarchical oorplans of order-5. This algorithm is based on a new algorithm for determining the set of nonredundant realizations of a wheel. The new algorithm for wheels has time cost O(k2 log k) and space cost O(k2 ) if each of the ( ve) blocks in a wheel has at most k realizations, and it achieves a factor k reduction in both time and space costs compared to previous algorithms. The area minimization algorithm was implemented. Our experimental results show that the algorithm is very fast in practice.

0

1 Introduction Area minimization (optimization) is an important subtask in the oorplanning phase of VLSI chip design process. It is the problem of selecting a layout alternative for each subcircuit on a chip so as to minimize the total chip area after the oorplan (which speci es the relative positions of the subcircuits on a chip) has been determined. This problem has been studied extensively [1, 6, 7, 8, 9, 11, 12, 13]. The diculty of the area minimization problem is closely related to the topological structures of the oorplans being considered. On the one hand, Stockmeyer showed that the problem is strongly NP -complete for general (non-hierarchical)

oorplans [9]. On the other hand, Otten and Stockmeyer proposed a polynomial-time algorithm for a special kind of hierarchical oorplans called slicing oorplans [7, 9]. In practice, many oorplans are constructed either by top-down partitioning (or bottomup clustering). In each step, a group of subcircuits is partitioned into at most p subgroups for some positive integer p. (Often, not all patterns of partitioning a group into p or fewer subgroups are allowed. Instead, only a selected set of partitioning patterns is allowed in order to reduce the the size of the search space in oorplan design [6].) A oorplan obtained this way is called a hierarchical oorplan (of order-p). Since the cost of the partitioning (or clustering) process increases very rapidly as p increases, the value p is usually con ned to be a small integer such as p  5 [2, 5]. A hierarchical oorplan (of order-p) can be described by a p-ary tree which corresponds to the partitioning process (note that p is a constant)1. In this paper we study the area minimization problem for hierarchical oorplans. Of particular interest is hierarchical oorplans of order-5 since they are the commonly used hierarchical oorplans in practice. The complexity of the area minimization problem for hierarchical

oorplans was a long-standing open problem. In this paper we rst show that the area minimization problem for hierarchical oorplans of order-5 is NP -complete. This result further implies that the area minimization problem for hierarchical oorplans constructed by any set of partitioning patterns that contains one or more nonslicing patterns is NP -complete. Although a hierarchy can always be extracted from a non-hierarchical oorplan [1], the corresponding tree will have unbound degrees in general. In the extreme case, the whole oorplan forms just one partition. 1

1

In [11] Wang and Wong proposed an area minimization algorithm for hierarchical oorplans of order-5. The algoritm is based on an extension of the technique in [9] to L-shaped modules. We shall present in this paper a new algorithm for this problem which has better time and space costs. The time cost of the new algorithm is pseudo-polynomial although it, of course, can be exponential in the worst-case. (Thus, the area minimization problem for hierarchical

oorplans of order-5 is only weakly NP -complete as opposed to the corresponding problem for non-hierarchical oorplans which is strongly NP -complete.) The remainder of this paper is organized as follows. In Section 2 we introduce some de nitions and give a precise description of the problem. In Section 3 we show the NP -completeness of the problem. Section 4 presents our new area minimization algorithm. Section 5 lists our experimental results. Finally, Section 6 concludes this paper.

2 Preliminaries A oorplan is a dissection of an enveloping rectangle by horizontal and vertical line segments into rectangular (basic) blocks such that no four blocks share a common point. A slice is a oorplan with two blocks. There are only two di erent slices, the vertical slice and the horizontal slice, as shown in Figure 1. A oorplan is slicing if it can be constructed by recursively partitioning a block into a slice starting with only one block. A wheel is a nonslicing oorplan with ve blocks. There are only two di erent wheels, the left wheel and the right wheel, as shown in Figure 2. A oorplan is said to be hierarchical of order-5 if it can be constructed by recursively partitioning a block into a slice or a wheel starting with only one block2. Figure 3 shows an example illustrating the construction process of a hierarchical

oorplan of order-5.

Figure 1: (a) Vertical slice, (b) Horizontal slice. Since wheels are the smallest non-slicing partitioning patterns, all other partitioning patterns with ve or fewer blocks can be decomposed into slices. This is the reason that in this de nition we ignore the other partitioning patterns. 2

2

Figure 2: (a) Left wheel, (b) Right wheel.

Figure 3: Construction of an order-5 oorplan. A layout alternative of a subcircuit is called a realization of the corresponding block in the oorplan. If we select a realization for each block and arrange them according to the

oorplan, we obtain a realization of the oorplan. In this paper we only consider the case in which the shapes of the blocks and the oorplans are rectangular. Thus, a realization (of a block or a oorplan) has two dimensions, the width and height of the smallest rectangle that can accommodate it. The width of a realization of a oorplan can be determined by computing the length of the longest paths in an edge-weighted DAG constructed from the

oorplan, so can the height [5, 8]. The area minimization problem can be formally de ned as follows: Given a oorplan F and a set of realizations for each of its (basic) blocks, determine a realization of F which has the minimum area. In this paper we are interested in the case in which F is a hierarchical oorplan. We also assume that the width and height of any realization of a block are non-negative integers. The following is the decision version of this problem when F is a hierarchical oorplan of order-5.

AREAMIN-5

Instance: A hierarchical oorplan of order-5, F , a set of realizations for each of its basic block, and an integer A.

Question: Does F have a realization with area less than or equal to A ? Let r be a realization (of a block or oorplan), we use w(r) to denote the width and h(r) to denote the height of r. When the exact composition of r does not matter or when r is a realization of a block, we can simply represent r by (w(r); h(r)). For two realizations r1 and r2, r1 is said to dominate r2 if the conditions w(r1)  w(r2) and h(r1)  h(r2) are satis ed3. For 3

Two realizations having the same width and height are viewed as the same.

3

a set of realizations, a realization in the set is said to be nonredundant if it is not dominated by any other realization in the set. Otherwise, the realization is said to be redundant. A set of nonredundant realizations obviously has the following properties: (i) No two realizations in the set have the same width or height, and (ii) If the realizations are sorted in increasing order according to one dimension, they will be arranged in decreasing order according to the other dimension. These properties suggest the following simple algorithm for determining the set of nonredundant realizations from a given set of realizations: Sort all realizations in the set in increasing order according to the width (if two realizations have the same width, delete the one with larger height), then inspect the realizations in this order one by one and retain only those which keep the height in decreasing order. We shall use L(F ) to denote the set of nonredundant realizations of F (which is a block or a oorplan). We list here two facts about L(F ) which are self-evident.

Fact 2.1 For a oorplan F , L(F ) contains all the minimum area realizations of F . Fact 2.2 Suppose F1 is a sub- oorplan4 of F . Let F 0 be the oorplan formed by substituting a basic block B for F1 in F and let L(B ) = L(F1 ), then L(F ) = L(F 0) as far as the dimensions of the realizations are concerned.

3 An NP -completeness result In this section, we shall show that AREAMIN-5 is NP -complete. Before giving the proof, we rst remark the implication of this result. It was shown in [10] that any nonslicing partitioning pattern contains a wheel-type structure (see Figure 4). In Figure 4, if we assume all unshaded blocks have only one realization (0; 0), this pattern is equivalent to a wheel formed by the ve shaded blocks as far as the area is concerned. The NP -completeness of AREAMIN-5, therefore, implies that the area minimization problem for hierarchical oorplans constructed by any set of partitioning patterns that contains a nonslicing pattern is NP -complete. In other words, slicing oorplans

4

Figure 4: A partially drawn nonslicing pattern are the only type of oorplans for which area minimization can be done in polynomial time if P 6= NP . Now we start to show that AREAMIN-5 is NP -complete. We rst introduce an important lemma.

Lemma 3.1 In the left wheel W , if L(B1) = L(B3), L(B2) = L(B4), and L(B5) = f(0; 0)g, moreover, for any (w; h) 2 L(B1 ) and (w0 ; h0 ) 2 L(B2 ), w  w0 and h  h0 , then L(W )  f(w + w0; h + h0) j (w; h) 2 L(B1); (w0; h0) 2 L(B2)g: Proof: Let r be a realization of W . Let (wi ; hi) denote r(Bi ) for 1  i  4. Of course r(B5) = (0; 0). Since w  w0 and h  h0 for any (w; h) 2 L(B1) and (w0; h0) 2 L(B2), it follows that w(r) = maxfw1 + w2; w4 + w3g h(r) = maxfh1 + h4; h2 + h3g >From these formulas it is easy to see that by letting (w1; h1) = (w3; h3) = (w; h) and (w2; h2) = (w4; h4) = (w0; h0), the resulting r is (w + w0; h + h0). Now it suces to prove the following claim: Claim: If h(r) = h1 + h4 (resp. h(r) = h2 + h3), then w(r)  w1 + w4 (resp. w(r)  w2 + w3). Proof of Claim: By symmetry, we only need to prove the case h(r) = h1 + h4 . Suppose the claim is not true, i.e., w(r) < w1 + w4. Then,

w1 + w2  w(r) < w1 + w4 =) w2 < w4 =) h2 > h4: w4 + w3  w(r) < w1 + w4 =) w3 < w1 =) h3 > h1: Now we have h(r)  h2 + h3 > h1 + h4 = h(r), an obvious contradiction. 2 The problem we are to reduce from is the following 2-1-PARTITION which is known to be NP -complete [4]. 4

A sub- oorplan is a set of blocks which form a rectangular superblock.

5

2-1-PARTITION Instance: A list of positive integers a1; a2;    ; a2n where n  1. Question: Is there an index set I such that I contains exactly one of 2t ? 1 and 2t for 1  t  n and Pi2I ai = (a1 +    + a2n)=2 ? In the instance of 2-1-PARTITION, if 2n is not a power of 2, we can always append a sucient number of 1's to the end of the list to make the total number of integers a power of 2. It is obvious that the modi ed instance has an yes answer i the original instance has an yes answer. Therefore, without loss of generality we assume 2n = 2m where m  1. For convenience, we introduce the following notations:

s(I ) = si;l

=

P

=

P ai for I  f1; 2;    ; 2ng : i2I P ltl+2i ?1 at: sm;1 = P1t2m at = a1 +    + a2n:

Ii;l = fI j I contains exactly one of l + 2t and l + 2t + 1 for 0  t  2i?1 ? 1g: We de ne a sequence of integers xi, i  1, as follows:

x1 = P xi = 2xi?1 + P Solving the recurrence, we obtain xi = (2i ? 1)P . In particular, we have xm = (2m ? 1)P = (2n ? 1)P . We now de ne a family of oorplans and the sets of realizations for their basic blocks recursively as follow: (1) A1;l, 1  l  2m ? 1, is the one block oorplan shown in Figure 5 with L(B0) = f(x1 + al; x1 + al+1); (x1 + al+1; x1 + al)g.

Figure 5: A1;l 6

Figure 6: (a) Construction of Ai;l, (b) The left wheel equivalent to Ai;l (2) Ai;l, 2  i  m and 1  l  2m ? 2i + 1, is recursively de ned by Ai?1;l and Ai?1;l+2i?1 as shown in Figure 6(a). Here, B1; B2; B3; B4, and B5 are basic blocks with L(B1) = L(B3) = f(1; P )g, L(B2) = L(B4) = f(P; 1)g, L(B5) = f(0; 0)g. Let us now count the number of basic blocks in Ai;l, which is denoted by bi. We have the following recurrence for bi: b1 = 1 bi = 4bi?1 + 5 Solving this recurrence, we obtain bi = 38 4i?1 ? 35 . In particular, we have bm = 83 n2 ? 53 . We have the following result concerning Ai;l.

Lemma 3.2 L(Ai;l) = f(xi + s(I ); xi + si;l ? s(I )) j I 2 Ii;lg. Proof: (By induction on i) Basis: i = 1, the lemma simply follows from the de nition of A1;l. Step: By the induction hypothesis, we have

L(Ai?1;l)

= f(xi?1 + s(I1); xi?1 + si?1;l ? s(I1)) j I1 2 Ii?1;lg

L(Ai?1;l+2i?1 ) = f(xi?1 + s(I2); xi?1 + si?1;l+2i?1 ? s(I2) j I2 2 Ii?1;l+2i?1 g Let B1Ai?1;l denote the sub- oorplan formed by B1 and Ai?1;l. Similarly, we can de ne B3Ai?1;l, B2Ai?1;l+2i?1 , and B4Ai?1;l+2i?1 . It is also easy to see that L(B1Ai?1;l) = L(B3Ai?1;l) = f(w(r); h(r) + P ) j r 2 L(Ai?1;l)g = f(xi?1 + s(I1); xi?1 + P + si?1;l ? s(I1)) j I1 2 Ii?1;lg L(B2Ai?1;l+2i?1 ) = L(B4Ai?1;l+2i?1 ) = f(w(r) + P; h(r)) j r 2 L(Ai?1;l+2i?1 )g = f(xi?1 + P + s(I2); xi?1 + si?1;l+2i?1 ? s(I2) j I2 2 Ii?1;l+2i?1 g

7

By viewing B1Ai?1;l, B2Ai?1;l+2i?1 , B3Ai?1;l, and B4Ai?1;l+2i?1 as basic blocks, Ai;l can now be treated as the left wheel as shown in Figure 6(b). Moreover, it is easy to check that the conditions in Lemma 3.1 are all satis ed. Thus, we have

R = f(xi?1 + s(I1) + xi?1 + P + s(I2); xi?1 + P + si?1;l ? s(I1) + xi?1 + si?1;l+2i?1 ? s(I2)) j I1 2 Ii?1;l and I2 2 Ii?1;l+2i?1 g = f(xi + s(I ); xi + si;l ? s(I )) j I 2 Ii;lg contains L(Ai;l). Since for any r 2 R, w(r)+ h(r) = 2xi + si;l, a constant, there is no redundant realization in R. Therefore, L(Ai;l) = R. 2

Figure 7: Construction of Fm;1 Let Fm;1 be the oorplan formed by Am;1 and two basic blocks as shown in Figure 7. Let the realizations for the blocks in Am;1 be that as de ned in (2). L(B6) = f(1; 1 + xm + P=2)g and L(B7) = f(xm + P=2; 1)g. It follows that a given instance of 2-1-PARTITION can be reduced to an instance of AREAMIN-5 with F = Fm;1 and A = (1 + xm + P=2)2 . Obviously, this is a polynomial reduction because bm is a polynomial in n, and x1 and xm are polynomials in n and P . We now show that the instance of 2-1-PARTITION has an yes answer i Fm;1 has a realization with area less than or equal to A = (1 + xm + P=2)2 . We note that Fm;1 has a realization with area less than or equal to A i Am;l has a realization r such that w(r)  xm + P=2 and h(r)  xm + P=2. By Lemma 3.1, we know that

L(Am;1) = f(xm + s(I ); xm + sm;1 ? s(I )) j I 2 Im;1g Thus, w(r) + h(r) = 2xm + sm;1 = 2xm + P . Consequently, w(r)  xm + P=2 and h(r)  xm + P=2 i w(r) = xm + P=2 and h(r) = xm + P=2, which in turn is equivalent to that there is an index set I in Im;1 such that s(I ) = P=2, i.e., the instance of 2-1-PARTITION has an yes answer. As was mentioned in Section 2, determining w(r) and h(r) amounts to computing the lengths of the longest paths in two DAGs. Furthermore, the number of edges in each of 8

the two DAGs is equal to the number of blocks in the corresponding oorplan. It is obvious that AREAMIN-5 is in NP . According to the analysis above and the fact that Fm;1 is balanced, i.e., Fm;1 can be constructed in O(m) = O(log(bm + 2)) = O(log(the number of blocks of Fm;1)) recursive partitioning steps, we have one of the main results of this paper.

Theorem 3.1 The decision version of the area minimization problem for hierarchical oorplans of order-5 is NP -complete (even when the oorplans are balanced).

2

4 An area minimization algorithm In this section we present an area minimization algorithm (for hierarchical oorplans of order5). The algorithm employs a hierarchical approach [1, 11], in which a fast algorithm for determining the set of nonredundant realizations of a wheel plays an important role. If each block in a wheel has at most k realizations, previous algorithms for determining the set of nonredundant realizations of a wheel have time and space costs O(k3 log k) and O(k3), respectively [8, 11]. In Section 4.1 we shall present an algorithm with time cost O(k2log2k) and space cost O(k2 ). The time cost of the algorithm is further improved to O(k2 log k) in Section 4.2. First we describe the hierarchical area minimization algorithm. The area minimization algorithm determines L(F ), the set of nonredundant realizations of a

oorplan F , instead of simply a realization with minimum area. By Fact 2.1, after determining L(F ), we can select readily a realization with minimum area from L(F ). Since L(F ) contains all the minimum area realizations of F , determining L(F ) also gives the designer the freedom to choose among all the minimum area realizations of F the one with a preferred aspect ratio. More importantly, because of Fact 2.2, determining L(F ) makes it possible for the algorithm to exploit the oorplan hierarchy in area minimization. The area minimization algorithm can be summarized as follows:

AreaMin(F ) Input:

a hierarchical oorplan of order-5, F , and the sets of nonredundant realizations of the blocks in F . 9

Output: the set of nonredundant realizations of F .

if F is a oorplan with one block return the set of nonredundant realizations of the block;

F1 a slice or a wheel in F ; F 0 the oorplan formed by substituting F1 with a basic block B in F ; L(B ) L(F1) ; return AreaMin(F 0).

AreaMin follows the reverse process in which F is constructed. At each stage, it determines

the set of nonredundant realizations of the sub- oorplan of F started with the partition F1, that is L(F1). Thus, when a oorplan with only one block is reached, it means the set of nonredundant realizations of F has been determined. AreaMin is actually a general area minimization method. It can be used for any hierarchical oorplans, but the sets of nonredundant realizations of other partitioning patterns should be determined instead of only slices and wheels. In AreaMin the area minimization problem for hierarchical oorplans of order-5 is reduced to determining the sets of nonredundant realizations for slices and wheels (determining L(F1) in AreaMin). An ecient algorithm for determining the set of nonredundant realizations of a slice was presented in [7, 9]. That algorithm has time cost O(k) if each of the (two) blocks in a slice has at most k realizations. What remains is to design an algorithm for determining the set of nonredundant realizations of a wheel. The basic algorithm presented in Section 4.1 has time cost O(k2log2k) and space cost O(k2) if each of the ( ve) blocks in the wheel has at most k realizations. The time cost of the algorithm is improved to O(k2 log k) in Section 4.2. We now determine the time complexity of AreaMin. Suppose F has n blocks and the dimensions of all realizations of the blocks are integers and are less than or equal to a positive integer M . It is easy to see that the number of nonredundant realizations of any sub- oorplan of F is at most nM . Thus, jL(F1)j  nM . The total number of calls to determining L(F1) is obviously less than n, so the time cost of AreaMin is O(n(nM )2 log(nM )) = O(n3M 2 log(nM )), a polynomial in terms of n and M . That is, AreaMin is a pseudo-polynomial time algorithm. As for the space cost, it is O(n2M 2) if only the dimensions of a realization are needed; it is 10

O(n3M 2) if we also want to have the information on how each nonredundant realization is formed.

4.1 An algorithm for determining the set of nonredundant realizations of a wheel Since there is an obvious symmetry between the left and the right wheels (one is the mirror image of the other), we only consider the left wheel W shown in Figure 2(a). To be speci c, the problem we shall focus on is: For the left wheel W in Figure 2(a), determine L(W ) for given L(Bi ) (1  i  5). For convenience, we introduce a few notations. If r is a realization of W , we use r(Bi) to denote the realization selected for Bi in r. We also assume the nonredundant realizations of a block are ordered in increasing width and decreasing height. The j -th realization of block Bi is denoted by (wij ; hji ). Let ki = jL(Bi )j (1  i  5) and k = maxfki j 1  i  5g. The algorithm used in [11] for determining L(W ) has time cost O(k3 log k) and space cost O(k3). A simpler algorithm with the same time and space costs was derived from a general method in [8]. Here, we shall present an algorithm with time cost O(k2log2k) and space cost O(k2 ). In Section 4.2, we shall show how the the time cost can be improved to O(k2 log k). First we derive an upper bound on the size of L(W ).

Figure 8: Case 2 in the proof of Lemma 4.1.

Lemma 4.1 jL(W )j  (k1 + k3)(k2 + k4) + (minfk1; k2; k3; k4g)k5 = O(k2). Proof: Let r be a realization of W . Let (wi; hi ) denote r(Bi). It is obvious that

w(r) = maxfw1 + w2; w1 + w5 + w3; w4 + w3g; h(r) = maxfh1 + h4; h2 + h5 + h4; h2 + h3g: There are two cases depending on how the width and the height of r are determined. 11

Case 1. Either the width or the height of r is determined by two blocks, i.e., w(r) = w1+w2,

or w(r) = w4 + w3, or h(r) = h1 + h4, or h(r) = h2 + h3. Since no two realizations in L(W ) have the same width or height, the number of nonredundant r's in this class is at most k1k2 + k4k3 + k1k4 + k2k3 which is equal to (k1 + k3)(k2 + k4). Case 2. Neither the width nor the height of r is determined by two blocks, i.e., w(r) > w1 + w2, w(r) > w4 + w3, h(r) > h1 + h4, and h(r) > h2 + h3. Without loss of generality we assume k1 = minfk1; k2; k3; k4g. If r is nonredundant, it must (see Figure 8),

w4 = maxfw j (w; h) 2 L(B4) and w  w1 + w5g h3 = maxfh j (w; h) 2 L(B3) and h  h4 + h5g w2 = maxfw j (w; h) 2 L(B2) and w  w5 + w3g: Thus, for xed (w; h) in L(B1) and (w0; h0) in L(B5) there is at most one nonredundant r in this class such that r(B1) = (w; h) and r(B5) = (w0; h0). Therefore, there are at most k1k5 nonredundant realizations of W in this class. Combining the two cases, the lemma follows. 2 The bound in Lemma 4.1 is achievable. This can be shown by an example. Let L(B5) = f(0; 0)g and L(B1) = L(B3) = f(K ? i; K + i) j i = 1; : : : ; kg;

L(B2) = L(B4) = f(K + ik; K ? ik) j i = 1; : : : ; kg: where K = k2 and k  2. It is easy to see that the conditions in Lemma 3.1 are all satis ed. Thus, the set X = f(2K + (ik ? j ); 2K ? (ik ? j )) j i; j = 1; : : : ; kg contains L(W ). Since for any r 2 X , w(r) + h(r) = 4K , no realization in X is redundant, so L(W ) = X . Therefore, jL(W )j = jX j = k2. Before presenting the algorithm, we de ne a procedure BS(S; b; flag), where S = f(w1; h1), (w2; h2); : : :; (ws; hs )g such that w1 < w2 <    < ws and h1 > h2 >    > hs , b is a nonnegative integer, and flag is either width or height. When flag = width, BS(S; b; flag) is the maximum i such that wi  b or nil if wl > b for 1  l  s; when flag = height, BS(S; b; flag) is the minimum i such that hi  b or nil if hl > b for 1  l  s. It is easy to 12

see that we can use binary search to implement BS. Therefore, we can assume BS has time and space costs O(log s) and O(1), respectively. Now we are ready to introduce our algorithm for determining L(W ). The rst step of the algorithm is to generate a superset of L(W ). There are two cases according to how the width and the height of a nonredundant realization are determined (exactly as in the proof of Lemma 4.1). Case 1. The nonredundant realizations whose widths or heights are determined by two blocks in the wheel. The nonredundant realizations in this case can be partitioned into classes according to which blocks determine their widths and heights. Our algorithm will generate a superclass for each class. There are eight classes all together, but we only consider two of them here: Class 1: the nonredundant realizations whose widths are determined by B4 and B3 and whose heights are determined by B3 and B2, and Class 2: the nonredundant realizations whose widths are determined by B4 and B3 and whose heights are determined B4, B5, and B2. A superclass of any of the other classes can be generated symmetrically since all other classes can be turned into either Class 1 or Class 2 by rotating W 90 , or 180 , or 270 clockwise. We consider Class 1 and Class 2 separately. Subcase 1.1: Class 1. For (w4; h4) in L(B4) and (w3; h3) in L(B3), a realization of W is called the I-support (for (w4; h4) and (w3; h3)) if it is the one with minimum height among those realizations r's such that r(B3) = (w3; h3), r(B4) = (w4; h4), w(r) = w4 + w3, and h(r) = h3 + h(r(B2)). Obviously, a nonredundant realizations in Class 1 must be the I-support for some (w4; h4) in L(B4) and (w3; h3) in L(B3). To obtain a superclass of Class 1, we simply generate the set consisting of all I-supports (there are at most k4k3 of them). We now describe a procedure for determining the I-support r for given (w4; h4) and (w3; h3) if it exists. Noticing that if h3 < h4, there is no corresponding I-support, we assume h3  h4. Since h(r) = h3 + h(r(B2)), we have h(r(B5))  h3 ?h4. Let t5 = BS(L(B5); h3 ?h4; height). If t5 = nil, again there is no I-support for (w4; h4) and (w3; h3). Otherwise, we simply let r(B5) = (w5t5 ; ht55 ), the t5-th realization in L(B5) because it is the one with minimum width among those realizations in L(B5) whose 13

heights are less than or equal to h3 ?h4. We now search L(B2) to determine r(B2). Suppose the tmp-th realization (w2tmp; htmp 2 ) in L(B2 ) is currently being considered. We check whether there is a realization in L(B1) that can be t into the slot with width w4 + w3 ? maxfw2tmp; w5t5 + w3g and height htmp 2 + h3 ? h4 (this is the space left for B1). This can be done by considering the t1-th realization (w1t1 ; ht11 ) in L(B1) where t1 = BS(L(B1); htmp 2 + h3 ? h4 ; height). If w1t1  w4 + w3 ? maxfw2tmp; w5t5 + w3g, obviously (w1t1 ; ht11 ) can be put into the slot, otherwise no realization in L(B1) can be placed in this slot. In the former situation, all realizations before (w2tmp; htmp 2 ) in L(B2 ) can be ignored because r(B2 ) is the one having minimum height with this property. In the latter situation, all realizations after (w2tmp; htmp 2 ) in L(B2 ) (includeing (w2tmp; htmp 2 ) itself) can be safely ignored simply because any of them can not leave enough space for B1. Notice that BS is called once for determining t5 at the beginning and once for testing each (w2tmp; htmp 2 ). If we use binary search to determine r(B2 ), the I-support can be found in O(log2 k) time and O(1) space. The pseudo-code of this procedure is given in Table 1.

t5 BS(L(B5); h3 ? h4; height); if t5 = nil then return nil; /* nil means the sought I-support does not exist. */ i 1; j k2; while i 6= j do tmp d(i + j )=2e; t1 BS(L(B1); htmp 2 + h3 ? h4 ; height); t if t1 = nil or w1 > w4 + w3 ? maxfw2tmp; w5t + w3g then j tmp ? 1; else i tmp; t1 BS(L(B1); hi2 + h3 ? h4; height); if t1 = nil or w1t > w4 + w3 ? maxfw2i ; w5t + w3g then return nil; else return r : r(B1) = (w1t ; ht1 ), r(B2) = (wt 2i ; hti2), r(B3) = (w3; h3), 1

5

1

5

1

1

r(B4) = (w4; h4), r(B5) = (w55 ; h55 ).

Table 1: Pseudo-code for determining the I-support for (w4; h4) and (w3; h3) By calling the above procedure for all realization pairs: one from L(B4), the other from L(B3), all the I-supports can be determined in O(k2 log2 k) time and O(k2) space. Subcase 1.2: Class 2 14

Let r0 be a nonredundant realization in Class 2 with r0(Bi) = (wi; hi) for 1  i  5. Obviously, h5 + h4  h3. We can further assume

h3 = maxfh j (w; h) 2 L(B3) and h  h4 + h5g

(1)

For otherwise, either r0 is not redundant (when w4 +w3 > w1 +w2), or there is a realization that has the same width and height as r0 but does not belong to this class (when w4 + w3 = w1 + w2). We therefore consider the pair (w4; h4) and (w3; h3) here only when there is a (w5; h5) such that (1) is satis ed. For (w4; h4) in L(B4) and (w5; h5) in L(B5), Let t3 = BS(L(B3); h4 + h5; height) 6= nil. A realization of W is called the II-support (for (w4; h4) and (w5; h5)) if it is the one with minimum height among those realizations r's such that r(B4) = (w4; h4), r(B5) = (w5; h5), w(r) = w4 + w3t3 , and h(r) = h4 + h5 + h(r(B2)). To obtain a superclass of Class 2, we simply generate the set consisting of all the II-supports (there are at most k4k5 of them). Let r be the II-support for (w4; h4) and (w5; h5) if it exists. By our assumption, r(B3) = (w3t3 ; ht33 ). We are now in the situation very similar to that when we determine an I-support. A similar procedure can be designed to determine r in O(log2 k) time and O(1) space. The pseudo-code of the procedure is given in Table 2.

BS(L(B3); h5 + h4; height); if t3 = nil then return nil; /* nil means the sought II-support does not exist. */ i 1; j k2; while i =6 j do tmp d(i + j )=2e; t1 BS(L(B1); htmp 2 + h5 ; height); t if t1 = nil or w1 > w4 + w3t ? maxfw2tmp; w5 + w3t g then j tmp ? 1; else i tmp; t1 BS(L(B1); hi2 + h5; height); if t1 = nil or w1t > w4 + w3t ? maxfw2i ; w5 + w3t g then return nil; else return r : r(B1) = (w1t ; ht1 ), r(B2) = (w2i ; hi2), r(B3) = (w3t ; ht3 ), t3

1

1

3

3

3

3

1

1

r(B4) = (w4; h4), r(B5) = (w5; h5).

3

3

Table 2: Pseudo-code for determining the II-support for (w4; h4) and (w5; h5) 15

Thus, all the II-supports can be determined in O(k2 log2 k) time and O(k2) space. By putting all the superclasses together, we obtain a set of realizations of W which contains all nonredundant realizations in Case 1 in O(k2 log2 k) time and O(k2 ) space. Case 2. The nonredundant realizations not in Case 1. This case is exactly the same as Case 2 in the proof of Lemma 4.1. For a nonredundant realization in this case, neither its width nor its height is determined by two blocks in the wheel. For any (w1; h1) in L(B1) and (w5; h5) in L(B5), a realization r such that

r(B1) = (w1; h1) r(B5) = (w5; h5) r(B4) = (w4t4 ; ht44 ) where t4 = BS(L(B4); w1 + w5; width) 6= nil r(B3) = (w3t3 ; ht33 ) where t3 = BS(L(B3); h5 + ht44 ; height) 6= nil r(B2) = (w2t2 ; ht22 ) where t2 = BS(L(B2); w5 + w3t3 ; width) 6= nil is called the III-support for (w1; h1) and (w5; h5). According to the proof of Lemma 4.1, a nonredundant realization in this case must be the III-support for some (w1; h1) in L(B1) and (w5; h5) in L(B5). For given (w1; h1) and (w5; h5), the III-support obviously can be determined by calling BS three times following the de nition. Thus, all the III-supports (there are at most k1k5 of them) can be determined in O(k2 log k) time and O(k2) space. Let R denote the set of realizations of W generated in both Case 1 and Case 2. By construction we have L(W )  R and jRj = O(k2). The last step of the algorithm is to remove the redundant realizations in R. This can be accomplished by sorting the realization in R according to their widths and removing all redundant ones as mentioned in Section 2, which takes O(k2 log k2) = O(k2 log k) time and O(k2 ) space. Thus we have the following result.

Theorem 4.1 If each block has at most k realizations, the set of nonredundant realizations

of a wheel can be determined in O(k 2 log2 k) time and O(k 2 ) space.

2

4.2 Improvements The basic algorithm presented in Section 4.1 for determining the set of nonredundant realizations of a wheel can be further improved. We rst show how to improve the time cost of the basic algorithm to O(k2 log k). To achieve the improvement, we only need to speed up the process of determining the I-supports and the 16

II-supports. We consider the I-supports rst. For xed (w3; h3) in L(B3), let ri denote the I-support for (w3; h3) and (w4i ; hi4) (if it exists, otherwise let ri = nil). It suces to show that r1; r2; : : :; rk4 can be determined in O(k log k) time. The following is the key observation.

Observation 4.1 If ri 6= nil, then ri+1 6= nil and w(ri+1(B2))  w(ri(B2)). We will determine r1; r2; : : :; rk4 one by one in this order. For each i, the realizations in L(B2) are examined in the order of increasing width to search for ri(B2). According to Observation 4.1, the search for ri(B2) can be started with ri?1(B2) if ri?1 6= nil, otherwise started with the rst realization in L(B2). The total number of calls to BS (to check whether or not there is enough room left to a place a realization of B1 for a realization in L(B2)) is bounded by k4+k2 since there is no need to examine the realizations in L(B2) backwards. Thus, r1; r2; : : :; rk4 can be determined in O(k log k) time. It is also easy to see that the realizations selected for B5 in r1; r2; : : :; rk4 as determined by t5 are in the order of nonincreasing width. We can use linear search to determine the realizations for B5 instead of calling BS. This will reduce the total time for determining the realizations for B5 for r1; r2; : : :; rk4 from O(k log k) to O(k). For II-supports, if we let pi denote the II-support for and (w4i ; hi4) and (w5; h5) for xed realization (w5; h5) in L(B5), we have the following observation:

Observation 4.2 If pi+1 6= nil, then pi 6= nil and w(pi+1(B2))  w(pi(B2)). As in the case for I-supports, we determine p1; p2; : : :; pk4 one by one in this order. By a method similar to that for I-supports, we can determine p1; p2; : : :; pk4 in O(k log k) time. Notice also that the realizations selected for B3 in p1; p2; : : : ; pk4 as determined by t3 are in the order of increasing width. We can use linear search to determine the realizations for B3 instead of calling BS. This will reduce the total time for determining the realizations for B3 for p1 ; p2; : : :; pk4 from O(k log k) to O(k). By the above analysis, we have another main result of this paper:

Theorem 4.2 If each block has at most k realizations, the set of nonredundant realizations

of a wheel can be determined in O(k 2 log k) time and O(k 2 ) space.

17

2

Although it will not reduce the order of the time cost of the algorithm, the determination of the III-supports can also be sped up because of the following observation if we let qi denote the III-support for (w5; h5) and (w1i ; hi1) for xed realization (w5; h5) in L(B5).

Observation 4.3 If qi 6= nil and qj 6= nil such that j > i, then w(qj (B4))  w(qi(B4)), w(qj (B3))  w(qi(B3)), and w(qj (B2))  w(qi(B2)). By the above observation, q1; q2; : : :; qk5 can be determined in this order in O(k) time by linearly searching L(B4), L(B3), and L(B2). Hence, all III-supports can be determined in O(k2) time. Finally, we would like to point out whether the O(k2 log k) time bound can be further improved is related to a long-standing open problem if the realizations in L(W ) have to be outputed in sorted order. The problem is called sorting X + Y which asks whether the sorting of fxi + yj j i; j = 1; : : : ; kg can be done in o(k2 log k) time [3]. We can reduce sorting X + Y to our problem in O(k) time as follow: Given X = fx1; : : : ; xk g and Y = fy1; : : :; yk g, in the left wheel W we let L(B5) = f(0; 0)g and

L(B1) = L(B3) = f(K + xi; 2K ? xi) j i = 1; : : :; kg; L(B2) = L(B4) = f(2K + yi; K ? yi) j i = 1; : : :; kg: where K is the largest one among x1; : : :; xk ; y1; : : : ; yk . It can be shown (by Lemma 3.1) that

L(W ) = f(3K + (xi + yj ); 3K ? (xi + yj )) j i; j = 1;    ; kg Therefore to produce L(W ) in sorted order is the same as to sort X + Y .

5 Experimental Results The algorithm (AreaMin) was implemented in the C language and executed on a SUN SPARCstation 10. The improvements presented in Section 4.2 were not implemented. We ran the program on a set of examples, most of them are from the literature. We rst describe the 12 test examples. EX1 to EX5 are taken from [11]. They all use the oorplan shown in Figure 9. For EX1 each block has three realizations: (4; 1), (2; 2), and (1; 4). For EX2 18

each block has four realizations: (6; 1), (3; 2), (2; 3), and (1; 6). For EX3 each block has ve realizations: (16; 1), (8; 2), (4; 4), (2; 8), and (1; 16). For EX4 each block has six realizations: (12; 1), (6; 2), (4; 3), (3; 4), (2; 6), and (1; 12). For EX5 each block has eight realizations: (24; 1), (12; 2), (8; 3), (6; 4), (4; 6), (3; 8), (2; 12), and (1; 24). EX6 is the 24-block oorplan together with the sets of realizations5 in [12] (also see [11]). EX7 to EX12 are derived from EX1 to EX6, respectively. EX7 is constructed by substituting each of the ve blocks of a wheel with a copy of EX1. EX8 to EX12 are constructed in the same way from EX2 to EX6.

Figure 9: The oorplan used in EX1 to EX5 test no. no. realizations no. realizations running time example blocks

oorplan sub- oorplans (sec) 25 EX1 25 3 168 0.02 25 EX2 25 4 365 0.03 25 EX3 25 5 776 0.13 EX4 25 625 994 0.18 25 EX5 25 8 2031 0.46 16 EX6 24 2:03  10 441 0.08 125 EX7 125 3 865 0.12 EX8 125 4125 1930 0.27 125 EX9 125 5 4730 1.05 125 EX10 125 6 5723 1.23 EX11 125 8125 12785 3.42 81 EX12 120 3:45  10 2382 1.78 Table 3: The experimental results Table 3 shows our experimental results. Under the column no. realizations oorplan we list the total number of possible realizations for each example. no. realizations sub oorplans denotes the total number of realizations (including nonredundant ones) of the sub- oorplans In this example, not all the widths and heights of the realizations are integers, but our algorithm can handle this situation without any modi cation. The integral requirement is only for complexity analysis. 5

19

ever generated by AreaMin for each example. This number re ects the number of realizations examined by AreaMin. From the table it is clear that AreaMin prunes a large number of redundant realizations in determining the set of nonredundant realizations. For example, although EX11 has 8125 possible realizations, our algorithm examined only 12785 realizations to determine the set of nonredundant realizations and it took only 3.42 seconds on a SUN SPARCstation 10. The running times for the other 11 examples are all under two seconds as shown in Table 3. The experimental results clearly show that our algorithm is very ecient.

6 Conclusions In this paper, we showed that the area minimization problem for hierarchical oorplans is NP -complete. This settled the long-standing open problem on the complexity of the area minimization problem for hierarchical oorplans. We then presented a fast pseudo-polynomial area minimization algorithm for hierarchical oorplans of order-5 | the commonly used hierarchical oorplans. The algorithm is based on a new algorithm for determining the set of nonredundant realizations of a wheel. The new algorithm for wheels runs faster and uses less memory than the previous algorithms in theory and practice.

References

[1] K. Chong and S. Sahni, \Optimal realizations of oorplans," in IEEE Trans. on Computer-Aided Design, vol. CAD-12, no. 6, pp. 793-801, 1993. [2] Wei-Ming Dai and E.S. Kuh, \Simultaneous oor planning and global routing for hierarchical building block layout," in IEEE Trans. on Computer-Aided Design, vol. CAD-6, no. 5, pp. 828-837, 1987. [3] M. L. Fredman, \How good is the information theory bound in sorting?," in Theoretical Computer Science 1, pp. 355-361, 1976. [4] M. R. Garey and D. S. Johnson, Computers and Intractability, A Guide to the Theory of NP-completeness. Freeman, San Francisco, 1979. [5] T. Lengauer, Combinatorial Algorithms for Integrated Circuit Layout. John Wiley & Sons, New York, 1990. [6] T. Lengauer and R. Muller, \Robust and accurate hierarchical oorplanning with integrated global wiring," in IEEE Trans. on Computer-Aided Design, vol. CAD-12, no. 6, pp. 802-809, 1993. 20

[7] R.H.J.M. Otten, \Automatic oorplan design," in Proc. 19th ACM/IEEE Design Automation Conf., 1982, pp. 261-267. [8] Peichen Pan and C. L. Liu, \Area minimization for general oorplans," in Digest Int'l. Conf. on Computer-Aided Design, 1992, pp. 606-609. [9] L. Stockmeyer, \Optimal orientations of cells in slicing oorplan designs," in Info. and Control, vol. 59, pp. 91-101, 1983. [10] K. J. Supowit and E. A. Slutz, \Placement algorithms for custom VLSI," in Computer Aided Design, vol. 16, no. 1, pp. 45-50, 1984. [11] Ting-Chi Wang and D.F. Wong, \Optimal oorplan area optimization," in IEEE Trans. on Computer-Aided Design, vol. CAD-11, no. 8, pp. 992-1002, 1992. [12] S. Wimer, I. Koren, and I. Cederbaum, \Optimal aspect ratios of building blocks in VLSI," in IEEE Trans. on Computer-Aided Design, vol. 8, no 2, pp. 139-145, 1989. [13] D.F. Wong and P.S. Sakhamuri, \Ecient oorplan area optimization," in Proc. 26th ACM/IEEE Design Automation Conf., 1989, pp. 586-589.

21