Optimal Replication for Min-Cut Partitioning 1 ... - Semantic Scholar

Report 7 Downloads 60 Views
Optimal

Replication

for Min-Cut

Partitioning Abbas El Gamal

James Hwang

Information Systems laboratory Stanford University Stanford, CA 94305 benefit as an optimization

Heuristics for replicating logic have been shown to reduce pin count and wiring density in partitioned logic networks. We present an eflcient algorithm for determining an optimal min-cut replication set for a k-partitioned graph in case with limO(knm log(n2/m)) time. For the NP-hard ited site partition components, we propose a new replication heuristic which reduces the worst-case running time by a factor of O(k’) over previous methods. Experimental results are presented.

1

l

Replication may reduce the number of FPGAs required to implement a design.

l

Replication can reduce the number of wires interconnecting the FPGAs.

l

Replication can reduce the number of inter-chip wires along a path in a design, resulting in increased performance.

Introduction

Graph partitioning is an extensively studied problem with many applications, including floorplanning, placement and multi-chip design (e.g. [l]). The problem is to find a partition of the vertices of a graph which minimizes the number of edges between vertices in distinct components. With constraints on component sizes, the problem is NP-complete, but fast and effective heuristics exist[3, 61.

The paper is organized as follows. Section 2 contains basic definitions and a formal statement of the min-cut replication problem. In Section 3 we present a solution to the replication problem, and in Section 4, a new replication heuristic. Section 5 contains experimental results.

Vertex replication as demonstrated in Fig. 1 can be used to dramatically reduce the size of a cut in a partitioned graph. To date however, work on replication has been experimental. An automated logic partitioning system supporting replication was described in [S], requiring trial and error by the user to determine logic to replicate. More recent heuristic approaches were reported in [2, 71.

2

Preliminaries

Given a directed graph G {v,, vz, ..*, Vk} is a partition disjoint subsets or’ components. ponent V\E is denoted T. A in-degree zero, and a sink if it of sources in a subset S c V is

In this paper we define vertez replication as a transformation on directed graphs. We present an O(knmlog(n2/m)) algorithm for determining optimal min-cut replication sets in a k-partitioned directed graph. We propose Flow-FM, a new heuristic for the NP-hard case with limited size partition components and show that the heuristic provides significant

= (V, E), a k-cut V = of the vertex set V into k ,The complement of a comvertex u is a source if it has has out-degree zero. The set denoted Is.

For any vertex u E K::, the replication of u into another component Vj is defined to be the graph G’ = (V’, E’), with vertex set V’ = V U {uj} obtained by adding a new vertex ui to Vj, and edge set E’ identical to E with the following modifications.

2’

l

Every cut edge (u,v) incident from u, where 2’ E Vj, is replaced by an edge (Uj, u).

l

For every edge (v,u) incident into u, E’ contains a new edge (v, Uj)*

Subsequent replication of Uj into F and further replications of u into Vj are defined to be null operations, and the transformation is extended in the natural way to arbitrary sets of vertices.

Figure 1: Graph corresponding to an n-to-2” decoder. Replicating u reduces the cut size from 2” to n.

432

O-81863010-8/92 $03.00 Q 1992 IEEFJ

step applied after partitioning.

Our work on replication has been motivated by the increasingly popular practice of mapping large logic networks into multiple FPGAs. Often the number of pins on an FPGA is not large enough to permit high utilization of its gate capacity after partitioning. The excess gate capacity may be used for replication with the following potential benefits.

Let V = {VI, V,, . . . . Vk} be a k-cut of G. We define in(lQ

= l{(u,v)

E E : u +Zvi, v E v;}l

to be the number of cut edges incident into component vi. Then as each cut edge is incident into exactly one component, we have that cutsize( V) = C in(V) .

t

S -55

vl&

The main problem addressed in this paper is the following. Min-cut Replication Problem Given a directed graph G = (V, E), and cut {VI, determine a collection of sets of vertices, {I$ : which minimizes cutsize( where V’ is the results when I$ is replicated from x to Vj for

Figure 2: Flow network construction V&

Vz, . . . , Vk}, 1 5 i, j 5 k}, partition that all i and j.

Solving

the Replication

Problem

The general k-cut min-cut replication problem solved by making the following observation.

Before treating the general problem, it is useful to consider the simplest nontrivial replication problem, namely determining the unidirectional optimal replication set when k = 2. Simple Min-cut Replication Problem Given a directed graph G and cut {VI, Vz}, determine of vertices Vl; c VI which minimizes in(Vs U VA).

of

Proof. The flow network can be easily constructed in O(m) time. Using, for instance, the algorithm in [5], a maximal flow can be found in time O(nmlog(n2/m)). Determining VA by breadth-first search in the residual graph can be done in time O(m). Hence, the entire replication algorithm runs in time O(nm log(n2/m)). n

We note in passing that choosing cost functions other than cut size results in distinctly different replication problems.

3

and determination

can be

Proposition 1 An equivalent statement of the min-cut replication problem is to determine sets v c E, that minimize in(l$UK*) fori= 1,2, . . . . k. Furthermore, each subset V can be determined independently.

a set

The proposition mas.

Let G = (V, E) be a directed graph with cut {VI, VZ}. We define a flow network G’ = (V’, E’) with vertex set V’ = V U {s, t}, where source s and sink t are new vertices (see Figure 2). Sets Vi = VI U {s} and Vi = Vz U it} define a cut in G’. The edge set E’ = (E - Ezl) U E: U E:, where

can be proved with the following simple lem-

Lemma 1 Let {VI, V2, . . . , Vk} be a k-cut of G, and let Vi C rji, fori = 1,2 ,..., k be arbitrary subsets of nodes. Then replicating lJi in Q can affect in(Vj) only when j equals 2.

. E: = ((8, u) I u E Iv, 1, l

Lemma 2 Let {Vi, V2, . . ., Vk} be a k-cut of G, and let {v} be replication sets, v c 9, that minimize in(V u v) for i = 1, 2, . . . , k. Then replicating Vi* in K for each i results in a minimal cut size over all replication sets.

Ei = {(v, t) 1v E V,, v on a cut edge}, and

. Es, = {(u,v)

1 u E Vz and v E K}.

The capacity function defined on E’, cap(e) =

Proof.’ (of Proposition 1) Suppose { I$} is an optimal solution to the min-cut replication problem. Then {I$* : \ and a u-t path exists in G’. We then have the following result. Theorem 1 VG is a solution to the simple min-cut replication problem.

Conversely, suppose {q*} is an optimal solution to the modified problem. Then letting KJ = Vj* n K yields a solution to the original problem. Clearly the V;:; are well defined. Furthermore, by Lemma 2, these {v} minimize cut size over all possible replication sets. n

Corollary 1 Given a directed graph G = (V, E) and a cut {VI, Vz}, an optimal replication set into V2 can be determined in time O(nmlog(n2/m)).

By Proposition 1, a solution to the general vertex replication problem can be obtained by applying the simple algorithm independently to the cuts (K,E} for i = 1, 2, . . . , k. 433

Corollary 2 Given a directed graph G = (V, E) and a k-cd {VI, vz, . . . . Vk}, optimal replication sets can be determined in time O(knmlog(n’/m)).

l

The simple min-cut replication problem for hypergraphs can be solved in a similar manner, replacing each multiple vertex hyperedge by a directed tree. However, for the general hypergraph problem, cut size is not equal to C in(x), so minimizing the sum does not immediately guarantee a minimal cut. As discussed later, this may not be a serious limitation in practice.

4

A k-Cut

Replication

5 . Experimental

Heuristic

The optimal solution of the previous section suggests a coherent approach to designing replication heuristics. Specifically, any heuristic for approximating a minimal cut can be used to approximate the max-flow solution to the constrained k-cut replication problem. By considering in turn each cut { V;:,F}, we reduce the replication problem to a well studied partitioning problem.

In Experiment 1, each design was partitioned for ActellO20 FPGAs with capacity 546 modules and 69 I/OS, unless the design fit in a single FPGA, in which case it was partitioned with capacity 100 and 50 I/O pins per chip. The Flow-FM replication heuristic was then applied to each component of the partition, and the reductions in total and output pins were recorded.

For example, with only slight modification, a partitioning heuristic due to Fiduccia and Mattheyses[3], denoted FM, can be used for the constrained replication problem. It is straightforward to modify FM to take into account edge directionality and to keep track of the number of cut edges incident into and out of each component (the original algorithm does not distinguish the direction of an edge). Similarly, the gain calculations must be modified to locally minimize in( Vi) instead of the cut size, and the size constraint should apply only to the component x. This approach is in contrast to [7], which uses an extended FM routine to perform replication between component pairs explicitly during partitioning.

Table 1 contains the results of Experiment 1. In the table, Apin is the relative percentage change in total and output pins resulting from replication, and T?/T~ is the ratio of the running time for Flow-FM over the time for the partitioning step. For comparison, Apin is also shown for replication using the FM heuristic without the max-flow step. Replication provided a substantial reduction in the number of pins using available unused modules on the chips, with reductions as large as 40% over min-cut partitioning without replication. Furthermore, it is clear that minimizing in(V,) This sugdid not come at the expense of increasing out(y). gests that the flow step finds replication sets that are probably close to optimal. Comparing the reductions from Flo,w-FM to those from FM’ alone shows that the flow step improves gain substantially.

Figure 3 contains a pseudocode description of Flow-FM, a replication heuristic which combines the max-flow approach with the directed graph FM partitioning heuristic. For each component, the max-flow solution is determined. If the resulting replication set does not violate the capacity constraints, it is kept as part of the solution. Otherwise, the FM heuristic is used to find a good, feasible solution.

Experiment 2 was devised to examine replication gain as a function of chip capacity. Each design of size N was partitioned into five fictitious chips with capacity [N/5J + 2. The chip capacity was then increased by increments of N/10 modules, the Flow-FM heuristic was applied to each chip, and the gain recorded.

By replacing hyperedges by directed trees, Flow-FM can be applied to hypergraphs. When most of the edges in a hypergraph have only two vertices it is reasonable to assume the flow results are close to optimal. Indeed, data in the next section show that Flow-FM substantially reduces the number of outgoing edges, which indicates that minimizing in(Vi) does not come at the expense of increasing cwt(Vj). of Flow-FM

Results

We devised the following experiments to examine the performance of Flow-FM as an optimization step applied after partitioning. Input netlists. ranging in size from a few hundred to four thousand Actel modules. were flattened to the level of the Actel library. Several designs, t2901, ptl020, gme-a and vrcl, were industrial examples, others were derived from MCNC benchmarks, and graphic was a netlist for a graphics engine. Initial partitions were obtained using the FM partitioning heuristic recursively to obtain a feasible solution, then iteratively between all pairs of components to improve the quality of the partition as much as possible.

When graph edges and vertices are assigned positive weights and partition component sizes are limited, the replication problem becomes N?-hard.

Advantages

By using the FM approximation without a flow step, the run time complexity is reduced by a factor of O(k*) over previous approaches by solving k simple problems instead up to k rounds of O(k*) bipartition problems.

Figure 4 contains the results averaged over all the input designs, measuring gain (percentage reduction in the number of required pins) as a function of capacity expansion. Doubling the capacity allowed many of the max-flow solutions to become feasible and an almost fourfold capacity increase was required for all the max-flow solutions to be accepted.

include the following.

When all max-flow solutions are feasible in a graph, the overall answer is provably optimal.

As can be readily seen from Table 1 and Figure 4, replication gain varied significantly over the designs in the data set, but in all cases, Flow-FM provided a substantial reduction in total pins at a low computational cost.

Existing partitioning routines can be used with straightforward modifications to account for edge directionality. 434

procedure FM-Rep(G, vi); begin for each node u E V if (u E V;.) u.lock = FIXED; else u.lock = UNLOCKED;

procedure Flow-FM(G); begin for each component vi begin

Gi = BuildFlowNetwork(G, Vi. = FlowRep( if ( IVi U Vi* ( > capacity ) v = FM-Rep(G, Is:);

I$e..tgedFM( vi, E); * fh

et& node u E vi if (u.loct # FIXED)

vi’ = vi’ u {u};

end; end;

end;

Figure

r

Design C7552

dec des entmisc gmea graphic ptlO2 t2901 vrcl

3: Pseudocode

for

Flow-FM replication

heuristic.

FMApinT A total total -s xi%-

size 1316, 1734 1898 1529

-4 -12 -21 -6 -3 -13 -17 -7 -4

2013 3954 1824 279 294 544

Table

vi);

1: Results

% % % % % % % % %

-38 -24 -25 -16 -4 -40 -21 -23 -4

,pin

1 Time

1

% % % % % % % % %

of Experiment

References

a1

[l] I. Dobbelaere, A. El Gamal, D. How, and B. KIeveland, “Field Programmable MCM Systems-Design of an Interconnection Frame,” FPGA Workshop, 1992. [2] A. El Gamal, ei al., Architectures, Circuits, and ComputerAided Design for Electrically Programmable VLSI, SemiAnnual Technical Report, Defense Advanced Research Projects Agency, March 1991. [3] C. M. Fiduccia and R. M. Mattheyses, “A Linear Time Heuristic for Improving Network Partitions,” Proceedings of the 19th Design Automation Conference, 1982, pp. 175-181. Figure

4: Results

of Experiment

2, with error bars indicating

one standard deviation.

[4] L. R. Ford, Jr. and D. R. Fulkerson, “Maximal Flow Through a Network,” Canadian Journal of Mathematics, Vol. 8, 1956, pp. 399-404.

Furthermore, although the constrained replication problem is AfP-hard, it appears that the unconstrained min-cut algorithm can still determine useful replication sets, especially when chip utilization is low.

[4 A. V. Goldberg and R. E. Tarjan, “A New Approach to the Maximum Flow Problem,” Proceedings of the 18th ACM Sympcsium on Theory of Computing, 1986, pp. 136-146. [S] B. W. Kernighan and S. Lin, “An Efficient Heuristic Procedure for Partitioning Graphs,” The Bell System Technical Journal, Vol. 49, 1970, pp. 291-307. [7] C. Kring and A. R. Newton, “A Cell-Replicating Approach to Mincut-Based Circuit Partitioning,” Digest of Technical Papers, ICCAD-91, Nov. 1991, pp. 2-5.

Acknowledgements

[8] R. L. Russo, P. II. Oden, and P. K. Wolff, Sr., “A Heuristic Procedure for the Partitioning and Mapping of Computer Logic Graphs,” IEEE ‘Bans. on Computers, vol. C-20, no. 12, December 1971, pp. 1455-1462.

We thank Dana How, John Gill, and Sank0 Lan for helpful discussions, and Jack Kouloheris and Polly Siegel for assistance in running the experiments. This research was in part supported by DARPA under contract J-FBI-89-101. 435