The Amorphous FPGA Architecture - Semantic Scholar

Report 0 Downloads 116 Views
The Amorphous FPGA Architecture Mingjie Lin

Department of Electrical Engineering Stanford University, CA 94305

[email protected]

ABSTRACT

Programmable Routing Resource

This paper describes the Amorphous FPGA, an innovative architecture attempting to optimally allocate logic and routing resource on per-mapping basis. Designed for high performance, routability, and ease-of-use, it supports variablegranularity logic blocks, dedicated wide multiplexers, and Logic IO Block Block variable-length bypassing interconnects with a symmetrical structure. Due to its many unconventional architectural fea- replacements PSfrag tures, the amorphous FPGA requires several major modifications to be made in the standard VPR placement/routing CAD flow, which include a new placement algorithm and a modified delay-based routing procedure. It is shown that, on average, an FPGA with the amorphous architecture can achieve a 1.35 times improvement in logic density, 9% imMultipliers and Block RAM Processor Block provement in average net delay, and 4% improvement in the critical-path delay for the largest 20 MCNC benchmark cirFigure 1: A generic island-style FPGA. cuits over an island-style baseline. Conventional island-style FPGA poses several challenges to architecture design. The first challenge is how to find Categories and Subject Descriptors a good balance between flexibility and efficiency in terms of area, performance and power, i.e., how to optimally alB.7.1 [Integrated Circuits]: [Types and Design Styles] locate hardware resource between logic and interconnects while still achieving maximum overall performance for a General Terms given set of target designs. Conventionally, FPGA has a Design, Experimentation, Measurement, Performance strict separation between logic and routing resources. This division is determined before the chip fabrication and is fixed Keywords at the configuration time. Despite of many advances in device technology over the last decade, a large proportion of FPGA, architecture, amorphous, performance analysis. the silicon area (≈60-80%) is always devoted to routing resources in order to ensure sufficient routability [3]. Mean1. INTRODUCTION while, the logic blocks are becoming ever more complex, Despite many advantages of FPGA, the huge performance attempting to perform coarse-grain functions and therefore and cost-efficiency gap between FPGAs and ASICs severely lighten the stress on the routing resources, but often end limits its application. Previous studies [1, 2] have shown up being under-utilized. Given a fixed amount of hardware, that without innovations in FPGA architecture, advances how to optimally partition them between logic and routin device technology alone can not significantly shrink this ing remains an open problem in FPGA architecture design. gap. Unfortunately, optimizing FPGA architecture proves This challenge is further complicated by the factor that a to be quite challenging mainly because an FPGA’s overall generic FPGA family often needs to maximize the applicaperformance is jointly determined by many factors including tion spectrum covering both control- and data-path applicalogic block, IO block, clock network, and routing architections. One conceptually appealing idea is to remove the hard tures, etc. As a result, the architecture of today’s FPGAs, boundary between logic and routing, and therefore permits although enhanced with extra features such as block RAMs applications with regular logic structures to more efficiently and embedded microprocessor, still very much resembles the utilize silicon area, while still permitting the use of many inone used in the first generation of FPGAs with similar wellterconnects at the expense of logic for random logic circuits. structured island style, in which an array of logic blocks are It should be noted that trading hardware resource between surrounded by pre-fabricated programmable routing chanrouting and logic is not totally new idea, early Pilkington nels as illustrated in Figure 1. architecture and Triptych [4] were two early attempts to follow this idea and have shown significant density advantages Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are over traditional island-style. More recently, [1, 5, 6] follow not made or distributed for profit or commercial advantage and that copies the similar approach of spreading out logic to match routing bear this notice and the full citation on the first page. To copy otherwise, to demand. republish, to post on servers or to redistribute to lists, requires prior specific The second challenge is to determine the granularity of permission and/or a fee. logic blocks in an FPGA. All prominent FPGAs[7, 8, 9, 10] FPGA’08, February 24-26, 2008, Monterey, California, USA. Copyright 2008 ACM 978-1-59593-934/08/02 ...$5.00.

Logic Resource Logic Resource today have fixed and uniform logic granularity for each logic block. From the architecture point of view, coarse-grain blocks have much less stress on the placement and routing but often result in long internal logic delays and underutilization for designs in small size, whereas fine-grain logic PSfrag replacements blocks can achieve shorter internal delay but often requires excessive amount of routing resource in order to successfully route a circuit. From the application point of view, datapath functions, in particular arithmetic functions, often op- Resource Routing/Logic erate on coarser arguments than control-path logic and are Routing Resource Routing Resource usually realized by fine-grain logic elements, while the imple(b) (a) mentation of control-path logic mostly benefits from coarser granularity. A rather interesting question is whether the Figure 2: Conceptual picture of FPGA architeclogic blocks in an FPGA should be heterogeneous or hotures: (a) Conventional island-style FPGA, (b) mogeneous in size. There are two architectural reasons to Amorphous FPGA. believe that an FPGA with heterogeneous blocks can potentially provide superior speed and density: (i) Different kinds In the following section, we describe the amorphous FPGA of logic may be more efficiently implemented with different architecture in detail. We then illustrate in Section 3 how kinds of blocks. (ii) Previous studies have shown that coarsevarious system functions can be performed inside an amorgrain blocks exhibit superior speed to fine-grain blocks, yet phous FPGA. Before presenting the performance comparthe smaller blocks have better density[11, 12]. A mixture ison results between the amorphous architecture and an of the two may provide superior speed-area trade-off. Parisland-style baseline in Section 5, we present our placement tially motivated by the above observations, Hutton et al [13] and routing algorithms in Section 4. Finally in Section 6, proposed a new adaptable FPGA logic element based on we summarize our findings and comment on several open fracturable 6-LUTs, which fundamentally alters the longresearch problems related to the amorphous FPGA archistanding belief that 4-LUT is the best choice for area/delay tecture. trade-off. The third challenge of designing an island-style FPGA is 2. THE AMORPHOUS ARCHITECTURE to determine the optimal segmentation of routing interconnects. In conventional routing architecture, each routing As depicted in Figure 3, the top-level architecture of the channel consists of a group of interconnects with variable amorphous FPGA architecture consists of an array of Routlengths. For example, Virtex II [8] has 16 Single, 40 Double, ing or Logic Element (ROLE) blocks with horizontal and 120 HEX, and 24 long interconnects in each routing channel. vertical routing channel overlay on the top. Different from In general, short segments are advantageous to routability the conventional island-style FPGA, the amorphous FPGA but bad for delay and power performance, while long interreplaces logic blocks with specially designed ROLE blocks connects achieve better delay performance but may result that allow the dynamic partition of hardware resource behigher power consumptions. For a given set of benchmark tween logic and routing on a per-mapping basis after chipcircuits, what is exactly the optimal segmentation for a parfabrication. Each ROLE block is capable of performing logic ticular routing architecture remains an open question. only, routing only, or the combination of both tasks.

New Approach

Routing or Logic Element (ROLE)

We propose the Amorphous FPGA architecture to meet above design challenges. Our objective is to develop an architecture that maximizes the application spectrum for both data-path and control-path applications without compromising performance and area efficiency. The main motivation behind the amorphous architecture is to reduce the significant cost paid for routing in standard FPGAs and translate the saving in hardware usage into performance gain. The central idea of the amorphous FPGA is to make several architectural choices dynamically configurable on a per-mapping basis at configuration time. The concept of this architectural “shapelessness” is illustrated in Figure 2. While in the conventional island-style architecture, there is a strong separation between logic and routing resources, and this resource partition is fixed after the chip fabrication, the amorphous FPGA allows the dynamic resource partition at configuration time. In addition, the amorphous architecture can readily perform several system-level functions such as (i) dynamic resource allocation between logic and routing, (ii) variable-granularity logic blocks, (iii) dedicated wide multiplexers, and (iv) variable-length interconnect overlay without passing through switching points.

As shown in Figure 4, a typical ROLE block contains three types of functional structures: 4-input look-up tables (4LUTs), flip-flop registers, and MUXes. The main motivation of this design is the observation that the logic capability of a LUT supersedes that of a MUX or a multiple-input switch. As shown in Figure 5, a 4-LUT can readily implement a 2:1MUX or 4-input Switch. To differentiate from conventional LUTs (look-up tables) and MUXes (multiplexers), we name the structure depicted in Figure 4(b) as MUT (Multiplexer or look-Up Table). Three parameters W , m, and k define the structure of a ROLE block. W denotes the total number of MUXes and MUTs along each side of the ROLE block, m is the number of MUTs on each side, and k is the number of inputs for a MUX or MUT in a ROLE block. Figure 4 depicts a ROLE block with W = 3, m = 1, and k = 6. A ROLE block can be configured into different types of functional blocks. If all 6-MUTs are used as 6-MUXes, then the whole ROLE will behave like a routing block. In contrast, if we use all 6-MUTs as combinations of 4-LUTs and their associated FFs, then the whole ROLE can be looked as a typical logic block with four 4-LUTs. Alternatively, we can partially use 6-MUTs and use the ROLE block as a hybrid

Interconnect Overlay

ROLE block

4-MUX

54 564 v

rrs

! !"

Sfrag replacements

6-MUT

 l  m == >> mlmlmlml ihiha`a` kjkjcbcb