High-Density Integration of Functional Modules Using ... - gtcad

Comment

Report 4 Downloads 41 Views

8C-1

High-Density Integration of Functional Modules Using Monolithic 3D-IC Technology 1

Shreepad Panth1 , Kambiz Samadi2 , Yang Du2 , and Sung Kyu Lim1 Dept. of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 2 Qualcomm Research, San Diego, CA 92121

Abstract—Three dimensional integrated circuits (3D-ICs) have emerged as a promising solution to continue device scaling. They can be realized using Through Silicon Vias (TSVs), or monolithic integration using Monolithic Inter-tier vias (MIVs), an emerging alternative that provides much higher via densities. In this paper, we provide a framework for ﬂoorplanning existing 2D IP blocks into 3D-ICs using MIVs. We take the ﬂoorplanning solution all the way through place-and-route and report post-layout metrics for area, wirelength, timing, and power consumption. Results show that the wirelength of TSV-based 3D designs outperform 2D designs by upto 14% in large-scale circuits only. MIV-based 3D designs, however, offer an average wirelength improvement of 33% for a wide range of benchmark circuits. We also show that while TSV-based 3D cannot improve the performance and power unless the TSV capacitance is reduced, MIV-based 3D offers signiﬁcant reduction of upto 33% in the longest path delay and 35% in the inter-block net power.

Fig. 1. A sample monolithic 3D technology with three metal layers per tier.

I. I NTRODUCTION Three dimensional integrated circuits (3D-ICs) have emerged as a promising solution to extend the 2D scaling trajectory predicted by the Moore’s Law. Currently, through-silicon vias (TSVs) enable 3DICs, allowing vertical stacking of multiple dies fabricated separately. However, the quality of TSV-based 3D-ICs strongly depends on the TSV dimensions and parisitics, and are limited to memory-on-logic or large logic-on-logic designs with relatively small number of global interconnects [1], [2]. An emerging alternative to TSV-based 3D is monolithic 3D that enables orders of magnitude higher integration density compared to that of TSV-based technology, due to the extremely small size of the monolithic inter-tier vias (MIVs). Monolithic 3D integration technology fabricates two or more tiers of devices sequentially, instead of bonding two previously fabricated dies using micro bumps and TSVs. Figure 1 shows a typical monolithic 3D structure with three metal layers per tier. The two device tiers are connected by inter-tier vias, which are essentially the same size as intra-tier vias. To fabricate the top device tier, low-thermal budgeting process must be applied to prevent damage to the underlying tier’s back-end-of-line (BEOL). Currently, several monolithic 3D integration processes are developed. CEA/LETI [3], [4] has developed a sequential integration ﬂow based on low temperature bonding process. Samsung [5] has developed a “S3” technology for 3-tier SRAM cell using low-thermal TFT process. Overall, MIVs provide better electrical characteristics (i.e., less parasitics, electrical coupling, etc.) than TSVs, and also enable higher integration densities due to their small size. In this paper, we propose an efﬁcient 3D design space exploration framework (i.e., 3D ﬂoorplanning) which accounts for the different characteristics between TSV-based and monolithic 3D integration technologies. Since re-designing existing logic, memory and IP blocks for 3D incurs signiﬁcant design overhead and cost, near-term 3D-ICs will focus on reusing existing 2D blocks [6], [7], [8]. In this paper, we present a ﬂoorplanning framework that uses different optimization objectives, based on physical characteristics of both TSVs and MIVs. To the best of our knowledge, this paper is the ﬁrst to provide a 3D ﬂoorplanning framework speciﬁcally for monolithic 3D integration technology.

978-1-4673-3030-5/13/$31.00 ©2013 IEEE

To further enhance the applicability of our approach, we integrate our proposed 3D block-level ﬂoorplanning framework with the existing commercial place and route (P&R) tools to more accurately assess the solution quality. We use four different testcases with varying complexities, ranging from 33K to 1.7M gates in 45nm technology to better show the impact of our proposed methodology. The contributions of our work are listed below. • We propose and develop an integrated 3D block-level ﬂoorplanning framework with appropriate objective functions for both TSV-based and monolithic 3D to enable an efﬁcient 3D-IC design space exploration. • In addition to using simulated annealing for ﬂoorplanning, we propose a post-ﬂoorplan reﬁnement (PFPR) heuristic which achieves an average reduction of 6.13% in inter-block wirelength with respect to the initial ﬂoorplan. • We propose a methodology for MIV planning, which relies on custom scripts, and existing commercial P&R tools. • We develop a methodology that takes the obtained 3D ﬂoorplan all the way through place and route, and is capable of reporting post-layout timing and power numbers. II. R ELATED W ORK Monolithic design for high-performance ICs was presented in [9]. This paper presented two design styles, the ﬁrst one in which PMOS and NMOS devices are fabricated on separate layers, and another in which standard cells have both PMOS and NMOS devices in the same tier. They presented a placement algorithm to fully utilize high density MIVs. Although the stackup is similar to our case, this paper carried out their study at the gate level, and the placement algorithm is not applicable to block-level monolithic 3D-ICs. Design for monolithic 3D-SRAM was carried out in [10]. The authors provided different design styles of the SRAM cell, assuming different PMOS and NMOS tiers, and compared them w.r.t. static noise margin, write margin, and data retention voltage. While several prior works exist on adding TSVs at the gate level or core level, only a few works consider adding TSVs into existing whitespace blocks at the ﬂoorplanning stage. Simultaneous buffering

681

8C-1 Center-to-Center based Annealing Update with pin locations

Create Verilog and DEF files with pins

Annealing based refinement

Route with Encounter

CT SV = αW L + βA + γNT SV

No Create Verilog/DEF file for each die

TSV planning Existing work Fig. 2.

Custom program

(1)

In the above equation, W L represents the inter-block wirelength, A represents the chip area, and NT SV represents the number of TSVs. However, if we are dealing with monolithic 3D, then the MIV size is negligible, and we do not need to constrain the number of MIVs, opening up the possibility for further optimization. The monolithic 3D cost function is given as follows.

Extract MIV location and connectivity

Yes

Monolithic ?

swap two blocks in either the positive sequence, negative sequence, or both, and (3) move or swap two blocks between a pair of dies/tiers. In TSV-based 3D, we need to control the number of TSVs due to its signiﬁcant silicon area. Hence, the TSV-based 3D cost function is given as follows.

CM IV = α W L + β A

Cadence Encounter

The design ﬂow to obtain a 3D ﬂoorplan with TSV/MIV insertion.

and TSV planning was carried out in [11], but the authors reported inaccurate 3D HPWL and timing metrics. An improved algorithm was presented in [7], but the same inaccurate HPWL metric was used. Results based on an improved BB-2D-HPWL metric was presented in [8], and the most accurate HPWL metric based on subnets was presented in [6]. However, none of these papers compared the quality of their engine with that of a commercially available tool, or took the obtained ﬂoorplans through place and route and reported postlayout numbers. These shortcomings are overcome in this paper, and therefore, the numbers reported are the most accurate. To the best of our knowledge, this is the ﬁrst work to fully exploit the high density offered by monolithic 3D integration, use a validated ﬂoorplanner to perform block-level monolithic 3D design, and compare post-layout 3D wirelength, timing and power numbers with those of a commercial 2D tool. III. 3D F LOORPLANNING WITH M ONOLITHIC I NTER - TIER V IAS A. Problem Formulation and Overview A general form of the 3D ﬂoorplanning problem can be stated as follows : Given the number of desired tiers, and a set of blocks along with their corresponding widths and heights, determine the (x, y, z) locations of each of the blocks and all MIVs/TSVs. The overall design ﬂow is shown in Figure 2. We ﬁrst perform ﬂoorplanning to determine the location of all the blocks assuming the pins are placed at the center. Once the locations of all the blocks are determined, we update the locations of the pins and perform a reﬁnement step (i.e., PFPR) to further minimize wirelength. Depending on whether we are dealing with TSVs or MIVs, we have different via planning engines. Finally, we create separate Verilog ﬁles for each die/tier with the corresponding connectivity information, and a design exchange format (DEF) ﬁle with the location of blocks and TSVs/MIVs. Each of the above steps are further explained in following subsections. B. Floorplanning Engine In this step, we take the description of all the blocks as well as the connectivity information and generate an output ﬂoorplan that minimizes a certain cost function depending on whether we are using TSV-based or monolithic 3D. We use a simulated annealing engine similar to [6], maintaining a separate sequence pair for each die. We perform the following different moves during the annealing process: (1) change aspect ratio of a block (or rotate in case of hard blocks), (2)

(2)

Considering the pin locations of the blocks during ﬂoorplanning will require an extra step to compute the physical location of all block-pins. Since the number of block-pins are quite large, this will lead to large runtime overhead. We instead propose a postﬂoorplanning reﬁnement (PFPR) step to consider pin locations once block locations have been determined. C. Post-Floorplan Reﬁnement (PFPR) After we determine the relative locations of all the blocks, we update the blocks with the pin locations. Each block has 8 possible orientations, 0◦ , 90◦ , 180◦ , 270◦ , and their ﬂipped counterparts. Without changing the relative locations of the blocks in the ﬂoorplan, each block can only have four possible orientations. For example, if the pins are in the center of a block, 0◦ , 180◦ or 90◦ , 270◦ and their ﬂipped counterparts are all the same. However, if the pins are placed along the periphery each of the above four orientations gives a different wirelength result. The goal of this step is to determine the orientation of each block, such that the wirelength is minimized. To do this, we use simulated annealing, where the only operation allowed is to change block orientation. The block orientation can only be changed among the allowed four scenarios. No sequence pair is necessary, as the relative locations of blocks do not change. Furthermore, wirelength computation can be done incrementally as we only change one block at a time. D. MIV Planning Algorithm Once we obtain the 3D ﬂoorplanning result, we need to insert TSVs or MIVs (monolithic inter-tier vias) in the case of monolithic 3D to connect blocks in different tiers. Since TSVs are big (around 5μm to 10μm) and we may not have enough whitespace in the dies, a whitespace manipulation step is required. We use an existing TSV planner [6] that constructs a 3D rectilinear Steiner tree (RST) from a 2D rectilinear Steiner minimum tree (RSMT), and then moves TSVs to nearby whitespace based on a network-ﬂow formulation. In the case that there is insufﬁcient whitespace, we insert whitespace at desired locations. However, in the case of monolithic 3D, MIVs are very small (around 70nm) and hence, we can safely assume that there is always whitespace available for MIV insertion. In this case, we can utilize existing obstacle avoiding routers to perform MIV insertion. We use the 2D IC router in Cadence SOC Encounter, and since it is limited to 15 metal layers, we use three metal layers to represent a given tier for the MIV planning stage only. This allows us to represent up to 5 tiers. For example, if a block is in tier 2, we use metal layer 4 to place block-pins, and metal layers 5 and 6 to represent interblock routing on that tier. Vias between metal 6 and 7 represent MIVs between tier 2 and 3. Our choice of the number of metal layers used

682

8C-1 Algorithm 1: MIV Planning Algorithm Input : Location of all blocks in B, block orientation, block-pin locations, and connectivity information Output: Number, location, and connectivity information of MIVs 1 for n ← 1 to Nnet do 2 add connectivity information into a Verilog ﬁle; 3 end 4 for i ← 1 to |B| do bi 5 for p ← 1 to Npin do 6 add pin physical location (xpbi , ybpi , lbi ) in the DEF; 7 end 8 add routing blockage for bi on its assigned layer ljbi ; 9 end 10 read the above Verilog and DEF ﬁles into SOC Encounter; 11 route the design and save the routed DEF ﬁle; 12 read the routed DEF ﬁle and reconstruct the routing graphs; 13 extract corresponding subnets in each die / tier from the routing graphs; 14 create Verilog ﬁle for each die/tier with subnet connectivity; 15 create DEF ﬁle for each die/tier with MIV locations;

Fig. 3.

TABLE I D ESIGN S TATISTICS FOR A LL B ENCHMARKS Design

# Gates

#Blk

des perf 33,024 cf rca 16 146,542 cf fft 256 8 288,145 mult 256 256 1,639,050

38 95 49 127

#Inter-blk Intra-blk Target nets WL (μm) period (ns) 2,378 210,488 0.9 3,135 1,210,618 1.3 1,402 4,490,813 1.5 49,471 12,354,340 0.845

is justiﬁed because we only route the inter-block nets in our blocklevel monolithic 3D designs, and they are routed in the top 2 or 3 metal layers of each tier. Our MIV planning heuristic starts with creating a netlist that contains the connectivity information of the pins of all the 3D nets as shown in Lines 1 to 3 of Algorithm 1, where Nnet denotes the total number of 3D nets. We then create a DEF ﬁle that contains the physical location of every pin of each block; xpbi and ybpi denote the x and y coordinates of pin p of block bi , respectively, and lbi denotes the metal layer that block bi is assigned to. In addition, we add routing blockages for each block to account for (1) the fact that MIVs cannot be placed within the blocks and (2) the internal wiring of each block (Lines 4 to 9). Next, we give the Verilog and DEF ﬁles to SOC Encounter to route all the 3D nets simultaneously (Lines 10 and 11). Simultaneous routing of all 3D nets avoids any possible congestion issues due to the small size of MIVs. Once we obtain the routed DEF, we trace the routing topology to determine (1) which MIV belongs to which net, and (2) which block-pin the MIV connects to (Lines 12 and 13). Finally, we generate the Verilog and DEF ﬁles for each tier (Lines 14 and 15) that contains the block/MIV locations. IV. E VALUATION A. Experimental Setup All required code and scripts are implemented in C/C++ and python, and all experiments are carried out on a 2.5 GHz 64bit linux system. The 45nm Nangate open source standard cell library is used in our experiments. The TSV diameter, landing pad size, pitch, and thickness are assumed to be 6μm, 7μm, 10μm, and 50μm respectively. The MIV diameter, pitch and thickness are 0.07μm, 0.28μm and 0.31μm respectively. The TSV resistance and

Our design ﬂow used to get post-layout simulation results.

capacitance are 50mΩ, and 122f F respectively. These parasitics are measured values, taken from [12]. The MIV resistance and capacitance are similar to that of local vias and are 4Ω, and 1f F respectively. The monolithic structure is similar to that of Figure 1, except that we use six metal layers per tier. We consider four benchmarks in this work, statistics of which are shown in Table I. The ﬁrst three are taken from the Opencores benchmark suite [13], and the fourth is a custom built 256-bit integer multiplier. This multiplier is built out of 256x4-bit multiplier and 512bit adder blocks, arranged into an adder tree. Each multiplier block has 3 pipeline stages and each adder block has 4 pipeline stages. The design ﬂow used to obtain all results is shown in Figure 3. It consists of roughly two steps: block design, and top-level design and analysis. 1) Block Design: We begin by designing each block separately in Cadence SOC Encounter. The netlist for each block is obtained by grouping modules bottom up along the hierarchy, until they reach a certain area threshold. Timing constraints for each block depend on the overall system frequency, and are determined by context characterization. Each block is then placed, routed and timing optimized in SOC Encounter. This step ﬁnalizes the pin locations within each block. We choose four blocks at random from “cf rca 16” testcase and show their layouts in Figure 4. 2) Top-level Design and Analysis: We perform ﬂoorplanning using the methodology described in Section III-B. Three different ﬂoorplanning methodologies are considered, the ﬁrst two (1) TSV-based 3D (TSV) and (2) monolithic 3D (MIV) are already described. The third one, MIV TF is obtained by using the same ﬂoorplan output as in the TSV case (before whitespace insertion), but using the MIVplanning engine instead of the TSV-planning engine. This compares the quality of the two different methodologies, starting with the same ﬂoorplan. The number of MIVs in MIV TF used can be more than the number of TSVs because multi-pin nets might use far more MIVs due to their small size. Some sample layouts for 2D ﬂoorplanning and 2-Die implementations of cf rca 16 are shown in Figure 4. We next route each die separately in SOC Encounter. We perform parasitic extraction to obtain the SPEF ﬁles for each die. In addition, we create a top-level Verilog ﬁle with the interconnections between dies, and a top-level SPEF ﬁle with the TSV/MIV parasitics. All netlist and parasitic information is then fed into Synopsys Primetime to obtain true 3D timing and power numbers. B. Experimental Results and Discussions 1) Floorplanner Validation: We run our ﬂoorplanner in 2D mode, and compare it with the results obtained from wirelength-driven ﬂoorplanning in Cadence Encounter. The Encounter footprint area is

683

8C-1

Fig. 4.

Some sample layouts for cf rca 16 testcase, along with select block designs, and zoomed in shots of TSVs and MIVs TABLE II

A

COMPARISON OF THE PERFORMANCE OF OUR FLOORPLANNER AND C ADENCE E NCOUNTER

Footprint Encounter des perf 0.0655 (1.00) cf rca 16 0.445 (1.00) cf fft 256 8 1.690 (1.00) mul 256 256 5.198 (1.00) Average 1.00

Fig. 5.

(mm2 ) Ours 0.0604 (0.92) 0.413 (0.93) 1.141 (0.68) 4.896 (0.94) 0.87

Inter-block WL (m) Encounter Ours 0.352 (1.00) 0.356 (1.01) 0.361 (1.00) 0.368 (1.02) 0.414 (1.00) 0.437 (1.06) 17.01 (1.00) 17.87 (1.05) 1.00 1.035

Various components of net power reported in this paper.

obtained by gradually increasing the area and running ﬂoorplanning until no block overlap is observed. The results are summarized in Table II. The large area reduction in the cf fft 256 8 design is due to the fact that Cadence Encounter repeatedly produces module overlaps when provided with smaller area. This is presumably due to some bug in the legalization stage of SOC Encounter. It can still provide comparable wirelength to our ﬂoorplanner however, as this particular testcase is only locally connected, and each block communicates with only one or two neighbours. As seen from Table II, our ﬂoorplanner produces comparable results with SOC Encounter. 2) Comparison of 2D versus 3D: In this section, we compare the wirelength, timing and top-level net power of 2D and 3D cases of all designs. The clock period assumed for Total Negative Slack (TNS) and power calculation is taken from Table I. The different components of net power are explained in Figure 5. We have intrablock nets, and inter-block nets. The inter-block net power is further split up into three components: (1) intra-block component (OBN-Int.), (2) inter-block component (OBN-Top) and (3) pin component of the loading cell (OBN-Pin). At the block level, the only component of

net power that can be optimized is OBN-Top. Furthermore, since we do not have a true 3D timing optimization engine, we report preoptimization timing and power numbers. The results for all designs are summarized in Table III. From this table, we see that with respect to the inter-block wirelength, monolithic 3D gives us signiﬁcant advantage. The total wirelength reduction depends upon the ratio of inter-block wirelength to intrablock wirelength, and varies depending on the circuit. TSV-based 3D design however, does not give any improvement in wirelength for the small design des perf, and we start to see small improvements in the cf rca 16 and cf fft 256 8 testcases. However, with the largest design, we see no improvement, mainly because we need to travel a large distance to the nearest whitespace block to place a TSV. Also, as expected, MIV TF gives better wirelength than the TSV-based method, but worse than the MIV case. With respect to timing and net power, we see that the MIV case improves the longest path delay (LPD), the total negative slack (TNS) and the top-level net power. The timing of MIV TF is sometimes better than the timing of MIV, as wirelength driven ﬂoorplanning does not guarantee best timing. In the benchmarks considered, except in the 2-Die case of cf fft 256 8, the TSV case does not give any timing or power improvement over 2D. This is because the large 122f F capacitance is analogous to more than 700μm of Metal 10 wire in the 45nm technology, and a signiﬁcant number of such long wires are required to see a sensible reduction. In general, the reduction in top net power of MIV follows the reduction in top net wirelength. The only exception is mult 256 256. Here we see that our 2D design has 43% more power than encounter, with only 5% more wirelength. This is because power consumption depends on the wirelength distribution, and our ﬂoorplanner results in solutions with the longer nets having higher switching activity. Therefore, we conclude that monolithic 3D can provide signiﬁcant beneﬁts over 2D even in the case of small designs, while TSVbased 3D is suitable for designs with a large number of long interconnections or memory-on-logic stacking applications; and the improvement in the case of logic-on-logic will be observed only with smaller TSV parasitics. 3) Power beneﬁt of monolithic 3D: We provide a detailed preoptimization power split-up of all four testcases in Table IV, with the legend explained in Figure 5. We compare 2D with MIV-based 3D, and also provide a reference case of “ideal interconnections”. This ideal case does not correspond to any real physical scenario, but represents the theoretical minimum power consumption at the block level. The values are obtained by setting the parasitics of the OBN-

684

8C-1 TABLE III A

COMPARISON OF WIRELENGTH , TIMING AND TOP NET POWER OF

Footprint Normalised (μm × μm) Si. Area Encounter Ours 2 Dies MIV 3 Dies 4 Dies 2 Dies TSV 3 Dies 4 Dies 2 Dies MIV TF 3 Dies 4 Dies

256x256 251x241 146x211 127x179 111x149 215x323 320x235 359x402 213x250 211x233 186x160

1 0.92 0.94 1.04 1.01 2.12 3.44 8.81 1.63 2.25 1.82

Encounter Ours 2 Dies MIV 3 Dies 4 Dies 2 Dies TSV 3 Dies 4 Dies 2 Dies MIV TF 3 Dies 4 Dies

667x667 555x744 416x477 367x370 273x384 484x418 377x370 350x349 438x416 375x369 317x311

1 0.93 0.89 0.92 0.94 0.91 0.94 1.10 0.82 0.93 0.89

2D

2D

Encounter 1,300x1,300 Ours 1,142x999 2 Dies 819x718 MIV 3 Dies 581x799 4 Dies 595x594 2 Dies 679x932 TSV 3 Dies 653x674 4 Dies 584x527 2 Dies 675x925 MIV TF 3 Dies 649x668 4 Dies 578x523

1.00 0.68 0.70 0.82 0.84 0.75 0.78 0.73 0.74 0.77 0.72

Encounter Ours 2 Dies MIV 3 Dies 4 Dies 2 Dies TSV 3 Dies 4 Dies 2 Dies MIV TF 3 Dies 4 Dies

1.00 0.94 1.00 0.96 1.02 1.00 1.08 1.14 0.99 1.02 1.03

2D

2D

2,280x2,280 2,144x2,284 1,506x1,718 1,286x1,295 1,177x1,131 1,608x1,616 1,508x1,236 1,240x1,190 1,601x1,609 1,501x1,182 1,182x1,131

#MIV/ #TSV

Inter-block routed WL (μm) des perf 352,805 (1.00) 356,489 (1.01) 1,800 267,678 (0.76) 2,738 222,240 (0.63) 3,823 204,868 (0.58) 120 473,092 (1.34) 456 515,267 (1.46) 984 734,739 (2.08) 124 370,823 (1.05) 482 353,226 (1.00) 1,098 238,356 (0.68) cf rca 16 361,673 (1.00) 367,542 (1.02) 1,747 289,156 (0.80) 2,925 255,910 (0.71) 3,936 240,583 (0.67) 156 354,347 (1.07) 334 401,425 (1.11) 477 345,090 (0.95) 324 323,631 (0.89) 609 281,093 (0.78) 850 263,092 (0.73) cf fft 256 8 413,674 (1.00) 436,933 (1.06) 1,050 263,787 (0.64) 1,921 254,256 (0.61) 2,475 269,049 (0.65) 75 369,166 (0.89) 147 357,592 (0.86) 377 422,216 (1.02) 105 357,887 (0.87) 210 339,045 (0.82) 518 310,465 (0.75) mult 256 256 17,089,968 (1.00) 17,870,346 (1.05) 48,513 13,815,376 (0.81) 79,682 11,392,196 (0.67) 102,994 10,116,222 (0.59) 1,683 18,825,744 (1.10) 3,599 21,184,404 (1.24) 4,232 20,890,062 (1.22) 13,162 16,127,948 (0.94) 20,955 152,560,50 (0.89) 24,260 15,1246,51 (0.89)

Total routed WL (μm) 563,293 (1.00) 566,977 (1.01) 478,166 (0.85) 432,728 (0.77) 415,356 (0.74) 683,580 (1.21) 725,755 (1.29) 945,227 (1.68) 581,311 (1.03) 563,714 (1.00) 448844 (0.80)

2D

VERSUS

LPD (ns)

3D TNS (ns)

OBN-Top power (mW)

1.65 1.73 1.44 1.23 1.10 2.18 2.46 4.09 2.06 1.65 1.25

(1.00) (1.05) (0.87) (0.74) (0.67) (1.32) (1.49) (2.48) (1.25) (1.00) (0.75)

-135.02 (1.00) -162.95 (1.21) -78.93 (0.58) -47.62 (0.35) -22.80 (0.17) -259.6 (1.92) -450.04 (3.33) -590.19 (4.37) -182.17 (1.35) -162.14 (1.2) -54.49 (0.40)

11.24 (1.00) 11.86 (1.06) 8.55 (0.76) 7.29 (0.65) 6.41 (0.57) 21.16 (1.88) 30.26 (2.69) 48.12 (4.28) 12.71 (1.13) 11.72 (1.04) 7.29 (0.65)

(1.00) (0.95) (0.94) (0.93) (0.92) (1.34) (1.75) (1.97) (0.97) (0.90) (0.90)

-2,762.73 (1.00) -2,159.27 (0.78) -1,949.23 (0.71) -1,729.42 (0.63) -1,576.67 (0.57) -11,093 (4.02) -16,074 (5.82) -18,825 (6.81) -2,463.8 (0.89) -1,353.31 (0.49) -1242.01 (0.45)

4.71 (1.00) 4.73 (1.00) 3.74 (0.79) 3.61 (0.77) 3.37 (0.72) 7.49 (1.59) 11.51 (2.44) 13.4 (2.85) 4.12 (0.87) 3.7 (0.79) 3.45 (0.73)

1,572,291 1,578,160 1,499,774 1,466,258 1,451,201 1,564,965 1,612,043 1,555,708 1,534,249 1,491,711 1,473,710

(1.00) (1.00) (0.95) (0.93) (0.92) (1.00) (1.03) (0.99) (0.98) (0.95) (0.94)

1.85 1.75 1.73 1.72 1.69 2.48 3.23 3.63 1.79 1.65 1.66

4,904,487 4,927,746 4,754,600 4,745,069 4,759,862 4,859,979 4,848,405 4,913,029 4,848,700 4,829,858 4,801,278

(1.00) (1.00) (0.97) (0.97) (0.97) (0.99) (0.99) (1.00) (0.99) (0.98) (0.98)

2.18 (1.00) 2.12 (0.97) 1.96 (0.90) 1.9 (0.87) 1.85 (0.85) 2.1 (0.96) 2.47 (1.13) 3.22 (1.48) 1.87 (0.86) 1.74 (0.80) 1.85 (0.85)

-22,308 (1.00) -11,388 (0.51) -3,618 (0.16) -4,447 (0.20) -4,023 (0.18) -14,655 (0.66) -34,950 (1.57) -67,602 (3.03) -6,314 (0.28) -1,358 (0.06) -1,626 (0.07)

7.7 (1.00) 8.2 (1.06) 5.3 (0.69) 5.06 (0.66) 5.29 (0.69) 9.22 (1.20) 11.1 (1.44) 16.67 (2.16) 6.82 (0.89) 6.24 (0.81) 5.74 (0.75)

29,444,308 30,224,686 26,169,716 23,746,536 22,470,562 31,180,084 33,538,744 33,244,402 28,482,288 27,610,390 27,478,991

(1.00) (1.03) (0.89) (0.81) (0.76) (1.06) (1.14) (1.13) (0.97) (0.94) (0.93)

1.12 (1.00) 1.27 (1.14) 1.17 (1.05) 0.95 (0.85) 0.97 (0.87) 1.76 (1.58) 2.02 (1.8) 2.45 (2.19) 1.06 (0.95) 0.99 (0.88) 1.12 (1.00)

-216.33 (1.00) -253.94 (1.17) -251.95 (1.16) -133.96 (0.62) -128.78 (0.60) -441.29 (2.04) -838.1 (3.87) -945.89 (4.37) -205.7 (0.95) -185.82 (0.86) -230.95 (1.07)

144.41 (1.00) 206.1 (1.43) 146.5 (1.01) 125 (0.87) 111.2 (0.77) 304.4 (2.11) 373.1 (2.58) 376.5 (2.61) 187.4 (1.30) 180.2 (1.25) 187.8 (1.30)

Switching power of the standard cells driving inter-block nets.

Fig. 6. Timing slack histograms comparing 2D and MIV-based 3D (2 die) for FFT benchmark. Negative slacks are shown in red, and positive slacks in green.

From Table IV, we see that even theoretically, only a 10% average reduction in the total power consumption is possible, and the reduction is larger for designs with relatively more inter-block nets. We also see that MIV-based 3D gives us 3.1% average reduction in the total power consumption across our four testcases. If we consider the parameter that is being optimized by ﬂoorplanning, i.e., OBN-Top, we see that a large reduction in the power consumption is obtained by using monolithic 3D. The reduction in the driving cell power is present in all testcases, but most noticeable in the mult 256 256, which has a huge number of driving cells.

Top nets to zero in Primetime. With a reduction in the wirelength of top level nets, we expect reduction the following power components: (1) Inter-block components of inter-block nets (OBN-Top), and (2)

Since we do not have a true 3D timing optimization tool, we cannot compare post-optimization numbers directly. However, we can predict the trend from the TNS reduction (Table III), and timing slack histograms (shown for cf fft 256 8 testcase in Figure 6). Due to the average reduction of 51% in TNS, fewer buffer insertions and cell

685

8C-1 TABLE IV A DETAILED SPLIT UP OF THE POWER FOR 2D Std. Cell Leakage Ideal interconnections Encounter 2D Ours 2 Dies MIV 3 Dies 4 Dies

26.1 27.1 27.2 26.9 26.7 26.6

0.5 0.5 0.5 0.5 0.5 0.5

Ideal interconnections Encounter 2D Ours 2 Dies MIV 3 Dies 4 Dies

107.4 108.1 108.1 107.9 107.9 107.9

2.9 2.9 2.9 2.9 2.9 2.9

Ideal interconnections Encounter 2D Ours 2 Dies MIV 3 Dies 4 Dies

353.5 353.7 353.7 353.6 353.6 353.6

8 8 8 8 8 8

Ideal interconnections Encounter 2D Ours 2 Dies MIV 3 Dies 4 Dies

1807.8 2174.9 2233.3 2110.7 2095.5 2049

33.9 33.9 33.9 33.9 33.9 33.9

AND MONOLITHIC

3D (

IN M W

IBN OBN-Pin OBN-Int. OBN-Top des perf 21.09 1.07 1.34 0 (-) 21.03 1.19 1.34 11.24 (1.00) 21.11 1.19 1.34 11.86 (1.06) 21.01 1.2 1.34 8.55 (0.76) 21.07 1.2 1.34 7.29 (0.65) 21.04 1.21 1.34 6.41 (0.57) cf rca 16 57.43 0.12 0.75 0(-) 57.33 0.12 0.75 4.71 (1.00) 57.31 0.12 0.75 4.73 (1.00) 57.3 0.12 0.75 3.74 (0.79) 57.32 0.12 0.75 3.61 (0.77) 57.46 0.12 0.75 3.37 (0.72) cf fft 256 8 120.24 0.56 1.4 0 (-) 120.27 0.53 1.4 7.7 (1.00) 120.27 0.53 1.4 8.2 (1.06) 120.26 0.54 1.4 5.3 (0.69) 120.21 0.54 1.4 5.06 (0.66) 120.18 0.53 1.4 5.29 (0.69) mult 256 256 744.33 10.47 33.1 0 (-) 744.24 9.85 33.1 144.41 (1.00) 741.38 9.82 33.1 206.1 (1.43) 741.35 9.85 33.1 146.5 (1.01) 741.38 9.82 33.1 125 (0.87) 741.27 9.83 33.1 111.2 (0.77)

upsizing will be required to meet timing. Also, since the entire slack histograms are shifted towards the right, techniques such as timing slack redistribution or multi-Vth design can be employed to achieve further power beneﬁt. C. Design Guidelines for block-level MIV-based 3D We consider two possible scenarios: timing critical and power critical designs. In the case of timing critical designs, we have shown that MIV-based 3D can give signiﬁcant reduction in longest path delay, as well as the total negative slack. Larger reductions in delay will be seen for designs with combinational paths through blocks. In the case of power critical designs, we have shown that MIV-based 3D gives signiﬁcant reduction in inter-block net power, and depending on the number of inter-block nets, signiﬁcant savings in power of driving cells of inter-block nets. Further power reduction can be achieved in one of several ways: (1) re-designing the blocks to downsize interblock drivers, (2) voltage scaling of the 3D system, which will shift the entire timing distribution back to the 2D case, and (3) Multi-Vth optimization will require fewer low Vth cells to meet timing, reducing device power. V. C ONCLUSIONS In this paper, we provided a ﬂoorplanning framework for monolithic 3D-ICs, and a methodology to obtain post-layout wirelength, timing, and power numbers for block-level 3D-ICs. We demonstrated that monolithic inter-tier via (MIV)-based 3D-ICs can achieve up to 42% reduction in wirelength when compared with 2D-ICs. In addition, we compared our monolithic 3D designs to the throughsilicon-via (TSV)-based 3D-IC designs in terms of area, wirelength, power and performance. We observed that TSV-based 3D is only beneﬁcial if either the TSV capacitance scales down, or the circuit has a large number of long wires. We also showed that due to a signiﬁcant reduction in the total negative slack, and increase of the positive slacks, MIV-based 3D-ICs require less timing optimization.

) Total 50.1 62.5 63.3 59.5 58.1 57.1

(0.80) (1.00) (1.01) (0.95) (0.93) (0.91)

168.6 173.9 173.9 172.8 172.6 172.5

(0.97) (1.00) (1.00) (0.99) (0.99) (0.99)

483.7 491.6 492.2 489.1 488.8 489.1

(0.98) (1.00) (1.00) (0.99) (0.99) (0.99)

2629.6 3140.4 3257.6 3075.4 3038.7 2978.3

(0.84) (1.00) (1.04) (0.98) (0.97) (0.95)

Moreover, with the application of advanced methods such as multiVth etc, further reduction in power is possible. R EFERENCES [1] K. Yang, D. H. Kim, and S. K. Lim, “Design Quality Tradeoff Studies for 3D ICs Built with Nano-scale TSVs and Devices,” in Proc. Int. Symp. on Quality Electronic Design, 2012, pp. 1–8. [2] X.Dong, J. Zhao, and Y. Xie, “Fabrication Cost Analysis and Cost-Aware Design Space Exploration for 3D-ICs,” in IEEE Trans. on ComputerAided Design of Integrated Circuits and Systems, 2010, pp. 1959–1972. [3] P. Batude et al., “Advances in 3D CMOS Sequential Integration,” in Proc. IEEE Int. Electron Devices Meeting, 2009, pp. 1–4. [4] O.Thomas et al., “Compact 6T SRAM cell with robust read/write stabilizing design in 45nm Monolithic 3D IC technology,” in Proc. IEEE Int. Conf. on Integrated Circuit Design and Tech., 2009, pp. 195–198. [5] S.-M. Jung, H. Lim, K. Kwak, and K. Kim, “ 500-MHz DDR HighPerformance 72-Mb 3-D SRAM Fabricated With Laser-Induced Epitaxial c-Si Growth Technology for a Stand-Alone and Embedded Memory Application,” in IEEE Trans. on Electron Devices, 2010, pp. 474–481. [6] D. H. Kim, R. O. Topaloglu, and S. K. Lim, “Block-Level 3D IC Design with Through-Silicon-Via Planning,” in Proc. Asia and South Paciﬁc Design Aut. Conf., 2012, pp. 335–340. [7] M. Tsai, T. Wang, and T. Hwang, “Through-Silicon Via Planning in 3-D Floorplanning,” in IEEE Trans. on VLSI Systems, 2011, pp. 1448–1457. [8] J. Knechtel, I. Markov, and J. Lienig, “Assembling 2-D Blocks Into 3-D Chips,” in IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 2012, pp. 228–241. [9] S. Bobba et al., “CELONCEL: Effective design technique for 3-D monolithic integration targeting high performance integrated circuits,” in Proc. Asia and South Paciﬁc Design Aut. Conf., 2011, pp. 336–343. [10] C. Liu and S. K. Lim, “Ultra-High Density 3D SRAM Cell Designs for Monolithic 3D Integration,” in Proc. IEEE Int. Interconnect Technology Conference, 2012. [11] H. Xu, D. Sheqin, M. Yuchun, and H. Xianlong, “Simultaneous buffer and interlayer via planning for 3D ﬂoorplanning,” in Proc. Int. Symp. on Quality Electronic Design, 2009, pp. 740–745. [12] X. Wu et al., “Electrical Characterization for Inter-tier Connections and Timing Analysis for 3-D ICs,” in IEEE Trans. on VLSI Systems, 2012, pp. 186–191. [13] http://www.opencores.org/.

686

Recommend Documents

20 - GTCAD

Near ... - gtcad

Functional Integration Approach to Hysteresis