8C-1
High-Density Integration of Functional Modules Using Monolithic 3D-IC Technology 1
Shreepad Panth1 , Kambiz Samadi2 , Yang Du2 , and Sung Kyu Lim1 Dept. of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 2 Qualcomm Research, San Diego, CA 92121
Abstract—Three dimensional integrated circuits (3D-ICs) have emerged as a promising solution to continue device scaling. They can be realized using Through Silicon Vias (TSVs), or monolithic integration using Monolithic Inter-tier vias (MIVs), an emerging alternative that provides much higher via densities. In this paper, we provide a framework for floorplanning existing 2D IP blocks into 3D-ICs using MIVs. We take the floorplanning solution all the way through place-and-route and report post-layout metrics for area, wirelength, timing, and power consumption. Results show that the wirelength of TSV-based 3D designs outperform 2D designs by upto 14% in large-scale circuits only. MIV-based 3D designs, however, offer an average wirelength improvement of 33% for a wide range of benchmark circuits. We also show that while TSV-based 3D cannot improve the performance and power unless the TSV capacitance is reduced, MIV-based 3D offers significant reduction of upto 33% in the longest path delay and 35% in the inter-block net power.
Fig. 1. A sample monolithic 3D technology with three metal layers per tier.
I. I NTRODUCTION Three dimensional integrated circuits (3D-ICs) have emerged as a promising solution to extend the 2D scaling trajectory predicted by the Moore’s Law. Currently, through-silicon vias (TSVs) enable 3DICs, allowing vertical stacking of multiple dies fabricated separately. However, the quality of TSV-based 3D-ICs strongly depends on the TSV dimensions and parisitics, and are limited to memory-on-logic or large logic-on-logic designs with relatively small number of global interconnects [1], [2]. An emerging alternative to TSV-based 3D is monolithic 3D that enables orders of magnitude higher integration density compared to that of TSV-based technology, due to the extremely small size of the monolithic inter-tier vias (MIVs). Monolithic 3D integration technology fabricates two or more tiers of devices sequentially, instead of bonding two previously fabricated dies using micro bumps and TSVs. Figure 1 shows a typical monolithic 3D structure with three metal layers per tier. The two device tiers are connected by inter-tier vias, which are essentially the same size as intra-tier vias. To fabricate the top device tier, low-thermal budgeting process must be applied to prevent damage to the underlying tier’s back-end-of-line (BEOL). Currently, several monolithic 3D integration processes are developed. CEA/LETI [3], [4] has developed a sequential integration flow based on low temperature bonding process. Samsung [5] has developed a “S3” technology for 3-tier SRAM cell using low-thermal TFT process. Overall, MIVs provide better electrical characteristics (i.e., less parasitics, electrical coupling, etc.) than TSVs, and also enable higher integration densities due to their small size. In this paper, we propose an efficient 3D design space exploration framework (i.e., 3D floorplanning) which accounts for the different characteristics between TSV-based and monolithic 3D integration technologies. Since re-designing existing logic, memory and IP blocks for 3D incurs significant design overhead and cost, near-term 3D-ICs will focus on reusing existing 2D blocks [6], [7], [8]. In this paper, we present a floorplanning framework that uses different optimization objectives, based on physical characteristics of both TSVs and MIVs. To the best of our knowledge, this paper is the first to provide a 3D floorplanning framework specifically for monolithic 3D integration technology.
978-1-4673-3030-5/13/$31.00 ©2013 IEEE
To further enhance the applicability of our approach, we integrate our proposed 3D block-level floorplanning framework with the existing commercial place and route (P&R) tools to more accurately assess the solution quality. We use four different testcases with varying complexities, ranging from 33K to 1.7M gates in 45nm technology to better show the impact of our proposed methodology. The contributions of our work are listed below. • We propose and develop an integrated 3D block-level floorplanning framework with appropriate objective functions for both TSV-based and monolithic 3D to enable an efficient 3D-IC design space exploration. • In addition to using simulated annealing for floorplanning, we propose a post-floorplan refinement (PFPR) heuristic which achieves an average reduction of 6.13% in inter-block wirelength with respect to the initial floorplan. • We propose a methodology for MIV planning, which relies on custom scripts, and existing commercial P&R tools. • We develop a methodology that takes the obtained 3D floorplan all the way through place and route, and is capable of reporting post-layout timing and power numbers. II. R ELATED W ORK Monolithic design for high-performance ICs was presented in [9]. This paper presented two design styles, the first one in which PMOS and NMOS devices are fabricated on separate layers, and another in which standard cells have both PMOS and NMOS devices in the same tier. They presented a placement algorithm to fully utilize high density MIVs. Although the stackup is similar to our case, this paper carried out their study at the gate level, and the placement algorithm is not applicable to block-level monolithic 3D-ICs. Design for monolithic 3D-SRAM was carried out in [10]. The authors provided different design styles of the SRAM cell, assuming different PMOS and NMOS tiers, and compared them w.r.t. static noise margin, write margin, and data retention voltage. While several prior works exist on adding TSVs at the gate level or core level, only a few works consider adding TSVs into existing whitespace blocks at the floorplanning stage. Simultaneous buffering
681
8C-1 Center-to-Center based Annealing Update with pin locations
Create Verilog and DEF files with pins
Annealing based refinement
Route with Encounter
CT SV = αW L + βA + γNT SV
No Create Verilog/DEF file for each die
TSV planning Existing work Fig. 2.
Custom program
(1)
In the above equation, W L represents the inter-block wirelength, A represents the chip area, and NT SV represents the number of TSVs. However, if we are dealing with monolithic 3D, then the MIV size is negligible, and we do not need to constrain the number of MIVs, opening up the possibility for further optimization. The monolithic 3D cost function is given as follows.
Extract MIV location and connectivity
Yes
Monolithic ?
swap two blocks in either the positive sequence, negative sequence, or both, and (3) move or swap two blocks between a pair of dies/tiers. In TSV-based 3D, we need to control the number of TSVs due to its significant silicon area. Hence, the TSV-based 3D cost function is given as follows.
CM IV = α W L + β A
Cadence Encounter
The design flow to obtain a 3D floorplan with TSV/MIV insertion.
and TSV planning was carried out in [11], but the authors reported inaccurate 3D HPWL and timing metrics. An improved algorithm was presented in [7], but the same inaccurate HPWL metric was used. Results based on an improved BB-2D-HPWL metric was presented in [8], and the most accurate HPWL metric based on subnets was presented in [6]. However, none of these papers compared the quality of their engine with that of a commercially available tool, or took the obtained floorplans through place and route and reported postlayout numbers. These shortcomings are overcome in this paper, and therefore, the numbers reported are the most accurate. To the best of our knowledge, this is the first work to fully exploit the high density offered by monolithic 3D integration, use a validated floorplanner to perform block-level monolithic 3D design, and compare post-layout 3D wirelength, timing and power numbers with those of a commercial 2D tool. III. 3D F LOORPLANNING WITH M ONOLITHIC I NTER - TIER V IAS A. Problem Formulation and Overview A general form of the 3D floorplanning problem can be stated as follows : Given the number of desired tiers, and a set of blocks along with their corresponding widths and heights, determine the (x, y, z) locations of each of the blocks and all MIVs/TSVs. The overall design flow is shown in Figure 2. We first perform floorplanning to determine the location of all the blocks assuming the pins are placed at the center. Once the locations of all the blocks are determined, we update the locations of the pins and perform a refinement step (i.e., PFPR) to further minimize wirelength. Depending on whether we are dealing with TSVs or MIVs, we have different via planning engines. Finally, we create separate Verilog files for each die/tier with the corresponding connectivity information, and a design exchange format (DEF) file with the location of blocks and TSVs/MIVs. Each of the above steps are further explained in following subsections. B. Floorplanning Engine In this step, we take the description of all the blocks as well as the connectivity information and generate an output floorplan that minimizes a certain cost function depending on whether we are using TSV-based or monolithic 3D. We use a simulated annealing engine similar to [6], maintaining a separate sequence pair for each die. We perform the following different moves during the annealing process: (1) change aspect ratio of a block (or rotate in case of hard blocks), (2)
(2)
Considering the pin locations of the blocks during floorplanning will require an extra step to compute the physical location of all block-pins. Since the number of block-pins are quite large, this will lead to large runtime overhead. We instead propose a postfloorplanning refinement (PFPR) step to consider pin locations once block locations have been determined. C. Post-Floorplan Refinement (PFPR) After we determine the relative locations of all the blocks, we update the blocks with the pin locations. Each block has 8 possible orientations, 0◦ , 90◦ , 180◦ , 270◦ , and their flipped counterparts. Without changing the relative locations of the blocks in the floorplan, each block can only have four possible orientations. For example, if the pins are in the center of a block, 0◦ , 180◦ or 90◦ , 270◦ and their flipped counterparts are all the same. However, if the pins are placed along the periphery each of the above four orientations gives a different wirelength result. The goal of this step is to determine the orientation of each block, such that the wirelength is minimized. To do this, we use simulated annealing, where the only operation allowed is to change block orientation. The block orientation can only be changed among the allowed four scenarios. No sequence pair is necessary, as the relative locations of blocks do not change. Furthermore, wirelength computation can be done incrementally as we only change one block at a time. D. MIV Planning Algorithm Once we obtain the 3D floorplanning result, we need to insert TSVs or MIVs (monolithic inter-tier vias) in the case of monolithic 3D to connect blocks in different tiers. Since TSVs are big (around 5μm to 10μm) and we may not have enough whitespace in the dies, a whitespace manipulation step is required. We use an existing TSV planner [6] that constructs a 3D rectilinear Steiner tree (RST) from a 2D rectilinear Steiner minimum tree (RSMT), and then moves TSVs to nearby whitespace based on a network-flow formulation. In the case that there is insufficient whitespace, we insert whitespace at desired locations. However, in the case of monolithic 3D, MIVs are very small (around 70nm) and hence, we can safely assume that there is always whitespace available for MIV insertion. In this case, we can utilize existing obstacle avoiding routers to perform MIV insertion. We use the 2D IC router in Cadence SOC Encounter, and since it is limited to 15 metal layers, we use three metal layers to represent a given tier for the MIV planning stage only. This allows us to represent up to 5 tiers. For example, if a block is in tier 2, we use metal layer 4 to place block-pins, and metal layers 5 and 6 to represent interblock routing on that tier. Vias between metal 6 and 7 represent MIVs between tier 2 and 3. Our choice of the number of metal layers used
682
8C-1 Algorithm 1: MIV Planning Algorithm Input : Location of all blocks in B, block orientation, block-pin locations, and connectivity information Output: Number, location, and connectivity information of MIVs 1 for n ← 1 to Nnet do 2 add connectivity information into a Verilog file; 3 end 4 for i ← 1 to |B| do bi 5 for p ← 1 to Npin do 6 add pin physical location (xpbi , ybpi , lbi ) in the DEF; 7 end 8 add routing blockage for bi on its assigned layer ljbi ; 9 end 10 read the above Verilog and DEF files into SOC Encounter; 11 route the design and save the routed DEF file; 12 read the routed DEF file and reconstruct the routing graphs; 13 extract corresponding subnets in each die / tier from the routing graphs; 14 create Verilog file for each die/tier with subnet connectivity; 15 create DEF file for each die/tier with MIV locations;
Fig. 3.
TABLE I D ESIGN S TATISTICS FOR A LL B ENCHMARKS Design
# Gates
#Blk
des perf 33,024 cf rca 16 146,542 cf fft 256 8 288,145 mult 256 256 1,639,050
38 95 49 127
#Inter-blk Intra-blk Target nets WL (μm) period (ns) 2,378 210,488 0.9 3,135 1,210,618 1.3 1,402 4,490,813 1.5 49,471 12,354,340 0.845
is justified because we only route the inter-block nets in our blocklevel monolithic 3D designs, and they are routed in the top 2 or 3 metal layers of each tier. Our MIV planning heuristic starts with creating a netlist that contains the connectivity information of the pins of all the 3D nets as shown in Lines 1 to 3 of Algorithm 1, where Nnet denotes the total number of 3D nets. We then create a DEF file that contains the physical location of every pin of each block; xpbi and ybpi denote the x and y coordinates of pin p of block bi , respectively, and lbi denotes the metal layer that block bi is assigned to. In addition, we add routing blockages for each block to account for (1) the fact that MIVs cannot be placed within the blocks and (2) the internal wiring of each block (Lines 4 to 9). Next, we give the Verilog and DEF files to SOC Encounter to route all the 3D nets simultaneously (Lines 10 and 11). Simultaneous routing of all 3D nets avoids any possible congestion issues due to the small size of MIVs. Once we obtain the routed DEF, we trace the routing topology to determine (1) which MIV belongs to which net, and (2) which block-pin the MIV connects to (Lines 12 and 13). Finally, we generate the Verilog and DEF files for each tier (Lines 14 and 15) that contains the block/MIV locations. IV. E VALUATION A. Experimental Setup All required code and scripts are implemented in C/C++ and python, and all experiments are carried out on a 2.5 GHz 64bit linux system. The 45nm Nangate open source standard cell library is used in our experiments. The TSV diameter, landing pad size, pitch, and thickness are assumed to be 6μm, 7μm, 10μm, and 50μm respectively. The MIV diameter, pitch and thickness are 0.07μm, 0.28μm and 0.31μm respectively. The TSV resistance and
Our design flow used to get post-layout simulation results.
capacitance are 50mΩ, and 122f F respectively. These parasitics are measured values, taken from [12]. The MIV resistance and capacitance are similar to that of local vias and are 4Ω, and 1f F respectively. The monolithic structure is similar to that of Figure 1, except that we use six metal layers per tier. We consider four benchmarks in this work, statistics of which are shown in Table I. The first three are taken from the Opencores benchmark suite [13], and the fourth is a custom built 256-bit integer multiplier. This multiplier is built out of 256x4-bit multiplier and 512bit adder blocks, arranged into an adder tree. Each multiplier block has 3 pipeline stages and each adder block has 4 pipeline stages. The design flow used to obtain all results is shown in Figure 3. It consists of roughly two steps: block design, and top-level design and analysis. 1) Block Design: We begin by designing each block separately in Cadence SOC Encounter. The netlist for each block is obtained by grouping modules bottom up along the hierarchy, until they reach a certain area threshold. Timing constraints for each block depend on the overall system frequency, and are determined by context characterization. Each block is then placed, routed and timing optimized in SOC Encounter. This step finalizes the pin locations within each block. We choose four blocks at random from “cf rca 16” testcase and show their layouts in Figure 4. 2) Top-level Design and Analysis: We perform floorplanning using the methodology described in Section III-B. Three different floorplanning methodologies are considered, the first two (1) TSV-based 3D (TSV) and (2) monolithic 3D (MIV) are already described. The third one, MIV TF is obtained by using the same floorplan output as in the TSV case (before whitespace insertion), but using the MIVplanning engine instead of the TSV-planning engine. This compares the quality of the two different methodologies, starting with the same floorplan. The number of MIVs in MIV TF used can be more than the number of TSVs because multi-pin nets might use far more MIVs due to their small size. Some sample layouts for 2D floorplanning and 2-Die implementations of cf rca 16 are shown in Figure 4. We next route each die separately in SOC Encounter. We perform parasitic extraction to obtain the SPEF files for each die. In addition, we create a top-level Verilog file with the interconnections between dies, and a top-level SPEF file with the TSV/MIV parasitics. All netlist and parasitic information is then fed into Synopsys Primetime to obtain true 3D timing and power numbers. B. Experimental Results and Discussions 1) Floorplanner Validation: We run our floorplanner in 2D mode, and compare it with the results obtained from wirelength-driven floorplanning in Cadence Encounter. The Encounter footprint area is
683
8C-1
Fig. 4.
Some sample layouts for cf rca 16 testcase, along with select block designs, and zoomed in shots of TSVs and MIVs TABLE II
A
COMPARISON OF THE PERFORMANCE OF OUR FLOORPLANNER AND C ADENCE E NCOUNTER
Footprint Encounter des perf 0.0655 (1.00) cf rca 16 0.445 (1.00) cf fft 256 8 1.690 (1.00) mul 256 256 5.198 (1.00) Average 1.00
Fig. 5.
(mm2 ) Ours 0.0604 (0.92) 0.413 (0.93) 1.141 (0.68) 4.896 (0.94) 0.87
Inter-block WL (m) Encounter Ours 0.352 (1.00) 0.356 (1.01) 0.361 (1.00) 0.368 (1.02) 0.414 (1.00) 0.437 (1.06) 17.01 (1.00) 17.87 (1.05) 1.00 1.035
Various components of net power reported in this paper.
obtained by gradually increasing the area and running floorplanning until no block overlap is observed. The results are summarized in Table II. The large area reduction in the cf fft 256 8 design is due to the fact that Cadence Encounter repeatedly produces module overlaps when provided with smaller area. This is presumably due to some bug in the legalization stage of SOC Encounter. It can still provide comparable wirelength to our floorplanner however, as this particular testcase is only locally connected, and each block communicates with only one or two neighbours. As seen from Table II, our floorplanner produces comparable results with SOC Encounter. 2) Comparison of 2D versus 3D: In this section, we compare the wirelength, timing and top-level net power of 2D and 3D cases of all designs. The clock period assumed for Total Negative Slack (TNS) and power calculation is taken from Table I. The different components of net power are explained in Figure 5. We have intrablock nets, and inter-block nets. The inter-block net power is further split up into three components: (1) intra-block component (OBN-Int.), (2) inter-block component (OBN-Top) and (3) pin component of the loading cell (OBN-Pin). At the block level, the only component of
net power that can be optimized is OBN-Top. Furthermore, since we do not have a true 3D timing optimization engine, we report preoptimization timing and power numbers. The results for all designs are summarized in Table III. From this table, we see that with respect to the inter-block wirelength, monolithic 3D gives us significant advantage. The total wirelength reduction depends upon the ratio of inter-block wirelength to intrablock wirelength, and varies depending on the circuit. TSV-based 3D design however, does not give any improvement in wirelength for the small design des perf, and we start to see small improvements in the cf rca 16 and cf fft 256 8 testcases. However, with the largest design, we see no improvement, mainly because we need to travel a large distance to the nearest whitespace block to place a TSV. Also, as expected, MIV TF gives better wirelength than the TSV-based method, but worse than the MIV case. With respect to timing and net power, we see that the MIV case improves the longest path delay (LPD), the total negative slack (TNS) and the top-level net power. The timing of MIV TF is sometimes better than the timing of MIV, as wirelength driven floorplanning does not guarantee best timing. In the benchmarks considered, except in the 2-Die case of cf fft 256 8, the TSV case does not give any timing or power improvement over 2D. This is because the large 122f F capacitance is analogous to more than 700μm of Metal 10 wire in the 45nm technology, and a significant number of such long wires are required to see a sensible reduction. In general, the reduction in top net power of MIV follows the reduction in top net wirelength. The only exception is mult 256 256. Here we see that our 2D design has 43% more power than encounter, with only 5% more wirelength. This is because power consumption depends on the wirelength distribution, and our floorplanner results in solutions with the longer nets having higher switching activity. Therefore, we conclude that monolithic 3D can provide significant benefits over 2D even in the case of small designs, while TSVbased 3D is suitable for designs with a large number of long interconnections or memory-on-logic stacking applications; and the improvement in the case of logic-on-logic will be observed only with smaller TSV parasitics. 3) Power benefit of monolithic 3D: We provide a detailed preoptimization power split-up of all four testcases in Table IV, with the legend explained in Figure 5. We compare 2D with MIV-based 3D, and also provide a reference case of “ideal interconnections”. This ideal case does not correspond to any real physical scenario, but represents the theoretical minimum power consumption at the block level. The values are obtained by setting the parasitics of the OBN-
684
8C-1 TABLE III A
COMPARISON OF WIRELENGTH , TIMING AND TOP NET POWER OF
Footprint Normalised (μm × μm) Si. Area Encounter Ours 2 Dies MIV 3 Dies 4 Dies 2 Dies TSV 3 Dies 4 Dies 2 Dies MIV TF 3 Dies 4 Dies
256x256 251x241 146x211 127x179 111x149 215x323 320x235 359x402 213x250 211x233 186x160
1 0.92 0.94 1.04 1.01 2.12 3.44 8.81 1.63 2.25 1.82
Encounter Ours 2 Dies MIV 3 Dies 4 Dies 2 Dies TSV 3 Dies 4 Dies 2 Dies MIV TF 3 Dies 4 Dies
667x667 555x744 416x477 367x370 273x384 484x418 377x370 350x349 438x416 375x369 317x311
1 0.93 0.89 0.92 0.94 0.91 0.94 1.10 0.82 0.93 0.89
2D
2D
Encounter 1,300x1,300 Ours 1,142x999 2 Dies 819x718 MIV 3 Dies 581x799 4 Dies 595x594 2 Dies 679x932 TSV 3 Dies 653x674 4 Dies 584x527 2 Dies 675x925 MIV TF 3 Dies 649x668 4 Dies 578x523
1.00 0.68 0.70 0.82 0.84 0.75 0.78 0.73 0.74 0.77 0.72
Encounter Ours 2 Dies MIV 3 Dies 4 Dies 2 Dies TSV 3 Dies 4 Dies 2 Dies MIV TF 3 Dies 4 Dies
1.00 0.94 1.00 0.96 1.02 1.00 1.08 1.14 0.99 1.02 1.03
2D
2D
2,280x2,280 2,144x2,284 1,506x1,718 1,286x1,295 1,177x1,131 1,608x1,616 1,508x1,236 1,240x1,190 1,601x1,609 1,501x1,182 1,182x1,131
#MIV/ #TSV
Inter-block routed WL (μm) des perf 352,805 (1.00) 356,489 (1.01) 1,800 267,678 (0.76) 2,738 222,240 (0.63) 3,823 204,868 (0.58) 120 473,092 (1.34) 456 515,267 (1.46) 984 734,739 (2.08) 124 370,823 (1.05) 482 353,226 (1.00) 1,098 238,356 (0.68) cf rca 16 361,673 (1.00) 367,542 (1.02) 1,747 289,156 (0.80) 2,925 255,910 (0.71) 3,936 240,583 (0.67) 156 354,347 (1.07) 334 401,425 (1.11) 477 345,090 (0.95) 324 323,631 (0.89) 609 281,093 (0.78) 850 263,092 (0.73) cf fft 256 8 413,674 (1.00) 436,933 (1.06) 1,050 263,787 (0.64) 1,921 254,256 (0.61) 2,475 269,049 (0.65) 75 369,166 (0.89) 147 357,592 (0.86) 377 422,216 (1.02) 105 357,887 (0.87) 210 339,045 (0.82) 518 310,465 (0.75) mult 256 256 17,089,968 (1.00) 17,870,346 (1.05) 48,513 13,815,376 (0.81) 79,682 11,392,196 (0.67) 102,994 10,116,222 (0.59) 1,683 18,825,744 (1.10) 3,599 21,184,404 (1.24) 4,232 20,890,062 (1.22) 13,162 16,127,948 (0.94) 20,955 152,560,50 (0.89) 24,260 15,1246,51 (0.89)
Total routed WL (μm) 563,293 (1.00) 566,977 (1.01) 478,166 (0.85) 432,728 (0.77) 415,356 (0.74) 683,580 (1.21) 725,755 (1.29) 945,227 (1.68) 581,311 (1.03) 563,714 (1.00) 448844 (0.80)
2D
VERSUS
LPD (ns)
3D TNS (ns)
OBN-Top power (mW)
1.65 1.73 1.44 1.23 1.10 2.18 2.46 4.09 2.06 1.65 1.25
(1.00) (1.05) (0.87) (0.74) (0.67) (1.32) (1.49) (2.48) (1.25) (1.00) (0.75)
-135.02 (1.00) -162.95 (1.21) -78.93 (0.58) -47.62 (0.35) -22.80 (0.17) -259.6 (1.92) -450.04 (3.33) -590.19 (4.37) -182.17 (1.35) -162.14 (1.2) -54.49 (0.40)
11.24 (1.00) 11.86 (1.06) 8.55 (0.76) 7.29 (0.65) 6.41 (0.57) 21.16 (1.88) 30.26 (2.69) 48.12 (4.28) 12.71 (1.13) 11.72 (1.04) 7.29 (0.65)
(1.00) (0.95) (0.94) (0.93) (0.92) (1.34) (1.75) (1.97) (0.97) (0.90) (0.90)
-2,762.73 (1.00) -2,159.27 (0.78) -1,949.23 (0.71) -1,729.42 (0.63) -1,576.67 (0.57) -11,093 (4.02) -16,074 (5.82) -18,825 (6.81) -2,463.8 (0.89) -1,353.31 (0.49) -1242.01 (0.45)
4.71 (1.00) 4.73 (1.00) 3.74 (0.79) 3.61 (0.77) 3.37 (0.72) 7.49 (1.59) 11.51 (2.44) 13.4 (2.85) 4.12 (0.87) 3.7 (0.79) 3.45 (0.73)
1,572,291 1,578,160 1,499,774 1,466,258 1,451,201 1,564,965 1,612,043 1,555,708 1,534,249 1,491,711 1,473,710
(1.00) (1.00) (0.95) (0.93) (0.92) (1.00) (1.03) (0.99) (0.98) (0.95) (0.94)
1.85 1.75 1.73 1.72 1.69 2.48 3.23 3.63 1.79 1.65 1.66
4,904,487 4,927,746 4,754,600 4,745,069 4,759,862 4,859,979 4,848,405 4,913,029 4,848,700 4,829,858 4,801,278
(1.00) (1.00) (0.97) (0.97) (0.97) (0.99) (0.99) (1.00) (0.99) (0.98) (0.98)
2.18 (1.00) 2.12 (0.97) 1.96 (0.90) 1.9 (0.87) 1.85 (0.85) 2.1 (0.96) 2.47 (1.13) 3.22 (1.48) 1.87 (0.86) 1.74 (0.80) 1.85 (0.85)
-22,308 (1.00) -11,388 (0.51) -3,618 (0.16) -4,447 (0.20) -4,023 (0.18) -14,655 (0.66) -34,950 (1.57) -67,602 (3.03) -6,314 (0.28) -1,358 (0.06) -1,626 (0.07)
7.7 (1.00) 8.2 (1.06) 5.3 (0.69) 5.06 (0.66) 5.29 (0.69) 9.22 (1.20) 11.1 (1.44) 16.67 (2.16) 6.82 (0.89) 6.24 (0.81) 5.74 (0.75)
29,444,308 30,224,686 26,169,716 23,746,536 22,470,562 31,180,084 33,538,744 33,244,402 28,482,288 27,610,390 27,478,991
(1.00) (1.03) (0.89) (0.81) (0.76) (1.06) (1.14) (1.13) (0.97) (0.94) (0.93)
1.12 (1.00) 1.27 (1.14) 1.17 (1.05) 0.95 (0.85) 0.97 (0.87) 1.76 (1.58) 2.02 (1.8) 2.45 (2.19) 1.06 (0.95) 0.99 (0.88) 1.12 (1.00)
-216.33 (1.00) -253.94 (1.17) -251.95 (1.16) -133.96 (0.62) -128.78 (0.60) -441.29 (2.04) -838.1 (3.87) -945.89 (4.37) -205.7 (0.95) -185.82 (0.86) -230.95 (1.07)
144.41 (1.00) 206.1 (1.43) 146.5 (1.01) 125 (0.87) 111.2 (0.77) 304.4 (2.11) 373.1 (2.58) 376.5 (2.61) 187.4 (1.30) 180.2 (1.25) 187.8 (1.30)
Switching power of the standard cells driving inter-block nets.
Fig. 6. Timing slack histograms comparing 2D and MIV-based 3D (2 die) for FFT benchmark. Negative slacks are shown in red, and positive slacks in green.
From Table IV, we see that even theoretically, only a 10% average reduction in the total power consumption is possible, and the reduction is larger for designs with relatively more inter-block nets. We also see that MIV-based 3D gives us 3.1% average reduction in the total power consumption across our four testcases. If we consider the parameter that is being optimized by floorplanning, i.e., OBN-Top, we see that a large reduction in the power consumption is obtained by using monolithic 3D. The reduction in the driving cell power is present in all testcases, but most noticeable in the mult 256 256, which has a huge number of driving cells.
Top nets to zero in Primetime. With a reduction in the wirelength of top level nets, we expect reduction the following power components: (1) Inter-block components of inter-block nets (OBN-Top), and (2)
Since we do not have a true 3D timing optimization tool, we cannot compare post-optimization numbers directly. However, we can predict the trend from the TNS reduction (Table III), and timing slack histograms (shown for cf fft 256 8 testcase in Figure 6). Due to the average reduction of 51% in TNS, fewer buffer insertions and cell
685
8C-1 TABLE IV A DETAILED SPLIT UP OF THE POWER FOR 2D Std. Cell Leakage Ideal interconnections Encounter 2D Ours 2 Dies MIV 3 Dies 4 Dies
26.1 27.1 27.2 26.9 26.7 26.6
0.5 0.5 0.5 0.5 0.5 0.5
Ideal interconnections Encounter 2D Ours 2 Dies MIV 3 Dies 4 Dies
107.4 108.1 108.1 107.9 107.9 107.9
2.9 2.9 2.9 2.9 2.9 2.9
Ideal interconnections Encounter 2D Ours 2 Dies MIV 3 Dies 4 Dies
353.5 353.7 353.7 353.6 353.6 353.6
8 8 8 8 8 8
Ideal interconnections Encounter 2D Ours 2 Dies MIV 3 Dies 4 Dies
1807.8 2174.9 2233.3 2110.7 2095.5 2049
33.9 33.9 33.9 33.9 33.9 33.9
AND MONOLITHIC
3D (
IN M W
IBN OBN-Pin OBN-Int. OBN-Top des perf 21.09 1.07 1.34 0 (-) 21.03 1.19 1.34 11.24 (1.00) 21.11 1.19 1.34 11.86 (1.06) 21.01 1.2 1.34 8.55 (0.76) 21.07 1.2 1.34 7.29 (0.65) 21.04 1.21 1.34 6.41 (0.57) cf rca 16 57.43 0.12 0.75 0(-) 57.33 0.12 0.75 4.71 (1.00) 57.31 0.12 0.75 4.73 (1.00) 57.3 0.12 0.75 3.74 (0.79) 57.32 0.12 0.75 3.61 (0.77) 57.46 0.12 0.75 3.37 (0.72) cf fft 256 8 120.24 0.56 1.4 0 (-) 120.27 0.53 1.4 7.7 (1.00) 120.27 0.53 1.4 8.2 (1.06) 120.26 0.54 1.4 5.3 (0.69) 120.21 0.54 1.4 5.06 (0.66) 120.18 0.53 1.4 5.29 (0.69) mult 256 256 744.33 10.47 33.1 0 (-) 744.24 9.85 33.1 144.41 (1.00) 741.38 9.82 33.1 206.1 (1.43) 741.35 9.85 33.1 146.5 (1.01) 741.38 9.82 33.1 125 (0.87) 741.27 9.83 33.1 111.2 (0.77)
upsizing will be required to meet timing. Also, since the entire slack histograms are shifted towards the right, techniques such as timing slack redistribution or multi-Vth design can be employed to achieve further power benefit. C. Design Guidelines for block-level MIV-based 3D We consider two possible scenarios: timing critical and power critical designs. In the case of timing critical designs, we have shown that MIV-based 3D can give significant reduction in longest path delay, as well as the total negative slack. Larger reductions in delay will be seen for designs with combinational paths through blocks. In the case of power critical designs, we have shown that MIV-based 3D gives significant reduction in inter-block net power, and depending on the number of inter-block nets, significant savings in power of driving cells of inter-block nets. Further power reduction can be achieved in one of several ways: (1) re-designing the blocks to downsize interblock drivers, (2) voltage scaling of the 3D system, which will shift the entire timing distribution back to the 2D case, and (3) Multi-Vth optimization will require fewer low Vth cells to meet timing, reducing device power. V. C ONCLUSIONS In this paper, we provided a floorplanning framework for monolithic 3D-ICs, and a methodology to obtain post-layout wirelength, timing, and power numbers for block-level 3D-ICs. We demonstrated that monolithic inter-tier via (MIV)-based 3D-ICs can achieve up to 42% reduction in wirelength when compared with 2D-ICs. In addition, we compared our monolithic 3D designs to the throughsilicon-via (TSV)-based 3D-IC designs in terms of area, wirelength, power and performance. We observed that TSV-based 3D is only beneficial if either the TSV capacitance scales down, or the circuit has a large number of long wires. We also showed that due to a significant reduction in the total negative slack, and increase of the positive slacks, MIV-based 3D-ICs require less timing optimization.
) Total 50.1 62.5 63.3 59.5 58.1 57.1
(0.80) (1.00) (1.01) (0.95) (0.93) (0.91)
168.6 173.9 173.9 172.8 172.6 172.5
(0.97) (1.00) (1.00) (0.99) (0.99) (0.99)
483.7 491.6 492.2 489.1 488.8 489.1
(0.98) (1.00) (1.00) (0.99) (0.99) (0.99)
2629.6 3140.4 3257.6 3075.4 3038.7 2978.3
(0.84) (1.00) (1.04) (0.98) (0.97) (0.95)
Moreover, with the application of advanced methods such as multiVth etc, further reduction in power is possible. R EFERENCES [1] K. Yang, D. H. Kim, and S. K. Lim, “Design Quality Tradeoff Studies for 3D ICs Built with Nano-scale TSVs and Devices,” in Proc. Int. Symp. on Quality Electronic Design, 2012, pp. 1–8. [2] X.Dong, J. Zhao, and Y. Xie, “Fabrication Cost Analysis and Cost-Aware Design Space Exploration for 3D-ICs,” in IEEE Trans. on ComputerAided Design of Integrated Circuits and Systems, 2010, pp. 1959–1972. [3] P. Batude et al., “Advances in 3D CMOS Sequential Integration,” in Proc. IEEE Int. Electron Devices Meeting, 2009, pp. 1–4. [4] O.Thomas et al., “Compact 6T SRAM cell with robust read/write stabilizing design in 45nm Monolithic 3D IC technology,” in Proc. IEEE Int. Conf. on Integrated Circuit Design and Tech., 2009, pp. 195–198. [5] S.-M. Jung, H. Lim, K. Kwak, and K. Kim, “ 500-MHz DDR HighPerformance 72-Mb 3-D SRAM Fabricated With Laser-Induced Epitaxial c-Si Growth Technology for a Stand-Alone and Embedded Memory Application,” in IEEE Trans. on Electron Devices, 2010, pp. 474–481. [6] D. H. Kim, R. O. Topaloglu, and S. K. Lim, “Block-Level 3D IC Design with Through-Silicon-Via Planning,” in Proc. Asia and South Pacific Design Aut. Conf., 2012, pp. 335–340. [7] M. Tsai, T. Wang, and T. Hwang, “Through-Silicon Via Planning in 3-D Floorplanning,” in IEEE Trans. on VLSI Systems, 2011, pp. 1448–1457. [8] J. Knechtel, I. Markov, and J. Lienig, “Assembling 2-D Blocks Into 3-D Chips,” in IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 2012, pp. 228–241. [9] S. Bobba et al., “CELONCEL: Effective design technique for 3-D monolithic integration targeting high performance integrated circuits,” in Proc. Asia and South Pacific Design Aut. Conf., 2011, pp. 336–343. [10] C. Liu and S. K. Lim, “Ultra-High Density 3D SRAM Cell Designs for Monolithic 3D Integration,” in Proc. IEEE Int. Interconnect Technology Conference, 2012. [11] H. Xu, D. Sheqin, M. Yuchun, and H. Xianlong, “Simultaneous buffer and interlayer via planning for 3D floorplanning,” in Proc. Int. Symp. on Quality Electronic Design, 2009, pp. 740–745. [12] X. Wu et al., “Electrical Characterization for Inter-tier Connections and Timing Analysis for 3-D ICs,” in IEEE Trans. on VLSI Systems, 2012, pp. 186–191. [13] http://www.opencores.org/.
686