Design of a robust, high performance standard cell ... - Semantic Scholar

Comment

Report 1 Downloads 68 Views

22nd International Conference on Microelectronics (ICM 2010)

Design of a robust, high performance standard cell threshold logic family for DSM technology Samuel Leshner, Niranjan Kulkarni, and Sarma Vrudhula

Krzysztof Berezowski

Arizona State University Tempe, AZ, U.S. Email: [email protected], [email protected], [email protected]

TIMA Laboratory Grenoble, France Email: [email protected]

Abstract—This paper presents the threshold logic latch (TLL), which provides a high performance, low power alternative to traditional CMOS logic networks. TLL is highly robust, even in deep sub-micron technology nodes. Experimental results obtained from simulation of a commercial 65 nm low power process demonstrate a static noise margin up to an order of magnitude greater than those of existing implementations of threshold logic. Examples of automated synthesis of pipelined multipliers using a combination of standard CMOS and a small number of TLL gates are shown through simulation to improve both area and total power by a factor of up to 1.5 and reduce leakage power by a factor of up to 2.3.

I. I NTRODUCTION The demand for power efﬁciency and high performance in embedded systems has inspired a large body of research into models of computation that utilize available transistor resources more efﬁciently than standard CMOS logic gates. Among these, threshold logic [7] has recently been rediscovered as a potential design alternative. Threshold functions are a proper subset of Boolean functions. A function y = f (x1 , x2 , . . . , xn ) is threshold if there exists a set of weights w1 , w2 , . . . , wn and a threshold T such that n−1 1 if i=0 wi xi ≥ T, y= (1) 0 otherwise. The advantages of threshold logic rest in its ability to compute complex functions very efﬁciently. In many cases, it is possible to replace a large multi-level CMOS network with a single threshold gate. For instance, the function f = a(b(c(d + e) + de) + cde) + bcde can be implemented as single threshold gate whereas an optimally synthesized version using traditional gate libraries requires four or more levels of logic. This absorption of CMOS logic reduces both gate count and critical path delay, and recent advances in threshold logic synthesis [3], [13] promise very fast, low power designs. Unfortunately, implementation of threshold gates requires circuits that implement a comparison between physical quantities, and as such are inherently more susceptible to parameter variations than switching networks implemented by static CMOS logic. This property is a severe obstacle for threshold logic; since the miniaturization of digital ICs has led to diminished control over the fabrication process, robustness has become the prime design metric for deep sub-micron technologies. Thus, in addition to the challenge of achieving

978-1-4244-5816-5/09/$26.00 ©2009 IEEE

higher performance and lower power consumption, a certain level of reliability must be guaranteed as well. The main contribution of this paper is a novel architecture of a CMOS-based threshold logic gate, which exhibits all of the advantages of threshold logic including functional capacity, high speed, and low power consumption. In addition, the proposed architecture provides reliability levels (expressed in terms of noise margins) comparable to those of contemporary static CMOS logic, and improved by orders of magnitude over existing threshold logic gate implementations. We also demonstrate the beneﬁts of threshold logic based design through the comparative evaluation of automatically generated conventional and threshold logic-based multiplier designs. II. P REVIOUS WORK Threshold logic has been studied extensively since its inception decades ago [7]. Advances in the ﬁeld have come slowly, however, largely due to a lack of efﬁcient physical implementations. Early implementations of threshold logic have relied on static currents [8] or the accumulation of stored charge in ﬂoating capacitances [4], [5]. However, these solutions have proven to be ultimately impractical in modern processes due to low speed, high power consumption, and/or overly taxing processing requirements. Recently, a number of novel implementations based on the principles of differential impedance have emerged which have provided very promising results [1]. Differential implementations can be constructed from any type of FET, and do not require any special devices or processing techniques. They achieve high performance and do not draw static current (other than leakage current). All logic values are stored statically, thus keeper devices, typically required in dynamic logic elements, are not needed. Of the many differential implementations of threshold logic gates, two examples have demonstrated uniformly high performance and low power consumption: single input current-sensing differential logic (SCSDL) [12] and differential current switch threshold logic (DCSTL) [9]. These are shown in Figures 1 and 2, respectively. The operation principle of all differential threshold logic elements is essentially the same. All implementations comprise a differential sense ampliﬁer (M1−8 ) and two networks of parallel transistors (M9,10 ) on either side. These parallel networks are controlled by the primary input signals, and

M 1 M2

clk Vout

N1

clk

M3 M4 M6 N4 M8

M5 N3 M7

clk

N5

N6 M9

inn-1

in1

Vout

N1

clk

M11

M10

in0

M3 M4

M1 M 2 Vout

clk N2

M5 N3 M7

M6 N4 M8

N5 inn

inn+1

Vout

N2

M12 N6

in2n-1

M9 Fig. 1.

Device-level schematic of the SCSDL element.

inn-1 Fig. 3.

clk

M1

M2 M11 clk

M3 N7 M13

clk

M6 N4 M8

Vout

clk

N5

N6 M9

inn-1

Fig. 2.

in1

in0

in1

in0

clk

inn

inn+1

in2n-1

Device-level schematic of the proposed TLL element.

III. P ROPOSED THRESHOLD LOGIC ARCHITECTURE

N2 M5 N3 M7

M10

clk

M12

N1

Vout

M4

clk

M10 inn

inn+1

in2n-1

Device-level schematic of the DCSTL element.

provide conﬁgurable impedances in series with the intrinsic impedances of the differential ampliﬁer. When the clock signal is low, the logic elements precharge both output nodes (N1,2 ) of the differential ampliﬁer. In order to ensure that the function is evaluated properly, every differential pair of capacitive nodes (N1,2 , N3,4 , N5,6 ) must be at the same potential prior to the rising edge of the clock signal. When the clock signal rises, both output nodes of the differential ampliﬁer begin to discharge at different rates determined by the current state of the primary input signals. The ﬁrst side of the differential ampliﬁer to discharge sufﬁciently determines the new output state. Neglecting the pre-charged capacitance of any nodes other than the outputs, the discharge delay τ of the output node of a single side of the ampliﬁer can be formulated as an RC network as given by Equations (2) and (3) using the Elmore delay model. τ = C1 (Z5 + Z7 + Z9 ) (2) (3) τ = C2 (Z6 + Z8 + Z10 ) Ci corresponds to the capacitance of the node Ni ; Zi corresponds to the effective impedance of the transistor Mi . Note that transistors M11 , M12 , and M13 of the DCSTL element create alternative discharge paths for nodes N1 and N2 . Strictly speaking, they are not essential to the proper operation of the logic element, but signiﬁcantly improve the speed of the gate by reducing the total impedance for each side of the differential ampliﬁer.

All of the previously mentioned implementations of differential threshold logic integrate the parallel input networks into the differential ampliﬁer. Since Z9 and Z10 correspond to the networks of parallel transistors and are inversely proportional to the number of active transistors in the network, the total series impedance becomes dominated by Z5−8 when the number of active transistors composing Z9 and Z10 is large. The actual impedance of Z5−10 will experience some variation in practice, and if Z9 and Z10 are very small, variations from Z5−8 may effectively mask the designed difference in impedance between the two differential discharge paths, resulting in unpredictable behavior. The simplest techniques for mitigating these problems are to increase the difference in impedance between the two discharge paths in order to improve the noise margin, and/or improve transistor matching between differential pairs of transistors in order to reduce unintentional variations in impedance. These techniques are straightforward to apply, but quite costly. A 0.25 μm test chip utilizing DCSTL elements fabricated in [10] used very large transistors (10 μm Wmax , 1.32 μm Lmax ) to provide sufﬁcient matching in the threshold logic gates, the size of which, the authors concluded, greatly diminished the delay and power advantages of the logic style. Our new differential threshold logic element, the threshold logic latch (TLL), utilizes an alternative technique to mitigate these issues. Instead of manipulating transistor dimensions, we isolate the parallel input networks (M9,10 ) from the differential ampliﬁer (M1−8 ). Consequently, the input networks act as banks of pass transistors, or more precisely, as inputprogrammable delay elements. Instead of comparing effective impedances, the ampliﬁer in a TLL gate effectively evaluates the race between the two clock signals traversing through each input network, i.e. compares the timing constants of the input networks. An annotated ﬁgure detailing the TLL logic element is seen in Figure 3. The novel structure of the TLL logic element provides a number of advantages over existing differential threshold logic implementations. In other implementations, both output nodes of the differential ampliﬁer begin to discharge at the same time at different rates determined by the input conﬁguration. In the

TLL logic element, the output node at the winning side of the ampliﬁer begins to discharge before the other, determined by which parallel input network the clock signal is able to propagate through the fastest. Essentially: 1. The side of the differential ampliﬁer that is triggered ﬁrst is able to partially discharge unhindered by the efforts of the opposite side, and 2. The early discharge of one side of the ampliﬁer reduces the initial rate of discharge of the opposite side. This difference in initial discharge times is determined solely by the impedances of the parallel input networks, whereas the difference in discharge rates utilized by other implementations is determined by the impedances of the parallel input networks in series with the transistors of the discharge path of the differential ampliﬁer. Thus, while variations in the differential ampliﬁer in a TLL gate may still lead to some imbalance in impedance between the two discharge paths, the designed difference between the two parallel input networks is not as easily masked. IV. E XPERIMENTAL RESULTS To highlight the advantages of the TLL gate and hybrid multiplier designs, extensive measurements were obtained through simulations. Simulations were conducted using Synopsys HSpice, Synopsys Nanosim, and the design kit for a commercial 65 nm low power process. The typical process corner for the technology assumes a supply voltage of 1.2V and a temperature of 25C. Cell reliability is of utmost importance in threshold logic gates, as process variations and signal noise in differential threshold logic gate have the potential to induce functional failures at any operating frequency. To compare the reliabilities of the various differential threshold logic implementations, Monte Carlo simulations were performed for ten different logic functions: a buffer, an AND function of 3, 5, and 7 inputs, an OR function of 3, 5, and 7 inputs, and a MAJORITY function of 3, 5, and 7 inputs. The number of failures out of 10,000 random samples was recorded for each, assuming the parameters each device in a gate to be completely independent. Additionally, static noise margin was computed for each function, deﬁned as the maximum DC voltage by which one of the bistable nodes N1 and N2 of each logic element may be offset before a failure occurs. A failure is deﬁned as the event that the gate will evaluate an incorrect output for any possible input combination, as described by the fault model in [11]. The results are summarized in Table I. Each threshold function will exhibit a different static noise margin; increased contention between the two branches of the differential ampliﬁer will effectively reduce the noise margin of the gate, thus the noise margin will vary from function to function depending on the combination of input signals applied. Gates with small differences in impedance values between input networks will demonstrate the smallest noise margins, and must thus be sized more aggressively to compensate.

TABLE I C ELL RELIABILITY COMPARISON BETWEEN DCSTL, SCSDL, AND TLL Function

MC failures (out of 10,000)

Static noise margin (mV)

DCSTL

SCSDL

TLL

DCSTL

SCSDL

TLL

BUFFER

1784

0

0

231

445

533

AND3/OR3

1771

638

0

102

67

345

AND5/OR5

2367

906

0

89

60

363

AND7/OR7

3005

1146

0

77

54

379

MAJORITY3

2542

1106

0

109

70

339

MAJORITY5

3984

3222

0

68

39

268

MAJORITY7

1654

2695

0

68

27

238

All logic elements were loaded by four unit inverters and sized for appropriate drive strength. Clocked signals were driven with a local clock buffer, whereas input signals were supplied via ideal voltage sources assuming a linear signal slew. Simulations were performed assuming a clock frequency of 1 GHz. As the tables show, for gate sizings identical to those of the DCSTL and SCSDL gates, the TLL element exhibits no functional failures. The static noise margin of the TLL element is also signiﬁcantly higher than that of other differential threshold logic implementations for every function, up to an order of magnitude. For static logic gates, 10-20% of the supply voltage is generally regarded as a safe and viable noise margin; the TLL element satisﬁes this constraint for all of the functions simulated, while the DCSTL and SCSDL gate typically exhibit a noise margin that is less than 10% of the supply voltage. V. T HRESHOLD LOGIC - BASED DESIGN The ability to efﬁciently compute threshold logic functions combined with robustness in the presence of static noise make TLL a viable candidate in automated design ﬂow. To demonstrate the advantages of TLL gates over traditional components in automated design, a 32-bit 2’s complement integer multiplier implemented in two stages was automatically hybridized from RTL using a library of TLL gates. Custom double-height TLL standard cells were designed, extracted and characterized for the hybridized designs using a commercial 65 nm design kit. Using Cadence RC Encounter and a commercial 65 nm standard cell library, the designs were synthesized, placed, and routed across a range of maximum frequency values under worst case delay conditions. Blocks were designed with a square aspect ratio and target density of 70-75%. All external input and clock pins are positioned opposite from the external output pins. Five metal layers were provided for routing. For comparison, conventional designs using only the CMOS standard cells were synthesized, placed, and routed for the same frequencies (up to the attainable maximum) using the same methodology. Timing closure of all designs was conﬁrmed using both Cadence RC Encounter and Synopsys PrimeTime. Power simulations were performed at the maximum attainable frequency using Synopsys Nanosim over 1000

high activity input vectors under worst case delay conditions, assuming a supply voltage of 1.1V and a temperature of 105C. A summary of all of the experimental results is given in Figures 4. As the results indicate, the hybrid designs are capable of achieving higher speeds of operation and lower power consumption than those utilizing the conventional approach. At higher frequencies, the hybrid multiplier is up to 1.5x smaller than an equivalent conventional multiplier, consuming up to 1.5x less power and 2.3x less leakage power. The hybrid multiplier is also capable of outperforming the conventional multiplier by 20% while still requiring less area, power, and leakage. VI. S UMMARY AND C ONCLUSIONS In this paper, we proposed the threshold logic latch (TLL), a new type of differential threshold logic that can be used to augment CMOS designs for higher performance and lower power consumption. The innovative structure of the logic element reduces its sensitivity to signal noise and process variations, increasing static noise margin an order of magnitude over that of existing implementations of differential threshold logic. TLL’s ease of use and broad impact make it extremely well suited to aggressively scaled modern processes. ACKNOWLEDGMENT This work was supported in part by the National Science Foundation under Award CCF-0702831, the Science Foundation Arizona SFAZ-SBC, and the Stardust Foundation. Any opinions, ﬁndings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reﬂect those of the supporting agencies. R EFERENCES

Fig. 4. Frequency vs. area, total power, and leakage between conventional and hybrid multiplier designs.

[1] V. Beiu, J. M. Quintana, and M. J. Avedillo, “VLSI implementations of threshold logic - a comprehensive survey,” IEEE Trans. on Neural Networks, vol. 14, no. 5, pp. 1217-1243, 2003. [2] H. Chow and I. Wey, ”A 3.3V 1GHz high speed pipelined Booth multiplier,” ISCAS 2002, vol. 1, pp. 457-460, 2002. [3] T. Gowda and S. Vrudhula, “Decomposition based approach for synthesis of multi-level threshold logic circuits,” Proc. of ASP-DAC 2008, pp. 125130, 2008. [4] H. Huang and T. Wang, ”CMOS capacitor coupling logic (C3 L) circuits,” Proc. of AP-ASIC 2000, pp. 33-36, 2000. [5] C. Jia, L. Milor, and H. Huang, ”Capacitor coupling threshold logic,” MWSCAS 2002, vol. 1, pp. I483-I486, 2002. [6] J. A. H. L´ opez, J. G. Tejero, J. F. Ramos, and A. G. Boh´ orquez, ”New types of digital comparators,” ISCAS 1995, vol. 1, pp. 29-32, Apr. 1995. [7] S. Muroga. Threshold Logic and Its Applications, New York: WILEYINTERSCIENCE, 1971. [8] S. D. Naffziger, ”Feedback-induced pseudo-NMOS static (FIPNS) logic gate and method,” U.S. Patent 6 466 057, 2002. [9] M. Padure, S. Cotofana, C. Dan, S. Vassiliadis, and M. Bodea, ”A new latch-based threshold logic family,” Proc. of CAS 2001, vol. 2, pp. 531534, 2001. [10] M. Padure, S. Cotofana, and S. Vassiliadis, ”Design and experimental results of a CMOS ﬂip-ﬂop featuring embedded threshold logic,” Proc. of ISCAS 2003, vol. 5, pp. V253-V256, 2003. [11] M. K. Goparaju and S. Tragoudas, ”Parametric fault model for RTD based threshold gates,” Proc. of WSEAS 2006, pp. 1-6, 2006. [12] R. Strandberg and J. Yuan, ”Single input current-sensing differential logic (SCSDL),” Proc. of ISCAS 2000, vol. 1, pp. 764-767, 2000. [13] J. L. Subirats, J. M. Jerez, and L. Franco, ”A new decomposition algorithm for threshold synthesis and generalization of boolean functions,” IEEE Trans. on Circuits and Systems, vol. 55, no. 10, pp. 3188-3196, 2008.

Recommend Documents

High Performance Design Techniques of ... - Semantic Scholar

Robust Performance of Decentralized Control ... - Semantic Scholar