Delay Model for Static CMOS Complex Gates

Report 4 Downloads 52 Views
Delay Model for Static CMOS Complex Gates Felipe S. Marranghello, André I. Reis, Renato P. Ribas PGMICRO, Federal University of Rio Grande do Sul – UFRGS, Porto Alegre, Brazil {fsmarranghello,andreis,rpribas}@inf.ufrgs.br

Abstract — This paper presents a novel approach for delay modeling of CMOS complex gates, containing series and parallel transistor arrangements. The model uses a charge based approach instead of evaluating voltages as function of time. The impact of input transition time, input-to-output coupling capacitance and short-channel effects, such as drain-induced barrier lowering (DIBL) and velocity saturation, are taken into account. The only empirical parameters are those required to calibrate the transistor model. Analytical results are in good agreement with HSPICE simulation data, based on BSIM4 transistor model, over a wide range of input slopes and output loads. Additionally, model accuracy has been improved when compared to previous related work.

Moreover, static timing analysis (STA) is usually performed considering single input transition per gate in the circuit. In this work, a novel analytical delay model for static CMOS complex gates is proposed for standard cell library characterization. The major contribution is a model that considers the relevant physical effects without fitting parameters for the delay model calibration. The rest of the paper is organized as follows. Section II presents a brief overview of related work. Section III describes the model deduction. Section IV shows the results, and Section V outlines the conclusions.

Keywords – Digital circuits, complex CMOS gates, delay model, UDSM CMOS, transistor stack.

As already mentioned, the main challenge in modeling complex gates is to consider the influence of different physical effects. Since parallel arrangements are easily reduced by summing the widths of associated devices, modeling series arrangements represents the main challenge. Roughly, existing methods can be divided into three categories: RC based models, reduction to equivalent inverter, and current estimation in stacked devices. In RC model category, the transistor network is modeled as an equivalent RC circuit. The advantage is that the related formulation is straightforward. However, RC based model fails to reproduce the impact of important effects and the nonlinear behavior of CMOS gates. The well-known Elmore delay is widely adopted due to its simplicity [8]. In [9], it is proposed an improved RC based model through the use of fitting parameters, whereas in [7], the inverter RC model has been improved through the good estimation of the overshoot time. However, the extension to CMOS gates is not addressed. Methods based on equivalent inverter reduction, on the other hand, aim to convert each logic plane from the static CMOS gate into an equivalent transistor. For that, the output voltage waveform of such equivalent inverter should be as close as possible to the output voltage behavior of the original structure. In [10], a simple conductance association is applied. The conductance of each device is considered to be proportional to the W/L ratio. This approach presents acceptable results only when all transistors operate in linear mode, or when the transistors are driven by the same input signal and the input transition is fast enough [22]. In several approaches, improvements to the estimation of the equivalent transistor are presented using several fitting parameters [11, 12, 15]. Furthermore, an iterative reduction technique for single input switching is also presented in [13]. Nevertheless, it has been only demonstrated for energy-delay product of dynamic gates, without presenting results of gate delay evaluation. Models based on the current estimation in stacked devices aim to predict the gate delay by determining an average discharge current. The transistor collapsing and the transistor

I.

INTRODUCTION

Standard cell methodology is often used in VLSI design. Since the design quality is tied to available cells in the targeted library, virtual libraries appear as a promising alternative [1]. However, electrical simulations to characterize the timing of cells become rapidly impractical due to the huge amount of data. A possible solution is the utilization of analytical delay models with a trade-off between runtime and accuracy. These models have been investigated over the years [2]-[21], being applied in optimization of circuits such as I/O buffers. Deriving an accurate delay model for static CMOS gates is a hard task due to many effects that impact the circuit behavior. An accurate model must consider the influence of input transition time, parasitic capacitances, short-circuit current, body effect, channel length modulation and drain induced barrier lowering (DIBL). Therefore, a pure mathematical approach becomes unfeasible for delay models aiming general CMOS gates. In order to deal with, some simplifications to the original structure are usually done. Common simplifications are series and parallel transistors collapsing, and the replacement of transistors by resistors and current sources. Methods that model the gate as RC networks present straightforward formulation, at the cost of either neglecting important effects [8] or using fitting parameters [9]. In such approaches, accuracy is quite limited because resistors fail to reproduce the behavior of transistors in most cases. More sophisticated methods aim to maintain the electrical behavior of the equivalent structure similar to the original one. Some methods based on the reduction of CMOS complex gates into equivalent inverter [10-13] can benefit from inverter models [2-7]. Alternatively, the equivalent structure is more complex than an inverter but simple enough to be efficiently applied [14-21]. Due to the difficulty on modeling the most significant effects, existing approaches neglect important effects which limit accuracy or use fitting parameters. Models that consider multiple input switching tend be too complex [11, 12, 15]. 978-1-4799-1132-5/13/$31.00 ©2013 IEEE

978-1-4799-1132-5/13/$31.00 ©2013 IEEE

II.

RELATED WORK AND MOTIVATION

replacement by resistors or current sources are the most common strategies. In [14], the current reduction factor is calculated independently of the switching device and the input transition time, whereas in [16], the current depends on the input transition. Moreover, in [16], when the switching transistor is not placed at the top of the pull-down NMOS network and the input transition is slow, it is assumed that the top transistor is operating in the linear region, which is not actually true. In [17], it is considered that the current through the transistor stack is approximately the same of the current across the inverter. This assumption has not been verified. On the other hand, the logical effort (LE) method, provides very simple expressions for the delay, even though it considers neither the input transition time nor the I/O coupling capacitance [18]. Some improvements have been proposed to the original LE model [19]-[21]. It is worth to mention that the modifications proposed have usually great impact on the model simplicity, which is one of the main advantages of LE. In [21], a slope correction factor to capture the impact of different input and output slopes is proposed. However, one transient simulation is required for each gate in order to extract the slope correction parameter, and the model is only valid for fast inputs. In [19], the current reduction factor and the influence of internal capacitances are assumed to be independent of the input transition time. In [20], in turn, the reduction factor is deduced for both fast and slow input ramps, and empirical parameters are added to fit the model results. Considering the delay models presented in previous works, it seems clear that there is a room for a model that can consider all relevant effects without fitting parameters. In this work, such a model is proposed, being suitable for static CMOS complex gates, as illustrated in Fig. 1.

Figure 1: Static CMOS complex gate.

III.

PROPOSED GATE DELAY MODEL

The gate delay (Td) is defined as the time interval between the moment when the input and the output reach half of the supply voltage value (Vdd): Td = Tout50 – (Tin/2)

(1)

where Tout50 is the time when the output signal reaches Vdd/2, and Tin is the input transition time. Since Tin is known, the model accuracy lies on the prediction of Tout50. The basic principle of the proposed model is that the time interval (ǻt) needed to cause a voltage drop ǻV in a capacitance C under an average current (Iavg) is expressed as follows [16][20]: ߂‫ ݐ‬ൌ

‫ܸ߂ܥ‬ ‫݃ݒܽܫ‬

(2)

It is applied the alpha-power transistor model [3], although other device models can be easily adapted [23]. The behavior of NMOS drain current (Ids) is modeled as follows:

­0 ° Ids = ® K ln Wn(Vgs − Vtn) α / 2 Vds ° KsnWn (Vgs − Vtn) α (1 + λVds ) ¯

cutoff linear

(3)

saturation

ܸ݀‫ ݐܽݏ‬ൌ ܲ‫ כ ݊ݒ‬ሺܸ݃‫ ݏ‬െ ܸ‫݊ݐ‬ሻఈȀଶ

(4)

where the parameter Į is the velocity saturation index. Wn is the effective transistor width. Kln, Ksn and Pvn are empirical parameters. Vds is the drain-to-source voltage, Vgs is the gateto-source voltage, Vdsat is the saturation voltage, ߣ is the channel length modulation parameter. Similar parameters are defined for PMOS transistor. The transistor threshold voltage (Vtn) also depends on Vds due to the DIBL effect, and on the source-to-bulk voltage (Vsb) due to the body effect. These effects should not be neglected in delay modeling. Considering DIBL and body effect, Vtn is given as: ܸ‫ ݊ݐ‬ൌ ܸ‫ Ͳݐ‬൅ ߜ௡ ܸ‫ ܾݏ‬െ ߟ௡ ܸ݀‫ݏ‬

(5)

where į is the body effect coefficient, and Ș is the DIBL coefficient. Notice that a linear approximation for the body effect is used. It must be noted that the used transistor model is only valid for strong inversion. As a consequence, the proposed delay model carries the same restriction. Hereafter, a rising input (i.e., falling output) is considered. The case for falling input (i.e., rising output) is analyzed in a similar manner by exchanging NMOS and PMOS parameters. A. Short Circuit Current During the output transition, both pull-up PMOS and pulldown NMOS planes in CMOS gate are partially ON during part of the output transition, and a short circuit (SC) current may flow from power supply to ground. This current is especially important for CMOS inverter. Notice that any other gate has a transistor stack in, at least, one of the planes, so increasing internal capacitances that reduce the amount of SC current. Additionally, SC current can be neglected for sufficiently fast inputs with no significant accuracy loss. Indeed, in terms of general input transitions, the specific modeling of the SC current is essential for accurate timing prediction. In this work, this current is seen as an extra charge that must be discharged, as discussed in [4] and in [5]. Another approach is to consider a reduction on the discharge current [9, 16, 20]. Both strategies are equivalent. Accurate prediction of the total SC current is known as a hard task [2426]. However, for gate delay models, the need for very accurate estimation of such current is limited since even moderately accurate methods can lead to a good precision gain. For instance, if the model error due to neglecting the SC current is 30%, and its estimation also presents an error of 30%, then the final error is reduced to 9%. For this reason, a first order approximation for the short-circuit charge is assumed to be enough for our goal. In order to evaluate the SC current, both the time interval during the short-circuit period (Tsc) and the average SC current value should be estimated (Isc). Tsc is given by the time needed for the input to switch between the transistor PMOS and NMOS threshold voltages, Vtp and Vtn, respectively: ܶ‫ ܿݏ‬ൌ ሺܸ݀݀ െ ȁܸ‫݌ݐ‬ȁ െ ܸ‫݊ݐ‬ሻܶ݅݊Ȁܸ݀݀

(6)

In (6), the threshold voltages are measured with Vds=Vdd/2. Isc is calculated considering the maximum possible |Vgs| for PMOS transistor, and Vds=Vdd/2, as follows: ‫ ܿݏܫ‬ൌ ‫ݏܭ‬௣ ܹ௣ ሺܸ݀݀ െ ȁܸ‫݌ݐ‬ȁ െ ܸ‫݊ݐ‬ሻఈ೛ ൬ͳ ൅ ߣ௣

ܸ݀݀ ൰ ʹ

(7)

To consider that SC current is very small for fast inputs, the final expression for the total short-circuit charge (Qsctot) is defined as: ሺܸ݀݀ െ ȁܸ‫݌ݐ‬ȁ െ ܸ‫݊ݐ‬ሻ (8) ൫ܶ݅݊ െ ܶ݅݊௥௘௙ ൯‫ܿݏܫ‬ ܸ݀݀ where the term Tinref is a reference input transition time that defines the boundary between fast and slow inputs. An expression for Tinref is derived later. The SC current contribution to delay (Qsc) is half of the total value: ܳ‫ܿݏ‬௧௢௧ ൌ

ܳ‫ ܿݏ‬ൌ ͲǤͷܳ‫ܿݏ‬௧௢௧

(9)

models state that this voltage is Vdd-Vtn_top, where Vtn_top is the threshold voltage of the device connected at the output node [13, 14, 16, 19, 20]. However, this statement is inaccurate and arises from the fact that Vdd-Vtn is the maximum voltage that an NMOS transistor can charge a capacitance, as observed in pass-transistor logic (PTL) design. In the analysis of switched-off series transistors, such situation is quite different since the current that flows in the structure must be the same in all devices, excepting for small variations due to parasitic currents like gate leakage. For this reason, the internal voltages rise until all currents are equal. The initial voltage value depends on the cell topology. In order to illustrate such behavior, the circuit presented in Fig. 2 is used. Table 1 shows the initial source voltage value of the top transistor for different number of bottom devices. In the same way, the value for the initial voltage drop is considered a technology constant. The supply voltage for the technology considered is 1 V.

B. Parasitic Network In this work, the term ‘parasitic network’ refers to the network that stops to conduct during the input transition, i.e., the pull-up PMOS plane for a rising input. The parasitic network adds extra charge due to internal capacitances and due to the SC current. Initially, all transistors in the network are either cutoff or in the linear region. Stacked devices can be reduced to an equivalent transistor (Weq) using [15]: ͳ ͳ ൌ෍ ܹ ܹ݁‫ݍ‬

Table 1: Comparison of initial top transistor source voltage value and top transistor threshold voltage.

whereas parallel devices are merged using: ܹ݁‫ ݍ‬ൌ ෍ ܹ

(11)

In both equation (10) and equation (11), the effective widths are considered. Equation (10) is accurate under these conditions because, during a considerable part of the output transition, all devices operate in the linear mode. The accuracy of equation (11) is known to be enough. Naturally, off-devices are not taken into account. The extra charge due to internal capacitances considers only the nodes that are between the output node and the drain node of the switching transistor. These nodes are initially charged to Vdd and discharged to a minimum value of Vdrop (to be defined in the next section). If the output transition to Vdd/2 is considered, the source voltage variation of transistor connected to the output is nearly zero for very fast inputs, and close to (Vdd-|Vdrop|)/2 for very slow inputs. The actual value depends on the input transition time. The final voltage of the other nodes is estimated considering a voltage divider. Each internal capacitance (Ci) is added to the output node as an equivalent capacitance (Ceqi) given by: ‫ݍ݁ܥ‬௜ ൌ

‫ כ ݅ܥ‬ሺܸ݀݀ െ ܸ݅ሻ ܸ݀݀

Figure 2: Test circuit to determine Vdrop.

(10)

(12)

where the term Vi represents the node voltage when the output crosses Vdd/2. C. Initial Value of Internal Nodes When the output signal is high, some nodes of the pulldown NMOS network may be partially charged. Previous

N 1 2 3 4

Vdrop (V) 0.252 0.268 0.278 0.285

Vtn_top (V) 0.571 0.566 0.563 0.561

In order to obtain a value for Vdrop analytically, the subthreshold current of the top transistor is equated to the subthreshold current of the bottom transistor. Assuming that both transistors are saturated, Vdrop is found: ܸ݀‫ ݌݋ݎ‬ൌ ܸ݀݀ െ

ܷ݊‫ Ž ݐ‬ቀ

ௐ௧௢௣ ௐ௕௢௧

ቁ ൅ ܸ݀݀ሺͳ െ ߟ௡ ሻ

ሺͳ ൅ ߜ௡ ൅ ʹߟ௡ ሻ

(13)

D. Top Transistor In Fig. 1, the NMOS transistors controlled by the variables a and c are the top ones on the pull-down network. When the top transistor switches, the other transistors act as a resistive path that reduces the discharging current. All transistors below the top ones can be accurately merged using equation (10), since they operate in the linear region during the entire output transition. The switching transistor enters saturation once the input reaches Vtn, and stays saturated during the whole output swing to Vdd/2. In order to consider the influence of Vds on saturation current, an average value is used. The minimum output voltage value (Vmin) is Vdd/2, and the maximum (Vmax) is taken to be the maximum output voltage due to the overshoot. Vmax is calculated with a capacitive voltage divider, as follows:

ܸ݉ܽ‫ ݔ‬ൌ ܸ݀݀ሺͳ ൅

‫݉ܥ‬ ሻ ‫ ݉ܥ‬൅ ‫ݐݑ݋ܥ‬

(14)

where Cm is the coupling capacitance of the switching transistors connected at the output node and Cout is the total capacitance connected to the output node. For instance, if input a in Fig. 1 switches, both the NMOS and PMOS transistors contribute to Cm at the output node, and if input c rises only the NMOS transistor contributes to the overshoot. In order to estimate the discharging current, the values for source voltage (Vs) of the top transistor when Vout=Vmax and Vout=Vmin have to be calculated. For a fixed top transistor drain voltage (Vd), Vs can be found by equating the saturation current of the top transistor (Isattop) with the linear current of the bottom devices (Ilineq), merged using equation (10) and equation (11): ‫ݐܽݏܫ‬௧௢௣ ൌ ‫݈݊݅ܫ‬௘௤

(15)

In order to obtain a simple expression, the increased source voltage effect on CLM is neglected (only the reduction on the drain voltage is considered). In [12, 15], the approaches completely neglect CLM. Thus, equation (15) becomes: ‫݌݋ݐܹ כ ݊ݏܭ‬ሺܸ݀݀ െ ܸ‫ Ͳݐ‬െ ߟܸ݀‫ ݏ‬െ ሺͳ ൅ ߜሻܸ‫ݏ‬ሻሺͳ ൅ ߣܸ݀ሻ =‫ כ ݐ݋ܾܹ כ ݈݊ܭ‬ඥሺܸ݀݀ െ ܸ‫Ͳݐ‬ሻ ‫ݏܸ כ‬

(16)

where Wbot is the equivalent width of the merged bottom devices and Wtop is the width of the top transistor. Vs is found by solving equation (16). Once the Vs values for both Vmax (Vsmax) and Vdd/2 (Vsmin) are calculated, the stack current for each case is estimated and the average value is calculated (Idstop). Moreover, the average threshold voltage is also estimated. If Vs is zero, then the only transistor in the stack is the switching transistor. Capacitances connected to the source node of the top transistor diminish the current reduction factor of the transistor stack. Nevertheless, such effect is quite small, being neglected in this work. To create a clear separation of fast and slow input domains, a critical input transition time (Tinref) is defined such that Tout50=Tinref. For any input faster than Tinref, the input reaches Vdd before the output reaches Vdd/2. If the input is slower than Tinref the input reaches Vdd after the output reaches Vdd/2. Tinref is the same as in equation (8). The reference time Tin_ref is found by equating the charge that must be drained from the output node with the charge through the topmost device during the input transition time, as follows: ܶ݅݊௥௘௙ ൌ ሺߙ ൅ ͳሻ

ሺ‫ ݉ܥ‬൅ ‫ݐݑ݋ܥ‬ሻ ‫כ‬

௏ௗௗ ଶ

൅ ‫ܸ݀݀ כ ݉ܥ‬

‫ݏ݀ܫ‬௧௢௣ ‫ כ‬ቀͳ െ

௏௧௡ ௏ௗௗ



(17)

Since Tinref separates the fast and slow domains, the SC current can be neglected in this estimation with small accuracy loss. In the fast domain, Tout50 becomes: ௏௧௡

ܶ‫ݐݑ݋‬ͷͲ ൌ

ܶ݅݊ ‫ כ‬ቀߙ ൅ ቁ ܳ‫ݐ݋ݐ‬ ௏ௗௗ ൅ ߙ൅ͳ ‫ݏ݀ܫ‬௧௢௣

(18)

where Qtot is the total charge to be discharged. The SC current is neglected for fast inputs. In the slow input domain a different approach is used. It is considered that the maximum current capability of NMOS device occurs when the output reaches half the supply voltage, since the device is saturated, and it has the maximum Vgs voltage at this instant.

The total charge to be removed from the output node (Qtot) equals the charged drained by the top transistor, as follows: ܳ‫ ݐ݋ݐ‬ൌ

ఈ ்௢௨௧ହ଴ ‫ݏ݀ܫ‬௧௢௣ ܸ݀݀ ‫ݐ כ‬ න ൬ െ ܸ‫݊ݐ‬൰ ݀‫ݐ‬ ሺܸ݀݀ െ ܸ‫݊ݐ‬ሻఈ ்௩௧௡ ܶ݅݊

(19)

Solving (19) yields the value for Tout50, including the SC: ሺఈାଵሻொ௧௢௧்௜௡ഀ ሺ௏ௗௗି௏௧௡ሻ

ܶ‫ݐݑ݋‬ͷͲ ൌ ሺ

ூௗ௦೟೚೛

௏ௗௗ ഀ



ሻഀశభ ൅ ܶ݅݊

௏௧௡ ௏ௗௗ

.

(20)

E. Bottom Transistor In Fig. 1, the NMOS transistors controlled by variable b, e and f are the bottom ones in the pull-down network. The top transistor is saturated and the switching transistor enters in saturation as soon as the input voltage reaches the threshold voltage. During the output voltage excursion to Vdd/2, the topmost device stays saturated while the switching device may enter the linear region. For a sufficiently fast (slow) input, the switching device is in linear (saturation) region. All other devices operate in the linear region during the entire period. For fast inputs, the final source voltage value of the top transistor is approximately Vsmin (as defined in previous section). The values for the other nodes can be estimated through a voltage division. The total charge due to the internal capacitances (Qin) is given by: ܳ݅݊ ൌ ෍ ‫ כ ݅ܥ‬ሺܸ݀݀ െ ܸ݀‫ ݌݋ݎ‬െ ܸ݁݊݀݅ሻ

(21)

where Ci represents the capacitances associated to each node that must be discharged, Vdd-Vdrop is the initial voltage values, and Vendi is the final voltage value of the node. Moreover, one can consider that the internal capacitances are discharged prior to the output node [20]. Hence, it can be considered that the output node is discharged according to the current capability of the top transistor, and that the internal nodes are discharged according to the current capability of the switching transistor. The reference input transition time is: ܶ݅݊௥௘௙ ൌ

ߙ൅ͳ ቀͳ െ

௏௧௡ೞೢ ௏ௗௗ

ܳ‫ݐݑ݋‬ ܳ݅݊ ሺ ൅ ሻ ‫ݏ݀ܫ‬ ‫ݏ݀ܫ‬ ௧௢௣ ௦௪ ቁ

(22)

where Vtnsw is the average threshold voltage of the switching transistor. Qout is the total charge to be removed from the output node, Qin is the total charge to be discharged from internal capacitances, Idstop is the average current for the top transistor, and Idssw is the average current for the switching transistor. The value for Idssw is obtained in a similar way to Idstop, considering the switching transistor as a top transistor. It should be noted that, when estimating Idssw, Vdd-Vdrop is used in (14) instead of Vdd to calculate Vmax. The output transition time is given by: ܳ‫ݐݑ݋‬ ܳ݅݊ ܶ݅݊ ܸ‫݊ݐ‬௦௪ ܶ‫ݐݑ݋‬ͷͲ ൌ ቆ ൅ ቇ൅ ൬ߙ ൅ ൰ ܸ݀݀ ‫ݏ݀ܫ‬௧௢௣ ‫ݏ݀ܫ‬௦௪ ߙ൅ͳ

(23)

When the input transition is slow, the voltage drop of internal capacitances is smaller, and their final value is higher than Vsmin. Some works neglect the dependence of the internal voltage variation on input transition time [14, 19]. In [20], this problem is solved considering that the top transistor operates in the linear region rather than in saturation. In [13], an empirical formula that neglects the voltage drop of internal capacitances for sufficiently slow inputs is presented. However, there must always be a voltage drop on internal

nodes. Otherwise, the topmost transistor would not be able to discharge the output. For a sufficiently slow inputs, (VddVdrop)/2 is a good approximation for the source voltage of the top transistor. In this work, as empirical formula is used, it presents better accuracy than others. The final source voltage of the top transistor is given by: ܸ݁݊݀௦௟௢௪ ൌ

ͲǤͷ ‫ כ‬ሺܸ݀݀ െ ܸ݀‫݌݋ݎ‬ሻ ͳ ൅ ܾ ‫ି ݁ כ‬଴Ǥହ‫כ‬௧ᇱ

(24)

where b and t’ are, respectively, given by: ܾൌ

ͲǤͷ ‫ כ‬ሺܸ݀݀ െ ܸ݀‫݌݋ݎ‬ሻ െͳ ܸ݄݈݂ܽ

(25)

ܶ݅݊ െ ܶ݅݊௥௘௙ ܶ݅݊௥௘௙

(26)

And: ‫ݐ‬Ԣ ൌ

The voltages of other nodes are estimated considering a voltage divider, even though the bottom transistor is expected to be saturated. Fig. 3 shows measured values for the source voltage of the top transistor in 2- to 4-input NAND gates, when the bottom transistor switches compared to the calculated value using equation (24) for different input transitions. Good agreement with HSPICE simulation has been verified.

Table 2 summarizes the errors obtained for different gates. The input transition time varies from 1 ps to 500 ps, with a time step of 1 ps. The values for output load are 1/4, 1, 4, 16, 64 and 256 times the input capacitance of NMOS transistor. The first column (Gate) depicts the gate under test, the second the switching input (Input), the third the average relative error (Avg), and the last one the worst case error (Wc). Both the average and the worst case errors consider all values of input transition times and output loads. CMOS inverter, NAND 2-4, NOR 2-4, and the gates depicted in Fig. 1 and Fig. 4 are considered in the analysis. Notice that there is more than one arc related to input a of the circuit in Fig. 4 and the all arcs are considered. NAND (NOR) gates are especially interesting to evaluate the stack modeling in the conducting (parasitic) network. For NAND and NOR gates, input a is the stacked transistor connected at the output node, input b is connected to the source of a, and so forth. The worst case error is circa 11%. Previous related works present the worst case error of more than 15% [13][19][20]. Moreover, the method proposed in [20] requires fitting parameter to consider the SC current. In [19], the error increases if the coupling capacitance or the SC current represent significant fraction of the total charge. The method proposed in [13] is only suitable to dynamic gates. Thus, the gate illustrated in Fig.1 cannot be evaluated but this method. Finally, the model proposed in [21] is the only one valid for fast inputs. Table 2: Modeling error for several gates.

Figure 3: Final values for internal voltages.

For a sufficiently slow input, it can be considered that the bottom transistor stays saturated during the whole output transition to Vdd/2, and that all capacitances discharge simultaneously. The estimation of Tout50 is performed in the same way as in the top switch, using (20), setting Vsmax to zero and by adding the extra charge due to the internal nodes to Qtot. F. Intermediate Transistor In Fig. 1, the NMOS transistor controlled by variable d is an intermediate transistor in the network. In this situation, devices operate in similar regions as at the bottom location. The equations for this situation are obtained considering all effects of top and bottom transistor switching. IV.

MODEL VALIDATION

This section presents the results of the proposed model, considering the electrical parameters from PTM32 [27]. The minimum drawn width used is twice the drawn length. Model evaluation must be carefully done because gate delay diminishes (and may become zero) when the input is slow, and any error becomes a large relative error although the absolute error is insignificant. In this work, accuracy is evaluated by considering the precision on evaluating Tout50. Notice that Tout50 always increases with the input transition.

Gate Input Avg(%) WC(%) NAND4 a 4.22 9.7 NAND4 b 3.41 5.6 NAND4 c 3.67 8.4 NAND4 d 3.69 10.6 NAND3 a 2.07 3.0 NAND3 b 1.79 2.4 NAND3 c 3.16 6.6 NAND2 a 2.12 5.0 NAND2 b 1.76 3.2 FIG1 a 1.40 5.56 FIG1 b 2.82 4.66 FIG1 c 1.33 10.1 INV a 2.35 7.45 FIG4 d 3.74 6.90

Gate Input Avg(%) WC(%) NOR4 a 3.4 7.45 NOR4 b 3.3 7.13 NOR4 c 3.1 7.23 NOR4 d 2.9 6.35 NOR3 a 2.0 5.3 NOR3 b 2.3 6.2 NOR3 c 2.8 5.71 NOR2 a 2.5 6.1 NOR2 b 2.7 6.3 FIG1 d 3.45 7.1 FIG1 e 4.1 9.1 FIG1 f 3.72 6.51 FIG4 b 2.83 8.27 FIG4 c 2.31 6.47

Table 3 details the average relative error (Avg), and Table 4 details the worst case relative error (WC) for different loads for the 4-input NAND gate (NAND4). That is the gate that presents the overall worst case error. Fig. 5 presents the relative error as input transition time function, for different switching situations. Furthermore, Fig. 6 illustrates the need to use a small time step when validating the model. Many works use sparse input transition values, and the worst case error may not reflect the real situation [19, 20]. For instance, the work presented in [20] considers only 10 input transition times, although it covers a wide range of input transition time. Using a small time step gives confidence that the worst case error has been correctly evaluated. V.

CONCLUSIONS

This paper presented an accurate analytical model for evaluation of delay of static CMOS gates. Moreover, the delay model considers explicitly the most relevant physical effects,

[4]

[5]

[6]

[7]

[8]

Figure 4: Example of gate used to validate the model.

[9]

without requiring additional fitting parameters. The average error is circa 3%, with the worst case around 11%. The behavior of stacked transistors is also discussed considering different input positions and transition slopes.

[10]

Table 3: Average error (%) for NAND4 gate.

[11]

Input a b c d

¼ 5.31 4.76 4.96 1.52

Normalized output capacitance 1 4 16 64 5.03 4.32 3.56 3.56 4.53 3.87 2.69 2.22 4.56 3.46 2.12 2.42 1.78 2.66 4.32 5.44

264 3.58 2.40 4.50 6.41

[12]

[13]

[14]

Table 4: Worst case model error (%) for NAND4 gate. Input a b c d

¼ 9.70 5.55 6.20 7.66

Normalized output capacitance 1 4 16 64 8.69 6.51 4.43 4.69 5.39 4.88 4.27 4.25 5.77 6.13 7.46 8.10 7.63 7.92 9.18 10

[15]

264 4.86 4.19 8.44 10.3

[16]

[17]

[18] [19]

[20]

[21]

Figure 5: Relative error of model as Tin function for NAND4.

[22]

ACKNOWLEDGMENTS Research partially supported by Brazilian funding agencies CAPES, CNPq and FAPERGS, under grant 11/2053-9 (Pronem).

[23]

[24]

REFERENCES [1]

[2]

[3]

F. S. Marques, L. S. da Rosa Jr., R. P. Ribas, S. S. Sapatnekar, and A. I. Reis. “DAG based library-free technology mapping,” in Proc. of ACM Great Lakes Symposium on VLSI, 2007, 293-298. Hedenstierna, N.; Jeppson, K.O.; "CMOS circuit speed and buffer optimization," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst, vol. 6, no. 2, Mar. 1987. pp.270- 281. Sakurai, T.; Newton, A.R.; "Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas," IEEE J. of Solid-State Circuits, , vol. 25, no. 2, Apr. 1990. pp.584-594,.

[25]

[26]

[27]

Jeppson, K.O.; "Modeling the influence of the transistor gain ratio and the input-to-output coupling capacitance on the CMOS inverter delay," IEEE J. Solid-State Circuits,, vol. 29, no. 6, June 1994. pp.646-654. Rossello, J. L.; Segura, J.; "An analytical charge-based compact delay model for submicrometer CMOS inverters," IEEE Trans. on Circuits and Systems I: Regular Papers, vol. 51, no. 7, July 2004. pp.1301- 1311. F. S. Marranghello, A.I. Reis, R.P. Ribas, "Design-oriented delay model for CMOS inverter," Proc. of 25th Symp. Integr. Circuits and Syst. Design (SBCCI), pp. 1-6 Aug. 2012. Zhangcai H.; et al.; "Modeling the overshooting effect for CMOS inverter delay analysis in nanometer technologies," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst, vol. 29, no. 2, Feb. 2010. pp.250-260. Elmore, W. C.; “The transient analysis of damped linear networks with particular regard to wideband amplifiers,” J. Applied Physics, vol. 19, no.1, 1948. Uebel, L. F.; Bampi, S.; "A timing analysis tool for VLSI CMOS synchronous circuits," in Proc. of IEEE Int’l Symp. on Circuits and Systems (ISCAS), May 1996, pp.516-519. Young-Hyun, J.; Ki, J.; Song-Bai, P.; "An accurate and efficient delay time modeling for MOS logic circuits using polynomial approximation," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst, vol. 8, no. 9, Sep. 1989. pp.1027-1032. Nabavi-Lishi, A.; Rumin, N.C.; "Inverter models of CMOS gates for supply current and delay evaluation," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst, vol. 13, no. 10, Oct 1994. pp.1271-1279. Chatzigeorgiou, A.; Nikolaidis, S.; Tsoukalas, I.; "A modeling technique for CMOS gates," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst, vol. 18, no. 5, May 1999. pp.557-575. Rossello, J. L.; de Benito, C.; Segura, J.; "A compact gate-level energy and delay model of dynamic CMOS gates," IEEE Trans. on Circuits and Systems II: Express Briefs, vol. 52, no. 10, Oct. 2005. pp.685- 689. Sakurai, T.; Newton, A. R.; "Delay analysis of series-connected MOSFET circuits," IEEE J. Solid-State Circuits, vol. 26, no. 2, Feb. 1991. pp.122-131. Nikolaidis, S.; Chatzigeorgiou, A.; "Modeling the transistor chain operation in CMOS gates for short channel devices," IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications, vol. 46, no. 10, Oct. 1999. pp.1191-1202. Daga, J. M.; Auvergne, D.; "A comprehensive delay macro modeling for submicrometer CMOS logics ," IEEE Journal of Solid-State Circuits, vol. 34, no. 1, Jan. 1999. pp. 42-55. Li Ding; et al.; "Modeling the overshooting effect of multi-input gate in nanometer technologies," in Proc. of IEEE Int’l Midwest Symp.on Circuits and Systems (MWSCAS), 2011, pp.1-4. Sutherland, I.; Sproull, B.; Harries, D.; Logical effort: Design Fast CMOS Circuits, 1999, Morgan Kaufmann. Kabbani, A.; Al-Khalili, D.; Al-Khalili, A. J.; "Delay analysis of CMOS gates using modified logical effort model," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst, vol. 24, no. 6, June 2005. pp.937- 947. Lasbouygues, B.; et al.; "Logical effort model extension to propagation delay representation," IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol.25, no.9, Sep. 2006. pp.1677-1684. Wang, C. C.; Markovic, D.; "Delay estimation and sizing of CMOS ;ogic using logical effort with slope correction," IEEE Trans. on Circuits and Systems II:Express Briefs, vol. 56, no. 8, Aug. 2009. pp.634-638. Jeong-Taek Kong; Overhauser, D.; , "Methods to improve digital MOS macromodel accuracy," IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol.14, no.7, pp.868-881, Jul 1995 Toh, K.-y.; Ko, P.-K.; Meyer, R.G.; , "An engineering model for shortchannel MOS devices," Solid-State Circuits, IEEE Journal of , vol.23, no.4 ,pp. 950-958, Aug1988 doi: 10.1109/4.346 Veendrick, H. J. M.; "Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits," IEEE Journal of SolidState Circuits,vol. 19, no. 4, Aug. 1984. pp.468- 473. Rossello, J.L.; Segura, J.; , "Charge-based analytical model for the evaluation of power consumption in submicron CMOS buffers," IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst, vol.21, no.4, pp.433448, Apr 2002 Liu, C. C.; Jian, C.; Johnson, L. G.; "Energy model of CMOS gates using a piecewise linear model," in Proc. of IEEE Int’l Symp. on Circuits and Systems (ISCAS), 2010, pp.3829-3832. Zhao, W.; Cao, Y.; "New generation of Predictive Technology Model for sub-45nm early design exploration," IEEE Trans. on Electron Devices, vol. 53, no. 11, Nov. 2006. pp.2816-2823. [ htpp://ptm.asu.edu]