Differential pass transistor pulsed latch - Springer Link

Report 2 Downloads 205 Views
Electr Eng (2007) 89: 371–375 DOI 10.1007/s00202-006-0018-2

O R I G I NA L PA P E R

Moo-Young Kim · Inhwa Jung · Young-Ho Kwak Chulwoo Kim

Differential pass transistor pulsed latch

Received: 13 October 2005 / Accepted: 5 December 2005 / Published online: 3 August 2006 © Springer-Verlag 2006

Abstract This paper describes the Differential Pass Transistor Pulsed Latch (DPTPL) which enhances D-Q delay and reduces power consumption using NMOS pass transistors and feedback PMOS transistors. The proposed flip-flop uses the characteristic of stronger drivability of NMOS transistor than that of transmission gate if the sum of total transistor width is the same. Positive feedback PMOS transistors enhance the speed of the latch as well as guarantee the full swing of internal nodes. Also, the power consumption of the proposed pulsed latch is reduced significantly due to the reduced clock load and smaller total transistor width compared to conventional differential flip-flops. DPTPL reduces E×D by 45.5% over ep-SFF. The simulations were performed in a 0.13 um CMOS technology at 1.2 V supply voltage with 1.25 GHz clock frequency. Keywords Flip-flop · CMOS · Pulsed-latch · Low-power

1 Introduction As the operating clock frequency of microprocessors goes higher due to advanced process technology and deep pipeline, the clock period gets shorter and the flip-flop overhead increases. Because the short clock period can be reduced to 6–8 FO4 [1], it is highly necessary to use high-speed flipflops. The power consumption of flip-flops is another significant problem in many digital systems. In a recent high frequency microprocessor, the clocking system consumed 70% of the total chip power consumption [2]. In the clocking system, 90% of the power is consumed by the flip-flops. As a result, it is important to reduce the power consumption of the flip-flop. In addition, the number of flip-flops required M.-Y. Kim · I. Jung · Y.-H. Kwak · C. Kim (B) Engineering building room 238, Department of electronics and computer engineering, Korea University, 5-1 Anam-Dong, Seongbuk-Gu, Seoul, 136-713, Korea E-mail: [email protected] E-mail: [email protected]

increases in deep pipeline architecture and the area overhead caused by flip-flops has become a serious problem. There are many flip-flops described in the literature that reduce either delay or power consumption or both. The Master-Slave Latch (MSL) is a good candidate for low power applications [3,4]. Hybrid latch flip-flop (HLFF) and semidynamic flip-flop (SDFF) have small delay at the cost of power consumption [4–6]. There are sense amplifier-based flip-flops (SAFF) and modified sense amplifier-based flipflops (MSAFF) as well as differential type flip-flops [4,7]. The ep-SFF has the advantages of lower power consumption and small delay [8]. In addition, there are reduced clockswing flip-flops (RCSFF) and low-swing clock double edgetriggered flip-flops using a reduced clock swing scheme which reduces power consumption of clock networks [9,10]. The modified SDFF (MSDFF) is one of the fastest flip-flops [11]. However, it still consumes large amount of power. This paper proposes pulsed latches using pass-transistor logic which exhibit fast speed, low power consumption, and simple structure. To overcome the voltage drop of the pass transistor logic for ‘High’ input data, the proposed pulsed latches use positive feedback PMOS transistors to restore full VDD . The rest of this paper describes the characteristics of conventional flip-flops and the proposed flip-flop and the simulation results for D-Q delay, total transistor width, setup time, power, P×D, and E×D in each case.

2 Conventional flip-flops In this section, several conventional flip-flops will be described. The MSL using the transmission gate master-slave latch pair is reported as a low-power flip-flop. Although the Clk-Q delay of MSL is small, the large setup time of MSL makes the D-Q delay of MSL relatively large. Also, the positive setup time of MSL makes slack borrowing which utilizes time left over by previous partitions, difficult. Both HLFF and SDFF have been mentioned as fast flip-flops. Their D-Q delay is smaller than that of MSL, because they have a negative setup time. However, both of them have two disadvantages.

372

One is that they consume much more power due to the use of dynamic circuits. The other is that the Q output can have a voltage bump if the ‘High’ data input feeds them when output Q is ‘Low’. MSDFF improves on the design of SDFF by improving the D-Q delay and avoiding the glitch that consumes unnecessary power. However, MSDFF still consume more power than MSL, HLFF, and ep-SFF. Sense amplifier-based flip-flops and its modified version (MSAFF) are based on a sense amplifier, and sense a small difference between inputs D and Db. However, SAFF has asymmetric rise and fall times because of an SR latch which is a speed bottleneck. SAFF incurs a large area cost and large power consumption due to having many transistors. The MSAFF has symmetric rise and fall times and faster D-Q delay than a SAFF. However, MSAFF uses many transistors incurring a large area cost. The ep-SFF is based on a single latch using a pulsed clock generator as shown in Fig. 1. The pulsed clock generator provides a pulsed clock as shown in Fig. 1c. The generated short pulse width is controlled by the delay of three-stage inverters. It has fair D-Q delay, consumes small energy and occupies small area.

3 Proposed flip-flop design Figure 2 shows the proposed differential pass transistor pulsed latch (DPTPL). DPTPL is differential type flip-flop having two data inputs and outputs. DPTPL consists of two parts, a pulsed clock generator and a static latch. The static latch consists of four parts, pass transistors, feedback PMOS transistors, clocked feedback NMOS transistors, and output drivers. Generally, as NMOS transistor, has a higher mobility than that of PMOS transistor. Using the assumptions that NMOS pass transistors have a better drive strength than transmission gates of equivalent size, the PMOS transistor of the

Fig. 1 Schematics of a explicit-pulsed hybrid static flip-flop, b pulsedclock generator, and c pulsed generator timing diagram

M.-Y. Kim et al.

Fig. 2 Schematics of a differential pass transistor pulsed latch (DPTPL) and b pulsed clock generator

transmission gate of ep-SFF (P1) is removed. However, the NMOS pass transistor has a shortcoming of not being able to swing up to VDD. Therefore, the PMOS transistors which prevent internal nodes A and B from the voltage drop of Vth, are connected. Also, because the PMOS transistors crosscoupled, these PMOS transistors not only prevent the voltage drop of Vth but also reduce the evaluation time of DPTPL due to the positive feedback. In Fig. 2a, when the dck is ‘Low’, in order to maintain the state of the previous stage, small feedback NMOS transistors are used with dclkb control. While HLFF and SDFF use a back-to-back inverter type at the output node without clock control, small feedback NMOS transistors in DPTPL are controlled by clock signals to prevent fighting current, which makes DPTPL faster with less power consumption. Differential pass transistor pulsed latch has the advantage of a short D-Q delay time since it needs only one NMOS pass transistor and one inverter from the input to the output. Also, because DPTPL has a symmetrical structure, D-Qb and Db-Q delay are almost the same. DPTPL uses a pulsed clock generator, which supplies a static latch with a pulsed clock. The power consumption of the pulsed clock generator in pulsed latches can be a significant portion of the total power consumption. If an external Local Clock Buffer (LCB) includes a pulsed clock generator and provides flip-flops with pulsed clock signal, the power consumption of the pulsed clock generator in each flip-flop can be reduced. DPTPL has large negative setup time making slack borrowing possible through the pulsed clock generator. As shown in Fig. 2, the schematic is simpler than other differential type flip-flops, which dramatically reduces area cost. The operation of DPTPL is as follows. During the short pulse width provided by the pulsed clock generator, when the dck is ‘High’, the NMOS pass transistors turn on and transmit input data to the output; at that time, the feedback NMOS transistors are off. Consequently, DPTPL can be considered as an edge-triggered flip-flop. The PMOS transistor of DPTPL prevents internal nodes A and B from seeing the voltage drop of Vth. When the dck is ‘Low’, the pass transistors are off and the feedback NMOS transistors are on. The small feedback NMOS transistors as well as the cross-coupled PMOS transistors make the latch hold the previous state.

Differential pass transistor pulsed latch

373

4 Simulation conditions and test bench The simulation of DPTPL and conventional flip-flops are performed using two methods. First, all flip-flops are simulated in a 0.13 um CMOS technology at 100◦ C with 1.2 V supply voltage and normal process corners. The operating clock frequency in this simulation is 1.25 GHz. For fair comparison of simulation results, all of the flip-flops are optimized to have minimum E×D with the same output load of 25fF. Figure 3 shows a block diagram for measurement of delay and power. Flip-flop speed is measured by data to output delay, which is the real performance of a flip-flop. Each flip-flop is designed to have balanced ‘Low’ to ‘High’ and ‘High’ to ‘Low’ transition delays, and the worst case delay is selected between them. For ep-SFF, D-Q delay was measured as a delay parameter. For all other flip-flops D-Qb delay was measured as a delay parameter. Power consumption of all flip-flops is measured by a method based on Fig. 3, where ep-SFF and DPTPL include a pulsed clock generator. Secondly, for chip testing, DPTPL, ep-SFF, SDFF, MSL, and MSAFF are laid out as shown in Fig. 5. Operating frequency in this simulation is 1 GHz. It is difficult to feed the clk and data signals with a small time difference such as 5, 2 ps, etc. Hence, a signal generator to control the time difference is needed. As shown in Fig. 4, while passing the three phase interpolators, the delay (τ ) from the delay cell is divided into three delay categories of τ /2, τ /4, and τ /8. By utilizing these delays from the phase interpolator, clk and data signals can be generated with a fine time difference. The outputs from the multiplexer will provide two cases. One case is for the clk signal leading the data signal, the other case is for the clk signal lagging the data signal. The former case is to measure the positive setup time and the latter

Fig. 5 Layout of overall block diagram for chip test

is to measure the negative setup time of each flip-flop. Furthermore, the time difference between the clk signal and data signals is controlled by Vctrl, which is applied from outside of the chip to control the delay of delay cell. Figure 5 shows that the layout is divided into four sub-blocks. The first subblock is for a power measurement. Thirty-two flip-flops are connected to measure the power consumption. The second sub-block is the clk leading block, which is used to measure the positive setup time of each flip-flop. The third sub-block is the clk lagging block, which is used to measure the negative setup time of each flip-flop. The signal generator generates all of the necessary signals for the other sub-blocks. With this circuit, setup time, Clk-Q delay, D-Q delay, and power of DPTPL, MSL, SDFF, ep-SFF, and MSAFF can be measured.

5 Simulation results 5.1 Waveforms

Fig. 3 Power and delay measurement test bench for overall comparison

Fig. 4 On-chip delay measurement block diagram

The signal waveforms of DPTPL are shown in Fig. 6. After clk signal goes ‘High’, input D arrives after 45 ps and then the voltage at node A follows the input D. Finally, output Qb goes ‘Low’ after 57 ps.

Fig. 6 Signal waveforms of DPTPL

374

M.-Y. Kim et al.

includes power consumption by a pulsed clock generator. DPTPL reduces the power consumption by 23 to 25% over conventional differential flip-flops. Compared to single-ended flip-flops, the DPTPL’s power consumption is smaller than HLFF and SDFF but larger than ep-SFF and MSL. If the pulsed clock generator is embedded in a local clock buffer (LCB), additional power savings can be achieved.

5.4 PDP and EDP

Fig. 7 Delay comparison: conventional versus proposed flip-flops

5.2 Delay The worst case delay of each flip-flop is plotted in Fig. 7. The setup time of the proposed DPTPL is about −45 ps. As shown in Fig. 7, because of the large negative setup time of DPTPL, the minimum D-Q delay of DPTPL is smaller than other conventional flip-flops. Consequently, the speed improvement over conventional flip-flops is up to 45%.

P×D and E×D simulation results are presented in Table 1. DPTPL reduces P×D and E×D up to 46 and 69% compared to differential type flip-flops. PDP and EDP of DPTPL can be improved significantly because the speed of DPTPL is faster than conventional differential flip-flops as well as because DPTPL has a smaller clock load and total transistor width. Table 1 summarizes the general characteristics of both the proposed and conventional flip-flops. Ep-SFF and DPTPL have good negative setup time characteristics and large hold time. Some padding may be needed to avoid the shortest path problem. Total transistor width of the proposed pulsed latches is decreased significantly compared to their counterparts. If the pulse generator is embedded in LCB, further area and power savings can be achieved.

5.3 Power Figure 8 presents the power comparison with 20% input data activity. Internal power in Fig. 8 for ep-SFF and DPTPL clock Data Internal

250

Power[uW]

200

α =20%

150 100 50 0

epSFF

MSL

HLFF

SDFF MSDFF SFAF MSAFF DPTPL

Fig. 8 Overall power comparison

6 Conclusion In this paper, the DPTPL is presented. The speed, power consumption, P×D, and E×D characteristics are compared between conventional flip-flops and the proposed flip-flops. DPTPL, utilizing the strong drivability of NMOS with positive feedback PMOS transistors, enables faster operation than their conventional counterparts. It also has an advantage of lower power consumption mainly due to simplicity and smaller clock load, and total gate width. Therefore, DPTPL reduces E×D by 45.5% over ep-SFF, which have the best characteristics in our simulations among the conventional flip-flops. Hence, DPTPL is a good candidate for deep-pipeline multi-GHz microprocessors that demand highspeed, low-power operation with small area.

Table 1 General characteristics

MSL ep-SFF HLFF SDFF MSDFF SAFF MSAFF DPTPL

α = 20%

Clk-Qb (CLk-Q) [ps]

D-Qb (D-Q) [ps]

Setup time [ps]

Hold time [ps]

Total Power [uW]

PDP [fJ]

EDP [fJ×ps]

No. of Tr (including pulse gen.)

Total Tr width (including pulse gen.) [um]

77.8 (96.8) 79.4 70.3 78.3 96.8 76.4 79.5

116.3 (88) 91.9 77.4 73.5 100.3 73.2 57.4

20 −40 −5 −10 −20 −10 −20 −45

18 88 58 54 48 48 48 92

159.2 136.8 181 244.4 225.4 234.5 226.3 175.3

18.52 12.03 16.63 18.91 16.56 23.52 16.54 10.06

2153.6 1058.5 1527.3 1464.0 1216.7 2359.1 1209.4 577.4

22 12 (24) 20 23 26 20 28 12 (24)

50.26 25.7 (41.7) 53.13 69.21 66.38 82.04 87.04 33.4 (55.4)

Differential pass transistor pulsed latch

Acknowledgement This research was supported by Korea Research Foundation Grant (KRF-2003-003-D00-143) and IDEC.

References 1. Hrishikesh MS et al (2002) The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays. In: International symposium on computer architecture 2. Anderson CJ et al (2001) Physical design of a fourth-generation power GHz microprocessor. In: IEEE International solid-state circuits conference, pp 232–233 3. Gerosa G et al (1994) A 2.2 W, 80 MHz superscalar RISC microprocessor. IEEE J Solid-State Circuits 29(12):1140–1454 4. Stojanovic V, Oklobdžija VG (1999) Comparative analysis of master-slave latches and flip-flops for high-performance and lowpower systems. IEEE J Solid-State Circuits 34:536–548 5. Partovi H et al (1996) Flow-through latch and edge-triggered flipflop hybrid elements. ISSCC digest of technical papers, pp 138–139

375

6. Klass F (1996) Semi-dynamic and dynamic flip-flops with embedded logic. In: Symposium on VLSI circuits digest technical papers, pp 108–109 7. Nikoli´c B et al (2000) Improved sense-amplifier-based flip-flop: design and measurement. IEEE J Solid-State Circuits 35(6):876– 884 8. Tschanz J, Narendra S, Chen Z, Borkar S, Sachdev M, De V (2001) Comparative delay and energy of single edge-triggered & dual edge-triggered pulsed flip-flops for high-performance microprocessors. In: International symposium on low power electronics and design, August 6–7, pp 147–152 9. Kawaguchi H, Sakurai T (1998) A reduced clock-swing flip-flop (RCSFF) for 63% power reduction. IEEE J Solid-State Circuits 33:807–811 10. Kim C, Kang S-M (2002) A low-swing clock double-edge triggered flip-flop. IEEE J Solid-State Circuits 37(5):648–652 11. Nedovic N, Oklobdžija VG (2000) Dynamic flip-flop with improved power. In: International conference on computer design, pp 323–326