Pipelined Phase Accumulator using Han Carlson Adders ...

Report 5 Downloads 28 Views
Pipelined Phase Accumulator using Han Carlson Adders and reduced pre-skewing Flip-flops for DDFS Salman Nazir*, Shahid Masud**

Usama Awais

Department of Electrical Engineering Lahore University of Management Sciences Lahore, Pakistan 54792 {14100110*, smasud**}@lums.edu.pk

Department of Electrical Engineering The University of Texas at Dallas Richardson, TX, USA 75080 [email protected] skewing registers. Secondly, there is no mechanism to detect FCW update and the global clock that drives individual FFs switches them continuously leading to sustained dynamic power dissipation [2]. Thus along with pipelining, clock frequency reduction and gated clocks are necessary to cut down on area and power consumption. In [3], the number of pre-skewing FFs is reduced but the PA architecture tends to have a longer latency than the one in [2]. Our proposed PA is designed on a similar basis, however restricting the instantaneous update of the FCW bits (N). To keep up for high speed and low power, all logical gates used in the PA architecture are designed precisely with domino logic. To achieve high performance addition, Carry Look Ahead (CLA) adders are considered as fast adders and the Han Carlson is an example of such a parallel prefix adder. In it, the sum logic made of XOR gates is arranged in a way that for any given row there is never more than one cell in each pair of columns [5]. It has an overall low gate count with reduced horizontal wire capacitances and lesser number of layers compared to other parallel prefix adder architectures [6].

Abstract—This paper presents a 32-bit pipelined Phase Accumulator (PA) design for direct digital frequency synthesizers (DDFSs) in 90nm CMOS technology. The proposed PA not only minimizes the power dissipation but also reduces the number of pre-skewing flip-flops. This is achieved by substituting the pre-skewing registers of the first stage of a previously proposed architecture with low power D-latches. The proposed PA makes use of a specialized parallel prefix adder as part of the 4 bit accumulator subunit. These sub-units are directly fed with the Frequency Control Word (FCW) bit stream via sequential loading scheme generated from gated clocks. A speed of 1.5 GHz is achieved with power consumption reduced by 47% compared to its predecessor. Keywords- pipelined phase accumulator; han carlson; low power; sequential loading; DDFS

I.

INTRODUCTION

Direct digital frequency synthesizers (DDFSs) enables fast frequency hopping as well as low power dissipation when used in several communication and instrumentation applications. Pipelined algorithms have been commonly used to achieve high speed PAs for DDFS, however pipelined PAs become more power consuming as the pipeline depth (M) is increased [1]. Recent literature [2, 3] discusses effective power reduction techniques in a pipelined PA. Sequential clock gating scheme introduced in [2] incorporates a large numbers of pre-skewing FFs which increases chip area and reduces operational speed at the expense of instantaneous FCW loading and the dynamic power savings associated with it. The gating schemes in [3, 4] provide a major reduction in the pre-skewing FFs, however the FCW update rate is constrained by the depth of pipelining.

The proposed 32-bit pipelined PA architecture is shown in Fig. 1. M has been chosen as eight since the lowest power point

Our proposed PA incorporates signal gating scheme by reducing the pre-skewing FFs further and using low power Dlatches in the first pipeline column. It also utilizes a high speed and low power adder in its accumulator circuit. This paper describes the low power PA architecture and also includes a comparison of the proposed PA with those presented in [2-4]. II.

PROPOSED PIPELINE PA

A conventional pipelined PA as introduced in [2, 3] has two main drawbacks: Increasing M for higher speed operation results in a big area penalty due to an increasing number of pre-

Figure 1. Proposed 32-bit pipelined PA (N=32, M=8, K=12)

-1-

ISOCC 2015

for a 32-bit accumulator is 8 stage pipelining [4]. The FCW input bit stream is first latched through low power D-Latches proposed in [7]. These D-latches are efficient enough not to experience sub-threshold current leakage and have very low standby power dissipation. The enable signal driving the latches ensures that the current input remains stored and is not updated before a sequential cycle of 8 clock periods. As seen in the figure, the number of pre-skewing FFs is reduced from 144 to 32 (N) when compared to [2]. This is also an advantage over [3] which uses 2N pre-skewing FFs. The Push-pull Isolation design [8] characterized by high performance and energy efficiency is used to make these DFFs. The ACU is followed by postskewing FFs for the storage and computation of the upper 12bit MSBs (K).

Figure 4. Average Power Dissipation profiles at different FCW update rates

conventional PA consumes an average power of around 1.59 mW for all update rates while the proposed PA cuts down its consumption as the FCW rate is increased beyond eight ccs. It consumes as low as 0.655 mW at a FCW update rate of 100 ccs. In comparison to the conventional pipelined PA, the proposed architecture decreases the number of pre-skewing FFs down to 77.7% and power consumption reduced by 47.2%. It is also has a higher operating speed compared to the PAs in [2] and [3].

Figure 2. Inside view of the Gated Signal Generator

IV.

CONCLUSION

Operation of a DDFS at high frequency necessitates some measures for low power operation. A pipelined PA becomes one of the most power dissipating elements as its pipeline depth and speed of operation increases. This paper proposes a pipelined PA which counteracts these concerns by reducing the number of pre-skewing FFs to the pipeline depth and incorporating low power D-Latches as the FCW loading registers. It also uses a power efficient and high speed parallel prefix adder in the accumulator sub-units to sustain better performance at a higher operating frequency. REFERENCES

Figure 3. Timing Diagram of the Gated Signal Generator [1]

Fig. 2 shows the Gated signal generator. The gated signals (GS1-GS8) help in the sequential loading of the FCW bit stream. As long as En remains high, the latches will load in new FCW bits. The 8-Bit NOR gate controls the state of En. The simulation of the Gated signal generator is shown in Fig.3. Here fclk is kept at 1.5 GHz and the FCW is varied as depicted by FCW (D) and its delayed input FCW (D-1). As seen, the gated signals are generated as soon as En becomes low, and there is room for new FCW to be loaded by the D-latches when En turns high again

[2]

[3]

[4]

[5]

III.

MEASUREM,ENTS AND RESULTS

[6]

The proposed 32-bit PA is simulated in 90nm CMOS process and consists of 8 pipelined stages. The PA operates at a maximum clock frequency of 1.5 GHz and consumes no more than 0.841 mW of power at this frequency with the FCW update rate fixed at eight clock cycles (ccs). Fig.4 demonstrates the average power consumption of the conventional and proposed PA for different FCW update rates. As seen, the

[7] [8]

-2-

Yeoh, H., Jung, J.-H., Jung, Y.-H., and Baek, K.-H., “A 1.3-GHz 350mW hybrid direct digital frequency synthesizer in 90 nm CMOS”, IEEE J. Solid-State Circuits 2010, vol. 45, no. 9, pp.1845 1855 Kim, Y.S., Lee, J., Hong, Y., Kim, J.E., and Baek, K.-H., “Low power pipelined phase accumulator with sequential clock gating for DDFS”, IET Electron. Lett., 2013, Vol.49, pp. 1445 - 1446 Jung, Y.-H., Yoo, T., Cho, S.-J., and Baek, K.-H., “Pipelined phase accumulator using sequential FCW loading scheme for DDFSs”, IET Electron. Lett., 2012, Vol.48, pp. 1044–1046 Kim, Y.S., and Kang, S.M., “A high speed low-power accumulator for direct digital frequency synthesizer”, Dig. IEEE MTT-S Int. Symp. Microwave, San Francisco, CA, USA, June 2006, pp. 502–505 Weste, N. H. E., Harris, D., and Banerjee, A., “CMOS VLSI Design”, Pearson Education, 3rd Edition, pp. 55-57, 2005. Betowski, D.J., and Beiu, V., “Considerations for phase accumulator design for Direct Digital Frequency Synthesizers”, IEEE International Conference on Neural Networks and Signal Processing, Dec. 2003, Nanjing, China, Vol. 1, pp. 176-179 Ching, L. P., and Ling, O. G., “Low-power and low-voltage D-Iatch”, IEE Electronics Letters, vol. 34, no. 7, pp. 641-642, 2nd April 1998 Ko ,U., and Balsara, P. T., “High-Performance Energy-Efficient D-FlipFlop Circuits”, IEEE Trans. On Very Large Scale Integration (VLSI) Systems, vol. 8, no. 1, pp. 94-98, February 2000.

ISOCC 2015