Power-speed Trade-off in Parallel Prefix Circuits - Semantic Scholar

Report 0 Downloads 13 Views
Power-speed Trade-off in Parallel Prefix Circuits ITCom 2002 High-Performance Pervasive Computing Conference Boston, MA July 29 – August 2, 2002

by S. Vanichayobon, S. K. Dhall, S. Lakshmivarahan, J. K. Antonio School of Computer Science University of Oklahoma www.cs.ou.edu

Outline

• Overview of Prefix Circuits • Overview of Power Consumption in CMOS Circuits • Power Consumption Model for Prefix Circuits • Simulated Power Consumption for Prefix Circuits • Power-speed Trade-off for Prefix Circuits • Conclusions

Overview of Prefix Circuits

• The prefix problem is to compute

yi = x1 ⋅ x2 L xi −1 ⋅ xi

for i = 1, 2, …, N

where “·” denotes an associative binary operation • Prefix circuits used in a number of areas, including • carry-look-ahead adder • ranking • radix sort

Overview of Prefix Circuits Structure and Notation for a Prefix Circuit input nodes ( xi = i )

1

2

3

4

1:1

1:2

1:3

1:4

level

1 2

output nodes ( yi = 1 : i )

operation node

Overview of Prefix Circuits Basic Definitions for Prefix Circuits • The circuit size is total number of operation nodes • The circuit depth is the length of the longest path • The circuit fan-in is the maximum fan-in of all nodes • The circuit fan-out is the maximum fan-out of all nodes Known Lower Bounds for N-Input Prefix Circuits size ≥ N − 1

depth ≥ lg N size + depth ≥ 2 N − 2

Overview of Prefix Circuits Comparison of Serial and Parallel Prefix Circuits Serial Prefix Circuit with size = 3 and depth = 3 1 level 1 2 3 1:1

2

3

4

1:2

1:3

1:4

. . .

input

output

Parallel Prefix Circuit with size = 4 and depth = 2 1 level 1 2 1:1

2

3

4

1:2

1:3

1:4

. . ..

input

output

Overview of Prefix Circuits Prefix Circuits Considered • Serial Prefix Circuit (S) • Divide and Conquer Parallel Prefix Circuit (DC) • Ladner and Fischer Parallel Prefix Circuit (LF), 1980 • Brent and Kung Parallel Prefix Circuit (BK), 1982 • Snir Parallel Prefix Circuit (SN), 1986 • Lakshmivarahan, Yang and Dhall Parallel Prefix Circuit (LYD), 1994 • Lin and Shin Parallel Prefix Circuit (SL), 1999

Overview of Prefix Circuits Prefix Circuit

S(N)

Size

Depth

Fan-out

N −1

N −1

2

DC(N)

( N 2) lg N

lg N

( N 2) + 1

BK(N)

2 N − lg N − 2

2 lg N − 2

lg N + 1

LF0(N)

4 N − F (5 + lg N ) + 1

lg N + k

(N 2 ) + k

LFk(N)

2 N − lg N − 2

2 lg N − 2

lg N + 1

SN(N)

2 N − 2 − depth

max(lg N , 2 lg N − 2 ≤ depth ≤ N − 1)

lg N + 1

SL(N)

2 N − 2 − depth

2 lg N − 5 ≤ depth ≤ 2 lg N − 3

lg N + 1

LYD(N)

2 N − 2 − depth

2 lg N − 6 ≤ depth ≤ 2 lg N − 3 2 lg N − 2

k +1

when k ≥ lgN-2

Overview of Power Consumption Sources of Power Consumption in CMOS

Leakage Current

Dynamic Capacitance Charging Current

Most important for CMOS Dependent on clock frequency Dependent on signal activity Transient Current

Overview of Power Consumption • The power consumption due to switching is given by 2 Pswitching = capeff VDD f

VDD is the supply voltage f is the clock frequency capeff is the effective switched capacitance, cap eff = a f C L

CL

is the circuit load capacitance

af

is the circuit switching activity factor*

*Note: without consideration of glitching, 0 ≤ a f ≤ 1 with consideration of glitching, it is possible to have a f > 1

Overview of Power Consumption Examples to illustrate the effect of glitching, assume • signals q, r, s, t are synchronously clocked • non-zero propagation delay at each node • load capacitance at each node is c0 For each circuit below, the load capacitance is CL = 3 c0 3 +cap 2 eff += 1 6c = 60 = 2CL r

q 1 2

s

+

q+r

+

r

q t

q+r+s 3

4c0 = (4/3)CL 3 cap + eff 1=4

+

q+r+s+t (a) Chained Implementation

1 2

+

+

q+r

t

s

+

s+t

q+r+s+t (b) Tree Implementation

Overview of Power Consumption Worst case analysis • Input signals cause node outputs to transition at every cycle • A node at level i experiences i transitions (includes glitches) • Assuming constant load capacitance c0 at each node, then the effective switched capacitance of a circuit is

capeff

 depth  =  ∑ iwi c0  i =1 

wi is number of nodes at level i

Power Consumption Model • Load capacitance of node with fan-out of k is modeled by c0 + (k –1)c' c' ' c c0

fan-out = k

• Calculation of effective switched capacitance for a preficx circuit divided into two parts

capeff ( N ) = c0 Kcapeff ( N ) + c ' Rcapeff ( N ) constant part residual part

Power Consumption Model capeff ( N ) = c0 Kcapeff ( N ) + c ' Rcapeff ( N ) constant part

 depth ( N )  Kcapeff ( N ) =  ∑ iwi   i =1  wi is number of nodes at level i residual part

 depth ( N ) k  Rcapeff ( N ) =  ∑ i ∑ ( j − 1)wij   i =1 j = 2  wij is number of nodes at level i having fan-out equal to j

Power Consumption Model Example calculation of capeff for the DC(N) circuit Kcapeff (2) = 1

 N  + N  depth N  + 1    Kcapeff (N ) = 2 Kcapeff    2  2 2

1

N 2

2

N +1 2

N −1

… DC

1:2

INPUT



N   2 

DC

… 1

N

N   2 

… N 1: 2

N 1: +1 2

1:N-1 1:N

OUTPUT

Power Consumption Model Example calculation of capeff for the DC(N) circuit Rcapeff ( 2) = 0

N N N + depth Rcapeff (N ) = 2 Rcapeff     2 2 2

1

N 2

2

N +1 2

N −1

… DC

1:2

INPUT



N   2 

DC

… 1

N

N   2 

… N 1: 2

1:

N +1 2

1:N-1 1:N

OUTPUT

Power Consumption Model Example calculation of capeff for the DC(N) circuit

capeff ( N ) = c0 Kcap eff ( N ) + c ' Rcapeff ( N )

(

)

(

N N    2 ) =N 1 ) + lg N c0 +  cap ( N ) effKcap (lg N ) 2 − lg N =  eff ( 2(lg 4   4 N N N Kcapeff ( N ) = 2Kcapeff ( ) +  depth( ) + 1 2 2 2   sd + s   sd − s  ' cap eff ( N ) =  c0 +  c  2   2  Rcapeff ( 2) = 0

N N N N d = depth s = size ( N eff ) =( N ) =lg2NRcapeff ( ) + depth( ( N Rcap ) ) = lg N 2 2 2 2

)c 

'

Power Consumption Model Value of capeff for different prefix circuits S(N) DC(N) BK(N) LFk(N)

 N ( N − 1)   ( N − 1)( N − 2)  '  c0 +  c 2 2    

(

N 2  (lg N ) + lg N 4 

)c

(

Ο( N 2 )

N +  (lg N ) 2 − lg N  4 0

[

)c

'

Ο( N lg 2 N )



]

N 2 N  ' 1  3    N N  1 2 1+ NlgN− 2N+(lgN) +lgN c0 + 31 + 2 lg 2  − 2  3 N + (lg 2 ) + lg 2  c    2  2   

[

Ο( N lg N )

]

N 2 N ' 1  3    N N 1 2 '  N N 2 2  (lgN) +lgN c0 + (lgN) −lgN c ≤LFk ≤ 1+ NlgN− 2N+(lgN) +lgN c0 + 31+ lg − 3N+(lg ) +lg c 2 2  2   4 4  2    2 2  2

(

)

(

)

Ο( N lg 2 N )

SN(N)

 3   1 2 1+ N1(lgN1)−  [2N1 +(lgN1) +(lgN1)]+ 2 2    

  N22 −N2    N1 N1  1 N N  c0 + 31+ lg − 3N1 +(lg 1)2 +lg 1 + − + N (lg N ) (lg N )  2 1   1   2 2   2    2 2  2 

1 2   ' lgN1(N2 −1) +2 N2 −3N2 +2 c  

SL(N)

 3   1 2 1+ N1(lgN1)−  [2N1 +(lgN1) +(lgN1)]+ 2 2    

  N22 −N2    N1 N1  1 N N  c0 + 31+ lg − 3N1 +(lg 1)2 +lg 1 + − + N (lg N ) (lg N )  2 1   1   2 2   2    2 2  2 

1 2  '  lgN1(N2 −1) +2 N2 −3N2 +2 c  

LYD(N)

(

)

Ο( N lg N )

(

)

Ο( N lg N )

   3  2lgN13  1  3 1 2  2 4 lgN +2lgN1 +  1 +1 + (N3 +N4)lgN1 + + N32 +N42 +(N3N4)c0 + 1+ N1 lgN1 −  [2N1 +(lgN1) +lgN1] +  3  2 2 2   3  2    

(

  N1 N1  1 N1 2 N1  31+ lg − 3N1 +(lg ) +lg  + 2 2    2 2  2

2 2 lgN1  3lgN1 +lgN1 + 3  +  

)

N3 N42 N  ( ) 2 lg N N 1 + + + +N4lgN1 +N3N4 +1− 4 c'   1 3 2 2  2

Ο( N lg N )

Simulated Power Consumption Parameter Values • Number of inputs, N = 8, 16, 32, 64 • Binary operation is XOR gate • Supply voltage, VDD = 2.8V • Simulated power consumption computed by averaging power consumption from many input vectors • Simulated power consumption compared to analytical model for seven prefix circuits

Simulated Power Consumption Comparison of analytical model and simulation results for N = 32

18000

2.5

16000

Power Consumption (W)

Normalized Power Consumption

Simulation

Analytical Model

20000

14000 12000 10000 8000 6000 4000

2.0

1.5

1.0

0.5

2000 0

0.0

BK

Snir

SL

LYD

DC

Prefix Circuit

LF0

LF1

Serial

BK

Snir

SL

LYD

DC

Prefix Circuit

LF0

LF1

Serial

Simulated Power Consumption Comparison of analytical model and simulation results for parallel prefix circuits Analytical Model

5.5

BK Snir SL LYD DC LF0 LF1

5.0

18000

4.5

16000

Power Consumption (W)

Normalized Power Consumption

20000

Simulation

14000 12000 10000 8000 6000 4000 2000

4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0

0 8 bits

16 bits

32 bits

Number of Bits

64 bits

8 bits

16 bits

32 bits

Number of Bits

64 bits

Power-speed Trade-off • Assuming the speed of a circuit to be inversely proportional to the circuit’s depth is appropriate for comparing circuits with the same supply voltage • However, scaling the supply voltage can also be an effective way of making a power-speed trade-off 2 Pswitching = capeff VDD f

• Reducing voltage decreases power consumption but generally increases circuit delay

Power-speed Trade-off A.P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Kluwer Academic Publishers, 1995.

15 14

Depth

13

BK Snir SL LYD DC LF0 LF1

12 11 10 9 8 7 6 1.2

1.4

1.6

1.8

2.0

2.2

2.4

Supply Voltage (V)

2.6

2.8

3.0

3.2

Power-speed Trade-off Modified Depth-Based Delay Model (depth-based delay model scaled with empirical delay measurements)

BK Snir SL LYD DC LF0 LF1

18 17 16

Normalized Delay

15 14 13 12 11 10 9 8 7 6 1.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

Supply Voltage (V)

2.8

3.0

3.2

Power-speed Trade-off Comparison of modified depth-based delay model and simulation results for N = 64 Modified Depth-Based Delay Model

18 17

Simulation

16 15 14 13 12 11 10

13 12 11 10 9 8

9

7

8

6

7

5

6

BK Snir SL LYD DC LF0 LF1

14

Delay (us)

Normalized Delay

15

BK Snir SL LYD DC LF0 LF1

4 1.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

Supply Voltage (V)

2.8

3.0

3.2

1.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

Supply Voltage (V)

2.8

3.0

3.2

Power-speed Trade-off Power-Speed Trade-off Example • Assume maximum acceptable delay is 6.4 µs • Power reduction of about 1.6 times can be obtained without speed loss by using LYD prefix circuit compared with using the divide-and-conquer prefix circuit

BK Snir SL LYD DC LF0 LF1

Power Consumption (W)

4.5 4.0 3.5

2.25W

3.0 2.5 2.0 1.5

BK Snir SL LYD DC LF0 LF1

14 13 12 11 10 9 8 7 6

1.44W

1.0

15

Delay (us)

5.0

5 4

1.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

Supply Voltage (V)

2.8

3.0

3.2

1.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

Supply Voltage (V)

2.8

3.0

3.2

Conclusions • Analytical power consumption model was developed and applied to seven prefix circuits • Accuracy of analytical power consumption model verified with PSpice simulations • Modified depth-based delay model proposed to account for propagation delay dependency on supply voltage value • Example of power-speed trade-off design provided • Future Work • effect of pipelining on power-speed trade-off • effect of fan-out on delay