Power-speed Trade-off in Parallel Prefix Circuits ITCom 2002 High-Performance Pervasive Computing Conference Boston, MA July 29 – August 2, 2002
by S. Vanichayobon, S. K. Dhall, S. Lakshmivarahan, J. K. Antonio School of Computer Science University of Oklahoma www.cs.ou.edu
Outline
• Overview of Prefix Circuits • Overview of Power Consumption in CMOS Circuits • Power Consumption Model for Prefix Circuits • Simulated Power Consumption for Prefix Circuits • Power-speed Trade-off for Prefix Circuits • Conclusions
Overview of Prefix Circuits
• The prefix problem is to compute
yi = x1 ⋅ x2 L xi −1 ⋅ xi
for i = 1, 2, …, N
where “·” denotes an associative binary operation • Prefix circuits used in a number of areas, including • carry-look-ahead adder • ranking • radix sort
Overview of Prefix Circuits Structure and Notation for a Prefix Circuit input nodes ( xi = i )
1
2
3
4
1:1
1:2
1:3
1:4
level
1 2
output nodes ( yi = 1 : i )
operation node
Overview of Prefix Circuits Basic Definitions for Prefix Circuits • The circuit size is total number of operation nodes • The circuit depth is the length of the longest path • The circuit fan-in is the maximum fan-in of all nodes • The circuit fan-out is the maximum fan-out of all nodes Known Lower Bounds for N-Input Prefix Circuits size ≥ N − 1
depth ≥ lg N size + depth ≥ 2 N − 2
Overview of Prefix Circuits Comparison of Serial and Parallel Prefix Circuits Serial Prefix Circuit with size = 3 and depth = 3 1 level 1 2 3 1:1
2
3
4
1:2
1:3
1:4
. . .
input
output
Parallel Prefix Circuit with size = 4 and depth = 2 1 level 1 2 1:1
2
3
4
1:2
1:3
1:4
. . ..
input
output
Overview of Prefix Circuits Prefix Circuits Considered • Serial Prefix Circuit (S) • Divide and Conquer Parallel Prefix Circuit (DC) • Ladner and Fischer Parallel Prefix Circuit (LF), 1980 • Brent and Kung Parallel Prefix Circuit (BK), 1982 • Snir Parallel Prefix Circuit (SN), 1986 • Lakshmivarahan, Yang and Dhall Parallel Prefix Circuit (LYD), 1994 • Lin and Shin Parallel Prefix Circuit (SL), 1999
Overview of Prefix Circuits Prefix Circuit
S(N)
Size
Depth
Fan-out
N −1
N −1
2
DC(N)
( N 2) lg N
lg N
( N 2) + 1
BK(N)
2 N − lg N − 2
2 lg N − 2
lg N + 1
LF0(N)
4 N − F (5 + lg N ) + 1
lg N + k
(N 2 ) + k
LFk(N)
2 N − lg N − 2
2 lg N − 2
lg N + 1
SN(N)
2 N − 2 − depth
max(lg N , 2 lg N − 2 ≤ depth ≤ N − 1)
lg N + 1
SL(N)
2 N − 2 − depth
2 lg N − 5 ≤ depth ≤ 2 lg N − 3
lg N + 1
LYD(N)
2 N − 2 − depth
2 lg N − 6 ≤ depth ≤ 2 lg N − 3 2 lg N − 2
k +1
when k ≥ lgN-2
Overview of Power Consumption Sources of Power Consumption in CMOS
Leakage Current
Dynamic Capacitance Charging Current
Most important for CMOS Dependent on clock frequency Dependent on signal activity Transient Current
Overview of Power Consumption • The power consumption due to switching is given by 2 Pswitching = capeff VDD f
VDD is the supply voltage f is the clock frequency capeff is the effective switched capacitance, cap eff = a f C L
CL
is the circuit load capacitance
af
is the circuit switching activity factor*
*Note: without consideration of glitching, 0 ≤ a f ≤ 1 with consideration of glitching, it is possible to have a f > 1
Overview of Power Consumption Examples to illustrate the effect of glitching, assume • signals q, r, s, t are synchronously clocked • non-zero propagation delay at each node • load capacitance at each node is c0 For each circuit below, the load capacitance is CL = 3 c0 3 +cap 2 eff += 1 6c = 60 = 2CL r
q 1 2
s
+
q+r
+
r
q t
q+r+s 3
4c0 = (4/3)CL 3 cap + eff 1=4
+
q+r+s+t (a) Chained Implementation
1 2
+
+
q+r
t
s
+
s+t
q+r+s+t (b) Tree Implementation
Overview of Power Consumption Worst case analysis • Input signals cause node outputs to transition at every cycle • A node at level i experiences i transitions (includes glitches) • Assuming constant load capacitance c0 at each node, then the effective switched capacitance of a circuit is
capeff
depth = ∑ iwi c0 i =1
wi is number of nodes at level i
Power Consumption Model • Load capacitance of node with fan-out of k is modeled by c0 + (k –1)c' c' ' c c0
fan-out = k
• Calculation of effective switched capacitance for a preficx circuit divided into two parts
capeff ( N ) = c0 Kcapeff ( N ) + c ' Rcapeff ( N ) constant part residual part
Power Consumption Model capeff ( N ) = c0 Kcapeff ( N ) + c ' Rcapeff ( N ) constant part
depth ( N ) Kcapeff ( N ) = ∑ iwi i =1 wi is number of nodes at level i residual part
depth ( N ) k Rcapeff ( N ) = ∑ i ∑ ( j − 1)wij i =1 j = 2 wij is number of nodes at level i having fan-out equal to j
Power Consumption Model Example calculation of capeff for the DC(N) circuit Kcapeff (2) = 1
N + N depth N + 1 Kcapeff (N ) = 2 Kcapeff 2 2 2
1
N 2
2
N +1 2
N −1
… DC
1:2
INPUT
…
N 2
DC
… 1
N
N 2
… N 1: 2
N 1: +1 2
1:N-1 1:N
OUTPUT
Power Consumption Model Example calculation of capeff for the DC(N) circuit Rcapeff ( 2) = 0
N N N + depth Rcapeff (N ) = 2 Rcapeff 2 2 2
1
N 2
2
N +1 2
N −1
… DC
1:2
INPUT
…
N 2
DC
… 1
N
N 2
… N 1: 2
1:
N +1 2
1:N-1 1:N
OUTPUT
Power Consumption Model Example calculation of capeff for the DC(N) circuit
capeff ( N ) = c0 Kcap eff ( N ) + c ' Rcapeff ( N )
(
)
(
N N 2 ) =N 1 ) + lg N c0 + cap ( N ) effKcap (lg N ) 2 − lg N = eff ( 2(lg 4 4 N N N Kcapeff ( N ) = 2Kcapeff ( ) + depth( ) + 1 2 2 2 sd + s sd − s ' cap eff ( N ) = c0 + c 2 2 Rcapeff ( 2) = 0
N N N N d = depth s = size ( N eff ) =( N ) =lg2NRcapeff ( ) + depth( ( N Rcap ) ) = lg N 2 2 2 2
)c
'
Power Consumption Model Value of capeff for different prefix circuits S(N) DC(N) BK(N) LFk(N)
N ( N − 1) ( N − 1)( N − 2) ' c0 + c 2 2
(
N 2 (lg N ) + lg N 4
)c
(
Ο( N 2 )
N + (lg N ) 2 − lg N 4 0
[
)c
'
Ο( N lg 2 N )
]
N 2 N ' 1 3 N N 1 2 1+ NlgN− 2N+(lgN) +lgN c0 + 31 + 2 lg 2 − 2 3 N + (lg 2 ) + lg 2 c 2 2
[
Ο( N lg N )
]
N 2 N ' 1 3 N N 1 2 ' N N 2 2 (lgN) +lgN c0 + (lgN) −lgN c ≤LFk ≤ 1+ NlgN− 2N+(lgN) +lgN c0 + 31+ lg − 3N+(lg ) +lg c 2 2 2 4 4 2 2 2 2
(
)
(
)
Ο( N lg 2 N )
SN(N)
3 1 2 1+ N1(lgN1)− [2N1 +(lgN1) +(lgN1)]+ 2 2
N22 −N2 N1 N1 1 N N c0 + 31+ lg − 3N1 +(lg 1)2 +lg 1 + − + N (lg N ) (lg N ) 2 1 1 2 2 2 2 2 2
1 2 ' lgN1(N2 −1) +2 N2 −3N2 +2 c
SL(N)
3 1 2 1+ N1(lgN1)− [2N1 +(lgN1) +(lgN1)]+ 2 2
N22 −N2 N1 N1 1 N N c0 + 31+ lg − 3N1 +(lg 1)2 +lg 1 + − + N (lg N ) (lg N ) 2 1 1 2 2 2 2 2 2
1 2 ' lgN1(N2 −1) +2 N2 −3N2 +2 c
LYD(N)
(
)
Ο( N lg N )
(
)
Ο( N lg N )
3 2lgN13 1 3 1 2 2 4 lgN +2lgN1 + 1 +1 + (N3 +N4)lgN1 + + N32 +N42 +(N3N4)c0 + 1+ N1 lgN1 − [2N1 +(lgN1) +lgN1] + 3 2 2 2 3 2
(
N1 N1 1 N1 2 N1 31+ lg − 3N1 +(lg ) +lg + 2 2 2 2 2
2 2 lgN1 3lgN1 +lgN1 + 3 +
)
N3 N42 N ( ) 2 lg N N 1 + + + +N4lgN1 +N3N4 +1− 4 c' 1 3 2 2 2
Ο( N lg N )
Simulated Power Consumption Parameter Values • Number of inputs, N = 8, 16, 32, 64 • Binary operation is XOR gate • Supply voltage, VDD = 2.8V • Simulated power consumption computed by averaging power consumption from many input vectors • Simulated power consumption compared to analytical model for seven prefix circuits
Simulated Power Consumption Comparison of analytical model and simulation results for N = 32
18000
2.5
16000
Power Consumption (W)
Normalized Power Consumption
Simulation
Analytical Model
20000
14000 12000 10000 8000 6000 4000
2.0
1.5
1.0
0.5
2000 0
0.0
BK
Snir
SL
LYD
DC
Prefix Circuit
LF0
LF1
Serial
BK
Snir
SL
LYD
DC
Prefix Circuit
LF0
LF1
Serial
Simulated Power Consumption Comparison of analytical model and simulation results for parallel prefix circuits Analytical Model
5.5
BK Snir SL LYD DC LF0 LF1
5.0
18000
4.5
16000
Power Consumption (W)
Normalized Power Consumption
20000
Simulation
14000 12000 10000 8000 6000 4000 2000
4.0 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0
0 8 bits
16 bits
32 bits
Number of Bits
64 bits
8 bits
16 bits
32 bits
Number of Bits
64 bits
Power-speed Trade-off • Assuming the speed of a circuit to be inversely proportional to the circuit’s depth is appropriate for comparing circuits with the same supply voltage • However, scaling the supply voltage can also be an effective way of making a power-speed trade-off 2 Pswitching = capeff VDD f
• Reducing voltage decreases power consumption but generally increases circuit delay
Power-speed Trade-off A.P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Kluwer Academic Publishers, 1995.
15 14
Depth
13
BK Snir SL LYD DC LF0 LF1
12 11 10 9 8 7 6 1.2
1.4
1.6
1.8
2.0
2.2
2.4
Supply Voltage (V)
2.6
2.8
3.0
3.2
Power-speed Trade-off Modified Depth-Based Delay Model (depth-based delay model scaled with empirical delay measurements)
BK Snir SL LYD DC LF0 LF1
18 17 16
Normalized Delay
15 14 13 12 11 10 9 8 7 6 1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6
Supply Voltage (V)
2.8
3.0
3.2
Power-speed Trade-off Comparison of modified depth-based delay model and simulation results for N = 64 Modified Depth-Based Delay Model
18 17
Simulation
16 15 14 13 12 11 10
13 12 11 10 9 8
9
7
8
6
7
5
6
BK Snir SL LYD DC LF0 LF1
14
Delay (us)
Normalized Delay
15
BK Snir SL LYD DC LF0 LF1
4 1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6
Supply Voltage (V)
2.8
3.0
3.2
1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6
Supply Voltage (V)
2.8
3.0
3.2
Power-speed Trade-off Power-Speed Trade-off Example • Assume maximum acceptable delay is 6.4 µs • Power reduction of about 1.6 times can be obtained without speed loss by using LYD prefix circuit compared with using the divide-and-conquer prefix circuit
BK Snir SL LYD DC LF0 LF1
Power Consumption (W)
4.5 4.0 3.5
2.25W
3.0 2.5 2.0 1.5
BK Snir SL LYD DC LF0 LF1
14 13 12 11 10 9 8 7 6
1.44W
1.0
15
Delay (us)
5.0
5 4
1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6
Supply Voltage (V)
2.8
3.0
3.2
1.2
1.4
1.6
1.8
2.0
2.2
2.4
2.6
Supply Voltage (V)
2.8
3.0
3.2
Conclusions • Analytical power consumption model was developed and applied to seven prefix circuits • Accuracy of analytical power consumption model verified with PSpice simulations • Modified depth-based delay model proposed to account for propagation delay dependency on supply voltage value • Example of power-speed trade-off design provided • Future Work • effect of pipelining on power-speed trade-off • effect of fan-out on delay