System-Level Power and Thermal Modeling and ... - Semantic Scholar

Report 1 Downloads 134 Views
System-Level Power and Thermal Modeling and Analysis by Orthogonal Polynomial based Response Surface Approach (OPRS) Janet M. Wang, Bharat Srinivas, Dongsheng Ma, Charlie Chung-Ping Chen, and Jun Li Univeristy of Arizona at Tucson University of Wisconsin at Madison

Abstract— This paper proposes a new statistical response surface based power estimation technique. The new approach is able to include a number of parameters such as multiple Vdd, multiple Vth and gate sizing parameters. It has both deterministic ability and statistical ability. The deterministic ability allows the new model to provide optimal design parameters for power reduction. The statistical ability can be used to model the process variation impact on power.

I. I NTRODUCTION The design of future nanometer chips requires power and thermal estimation that are simultaneously accurate and efficient as well as an optimization methodology. Thermal and power properties are mutually dependant: temperature depends on the power density, and power is a function of temperature. Due to the high complexity of thermal and power models at the device level, it is very difficult to accurately compute the system level power and the temperature with device level resolution. For example, the static power consumption is a function of the leakage current. Leakage current, in turn, includes sub-threshold and gate-oxide leakage and is a function of multiple parameters such as threshold voltage Vth , supply voltage Vdd , oxide thickness tox , channel length L, channel width W and temperature T . The functional dependence of leakage current on the above parameters is unique for each parameter. While the leakage current changes exponentially with regard to temperature and Vth [10], it has a polynomial dependence on the channel length L. In addition to varying design parameters, within-die and die-to-die process variations [11] and input vector alterations [3] also affects the leakage current. We conclude that the estimation of leakage power alone requires methodologies that handle multiple-parameters, that are aware of process variations, and exploit both deterministic and statistical methodologies. A sizable fraction of recent EDA research has been devoted to full chip (i.e. system level) power and thermal estimation and optimization. These efforts aim to discover power and thermal dependence on process variations and circuit input vector patterns [1], [2], [3], [7], [13], [14] and [10] through either deterministic (analytical formula for the power and temperature) or statistical approaches. Most of the existing research makes use of an iterative method that cycles between system level power estimation and system level thermal analysis. Any change in the chip temperature requires a full-chip power simulation. Likewise, power fluctuations require a full-chip thermal analysis. The inefficient iterative framework is impractical for system level power optimization and to analyze temperature controlling mechanisms [4] [6] [11] [5]. This paper proposes a new orthogonal polynomial response surface method (OPRS). The new approach characterizes design metric such as timing, power and temperature at the device and gate levels. The proposed method estimates system level power and thermal fluctuations efficiently through a top-down recursive framework. OPRS has

0-7803-9254-X/05/$20.00 ©2005 IEEE.

four advantages over the existing approaches: 1) The orthogonal polynomials approximate exponential functions with reasonable number of orders. Even though leakage power increases exponentially with temperature and Vth , polynomials can still model the dependence of leakage current on multiple parameters. 2) The orthogonal polynomial variables may be either deterministic or random. Therefore, we can apply OPRS to model the impact of process variation. 3) The roots of the orthogonal polynomials provide the sampling points for OPRS. The order of the orthogonal polynomial limits the number of necessary sampling points. Therefore, even with multiple variation sources, the complexity of the model remains manageable. 4) The new approach is a response surface type method that requires NO additional information about the structure of gates or functional blocks. The accuracy of OPRS depends on the models or tools that provide the response surface. Given accurate models at the device level, OPRS provides system level performance prediction with device level resolution. The paper presents novel ideas in the following order. Section II introduces the foundation of orthogonal polynomial, response surface method. Section III demonstrates the top down recursive OPRS framework. Section IV uses leakage power as an example to explain the OPRS characterization procedure. Section V explains the OPRS based thermal model and its decoupled nature with power analysis. Section VI discusses two applications of the proposed method. Experimental results are included in Section VII. Section VIII summarizes our conclusions. II. O RTHOGONAL POLYNOMIAL BASED RESPONSE SURFACE METHOD (OPRS) In this section, we explain the concept of orthogonal polynomial based response surface method. First, we consider the method for a deterministic case. Then in the second part, we extend the approach to the statistical cases. A. Deterministic orthogonal polynomial based response surface method (DOPRS) Consider a simple model with one input x and one output y , y = f (x)

(1)

where f is a known explicit or implicit function. The output y can be approximated using a set of polynomials yˆ = fˆ(x) =

N 

αi xi

(2)

i=0

Since the approximation of y may have errors comparing with the original function, we define the residual of the above N th order model as follows:

728

RN (α, x) = yˆ(x) − y(x)

(3)

where α represents the coefficient vector {αi }. The goal of response surface method is to construct the model fˆ(x) with minimum error. According to different ways of minimizing the residual, we have four categories of response surface methods: collocation method, subdomain method, method of moments (or Galerkin) and least squares method. While the least squares method is heavily used by the majority of the research work, rest of the approaches are not mentioned in most papers. Villadsen and Michelsen in [12] demonstrated the pros and cons for each method. To summarize, the least square method provides accurate results but the sample points may affect the accuracy enormously. The subdomain and moment methods have efficiency issues even though both of them are the most accurate methods. Collocation method is the most efficient but least accurate approach among the four methods. It appears that we have to choose between the easily applicable but in general rather inaccurate collocation method and the trustworthy but cumbersome subdomain or moment methods. The orthogonal polynomial based collocation method provides an approach that can be as accurate as moment and subdomain methods while still having good efficiency [17]. The orthogonal polynomial based collocation method approximates y by using a set of specified orthogonal polynomials gi (x). yˆ =

N 

αi gi (x)

N +1. For multiple input cases, assume I inputs, every xj represents an I entry vector. The above linear system becomes (M ×I)×((N + 1) × I). Therefore, we only need M = N + 1 collocation point sets. The complexity of the approach does not increase dramatically with regard to the number of inputs or variation sources. For example, if the approximation order is N = 3, we have 40 variation sources, the number of collocation point sets is N + 1 = 4. Each collocation point set includes 40 entries that represent 40 variation sources. The linear system is of 160 × 160. According to [12], the collocation points are selected as the roots of orthogonal polynomial gN +1 (x). B. Statistical orthogonal polynomial response surface method (SOPRS) In statistical models where variables and functions are random, we can extend the deterministic method to statistical response surface method. Consider a simple model with one random variable as input denoted as x(w), where w represents the sampling space, and one output y , y = f (x(w)) (9) where f is a known explicit or implicit random function. The output ‘y’ can be approximated using a set of specified orthogonal functions gi (x(w)).

(4)

yˆ =

i=0

where N + 1 is the order of approximation. The set of coefficients in the approximation yi can be calculated by requiring that the residual and each member of gi (x) should be orthogonal to each other.



R(α, x)gi (x)dx = 0,

i = 0, ..., N

x

M 



(5)

p(x(w))R(β, x(w))gi (x(w))dx(w) = 0

Since PDF function p(x(w)) is always positive, (7) now becomes vj R(α, xj )gi (xj ) = 0

(6)

R(β, xj ) = 0, j = 0, 1...N

j=1

R(α, xj ) = y(xj )− yˆ(xj ) = y(xj )−

N 

αi gi (xj ) = 0, j = 1...M

(7) ¿From (7), one notices that if R(α, xj ) = 0 the residue in (5) will always be minimum which is zero in this case. This also implies that the orthogonal polynomial based response method does not require the complete definition of the residual function. As long as we can calculate the value of the residual at several given values of inputs, we will be able to obtain the zero residue. The sample points (also called collocation points) are the zeros of the orthogonal polynomials. [12] proved that by choosing the collocation points as the zeros of orthogonal polynomials, we can achieve high accuracy as subdomain method, method of moments and Galerkin’s method. Given the y values at xj s (y(xj )) and the orthogonal polynomial values at xj s (gi (xj )), 7 becomes a M × (N + 1) linear system g1 (x1 ) g1 (x2 ) ··· g1 (xM )

··· ··· ··· ···







(12)

Like deterministic DOPRS, the sampling points xj are simply the roots of the (N + 1) order orthogonal polynomials gN +1 (x). gi (x) can be chosen as a special group of orthogonal polynomials according to different applications. For example, if the input x is Gaussian distributed, we choose Hermite orthogonal polynomials.

i=0

g0 (x1 )  g0 (x2 )  ··· g0 (xM

(11)

x(w)

i = 0, ..., N , where vj and xj are the weights and abscissas respectively. If vi and gi (xj ) have the same sign and are not zero for all i and j , (6) can be approximated by,



(10)

where N + 1 is the order of approximation. The orthogonal relationship defined by (5) is transformed to the probabilistic space by incorporating the Probability Density Function (PDF) of inputs p(x(w)):

We can apply Gaussian quadrature approximation to (5) and get R(α, x)gi (x)dx 

βi gi (x(w))

i=0

x



N 

III. R ECURSIVE O RTHOGONAL P OLYNOMIAL BASED R ESPONSE S URFACE (OPRS) F RAMEWORK The recursive application of the OPRS approach across the hierarchical structure brings the device level resolution to the system level. The variation sources such as L, tox and T emperature can be treated as global variables. The following pseudo code for OPRS model shows that the OPRS models recursively and implicitly call their own function in a cascading manner down the system design scales: system to component to cell/gate. Performance = OPRS(L, tox ,Temperature, ...) % system level call {



gN (x1 ) α0 y(x1 ) gN (x2 )   α1   y(x2 )   ···  =  ···  ··· y(xM ) αN gN (xM ) (8) To solve the above linear system requires M = N + 1. That is the required number of collocation points are decided by system order

729

If (not library cell) { component1 performance= OPRS(L,tox , Temperature,...); % component level call component2 performance=OPRS(L,tox ,Temperature,...) ; % component level call ... performance=component1 performance+component2 performance+...; call OPRS model construction (performance,L,tox ,Temperature); return OPRS model and Performance at system level;

For example, if we are at the cell/gate level, the cells/gates will be precharacterized. The recursive framework at the cell/gate level has the complexity of O(N) with N as the number of cells/gates. The proposed framework, if used in the power estimation, relies on leakage power estimation (Section IV) and dynamic power estimation for components (Section V), and power dissipation on interconnections of the components (Section V). The end result of the hierarchical framework is that power is a function of global process variation variables (e.g.L and tox ) and Temperature P (L, tox , T emperature). This is an explicit formula which describes the temperature and global process variation impact on power.

C

C

Fig. 1.

OPRS Hierarchical Models: system-level diagram

} else { % cell level OPRS model already exist

IV. L EAKAGE P OWER E STIMATION BY DOPRS AND SOPRS

cell1 performance = OPRS AND(L,tox ,Temperature);

In this section, we first describe the leakage power characterization by DOPRS. Then we explain the SOPRS model in estimating the process variation impact to leakage power. Leakage power is generally estimated as

% assume cell is an AND gate cell2 performance = OPRS OR(L,tox ,Temperature); % assume cell is an OR gate ...

Pleak = Ileak Vdd

performance = cell1 performance+cell2 performance+ ... return OPRS model and performance at component level

(13)

In this paper, we model Pleak as multiple variable function. That is

} }

Pleak = f (Vdd , Vth , W, tox , T )

Figure 1 shows the design scales (system → component → gate) in a circuit diagram that corresponds to the pseudo-code. On its first application at the system (or chip) level the OPRS subroutine calls on to the OPRS model for each of the system’s components. The OPRS model at component level calls, in turn, the OPRS model at the cell/gate level. The component level timing and power depend on the timing and power of all the component’s cells/gates. OPRS models for the cell/gate library have already been built analytically. SPICE (or other device level simulators) provides a precise form of the OPRS cell/gate-level analytical equation that yields the timing and power from the input temperature, the geometry parameters of interconnects and devices, and the input signal delays. The “golden data” that stems from the OPRS cell/gate model enables the construction of the OPRS component model. The same gate-component cycle repeats at the component-system level to yield an OPRS system model. In both cycles, device level simulators such as SPICE provide the golden data that is critical for the construction of the model at the successive, larger scale. It is therefore possible to proceed from cell/gate level performance to component level performance and finally to system level performance. To summarize: the component level timing and power is a function of the timing and power of all the component’s gates. The golden data that stems from the OPRS gate model enables the construction of the OPRS component model. The same gate-component cycle repeats at the component-system level to yield an OPRS system model. In both cycles, device level simulators such as SPICE provide the golden data that is critical for the construction of the model at successive scales. Note that the output of the recursive OPRS is the system level analytical equation in terms of L,tox and T emperature. This top down recursive procedure automatically identifies the collocation points of L,tox and T emperature for the system level performance or system response surface. Moreover, since the resulting system level OPRS model is a function of temperature, when temperature changes, it predicts the power fluctuations without full-chip power analysis. The recursive framework can be further extended to future new device based designs by only replacing SPICE with nanodevice or molecular level simulators. The resulting system level OPRS model will have molecular level resolution. Note that the complexity of the recursive framework depends on the number of components in the system and level of abstraction.

(14)

The function “f ” in (14) is unknown or implicit. Every variable has a range of values it can choose. In addition, some of them are deterministic variables, such as Vdd , Vth and temperature T . Others may be statistical, such as gate width w, and gate oxide thickness tox . We use DOPRS (deterministic OPRS) to model power with deterministic variables, and SOPRS to model power with statistical variables. Starting with deterministic variables, Vdd , Vth , and temperature T . Assume they have equal possibility to choose any of the values, we can model the variables as uniformly distributed in the defined region. We choose Legendre polynomial as the orthogonal polynomial. Legendre polynomial is found to be an orthogonal polynomial that has good accuracy with uniform distributed inputs [12]. The first few Legendre polynomials are 1 (15) 3 3 6 3 (16) P3 (z) = z 3 − z, P4 (z) = z 4 − z 2 − 35 7 5 To map the variables under investigation to the [−1, 1] variable range of Legendra polynomial, we use P0 (z) = 1,

P1 (z) = z,

P2 (z) = z 2 −

a+b b−a (17) z+ 2 2 assuming the variables of interest are of range [a, b]. Let x represent the variable vector. Its each entry corresponds to one variable. For instance, x1 represents multiple Vdd values, x2 multiple Vth values, x3 and x4 represents gate-size (W ) and Temperature. x=

Pˆleak = a0 + a11 x1 + a12 x2 + a13 x3 + a14 x4 + 0.5a21 (3x1 2 − 1) +0.5a22 (3x2 2 − 1) + 0.5a23 (3x3 2 − 1) + 0.5a24 (3x4 2 − 1) +0.5 a31 (5x1 3 − 3x1 ) + 0.5 a32 (5x3 3 − 3x3 ) +0.5 a33 (5x3 3 − 3x3 ) + 0.5 a34 (5x4 3 − 3x4 )

(18)

We can rewrite the above equation using vectorial representation as follows: Pˆleak = a0 + at1 x + 0.5at2 (3 x2 − 1) + 0.5at3 (5 x3 − 3x)

(19)

where a0 is the 0th order coefficient in the lengendre expansion. at1 represents all the 1st order coefficients (a11 , a12 , a13 , a14 ). at2 the 2nd and at3 the 3rd. a0 ,a1 , a2 and a3 are the unknowns. We use

730

HSPICE with BSIM3 model to estimate the leakage current. In the BSIM3 model, the leakage current model includes both subthreshold leakage current and gate leakage current. The subthreshold leakage current is modeled as −VDS Vg −Vth W (20) (m − 1) VT 2 e mT (1 − e VT ) Isub = µ0 Cox L where 3tox Cdm (21) =1+ m=1+ Wdm Cox

Fig. 2.

Model the Interconnect Network as Blackbox by OPRS

VT is the thermal voltage; T is the temperature; Cox is the gate oxide capacitance; µ0 is the zero-bias mobility; m is the subthreshold swing coefficient(also called body coefficient); Wdm is the maximum depletion layer width; tox is the gate oxide thickness; Cdm is the capacitance of the depletion layer. The gate leakage current is modeled as −B

Igate = A ∗ Eox 2 ∗ e Eox

(22)

where Eox is the electric field across the oxide; A and B are physical parameters. Let Pleak golden denote the HSPICE or any other trustable tool or measurement based leakage power value. We choose sample points of variables according to the roots of Legendre polynomials. A simple heuristic in the selection of sample points is as follows. For terms involving two or more random variables, the values of the corresponding variables are set to the values of the roots of the higher order polynomial and so on. If more points “corresponding” to a set of terms are available than needed, the points which are closer to the origin are preferred as they fall in regions of higher probability. Further, when there is till an unresolved choice, the collocation points are selected such that the overall distribution of the collocation points is more symmetric with the origin. If still more points are available, the collocation point is selected randomly. The advantage of this method is that the behavior of the model is captured reasonably well at points corresponding to regions of high probability. Furthermore, singularities can be avoided in the resultant linear equations for the number of collocation points is equal to the number of unknown coefficients. For a 3rd order approximation as in (19), only three sampling points are needed for each variable because we only have three unknown coefficients. 3 The roots of 3rd order Legendre polynomial . By (17), we can derive the real values of are 0,− 35 ,and 5 each design parameter. At each sampling point, HSPICE with BSIM3 model is used to generate the Pleak golden . Therefore, the residual becomes (23) R3 (α, xcj ) = Pˆleak − Pleak golden = 0 which leads to a0 +at1  x + 0.5at2 (3 x2 − 1) + 0.5at3 (5x3 − 3x) = Pleak

golden

(24)

With (19) and (8), we solve the linear equation to obtain the coefficients a0 , a1 , a2 and a3 . Table I shows the coefficient vectors of (19). Here, column 1 shows the different types of basic gates modeled. Column 2 displays the coefficient a0 and Column 3,4,5 and 6 display the coefficient vectors for a1 , a2 and a3 respectively. x includes Vdd , Vth and Temperature T . This is the reason why the coefficient vectors a1 to a3 all have three entries. a0 is the 0th order coefficient, and is not associated with any  x. Hence, it is always of single entry and not in the vector format. The discussed methodology could be extended to include the input vectors as an input variable(X) to the DOPRS approach. The inclusion of input vectors can be explained with an example of a 3-input NAND (referred as NAND3) gate. We map each of the possible input combinations of a combinational gate to a distribution

T

T T

T

Fig. 3.

Decouple the Iterations between Power and Thermal Analysis

distrib(a, b), where a and b are lower and upper bounds. For a NAND3 gate, we have a possibility of 8 combinational inputs. Each of these input combinations corresponds to one leakage value under given supply voltage, threshold voltage and gate geometries. We first assume a uniform distribution for inputs,U (a, b), a and b are 0 and 7 all together 8 different states respectively for NAND4 gate. Then using (17), the roots of 3rd order Lengendre polynomial are mapped to the nearest integer in the input vector design space of U (a, b). For input vectors with different probability (nonuniform distribution), other type of polynomials are suggested by [17]. With input vector variable (X) included, the leakage power becomes a functon of variable set (Vdd , Vth , T, X). We use HSPICE with BSIM3 model to estimate the leakage power at chosen collocation points for (Vdd , Vth , T, X) as described by Section II. Table III shows the results of the modeling of input vectors using the DOPRS approach. Here, column 1 shows the different input vectors. Column 2 and 3 display the supply voltage and threshold voltage (unit as voltage (V)) respectively. Column 4 demonstrates the temperature. The DOPRS leakage power results (unit as Watt (W)) are in Column 5. And they are compared with SPICE results in Column 6. The error percentage is shown in Column 7. The average error percentage is 3.5%. Any given circuit can be sub-divided into standard cells such as INVERTER, NAND, NOR etc. The proposed DOPRS method can be used in leakage precharacterization for each type of standard cell. The precharacterization is a “one-time” cost. Once leakage equation is obtained for a given standard cell, it is stored in the database. These equations are available for REUSE in any given circuit leakage power calculation / optimization. Now let us consider transistor width(w) and gate-oxide thickness (tox ) as the process variation parameters. w and tox are random variables. Assume the distributions of w and tox are w ∼ N (µ1 , σ1 ) and tox ∼ N (µ1 , σ1 ). The input random variables are represented using standard random variables ξ1 , ξ2 . Because Hermite polynomial is found to be the best choice for collocation method when the variables are normally distributed [12], in this paper, we use a 3rd order Hermite polynomial expansion for the leakage current. Pleak

with pv

 = a0 + at1 ξ + at2 (ξ2 − 1) + at3 (ξ3 − 3 ∗ ξ)

(25)

The calculation of the unknown coefficients for input random variables follows the analogous methodology described for DOPRS models. Table II shows the computed coefficient vectors for (25).

731

TABLE III L EAKAGE P OWER OBTAINED FOR A 3- INPUT NAND GATE FOR AN INPUT- VECTOR BASED DOPRS APPROACH FOR 0.18µ TECHNOLOGY

TABLE I DOPRS METHOD C OEFFICIENT V ECTORS FOR (19) FOR INPUT PARAMETERS OF Vdd , Vth AND T FOR A 0.18µ TECHNOLOGY Gate INV

a0 571.25

NAND2

167.39

NAND3

180.67

NAND4

182.25

NOR2

128.57

NOR4

257.16

a1 895.34 -3606.1 277.62 200.7 418.171.8 76.09 219.3 498.01 86.04 355.28 -924.31 99.57 252.73 -620.37 64.16 505.45 -1240.7 128.34

a2 -187.59 273.83 -253.52 -13.49 114.18 -31.41 -20.25 125.73 -38.38 -49.99 148.84 -78.05 -31.01 99.53 -52.35 -62.04 199.07 -104.7

a3 -1241.1 -2529.1 -649.3 131.78 -304.02 -77.66 -145.32 -375.75 -85.43 -237.69 -777.83 -101.14 -165.72 519.35 -67.75 -331.44 -1038.7 -135.5

Input Vector 000 001 010 011 100 101 110

Cith

METHOD

Gate INV

a0 199.65

NAND2

17.17

NAND3

20.84

NAND4

26.41

NOR2

19.6

NOR4

38.43

XOR2

811.98

a1 -165.76 171.36 -8.23 10.42 -8.93 11.42 -11.89 15.23 -8.21 10.42 -15.97 20.27 -670.57 694.84

a2 13.87 -113.16 1.15 -5.69 1.28 -6.08 1.71 -8.11 1.14 -5.69 2.23 -11.07 56.54 -457.76

Vth (V) 0.5 0.5 0.6 0.4 0.3 0.5 0.2

T (C) 25 25 25 30 25 35 20

DOPRS (W) 2.72p 2.61p 2.79p 2.99p 3.87p 4.17p 5.81p

SPICE (W) 2.59p 2.62p 2.75p 2.99p 3.99p 4.19p 5.82p

Error (%) 5.02 -0.3 1.45 0 -0.5 -0.4 -0.1

instantaneous power disspation, the thermal capacitance acts as a low-pass filter in translating power variation to temperature variation. Hence, it is important to use a thermal model instead of a power metric directly. We use Hotspot as the thermal analysis tool [13]. Hotspot is an RC model based thermal simulator. Including the proposed power dissipation model, the KCL equations for heat flow throughout the chip become

TABLE II C OEFFICIENT V ECTORS FOR (25) FOR INPUT PARAMETERS OF w AND tox FOR A 0.18µ TECHNOLOGY

SOPRS

Vdd (V) 1.5 1.5 1.5 1.2 1.2 1.2 1.0

Ti − Tj dTi = Iith (Ti , W, L, tox) + th dt Rij

(26)

A subscript i node in the thermal circuit represents a self-heating element (e.g. a gate or a segment of wire) at location (x, y, z) on the chip. The thermal capacitance Cith is proportional to the local specific th heat, the thermal resistance Ri,j is inversely proportional to the local thermal conductivity, and the thermal current Iith is the power of the ith element which has been obtained by OPRS base power estimation with process variation variables w, L, and tox. the Iith may be a nonlinear function. An accurate way to solve the nonlinear system described in ( 26) with certain boundary conditions. In the current paper, we directly provide the average Iith for each component with regard to the variational range of temperature (Ti ), w, L and tox. The experimental results shows good accuracy comparing with Monte Carlo based approach. The end result of including variations in the thermal model is that we are able to decouple the transient iterations (iterations at every time step) between power analysis and thermal analysis as in Figure 3 by incorporating the changing trend in power and thermal models respectively.

a3 2.65 106.11 0.2 4.73 0.22 4.97 0.29 6.63 0.2 4.73 0.39 9.21 10.79 428.67

V. S YSTEM L EVEL T HERMAL A NALYSIS WITH VARIATIONS In addition to the proposed leakage power estimation model, we also provide dynamic power estimation methodolgy for both interconnect networks and gates. The basic principles are based on OPRS method. For example, as shown in Figure 2, interconnect networks can be modeled as blackbox with line width, line thickness as the variables of interconnect power dissipation. The power dissipation on interconnect Pint (Wline , Hline ) is a function of line width Wline and line thickness Hline respectively. The same methodology can be applied to dynamic power estimation for gates. Note that the OPRS based methodogy is completely independent of the underlying tools to provide performance esitimation at collocation points. Therefore we may use some existing tools to provide power estimation for both interconnect network (e.g. thermal-ADI [15]) and gate (e.g. Wattch version 1.02 [13]). Again, we only need to run the simulation at the collocation points. The most popular ways of modeling temperature is by averaging power dissipation over a window of time. By summing up leakage power, interconnect power and dynamic power dissipated within certain time period, we can obtain the temperature directly. Though this direct averaging approach capture any localized heating because OPRS based power estimationcan be done at the granularity of on-chip components, it may fail to account for lateral coupling among components. Especially, when temperature corresponds to

VI. A PPLICATIONS OF OPRS MODEL A. Application 1: Joint Power Optimization with Genetic Algorithm We store cell/gate level OPRS models as part of cell libraries. The genetic algorithm uses one chromosome to represent multiple variables. For example, multiple Vdd , multiple Vth and w occupy one bit of the chromosome. For each chromosome value, we have a corresponding leakage power value achieved by OPRS model. Figure 4 shows a chromosome with bit storing multiple values [5]. The initial value of chromosome represents the starting design parameters of the gate, function block or chip. After evaluating leakage power by the gate OPRS model as well as the recursive OPRS model, for each gate/block, we reach a power value for the chip. Changing the chromosome settings leads to another power value. This procedure continues until we find a minimum power value. B. Application 2: Power Management for Network-on-Chip Design A function block, shown in Figure 5, has two power states: active and standby. When the function block is in idle status, i.e standby mode, the power consumed is mainly due to leakage current. Given a fixed work load θ, we use block1 to process the tasks. When block1’s

732

1

1 1

1

1

1

1 1

1

T

T

T T

Fig. 4.

Chromosome of Genetic Algorithm T

Fig. 6. S

Scheduling of System Blocks

S

S

such that ωmin1 ≤ ω1 ≤ ωmax1 , ωmin2 ≤ ω2 ≤ ωmax2 , and N1 and N2 are toggling times of block1 and block2. Fig. 5.

2θ θ1 = (ω1 + ω2 )f (ω1 ) τ1 2θ θ2 = N2 = (ω1 + ω2 )f (ω2 ) τ2

N1 =

Network on Chip (NOC) System Blocks

temperature approaches the upper bound (Tup ), we switch to block2. The τtotal of Figure 6 is the total time taken to finish all the tasks; τ1 and τ2 are the times for the function block1 to rise from a temperature To1 to Tup and function block2 to rise from a temperature To2 to Tup respectively. τcycle is the sum of τ1 and τ2 . Thus, τ1 = f1 (To1 , ω1 , Tup )  f1 (ω1 ) (27)

VII. E XPERIMENTAL R ESULTS

where ω1 and ω2 are the clock frequencies of block1 and block2 respectively. The total energy (Etotal ) for the total work load θ is Etotal = (E1 + E2 )N =



( +

τ1

0 

P1 (ω1 (t))dt + Pleak−1 (T )(τcycle − τ1 )

τcycle

P2 (ω2 (t))dt + Pleak−2 (T )τ1 )N

(28)

τ1

Minimizing total energy means min(Etotal ) subject to ωmin1 ≤ ω1 ≤ ωmax1 ωmin2 ≤ ω2 ≤ ωmax2 T (block1 ) ≤ Tup T (block2 ) ≤ Tup

(29)

where P1 (ω1 (t)) and P2 (ω2 (t)) are the power functions during the active mode and are expressed as a function of clock frequency for block1 and block2 respectively. Pleak1 (T ) and Pleak2 (T ) are the leakage power functions during standby state as a function of temperature respectively. T (block1 ) and T (block2 ) are the temperatures of block1 and block2. N = Ttotal /Tcycle is number of toggling times. Since the constraints set by equations T (block1 ) ≤ Tup and T (block2 ) ≤ Tup are already taken into account using (27), the temperature constraints in the set of equations (29) can be eliminated. Hence the constraints that minimize the energy consumed reduces to choosing the frequencies ω1 and ω2 for both the blocks. The simplified model for the total energy consumed by both the blocks at ω1 and ω2 is Etotal = (P1 (ω1 )f (ω1 ) + Pleak1 f (ω2 )) N1 + (P2 (ω2 )f (ω2 ) + Pleak2 f (ω1 )) N2

If the two blocks are identical and ω1 = ω2 , then the total energy becomes θ (32) Etotal = (P (ω)f (ω) + Pleak f (ω)) ωf (ω) such that ωmin ≤ ω ≤ ωmax , where ωmin = 2fθ(ω) The results of this application are shown in the Experimental section.

τ2 = f2 (To2 , ω2 , Tup )  f2 (ω2 ) τcycle = τ1 + τ2

(31)

(30)

BSIM3 model parameters of 0.13µ technology are used for all the test cases. Table IV shows the results obtained using the proposed technique for fundamental gates. The design space for Vdd , Vth and T in our results range from 1V-2V , 0.1V-0.7V and 20C-40C respectively. Values for any given set of gate-parameters are computed using the modeled equation at a random point in the design space. The result is compared to HSPICE to verify the flexibility and accuracy of the model in the design space. Column 6 and Column 7 represent the leakage power values in nW computed from DOPRS and HSPICE respectively. The average error over different cases is 3%. The proposed DOPRS technique is also applied to ISCAS’85 circuits. Table V shows the results for the application of Recursive framework explained in Section III. We compare these results with HSPICE to compare the accuracy of the model. Column 1 represents the Benchmark Circuit and the column 2 represents the number of gates present in the corresponding circuit. Column 4 and 5 represent the leakage power in nW for DOPRS and HSPICE respectively with an average error of 6%. Table VI shows the results obtained using SOPRS technique under process variations for fundamental gates. w and tox are assumed as the gate parameters for process variations. The mean of the PDF of w is taken to be 0.13µ and its standard deviation to be 20%. The mean of the PDF of tox is taken as 4nm and its standard deviation to be 20%. A Gaussian distribution is assumed for w and tox . Table VI shows the mean and standard deviations of the gates obtained using SOPRS technique with respect to the specified w and tox variations. Column 2 and Column 3 represent the means of different gates computed using SOPRS and HSPICE approaches.The percentage error in means is shown in Column 4 and the average error is approximately 4% over different cases. Similarly Columns 6,7 and 8 represent the standard deviations for SOPRS, HSPICE and the percentage errors respectively. The SOPRS technique is also applied to ISCAS85 circuits to check its authenticity for the same w and

733

TABLE IV L EAKAGE P OWER OBTAINED AT DIFFERENT VALUES OF Vth , Vdd AND T EMPERATURE IN THE DESIGN SPACE FOR 0.13µ TECHNOLOGY Gate Type INV

Input Vector 1

NAND2

01

NAND3

101

NAND4

1011

NOR2

10

NOR4

0000

Vdd (V) 1.99 1.5 1.23 1.2 1.8 1.99 1.23 1.5 1.85 1.3 1.8 1.5 1.77 1.77 1.99 1.22 1.5 1.8

Vth (V) 0.375 0.335 0.42 0.375 0.375 0.37 0.4 0.38 0.35 0.4 0.35 0.32 0.375 0.375 0.33 0.4 0.35 0.38

T (C) 20 25 30 20 25 30 20 25 30 20 25 30 20 25 30 20 25 30

DOPRS (W) 213.34p 2.66n 142.66p 19.57p 52.16p 24.31p 10.60p 25.64p 22.11p 121.11p 1.82n 1.85n 2.38n 1.38n 4.13n 2.51n 13.53n 3.98n

SPICE (W) 207.63p 2.79n 142.73p 19.83p 54.71p 24.45p 10.32p 25.44p 20.85p 113.83p 1.75n 1.82n 2.40n 1.27n 3.98n 2.39n 13.22n 3.71n

TABLE V C OMPARISION OF L EAKAGE P OWER FOR ISCAS85 B ENCHMARK C IRCUITS AFTER GA BASED DOPRS APPROACH AND HSPICE

Error % 2.75 -4.65 -0.05 -1.33 -4.88 -0.57 2.75 0.8 6.02 6.4 3.8 1.6 -0.74 7.9 3.6 5.32 2.35 7.5

DOPRS (nW) 0.073 92.47 29.99 33.45 21.05 82.9 269.60 69.93 123.5

HSPICE (nW) 0.078 99.88 32.61 36.44 22.80 90.61 293.77 73.29 134.05

Error (%) -5.78 -7.42 -8.03 -7.42 -7.67 -8.41 -8.23 -4.58 -7.86

TABLE VI M EAN AND S TANDARD DEVIATIONS OF VARIOUS GATES UNDER PROCESS VARIATIONS OF w AND tox

Gate Type

tox variations. Table VII shows these SOPRS based results. Column 1 represents the Benchmark Circuit and the column 2 represents the number of gates present in the corresponding circuit. Column 4 and 5 represent the leakage power in nW for SOPRS and HSPICE respectively. It could be observed from the results that the error in leakage estimation under process variations is around 3%. The results show that the SOPRS approach modeled leakage power with process variations with a very good accuracy. Figure 7 and Figure 8 show the PDF curves of the leakage current for a 2-Input NAND gate and a 2-Input XOR gates respectively under w and tox process variations respectively. The X-axis is the leakage current and the Y-axis shows the frequency of leakage current values obtained under w and tox variations. The curve represented by a ’line’ shows the PDF curve obtained from SOPRS technique and the curve represented as ’dots’ shows the PDF curve obtained from HSPICE Monte Carlo results. Figure 9 represents the static and dynamic power breakdown for a minimal power optimized circuits without including the effect of process variations on the static leakage power. The X-axis represents various circuit names and Y-axis the percentage of the total leakage power. The white and black blocks represent the percentage of static power and dynamic power resepectively. The percentage of static power for C17 is 1.1% of the total power which is equal to the percentage shown in [5]. Figure 10 includes the effect of process variations on the static power for minimal power optimized circuits. The X-axis and Y-axis of Figure 10 represent the circuit names and percentage of total power similar to Figure 9. The percentage increase of leakage power contribution to the total power due to process variations is represented as a white block in Figure 10. Table VIII shows the unknown coefficients of (26) for different blocks of the Alpha-EV6 processor. HOTSPOT 2.0 simulator is used for obtaining the collocation points [13]. The unknown coefficients of the equation are then calculated using OPRS. The equations obtained for various blocks estimates the time taken for the blocks to rise from an initial temperature (To1 ) to an upper bound temperature (Tup ) in (27) at the given frequency. The initial and upper bound temperature depend upon the type of the block. The equations for each block are modeled for a specific range of temperatures depending on the type of the block and for a frequency range of 4GHz − 6GHz. Table IX shows the temperature variation comparison results.

# of Gates 6 160 202 320 506 880 2307 2416 1456

Bench mark C17 C432 C499 C880 C1355 C1908 C5315 C6288 16×16 Multiplier

INV NAND2 NAND3 NAND4 NOR2 NOR4 XOR2

SOPRS Mean (pA) 17.94 17.49 20.82 26.63 19.65 38.78 207.84

Mean Error (%) -1.05 -7.21 -4.09 -9.66 -9.69 -6.69 -6.75

SPICE Mean (pA) 18.13 18.85 21.71 29.48 21.76 41.56 222.9

SOPRS STD

SPICE STD

13.8 17.12 18.06 24.57 16.68 32.89 168.23

12.98 15.83 16.97 22.48 15.35 29.98 157.65

STD Error (%) 6.32 8.15 6.42 9.29 8.66 9.71 6.7

TABLE VII C OMPARISION OF THE M EAN OF L EAKAGE P OWER UNDER P ROCESS VARIATIONS FOR ISCAS85 B ENCHMARK C IRCUITS FOR SOPRS METHOD AND HSPICE

734

Bench mark C17 C432 C499 C880 C1355 C1908 C5315 C6288 16×16 Multiplier

# of Gates 6 160 202 320 506 880 2307 2416 1456

SOPRS (nW) 0.104 137.19 42.23 55.99 34.21 139.78 451.87 103.28 167.49

HSPICE (nW) 0.113 136.56 42.62 52.94 33.19 131.81 426.05 104.38 172.72

Error (%) -7.21 0.4 -0.9 5.75 3.17 6.10 6.06 -1.04 -3.03

TABLE VIII C OEFFICIENTS OF ( 26)

FOR DIFFERENT BLOCKS OF THE

A LPHA EV6

PROCESSOR

Block IntReg

a0 258.493

D-Cache

260.4235

I-Cache

292.048

Bpred

259.0471

a1 -75.2555 39.0566 112.4069 -110.978 -57.1561 3.8104 -150.5109 -126.6959 97.1653 -126.6959 -89.5445 17.1468

a2 21.4488 46.7473 54.9969 27.2235 -1.0999 0 47.2973 -2.7498 -9.8994 40.1477 12.0993 -1.0999

a3 10.9994 -5.4997 94.5946 13.7492 -1.0999 0 31.8982 -30.7983 -144.0918 42.8976 -4.9497 1.0999

TABLE IX C OMPARISION OF THE M EAN AND STD OF T EMPERATURE UNDER P ROCESS VARIATIONS FOR A LPHA EV6 PROCESSOR BY SOPRS METHOD AND M ONTE C ARLO BASED H OT S POT STD by SPOR

Mean by SOPRS (C) 26 38 41 23

Blocks

IntReg D-Cache I-Cache Bpred

2.14 4.39 2.37 3.4

Mean Error (%) 0.31 0.78 2.21 0.43

STD Error (%) 4.2 0.4 -7.3 4.71

Fig. 9. Power Breakdown for Power Optimized Circuits Without Considering Process Variations

ï ï

2 2

2

ï

ï2

Fig. 7. PDF of leakage current for a 2-input NAND gate under process variations of w and tox

Fig. 10. Power Breakdown for Power Optimized Circuits With Process Variations

200 180 ï

ï

160 140 120 100 80 60 40 20 0

0

200

600

400

800

1000

1200

L

Fig. 8. PDF of leakage current for a 2-input XOR gate under process variations of w and tox

VIII. C ONCLUSION This paper proposes a new statistical response surface based leakage power estimation technique. The new approach is able to include a number of parameters such as multiple Vdd, multiple Vth and gate sizing parameters. It consists of both deterministic ability and statistical ability. The deterministic ability allows the new model provide optimal design parameters for power reduction. The statistical ability can be used to model the process variation impact on leakage power. R EFERENCES [1] R. Gu and M. Elmarsy, “Power dissipation analysis and optimization of deep submicron CMOS digital circuits,” in IEEE Journal of Solid-State Circuits, Volume: 31 , Issue: 5 , May 1996, pp:707 - 713 [2] Z. Chen, L. Wei and K. Roy, “Estimation of standby leakage power in cmos circuits considering accurate modeling of transistor stacks,” in Proc. of the International symposium on Low power electronics and design., vol. SC-9, p. 256, 1998. [3] J. Halter and F. Najim, “A gate-level leakage power reduction method for Ultra low power CMOS circuits,” in Proc. of the CICC, 1997. [4] S. Sirichotiyakul, T. Edwards, Chanhee Oh, R. Panda, D. Blaauw, “Duet: An accurate leakage estimation and optimization tool for dual-Vt circuits,” in IEEE Trans. of VLSI, volume: 10 , Issue: 2 , April 2002, Pages:79 - 90.

[5] W. Hung, Y. Xie, N. Vijaykrishnan, M. Kandemir, M. J. Irwin and Y. Tsai, “Total Power Optimization through Simultaneously MultipleVDD Multiple-VTH Assignment and Device Sizing with Stack Forcing,” in Proc. of the International symposium on Low power electronics and design,August 2004, pp:144 - 149. [6] D. Lee, H. Deogun, D. Blaauw, D. Sylvester, “ Simultaneous State, Vt and Tox Assignment for Total Standby Power Minimization,” in Proc. of DATE , 2004 pp.494-499. [7] Srivastava, R. Bai, D. Blaauw and D. Sylvester, “Modeling and analysis of leakage power considering within-die process variations,” in Proc. of the International symposium on Low power electronics and design, 2002, pp. 64-67. [8] S. Zhang, V. Wason and K. Banerjee“A Probabilistic Framework to Estimate Full-Chip Subthreshold Leakage Power Distribution Considering Within-Die and Die-to-Die P-T-V Variations,”in Proc. of the International symposium on Low power electronics and design 2004, pp;156-161 [9] 2004 International Technology Roadmap for Semiconductors [10] L. He, W. Liao, M. Stan, “System Level Leakge Reduction Considering the Interdependency between Temperature and Leakage”, Design Automation Conference, June 2004. [11] A. Srivastave, D. Sylvester, D. Blaauw, “Statistical Optimization of Leakage Power Considering Process Veriations Using Dual-Vth and Sizing”, Design Automation Conference, June 2004. [12] J. Villadsen, M. L. Michelsen, Solution of Differential Equation Models by Polynomial Approximation, 1978 by Prentice-Hall, Inc. [13] W. Huang, M. Stan, K. Skadron, K. Snkarannarayanan, “A Compact Thermal Modeling for Temperature-Aware Design”, Design Automation Conference, June 2004 [14] H. Su, F. Liu, A. Devgan, E. Acar, S. Nassif, “Full Chip Leakage Estimation Considering Power Supply and Temperature Variations”, Design Automation Conference, June 2004 [15] T. Y. Wang, C. Chen, “Thermal Adi: A Linear Time Chip Level dynamic thermal simulation algorithm based on alternating-directionimplicit (ADI) method”, International Symposium on Physical Design, 2001. [16] X. Chen, L. Peh, “Leakage Power Modeling and Optimization in Interconnection Networks”, ISLPED 2003 [17] xxxxxxxxxxxxx, “A Probabilistic Collocation Method Based Statistical Gate Delay Model Considering Process Variations and Multiple Input Switching”, accepted by Design Automation and Test in Europe (DATE) 2005.

735