4E-1
Fast and Effective Gate-Sizing with Multiple-Vt Assignment using Generalized Lagrangian Relaxation Hsinwei Chou Dept. of Electrical and Computer Engineering University of Wisconsin. Madison hsinwcichou @ wisc.edu
Yu-Hao W a n g Incentia Design Systems, Inc.
[email protected] Abstmcl-Simultaneous gate-sizing with multiple assignment for delay and power optimization is a complicated task in modern custom designs. In this work, we makc the key contribution of a novel gate-sizing and multi-Vt assignment technique based on generalized Lagrangian Relaxation. Experimental results show that our technique exhibits linear runtime and memory usage, and can effectively tunc circuits with over 15,000 variahles and 8,000 constraints in under 8 minutes (250x faster than state-of-the-art optimization solvers).
function. Therefore, a posynomial function is a sum of monomials. Posynomials have the property thar they are closed under addition, multiplication, and non-negative scaling. Furthermore, it is wellknown that posynomial functions can be converted into convex functions via a simple change of variables [lOl. I n general, a convex optimization problem has the form minimize f ~ ( z ) subject to gz(z) 5 O !
I . INTRODUCTION Transistor sizing is a crucial task in modern custom designs for achieving high-performance. From delay optimization [ I ] E21 131 to dynamic power reduction [4], sizing plays an important role in timing and power closure. In recent years, due to the exponential surge in leakage power consumption, multiple-l't assignment [ 5 ] [6] has nlso become an essential task in high-end designs. At prescnt, research is ongoing 171 181 for determining how transistor si;.ing can be optimally combined with multi-Vt assignment to achieve the best performance. In this work, we make the key contribution of a novel and effective gate-sizing with multi-Vt assignment technique. Our method is based on the cIassical theory of Lagrangian Relaxation [9] and a class of functions known as posynomials [IO]. Due to the convexity of our problem as well as the mathematically proven theories behind our formulation
'Optimality is with respect to the posynomial-based formulations, without any discrerizamn heunstics.
0-7803-8736-8/05/$20.00 02005 IEEE.
381
Fig. I .
A combinational circuit
The primary inputs, gates, and primary outputs are individually referred to as a component. The output of each component is referred to as a node. Two additional auxiliary nodes are introduced in such
ASP-DAC 2005
a way that one has fan-ins from all the PO primary outputs and the
other has fan-outs to all the PI primary inputs. Every node is unique. Lei N=PI+NG+PO+I. The nodes are labeled by indices 0;. . . N in a reverse topological ordering of the circuit viewed as a weighted directed acychc graph (DAG). See Fig. I for illustration. For 0 5 i 5 N-I, let ai be the arrival time at node i , and let inpuf.(i) and o u t p u t ( i ) be the set of node indices that connect directly to the input($ and output(s) of node i respectively. For example, input(O)={l,2} and output(3)={1) for the circuit shown in Fig. I . Let 'D and C be the set of primary input and gate(inc1uding primary outputs) component indices in the circuit, respectively. For example, D={4:5,6} and G={l, 2 , 3 } for the circuit shown in Fig. I . For i E 9. let Wgsbe the parameter controlling the widths of all the NMOSs and PMOSs(:adjusted by a y ratio), VLnaand KPi be the NMOS and PMOS threshold voltages respectively, CL^ be the load capacitance' of i , and s1 be the output slew of i. For simplicity of presentation. a, and st can be either the rising or the falling version. Let Ti, Di,PdyrLarnzr:ir and P l r a k a g e i denote the slew, propagation delay, dynamic power, and leakage power functions of 1 respectively. Finally, Let Lw, and U,,, be the lower and upper bound of Wgt respectively, Ltni and Ut,,2 be the lower and upper bound of K,nz respectively, and Lt,; and UtPi be the lower and upper bound of V t P i respectively.
TABLE I MODEL FITTING
ERROR M E A N A N D STANDARD D E V I A T I O N
~
C. Posynomial Charucten'zririons
After enough sample data points have been collected, we then used a general SQP package, CFSQP [ 121, to solve the parametric regression problem for the needed coefficients 'c' and 'a' in equation 3. This was performed as follows. First, we guess a value for the vector '0' and its dimension 'k'. Then, using that 'cy', 'k', and the available SPICE-simulated sample data point vector 'b', we solve the corresponding least-squares problem in equation 3 for the coefficient vector 'c' (using CFSQP). We iteratively and exhaustively repeat this 111. POSYNOMIRL DELAYA N D POWER APPROXIMATIONS procedure for different guesses of 'a' and 'k' until we obtain a leastThe benefits of using posynomials a5 a form of approximation was square error that is below a certain threshold level, at which point dencnbed earlier in Section 11-A.In this section, we detail the process we will have found an accurate posynomial approximation for the by which we generated accurate, posynomial characterizations of particular metric involved(i.e. delay, power, etc ). This posynomial the delay and power(both dynamic and leakage) behavior of all approximation process was performed for every relevant metric of simple CMOS gates. These forms will be used in the core LARTTE every simple CMOS gate (i.e. NAND, NOR, etc.) until the resulting fitting errors for all the gates came out to have at least 90% of algorithm, as we will show in Section IV. their errors contained within &IO%. For illustration purpose, the posynomial approximation we found for the propagation delay of A. Pusynomial Parametric Regression a CMOS inverter is shown below in equation 4. All other forms for Regression analysis was performed to generate the posynomial all other gates are omitted due to space limitation. approximations. In other words, we tried to best fit a set of SPICEsimulated data points to the general posynomial equation. A posynomial parametric regression problem has the following form:
subject
to
cJ
2 0, 1 5 j 5 k
(3)
where z f an is a "vector of tunable parameters (i.e. W,s, Vts, etc.), c E Rk and ct E Z k x nare the unknown characterization coefficients to be determined, and b E R'" is a m-vector of SPICEsimulated sample data values corresponding to a particular metric which we are trying to approximate (i.e. delay, power, etc.)
For the slew-related term in equation 4, j E input(i) where i E (D U G').Note that each individual term in the posynomial
R. Sample Data Point Generation
To generate the necessary data for curve-fitting (vector b), we first designed a series of experiments such that the worst-case delay, leakage power, and dynamic power of all the various gates can be captured. This was done with slew effects taken into account for the highest accuracy. Then, for each gate, we exhaustively ran tens of thousands of SPICE simulations(in 0.1 wm technology) to obtain a meaningful sample cif data points for use in regression analysis in Ill-c. 2For simplicity of discussion, CL, is treated as a variable by itself in this paper. However. in the actual algorithm, C I L ~ is expressed in terms of the widths of the loading gates, which are tunable parameterq themselves.
382
approximation may not have any direct physical meaning due to the nature of the multi-dimensional curve-fitting and guessing procedure. Table I shows the model fitting error mean and standard deviation for the characterized gates. Prefixes Inv, Na, and N o in the table represent Inverter, NAND, and NOR gates. Suffixes TP, PL, and PD denote delay, leakage power, and dynamic power respectively. 1V. THEL A R T T E ALGORITHM
W e now present the main LARTTE algorithm. Problem fomulations and theories involving optimality conditions are detailed to give insights to the superior runtime and performance of LAR'ITE.
A. D c l q and Toiul Power Optimization: Problem Formulatioil
The problem of mininiizing the maximum delay and total power subject to arrival timc and slew constraints can be formulated as a general, large-scale I follows:
without affecting the final result. The newly transformed problem is the following:
minimire subject to
5 I ; wg,u;xl5 1 , a E E Lt,$ v,,' 5 1, Kntuti: I 1, i E G Ltp&,' I I! VtpiIj;pl 51:2 E 0 L,,,! w;l
where c y I , a2 and 0 3 are user-specified weighting factors to the normalized maximum delay a", normalized total leakage power Pleakage, and normalixed total dynamic power Pdynamic functions respectively. al+a2+m:4=l. The weights are there to allow the overall importance to be divided amongst the various terms based on application-specific conditions, i.e. the percentage of time the circuit spends in idling mode, etc. The weighting factors also enable tradeoff analysis between delay, leakage, and dynamic power to be performed easily. W g ,ViTzand Vtpitre vectors of tunable parameters consisting of the parameters controlling the widths of all transistors and transistor &s respectively. Cr, and s are vectors of load capacitance end slews. From simplc rearrangement, equation 5 can be transformed into the following geometric program, which we will denoted as the primal problem (PP). PP
:
minimize
rl1E.o
+a2&kage(Mig,
where parameters with a * superscript represent those after an exponential change of variables. The reason why this Iogarithmictransformation was done was because empirically, we found that this formulation resulted in greater stability in our tuning process than the original formulation, PP. The log function also couples nicely with the exponential function to reduce the complexity of the optimality conditions(to be shown later). From 7, we can form the general Lagrangian function 1131 by introducing non-negative Lagrange multipliers to relax each arrival time and slew constraint into the objective function. Simple bounds on the transistor widths and Vts are not relaxed. For example, fof j E
input(O), let A$ denote the multiplier for the constraint In(%) 5 0. For i E G n'dj E input(i),let A$ denote the multipliers for the n .
V t n ,V i t p , s )
+ a 3 P d y n n n L d W S I G L ;V t n , V t p ,
conatraints In( e
~
subject to
(7)
':,*."' ) 5 0, and for i E (V U G ) n b'j F
3)
let A?;
3 5 1, j E input(0)
E inpiit(i),
A
5 0. For A$ denote the multipliers for the constraints l n ( g ) 5 0.
denote the multipliers for the constraints In($)
c i a
i E 2).let
U,)
Finally, let X be the vector of all the multipliers introdugd. Then, the general Lagrangian function can be written as: ~ ( ~ g , v t~n ip , , a , s ,=~crlea; ) +az~;,,,,,,(~g,~tn,~/tp,s)
+ (Y3P;ylramic(WS,cr,,V t %VtP,). LWiWya15 l! [Jtn,~;,l
5
1,
wgiuw;l51, i E 4 f i n , u ~5;1 , 8 E G
LtpZV,ii' 5 1, VtptUG:
5 I!
E
B
(6)
In general: P P is not in the form of a convex optimization problem, However, posynomials can be readily transformed into convex form by the following simple exponential transformation of the variables [IO]: Let z represent the vector of all tunable parameters, and transform each entry zr in x to a new variable yi, where IC% = e U z .After that, y is used to represent the vector of aI1 new tunable parameters and is thus used in the tuner. After tuning is complete, the original targets, x,'s, can be easily recovered from the optimal 9i-svia exponentiation.
(8) The Lagrangian relaxation subproblem associated with a particular fixed Lagrange niultiplier value X ( L R S / A ) is then: LRS/A
3. Genercllized Lagrangian Reluxdun with bgarirhmic Constraint Transformalinns From PP, after making the necessary exponential variable transformations, the next step is to make a Logarithmic transformation on the non-simple constraints by taking the natural log of both sides. Since the logarithmic function is monotonically increasing, this can be done
383
:
minimize subject to
C A ( W g .Vtn!V t p , a,5 )
LWIWg;' 5 1, WgiV;j
5 1; i
E
S
Lt,i
V&*!5 1, Kna!YE; I 1, i E E
Lt,,
4;*15
1,
vtp,u;p;I1, iE G
(9)
From basic theory on the Lagrangian function [ 131, it is known that there exists a vector value of X for which the optimal .solution
of C n S / X is actually equal to the optimal solution of the original problem, PP. Hence, if we can find this X value, then we can find the desired optimal solution of the original problem, PP (through solving
cns/xj.
Before we discuss our strategy for finding the correct X value: we shall first present a key part of our algorithm which is largely responsible for the excellent runtime of LARTTE. C. Firsr-Order KKI' Neeessay Condition For The hgrangian Func~tioii Solution
For a given Lagrangian function that we are interested in solving, proven mathematical iheories [I31 tell us that for a particular vector value X to be the correct, optimal solution multiplier, the firstorder Kuhn-Karush-Tucker (KKT) necessary condition must hold. Under the first-order IWT condition, the gradient of the Lagrangian function with respect to all variable parameters must be equal to 0.That is, Vrr;f,L~=0, VV;~,LX=O, and VV;,, Cx=0 for 1 5 i 5 NG+PO. Also, VaIL2=0 and V,:LA=O for 1 5 i 5 PI+NG+PO. Therefore, in trying to tind out what the correct, optimal multiplier value X should be, we need only consider cases where the above conditions are satisfied. This 'filtering' process is the key to dramatic runtime reduction. By taking V,;Cx=O and V,;CX=O to the Lagrangian, we obtain the following required optiinality condition on the arrival time and slew constraint multipliers:
Note that each line in I O applies to an individual set of components of X and is independent to the other lines. For example, if a particular vector value A' is to be deemed a candidate for t h e correct, optimal multiplier A, then all of its outgoing PO multiplier coyponcnts (from a PO gate to the sink node 0 ) must sum up to be m e a O . Furthermore, for all gates in D i J G, all of thcir incoming multipliers (from fan-in gates) must sum up to their outgoing multipliers multiplied by 4% In considering only those values of CO*
+D^
A'
which satisfy
Ib
equation as solution candidates for the correct, optimal multiplier A. our tuning process can significantly cut down on runtime by avoiding unnecessary computation involving impossible X candidates. Using equation 10. we now present our method for solving for the correct, optimal X value(and consequently the optimal solution of our original problem as well).
D. Iternrive Multiplier Adjustment f o r Determining Optimul X We employ an iterative, modified sub-gradient method (141 for finding the desired X vector. First, we arbitrarily pick a starting lambda value which satisfies equation (!Oj. For example, we started by assigning each of the to be where N is the number of inputs to sink node O(the number of actual primary outputs). All other multiplier components were assigned in a similar way via reverse topological order. After an initial A' guess was formed, we then iteratively update A" using a modified sub-gradient approach shown in Table 11, line 3 . to Form a new guess at every iteration. 81: is a step size value which was initialized to be 1 and gradually modified over iterations using a Truqt-Region approach [15]. We continue to iterate
w,
3 54
and make new guesses for the correct, optimal value of X until our L:Rs/X* value converges to that of the PP value, at which point we will have found our desired multiplier A, whLch is just equal to the A' at the stopped iteration. E. Solving C'RSlX Our LARTTE algorithm terminates when the solution of L'RS/A converges to that of PP. In order to do this, we musf present a method for solving the unconstrained optimization problem in LRS/X (neglecting simpie bound constraints). Since the field of unconstrained optimization is mature [13], we resort to using an off-the-shelf unconstrained solver in L-BFGS-B 1161 to do this. LBFGS-B implements the well-known BFCS-method [ 131, which has been proven to be exceptional for handling large-scale unconstrained problems with limited memory usage. The efficiency provided by L-BFGS-B contributes largely to the fast runtime of LAR'ITE. E VI Discretizafion arid L A R m E Summary Up to now, we have treated the parameter Vt as a continuously tunable parameter. This was done because the Lagrangian Relaxation technique is a technique for continuously differentiable optimization problems. Obviously, this is a problem because in practice, there are usually only a fixed and limited number of varying \/, devices to choose fromjdue to fabrication issues). Hence, in order to rectify this situation, we must discretize our 14. solutions in the end to the nearest allowable P't value. For example, if we find that after tuning, one of our transistors has an optimal V, solution value of 0 17V, but we can only choosc between a device with 0.24V Vt and a device with 0.16V I$, then we would discretize this transistor's Vt solution to be 0.16V instead. This discretization step is carried out at the end of the tuning process for all transistors and their corresponding continuous V, solutions. One may question the validity of this 'solve-conlinuous-thendiscretizc' heuristic, since the solution after discretization may no longer correspond to the optimal solution in the original problem. However. as will be shown in our experimental results (Section V), the solution after discretization is actually always very close to the ideal, optimal solution in the original problem. This will be demonstrated to hold even when the number of I/ts to discretize from is small (i.e 4, which was the value used in this work). Hence, our strategy is justifiable and sound. LARTTE has now been fully presented and is summarized in Table I1 for clarity.
V. EXPERIMENTAL RESULTS We implemented LARTTE in C/C++ and ran all our experiments on a 1.OGHz P4 machine with 1 .OGb of RAM. The stopping criterion of LARITE was set to when P P and C'RS/A agree to within 1 .0%. Lower and upper bounds of transistor widths were 0.2 pm and 1.1 prri respectively. For Vt,the lower and upper bounds were 0.14Vand 0.26V. V D Dwas 1.W and a 0.1 activity factor was used, Input slew ranged from 30 to 150 p s . For multi-V, selection, (Table I n ) , the four Vt values were made to be available for discretization: O.I4V, 0. IW, 0.22V, and (1.26V. All SPICE simulations were done in 0.1 pm technology with niultiple Vt transistor models. We conducted our experiments on the 1SCAS85 benchmarks, where the number of gates ranged from 2 I4 to 3,5 12 and the total number of tunable parameters from 654 to 15,198. Table III shows the LARTTE optimization results.
ALGORITHM LARTTE: Oulput: oprimal gate-sizing and Vt allwatiun s o h i o n I . k := 1 /* herdtion number * I A := arbitrary initial vector of constraint mulriphers satisfying (IO) lnitializs all optimization tunable parameters 2. Solve C R S I X by calling L-BFGS-B to minimize C , ( W g , Vtn: V t p , a,s,A) until optimal wlution found and then compute n l , . . . , u ~ ~ and + ~ ~ + $ 1 , . . . : SPI+SG+KJ
3 . / * Adjust
multiplus X */ for z : = 0 to PI+NG+PO do forench I E inwu.L(t) do
[ Project
"-
A% ;
*
(4) r l
Ok
if?E(DuG)
Surveying the literature, we find that another previously-propose sizing-with-Vt-assignment technique (171 took over 1.5 hours to tune a circuit with only 5318 transistors on a Sparc 60. This is obviously much slower than LARTTE, as c7552 has many more components ~ and takes only 7.2 minutes to finish with LARTTE. In [E], it was reported that their concurrent sizing-with-Vt scheme achieves on average 37% total power reduction, which is again inferior to LARITE. Similarly. in [IS], their dual Vt with sizing method can reduce total power by 50% without any timing optimization. As we have shown, LARTTE can achieve a higher power savings on top of delay optimization. Many other works [19] [20] exhibit similar inferiority to LARTTE.
M
tu the nearest point satisfying (IO)
,
,
,
,
,
,
,
,
i
8
100
4. IC := k + 1 5 . Go10 step 2 unril the cost funcrions of P P and L'RS/A converge to within a specilied tolerance 6. Discretize I h e V, solutions 7. Solve LES/X by calling L-BFGS-B to find the optimal solution
TABLE II L ARTTE
ALCOKJTH M
. (a) (b) Fig. 2. The (a) runtime and (b) storage requirements of LARTTE vs. number of variables.
A. Oprimal Timing and Power Gate-Sizing arid Vt Assignment
In Table 111, the 'optimize delay' columns show the maximum delay before and after tuning, with only timing involved in the objective function (cr,= I , az=aa=O).All transistors have a nominal Vt value of 0.18V. After obtaining the best possible delay value from sizing optimization alone, we then try to optimize the total power consumption subject to that same optimal-delay value. Hence, the solution obtained from tuning the power consumption will be guaranteed to have a critical path delay not exceeding the optimal delay value shown in the 'optimize delay' column. For power tuning, the dynamic and leakage power terms were arbitrarily assigned equal weights (In practice, these weights should be assigned based on application-specific conditions, such as the percentage of time the target circuit spends in idle mode). The resulting optimized power solution from tuning both the transistor widths and V t s are shown in the 'optimize total power' columns. This is compared to the power consumption of the circuit after tuning for delay only (with nominal V,s).The table shows an average of over 58% total power reduction can be achieved with the same delay target using simultaneous gatesizing and multi-Vt assignment. The table aIso shows that LARTTE has a mere linear runtime and memory usage requirement (see Fig. 2 as well). Lastly, in order to justify our strategy of first treating Vt as a conlinuous variable, then discretizing in the end, we show the leakage power consumption of the various tested circuits before and after discretization in Table 111. As expected, the discretized solution is always inferior to the continuous solution. However, it can be seen that the difference in leakage power consumption before and after discretiration is relatively trivial in all of the tested circuits. This suggests that our heuristic works fairly well in practice and can result in a solution point which is not too far from the globally optimal solution. To gauge the effectiveness and runtime of LARTTE, we employ a state-of-the-art general-purpose large-scale convex optimization solver in SNOF'T [ I I ] to solve the same primal problem. The runtime results are tabulated in Table 111, where it can be seen that our method is over 25Ox faster. Furthermore, wc verified that our LARTTE solution aereed with that from SNOPT to within 1% in all cases.
385
By simultaneously optimizing for delay, dynamic power, and leakage power using varying m weights, LARTTE can also be used to explore several tradeoff relationships between delay, leakage and dynamic power. Fig. 3 shows the dynamic power versus delay and leakage power versus delay optimal tradeoff curves for a 12-bit ALU, and Fig. 4(a) shows the dynamic power versus leakage power optimal tradeoff curve for the same 12-bit ALU. In Fig. 4(b). we show the effects of varying the number of Vts available for discretization. The circuit used was c432. It can be seen that any more than 4 available Vts results in minor savings.
VI.
CONCLUSION, SHORTCOMING, A N D
FUTUKEW O R K
In this work, we made the key contribution of a novel gate-sizing and multi-Vt assignment technique using Lagrangian Relaxation. Our solution is mathematically guaranteed to find the most timing and power-optimal solution point due to the use of accurate, convex posynomial approximations. Although our experimental results validate the effectiveness of LARTTE, there is currently one shortcoming with our approach that we would like to acknowledge. That is, in the tuning process, the
TABLE 111 RESULTS
a
,
,
01: OPTIMIZATION
ON I S C A S ’ 8 5 U E N C H M A R K CIRCUITS
-
,
I
171 T. Karnik, Y. Ye, J. Tschanz. L. Wei, S. Bums, V. Covindarajulu, V. De, and S. Borkar, “Total power optimization by simultaneous dual-vt
allucation and device sizing in high perrormance microprocessors,” in IEEHACM DAC, 2002. pp, 48-9
i
I 2
Leakage
-7
,
P
1
4
5
6
Iof V available
(mW)
tal
(b)
Fig. 4. (a) Dynamic vs. leaka&epower trade-off curve for c2670 (b) Effects of variable Vt on power reduction
pmos-to-nmos ratio y was not tunable. We actually statically assigned this ratio for each gate based on sound heuristics involving fan-in count and gate type information. Obviously, not being able to lune y can non-trivially reduce the optimization space. The reason why this problem exists was because of’the way wc simulated our SPICE sample data points (vector b in Section 11I-B) in the posynomial characterization process. Due to time limitation, we had to carry out the thousands of SPICE simulations in such a way that the statically assigned ratio was always inherently enforccd. Hcnce, because our posynomial approximations were generated based on a fixed 7 , the tuning process had to also abide by this y value to preserve accuracy. We intend to correct this issue in a future work by spending more time on the posynomial characterization process and adding in a new constraint Lri 5 yi 5 U,, for each gate i. REFERENCE s
111 A. R. Conn, P. K. Coulman. R. A. Haring, G. L. Momll, and C. Viswcswariah, “Jiff ytune: circuit optimization using time-domain sensitivities,“ IEEE Ti-on.soclifJn.sOII Contpirter-Aided Design I$ ICs mtd Swems. vol. 17, no. I;!? pp. 1292-1309, Deccmber 199% 121 C. P Chen, C C. N. Chu, and D. E Wong, “Fast and exact simultaneous gate and wire xizing by Iagrangian relaxation.” IEEE Trorr.sac~io~zs on
1.
[ E ] A. Srivastava. U. Silvester. and D. Blaauw, “Concurrent sizing, vdd, and v/sub thl assignment for low power design,“ in Derigri. Au!o,,niariorl.imd Tesr in E u r o p . 2004, pp. 718-719. [91 M. S.Bazaraa. H. 0.Sherali, and C. M. Shetty. Nwdineur Pmgranimingr Thenq ond Alprifhmr, 2nd ed. New York: Wiley, 1997.
[ I O ] K. Kasamsetty, M. Kelkar, and S. S,Sapatnekar. “A new class of convex functions for dctay modeling and their applica~ionto the transistor sizing problem,” IEEE Truinsactions on Computer-Aided Design of ICs and Sprems, VOI. 19, no. 7, pp. 779-788, JUIY 2000. [ I 1 ] P. E. Gill, W. Murray, and M . A. Saunders. “Snopt: An sqp algorithm for large-scale constrained optimization,” Department of Mathematics. Univenity of California. San Diego. La Jolla, CA. NumericaI Analysis Repon 97-2, 1997. [ 121 Lawrencc, C., Zhou. J . L..Tits, and A. L.. “User’s guide for cfsqp versiun 2.4: A E code Cor solving (large scale) constrained nonlinear (minmax) optimization problems. generating iterates satisfying all inequality
constraints,“ Institute for Systems Research. University of Maryland, College Park, MD, Tech. Rep. TR-94-16r1, 1996. 1131 J. Nvccdal and S . 1. Wright, Numerirril Uprimizotion. Heidelberg. Berlin, New York: Springer Verlag, 1999. [ 141 H. Tennakoon and C. Sechen, “Gate sizing using lagrangian relaxation combined with a f u t gradient-based preprocessing step.” in ICCAD. 2002. pp. 395-402. [ I S ] A. R. Conn, N. Could, and P. L. Toint, “Global convergcnce of a class u f trust region algorithms for optimization with simple hounds,” SIAM J. NUV?W~(XJ/ Anu@sk, vol. 25. pp. 433-460. 1988.
1161 R. W. Byrd. P.Lu. J . Nocedal, and C. Zhu. “A limited memory algorithm for bound constrained optimization,” Norlhwcstern University EECS,” Technical Report NAM-OZ. 1994. 1171 S. Sirichotiyakul. T. Edwards, C. Oh. J . Zuo. A. Dharchoudhary. R. Panda. and D. Rlauuw, “Srand-by power minimization through simultaneous threshold voltage selection and circuit sizing,” in IEEWACM IIAC, 1999, pp. 436-441. I IS] P. Pdnt, R. K . Roy, and A. Chatterjee. “Dual-thre?hold voltage asbignment with transistor si,ing for low power cmus circuits,” lEEE ’linn.rocriofi~ o n VLSl Syrrcnrs. vol. 9, no. 2, pp. 390-394, April 2001. 1191 M. Ketkar and S. Sapatnekar, “Paramcter variations and impacts on circuits and micruarchitccturc,” in iE€€ Coinfirence OIZ Computer-Aided De.rign, 2002. pp. 37.5-378. 1201 D. Nguyen, A. Davare, M. Orshansky, D. Chinnery, B. Thompson.
S ~ S ~ W I vnl. S . 18. no. 7. pp. 1014
and K. Keutiicr. “Minimization of dynamic and static power through joint assignment o f threshold voltages and siiing optimization.” in
131 Y. Jiang. S Sepatnekar. C. Bamji, and J . Kim. “Combincd transislor
lnternutiorrul S~rnposiiii~r on Liw-Power E/ccrmriics ord Design, 2003, pp. 1.58-163.
Computer-Aided & s i p
of
ICs
urd
-1025. July 1999. cizing with buffer inscrtion for timing optimization
’’
141 M. Borih. R. M. Owens, and M . J . Irwin, ‘Iransistur sizing Tor low power cmos circuits,” IEEE Trunsuctions mi Cmprrrer-Aidrd I k s i g n ~f K s and Systems. vol. 15, no. 6 . pp. 665-671, June 1996. [ 5 ] M. Hirabayashi. K. Nose. and T. Sakurai, “Design methodology and
optimization strategy for dual-vth scheme using commercially available tools,” in P r o c uf‘rhr Iii&ernaricJnul.swtpo.Thm an Lor$’ power elecrnrnic~ orid design. 200 I , pp. :!X3 - 286. [6] N.Tripathi, A. Bhosle. D. Samanta. and A. Pal. “Optimat assignmcnt of high threshold vottage for synthesizing dual threshold cmos circuits,” in VIS1 Derigri. Indiu, 2001, pp. 227-232
3 86