Advanced Variance Reduction and Sampling ... - IEEE Xplore

Report 1 Downloads 93 Views
1894

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 12, DECEMBER 2010

Advanced Variance Reduction and Sampling Techniques for Efficient Statistical Timing Analysis Javid Jaffari, Student Member, IEEE, and Mohab Anis, Senior Member, IEEE

Abstract—The Monte-Carlo (MC) technique is a traditional solution for a reliable statistical analysis, and in contrast to probabilistic methods, it can account for any complicate model. However, a precise analysis that involves a traditional MC-based technique requires many simulation iterations, especially for the extreme quantile points. In this paper, advanced sampling and variance reduction techniques, along with applications for efficient digital circuit timing yield analysis, are studied. Three techniques are proposed: 1) an enhanced quasi-MC-based sampling which generates optimally low-discrepancy samples suitable for yield estimation of digital circuits; 2) an order-statistics based control variate technique that improves the quality of the yield estimations, when a moderate number of samples is needed; and 3) a classical control-variate technique utilized for a variance-reduced critical delay’s statistical moment estimation. This solution is shown to be effective even for a very low number of samples. Index Terms—Control variate, digital very large scale integration (VLSI) circuits, quasi-Monte Carlo, statistical static timing analysis (SSTA), timing yield, variance reduction.

I. Introduction N ACCURATE yet efficient statistical static timing analysis (SSTA) is a pivotal task in predicting the yield of a high-performance digital very large scale integration circuit. Corner-based timing verification techniques are prone to overdesign issues and may not lead to an efficient design for a tight power consumption budget. Therefore, several probabilisticbased (non-Monte Carlo) SSTA methods have been proposed to address the challenge of statistical timing analysis for high performance digital circuits. In the probabilistic-based SSTA methods, the signal arrival-times are treated as random variables, and the probability distribution function (PDF) of the circuit’s critical delay is extracted by a statistical analysis. Blaauw et al. [1] have provided a survey of SSTA methods. Despite the considerable improvement of SSTA methods, there are still some concerns about the applications for a reliable and large-scale timing sign-off. The key challenge and drawback of current probabilistic-SSTA approaches originate from the presence of complex timing and the process variation effects that are partly ignored or simplified

A

Manuscript received July 16, 2009; revised January 11, 2010 and May 4, 2010; accepted June 14, 2010. Date of current version November 19, 2010. This paper was recommended by Associate Editor C. V. Kashyap. J. Jaffari is with IGNIS Innovation, Inc., Kitchener, ON N2H 6M6, Canada (e-mail: [email protected]). M. Anis is with the Department of Electronics Engineering, American University in Cairo, New York, NY 10018, USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TCAD.2010.2061553

in each solution. Such effects include the nonlinearity of the gate delays as a function of the process parameters and capacitive loads, the nonlinearity of the MAX operation and the resultant non-zero skew signal arrival time PDFs. The complex relation between the input/output rise/fall signal transition times and gate delays introduced by current source models, interconnect delay models, non-Gaussian process parameters, and the spatial/structural correlations are some of the other complex issues that have been partially overlooked in previous probabilistic-based methods. Therefore, the Monte-Carlo (MC) method, as a traditional alternative to probabilistic techniques, has recently attracted attention for a reliable and accurate digital circuit timing signoff [2]–[5]. The major advantage of the MC method is its capability to account for any complicate timing and process model. Moreover, the development and integration costs of MC-based SSTA tools are minimal, since the available deterministicSTA engines can be maximally reused in developing a new MC-based yield analysis tool. In addition, MC is the class of embarrassingly parallel problems benefiting the most from running on multi-core systems. However, its most threatening disadvantage is the slow convergence rate. To achieve a reasonably precise estimation of the yield, thousands of simulations (samples) are needed by using the traditional-MC analysis. The precision of the MC-based methods is defined in terms of the statistical confidence interval of the estimation. The traditional-MC’s rate of convergence is independent of the problem’s dimension, but it decays with a very slow rate of O(N −1/2 ) with respect to the number of samples, N. To improve the convergence rate, hybrid sampling methods composed of the Latin hypercube sampling (LHS) and a quasiMC (QMC) sequence have been proposed recently [4]–[6]. The LHS method generates samples in each dimension by stratifying its domain into equi-probable sub-ranges, improving the uniformity of the samples in 1-D projections. The QMC utilizes low-discrepancy sequences for 1-D and higher projections. It is proved that the estimation error is mitigated by the equi-distribution (uniformity) of the samples rather than their randomness [7]. Therefore, the upper bound of the convergence rate of the QMC method, O(logd N/N), is found to be asymptotically superior to the MC technique, where d is the problem dimension (e.g., the number of process parameters or the number of its principal components). This asymptotic advantage seems to be achievable only if N >> ed . It is absolutely impractical even for moderate size problems to use that huge number of samples. However, the QMC

c 2010 IEEE 0278-0070/$26.00 

JAFFARI AND ANIS: ADVANCED VARIANCE REDUCTION AND SAMPLING TECHNIQUES FOR EFFICIENT STATISTICAL TIMING ANALYSIS

Fig. 1.

Approximate range preferred for each proposed method.

method in practice has shown a significant advantage over the traditional-MC for the analysis of the high-dimensional computational finance problems during the 1990s [8]. This surprising behavior is later justified through the analysis of the variance (ANOVA) of the high-dimensional problems and quantified with the notion of effective dimension [9]–[11]. In Section III, this phenomenon is reviewed to provide insight on how a QMC sampling method can be effectively adjusted for the SSTA problem. Therefore, the effective dimension of the digital circuit’s yield estimation problem is investigated. Due to such observations, an algorithm is proposed to improve the uniformity of the Sobol’s QMC samples [12], [13] in high-order projections. By using the proposed optimized-Sobol samples, an SSTA engine is developed to target a high-performance timing yield analysis that requires a fewer number of iterations than that of a non-optimized QMC sampler to estimate the yield with a certain confidence. As discussed in Section III, the proposed QMC-based SSTA engine does not significantly outperform the traditional-MC method for moderate numbers of samples (e.g., τ Iτ (p) = (3) 1 D(p) ≤ τ where τ is the maximum acceptable critical delay. Therefore, the following integral represents the timing yield:

 y = P (Iτ = 1) = Eϕ Iτ (p) = Iτ (p) ϕ (p) dp. (4) Rd

The MC method suggests a numerical technique to solve the integral in (4) by an N times sampling from the ϕ(p) distribution, evaluating the circuit’s critical delay, and extracting the mean of Iτ (p) by using the following estimator: # {i|Di < τ} yˆ = (5) N where #{.} is the number of elements in a set, and N is the total number of simulation iterations. The problem of the traditional-MC is its slow convergence rate (σyˆ τ = O(N −1/2 )), that is, the standard deviation of the estimation’s error declines with the inverse of the square root of the number of samples [20]. The following formulation can then be used to determine the number of samples with α-confidence half-range of β(1 − y) for a yield of y:  −1 2 (0.5 + α/2) y N= · (6) 2 β 1−y where −1 (.) is the inverse of the normal cumulative distribution function (CDF). It is evident that to reduce the interval range (β) by K, the number of samples must be increased K2 times. However, it is important to note that the estimation error, e = yˆ − y, is related to the equi-distribution (uniformity) of the samples rather than their randomness. This idea strongly suggests that by using a well-spread sequence, which is more uniformly spread than a pseudo-random sequence, a more precise estimation can be achieved [7]. For example, LHS [21]

Fig. 2. 2-D projections of different sampling approaches. The gray squares represent areas with high or low concentration of samples. (a) Traditional-MC. (b) LHS. (c) QMC (Sobol).

is a sampling technique which increases the convergence rate by providing more uniform samples in 1-D. This is achieved by partitioning the domain of each random variable into equal-probable sub-ranges and generating the same number of samples in each sub-range, randomly. A random permutation of the LHS samples is finally adopted to generate random sample vectors. However, the discrepancy of the LHS samples is not noticeably better than that of the traditional pseudo random-MC samples in projections higher than 1-D. This is due to the fact that the permutation of the samples is performed randomly. Fig. 2(b) illustrates a 2-D projection of the LHS-based samples. It can be seen that the samples are not much more uniform than the traditional MC-based samples [Fig. 2(a)] in 2-D projections. Instead of generating random samples by a pseudo-random number generator, or stratifying each dimension separately as it is done in the LHS, the QMC is a technique to produce deterministic low-discrepancy sequences that are more uniformly distributed over the d-Dimensional problem space than former methods. A higher than 1-D uniformity is achievable that leads to a faster convergence rate than that of the MC or LHS technique. Fig. 2(c) depicts the 2-D projection of the QMC samples, generated by the Sobol algorithm [13]. Other examples of low discrepancy sequences include the Halton [22], Faur [23], and Niederreiter [24]. The upper bound error of the is given by the Koksma-Hlawka  QMC technique  bound, O logd N/N , which promises an asymptotically faster convergence than that of the MC method [7]. However, this superiority seems to be unachievable unless N > ed , which is absolutely impractical for even moderate size problems (d > 10). Surprisingly, the practical applications of the QMC on some high-dimensional computational finance problems [8] have shown significant advantages over the MC’s performance.   The convergence rate for such problems is roughly O n−1 , independent of the problem dimension. Much research has been conducted to explain this surprisingly good performance [25], [26]. A qualitative explanation is then developed under the notation of an effective dimension in [9] and [11]. The effective dimension idea is explained by the ANOVA decomposition of the integrand function f . Suppose f (x), the function of random variables x is decomposed into a sum of orthogonal functions as follows for = {1, 2, · · · , d}: d   f (x) = fu (x) = f0 + fi x(i)   u⊆  (i) (j)  i=1 fij x , x + · · · + f1···d x(1) , · · · , x(d) . + i<j

(7)

JAFFARI AND ANIS: ADVANCED VARIANCE REDUCTION AND SAMPLING TECHNIQUES FOR EFFICIENT STATISTICAL TIMING ANALYSIS

Caflisch et al. [9] introduced the notion of the effective dimensions as follows: 1) the effective dimension of f in the superposition sense is dS , if |u|≤dS σ 2 (fu ) ≥ Pσ 2 (f ); 2) the effective dimension of f in the truncation sense is dT , if u⊆{1,2,···,dT } σ 2 (fu ) ≥ Pσ 2 (f ); where P is a proportion chosen to be less than, but close to 1. For example if P = 0.99 of the variance of f is due to the functions of single x variables, the effective dimension in the superposition sense, dS , is 1. This means the interactions among the parameters have negligible effects on the function. On the other hand, the truncation sense of effective dimension is related to the list of important variables. Therefore, dT = m means that the first m variables create the highest portion of the integrand variance. It should be noted that the variance of any MC-based estimations is directly related to the variance (error) of the integrand function. Therefore, the superiority of the QMC compared with that of the pseudo-random MC method, for some of the high-dimensional problems, arises from the low-effective dimensionality of such problems and the fact that QMC sequences produce high uniformity in the first few dimensions (≤ 12) and low order projections (≤ 3) [10], [11]. As a result, we first investigate the effectiveness of the QMC method for the analysis of the timing yield, through the analysis of the effective dimension of the yield function, Iτ (p). For this purpose, a numerical technique [27] is used to estimate the variance of the different ANOVA terms of the indicator-type yield function. This technique utilizes the quasiregression method [28] and uses shifted Legendre polynomial functions as the basis for the orthogonal ANOVA terms. Table I lists the relative importance of the 1-D and 2-D ANOVA terms, when the yield is 0.5 and 0.99. The relative importance values are computed by using m 1-D : 100 × σ 2 (fi )/σ 2 (f ) i=1

Full1-D : 2-D :

100 × 100 ×

d i=1 m−1

σ 2 (fi )/σ 2 (f ) m

(8)

  σ 2 fi , fj /σ 2 (f )

i=1 j=i+1

where m is the number of the grids of the mesh structure, and d is the total number of variables including the grid and purely random parameters. The resultant analysis demonstrates a reduction on 1-D significance as the yield increases, indicating that the LHS technique exhibits only a slight improvement over the traditional-MC for a typical yield analysis, close to the extreme of the critical delay distribution tail. The analysis also reveals that to benefit more from a QMC sampling, the sampling technique should be carefully optimized to maximize the high-dimensional uniformity. These are significant observations, suggesting that the excellent performance of the QMC method seen in financial problems can not easily be achieved for digital circuit yield analysis problems. This is due to the fact that, although the aforementioned financial problems are found to be effectively very low dimensional, i.e., at most 2-D with significant 1-D portions [27], the yield estimation

1897

TABLE I Relative Importance of ANOVA Terms for the Yield Function Circuit C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552 S9234 S13207 S15850 S35932 S38417 S38584

1-D 65 64 65 63 67 60 61 61 63 61 60 59 61 45 59 56

Yield = 0.5 Full 1-D 2-D 69 7 67 7.4 70 7 66 6.7 71 8 64 4.4 64 4.1 64 7.1 64 3.3 64 3.4 64 3.4 63 1.2 65 1.2 59 13 67 25 64 29

1-D 19.4 15.2 19.6 14.8 18.6 11 9 12 9.1 9.2 8.2 7.5 6.8 5.4 5.2 5

Yield = 0.99 Full 1-D 20.1 15.8 20.2 14.9 19 11.6 9.2 12.3 9.3 9.6 8.4 7.9 7.5 10.7 7.9 6.1

2-D 18.5 19.6 18.8 24.4 18.3 21.5 18.6 25.3 22.5 21.2 19.4 17.9 17.0 36.5 10.8 23.5

function is not. Therefore, the investigation and possible improvement of the high-order discrepancies of the QMC samples should be seriously considered, if such a method is adopted for an SSTA, particularly for the yield analysis. It should be also noted that our analysis reveals strong 1D ANOVA terms for the mean and strong 2-D terms for the standard deviation of the critical delay, opposed to the yield which also has strong higher-order terms. Therefore, both the LHS and QMC are good candidates for the mean estimation [4]. However, for the standard deviation estimation, a carefully designed QMC sampler that produces highly uniform 2-D projections should be considered. Justifying the 1-D and 2D behaviors of the mean and variance of the critical delay is not difficult. In fact, a promising probabilistic-based SSTA technique, proposed in [29], approximates the critical delay with a linear additive function of the principal components of the process parameters. This approximation suggests that the critical delay function is effectively 1-D in superposition sense, so the first moment (mean) remains 1-D. Therefore, the second moment (variance) includes a significant set of 2-D terms due to the pairwise multiplication of the principal component factors when powering the additive circuit’s delay function to two. It is now easier to realize why the yield function is composed of one, two, and higher dimensional term, that is, the indicator function of (3) consists of many higher than one degree terms, especially when yield is close to the two extremes. This is verified by the following approximation. Suppose a sigmoid function, the error function, is used to approximate the indicator function. If a Taylor expansion is used, then 1 + erf (α (τ − h)) 2

 1 1 α3 (τ − h)3 α5 (τ − h)5 = +√ α (τ − h) − + − ··· . 2 π 3 10 (9)

Iτ (h) ≈

As a result, the closer the yield is to 1 or 0 the further h = D (p) is from τ, and hence, the higher order terms are stronger. In fact, one application of the LHS for the analog circuit yield analysis, proposed in [30], shows a significant efficiency drop, when the yield reaches the extremes (e.g., over 90% or below

1898

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 12, DECEMBER 2010

Fig. 3. Some bad pairing (high-discrepancy) of Sobol’s samples. (a) (9, 10, 2)-net. (b) (8, 10, 2)-net. (c) (7, 10, 2)-net.

10%) that can be well justified through the existence of highdimensional ANOVA terms in the high/low yield ranges. This is a critical issue, since the domain of attraction in a digital circuit yield analysis problem is actually close to the high extreme. It is now easier to predict that the application of LHS alone does not provide a significant improvement on the yield estimation accuracy, and the QMC sampling should be efficiently optimized to obtain the lowest discrepancy in high dimensional projections.

lower t is, the lower the discrepancy. Therefore, the proposed Sobol sequence should target minimizing t as an objective. Before discussing the discrepancy optimization method, the general algorithm for generating Sobol samples [13], along with an approach to determine the discrepancy of the Sobol samples, are reviewed. To generate N = 2m samples of a d-dimensional Sobol (j) sequence, xi , where i = 0, · · · , N − 1 and j = 1, · · · , d, (j) each xi is generated from the following equation: (j)

D ((t, m, s) -net in base 2) ≤

  ms−1 t 2 + O ms−2 . (s − 1)!

(10)

Fig. 3 illustrates some of the poor 2-D projection pairings for 1024 Sobol samples. Each of the projections in the figure is considered as a sequence from (t, 10, 2)-nets of base 2, where t is 9, 8, and 7, respectively, for Fig. 3(a)–(c). Therefore, the

(j)

(11)

(j)

where ⊕ denotes a bitwise XOR operation, vk are binary direction numbers, and the ai ∈ {0, 1} coefficients are extracted from the binary representation of the Gray code of i. The Gray code of i is defined as G (i) = i ⊕ int 2i , where int [x] represents the largest integer inferior or equal to x. For (j) example, to find x25 , the following steps are taken: i = 25 → G (i) = 11001 ⊕ 01100 = 10101 (j) (j) (j) (j) and hence, x25 = v1 ⊕ v3 ⊕ v5

B. Proposed QMC/LHS-Base Yield Analyzer In this part, the discrepancy of the Sobol’s QMC sequence is investigated, and a method is proposed which produces low-discrepancy Sobol samples. Such a method generates Sobol samples with as much as possible high-dimensional uniformity. It also ranks the sample dimensions according to their discrepancies so that the most uniform dimension is the first one. Therefore, the generated samples can be finally applied to the process parameters and their principal components with an ordering procedure sorted based on the importance (criticality) of them to reduce the estimation error. 1) Sobol’s Sequence Generation and Discrepancy: The Sobol [13] is a low-discrepancy QMC sequence that is preferred over many other QMC sequences [22]–[24], especially for high-dimensional estimations, due to its higher uniformity for both 1-D and 2-D projections, as a result of it prime base of two [31]. However, due to the finite number of samples, all the QMC methods including the Sobol, produce low uniformity in many high dimensional projections, which is undesirable for an efficient digital circuit yield analysis. Note that we showed that the yield problem is composed of many high dimensional ANOVA terms, so the non-uniformity of the samples in a projection increases the variance of the error of the corresponding term in the ANOVA decomposition. The Sobol sequence can be represented as the (t, m, s)-net and (t, s)-sequence in base 2. The (t, m, s)-net in base 2 is a set of 2m points in [0, 1]s cube such that the number of points in every elementary subinterval of volume 2t−m is exactly 2t , 0 ≤ t ≤ m [32]. Based on the upper bound proposed in [32] for the discrepancy of a general (t, m, s)-net in base b, the following can be derived as the upper bound of the Sobol discrepancy:

(j)

xi = a1 v1 ⊕ a2 v2 ⊕ · · · ⊕ am v(j) m

(12)

(j)

where each direction number, vk , is a binary fraction that is written as (j) (j)  vk = νk 2k (13) (j)

(j)

where νk is an odd integer, 0 < νk < 2k for k = 1, · · · , m. (j) For each dimension j, a sequence of integers νk is defined by a q-term recurrence relation as (j) (j)

(j)

νi = (j)

(j) (j)

2b1 νi−1 ⊕ 22 b2 νi−2   ⊕ ··· (j) (j) q−1 (j) (j) ⊕2 bq−1 νi−q+1 ⊕ 2q νi−q ⊕ νi−q

(14)

where bk ∈ {0, 1}, k = 1, · · · , q − 1 are the coefficients of a q-degree primitive polynomial [33] specified for each dimension j. Jaeckel [34] offers a collection of more than 8 million primitive polynomials up to degree q = 27 to be used for the Sobol generation. It is evident in (14) that in each dimension, there is a great deal of flexibility in choos(j) ing the initial values (ν1 , · · · , νq(j) ), whereas the remaining (j) (j) ) is generated through the q-degree recurrence (νq+1 , · · · , νm relation of (14). The constraints on the initial direction values (j) νk for k = 1, · · · , q(j) are that they must be odd integers and less than 2k ; therefore, for a dimension with a q-degree primitive polynomial, there are 2q(q−1)/2 possible choices in selecting the initial direction values. Consequently, a random (j) technique is traditionally used to choose the initial νk terms for each dimension in [34]. By referring to Fig. 3, it can be seen that to fill the empty regions and increase the uniformity of the samples, either more samples are needed or the initial direction values of the corresponding dimension should be changed. This is where the proposed technique comes into the picture. As a result, the objective of this part of the work is to pick a set of initial direction values to reduce the bad pairings as much as possible. Moreover, the objective should be that the more uniform projections are generated for the first dimensions, and the uniformity worsens as the dimension index increases. This is helpful because not all the process parameters, and hence

JAFFARI AND ANIS: ADVANCED VARIANCE REDUCTION AND SAMPLING TECHNIQUES FOR EFFICIENT STATISTICAL TIMING ANALYSIS

Algorithm 1 Optimize Initial Direction Values (m) (m−1)

Generate random Initial Direction Values (IDV) for 2 dimensions; Generate upto m DVs by using the recursions for each dimension; {Find pairwise discrepancies and initialize priorities} for k = m − 1 downto 1 do for d2 = 1 to 2m−k do for d1 = 2m−k−1 + 1 to 2m−k do t(d1, d2) = t of the 2-D projection of d1 and d2; if t(d1, d2) > m − k − 1 then priority(d1)+ = 2t(d1,d2)−(m−k−1)−1 ; end if end for end for Initialize Temperature; while There is a bad pairing (any t(d1, d2) > 0) do while inner-loop criterion do Randomly select a dimension, directed by priorities; Randomly select an IDV, in that dimension; Randomly change that IDV up to kth bit; Compute the new t matrix and the priority vector; if accept(new − old priority, Temperature) then Apply the changes to the selected IDV; Update t and priority; Generate upto m DVs by using the recursion; end if end while Update Temperature; end while end for

the principal components, are equally critical, so a sorting of them can be considered to boost the efficiency of the method. Sobol, himself, has realized the importance of the initial direction values on the quality of the generated sequences, and has proposed two properties to increase the uniformity of the samples [35]. However, to satisfy Sobol’s proposed properties, 22d samples are needed. Cheng and Druzdzel have defined a measure of 2-D uniformity, and proposed a search algorithm to find a set of initial direction values with a defined uniformity [36]. The primary drawback of this technique is that the number of samples and dimensions must be known in advance. Moreover, it re-produces the Sobol sequences and re-evaluates the defined discrepancy measure in each iteration (after each initial direction value update), which significantly increases the runtime of the algorithm, especially for optimizing the initial values for the large number of samples and dimensions. This was due to the assumption that the bad dimension pairings cannot be found prior to the generation of the sequences [31]. However, there is no need to actually generate a Sobol sequence to detect the poor pairings or to measure the t. The t value, as a measure of the uniformity, can be found for a pair of dimensions by using the definitions given in [37]. (j) Suppose vi,b is the bth most-significant bit of the binary (j) representation of vi , the ith direction value of the jth dimen-

1899

sion. If for any integer d1 and d2 in the range of [0, m] and d1 + d2 = l, the binary system of d1 + d2 vectors of length m (j1) (j2) composed of {vi,b1 |1 ≤ i ≤ m, 1 ≤ b1 ≤ d1 } and {vi,b2 |1 ≤ i ≤ m, 1 ≤ b2 ≤ d2 } is full-rank, then the 2-D projection of dimensions j1 and j2 creates a (m − l, m, 2)-net point set. (j1)

(j2)

For example, if ∀i = 1, · · · , m vi,1 = vi,1 , then l = 1, so t = m − 1. This means that up to the (2m )th sample, the projection of the j1th and j2th dimensions is similar to that in Fig. 3(a). In other words, for all samples 0 ≤ s ≤ 2m − 1, the xs(j1) < 0.5 ⇔ xs(j2) < 0.5, because the MSB of the m first direction values of the dimension j1 and j2 are equal. 2) Optimization of the Sobol’s Sequence Discrepancy: The timing yield analysis problem is found to be composed of high-dimensional terms. Therefore, the discrepancy of the high-dimensional projections affects the estimation error. The discrepancy of a multi-dimensional Sobol sequence can be improved by the careful selection of the initial direction values. However, it is known that all the random variables of the yield function are not equally important. For example, the m variables, representing the principal components (PC) of the grid random variables in the PCA decomposition, contribute the most to the ANOVA decomposition (refer to Table I). This is due to the fact that a single PC affects to the propagation delay of many gates, and hence, it affect the critical delay more strongly. Moreover, the gates with closer to zero timing slack have more chance of becoming critical leading to this conclusion that the PCs with greater contributions to those types of gates have more effects on the circuit yield than the rest. As a result, if the direction value of the Sobol generator is set such that the lower discrepancy dimensions are obtained first, and the higher discrepancy ones later, the generated Sobol samples can be used in an ordering scheme according to the importance to the PCs for an efficient estimation. In this part, a simulation annealing optimization algorithm (Algorithm 1) is proposed which produces such direction values. Note that this is a relatively lengthy process, and can take as long as a day, but it is only a one-time process. The extracted initial direction values are saved and used for future Sobol sequence generations. For a given m, the objective of the optimizer is to set the initial values such that the maximum t for the pairs of dimensions of {1, · · · , 2m−k } is m − k − 1, where k = 1, · · · , m − 1. This means, t = 0 (perfectly uniform) for the first two dimension, t ≤ 1 for pairs of dimensions of one to four, t ≤ 2 for pairs of dimensions of one to eight, and so on. Therefore, any pairing of the dimensions, d1 = {2m−k−1 + 1, · · · , 2m−k } with dimensions d2 = {1, · · · , 2m−k }, should only be verified to satisfy the t ≤ m − k − 1 condition, speeding up the optimization process. To help the optimizer to converge even faster, only the first k bits of the initial values in the dimensions of ({2m−k−1 + 1, · · · , 2m−k }) are included in the search during the optimization. This due to the fact that in this range, the maximum t is m − l = m − k − 1, hence, l = k + 1, meaning that, at most, up to the kth bit of these dimensions form the system of independent binary vectors. Moreover, the simulation annealing engine is directed by an initial value

1900

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 12, DECEMBER 2010

Fig. 4. Distribution of t, the measure of discrepancy, for 1024 Sobol samples using (top) random initial direction values and (bottom) optimized initial direction values. (a) Dimensions 1,..., 512. (b) Dimensions 1,..., 256. (c) Dimensions 1,..., 128. (d) Dimensions 1,..., 64. (e) Dimensions 1,..., 512. (f) Dimensions 1,..., 256. (g) Dimensions 1,..., 128. (h) Dimensions 1,.., 64.

selection criterion, giving a high priority to those dimensions that have the worst discrepancies. Fig. 4 provides a comparison of the distribution of t, the measure of the discrepancy, before and after the optimization for m = 10 (1024 samples). As depicted in Fig. 4(d), even for the first few dimensions, (1, · · · , 64), and before the optimization, some pairs of dimensions have very high discrepancies (t = m − 1 = 9), and many others have discrepancies higher than the maximum of the optimized version (t > 5). However, as shown in Fig. 4(e)–(h) for the optimized version, the maximum discrepancy reduces from 8 to 5, moving from the dimension of 512 down to 64. 3) Yield Analyzer: The proposed SSTA framework is constructed by combining the proposed low discrepancy Sobol sequence and the Latin hypercube samples. A similar hybrid approach is developed in [4] to leverage from the high uniformity of a few QMC dimensions for the important parameters. In this research, for a given number of N = 2m samples, m−1 2 dimensions use Sobol samples, whereas the remaining dimensions use LHS samples. The optimum initial direction values of the Sobol generator for a given m is pre-computed and stored by using the algorithm proposed earlier. Since the Sobol samples provide a higher than 1-D uniformity, they are prioritized to be assigned for the most important PCs of the process parameters. As discussed earlier, the PCs contribute the most to the variance of the critical delay, therefore, the more uniform dimensions are assigned for them. However, the LHS samples can be used to provide samples for the nonspatially correlated process parameters (e.g., RDF) or any less important PCs that are not assigned to the Sobol samples. The number of Sobol dimensions is limited to 2m−1 for a given number of samples to limit t ≤ m − 2. However, by approaching the first dimension, the uniformities increase. Therefore, it is beneficial to rank the PCs according to their importance. Consequently, a weight is assigned to each PC as

a measure of its criticality. The following equation is used to derive the criticality of each PC:     Nj p Slackj,k 2 (15) ψi,j exp α · ci = Dnom j=1 k=1 where ci is the measure of the criticality of the ith principal component, p is the number of PCs, ψi,j is the coefficient of the jth PC in the ith grid variable (obtained from the PC analysis [29]), Nj is the number of logic cells in the jth grid, Slackj,k is the slack of the kth cell in the jth grid, Dnom is the nominal critical delay of the circuit, and α < 0 is a constant factor. As a result, if a grid has many close-to-zero slack cells and/or its neighboring grids have many close-to-zero slack cells, the corresponding PC of that grid has a high criticality. The PCs are then ordered, based on their criticalities and then assigned to the Sobol dimensions, sequentially. If there are more Sobol dimensions than PCs, the remaining Sobol dimensions are assigned to some of the non-correlated process parameters, according to a simple criticality measure for them, equal to −1 × slackcell . Thus, the smaller the slack of a cell is, the higher the probability that the non-correlated parameters of that cell are assigned to the Sobol samples. C. Results The standard deviations of the estimation errors are investigated by applying the benchmark circuits. The values, reported in Table II are the improvement percentage compared with those of the traditional-MC by using the proposed method with and without using the optimized direction values. The maximum acceptable delay is set such that circuits have 95% yield. The standard deviation of the estimated yield is obtained by repeating the MC or LHS/QMC analysis 100 times using a constant number of samples (32, 128, 512,

JAFFARI AND ANIS: ADVANCED VARIANCE REDUCTION AND SAMPLING TECHNIQUES FOR EFFICIENT STATISTICAL TIMING ANALYSIS

TABLE II Standard Deviation Reduction (Percentage) of the Estimated Yield Compared to the Traditional-MC Analysis Samples Opt-DV C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552 S9234 S13207 S15850 S35932 S38417 S38584 Average

32 w/o 11.7 12.9 25.6 16.3 19.0 16.8 16.0 16.2 17.1 18.4 18.0 13.1 8.4 5.9 13.9 13.8 15.2

128 w/ 18.6 28.3 16.8 20.5 34.3 13.5 14.2 13.0 15.5 18.4 7.4 12.0 18.3 20.3 9.3 18.3 17.4

w/o 26.7 13.4 25.4 21.9 30.2 19.6 16.6 23.6 7.7 12.3 2.6 12.8 16.9 3.0 5.4 23.2 16.3

w/ 34.1 35.6 27.5 23.3 37.0 25.8 13.7 10.1 19.6 29.5 30.6 8.8 10.5 14.9 12.6 27.6 22.5

512 w/o 35.8 21.6 28.1 18.0 13.9 3.7 11.7 17.6 13.2 15.2 11.2 0.5 20.8 11.2 8.0 12.6 15.3

w/ 38.7 39.8 42.1 34.2 37.1 29.1 35.7 27.0 22.4 22.8 19.8 36.8 20.2 20.8 20.6 20.7 29.2

2048 w/o w/ 42.6 34.0 27.5 47.0 34.6 47.3 17.7 40.9 32.4 52.9 17.4 35.4 22.2 43.2 10.8 32.9 8.7 37.9 9.9 25.3 15.8 38.5 10.1 31.5 11.9 43.1 6.8 27.6 0.4 20.1 5.4 23.4 17.1 36.3

The proposed technique (QMC/LHS) is tested with and without applying the optimized direction values.

1901

optimized sampler. Given the O(N −0.5 ) convergence rate of the traditional-MC, to obtain an estimation of the yield with the same confidence interval as the proposed method, if the improvement of standard deviation is r%, then (1−r/100)−2 × samples are needed in using the traditional-MC method, which translates the 36% improvement to 2.44 × 2048 samples to get the same accuracy by the traditional-MC method. IV. Order Statistics-Based Control Variate for Yield Estimation In this section, a timing yield estimation technique is introduced which has a higher efficiency than the QMC/LHS-base method for a moderate number of samples (a few hundred to a few thousand). The problem of the QMC/LHS method is its negligible variance reduction, when a small number of samples is used. Moreover, due to the strong high-dimensional ANOVA terms in the yield function, the QMC/LHS is, generally, not very effective. The classical control variate, a variance reduction technique, is first reviewed in this section. Then the inefficiency of the direct application of the classical approach for yield function is investigated through the analysis of the correlation between the actual and control variables for the yield problem. Finally, an order statistics-base technique is applied for the timing yield estimation with the use of an auxiliary control-variable. A. Control Variate and Yield Estimation Problem

Fig. 5. Standard deviation of the error of the estimated yield for C6288: comparison of traditional-MC, QMC/LHS method with non-optimized IDV, and proposed QMC/LHS method with optimized IDV.

2048), recording the yield in each run and finally calculating the standard deviation of the 100 estimated yields. Note that in the QMC-based sampling, rerunning the original Sobol generator does not generate different sequences. Therefore, the scrambled Sobol technique [38] is used in order to generate randomized-QMC samples so that the generated samples in each run are different, and can be used to estimate the variance of the estimation’s error. The scrambling adds a degree of randomization into the samples, but maintains the structure of the samples in terms of the discrepancies. Fig. 5 depicts the standard deviation of the error for the yield of C6288 as an example. As listed in Table II, the non-optimized direction value version shows, in average, an almost 16% improvement across the range of the numbers of samples. This occurs because the high-dimensional discrepancy of only a few random variables is low in the non-optimized technique; therefore, mostly it is the 1-D ANOVA terms that contribute to the estimation variance reduction. However, the average improvement of the standard deviation is as high as 36% because more low-discrepancy random variables are generated by the optimized direction value Sobol sampler than that of the non-

The control variate is known as a promising variance reduction technique for the expected value estimation, when an auxiliary model (control variable) is available. The two necessary conditions for the control variable is that it has to be highly correlated with the parameter under expected value estimation, and its exact expected value should be exactly known. A rigorous exposition of the classical control variable technique is presented in [39]. The following is an overview of the classical control variate method: if X is the random variable under expected value estimation, and C is the control variable with a known expected value of µC , then X is substituted by X∗ in computing E[X], as X∗ = X − β (C − µC )

(16)

where β is a constant. This leads to an estimation variance of   var X∗ = var (X) − 2βcov (C, X) + β2 var (C) . (17) Therefore, a significant variance reduction can be achieved by the proper setting of β, if X and C are highly correlated. However, this classical formulation is not effective for the yield estimation problem, if it is applied directly to the yield function of (3). This occurs because the random variable of the yield problem (Iτ ) is a crisp function. Consequently, no matter how well a control variable is correlated with D, the critical delay, a slight shift (bias or scale) between the random variable D and the control variable model, C, impacts the correlation between the yield functions due to D and C. For example, assume D ≡ N(0, 1) is a random variable under yield estimation, and C = 0.75(D−1)+0.2N(0, 1) is a highly correlated control variable where corr(D, C) = 0.966. If the threshold value for the yield estimation is set to 1.5, the correlation between the

1902

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 12, DECEMBER 2010

yield of D and C is very low, corr(I1.5 (D), I1.5 (C)) = 0.095, due to the shift and scaling of the control variable with respect to the actual variable. In other words, P(X < 1.5) cannot be well approximated (modeled) with P(C < 1.5); therefore, no improvement is gained by the direct application of the classical approach for the yield problem. B. Proposed Order Statistics-Base Control Variate Method The control variate method has been used for a quantile estimation problem earlier in [40], and a median unbiased order statistics-based [41] estimator has been derived for it. A similar approach is used in this article, but to derive the median unbiased estimator for yield. Suppose an auxiliary control variable, C, is available which is highly correlated with the circuit’s critical delay, D. Now assume that the CDF of C is (c) and is exactly known. This control variable is later introduced in this section. The problem is now reduced to find a quantile point cq such that the hypothesis test of P(D < τ) = P(C < cq ) is satisfied. Then, simply y = P(D < τ) = (cq ) is an estimation of the yield, based on the knowledge of the control variable as well as the simulation of D. Such a point, cq , is located where the following condition is satisfied:   # i|Ci < cq = # {i|Di < τ} .

(18)

This estimator is used instead of the natural one, formulated in (5). For example, suppose 95 out of 100 simulations yield to a critical delay lower than the maximum acceptable delay. If the natural estimator, (5), was used then 0.95 would be reported for the yield. However, the actual yield might be greater (e.g., 0.97), but due to the random nature of selecting the simulation points, only 95 points have been placed in the acceptable region (two points fewer than expected). However, because C is correlated with D, it is also very probable that only 95 points (two points fewer) reside in the acceptable region of the control variable formed by (C < cq ). Therefore, the cq can be determined such that the number of points where C < cq be equal to 95. Obtaining such a cq results in an estimation of yˆ = (cq ) which is more accurate than the natural estimator based on D alone, since the CDF of C,  cq , is exactly known, and the control variable is highly correlated with the critical delay. Algorithm 2 demonstrates the proposed yield estimation method. The C and D vectors are ordered ascendingly. If k be the largest integer such that D(k) < τ, then cq can be set to any value between C(k) and C(k + 1) in order to satisfy (18). A linear interpolation is used such that the closer τ is to the D(k) the closer cq is to C(k). Note that if k = N, no simulation with a delay higher than threshold (τ) is observed, then cq is set to CN . As will be seen later in this paper, the first condition (k = N) is a source of bias which becomes problematic as the number of samples shrinks. The derivation of such an estimator is similar to that of the median unbiased quantile point estimator as reported in [40]. The hypothesis H : P(D < τ) = P(C < cq ) is equivalent to H : P(D < τ, C < cq ) + P(D < τ, C > cq ) = P(D < τ, C < cq ) + P(D > τ, C < cq ) which is equal to H : P(D < τ, C > cq ) = P(D > τ, C < cq ).

Algorithm 2 Order-Statistics-based Control Variate Yield

Generate an approximate model for critical delay for spls = 1 to N do p = Generate a d-Dim random process vector D(spls) = Simulate critical delay of circuit with p C(spls) = Evaluate the approximate model with p end for Ascending sort C and D k = The largest integer such that D(k) < τ if k = N then cq = D(N) else cq = (τ−D(k))(C(k+1)−C(k)) + C(k) D(k+1)−D(k) end if yˆ = (cq )

The test of this hypothesis is achieved by an uniformly most powerful unbiased test and the application of the McNemar’s test [40]. Up to this point, it is assumed that an auxiliary variable, C, is available. Such a control variable can be determined by extracting the nominal critical path of the digital circuit, and deriving a linear expression for its delay with respect to the process parameters. The nominal critical path is defined as the path with the highest delay, where all process random variables are set to their nominal value. The linear modeling of the path delay, in relation to the process parameters, leads to a Gaussian path delay. Therefore, such a control variable has a known CDF. The expression of this variable is extracted as follows. Suppose C(p), the control variable, is the delay of the nominal critical path which is a function of process parameters p. Therefore

#gates

C (p) =

  T (i) p(i) , S (i−1)

(19)

i=1

  where T (i) p(i) , S (i−1) is the delay of the ith gate in the critical path as a linear function of process parameters of that gate, p(i) , and the signal transition time of the fan-in gate, S (i−1) , as follows:

T

(i)



(i)

p ,S

(i−1)



#p(i)

=

a0(i)

+

a1(i) S (i−1)

+



(i) aj+1 p(i) j

(20)

j=1

where a0(i) is the nominal delay of ith gate, when the process variation parameters are all set to zero, and the input transition is zero (step function), and

S

(i)



(i)

p ,S

(i−1)



#p(i)

=

b0(i)

+

b1(i) S (i−1)

+



(i) bj+1 p(i) j

(21)

j=1

where b0(i) is the nominal transition time of the ith gate. The S (0) is set to the constant primary input transition time.

JAFFARI AND ANIS: ADVANCED VARIANCE REDUCTION AND SAMPLING TECHNIQUES FOR EFFICIENT STATISTICAL TIMING ANALYSIS

Finally, to model the spatially correlated process parameters, the PCA method, adopted for the SSTA in [29], is used. The jth process variable of the ith gate, p(i) j , is decomposed into a weighted linear sum of a set of independent normal random variables. As a result, C, the delay of the nominally critical path is formed as a linear function of a set of independent Gaussian random variables. As a result, the PDF of C is Gaussian with a known mean and variance leading to a known CDF. Some important issues should be considered here: first, the assumption of linearity is only for the control variable, not for the actually estimated critical delay, D. Second, although accounting for only the nominally critical path, leads to an underestimated value, no problem exists as long as the control variable remains correlated with D, the actual delay. In fact, D and C are highly correlated mostly because the underlying process parameters are globally and spatially correlated. However, even if the process parameters were purely random, there would be considerable correlation due to sharing the critical gates by different paths. Table III lists the correlation factor between the control variable and the actual critical delays. The first column is obtained when the gate length variations are globally and spatially correlated, as modeled in Section II, and the second set of correlation numbers is obtained for purely random gate length variations. As can be seen, the C499 and C1355 circuits show the lowest correlation factors in the purely random case. This is due to the structure of these two circuits, where many paths have delays that are equal or very close to the delay of the nominally critical path. In fact, both the C499 and C1355 circuits are 32-bit single-error-correcting circuits in which most of their paths are critical, and very few gates have a non-zero slack. A similar situation is seen for S35932 where the number of gates with low slack is very high compared with that of the other circuits in the same range of gate count. Therefore, the use of only one nominally critical path to generate the control variable leads to a variable with a low correlation to the actual critical delay in these cases. This is due to the fact that it is always possible that another path becomes critical, and the model almost certainly underestimates the delay. Finally, it is also possible to detect different potential critical paths from different process corners. We suggest two approaches to leverage this information. First, a control variable can be set to weighted-sum (or average) the critical delay of the potential critical paths obtained from limited numbers of corner analysis. This way the control variable remains Gaussian but can represent a larger set of critical paths. Second, a control variable can be set to the maximum of the limited potential critical paths. This technique models the actual critical delay more accurately than the weighted sum method, but the control variable is no longer Gaussian. As a result, finding its quantile in the order-statistics estimator requires a numerical integration. C. Results Table IV shows the percentage of the standard deviation reduction by applying the proposed order statistics-based method, compared with the traditional-MC sampling. Similar

1903

TABLE III Correlation Between the Defined Control Variable and the Actual Critical Delay, With and Without Considering Gate Length Spatial Correlations Circuits C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552 S9234 S13207 S15850 S35932 S38417 S38584

Correlated 0.9966 0.9806 0.9936 0.9707 0.9997 0.9958 0.9984 0.9934 0.9989 0.9996 0.9878 0.9996 0.9964 0.9071 0.9922 0.9895

Random 0.9502 0.7364 0.9175 0.6976 0.9977 0.9709 0.9679 0.8927 0.9777 0.9969 0.8491 0.9960 0.9304 0.8301 0.9068 0.9296

TABLE IV Standard Deviation Reduction (Percentage) of the Estimated Yield Compared to the Traditional-MC Analysis

Variations Samples C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552 S9234 S13207 S15850 S35932 S38417 S38584

Correlated 128 512 2048 72.4 65.5 68.3 48.6 47.7 41.6 65.1 59.4 58.9 50.2 34.5 42.8 86.9 80.3 76.6 70.4 65.4 66.9 80.0 76.0 71.8 62.3 58.1 56.8 82.5 73.3 74.6 87.9 81.7 77.5 45.7 47.8 51.2 86.7 79.2 75.1 62.3 65.2 66.0 16.1 7.2 10.1 53.5 53.0 51.5 53.3 50.6 54.7

Random 128 512 2048 42.7 42.3 34.7 1.9 9.8 −3.5 28.7 14.7 27.0 3.9 −4.5 −0.6 78.1 69.8 62.0 44.6 43.2 42.4 53.2 48.2 55.1 29.3 24.4 16.2 41.1 42.2 37.8 74.3 65.2 63.2 1.6 −1.7 −4.7 77.1 64.8 64.2 37.4 21.6 22.7 7.8 −5.0 4.1 7.7 15.8 3.9 48.4 40.0 45.7

The order statistics-based control variate technique is tested with and without considering spatially correlated random variables.

to the analysis in Section III-C, the yield of each circuit is chosen as to 0.95. As discussed earlier, the efficiency of this approach is highly dependent on the magnitude of the correlation between the critical delay and control variable. Therefore, we also report the efficiency of the method in an extreme case when all process variations are purely random. The advantage of this method is that the variance reduction is considerable even for a moderate number of samples. However, due to the faster convergence rate of the QMC/LHS method, it can outperform this method for some benchmark circuits (e.g., S35932) even with a low number of samples. Fig. 6 depicts the histogram of 100 times running the traditional-MC and the order statistics base control variate method for the C499 circuit with 512 samples in each run. The average estimated yield of both method is 0.95, but the standard deviation of the proposed method shows a 48% reduction.

1904

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 12, DECEMBER 2010

Fig. 6. Histogram of 100 estimated yields each obtained from 512 samples of (a) traditional-MC and (b) order statistics-based control variate, for C499 circuit. The proposed method’s estimation is unbiased with E[y] = 0.95 but shows 48% standard deviation reduction.

Fig. 7. Bias of the estimated yield for C6288: a comparison between 99% and 95% yields.

Finally, as discussed for algorithm 2, there is always a possibility with probability of yN to detect no failure circuit in N simulations. This is a source of bias which increases as N, the number of samples, reduces. The bias of an estimation is the deviation of the expected value of that estimation, yˆ , from the actual yield, y. Ideally a bias of zero (E[ˆy] − y = 0) is desired, but as seen in Fig. 7 this is not the case for the proposed method when number of samples reduces. It is evident that a negative bias is introduced, as the number of samples is reduced, due to the possible underestimated approximation of cq by CN in algorithm 2. Moreover, for a fix N, the bias is higher for a 99% yield than that of a 95% yield, since the probability of no failure (yN ) is higher.

This technique is suitable for large circuits and the early stages of design phases, when a quick estimation of yield is required with a small number of samples. The hybrid QMC/LHS and the order statistics-base control variate have been shown to be inefficient for such cases due to a large error variance or unwanted negative bias. However, there are two problems associated with such a technique. First, the variance of the error can still be very large, since the low number of samples leads to less confident estimation of critical delay’s mean and variance. To address this issue, the classical control variate technique will be applied to provide a highly accurate estimation of mean and variance. The second problem of this method arises from the error originating from the Gaussian model approximation. The Gaussian assumption can lead to an error that cannot be fixed by increasing the number of samples or improving the estimation variance by using a control variate technique, simply because the actual PDF of the critical delay is not exactly normal. However, a designer may live with this level of inaccuracy, and give credit to the traditional approach of yield analysis in terms of µ ˆ + kσˆ quantiles as long as the estimated mean and variance are highly accurate. This is especially true for early stages of design phases where other previously reviewed MC techniques are inefficient. Other solutions would be to employ PDF models that capture the higher number of moments such as skew-normal distribution [42] or the asymptotic probability extraction [43]. Inspired by the classical control variate equation (16), the following formula is used to derive a variance reduced estimation of the critical delay mean, µ ˆ D∗ : N 

µ ˆ D∗ =



i=1

(22) + β µ µC N where Di is the critical delay at the ith iteration, Ci is the control variable value at the ith iteration, βµ is a constant, and µC is the exact expected value of the control variable, C. It should be noted that the control variable is the same as what is used for the order-statistics method, which is the delay of the nominally critical path expressed in terms of linear gate delay equations in (19)–(21). By using (17) and its derivative over β, the optimum βµ which minimizes the estimated mean variance is

V. Classical Control-Variate and Gaussian Modeling for Yield Estimation As reported in the previous section, the order statistics-based control variate technique outperforms the optimal QMC/LHS method, especially for a moderate number of samples; however, it is observed that the method is prone to an underestimation of the yield. This leads to a negatively biased estimation, where the yield analysis is performed with a low number of samples. The magnitude of the bias increases rapidly, as the number of samples reduces beyond a threshold. In this section, the Gaussian PDF is used to approximately model the PDF of the critical delay. By assuming a Gaussian distribution, the mean and variance of the critical delay is estimated by MC simulations and used for yield estimation.

D i − β µ Ci

βµ =

cov (D, C) ρσD = var (C) σC

(23)

where ρ, σD , and σC are the correlation coefficients and the standard deviations of D and C, respectively. If a similar regression is used for the variance estimation, the following expression is derived for the variance reduced 2 estimation of the critical delay variance, σˆ D ∗: N  2 σˆ D ∗ =

(Di − µ ˆ D )2 − βσ (Ci − µ ˆ C )2



i=1

where µ ˆD =

N i=1

 Di

2 + β σ σD

N −1 N and µ ˆC =

N i=1

 Ci

N.

(24)

JAFFARI AND ANIS: ADVANCED VARIANCE REDUCTION AND SAMPLING TECHNIQUES FOR EFFICIENT STATISTICAL TIMING ANALYSIS

1905

The term, (N − 1), in the denominator eliminates the bias of the variance estimation as the E[(x − µ ˆ x )2 ] = N−1 σx2 for N N samples. In order to achieve a variance reduction for the introduced critical delay variance estimator, the two variables, (D − µ ˆ D )2 2 and (C − µ ˆ C ) must be correlated. The covariance between them are obtained as 

ˆ C) cov (D − µ ˆ D ) , (C − µ 2

2



 =2

N −1 N

2 2 ρ2 σC2 σD

(25)

if the critical delay is approximated with a Gaussian PDF. As a result, the optimum βσ is 2 ρ2 σD (26) 2 σC    2 4 since var (C − µ ˆ C )2 = 2 N−1 σC . N The problem with (23) and (26) is that they are functions of ρ and σD that are unknown themselves. One option is to use the MC simulation data to estimate ρ and σD and use them to calculate the optimum β factors. However, this causes a bias in the estimation of the mean and variance. Another option would be to use a portion of the samples to estimate ρ and σD and to obtain an approximate optimum-β, and use the rest of the samples to actually estimate the variance reduced mean and variance using (22) and (24). Fortunately, this is an unbiased method, but the variance reduction is not as large as what is reported in (17) since the β itself would be a varying parameter. The variance of the estimated β increases when the number of samples reduces, which impacts the variance reduction of the estimation for very small number of samples (e.g., 10). Looking back to (17), it is evident that the estimation variance follows a quadratic function with respect to β, hence, it can be concluded that by setting β to a value not too far from the optimum point, there would not be a huge estimation variance penalty. In fact, as long as β < 2 × βoptimum , still some variance reduction is achievable [refer to (23) and (17)]. Therefore, it is intuitively assumed that σD is very close to σC , since the former is the standard deviation of the circuit’s delay, and the latter is that of its nominal critical path. Also, ρ can be assumed to be very close to 1, given the highly correlated behavior of the two random variables. Therefore, βµ and βσ can be roughly set to 1. Fig. 8 compares the standard deviation of the estimated yield by using a Gaussian approximation for the traditional-MC and classical control variate method. Here, by the traditional-MC, we mean extracting the mean and variance by traditional-MC and calculating yield using a Gaussian fit. This is different from the extraction of the yield from traditional-MC by using the estimator of (5). Two options are considered for the β calculation in the control variable technique. In one approach, the optimum β is determined by using the 1/3 of the sample populations, and the rest (2/3) are used for the critical delay’s mean and variance estimation. In another approach, both βµ and βσ are set to 1, as discussed earlier. As can be seen, the magnitude of the variance reduction is lower for C1355 than that for C6288, the same as the order-statistics base

βσ =

Fig. 8. Standard deviation of the error of the estimated yield using Gaussian approximation: comparison of the traditional-MC and the proposed classical control variate method using optimum-β and constant-β (= 1). (a) C 6288. (b) C 135.5.

method. This is due to the lower correlation between the control variable and the actual circuit delay in the C1355 circuit. As it is also expected, the variance reduction is higher for a fixed β = 1 than that of the estimated optimum β, but the difference gradually vanishes as the number of samples increases, until the optimum β outperforms the fixed. It is noteworthy that the reported standard deviation does not reflect the intrinsic error (bias) due to the Gaussian approximation. Such an error is affected by the factors which are involved in producing higher moments than second order ones of the critical delay PDF (e.g., circuit graph and topology, and technology parameters). Table V lists the standard deviation reduction by using the proposed technique for fixed β = 1. The number of samples is 16. As listed, the variance reduction is significantly higher than that of the two earlier methods; however, the estimation is biased due to the intrinsic error of modeling the critical delay with a Gaussian PDF. For example in S35932, −0.43 bias in the estimate means the E[ˆy] = 94.57% and the fact that the reduction in standard deviation of error is 43.5% of the MC estimate, and the original standard deviation of error is 4% for 16 samples, hence we reduce it to 2.26%. Therefore, [µ ± σ] becomes [94.5% ± 2.26%] vs. [95% ± 4%]. VI. Putting Them All Together In this section, the proposed timing yield analysis techniques are integrated to form a unified engine. So far, three MC-base

1906

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 29, NO. 12, DECEMBER 2010

TABLE V Standard Deviation Reduction (Percentage) and Bias (100E[ˆy] − 95) of the Estimated Yield Compared to the Traditional-MC Analysis Variations C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552 S9234 S13207 S15850 S35932 S38417 S38584

Bias −0.19 −0.22 0.05 −0.16 −0.12 −0.11 −0.07 −0.04 −0.11 −0.21 −0.07 −0.18 −0.24 −0.32 −0.27 −0.45

Correlated Std. 91.0 78.6 90.4 75.9 96.2 89.8 93.4 88.7 94.8 96.3 84.7 96.5 88.6 55.4 84.9 68.7

Random Bias Std. 0 66.9 0.19 33.8 −0.04 65.1 −0.33 21.8 0.19 93.6 0.20 78.9 0.02 77.9 0 61.5 0.04 81.4 0.11 91.6 −0.33 50.5 0.15 93.3 −0.36 57.5 −0.43 43.5 0.10 59.1 0.27 77.9

The classical control variate technique is tested with and without considering spatially correlated random variables.

timing analysis methods are reviewed. 1) The low-discrepancy QMC/LHS engine: this method is not efficient for a range of small number of samples (e.g., < 500). However, the magnitude of the variance reduction, achieved by this method, is almost constant for different types of circuits, as it is only dependent to the relative importance of the yield function ANOVA terms which are pretty close for various circuits (refer to Table I). 2) The order-statistics control variate engine: this method is highly biased for the range of small number of samples, and the number of samples, required to disappear the bias, is yield-dependent (refer to Fig. 7). Moreover, the magnitude of variance reduction is highly circuittopology dependent. 3) The classical control variate engine: this method models the critical delay with a generic distribution, so it is inherently biased regardless of how many samples are used. Since the QMC/LHS method is robust (circuit-independent), but not very efficient when used with low number of samples, one may combine it with the order-statistics control variate method. This means the samples should be generated by using the QMC/LHS method, an the order-statistics-base control variable estimator should be utilized for the yield estimation. Consequently, the combined engine still works fine even if the circuit’s topology does not produce a highly correlated control variable. The standard deviation of the error is found to be slightly better than the maximum of variance reductions, achieved by applying each method alone. However, the bias issue for small samples can be inherited to this engine. In order to fix such an issue, the classical control variate estimator should be first used to provide an estimate of the yield. However, the sampling technique should remain QMC/LHS, so that the simulation results can be later used for the combined QMC/LHS and order-statistics control variate

estimator. Therefore, suppose y is found to be the estimated yield using the Gaussian approximation. The bias of the orderstatistics method originates from the cases, where no failure circuit is observed by using N simulations. Intuitively, if the number of samples is large enough that the probability of observing no failure is less than 0.1, one may assume that the bias is negligible. As a result, such a threshold would be N > −1/log(y). Such a threshold is 230 and 45 for 99% and 95% yields, respectively. The validity of these numbers are verified in the Fig. 7. Therefore, given an approximate estimation of y from the classical control variate method, one may continue the timing simulation by using QMC/LHS samples and applying the order statistics-base control variate estimator. VII. Conclusion In this paper, three MC-based timing yield estimation techniques for digital circuits were proposed. The principal drawback of MC techniques is the slow convergence rate. Advanced sampling techniques and the control variate method were applied to reduce the number of simulations. The following three methods were investigated. 1) An optimized-discrepancy QMC/LHS engine: this method provides a greater variance reduction than the non-optimized QMC/LHS method. The quality of the results are almost circuit-independent, but not significantly better than the traditional-MC, especially for low number of samples (e.g., < 500). 2) An order-statistics based control variate engine: the number of samples reduction achieved by this method can reach to an order of magnitude. However, in circuits where there are many paths with zero slack, the saving can drop significantly. Moreover, this method is highly biased when using low number of samples, and the number of samples required to lessen the bias is also yield-dependent. 3) A classical control variate engine: this method models the critical delay with a generic distribution. Therefore, it is inherently biased. However, it is a good candidate for early stage timing yield estimation where the first two techniques are either slow or highly biased. References [1] D. Blaauw, K. Chopra, A. Srivastava, and L. Scheffer, “Statistical timing analysis: From basic principles to state-of-the-art,” IEEE Trans. Comput.-Aided Design, vol. 27, no. 4, pp. 589–607, Apr. 2008. [2] L. Scheffer, “The count of Monte Carlo,” in Proc. ACM/IEEE TAU, 2004 [Online]. Available: http://www.tauworkshop.com/ [3] S. Tasiran and A. Demir, “Smart Monte Carlo for yield estimation,” in Proc. ACM/IEEE TAU, 2006 [Online]. Available: http://www.tauworkshop.com/ [4] V. Veetil, D. Sylvester, and D. Blaauw, “Efficient Monte Carlo based incremental statistical timing analysis,” in Proc. IEEE/ACM Des. Autom. Conf., 2008, pp. 676–681. [5] A. Singhee, S. Singhal, and R. A. Rutenbar, “Practical, fast Monte Carlo statistical static timing analysis: Why and how,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, 2008, pp. 190–195. [6] J. Jaffari and M. Anis, “On efficient Monte Carlo-based statistical static timing analysis of digital circuits,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design, 2008, pp. 196–203. [7] H. Niederreier, Random Number Generation and Quasi-Monte Carlo Methods (CBMS-NSF Regional Conference Series in Applied Mathematics, no. 63). Philadelphia, PA: SIAM, 1992.

JAFFARI AND ANIS: ADVANCED VARIANCE REDUCTION AND SAMPLING TECHNIQUES FOR EFFICIENT STATISTICAL TIMING ANALYSIS

[8] J. F. Traub and A. G. Werschulz, Complexity and Information. Cambridge, MA: Cambridge University Press, 1998. [9] R. Caflisch, W. Morokoff, and A. Owen, “Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension,” J. Comput. Finance, vol. 1, no. 1, pp. 27–46, 1997. [10] X. Wang and K. T. Fang, “The effective dimension and quasi-Monte Carlo integration,” J. Complexity, vol. 19, no. 2, pp. 101–124, Apr. 2003. [11] X. Wang and I. H. Sloan, “Why are high-dimensional finance problems often of low effective dimension?” SIAM J. Sci. Comput., vol. 27, no. 1, pp. 159–183, 2005. [12] I. M. Sobol, “The distribution of points in a cube and the approximate evaluation of integrals,” U.S.S.R. Comp. Math. Math. Phys., vol. 7, no. 4, pp. 86–112, 1967. [13] P. Bratley and B. L. Fox, “Algorithm 659: Implementing Sobol’s quasirandom sequence generator,” ACM Trans. Math. Softw., vol. 14, no. 1, pp. 88–100, 1988. [14] F. Brglez and H. Fujiwara, “A neutral netlist of combinational benchmark circuits and a target translator in Fortran,” in Proc. IEEE Int. Symp. Circuits Syst., Jun. 1985, pp. 663–698. [15] F. Brglez, D. Bryan, and K. Koiminski, “Combinational profiles of sequential benchmark circuits,” in Proc. IEEE Int. Symp. Circuits Syst., 1989, pp. 1929–1934. [16] A. Agarwal, D. Blaauw, V. Zolotov, S. Sundareswaran, M. Zhao, K. Gala, and R. Panda, “Statistical delay computation considering spatial correlations,” in Proc. IEEE/ACM Asia South Pacific Des. Autom. Conf., 2003, pp. 271–276. [17] J. Xiong, V. Zolotov, and L. He, “Robust extraction of spatial correlation,” IEEE Trans. Comput.-Aided Design, vol. 26, no. 4, pp. 619–631, Apr. 2007. [18] P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos, “Modeling within-field gate length spatial variation for process-design co-optimization,” in Proc. SPIE, 2005, pp. 178–188. [19] Capo: A Large-Scale Fixed-Die Placer from UCLA [Online]. Available: http://vlsicad.ucsd.edu/GSRC/bookshelf/Slots/Placement [20] S. M. Ross, Simulation, 4th ed. New York: Academic, 2006. [21] M. D. McKay, W. J. Conover, and R. J. Beckman, “A comparison of three methods for selecting values of input variables in the analysis of output from a computer code,” Technometrics, vol. 21, no. 2, pp. 239– 245, May 1979. [22] J. H. Halton, “On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals,” Numerische Math., vol. 2, no. 1, pp. 84–90, Dec. 1960. [23] B. L. Fox, “Algorithm 647: Implementation and relative efficiency of quasirandom sequence generators,” ACM Trans. Math. Softw., vol. 12, no. 4, pp. 362–376, 1986. [24] H. Niederreiter and C. Xing, “Low-discrepancy sequences and global function fields with many rational places,” Finite Fields Their Applicat., vol. 2, no. 3, pp. 241–273, Jul. 1996. [25] I. H. Sloan and H. Wozniakowski, “When are quasi-Monte Carlo algorithms efficient for high dimensional integrals?” J. Complexity, vol. 14, no. 1, pp. 1–33, 1998. [26] A. Papageorgiou, “Sufficient conditions for fast quasi-Monte Carlo convergence,” J. Complexity, vol. 19, no. 3, pp. 332–351, Jun. 2003. [27] C. LeMieux and A. Owen, “Quasi-regression and the relative importance of the ANOVA components of a function,” in Monte Carlo and QuasiMonte Carlo Methods. Berlin, Germany: Springer, 2002, pp. 331–344. [28] A. Jian and A. Owen, “Quasi-regression,” J. Complexity, vol. 17, no. 4, pp. 588–607, 2001. [29] H. Chang and S. Sapatnekar, “Statistical timing analysis under spatial correlations,” IEEE Trans. Comput.-Aided Design, vol. 24, no. 9, pp. 1467–1482, Sep. 2005. [30] M. Keramat and R. Kielbasa, “Latin hypercube sampling Monte Carlo estimation of average quality index for integrated circuits,” Analog Integr. Circuits Signal Process., vol. 14, pp. 131–142, Sep. 1997. [31] W. J. Morokoff and R. E. Caflisch, “Quasi-random sequences and their discrepancies,” SIAM J. Sci. Comput., vol. 15, no. 6, pp. 1251–1279, Nov. 1994. [32] H. Niederreiter, “Point sets and sequences with small discrepancy,” Monatshefte Math., vol. 104, pp. 273–337, Dec. 1987. [33] D. E. Knuth, The Art of Computer Programming, Vol. 2: Seminumerical Algorithms, 2nd ed. Reading, MA: Addison-Wesley, 1981. [34] P. Jaeckel, Monte Carlo Methods in Finance. New York: Wiley, 2002. [35] I. M. Sobol, “Uniformly distributed sequences with an additional uniform property,” U.S.S.R. Comput. Math. Math. Phys., vol. 16, no. 5, pp. 236–242, 1976. [36] J. Cheng and M. J. Druzdzel, “Computational investigation of low-discrepancy sequences in simulation algorithms for Bayesian

[37] [38] [39] [40] [41] [42] [43]

1907

networks,” in Proc. Annu. Conf. Uncertainty Artif. Intell., 2000, pp. 72–81. H. Niederreiter, “Constructions of (t, m, s)-nets and (t, s)-sequences,” Finite Fields Their Applicat., vol. 11, pp. 578–600, Jan. 2005. H. S. Hong and F. J. Hickernell, “Algorithm 823: Implementing scrambled digital sequences,” ACM Trans. Math. Softw., vol. 29, no. 2, pp. 95–109, 2003. S. S. Lavenberg and P. D. Welch, “A perspective on the use of control variables to increase the efficiency of Monte Carlo simulations,” Manage. Sci., vol. 27, pp. 322–335, Mar. 1981. J. C. Hsu and B. L. Nelson, “Control variates for quantile estimation,” Manage. Sci., vol. 36, pp. 835–851, Mar. 1990. H. A. David, Order Statistics, 2nd ed. New York: Wiley, 1981. A. Azzalini, “A class of distributions which includes the normal ones,” Scandinavian J. Statist., vol. 12, no. 2, pp. 171–178, 1985. X. Li, J. Le, P. Gopalakrishnan, and L. T. Pileggi, “Asymptotic probability extraction for nonnormal performance distributions,” IEEE Trans. Comput.-Aided Des., vol. 26, no. 1, pp. 16–37, Jan. 2007.

Javid Jaffari (S’00) received the B.S. degree in bioelectrical engineering from the Science and Research Branch, Azad University, Tehran, Iran, in 2002, the M.S. degree in electrical engineering from the University of Tehran, Tehran, in 2005, and the Ph.D. degree in electrical and computer engineering from the University of Waterloo, Waterloo, ON, Canada, in 2010. In his M.S. dissertation, he worked on low-power test and leakage power reduction techniques for deep sub-micron digital very large scale integration (VLSI) circuits. His Ph.D. research was on statistical yield analysis and design for nanometer VLSI. He is currently with IGNIS Innovation, Inc., Montreal, QC, Canada, where he develops intelligent algorithms to improve the activematrix organic light-emitting diode display’s yield, uniformity, and aging. He has authored/co-authored over 20 papers in international journals and conferences.

Mohab Anis (S’98–M’03–SM’09) received the B.S. degree (with honors) in electronics and communication engineering from Cairo University, Cairo, Egypt, in 1997, and the M.A.Sc. and Ph.D. degrees in electrical engineering from the University of Waterloo, Waterloo, ON, Canada, in 1999 and 2003, respectively. He holds an MBA degree with a concentration in entrepreneurship and innovation and an M.S. degree with a concentration in management of technology. He is currently an Associate Professor with the Department of Electronics Engineering, American University in Cairo, New York. Previously, he was a Tenured Associate Professor with the University of Waterloo, where he is now an Adjunct. He has authored/co-authored over 100 papers in international journals and conferences and is the author of two books: Multi-Threshold CMOS Digital Circuits-Managing Leakage Power (Kluwer, 2003) and Low-Power Design of Nanometer FPGAs: Architecture and EDA (Morgan Kaufmann, 2009). His current research interests include integrated circuit design and design automation for very large scale integration systems in the nanometer regime. Dr. Anis is an Associate Editor of the ACM Transactions on Design Automation of Electronic Systems, the Microelectronics Journal, the Journal of Circuits, Systems and Computers, and the ASP Journal of Low Power Electronics. He is an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I. From 2008 to 2009, he was an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS. He is a member of the program committees for several IEEE and ACM conferences, and is the Chair of the 2010 International Conference on Microelectronics. He was the recipient of the 2009 Early Research Award from Ontario’s Ministry of Research and Innovation, the 2004 Douglas R. Colton Medal for Research Excellence in recognition of his excellence in research, leading to new understanding and novel developments in microsystems in Canada, and the 2002 International Low-Power Design Contest. He is an advocate of technological innovation within organizations, for which he founded INNOVETY LLC, Egypt, a management consulting and software development firm that focuses on innovation management.