Why We Need Statistical Static Timing Analysis

Report 3 Downloads 104 Views
Why We Need Statistical Static Timing Analysis Cristiano Forzan and Davide Pandini Central CAD and Design Solutions STMicroelectronics, Agrate Brianza, 20041 Italy [email protected], [email protected] of variability is to run multiple static timing analyses at different “corners” that include “best-”, “nominal-”, and “worst-case”. This approach is breaking down because STA would necessitate 2n runs, where n is the number of variability sources. By considering the list of the principal sources of variation and their impact on delay as reported in [2], a complete corner-case analysis requires from 27 to 220 STA runs! A possible solution to reduce the number of timing analyses is design and verification in worst-case. Worst-case timing analysis determines the design timing performance by assuming that the worst process and operating conditions exist simultaneously, based on the assumption that if a circuit works correctly under the most pessimistic conditions, then it will work correctly under nominal conditions. Therefore, designing for extreme conditions would automatically take care of the nominal case. However, considering the corner values for each electrical parameter may lead to an over-pessimistic estimation of the performance. This stems from the fact that the actual correlation between variations is not considered. In other words, the scenario with all parameters in their worst-case values has really a minimal probability to happen in practice, and in several cases it cannot happen at all. As an example, by considering the variation impact on delay reported in [2], the worst-case approach will require a [-65%, +80%] guard-band timing interval, thus leading to a strong underutilization of the technology. Furthermore, within-die variations are more significant as predicted by ITRS. For technologies till 180nm, the on-chip variations were mostly within 10%. However, shrinking the technology down to 90nm, 65nm, and below may introduce on-chip variations larger than 50% with respect to the nominal parameter value. These variations may be handled by existing corner-based design methodologies only by applying different derating factors for data-path and clock-path delay, and/or by introducing uncertainty margins. Another drawback of the worst-case approach is that it does not provide information to designers about the sensitivity to various parameters, which can potentially be very useful in driving the optimization efforts towards a more robust design.

Abstract As technology continues to advance deeper into the nanometer regime, a tight control on the process parameters is increasingly difficult. As a consequence, variability has turned out to be a dominant factor in the design of complex ICs. Traditional Static Timing Analysis (STA) is becoming insufficient to accurately evaluate the process variation impact on the design performance considering the increasing number of process, power supply voltage, and temperature (PVT) corners. In contrast, Statistical Static Timing Analysis (SSTA) is a promising innovative technique to handle increasingly larger environmental and process fluctuations, especially on-chip parameter variations. However, the statistical approach needs a set of costly additional data such as an accurate process variation description, and a statistical standard cell library characterization. In this paper, STA and SSTA are applied on a real industrial design to compare the two techniques, in terms of both accuracy and cost. From our analysis, we have concluded that the potential advantages offered by SSTA exceed the additional library characterization cost and process data assembly effort.

1. Introduction Following the aggressive technology scaling trends and the fundamental limits of optical lithography, the gap between the designed layout and what is really fabricated on silicon is increasing. As a consequence, performances predicted during the design implementation may significantly differ from post-silicon measurements. Furthermore, with the increasing difficulty in process control in nanometer technologies, manufacturing variations are growing as a percent of feature sizes [1]. The number of variability sources is also growing as the fabrication processes become more and more complex, and the correlation between different variability sources is also more difficult to predict. The parameter fluctuations cause parametric yield loss, i.e., performance degradation, and the fabricated chips do not function as required by specification. Traditionally, the methodology adopted to determine the timing performance spread of a design in presence

1-4244-1258-7/07/$25.00 ©2007 IEEE

91

data from silicon characterization, different approaches to exploit SSTA should be explored both by IDMs and fabless design houses. In this work, we analyze and propose alternative methodologies to utilize SSTA without signing off, in order to show the potential benefits of this technique in advanced nanometer technologies. The rest of this paper is organized as follows: in Section 2 two experiments demonstrating the limitations of corner-based analysis and the advantages of SSTA are described. The preparation of the data necessary to perform statistical timing analysis is discusses in Section 3, while in Section 4 the results of SSTA applied to a real industrial design are reported and discussed. Finally, Section 5 summarizes a few conclusive remarks.

Figure 1. Propagation delay: percentage difference between 3σ MC and worst-case on the characterization grid for an INVERTER in 65nm CMOS technology

2. Process Corners vs. Performance Corners

A solution for accurately evaluating the design performance in presence of variability is SSTA. Starting from the statistics of the variability sources, including their probability distributions, variances, and covariances, statistical timing analysis allows computing the probability distribution of the design slack in a single analysis. The slack distribution brings several advantages. For products that are at-speed tested and binned like microprocessors, it allows to forecast the chip percentage that will fall in the highest frequency bin. More in general, it predicts the true operating frequency. Therefore, this approach may reduce and even eliminate the need for broad guard-banding [3]. Moreover, a statistical timer may properly identify and optimize the sensitivities with respect to variations, thus allowing statistical optimization methods. Several algorithms and techniques have been proposed in literature for an accurate and efficient SSTA. The approaches can be divided into various categories like block-based or path-based, parameterized or non-parameterized statistical timing analysis, based on the assumption of Gaussian linear process parameters, or more generally considering the non-Gaussian non-linear process parameter behavior, addressing both timing verification and circuit optimization [4][5][6][7][8][9]. Driven by the increasing impact of parametric yield loss in nanometer designs, at every conference on design automation there are sessions and panels dedicated to statistical analysis and optimization. In this work, we are not attempting to propose algorithmic or modeling techniques for SSTA. In contrast, we believe that a critical issue that has not been adequately addressed so far is how we use the currently available methods for SSTA in existing design flows, given the present limited and quite often lack of available process data to model the process parameter variability and their spatial and temporal correlation. Hence, we believe that it is not yet possible to use SSTA for a reliable sign-off, and given the current difficulty to obtain accurate process

In this Section two simple experiments are described to show both at gate and path level the benefits of SSTA. In particular, we demonstrate that there is not any single process corner that yields the worst timing results. It is important to notice that this result may potential jeopardize the robustness of current sign-off methodologies based on worst-case corners.

2.1. Process Corners vs. Input Slew and Output Load The first experiment shows that the process corner depends on both the input slew and output capacitive load. It also demonstrates that the process corner also depends on both the standard cell driving capability and functionality. We analyzed the propagation delay characterization of an INVERTER in 65nm CMOS technology. The delay values stored in the libraries at a given PVT corner are obtained by performing a circuit simulation for each combination of input slew and output capacitive load, whose values are defined in the library cell look-up table. When worst-corner characterization is considered, the worst process corner is used during these simulations. However, the worst process corner does not necessarily correspond to the worst timing value. Worst-case timing is defined as the 3σ value of the timing distribution obtained with Monte Carlo (MC) analysis carried out with a transistor-level circuit simulator. Following this definition, we performed MC analysis on each point (10K runs for each input slew/output load) of the characterization grid, using the process parameter distributions provided by manufacturing. Subsequently, we compared the 3σ value of the timing distribution against the value obtained with the worst-case corner. The percentage difference is illustrated in Figure 1. It can be observed that the difference between the worst timing value (computed with MC)

92

Optimistic for short paths

The variance as a percentage of the path delay mean value is thus expected to decrease with the square root of the number of stages. The standard deviation reduction with the path length in (2) is quite intuitive, and it is due to some cancellations between uncorrelated components along the path. The results of these tests are summarized in Figure 2 where the ratio σ/µ in (1) and (2) for both the die-to-die and the mismatch variations are reported as a function of the path length. The combined effect of the two variability sources that are supposed to be uncorrelated is illustrated by the red plot. With the traditional worst-case approach, the variation impact in STA is typically bounded by timing derating factors. The derating technique is represented in Figure 2 as a straight line, which is constant as a function of the number of stages (green dashed plot). Therefore, timing derating is an appropriate technique to model the die-to-die variations, while it is not adequate to accurately model the mismatch variations and more in general OCVs. For example, by choosing to accurately model the variation for a path length of 3 stages, as in illustrated in Figure 2, we obtain a pessimistic modeling for longer paths and an optimistic modeling for shorter paths. Since the on-chip component of variations is expected to increase as technology scales down, the derating solution will become more and more inaccurate leading to both a risky and overly pessimistic performance analysis.

Stddev / mean

Pessimistic for long paths

Die-to-Die Variations Mismatch Variations Die-to-Die and Mismatch Timing Derating

1

2

3

4

5 6 7 8 Number of Stages

9

10

11

12

Figure 2. Die-to-die and mismatch variations as a function of the path length

and the timing of the worst process corner (stored in the libraries) is a function of both input slew and output capacitive load. Similar results were obtained by repeating the experiment with different standard cells and/or different driving strengths, thus demonstrating that the process corner depends on both cell functionality and size.

2.2. Die-to-die and Mismatch Impact on Path Delay We considered a simple twelve-NAND chain, and we measured the path delay between the first stage and the downstream stages along the chain. For each path, we considered the ratio between the standard deviation and nominal value of the delay distribution. Initially, we performed MC analysis considering only die-to-die variations. Because of the perfect correlation between the parameters, the ratio between sigma and mean of the path delay is given by [11]: n

σ  =  µ    path

3. Additional Data for SSTA In order to have the capability of accurately predicting the distribution of design performance, a statistical timing analyzer must access robust and accurate models for the sources of variation, including the type of probability distribution and variances. The variability characterization is costly, because of the large amount of data required to estimate distributions and correlations, which results in a large number of characterization test structures, silicon area, and testing resources. The other additional information necessary for SSTA is the dependence of timing quantities like the propagation delay and transition time on process variations. Concerning the devices, timing must be re-characterized to take into account the physical variations. As an example, the propagation delay can be described as the sum of its nominal value and a function of the parameter variations: delay = delaynom + f ( p1 , p2 ,", pk ) . (3) By assuming small fluctuations of the process parameters around their nominal values, function f in (3) can be approximated by its first-order Taylor expansion. Hence, (3) can be expressed by:

n

∑∑ ρ σ σ ij

i

i =1 j =1

=

n

∑µ

j

i

2 n 2 ⋅ σ cell

n ⋅ µ cell

σ  . =    µ  cell

(1)

i =1

Therefore, when considering only die-to-die variations, the ratio (1) as a function of the path-length is constant. During the second test, we performed MC analysis of the twelve-NAND chain with only the mismatch variations, which are part of the on-chip variations (OCV). In this case, since the parameters are uncorrelated, the ratio between sigma and mean of the path delay is given by [11]: n

σ    =  µ  path

∑σ i =1 n

∑µ

i

2 i

=

2 n ⋅ σ cell

n ⋅ µ cell

=

σ  ⋅   . n  µ  cell

1

(2)

i =1

93

k

delay = delay nom + ∑ i =1

∂delay ⋅ ∆pi , ∂p i

tistical extraction tool along with their sensitivities to the sources of variation.

(4)

where the term ∂delay ∂pi is the delay deviation with

4. Case Study and Experimental Results

respect to the deviation of the process parameter pi and is called sensitivity. By assuming that the process parameter variations ∆pi are described by Gaussian distributions, (4) allows to easily computing the delay Gaussian distribution. This method is known as the linear sensitivity approach, and during statistical library characterization both the nominal value and the sensitivities with respect to each parameter are evaluated. This task must be carried out for each timing quantity described in the library. In contrast, the environmental sources of variation, such as power supply and temperature, are typically kept separated from the spatial variations, as their impact on cell delay is significantly nonlinear. Therefore, a mixed approach is generally adopted, by performing the statistical library characterization with respect to the process variations around different (power supply, temperature) corners. It is important to notice that the assumptions underlying (4) are not valid in general. The linear impact of process parameters on delay is justified only for small variations. However, as critical feature size shrinks, process variability is getting larger and the linear approximation is becoming less and less accurate [10]. Regarding the Gaussian distribution assumption, some process parameters have a significant non-Gaussian probability distribution. In particular, via resistance is known to have an asymmetric probability distribution. Furthermore, the result of the max operation (performed during the block-based analysis) between Gaussians, is not really a Gaussian distribution. Major efforts have been devoted by researchers to extend parameterized statistical timing analysis to non-Gaussian and non-linear parameters [8][9]. However, the application of the proposed techniques is expected to significantly impact the SSTA run-time, and a trade-off between accuracy and run-time is needed when SSTA is performed on modern multi-million gate industrial designs, with a large number of variability sources. It is important to point out that even by assuming a linear behavior of the delay as a function of the process parameter variations, the statistical library characterization is a very critical issue, as it requires significantly more computational effort, characterization time, accuracy improving, and larger file size with respect to traditional library characterization. Again, a correct balance between cost and accuracy must be carefully analyzed. Similarly, a linear approach can be adopted also for wire parasitics, which can be computed by the sta-

In this Section, the results of statistical analysis applied to an industrial design (about 400K gates) manufactured in 90nm CMOS technology will be analyzed in detail. The SSTA tool used in this case study implements both the block-based and path-based statistical timing approaches, along with a statistical library characterization engine, and a variation-aware parasitic extractor. We re-characterized all the standard cell libraries where the design is mapped considering the following primary sources of variation: channel length, channel width, oxide thickness and threshold voltage for both nMOS and pMOS devices. The total time required by the statistical characterization performed around three different power supply/temperature corners was between 4X and 5X the correspondent time necessary to perform the traditional library characterization. For the parasitics extraction, we provided the process stackup technology described by the Interconnect Technology File to the SSTA tool, which generated the extraction rules based on its built-in field solver. The computational time necessary to generate the variation-aware extraction rules was comparable to the time necessary to perform the same operation with other commercial extractors without accounting for process variability.

4.1. Statistical Slack vs. Arrival/Required Times A very interesting result demonstrating the effectiveness of SSTA is related to the statistical slack. First, we considered the required and arrival times for an endpoint, and we performed the traditional (deterministic) timing analysis at the worst- and best-process corners. Then, we carried out SSTA, considering only the device variations, obtaining the timing distributions. As expected, the distributions for both the required and arrival times are bounded by best- and worst-case analysis. However, in timing analysis the path criticality is expressed by its slack value, which is defined as the difference between the required and the arrival time. Therefore, we used statistical and worst-/best-case techniques to compute the slack, obtaining the results shown in Figure 3. From the analysis of the required, arrival, and slack values, we can observe that while the required and arrival distributions are bounded by worst/best-case analysis, the slack distribution is not: the 3σ value of the slack distribution is negative (-0.238ns) even if both worst and best values are positive. This means that the process corner is not necessarily the slack corner: the worst slack value does not necessarily occur at worst or best process corner. While both the

94

Table 1. Critical path and design slack

SLACK

distribution for the entire design

STATISTICAL APPROACH (0.280ns , 0.183ns)

STATISTICAL APPROACH 3 sigma: -0.268ns BEST CASE 0.240 ns

Mean

σ

Critical path slack (ns)

0.40171

0.13246

0.00433

Design slack (ns)

0.38389

0.12956

-0.00479

Mean/3σ

can potentially be very useful to obtain a more robust design.

WORST CASE 0.360 ns

4.2. Design Slack vs. Critical Path Slack

-0.5

0

0.5

Another important effect that cannot be accurately taken into account by traditional STA is the computation of the design slack. By definition, the design slack is the minimum over all the path slacks. When traditional static timing analysis is performed, each slack is just a number; therefore, the design slack is equal to the slack of the most critical path. In contrast, in SSTA the slacks are probability distributions; hence, the min operation is performed between distributions, yielding another probability distribution. Since the slacks are in general not perfectly correlated, the mean of the min distribution is smaller than the min of the slack distribution means. In such cases, the traditional STA may provide an optimistic result. In the design under analysis we had a dominant critical path, therefore this effect did not appear. However, in order to quantify the impact of such effect we disabled the dominant path, obtaining the critical path and design slack distributions reported in Table 1. Even though in this case the difference is very small, i.e. about 9ps, by considering the design slack we could have a potential violation that was not predicted by the critical path slack.

1

Time (ns)

Figure 3. Slack obtained with SSTA vs. worst-/best-case corner

arrival and required time are necessarily monotonic with respect to process parameters, the slack is not and its worst value does not occur at the min/max values of the parameters. By focusing our attention on a critical path that includes both High Threshold Voltage (HVT) devices and Standard Threshold Voltage (SVT) devices, we discovered one of the reasons why this happens. After a SSTA run, the list of process parameters that mostly affected the slack distribution along with their sensitivities can be reported. In particular, for the path under analysis we obtained that the sensitivities with respect to the threshold voltages of the HVT and SVT devices had opposite signs, meaning that a SVT MOS threshold voltage increase results in a slack decrease (as most of the data-path cells are SVT cells) thus increasing the arrival time, while an HVT MOS threshold voltage increase results in a slack increase (as for power reason the clock-tree of the latching FlipFlop is composed only by HVT cells) therefore increasing the required time. In our analysis, since the actual correlation value was not available we assumed that the SVT and HVT threshold parameters were highly (but not completely) correlated, since the HVT threshold with respect to the SVT threshold is obtained by means of a further process step (additional implantation). Therefore, during SSTA those parameters may move in opposite directions, providing the result reported above. Instead, when traditional timing analysis is performed, the process parameters are completely correlated, since they are all considered at their worst cases. In conclusion, SSTA can detect a situation where the slack is negative while traditional timing analysis cannot. Moreover, after a statistical timing analysis run the sensitivities of the timing quantities with respect to the process parameters can be reported to the designers, and

4.3. SSTA vs. STA Another important result was obtained by comparing SSTA against traditional STA. We performed STA over the industrial design around worst-case, obtaining the slack value for each path. We sorted the paths based on their increasing STA slack value, obtaining the black curve shown in Figure 4 based the top 100 critical paths. The black curve represents the typical slack behavior after an optimization based on traditional STA. Then, for each path, we performed statistical timing analysis and we plotted the slacks evaluated at 3σ (red plot). SSTA was performed by using the same conditions listed in the previous paragraph. By comparing STA slacks against SSTA slacks, we may derive some interesting observations: • The oscillating behavior of the SSTA curve means that paths are sorted differently when their slacks are computed with SSTA instead of STA. In other words, SSTA yields a different path criticality order with respect to traditional STA;

95

Finally, we have also shown that SSTA allows to increasing the design performance without introducing overly pessimistic margins, thus taking full advantages of technology scaling. Therefore, even if SSTA is more costly both in term of input data and run-time with respect to STA, we may conclude that in the next technology nodes the adoption of this emerging methodology for both sign-off and timing/power optimization will become crucial.

1

Slack (ns)

0.5

Traditional STA analysis

0

Statistical STA analysis at 3sigma 10% performance improvement 15% performance improvement

6. References

-0.5 0

20

40 60 Paths sorted by STA analysis

80

100

[1] S. R. Nassif, “Modeling and Analysis of Manufacturing Variations,” in Proc. Custom Integrated Circuits Conference, May 2001, pp. 223-228.

Figure 4. SSTA vs. STA: top 100 critical paths







The 3σ slacks obtained with SSTA are generally larger than the corresponding STA slacks. This means that SSTA allows to reducing the level of pessimism introduced by the worst-case approach; For a small number of paths, SSTA provides a 3σ slack value smaller than the slack computed by STA (this result is related to the correlation between parameters and it was discussed in a previous section); By fixing only a small number of paths a 10% performance improvement (measured with respect to the faster clock period and represented by the blue dashed line) could be achieved; in contrast, it would be very hard to gain a 15% of performance improvement, corresponding to the black dashed line.

[2] L. Stok and J. Koehl, “Structured CAD: Technology Closure for Modern ASICs,” Tutorial, DATE, Mar. 2004. [3] C. Visweswariah, “Death, Taxes and Failing Chips,” in Proc. Design Automation Conf., Jun. 2001, pp. 343-347. [4] H. Chang and S. S. Sapatnekar, “Statistical Timing Analysis Considering Spatial Correlations using a Single PERT-like traversal,” in Proc. Intl. Conf. on Computer-Aided Design, Nov. 2003, pp. 621-625. [5] A. Agarwal, D. Blaauw, and V. Zolotov, “Statistical Timing Analysis for Intra-Die Process Variations with Spatial Correlation,” in Proc. Intl. Conf. on ComputerAided Design, Nov. 2003, pp. 900-907. [6] C. Visweswariah, K. Ravindran, K. Kalafala, S. G. Walker, and S. Narayan, “First-Order Incremental block-based Statistical Timing Analysis,” in Proc. Design Automation Conf., Jun. 2004, pp. 331-336.

After SSTA the design yield can be predicted before manufacturing, thus allowing estimating if the forecasted profit margins are satisfying with respect to the soaring NRE and mask cost in 65nm and below. Moreover, after the automatic design optimization based on the sensitivities computed during statistical analysis, the overall design performance and robustness could be increased without introducing excessive pessimistic margins, as the timing most critical paths can be optimized by replacing only the most sensitive cells to parameter variations.

[7] A. Agarwal, K. Chopra, D. Blaauw, and V. Zolotov, “Circuit Optimization using Statistical Static Timing Analysis,” in Proc. Design Automation Conf., Jun. 2005, pp. 321-324. [8] H. Chang, V. Zolotov, S. Narajan, and C. Visweswariah, “Parameterized Block-Based Statistical Timing Analysis with Non-Gaussian Parameters, Nonlinear Delay Functions,” in Proc. Design Automation Conf., Jun. 2005, pp. 71-76. [9] Y. Zhan, A. J. Strojwas, X. Li, L. T. Pileggi, D. Newman, and M. Sharma, “Correlation-Aware Statistical Timing Analysis with Non-Gaussian Delay Distribution,” in Proc. Design Automation Conf., Jun. 2005, pp. 77-82.

5. Conclusions In this paper we have shown that SSTA can be performed on an industrial design in an affordable runtime. Furthermore, we have demonstrated some of the advantages of SSTA with respect to traditional STA. In particular, we have shown at different levels of analysis (standard cell-level, path-level, and design-level) that only statistical timing analysis can model the real corners. This is a very important result, as there is not any single process corner that can capture the worst timing result.

[10] X. Li, J. Le, P. Gopalakrishnan, and L. T. Pileggi, “Asymptotic Probability Extraction for non-Normal Distributions of Circuit Performance,” in Proc. Intl. Conf. on Computer-Aided Design, Nov. 2003, pp. 1-9. [11] C. S. Amin, N. Menezes, K. Killpack, F. Dartu, U. Choudhury, N. Hakim, and Y. I. Ismail, “Statistical Static Timing Analysis: How Simple Can We Get?,” in Proc. Design Automation Conf., Jun. 2005, pp. 652-657.

96