Accurate Statistical Soft Error Rate (SSER) Analysis Using A Quasi-Monte Carlo Framework With Quality Cell Models Yu-Hsin Kuo, Huan-Kai Peng, and Charles H.-P. Wen Department of Electrical Engineering National Chiao Tung University, Hsinchu, Taiwan E-mail: {
[email protected],
[email protected],
[email protected]}
I. Introduction Formerly only concerned in memory, soft errors have emerged to be one of the major failure mechanisms for logic circuits in sub-90nm technologies. Such errors result from radiation-induced transient faults latched by stateholding elements, which depends on three masking effects [1]: logical, electrical and timing maskings. As shown in Figure 1, logical masking occurs when the input value of one cell blocks the propagation of the transient fault under one input pattern. One transient fault attenuated by electrical masking may further disappear due to cell’s electrical properties. Timing masking occurs when the survival transient faults arrives one state-holding element outside its window of clock transition. Subject to the three mechanisms, numerous researches are presented to evaluate soft error for logic circuits. The work in [3] propagates transient faults through one gate according to the logic function and meanwhile uses analytical models to electrically evaluate the change of transient faults. A refined model is presented in [4] to incorporate non-linear transistor current, which is further applied to all gates with different charges deposited. A static analysis is also proposed in [10] for timing masking by computing backwards the propagation of the error-latching windows efficiently. Consequently, soft error rate (SER) has become a key metric for circuit reliability and been extensively investigated. SERA[5] computes SER by means of a wave-
v1
v2
flip-flops
particle strike
timing masking
electrical masking
flip-flops
logical masking
Fig. 1. Three masking mechanisms for soft errors
form model to consider the electrical attenuation effect and error-latching probability while ignoring logical masking. Whereas MARS-C[9] applies the symbolic technique to both logical and electrical maskings and scales the error probability according to the specified clock period, AnSER[10] applies signature observability and latching window computation for logical and timing maskings to estimate SER for circuit hardening. SEAT-LA[6] and the algorithm in [7] simultaneously characterize cells, flipflops and propagation of transient faults by waveform models and result in good SER estimate when comparing to SPICE simulation. Recently, process variations that worsens in sub-90nm technologies have brought a paradigm shift to soft-error research. The authors of [13] first investigate the various sources of process variations, and conclude that the traditional static approach will underestimate circuit SER in presence of process variations [14]. More specifically, according to Figure 2 from [11], static approaches will underestimate circuit SER by up to 50% under the process variation σproc = 5% (±3 σproc covers 99.73% of the distribution), or over 100% under σproc = 10%. However, although [14] and [11], respectively, proposes a symbolic100 80
SER (uFIT)
Abstract— For CMOS designs in sub 90nm technologies, statistical methods are necessary to accurately estimate circuit SER considering process variations. However, due to the lack of quality statistical models, current statistical SER (SSER) frameworks have not yet achieved satisfactory accuracy. In this work, we present accurate table-based cell models, based on which a Monte Carlo SSER analysis framework is built. We further propose a heuristic to customize the use of quasirandom sequences, which successfully speeds up the convergence of simulation error and hence shortens the runtime. Experimental results show that this framework is capable of more precisely estimating circuit SSERs with reasonable speed.
60
Static Monte Carlo
40 20 0 0%
1%
2%
5%
10%
process-variation rate (σproc )
Fig. 2. SER discrepancies between static and Monte Carlo
SPICE simulation w.r.t. process-variation rates
Estimate signal probability
DWAA correction
Pick one strike node ni
Pick one charge q
Cell Modeling
Characterize cells
First-strike Tables
table lookup
Sample r.v. & renew firststrike transient fault
Propagation Tables
table lookup
Sample r.v. & renew propagation transient fault
times ?
+ǻq
low-discrepancy sequences
Į times ? Characterize FFs
FF Latchingwindow Tables
table lookup
Compute SERi
More ni ?
Sum up SERi
SER Estimation
In this section, we describe the SSER analysis framework that considers process-variation impacts for cellbased designs. The proposed framework is illustrated in Figure 3 and mainly consists of four stages: (1) cell modeling, (2) electrical probability computation, (3) signal probability computation and (4) SER estimation. A stage-by-stage explanation of each component will start reversely from SER estimation to cell modeling.
Circuit gate-level netlist
Eletrical Probability Computation
II. SSER Analysis Framework
Process variation parameters
Signal Probability Computation
and statistical-learning-based frameworks for statistical SER (SSER) analysis, their SSER results are not accurate enough, where the main challenge comes from the difficulty of constructing quality cell models for transientfault distributions. In this work, we first build accurate table-based models for transient-fault distributions, according to which a Monte Carlo SSER analysis framework is built. Further, we propose a heuristic to customize the use of quasirandom sequences, which successfully speed up the convergence of simulation error and hence shorten the runtime. From the experimental results, the framework is capable of yielding more accurate SSER results compared to previous works with reasonable speed. The rest of this paper is organized as follows. In Section II, we presents the SSER analysis framework. In Section III, the generation of our table-based cell models is detailed. Then, we propose a heuristic of using quasirandom sequences to speed up the framework in Section IV. Section V describes the experimental results, including the accuracy of our models, the Monte Carlo convergence with and without quasirandom sequences, and the SSERs as well as runtime over a variety of benchmark circuits. Finally, we draw our conclusion and outline future works in Section VI.
Technology library
SERCUT
Fig. 3. The proposed SSER analysis framework
Here Psof t−err (i, q) represents the probability that a transient fault originated from the particle of charge q at node i can result in one soft error at any flip-flop. R(q) represents the effective frequency for a particle hit of charge q in unit time according to [1][5]. That is, R(q) = F × K × A ×
1 −q × exp( ) Qs Qs
(3)
where F , K, A and Qs denote the constants for neutron flux(> 10MeV), the technology-independent fitting parameter, the susceptible area in cm2 and the charge collection slope, respectively. B. Signal probability computation
A. SER estimation We will first introduce the estimation of the overall SER in our framework. The overall SER for the circuit under test (CUT) can be computed by summing up the SER’s of each individual node in the circuit. That is, SERCU T =
NX node
SERi
(1)
i=0
where Nnode is the total number of possible nodes to be struck by radiation particles in the CUT. Each SERi can be further formulated by integrating over the range q = 0 to QM AX the products of particlehit rate and the probability that a soft error can survive. Therefore, Z QM AX SERi = (R(q) × Psof t−err (i, q))dq (2) q=0
Psof t−err (i, q) depends on all three masking effects and can be further decomposed into Nf f
Psof t−err (i, q) =
X
Plogic (i, j) × Pelec (i, j, q)
(4)
j=0
where Nf f denotes the total number of flip-flops in the circuit under test. Plogic (i, j) denotes the overall signal probability of propagating the transient faults through all cells along the path from node i to flip-flop j. It can be computed by multiplying the signal probabilities of all cells as follows. Y Psig (k) (5) Plogic (i, j) = k∈i;j
where k denotes one node on the path i ; j and Psig (k), accordingly, denotes the probability that all input signals of node k jointly determine such that the transient fault is not logically masked on this path.
The handling of reconvergent fanout nodes (RFONs) is an issue of computing signal probability whereas omitting it may cause considerable error [15]. In this work, a linear-time algorithm, dynamic weighted averaging algorithm (DWAA), is employed to consider the RFON effect and fix the signal probability. The main idea behind DWAA is to consider the dependency of signals between the fanout cone and the reconvergent node by forcing the reconvergent signals to the value corresponding to their respective fanins. More details of DWAA can be referred to [15].
models Mstrike and Mprop . As detailed in the next section, they are also the most critical components for an accurate SSER analysis framework due to the difficulty from integrating process-variation impacts. III. Table-based Statistical Models Mstrike and Mprop are respectively the generation and propagation models of pw that is a random variable. According to [11], pw follows the normal distribution, which can be written as: pw ∼ N (µpw , σpw )
C. Electrical probability computation Electrical probability Pelec (i, j, q) considers the electrical and timing masking effects and can be defined as Pelec (i, j, q)
=
Perr−latch (pwj , wj )
=
Perr−latch (λelec−mask (i, j, q), wj ) (6)
where Perr−latch is defined as follows. Definition (Perr−latch , error-latching probability) Assume that the pulse width of one arrival transient fault and the latching window (tsetup +thold ) of one flip-flop are random variables and denoted as pw and w, respectively. Let x = pw − w be a new random variable where µx and σx are its mean and variance. Z µx +3σx 1 x × P(x > 0) × dx (7) Perr−latch (pw, w) = tclk 0 While Perr−latch accounts for the timing making effect, λelec−mask accounts for the electrical masking effect with the following definition. Definition (λelec−mask , electrical masking function) Given the node i where the particle strikes to cause a transient fault and flip-flop j is the destination that the transient fault finally ends at, assume that the transient fault propagates along one path i ; j through v0 , v1 , ..., vm , vm+1 where v0 and vm+1 denote node i and flip-flop j, respectively. λelec−mask (i, j, q) = δprop (· · · (δprop (δprop (pw0 , 1), 2), · · ·), m) | {z } m times
(8)
where pw0 = δstrike (q, i). In Equation (8), δstrike and δprop , respectively, represent the first-strike function and the propagation distribution function of transient faults. D. Cell modeling Since δstrike and δprop are both non-linear functions of distributions, they are non-deterministic in nature and can only be only approximated by efficient and accurate
(9)
Therefore, we decompose Mstrike and Mprop into four µ σ µ σ models: Mstrike , Mstrike , Mprop , and Mprop where each can be defined as: M : ~x 7→ y (10) where ~x denotes a vector of input variables and y is called µ σ the model’s label or target value. For Mstrike and Mstrike , we use input variables including charge strength, driving µ and gate, input pattern, and output loading. For Mprop σ Mprop , we use input variables including input pattern, pin i−1 index, driving gate, input pulse-width distribution (µpw i−1 and σpw ), propagation depth, and output loading. To build these models, a traditional approach is to construct tables according to manually-selected corner cases. However, such approach has two difficulties: first, these models have a lot of input variables so that their combinations enumerating all corner cases are prohibitively expensive. Second, input variables such as input pulse-width distribution are dependent variables in nature, which cannot be specified directly according to pre-selected combinations. Therefore, we use a different approach, as shown in Figure 4, consisting of 3 steps: random sample generation, table fill-up, and table lookup. A. Random sample generation We use a unified Monte Carlo SPICE simulation framework to build the two kinds of models (Mstrike and Mprop ) of distinct mapping spaces, as illustrated by Step 1 of Figure 4. The framework first generates a random path loaded with additional random cells. A charge is then injected as a current source at the beginning of the path according to the following equation [4]: I(q, t) =
t q − t × (e− τα − e τβ ) τα − τβ
(11)
In each Monte Carlo instance, the pulse-width distributions are recorded along the path, which are later collected separately for different models. Note that this framework can be applied to all sources of process variations, as long as each of their impacts can be reflected using SPICE simulation. Also, to build accurate models, it is essential to acquire sufficiently large amount of samples in this step; in our case, for example, 500K.
q
0
Mstrike : q 6 pw 0
Ξ
M prop : pw i 1 6 pw i
Step 3 : Table lookup
Ξ Ξ
Step 2 : Table fill-up
10
2
di s c o c re t m Ξ b ev Ξ in ar Ξ at i a io bl ns e
Step 1: Random sample generation
1
continuous var1
continuous var2
Fig. 4. Construction of table-based models
B. Table fill-up In Step 2 of Figure 4, we classify all samples according to their corresponding input variables to fill up the tables. For discrete variables such as charge strength, driving gate, input pattern, pin index, propagation depth, and output loading (in terms of equivalent-INVs), this can be done directly, which is like having multiple slices of tables, as illustrated in Figure 4. For continuous variables such as the width and height of input pulse, however, we must discretize them to form a number of table cells. It can be done through determining the upper/lower bounds and the number of partitions. For the two bounds, we use the MIN and MAX values of samples sharing the same discrete input variable combination. For the number of partitions, there is a trade-off between table resolution and size: with sufficient samples, a larger number of partitions leads to finer table resolution and accuracy, in expense of a larger table size. To achieve the balance the table size and resolution, an estimate of the table error is: MAX(Ci ) − MIN(Ci ) MEANCi ∈all cells ≤ ˆ (12) MEAN(Ci ) Ci represents the samples within a specific cell; ˆ represents the error rate threshold. MAX, MIN, and MEAN respectively represent the maximum, minimum, and mean of the sample labels of Ci . We iteratively increase the number of partitions and calculate the mean error estimate until it falls below the target threshold. In our case, we found good accuracy can be reached with the number of partitions no more than 25 for all tables. C. Table lookup After all samples are allocated into table cells, there are two types of cells: non-empty cells with a number of
samples and empty cells with none. For non-empty cells, we calculate its lookup value according to the samples within. While there are many ways to do it, we found the mean a good and efficient representative. For the lookup values of empty cells, a traditional approach would be extrapolating them from non-empty ones. However, under sufficiently large amount of random samples, it is very likely that the empty cells originate from unrealistic situations. For example, as in Step 3 of Figure 4, the empty cells are distributed only in the topright and lower-left corners, representing the extremely flat and the extremely sharp transient faults, respectively. Although neither of the two kinds of transient faults exists in reality, accesses to these cells happen during the SSER analysis occasionally as a result of error propagation. In such cases, we use the lookup value of the nearest non-empty cell instead to offset the expected error. IV. Using Quasirandom Sequences Pseudorandom number generation plays a key role to the success of the Monte Carlo method. However, using rand () function for sampling points often suffers from the clustering problem [16] in high dimensional spaces. Figure 5(a) illustrates this problem on an example of generating a (X,Y )-distribution by the Monte Carlo method using the rand () function. The sampling points are observed not evenly scattered among the (X,Y ) plate, which means that these sampling points from pseudorandom generation may not be representative enough for the entire space. The clustering problem motivates research of finding a deterministic sequence such that well-chosen points are distributed in the high-dimensional spaces uniformly. Such sequences are named quasirandom sequences. Figure 5(b) shows the same number of sampling points using quasirandom sequences on the (X,Y ) plate. Sobol algorithm [16] is used to generate the corresponding se-
1.00
1.00
0.75
0.75
0.50
0.50
0.25
TABLE I Summary of table error
0.25
0.00
0.00
0.00
0.25
0.50
0.75
0.00
1.00
0.25
(a)
0.50
0.75
1.00
(b)
cell INV AND OR Average
error rate (%) µ σ µ Mstrike Mstrike Mprop 0.35 0.19 0.38 0.30 0.23 0.36 0.39 0.21 0.37 0.35 0.21 0.37
σ Mprop 1.07 1.35 2.07 1.50
Fig. 5. Distributions from the Monte Carlo methods with random number generation and quasirandom sequences
quences. From Figure 5(b), new sampling points are observed more uniformly distributed over the (X,Y ) plate and thus have better representativeness. Monte Carlo methods with quasirandom sequences are termed Quasi-Monte Carlo (QMC) methods. Given a sampling number N and a √ dimension d, Monte Carlo methods converge with O(1/ N ) simulation errors whereas QMC methods converge with O(1/N ) for optimal cases. Previous research works have demonstrated better results for QMC than MC methods for the problems with ≤ 360 dimensions in finance and physics. Since each gate in the circuit becomes a free dimension (regardless of spatial correlations), the total dimension in the corresponding SSER system can be very high. However, for a large d and moderate N , quasirandom sequences perform no better than the pseudorandom sequences [16]. Besides, high dimensional quasirandom sequences tend to suffer from the clustering problem again. In the worst cases, QMC’s convergence √ rate, O((lnN )d /N ), are even worse than MC’s O(1/ N ) as d goes larger. Therefore, we are motivated to apply dimension reduction to ensure the effectiveness of the proposed QMC framework for SSER analysis. Effective dimensions of circuits can be observed through experiments. Figure 6 shows the convergence rates for four sample circuits where the vertical lines indicate the logic depths (a.k.a. levels) of each circuit. All convergence 40
40
logic depth SER error (%)
SER error (%)
logic depth 30 20 10 0
30 20 10 0
1
5
10
15
20
25
30
35
40
1
5
10 15 20 25 30 35 38 40 45
# dimensions
V. Experimental Results A series of table-based models are built and evaluated in accuracy. These models are then integrated into our SSER analysis framework to evaluate their SER estimation capability. A. Model accuracy We build the table-based models according to Figure 4 for three cells under 45nm technology. Assuming 5% process variation (σproc = 5%), the models are built using 500K training samples. The total size of cell models in our experiments is 9.5MB. Then, we examine these models’ accuracy using another 10K test samples. The average errors of the models are summarized in Table I according to model types. Accordingly, two messages µ σ µ can be observed: (1) For Mstrike , Mstrike , and Mprop , the models are highly accurate with average errors no more σ than 0.4%. For the Mprop models, the average error is µ µ still within 2.1%. (2) In [11], the Mstrike , Mprop , and σ Mprop models have average errors up to 3.9%. For its σ Mstrike models, the average error further reaches 12.9%. In summary, our models exhibit much better quality.
# dimensions
(a) c432
(b) c2670
40
B. SSER measurement
40
logic depth SER error (%)
logic depth SER eror (%)
rates drops quickly as the dimension numbers increase. Such phenomenon implies their underlying SSER systems can be properly described using much lower dimensions. For example, the intuitive dimension number for the circuit c7552 is 2114, the total number of its nodes. From Figure 6(d), however, a dimension number of 60 is already good enough. Also, from Figure 6 states that the circuit level can suffice to represent the total dimension and thus converge SER faster. In Table II of the next section, more benchmark circuits are used to validate this hypothesis of using the circuit level as the reduced dimension.
30 20 10 0
30 20 10 0
1
20
40
60
80
100
# dimensions
(c) c7552
120
135
1
10
20
30
40
50
60
70
# dimensions
(d) mul 4
Fig. 6. Convergence rate, dimension number, and logic depth of benchmark circuits
The proposed framework is implemented in C/C++ and exercised on a Linux machine with a Pentium Core Duo (2.4GHz) processor and 4GB RAM. The 45nm Predictive Technology Model (PTM) [17] is used for cell modeling. For all circuits, each node under every input pattern combination is injected with four levels of electrical charges: Q0 = 34f C, Q1 = 66f C, Q2 = 99f C and Q3 = 132f C, where 32f C is observed to be the weakest
SER (uFIT)
40
20
50
Q0 Q1 Q2 Q3
40
SER (uFIT)
SER(uFIT)
25
15 10 5 0
30
Q0 Q1 Q2 Q3
20 10 0
Static Statistical (SPICE) (SPICE)
MC
QMC
Static Statistical (SPICE) (SPICE)
(a) i4
60
QMC
MC
QMC
(b) i6 80
Q0 Q1 Q2 Q3
SER(uFIT)
80
MC
40 20
60
Q0 Q1 Q2 Q3
40 20 0
0 Static Statistical (SPICE) (SPICE)
MC
QMC
Static Statistical (SPICE) (SPICE)
(c) i18
(d) c17
Fig. 8. SER breakdown by charge strength
80 60
30
SER(uFIT)
charge capable of generating a transient fault with positive pulse width under the settings in our experiments. Both circuit SER and SSER are measured and compared. For SER, we use static SPICE simulation; for SSER, we use Monte Carlo SPICE simulation as well as the proposed framework with (QMC) and without (MC) quasirandom sequences. Considering the extremely long runtime of Monte Carlo SPICE simulation (w/ 100 runs), we can only afford to perform tests on small circuits (i4, i6, i18 and c17), with the largest containing 7 gates, 12 strike nodes and 5 inputs. The runtime of the Monte Carlo SPICE simulation ranges from 8 hours to slightly more than one day. The runtime of our framework requires less than 1 second with an average of 106 speedup.
Static SPICE Monte Carlo SPICE MC QMC
the proposed framework gives much closer results with Q0 -part errors of 11.4%, 2.3%, 5.6%, and 4.1%, respectively. It also discloses that the analysis of weak charges is the most challenging task of SSER analysis.
20
C. SSER estimation on benchmark circuits
0 i4
i6
i18
c17
becnchmark circuits Fig. 7. SSER comparison from static and Monte Carlo SPICE simulations, the proposed MC and QMC frameworks
Figure 7 compares the results from SPICE simulation and our frameworks. The three facts are observed: (1) Considering 5% process variations, the SSER obtained by Monte Carlo SPICE simulation are 35% ∼ 52% above the SER obtained by static SPICE analysis (indicated by the black bars). Since the process variation worsens the stability of circuits beyond the deep submicron era, statistical effect should be considered to avoid increasingly underestimated circuit SER. (2) The proposed MC and QMC frameworks yield very similar SSERs with each others where the mismatches are within 0.4%. This means that we can use the faster QMC without serious accuracy degradation. (3) Compared to the results of Monte Carlo SPICE simulation, the proposed QMC framework has error rates of 2.5%, 0.8%, 2.9%, and 2.8%, respectively. Compared to [14] and [11] where the error rates are around 10%, our framework is quite accurate, which can be well attributed to our models. To more closely investigate the SER difference between static and statistical analysis, we breakdown the results in Figure 7 by charge strength levels, and present the results in Figure 8. Comparing the results between static and statistical SPICE simulations across all test circuits, it is observed that the results of the two SPICE simulations and the proposed framework are very similar for Q1 ∼ Q3 parts (within 1% difference). However, the static SPICE simulation dramatically underestimate the SERs for the Q0 part (indicated by the white bars), whereas
Using the proposed MC/QMC frameworks, we conduct SSER analysis on a variety of circuits including the ones in Figure 7, the ISCAS’85 benchmark circuits, and a series of multipliers. Table II first lists the name, the total number of nodes, and the total number of outputs for each circuits. The following four columns report the SSER values and the runtime required by the MC and QMC frameworks, respectively. The last two columns compute the SER difference and speedup, respectively, by comparing results from the MC and QMC frameworks. From Table II, SSER is clearly related to the number of nodes and primary outputs of a circuit, which correspond to the possibility of the circuit struck by radiation particles and the possibility of the transient faults observed at primary outputs, respectively. The runtime, however, depend on not only the number of strike nodes, but also the number of convolutions between nodes. SER difference is computed by |SSERM C − SSERQM C |/SSERM C and the average of 0.88% difference implies that the QMC and MC frameworks are of the same quality. For all benchmark circuits, the overall speedup brought by QMC is 2.95X in average and the QMC runtime is comparable to that of [11]. VI. Conclusion Traditional SER analysis techniques intend to mimic the static SPICE simulation. However, in presence of process variation, all static techniques tend to unavoidably underestimate true SERs and thus the research of statistical SER analysis is emerging. In this paper, we propose a method for building quality statistical cell models, based on which a Monte Carlo SSER framework is built. A heuristic is particularly proposed to apply quasirandom
TABLE II Benchmark circuits, SER and runtime from the baseline MC and QMC frameworks circuit i4 i6 i18 c17 c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 mul 4 mul 8 mul 16 mul 24
Nnode 4 6 12 12 233 638 443 629 425 841 901 1806 2788 2114 158 728 3156 7234
Npo 1 2 3 3 7 32 26 32 25 157 22 123 32 126 8 16 32 48
MC SSER (FIT) TM C (sec) 24.22E-05