Truncation Error Analysis of MTBF Computation for ... - Semantic Scholar

Report 2 Downloads 36 Views
Truncation Error Analysis of MTBF Computation for Multi-Latch Synchronisers Terrence Mak ∗† Abstract Chip designs have an increasing number of independent clock domains. Synchroniser circuits are used to facilitate reliable data transfers between these clock domains. The task of these synchronisers is inherently prone to the occasional, statistically random, failure. These failures are frequently quantified by the synchronisers’ Mean Time Between Failures, MTBF. The MTBF becomes worse at an exponential rate with increasing frequency. In contrast, the MTBF improves exponentially as more latches are cascaded to form the synchroniser, but at the cost of increasing the data transfer latency. Thus, selecting the number of latch stages to employ in the synchroniser is a trade-off between reliability and latency. We present equations for accurate estimation of the MTBF of multi-latch synchronisers, combined with an error analysis of these equations. We compare MTBF estimates obtained by using these equations to estimates gathered from comprehensive simulation analysis, and show that error terms are not insignificant. We provide a detailed description of all the assumptions that we have made in both the formulation of the MTBF equations and the circuit simulation environment.

Keywords: Metastability, mean-time-between-failure (MTBF), latch/flip-flop, Taylor series, truncation error.

1

Introduction

The continuous scaling of process technology presents a challenge on low skew clock distribution. As a result, the number of independent clock domains on a single chip increases and leads to a growing number of synchronizers for interfacing communication signals. The purpose of a synchronizer is to capture incoming data from another clock domain which could be vulnerable to the metastability problem. While it is well known that no synchronizer can completely avoid metastability [6], it is vital to characterize ∗ T. Mak is with the School of Electrical, Electronic and Computer Engineering, Newcastle University, Newcastle upon Tyne, UK. This work was carried out when he was an intern at the Sun Microsystems Laboratories. E-mail: [email protected]. † Manuscript received January 8, 2011; Revised manuscript received August 31, 2011; Accepted September 22, 2011

1

,

2

the probabilistic performance of a synchronizer design in terms of mean time between failure (MTBF). There are a number of ways to characterize the MTBF for a synchronizer, such as measuring the number of failure events from a synchronizer can yield an accurate characterization [4, 9, 2]. While simulation of synchronizer circuits using numerical simulation provides an effective evaluation of such circuits [10] and, usually, can generate new insight and discoveries [7]. Cascading latches can significantly improve the MTBF for synchronizers. However, it is not trivial to derive a mathematical expression for computing the overall MTBF based on characteristic parameters of individual latches. Sometimes, different versions of expressions can be found and the derivation were either omitted or ambiguous, such as in [1, 5]. In this paper, we present a method to approximate the MTBF for cascaded latches. The truncation and propagation error bounds of the expression are rigorously derived and analyzed. Although alternative derivation methodologies were presented in [5, 8], to our knowledge this is the first work to analyze and to derive the MTBF approximation error. We found that the error contributes significantly to the overall evaluation and, as a result, the evaluation underestimates MTBF. This paper is organized as follows: Derivations of MTBF for single and multiple latches are presented in Section 2; The computations for truncation and propagation errors are presented in Section 3 and Section 4 is the concluding remarks.

2 2.1

Basic MTBF Computations MTBF for a Single Latch

An estimation of the MTBF can be made from the settling time Ts , the timing window constant Tw and the settling time constant τ . Consider a simple latch, if the data input goes HIGH sufficiently far in advance of the clock edge, the synchronizer output will be driven HIGH and if it is significantly after the clock it will be driven LOW. If the two edges are close enough, and without noise the output may go into the metastable state. Fig. 1 presents the two data arrival waveforms and the one arrived earlier drives the output (Q) settled HIGH and the one arrived later drives the output (Q) settled LOW. There is a balance point, where the separation between data and clock would give an exactly equal probability of a HIGH or LOW outcome. Thus, for the two input arrival times separated by a time ∆tin , and spanning this balance point, where ∆tin < Tw , the settling time for both the HIGH and LOW outputs are the same with Ts (See Fig. 1). The relationships between ∆t and Ts can be modeled as [5, 8]. Ts = τ ln

Tw ∆tin

(1)

Since the arrival time window ∆tin is a function of Ts , and thus we write ∆tin as ∆t(Ts ) and Eq. 1 can be expressed as ∆tin (Ts ) = Tw e−Ts /τ

(2)

,

3

Tab. 1: The notations used in MTBF error analysis Tw Time window constant Tiw Time window constant for the i-th latch Tc Clock period Td Average data period Ts Settling time Tis Settling time of the i-th latch ∆ti (Tjs ) The time window of the i-th latch that corresponds to the settling time Tjs of the j-th latch τi The settling time constant for the i-th latch

Fig. 1: An illustration of waveform of data arrival, clock and the consequences of the Q output.

Suppose the maximum allowed settling time of a latch equals to Ts . We, therefore, have the metastable window ∆t(Ts ) that for any data arrived within this window, the settling time will be equal or longer than Ts and, thus, the latch will fail. As a result, we can compute the probability of synchronizer failure as the probability of an incoming signal arrived within this window. Assume that the incoming data is uniformly distributed among a clock period Tc , we can express the probability of a synchronizer failure as follows, ∆tin (Ts ) Tw e−Ts /τ = (3) Tc Tc Further, the average incoming data period is denoted by Td , which yields the probability for a synchronizer failure per second of, P (F ail) =

P (F ail)/second = Therefore, yields [8]

Tw e−Ts /τ Tc Td

(4)

,

4

M T BF =

2.2

Tc Td eTs /τ Tw

(5)

MTBF for Cascaded Latches

In this section, we will derive a general expression for MTBF estimation for n cascaded latches. Firstly, we derive the MTBF for two latches connected in series. The MTBF expression for n latches can be obtained by generalizing the two-latch equation. Suppose we have two latches connected in series as shown in Fig. 2. We denote T2s to be maximum allowable settling time of the second latch and if data arrived within a timing window ∆t2 (T2s ), the settling time of the second latch will be equal to or greater than T2s . From Eq. 2, we have s

∆t2 (T2s ) = T2w e−T2 /τ2

(6)

From Fig. 2, we can see that if the output of the first latch resolves within the timing window [T1s , T1s + ∆t2 (T2s )], the settling time of the second latch will be equal to or greater than T2s . Therefore, we need to compute the time window of the first latch corresponding to a settling time window [T1s , T1s + ∆t2 (T2s )]. We have ∆t1 (T1s ) and ∆t1 (T1s + ∆t2 (T2s )) yield settling times T1s and T1s + ∆t2 (T2s ), respectively. So we can rewrite the timing window ∆t1 (T2s ) as

Fig. 2: Relationship between input time windows and their corresponding output between times for two cascaded latches.

∆t1 (T2s ) = ∆t1 (T1s ) − ∆t1 (T1s + ∆t2 (T2s ))

(7)

,

5

Using Taylor series 1 expansion for the term ∆t1 (T1s + ∆t2 (T2s )) by letting f (x) = ∆t1 (x) and a = T1s , we have ∆t1 (T2s )

 δ∆t1 (T1s ) ∆t2 (T2s ) = − ∆t1 (T1s ) + δT1s  1 δ 2 ∆t1 (T1s ) s 2 (∆t2 (T2 )) + · · · + 2! δ 2 T1s 1 δ 2 ∆t1 (T1s ) δ∆t1 (T1s ) s (∆t2 (T2s ))2 − · · · ∆t (T ) − = − 2 2 δT1s 2! δ 2 T1s ∆t1 (T1s )

(8) (9)

From Eq. 2, we have s

δ∆t1 (T1s ) −T1w e−T1 /τ1 −∆t1 (T1s ) = = (10) s δT1 τ1 τ1 and similarly for the higher derivative terms. Therefore, we can substitute Eq. 10 into Eq. 9 to obtain a simplified expression, ∆t1 (T2s )

 ∆t1 (T1s ) ∆t2 (T2s ) τ1   1 ∆t1 (T1s ) − (∆t2 (T2s ))2 + · · · 2! τ12 

=

And when we truncate Eq. 11 to the first term:   ∆t1 (T1s ) s ∆t1 (T2 ) ≈ ∆t2 (T2s ) τ

(11)

(12)

The truncation error  for the above expression becomes



=

 ∆t1 (T1s ) (∆t2 (T2s ))2 τ12   1 ∆t1 (T1s ) (∆t2 (T2s ))3 + · · · − 3! τ13 1 2!



(13)

Note that we compute the ∆t1 (T2s ) backward from the second latch to the first latch. This approach provides a clear presentation of the relationships between time window and settling time. In general, we have the metastability window of the first latch that results in an output failure for the n-th latch case as   Pn s n−1  Y Tjw e− i=1 Ti /τi  Pn s  Tnw e− i=1 Ti /τi ∆t1 (Tns ) ≈  (14) τj j=1 1

The Taylor series gives f (x) = f (a) + where a is a neighborhood real number.

f 0 (a) (x 1!

− a) +

f 00 (a) (x 2!

− a)2 +

f (3) (a) (x 3!

− a)3 + · · ·

,

6

We can compute the MTBF as the probability of an incoming signal arrived within the window in Eq. 14. Similarly, given the clock period of incoming data, Tc , and average incoming data period, Td , the MTBF for n latches becomes,  P  P n−1 Y τj e ni=1 Tis /τi  e ni=1 Tns /τn   (15) M T BFn ≈ Tc Td  Tjw Tnw j=1 The MTBF derivation obtained is exactly the same as those presented in [5, 8]. However, we noticed that the truncation error analysis was ignored in those works and, typically in Eq. 7, ∆t2 (T2s ) was assumed to be infinitesimal small, so that a simplified linear expression could be obtained. However, this linear approximation introduces a truncation error and this error accumulates throughout the computation for multiple stages of latches. We will present a thorough truncation analysis in Section 3.

3

Error Computation

3.1

Truncation Error Analysis

The MTBF approximation involves a truncation error when using Taylor series expansion in Eq. 12. The approximation is illustrated in Fig. 3. The objective is to compute the time window t(T1 ) − t(T1 + ∆t). Because the function, f (x) = ∆t1 (x), is exponential and ∆t is also an exponential function, the calculation involves a double exponential. A linear approximation can simplify the computation by taking the first derivative at T1 and, thus, the time window (t0 (T1 + ∆t) in Fig. 3) calculated using this approximation is larger than the true window (t(T1 + ∆t) in Fig. 3). We will now derive the relative error of the approximation. This error  equals the summation of the remaining terms of the Taylor series as shown in Eq. 13. We can have the relative error for a 2-latch synchronizer case as follows,   P∞ (−1)i ∆t1 (T1s ) (∆t (T s ))i i 2 2 i=2 i!  τ1 (16) s ∆t1 (T2 ) = ∆t1 (T ) 2

By substituting Eq. 11 into Eq. 16, we can obtain  ∆t1 (T2 ) =

∆t1 (T1s ) ∆t2 (T2s ) τ1 ∆t1 (T2s ) ∆t (T s )

−1

(17) ∆t (T s )

1 1 From Eq.11, we know that ∆t1 (T2s ) > 1τ1 1 ∆t2 (Tss ) − 2τ (∆t2 (Tss ))2 by 2 1 taking the first two terms of the Taylor series of ∆t1 (T2s ). Therefore, we have the following by substituting the inequality into Eq. 17.

 ∆t1 (T2 )