A DIGITAL-PLL-BASED TRUE RANDOM NUMBER GENERATOR Chengxin Liu, John McNeill Worcester Polytechnic Institute, Electrical and Computer Engineering Department, 100 Institute Road, Worcester, MA 01609, USA E-mail:
[email protected] ABSTRACT
2. JITTER IN PLLS
A true random number generator (RNG) based on a digital phase-locked loop (PLL) has been designed and implemented in a 1.5um CMOS process. It achieved an output data rate of 100 kbps from the sampling of two 30MHz ring oscillators, and successfully passed the NIST test suite SP800-22.
2.1 Jitter and phase noise in ring oscillators White noise and 1/f noise are the main noise sources in MOS transistors which will be upconverted to phase noise. Fig. 2 shows the typical phase noise spectrum and time domain plot of rms jitter versus measurement time delay for open-loop ring oscillators in log-log scale.
1. INTRODUCTION
log(SΦ ( f ))
slope=1
Several emerging cryptographic applications such as smart cards and pervasive computing require a low cost solution to the problem of obtaining high quality random numbers in system-on-chip designs for secure data communication. The most popular approach by far is the method of oscillator-sampling [1][2] due to the advantages of less die area, power efficiency, and high speed.
− 30dB / dec
κ ∆T
log( f )
tc
log(∆T)
The -20dB/dec region in the phase noise spectrum is due to the white noise and has the form of N SΦW ( f ) = 21 (1) f where N1 is the frequency domain white noise figure of merit, and f is the offset frequency from the carrier. It is a Gaussian random process in the time domain simply reflecting the central limit theorem in statistical theory. The rms jitter after measurement time delay ∆T is [3] σ w ( ∆T ) = κ ∆T (2) where κ is the time domain white noise figure of merit which is determined by the delay cell parameters, and is independent of the number of stages in the ring [3]. κ is related to the frequency domain figure of merit N1 by [3]
N1 f0 where f0 is the VCO’s oscillating frequency.
κ=
Q
(3)
1/f noise upconverted phase noise dominates at the lower offset frequencies and thus longer time delays. It has a slope of –30dB/dec in the phase noise spectrum with the representation of [4] N f (4) SΦ1/ f = 1 3 c f
Noisy Low Freq. Oscillator
Figure 1. Traditional oscillator-based RNG
0-7803-9345-7/05/$20.00©2005 IEEE.
ζ∆T
slope=0.5
(a) Frequency domain (b) Time domain Figure 2. Jitter and phase noise in ring oscillators
Random Bit Stream
D
N1 f2
fc
This paper presents a new dual-oscillator sampling architecture for random number generation. The main advantage over the traditional approach is the capability of achieving the same data rate using slower clocks, thus cheaper process and lower cost. Section 2 reviews the jitter theory of ring oscillators and PLLs. The RNG architecture and circuit details are discussed in section 3. Section 4 gives the experimental results.
Oscillator
− 20dB / dec
N1 f c f3
As illustrated in Fig. 1, a low frequency oscillator with high jitter samples the output of a high frequency oscillator using a D flip-flop to produce the randomnumber sequences. In order to achieve high level randomness, the rms jitter of the low frequency oscillator must be much greater than the period of the fast oscillator [2]. Experimental results in [2] have shown that for CMOS ring oscillators in a 0.18um digital library, the jitter to mean period ratio is less than 10-4, which limits the maximum output data rate to 100 kbps if a 1GHz fast oscillator is used.
High Freq.
log(σ ∆T )
87
where fc is the 1/f3 phase noise corner [4]. In time domain, it is a correlated random process. The accumulated rms jitter is proportional to the measurement time delay as
σ 1/ f ( ∆T ) = ζ∆T
3. RNG DESIGN 3.1 System architecture
(5)
The proposed RNG architecture is illustrated in Fig. 4. Two identical noisy ring oscillators are designed with white noise dominated jitter. Oscillator І is free-running and serving as the clock. The phase error of the two oscillators is sampled by a low metastability D flip-flop, which also acts as a bang-bang phase detector. Two up/down counters form the loop filter. The 24-bit up/down counter p integrates the phase error of the two ring oscillators to set the average frequency of oscillator II, and introduces a pole to the loop transfer function. The 1-bit up/down counter z introduces the zero to stabilize the loop, and provides instantaneous phase correction without affecting the average oscillating frequency. Therefore the two oscillators are always synchronized through the feedback. The whole system is powered by a voltage regulator to reject the noise from the power supply. It should be noted that the whole system is nonlinear and thus it is difficult to be modeled analytically.
where ζ is time domain 1/f noise figure of merit.
2.2 Jitter and phase noise in PLLs The noise transfer function of PLL corresponds to the high-pass transfer function. If the PLL loop bandwidth is wide enough to cover the 1/f3 phase noise corner fc, the upconverted 1/f noise jitter can be filtered out, leaving only the filtered white noise as in Fig. 3. This remaining noise has a Gaussian distribution and the jitter process is a correlated white noise process. The closed-loop phase noise spectrum has the form of
SΦCL ( f ) =
N1 / f L2 1 + ( f / fL )
(6)
2
where fL is the PLL loop bandwidth. By the WienerKhinchine theorem, the autocorrelation function of this jitter process can be obtained by taking inverse Fourier transform of its power spectrum density (6). Therefore the autocorrelation coefficient of this jitter process is calculated as ρ xx ( ∆t ) =exp ( -2πf L ∆t ) (7)
If the loop bandwidth is wide enough to filter out the 1/f noise upconverted jitter, the jitter difference of the two oscillators is simply correlated white noise. After sampled by the D flip-flop, a correlated data stream with equal probability for ‘1’s and ‘0’s is generated. From equation (7), by dividing the output data rate down to around or below the PLL loop bandwidth, the autocorrelation of the output data can be significantly decreased so that the data stream can be considered as random. Therefore for the proposed RNG, the maximum data rate achievable is limited by the loop bandwidth of this system. Since the PLL loop bandwidth can be as high as 10% of the clock frequency, ideally the maximum RNG data rate can be as high as 10% of the ring oscillator frequency.
And the rms jitter with respect to the PLL reference clock is as [3]
σx = κ
1
(8)
4π f L
So for white noise dominated PLL, only κ and the PLL loop bandwidth fL are needed for jitter prediction. log(SΦ ( f ))
log(σ ∆T ) N1 f2
fc f L
However, wider loop bandwidth results in lower PLL output jitter according to equation (8). The output jitter should be much larger than the LSB of the DAC, or the phases of the two ring oscillators will not be synchronized but oscillate. Thus the key factors of this design are the DAC resolution and the noise figure of merit κ.
2σ x
κ ∆T
τL
log( f )
tc
log(∆T)
3.2 Time domain analysis
(a) Frequency domain (b) Time domain Figure 3. Jitter and phase noise in PLLs
As illustrated in Fig. 5, in the presence of jitter and assuming there is no frequency drift, the transition time for the two oscillators can be expressed as (9) t1[ n] = t1[n − 1] + T1 + ε 1[n]
9 bits DAC Ctrl
Ring Oscillator II Bias DAC Ctrl
Ring Oscillator I
Post -processing
RandomBits
(10) t 2 [n] = t2 [n − 1] + T2 [n] + ε 2 [ n] where T1 is the constant average period of the free-running oscillator I, T2[n] is the average period of the oscillator II in the nth interval, and εi[n] is zero-mean, discrete Gaussian random process which expresses the deviation of the period from the average. Since white noise dominates, the random process εi[n] is uncorrelated. From equation (2), the standard deviation of εi[n] is as σ ε i = κ i Ti i = 1, 2 (11)
Data D
Q
24-bit up/down counter p
8 bits
Clk 1-bit up/down
1 bit
counter z
Figure 4. The proposed RNG architecture
0-7803-9345-7/05/$20.00©2005 IEEE.
88
3.3 DAC-controlled ring oscillator
t[n] t[2]
The DAC-controlled ring oscillator is realized by adding a capacitor array to the load of the 3-stage single-ended ring oscillator as illustrated in Fig. 7. To provide more jitter, the current-starved inverter with six extra 50K resistors at the voltage control nodes is used as the delay stage instead of the simple CMOS inverter. The two ring oscillators are totally symmetric and next to each other in the layout. Thus the deterministic jitter due to the power supply and substrate is in common-mode and will be rejected.
t[1]
t T0 +ε[1]
T0 +ε[2]
T0 +ε[n]
Figure 5. Def. of random processes for clock with jitter
Assuming there is no metastability for the D flip-flop, the phase error sequence rb[n], which is also the output sequence of this RNG system, is as rb[n] = ( sign ( td [n]) + 1) / 2 (12)
Vdd
Vp
where sign() is the signum function, and td[n] is td [n] = t1[n] - t2[n] = ( t1[n − 1] − t2 [n − 1]) + (T1 − T2 [n]) + ( ε1[n] − ε 2 [n])
(13)
1x 1x 2x
128x
Cload
Figure 7. The DAC-controlled ring oscillator
For the current-starved inverter, the noise figure of merit κ can be maximized by minimizing the power dissipated for inverter switching [5]. Minimum sizes are used for the switching NMOS transistors to achieve higher κ with fast available speed. PMOS transistors are sized relative to NMOS transistors to provide symmetric rise and fall times. The control transistors are twice the size of the switching transistors, and are biased with very small excess gate-source voltages to limit the switching current.
The frequency of oscillator II is adjusted in every clock cycle and can be modeled as f 2 [n + 1] = f 2 [1] − ( k p ⋅ ctrp [n] + k z ⋅ ctrz [n]) ⋅ K vco (14) where kp and kz are the gain of the counters p and z, ctrp[n] and ctrz[n] are their counting results, and Kvco is the oscillator control constant in the unit of Hz/bit.
3.4 Low metastability D flip-flop
In the implementation of this design, the eight most significant bits of the 24-bit counter p are connected to the DAC control of the oscillator I, while the output of 1-bit counter z is connected to the DAC directly. So equation (14) changes to
Since the edges of the oscillators are always aligned to each other, a low metastability falling edge triggered D flip-flop is designed in this work as in Fig. 8. When the CLK is high, both the pre-amplifier [6] and the D latch [6] will be reset. The transmission gate is on and the data is sampled. When the CLK switches to low, the transmission gate is closed to hold the data. The pre-amplifier amplifies the difference between the held data and vdd/2, and the D latch regenerates it to a valid logic level. To reduce the metastability error, two D flip-flops are cascaded.
f 2 [n + 1] = f 2 [1] − ( fix (ctrp [n]/ 216 ) + ctrz [n]) ⋅ K vco (15)
where fix(A) is the Matlab fix function which rounds the elements of A toward zero, resulting in an array of integers. The loop bandwidth of this digital PLL system is proportional to the oscillator control constant Kvco, and the counters’ gain, kp and kz. Fig. 6 shows the loop acquisition process simulated by Matlab for different number of bits in counter p. A shorter locking time indicates a wider loop bandwidth.
CLK
CLK CLK
Data
+
+
−
−
Pre-Amp
50 t d [n]=t1[n]-t 2[n] (ns)
F7
Vn
Since ε1[n] and ε2[n] are independent zero-mean Gaussian random processes, their difference is also a zero-mean Gaussian random process. The first two difference terms in equation (13) indicate the correlation of td[n] with its previous bits. Therefore the sequence td[n] is a correlated random process.
CLK
14-bit
0
Reset
Q
D Latch
Vdd/2
Figure 8. The low metastability D flip-flop 16-bit
-50 -100 -150
out
9 control bits Z 0 F0 F1
f1 =30MHz f 2 [1]=29.9MHz
18-bit
t (µs) 0
50
3.5 Digital post-processing
t d [1]=20ns σ εi =60ps
100
In order to lower the autocorrelation, the raw output data stream of the digital PLL system is divided down by 10 first and then fed into a Von Neumann corrector to eliminate the bias. Finally another frequency divider is used to further reduce the autocorrelation.
K VCO =40kHz/bit
150
200
Figure 6. Loop acquisition simulated by Matlab
0-7803-9345-7/05/$20.00©2005 IEEE.
89
Table 1. NIST SP800-22 statistical test results
4. EXPERIMENTAL RESULTS
DAC1
Ring1
DAC2
Ring2
Frequency Block Frequency Cusum-Forward Cusum-Reverse Runs Longest Run Rank FFT Universal Approx. Entropy Serial1 Serial2 Lempel Ziv Linear Complexity Periodic Templates Aperiodic Templates Random Excursions Random Ex. Variant
2 DFFs
Resistors
Figure 9. The die photo
The two oscillators are running at 30MHz in this RNG system. Fig. 10 shows the measured plot of the rms jitter over measurement time delay for the open-loop ring oscillators. The extracted white noise figure of merit κ is 1.48E-7.
RMSJitter (sec)
10
10
-10
Meas. Time Delay (sec) 10
-7
10
Data Set III 0.346485 0.055258 0.677662 0.506432 0.903376 0.124246 0.141371 0.718271 0.841297 0.981364 0.619666 0.319159 1.000000 0.980834 0.299952 all passed all passed all passed
6. REFERENCES
-6
[1] B. Jun and P. Kocher, The Intel random number generator, Technical Report, Intel, 1999. [2] M. Bucci et al., A high speed oscillator-based truly random number source for cryptographic applications on a smart card IC, IEEE Trans. on Computers, vol. 52, No. 4, pp. 403-409, 2003. [3] J. McNeill, Jitter in ring oscillators, IEEE J. SolidState Circuits, vol. 32, No. 6, pp. 870-879, 1997. [4] C. Liu and J. McNeill, Jitter in oscillators with 1/f noise sources, Proc. IEEE ISCAS’04, vol. 1, pp. 773776, 2004. [5] C. Liu and J. McNeill, Jitter in deep submicron CMOS single-ended ring oscillators, Proc. 5th International Conference on ASIC, pp. 715-718, 2003. [6] C. Portmann and T. Meng, Power-efficient metastability error reduction in CMOS flash A/D converters, IEEE J. Solid-State Circuits, vol. 31, No. 8, pp. 1132-1140, 1996. [7] A. Rukhin, el. A statistical test suite for random and pseudorandom number generators for cryptographic applications, NIST Special Publication 800-22, 2001.
Figure 10. Jitter performance of the free-running ring oscillators at 30MHz
By analyzing the autocorrelation of the raw output data before post-processing, the extracted equivalent loop bandwidth of this PLL system is around 500 kHz. Thus the rms jitter of this system is about 60ps according to equation (8). The DAC has a measured LSB of 44ps with the total adjusting range of ±1.6ns. The resolution is not as fine as expected due to variation in the fabrication process. Thus it was necessary to apply more division than expected to lower the autocorrelation sufficiently, and a data rate of only 100 kbps is achieved. It is expected that this rate will be improved in a future iteration of the design. The quality of the randomness has been verified by the NIST SP800-22 test suite [7] over 2Mbit long sequences. This test suite consists of 16 statistical tests, and the passing criteria for each test is that the p-value is larger than 0.01. Table 1 shows the complete test results for 3 data sequences.
0-7803-9345-7/05/$20.00©2005 IEEE.
Data Set II 0.321518 0.755148 0.556027 0.243561 0.194128 0.945998 0.24964 0.6652 0.673109 0.196523 0.183901 0.411532 1.000000 0.187954 0.65189 all passed all passed all passed
A new type of true RNG based on digital PLL has been proposed. The random bits are generated by the jitter sampling of two identical synchronized ring oscillators. Comparing to the traditional oscillator sampling approach, this method is able to achieve higher data rate when using same speed of clocks. This structure has been realized in a 1.5um process, and has successfully passed the NIST SP800-22 statistical test suite.
κ=1.48E-7
-11
Data Set I 0.403123 0.151636 0.323588 0.338798 0.17792 0.891002 0.239545 0.270026 0.685955 0.399766 0.866552 0.793059 1.000000 0.448407 0.362154 all passed all passed all passed
5. CONCLUSION
-9
ζ =2.52E-4 10
P-value
Test
The customized part of this design including the oscillators, the D flip-flops, and the voltage regulator are implemented in a 5V 1.5um 2-poly 2-metal CMOS process and consume an area of 1mm2. The chip micrograph is shown in Fig. 9. Excluding the output buffers, the total power dissipation is 1.92mW. The counters and the digital post-processing circuits are implemented in an off-chip FPGA for design flexibility.
90