Three ways to mount distinguishing attacks on irregularly clocked stream ciphers H˚ akan Englund Department of Information Technology, Lund University, Lund, Sweden E-mail:
[email protected] Thomas Johansson Department of Information Technology, Lund University, Lund, Sweden E-mail:
[email protected] Abstract: Many stream ciphers use irregular clocking to introduce nonlinearity to the keystream. We present three distinguishers on irregularly clocked linear feedback shift registers. The general idea used is to find suitable linear combinations of keystream bits, here called samples, that are drawn from a biased distribution. We describe how to place windows around the estimated positions around members of the linear combinations, and very efficiently create many samples with low computational complexity. We also describe ideas based on constructing samples consisting of vectors of bits (words) instead of single binary samples. These vectors based methods can distinguish the cipher using fewer keystream bits but sometimes require a higher computational complexity. Keywords: irregularly clocked LFSR; distinguishing attack. Reference to this paper should be made as follows: Englund, H. and Johansson, T. (2006) ‘Three ways to mount distinguishing attacks on irregularly clocked stream ciphers’, Inaugural issue of Int. J. of Security and Networks, Vol. 1 No.1 2006.
easily be reconstructed from its output. Hence some nonlinearity has to be introduced. One common way is to use some binary values from the LFSR state and feed them Since the standardisation of the block cipher AES Dae- through a nonlinear boolean function. Such stream cimen and Rijmen (2002) a lot of attention in symmetric phers are called filter generators. Another possibility is cryptology has moved to the area of stream ciphers. In to decimate the sequence in an unpredictable way. Several many applications stream ciphers can offer advantages such designs have been proposed, for example the shrinkof different kinds, e.g., in situations when low power ing generator, the self-shrinking generator and the alterconsumption is required, low hardware complexity or nating step. There are also designs that use both irregular when we need extreme software efficiency. To ensure trust clocking and boolean functions, for example LILI-128 and in the security claims of stream ciphers it is imperative LILI-II. that the security of stream ciphers is carefully studied and In stream cipher cryptanalysis we usually consider the analysed. plaintext to be known, i.e., the keystream is known and we A common building block in stream ciphers is a Linear try to recover the key. Siegenthaler (1984) introduced the Feedback Shift Register (LFSR). By using an LFSR with idea of exploiting the correlations in the keystream. As a a primitive feedback polynomial, sequences with very long consequence of this attack, nonlinear functions must have periods and good statistical properties can be achieved. high nonlinearity. LFSRs are also very suitable for fast implementation in hardware. Since LFSRs are linear, the initial state can Copyright c 200x Inderscience Enterprises Ltd. 1 INTRODUCTION
This attack was later followed by the fast correlation attack Meier and Staffelbach (1988). In a fast correlation attack one first tries to find a low weight parity check polynomial of the LFSR and then applies some iterative decoding procedure. Many improvements have been introduced on this topic, see Canteaut and Trabbia (2000); Chepyzhov et al. (2000); Johansson and J¨ onsson (2002, 2000, 1999a,b).
LF SR
Clock control
st
H H 6
-
zt
ct
Figure 1: Irregularly clocked LFSR. Algebraic attacks have received much interest lately. These attacks try to reduce the key recovery problem to the problem of solving a large system of algebraic equations, see Courtois and Meier (2003); Courtois (2003). In this paper we consider distinguishing attacks, a well known class of attacks. A distinguishing attack is a known keystream attack, i.e., we have access to some amount of the keystream and from this data we try to decide whether this data origins from the cipher we consider, or if the data appears to be random data, see e.g., Ekdahl and Johansson (2002); Goli´c and Menicocci (2003); Junod (2003); Watanabe et al. (2003); Englund and Johansson (2005).
2.1
Irregularly clocked linear shift registers
We consider irregularly clocked binary stream ciphers where the output is taken from some decimated LFSR sequence. As depicted in Figure 1 a binary LFSR produces a binary sequence denoted s = s0 , s1 , . . .. A clock control sequence denoted c = c0 , c1 , . . . is generated in some arbitrary way. This clock control sequence is used to decimate the LFSR sequence, i.e., to remove some symbols from the stream. The decimated sequence is denoted z = z0 , z1 , . . . and is used as the keystream. In other words, the clock control sequence uniquely determines a sequence of integers d0 , d1 , . . . such that zt = sdt
Throughout the paper we will consider stream ciphers that can be modelled as an irregularly clocked linear feedback shift register, where the sequence that determines the decimation is created in some arbitrary way. The only requirements that we assume are that we know the expected and where d0 < d1 < d2 < · · · . As usual, the keystream is behaviour (e.g. expectation and variance) of the clock con- added to the plaintext to produce the ciphertext. trol sequence. As a slightly more advanced model, we mention also the irregularly clocked filter generator. The irregular clocking We start by returning to the ideas used in Englund and is as before, but the keystream is now generated as Johansson. (2005), where a very efficient distinguisher was described. We introduce new developments of this attack zt = f (sdt , sdt +γ1 , sdt +γ2 , . . . , sdt +γk−1 ), in two directions, which in general require less keystream but sometimes require a higher computational complexity. where f () is a boolean function in k variables. A general The general idea used in Englund and Johansson. (2005) is approach in cryptanalysis in this model would be to apto find suitable linear combinations of keystream bits, here proximate f () by a linear function. This brings us back called samples, that are drawn from a biased distribution. to the first model with some small modifications. Thus we Collecting enough samples will enable us to detect whether focus on the first and simpler model. the bias is present or not and thus learning whether the keystream sequence came from the considered cipher or if 2.2 Hypothesis testing it was purely random. The ideas we describe in this paper are based on constructing samples consisting of vectors of We give a brief introduction to binary hypothesis testing. In a binary hypothesis test we have two claims, namely bits (words) instead of single binary samples. that hypothesis H0 is the explanation for an observed measurement, or alternatively that H1 is the explanation for The outline of the paper is as follows. In Section 2 we dean observed measurement. In the test we try to decide scribe irregularly clocked linear shift registers and discuss which of the two hypothesis is the correct one. hypothesis testing. We then review the ideas described in Assume that we have a sequence of m independent Englund and Johansson. (2005) in Section 3. In Section and identically distributed (i.i.d.) random variables 4 we present a simple word based distinguisher. We inX , X , . . . , X over an alphabet X . Its distribution is 1 2 m troduce ideas on how to extend the word based attack by denoted Q(x) = P r(X = x), 1 ≤ i ≤ m and the sami considering the distribution of the undecimated LFSR seple values obtained in an experiment are denoted x = quence, this is discussed in Section 5. Finally, we conclude x , x , . . . , x . We have the two hypotheses H : Q = P0 1 2 m 0 the the paper in Section 6. and H1 : Q = P1 , where P0 and P1 are two different distributions. To distinguish between the two hypotheses, one defines a decision function, φ : X m → {0, 1}. φ(x) = 0 implies that H0 is accepted and φ(x) = 1 implies that H1 2 PRELIMINARIES is accepted.
Two probabilities of error are associated with the decision function, denoted α = P (φ(x) = 1|H0 is true), and β = P (φ(x) = 0|H1 is true). Let H0 be the hypothesis that the distribution Q is induced by the cipher and let H1 be the hypothesis that Q is uniform. The Neyman-Pearson lemma tells us how to carry out the actual test when we have a sequence of samples.
3 AN EFFICIENT DISTINGUISHER
In Englund and Johansson. (2005) an efficient distinguisher was introduced for irregularly clocked filter generators. The attack is based on a low weight recurrence relation for the LFSR sequence in the form st ⊕ st+τ1 ⊕ . . . ⊕ st+τw−1 = 0, t ≥ 0,
Let X1 , X2 , . . . , Xm be i.i.d. random variables drawn according to probability distribution Q. Consider the decision problem corresponding to the hypotheses Q = P0 vs. Q = P1 . For T ≥ 0 define a region P0 (x1 , x2 , . . . , xm ) >T . Am (T ) = P1 (x1 , x2 , . . . , xm ) Let α = P0m (Acm (T )) and β = P1m (Am (T )) be the error probabilities corresponding to the decision region Am . Let B be any other decision region with associated error probabilities α∗ and β ∗ . If α∗ ≤ α, then β ∗ ≥ β. 0 (x) The region Am (T ) that is determined by P P1 (x) > T , is the one that jointly minimizes α and β. In our case we consider α and β to be equal and hence T = 1. This gives us the test m X P0 (xn ) P0 (x1 , x2 , . . . , xm ) log2 >1 ⇒ >0. P1 (x1 , x2 , . . . , xm ) P1 (xn ) n=1 (1)
The ratio in (1) is called a log-likelihood ratio, and the test is thus called a log-likelihood test. Finally, we note that the number of samples we need to collect in order to reliably distinguish between the two distributions is roughly 1 N≈ , D(P0 , P1 ) (2) where D(P0 , P1 ) is the relative entropy defined as P P0 (x) D(P0 , P1 ) = x∈X P0 (x) log P1 (x) . If P1 is the uniform distribution, D(P0 , P1 ) = log |X | − H(P0 ), where H(P0 ) is the entropy function. We use this relative entropy as our measure of bias. The statistical distance, denoted ε, between P0 , P1 is defined as 1X ε = |P0 − P1 | = |P0 (x) − P1 (x)|, 2 x∈X
(3) Furthermore, if P0 is very close to a uniform distribution P1 , then D(P0 , P1 ) ≈ ln12 |P0 −P1 |2 , leading to N ≈ ε12 . For a more thorough treatment of hypothesis testing, we refer to any textbook on the subject, e.g. Cover and Thomas (1991).
(4) where w is the weight of the recurrence. If the LFSR recurrence relation is not of low weight, the first step would be to find such a low weight recurrence relation. This is discussed below. Looking at the keystream as the decimated LFSR sequence, and assuming zt0 = st for some t0 , t we would like to identify where in keystream we can find st+τ1 etc. Due to the irregular clocking, this can in general not be determined. However, instead we can determine where st+τ1 is likely to have appeared (if at all). The idea is to place windows around the most likely positions of appearance. From the number of zeros inside the windows it is easy to calculate P (st+τ1 = 0) etc. conditioned on the keystream. To support a very fast implementation of these ideas we proceed a bit differently. Every possible combination of one position from each window xored together is referred to as a sample. We basically just keep the number of zeros inside each window in memory, and from this information we can easily calculate how many possible ways we can combine one symbol from each window such that it fulfils the recurrence equation. The advantage of this procedure is that the numbers of zeros in a window is easily updated when moving the windows, and hence at each time instant one receives many samples with a very low computational complexity. These ideas are now described in more detail.
3.1
Finding a low weight multiple
As in many attacks, such as fast correlation attacks and distinguishing attacks, we explore a low weight recurrence relations of the LFSR sequence, i.e., a recursion of weight w for the LFSR sequence s that adds to zero for all time instances t, see (4). If the weight of the original feedback polynomial is too high, it is possible to find multiples of the polynomial that has a lower weight. Several methods of finding such multiples have been proposed. Some methods focus on finding multiples with as low weight as possible, while some other methods accept a higher weight but reduce the complexity of finding the multiple. Assume that we have a feedback polynomial g(x) of degree r and search for a multiple of weight w, According to Goli´c (1996), the critical degree when these multiples start to appear is (w − 1)!1/(w−1) 2r/(w−1) . Goli´c (1996) also describes an algorithm that focuses on finding multiples of degree around the critical degree. The first step is to calculate the residues xi mod g(x), then one computes the residues xi1 + . . . xik mod g(x) for all nk combinations
1 ≤ i1 ≤ . . . ≤ ik ≤ n, with n being the maximum degree of the multiples. The last step is to use fast sorting to find all of the zero and one matches of the residues from the second step. The complexity of this algorithm is approxi1/2 2r/2 for odd multiples mately O(S log S) with S = (2k)! k!
LFSR sequence:
Keystream:
st
st−τ1
J J ^ J
st+τ2 −τ1
?
zt0 −τ 0
1
zt0
zt0 +τ 0 −τ 0 2
1
k/(2k−1)
of weight w = 2k + 1, and S = (2k−1)!k! 2rk/(2k−1) for even multiples of weight w = 2k. Wagner (2002) presented a generalization of the birthday problem, i.e., given k lists of r-bit values, find a way to choose one element from each list, so that these k values XOR to zero. This algorithm finds a multiple of weight w = k + 1 using lower computational complexity, k · 2r/(1+blog kc) , than the method described above, on the expense of higher degree, which is 2r/(1+blog kc) . Since the number of samples is of high concern to us we choose to work with the method described in Goli´c (1996). In the attack we focus on using a multiple of weight three in the form
Figure 2: Decimated LFSR sequence.
3.3
Determining the size of the windows
The output sequence from the clock-control part, denoted by ct , is assumed to have a fixed distribution independent of t. By using the central limit theorem we know that the sum of a large number of random variables approaches the normal distribution. So Yn = C1 + C2 + . . . + Cn ∈ √ N(n·E(C), σc n), where n denotes the number of observed symbols and σc the standard deviation for the clocking sequence. If we choose the window size sufficiently large, the desired position will be located inside the window with a high st ⊕ st+τ1 ⊕ st+τ2 = 0, t ≥ 0. probability. Since (5) √ √ P (nE(C) − σ√ 0.682, c n < Yn < nE(C) + σc √n) = The complexity of finding this multiple is roughly 2r/2 and P (nE(C) − 2σ n < Y < nE(C) + 2σ n) = 0.954, c n c the expected degree of this multiple is also approximately r/2 2 , where r denotes the degree of the original LFSR. we choose a window size of approximately four standard It is also possible to mount the attack with multiples of deviations. higher weight. Using a multiple of higher weight lowers the degree of the multiple, but the probability that one 3.4 Estimation of the number of required or several of the members are decimated also increases. samples So from now on we assume that we use a weight three recurrence relation as in (5). Having established the location and size of windows, the
3.2
Positioning the windows
Consider again the weight three relation, but now with irregular clocking. We denote by C the random variable giving the number of clocks between two consecutive keystream symbols, P (C) its distribution and E(C) the expectation of C. The size of the windows depends on the distance from the fixed position, hence we will fix the centre position in the recurrence and use windows around the other two positions. We rewrite the recurrence as st−τ1 ⊕ st ⊕ st+τ2 −τ1 = 0, t ≥ 0. (6)
main idea of the distinguishing attack is now to create samples of the form zw1 ⊕ zt ⊕ zw2 , t ≥ 0, where w1 is any position inside the first window and w2 is any position inside the second window. We will run through all such possible combinations. As will be demonstrated, each sample is drawn according to a biased distribution. To determine how many bits we need to observe to reliably distinguish the cipher from a random source, we need to estimate the bias. We denote the window sizes by r1 and r2 . Note that we are calculating a number of samples, and that for every time instant we get r1 · r2 new samples. Assume that at time t none of the members in the recurrence relation is decimated. Then among all r1 ·r2 constructed samples, one will correspond to the actual recurrence relation and thus will always sum to zero, contributing with the bias 1/2. The other r1 · r2 − 1 samples are assumed to be random. So the distribution for one sample can then roughly be calculated as 1 1 · pdec P (zw1 ⊕ zt ⊕ zw2 = 0) = 1/2 + 1/2 · r1 r 2
Assume that at time instant t, none of the terms in the above equation is decimated, i.e., they all appear somewhere in the key stream. Considering a fixed position t0 in the keystream, we would like to estimate the distance from zt0 = st to zt0 −τ10 = st−τ1 , and also the distance between zt0 = st and zt0 +τ20 −τ10 = st+τ2 −τ1 , i.e., the values of τ10 and τ20 . This decimation is illustrated in Figure 2. The expected distance between the outputs from the LFSR, corresponding to input st−τ1 and st , is τ1 /E(C). Similarly the expected distance between st and st+τ2 −τ1 2 −τ1 . So we will place one window around the posi- and the statistical distance is τE(C) tion t0 − τ1 /E(C) and another window around the position 1 1 1 ε= · · · pdec , 2 −τ1 t0 + τE(C) . 2 r1 r2
where pdec denotes the probability that none of the three 1. Find a weight three multiple of the LF SR recurrence. members in the recursion is decimated. In the approxima2. Determine the expected centre positions of the windows. tion we neglect the probability that the correct position 3. Calculate the sizes, r1 and r2 , of the windows. 4. Calculate the bias ε. in some cases deviates more than two standard deviations 5. Calculate the number of bits N we need to observe. from the expected position. In practice, this would lead to 6. for t from 0 to N a slightly smaller bias. if zt = 0 Z Z Z Z P W + = Pt win1 · Pt win2 + (r1 − Pt win1 )(r2 − Pt win2 ) We can now estimate how many keystream bits we need else if zt = 1 to observe, in order to make a correct decision. If the win1 win1 Z Z P W + = PtZ (r2 − Pt win2 ) + (r1 − PtZ )Pt win2 distributions are smooth (close to uniform), the number end if of variables N we need to observe is N ≈ 1/ε2 samples in Zwin1 Zwin2 Move windows and calculate Pt+1 , Pt+1 order to reliably distinguish between the two distributions, end for √ see for example Coppersmith et al. (2002). Note that the 7. if |P W − N · r12·r2 | > N · r1 · r2 output “cipher” otherwise “random”. error probabilities are decreasing exponentially with N . At each time instant we receive r1 · r2 new samples, and hence the total number of bits we need for the distinguisher can Figure 3: Summary of the proposed distinguishing attack. be estimated by r1 · r 2 N≈ 2 . pdec Johansson. (2005) it was shown that the standard deviation for a sum of these samples, can be estimated by 3.5 Complexity of calculating the samples σ = pN r14r2 . The strength of the distinguisher lies in the fact that the calculation of the number of ones and zeros in the windows can be performed very efficiently. When we move the centre position from zt to zt+1 we also move the windows one step to the right. We denote the number of zeros in window one and two by PtZwin1 , and PtZwin2 , respectively. The number of samples that fulfil zw1 + zt + zw2 = 0 at time t is denoted PtW , where w1 , w2 are any pair of positions in window one and window two, respectively. When moving the windows we Zwin1 get Pt+1 from PtZwin1 by adjusting corresponding to the bit leaving the window and the one entering the window, e.g. Zwin1 Pt+1 = PtZwin1 + p, where p = 1 if a one is leaving the window and a zero entering, p = −1 if it is the opposite, and p = 0 otherwise. Zwin1 Zwin2 From the Pt+1 and Pt+1 we can, with few basic W computations, calculate Pt+1 . Define one operation as the Zwin1 W computations required to calculate Pt+1 from Pt+1 and Zwin2 Pt+1 . Theorem 1. The proposed distinguisher requires about N = rp12·r2 bits of keystream and uses a computational dec complexity of approximately N operations. Although the number of zeros in the windows Zwin1 Zwin2 Pt+1 , Pt+1 are dependent of the number of zeros in the previous window PtZwin1 , PtZwin2 , the covariance between the number of samples received at time instant t W and t + 1 is zero, Cov(Pt+1 , PtW ) = 0, see Englund and Johansson. (2005). The last step in the attack is to determine whether the collected data really is biased. A rough method for the hypothesis test is to check whether the result deviates more than two standard deviations from the expected result in the case when the bits are truly random. In Englund and
3.6
Summary of previous attack
The describedPattack is summarized in Figure 3. In this N figure, P W = t=0 PtW is the total number of zero samples out of a total of N ·r1 ·r2 samples. Simulation results for the attack were presented in Englund and Johansson. (2005).
4 AN EFFICIENT DISTINGUISHER USING
VECTORIAL SAMPLES In the previous section an efficient distinguisher was created by adding binary symbols from different windows in the keystream. The natural step to improve this strategy is to consider a word based attack. Instead of considering the number of zeros and ones inside the windows, we increase the alphabet size and count words (vectors of consecutive bits) inside the windows and around the centre member of the weight three recurrence equation. In this section the idea of using words is applied directly on the keystream, i.e., a length two word would be of the form (zt , zt+1 ). We start by considering the calculation and storage of the number of different words inside the two windows. Let PtZwin1 now denote the (empirical) distribution of words in window one and PtZwin2 the same in window two. The word at the centre position of the recurrence relation is denoted Zt . Note that we can maintain an effective upZwin1 dating procedure for these distribution, i.e., Pt+1 and Zwin2 Pt+1 are easily updated from PtZwin1 and PtZwin2 . The disadvantage of the proposed procedure, compared to the previous method is that the complexity of calculating the overall number of samples of a certain value increases with larger word size. Using a larger alphabet means a higher computational complexity to calculate the number of ways to add the words in the different windows. On the other hand we expect a higher bias in a word based
0 1 1 0 1 1 1 1 1 0 1 1
01 10 11 11 10 11
10
10
11 01 11 11 01
Word i 00
0 0 0 1 1 0 0 0 0 1 0 0
00 01 10 00 01 00 00 11 00 00 10
Distribution Z Z Pt win1 [i] Pt win2 [i] 0 6
1. 2. 3. 4. 5.
Find a weight three multiple of the LFSR. Determine the positions of the windows. Calculate the sizes r1 , r2 of the windows. Estimate the number of bits N we need to observe. for t from 0 to N for i from 0 to 2b for j from 0 to 2b Z Z P W (i ⊕ j ⊕ Zt )+ = Pt win1 (i) · Pt win2 (j) end for end for Zwin1 Zwin2 Move window and calculate Zt+1 , Pt+1 and Pt+1 . end for
6.
Calculate I =
PtW [i] 0 · 2 + 3 · 1 + 2 · 6 + 6 · 1 = 27
01
3
2
0 · 1 + 3 · 2 + 2 · 2 + 6 · 6 = 46
10
2
2
0 · 6 + 3 · 2 + 2 · 2 + 6 · 1 = 16
11
6
1
0 · 2 + 3 · 6 + 2 · 1 + 6 · 2 = 32
P
x∈X
P W (x) · log2
h
P0 (x) 2−b
i
If I > 0 output “cipher” otherwise “random”.
Figure 4: Calculating the samples. approach, giving a shorter required keystream.
Figure 5: Summary of the proposed distinguishing attack.
4.1
4.3
Convolution of large distributions
As described, the main idea of the distinguishing attack using word size b is to create samples of the form (zw1 , zw1 +1 , . . . , zw1 +b−1 ) ⊕ (zt , zt+1 , . . . , zt+b−1 ) ⊕ ⊕(zw2 , zw2 +1 , . . . , zw2 +b−1 ), t ≥ 0,
where the word (zw1 , zw1 +1 , . . . , zw1 +b−1 ) is any word inside the first window and (zw2 , zw2 +1 , . . . , zw2 +b−1 ) is any word inside the second window. The total number of samples created at time t is r1 · r2 , where now r1 is the number of words inside the first window, etc. Consider the two stored tables PtZwin1 and PtZwin2 containing the number of different words inside each window as empirical distributions. Then at each time instant in the attack we need to perform a convolution of two such distributions. This is a time consuming operation if the distributions are large. A trivial calculation of the convolution of two distributions of size n has complexity O(22n ). However, it has been demonstrated, see for example Maximov and Johansson (2005) that convolutions over bitwise addition of two large distribution can be performed via Fast Hadamard Transform (FHT) with complexity O(n · 2n ).
4.2
An example
Consider the case when we have a keystream according to Figure 4. In the example we have window sizes of 12 bits and we use vectors of length two. The first step is to calculate the number of words in each window. The next step is to calculate in how many ways we can add together one word from each window and the centre word to get a specific word. For example, calculating in how many ways we can combine (add bitwise) words from the windows to get the bitwise modulo two sum (0, 1). The centre word in the recurrence is assumed to be (1, 0), so we need to find all combinations adding to (1, 1). That would be 0 · 1 + 3 · 2 + 2 · 2 + 6 · 6 = 46 samples having value (0, 1) out of a total of 121 samples. The calculation is illustrated in Figure 4.
Rough theoretical estimates for word size two
For word size two, we create samples Wl , of the form Wl = (zw1 , zw1 +1 ) ⊕ (zt , zt+1 ) ⊕ (zw2 , zw2 +1 ), t ≥ 0, where w1 is any position in the first window and w2 is any position in the second window. We will run through all such possible combinations. Let us assume that at time t we consider (zt , zt+1 ) and suitably placed windows. With probability (1 − pdec ) some symbol in the first relation is not present and with the same probability (1 − pdec ) some symbol in the second relation is not present. We get four possibilities, one being the sample drawn according to the uniform distribution, happening with probability (1 − pdec )2 . Another being the sample draw according to a distribution where the first coordinate is zero, happening with probability (1 − pdec )pdec . The same probability holds when the second second recursion holds and the second coordinate is zero, and finally all LFSR symbols in both relations are in place with probability p2dec . This leads to r1 · r2 − 1 samples from the uniform distribution and one sample from a skew distribution. For an arbitrary sample, this gives P (Wl = (0, 0)) = 1/2pdec +1/4p2dec , P (Wl = (0, 1)) = P (Wl = (1, 0)) = 1/4 + r 1 r2 1/4p2
−1/2p
+1/4p2
dec dec 1/4 − r1 rdec and P (Wl = (1, 1)) = 1/4 + . r 1 r2 2 Clearly, a similar analysis can be done for larger word sizes.
4.4
Summary of attack
The proposed attack using words is summarized in Figure 5. The distribution P0 (x), ∀x ∈ X is determined through simulation before the attack and needs only be done once. A convolution of PtZwin1 (i) and PtZwin2 (j) is performed at each time instant, to decrease the complexity this step can be performed via the Fast Hadamard Transform.
LF SRs
LF SRc
st H H 6
- zt s0 s1 s2
ct
? ? z0 z1
P (S = z0 z1 z1 |Z 0 |Z P (S = z0 z1 z1 0 z |Z P (S = z0 z1 1 0 z 0 |Z P (S = z0 z1 1 0 z z |Z P (S = z0 1 1 0 z z 0 |Z P (S = z0 1 1 0 z 0 z |Z P (S = z0 1 1 0 0 z 0 |Z P (S = z0 z1 1
= z0 z1 ) =
0.5
= z0 z1 ) =
0.25
= z0 z1 ) =
0.25
= z0 z1 ) =
0
= z0 z1 ) =
0
= z0 z1 ) =
0
= z0 z1 ) =
0
= z0 z1 ) =
0
Figure 6: Cipher used in simulations Figure 7: Keystream generated using ct ∈ {1, 2}. Vector size
Bits
Samples
D(P0 , P1 )
1
229.90
245.46
2−33.9
2
229.90
245.45
2−32.9
3
229.90
245.44
2−32.2
4
229.90
242.10
2−31.7
5
229.90
242.09
2−31.3
6
223.25
238.75
2−31.1
8
223.25
238.73
2−29.9
Table 1: D(P0 , P1 ) for different vector lengths.
4.5
Simulation results
To verify the effectiveness of our ideas we implemented the attack on an irregularly clocked LFSR. The simulation results are based on the step-one-step-two generator, i.e., at every time instant an LFSR producing the keystream is clocked one or two times, i.e., ct ∈ {1, 2} with equal probability. Hence E(C) = 1.5 and V (C) = 2.5, the generator is illustrated in Figure 6. For LF SRc we used the primitive trinomial x41 +x20 +1, and for LF SRs we used the primitive trinomial x3660 + x1637 + 1. We fix the central member of the feedback polynomial. The centre position for window one will be positioned at t − τ1 /E(C) = t − 3660−1637 = t − 1349, 1.5 2 −τ1 and at t + τE(C) = t − 1637 = t + 1091 for window 1.5 two. We use window sizes of four standard deviations, √ √ i.e., r1 = 4 2.5 · 1349 = 232 and r2 = 4 2.5 · 1091 = 209. We can compare the bit oriented approach from the previous section (corresponding to vector size 1) with the new word based approach. Simulation results for different vector sizes are presented in Table 1, where the divergence between the empirical distribution of the cipher ,P0 , and the uniform distribution, P1 , is denoted by D(P0 , P1 ). By using larger vector sizes we can achieve a higher bias, and thus fewer keystream bits are needed for the attack. However, the computational complexity may be higher. In situations where the number of keystream bits are more important than the computational complexity the idea with vectors may be of great interest.
5 CONSIDERING WORDS IN THE
UNDECIMATED SEQUENCE Instead of considering words in the keystream sequence we
can estimate the distribution of words in the undecimated sequence of LF SRs . These words can have a different word size than in the decimated sequence. A word in the keystream is as before denoted Zt = (zt , zt+1 , . . . , zt+b−1 ), and a corresponding word of size d in the undecimated LFSR stream is denoted by St = (st , st+1 , . . . , st+d−1 ). Consider the case depicted in Figure 7. In this figure we consider keystream words of size two and the decimation is done according to ct ∈ {1, 2} with equal probability. The distribution of the undecimated words will be skew, e.g., P (S = 111|Z = 00) = 0. The probabilities P (St |Zt ), ∀St , Zt , can be precomputed. For simplicity, our considerations are purely combinatorial. Each observed Z vector in a window will give rise to a number of samples for the S vector. In the previous example, an observed Z = (00) would give 4 possible samples for S, namely two samples with value S = (000) one with value S = (001) and one with value S = (010). When performing the attack, as in the previous section, windows are placed around the expected positions of members in the recurrence relation. Let us denote the empirical distributions of the corresponding undecimated words of size d as PtSwin1 , PtSwin2 and PtScentre . The empirical distribution of undecimated words in the windows, PtSwin1 and PtSwin2 are calculated and, as before, the update of these tables can be done very efficiently when moving the windows. The update is performed for the two windows at each time instant t. For the centre word the table PtScentre is obtained by a simple table lookup. In the cipher case, we are looking at an equation of the form St−τ1 + St + St+τ2 −τ1 = 0, t ≥ 0. The final procedure is almost exactly as in the previous section. We have an empirical distribution of St−τ1 in the array PtSwin1 , etc., and we estimate of the distribution of St−τ1 + St + St+τ2 −τ1 through the convolution of PtSwin1 , PtSwin2 and PtScentre . This distribution is denoted PtW . By computing the convolution via Fast Hadamard Transform the complexity of these calculations can, as mentioned in Section 4.1, be significantly lowered. Finally, by summing over all t, we obtain the empirical distribution P W . As before, this needs to be checked against the uniform distribution and possibly some experimentally verified P0 (x). An outline of the proposed attack is depicted in Figure 8. To investigate the ideas we implemented the attack
1. 2. 3. 4. 5.
Find a weight three multiple of the LFSR. REFERENCES 7 * Determine the positions of the windows. Calculate the sizes r1 , r2 of the windows. Estimate the number of bits N we need to observe. for t from 0 to N Canteaut, A., Trabbia, M., 2000. Improved fast correlafor i from 0 to 2d tion attacks using parity-check equations of weight 4 d for j from 0 to 2 and 5. In: Preneel, B. (Ed.), Advances in Cryptology— for k from 0 to 2d S S EUROCRYPT 2000. Vol. 1807 of Lecture Notes in ComP W (i ⊕ j ⊕ k)+ = Pt win1 (i) · Pt win2 (j) · PtScenter (k) end for puter Science. Springer-Verlag, pp. 573–588. end for end for Chepyzhov, V., Johansson, T., Smeets, B., 2000. A simSwin2 Swin1 Scenter , and Pt+1 , Pt+1 Move window and calculate Pt+1 ple algorithm for fast correlation attacks on stream ciend for
6.
Calculate I =
P
x∈X
P W (x) · log2
h
P0 (x) 2−b
i
phers. In: Schneier, B. (Ed.), Fast Software Encryption 2000. Vol. 1978 of Lecture Notes in Computer Science. Springer-Verlag, pp. 181–195.
If I > 0 output “cipher” otherwise “random”.
Figure 8: The attack considering undecimated sequences.
Bits
Samples
D(P0 , P1 )
Binary attack
229.90
245.46
2−33.9
Undecimated attack (b=2, d=3)
229.90
251,45
2−33.86
Undecimated attack (b=3, d=5)
223.25
250,79
2−33.79
Table 2: D(P0 , P1 ) using words in undecimated sequence.
on a step-one-step-two generator and compared the attack with the bit oriented attack presented in Section 3. The cipher used in the simulation is the same used in Section 4.5. The results are summarized in the Table 2, where b and d denotes the size of the words in the decimated respectively the undecimated sequence, D(P0 , P1 ) denotes the divergence between the simulated empirical distribution and the uniform distribution. In the example the gain of using vectors in the undecimated sequence is not very large, but on other ciphers the gain might be larger. Using larger words in the decimated sequence similarly as in Section 4 can also be considered to improve the attack. It can be noted that for each bit in the undecimated attack we receive more samples than for each bit in the bit oriented attack.
Coppersmith, D., Halevi, S., Jutla, C., 2002. Cryptanalysis of stream ciphers with linear masking. In: Yung, M. (Ed.), Advances in Cryptology—CRYPTO 2002. Vol. 2442 of Lecture Notes in Computer Science. SpringerVerlag, pp. 515–532. Courtois, N., 2003. Fast algebraic attacks on stream ciphers with linear feedback. In: Boneh, D. (Ed.), Advances in Cryptology—CRYPTO 2003. Vol. 2729 of Lecture Notes in Computer Science. Springer-Verlag, pp. 176–194. Courtois, N., Meier, W., 2003. Algebraic attacks on stream ciphers with linear feedback. In: Biham, E. (Ed.), Advances in Cryptology—EUROCRYPT 2003. Vol. 2656 of Lecture Notes in Computer Science. Springer-Verlag, pp. 345–359. Cover, T., Thomas, J., 1991. Elements of Information Theory. Wiley series in Telecommunication. Wiley. Daemen, J., Rijmen, V., 2002. The Design of Rijndael. Springer-Verlag. Ekdahl, P., Johansson, T., 2002. Distinguishing attacks on SOBER-t16 and SOBER-t32. In: Daemen, J., Rijmen, V. (Eds.), Fast Software Encryption 2002. Vol. 2365 of Lecture Notes in Computer Science. SpringerVerlag, pp. 210–224. Englund, H., Johansson., T., 2005. A new distinguisher for clock controlled stream ciphers. In: Fast Software Encryption 2005. Lecture Notes in Computer Science. Springer-Verlag.
6 CONCLUSIONS
Englund, H., Johansson, T., 2005. A new simple techWe propose three distinguishers for irregularly clocked nique to attack filter generators and related ciphers. stream ciphers. The first attack distinguishes the cipher In: Handschuh, H., Hasan, A. (Eds.), Selected Areas in with very low computational complexity. Using vectors Cryptography—SAC 2004. Vol. 3357 of Lecture Notes of keystream bits, a procedure was shown that requires in Computer Science. Springer-Verlag, pp. 39–53. fewer keystream bits but possibly a higher complexity to distinguish the cipher. Finally an idea for a distinguisher Goli´c, J., October 1996. Computation of low-weight paritythat estimates the distribution of the undecimated LFSR check polynomials. Electronic Letters 32 (21), 1981– stream was proposed. 1982.
Goli´c, J., Menicocci, R., 2003. A new statistical distinguisher for the shrinking generator, available at http://eprint.iacr.org/2003/041, Accessed September 29, 2003. Johansson, T., J¨onsson, F., 1999a. Fast correlation attacks based on turbo code techniques. In: Wiener, M. (Ed.), Advances in Cryptology—CRYPTO’99. Vol. 1666 of Lecture Notes in Computer Science. Springer-Verlag, pp. 181–197. Johansson, T., J¨onsson, F., 1999b. Improved fast correlation attacks on stream ciphers via convolutional codes. In: Stern, J. (Ed.), Advances in Cryptology— EUROCRYPT’99. Vol. 1592 of Lecture Notes in Computer Science. Springer-Verlag, pp. 347–362. Johansson, T., J¨onsson, F., 2000. Fast correlation attacks through reconstruction of linear polynomials. In: Bellare, M. (Ed.), Advances in Cryptology—CRYPTO 2000. Vol. 1880 of Lecture Notes in Computer Science. Springer-Verlag, pp. 300–315. Johansson, T., J¨onsson, F., 2002. A fast correlation attack on LILI-128. In: Information Processing Letters. Vol. 81. pp. 127–132. Junod, P., 2003. On the optimality of linear, differential and sequential distinguishers. In: Advances in Cryptology—EUROCRYPT 2003. Vol. 2656 of Lecture Notes in Computer Science. Springer-Verlag, pp. 17–32. Maximov, A., Johansson, T., 2005. Fast computation of large distributions and its cryptographic applications. In: Advances in Cryptology—ASIACRYPT 2005. Vol. 3788 of Lecture Notes in Computer Science. SpringerVerlag, pp. 313–332. Meier, W., Staffelbach, O., 1988. Fast correlation attacks on stream ciphers. In: G¨ unter, C. (Ed.), Advances in Cryptology—EUROCRYPT’88. Vol. 330 of Lecture Notes in Computer Science. Springer-Verlag, pp. 301– 316. Siegenthaler, T., 1984. Correlation-immunity of nonlinear combining functions for cryptographic applications. IEEE Transactions on Information Theory 30, 776–780. Wagner, D., 2002. A generalized birthday problem. In: Yung, M. (Ed.), Advances in Cryptology—CRYPTO 2002. Vol. 2442 of Lecture Notes in Computer Science. Springer-Verlag, pp. 288–303. Watanabe, D., Biryukov, A., Canniere, C. D., 2003. A distinguishing attack of SNOW 2.0 with linear masking method. In: Selected Areas in Cryptography—SAC 2003. To be published in Lecture Notes in Computer Science. Springer-Verlag.