On the Distribution of Characteristics in Bijective Mappings Luke O'Connor 1 Department fo Computer Science University of Waterloo, Canada Abstract Dierential cryptanalysis is a method of attacking iterated mappings based on dierences known as characteristics. The probability of a given characteristic is derived from the XOR tables associated with the iterated mapping. If is a mapping : Z2m ! Z2m , then for each X; Y 2 Z2m the XOR table for gives the number of input pairs of dierence X = X +X 0 for which (X )+ (X 0) = Y . The complexity of a dierential attack depends upon two properties of the XOR tables: the density of zero entries in the table, and the size of the largest entry in the table. In this paper we present the rst results on the expected values of these properties for a general class of mappings . We prove that if : Z2m ! Z2m is a bijective mapping then the expected size of the largest entry in the XOR table for is bounded by 2m, while the fraction of the XOR table that is zero approaches e? 12 = 0:60653. We are then able to demonstrate that there are easily constructed classes of iterated mappings for which the probability of a dierential-like attack succeeding is very small.
Keywords: Dierential cryptanalysis, iterated mapping, product cipher.
The author is presently employed by the Distributed System Technology Center, Brisbane, Australia. Correspondence should be sent to ISRC, QUT Gardens Point, 2 George Street, GPO Box 2434, Brisbane Q 4001, Australia; Email
[email protected]. 1
1
1 Introduction Dierential cryptanalysis is a statistical attack popularized by Biham and Shamir [3, 5] that has been applied to a wide range of cryptosystems including LUCIFER, DES, FEAL, REDOC, Kahfre [7, 8, 10, 17, 18, 25]. The attack is universal in that it can be used against any cryptographic mapping which is constructed from iterating a xed round function (compare this to the universality of the birthday paradox against hash functions). For this reason the dierential attack must be considered one of the most general cryptanalytic attacks known to date. The main shortcoming of dierential cryptanalysis is that large amounts of chosen-ciphertext may be required to determine the key, which will not possible in most practical circumstances. Nevertheless, dierential cryptanalysis has caused the revision and redesign of several iterated mappings [1, 5, 6, 20] and is the only known attack which can theoretically recover DES keys in time less than the expected cost of exhaustive search [4]. Importantly, the method has shown that the security of DES is not signi cantly increased if independent subkeys are used. We will give a brief description of dierential cryptanalysis with reference to product ciphers, though any iterated mapping could be used. For a product cipher E that consists of R rounds, let Er (X; K ) be the ciphertext of the plaintext X under the key K for r rounds, 1 r R, where ER(X; K ) = E (X; K ) = C is the mapping for X . Let C (r) = Er (X; K ) + Er (X 0; K ) be the dierence between the ciphertexts of two plaintexts X; X 0 after r rounds where 1 r R. A r-round characteristic is de ned as an (r + 1)tuple r (X; Y1; Y2; : : :; Yr ) where X is a plaintext dierence, and the Yi are ciphertext dierences. A plaintext pair X; X 0 of dierence X is called a right pair with respect to a key K and a characteristic r (X; Y1; Y2; : : :; Yr ) if when the pair X; X 0 is encrypted, C (i) = Yi for 1 i r. That is, the characteristic correctly predicts the ciphertext dierences at each round. The characteristic r has probability p r if a fraction p r of the plaintext pairs of dierence X are right pairs. On the other hand, if X; X 0 is not a right pair, then it is said to be a wrong pair (with respect to the characteristic and the key). A table which records the number of pairs of dierence X that give the output dierence Y for a mapping is called the XOR table distribution of . A characteristic X; Y is said to be impossible for if its corresponding XOR table entry is zero. Also a characteristic is said to be nonzero if w(X ) > 0. Assume that we wish to determine the subkey KR that is being used in round R. The 2
method of dierential cryptanalysis proceeds as follows:
Step 1 Find a highly probable r-round characteristic r (X; Y1; Y2; : : :; Yr ) which gives (partial) information about the input and output dierences of the round mapping F at round R ;
Step 2 Uniformly select a ciphertext pair X; X 0 with dierence X and encrypt this
pair, assuming that X; X 0 is a right pair. Determine candidate subkeys K10 ; K20 ; : : :; Kd0 such that each Ki0 could have caused the observed output dierence. Increment a counter for each candidate subkey Ki0 ;
Step 3 Repeat Step 2 until one subkey KR0 is distinguished as being counted signi cantly more often than other subkeys. Take KR0 to be the actual subkey.
If X; X 0 is a right pair then one of the candidate subkeys K10 ; K20 ; : : :; Kd0 is the actual subkey KR, and KR will be counted for each right pair. On the other hand, if X; X 0 is a wrong pair then we assume that the candidate keys are distributed uniformly over the set of possible subkeys for the round, and KR will be counted with small probability. Similarly, we assume that any key other than the actual subkey will also be counted infrequently. It is then natural to de ne the complexity of a dierential cryptanalysis to be the number of encrypted plaintext pairs of a speci ed dierence required to determine the key or subkey. From experiments on restricted versions of DES, Biham and Shamir [3] found that the complexity of the attack was approximately c=p , where p is the probability of the characteristic being used, and c is a constant bound as 2 < c < 8. A variant of the attack is to perform Step 2 using only a subset of the subkey bits which could be counted for that round. For example, in DES each subkey is 48 bits in length which could possibly require 248 counters to record the individual frequencies of the candidate subkeys. It is then more practical to count on fewer key bits, and for DES it is natural to count on 6k key bits, representing the subkey bits entering k S -boxes. Observe that those S -boxes Sj whose subkeys are not being counted may still be used to discard, before counting, those ciphertext pairs X; X 0 which yield an impossible characteristic for Sj when X; X 0 is assumed to be a right pair. That is, if for S -box Sj the observed input/output dierence is Xj ; Yj , and Xj ; Yj is an impossible characteristic in the XOR table for Sj , then the pair encrypted to produce the dierences Xj ; Yj cannot be a right pair. By ltering plaintext pairs in this way, the ratio of right to wrong pairs that 3
will be counted is enlarged, and the actual subkey will be distinguished more directly. Thus the density of impossible characteristics in an S -box is important to determine the eectiveness of this ltering process. 1.1
Results
Observe that an r-round characteristic is simply the concatenation of r 1-round, or single round, characteristics de ned on the round mapping F . It then follows that the probability of the r-round characteristic r can be bound as p r (p )r where p is the probability of the most likely nonzero single round characteristic. At present there are no general bounds known for p . The main result of this paper is to bound p when the round mapping F is derived from a set of bijective mappings. These results will lead to bounds on the probability of characteristics in a large class of product ciphers. Let : Z2m ! Z2m be an bijective mapping, which will be referred to as an m-bit permutation. The set of all m-bit permutations is known as the symmetric group on 2m objects and is denoted as S2m . Let (X; Y ) be the value of the XOR table entry of the pair X; Y 2 Z2m for the permutation 2 S2m . We will consider (X; Y ) as a random variable (X; Y ) : S2m ! f0; 2; : : :; 2m g, assuming the uniform distribution on the set S2m . We prove (Theorem 2.1) that 2m?1 m?1 2 2 1 k Pr( (X; Y ) = 0) = 2m ! (?1) k 2k k! (2m ? 2k)! k=0 X
!
(1)
from which we are able to determine the exact distribution of (X; Y ) (see Corollary 2.1). Note that eq. (1) gives the distribution of a single entry in the XOR table, but we are interested in global properties of the table such as the largest entry and the fraction of the table that is zero. Fortunately, we are able to manipulate the distribution of eq. (1) to yield bounds on such global properties of the XOR table. In x3 we prove for large m, the expected probability of the most likely nonzero characteristic for an m-bit permutation is at most 2mm?1 when the uniform distribution on S2m is assumed. Equivalently, the expected maximum entry in the XOR table for nonzero characteristics is at most 2m for large m. Experiments indicate that this bound is signi cant for m 8 (see Table 1 in x3). Theorem 3.1 indicates that the individual entries of an XOR table are expected to be distributed in the interval [0; 2; : : : ; 2m]. At this time we are not able to determine the exact distribution of the entries within this interval, but we are able to show that most 4
XOR table entries are in fact zero. We prove (Corollary 2.2) that the expected fraction of the XOR table for nonzero characteristics that is zero approaches e? 21 = 0:60653. In another way, approximately 60% of the entries for nonzero characteristics will be zero for a permutation selected uniformly. It follows that impossible characteristics can then be used to discard a high percentage of wrong pairs X; X 0 which give no probabilistic information about the actual key. The sections of this paper are arranged as follows. In x1.2 some relevant notation is de ned. In x2 we present the Pairing Theorem which is the counting result that will be used to prove the major results concerning the distribution of characteristics in XOR tables. Later in x2 we determine the expected fraction of the XOR table that is zero, and use these calculations in x3 to prove our results concerning the largest entry of the XOR table. In sections 3.1 and 3.2 we use our previous results to bound the probability of characteristics in two common product ciphers. 1.2
Notation
Throughout the paper we will let [] denote a boolean predicate which evaluates to 0 or 1. For example, the sum ni=1[n is prime] computes (n), the number of primes less than or equal to n, while (n) = ni=1[gcd(i; n) = 1]. This notation should not be confused with E[] which is the expected value of the random variable . We will now formalize some of the notation given in the introduction. For a given 2 S2m , de ne (X; Y ) as P
P
(X; Y ) =
X
X;X 0 2Z2m X =X +X 0
[(X ) + (X 0) = Y ]:
(2)
Then 2?m (X; Y ) is a random variable giving the probability that the dierence in the output of the mapping is Y when the dierence of the input pair X; X 0 is X . For all 2 S2m , observe that when X = 0 or Y = 0 it follows that (X; Y ) = 0, unless X = Y = 0 whereupon (X; Y ) = 2m . The distribution of (X; Y ) taken over all possible X; Y 2 Z2m is known as the pairs XOR distribution table for , or simply the XOR table for . Unless otherwise stated, when we speak of a characteristic X; Y for a product cipher we will refer to a nonzero single round characteristic. 5
Example 1.1 For an m-bit permutation , let XOR be the 2m 2m matrix where XOR (i; j ) = (i; j ), 0 i; j 2m ? 1, where i; j are treated as 3-bit binary vectors.
Observe that XOR (0; 0) = 8, and all other entries in the rst row or column of XOR() are zero. For = (7; 2; 4; 1; 5; 6; 3; 0) the corresponding XOR table is given as: 2
XOR =
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4
8 0 0 0 0 0 0 0
0 0 0 0 2 2 2 2
0 0 0 0 2 2 2 2
0 4 4 0 0 0 0 0
0 0 0 0 2 2 2 2
0 4 0 4 0 0 0 0
0 0 4 4 0 0 0 0
3
0 0 0 0 : 2 2 2 2 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5
(3)
Notice that if each entry in the XOR table is divided by 2m then the resulting matrix will be doubly stochastic. 2 The XOR table for an m-bit permutation has the following general form: 2
3
0 a1;2m?1 m a2;2m?1 XOR = = 2 0 : (4) 0 A ... a2m?1;2m?1 We are interested in the properties of the (2m ? 1) (2m ? 1) submatrix A = [ai;j ]; 1 i; j 2m ? 1, which corresponds to that portion of the XOR table entries attributed to 6 6 6 6 6 6 6 6 6 4
2m 0 0 0 a1;1 a1;2 0 a2;1 a2;2 ... ... ... 0 a2m?1;1 a2m?1;2
7 7 7 7 7 7 7 7 7 5
2
3
4
5
nonzero characteristics. In this paper we will show that for large m, approximately 60% of the entries in A are zero and largest entry in A is expected to be bounded by 2m. It is evident that all the entries of an XOR table are even, since summation in eq. (2) is taken over all unordered pairs. However, for clarity in the counting to follow we will consider the pairs to be ordered. To this end, de ne 0 (X; Y ) as 0 (X; Y ) = (X; Y )=2: 6
(5)
2 The Pairing Theorem Observe that a characteristic X; Y corresponds to a pairing of the inputs and outputs of a permutation (namely the pairs X; X 0 and Y; Y 0 where X = X + X 0 and Y = Y + Y 0). For : A ! B , let A and B be pairings on the sets A and B , respectively. Theorem 2.1 determines the number of functions which take no pair of A to a pair in B , and will be referred to as the Pairing Theorem. All our results concerning the distribution of characteristics in bijective mappings are derived from the Pairing Theorem.
Theorem 2.1 (Pairing Theorem) Let A = fa1; a2; : : : ; a2dg and B = fb1; b2; : : : ; b2dg be sets of distinct elements. Let A A A and B B B be unordered pairs, such that ai(bi) occurs in one pair of A(B ) for 1 i 2d. Then the number (d) of bijective functions : A ! B such that for 8(ai; aj ) 2 A, ((ai); (aj )) 62 B is (d) =
d
X
k=0
(?1)k
d 2 2k k! (2d ? 2k)!: k !
(6)
Proof. Order the elements of B as (b0i; b0d+i ), 1 i d. For 1 i d de ne P (i) as
P (i) = f j ((a); (a0)) = (b0i; b0d+i); (a; a0) 2 A g which is the number of functions that map some pair of A to the pair (b0i; b0d+i ) 2 B. It follows that (d) = (2d)! ?
[
1j d
P (j ) = (2d)! +
X
Sf1;2;:::;dg S= 6 f;g
(?1)jSj
\
j 2S
P (j )
(7)
using the inclusion-exclusion principle [13]. For 1 k d de ne the integers
P (i01; i02; : : : ; i0k )
=
\
1j k
P (i0j )
(8)
= P (d; k). From eq. (7) it then and by symmetry P (1; 2; : : : ; k) = P (i01; i02; : : :; i0k ) def follows that d (9) (d) = (2d)! + (?1)k kd P (d; k): k=1 !
X
7
It remains to determine P (d; k) for 1 k d. To this end, order the pairs within A as (a0i; a0d+i) for 1 k d. Then P (d; k) is the number of functions for which there are k pairs (a00i ; a00d+i) such that f(a00i ); (a00d+i)g = f(b0i; b0d+i)g, 1 i k. There are kd ways to choose the k pairs (a00i ; a00d+i) from A, k! ways of assigning the (a00i ; a00d+i) to the (b0i; b0d+i), and 2k ways of assigning (a00i ; a00d+i) to a particular pair in B . It then follows that (10) P (d; k) = kd 2k k! (2d ? 2k)! where (2d ? 2k)! is the number of ways to assign the elements in A ?fa00i ; a00d+i j 1 i kg. We then have that 2 d d d d k k (?1) k 2k k! (2d ? 2k)! (d) = (2d)! + (?1) k P (d; k) = k=0 k=1 which completes the proof of the theorem. 2
!
X
!
X
!
Observe that (2m?1 ) will give the number of permutations for which the entry in the XOR table for X; Y is zero; also, since is bijective we are able to determine the probability that k pairs of dierence X lead to dierence Y . We have de ned dierences using the `+' operator but we observe that Pairing Theorem will apply to any notion of dierence that pairs input and output dierences uniquely. Using the Pairing Theorem we will now derive the distribution of the random variable 0 (X; Y ) as de ned in eq. (5).
Corollary 2.1 For any xed nonzero X; Y 2 Z2m, assuming is chosen uniformly from the set S2m and 0 k 2m?1, 2m?1 2m?1 2 k k ! 2k (2m?1 ? k ) 0 : (11) E[ (X; Y )] =
!
X
2m ! k Proof. From the Pairing Theorem the number of 2 S2m for which (X; Y ) = 0 is (2m?1 ). By de nition P (d; k) is the number of mappings which take any k pairs from A to the xed k pairs (b0i; b0i+d) 2 B where 1 i k. Then from eq. (10) it follows that P (d; k) (d ? k) (12) (2d ? 2k)! k=0
8
is the number of mappings which take exactly k pairs from A to (b0i; b0i+k ). Then the number of m-bit permutations for which k pairs of dierence X can be mapped into k xed pairs of dierence Y is P (2m?1 ; k) (2m?1 ? k) = 2m?1 P (2m?1 ; k) (2m?1 ? k) : k (2m ? 2k)! (2m ? 2k)! S[2m?1 ] !
X
jSj=k
It follows from eq. (10) that
f : 0 (X; Y ) = kg
m?1 P (2m?1 ; k ) (2m?1 ? k ) 2 = k (2m ? 2k)! m?1 2 2 k (2m?1 ? k ) k ! 2 = k m = 2 ! Pr(0 (X; Y ) = k): !
!
The theorem follows from the de nition of expectation.
(13)
2
Now consider any row of the XOR table corresponding to an input dierence X . Since there are 2m pairs of dierence X , each of the 2m column entries in the row is expected to be 1. We will prove this formally in the next theorem by determining that E[0 (X ); Y ] tends to the constant 21 . The proof will also contain the information required to show that approximately 60% of the XOR table is expected to be zero. Remark. The proof of the next theorem is based on approximating a summation Pk Tk by its rst term as Pk T1 Q1j 0 and suciently large m. Then by de nition (2m?1) = e? 12 lim m!1 T (m; 0) and 1
(2m?1 ) e? 21 T(m; 0) e? 2 2m!: 11
(18)
A similar method to can be used to determine an asymptotic estimate of E[0 (X; Y )]. To this end, de ne 2m?1 terms T(m; k) for 1 k 2m?1 as
T(m; k) =
2m?1 2 k k! 2k (2m?1 ? k) k !
(19)
and E[0(X; Y )] = 1k2m?1 T(m; k)=2m!. Then for large m and 1 k 2m?1 ? 1 the ratio of successive terms is ?1 T (m; k + 1) = 2k 1 + 1 : (20) T(m; k) 2m ? 2k As in the rst part of the the proof, it can be shown that T(m; 1) is the dominant term amongst the T (m; k). For large m it then follows that E[0 (X; Y )] ? e 21 < O 1 + o(1) < 2m ! T(m; 1) 2m for any > 0 and suciently large m. Then by de nition E[0 (X; Y )] = e 21 lim m!1 2m ! T (m; 1) and 1 e 0 E[ (X; Y )] 2 T2m(!m; 1) : P
! #
"
The theorem now follows since
T (m; 1) = 2 (2m?1 )2 (2m?1 ) 2 22m?2 e? 21 T(m; 0) 22m?1 e? 12 (2m ? 2)! and 1 2 T (m; 1) 22m?1 (2m ? 2)! = lim 22m?1 = 1 : e = lim lim m!1 m!1 22m ? 2m m!1 2m ! 2m! 2
2
As noted in the introduction, the presence of impossible characteristics assists in discarding certain plaintext pairs which cannot give any probabilistic information concerning the actual key. It has been observed that 20{30% of the characteristics in the S -boxes of 12
DES are impossible. Let m;0 be the expected number of nonzero characteristics X; Y which have zero entries in the XOR table of a uniformly selected m-bit permutation. We are able to compute m;0 as a direct application of the Pairing Theorem and the previous theorem.
Corollary 2.2 For any xed nonzero X; Y 2 Z2m and assuming is chosen uniformly from the set S2m
m ? 1)2 (2 : mlim !1 m;0 = e 21 Proof. By the de nition of m;0 and eq. (18) it follows that 0 m;0 = 1m 2 ! X;Y 2Z2m 2S2m [ (X; Y ) = 0)] X
= 21m !
(21)
X
w(X );w(Y )>0 X
X;Y 2Z2m w(X );w(Y )>0
(2m?1 )
m ? 1)2 (2 2m! (2m?1) m 2 (2 ?21 1) : e This completes the proof of the theorem.
(22)
2
It now follows that approximately 60% of the entries of the A submatrix de ned in eq. (4) are zero since e? 21 = 0:6065. Then from eq. (4), the total fraction of an XOR table that is expected to be zero is (2m ? 1)2 + 2m+1 ? 2 : 22m e 21 22m
3 The largest entry in the XOR table For a random m-bit permutation let m be de ned as follows m def =
max
X;Y 2Z2m w(X );w(Y )>0
13
0 (X; Y ):
(23)
Thus twice m is the size of the largest XOR entry for the mapping , and will bound the probability of the most likely dierence passing through . Using the Pairing Theorem we are able to bound the expected value of m. Theorem 3.1 Assuming the uniform distribution on the set S2m E[m] 1: lim (24) m!1 m Proof. For 1 k 2m?1 , let 0m;k be the expected number of nonzero characteristics X; Y for which 0 (X; Y ) = k. Further let Pr(0 = k) be the probability that an m-bit permutation has a nonzero characteristic X; Y for which 0 (X; Y ) = k. The proof rests on the following inequalities Pr(m = k) < Pr(0 = k) 0m;k : The left inequality follows from de nitions and the right inequality follows from expanding the expectation: 0m;k
=
m?1 2X
i=0
i Pr(i characteristics of size k)
> Pr(1 characteristic of size k): We will prove that k>m k 0m;k = o(1) from which the theorem will follow. For 0 k 2m?1 we have by de nition that [0 (X; Y ) = k)] : 0m;k = 2m1 ! m 2S2m X;Y 2Z2 P
X
X
w(X );w(Y )>0
Then using the Pairing Theorem it follows that 0m;k = 2m1 ! [0(X; Y ) = k)] X;Y 2Z2m 2S2m X
= 2m1 ! = 2m1 !
w(X );w(Y )>0 X
X;Y 2Z2m w(X );w(Y )>0 X
X;Y 2Z2m w(X );w(Y )>0
X
2m?1 P (2m?1 ; k) (2m?1 ? k) k (2m ? 2k)! !
2m?1 2 2k k! (2m?1 ? k)! k !
m ? 1)2 2m?1 2 (2 2k k! (2m?1 ? k)! : = 2m ! k !
14
(25)
If the sequence in eq. (25) is plotted for increasing k, then when k > m the terms become negligible in size. Consider obtaining asymptotic estimates of eq. (25) for k = m + 1, which will be achieved in two steps. Using eq. (18) from the proof of Theorem 2.2 and asymptotic estimates of the factorial function [14, p.221], the three largest terms of eq. (25) can be estimated as 2m?1 2 (2m ? 2m ? 2)! 2m?1 2 (2m?1 ? (m + 1)) m+1 m+1 12 m 2m ! e 2 ! m ? 2m ? 2) 21 2m? m2 e2m+ 23 (2 = O(1) (2m?1 ? m ? 1) (m + 1)2m+3
m? m2 e2m+ 23 2 = O(1) m?1 : (2 ? m ? 1) 21 (m + 1)2m+3 Then estimating the remaining terms of eq. (25) in a similar way, which are 2m+1 ; (2m ? 1)2 and (m + 1)!, we have that 4m? m2 ?1 em+ 12 2 0 (26) m;m+1 = O(1) m?1 (2 ? m ? 1) 21 (m + 1)m+2 from which it follows that limm!1 (m + 1) 0m;m+1 = 0. Observe that for large m, 0m;k de ned in eq. (25) and T(m; k) as de ned in eq. (15) only dier by the multiplicative factor (?1)k (2m ? 1)2 : 2m! e 21 It then follows from eq. (16) that for large m, and 1 k 2m?1 ? 1 ?1 0m;k+1 1 : (27) 0m;k = 2(k + 1) 1 ? 2m ? 2k We then have ! #
"
mlim !1
m?1 2X
k=m+1
k 0m;k
mlim !1 = =
m?1 2X
k=m+1
k 0m;m+1 2m?1
X 0 k lim m;m +1 m!1 k=m+1 0 2m mlim !1 m;m+1 O(2 ) 6m? m2 ?1 em+ 12 2 mlim !1 O(1) (2m?1 ? m ? 1) 12 (m + 1)m+2
= = 0
15
and thus
P
2m?1
k=m+1
k 0m;k = o(1). Finally observe that for large m
E[m]
= =
m
X
k=1 m X
k Pr(m = k) k Pr(m = k)
k=1 2
+ +
m 1 ? O 0m;k k>m = m (1 + o(1)) + o(1): 4
X
m?1 2X
k=m+1 m?1 2X
k=3m+1 5
k Pr(m = k) O(k 0m;k )
+ o(1)
2
This completes the proof of the theorem.
Let E[m] be an empirical estimate of E[m] based on a sample of 10; 000 random permutations. Further, let min (max) be the smallest (largest) maximum XOR entry found across the 10; 000 permutations. Table 1 lists these quantities for m = 4; 5; : : : ; 10. We see that for m 6, (m + 1) m;m+1 is a good approximation to the tail of the summation for E[m] beginning at k = m + 1. The reader is reminded that the results
m (m + 1) 0m;m+1 4 .76863 5 .25973 6 .80244 10?1 7 .22027 10?1 8 .53856 10?2 9 .11818 10?2 10 .23470 10?3
P
2m?1 0 k=m+1 k m;k
.87258 .28436 .86489 10?1 .23498 10?1 .57019 10?2 .12438 10?2 .24584 10?3
E[m] min max 3.114 3.839 4.495 5.126 5.606 6.190 6.700
2 3 3 4 5 6 6
6 6 7 8 8 8 9
Table 1: The distribution of characteristics. of Table 1 have been derived using 0 (X; Y ) where 0 (X; Y ) = (X; Y )=2. Recall that p was de ned in the introduction as the probability of the most likely single round characteristic for an iterated mapping. The main use of Theorem 3.1 is its application in constructing classes of product ciphers for which p is bounded. If a 16
-
S
-
-
S
-
-
... ...
-
S
-
-
-
S
-
-
S
-
-
... ... ... ...
-
S
-
-
-
.. . S
-
P
-
.. . S
P
-
-
... ...
-
.. . S
P
.. .
-
-
Figure 1: The general SP-network product cipher.
product cipher uses s S -boxes, and each S -box has m m then p 2mm?1 = 2mm?1 . The probability of this being the case is less than (1 ? k>m 0m;k )s , which from Theorem 3.1 will approach 1 when m is large and s = o(mm). More practically, in Table 1 it was easy to nd 10,000 8-bit permutations for which 8 8, which could be used to construct a suitable product cipher with a known bound on p . In the next two sections we will describe two such product ciphers, assuming that 8-bit permutations with 8 8 can be found directly by random selection. P
3.1
Characteristics in SP-networks
In this section we will derive bounds on the expected value of p assuming that the round mapping F is based on random m-bit permutations selected uniformly from S2m . Consider the network shown in Figure 1. Let F consist of s S -boxes implementing m-bit permutations 1; 2; : : : ; s such that F : Z2ms ! Z2ms where 1 operates on the rst block of s bits, 2 operates on the second block of s bits, and so on. Then if we rede ne m as = 2m?1 p m def
max
2f1 ;2 ;:::;s g X;Y 2Z2m w(X );w(Y )>0
0 (X; Y )
it follows that 2mm?1 is the probability of the most likely characteristic across all s permutations in F . Then for any r-round characteristic r containing no zero dierences it 17
also follows that
r (28) 2mm?1 : In general the bound in eq. (28) is not expected to be equality when r > 1. This discrepancy is accounted for by observing that the most likely characteristics have a zero input dierence to a subset of the S -boxes, which means that these dierences cause the expected output dierence with probability 1. However, the avalanche eect diuses the nonzero output dierence of an S -box at round i to the inputs of several S -boxes at round i + 1, making it likely that more S -boxes at round i + 1 will have nonzero input dierence than at round i, thus decreasing the probability of the characteristic predicting the dierence from one round to the next. Thus characteristics are chosen not only because they are probable, but also because they may limit the in uence of the avalanche eect on causing nonzero input dierences to S -boxes. Consider a 16-round 64-bit product cipher E for which the round mapping consists of 8 8-bit permutations followed by a transposition of the 64 ciphertext bits, which is an example of an SP-network. Then to predict the input dierence to round 16 requires a 15-round characteristic 15 where the input dierence to each of the rst 15 rounds is nonzero. Let us assume that the permutations are selected uniformly from S28 and that at each round there is only one S -box which has a nonzero input dierence. It then follows that that 15 15 8
15 (29) p p 27 = 0:86736 10?18 : On the other hand, if 15 has nonzero input dierences to two S -boxes at 7 out of the 15 rounds then 27 8 (30) p 15 287 287 = 0:32311 10?26:
p r
3.2
Characteristics in DES-like networks
DES-like networks are symmetric ciphers whose round function is of the form shown in Figure 2. The ciphertext is divided into halves, the left half Lr and the right half Rr . The round function F acts on Rr under the action of Kr , the subkey round r. For the SP-networks displayed in Figure 1, if at any point a characteristic r predicts a zero 18
Lr
Rr
?
?n F hhhhhh ( hhhh(h(h(h((((((((( hhhhhh (( ( ( ( ( hhhh ( ( (( ? ? Lr+1
Kr
Rr+1
Figure 2: The round function of a DES-like cipher. dierence then all subsequent dierences will be zero since the round mapping is bijective. However, in the case of a DES-like mapping, the round function F need not be bijective and nonzero input dierences to F can be used to produce zero output dierences. For a = a1a2 2 Z 2m where ai 2 Z2m, let r(a) = a2a1. An r-round characteristic for a DES-like mapping r = (X; Y1; : : : ; Yr ) is said to be iterative if X = r(Yr ). Taking into account the swapping operation at each round, an iterative characteristic essentially maps plaintexts of dierence X to ciphertexts of dierence X in r rounds. We observe that k r-round iterative characteristics can be concatenated to form a (kr)round characteristics, k > 0. The best known characteristic that has been used against DES is a 2-round iterative characteristic found by Biham and Shamir [5] that is concatenated 6 21 times to break 16-round DES. Let F be de ned as in the previous section to consist of s S -boxes implementing m-bit permutations 1; 2; : : : ; s such that F : Z2ms ! Z2ms where 1 operates on the rst block of s bits, 2 operates on the second block of s bits, and so on. Also let the output of these substitutions be acted on by an (ms)-bit permutation P . With respect to DES, consider removing the E expansion and creating 4 new S -boxes that are 8-bit permutations; the P permutation is retained. A new key schedule yielding 32-bits for Kr would need to be devised. Let r be the left half dierence at round r, r the right half dierence at round r, and r the output dierence of F at round r. Then these dierences are listed in Table 2 for rounds 1 to 4. It is straight forward to argue that no 2-round iterative characteristics 19
will exist when the F function is bijective, unless 1 = 1 = 0. On the other hand, we will prove that 3-round characteristics are possible. We will call a xed point [15] of F if inputs of dierence to F can lead to an output dierence of in F (that is, F (; ) > 0).
r r round r 1 1 1 1 1 + 1 2 1 + 1 1 + 2 3 1 + 2 1 + 1 + 3 4 Table 2: Round dierences.
Lemma 3.1 Let be a xed point of F . Then 3 is a 3-round nonzero iterative char-
acteristic if
X 2 f 0; 0; g:
(31)
Proof. If 3 is a 3-round iterative characteristic then from Table 2 we must have that 1 = 1 + 2 and 1 = 1 + 1 + 3 which implies that 1 + 2 + 3 = 0. There are three cases to consider corresponding to the three possible values of X in (31). We will prove the case where X = 0 explicitly, and the other cases are similar. If X = 0 then
1 = 0, and 2 = = 1 since is a xed point of F . But then 1 + 2 = and 3 = from which we have that 4 = 1 = and 4 = 1 + 0 + 1 = 0. 2
For each dierence such that F (; ) > 0 there will exist three 3-round iterative characteristics of the form in (31). Each of these 3 characteristics has a round dierence of the form 0 which will go to the dierence 0 with probability 1. Then one in every 3 rounds dierences are predicted with certainty implying that 2 r + r mod 3 p r p b 3 c b 2 c :
(32)
Consider a Feistel-cipher similar to DES obtained by removing the E expansion and replacing the S -boxes by four 8-bit bijective mappings followed by a 32-bit permutation. 20
Then the probability of a 15-round characteristic 15 is bound as 10 10 (33) p 15 p = 287 = 0:90949 10?12: The bound is lowered further if we assume that more than one S -box has a nonzero input dierence at a round which has a nonzero input dierence.
4 Conclusion and Remarks The method of dierential cryptanalysis is based on the distribution of multi-round characteristics r (X; Y1; Y2; : : :; Yr ). The probability of r correctly predicting ciphertext dierences at each round in turn depends on the distribution of single-round characteristics Yi; Yi+1 for the round mapping F . When F consists of S -boxes implementing m-bit permutations, we have shown that the probability of the most likely single-round characteristic is expected to be less than 2mm?1 . It may well be the case that Theorem 3.1 can be improved to show that limm!1 E[m]=m = 0. Further research may attempt to prove that E[m] = O(log m) for example. Our results then show that a relatively simple design can produce product ciphers for which all characteristics r are expected to (correctly) predict dierences with low probability. We further note that random m-bit permutations can be generated eciently [24], and that the fraction of permutations that are with linear [11] or degenerate [23] in any output bit is tending to zero rapidly as a function of m. On the other hand, Biham and Shamir [5] found that replacing the S -boxes of DES by random 4-bit permutations yielded systems that were far weaker than the original DES. The weakness of these S boxes appears to be due to the dimension of the permutation rather than the use of permutations per se. The XOR properties of S -boxes that are contructed from several permutations, as in the case of DES, is considered by O'Connor [22]. An apparent defense against dierential cryptanalysis would be to design a round mapping F for which the corresponding XOR table contains uniform or nearly uniform entries. It has been shown by Nyberg [20], and independently by Adams [1], that it is possible to construct mappings : Z2m1 ! Z2m2 for which each entry of XOR is 2m1 ?m2 when the input dierence is nonzero. For the construction to be possible it must be the case that m1 2m2 which implies that the mapping cannot be bijective. Detombe and Tavares [9] have shown that for bijective mappings : Z2m ! Z2m the most balanced XOR 21
tables are those for which each row has 2m?1 entries that are 2, with the remaining XOR entries being zero. In both cases the mappings are constructed from boolean functions that are either bent or almost bent. More recently, several other such constructions have been found [2, 19]. We have concentrated on characteristics r but more important to the system designer are dierentials . A dierential is similar to a characteristic except that only an input dierence X and output dierence Yr = Y are speci ed while the intermediate dierences Y1; Y2; : : :; Yr?1 are unspeci ed and may assume any values which lead to Y at the rth round. The notion of a dierential follows from modeling dierences using Markov chains, as suggested by Lai [16]. It may be the case that all characteristics are unlikely but there exist high probability dierentials. A deeper analysis using Markov chains will be required to bound the probability of the most likely dierential in a cipher. Notwithstanding, Knudsen and Nyberg [21] have shown that the probability of any dierential is bounded from above by 2 (p )2, regardless of the number of rounds. ACKNOWLEDGEMENTS I would like to thank Prabahkar Ragde for his assistance is developing the results in this thesis. I would also like to thank the referees for there cogent comments and for the correction of several errors in the original manuscript.
References [1] C. M. Adams. On immunity against Biham and Shamir's dierential cryptanalysis. Information Processing Letters, 41:77{80, 1992. [2] T. Beth and C. Ding. On almost perfect nonlinear permutations. abstracts of papers, EUROCRYPT 93. [3] E. Biham and A. Shamir. Dierential cryptanalysis of DES-like cryptosystems. Journal of Cryptology, 4(1):3{72, 1991. 22
[4] E. Biham and A. Shamir. Dierential cryptanalysis of the full 16-round DES. Technical Report 708, Technion, Israel Institute of Technology, Haifa, Israel, 1991. [5] E. Biham and A. Shamir. Dierential cryptanalysis of Snefru, Khafre, REDOCII, LOKI and LUCIFER. Advances in Cryptology, CRYPTO 91, Lecture Notes in Computer Science, vol. 576, J. Feigenbaum ed., Springer-Verlag, pages 156{171, 1992. [6] L. P. Brown, M. Kwan, J. Pieprzyk, and J. Seberry. Improving resistance to dierential cryptanalysis and the redesign of LOKI. to appear, Advances in Cryptology, proceedings of ASIACRYPT 91. [7] L. P. Brown, J. Pieprzyk, and J. Seberry. LOKI - a cryptographic primitive for authentication and secrecy applications. Advances in Cryptology, AUSCRYPT 90, Lecture Notes in Computer Science, vol. 453, J. Seberry and J. Pieprzyk eds., SpringerVerlag, pages 229{236, 1990. [8] T. Cusick and M. Wood. The REDOC-II cryptosystem. Advances in Cryptology, CRYPTO 90, Lecture Notes in Computer Science, vol. 537, A. J. Menezes and S. A. Vanstone ed., Springer-Verlag, pages 545{563, 1991. [9] J. Detombe and S. Tavares. Constructing large cryptographically strong S-boxes. abstracts of papers, AUSTCRYPT 92. [10] H. Feistel. Cryptography and computer privacy. Scienti c American, 228(5):15{23, 1973. [11] J. Gordon and H. Retkin. Are big S -boxes best? In T. Beth, editor, Cryptography, proceedings, Burg Feuerstein, pages 257{262, 1982. [12] R. L. Graham, D. E. Knuth, and O. Patshnik. Concrete Mathematics, A Foundation for Computer Science. Addison Wesley, 1989. [13] M. Hall. Combinatorial Theory. Blaisdell Publishing Company, 1967. [14] M. Hofri. Probabilistic Analysis of Algorithms. Springer{Verlag, 1987. [15] L. R. Knudsen. Cryptanalysis of loki. to appear, Advances in Cryptology, proceedings of ASIACRYPT 91, 1991. 23
[16] X. Lai. On the design and security of block ciphers. ETH Series in Information Processing, editor J. Massey, Hartung-Gorre Verlag Konstanz, 1992. [17] X. Lai and J. L. Massey. A proposal for a new block encryption standard. In Advances in Cryptology, EUROCRYPT 90, Lecture Notes in Computer Science, vol. 473, I. B. Damgard ed., Springer-Verlag, pages 389{404, 1991. [18] R. Merkle. Fast software encryption functions. Advances in Cryptology, CRYPTO 90, Lecture Notes in Computer Science, vol. 537, A. J. Menezes and S. A. Vanstone ed., Springer-Verlag, pages 476{501, 1991. [19] K. Nyberg. Dierentially uniform mappings for cryptography. abstracts of papers, EUROCRYPT 93. [20] K. Nyberg. Perfect nonlinear S -boxes. Advances in Cryptology, EUROCRYPT 91, Lecture Notes in Computer Science, vol. 547, D. W. Davies ed., Springer-Verlag, pages 378{386, 1991. [21] K. Nyberg and L. R. Knudsen. Provable security against dierential cryptanalysis, August, 1992. talk given at the Rump Session of CRYPTO '92. [22] L. J. O'Connor. On the distribution of characteristics in composite permutations. presented at CRYPTO 93, USA, August 1993. [23] L. J. O'Connor. Enumerating nondegenerate permutations. Advances in Cryptology, EUROCRYPT 91, Lecture Notes in Computer Science, vol. 547, D. W. Davies ed., Springer-Verlag, pages 368{377, 1991. [24] E. M. Reingold, J. Nievergeld, and N. Deo. Combinatorial Algorithms: Theory and Practice. Prentice-Hall, 1976. [25] A. Shimizu and S. Miyaguchi. Fast data encipherment algorithm FEAL. Advances in Cryptology, EUROCRYPT 87, Lecture Notes in Computer Science, vol. 304, D. Chaum and W. L. Price eds., Springer-Verlag, pages 267{278, 1988.
24