IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 12, DECEMBER 2010
1915
On the Improvement of Neural Cryptography Using Erroneous Transmitted Information with Error Prediction Ahmed M. Allam and Hazem M. Abbas, Member, IEEE
Abstract— Neural cryptography deals with the problem of “key exchange” between two neural networks using the mutual learning concept. The two networks exchange their outputs (in bits) and the key between the two communicating parties is eventually represented in the final learned weights, when the two networks are said to be synchronized. Security of neural synchronization is put at risk if an attacker is capable of synchronizing with any of the two parties during the training process. Therefore, diminishing the probability of such a threat improves the reliability of exchanging the output bits through a public channel. The synchronization with feedback algorithm is one of the existing algorithms that enhances the security of neural cryptography. This paper proposes three new algorithms to enhance the mutual learning process. They mainly depend on disrupting the attacker confidence in the exchanged outputs and input patterns during training. The first algorithm is called “Do not Trust My Partner” (DTMP), which relies on one party sending erroneous output bits, with the other party being capable of predicting and correcting this error. The second algorithm is called “Synchronization with Common Secret Feedback” (SCSFB), where inputs are kept partially secret and the attacker has to train its network on input patterns that are different from the training sets used by the communicating parties. The third algorithm is a hybrid technique combining the features of the DTMP and SCSFB. The proposed approaches are shown to outperform the synchronization with feedback algorithm in the time needed for the parties to synchronize. Index Terms— Cryptography, mutual learning, neural cryptography, neural synchronization, tree parity machine.
I. I NTRODUCTION YNCHRONIZATION between different entities is a known phenomenon that exists in different physical and biological systems. Synchronization in biological systems can be found in the behavior of Southeast Asian fireflies [1], which is a biological type of phase synchronization of multiple oscillators. Also, another type of synchronization exists in chaotic systems [2], where the synchronization process in artificial neural networks (NNs) can be exploited in securing information transmission.
S
Manuscript received October 28, 2010; revised September 13, 2010; accepted September 14, 2010. Date of publication October 11, 2010; date of current version November 30, 2010. A. M. Allam is with the Department of Computer and Systems Engineering, Ain Shams University, Cairo 11361, Egypt (e-mail:
[email protected]). H. M. Abbas is with Mentor Graphics Egypt, Cairo 11361, Egypt (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNN.2010.2079948
NNs are mathematical models that have been developed to simulate biological neurons inside the brain [3]. These models can solve difficult problems that cannot be handled by classical methods due to the lack of sufficient information. NNs can replace the process of developing complex algorithms by means of learning by examples also known as supervised learning (see [4], and references therein). Synchronization in NNs is found to solve one of the common problems in cryptography, which is to share a common key between two parties A and B, while an attacker E should be unable to retrieve the key even with the ability to access the communication channel [5]. In classical cryptography, this problem has been extensively studied [6]–[9]. In 1976, Diffie and Hellman developed a mechanism based on number theory by which a secret key can be exchanged by two parties over a public channel which is accessible to any attacker [6], [9]. Alternatively, two networks trained on their outputs are able to achieve the same objective by means of mutual learning [10] bringing about what is known as neural cryptography. The most common model used in neural cryptography is the tree parity machine (TPM) [10], which keeps the state of the two parties secret, making it more secure than using a simple network. This paper presents three algorithms to enhance the security of neural cryptography in such a way that the attacker faces difficulties in trusting the transmitted information on the public channel. The proposed algorithms tamper with the listening process, which is the basic mechanism the attacker depends on to break into the system. This paper is organized as follows. Section II presents the mutual learning method for both simple networks and the TPM. Section III summarizes the most known attacks against mutual learning. In Section IV, the Do not Trust My Partner (DTMP) with error prediction approach is proposed to improve the security of exchanging the output bits of two communicating parties. Section V presents the possible breakin scenarios against the proposed method. In Section VI, the performance of the proposed algorithm is analyzed. Section VII presents simulation and experimental results for the DTMP algorithm. Section VIII introduces the Synchronization with Common Secret Feedback (SCSFB) algorithm as a modification for the synchronization with feedback algorithm. In Section IX, the two proposed approaches, i.e., DTMP and SCSFB, are combined to provide for additional secure communication.
1045–9227/$26.00 © 2010 IEEE
1916
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 12, DECEMBER 2010
τ
X
wA
wB
σ3
σ2 w2
w1 σ
σ1
w3
σB
A
x1 Fig. 2.
Fig. 1. Two perceptrons with identical inputs x that learn their mutual output bits σ .
II. M UTUAL L EARNING IN TPM S The basic building block for the mutual learning process is a single perceptron. Fig. 1 depicts two communicating perceptrons having different initial weights wA/B , and receiving the same random input x at every training step. The mutual learning process is based on exchanging output bits σ A/B between the two perceptrons. The output σ is defined as σ i = sign wiT .x (1) where
i ∈ {A, B} and sign(n) =
1, for n ≥ 0; −1, otherwise
at the end of training step t, the weight vectors w are updated using the following learning rule [10]: η w A (t + 1) = w A (t) + x(t)σ B (t) −σ A (t)σ B (t) N η B B w (t + 1) = w (t) + x(t)σ A (t) −σ A (t)σ B (t) (2) N where η is a suitable learning rate and is the step function. Clearly, the weights will be updated only if the two output bits σ A and σ B disagree. After each weight update, the weight vectors of the two networks are kept normalized. If the learning rate exceeds a critical value ηc = 1.816, the two weight vectors will satisfy the condition w A = −w B [10]. There are some restrictions on both the input and weight vector generation mechanisms in order to achieve full synchronization. The input pattern x has to be an N-dimensional vector with its components being generated from a zeromean unit-variance Gaussian distribution (continuous values). Also, the weight vector w is an N-dimensional vector with continuous components which should be normalized, i.e., ||wT .w = 1||, since only normalized weights can synchronize. The single-perceptron-based protocol transmits very detailed information about the state representation of each party. Therefore, a single perceptron is insecure and can be easily broken into. If the two parties agree in their mutual output bits while the attacker does not, the reverse rule can be applied (here it is anti-Hebbian), so that the attacker will be able
x2
x3
TPM with K = 3 and N = 3.
to synchronize with probability of 99% [10]. Consequently, another structure should be developed to hide the internal state of each party so that it cannot be reproduced from the transmitted information. A TPM structure that consists of K perceptrons (Fig. 2), each with an output bit σk , is a candidate for improving the security of the mutual learning process and can be defined as (3) σk = sign wkT .xk . The output of the TPM is K τ= σk . k=1
(4)
The continuous type of input and weight vector components are not suitable for cryptographic application. When only digital signaling (0s and 1s) is permitted, the input and weight components should be drawn from a discrete distribution rather than a continuous one. Bipolar input patterns x ∈ {−1, 1}N and discrete weight vector wk, j ∈ {−L, −L + 1, . . . L − 1, L} N will be used here, where L is an integer value chosen by the designer to represent the synaptic depth of the network [10]. The two partners who need to share a common key will maintain two identical TPMs. Since the weights need to be of discrete values, they must be drawn from a uniform distribution to achieve the synchronization between the two parties. Then, the two partners are trained using the same input vector xk , which is either drawn from another uniform distribution [11] or generated using a linear feedback shift register (LFSR) [12]. The learning mechanism proceeds as follows ( [10], [13]). 1) If the output bits are different, i.e., τ A = τ B , nothing is changed. 2) If τ A = τ B = τ , only the weights of the hidden units A/B = τ A/B will be updated. with σk 3) The weight vector of this hidden unit is adjusted using any of the following learning rules. a) Anti-Hebbian A/B A/B wk = wk − τ A/B xk σk τ A/B τ A τ B . (5) b) Hebbian A/B A/B wk = wk + τ A/B xk σk τ A/B τ A τ B . (6) c) Random walk A/B A/B wk = wk − xk σk τ A/B τ A τ B . (7)
ALLAM AND ABBAS: ON THE IMPROVEMENT OF NEURAL CRYPTOGRAPHY
1917
During step 2, if there is at least one common hidden unit with σk = τ in the two networks, then there are three possibilities that characterize the behavior of the hidden nodes. 1) An attractive move, when hidden units at similar k positions have equal output bits, σkA = σkB = τ A/B . 2) A repulsive move, when hidden units at similar k positions have unequal output bits, σkA = σkB . 3) No move, when σkA = σkB = τ A/B . The distance between hidden units can be defined by their mutual overlap ρk
The lower the absolute value of |h iE |, the higher the probability that σ A/B = σ E and this probability is known as the maximum p error prediction i . Therefore, E will manipulate the hidden unit with minimum |h iE | before applying the learning rule [10], [17]. Since the geometric attack follows a probabilistic behavior, it is not guaranteed that it will succeed in correcting the internal representation of the attacker’s TPM.
ρk =
wk A .wk B wk A .wk A wk B .wk B
(8)
where 0 < ρk < 1, with ρk = 0 at the start of learning, and ρk = 1 when synchronization occurs with the two hidden units having a common weight vector. III. ATTACKS ON N EURAL C RYPTOGRAPHY The security of neural key-exchange protocol is based on a competition between attractive and repulsive forces mentioned earlier [14]–[16]. Two NNs interacting with each other synchronize much faster than with a third network only trained with their inputs and outputs. The difference between the two parties and an attacker is that the parties synchronize in a polynomial time of synaptic depth L, as shown in (9) (9) tsync = t0 ln(N), t0 = O L 2 while the complexity of the attacker scales exponentially [17]. However, the process is stochastic and depends on the random attractive and repulsive forces. Consequently, there is a slight probability that an attacker will succeed in synchronizing with one of the parties [10], [17], [18]. The challenge the attacker faces with the TPM structure is the lack of knowledge about the internal representation {σ1 , σ2 , . . . , σk } of A’s or B’s TPM. Most known attacks depend on estimating the state of the hidden units. Hence, if a smart attack strategy exists, then it can break the security of neural cryptography. The basic attack strategies of neural cryptography are given below. A. Simple Attack [5], [17] Here the attacker E’s NN has the same structure as that of A and B. All that E has to do is to start with random initial weights and to train with the same inputs transmitted between A and B over the public channel. Then, the attacker E learns the mutual output bit τ A/B between them and applies the same learning rule by replacing τ E with τ A/B (10) wkE = wkE − τ A xk σkE τ A/B τ A τ B . B. Geometric Attack [10], [17], [19] A geometric attack can outperform a simple attack because, in addition to applying the same learning process, E can exploit τ E and the local fields of its hidden units, h 1E , h 2E , . . . , h kE . When τ A/B = τ E , E applies only the same learning rule used by A and B. If τ A/B = τ E , then E tries to correct its internal representation of its TPM, h 1E , h 2E , . . . , h kE .
C. Majority Attack [10], [17] The majority attack is similar to the geometric attack. However, E can increase the probability to predict the internal representation of any of the partners’ TPM. The majority attack is also called a “cooperative attack” since the M TPMs are working in a cooperating group rather than as individuals. Instead of using only one TPM, the attacker starts with M TPMs with random weight vectors to achieve a zero overlap between them. If τ A = τ B , weights are not updated. But for τ A = τ B , the attacker must update its own TPM. Then it calculates the output bit of all its attacking networks. If the mth attacking network’s output bit τ E,m is not equal to τ A , then the geometric attack is applied individually for every mth network and no update will occur before cooperating with other networks. Next, E searches for the common internal representation among the M TPMs and updates their weights according to this majority vote. In order to avoid the expected high correlation between the M networks, the majority attack and the geometric attack are applied alternately in even and odd steps, respectively. Also, the majority attack is replaced with the geometric attacks in the first 100 steps. D. Genetic Attack [17], [18] In the genetic attack, the attacker starts with only one TPM but is permitted to use M TPMs. Since the most challenging issue in the mutual learning process is to predict the internal representation of either TPM A or TPM B , the genetic attack directly deals with this difficulty. For a specific value of τ A/B , there are 2 K −1 different internal representations (using a K ) to reproduce this value. number of K binary variables {σi }i=1 The genetic attack handles all these possibilities in a parallel fashion. The genetic attack proceeds as follows. 1) If τ A = τ B and E has at most M/2 K −1 TPMs, 2 K −1 TPMs are generated and each updates its weights based on one of the possible internal representations. This step is known in genetic algorithms as the mutation step. 2) If E has more than M/2 K −1 TPMs, the mutation step will be essentially an overhead due to the exponential storage needed. Therefore, the attacker must discard some of the TPMs to avoid exponential storage increase. As a genetic algorithm, the discarding procedure is based on removing the TPMs with the least fittest function. The algorithm uses two variables U and V as the fitting functions. The variable U represents the number of correct prediction of τ A in the last V training steps. For more details about genetic algorithms and evolutionary strategies, refer to [20].
1918
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 12, DECEMBER 2010
IV. DTMP WITH E RROR P REDICTION The attack strategies discussed in Section III basically depend on listening to the conversation between the two parties, sniffing their mutual output bits τ A/B , and using the data to carry out the attack. This can be summarized in the following rules. 1) If τ A = τ B and τ A = τ E , then E applies the learning rule. 2) If τ A = τ B and τ A = τ E , then E takes an action according to the attack type. Therefore, hiding the mutual output bits or sending erroneous values can increase the security level significantly, as the attacker will not be able to break in unless the values of the mutual output bits are known. Moreover, applying any attack strategy using erroneous τ A/B will cause the attacker to diverge away from the two parties. The DTMP algorithm proposed here aims to send erroneous τ in a probabilistic controlled manner in such a way that it improves the security of the neural cryptography without adding extra computation nor storage space required from the parties’ side. Consequently, applying any type of attack becomes totally futile unless E knows the correct values of τ A/B . In fact, the proposed method brings up two challenging issues. 1) If A transmits a wrong value of τ A , then how will B be able to recover the correct value of τ A from the transmitted value? 2) Even if the first issue can be tackled, how can one guarantee that E will not apply the same strategy and eventually detect the error and consequently breaks in? These two challenges can be handled by controlling the probabilistic generation of the erroneous bit by finding and exploiting a property that is common between the two communicating parties A and B but unavailable for the attacker E. It was shown that the average overlap between A and B is higher than that between E and any of the two parties. The effort needed by the partners to synchronize increases polynomially with the synaptic depth L, while the complexity of the attack grows exponentially with L 2 . The probability of a successful geometric attack PE is given by (for k = 3) (11) PE ≈ 0.92 exp −0.15 L − 0.05 L 2 which becomes a very small amount for a large value of L [17]. Therefore, this property can provide for a mechanism by which the erroneous bit is essentially a function of the average overlap between the two parties. Let HkA and HkB be the absolute sum of the input to the kth hidden unit in both TPM A and TPM B , respectively A/B N A/B A/B w x = (12) Hk k,i . i=1 k,i In the following, it will be shown how to use the variable Hki , i = (A, B), as the common property between the two parties to achieve the desired objective. Let s be the number of A/B bits required to store Hk . Initially, the values of HkA and HkB differ in d bits, which ranges randomly between 0 and s according to the initial configuration of the two parties. Through
the mutual learning process, the overlap ρ A/B between the two parties increases and the corresponding weight values A/B wk, j become closer. Consequently, the value of d decreases gradually until it reaches zero at full synchronization. At a partial synchronization state, d holds a value that is relatively so small that an attacker cannot match the most significant (s − d) bits with the two parties if the least significant d bits are truncated or ANDed with d zeros. From this point on, A/B is set to its own value after masking. Therefore, if d is Hk estimated correctly, then there are two possibilities
or
HkA = HkB
(13)
A Hk − HkB = 2d .
(14)
The binary representation of HkA and HkB will control the occurrence of either equality. Hence, parties A and B will have the means to validate the occurrence of (13) or (14), whereas the attacker has no access to either HkA or HkB . To further exploit this property, HkA and HkB will be used as two seeds in a random number generator (RNG) p p p p R0,k = Hk (15) Ri,k = Rem a ∗ Ri−1,k + c , m , A/B
where Rem(x, y) is the remainder of the division x/y, R0,k p is the seed for the RNG, Ri,k is the i th generated random number, k ∈ {1, 2} is the hidden unit being used, = {a, c, m} is the set of RNG parameters, and p denotes either party A A = R B , one or B. Since the two seeds are equal, i.e., R1,k 1,k party can send erroneous bits as a function of the generated random variable and be sure that its partner is able to recover the correct mutual output value. In addition, E has no access to either HkA or HkB . Consequently, the probability to reproduce E equal to R A/B is very small. It is worth mentioning here Ri,k i,k that the use of another is necessary to generate erroneous τ , since the sequence generated by the old will be easily A and R B are sent over the recovered by the attacker when R1,k 1,k A , R A ) will be used in such a public network. A function F(Ri,1 i,2 A , RA ) > 0 way that A will generate an erroneous τ A if F(Ri,1 i,2 A > A and a correct τ otherwise. Here, F is chosen to be F(Ri,1 A Ri,2 ) and it will be set by the designer to secure the system. A A , R A ) will definitely improve the system more complex F(Ri,1 i,2 security. Since B has the same value of Hk , it follows that A and should be able to recover the correct value of τ A . The algorithm is detailed in Algorithm 1. If only one party sends the erroneous bit, the security of the mutual learning algorithm will be jeopardized as it is always sufficient for the attacker to follow the output value of any party. To avoid this vulnerability, both parties may send A/B erroneous bits using additional local parameter, e.g., Hk (k = 1, 2), to prevent the scenario at which the attacker can attack each of the two parties separately. V. B REAK - IN S CENARIOS FOR DTMP The DTMP algorithm mainly hinges on the fact that the average overlap between the two parties is higher than that between the attacker and any of the communicating parties.
ALLAM AND ABBAS: ON THE IMPROVEMENT OF NEURAL CRYPTOGRAPHY
1919
Algorithm 1 The DTMP Algorithm with Error Prediction
Therefore, the attacker can succeed in removing the impact of the DTMP algorithm only by coincidentally obtaining a A/B correct value of HkE (= Hk ). Also, if the attacker uses the same seed, there is no guarantee that this will be the correct seed since a large number of repulsive steps are also probable due to the original mutual learning process security. However, there are two different possible methods for E to break in. 1) To make a brute-force attack by applying all possible A/B for k = 1, 2. This is compuavailable seeds, i.e., Hk tationally a very expensive process both in processing time and storage since it needs around 2(s−d) attackers applying the algorithm in parallel. The absolute sum of a hidden unit (16) nearly covers the entire spectrum as shown in Fig. 3. Therefore, it is highly unlikely that its value will be predicted, and thus the seed cannot be recovered. The value of the hidden unit sum N maximum is calculated as w x i i i=1 Hmax = L × N = 127 × 3000 = 381 000
(16)
where L is the maximum weight value and N is number of weights per hidden unit (in our implementation L = 127 and N = 3000). The value Hmax is stored in s = 19 bits, and if d = 13, then the number of bits
× 105
2
1.5 Hk
Require: n a predefined period in which the two partners apply the DTMP algorithm A A τcomput ed τ computed by A, A τ A sent from A to B and τsent A τused τ A used by B in the mutual learning algorithm. 1: A and B estimate the current overlap ρi and hence estimate d 2: A and B select two predefined hidden units and mask the lsb d bits of the sum by ANDing with zeros 3: The two masked sums, HkA and HkB , are used as two seeds in a predefined RNG as suggested in (15) and the results are exchanged A = R B then 4: if R1,k 1,k A 5: Hk and HkB are equal 6: else 7: goto step 19 8: end if 9: use HkA and HkB in another RNG. 10: loop {for n} A , R A ). 11: apply F(Ri,1 i,2 A A ) > 0 then 12: if F(Ri,1 , Ri,2 A = −τ A 13: τsent comput ed 14: end if B , R B ) > 0 then 15: if F(Ri,1 i,2 A A 16: τused = −τsent 17: end if 18: end loop[and the algorithm is over] 19: increment d or continue in mutual learning for more iterations, then goto step 3.
2.5
1
0.5
0
0
0.5
1 1.5 Iteration steps
2
2.5 × 104
Fig. 3. Value of the absolute sum of a hidden unit |Hk | covering the entire possible spectrum.
remaining after masking is s − d = 6, which needs 64 A/B different trials for guessing the correct value of Hk . This is needed for only a single hidden unit. Using two hidden units, as proposed in the algorithm, will double the remaining bits to become 2 ∗ (s − d). Since different hidden units are assumed to be statistically independent A/B A/B and no relationship between Hk1 and Hk2 can be 12 established, a number of 2 trials are required to reach A/B A/B the correct two values of Hk1 and Hk2 . A/B and try to detect the error in the generated 2) To skip Hk τ bits. Since the error does not have a simple probability distribution, E will not be able to locate the erroneous bits from the correct ones even if E is looking for the event when τ A = τ B . However, this is a computationally expensive attempt on the part of the attacker and does not guarantee synchronization with the parties. The complexity of this trial is O(2n ), where n is the number of steps needed by a partner to generate erroneous τ bits. In this case, when the attacker exploits the event when τ A = τ B at large values of overlap ρi , then a break-in can occur. There are two counter measures against this possibility. a) Applying the algorithm before the overlap reaches a value at which Pρ (τ A = τ B ) is very high while d is still small so that the truncated part of the hidden unit sum is negligible with respect to the total hidden unit sum. The variable Pρ (τ A = τ B ) refers to the probability of τ A = τ B at a certain overlap ρ. b) Generating erroneous bits for a certain period of time without applying the DTMP algorithm producing a decreased overlap between the two partners followed by an algorithm restart. In spite of being a viable method, it will definitely prolong the synchronization time. The DTMP aims to improve the security of the neural cryptography without affecting the synchronization time. It
1920
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 12, DECEMBER 2010
1000
1
900 800 600 500 Update
|HAk − HBk |
700
400 300 200 100 0 3000
3500
4000 Iteration steps (a)
4500
5000
0
10000 9000 8000
2000
4000 6000 Iteration steps
8000
10000
Fig. 5. Occurrence of τ A = τ B through the mutual learning, represented by white areas.
7000 |HAk − HBk |
0
6000 5000 4000 3000 2000 1000 0 3000
3500
4000 Iteration steps (b)
4500
5000
Fig. 4. Value of |HkA − HkB | versus d during training at a large value of overlap. (a) d = 9. (b) d = 13. A/B
works on estimating the parameter d so that the variable Hk can be used to generate the erroneous values of τ . The value of d can be determined in one of the two ways: 1) randomly select different values of d and check for the equality, HkA = HkB . If the value of d is still unsafe, then the two parties resume the learning process for few more iterations and then select a new random value of d; 2) make a direct estimation based on the current overlap ρ A/B . It should be noted that finding a direct relationship between d and the current overlap (measured in the frequency of updates), especially with three hidden units, is not a straightforward task, as it is a function of many parameters such as the initial weights of the two parties, the learning method, and the input vector generation mechanism. Therefore, the random selection of d will be adopted in this paper. This solution will not affect the performance of the algorithm as far as the average synchronization time is concerned since the maximum value of d is equals to s, which is a finite small number. Finding a closed-form expression for the value of d will be taken up further in a future extension of this paper. The DTMP algorithm exploits the high probability of having HkA = HkB for a good estimation of the parameter d. In Fig. 4(a) and (b), the value |HkA − HkB | is displayed starting from iteration 3000 to 5000 for different values of d. Our
simulation results show that the two parties synchronized within approximately 10 000 iterations. During the training period from 3000 to 5000, it was evident that the Euclidean distance between the two parties became small enough to study the effect of the DTMP algorithm at high values of overlap. It is shown that the higher the overlap between the two parties, the larger the training region in which HkA = HkB . Also, the higher the value of d, the larger the number of steps needed A = H B , then to reach the equality HkA = HkB . Actually, if Hk1 k1 increasing d will bring the value of the two variables closer. However, it is preferred to continue in mutual learning until A/B HkA = HkB , as it implies the low probability that HkE = Hk . It is clear that at high values of d, more training steps should elapse before synchronization occurs, i.e., HkA = HkB . One might consider that d = 13 is an excessively large number but given the large value of N(= 3000), the amount 213 sounds very reasonable and the average absolute difference for each weight is A H − H B 8000 k k = ≈ 2.667. (17) N 3000 The overlap between the attacker and any of the two parties has a very low probability in order to reach a value shown in (17). Fig. 3 shows that the amount 213 is much smaller than the absolute sum. Moreover, the DTMP algorithm is applied from iteration 3000 to 5000 where the occurrence of (τ A = τ B ) is not highly likely as depicted in Fig. 5 due to incomplete synchronization. Therefore, it is difficult for E to predict whether the two parties will update their weights or not. VI. P ERFORMANCE E NHANCEMENT As mentioned before, when d gets smaller, the system becomes more secure. Here a binary coded decimal (BCD) representation of the sum of the hidden units in the DTMP algorithm is introduced in such a way that the needed value of d gets decreased, thereby providing more security. The DTMP– BCD algorithm (Algorithm 2) requires some additional parameters in order to accommodate the BCD representation.
ALLAM AND ABBAS: ON THE IMPROVEMENT OF NEURAL CRYPTOGRAPHY
The complexity of the attacks needs to be analyzed. The A/B absolute sum of a hidden unit Hk is stored in a number of A/B S BCD digits (4S bits). Knowing that Hk may not cover the entire span of the 4S bits, the number of the most significant bits that will not be affected will be referred to as u. For this algorithm, this leads to (S − Dsa f e ) equal digits between HkA and HkB with a complexity of O(2k(4(S−Dsa f e )−u) ). For example, when Dsa f e = 3, S = 6, K = 2 (2 selected hidden A/B units), and u = 2 (notice that the absolute value of Hk cannot exceed L × N = 127 × 3000 = 381 000, so the last digit can be represented only in 2 bits and the other 2 bits remain unchanged) and thus a complexity of O(220 ) can be achieved. Although some of the algorithm parameter values can be left for the two communicating parties to set, many of the DTMP– BCD parameters should be properly adjusted to achieve the
7000 A B C
6000 B 5000 A 4000
C
ED
Algorithm 2 DTMP–BCD Algorithm with Error Prediction Require: S is the minimum number of digits required to A/B represent Hk D the number of digits that HkA and HkB differs in. C D the value of the Dth digit Dsa f e the number of digits that is safe for the algorithm to neglect, and n a predefined period during which the two partners apply the DTMP algorithm. 1: A and B calculate the current overlap ρi and hence compute D. 2: A and B select two predefined hidden units and masking the least significant d bits of the sum by ANDing with zeros. 3: Use the two masked sums HkA and HkB as two seeds in a predefined RNG as shown in (15) and exchange the results. A = R B then 4: if R1,k 1,k A 5: Hk and HkB are equal and control is transferred to step 9, 6: else 7: goto step 24 8: end if 9: D = D − 1 10: if D > Dsa f e then 11: one of the two partners may add a random value for C D 12: goto step 3. 13: end if 14: use HkA and HkB in another RNG. A , RA ) 15: apply F(Ri,1 i,2 16: loop {for n} A , R A ) > 0 then 17: if F(Ri,1 i,2 A 18: τsent = −τ A 19: end if B , R B ) > 0 then 20: if F(Ri,1 i,2 A A 21: τ = −τreceived 22: end if 23: end loop[and the algorithm is over] 24: any partner will increment/decrement C D randomly; then goto step 2
1921
3000 2000 1000 0
0
5000
10000
15000
Iteration steps Fig. 6. Euclidean distance between weight vectors of the two TPMs for different scenarios. (A) Mutual learning normal operation. (B) Party A is sending erroneous bits till end of simulation time. (C) Two partners applying the DTMP algorithm.
intended security enhancement of the DTMP–BCD algorithm. Two of these parameters are n and Dsa f e . Equation (9) shows how to estimate the synchronization time at which the two communicating parties can synchronize at common weight vectors. Instead of analytically estimating the current overlap, the frequency of occurrence of τ A = τ B is utilized to achieve this objective. The higher the frequency that τ A = τ B , the higher the overlap between A and B. It is recommended that either the DTMP or the DTMP–BCD starts before the overlap between the two communicating partners reach a high value of the frequency of τ A = τ B to avoid the attacker’s breakin. The duration of the DTMP can begin at the start of the application of the algorithm and end at the synchronization time. In this case, the communicating parties must accept the fact that the DTMP technique will not be able to deceive the attacker at a very high overlap in case only one of the communicating parties is sending erroneous output bits. The two communicating parties need to select Dsa f e in such a way that they become confident that the attacker will not have an equal absolute truncated sum up to the Dsa f e th digit. In our simulation, since N is high (3000), truncating three digits (order of hundreds) will not decrease the security of the DTMP–BCD. Also, it is clear from the DTMP–BCD algorithm that A and B can apply the algorithm by selecting Dsa f e equal to zero, and this will bring about another advantage of A/B in BCD instead of the conventional binary representing Hk representation. VII. E XPERIMENTAL R ESULTS In this section, some experimental results will be presented to demonstrate the performance of the DTMP algorithms where two TPMs are trained with K = 3, N = 3000, and L = 127. The mutual learning algorithm is applied in training the networks, and the simulation results are displayed in Fig. 6. In this figure, three different scenarios are simulated starting with the conventional mutual learning algorithm and ending
1922
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 12, DECEMBER 2010
with the DTMP algorithm with error prediction. The Euclidean distance E D = ||WA − WB || between the weight vectors of the two parties is plotted against the training step. This distance should become zero when the two networks are synchronized. The LFSR is used here to generate the input vector. Despite the fact that the LFSR is not secure due to the correlation between successive input patterns, it is used here to show, nonetheless, how the DTMP algorithm can improve the security of neural cryptography. Improving the LFSRbased mutual learning process suggests that the system will be more secure using pure random input generation. It should be mentioned here that the LFSR leads to faster synchronization than a pure random generator. Only one party applying the DTMP algorithm indicates how the DTMP algorithm improves the security of neural synchronization since it represents the difficulties an attacker faces to synchronize with the two communicating parties. In fact, the attacker faces more difficulties since it can only listen to the conversation between the two parties, while they have an additional advantage of performing mutual learning. Fig. 6, curve A shows the normal scenario where the mutual learning algorithm is applied to the two networks. The second scenario is depicted in curve B. It shows the case when A does not stop sending the erroneous τ A . Therefore, B does not trust the value of τ A on a permanent basis. As a result, and as shown in the curve, the ED exhibits an oscillating pattern for the rest of the simulation period and never reaches the zero value. Consequently, A and B will never synchronize. Finally, simulation of the proposed DTMP algorithm with error prediction algorithm is shown in curve C. It is evident that the distance is always decreasing without any oscillation. It starts to trace curve A from the early phase of simulation since the two parties use the same random strategy (the same seed) and a receiving party is capable of correcting the erroneous output bits received from the sending party. Clearly, the DTMP algorithm does not affect the synchronization time. Three important observations can be drawn when the DTMP algorithm with error prediction is applied. 1) The generated erroneous bits can confuse the attacker and reduce its probability of successful break-ins. 2) The algorithm eliminates the period in which the overlap between the two parties increases due to the generated erroneous bits. 3) It enables one of the parties to generate erroneous bits from a certain time (d is relatively small) till the end of the mutual learning process. Therefore, it eliminates the oscillations experienced in scenario (B). VIII. SCSFB The synchronization with feedback algorithm has been developed to improve the security of the mutual learning scheme. As shown in Fig. 7, the feedback mechanism depends on generating different input patterns that are partially A/B secret (E does not know σk ) by employing the following rules [13]: A/B
xi, j
A/B xi,1
A/B
(18)
A/B σi
(19)
= xi, j −1 for j > 1 =
τ
σ1 w1
w2
x1 Fig. 7.
σ3
σ2
w3
x2
x3
TPM with feedback (N = 3, K = 3).
A/B
where xi, j
is the j th component of the input vector in the A/B
is the output of the i th hidden i th hidden unit and σi unit, 1 < i < K and 1 < j < N. Applying (18) and (19) does not guarantee synchronization, because each occurrence of σiA = σiB reduces the number of the identical input bits by 1 for the next N steps. Alternatively, the algorithm is modified in such a way that it is applied only if τiA = τiB . To reduce the possibility of generating uncorrelated inputs when τiA = τiB , A/B only (18) is applied, while xi,1 is set to a common random A B bit. If τi = τi for a number of R steps, then all input bits are reinitialized to a common random input vector [13]. Synchronization with feedback decreases the probability of attacks even for low values of L, while it increases the average number of steps needed to achieve full synchronization [13]. In the following, a modification to the synchronization with feedback is introduced so that the synchronization time is not affected while the low possibility of successful attacks is maintained. The modified feedback algorithm is based on a similar mechanism as used in the DTMP algorithm with error A/B of a hidden prediction by utilizing the absolute sum Hk A/B is used as a seed for an RNG in a unit. The value of Hk fashion similar to that mentioned in Section IV to generate a secret common input patterns. By using this mechanism, the repulsive effect caused by the original feedback mechanism is eliminated, since the input patterns, while being partially secret, are common for each of the two communicating parties. The rules of the modified algorithm are as follows: A/B
xi, j
A/B xi,1 A/B xi,z
A/B
= xi, j −1 for j > 1 = =
A/B −x i, j A/B −xi,z
where z = mod(Ri , N) and G(Ri ) =
∗ G(Ri )
(20) (21) (22)
1, for Ri is even −1, otherwise
and the condition used to evaluate G(Ri ) (Ri is an even number in our simulation) can be set by the two communicating parties. Fig. 8 displays the performance of mutual learning using Hebbian learning rule and the SCSFB. Some simulated scenarios are introduced to show how the SCSFB algorithm improves the security of the neural cryptography without increasing the synchronization time. The first scenario is depicted in curve A, where one of the two communicating partners applies the SCSFB algorithm while the other accepts
ALLAM AND ABBAS: ON THE IMPROVEMENT OF NEURAL CRYPTOGRAPHY
1923
12000
9000 A B C
10000
A B
8000 7000 6000 ED
ED
8000 6000
5000 4000 3000
4000
2000 2000 0
1000 0 0
2000
4000 6000 Iteration steps
8000
10000
Fig. 8. TPM K = 3, N = 3000 and common feedback using the Hebbian learning rule. (A) Mutual learning with only one party using the SCSFB algorithm. (B) Mutual learning normal operation. (C) Mutual learning with the two communicating parties applying the SCSFB.
0
0.5
1 Iteration steps
1.5
2 × 104
Fig. 9. TPM K = 3, N = 3000 and common feedback using the random walk learning rule. (A) Mutual learning with two parties apply SCSFB. (B) Mutual learning with one party applying the SCSFB algorithm.
7000
IX. DTMP A LGORITHM WITH C OMMON S ECRET F EEDBACK AND E RROR P REDICTION While the DTMP algorithm hides the real values of the mutual output bits, the SCSFB makes the input vectors partially invisible. Hence, the two mechanisms can be combined to exploit the strengths of both and produce a system that is more immune to the possible attack strategies. As shown in Fig. 10, curve B, the DTMP algorithm improves the security of neural cryptography by decreasing the overlap between the two parties if one party is still using the value of τ A/B received from its partner without error prediction. Also, it is depicted in curve C that combining SCSFB algorithm with DTMP algorithm with error prediction leads to a larger value of the Euclidean distance between the
A B C D
6000 5000 4000 ED
the input patterns as received. The results show that the two communicating partners are not able to synchronize if one of the two parties is still using the original input vectors. The scenario is similar to what the attacker will experience in any attack strategy. Curve B shows the normal operation of the mutual learning algorithm while curve C represents the mutual learning with SCSFB algorithm. It is clear that curve C is very close to curve B and there is no difference in synchronization time between curve B and curve C. Applying SCSFB with the Hebbian learning rule (6) has no significant effect on the security. Therefore, the two partners become unable to synchronize if one of them is continuously sending erroneous τ bits (Fig. 8, curve A) and the overlap remains constant. In contrast, if the random walk learning rule (7) is used instead, weight updates are affected only by the input vectors and consequently the Euclidean distance increases dramatically as shown in Fig. 9, curve B, which clearly shows how the attacker deviates away from the two parties as a result of applying a wrong input vector. Curve A also indicates that the synchronization is achieved when using SCSFB algorithm with random walk learning rule.
3000 2000 1000 0
0
2000
4000 6000 Iteration steps
8000
10000
Fig. 10. TPM K = 3, N = 3000 DTMP with common feedback. (A) Mutual learning normal operation. (B) Mutual learning with one party applying the DTMP algorithm. (C) Mutual learning with one party applying the DTMP with SCSFB algorithm. (D) Mutual learning with the two communicating parties applying the DTMP and SCSFB algorithm.
two parties and the attacker. When both parties apply the DTMP algorithm with error prediction and SCSFB, they will evidently synchronize (curve D), while any opponent will face difficulties in detecting the correct values of both τ A/B and input patterns xi,k . The values of these variables are functions of the internal weights of the two communicating partners, which are largely different from the attacker’s ones. X. C ONCLUSION In this paper, the security of neural cryptography was shown to be improved by introducing three algorithms, the DTMP algorithm with error prediction, the SCSFB algorithm, and a hybrid solution of both DTMP and SCSFB. The DTMP algorithm with error prediction is based on the idea that the attacker will use the mutual output bits of the two communicating parties (i.e., τ A and τ B ) in its decision to
1924
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 21, NO. 12, DECEMBER 2010
apply its own attack strategy before applying the learning rule. Allowing either τ A or τ B or (both) to become erroneous via a certain strategy will affect the attack decision and the updated weights in both Hebbian and anti-Hebbian learning rules. However, this influences only the attack decision in random-walk learning rule. While the synchronization with feedback hides the input pattern from the attacker, DTMP does hide the mutual output bits. However, the DTMP algorithm with error prediction does not increase the synchronization time and thus outperforms the synchronization with feedback. The new algorithm SCSFB is a hybrid of the DTMP algorithm and the synchronization with feedback scheme. It is based on the idea that the input vector becomes partially secret. Therefore, the attacker cannot apply the learning rule directly without predicting the secret part of the input patterns. Similar to DTMP, SCSFB does not affect the synchronization. The DTMP and SCSFB algorithms are then combined to make use of each in hiding information from the attacker. The SCSFB was shown to improve the DTMP security. The algorithm has outperformed the neural synchronization with feedback, since it does not increase the number of repulsive steps (i.e., no effect on synchronization time). In addition, the algorithm is effective in its ability to confuse the attacker by not giving any window to trust the transmitted bits during a certain predefined period. Also, the DTMP with SCSFB gives better security than using DTMP alone. Therefore, only the two partners can detect the transmitted error, obtain a common input vectors, and hence guarantee synchronization, while the attacker faces difficulties in detecting this error. R EFERENCES [1] A. Pikovsky, M. Rosenblum, and J. Kurths, Synchronization: A Universal Concept in Nonlinear Sciences. Cambridge, U.K.: Cambridge Univ. Press, 2003. [2] C.-M. Kim, S. Rim, and W.-H. Kye, “Sequential synchronization of chaotic systems with an application to communication,” Phys. Rev. Lett., vol. 88, no. 1, pp. 014103-1–014103-4, Dec. 2001. [3] A. I. Galushkin, Neural Network Theory. New York: Springer-Verlag, 2007. [4] G. Pölzlbauer, T. Lidy, and A. Rauber, “Decision manifolds—a supervised learning algorithm based on self-organization,” IEEE Trans. Neural Netw., vol. 19, no. 9, pp. 1518–1530, Sep. 2008. [5] I. Kanter, W. Kinzel, and E. Kanter, “Secure exchange of information by synchronization of neural networks,” Europhys. Lett., vol. 57, no. 1, pp. 141–147, 2002. [6] W. Diffie and M. Hellman, “New directions in cryptography,” IEEE Trans. Inform. Theory, vol. 22, no. 6, pp. 644–654, Nov. 1976. [7] A. J. Menezes, S. A. Vanstone, and P. C. Van Oorschot, Handbook of Applied Cryptography. Boca Raton, FL: CRC Press, 1996. [8] B. Schneier, Applied Cryptography: Protocols, Algorithms, and Source Code in C, 2nd ed. New York: Wiley, 1995. [9] W. Stallings, Cryptography and Network Security: Principles and Practice. Upper Saddle River, NJ: Pearson, 2002.
[10] E. Klein, R. Mislovaty, I. Kanter, A. Ruttor, and W. Kinzel, “Synchronization of neural networks by mutual learning and its application to cryptography,” in Advances in Neural Information Processing Systems 17, L. K. Saul, Y. Weiss, and L. Bottou, Eds. Cambridge, MA: MIT Press, 2005, pp. 689–696. [11] M. Rosen-Zvi, E. Klein, I. Kanter, and W. Kinzel, “Mutual learning in a tree parity machine and its application to cryptography,” Phys. Rev. E, vol. 66, no. 6, pp. 066135-1–066135-13, Dec. 2002. [12] M. Volkmer, “Entity authentication and authenticated key exchange with tree parity machines,” in “Cryptology ePrint Archive,” Inst. Comput. Technol., Hamburg Univ. Technol., Hamburg, Germany, Rep. 2006/112, 2006. [13] A. Ruttor, W. Kinzel, L. Shacham, and I. Kanter, “Neural cryptography with feedback,” Phys. Rev. E, vol. 69, no. 4, pp. 046110-1–046110-7, Apr. 2004. [14] A. Engel and C. P. L. Van den Broeck, Statistical Mechanics of Learning. Cambridge, U.K.: Cambridge Univ. Press, 2001. [15] J. Hertz, R. G. Palmer, and A. S. Krogh, Introduction to the Theory of Neural Computation. Cambridge, MA: Perseus Publishing, 1991. [16] A. Ruttor, W. Kinzel, and I. Kanter, “Dynamics of neural cryptography,” Phys. Rev. E, vol. 75, no. 5, pp. 056104-1–056104-4, May 2007. [17] A. Ruttor, “Neural synchronization and cryptography,” Ph.D. thesis, Fakultät für Physik und Astronomie, Inst. für Theoretische Physik und Astrophysik, Würzburg, Germany, 2006. [18] A. Ruttor, W. Kinzel, R. Naeh, and I. Kanter, “Genetic attack on neural cryptography,” Phys. Rev. E, vol. 73, no. 3, pp. 036121-1–036121-8, 2006. [19] A. Klimov, A. Mityagin, and A. Shamir, “Analysis of neural cryptography,” in Proc. 8th Int. Conf. Theory Appl. Cryptology Inform. Secur., London, U.K., 2002, pp. 288–298. [20] N. Nedjah, A. Abraham, and L. de M. Mourelle, Genetic Systems Programming: Theory and Experiences. New York: Springer-Verlag, 2009.
Ahmed M. Allam received the Graduate degree from the Computer and Systems Engineering Department, Ain Shams University, Cairo, Egypt, in 2008. He is currently pursuing the Masters degree in the same university. He joined Mentor Graphics Egypt, Cairo, as a Quality Assurance Engineer in 2008. In 2010, he joined a Synopsys partner, Swiftronix, Cairo, as a Digital Design and Verification Engineer. His current research interests include computational intelligence, digital design, quantum computing, cryptography, and quantum cryptography.
Hazem M. Abbas (S’92–M’94) received the B.Sc. and M.Sc. degrees from Ain Shams University, Cairo, Egypt, in 1983 and 1988, respectively, and the Ph.D. degree from Queen’s University at Kingston, ON, Canada, in 1993, all in electrical and computer engineering. He worked as Post-Doctoral Fellow at Queen’s University and as a Research Fellow at the Royal Military College at Kingston, ON, and then joined the IBM Toronto Laboratory, Toronto, ON, as a Research Associate. He also worked in the Department of Electrical and Computer Engineering, Queen’s University, as an Adjunct Assistant Professor. He is currently a Professor at the Department of Computers and Systems Engineering at Ain Shams University and working for Mentor Graphics Egypt, Cario, as a Senior Research and Development Engineering Manager. His current research interests include neural networks, pattern recognition, evolutionary computations, and image processing and their implementation on parallel architectures. Prof. Abbas serves as the President of the IEEE Signal Processing Chapter in Cairo.