Investigation on Scrambler Reconstruction with ... - Semantic Scholar

Report 2 Downloads 100 Views
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2011 proceedings.

Investigation on Scrambler Reconstruction With Minimum A Priori Knowledge Xiao-Bei Liu ∗ , Soo Ngee Koh∗ , Xin-Wen Wu† and Chee-Cheon Chui‡ ∗

School of Electrical and Electronic Engineering Nanyang Technological University Singapore 639798 † School of Information and Communication Technology Griffith University Gold Coast, QLD 4222, Australia ‡ Temasek Laboratories Nanyang Technological University Singapore 639798 Abstract—The existing algorithm for reconstruction of the feedback polynomial in a linear scrambler relies on the assumption that the input sequence is produced by a biased memoryless source and the source bias is known and used for reconstruction. In this paper, the problem of reconstruction of the feedback polynomial in a linear scrambler without knowledge of the source bias is studied. An algorithm is proposed to reconstruct the scrambler without knowledge of the source bias and factors which affect the performance of this algorithm are discussed. A scheme which reduces the number of bits used in the reconstruction without affecting the detection capability of the reconstruction algorithm is also proposed in this paper.

scrambler. Both types of scrambler usually consist of a linear feedback shift register (LFSR) whose output sequence (st )t≥0 is combined with the input sequence (xt )t≥0 and the result is (yt )t≥0 , i.e., (1) yt = xt ⊕ st t ≥ 0 The structure of a synchronous scrambler is shown in Fig. 1.

Index Terms—scrambler, linear feedback shift register, feedback polynomial Fig. 1.

Structure of synchronous scrambler

I. I NTRODUCTION A linear scrambler is usually used in communication systems to convert a data bit sequence into a pseudorandom sequence that is free from long strings of 1s or 0s. In this paper, we consider a scenario wherein the specifications of the scrambler used by the transmitter are not known perfectly by the receiver. In this case, to obtain the data before the scrambling, some parameters of the scrambler need to be first recovered at the receiver. The capability of recovering the scrambler when its specifications are fuzzy is envisaged to be an enabling technology in digital communication systems with flexible platform such as software defined radio (SDR). As SDR adaptively changes its configurations (frequency band, modulation type, scrambler, encoder, etc.) according to the changing communication environments, some work has been done on the design of an intelligent receiver which can adapt itself to different building blocks of the transmitter [1]-[3]. Similar application is also envisaged in [4] for “multi-standard adaptive receivers”. However, the work that has been done on the reconstruction of scramblers is still limited [5]. There are generally two types of linear scrambler. One is synchronous scrambler and the other is self-synchronized

To achieve the maximum period for the sequences produced by the scramblers, binary primitive polynomials are usually used as the feedback polynomials for the LFSR. Reconstructing a linear scrambler consists of recovering the feedback polynomial of the LFSR as well as its initial state in the case of a synchronous scrambler. In this paper, we will focus on reconstructing the feedback polynomial of the LFSR, as reconstructing the initial state of the LFRS is a well known problem for stream cipher and it has been extensively studied in the literature [6]-[9]. In [5], an algorithm is proposed for reconstructing the feedback polynomial of a scrambler by using only the output bit sequence of the scrambler. The reconstruction highly relies on the assumption that the input of the scrambler is produced by a biased memoryless source with Pr(xt = 0) = 12 + ε, where ε = 0, and ε is perfectly known and used for reconstruction. In this paper, we consider a more general case, i.e., the input sequence is produced by a biased source, but ε instead of known a priori, needs to be estimated. For simplicity, only synchronous scrambler is considered. However, our idea is also applicable to self-synchronized scrambler. The paper is organized as follows. In Section II, the existing algorithm to

978-1-4244-9268-8/11/$26.00 ©2011 IEEE

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2011 proceedings.

reconstruct a synchronous scrambler is reviewed. In Section III, a scheme to estimate ε is proposed, and the accuracy of the estimation is analyzed. In Section IV, a new algorithm which reconstructs the scrambler without knowing ε is proposed, and factors which affect the performance of the proposed algorithm are studied. In Section V, a scheme which allows a much higher false-alarm probability in the detection, but does not affect the detection capability is proposed. Conclusions are drawn in Section VI. II. T HE E XISTING A LGORITHM TO R ECONSTRUCT A S YNCHRONOUS S CRAMBLER In [5], an algorithm for reconstruction of the feedback polynomial P (X) of a synchronous scrambler is proposed. Instead of searching the feedback polynomial of the constituent LFSR of the scrambler directly, the algorithm searches for sparse multiples of the feedback polynomial with the degree of the sparse multiples varying from low to high. After 2 or more sparse multiples are detected, the feedback polynomial of the constituent LFSR can be deduced by computing the greatest common divisor (gcd) of the detected sparse multiples. The algorithm is outlined below for reader’s convenience. It includes the following steps: 1) Compute the threshold T as follows: T =

a(a + b¯ σl ) (2|ε|)d

where a = Φ−1 (1 −

Pf ) 2

b = −Φ−1 (Pn ) and σ ¯l =



(1 + 2d(d − 1))(1 − (2ε)2d )

(2)

(3)

(5)

Pf and Pn denote the false-alarm probability and the non-detection probability respectively and they are predefined. d denotes the weights of the sparse multiples of P (X) and typical values of d are 3, 4, and 5. 2) For (i1 , ..., id−1 ), 0 < i1 < ... < id−1 ≤ D (D is the maximum degree of the sparse multiple we want to search), compute the number of bits N required to recover the feedback polynomial as follows: (a + b¯ σl )2 (2|ε|)2d

d−1  j=1

yt−ij

(9)

4)  If |Z| > T , store the sparse multiple Q(X) = 1 + d−1 ij in a table. j=1 X 5) When another sparse multiple, say Q( X), is detected and Q (X) = Q(X), compute the nontrivial gcd of (Q(X), Q (X)). Steps 2 to 5 are repeated until a gcd(Q(X), Q (X)) = P (X) (P (X) = 1) is found or all combination of (i1 , ..., id−1 ) are tested. The algorithm described above uses the distribution of Z to determine whether Q(X) is a multiple of P (X). When Q(X) is not a multiple of P (X), Z has a Gaussian distribution with mean value 0 and variance N −id−1 ; when Q(X) is a multiple of P (X), Z has a Gaussian distribution with mean value μ and variance σ 2 , where μ is given by μ = (N − id−1 )(2ε)d and σ is given by σ≤



N − id−1 · σ ¯l

(10)

(11)

It is noted that the threshold T and the number of bits N must be determined before the searching of the sparse multiples starts. According to Equations (2) and (7), the values of T and N depend on the value of ε. In the algorithm proposed by Cluzeau, ε is assumed to be known and used for reconstruction. Consider a more general case, i.e., ε is unknown in the ˜ denote the number of bits used in the reconstruction. Let N ˜ may be reconstruction when ε is unknown. Obviously N ˜ will be smaller or larger than N . The proper choice of N discussed in next section. To see how the value of Z varies ˜ and ε, in Figure 2, the variations of Z versus according to N ˜ N are plotted for different biased sources. The solid curves in Figure 2 represent the values of Z obtained in a simulation. As Z is Gaussian distributed, we know that 99.7% of the values of Z will fall within the range of [μ−3σ, μ+3σ]. In the following, we will call μ − 3σ the statistical lower bound of Z and μ + 3σ the statistical upper bound of Z. We also assume ˜ − id−1 can be approximated by ˜  id−1 and thus N that N ˜ N . According to (11), the  statistical lower bound of Z is ˜ (2ε)d − 3 N ˜σ approximately N ¯l and the  statistical upper d ˜σ ˜ ¯l . In Figure 2, bound of Z is approximately N (2ε) + 3 N the statistical upper and lower bound of Z are represented by the dashed curves beyond and below each solid curve.

(7)

3) Initialize Z with Z = 0 For t from id−1 to N , compute z t = yt ⊕

Z = Z + (−1)zt

III. E STIMATION OF THE S OURCE B IAS (4)

In the above equations, Φ denotes the normal distribution function, i.e.,  x t2 1 (6) exp(− )dt Φ(x) = √ 2 2π −∞

N = id−1 +

and

(8)

From Figure 2, it can be seen that the value of Z generally ˜ when Q(X) is a multiple of P (X) and Z grows with N grows faster with bigger ε. (In this paper, we only consider the case that ε > 0. When ε < 0, the estimation is still the same, except that we need to take note of the sign of Z).

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2011 proceedings.

ε˜ and the actual bias ε is smaller than a value δ is given by

5

3.5

x 10

P (|˜ ε − ε| < δ) = P (ε − δ < ε˜ < ε + δ)

Q(X) is a multiple of P(X) (ε=0.3) Q(X) is a multiple of P(X) (ε=0.2) Q(X) is a multiple of P(X) (ε=0.1) Q(X) is not a multiple of P(X)

3

d

d

= P ((2(ε − δ)) < Zn < (2(ε + δ)) ) (2(ε + δ))d − μn (2(ε − δ))d − μn ) − Φ( ) σn σn  ˜ ((2(ε + δ))d − (2ε)d ) N  ) ≈ Φ( 1 + 2d(d − 1)  ˜ ((2(ε − δ))d − (2ε)d ) N  − Φ( ) 1 + 2d(d − 1)

2.5

= Φ(

Value of Z

2 1.5 1 0.5 0 −0.5

0

5 10 ˜) Number of bits used in the reconstruction(N

Fig. 2.

15

In Table I, the values of δ such that more than 99.9% of ε˜ satisfy |˜ ε − ε| < δ are calculated and shown for different ˜ and ε. values of N

5

x 10

TABLE I ε − ε| < δ VALUES OF δ SUCH THAT MORE THAN 99.9% OF ε˜ SATISFY |˜ ˜ AND ε (d = 3) FOR DIFFERENT COMBINATIONS OF N

˜ (d = 3) Variation of Z versus N

Based on the observation from Figure 2, when ε is unknown, ˜ when we can roughly estimate ε from the values of Z and N Q(X) is a multiple of P (X). More specifically, ε can be approximated by an estimated value ε˜, which is given by   1d Z Z 1d ≈ (12) ε˜ = ˜ − id−1 ˜ 2 N 2 N As an example, from Figure 2, we can see that when ˜ = 106 , the corresponding values of Z for ε = 0.1, 0.2 N and 0.3 are 9379, 64861 and 217403 respectively. By using Equation (12), we can obtain the estimated values of ε of 0.105, 0.201 and 0.301 respectively, which approach the actual values of ε very closely. In the following, the accuracy of estimating ε will be Z discussed. Suppose Zn = N ˜ , then we have FZn (z) = P (Zn ≤ z) Z = P ( ≤ z) ˜ N ˜ · z) = P (Z ≤ N  N˜ ·z (t − μ)2 1 √ exp(− )dt = 2σ 2 2πσ −∞

(15)

˜ N 104 105 106 107 108

ε = 0.3 0.022 0.0068 0.0022 0.00068 0.00021

ε = 0.2 0.051 0.014 0.0042 0.0014 0.00042

ε = 0.1 0.246 0.167 0.0162 0.005 0.0015

ε = 0.05 0.206 0.154 0.115 0.04 0.006

Let us call |˜ ε − ε| the estimation error of ε. From Table I, it can be observed that the estimation error normally increases ˜ is fixed, and decreases with with decrease of ε when N ˜ ˜ = 106 , 99.9% of increase of N when ε is fixed. When N the estimation errors for ε = 0.1, 0.2 and 0.3 are smaller than 0.0162, 0.0042 and 0.0022 respectively. Those results correspond very well to the estimation errors deduced from ˜ = 106 , the Figure 2. From Figure 2, we know that when N estimation errors for ε = 0.1, 0.2 and 0.3 are 0.005, 0.001 and 0.001 respectively, which are smaller than the δ values shown in Table I. IV. R ECONSTRUCT A S YNCHRONOUS S CRAMBLER W ITHOUT K NOWLEDGE OF THE S OURCE B IAS

(13)

and

In this Section, the algorithm which recovers the synchronous scrambler without knowledge of the source bias will be described. It uses the scheme proposed in Section III to estimate the source bias during the reconstruction. A. The Proposed Algorithm

fZn (z) =

FZ n (z)

˜ · z − μ)2 ˜ (N N exp(− ) =√ 2σ 2 2πσ μ 2 (z − N 1 ˜) exp(− =√ σ 2 ) σ 2( N˜ ) 2π( N˜ )

(14)

From (14), we can see that Zn is Gaussian distributed√with μ σ 2 2 ˜ = 12 d Zn , mean value μn = N ˜ and variance σn = ( N ˜ ) . As ε the probability that the difference between the estimated bias

The proposed algorithm includes the following steps: ˜ 1) Set a false alarm probability Pf and a number of bits N used in the reconstruction. We will discuss later what a ˜ should be. proper choice of N 2) For (i1 , ..., id−1 ), 0 < i1 < ... < id−1 ≤ D (D is the maximum degree of the sparse multiple we want to search), calculate a by a = Φ−1 (1 −

Pf ) 2

(16)

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2011 proceedings.

and the threshold T by

 ˜ − id−1 T =a· N

20000

3) Initialize Z with Z = 0 ˜ , compute For t from id−1 to N d−1 

15000

yt−ij

(18)

j=1

and (19) Z = Z + (−1) d−1 ij 4) If |Z| > T , store Q(X) = 1 + j=1 X in a table. Estimate ε by Equation (12) 5) When ε is obtained, follow Cluzeau’s algorithm as described before until another multiple Q (X) is detected.

Value of Z

z t = yt ⊕

Q(X) is a multiple of P(X) (ε=0.3) Q(X) is a multiple of P(X) (ε=0.2) Q(X) is a multiple of P(X) (ε=0.1) Q(X) is not a multiple of P(X)

(17)

10000

5000

zt

0

−5000

˜ on the Performance of the Proposed B. The Impact of N Algorithm. The difference between the proposed algorithm and Cluzeau’s algorithm is that, before ε is obtained in Step 4, the number of bits used in the recovery must be “guessed”. ˜ on the performance Next, we will examine the impact of N of the proposed algorithm. ˜ on the false-alarm Firstly, let us study the impact of N probability Pf . According to (16), Pf = 2(1−Φ(a)), i.e., Pf is ˜ . Next, let us examine the impact of N ˜ on the independent of N non-detection probability Pn . According to the distributions of Z, Pn is given by T − |μ| ) Pn ≈ Φ(  ˜ ·σ N ¯l  ≈ Φ( 



˜ · a − (N ˜ )(2ε)d N

˜ · (1 + 2d(d − 1))(1 − (2ε)2d ) N  ˜ (2ε)d a− N ) ≈ Φ(  1 + 2d(d − 1)

)

(20)

˜. According to Equation (20), Pn decreases with increase in N ˜ If N is very small, Pn might be too big for multiples of the feedback polynomial P (X) to be detected. The source bias ε ˜ . For a fixed Pn , the larger the also affects the choice of N ˜ will be. To clarify value of ε is, the smaller the value of N this, in Figure 3, we take a closer look of Z for a short range ˜. of N From Figure 3, it can be observed that a large part of the area between the upper and lower bounds of the curve obtained under the condition that Q(X) is a multiple of P (X) and ε = 0.1 and the area between the upper and lower bounds of the curve obtained under the condition that Q(X) ˜ is is not a multiple of P (X) overlap. It means that when N not big enough and the source bias ε ≤ 0.1, Q(X) might not be detected as a multiple of P (X), even though it is ˜ becomes smaller, actually a multiple of P (X). When N e.g., < 104 , Q(X) might not be detected as a multiple of

0

1

Fig. 3.

2 3 4 5 6 ˜) Number of bits used in the reconstruction(N

7

8 4

x 10

˜ for short range of N ˜ Variation of Z versus N

P (X) even though the source bias is bigger, i.e., ε ≤ 0.2. This corresponds very well to the conclusions we draw from Equation (20). ˜ for the Proposed Algorithm C. Setting of N From the previous description, we know that when ε ˜ is critical for the algorithm is unknown, the choice of N proposed in Section IV-A to be successful. On one hand, if ˜ is too small, the non-detection probability Pn will be too N high for the algorithm to be successful. On the other hand, ˜ is too big, the searching time will be wasted. To show if N ˜ should be, in Table II, the values what a proper choice of N ˜ and ε. of Pn are listed for different combination of N TABLE II ˜ AND ε (Pf = 2 · 10−7 ). VALUES OF Pn FOR DIFFERENT N ˜ N 104 105 106 107

108 1012

ε = 0.3 < 10−10 < 10−10 < 10−10 < 10−10 < 10−10 < 10−10

ε = 0.2

ε = 0.1

ε = 0.05

ε = 0.01

0.11 < 10−10 < 10−10

0.99999

0.9999998

0.9999999

0.996

0.9999995

0.9999999

0.0025 < 10−10

0.999986

0.9999999

0.9787 7.5 · 10−7

0.9999999

< 10−10 < 10−10 < 10−10

< 10−10 < 10−10

< 10−10

0.9999998 0.0025

From Table II, it can be observed that for a strongly biased source (ε ≥ 0.3), using a bit sequence with length of 104 is enough for the reconstruction. In contrast, for a weakly biased source (ε ≤ 0.01), the length of the bit sequence needs to be larger than 1012 . In practical situations, typical values of ε are between 0.1 and 0.05 [5], and the corresponding lengths of bit sequence needed are in the range of 107 to 108 . Therefore, in the situation that ε is totally unknown, using 108 of bits at first in the reconstruction process is a reasonable choice.

This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2011 proceedings.

V. S CHEME TO R EDUCE THE N UMBER OF B ITS IN THE R ECONSTRUCTION From Equation (20), it can be observed when a is reduced, ˜ is required to achieve the same value of smaller value of N Pn . However, according to (16), when a becomes smaller, the false-alarm probability Pf will increase. In the following, a scheme to reduce a without affecting Pf will be proposed. d−1 When a binary polynomial Q(X) = 1 + j=1 X ij is detected as a t-nomial multiple of P (X), the probability that Q(X) is actually not a multiple of P (X) is Pf . According to the property of Q(X), we know that if Q(X) is a t-nomial k multiple of P (X), Q2 (X), k = 1, 2, ... are also t-nomial multiples of P (X). This property can be used to improve the performance of the detection. The scheme is as follows: when Q(X) is detected as a multiple of P (X), Q2 (X), k Q4 (X),...,Q2 (X), (k is a small positive integer), are then tested. If all of them are detected as multiples of P (X), Q(X) is stored in the table as a multiple of P (X). Otherwise, Q(X) will not be stored as a multiple of P (X). In this case, the new false-alarm probability Pf will be the product of the false-alarm probability in each individual detection, k+1 . The new non-detection probability will i.e., Pf = (Pf ) become Pn = 1 − (1 − Pn )k+1 . Suppose the target falsealarm probability is 2 · 10−7 , the non-detection probability Pn = 10−5 and k =√3, then Pf in each individual detection 4 can be as big as 2 · 10−7 = 0.02, the corresponding value of a is 2.33. Comparing with the value of a when Pf = 2 · 10−7 , i.e., a = 5.19, it can be observed that a is significantly reduced. The new non-detection probability is Pn = 1 − (1 − 10−5 )4 = 4 · 10−5 . Comparing with Pn in each individual detection, it can be observed that Pn is not affected much. ˜ according to different The percentages of the reduction in N values of k are shown in Table III. These results are obtained with Pf = 2 · 10−7 , Pn = 10−5 . From Table III, it can TABLE III ˜ T HE PERCENTAGES OF THE REDUCTION IN N k pf a ˜ Reduction in N

1 4.47.10−4 3.51 32%

2 0.0058 2.76 45%

3 0.02 2.31 52%

4 0.046 2.0 56%

be observed that with the increase in k, the number of bits required to do the reconstruction is reduced. However, the difference in the reduction between k − 1 to k (k = 2,3 and 4) becomes smaller and smaller with the increase in k. Considering that when k increases, pf will also increase and more time is needed to check those wrongly-detected multiples, we recommend to use k = 2 or k = 3 in real applications.

VI. C ONCLUSION The algorithm proposed in [5] is very promising in reconstruction of the LFSR in a synchronous scrambler, as it does not need any a priori knowledge of the input bits, except the source bias ε. In this paper, a more general case is considered, i.e., ε is unknown in the reconstruction. An algorithm which reconstructs the scrambler without knowing ε a priori is proposed. This is important for multi-standard adaptive and cognitive radios operating possibly in a heterogeneous wireless environments. In the proposed algorithm, a scheme is used to estimate ε during the reconstruction process. The larger the number of bits used in the reconstruction, the more accurate the estimation is. When the number of bits is fixed, the larger the source bias is, the more accurate the estimation will be. The number of bits used in the reconstruction is also very critical for the proposed algorithm to be successful. A weakly biased source generally requires a larger number of bits than a strongly biased source to do the recovery. For typical values of ε which are between 0.1 and 0.05, the corresponding lengths of bit sequence needed are in the range of 107 to 108 . A new scheme which exploits the property of the sparse multiples is also proposed in this paper to reduce the number of bits used in the reconstruction. Our analysis shows that by using our proposed scheme, the number of bits required to do the reconstruction can be reduced by more than half, without affecting the detection capability of the algorithm. It should be noted that the ideas proposed in this paper are also applicable to self-synchronized scrambler. R EFERENCES [1] K. Umebayashi, S. Ishii, and R. Kohno, “Blind adaptive estimation of modulation scheme for software defined radio,” in Proc. PIMRC, 2000, pp. 43-47. [2] H. Ishii, S. Kawamura, T. Suzuki, M. Kuroda, H. Hosoya, H. Fujishima, “An Adaptive Receiver based on Software Defined Radio Techniques,”, in Proc. 12th PIMRC, vol. 2, pp. 120-124, USA, Sep. 2001. [3] C. Han, A. Doufexi, S. Armour, K. H. Ng, J. McGeehan, “Adaptive MIMO OFDMA for Future Generation Cellular Systems in Realistic Outdoor Environment,” IEEE VTC Spring, May 2006. [4] R. Gautier, G. Burel, J. Letessier and O. Berder, “Blind Estimation of Scrambler Offset Using Encoder Redundancy,” Thirty-Sixth Asilomar Conference on Signals, Systems and Computers, 2002. [5] M. Cluzeau, “Reconstruction of a linear scrambler,” IEEE Transactions on Computers, vol. 56, no. 9, 2007, pp. 1283-1291. [6] T. Siegenthaler, “Decrypting a class of stream ciphers using ciphertext only,” IEEE Trans. Computers, vol. 34, no. 1, pp.81-84, 1985. [7] W. Meier and O. Staffelbach, “Fast correlation attack on certain stream ciphers,” J. Cryptology, vol. 1, no. 3, pp. 159176, 1989. [8] A. Canteaut and E. Filiol, “Ciphertext Only Reconstruction of Stream Ciphers Based on Combination Generators,” Proc. Seventh Int’l Workshop Fast Software Encryption (FSE ’00), 2000, pp. 165-180. [9] X. Wu, S. N. Koh and C. C. Chui, “Primitive polynomials for robust scramblers and stream ciphers against reverse engineering,” Proc. IEEE Int’l Symp. Information Theory (ISIT), Austin, Texas, U.S.A., June 13 18, pp. 2473-2477, 2010.