Concurrent Error Detection in a Polynomial Basis ... - Springer Link

Report 3 Downloads 61 Views
JOURNAL OF ELECTRONIC TESTING: Theory and Applications 22, 143–150, 2006 c 2006 Springer Science + Business Media, LLC. Manufactured in The United States.  DOI: 10.1007/s10836-006-7446-9

Concurrent Error Detection in a Polynomial Basis Multiplier over GF(2m ) CHIOU-YNG LEE Department of Computer Information and Network Engineering, Lunghwa University of Science and Technology, Taoyuan County 333, Taiwan, R.O.C. [email protected]

CHE WUN CHIOU∗ Department of Computer Science and Information Engineering, Ching Yun University, Chung-Li 320, Taiwan, R.O.C. [email protected]

JIM-MIN LIN Department of Information Engineering and Computer Science, Feng Chia University, Taichung City 407, Taiwan, R.O.C. [email protected]

Received June 13, 2005; Revised January 20, 2006 Editor: M. Sonza Reorda

Abstract. Eliminating cryptographic computation errors is vital for preventing attacks. A simple approach is to verify the correctness of the cipher before outputting it. The multiplication is the most significant arithmetic operation among the cryptographic computations. Hence, a multiplier with concurrent error detection ability is urgently necessary to avert attacks. Employing the re-computing shifted operand concept, this study presents a semi-systolic array polynomial basis multiplier with concurrent error detection with minimal area overhead. Moreover, the proposed multiplier requires only two extra clock  cycles while traditional multipliers using XOR trees consume at least log2 m extra XOR gate delays in GF(2m ) fields. Keywords:

1.

finite fields arithmetic, multiplier, fault-tolerant computing, fault detection, cryptography

Introduction

In recent years, finite field arithmetic operations in GF(2m ) were frequently desired in coding theory [1], cryptography [2], digital signal processing [3, 4], switching theory [5], and pseudorandom number generation [6]. There are three popular types of bases over finite fields; polynomial basis (PB) [7–15], normal basis (NB) [16, 17], and dual basis (DB) [18, 19]. Among the finite field arithmetic operations, multiplication is the most important, complex and time consuming. Other complex operations, such as exponentiation, inversion and division, can normally be adopted with Fermat’s theorem to conduct the iterative multiplication operations, particularly for cryptographic systems. With the advantages of low design complexity, simplicity, regularity, and modularity in architecture, the polynomial basis multipliers are

∗ To

whom correspondence should be addressed.

widely used for producing efficient VLSI multiplier implementations. Fault based cryptanalysis, which deliberates fault injection into cryptographic devices is an effective cryptanalysis technique against symmetrical and asymmetrical encryption algorithms. Kelsey et al. [20] indicated that differential fault analysis requires only 50–200 cipher text blocks to recover a symmetrical block cipher Data Encryption Standard (DES) key by deliberating fault injection. Biham and Shamir [21] and Boneh et al. [22] also developed fault-based cryptanalysis on symmetrical and asymmetrical cryptosystems, respectively. The presence of faults in cryptographic devices can lead to an active attack. Hence, the simplest method for protecting the encryption/decryption circuitry from an attacker is to ensure that the computational device can confirm the accuracy of the signature before outputting

144

Lee, Chiou and Lin

it. Many error-detection schemes have been developed for symmetrical [23, 24] and asymmetrical cryptosystems [25– 28] to output error-free values. Fenn et al. [27] proposed an on-line error detection scheme for bit-serial multipliers in GF(2m ) using parity prediction. By applying the same parity checking approach, Reyhani-Masoleh and Hasan [28] provided error detection methods in bit-parallel and bit-serial polynomial basis multipliers in GF(2m ). The major problem involved in adopting parity checking is that it takes a long time to generate parity. A XOR  tree is utilized for computing parity, requires at least log2 m XOR-gate delays to determine parity. The value of m in applicable cryptosystems is typically very large, for example, m = 512 or more. Hence, the parity checking method is not allowed to provide on-line error detection capability in systolic array multipliers with bit-parallel output. However, our previous paper (2005) [29] has solved this difficulty for the case of dual basis representation with the parity checking method. In [30], Chiou has provided a concurrent error detection scheme for a special case of polynomial basis representation, termed all-one polynomial. However, the concurrent error detection scheme for the general case of polynomial basis representation is very difficult and has not been attempted. As aforementioned, on-line error detection is very important for protecting encryption/decryption processes from both faults and attackers. To overcome this problem for the polynomial basis representation, this work employs the concept of REcomputing with Shifted Operands (RESO) [31–33] to provide a concurrent error detection method for polynomial basis multipliers based on general polynomials. The proposed method needs only two additional clock cycles.

nomial (AOP) of degree m if pi = 1 for all 0 ≤ i ≤ m − 1 [11]. The RESO scheme [31, 32] is based on time redundancy. Let the function G(x) be a function unit and the function T be such that T −1 (G (T (x))) = G (x) for all input x. The results are computed twice. The result from the first computation step for computing G(x) is stored in the register. During the second computation step, the result yielded by computing T −1 (G (T (x))) is compared with the result from the first step. A mismatch indicates an existing error. The RESO method has been proven to be effective for a bit-sliced ripple-carry adder and a carry-lookahead adder in [31]. The RESO method has also been expanded to both multiply and divide arrays in [32]. A similar technique has been developed by Minero et al. [33]. The advantage of the RESO scheme is that it can detect both permanent and intermittent failures. The fault model assumed in the RESO scheme is the functional fault model. The functional fault model assumes that any faults are confined to a small area of the circuit and that the precise nature of these faults is not known. The functional fault model is appropriate for VLSI circuits. Error detection capability of the RESO method is reliable to the amount of shift. The RESO-1 (shifted by 1 bit) method is employed in this paper.

2.

C = AB mod P(x) = c0 + c1 α + c2 α 2 + · · · + cm−1 α m−1 ,

Preliminaries

It is assumed that the reader is familiar with the basic concepts of finite fields and the RESO method. For more information, the reader can refer to [2] for finite fields and [31] for the RESO method. In the following paragraphs, the results from the finite fields and RESO method are briefly reviewed. Let GF(2m ) be a finite field of 2m elements. GF(2m ) is an extension field of the ground field GF(2). Let α be a root of an irreducible primitive polynomial P(x) = p0 + p1 x 1 + p2 x 2 + · · · + pm−1 x m−1 + x m of degree m over GF(2), where  p0 = 1 because P(x) is not divisible by x + 1. Thus, ψ = 1, α, α 2 , α 3 , . . . , α m−1 is a polynomial basis of GF(2m ). Any elements A, B ∈ GF(2m ) can be represented by A = a0 + a1 α + a2 α 2 + · · · + am−1 α m−1 , B = b0 + b1 α + b2 α 2 + · · · + bm−1 α m−1 , where ai , bi ∈ {0, 1} for all 0 ≤ i ≤ m − 1. The polynomial of the form P(x) = p0 + p1 x + p2 x 2 + · · ·+ pm−1 x m−1 +x m over GF(2) is termed an All-One Poly-

3.

The Proposed Polynomial Basis Semi-Systolic Array Multiplier

Let the product C of A and B be represented by

where ci is over GF(2) for 0 ≤ i ≤ m − 1. Using Horner’s rule, we can write the product C as C = AB = (a0 + a1 α + a2 α 2 + · · · + am−1 α m−1 )B = a0 B + a1 α B + a2 α 2 B + · · · + am−1 α m−1 B. (1) Let D[i, j] denote a 1-bit latch, 0 ≤ i, j ≤ m − 1. The relations between D[i, j]’s are given by D [0, j + 1] = D [m − 1, j] for 0 ≤ j ≤ m − 2, D [i + 1, j + 1] = D [i, j] + pi+1 D [m − 1, j] for 0 ≤ i, j ≤ m − 2.

The initial values of D[i, j]’s are assigned as follows: D [i, 0] = bi

for 0 ≤ i ≤ m − 1.

Concurrent Error Detection in a Polynomial Basis Multiplier over GF(2m )

145

Based on the D[i, j] notations and the following property: α m = 1 + p1 α + p2 α 2 + · · · + pm−1 α m−1 .

(2)

each term in Eq. (1) is separately computed as follows a0 B = a0 (b0 + b1 α + b2 α 2 + · · · + bm−1 α m−1 ) = a0 (D [0, 0] + D [1, 0] α + D [2, 0] α 2 + · · · +D [m − 1, 0] α m−1 ), a1 α B = a1 α(D [0, 0] + D [1, 0] α + D [2, 0] α 2 + · · · +D [m − 1, 0] α m−1 ) = a1 (D [0, 0] α + D [1, 0] α 2 + D [2, 0] α 3 + · · ·  +D [m − 2, 0] α m−1 + D [m − 1, 0] α m   D [0, 0] α + D [1, 0] α 2 + D [2, 0] α 3 + · · ·  = a1  +D [m − 2, 0] α m−1 + D [m − 1, 0]  1 + p1 α + p2 α 2 + · · · + pm−1 α m−1   D [m − 1, 0] + (D [0, 0] + p1 D [m − 1, 0]) α  = a1  + (D [1, 0] + p2 D [m − 1, 0]) α 2 + · · · + m−1 (D [m − 2, 0] + pm−1 D [m − 1, 0]) α

Fig. 1.

The semi-systolic array multiplier using a general polynomial.

= a1 (D [0, 1] + D [1, 1] α + D [2, 1] α 2 + · · · +D [m − 1, 1] α m−1 ).

In general, ai α i B (0 ≤ i ≤ m − 1) can be obtained by ai α i B = ai (D [0, i] + D [1, i] α + D [2, i] α 2 + · · · +D [m − 1, i] α m−1 ).

(3)

The coefficients of the product C are produced by summing the corresponding coefficients of each term in Eq. (1). In other words, ci for 0 ≤ i ≤ m − 1 is given by ci =

m−1

a j D [i, j].

(4)

j=0

The proposed semi-systolic array implementation of the multiplication based on Eq. (4) is shown in Fig. 1. The detailed circuit for cell Ui,j in Fig. 1 is described in Fig. 2. The (j + 1)th row in Fig. 1 is responsible for computing a j α j B. The (j + 1)th row in Fig. 1 consists of cells Ui,j for 0 ≤ i ≤ m − 1. The major advantage of the semi-systolic multiplier array is that it only needs about one-third latency of the corresponding systolic multiplier array. However, the drawback of the semi-systolic multiplier array is the existence of the global line. 4.

The Proposed Concurrent Error Detection Multiplier Over GF(2m )

As aforementioned, ψ = {1, α, α 2 , α 3 , . . . , α m−1 } is a polynomial basis. In [34], the shifted polynomial basis

Fig. 2.

The circuit for the cell Ui, j .

ψ ∗ = {1, α, α 2 , α 3 , . . . , α m−1 , α m } was proven also a basis. Using the shifted polynomial basis ψ ∗ , we can express two elements A and B as A = a0 + a1 α + a2 α 2 + · · · + am−1 α m−1 + am α m , B = b0 + b1 α + b2 α 2 + · · · + bm−1 α m−1 + bm α m , where am = bm = 0. Multiplying both sides of Eq. (2) with α, one has α m+1 = α + p1 α 2 + p2 α 3 + · · · + p m−2 α m−1 + pm−1 α m . (5)

146

Lee, Chiou and Lin

Fig. 3.

The semi-systolic array multiplier with concurrent error detection.

In the shifted polynomial basis ψ ∗ , the product C of A and B is re-computed using lC = AB = (a0 + a1 α + a2 α 2 + · · · + am−1 α m−1 + am α m )B = a0 B + a1 α B + a2 α 2 B + · · · + am−1 α m−1 B + am α m B. (6) Let D[i,j] (0 ≤ i, j ≤ m) and D denote 1-bit latches. The relations between these 1-bit latches are defined as follows: for 0 ≤ i, j ≤ m − 1, D [i + 1, j + 1] = D [i, j] + pi D [m, j] . Some of these flip-flops are assigned initial values as follows: D [i, 0] = bi for 0 ≤ i ≤ m − 1, and D[m, 0] = 0, and D [0, j] = 0 for 1 ≤ j ≤ m. Using the D[i, j] notations, each term in Eq. (6) is expressed by for 0 ≤ i ≤ m  ai α i B = ai D [0, i] + D [1, i] α + D [2, i] α 2 + · · ·  +D [m, i] α m .

The intermittent result C is also represented by the shifted polynomial basis and has the following result: C = c¯0 + c¯1 α + c¯2 α 2 + · · · + c¯m α m , where for 0 ≤ i ≤ m, m c¯i = a j D [i, j].

(7)

j=0

Since the final result C is represented by the shifted polynomial basis, a conversion from the shifted polynomial basis ψ ∗ back to the original polynomial basisψ is needed. Based on the Eqs. (2) and (7), the following conversion must be done. C = c0 + c1 α + c2 α 2 + · · · + cm−1 α m−1 , where for 0 ≤ i ≤ m − 1, ci = c¯i + pi c¯m .

(8)

In Eq. (8), the ci ’s on the right and left sides are the corresponding coefficients in the bases ψ ∗ and ψ. The concurrent error detection in the proposed polynomial basis multiplier is based on the RESO method [31, 32]. In the normal computation, the computation follows Eq. (4). In the re-computation, the product C is computed according to Eq. (7) and (8). The result from the first computation is temporarily stored in the register and then compared with the result from the second computation. A mismatch indicates the presence of an error.

Concurrent Error Detection in a Polynomial Basis Multiplier over GF(2m )

Fig. 4.

The detailed circuit for the cell V.

The semi-systolic array polynomial basis multiplier with concurrent error detection is shown in Fig. 3. The detailed circuits for cells U and V are depicted in Figs. 2 and 4, respectively. Since am is equal to 0 in the basis ψ ∗ , thus, the row for am in Fig. 3 is eliminated to save space. As aforementioned, the functional fault model is assumed. We assumed that faults will make their effect felt at the cell level in terms of changed logical output values. A faulty cell has error output(s). Throughout this paper, a single faulty cell is assumed. Suppose that the faulty cell is Ui,j , where 0 ≤ i, j ≤ m − 1. The faulty behavior can be classified into the following five cases and proven detectable. (1) Error on pout The output pout of the cell Ui,j is a go-through line from the input pin , thus an error on it is easily detected by comparing pin and pout . Therefore, the error on pout can be neglected. (2) Error on aout The output aout of the cell Ui,j is also a go-through line from the input ain , thus an error on it is easily detected. Therefore, the error on aout can be neglected. (3) Error on cout If an error occurs on cout of the faulty cell Ui,j . In the first computation, this error will infect output ci−1 . In the second computation, this error will have an influence on ci . Therefore, this error will be detected. (4) Error on bout If an error occurs on bout of the faulty cell Ui,j . In the first computation, this error will infect outputs ci , ci+1 , . . ., and ci+t , where t equals the minimum value of m − j − 2 and m − i − 2. In the second computation, the error will affect outputs ci+1 , ci+2 , . . ., and ci+h , where h is equal to the minimum value of m – j − 1 and m − i. Therefore, this error will be detected. (5) Errors on both cout and bout If errors exist on both cout and bout of the faulty cell Ui,j . In the first computation, these errors will infect outputs ci−1 , ci , . . ., and ci+t , where t is the minimum value of m – j − 2

147

and m − i − 2. In the second computation, the errors will affect outputs ci , ci+1 , . . ., and ci+h , where h is equal to the minimum value of m − j − 1 and m − i. Therefore, these errors are detectable. As the faulty cell is Um,j in the last column of Fig. 3, where 0 ≤ j ≤ m − 1, the errors on its outputs pout and aout are ignored. The error on the output cout will infer cm−1 and cm in the first and the second computations, respectively. Thus, the error on the output cout can be detected. However, the output bout from the cell Um,j is a global line and thus will affect nearly all cells in the next stage. Therefore, an extra column is added for detecting errors on bout ‘s of the cells Um,j ’s (0 ≤ j ≤ m − 1). Two comparators are employed for comparing the results from the first and second computations. The space complexity of the multiplier without concurrent error detection in Fig. 1 requires 2 m2 AND gates, 2m2 XOR gates, and 2m2 D flip-flops. The multiplier with concurrent error detection in Fig. 3 requires about 2m2 + 5m AND gates, 2m2 + 7m XOR gates, and 2m2 + 6m D flipflops. The proposed polynomial basis multiplier with concurrent error detection minimally increases the space complexity overhead compared with a multiplier without concurrent error detection. Moreover, the proposed multiplier with concurrent error detection requires only two extra clock cycles compared to the original multiplier in Fig. 1. Table 1 shows comparisons of various polynomial basis multipliers in GF(2m ) with error detection capability. If k-out-of-2k totally self-checking (TSC) checkers [35] are employed for use of comparators in Fig. 3. Suppose m-1 serial 2-out-of 4 TSC checkers and one m-out-of-2m TSC checker are for Comparator-1 and Comparator-2, respectively. The minimal two-level implementations of such TSC checkers are also assumed. We will take into the transistor count using a standard CMOS VLSI realization. In the CMOS VLSI technology, inverter, k-input AND, k-input OR, k-input XOR and 1bit latch are composed of 2, 2k + 2, 2k + 2, 2k + 2, and 8 transistors, respectively [36–39]. The comparison results of various multipliers with and without concurrent error detection are depicted in Table 2. If extra checkers are not included, our proposed multiplier with concurrent error detection requires 1.4% extra transistors (for example, m = 131). If both extra checkers are considered, our proposed multiplier with concurrent error detection needs about 120% extra transistors. Most of extra space costs are due to totally self-checking checkers. Table 3 shows the extra area overhead for various multipliers with different m values. The proposed polynomial basis multiplier with concurrent error detection capability by use of the RESO method takes 2 extra clock cycles and requires 88m2 + 156m − 56 transistors if totally self-checking checkers are included. For comparison, our previous proposed dual basis multiplier with concurrent error detection capability by using the parity check method takes 1 extra clock cycles and needs 122m2 + 66m − 60 transistors [29].

148

Lee, Chiou and Lin

Table 1.

Comparison of time overhead for various polynomial basis multipliers with error detection.

Multipliers

Fenn et al. [27]

Reyhani-Masoleh and Hasan [28]

Fig. 3

Output

Bit-serial output

Bit-parallel output

Bit-parallel output

Generated polynomial

General form

General form

General form

Error detection

Concurrent error detection 1 XOR gate delay

Error detection   log2 m XOR gate delays

Concurrent error detection

Time overhead

Table 2.

2 clock cycles

Comparison of area overhead for various polynomial basis multipliers with and without error detection capability.

Multipliers

Multiplier without concurrent error detection in Fig. 1

Multiplier array itself

Gate count

2m2 AND2 2m2

The proposed multiplier with concurrent error detection in Fig. 3 2m2 + 2m AND2

Gate count

2m2 + 5m XOR2

XOR2

2m2 + 6m + 1 1-bit latches

2m2 1-bit latches

m AND3 Extra comparator circuits

Transistor count

40m2

Transistor count

Gate count

0

Gate count

40m2 + 74m + 8 2m inverters 4m-4 AND2 4m-1 OR2 m-1 AND4 m-1 OR4 2m ANDm 2m ORm 1 AND2m 1 OR2m 3m-3 1-bit latches

Total space cost

Transistor count

0

Transistor count

48m2 + 82m − 64

Transistor count

40m2

Transistor count

88m2 + 156m − 56

Note: ANDk : k-input AND gate, XORk : k-input XOR gate, ORk : k-input OR gate. Table 3.

Comparison of area overhead for various polynomial basis multipliers with different m values. #Transistors

Multipliers

m = 54

m = 131

m = 500

m = 1210

The multiplier in Fig. 1

116640

686440

10000000

58564000

The multiplier in Fig. 3 (not including comparators)

120644

696142

10037008

58653548

The multiplier in Fig. 3 (including comparators)

264976

1530548

22000000

129029504

Extra area overhead (not including comparators)

3.4%

1.4%

0.37%

0.2%

Extra area overhead (including comparators)

127%

122%

120%

120%

5.

Conclusions

A semi-systolic array polynomial basis multiplier using a general polynomial has been presented. By employing the re-computing shifted operands concept in [31–33] based on time redundancy, a semi-systolic array polynomial basis multiplier with concurrent error detection capability was produced. The faulty cell is assumed for the functional fault model. Both permanent and intermittent faults on the

faulty cell can be detected. The proposed multiplier with concurrent error detection capability minimally increases the space complexity overhead compared to a multiplier without concurrent error detection. Furthermore, the proposed multiplier with concurrent error detection requires only two extra clock cycles compared with a multiplier without concurrent error detection. Most existing multipliers with a parity check require at least log2 m gate delays for error detection. This is a long delay if this

Concurrent Error Detection in a Polynomial Basis Multiplier over GF(2m ) type of multiplier is applied in cryptographic computations. Therefore, the parity check method is available for concurrent error detection for bit-serial multipliers with bitserial output. Our proposed method provides concurrent error detection capability for polynomial basis multipliers with bit-parallel output. If totally self-checking checkers are not considered, our proposed multiplier with concurrent error detection capability only requires a bit little extra space cost, for example, 1.4% for m = 131. If totally selfchecking checkers are included, our proposed polynomial basis multiplier by employing the RESO method requires about 88 m2 transistors while the proposed dual basis multiplier by use of the parity check method in [29 ] needs 122 m2 transistors. Acknowledgments The authors would like to thank anonymous referees and the editor for carefully reading the paper and for their great help in improving the paper. The work was supported in part by the National Science Council of Republic of China under grant numbers NSC 93-2213-E-231-012, NSC 93-2218-E231-005, and NSC 94-2213-E231-021. References 1. F.J. MacWilliams and N.J.A. Sloane, The Theory of Error-Correcting Codes, Amsterdam, North-Holland, 1977. 2. R. Lidl and H. Niederreiter, Introduction to Finite Fields and Their Applications; Cambridge Univ. Press, New York 1994. 3. R.E. Blahut, Fast Algorithms for Digital Signal Processing, AddisonWesley, Reading, Mass., 1985. 4. I.S. Reed and T.K. Truong, “The Use of Finite Fields to Compute Convolutions,” IEEE Trans. Information Theory, Vol. IT-21, No. 2, pp. 208–213, 1975. 5. B. Benjauthrit and I.S. Reed, “Galois Switching Functions and Their Applications,” IEEE Trans. Computers, Vol. C-25, pp. 78–86, Jan. 1976. 6. C.C. Wang and D. Pei, “A VLSI Design for Computing Exponentiation in GF(2m ) and its Application to Generate Pseudorandom Number Sequences,” IEEE Trans. Computers, Vol. 39, No. 2, pp. 258–262, Feb. 1990. 7. T.C. Bartee and D.J. Schneider, “Computation with Finite Fields,” Information and Computing, Vol. 6, pp. 79–98, Mar. 1963. 8. E.D. Mastrovito, “VLSI Architectures for Multiplication Over Finite Field GF(2m ),” Applied Algebra, Algebraic Algorithms, and ErrorCorrecting Codes, Proc. Sixth Int’l Conf., AAECC-6, T. Mora, (Ed.), Rome, July 1988, pp. 297–309. 9. C¸.K. Koc¸ and B. Sunar, “Low-Complexity Bit-Parallel Canonical and Normal Basis Multipliers for a Class of Finite Fields,” IEEE Trans. Computers, Vol. 47, No. 3, pp. 353–356, 1998. 10. C.Y. Lee, “Low Complexity Bit-Parallel Systolic Multiplier Over GF(2m ) using Irreducible Trinomials,” IEE Proc.-Comput. Digit. Tech., Vol. 150, No. 1, pp. 39–42, Jan. 2003. 11. T. Itoh and S. Tsujii, “Structure of Parallel Multipliers for a Class of Fields GF(2m ),” Information and Computation, Vol. 83, pp. 21–40, 1989. 12. M.A. Hasan, M. Wang, and V.K. Bhargava, “Modular Construction of Low Complexity Parallel Multipliers for a Class of Finite Fields GF(2m ),” IEEE Trans. Computers, Vol. 41, No. 8, pp. 962–971, 1992.

149

13. C.Y. Lee, E.H. Lu, and J.Y. Lee, “Bit-Parallel Systolic Multipliers for GF(2m ) Fields Defined by All-One and Equally-Spaced Polynomials,” IEEE Trans. Computers, Vol. 50, No. 5, pp. 385–393, 2001. 14. C. Paar, “A New Architecture For a Parallel Finite Field Multiplier with Low Complexity Based on Composite Fields,” IEEE Trans. Computers, Vol. 45, No. 7, pp. 856–861, 1996. 15. C.W. Chiou, L.C. Lin, F.H. Chou, and S.F. Shu, “Low Complexity Finite Field Multiplier Using Irreducible Trinomials,” Electronics Letters, Vol. 39, No. 24, pp. 1709–1711, 2003. 16. J.L. Massey and J.K. Omura, “Computational method and apparatus for finite field arithmetic,” U.S. Patent Number 4,587,627, May 1986. 17. A. Reyhani-Masoleh and M.A. Hasan, “A New Construction of Massey-Omura Parallel Multiplier Over GF(2m ),” IEEE Trans. Computers, Vol. 51, No. 5, pp. 511–520, 2002. 18. E.R. Berlekamp, “Bit-Serial Reed-Solomon Encoders,” IEEE Trans. Information Theory, Vol. IT-28, pp. 869–874, 1982. 19. H. Wu, M.A. Hasan, and I.F. Blake, “New Low-Complexity BitParallel Finite Field Multipliers Using Weakly Dual Bases,” IEEE Trans. Computers, Vol. 47, No. 11, pp. 1223–1234, November 1998. 20. J. Kelsey, B. Schneier, D. Wagner, and C. Hall, “Side-Channel Cryptanalysis of Product Ciphers,” Proc. of ESORICS, Springer, Sep.1998., pp. 97–110 21. E. Biham and A. Shamir, “Differential Fault Analysis of Secret Key Cryptosystems,” Proceedings of Crypto, Springer LNCS 1294, 1997, pp. 513–525, 22. D. Boneh, R. DeMillo, and R. Lipton, “On the Importance of Checking Cryptographic Protocols for Faults,” Proc. of Eurocrypt, Springer LNCS 1233, pp. 37–51, 1997. 23. R. Karri, G. Kuznetsov, and M. Goessel, “Parity-Based Concurrent Error Detection of Substitution-Permutation Network Block Ciphers,” Proc. of CHES 2003, Springer LNCS 2779, 2003, pp. 113–124. 24. G. Bertoni, L. Breveglieri, I. Koren, P. Maistri, and V. Piuri, “Error Analysis and Detection Procedures for a Hardware Implementation of the Advanced Encryption Standard,” IEEE Trans. Computers, Vol. 52, No. 4, pp. 492–505, 2003. 25. M. Joye, A. Lenstra, J.-J. Quisquater, “Chinese remaindering based cryptosystems in the presence of faults,” Journal of Cryptology, Vol. 12, pp. 241–245, 1999. 26. D. Boneh, R. DeMillo,, and R.J. Lipton, “On The Importance of Eliminating Errors in Cryptographic Computations,” Journal of Cryptology, Vol. 14, pp. 101–119, 2001. 27. S. Fenn, M. Gossel, M. Benaissa, and D. Taylor, “On-line Error Detection for Bit-Serial Multipliers in GF(2m ),” Journal of Electronic Testing: Theory and Applications, Vol. 13, pp. 29–40, 1998. 28. A. Reyhani-Masoleh and M.A. Hasan, “Error Detection in Polynomial Basis Multipliers Over Binary Extension Fields,” Proc. of Cryptographic Hardware and Embedded Systems-CHES 2002, LNCS 2523, pp. 515–528, 2003. 29. C.-Y. Lee, C.W. Chiou, and J.-L. Lin, “Concurrent Error Detection in A Bit-Parallel Systolic Multiplier for Dual Basis of GF(2m),” Journal of Electronic Testing: Theory and Applications, Vol. 21, No. 5, pp. 539–549, 2005. 30. C.W. Chiou, “Concurrent Error Detection in Array Multipliers for GF(2m ) Fields,” IEE Electronics Letters, Vol. 38, No. 14, pp. 688– 689, 4th July 2002. 31. J.H. Patel and L.Y. Fung, “Concurrent Error Detection in ALU’s by Recomputing with Shifted Operands,” IEEE Trans. Computers, Vol. C-31, No. 7, pp. 589–595, 1982. 32. J.H. Patel and L.Y. Fung, “Concurrent Error Detection in Multiply and Divide Arrays,” IEEE Trans. Computers, Vol. C-32, No. 4, pp. 417–422, 1983. 33. R.H. Minero, A.J. Anello, R.G. Furey, and L.R. Palounek, “Checking by Pseuduplication,” U.S. Patent 3660646, May 1972. 34. Applications of Finite Fields, A.J. Menezes, (Ed.), Kluwer Academic, Boston, 1993. 35. P.K. Lala, Fault tolerant & Fault Testable Hardware Design, London, Prentice-Hall International, Inc., 1985.

150

Lee, Chiou and Lin

36. K.Z. Pekmestzi, “Multiplexer-Based Array Multipliers,” IEEE Trans. Computers, Vol. 48, No. 1, pp. 15–23, 1999. 37. R.J. Baker, H.W. Li, and D.E. Boyce, CMOS-Circuit, Design, Layout, and Simulation, IEEE Press, New York, 1998. 38. S.M. Kang and Y. Leblebici, CMOS Digital Integrated CircuitsAnalysis and Design, McGraw-Hill, 1999. 39. G.-Y. Byun and H.-S. Kim, “Low-Complexity Multiplexer-Based Multiplier of GF(2m ),” IEICE Trans. Infor. & Syst., Vol. E86-D, No. 12, pp. 2684–2690, 2003. Chiou-Yng Lee received the Bachelor’s degree (1986) in medical engineering and the M.S. degree in electronic engineering (1992), both from the Chung Yuan university, Taiwan, and the Ph.D. degree in electrical engineering from Chang Gung University, Taiwan, in 2001. From 1988 to now, he was a research associate with Chunghwa Telecommunication Laboratory in Taiwan. He joined the department of project planning. He taught those related field courses at Ching-Yun Technology University. He is currently as an assistant professor of Department of Computer Information and Network Engineering in Lunghwa University of Science and Technology. His research interests include computations in finite fields, error-control coding, signal processing, and digital transmission system. Besides, he is a member of the IEEE and the IEEE Computer society. He is also an honor member of Phi Tao Phi in 2001.

Che Wun Chiou received his B.S. degree in Electronic Engineering from Chung Yuan Christian University in 1982, the M.S. degree and the Ph.D. degree in Electrical Engineering from National Cheng Kung University in 1984 and 1989, respectively. From 1990 to 2000, he was with the Chung Shan Institute of Science and Technology in Taiwan. He joined the Department of Electronic Engineering and the Department of Computer Science and Information Engineering, Ching Yun University in 2000 and 2005, respectively. He is currently as Dean of Division of Continuing Education in Ching Yun University. His current research interests include fault-tolerant computing, computer arithmetic, parallel processing, and cryptography. Jim-Min Lin was born on March 5, 1963 in Taipei, Taiwan. He received the B.S. degree in Engineering Science and the M.S. and the Ph.D. degrees in Electrical Engineering, all from National Cheng Kung University, Tainan, Taiwan, in 1985, 1987, and 1992, respectively. Since February 1993, he has been an Associate Professor at the Department of Information Engineering and Computer Science, Feng Chia University, Taichung City, Taiwan. He is currently as Professor at the Department of Information Engineering and Computer Science, Feng Chia University. His research interests include Operating Systems, Software Integration/Reuse, Embedded Systems, Software Agent Technology, and Testable Design.