Concurrent Error Detection for Involutional ... - Semantic Scholar

Report 3 Downloads 178 Views
Concurrent Error Detection for Involutional Functions with applications in Fault Tolerant Cryptographic Hardware Design Nikhil Joshi1, Kaijie Wu2, Jayachandran Sundararajan1, Ramesh Karri1 1 ECE Department, Polytechnic University, Brooklyn NY {njoshi01@utopia, jsunda01@utopia, ramesh@india}.poly.edu 2 ECE Department, University of Illinois at Chicago, [email protected] Abstract In this paper we present a time redundancy based concurrent error detection (CED) technique targeting involutional functions. A function F is an involution if F(F(x))=x. The proposed CED technique exploits the involution property and checks if x=F(F(x)). Unlike traditional time redundancy based CED methods this technique can detect both permanent and transient faults. Keywords: Concurrent Error Detection (CED), Involution, KHAZAD, ANUBIS 1.

INTRODUCTION

Faults that occur in VLSI chips can broadly be classified into two categories: Transient faults that die away after sometime and permanent faults that do not die away with time but remain until they are repaired or the faulty component is replaced. The origin of these faults could be due to the internal phenomena in the system such as threshold change, shorts, opens etc. or due to external influences like electromagnetic radiation. These faults affect the memory as well as the combinational parts of a circuit and can only be detected using Concurrent Error Detection (CED) methods. This is especially true for sensitive devices such as cryptographic chips [1][2][3][4][5]. The most straightforward methods of performing CED are Hardware Redundancy and Time Redundancy. In Hardware Redundancy, two copies of the hardware are used concurrently to perform the same computation on the same data [6]. At the end of each computation, the results are compared and any discrepancy is reported as an error. The advantage of this technique is that it has minimum error detection latency and it can detect both transient and permanent faults. A drawback of this

technique is that it entails ≥100% hardware overhead. In Time Redundancy, the same hardware is used to perform both the normal and re-computation using the same input data. The advantage of this technique is that it uses minimum hardware. The drawbacks of this technique are that it entails ≥100% time overhead and it can only detect transient faults. In this paper, we propose a time redundancy based CED technique for involutional functions that can detect both transient and permanent faults. A function F is an involution if F(F(x))=x, ∀x in the domain of F. The CED technique proposed in this paper exploits this involution property and checks if x=F(F(x)) to detect permanent and transient faults. The paper is organized as follows. In section 2 we will describe the involution based time redundancy CED technique. In section 3 we will adapt the proposed CED technique to functions composed of involutions by using involutional symmetric block ciphers KHAZAD and ANUBIS as running examples. We modeled the various CED implementations using VHDL (a hardware description language) and synthesized using cadence ASIC synthesis tool PKS. The results of this implementation are presented in section 4. The error detection capabilities of the proposed methods are discussed in section 5. Additional optimizations that can be performed for cryptographic hardware is discussed in section 6. Conclusions are reported in section 7. 2. CONCURRENT ERROR DETECTION OF INVOLUTION FUNCTIONS If a hardware module implements an involution F, faults in this module can be detected by checking if x=F(F(x)). Figure 1 shows this basic idea. At the beginning of the involution F, the input x is buffered in a register. The output F(x) of the module implementing F is saved in another register and then fed back to the input of the module F through a multiplexer controlled by CED control signal. The result of this second computation namely F(F(x)) is compared to the original input x with a mismatch indicating an error.

Figure 1: Concurrent Error Detection for an involution F We will now outline four example involutions. Consider the two 4-bit input 4-bit output functions P and Q shown in Table 1. Notice that P(P(x)) = x and Q(Q(x)) = x, ∀x. In addition P(x)≠x and Q(x) ≠x. A careful analysis of the P and Q functions shows that all single bit faults, all two-bit faults and so on can be detected by the proposed CED technique. For example with an input x = (0000)2, a correct operation of P(x) yields (0011)2 and P(P(x)) = (0000)2 = x. Consider a fault inside the P-box which lets the P-box outputs any other number except (0011)2. Feeding this wrong output back to the input of the P-box yields a number that is not equal to (0000)2, thus detecting this fault. x 0 1 2 3 4 5 6 7 8 9 A B C D E

P(x) 3 F E 0 5 4 B C D A 9 6 7 8 2

Q(x) 9 E 5 6 A 2 3 C F 0 4 D 7 B 1

F 1

8

Table 1: 4x4 involutions P and Q Next, consider the 8 bit input and 8 bit output S function shown in Figure 2 composed using involutions P and Q. Once again the proposed CED technique can detect all faults in S. For example, consider a correct S function that takes (0000, 0000)2 as the input. After the first level of P-box and Q-box, the result is (0011, 1001)2. The bits of two parts are exchanged and the input to the second level of P-box and Q-box is (0010, 1101)2. The output (0101, 1000) then has bits exchanged and is sent to the third level. The final output of the S function will be (1011, 1010)2. Feeding back this input yields (0000, 0000)2. Now assuming there is a single-bit fault at the P-box at the third level, and outputs (1010) instead of the correct output (1011). Feeding the wrong output (1010, 1010)2 back to this function yields (1001, 0100)2. Comparing this to the original input (0000, 0000)2 detects the fault.

P

Q

Q

P

P

Q

Figure 2: S-Box used in the Non-linear function γ of KHAZAD and ANUBIS Consider the transposition function τ of a matrix a: τ(a)=b ⇒ bij=aji. This is an involution as well and hence the proposed CED technique can detect all faults in it. Consider the linear mapping functions θ1 and θ2 defined as θ1 (a) = b Ù b = a × H1, where

H1 =

⎡01x 03x 04x 05x 06x 08x 0Bx 07x ⎤ ⎢03x 01x 05x 04x 08x 06x 07x 0Bx ⎥ ⎥ ⎢ ⎢04x 05x 01x 03x 0Bx 07x 06x 08x ⎥ ⎥ ⎢ ⎢05x 04x 03x 01x 07x 0Bx 08x 06x ⎥ ⎢06x 08x 0Bx 07x 01x 03x 04x 05x ⎥ ⎥ ⎢ ⎢08x 06x 07x 0Bx 03x 01x 05x 04x ⎥ ⎢0Bx 07x 06x 08x 04x 05x 01x 03x ⎥ ⎥ ⎢ ⎣⎢07x 0Bx 08x 06x 05x 04x 03x 01x ⎦⎥

or as θ2 (a) = b Ù b = a ×H2, where ⎡ 0 x01 ⎢ H2= ⎢0 x02 ⎢0 x04 ⎢ ⎣0 x06

0 x02 0 x04 0 x06⎤ 0 x01 0 x06 0 x04⎥⎥ 0 x06 0 x01 0 x02⎥ ⎥ 0 x04 0 x02 0 x01⎦

Since matrices H1 and H2 are symmetric and unitary θ1 and θ2 are involutions [12][14]. We performed an exhaustive simulation and found out that all faults in these involutions can be detected using the proposed CED as well. Finally, consider the exclusive-or function with one of its inputs fixed. Although this is an involution, the proposed CED scheme can detect only 50% of all possible faults. To see why consider a 64-bit exclusive-or module with one of its inputs fixed. If the exclusive-or module has a stuck-at fault at one of its output bits and a faulty output is obtained because of this fault the involution could in fact generate the correct input that was applied. Consider (0000 0000 0000 0000)16 as the 64-bit input to the exclusive-or module with (0000 0000 0000 0001)16 as the fixed input. (0000 0000 0000 0001)16 is the correct 64-bit output. If the Least Significant Bit (LSB) of the exclusive-or output is stuck-at-0, we obtain a faulty output (0000 0000 0000 0000)16. But when we apply the faulty output again to the exclusive-or module we get back the original 64-bit input (0000 0000 0000 0000)16. Involution based CED fails to detect such faults. This can be solved by interchanging the operands applied to the exclusive-or module during the normal and the re-computation phases as follows. Divide the 64-bit input to the exclusive-or module into two halves. Similarly divide the 64-bit exclusive-or module into two halves. During the normal computation, the left half of the input is applied to the left half of the exclusive-or module and the right half of the input is applied to the right half of the exclusive-or

module. During re-computation, the inputs and the modules to which they are applied are interchanged. By doing so all permanent faults in the module can be detected. An inherent advantage of the proposed involution based CED method is that it can detect permanent faults in a function even though the faults might not affect the output, i.e. are not activated by current inputs. Consider a situation where a faulty bit is stuck- at-1 and the output at that bit was supposed to be logic ‘1’. In this case, although the output is correct, the fault will be detected because the involution will not yield the correct result. This enhances the security of the implementation since any attempts to clandestinely attack the algorithm can be detected. This also improves the overall fault coverage as well as the error detection latency. 3.

INVOLUTIONAL TIME REDUNDANCY FOR BLOCK CIPHERS KHAZAD AND ANUBIS

We will now apply the proposed CED technique on involutional functions from the domain of cryptography namely involutional substitution-permutation-network based block ciphers. A substitution-permutation-network (SPN) symmetric block cipher is composed of several rounds and each round consists of a non-linear substitution function, a linear diffusion function and a key-mixing function. The linear diffusion function ensures that after a few rounds all the output bits depend on all the input bits. The nonlinear function ensures that this dependency is complex and nonlinear. The key-mixing function adds the round keys (derived from the user provided secret key). In traditional symmetric block ciphers the functions used in decryption are the inverse of those used in encryption. On the other hand, in involutional block ciphers all functions are involutions and therefore the functions used in decryption are identical to those used in encryption. The only difference between encryption and decryption is that the round keys used in decryption are the inverse of those used in encryption. Besides the implementation benefit (the same hardware can be used for both encryption and decryption), an involutional structure implies equivalent security for both encryption and decryption [10]. Several involutional SPN symmetric block ciphers have been proposed and analyzed

[11][12][13][14][15]. Figure 3 shows involution SPN ciphers KHAZAD [12] and ANUBIS [14].

Figure 3: (a) 64-bit KHAZAD involutional cipher (b) round function of KHAZAD ρ(x) = σKey(θ1(γ(x))) (c) 128-bit ANUBIS involutional cipher and (d) round function of ANUBIS ρ[Key](x) = σKey(θ2(τ(γ(x)))) KHAZAD encrypts a block of 64-bit plaintext using 8 identical rounds. A KHAZAD round shown in Figure 4 (b) is composed of an involutional non-linear substitution function γ that uses eight identical copies of the 8×8 involution S described earlier. This is followed by an involutional linear diffusion function θ1 also described earlier and a 64-bit exclusive-or function that mixes in the round key (σKey). We used the hardware implementation for θ1 described in [16]. ANUBIS encrypts a block of 128-bit plaintext using 14 rounds. An ANUBIS round shown in Figure 4 (d) is composed of an involutional non-linear substitution function γ that uses sixteen identical copies of the 8×8 involutional S function. This is followed by the matrix transposition function τ that was described earlier. The τ function permutes the input bytes and a hardware implementation just changes the wire order costing no hardware. For simplicity we ignore this function in the rest of this paper. This is followed by an involutional linear diffusion function θ2 described earlier and a 128-bit exclusive-or function that mixes in the round key (σKey). We used the hardware implementation for θ2 described in

[17]. Since all round functions are involutions, γ(γ(x)) = x, τ(τ(x))=x, θ1(θ1(x)) = x, θ2(θ2(x)) = x and σ(σ(x)) = x, ∀x. Since both KHAZAD and ANUBIS have almost identical structure, we will use KHAZAD as the running example to explain involutional time redundancy. We will show how involution time redundancy can be incorporated into an iterative implementation of KHAZAD. Since the KHAZAD round itself is not involutional we split the round function into three stages to facilitate involutional time redundancy based CED. In this modified KHAZAD architecture, each KHAZAD round takes three clock cycles to finish with round operation γ, round operation θ and round operation σ taking one clock cycle. In contrast to 1.18 Gbps throughput and 26922 units of area for the single stage per round implementation of KHAZAD, this three-stage per round implementation of KHAZAD has a throughput of 0.89 Gbps (a 33% degradation) and area of 28967 units (a 7% area overhead). The KHAZAD architecture with involutional CED is shown in Figure 4. This involutional CED can detect permanent faults in addition to the transient faults. This is because, although the same module is used twice, the data that it is operating on is different in each case. Three additional registers and three additional comparators are the main sources of the area overhead associated with this technique.

Figure 4: KHAZAD and ANUBIS round functions with involutional time redundancy 3.1. Normal Time Redundancy In the normal time redundancy round operation γ is performed in clock cycles 1 and 2 on the same input x both the times. If γ(x) obtained at the end of clock cycle 1 = γ(x) obtained at the end of clock cycle 2 (i.e. no transient fault is detected in module γ), round operation θ1 is performed in clock cycles 3 and 4 on the same input γ(x). If θ1(γ(x)) at the end of clock cycle 3 =θ1(γ(x)) at the end of clock cycle 4 (i.e. no transient fault is detected in module θ1), round operation σ is performed in clock cycles 5 and 6 on the same input θ1(γ(x)). If σ(θ1(γ(x))) at the end of clock cycle 5= σ(θ1(γ(x))) (i.e. no transient fault is detected in module σ) one KHAZAD round is successfully completed. This scheme can only detect transient faults. 3.2. Involutional Time Redundancy In the involutional time redundancy, round operation γ is performed in clock cycle 1 on input x

followed by the corresponding CED operation γ(γ(x)) in clock cycle 2. If x= γ(γ(x)) (i.e. no fault is detected in module γ), round operation θ1 is performed in clock cycle 3 on γ(x) followed by the corresponding CED operation θ1(θ1(γ(x))) in clock cycle 4. If γ(x)= θ1(θ1(γ(x))) (i.e. no fault is detected in module θ1) then round operation σ is performed in clock cycle 5 on θ1(γ(x)) followed by the corresponding CED operation σ(σ(θ1(γ(x)))) in clock cycle 6. If θ1(γ(x))= σ(σ(θ1(γ(x)))) (i.e. no fault is detected in module σ) one KHAZAD round is successful. Time redundancy based CED Clock cycle Normal

Involutional

1

γ(x) of round 1

γ(x) of round 1

2

γ(x) of round 1+check

γ(γ(x)) of round 1+check

3

θ(γ(x)) of round 1

θ(γ(x))) of round 1

4

θ(γ(x)) of round 1+check

θ(θ(γ(x))) of round 1+check

5

σ(θ(γ(x))) of round 1

σ(θ(γ(x))) of round 1

6

σ(θ(γ(x))) of round 1+check

σ(σ(θ(γ(x)))) of round 1+check

Table 2: Normal and Involutional time redundancy steps for one round of KHAZAD. Table 2 summarizes the normal and the involutional time redundancy based CED for one round of KHAZAD and ANUBIS. Although both methods entail 100% time overhead, involutional time redundancy can detect permanent faults as well. 4.

FAULT-INJECTION SIMULATION BASED EVALUATION OF INVOLUTIONAL TIME REDUNDANCY

4.1. Single-bit faults In order to evaluate the error detection capability of the proposed CED scheme, the architecture implementation was modeled in C. Single-bit stuck-at faults (both stuck-at-0 and stuck-at-1) were injected at all points in the design, for every input used for testing. This was accomplished by adding a multiplexer with a fault injection control at the point of the fault insertion as shown in Figure 5.

Original data bit

1

0

1

0

Stuck-at fault 0 or 1

Output with a fault injected

Figure 5: Fault injection at the output of a function A stuck-at-1 fault is injected at a point by setting the fault injection control to 1 and similarly, a stuck-at-0 fault is injected at a point by setting the fault injection control to 0. Single-bit faults were inserted not only at the input and output of the modules, but also internal to them. In the fault simulation the lowest level of fault injection was performed at the inputs and outputs of the P-Box and Q-box of an S-Box for the γ layer, and the exclusive-or operation for the functions θ and σ. The number of single-bit faults is shown in Table 3. For example, since an S-Box of KHAZAD consists of three 4x4 P-Boxes and three 4x4 Q-Boxes, the total number of connections of an S-Box is 4 × 3 (for P-Box) + 4 × 3 (for Q-Box) + 8 (the number of inputs to S-Box) = 32. Since the γ function of KHAZAD and ANUBIS consist of 8 and 16 S-Boxes respectively, the total number of connections and hence the corresponding total number of single-bit faults is 512 and 1024. Simulations were performed for 1.5 million random inputs, and for every input all the possible single-bit permanent faults were simulated and encryption was performed. Table 3 shows the fault coverage. Layer

# of single-bit faults KHAZAD

ANUBIS

γ

512

1024

θ

2144

σ

384

# of inputs applied

Fault coverage Permanent

Transient

1500000

100%

99.995%

704

1500000

100%

100%

768

1500000

100%

100%

Table 3: Single-bit fault coverage of involutional time redundancy Exhaustive fault simulations were performed for transient faults also, with the different durations of the fault ranging from one to ten clock cycles. The overall fault coverage in the case of transient faults was 99.995% for the γ layer and 100% for the θ and σ layers. The only scenario in which this CED scheme fails for transient faults in the case of the γ layer is as follows: in some rare cases, the fault causes the output of the module to be wrong but the involutional output to be correct. In such cases, if the transient fault occurs in such a way that it lasts exactly for two clock cycles and if the input/fault combination is such that the above criteria for incorrect involution is satisfied, then the fault goes undetected. As an example, consider the following case. If the input to the γ layer is 0xC8D3 EEF4 AE2D 2DA1, the output will be 0x7E05 93BC 87F3 F32C. If a stuck-at-1 fault occurs for two clock cycles at the 3rd LSB of the input of the γ layer with the same inputs, then the γ layer receives 0xC8D3 EEF4 AE2D 2DA5 as the input, which results in a faulty output of 0x7E05 93BC 87F3 F328. When this faulty output is fed-back for re-computation, the γ layer receives 0x7E05 93BC 87F3 F32C as the input because of the stuck-at-fault. The re-computation results in 0xC8D3 EEF4 AE2D 2DA1 which was the exact value input to the module, hence the system does not report an error. This problem can be solved by performing another normal re-computation during the third idle clock cycle in the datapath, with the cost of extra hardware. But the probability of such scenarios occurring for all the rounds in the cipher is extremely low. 4.2. Multiple-bit faults Since it was not feasible to inject every possible combination of multiple bit faults, random multiple bit faults were injected into the system and encryption was performed. It was observed that the fault coverage for the γ layer was 100% for permanent faults and 99.999% transient faults. The only transient faults that go undetected are those as explained in section 4.1, but at multiple locations (probability of occurrence further reduces). The fault coverage for the θ layer was 100% since θ is a

diffusion function and every bit in the output is dependent on every bit in the input. The only problem with detection of multiple-bit faults occurred at the σ layer. Since the right and left halves of the operands are exchanged during the involutional re-computation of the σ layer, “symmetric faults” go undetected. Symmetric faults are those faults which occur in such a way that they affect the same two bits during the normal computation and the involutional re-computation. For example, the output bit-0 of the σ layer during normal computation would correspond to output bit-32 during the exchange operation for involutional re-computation. If a pair of faults occur in such a way that they affect both bit-0 and bit-32 in the same way (stuck-at-0 or stuck-at-1), then this fault would go undetected. Except for this case, all other transient and permanent multiple-bit faults are detected. A summary of the results is presented in table 3.It can be seen that the permanent fault coverage in the case of 3-bit and 5-bit faults is 100% since odd-number of faults do not result in symmetric faults. Type of

% of total # of faults inserted

Fault

Overall Fault coverage Permanent

Transient

2-bit

27.8

99.998%

99.996%

3-bit

24.6

100%

99.999%

4-bit

27.2

99.999%

99.999%

5-bit

20.4

100%

100%

Table 4: Multiple-bit fault coverage of the implementation of involution based CED for KHAZAD and ANUBIS 4.3. Bridge faults To complete the evaluation of the CED capability, bridge faults were simulated. The simulated bridge faults belonged to 2 categories: wired-AND and wired-OR. Injection of bridge faults was more complicated than single or multiple-bit faults. Suppose a bridge fault occurs at node x and y. In a wired-AND scenario, node x and y take ‘1’ only if both the nodes are high, otherwise both nodes are

driven low. Similarly, in a wired-OR scenario, node x and y take ‘0’ only if both nodes are low, otherwise the nodes are driven high. Bridge faults (both AND and OR) were injected at 1.5 million random adjacent pairs of points in the datapath. Since the fault injection was performed using C, the lowest level of bridge faults inserted were restricted to the locations discussed in section 5.1 i.e., at the P- and Q-box levels for the γ layer and at the exclusive-or gate level for the he θ and σ layers. The simulation results showed that all faults were detected. 5.

IMPLEMENTATION BASED VALIDATION

KHAZAD and ANUBIS with involution based CED were implemented using IBM 0.13 micron library. The architectures were modeled using VHDL, and Cadence Buildgates PKS system was used for synthesis and place route. The normal designs without CED were implemented using the same library and design flow. Tables 4 and 5 show the details of the overheads for CED architecture of KHAZAD and ANUBIS respectively compared to the three clock cycles per round implementation. The second row shows the area used by the designs. An inverter of this library takes 32 units area. The area overheads of the CED designs are 17.45% and 12.7% respectively. The third row shows the minimum clock periods of synthesized designs. Due to the extra hardware inserted in the datapath, the clock periods of CED designs are more than the three cycles per round designs. The fourth row shows that the CED designs takes twice the number of clock cycles compared to the three clock cycles per round design since this is a time redundancy based technique. Finally the throughput comparisons are shown in the fifth row. The throughput is calculated as the number of bits encrypted per second, i.e. the # of text / (the # of clock cycles × clock period). We also implemented pipelined versions of KHAZAD and ANUBIS with and without CED. Similar throughput results were obtained as shown in tables 4 and 5 owing to the time redundancy based technique.

Area Clock period (ps) # clock cycles Throughput (Gbps)

Normal (3 cycles/round)

Involution Time Redundancy

Overhead %

28697

33705

17.45

3136.35

3187.4

1.6

23

45

95.65

0.89

0.4462

50.13

Table 5: Overhead for the Involution time redundancy CED implementation of KHAZAD

Normal (3 cycles/round)

Involution Time Redundancy

Overhead %

Area

129304

145722

12.7

Clock period (ps)

3198.78

3226.2

1

#clock cycles

35

69

97.14

Throughput (Gbps)

1.14

0.575

50.43

Table 6: Overhead for the Involution time redundancy CED implementation of ANUBIS 6.

ADDITIONAL CRYPTOGRAPHIC APPLICATION SPECIFIC OPTIMIZATION

Symmetric block ciphers are typically used in one of three feedback modes: Cipher Block Chaining (CBC) mode, Output FeedBack (OFB) mode and Cipher FeedBack (CFB) mode. Since KHAZAD and ANUBIS are symmetric block ciphers they are typically implemented as non-pipelined iterative architectures. In such an iterative, non-pipelined implementation of KHAZAD encryption/decryption, round operation γ is busy in clock cycles 1, 4, 7 … and idles in clock cycles 2, 3, 5, 6, 8, 9… Similarly, round operation θ1 is busy in clock cycles 2, 5, 8 ... and idles in clock cycles 1, 3, 4, 6, 7… Finally, round operation σ is busy in clock cycles 3, 6, 9 … and idles in clock cycles 1, 2, 4, 5, 7, 8 …. All of these idle cycles exist only in a non-pipelined implementation. Involutional time redundancy in Figure 4 can be adapted to exploit these idle clock cycles as follows: Round operation γ(x) is performed in clock cycle 1 and the input x is saved in the CED Register 1. The corresponding CED operation for

γ(x) i.e., γ(γ(x)) is performed in clock cycle 2.

Figure 6: The darkly shaded cells show the cycles when the modules perform normal operation while the lightly shaded cells show the cycles when the modules perform involutional recomputation. Unshaded cells show the idle cycles in the design. (a) Involution Time redundancy in an iterative KHAZAD implementation takes 6 clock cycles for one round of KHAZAD. (b) Involution Time redundancy with idle cycles takes four cycles for one round of KHAZAD. Concurrently, the round operation θ(γ(x)) is performed and the input γ(x) is saved in CED Register 2. If x= γ(γ(x)) then no fault is detected in module γ and hence no errors are reported. The corresponding CED operation for θ(γ(x)) i.e., θ(θ(γ(x))) is performed in clock cycle 3 concurrent with round operation σ(θ(γ(x))) while the input θ(γ(x) is saved in CED Register 3. If γ(x) = θ(θ(γ(x))) then no fault was detected in module θ. At this point, one KHAZAD round ρ is completed only in 3 clock cycles in contrast to the 6 cycles consumed by the two other schemes described above. Now, in clock cycle 4, the corresponding CED operation for σ(θ(γ(x))) i.e., σ(σ(θ(γ(x)))) is performed concurrent with the round operation γ(y) where y is the input to the second round of the KHAZAD encryption/decryption given by y= σ(θ(γ(x))). If σ(σ(θ(γ(x))))= θ(γ(x)) then no fault is detected in module σ. This involutional time redundancy+idle cycles scheme is compared with involutional time redundancy in Table 7. Clock 1

Time redundancy based CED Involutional

Involutional + Idle cycles

γ(x) of round 1

γ (x) of round 1

2

γ(γ(x)) of round 1+check

θ(γ(x))) of round 1, γ(γ(x)) of round 1+check

3

Θ(γ(x))) of round 1

σ(θ(γ(x))) of round 1, θ(θ(γ(x))) of round 1+check

4

Θ(θ(γ(x))) of round 1+check

γ(y) of round 2, σ(σ(θ(γ(x)))) of round 1+check

5

Σ(θ(γ(x))) of round 1

θ(γ(y))) of round 2, γ(γ(y)) of round 2 +check

6

Σ(σ(θ(γ(x)))) of round 1+check

σ(θ(γ(y))) of round 2, θ(θ(γ(y))) of round 2+check

Table 7: Comparison between involution time redundancy without and with idle cycles 6.2. Implementation Involutional time redundancy + idle cycles was implemented using IBM 0.13 micron library. The same synthesis methodology as was outlined previously was followed and the results summarized in Tables 7 and 8. It can be seen that for a negligible additional area, this new optimization dramatically reduced the time overhead from about 100% to less than 10%. No CED (3 cycles/round)

Involution time red. + idle cycles

Overhead

Area

28697

33910

18.1%

Clock period (ps)

3136.35

3227.77

2.9%

#clock cycles

23

24

4.3%

Throughput (Gbps)

0.89

0.83

7.2%

Table 8: Overhead associated with Involution time redundancy + idle cycles for KHAZAD

No CED (3 cycles/ round)

Involution time red. + idle cycles

Overhead

Area

129304

145957

12.9%

Clock period (ps)

3198.78

3400.89

6.3%

#clock cycles

35

36

2.9%

Throughput (Gbps)

1.14

1.04

9.6%

Table 9: Overhead associated with Involution time redundancy + idle cycles for ANUBIS The KHAZAD round key generation algorithm expands the 128-bit user key K in to nine 64-bit round

keys K0,K1…, K8 using the KHAZAD round function. Hence, the CED method proposed in this paper can be applied to detect faults in the KHAZAD round key generation data path as well. 6.2 Comparison with Related Research in Fault Tolerant Crypto Several CED techniques for crypto chips have been proposed. In [7] a Register Transfer Level CED approach for Advanced Encryption Standard (AES) that exploits the inverse relationship between the encryption and decryption at the algorithm level, round level and individual operation level was developed. This technique has an area overhead of 21% at the algorithm level, 18.9% at the round level and 38.08% at operation level. Similarly, the time overhead is 61.15%, 26.55% and 23.56% respectively. In [8] this inverse-relationship technique was extended to AES round key generation. A drawback of this approach is that it assumes that the AES crypto device operates in a half-duplex mode (i.e. either encryption or decryption but not both are simultaneously active). In [9] a parity-based CED method for the AES encryption algorithm was presented. This technique has relatively high hardware overhead. The technique adds one additional parity bit per byte resulting in 16 additional bits for the 128-bit data stream. Each of the sixteen 8-bit×8-bit AES s-boxes is modified into 9-bit×9-bit S-Boxes more than duplicating the hardware for implementing the S-Boxes. In addition, this technique adds one additional parity bit per byte to the outputs of the Mix-Column operation because Mix-Column does not preserve parity of its inputs at the byte-level. In contrast to all these techniques, involution based CED yielded less area overhead (18%) and less time overhead (8%) and is best in terms of the fault coverage (100%) and fault detection latency. 7. CONCLUSIONS Involutional time redundancy is a powerful CED method. It can detect both transient and permanent faults that affect the output. Furthermore, it can detect permanent faults in a function even if the faults do not affect the output (i.e. are not activated by current inputs). Consider the situation where a bit is stuck-at-1 and the output at that bit was supposed to be logic ‘1’. In this case, although the output is

correct, the fault will be detected because the involution will not yield the correct result. This improves the overall fault coverage as well as the error detection latency. This dramatically enhances the security of the implementation since any attempts to clandestinely attack the algorithm can also be detected. An additional optimization specific to involutional substitution permutation ciphers resulted in almost negligible time overhead (less than 8% degradation in throughput). 8. REFERENCES [1] D. Boneh, R. DeMillo and R. Lipton, “On the importance of checking cryptographic protocols for faults”, Proceedings of Eurocrypt, Lecture Notes in Computer Science vol 1233, Springer-Verlag, pp. 37-51, 1997. [2] E. Biham and A. Shamir, “Differential Fault Analysis of Secret Key Cryptosystems”, Proceedings of Crypto, Aug 1997. [3] J. Bloemer and J.-P. Seifert, “Fault based cryptanalysis of the Advanced Encryption Standard,” www.iacr.org/eprint/2002/075.pdf. [4] C. Giraud, “Differential Fault Analysis on AES”, http://eprint.iacr.org/2003/008.ps [5] Jean-Jacques Quisquater, Gilles Piret, “A Differential Fault Attack Technique Against SPN Structures, with Application to the AES and KHAZAD,”Fifth International Workshop on Cryptographic Hardware and Embedded Systems (CHES 2003), Volume 2779 of Lecture Notes in Computer Science, pages 77-88, Springer-Verlag, September 2003 [6] B.W. Johnson, “Design and Analysis of Fault-Tolerant Digital Systems,” Addison-Wesley, 1989. [7] R Karri, K. Wu, P. Mishra and Y. Kim, “Concurrent Error Detection of Fault Based Side-Channel Cryptanalysis of 128-Bit Symmetric Block Ciphers,” IEEE Transactions on CAD, Dec 2002 [8] G. Bertoni, L. Breveglieri, I. Koren and V. Piuri, “On the propagation of faults and their detection in a hardware implementation of the advanced encryption standard,” Proceedings of ASAP’02, pp. 303-312, 2002.

[9] G. Bertoni, L. Breveglieri, I. Koren, and V. Piuri, “Error Analysis and Detection Procedures for a Hardware Implementation of the Advanced Encryption Standard,” IEEE Transactions on Computers, vol. 52, No. 4, pp. 492-505, Apr 2003. [10] Joan Daemen, Vincent Rijmen, Paulo S.L.M. Barreto, "Rijndael: Beyond the AES," Mikulášská kryptobesídka 2002 -- 3rd Czech and Slovak cryptography workshop, Dec. 2002, Prague, Czech Republic [11]

A.Biryukov, “Analysis of Involutional Ciphers: KHAZAD and ANUBIS,” Proceedings of the 3rd NESSIE Workshop, Springer-Verlag pp. 45 – 53

[12]

P.S.L.M. Barreto and V.Rijmen, “The KHAZAD legacy-level Block Cipher,” First open NESSIE Workshop, Leuven, 13-14 November 2000

[13]

J.Daemen, M.Peeters, G.Assche and V.Rijmen, “The Noekeon Block Cipher,” First Open NESSIE workshop, November 2000

[14]

P.S.L.M. Barreto and V.Rijmen, “The ANUBIS Block Cipher,” Primitive submitted to NESSIE, September 2000,available at www.cosic.esat.kuleuven.ac.be/nessie

[15]

F.Standaert, G.Piret, G.Rouvroy, “ICEBERG: an involutional cipher Efficient for block encryption in Reconfigurable hardware,” FSE 2004, Springer-Verlag, February 2004

[16]

F. Standaert, G. Rouvroy, J. Quisquater, J.Legat, “Efficient FPGA Implementations of Block Ciphers KHAZAD and MISTY1,” proceedings of the 3rd NESSIE Workshop, Munich, November, 2002.

[17]

J.Daemen and V.Rijmen, “The Rijndael Block Cipher,” AES Proposal submitted to NIST, March 1999.