Some new results on binary polynomial multiplication Murat Cenk Institute of Applied Mathematics, Middle East Technical University, Ankara, Turkey joint work with Anwar Hasan
April 10, 2015
Murat Cenk
New results on binary polynomial multiplication
1 / 36
Outline
1
Why do we need efficient multiplication algorithms in F2n ?
2
Known methods
3
Complexity tables for high speed cryptography
4
Improving complexities further
5
New results
Murat Cenk
New results on binary polynomial multiplication
2 / 36
Motivation: Why do we need efficient multiplication algorithms?
Cryptographic systems must be efficient. F2n is suitable for implementations. The value of n for practically used elliptic curve cryptography changes between 163 and 571, and one scalar multiplication requires several hundreds of field multiplications, i.e., it is not efficient unless careful designs and efficient algorithms are used.
Murat Cenk
New results on binary polynomial multiplication
3 / 36
Example Bernstein [Crypto 2009] A binary Edwards curve over F2251 = F2 [t]/(t251 + t7 + t4 + t2 + 1) is used. A single scalar multiplication requires 1266 field multiplications. Each multiplication needs 33974 bit operations, where 33096 bit operations for 251-bit polynomial multiplication, and 878 bit operations is required for reducing the 501-bit product modulo defining polynomial. So, totally 1266 × 33974 = 43011084 bit operations. Note that the other operations such as additions, squarings, multiplication by a fixed element of field, and conditional swap require totally 1668531, which is negligible compared to multiplication.
Murat Cenk
New results on binary polynomial multiplication
4 / 36
Notation and model of computation
Fqn is used for the finite field with q n elements (where q is a prime power), and Fq [X] is employed for the ring of polynomials over Fq . Mq (n) represents the minimum number of bit operations required for the computation of the product of two polynomials of a degree less than n over Fq . DA and DX denote the delay of bit level multiplication and addition, respectively. The cost metric related to polynomial multiplication is taken as the number of bit operations (bit addition and bit multiplication) required for multiplying polynomials over F2 or F4 , and since the computations are over characteristic two fields, addition and subtraction are equal.
Murat Cenk
New results on binary polynomial multiplication
5 / 36
Two measures of the complexity of an algorithm. Arithmetic complexity: the total number of operations required for multiplying polynomials and denoted by M (n).
Delay complexity: the depth of the corresponding arithmetic circuit, i.e., the length of the longest path and denoted by D(n).
Murat Cenk
New results on binary polynomial multiplication
6 / 36
Two measures of the complexity of an algorithm. Arithmetic complexity: the total number of operations required for multiplying polynomials and denoted by M (n).
Delay complexity: the depth of the corresponding arithmetic circuit, i.e., the length of the longest path and denoted by D(n). A0
A2
A1
x
A0
Murat Cenk
R6
R1
R4
α
A2
New results on binary polynomial multiplication
6 / 36
1
Why do we need efficient multiplication algorithms in F2n ?
2
Known methods
3
Complexity tables for high speed cryptography
4
Improving complexities further
5
New results
Murat Cenk
New results on binary polynomial multiplication
7 / 36
The computational complexity of multiplication Polynomial multiplication: Consider two degree n − 1 polynomials A(x) =
n−1 X
ai xi ,
B(x) =
i=0
n−1 X
bi xi .
i=0
The school-book multiplication gives us the product C(x) of A(x) and B(x) to be C(x) =
n−1 X n−1 X
ai bj xi+j .
i=0 j=0
This algorithm requires n2 multiplications and (n − 1)2 additions. Reduction: This step is generally easy and the cost is less than 5n. Murat Cenk
New results on binary polynomial multiplication
8 / 36
Karatsuba Algorithm
Karatsuba algorithm has better complexity. For example, consider two 2-term polynomials, A(x) = a0 + a1 x,
B(x) = b0 + b1 x.
Karatsuba algorithm computes the product C(x) = A(x)B(x) as C(x) = a1 b1 x2 + [(a0 + a1 )(b0 + b1 ) − a0 b0 − a1 b1 ]x + a0 b0 . Here we need just three multiplications a0 b0 , (a0 + a1 )(b0 + b1 ), a1 b1 and four additions.
Murat Cenk
New results on binary polynomial multiplication
9 / 36
Asymptotic complexity of Karatsuba Algorithm
Now, the size of polynomials are four (degree three): A(x) = a0 +a1 x+a2 x2 +a3 x3 = a0 + a1 x + |{z} x2 (a2 + a3 x) = A0 +yA1 , | {z } | {z } y
A0
A1
B(x) = b0 +b1 x+b2 x2 +b3 x3 = b0 + b1 x + |{z} x2 (b2 + b3 x) = B0 +yB1 . | {z } | {z } B0
y
B1
A(x)B(x) = A1 B1 y 2 +[(A0 +A1 )(B0 +B1 )−A0 B0 −A1 B1 ]y+A0 B0 . For 2n-term polynomials, we have M (2n) ≤ 3M (n) + 8n − 4, M (1) = 1, M (n) ≤ 7nlog2 3 + 4n − 4 = 7n1.585 + 4n − 4.
Murat Cenk
New results on binary polynomial multiplication
10 / 36
Karatsuba algorithm (with Bernstein’s improvement)
A(x) = A0 + X n A1 ; B(x) = B0 + X n B1 , A0 =
B0 =
n−1 X
ai X i , A1 =
n−1 X
i=0
i=0
n−1 X
n−1 X
i=0
bi X i , B1 =
ai+n X i ,
bi+n X i .
i=0
(A0 + X n A1 )(B0 + X n B1 ) = (1 + X n )(A0 B0 + X n A1 B1 ) + X n (A0 + A1 )(B0 + B1 )
Murat Cenk
New results on binary polynomial multiplication
11 / 36
The arithmetic complexity of the algorithm is as follow : M2 (n + k) ≤ 2M2 (n) + M2 (k) + 3n + 4k − 3, n/2 ≤ k ≤ n, D2 (2n) ≤ D2 (n) + 3DX , M (n) ≤ 6.5n1.58 − 7n + 1.5, 2 D2 (n) ≤ 3 log2 (n)DX + DA .
Murat Cenk
New results on binary polynomial multiplication
12 / 36
Karatsuba-like improved 3-way split algorithm This algorithm was obtained by C., Negre and Hasan in 2012 using a technique similar to that employed in [Zhou-Michalik]. P0 P2 P4 P5
= A0 B0 = P0L + P0H X n , P1 = A1 B1 = P1L + P0H X n , = A2 B2 = P2L + P2H X n , P3 = (A1 + A2 )(B1 + B2 ) = P3L + P3H X n , = (A0 + A1 )(B0 + B1 ) = P4L + P4H X n , = (A0 + A2 )(B0 + B2 ) = P5L + P5H X n ,
R0 = P0H + P1L , R1 = R0 + P0L , R2 = R1 + P4L , R3 = P1H + P2L , R4 = R1 + R3 , R5 = P4H + P5L , R6 = R4 + R5 , R7 = R3 + P2H , R8 = R7 + R0 , R9 = R8 + P3L , R10 = R9 + P5H , R11 = R7 + P3H , C = P0L + R2 X n + R6 X 2n + R10 X 3n + R11 X 4n + P2H X 5n . M2 (3n) ≤ 6M2 (n) + 18n − 6, M2 (2n + k) ≤ 5M2 (n) + M2 (k) + 12n + 6k − 6, n/2 < k ≤ n, M2 (2n + k) ≤ 5M2 (n) + M2 (k) + 13n + 4k − 5, k ≤ n/2, D2 (3n) ≤ D2 (n) + 4DX , M2 (n) ≤ 5.8n1.63 − 6n + 1.2, D2 (n) ≤ 4 log3 (n)DX + DA . Murat Cenk
New results on binary polynomial multiplication
13 / 36
Bernstein 4-way split algorithm
A = A0 + A1P X n + A2 X 2n + A3 X 3n , BP= B0 + B1 X + B2 X 2n + B3 X 3n n−1 n−1 where Aj = i=0 ai+nj X i and Bj = i=0 bi+nj X i for j = 0, 1, 2, 3. Bernstein’s 4-way algorithm is the following: AB = (1 + X 2n )((1 + X n )(A0 B0 + X n A1 B1 + X 2n A2 B2 + X 3n A3 B3 ) +X n (A0 + A1 )(B0 + B1 ) + X 3n (A2 + A3 )(B2 + B3 )) +X 2n (A0 + A2 + (A1 + A3 )X n )(B0 + B2 + (B1 + B3 )X n ). M2 (4n) ≤ M2 (2n) + 6M2 (n) + 27n − 8, M2 (3n + k) ≤ M2 (2n) + 5M2 (n) + M2 (k) + 19n + 8k − 8, n/2 ≤ k ≤ n, D2 (4n) ≤ D2 (n) + 5DX , M2 (n) ≤ 6.425n1.58 − 6.8n + 1.375, D2 (n) ≤ 5 log4 (n)DX + DA .
Murat Cenk
New results on binary polynomial multiplication
14 / 36
Interpolation method
Let C(x) = A(x)B(x) in Fqn and 2n − 1 ≤ q where q is a prime power. Step 1 (Selection) Choose 2n − 1 points i.e. w0 , w1 , · · · , wd−1 . Step 2 (Evaluation) For i = 0, 1, . . . , 2n − 1, (i) Compute A(wi ) and B(wi ) (ii) Compute the product A(wi ) · B(wi ) Step 3 (Interpolation) Compute the polynomial product C(x) of A(x) and B(x) of (2n − 1)-term such that C(wi ) = A(wi ) · B(wi ).
Murat Cenk
New results on binary polynomial multiplication
15 / 36
Step 3 can be done c0 c1 .. . |
explicitly by the following matrix equation. A(w0 )B(w0 ) A(w1 )B(w1 ) (1) = V −1 · .. .
cd−1 {z } C
|
where V =
A(wd−1 )B(wd−1 ) {z } A(wi )B(wi ) ··· ··· .. .
w0d−1 w1d−1 .. .
2 1 wd−1 wd−1 ···
d−1 wd−1
1 1 .. .
w0 w1 .. .
w02 w12 .. .
The matrix V is called interpolation matrix. Since matrix V a is Van der Monde matrix, it is invertible.
Murat Cenk
New results on binary polynomial multiplication
16 / 36
Bernstein’s 3-way split formula A = A0 + A1 Y + A2 Y 2 , B = B0 + B1 Y + B2 Y 2 . Bernstein has used these five elements 0, 1, X, X + 1 and ∞. Evaluations P0 P1 P2 P3 P4
Murat Cenk
= A0 B 0 , = (A0 + A1 + A2 )(B0 + B1 + B2 ), = (A0 + A1 X + A2 X 2 )(B0 + B1 X + B2 X 2 ), = (A0 + A1 + A2 ) + (A1 X + A2 X 2 ) (B0 + B1 + B2 ) + (B1 X + B2 X 2 ) , = A2 B 2 .
New results on binary polynomial multiplication
17 / 36
Bernstein’s 3-way split formula A = A0 + A1 Y + A2 Y 2 , B = B0 + B1 Y + B2 Y 2 . Bernstein has used these five elements 0, 1, X, X + 1 and ∞. Evaluations P0 P1 P2 P3 P4
= A0 B 0 , = (A0 + A1 + A2 )(B0 + B1 + B2 ), = (A0 + A1 X + A2 X 2 )(B0 + B1 X + B2 X 2 ), = (A0 + A1 + A2 ) + (A1 X + A2 X 2 ) (B0 + B1 + B2 ) + (B1 X + B2 X 2 ) , = A2 B 2 .
Reconstruction
U
W , X2 + X = P0 + (P0 + P1 )X, V = P2 + (P2 + P3 )(X n/3 + X)
W
= (U + V + P4 (X 4 + X))(X 2n/3 + X n/3 )
C = U + P4 (X 4n/3 + X n/3 ) +
Murat Cenk
New results on binary polynomial multiplication
17 / 36
Complexities
M (n) ≤ 3M (n/3) + 2M (n/3 + 2) + 35n/3 − 12, M (n/3 + 2) ≤ M (n/3) + 8n/3 + 4, M (n) ≤ 25.5nlog3 (5) − 25.5n + 1, M (n) = O(n1.46 ).
Murat Cenk
New results on binary polynomial multiplication
18 / 36
Multi-evaluation and reconstruction data flow
A0
P3
P2
A2
A1
P4
P0
P1
3 n
1
n 3
n
3
3
1
1
n 3
A0
R3
R1
R4
A2 Div. by X 2+X n 3
n
n
3
3
C
Murat Cenk
New results on binary polynomial multiplication
19 / 36
Delay evaluations Reconstruction C = U + P4 (X 4n/3 + X n/3 ) +
W , X2 + X
Division by X 2 + X Divide W by X which is a shift of the coefficients of W . Divide W/X by X + 1. The coefficients of W/(X 2 + X): 0 wn−j = wn + wn−1 + · · · + wn−j+2 .
The corresponding delay is (n − 2)D⊕ where D⊕ is the delay of a bit addition. Delay complexity D(n) = ( Murat Cenk
3n 3 + 8 log3 (n) − )D⊕ + D⊗ . 2 2
New results on binary polynomial multiplication
20 / 36
Three-way formula based on field extension C., Negre and Hasan proposed a different approach. F4 = F2 [α]/(α2 + α + 1) = {0, 1, α, α + 1}. Evaluate the polynomials at 0, 1, α, α + 1 and ∞. Evaluations P0 P1 P2 P3 P4
= = = = =
A0 B0 , (A0 + A1 + A2 )(B0 + B1 + B2 ), (A0 + A2 + α(A1 + A2 ))(B0 + B2 + α(B1 + B2 )), (A0 + A1 + α(A1 + A2 ))(B0 + B1 + α(B1 + B2 )), A2 B 2 .
Reconstruction C = (P0 + X n/3 P4 )(1 + X n ) + (P1 + (1 + α)(P2 + P3 ))(X n/3 + X 2n/3 + X n ) + α(P2 + P3 )X n + P2 X 2n/3 + P3 X n/3 . Murat Cenk
New results on binary polynomial multiplication
21 / 36
Complexities
MF2 (n) ≤ 2MF4 (n/3) + 3M F2 (n/3) + 29n/3 − 12, MF4 (n) ≤ 5M F4 (n/3) + 58n/3 − 21, MF4 (n) ≤ 30.75nlog3 (5) − 29n + 5.25, MF2 (n) ≤ 30.75nlog3 (5) − 9.67n log3 (n) − 30.5n + 0.75.
Murat Cenk
New results on binary polynomial multiplication
22 / 36
Multi-evaluation and reconstruction data flow
P0
P2
P1
P4
P3 n
A0
3
A2
A1
xα
x
x (1+α)
n
α
n
3
n
3
A0
R6
R1
R4
n
3
n
n
3
3
A2
C
Murat Cenk
New results on binary polynomial multiplication
23 / 36
Delay evaluations
DF2 (n) ≤ 7D⊕ + DF4 (n/3), DF4 (n) ≤ 9D⊕ + DF4 (n/3), DF4 (n) ≤ 9 log3 (n)D⊕ + D⊗ , DF2 (n) ≤ (9 log3 (n) − 2)D⊕ + D⊗ .
Murat Cenk
New results on binary polynomial multiplication
24 / 36
Complexity comparisons
CNH complexities M (n) ≤ 30.75nlog3 (5) − 9.67n log3 (n) − 30.5n + 0.75, D(n) ≤ (9 log3 (n) − 2)D⊕ + D⊗ .
Murat Cenk
New results on binary polynomial multiplication
25 / 36
Complexity comparisons
CNH complexities M (n) ≤ 30.75nlog3 (5) − 9.67n log3 (n) − 30.5n + 0.75, D(n) ≤ (9 log3 (n) − 2)D⊕ + D⊗ .
Bernstein’s complexities log3 (5) − 25.5n + 1, M (n) ≤ 25.5n 3n 3 + 8 log3 (n) − D(n) ≤ D⊕ + D⊗ . 2 2
Murat Cenk
New results on binary polynomial multiplication
25 / 36
1
Why do we need efficient multiplication algorithms in F2n ?
2
Known methods
3
Complexity tables for high speed cryptography
4
Improving complexities further
5
New results
Murat Cenk
New results on binary polynomial multiplication
26 / 36
1
Why do we need efficient multiplication algorithms in F2n ?
2
Known methods
3
Complexity tables for high speed cryptography
4
Improving complexities further
5
New results
Murat Cenk
New results on binary polynomial multiplication
27 / 36
A new split method for Bernstein’s 3-way split algorithm
We compute (XA(X))(XB(X)) instead of A(X)B(X) by using Bernstein’s 3-way split algorithm. XA(X) = A0 + A1 X n+1 + A2 X 2n+2 XB(X) = B0 + B1 X n+1 + B2 X 2n+2 , This method splits 3n-term polynomials as (n, n + 1, n − 1) rather than (n, n, n) M2 (3n) ≤ M2 (n) + 2M2 (n + 1) + M (n + 2) + M (n − 1) + 35n − 12, M2 (3n − 2) ≤ 2M2 (n) + M2 (n + 1) + 2M (n − 1) + 35n − 13.
Murat Cenk
New results on binary polynomial multiplication
28 / 36
Improved 5-way split algorithm
A = A0 + A1 X n + A2 X 2n + A3 X 3n + A4 X 4n , B = B0 + B1 X n + B2 X 2n + B3 X 3n + B4 X 4n . m1 = A0 B0 , m2 = A1 B1 , m3 = A2 B2 , m4 = A3 B3 , m5 = A4 B4 , m6 = (A0 + A1 )(B0 + B1 ), m7 = (A0 + A2 )(B0 + B2 ), m8 = (A2 + A4 )(B2 + B4 ), m9 = (A3 + A4 )(B3 + B4 ), m10 = (A0 + A2 + A3 )(B0 + B2 + B3 ), m11 = (A1 + A2 + A4 )(B1 + B2 + B4 ), m12 = (A0 + A3 + A1 + A4 )(B0 + B3 + B1 + B4 ), m13 = (A0 + A1 + A2 + A3 + A4 )(B0 + B1 + B2 + B3 + B4 ),
Murat Cenk
New results on binary polynomial multiplication
29 / 36
Let C =
P10
i=1
Ui X (i−1)n
t1 = p1 + p2 , t2 = t1 + p3 , t3 = t2 + p11 , t4 = p4 + p5 , t5 = p12 + p13 , t6 = t4 + t5 , t7 = t2 + t6 , t8 = t1 + t4 , t9 = p6 + p7 , t10 = t8 + t9 , t11 = t10 + p9 , t12 = p14 + p15 , t13 = t11 + t12 , t14 = p19 + p23 , t15 = t14 + p25 , t16 = t13 + t15 , t17 = p8 + p9 , t18 = t17 + p10 , t19 = t18 + p18 , t20 = p6 + p7 , t21 = t18 + t20 , t22 = p16 + p17 , t23 = t21 + t22 , t24 = t23 + t3 , t25 = p20 + p21 , t26 = p25 + p26 , t27 = p19 + p24 , t28 = t25 + t26 , t29 = t28 + t27 , t30 = t29 + t24 , t31 = t7 + t19 , t32 = t28 + t31 , t33 = p22 + p23 , t34 = t32 + t33 , t35 = t11 + p1 , t36 = t35 + p10 , t37 = t36 + t12 , t38 = t37 + p22 , t39 = t38 + p24 , t40 = t39 + p26 , U1 = p1 , U2 = t3 , U3 = t7 , U4 = t16 , U5 = t30 , U6 = t34 , U7 = t40 , U8 = t23 , U9 = t19 , U10 = p10 ,
Murat Cenk
New results on binary polynomial multiplication
30 / 36
Asymptotic complexities of this algorithm are the following: M2 (n) ≤ 13M2 (n) + 56n/5 − 18, M2 (1) = 1, M2 (n) ≤ 6.5n1.58 − 7n + 1.5, D2 (n) ≤ D2 (n/5) + 12DX , D2 (1) = DA , D2 (n) ≤ 12 log5 (n)DX + DA .
Murat Cenk
New results on binary polynomial multiplication
31 / 36
New improved 3-way algorithm
P0 = A0 B0 , P1 = (A0 + A1 + A2 )(B0 + B1 + B2 ), P4 = A2 B2 , P2 = (A0 + A2 + α(A1 + A2 ))(B0 + B2 + α(B1 + B2 )) = P2,0 + αP2,1 , C = P4 X 4n + (P0 + P1 + P2,1 )X 3n + (P2,0 + P1 + P2,1 )X 2n +(P4 + P1 + P2,0 )X n + P0 Asymptotic complexities of this algorithm are the following: M2 (n) ≤ 3M2 (n/3) + M4 (n/3) + 20n/3 − 5, , M2 (1) = 1, M2 (n) ≤ 15.125n1.46 − 14.25n − 2.4274 log3 (n) + 0.125, D2 (n) ≤ D4 (n/3) + 8DX , D2 (1) = DA , D2 (n) ≤ 10 log3 (n)DX + DA .
Murat Cenk
New results on binary polynomial multiplication
32 / 36
Comparison of complexities
Table: Cost of multiplication
Algorithm
Split
M (n)
Delay
Bernstein
2
6.5n1.58
Bernstein
3
25.5n1.46 + O(n)
(1.5n + O(log3 (n))DX
3
5.8n1.63
4 log3 (n)DX
3
30.25n1.46
3
15.125n1.46
4
6.425n1.58
5
6.5n1.5
CNH CNH CH Bernstein CH
Murat Cenk
+ O(n) + O(n) + O(n) + O(n)
+ O(n)
+ O(n)
3 log2 (n)
10 log3 (n)DX 10 log3 (n)DX 5 log4 (n)DX 11 log5 (n)DX
New results on binary polynomial multiplication
33 / 36
1
Why do we need efficient multiplication algorithms in F2n ?
2
Known methods
3
Complexity tables for high speed cryptography
4
Improving complexities further
5
New results
Murat Cenk
New results on binary polynomial multiplication
34 / 36
New results
Murat Cenk
n
Previous
New
9 15 17 18 19 21 22 23 24 25 26 27 163 233 251 256 283 407 408 409 571
132 329 414 456 502 602 641 678 704 800 856 922 16923 29354 33096 34079 38735 67374 67582 67753 112569
126 317 407 438 498 596 632 676 702 791 853 912 16828 29156 32604 33397 38432 66931 67137 67284 111621
New results on binary polynomial multiplication
35 / 36
Thank you for your attention.
Murat Cenk
New results on binary polynomial multiplication
36 / 36