Simulation Bounds for Equivalence Verification of Polynomial Datapaths Using Finite Ring Algebra∗ Namrata Shekhar1 , Priyank Kalla1 , M. Brandon Meredith2 , Florian Enescu2 1
Department of Electrical & Computer Engineering University of Utah, Salt Lake City, UT-84112 Department of Mathematics & Statistics Georgia State University, Atlanta, GA-30303 2
Submitted For Review to the IEEE Transactions on VLSI Special Section on Design Verification and Validation: Theory and Techniques Guest Editors: Dhiraj K. Pradhan and Ian G. Harris
Designated Contact Author: Priyank Kalla Email:
[email protected] Ph: (801)-587-7617 Fax: (801)-581-5281
Abstract
Keywords: Simulation-based verification, Equivalence Checking, Bit-Vector Arithmetic, Finite Integer Rings, Polynomial Functions This paper addresses simulation-based verification of high-level (algorithmic, behavioral or RTL) descriptions of arithmetic datapaths that perform polynomial computations over finite word-length operands. Such designs are typically found in digital signal processing (DSP) for audio/video and multi-media applications; where the wordlengths of input and output signals (bit-vectors) are predetermined and fixed according to the desired precision. Initial descriptions of such systems are usually specified as Matlab/C code. These are then automatically translated into behavioral/RTL descriptions (HDL) for subsequent hardware synthesis. In order to verify that the initial Matlab/C model is bit-true equivalent to the translated RTL, how many simulation vectors need to be applied? This paper derives some important results that show that exhaustive simulation is not necessary to prove/disprove their equivalence. To derive these results, we model the datapath computations as polynomial functions over finite integer rings of the form Z2m ; where m corresponds to the bit-vector word-length. Subsequently, by exploring some number-theoretic and algebraic properties of these rings, we derive an upper bound on the number of simulation vectors required to prove equivalence or to identify bugs. Moreover, these vectors cannot be arbitrarily generated. We identify exactly those vectors that need to be simulated. Experiments are performed within practical CAD settings to demonstrate the validity and applicability of these results. While our results cannot prove equivalence of an RTL description against its gate-level netlist, we show that their exists a large class of practical applications that can benefit from these results. ∗
This work is sponsored in part by: (i) NSF CAREER grant CCF-546859 and (ii) Georgia State University Research Initiation Grant.
2
I. I NTRODUCTION Increasing size and complexity of digital systems has resulted in a vast array of formal verification techniques which operate at different levels of abstraction. These include model checking, equivalence checking, theorem proving, among others. In spite of many advances in formal verification, simulation-based validation has remained an important method for ensuring functional correctness during various stages of the design cycle. This paper addresses the problem of simulation-based equivalence testing of arithmetic datapaths that perform polynomial computations over finite word-length operands. Such datapath oriented designs are commonly found in many digital signal processing applications, that perform ADD , MULT type of algebraic computations over bit-vectors of specified widths. Fig. 1 describes a typical design flow for such datapath intensive (signal-processing) applications, along with the context in which the equivalence verification problem appears. Simulink Model (Floating point Design)
Automated Fixed Point Generation
Simulink Model (Fixed point Design)
Conversion Utility
RTL models
Optimisation
Equivalence Verification?
Logic Simulation
RTL models
Synthesis
Fig. 1. The Equivalence Verification problem: Matlab to RTL design flow.
Initial algorithmic specifications (such as MATLAB models) of most signal processing applications involve data representation using floating-point formats. However, they are often implemented with fixed-point architectures, where the required precision dictates the bitvector sizes of the variables. Various automated tools exist for this translation [1]. For synthesis and optimization purposes, these high-level descriptions may be subsequently converted to HDL using automatic utilities [2] [3]. Design optimization may be further achieved by applying high-level synthesis and restructuring operations on the translated RTL model [4] [5]. It is required to show that the translated and optimized RTL models are bit-true equivalent to the fixed-point specification. Simulation is extensively used to validate the inputoutput behavior of the original model (at the M ATLAB , C
level); say to validate the pass-band of a filter. These vectors can then also be applied at the RT-level. Validation of the Matlab model is fast, even with a large number of test vectors, since it is compiled-code simulation. However, simulating the RTL model with the same set of vectors is generally slow. In general, equivalence proofs require exhaustive simulation, which makes simulation-based approaches impractical. In this regard, this paper derives an important result related to simulation-based verification of high-level descriptions of arithmetic datapaths. In particular, we show that: 1. Exhaustive simulation is not always necessary to verify equivalence or to find bugs; 2. An upper bound can be derived for the maximum number of test vectors required for this purpose; and 3. Which simulation vectors to choose for the equivalence proofs. A. Bit-Vector Arithmetic and Finite Ring Algebra Polynomial algebra provides a suitable platform for modeling such arithmetic intensive designs. However, for correct modeling of such systems, the bit-vector size needs to be accounted for in the polynomial model. For example, the largest (unsigned) integer value that a bitvector of size m can represent is 2m − 1; implying that the bit-vector represents integer values reduced modulo 2m (%2m ). This suggests that bit-vector arithmetic can be efficiently modeled as algebra over finite integer rings [6]. Therefore, we model the datapaths as polynomial functions over the ring Z2m , where the bit-vector wordlengths (m) dictate the cardinality of the ring. Subsequently, we exploit the number-theoretic and algebraic properties of these rings to systematically establish the claims mentioned above. Let us motivate this issue using a few examples and put our contribution in perspective. module fixed bit width (x, t1 , t2 ); input [7 : 0] x; output [7 : 0] t1 , t2 ; assign t1 [7 : 0] = 64 ∗ x2 + 192 ∗ x − 169; assign t2 [7 : 0] = 192 ∗ x2 + 64 ∗ x + 87;
Consider the RTL computations t1 and t2 shown above. The outputs t1 and t2 are symbolically distinct polynomials. However, because the datapath size is fixed to 8-bits, the computations t1 and t2 are bit-true equivalent. In other words, t1 [7 : 0] ≡ t2 [7 : 0], or mathematically speaking, t1 %28 ≡ t2 %28 . To prove the desired equivalence, exhaustive simulation would require that we compute and compare the values of t 1 and t2 for x = 0, 1, . . . , 28 − 1. This paper derives results which
3
prove that in the above case: i) if the two designs are not equivalent (bug), a maximum of 10 (specific) vectors are sufficient to capture the erroneous behavior; ii) if for these 10 vectors, no bug is detected, then the designs are indeed equivalent. Note that our approach presents a significant reduction in the number of required simulation vectors. Now, consider a practical application: Given two degree-k polynomials F1 (x) and F2 (x), their coefficients can be represented as (k + 1)-length vectors A = (a0 , . . . , ak ) and B = (b0 , . . . , bk ). Computing the convolution of such vectors results in another k + 1 vector, according to: ci =
k X
(1)
aj bi−j 0 ≤ i ≤ k
j=0
where, ci is the ith component in the vector C. This procedure, however, has a complexity of O(n 2 ). It is wellknown [7] [8] that convolution can be effectively implemented in hardware by: 1. Computing the DFT of vectors A and B, 2. Calculating their pairwise product; and finally, 3. Taking the inverse DFT of the result. In other words, the result vector (c00 , c01 , . . . , c0k ) is computed as (2)
C 0 = DF T −1 (DF T (A) · DF T (B))
This operation is shown in Fig. 2 for degree-3 polynomials. Such a ’divide-and-conquer’ strategy results in a complexity of O(n · logn), and is a popular way of implementing the convolution of two vectors. a0 a1 a2
B. Problem Modeling We model the arithmetic computations over bitvectors as follows. Let x1 , x2 , . . . , xd denote the dvariables (bit-vectors) in the design. Let n 1 , n2 , . . . , nd denote the size of the corresponding bit-vectors. Therefore, x1 ∈ Z2n1 , x2 ∈ Z2n2 , . . . , xd ∈ Z2nd . Note that Z2ni corresponds to the finite set of integers {0, 1, . . . , 2ni − 1}. Let m correspond to the size of the output bit-vector f ; hence, f ∈ Z 2m . Subsequently, we model the arithmetic datapath computation as a polynomial function (or polyfunction) from Z 2n1 × Z2n2 × · · · × Z2nd to Z2m [9]. Here Za × Zb represents the Cartesian product of Za and Zb . In other words, the computation is modeled as a multi-variate polynomial F (x1 , x2 , . . . , xd ) %2m . The equivalence problem then corresponds to checking the congruence of two polynomials: F (x1 , . . . , xd )%2m ≡ G(x1 , . . . , xd )%2m . The verification problem of Fig. 1 has seen a lot of interest recently in [10] [11] [12] [13] [14]. The works of [13] [14] use the same polynomial function model to derive a symbolic approach to prove/disprove equivalence of arithmetic datapaths. However, these works are restricted inasmuch as they can only provide a “yes/no” answer to the equivalence check. They cannot provide an error trace when bugs are detected. Moreover, our results also have implications on the applicability of the fundamental theorem of algebra which has been used in hardware design and verification [10] [11] [12], as described below.
FFT(A) FAB0 c’ 0
a3 FAB1 FAB2
b0
InvFFT(FAB)
FAB3
b1 b2
We provide theoretical results which show that 6 8 vectors are sufficient to prove or disprove equivalence. Again, a method for generating these specific simulation vectors is also derived.
FFT(B)
c’ 1 c’ 2 c’3
b3
Fig. 2. Convolution of A and B.
Suppose that we intend to verify that both implementations compute the same values, say, for coefficients c0 (obtained via direct convolution) and c 00 (obtained via DFT-product-InvDFT). Further, as is often the case, assume that the entire datapath word-length in both cases is fixed to a certain width, say 4-bits. Each of the inputs (ai and bi , 0 ≤ i ≤ 3) can take values between {0, . . . , 24 − 1}, requiring a total of 24∗8 test vectors.
C. Bit-Vector Arithmetic versus the Fundamental Theorem of Algebra Lemma 1: Let P (x) be a degree-k uni-variate polynomial. If P (x) = 0 for (k + 1) distinct values of x, then all the coefficients of P (x) are zero. The above lemma [15] is based on the fundamental theorem of algebra [16]. This theorem states that a degree-k univariate polynomial P (x) has exactly k complex roots, unless all its coefficients are zero. Since integers are a special case of complex numbers, this theorem holds for the set Z as well. The work of [11] used this result for equivalence verification by modeling arithmetic datapaths as polynomials, and showed that k + 1 vectors were sufficient to prove equivalence of any given degreek polynomials. It was later extended in [15] to be appli-
4
cable to multi-variate polynomials as well, and was further applied to reduce the complexity of model-checking. Also, [10] used the same concepts for extracting polynomial representations from RTL descriptions for highlevel synthesis purposes. However, the above results are only relevant in unique factorization domains (UFDs), such as the set of real numbers (R), the set of integers (Z), finite fields (Zp , GF (pn ), p =prime) etc. In our context, the specific modulo value (2m ) does not correspond to unique factorization, due to the presence of zero-divisors (e.g., 4 6= 2 6= 0, 4·2 = 0%8), and correspondingly due to lack of multiplicative inverses. In other words, finite rings of residue classes Z2m do not form a field. As a result, factorization is not unique in such rings, as shown for the polynomial F (x) = x2 + 6x ∈ Z8 below. x2 + 6x = x(x + 6) %23 = (x + 4)(x + 2) %23 The polynomial F (x) has a degree k = 2, but can be factorized in two non-unique ways; corresponding to four distinct roots. In such cases, Lemma 1 does not hold; simulating for k + 1 = 3 values such as x = 0, 2, 4, does show F = 0 but that does not mean that all the coefficients of F (x) are zero. Indeed, for x = 1, F (x) = 7 6= 0. Therefore, simulating for only k + 1 vectors is insufficient for verification. Clearly, for bit-vector arithmetic, properties of this class of rings (of the type Z2m ) need to be investigated further for simulation-based verification. We will now explore results for polynomial functions over such finite integer rings with applications in simulation-based verification of arithmetic datapaths. D. Scope of the Paper The approach presented in this paper has been applied to verify high-level descriptions of arithmetic datapaths, such as those in C and RTL (VERILOG / VHDL), some of which were automatically generated by M ATLAB (Simulink and filter design toolboxes) [3]. Our technique is applicable to designs that implement unsigned and two’s complement (overflow) arithmetic. In the DSP domain, however, rounding as well as saturation are also common modes of approximation. Modeling such architectures as polynomial functions over finite rings is significantly more involved and is not the subject of this paper. For the same reason, verification of (behavioral) RTL against its corresponding gate-level implementation (netlist) is also not dealt with in the paper. Even within
this scope, we demonstrate that there exists a large class of applications that can benefit from our results. E. Paper Organization This paper is organized as follows: The next section reviews related work in simulation-based verification. Section III covers preliminary concepts and background material regarding polynomial functions and finite ring theory. Section IV describes the proposed results for univariate polynomials and provides the mathematical foundation for their support. These results are extended for multi-variate computations in Section V. Finally, Section VI describes the experimental setup and results, while Section VII concludes the paper. II. P REVIOUS W ORK Bryant, in [17] [18], used three-valued logic simulation to reduce the required number of simulation vectors for circuit verification. Subsequently, [19] incorporated some of these techniques into their framework, which provided a method for design verification at different levels of abstraction. Brand [20] proposed exploiting information from the design specification to significantly reduce the complexity of simulation. Clarke et al. further researched the problem of specifications and generators in [21]. However, BDDs were used to demonstrate a practical approach to this problem in the SimGen project [22]. Later on, Shimizu et al. [23] [24] automated this approach to verify large industrial designs. The above methods are focused towards generating the appropriate number of test vectors to ensure sufficient verification coverage. Our approach, on the other hand, applies polynomial methods that make exhaustive simulation unnecessary for arithmetic datapaths. An algebraic approach to reducing the test vector set was proposed in [11], and later extended in [12]. These works model the given datapaths as polynomials and apply the fundamental theorem of algebra to verify the descriptions. Recently, [15] extended the theorem to be applicable to multivariate polynomials as well. However, as mentioned earlier, these results do not always hold over the proposed model, which accounts for bit-vector sizes of the input and output variables. The works which come closest to ours have been presented in [13][14]. They model the given datapath as a polynomial function over a system of finite integer rings. [13] proves equivalence of fixed-size datapaths by using canonical representations of polynomials over finite rings. The concept of vanishing polynomials is used in [14] to derive a symbolic approach to test equivalence
5
of polynomial functions. Both approaches use some form of algebraic simplification, which suffers from the wellknown intermediate-expression swell problem [25]. In addition, these techniques cannot provide an error trace whenever non-equivalence is detected. This paper reinterprets the polynomial function model and extends the concepts presented in [9] to derive a novel solution for simulation-based verification of highlevel descriptions of arithmetic datapaths. The next few sections cover some preliminary concepts, and then derive the theoretical contributions of this paper. Practical application of our work is subsequently demonstrated. III. P RELIMINARIES This section briefly reviews basic commutative algebra concepts to put our polynomial equivalence problems in perspective. The material is mostly referred from [6]. Definition III.1: An Abelian group is a set G and a binary operation 0 +0 satisfying: • Closure: For every a, b ∈ G, a + b ∈ G. • Associativity: For every a, b, c ∈ G, a + (b + c) = (a + b) + c. • Commutativity: For every a, b ∈ G, a + b = b + a. • Identity: There is an identity element 0 ∈ G such that for all a ∈ G; a + 0 = a. • Inverse: If a ∈ G, then there is an element a −1 ∈ G such that a + a−1 = 0. The set of integers Z, for instance, forms an abelian group under addition. Definition III.2: A Commutative ring with unity is a set R and two binary operations 0 +0 and 0 ·0 , as well as two distinguished elements 0, 1 ∈ R such that, R is an Abelian group with respect to addition with additive identity element 0, and the following properties are satisfied: • Multiplicative Closure: For every a, b ∈ R, a · b ∈ R. • Multiplicative Associativity: For every a, b, c ∈ R, a · (b · c) = (a · b) · c. • Multiplicative Commutativity: For every a, b ∈ R, a · b = b · a. • Multiplicative Identity: There is an identity element 1 ∈ R such that for all a ∈ R, a · 1 = a. • Distributivity: For every a, b, c ∈ R, a · (b + c) = a · b + a · c holds for all a, b, c ∈ R. The set Zn = {0, 1, . . . , n − 1}, where n ∈ N , forms a commutative ring with unity. It is called the residue class ring, where addition and multiplication are defined modulo n (%n) according to the rules below. For our
application, n = 2m . (a + b)%n = (a%n + b%n)%n (a · b)%n = (a%n · b%n)%n
(3) (4)
(−a)%n = (n − a%n)%n
(5)
Definition III.3: Integers x, y are called congruent modulo n (x ≡ y %n) if n is a divisor of their difference: n|(x − y). Definition III.4: A zero divisor is a non-zero element x of a ring R, for which x · y ≡ 0, where y is some other non-zero element of R and the multiplication x · y is defined according to Eqn. (4). As an example, consider the non-zero integers 2 and 4 in the ring Z8 . Since, 2 · 4 ≡ 0 % 8, 2 and 4 are zerodivisors of each other. A commutative ring that has no zero divisors is known as an integral domain. The set of integers, Z, is an example. Definition III.5: A field F is a commutative ring with unity, where every element in F , except 0, has a multiplicative inverse, i.e. ∀ a ∈ F − {0}, ∃ a ˆ ∈ F such that a · a ˆ = 1. The system Zn forms a field if and only if n is prime. Hence, Z2m (for m > 1) is not a field as not every element in Z2m has an inverse. Lack of inverses in Z 2m makes RTL verification complicated since Euclidean algorithms for division and factorization are no longer applicable. Definition III.6: Let R be a ring. A polynomial over R in the indeterminate x is an expression of the form: a0 + a 1 x + a 2 x2 + · · · + a k xk =
X
ai xi
(6)
k
∀ai ∈ R. Elements ai are coefficients, k is the degree. The element ak is called the leading coefficient; when ak = 1, the polynomial is monic. The system consisting of the set of all polynomials in x over the ring R, with addition and multiplication defined accordingly, also forms a ring, called the ring of polynomials R[x]. Similarly, R[x 1 , . . . xd ] denotes a ring of multi-variate polynomials in d variables. Note that, when R = Z2m , the corresponding polynomials are evaluated %2m . Functions over finite rings that can be represented by polynomials are generally termed as polynomial functions (or polyfunctions). Singmaster [26] initially studied various properties of polynomial functions of the type
6
f : Zn → Zn . Subsequently, Chen extended the study to analyze polyfunctions of the type f : Z n → Zm [27] and further to those of the type f : Zn1 × Zn2 × · · · × Znd → Zm [9]. The following definition of such a polynomial function is taken from [9], and modified, for our application, to rings modulo an integer power of 2. Definition III.7: A function f from Z 2n1 × Z2n2 × . . . × Z2nd → Z2m is said to be a polynomial function (or polyfunction) if it is represented by a polynomial F ∈ Z[x1 , x2 , . . . , xd ]; i.e. f (x1 , x2 , . . . , xd ) ≡ F (x1 , x2 , . . . , xd ) for all xi ∈ Z2ni , i = 1, 2, . . . , d and ≡ denotes congruence (mod 2m ). Example III.1: Let f : Z2 × Z22 → Z23 be a polyfunction in two variables (x1 , x2 ), defined as: f (0, 0) = 1, f (0, 1) = 3, f (0, 2) = 5, f (0, 3) = 7, f (1, 0) = 1, f (1, 1) = 4, f (1, 2) = 1, f (1, 3) = 0. Then, f is a polyfunction representable by F = 1+2x 2 + x1 x2 2 , since f (x1 , x2 ) ≡ F (x1 , x2 ) % 23 for x1 = 0, 1 and x2 = 0, 1, 2, 3. It is possible for a polynomial with non-zero coefficients to vanish on such mappings, in which case the polynomials are called vanishing polynomials (F ≡ 0) and their underlying functions correspond to nil polyfunctions. Example III.2: Consider the function f (x 1 , x2 ) : Z2 × Z22 → Z23 represented by the polynomial F = 4x1 x22 + 4x1 x2 . While F has non-zero coefficients, F % 23 ≡ 0, ∀x1 ∈ Z2 , ∀x2 ∈ Z4 . Properties of such polynomials have been extensively studied in number theory and commutative algebra [28] [26] [27] [9]. We will later review some of these results in the context of deriving simulation bounds for equivalence of polyfunctions. Examples are used to demonstrate relevant results; their corresponding proofs can be found in [27] [9] and are therefore not reproduced here. Definition III.8: Two polynomials F 1 and F2 are equivalent % 2m , if their associated polyfunctions f 1 and f2 from Z2n1 × Z2n2 × . . . × Z2nd → Z2m are equal. We denote equivalence as F1 % 2m ≡ F2 % 2m . IV. U NIVARIATE P OLYNOMIALS Given a polynomial function f (x) and its representative polynomial F (x) over Z2m , we need to reinterpret F in a way that would be more suitable for our purposes. In this context, we first define the forward difference operator (∆) [29], which is a discrete analog to the derivative of a polynomial function. Definition IV.1: Let F (x) be a degree-p polynomial
over Z. (∆F )(x) = F (x + 1) − F (x) (∆2 F )(x) = (∆F )(x + 1) − (∆F )(x) .. . ! p X p k p F (p − k + x) (7) (∆ F )(x) = (−1) k k=0 Let us now demonstrate the application of the forward difference operator on a degree-2 polynomial in x. Example IV.1: Let F (x) = 4x2 + 3x be a polynomial in Z. Applying the ∆ operator described in Def. IV.1, we get (∆F )(x) = F (x + 1) − F (x) = 4(x + 1)2 + 3(x + 1) − (4x2 + 3x) = 8x + 7 2
(∆ F )(x) = (∆F )(x + 1) − (∆F )(x) = 8(x + 1) + 7 − (8x + 7) = 8 3
(∆ F )(x) = (∆2 F )(x + 1) − (∆2 F )(x) = 0 We now state Newton’s interpolation formula based on the above definition. The proof for this formula is widely available in literature [30] and is not reproduced here. Proposition 1: If F (x) is a polynomial of degree p with integral coefficients, then it can be written as p X
!
x (8) F (x) = (∆ F )(0) k k=0 Note that the binomial term in Eqn. (8) can be expanded according to, x k
!
=
k
x(x − 1) · · · (x − k + 1) k!
(9)
The numerator of this term represents (in polynomial form) a product of i consecutive numbers in x. More formally, we have the following definition [27]: Definition IV.2: Falling factorials of degree k are defined as: • Y0 (x) = 1 • Y1 (x) = x • Y2 (x) = x(x − 1) .. . •
Yk (x) = x · (x − 1) · · · (x − k + 1)
7
Eqn. (8) can now be written as: p X
F (x) =
(∆k F )(0)
k=0 p X
x(x − 1) · · · (x − k + 1) k!
(∆k F )(0) Yk (x) k! k=0
=
p X
=
ck · Yk (x)
(10)
k=0 k
F )(0) where ck = (∆ k! . Since F (x) has integral coefficients, ck is always an integer for 0 ≤ k ≤ p. Note that F (x) has a maximum of (p + 1) terms in the sum. This is illustrated in the following example: Example IV.2: Consider the polynomial F (x) = 4x2 + 3x in Z. Here, the degree of F (x) is p = 2 and F can be expanded into p + 1 = 3 terms. Using Eqn. (10) and the (∆k F )(x) values computed in Example IV.1, this polynomial can be expressed as,
F (x) =
2 X
ck · Yk (x)
k=0
= c0 · Y0 (x) + c1 · Y1 (x) + c2 · Y2 (x) (∆1 F )(0) (∆0 F )(0) ·1+ ·x 0! 1! (∆2 F )(0) + · x(x − 1) 2! = 7x + 4x(x − 1) The above interpretation is useful in deriving the results proposed in this paper. However, before we proceed to the relevant theorems, we briefly review the concept of vanishing polynomials. =
A. Vanishing Polynomials over Z2m According to a fundamental result in number theory, for any n ∈ N , it is possible to find the least value λ ∈ N such that n|λ!. We denote this value as SF (n) 1 i.e. λ = SF (n) [33]. Consider the ring Z2m . Here, SF (2m ) corresponds to the least value such that 2 m |λ!. The significance of the above concept can be explained as follows. In the ring Z23 , let SF (23 ) = 4 as 8 divides 4! (= 4 × 3 × 2 × 1 = 8 × 3). Note that 8 does not divide 3!, and hence the least λ in question is 4. Consequently, any integer that can be factored into a product This function has been extensively studied in number theory. It was initially studied by Lucas [31] and Kempner [32] and was recently re-visited upon by Smarandache [33]. In recent literature it is often referred to as the Smarandache function and hence we refer to it as SF (n). 1
of (at least) 4 consecutive numbers will be divisible by 23 . In other words, an integer that can be factored into a product of λ consecutive numbers will be ≡ 0%2 3 and vanish in Z23 . Now consider a polynomial F (x) in the ring Z23 , such that 23 |F (x). Therefore, if F , evaluated at x, can be represented as the product of 4 consecutive numbers (depending on x), then F would vanish in Z 23 . Falling factorials are natural examples of such polynomials. Indeed, if F (x) = Y4 (x) = x(x − 1)(x − 2)(x − 3), then F (x) ≡ 0 % 23 . More formally, we have the following result: Lemma 2: Any polynomial in Z2m [x] that can be expressed as a multiple of Yλ (x) will be divisible by 2m and vanish. Example IV.3: Consider the polynomial F (x) corresponding to to the function f : Z24 → Z24 : F (x) = x6 + x5 + 5x4 + 15x3 + 2x2 + 8x
(11)
Here, λ = SF (24 ) = 6. Therefore, if F (x) can be factored into a product of 6 consecutive numbers in x, then F (x) is a vanishing polynomial in Z24 . In fact, F (x) ≡ x(x−1)(x−2)(x−3)(x−4)(x−5) % 24 ≡ Y6 (x) % 24 ; hence F (x) ≡ 0 % 24 . When a polynomial cannot be factorized according to Lemma 2, it can still vanish. In this regard, Chen [27] identified the constraints on the coefficients which would determine whether the polynomial in question would vanish. We state the following result. Lemma 3: The expression ck · Yk (x) ≡ 0 in Z2m [x] 2m if and only if (k!,2 m ) |ck ; where, ck is an arbitrary integer in Z, • Yk (x) is as defined above, • (k!, 2m ) is the greatest common divisor (GCD) of k! and 2m and • k ∈ Z, such that 0 ≤ k < λ. •
In other words, if ck ≡ 0 %
2m (k!,2m )
then ck · Yk (x) ≡ 0.
Example IV.4: Let F (x) = 4x2 − 4x over Z23 [x]. Note that F (x) = 4(x)(x − 1) = 4 · Y2 (x). Here, λ = 4, degree k = 2 < λ and coefficient c2 = 4. Therefore, F (x) cannot be written as a factor of Y 4 (x), according to 23 Lemma 2. However, c2 = 4 ≡ 0 % (2!,2 3 ) . Because the
condition in Lemma 3 is satisfied, F (x) % 2 3 ≡ 0. If c2 were replaced by 3, then F (x) = 3(x)(x−1) % 2 3 would 23 not be a vanishing polynomial as c2 = 3 6= 0 % (2!,2 3) .
8
B. Application to Equivalence Verification The properties of vanishing expressions can be applied to reduce any given polynomial over a finite integer ring. Let F (x) be any polynomial corresponding to a polyfunction f : Z2m → Z2m . Now, F (x) can be written as [27], F (x) = Qλ (x)Yλ (x) + R(x) (12) where, • λ = SF (2m ), • Qλ (x) is an arbitrary polynomial in Z[x], • Yλ (x) represents the product of λ consecutive numbers in x and • R(x) is an arbitrary polynomial in Z[x], such that degree (R(x)) < λ. To obtain the above representation, we divide F (x) by Yλ (x), resulting in the quotient Qλ (x) and remainder R(x). Note that Qλ (x) · Yλ (x) represents a multiple of λ consecutive numbers and is ≡ 0 %2m , according to Lemma 2. Therefore, F (x) = Qλ (x)Yλ (x) + R(x) = 0 + R(x) = R(x)
(13)
The degree of F (x) has now been reduced to < λ. Let us explain this representation with an example. Example IV.5: Consider the polynomial F (x) = 4 x + 3x3 + 7x2 + 6x over Z23 . The degree of F (x) is 4. We compute λ = SF (23 ) = 4, and divide F (x) by Y4 to represent it as: F (x) = Y4 (x) + x3 + 4x2 + 4x Here, Q4 = 1 and R(x) = x3 + 4x2 + 4x. Now, Y4 (x) represents a product of λ consecutive numbers in Z23 , and evaluates to 0 % 23 . Thus, F (x) = x3 + 4x2 + 4x, where the degree is now 3 < λ. We now state the following results: Theorem 1: Let F (x) ∈ Z[x] be a polynomial and let f : Z2m → Z2m be its associated polyfunction. To prove that F (x) vanishes in Z2m , it is sufficient to show that F (x) ≡ 0 % 2m for any λ consecutive integer values of x. Here, λ is the least integer such that 2 m |λ!. Proof: Given a polynomial F (x) corresponding to the polyfunction f : Z2m → Z2m , we can use Eqn. (13) to reduce it to: F (x) = Qλ (x)Yλ (x) + R(x) = R(x)
From Newton’s interpolation formula, we know that any function with integral coefficients can be represented according to Eqn. (10). Note that Eqns. (7) - (10) also hold over Z2m , since the coefficients remain integral. We can now rewrite F (x) as F (x) = R(x) =
λ−1 X
=
λ−1 X
(∆k R)(0) Yk (x) k! k=0 ck Yk (x)
(14)
k=0
since deg(R(x)) < λ. According to Lemma 3, if all the coefficients ck of this expression reduce to 0 when 2m m computed % (2m ,k!) , then F (x) vanishes in Z2 . Thus, we need to verify this condition for each value c k , where 0 ≤ k < λ. Further, each such computation requires evaluating F (x) according to Eqn. (14). This implies that F (x) is evaluated a maximum of λ consecutive times. Thus, the above theorem holds for integers from 0 to λ − 1. Since F (x) ≡ 0∀x ⇔ F (x + a) ≡ 0∀x, where a ∈ Z, the Theorem 1 is also true for any λ consecutive numbers. Example IV.6: Let us explain this concept using the polynomial F (x) % 23 from Example IV.5. Since λ = 4, we need to compute ck for 0 ≤ k < 4. 3 (∆0 F )(0) % (232,0!) 0! 3 (∆1 F )(0) • k = 1: c1 = % (232,1!) 1! Since c1 6= 0 % 23 , F (x) is not a
•
k = 0: c0 =
=0 =1
vanishing polynomial. Corollary 1: Let F1 (x) and F2 (x) be two polynomials with coefficients in Z. To prove F 1 (x) % 2m = F2 (x) % 2m , it is sufficient to show that F1 (x) ≡ F2 (x) in Z2m for any λ consecutive integer values of x. Proof: We need to prove that: F1 (x) % 2m ≡ F2 (x) % 2m ⇒ F1 (x) − F2 (x) ≡ 0 % 2m From Theorem 1, we know that any λ consecutive values of x are sufficient to prove that F1 (x) − F2 (x) vanishes in Z2m . Corollary 1 directly follows from this result. Example IV.7: Consider the two symbolically distinct polynomials F1 (x) = 2x5 + 13x4 + 15x2 + 2x + 8 and F2 (x) = x4 + 10x3 + 3x2 + 2x + 8. To prove that F1 % 24 ≡ F2 % 24 , we need to compare the values of F1 (x) and F2 (x) for any λ = SF (24 ) = 6 consecutive values of x (instead of all 24 values). Therefore, F1 (0) = 8; F2 (0) = 8
9
where Yki (xi ) is the falling factorial of degree k i in variable xi , as defined in Definition IV.2. We now extend Newton’s interpolation formula for d variables. Definition V.2: Let F be a polynomial in d variables x1 , x2 , . . . , xd with degrees p = hp1 , p2 , . . . , pd i. We define the multi-variate ∆ operator as:
F1 (1) = 8; F2 (1) = 8 F1 (2) = 8; F2 (2) = 8 F1 (3) = 8; F2 (3) = 8 F1 (4) = 0; F2 (4) = 0 F1 (5) = 0; F2 (5) = 0
Since F1 (x) and F2 (x) are equivalent for all 6 values, Corollary 1 implies that the two polynomials are equiva(∆p F )(x) = (∆p11 F )(x) ◦ · · · ◦ (∆pdd F )(x) (16) lent for all x in Z24 . Example IV.8: Now consider the polynomials F 1 (x) = Here, ◦ denotes the successive application of the ∆ oper5 x + 15x4 + 5x3 + x2 + 2x + 8 and F2 (x) = x4 + 10x3 + ator for each of the d variables. Example V.1: Let us now apply the ∆ operation to 3x2 + 2x + 8 over Z24 . These polynomials are not equivthe polynomial F (x1 , x2 ) = 4x1 x2 . The degrees of 4 alent % 2 . Let us apply the above result to determine hx1 , x2 i are h1, 1i. their points of difference. (∆h0,1i F )(x1 , x2 ) = 4x1 (x2 + 1) − 4x1 x2
F1 (2) = 8; F2 (2) = 8
= 4x1
F1 (3) = 0; F2 (3) = 8
(∆h1,0i F )(x1 , x2 ) = 4(x1 + 1)x2 − 4x1 x2
F1 (4) = 0; F2 (4) = 0
= 4x2
F1 (5) = 0; F2 (5) = 0 (∆
F1 (6) = 0; F2 (6) = 0 F1 (7) = 0; F2 (7) = 0 Within λ = 6 values, we were able to find that F 1 (3) 6= F2 (3) % 24 . V. E QUIVALENCE
OF
Yk (x) =
d Y
i=1
Yki (xi ) = Yk1 (x1 ) · Yk2 (x2 ) · · · Ykd (xd ), (15)
F )(x1 , x2 ) = 4(x2 + 1) − 4x2
= 4 Proposition 2: Let F (x1 , x2 , . . . , xd ) be a polynomial with degree p. Then, Newton’s formula can be written in multi-index notation as,
M ULTI - VARIATE P OLYNOMIALS
In Section I, we had shown how multiple-wordlength bit-vector computations can be modeled as polynomial functions from Z2n1 × Z2n2 × · · · × Z2nd to Z2m . Moreover, we had noted that a fixed-size datapath with multiple variables is a special case of the above, where n1 = · · · = nd = m. Furthermore, in the previous section, we had sought to exploit the concept of vanishing polynomials to determine the required number of simulation vectors over Z2m [x]. We will now extend these concepts to address multi-variate polynomials. In what follows, we use the multi-index notation: k = hk1 , k2 , . . . , kd i are the (non-negative) degrees corresponding to the d input variables x = hx 1 , x2 , . . . , xd i, respectively. Here, k and x are d-tuples, where k i ∈ Z + and xi ∈ Z2ni , for i = 1, 2, . . . , d. Z + denotes the set of non-negative integers Definition V.1: Let k = hk1 , k2 , . . . , kd i ∈ (Z + )d . We define,
h1,1i
x F (x) = (∆k F )(0) k k≤p X
=
d X (∆k F )(0) Y
k1 ! . . . k d !
k≤p
=
X (∆k F )(0)
k!
k≤p
=
!
X
Yki (xi )
i=1
Yk (x)
ck Yk (x)
(17)
k≤p k
F )(0) where ck = (∆ k! . In the above, k ≤ p implies that k1 ≤ p1 , k2 ≤ p2 , . . . , kd ≤ pd . The coefficients ck are computed for all vectors k = hk1 , . . . , kd i, where each ki = 0, . . . , pi − 1. Q This corresponds to a maximum of di=1 (pi + 1) coefficients. Example V.2: Let us represent the polynomial F = 3x22 + 4x1 x2 according to Proposition 2. Here, the degrees of x1 and x2 can be represented as p = h1, 2i.
F
=
X
hk1 ,k2 i≤h1,2i
chk1 ,k2 i Yhk1 ,k2 i (x1 , x2 )
10
Lemma 5: The expression ck · Yk (x) ≡ 0 if and m only if m Q2 d |c ; where: (2 , ki !) k
(∆k F )(0, 0) Yh0,0i (x1 , x2 ) + · · · 0! · 0! (∆k F )(0, 0) Yh1,2i (x1 , x2 ) + 1! · 2! = 3 · x2 + 3 · x2 (x2 − 1) + 4 · x1 · x2
=
i=1
= 3 · Yh0,1i (x1 , x2 ) + 3 · Yh0,2i (x1 , x2 ) +4 · Yh1,1i (x1 , x2 ) A. Multi-variate Vanishing Polynomials As in the univariate case, we review results related to nil polyfunctions. Lemma 2 can be extended as follows: If a polynomial in d variables can be factorized into a product of λ consecutive numbers in at least one of the variables xi , then it vanishes % 2m . The following example illustrates this idea. Example V.3: Consider the polynomial F (x 1 , x2 ) = x41 x2 +2x31 x2 +3x21 x2 +2x1 x2 over Z22 . Here, λ = 4 and the highest degrees of x1 and x2 are k1 = 4, and k2 = 1, respectively. Now F % 22 can be equivalently written as F = Y4 (x1 )·Y1 (x2 ) % 22 = Y (x1 , x2 ) % 22 . Since F % 22 can be represented as a product of 4 consecutive numbers in x1 , 22 |F and F ≡ 0 in Z22 . In the above example, both the input variables x1 , x2 , as well as the output F are in Z22 . We wish to generalize these results to analyze any arbitrary polynomial over Z2n1 × Z2n2 × . . . × Z2nd to Z2m . For this purpose, we define another quantity, µ i [9]: Definition V.3: µi = min{2ni , λ}; i = 1, 2, . . . , d
k = hk1 , k2 , . . . , kd i represents the degrees of the variables x = hx1 , x2 , . . . , xd i and ki < µi for all i = 1, . . . , d, • ck is an arbitrary integer, • Yk (x) is as defined above and Qd • (2m , i=1 ki !) is the greatest common divisor (GCD) Q of 2m and di=1 ki !. m implies that ck · Alternately, ck ≡ 0 % m Q2 d
•
(18)
where λ = SF (2m ). We now present the following results from [9]: Lemma 4: Let k = hk1 , k2 , . . . , kd i ∈ (Z + )d , where Z + denotes the set of non-negative integers. Then, Yk (x) ≡ 0 if and only if ki ≥ µi , for some i. Example V.4: Let F = x21 x2 −x1 x2 be a polynomial corresponding to f : Z2 × Z22 → Z23 . We show that F is a vanishing polynomial as F can be written according to: x21 x2 − x1 x2 ≡ x1 (x1 − 1)x2 ≡ Yh2,1i (x1 , x2 ) ≡ 0 Here, λ = 4 and the degrees of x1 and x2 are k1 = 2 and k2 = 1. Now µ1 = min{2, 4} = 2 = k1 and µ2 = min{22 , 4} = 4 > k2 . Since the condition in Lemma 4 is satisfied (µ1 = k1 ), F ≡ 0 % 23 .
(2 ,
i=1
ki !)
Yk (x) ≡ 0. Example V.5: Consider the polynomial F = 2 4x1 x2 + 4x1 x2 corresponding to f (x1 , x2 ) : Z2 × Z22 → Z23 . We can use Lemma 5 to prove that f is a nil polyfunction. Here, 2n1 = 2, 2n2 = 4 and 2m = 8. Also, λ = 4; µ1 = min{2, 4} = 2, µ2 = min{4, 4} = 4. F
≡ 4x1 x22 + 4x1 x2 ≡ 4 · x1 · x2 · (x2 − 1) ≡ ch1,2i · Yh1,2i (x1 , x2 ) ≡ 0
because ch1,2i = 4 ≡ 0 %
8 (8,1!·2!) .
B. Application to Equivalence Verification The results of Sec. IV can be easily extended to multivariate polynomials as well. Given any polynomial F (x) corresponding to the polyfunction f : Z 2n1 × Z2n2 × . . . Z2nd → Z2m , we can represent it as F (x) =
d X
Qi (x)Yµ(i) (x) + R(x)
(19)
i=1
where, • µ(i) = h0, ..., µi , ..., 0i is a d-tuple, where µ i is in position i and µi is defined according to Eqn.(18), • Qi (x) ∈ Z[x1 , . . . , xd ] are arbitrary polynomials, possibly zero, • Yµ(i) (x) is the falling factorial of degree µ i in variable xi , • R(x) is an arbitrary polynomial in hx 1 , x2 , . . . , xd i, such that degree (xi ) < µi , for all i = 1, 2, . . . , d. P From Lemma 4, di=1 Qi (x)Yµ(i) (x) ≡ 0 % 2m .
This results in reducing the given polynomial to R(x), where the degree ki of each xi is < µi . This is illustrated in the example below. Example V.6: Let F (x) = x21 + 3x22 + 4x1 x2 + 7x1 over Z2 × Z22 → Z23 . Here, λ = 4, and µ1 =
11
min{2, λ} = 2 and µ2 = min{4, λ} = 4. We represent the polynomial as, F (x1 , x2 ) = Y (x1 , x2 ) + 3x22 + 4x1 x2
(20)
In this case, Y (x1 , x2 ) is a product of µ1 = 2 consecutive numbers in x1 and vanishes % 23 . Hence, F (x) = R(x) = 3x22 + 4x1 x2 , where the maximum degrees of x1 and x2 are respectively, k1 = 1 < µ1 and k2 = 2 < µ 2 . We now state the following theorem: Theorem 2: Let F (x) ∈ Z[x1 , . . . , xd ] and let f : Z2n1 × Z2n2 × . . . Z2nd → Z2m be its associated polyfunction. To prove that F (x) vanishes, it is sufficient to Q show that F (x) ≡ 0 % 2m for any di=1 µi values of x. Here µi is defined as the min{2ni , λ}. Note that every such ’value’ for x = hx1 , . . . , xd i is a d-tuple, such that each xi can take any µi consecutive values. Proof: The proof is based on the corresponding procedure for univariate polynomials, which is extended and reproduced below. Given any polynomial F (x) corresponding to the polyfunction f : Z2n1 × Z2n2 × . . . Z2nd → Z2m , we can reduce it according to Eqn. (19): F (x) =
d X
Qi (x)Yµ(i) (x) + R(x)
i=1
= 0 + R(x) (21)
= R(x)
From Newton’s formula in Proposition 2, we can now reinterpret the polynomial as: F (x) = R(x) =
X (∆k R)(0)
k/16 1/4/< 16 >/16 1/6/< 32 >/32 1/9/< 32 >/32 1/7/< 32 >/32 1/7/< 32 >/32
211 216 232 232 232 232
18/< 1 18/< 1 34/< 1 34/< 1 34/< 1 34/< 1
2/4/< 12, 8 >/16 2/4/< 11, 14 >/16 3/3/< 12, 14, 10 >/16 3/4/< 15, 11, 13 >/16 2/4/< 24, 29 >/32 2/9/< 16, 12 >/16 3/4/< 14, 14, 16 >/16 2/10/< 12, 12 >/16
220 225 236 239 253 228 244 224
182 /< 1 182 /< 1 183 /< 1 183 /< 1 342 /< 1 182 /< 1 183 /< 1 182 /< 1
Univariate Anti-alias function Poly unopt cos x cot−1 x erf x 1+x ln( 1−x ) Multivariate IRR PSK Cubic filter Degree-4 filter 2 4th Order IIR MIBENCH Horner Polynomial Vanishing polynomial
S IMULATION Benchmark Anti-alias function PSK Horner Polynomial 4th Order IIR Vanishing polynomial
TABLE II
RESULTS FOR DETECTING NON - EQUIVALENCE
Random Simulation Test Vectors 12 14 4 9 8
Our Approach Simulation Bound 18 182 183 342 182
second example is a polynomial expression from [34]. The other univariate examples are implementations of elementary function computations. The first benchmark in the set of multivariate datapath instances represents an image rejection computation (IRR). The phase-shift keying (PSK) function is from [4] and is used in digital communication. The polynomial filters are Volterra models of polynomial signal processing applications taken from [35]. MIBENCH is a 9th -degree polynomial from a set of automotive applications in [36]. Horner polynomials are borrowed from [34]. Polynomial computations commonly used in DSP are often implemented in Horner’s form using multiply-add-accumulate (MAC) units. In [4], it was shown how computations by these MAC units can be extracted as polynomials in Horner’s form. The vanishing polynomial example was specifically created to validate our concepts. Some of these designs were available as RTL code.
Our Approach Required Test Vectors/Time(s) 18/< 1 308/< 1 4335/< 1 416/< 1 103/< 1
The others were available as high-level specifications in MATLAB or C code. RTL code for these reference designs was automatically generated using the MATLAB Simulink and Filter Design toolboxes (particularly for the digital filter designs) [3]. Once the reference RTL descriptions were obtained, they were further optimized using techniques from [4] and [5]. In [4], application of highlevel restructuring and symbolic algebra-based transformations was presented for high-level synthesis. These include factorization and expansion, tree-height reduction, etc. The recent work of [5] has derived a sequence of polynomial algebra based transformations to reduce the area-cost of the implementation. This is achieved by modulating and segmenting the coefficients and subsequently removing algebraic redundancy (vanishing polynomials). These transformations were applied to the original RTL description to obtain functionally equivalent implementations.
13
approach in the presence of bugs. To verify that we can detect non-equivalence of designs with the proposed simulation bound, we experimented with some designs by arbitrarily changing one or more of the coefficients. In all cases, we were able to detect the erroneous values within the required number of simulations. Table II presents the results of these experiments. Consider, for instance, the benchmark PSK. We were able to determine the error in 308 consecutive test vectors, well within the bound of 324 vectors.
Subsequently, the data-flow graphs for the original and optimized RTL descriptions were extracted using G AUT [37]. Traversing the DFGs from the inputs to the outputs, the polynomial representations were constructed. The datapath sizes of both inputs and outputs (n1 , . . . , nd and m) were also recorded. We then used the algorithm given in the Appendix to compute the appropriate λ and µ values. Based on these parameters, the proposed results were applied to generate the maximum number of required test vectors. The descriptions were then simulated with these vectors to verify equivalence. The results are given in Table I. Let us explain our approach using the polynomial benchmark Poly unopt. Fig. 3 depicts the dataflow graph for the original expression. We have used the optimization procedure presented in [5] to apply a series of algebraic reductions yielding an equivalent representation with the minimum estimated cost (in terms of area). The corresponding graph is depicted in Fig. 4. It is now required to show that the reduced cost implementation is the same as the original representation. Hence, we first compute the required number of simulation vectors. From Corollary 1, we know that λ = SF (2 16 ) = 18 consecutive test vectors are required. We then extract the polynomials corresponding to both designs and perform the simulation run. It was found that both polynomials evaluate to the same values % 216 , implying that the corresponding designs are equivalent. 40960
*
*
+
+
*
*
+
+
*
*
*
*
20994
2, we compute the value of ν as blog2 (1 + 3)c = 2. 2. Now, initialize rem = 3 and n = 0. Continue to the while loop since rem > 0. • Compute a2 = 2ν −1 = 3 and n = 0+brem/a2 c = 1 • Update rem = rem%a2 = 0 and ν = ν − 1 = 1 3. Since rem = 0, exit the while loop. The computed value of λ is (m + n) = 4. From Sec. IV, we can verify that this is indeed the correct value. Complexity: The worst case complexity of the algorithm is O(m/log(m)), where m corresponds to the word-length of the output variable in the datapath.