Fast Pruned Interleaving - Semantic Scholar

Report 2 Downloads 149 Views
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 3, MARCH 2013

817

Fast Pruned Interleaving Mohammad M. Mansour, Senior Member, IEEE

Abstract—In this paper, computationally efficient schemes for enumerating the so-called inliers of a wide range of permutations employed in pruned variable-size (turbo) interleavers are proposed. The objective is to accelerate pruned interleaving time in turbo codes by computing a statistic known as the pruning gap that enables determining a permuted address under pruning without serially permuting all its predecessors. It is shown that for any linear or quadratic permutation, including variations such as dithered relative prime or almost regular, the pruning gap can be computed in logarithmic time. Moreover, it is shown that Dedekind sums form efficient building blocks for enumerating inliers of the widely adopted polynomial-based permutations. An efficient algorithm for computing such sums in vector form using integer operations is presented. The results are extended to 2D and higher dimensional interleavers that combine multiple permutations along all dimensions, and closed-form expressions for inliers are derived. It is shown that the inliers statistic is a linear combination of the constituent permutation inliers. A lower bound on the minimum spread of serially pruned interleavers using the inliers statistic is also derived. Moreover, it is shown that serially pruned interleavers inherit the content-free property of the mother interleaver, and hence they are parallelizable. Simulation results of practical pruned turbo interleavers demonstrate a speedup improvement of several orders of magnitude compared to serial interleaving. Index Terms—Pruned interleavers, turbo codes, permutation polynomials, QPPs, 3GPP LTE, channel interleavers.

I. I NTRODUCTION

I

N this paper, computationally efficient schemes to accelerate the operations of pruned interleavers employed in turbo codes are proposed. For background on the theory of interleavers, see [1]–[3]. Several popular interleaving schemes for turbo codes [4] have been proposed in for example [5]–[8], and practical interleaver aspects have been treated in [9]–[14], among others. Interleavers are devices that reshuffle a sequence of symbols according to some permutation [1]. They are widely used in communication systems as an adjunct to coding for error correction to reduce the impact of noise on communication performance. Interleavers are based on carefully chosen permutations that break temporal/spatial correlation between successive symbols in the input sequence. For example applications, see [4], [15]–[17]. Interleavers with random properties are often generated by pseudo-random algorithms (e.g., high spread S-random interleavers [5]). However, such interleavers lack a compact representation that leads to a simple implementation. Paper approved by L. Szczecinski, the Editor for Coded Modulation and ARQ of the IEEE Communications Society. Manuscript received April 13, 2012; revised August 16, 2012. The author is with the Department of Electrical and Computer Engineering, American University of Beirut, P.O. Box 11-0236, Riad El-Solh Beirut, 1107 2020, Lebanon (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2012.120512.120252

A class of computationally efficient interleavers are deterministic interleavers [3], [18] with simple address generation expressed in closed-form. Examples include bit-reversal permutations [19] and permutation polynomials (PPs) [20] (e.g., circular shift [7], parity and column twist [21]), linear permutation polynomials (LPPs) [19], [22], almost regular permutations [8], dithered relative prime (DRP) [7]), and quadratic permutation polynomials (QPPs) [20], [23], [24]. Many practical deterministic interleavers are limited to a small set of discrete lengths. Pruning is a technique used to support more flexible block lengths [25]–[27]. Communication standards [19], [24], [28] typically vary the block length depending on the input data rate requirements and channel conditions. To support any length β, interleaving is done using a mother interleaver with smallest length k > β such that outlier interleaved addresses ≥ β are excluded. However, pruning alters the spread characteristics of the mother interleaver. It also creates a serial bottleneck since interleaved indices become address-dependent, and hence permuting streaming data in parallel on the fly is no longer practically feasible [29]. Expensive buffering of the data is required to maintain a desired system throughput. Hence it is essential to characterize the pruned permutation structure to study its spread characteristics, and to parallelize the pruning operation to reduce latency by interleaving an address without interleaving all its predecessors. In [30], [31], a solution for pruned bit-reversal and linear congruential interleavers was proposed based on solving the following mathematical problem. Given a set of integers [k]  {0, 1, · · · , k − 1} and a permutation π on [k], enumerate all integers < α in [k] that map to indices less than some β ∈ [k] in the permuted sequence. This permutation statistic was called the (α, β)-inliers of π and the inliers count is denoted by I(π, k; α, β) [31]. A serially pruned interleaver (SPI) of size α < k and pruning ◦ length β < k, with α < β, is defined by π : D → R, ◦ x → y = π(x) = π(p(x)), where |D| = |R| = α, such that: ◦ 1) π(x) < β, and 2) p(x)  x + Δx is the pruning function where Δx is the pruning gap of x defined to be the minimum Δ ≥ 0 such that I(π, k; x + Δ, β) = x (i.e., for j = 0, · · · , x+Δx −1, π(j) < β is satisfied exactly x times). The domain ◦ and range of π are D = [α] and R = π(p([α])). If this gap can be efficiently computed, then pruned interleaving can be parallelized by windowing using the minimal inliers and parallel pruning algorithms in [31]. Also this gap is used to characterize the minimum spread of a SPI as shown later. The difficult part is to how to compute the pruning gap, given that it heavily depends on the structure of π. In this paper, we present efficient schemes for computing this statistic for a large class of practical turbo interleavers,

c 2012 IEEE 0090-6778/12$31.00 

818

using a mathematical framework that extends beyond our earlier work in [30], [31]. In particular, the contributions of this work are: 1) We prove that for any linear or quadratic permutation, including dithered relative prime or almost regular permutations, the pruning gap can be computed in O(log k); 2) We show that Dedekind sums [22] form efficient building blocks for enumerating inliers of PPs, which have been widely adopted in the literature (e.g. QPP in LTE [20], [23], [24]); 3) We present an efficient algorithm to compute such sums in vector form using only integer operations by optimizing a certain reciprocity relationship; 4) We extend the results to 2D and higher dimensional interleavers that combine multiple permutations along all dimensions, and show that the inliers statistic is a linear combination of the underlying permutation inliers; 5) We derive a lower bound on the minimum spread of a SPI and show that it remains close to that of its mother interleaver for small k−β; and finally 6) we show that serial pruning preserves the contention-free property of the mother interleaver. The reader is referred to [31] for parallel pruning algorithms and architectures of bit-reversal permutations. The remainder of the paper is organized as follows. Section II presents the permutation inliers problem and gives a solution for LPPs using Dedekind sums, together with an algorithm for evaluating these sums. In Section III, LPPs are studied in more detail and some arithmetic properties of Dedekind sums are derived. Inliers of almost regular, dithered, and quadratic permutations are addressed in Section IV. Inliers of 2D and 3D interleavers are discussed in Sections V and VI. The minimum spread and contention-free properties of SPIs are treated in Section VII. Sections VIII and IX include simulation results of practical pruned interleavers and conclude the paper. Proofs of all theorems are included in the Appendices. II. P ERMUTATION I NLIERS Consider the set of integers [k] = {0, 1, · · · , k − 1}, and let π(·) be a permutation on [k]. We are interested in enumerating all j ∈ [k] such that j < α and π(j) < β, where 0 < α < β ≤ k. Here, π represents the interleaver mapping under investigation. In [31], a permutation statistic useful for analyzing pruned interleavers called permutation inliers was introduced. An integer j ∈ [k] is called an (α, β)-inlier of π if j < α and π(j) < β. Let I(π, k; α, β)  {j ∈ [k] | j < α, π(j) < β}, 0 < α, β ≤ k, (1) be the set of all (α, β)-inliers and I(π, k; α, β) be the (α, β)-inliers count. Otherwise, j is called an (α, β)-outlier if j < α and π(j) ≥ β. O(π, k; α, β) = [α] − I(π, k; α, β) denotes the set of all (α, β)-outliers, where ‘−’ is the set-difference operator, and O(π, k; α, β) their count. For example, if π = ( 09 11 27 32 45 58 66 74 80 93 ), α = 5, β = 7, only {1, 3, 4} map to positions < 7. The (5, 7)-inliers are I(π, 10; 5, 7) = {1, 3, 4}, while the outliers are O(π, 10; 5, 7) = [5]−I(π, 10; 5, 7) = {0, 2}. The more general case of counting inliers in a bounded region α1 ≤ j < α2 and β1 ≤ π(j) < β2 , I  (π, k; α1 , β1 , α2 , β2 ) = {j ∈ [k] | α1 ≤ j < α2 , β1 ≤ π(j) < β2 }, 0 ≤ α1 < α2 ≤ k, 0 ≤ β1 < β2 ≤ k, reduces to the original problem in (1) by observing that I  (π, k; α1 , β1 , α2 , β2 ) = {I(π, k; α2 , β2 )−I(π, k; α2 , β1 )}−

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 3, MARCH 2013

{I(π, k; α1 , β2 )−I(π, k; α1 , β1 )}. Hence without loss of generality, we focus on (1) in the remainder of this paper. The inliers count I(π, k; α, β)  |I(π, k; α, β)| can be expressed as (see Appendix B) k−1  j − απ(j) − β I(π, k; α, β) = (2) k k j=0 where x is the largest integer ≤ x. The outliers count is O(π, k; α, β) = α−I(π, k; α, β). We give a more convenient formulation of (2) below using the saw-tooth function ((x))  x−x− 12 + 12 δ(x), where δ(x) = 1 if x is an integer, and 0 otherwise (for properties of ((·)), see Appendix A and [22]). Theorem 1: For any permutation π on [k], the (α, β)-inliers count is given by αβ +K I(π, k; α, β) = k           k−1  j −α   j π(j)−β π(j) + − − (3) k k k k j=0 where K is a constant given in (54) in Appendix B. Also, there exist small positive constants c1 , c2 such that αβ k −c1 ≤ I(π, +c . k; α, β) ≤  αβ 2 k We show in this work that inliers can be enumerated more efficiently in O(log k) as opposed to O(k) for a wide class of useful permutations using (3) instead of (2). A. Dedekind Sums: Building Blocks for Enumerating Inliers In this section, we develop an efficient scheme for evaluating sums involving the saw-tooth function similar to those used in the inliers sum in (3). A large class of permutations adopted in interleavers in the literature is based on permutation polynomials of the form π(j) = F (j) mod k, where d F (j) = i=0 ai j d is a degree-d polynomial over the ring of integers. Such permutations have been studied in [20], [32] and appropriate conditions on the integer coefficients ai were derived. If k = 2n , n > 2, then F (j) is a PP mod k if and only if a1 is odd, (a2 +a4 +· · · ) is even, and (a3 +a5 +· · · ) is even [32]. Also, if F (j) is a PP modulo k, then F (j) is a PP modulo k/2. Examples include linear PPs F (j) = jh+c with gcd(h, k) = 1, and quadratic PPs F (j) = jh+j 2 b + c with h odd and b even. For values of k other than 2n , see [20]. We show next that inliers for this class of permutations can be enumerated efficiently using Dedekind sums [33] (see properties in Appendix A and generalization (64) in Appendix C):    k−1  j jh + x D(h, k, x)  , (4) k k j=0 where x is real and gcd(h, k) = 1. These sums possess a wellknown reciprocity relation [33] between the arguments h and k, which enables efficient evaluation of (4) in O(log k) using a Euclidean-like algorithm using integer operations. From (3), the inliers count in general depends on the difference of l pairs of Dedekind sums (e.g., l = 2 in (3) for π(j) = jh), which can be expressed in vector form as  l−1  (−1)i D(hi , ki , xi )−D(hi , ki , yi ) D(h, k, x, y; w) = i=0

1 − 2

       yi xi − ·δ[wi ] ki ki

(5)

MONSOUR: FAST PRUNED INTERLEAVING

819

procedure D EDV EC - ALGORITHM(h,k, x, y; w) D←0 h ← h mod k θ ← x−x , φ ← y−y x ← x mod k, y ← y mod k v ← −1 for i = 0 to len(h) − 1 do s1 ← 0, s2 ← 0 p ← 1, p ← 0 s ← 1, v ← −v D ← D + v · ((xi mod ki − yi mod ki )/ki + δ[xi ]/2 − δ[yi ]/2) · δ[wi ] while h(i) > 0 do a ← ki /hi  b1 ← xi /hi  , r1 ← xi mod hi b2 ← yi /hi  , r2 ← yi mod hi e1 ← (0 < θi < 1), e2 ← (0 < φi < 1) f1 ← (r1 = 0)· (xi = 0)·(e1 = 0), f2 ← (r2 = 0)·(yi = 0)·(e2 = 0) s1 ← s1 − 3s · (2b1 − 2b2 − f1 + f2 ) s2 ← s2 + 6sp · (b1 · (xi + r1 + e1 ) − b2 · (yi + r2 + e2 )) xi ← r1 , yi ← r2 , s ← −s t ← ki mod hi , ki ← hi , hi ← t t ← ap + p , p ← p, p ← t end while D ← D + (s1 + s2 /p)v end for end procedure

 reduce mod k  split into integer and fraction  alternate between add and subtract  loop over all vector elements  to store new difference  sign  apply reciprocity relation between hi , ki

 (true) = 1; (false) = 0

 add or subtract new difference

Fig. 1: Dedekind vector sum algorithm for computing (5).

l−1 l−1 l−1 where h = (hi )l−1 i=0 , k = (ki )i=0 , x = (xi )i=0 , y = (yi )i=0 , l−1 w = (wi )i=0 are vectors of length l, and δ[x] = 1 if x = 0, and 0 otherwise (not to be confused with δ(x) above which equals 1 when x is an integer, and 0 otherwise). When all hi = h and all ki = k, we use the notation D(h, k, x, y; w), and when all δ[wi ] = 0, we use D(h, k, x, y). In Fig. 1, we present an algorithm to compute (5) by generalizing the work in [30], [33] to include: 1) any l > 0, 2) case xi and yi are real numbers (not just integers), 3) case hi ’s and ki ’s are distinct, and 4) handle the last term in (5). Lemma 1: For any LPP π(j) = jh+c (mod k) with gcd(h, k) = 1, the inliers count is αβ + D(h, k, x, y) + K (6) ILPP (π, k; α, β) = k

where l = 2, y = (π(α), π(0)), x = y−(β, β), and K is defined in (54). (see Appendix D) III. M ORE ON LPP S AND S OME R ELATED A RITHMETIC ON D EDEKIND S UMS An immediate generalization that follows from (2) is that bounded elements generated by any two permutations σ and π on [k] can be enumerated as well. Let I(σ, π, k; α, β)  {j ∈ [k] | σ(j) < α, π(j) < β} for 0 < α < β < k. The inliers count is given by k−1  σ(j)−απ(j)−β I(σ, π, k; α, β) = k k j=0 k−1  j −απ(σ −1 (j))−β = (7) k k j=0 = I(π◦σ −1 , k; α, β) = I(σ◦π −1 , k; β, α) (8) where (7) follows because the sum does not change if the summation indices j are permuted.

Example 1: Let σ(j) = jg + b (mod k), π(j) = jh + c (mod k) be two LPPs, where gcd(g, k) = gcd(h, k) = 1. Then σ −1 (j) = g  · (j − b) (mod k), where g  g = 1 (mod k), and ρ(j)  π(σ −1 (j)) = g  h·(j−b)+c (mod k). Then ILPP (π ◦σ −1 ,  k, α, β) = αβ k + D(g h, k, x, y)+ K, where l = 2, y = (ρ(α), ρ(0)), x= y − (β, β). If k = 230 , g = 77, h = 51, b = 4, c = 1, α = 220 , β = 223 + 215 , then K = 0 using (54) and ILPP (ρ, 230 ; 220 , 223+215) evaluates to 13618 using (5) in 8 iterations. We generalize to the case when σ(j) is an arbitrary permutation on [k1 ] and π(j) is a permutation on k2 = pk1 elements for some p, such that for j = 0, · · · , k1−1, π(j+mk1 ) = πm (j) (mod k1 ), where the πm ’s are arbitrary permutations on [k1 ] for m = 0, · · · , p − 1. We enumerate the inliers I(σ, π, k1 , k2 ; α, β)  {j ∈ [k2 ] | σ(j) < α, π(j) < β}, where α ∈ [k1 ], β ∈ [k2 ], using the sum:   k 2 −1 σ(j)−α π(j)−β I(σ, π, k1 , k2 ; α, β) = k1 k2 j=0   k 2 −1 σ(π −1 (j))−α j − β = k1 k2 j=0   k 1 −1 ∗ −1 j − β σ(πm (j))−α = α·m+ k1 k1 j=0 −1 = α·m+I(σ◦πm , k1 ; β ∗ , α)

(9)

where m = β/k1  and β ∗ = β mod k1 . Note that when k1 = k2 , then p = 1, m = 0 and hence (9) reduces to (8). Example 2: Let σ(j) = jg + b (mod k1 ), π(j) = jh + c (mod k2 ) be two LPPs where k1 = 221 , k2 = 8k1 . π satisfies π(j + mk1 ) = π(j) (mod k1 ) for any m. Let λ(j)  σ(π −1 (j)) = h g · (j − c) + b (mod k1 ), where hh = 1 (mod k2 ). With the parameters from Example 1,

the inliers count evaluates to 220 · (223 +215 )/221 +ILPP (λ, 221 ; (223 +215 ) mod 221 , 220 ) = 4210743 using (9).

820

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 3, MARCH 2013

Similarly, for LPPs σ, π as defined in Example 1, we can express the following sum in terms of Dedekind sums (4), using the fact that sums remain invariant upon permuting the indices j:     k−1  σ(j)−α π(j)−β = D(g  h, k, ρ(α) − β) k k j=0 = D(h g, k, λ(β) − α), (10) where g  g = 1 (mod k), h h = 1 (mod k), ρ(j) = π(σ −1 (j)), λ(j) = σ(π −1 (j)). For the case when σ, π are permutations on [k1 ] and [k2 ] with k2 = pk1 (used in 2D interleavers), we have    k 2 −1 σ(j)−α π(j)−β = d·D(g  h/d, k1 , (ρ(α)−β)/d) k k 1 2 j=0 (11) 

where gg = 1 (mod k1 ) and d = gcd(h, k2 ) (see Appendix C). Moreover, a further generalization useful for counting inliers for almost linear, dithered relative prime, and QPPs is evaluating Dedekind sums only over indices j = r (mod q), for r = 0, · · · , q−1, where q is a divisor of k (denoted as q | k):     k−1 j  jh+x Dr,q (h, k, x)  k k j=0 j≡r(q)

We show that these sums can be evaluated in terms of Dedekind sums using the relation:   r rh+x Dr,q (h, k, x) = D(qh, k, rh + x)+ k q    1 rh+x − ·δ[r] (12) 2 k Similarly, generalized (r mod q)-Dedekind sums defined by two LLPs σ and π are evaluated as:   k−1 σ(j)−α π(j)−β =D(g  hq, k, z(r)+ρ(α+ω)−β) k k j=0 j≡r(q)      ω π(r)−β 1 z(r) + ρ(α + ω) − β + − ·δ[ω] (13) k q 2 k where ω = σ(r)−α (mod q), g  g = 1 (mod k/q), z(r) = (1− g  g)hr, and ρ(j) = π(σ −1 (j)). Note that when σ(j) = j, α = β = 0, then ω = r, z = 0, ρ = π, and hence (13) reduces to (12) with c = x. The proofs of (10), (11) and (12), (13) are given in Appendices C and E, respectively. IV. I NLIERS OF P OLYNOMIAL -BASED P RUNED I NTERLEAVERS In this section, inliers of other polynomial-based permutations are considered. A. Almost Regular Permutation (ARP) Interleavers Almost regular permutations were proposed by Berrou et al. in [8]. They are LPPs with an irregular constant term (degree-0 term in the LPP in j) that is periodic with even period q|k: π(j) = jh + P (j mod q) (mod k) P (i) = q · (ci h + di ) where gcd(h, k) = 1, and ci , di are small integers that are periodic with period q. Since q|k and gcd(h, k) = 1, then

gcd(h, q) = 1 and hence it is easy to show that π is a permutation mod k. Here we study the inliers of a more general case defined by two ARPs σ and π: σ(j) = jg + S(j mod q) (mod k)

(14)

π(j) = jh + P (j mod q) (mod k)

(15)

where gcd(g, k) = gcd(h, k) = 1, and appropriately chosen periodic functions S, P . Since the degree-0 terms in the ARPs in (14), (15) are periodic, each ARP can be viewed as q LPPs: σr (j) = jg +S(r) (mod k), j = r mod q, r = 0, · · · , q−1 (16) πr (j) = jh+P (r) (mod k), j = r mod q, r = 0, · · · , q−1 (17) and hence the inliers set is the disjoint union of q LPP inlier sets: q−1

IARP (σ, π, k; α, β) =



ILPP,r (σr , πr , k; α, β)

r=0

ILPP,r (σr , πr , k; α, β) = {j ∈ [k] | j = r modq, σr (j) < α, πr (j) < β}

Consequently, the ARP inliers count is the sum of q LPP inlier counts: q−1  ILPP,r (σr , πr , k; α, β) (18) IARP (σ, π, k; α, β) = ILPP,r (σr , πr , k; α, β) =

r=0 k−1 

   σr (j)−α πr (j)−β (19) k k

j=0 j≡r(q)

To evaluate (18), (19), we apply Theorem 1 and use the (r mod q)-Dedekind sum in (13). Theorem 2: Let σr (j), πr (j) be two ARPs as defined in (16), (17), and let ρr (j) = πr (σr−1 (j)) = g  h·(j−S(r))+P (r) (mod k), where g  is the inverse of g modulo (k/q). Then IARP (σ, π, k; α, β) =

αβ +D (h1 , k, x, y; w) k q + D(h2 , q, u, v) + KARP k

(20)

where h1 = g  hq, h2 = g  h, g  g = 1 (mod q), and x = (xi )l−1 i=0 , l−1 , w = (w ) are vectors of length l = 2q whose y = (yi )l−1 i i=0 i=0 even and odd elements are defined for r = 0, · · · , q−1 as y2r = z(r) + ρr (α + w2r ) y2r+1 = z(r) + ρr (w2r+1 ) w2r = gr − α (mod q)

x2r = y2r − β x2r+1 = y2r+1 − β

(21) (22)

w2r+1 = gr (mod q) (23)

where z(r) = (1 − g  g)hr, v = (h2 α, 0), u = v − (β, β), and KARP is given in (70) in Appendix F. B. Dithered Relative Prime (DRP) Interleavers DRP interleavers were defined by Crozier and Guinand in [7] using the recursion π(j) = π(j − 1) + S(j mod q)

(mod k)

for j = 0, · · · , k − 1, where π(−1) = 0 and S is a vector of q (typically small) appropriately chosen increments. By unrolling the recursion, it is easy to show that when j = r (mod q) for some r = 0, · · · , q−1, a DRP permutation reduces to an ARP as follows:

MONSOUR: FAST PRUNED INTERLEAVING

πr (j) =

r  i=0

S(i) +

821

q−1 j−r  S(i) (mod k), (j = r mod q) q i=0

= hj + P (r) − hr (mod k) where h  1q q−1 i=0 S(i) is an integer co-prime to k and P (r) = r S(i). Hence the inliers of a DRP permutation can be i=0 computed similar to that of an ARP using Theorem 2 and (20). C. Quadratic Permutation Polynomial Interleavers Let π(j) = jh + j 2 b + c (mod k). For π to be a QPP, the coefficients h and b must satisfy certain conditions with respect to the prime factors of k [34]. Let k = i pi ni be the prime factorization of k, where pi are the prime factors, and ni ≥ 1 the corresponding powers. If 2 is not a prime factor of k, or if 2 is a prime factor with a power ≥ 2 (case 1), then gcd(h, k) = 1 and the prime factors of b must include at least all the prime factors of k (but possibly with different powers), i.e., b = m i pi mi where m  pi and mi ≥ 1. Otherwise, if 2 is a prime factor with a power of 1 (case 2), then gcd(h, k/2) = 1, h + b must be odd, and the prime factors of b must include at least all the prime factors of k other than 2 (again possibly with different powers), i.e., b = m i pi mi where m  pi , pi = 2, mi ≥ 1, and 2  m if h is even. For example, if k = 22·33·52 , then case 1 applies and hence h = 7 and b = 2·3·5 would a generate a QPP mod k. If k = 2·33 ·52 , then case 2 applies and h = 7, b = 2·(3·5) or h = 2·7, b = 3·5 would both generate a QPP mod k. Let q = gcd(k, b) and b = mk/q, where m satisfies the above conditions. Theorem 3 computes Quadratic-Dedekind sums of the following form using Dedekind sums (see properties/proof in Appendix G):     k−1  j jh + j 2 b + c (24) Q(h, k, c, b)  k k j=0 3: Let π(j) = jh + j 2 b + c (mod k) where k = Theorem ni and b = m i pi mi such that the above QPP conditions i pi are satisfied. Assume that mi ≥ ni /2 for all prime factors pi . Then     q−1  1 π(r) D(h, k/q, π(r)/q)− Q(h, k, c, b) = ·δ[r] 2 k r=0    q 1 c (25) + D(h, q, c)− , k 2 q Depending of the value of q in b = mk/q, (25) can be further simplified. The term j 2 b = j 2 mk/q, when taken mod k, depends only on j 2 mod q, and hence is periodic with period q. We give a solution for (25) when q = 8. Similar steps can be followed for other values of q. It is easy to verify that ⎧ (mod k), when j = 0 mod 4; ⎨0 j 2 b ≡ mk/8 (mod k), when j = 1 mod 2; ⎩ mk/2 (mod k), when j = 2 mod 4, and hence π can be viewed as the ⎧ (mod k), ⎨c π(j) = jh+ c+mk/8 (mod k), ⎩ c+mk/2 (mod k),

union of three LPPs: when j = 0 mod 4; when j = 1 mod 2; (26) when j = 2 mod 4.

Using (r mod q)-Dedekind sums in (12) with π as given

in (26), we obtain, for b = mk/8: Q(h, k, c, b)= D0,4 (h, k, c)+D1,2(h, k, c+mk/8) +D2,4 (h, k, c+mk/2) = D(h, k/4, c/4) + D(h, k/2, (c + b + h)/2)    1 c+b+h +D(h, k/4, (c + 2h)/4 + b) − 2 k     2 c+2h 1 c+4b+2h + − (27) k 4 2 k which includes only 3 Dedekind sums as opposed to q+1 = 9 in (25). With Q(h, k, c, b) in (25) evaluated using (27), Theorem 1 can now be applied to compute the QPP inliers count: αβ +Q(h+2αb, k, π(α)−β, b) IQPP (π, k, α, β) = k −Q(h+2αb, k, π(α), b)+Q(h, k, π(0), b) −Q(h, k, π(0)−β, b)+K

(28)

using the following lemma (see proofs of equation (27) and Lemma 2 in Appendix H): Lemma 2: For a QPP with k, b as defined in Theorem 3 and q = 8, the inliers count is αβ IQPP8 (π, k; α, β) = +D (h, k, x, y; w) + KQPP8 (29) k using (28), where h = (h+2αb, h, h, h, h, h) k = (k/2, k/2, k/4, k/4, k/4, k/4) w = (1, 1, 0, 0, 1, 1)           π(α) π(0)−β 2 π(α)−β KQPP8 = K − − − k 4 4 4    π(0) + 4 1 x = y− (2β, 2β, β, β, β, β)  4 π(α) + b + h π(0) + b + h π(α) π(0) y= +αb, , , , 2 2 4 4  π(0) + 2h π(α) + 2h +(α+1)b, +b . 4 4 Example 3: Let σ(j) = jg + d (mod k) be LPP and π(j) = hj + bj 2 + c (mod k) be a QPP with b = mk/8. Then ρ(j)  π(σ −1 (j)) = h1 j + b1 j 2 + c1 (mod k), where h1 = g  h − 2bg 2 d, b1 = bg 2 , c1 = c − g  dh + bg 2 d2 , and g  g = 1 mod k, is a QPP. Using (8), (29), the inliers count 24 is IQPP (ρ, k; α, β) = αβ k +D (h, k, x, y; w)+KQPP8 . If k = 2 , g = 13, d = 0, b = k/8, c = 0, α = k/4, β = k/2 + 1, then h1 = 2581111, b1 = b, c1 = 0, π(0) = 0, π(α) = 12582912, and KQPP8 = 3/4. Applying (5), IQPP (ρ, 224 ; 222 , 223 +1) evaluates to 2097153 using in 9 iterations. Using (8) as well, the inliers count does not change if we use λ(j) = σ(π −1 (j)). The inverse of π is another QPP [34] π −1 (j) = g1 j+g2 j 2 (mod k), where g2 = −b · [(h+b)(h+2b)(h+3b)]−1 2

−1

g1 = (1−g2 (h+b) )·(h + b) 2

(mod k/2) (30) (mod k)

(31)

Then λ(j) = gg1 j + gg2 j + d (mod k) is a QPP (k even) with gg2 = m2 k/8 for some m2 since gcd(g, k) = 1 and −1 [(h+b)(h+2b)(h+3b)] is odd, and hence Lemma 2 applies 24 23 from which IQPP (λ, 2 ; 2 +1, 222) evaluates to 2097153 in

822

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 3, MARCH 2013

8 iterations. V. 2D B LOCK I NTERLEAVERS We next consider the inliers of widely adopted 2D interleavers (e.g., [28], [35]). A 2D block interleaver [3], [25] of size k = k1 k2 is defined as follows. Data elements to be interleaved are written row-wise into an array with k1 rows and k2 columns, and read out pseudo-randomly. An element at entry x = i·k2 +j ∈ [k] is written into row i = x/k2  ∈ [k1 ] and column j = x mod k2 ∈ [k2 ]. The row and column read addresses ir and jr are defined by the functions R(·, ·) : k1 × k2 → k1 , C(·, ·) : k1 × k2 → k2 ,

(i, j) → ir = R(i, j) (i, j) → jr = C(i, j)

Hence, entry (i, j) maps to entry (ir , jr ), or equivalently, the interleaver mapping is given by π(i · k2 + j) = ir · k2 + jr .

(32)

For π to be a permutation, the pair of functions R(·, ·) and C(·, ·) must be orthogonal, i.e., all the k1·k2 pairs (R(i, j), C(i, j)) must be distinct for i = 0, · · · , k1 −1 and j = 0, · · · , k2 −1. Example 4: If R(i, j) = R(i) : k1 → k1 and C(i, j) = C(j) : k2 → k2 are permutation functions in i on [k1 ] and in j on [k2 ], respectively, then R(·) and C(·) are orthogonal and π is a permutation. Example 5: Let R (i, j) : k1 × k1 → k1 be a bipermutation function [36] in i, j on [k1 ], k1 = 2n (i.e., both R (i, ·) and R (·, j) are permutations on [k1 ]). For example, R (i, j) = pi + qj + c (mod k1 ) with gcd(p, k1 ) = gcd(q, k1 ) = 1 is a bipermutation since R (i, ·) is a LPP in i and R (·, j) is a LPP in j. Let C(i, j) = C(j) : k2 → k2 be a permutation function in j on [k2 ], such that k2 |k1 . If R (·, j) is a PP modulo k1 , then it is a PP modulo k2 [32], since k2 = k1 /2m for some m. Choosing R(i, j) : k1 × k2 → k1 to be R (i, j) for i = 0, · · · , k1 −1 and j = 0, · · · , k2 −1, results in R(·, ·) and C(·) being orthogonal, and hence π as defined in (32) is a permutation. Example 6: If R(i, j) is as defined in Example 5 and if C(i, j) = C(R(i, j), j) : k1 ×k2 → k2 is a permutation function in j on [k2 ], then R(·, ·) and C(·, ·) are orthogonal and π is a permutation. A. 2D Square Interleavers Employing Linear Permutations We first study the inliers of a generalized class of 2D square interleavers proposed by Berrou and Glavieux in [37], of the type presented in Example 6 with k1 = k2 = w and k = w2 : ir = R(i, j) = pi + qj + c

(mod w)

(33)

jr = C(ir , j) = f1 (ir )j + f2 (ir )

(mod w)

(34)

R is a linear bipermutation (p, q co-prime to w), and C is a family of LPPs with appropriately chosen functions f1 , f2 . We count all pairs (i, j) that satisfy π(i · w + j) < β, subject (2) to i · w + j < α. Let ISQ (π, k; α, β) be the number of (α, β)inliers, with β = β1 ·w+β2 and α = α1 ·w+α2 . Then π(i · w + j) = ir · w + jr < β1 · w + β2 s.t: i · w + j < α1 · w + α2

We distinguish among four cases under conditions are satisfied: ⎧ Case 11 : ir < β1 ; i < α1 , ⎪ ⎪ ⎨ Case 12 : i < β ; i = α1 , r 1 ⎪ Case 21 : i = β , j < β ; i < α1 , r 1 r 2 ⎪ ⎩ Case 22 : ir = β1 , jr < β2 ; i = α1 ,

which the above any j ∈ [w] j < α2 any j ∈ [w] j < α2

(35)

(2)

Write ISQ (π, k; α, β) = I11 +I12 +I21 +I22 corresponding to the four cases for a 2D interleaver. • Case 11: We can express I11 as w−1  w−1  i − α1R(i, j) − β1 I11 = w w i=0 j=0    w−1 w−1  i − α1  R(i, j) − β1 = = α1 β1 (36) w w i=0 j=0



using the fact that R(i, j) is a permutation in j on [w] for any i. Case 12: We need to enumerate all R(α1 , j) = qj+pα1+c (mod w) < β1 subject to j < α2 . They are simply the inliers of an LPP as defined in (6): I12 = ILPP (R(α1 , j), w; α2 , β1 ).



Case 21: We enumerate all C(β1 , j) = f1 (β1 )j+f2 (β1 ) < β2 subject to R(i, j) = β1 and i < α1 . We have pi+qj + c (mod w) = β1 . Hence i = σ(j)  (β1 − qj − c)p−1 (mod w), and its inverse is σ −1 (j) = (β1 − pj − c)q −1 (mod w). This is equivalent to enumerating all C(β1 , j) < β2 such that σ(j) < α1 , which are again the inliers of an LPP (see Example 1): I21 = ILPP (C ◦ σ −1 , w, α1 , β2 ) −1



(37)

(38)

−1

where C ◦ σ = −q pf1 (β1 )j +f2 (β1 )+(β1 −c)q −1 · f1 (β1 ) (mod w). Case 22: We enumerate all C(β1 , j) = f1 (β1 )j+f2 (β1 ) < β2 subject to R(i, j) = β1 , i = α1 and j < α2 . The condition R(α1 , j) = β1 gives j ∗ = (β1 − c − pα1)q −1 (mod w). Therefore  1, if j ∗ < α2 and C(β1 , j ∗ ) < β2 ; I22 = (39) 0, otherwise.

Summing all four results (36), (37), (38), (39), we obtain (ILPP is evaluated using (6)): (2)

ISQ (π, k; α, β) = α1 β1 + ILPP (R(α1 , j), w; α2 , β1 ) +ILPP (C ◦ σ −1 , w; α1 , β2 ) + I22 (40) 1) Generalization to Other Permutations: The above steps can be extended beyond LPPs as well. We consider below a few variations to illustrate the concept further. First, if C(ir , j) in (34) is a QPP of the form C(ir , j) = f1 (ir )j + f2 (ir )j 2 (mod w), then (40) remains the same except for I21 in (38) which becomes I21 = IQPP (C ◦ σ −1 , w; α1 , β2 ) with C ◦ σ −1 = (β1 −pj −c)q −1 f1 (β1 )+(β1 −pj −c)2 q −2 f2 (β1 ) (mod w). Another example is to replace (33) by R(i, j) = pi + bi2 + qj + c (mod w), which is a QPP in i and a LPP in j. Again, (40) applies with I21 in (38) changed to IQPP (C◦σ −1 , w; α1 , β2 ). To find σ(j), Case 21 above is revisited to solve for i in terms of j from pi+bi2 +qj +c (mod w) = β1 . The inverse of pi + bi2 (mod w) is determined using (30), (31)

MONSOUR: FAST PRUNED INTERLEAVING

823

to get g1 i + g2 i2 (mod w), and hence the root is i = σ(j) = g1 (β1−c−qj)+g2 (β1−c−qj)2 (mod w). Then σ −1 is also a QPP σ −1 (j) = r1 j +r2 j 2 , with r1 , r2 determined from (30), (31), and hence C ◦ σ −1 = f1 (β1 )+(r1 j+r2 j 2 )·f2 (β1 ), which is a QPP. Other generalizations follow a similar approach.

C. 2D Rectangular Interleavers with Reversal of Dimensions We consider 2D interleavers with reversal of dimensions having improved spread properties:

B. 2D Rectangular Interleavers We next study inliers of more flexible 2D rectangular interleavers where k2 ≥ k1 and k1 |k2 : ir = R(i, j) = pi + qj + c jr = C(ir , j) = f1 (ir )j + f2 (ir )

(mod k1 ) (mod k2 )

We enumerate all (i, j) that satisfy π(i·k2+j) < β = β1·k2+β2 , subject to i·k2 +j < α = α1 ·k2 +α2 : π(i·k2 + j) = ir ·k2 + jr < β1 ·k2 +β2 i·k2 + j < α1 ·k2 +α2

s.t:

We again distinguish among four cases similar to (35) with j ∈ [w] replaced by j ∈ [k2 ]: • Case 11: (ir < β1 , i < α1 ) These inliers can be enumerated as (k2 ≥ k1 ):    k 2 −1 k 1 −1 i−α1 R(i, j)−β1 I11 = k1 k1 j=0 i=0  k  k 1 −1 2 −1 i−α1 R(i, j)−β1 k2 = (41) = α1 β1 k k k1 1 1 i=0 j=0 •

Case 12: (ir < β1 , i = α1 , j < α2 ) We enumerate all R(α1 , j) < β1 subject to j < α2 :    k 2 −1 R(α1 , j)−β1 j − α2 I12 = k1 k2 j=0

= α1 ·β2 /k1+ILPP (σ◦Cj−1 , k1 , β2 mod k1 , α1 ) (43)



again by applying (9), where Cj−1 is the inverse of C(β1 , j) mod k2 . Case 22: (ir = β1 , jr < β2 ; i = α1, j < α2 ) We enumerate all C(β1 , j) < β2 subject to R(α1 , j) = β1 and j < α2 . The condition R(α1 , j) = β1 gives jd∗ = (β1−c−pα1 )q −1 (mod k1 ) + d · k1 , for d = 0, 1, · · · , k2 /k1 −1. Therefore k2 /k1 −1



I22 =  δd =

δd ,

where

(44)

d=0

1, if jd∗ < α2 and C(β1 , jd∗ ) < β2 ; 0, otherwise.

Collecting the four terms in (41), (42), (43), (44), we get (2) the total inliers count IREC (π, k; α, β).

(mod k2 )

ir = R(i, jr ) = f1 (jr )i + f2 (jr )

(mod k1 )

We enumerate all (i, j) that satisfy π(i·k2+j) < β = β1·k1+β2 subject to i·k2 +j < α = α1 ·k2 +α2 : π(i·k2 + j) = jr ·k1 + ir < β1 ·k1 +β2 s.t: i·k2 + j < α1 ·k2 +α2 (2)

We again distinguish among four cases: IREC,REV(π, k, α, β) = I11 +I12 +I21 +I22 corresponding to ⎧ Case 11 : jr < β1 ; i < α1 , any j ∈ [k2 ] ⎪ ⎪ ⎨ Case 12 : j < β ; i = α1 , j < α2 r 1 ⎪ Case 21 : j = β , i < β ; i < α1 , any j ∈ [k2 ] r 1 r 2 ⎪ ⎩ Case 22 : jr = β1 , ir < β2 ; i = α1 , j < α2 •

Case 11: (jr < β1 , i < α1 ) We can express I11 similar to (41) as    k 2 −1 k 1 −1 i − α1 C(i, j) − β1 = α1 β1 I11 = k1 k2 j=0 i=0



Case 12: (jr < β1 , i = α1 , j < α2 ) We enumerate all C(α1 , j) < β1 subject to j < α2    k 2 −1 j − α2 C(α1 , j) − β1 I12 = k2 k2 j=0

= β1 ·α2 /k1+ILPP (R(α1 , j), k1 ; α2 mod k1 , β1 ) (42)

by applying (9) with π as the identity map, and R(α1 , j) = qj +c+pα1 (mod k1 ) is a LPP. • Case 21: (ir = β1 , jr < β2 ; i < α1 , any j) We enumerate all C(β1 , j) = f1 (β1 )j + f2 (β1 ) < β2 subject to ir = β1 and i < α1 . From pi + qj + c (mod k1 ) = β1 , we get i = σ(j)  (β1 −qj −c)p−1 (mod k1 ). This is equivalent to enumerating all C(β1 , j) < β2 such that σ(j) < α1 :   k 2 −1 σ(j)−α1 C(β1 , j)−β2 I21 = k1 k2 j=0

π : (i, j)  (jr , ir ) → jr = C(i, j) = pi + qj + c

= ILPP (C(α1 , j), k2 ; α2 , β1 ) •

Case 21: (jr = β1 , ir < β2 ; i < α1 , any j) Count all R(i, β1 ) < β2 subject to C(i, j) = β1 , i < α1 :    k 1 −1 i − α1 R(i, β1 ) − β2 I21 = k1 k1 i=0 = ILPP (R(i, β1 ), k1 ; α1 , β2 )



Case 22: (jr = β1 , ir < β2 ; i = α1 , j < α2 ) If R(α1 , β1 ) < β2 , C(α1 , j) = β1 , j < α2 , then we have 1 more inlier. From C(α1 , j) = β1 we have j ∗ = (β1−c−pα1 )q −1 (mod k2 ). Therefore  1, if j ∗ < α2 and R(α1 , β1 ) < β2 ; I22 = 0, otherwise. VI. E XTENSION TO 3D B LOCK I NTERLEAVERS

In this section, the analysis of inliers is extended to 3D interleavers. Due to lack of space, only 3D cube interleavers based on linear multi-permutations [36] are considered. Generalizations to higher (and reversed) dimensions follow a similar approach. Let π : (i, j, h) → (ir , jr , hr ) where ir = R(i, j, h) = pi + qj + vh + c jr = C(ir , j) = f1 (ir )j + f2 (ir )

(mod w) (mod w)

hr = H(jr , h) = f3 (jr )h + f4 (jr )

(mod w)

824

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 3, MARCH 2013

k = w3 , R is a linear multi-permutation (p, q, v co-prime to w), and C, H are families of LPPs with appropriately chosen functions f1 , · · · , f4 . We count all triplets (i, j, h) satisfying

f2 (β1 ) = β2 , we get j ∗ = (β2 − f2 (β1 ))[f1 (β1 )]−1 . Substituting for j in the condition ir = β1 , then together with the condition i < α1 , we obtain σ(h)  (β1−c−qj ∗− vh)p−1 < α1 . Its inverse is σ −1 (h) = (β1−c−qj ∗ −ph)v −1 (mod w). This is equivalent to enumerating all H(β2 , h) < β3 such that σ(h) < α1 , which equals

π(i·w2 +j ·w+h) = ir ·w2 +jr ·w+hr < β1 ·w2 +β2 ·w+β3 s.t:

i·w2 +j ·w+h< α1 ·w2 +α2 ·w+α3

where β = β1 · w2 + β2 · w + β3 and α = α1 · w2 + α2 · w + α3 . (3) Denote by ICUBE (π, k; α, β) = 3i,j=1 Iij the number of inliers of a 3D block cube interleaver. We now distinguish among nine cases: ⎧ Case ⎪ ⎪ ⎪ ⎪ Case ⎪ ⎪ ⎪ ⎪ Case ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ Case Case ⎪ ⎪ ⎪ Case ⎪ ⎪ ⎪ ⎪ Case ⎪ ⎪ ⎪ ⎪ ⎪ Case ⎪ ⎩ Case

11 : 12 : 13 : 21 : 22 : 23 : 31 : 32 : 33 :

ir < β1 ; ir < β1 ; ir < β1 ; ir = β1 , ir = β1 , ir = β1 , ir = β1 , ir = β1 , ir = β1 ,

jr < β2 ; jr < β2 ; jr < β2 ; jr = β2 , jr = β2 ; jr = β2 ;

i < α1 , i = α1 , i = α1 , i < α1 , i = α1 , i = α1 , hr < β3 ; i < α1 , hr < β3 ; i = α1 , hr < β3 ; i = α1 ,

any j, j < α2 , j = α2 , any j, j < α2 , j = α2 , any j, j < α2 , j = α2 ,

h ∈ [w] any h ∈ [w] h < α3 h ∈ [w] any h ∈ [w] h < α3 h ∈ [w] any h ∈ [w] h < α3

Case 11: We can express I11 similar to (41) as w−1  w−1  w−1  i−α1R(i, j, h)−β1 I11 = = α1 β1 w w w j=0 i=0



(45)

I31 = ILPP (H ◦ σ −1 , w; α1 , β3 )



where H ◦ σ −1 = f3 (β2 ) · (β1−c−qj ∗ − ph) · v −1+f4 (β2 ) (mod w). Case 32: Conditions ir = β1 , jr = β2 , i = α1 give j ∗ = (β2−f2 (β1 ))[f1 (β1 )]−1 (mod w) and h∗ = (β1−c−pα1− qj ∗ )v −1 (mod w). From j < α2 and hr < β3 we have  1, if j ∗ < α2 and H(β2 , h∗ ) < β3 ; I32 = 0, otherwise.

Case 33: Conditions ir = β1 , i = α1 , j = α2 give h∗ = (β1 −c−pα1 −qα2 )v −1 (mod w). To satisfy conditions jr = β2 , hr < β3 , h < α3 , we have  1, if C(β1 , α2 ) = β2 & h∗ < α3 & H(β2 , h∗ ) < β3 ; I33 = 0, otherwise. •

Summing all Iij above we obtain the total inliers count (3) ICUBE (π, k; α, β) of a 3D cube interleaver.

h=0



Case 12: We can express I12 as w−1  w−1  j − α2R(α1 , j, h) − β1 I12 = = α2 β1 w w j=0 h=0



Case 13: We enumerate all R(α1 , α2 , h) < β1 subject to j < α3 : I13 = ILPP (R(α1 , α2 , h), w; α3 , β1 )



Case 21: Since ir = β1 and i < α1 , we have pi + qj + vh+ c (mod w) = β1 so σ(j, h)  (β1 − c− qj − vh)p−1 (mod w) < α1 . Also ir = β1 and jr < β2 imply C(β1 , j) < β2 . Hence w−1  w−1  σ(j, h) − α1C(β1 , j) − β2 I21 = = α1 β1 w w j=0 h=0



Case 22: From ir = β1 and i = α1 , we have R(α1 , j, h) = β1 (mod w). From jr < β2 and j < α2 , we enumerate all C(β1 , j) < β2 subject to j < α2 . Since j and h are related by the above equation for any h ∈ [w], then for every j that satisfies both inequalities there is a corresponding unique h. Hence we only need to count all C(β1 , j) < β2 subject to j < α2 : I22 = ILPP (C(β1 , j), w; α2 , β2 )



Case 23: From ir = β1 , i = α1 , j = α2 , we have pα1 + qα2 +vh+c = β1 (mod w). Solving for h, we get h∗ = (β1 −c−pα1 −qα2 )v −1 (mod w). Conditions jr < β2 and h < α3 yield:  1, if h∗ < α3 and C(β1 , α2 ) < β2 ; I23 = 0, otherwise.



Case 31: We enumerate all H(β2 , h) < β3 subject to ir = β1 , jr = β2 and i < α1 . Solving for j from f1 (β1 )j+

VII. A PPLICATION TO T URBO I NTERLEAVERS AND B OUND ON M INIMUM S PREAD Serial pruning is valuable in turbo coding applications because it can accommodate flexible codeword lengths. Typically, in a communication system employing adaptive modulation and coding, only a small set of discrete codeword lengths k are supported. Bits are either punctured or filled in to match ◦ the nearest supported length. For a pruned interleaver π of length β to be useful, it is desirable to have the following characteristics: 1) It does not require extra storage memory to store the pruned indices, 2) its spread factor [5] degrades gracefully with the number of pruned indices g  k −β, and hence the impact on BER performance is limited, and 3) pruning preserves the contention-free property [12], [38] of its mother interleaver (if present). Obviously serial pruning satisfies the first property. We next show that it also satisfies the other two. The spread factor of an interleaver is a popular measure of merit for turbo codes [5]. The spread measures of π and ◦ π associated with two indices i, j are S(i, j) = |π(i)−π(j)|+ ◦ ◦ |i−j| and Sp (i, j) = | π(i)−π(j)|+|i−j| = |π(p(i))−π(p(j))|+ ◦ |i−j|. The minimum spreads of π and π are defined as Smin  mini,j 0, π(j + tW ) < β; (47) Δ(j, t) = Δ(j −1, t), ⎪ ⎩ Δ(j −1, t)+1, if j > 0, π(j + tW ) ≥ β, with initial condition Δ0 = 0. Then, for 0 ≤ j < W and 0 ≤ t = v ≤ k/W , we have ◦    π(j +tW −Δ(j, t)) π(j +tW ) = W W   ◦  π(j +vW −Δ(j, v)) π(j +vW ) = = . W W

Hence a serially pruned interleaver is contention-free when the banks are accessed sequentially using a counter from j = 0, 1, · · · , W −1, if the mother interleaver is contention-free. The pruning gaps in (47) can be computed efficiently using Theorem 1 based on Dedekind sums. The implications are that serially-pruned contention-free interleavers are parallelizable at a low implementation cost using the proposed schemes. When coupled with windowing techniques to parallelize the constituent APP decoders, a turbo decoder can then be efficiently parallelized to meet throughput requirements in 4G wireless standards and beyond.

826

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 3, MARCH 2013

LTE turbo-coding with pruned QPP interleavers of length ∼ 2048 (31,64)-QPPI, k = 2048 β = 2046 (pruned) β = 2044 (pruned) β = 2042 (pruned) β = 2040 (pruned) β = 2036 (pruned) β = 2032 (pruned) β = 2028 (pruned) β = 2024 (pruned) β = 2020 (pruned) β = 2016 (pruned) β = 1664 (pruned) (113,420)-QPPI, k = 2016 (183,104)-QPPI, k = 1664

−1

10

−2

10

−3

BER

10

−4

10

−5

10

0

LTE turbo-coding with pruned QPP interleavers of length ∼ 4096

10

(31,64)-QPPI, k = 4096 β = 4032 (pruned) β = 3968 (pruned) β = 3904 (pruned) β = 3840 (pruned) (127,168)-QPPI, k = 4032 (375,248)-QPPI, k = 3968 (363,244)-QPPI, k = 3904 (331,120)-QPPI, k = 3840

−2

10

BER

0

10

−4

10

−6

10

−6

10

−7

10

−8

10

−8

10

0.5

1

1.5

2

2.5

0.5

Eb /N0 (dB)

1

1.5

2

2.5

Eb /N0 (dB)

(a)

(b)

Fig. 3: BER of LTE turbo codes with pruned QPP interleavers: (a) k = 2048 and (b) k = 4096. TABLE I: Parameters of 1D, 2D, 3D interleavers considered. Interleaver LPP-1D QPP-1D ARP-1D 2-Dimensional LPP-LPP-SQ-2D LPP-LPP-REC-2D LPP-LPP-RECV-2D LPP-QPP1-SQ-2D

LPP-QPP2-SQ-2D 3-Dimensional LPP-LPP-LPP-SQ-3D

Permutation π π(j) = (k/2−1)j +10 mod k π(j) = (k/2−1)j +3kj 2 /8 mod k σ(j) = (k/2−3)j +S(j mod 8) mod k π(j) = (k/2−1)j +P (j mod 8) mod k π ir = 5i+7j +33 mod k1 jr = f1 (ir mod 8)j +f2 (ir mod 8) mod k1 π(ik2 +j) = ir k1 +jr mod k ir = 11i+13j +33 mod k1 jr = f1 (jr mod 8)j +f2 (jr mod 8) mod k2 π(ik2 +j) = ir k1 +jr mod k jr = 7i+13j +33 mod k2 ir = f1 (jr mod 8)j +f2 (jr mod 8) mod k1 π(ik2 +j) = jr k1 +ir mod k ir = 3i+7j +21 mod k1 jr = f1 (ir mod 8)j +f2 (ir mod 8)j 2 mod k1 π(ik2 +j) = ir k1 +jr mod k ir = 3i+5k1 /8i2 +7j +12 mod k1 jr = f1 (ir mod 8)j +f2 (ir mod 8)j 2 mod k1 π(ik2 +j) = ir k1 +jr mod k π ir = 3i+11j +13h+19 mod w jr = f1 (ir mod 8)j +f2 (ir mod 8) mod w hr = f3 (jr mod 8)j +f4 (jr mod 8) mod w π(ik2 +j) = ir w2 +jr w+hr mod k

VIII. P RACTICAL E XAMPLES AND S IMULATIONS R ESULTS To demonstrate the performance advantage of the proposed schemes, several pruned interleavers were constructed and simulated using the proposed algorithms and compared with existing serial pruning algorithms in the literature. The generalized Dedekind sum algorithm (Fig. 1) was implemented using integer operations and optimized to operate on large integers. The pruning gap of interleavers was computed using the equations developed in this work. The minimal inliers and parallel algorithms in [30], [31] were then applied. 1D,

Size k = 2n n = 10, 11, · · · , 20 n = 10, 11, · · · , 20 n = 10, 11, · · · , 20 S, P periodic with period 8 k1 = 2n1 , k2 = 2n2 , n = n1 +n2 n1 = n2 = 5, · · · , 12 f1 = (17, 37, 19, 29, 41, 23, 13, 7) f2 = 1, · · · , 8 n1 = 4, · · · , 11; n2 = 6, · · · , 13 f1 = (17, 37, 19, 29, 41, 23, 13, 7) f2 = 1, · · · , 8 n1 = 4, · · · , 11; n2 = 6, · · · , 13 f1 = (17, 37, 19, 29, 41, 23, 13, 7) f2 = 1, · · · , 8 n1 = n2 = 5, · · · , 12 f1 = (17, 37, 19, 29, 41, 23, 13, 7) f2 = (1, 3, 5, 7, 9, 11, 13, 15, 17)k1 /8 (mod k1 ) n1 = n2 = 5, · · · , 12 f1 = (17, 37, 19, 29, 41, 23, 13, 7) f2 = 1, · · · , 8 w = 2n , k = w3 n = 4, 5, · · · , 8 f1 = (17, 37, 19, 29, 41, 23, 13, 7), f2 = 1, · · · , 8 f3 = (41, 23, 13, 7, 17, 37, 19, 29), f4 = 1, · · · , 8

2D, and 3D block interleavers were considered with various combinations of permutations, dimensions, and ordering of dimensions. The parameters are summarized in Table I. Figure 4 plots the time to compute the pruning gap of 1D and 2D interleavers as a function of index. Also shown are the corresponding times of serially pruned algorithms. The plots demonstrate a significant improvement of several orders of magnitude in pruning time compared to the serial case. The absolute times are down from seconds to milliseconds on a state-of-the-art desktop computer. This is reaffirmed in Fig. 5

MONSOUR: FAST PRUNED INTERLEAVING

827

0

1

10

10

LPP−1D LPP−1D* QPP−1D QPP−1D* ARP−1D ARP−1D*

serial pruned interleaver

−1

proposed

−2

10

serial pruned interleaver

−1

10

proposed

−2

−3

10

10

−3

−4

10

LPP−2D LPP−2D* LPP−3D LPP−3D*

0

10

Time to compute pruning gap

Time to compute pruning gap

10

1.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2

Index

10

3.4

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

Index

4

x 10

(a)

5

x 10

(b)

Fig. 4: Time to compute pruning gap versus index for (a) 1D and (b) 2D and 3D interleavers. 2

1

10

10

serial pruned interleaver serial pruned interleaver 1

0

10

−1

10

−2

10

proposed

LPP−LPP−SQ−2D LPP−LPP−SQ−2D* LPP−LPP−REC−2D LPP−LPP−REC−2D* LPP−LPP−RECV−2D LPP−LPP−RECV−2D* LPP−LPP−LPP−SQ−3D LPP−LPP−LPP−SQ−3D*

0

LPP−1D LPP−1D* QPP−1D QPP−1D* ARP−1D ARP−1D*

Normalized time

Normalized time

10

10

−1

10

proposed −2

−3

10

10

−4

10

0

−3

2

4

6 Interleaver size k

8

10

12 5

x 10

(a)

10

0

2

4

6

8 10 Interleaver size k

12

14

16

18 6

x 10

(b)

Fig. 5: Relative pruning times for (a) 1D and (b) 2D and 3D interleavers as function of k.

which show the normalized pruning time as a function of interleaver size. The speedup attained in all cases is orders of magnitude better than the serial case. Both QPP permutations have identical performance (see (40) and Section V-A1). IX. C ONCLUSION A mathematical framework for analyzing and accelerating pruned polynomial-based turbo interleavers has been presented. The pruning gap of a pruned interleaver has been cast in terms of a statistic involving sums of integer floor and saw-tooth functions. A computationally efficient algorithm for evaluating these sums has been proposed. Extensions to higher dimensional interleavers have been also considered. The minimum spread of serially pruned turbo interleavers has been shown to closely match that of the mother interleaver for small pruning gaps, with minimal impact on BER performance. Moreover, serial pruning has been shown to preserve

the contention-free property of the mother interleaver. The efficiency of the proposed schemes in reducing interleaving latency has been demonstrated. The significance of this work lies in that it enables flexible and high throughput implementations of pruned turbo interleavers employed in communication systems that support multiple data rates and variable-length data blocks. A PPENDIX A. Useful Identities Involving the ((·)) Function Throughout the proofs, we make use of the following identities involving the ((·)) function which can be proved using the definition ((x)) = x − x − 12 + 12 δ(x). Assume m, n, k are integers, and x is any real number, we have: ((n)) = 0, ((n/2)) = 0, ((−x)) = − ((x)), ((x + n)) = ((x)),

828

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 3, MARCH 2013

  x ± 12 = ((2x))−((x)), (((x + nk)/k)) = ((x/k)),         n±θ n θ 1 n = ± ∓ δ (48) k k k 2 k         n n± 1 1 1 n±1 1 n = ± ∓ δ ∓ δ (49) k k k 2 k 2 k      m m−n n mod k − =− k k k ⎧ ⎨1, if 0 < m < n (mod k); + 12 , if m = 0 or m = n; n = 0 (mod k) (50) ⎩ 0, otherwise ⎧          ⎨+ 21 , if 0 < m < n (mod k); m−n m n − + = − 21 , if 0 < n < m (mod k); (51) ⎩ k k k 0, otherwise where in (48), θ = 0 is real and 0 < θ < 1. Moreover, the function ((x)) possesses many interesting properties when x is a rational number j/k, especially when ((j/k)) is summed over a complete residue system modulo k:   k−1  π(j) + x = ((x)) (52) k j=0 where π is any permutation on [k] and x is any real number. Moreover, the above sum does not change if j is run over any interval [nk, nk+k−1] of length k, for any integer n. B. Proof of Equation (2) and Theorem 1 Noting that 0 ≤ j < k and 0 ≤ π(j) < k, the first integer floor function in (2) evaluates to −1 when 0 ≤ j < α and to 0 when j ≥ α, while the second evaluates to −1 when 0 ≤ π(j) < β and 0 when π(j) ≥ β. Hence, the sum of the products of the two floor functions enumerates the required inliers. Next, rewriting the floor functions in (2) in terms of ((·)), where 0 ≤ j < k and 0 ≤ π(j) < k, multiplying out terms, and applying the property ((n)) = 0 in Appendix A and property (52), we obtain



αβ 1 π(α)−β 1 π(0)−β I(π, k; α, β) = + − k 2 k 2 k

−1



−1

 π (β) π (0) 1 π −1(β)−α 1 π −1(0)−α − − + − 2 k k 2 k k







 k−1  j π(j)−β π(j) j −α − − (53) + k k k k j=0

from which (3) follows, where the constant K is given by       −1  π(α)−β π(0)−β π (β)−α 2K = − − k k k   −1     −1    −1  π (β) π (0)−α π (0) + + − k k k After simplifying the saw differences above, K can then be evaluated as follows, starting with K = 0: if 0 < π −1 (β) < α, elseif π −1 (β) = 0 or α, if 0 < π

−1

elseif π

(0) < α,

−1

(0) = 0 or α,

K = − 21 K = − 41 K=K+ K=K+

if 0 ≤ π(α) < β,

K=K−

if 0 ≤ π(0) < β,

K=K+

αβ Finally, the bound αβ k −c1 ≤ I(π, k; α, β) ≤  k +c2  can be easily proved by evaluating (2) without ·.

C. Dedekind Sum Identities, Generalization of (4), and Proofs of Equations (10), (11) A Dedekind sum (4) satisfies the following properties, which can be verified using identities (48) and (52):    1 h ·x D(h, k, x) = D(h, k, x)+ ·(1−δ(x))(55) 2 k D(d·h, d·k, d·x) = d·D(h, k, x), integer d (56) D(h, k, −x) = D(h, k, x) (57) D(h, k, x + nk) = D(h, k, x), any integer n D(h + nk, k, x) = D(h, k, x), any integer n D(k − h, k, x) = −D(h, k, x) 

(58) (59) (60)



where in (55), h is such that h h = 1 (mod k). We next prove (11). The proof for (10) is similar when k1 = k2 = k and d = gcd(h, k2 ) = 1. Let σ(j) = jg + b (mod k1 ), π(j) = jh + c (mod k2 ), g  g = 1 (mod k1 ), k2 = pk1 , and gcd(g, k1 ) = gcd(h, k1 ) = 1. Let d = gcd(h, k2 ), then gcd(h, pk1 ) = gcd(h, p) = d:       k 2 −1 σ(j)−α π(j)−β k1 j=0  p−1  k 1 −1 

k2

     (ik1 +j)g +b−α (ik1 +j)h+c−β (61) k1 k2 j=0 i=0    p−1   k 1 −1 jg +b−α  ih/d+(jh+c−β)/dk1 = (62) k1 p/d j=0 i=0      k 1 −1 jg +b−α jh+c−β (63) =d · k1 dk1 j=0

=

= d · D(g  h/d, k1 , (c−β −g h(b−α))/d)

where j → ik1 + j in (61), Appendix A inline identities are applied in (62), and (52) is applied in (63). Finally, let x1 and x2 be two real numbers. Decompose x1 = c1 +θ1 into an integer part (c1 ) and a fractional part (θ1 ). The Dedekind sums in (4) and (10) can be generalized to:     k−1    k−1  jg +x1 jh+x2  jg +c1 +θ1 jh+x2 = (64) k k k k j=0 j=0 and evaluated in terms of (10) as follows. If x1 is an integer, then (64) can be evaluated similar to (10). Next, assume that θ1 = 0. Let g  g = 1 (mod k). Applying identities (48), (52) to (64), we obtain     k−1  jg +c1 +θ1 jh+x2 = D(g  h, k, x2 −g  hc1 ) k k j=0    θ1 1 x2 −g  hc1 + ·((x2 ))− . (65) k 2 k D. Proof of Lemma 1

1 2 1 4 1 2 1 2

(54)

Referring to (3) in Theorem 1 with π(j) = jh+c (mod k), the sum involving the saw functions can be expressed in terms of the Dedekind vector sum equation given in (5) with l = 2 and h0 = h1 = h, k0 = k1 = k, x0 = π(α)−β, x1 = π(0)−β, y0 = π(α), y1 = π(0), and w0 = w1 = 0.

MONSOUR: FAST PRUNED INTERLEAVING

E. Proofs of r mod q-Dedekind Sum Equations (12) and (13) Equation (12) is a special case of (13) when σ(j) = j, α = β = 0. Hence we prove (13) only. Doing the substitutions j → jq+r, then j → jg  where g  g = 1 (mod k/q), we obtain k−1 σ(j)−απ(j)−β k k j=0 j≡r(q)

=

 k/q−1  j=0

   j + (σ(r)−α)/q jg h+ (π(r)−β)/q (66) k/q k/q

If (rg+b−α)/q is an integer, i.e., σ(r)−α = 0 (mod q), then substitute j → j −(σ(r)−α)/q to get  k/q−1   j jg  h+ (rh+c−β −(rg +b−α) g  h) /q k/q k/q j=0     k/q−1    j  jg h+(g h · (α−b)+c−β)/q = k/q k/q j=0 = D(g  h, k/q, (ρ(α)−β)/q) where ρ(j)  g  h · (j −b)+c (mod k). Otherwise, if (rg+b− α)/q = an integer, let w(r)  σ(r)−α (mod q) and z(r)  (1− g  g)hr. (Note that g  g = 1 (mod k/q) but g  g = 1 (mod k).) We split (σ(r)−α)/q in (66) into an integer and a fractional part, and then apply (65) to get the following, from which (13) follows: k/q−1  j +(σ(r)−α)/q+w(r)/qjg  h+(π(r)−β)/q k/q k/q j=0 = D(g  h, k/q, (π(r)−β −g  h · (σ(r)−α−w(r)))/q)      w(r) π(r)−β 1 π(r)−β −g  h · (σ(r)−α−w(r)) + − k q 2 k

F. Proof of Theorem 2 Let σ(j), π(j) be two ARPs defined using σr (j) = jg+S(r) (mod k) and πr (j) = jh+P (r) (mod k), for j = r (mod q) and r = 0, · · · , q − 1, where q|k, gcd(g, k) = gcd(h, k) = 1, P (r) = Q(r) = 0 (mod k). Let g  g = 1 (mod k) and h h = 1 (mod k). To get the total inliers count, we apply (18) and sum the inlier counts using (19) for r = 0, · · · , q −1. Let ρr (j) = Q(r) + g  h · (j − P (r)) (mod k), λr (j) = P (r) + h g · (j − Q(r)) (mod k), σr (r) = gr+P (r) (mod k), πr (r) = hr+Q(r) (mod k), g  g = 1 (mod q), h h = 1 (mod q), z(r) = (1 − g  g)hr, and y(r) = (1−h h)gr. Expanding (19) in terms of ((·)), then substituting in (18) gives (67) at the top of the next page. The remainder of the proof focuses on simplifying the terms in this summation. Let S1 , S2 , S3 , S4 denote the four sums of (r mod q)-Dedekind sums above. After some manipulations, the sum reduces to   αβ 1 (z(r1 )+ρr1 (α)) mod k−β + IARP (σ, π, k; α, β) = k 2 k   1 ρ0 (0) mod k−β − +(S1 −S2 )−(S3 −S4 ) 2 k      y(r2 )+λr2 (β) 1 y(r2 )+λr2 (β)−α − − 2 k k       λ0 (0) 1 λ0 (0)−α + − (68) 2 k k

829

We next apply (13) to each of the Si ’s in (68) and simplify. For S1 we obtain (69) at the top of the next page, where wr = σr (r)−α (mod q) = gr −α (mod q) since q|k and Q(r) = 0 (mod q). A similar derivation can be carried for the remaining sums S2 , S3 , S4 . After simplification, the combination of the four Dedekind sums S  (S1 −S2 )−(S3 −S4 ) in (68) reduces to q S = D(g  h, q, u, v)+D(g  hq, k, x, y) k         q g  hα−β g hα β − − + 2k q q q       q−1   z(r)+ρr (α+wr )−β z(r)+ρr (α+wr ) 1 − − 2 r=0 k k r=g α

     q−1 z(r)+ρr (wr ) 1  z(r)+ρr (wr )−β + − 2 r=0 k k r=0

l−1 l−1 where x = (xi )l−1 i=0 , y = (yi )i=0 , w = (wi )i=0 are vectors of length l = 2q whose elements are respectively defined in (21), (22), (23) in Theorem 2 for r = 0, · · · , q − 1, and u, v are vectors of length 2 defined as v = (g  hα, 0), u = v−(β, β). Substituting back in (68) and applying (51), the inliers count reduces to the form given in (20) with the constant KARP given by the following, starting with KARP = 0 (α = 0, β = 0):

if if if if if if if if

(mod z(r1 ) + ρr1 (α) < β ρ0 (0) < β (mod 0 < y(r2 )+λr2 (β) < α (mod y(r2 )+λr2 (β) = 0 or α (mod λ0 (0) < α (mod λ0 (0) = 0 or α (mod  0 < g hα < β (mod 0 < β < g  hα (mod

k), KARP = − 12 k), KARP = KARP + 12 k), KARP = KARP − 12 k), KARP = KARP − 14 k), KARP = KARP + 12 k), KARP = KARP + 14 q k), KARP = KARP − 4k q k), KARP = KARP + 4k

(70)

G. Proof of Theorem 3 In the following, we assume the integers h, k, b satisfy the properties stated in Section IV-C for π(j) = jh+j 2 b+c (mod k) to be a QPP. The inverse of π is another QPP π −1 (j) = jg1 + j 2 g2 (mod k), where g1 and g2 given in (30), (31). Let x be any real number. It is easy to verify that the QuadraticDedekind sum (24) satisfies identical properties to (56)-(60) for Dedekind sums, except for property (55), which becomes    2 (1−δ(x)) g1 x−g2 x · Q(h, k, x, b) = Q(h, k, x , b)+ k 2 To prove Theorem 3, we assume x = c is an integer, and consider the prime factorization of k and b. Let k = 2n2 3n3 p1 np1 p2 np2 · · · pe npe b = 2m2 3m3 p1 mp1 p2 mp2 · · · pe mpe × m, where pi ’s are prime factors = 2, 3, ni ’s and mi ’s are the prime factor multiplicities, and m is any integer = 2, 3. Then π has at least one quadratic inverse iff [34]:  n2 −2    , 1 , n2 > 1; 2 m2 ≥ max 0,    n2 = 0, 1.  n3 −1 , 1 , n3 > 0; 2 m3 ≥ max n3 = 0.  0,  npi mpi ≥ , pi = 2, 3 2 Then, mk = 2n2−m2·3n3−m3·p1 np1−mp1·p2 np2−mp2 · · ·pe npe−mpe×

830

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 3, MARCH 2013



q−1 ⎪ ⎨ 

         α πr (r)−β πr (r) σr (r)−α σr (r) β − − + − ⎪ k q q k q q r=0 ⎩           z(r)+ρr (α)−β z(r)+ρr (α) z(r)+ρr (0)−β z(r)+ρr (0) 1 σr (r)−α 1 σr (r) + δ − − δ − 2 q k k 2 q k k                y(r)+λr (β)−α y(r)+λr (β) y(r)+λr (0) 1 πr (r)−β 1 πr (r) y(r)+λr (0)−α − δ − + δ − 2 q k k 2 q k k ⎫                               ⎪ k−1  σr (j)−α πr (j)−β σr (j)−α πr (j) σr (j) πr (j)−β σr (j) πr (j) ⎬ (67) + − − + ⎪ k k k k k k k k ⎭ j=0

IARP (σ, π, k; α, β) =

j≡r(q)

S1 

=

     q−1 k−1    σr (j)−α πr (j)−β k k r=0 j=0 q−1 

j≡r(q)

D(g  hq, k, z(r)+ρr (α)−β) 

r=0

D(g hq, k, z(r)+ρr (α+wr )−β) +

wr k

!!

""

πr (r)−β q

!!

− 12

wr = 0;

""

z(r)+ρr (α+wr )−β k

,

wr = 0.

⎧ ⎫     ⎪ q−1 ⎪ ⎨ k−1    j j(h + rb) + c ⎬ Q(h, k, c, b) = ⎪ ⎪ k k ⎭ r=0 ⎩ j=0 j≡r(q)        q−1 q−1 q−1    r 1 π(r) 1  π(r) q q = − ((π(0))) − D(h, k/q, π(r)/q) + + k r=0 q 2 q 2k 2 r=1 k r=0        q−1 q−1  q c 1  π(r) q q ((c)) − D(h, k/q, π(r)/q) + Q(h, q, c, b) − + = k 2k q 2k 2 r=1 k r=0 Q(h, k, c, b) =

(69)

(71)

(72)

  k−1           k−1   j  jh + c + m k jh + c + m k8 j j jh + c 2 + + k k k k k k j=0 j=0

 k−1  j=0 j≡0(4)

j≡1(2)

j≡2(4)

= D0,4 (h, k, c) + D1,2 (h, k, c+mk/8) + D2,4 (h, k, c+mk/2) = D(h, k/4, c/4) + D(h, k/2, (c+mk/8+h)/2) + D(h, k/4, (c+mk/2+2h)/4)           1 c+mk/8+h 1 c+mk/8+h 2 c+mk/2+2h 1 c+mk/2+2h + − + − k 2 2 k k 4 2 k = D(h, k/4, c/4) + D(h, k/2, (c+b+h)/2) + D(h, k/4, (c+2h)/4+b)           1 c+b+h 2 c+2h 1 c+4b+2h − + − 2 k k 4 2 k

b = q×b. Hence j 2 b ≡ jrb (mod k) for r = 0, 1, · · · , q−1. We q−1 k−1 k−1 can then replace j=0 by r=0 j=0,j=r to simplify the sum in (24) as shown in (71), (72) above, where the (r mod q)-Dedekind  n  sum has been used in (71). Also, if we assume mpi ≥ 2pi for all prime factors of k, b, then npi − mpi ≤ mpi , and hence q|b. Consequently, in (72), we have Q(h, q, c, b) = D(h, q, c). Also, when c is an integer, then ((c)) = 0. Therefore, (25) follows from (72). H. Proof of Equation (27) and Lemma (2) Again assume that h, k, b are such that π is a QPP, and let b = mk/8 where m  8. According to (26), π can be viewed as the union of three LPPs. Applying this to Q(h, k, c, b) in (24), the simplifications in (73) shown above follow. Applying formula for (r mod q)-Dedekind sums in (12) and simplifying terms, (27) follows.

(73)

To prove Lemma (2) each of the Q terms in (28) is expanded using (27) to obtain 12 Dedekind sums and associated saw fractions. These terms are combined in vector form using (5), from#!!which " the The " !result ! ""follows. !! "" !constant ! ""$ π(α) π(0)−β − − + π(0) term KQPP8 = K − k2 π(α)−β 4 4 4 4 includes K in (54). From the conditions on h, k for π to be a QPP, either gcd(h, k) = 1 or gcd(h, k/2) = 1. Since 8|k, then h is odd, and therefore h/2 cancels out from the saw fractions in the equation of KQPP8 since ((αh/2)) = ((α/2)).

R EFERENCES [1] J. Ramsey, “Realization of optimum interleavers,” IEEE Trans. Inf. Theory, vol. 16, no. 3, pp. 338–345, May 1970. [2] G. D. Forney, Jr., “Burst-correcting codes for the classic bursty channel,” IEEE Trans. Commun. Technol., vol. 19, no. 5, pp. 772–781, Oct. 1971.

MONSOUR: FAST PRUNED INTERLEAVING

[3] R. Garello, G. Montorsi, S. Benedetto, and G. Cancellieri, “Interleaver properties and their applications to the trellis complexity analysis of turbo codes,” IEEE Trans. Commun., vol. 49, no. 5, pp. 793–807, May 2001. [4] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: turbo codes,” in Proc. 1993 IEEE Conf. Commun., vol. 2, pp. 1064–1070. [5] S. Dolinar and D. Divsalar, “Weight distributions for turbo codes using random and nonrandom permutations,” JPL TDA Progress Report 42122, Tech. Rep., Aug. 1995. [6] C. Heegard and S. B. Wicker, Turbo Coding. Kluwer Academic Publishers, 1999. [7] S. Crozier and P. Guinand, “High-performance low-memory interleaver banks for turbo-codes,” in Proc. 2001 IEEE Veh. Tech. Conf. – Fall, vol. 4, , pp. 2394–2398. [8] C. Berrou, Y. Saouter, C. Douillard, S. Kerouedan, and M. Jezequel, “Designing good permutations for turbo codes: towards a single model,” in Proc. 2004 IEEE Conf. Commun. (ICC), vol. 1, pp. 341–345. [9] E. Dunscombe and F. C. Piper, “Optimal interleaving scheme for convolutional coding,” IEE Electron. Lett., vol. 25, no. 22, pp. 1517– 1518, Oct. 1989. [10] F. Daneshgaran and M. Mondin, “Design of interleavers for turbo codes: iterative interleaver growth algorithms of polynomial complexity,” IEEE Trans. Inf. Theory, vol. 45, no. 6, pp. 1845–1859, Sep. 1999. [11] F. Daneshgaran and P. Mulassano, “Interleaver pruning for construction of variable-length turbo codes,” IEEE Trans. Inf. Theory, vol. 50, no. 3, pp. 455–467, Mar. 2004. [12] A. Nimbalker, T. K. Blankenship, B. Classon, T. E. Fuja, and D. J. Costello, Jr., “Contention-free interleavers for high-throughput turbo decoding,” IEEE Trans. Commun., vol. 56, no. 8, pp. 1258–1267, Aug. 2008. [13] K. Kusume and G. Bauch, “Simple construction of multiple interleavers: cyclically shifting a single interleaver,” IEEE Trans. Commun., vol. 56, no. 9, pp. 1394–1397, Sep. 2008. [14] S. J. Johnson and S. R. Weller, “Practical interleavers for repeat– accumulate codes,” IEEE Trans. Commun., vol. 57, no. 5, pp. 1225– 1228, May 2009. [15] R. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Inf. Theory, vol. 27, pp. 533–547, Sep. 1981. [16] R. Gallager, Low-Density Parity-Check Codes. MIT Press, 1963. [17] E. Zehavi, “8-PSK trellis codes for a Rayleigh channel,” IEEE Trans. Commun., vol. 40, no. 5, pp. 873–884, May 1992. [18] O. Y. Takeshita and D. J. Costello, Jr., “New deterministic interleaver designs for turbo codes,” IEEE Trans. Inf. Theory, vol. 46, no. 6, pp. 1988–2006, Sep. 2000. [19] “IEEE standard for local and metropolitan area networks—part 20: air interface for mobile broadband wireless access systems supporting vehicular mobility,” IEEE, Piscataway, NJ, Tech. Rep. 802.20, 2008. [20] J. Sun and O. Takeshita, “Interleavers for turbo codes using permutation polynomials over integer rings,” IEEE Trans. Inf. Theory, vol. 51, no. 1, pp. 101–119, Jan. 2005. [21] E. S. E. . . v1.2.1, “Frame structure channel coding and modulation for a second generation digital terrestrial television broadcasting system (DVB-T2),” ETSI, Tech. Rep., 2011. [22] D. Knuth, The Art of Computer Programming, 3rd edition. AddisonWesley, 1998, vol. 2. [23] A. Nimbalker, Y. Blankenship, B. Classon, and T. K. Blankenship, “ARP and QPP interleavers for LTE turbo coding,” in Proc. 2008 IEEE Wireless Commun. Netw. Conf., pp. 1032–1037. [24] “Evolved universal terrestrial radio access (E-UTRA): multiplexing and channel coding,” 3rd Generation Partnership Project (3GPP), Tech. Rep. 3GPP TS 36.212, Sep. 2008. [25] M. Eroz and A. R. Hammons, Jr., “On the design of prunable interleavers for turbo codes,” in Proc. 1999 IEEE Veh. Tech. Conf. – Spring, vol. 2, pp. 1669–1673. [26] M. Ferrari, F. Scalise, and S. Bellini, “Prunable S-random interleavers,” in Proc. 2002 IEEE Conf. Commun., vol. 3, pp. 1711–1715. [27] L. Dinoi and S. Benedetto, “Design of fast-prunable S-random interleavers,” IEEE Trans. Wireless Commun., vol. 4, no. 5, pp. 2540–2548, Sep. 2005.

831

[28] “IEEE standard for local and metropolitan area networks—part 16: air interface for broadband wireless access systems,” IEEE, Piscataway, NJ, Tech. Rep. 802.16, 2009. [29] A. Parsons, “The symmetric group in data permutation, with applications to high-bandwidth pipelined FFT architectures,” IEEE Signal Process. Lett., vol. 16, no. 6, pp. 477–480, June 2009. [30] M. M. Mansour, “Parallel lookahead algorithms for pruned interleavers,” IEEE Trans. Commun., vol. 57, no. 11, pp. 3188–3194, Nov. 2009. [31] ——, “Pruned bit-reversal permutations: Mathematical characterization, fast algorithms and architectures (under review),” IEEE Trans. Signal Process., Oct. 2011. [32] R. L. Rivest, “Permutation polynomials modulo 2w ,” Finite Fields and their Applications, vol. 7, pp. 287–292, 2001. [33] D. Knuth, “Notes on generalized Dedekind sums,” Acta Arithmetica, vol. 33, pp. 297–325, 1977. [34] J. Ryu and O. Takeshita, “On quadratic inverses for quadratic permutation polynomials over integer rings,” IEEE Trans. Inf. Theory, vol. 52, no. 3, pp. 1254–1260, Mar. 2006. [35] “IEEE standard for local and metropolitan area networks—part 11: wireless LAN medium access control (MAC) and physical layer (PHY) specifications: Enhancements for higher throughput,” IEEE, Piscataway, NJ, Tech. Rep. 802.11n, 2009. [36] C. P. Schnorr and S. Vaudenay, “Parallel FFT-hashing,” in Proc. 1994 Int. Workshop on Fast Software Encryption, ser. LNCS, vol. 809. Springer-Verlag, Dec. 1994, pp. 149–156. [37] C. Berrou and A. Glavieux, “Near optimum error correcting coding and decoding: turbo-codes,” IEEE Trans. Commun., vol. 44, no. 10, pp. 1261–1271, Oct. 1996. [38] O. Takeshita, “On maximum contention-free interleavers and permutation polynomials over integer rings,” IEEE Trans. Inf. Theory, vol. 52, no. 3, pp. 1249–1253, Mar. 2006. Mohammad M. Mansour received his B.E. degree with distinction in 1996 and his M.E. degree in 1998 both in computer and communications engineering from the American University of Beirut (AUB), Beirut, Lebanon. In August 2002, Mohammad received his M.S. degree in mathematics from the University of Illinois at Urbana-Champaign (UIUC), Urbana, Illinois, USA. Mohammad also received his Ph.D. in electrical engineering in May 2003 from UIUC. He is currently an Associate Professor of Electrical and Computer Engineering with the ECE department at AUB, Beirut, Lebanon. From December 2006 to August 2008, he was on research leave with QUALCOMM Flarion Technologies in Bridgewater, New Jersey, USA, where he worked on modem design and implementation for 3GPP-LTE, 3GPP-UMB, and peer-to-peer wireless networking PHY layer standards. From 1998 to 2003, he was a research assistant at the Coordinated Science Laboratory (CSL) at UIUC. During the summer of 2000, he worked at National Semiconductor Corp., San Francisco, CA, with the wireless research group. In 1997 he was a research assistant at the ECE department at AUB, and in 1996 he was a teaching assistant at the same department. His research interests are VLSI design and implementation for embedded signal processing and wireless communications systems, coding theory and its applications, digital signal processing systems and general purpose computing systems. Prof. Mansour is a member of the Design and Implementation of Signal Processing Systems Technical Committee of the IEEE Signal Processing Society, and a Senior Member of the IEEE. He has been serving as an Associate Editor for IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS II since April 2008, Associate Editor for IEEE T RANSACTIONS ON VLSI S YSTEMS since January 2011, and Associate Editor for IEEE S IGNAL P ROCESSING L ETTERS since January 2012. He served as the Technical CoChair of the IEEE Workshop on Signal Processing Systems (SiPS 2011), and as a member of the technical program committee of various international conferences. He is the recipient of the PHI Kappa PHI Honor Society Award twice in 2000 and 2001, and the recipient of the Hewlett Foundation Fellowship Award in March 2006. He joined the faculty at AUB in October 2003.