Weight Distributions and Bounds for Turbo–Codes - Semantic Scholar

Report 2 Downloads 62 Views
Weight Distributions and Bounds for Turbo–Codes (∗) Yuri V. Svirid Chair for Communications, Technical University of Munich, Arcisstr. 21, D-80290 Munich, Germany and Belarusian State University of Informatics and Radioelectronics, P. Brovka str. 6, 220027 Minsk, Belarus

Abstract An optimal interleaving between two component encoders of a turbo–code is proposed. The optimality criterion for any real constructable interleaver is given. For component codes with known weight distribution (WD), the WD of the turbo–code with optimal interleaving is calculated. Furthemore, a method for approximate calculation of the WD of convolutional code with terminated feed–forward and feed–back encoders is proposed. Using this WD’s and proposed interleaving, the union upper bounds on the bit error rate of the turbo–code are given. It is shown that the often observed “break” in the performance curves of turbo–codes is a result of their “broken” WD. For component codes lying on the Gilbert–Varshamov bound, an upper bound on the ratio between the minimum distance of the turbo–code and the component code is determined. The lower bounds on the signal–to–noise ratio needed to achieve error free decoding of turbo–codes with random component codes are given. These bounds are compared with the Shannon limit and analogous bounds for random codes.

∗ This

work has already been in part reported in [1] and [2].

1 1.1

Introduction Turbo–Code and Component Codes

Any codeword of the recently introduced turbo–codes [3] has the following structure: F (I) = [I|IΛ1 |I 0 Λ2 ],

(1)

where F (·) is the function of the encoder, I is the k–tuple of the information bits, Λ1 and Λ2 are the mapping matrices from the space of dimension k to dimensions r and r0 , respectively, and I 0 is a version of I with interleaved (permutated) coordinates. An encoder corresponding to (1) is shown in Figure 1.

r

-

I

I k

k r

-

-

Λ1

IΛ1 r

? Interleaver I0

-

-

Λ2

I 0 Λ2 r0

k Figure 1: Basic structure of the turbo–encoder

The codes with generator matrices G1 = [Ik |Λ1 ] and G2 = [Ik |Λ2 ], where Ik is the k × k unit matrix, will be called the first and the second component codes of the turbo–code (1 ) respectively. As component codes both, systematic block codes and convolutional codes with terminated encoders have been in use until now. The rate of the whole code in both cases is R = k/(k + 2r) (if not mentioned otherwise, we only consider the case G1 = G2 and consequently r0 = r), but for convolutional codes the redundancy part is r = (n0 − k0 )(k/k0 + m), where it is assumed that each component code has rate Rc = k0 /n0 and memory m. Obviously, the overall turbo–code is a block code regardless whether the component codes are of block or convolutional type.

1.2

Linearity of Turbo–Codes

For the analysis we need to know whether turbo–codes are linear or not. The linearity condition for binary codes, F (I1 ⊕ I2 ) = F (I1 ) ⊕ F (I2 ), where ⊕ denotes the componentwise modulo–two addition of corresponding vectors, has to be true for any information vectors I1 and I2 . In our case it immediately implies (I1 ⊕ I2 )0 = I10 ⊕ I20 , which is true for any permutation. Thus, the all–zero codeword belongs to the code and only the WD’s instead of distance profiles have to be found. 1 We note, that from a decoding point of view it is better to define a component code as the smallest code to be decoded. For consideration of the whole turbo–code the definition above is favorable. Both definitions for convolutional codes are equivalent.

2

1.3

Union Bounds

The classical additive (union) upper bound on the bit error rate (BER) for any linear binary (k + r, k) code is [4]: k+r X δw PBER ≤ A(w)P (w), (2) k w=1 where A(w) is the number of codewords with Hamming weight w, δw is the average number of nonzero information symbols associated with a codeword of weight w, and P (w) is the probability of an error by maximum likelihood decoding for a code made of two codewords whose weights are zero and w. For a binary symmetric channel holds:     P (w) =

  

(w−1)/2 P ¡

¡

1 w 2 w/2

i=0

w w−i

¢

pw−i (1 − p)i

w/2−1 ¢ w/2 P ¡ w ¢ w−i p (1 − p)w/2 + (1 − p)i w−i p

,

for odd w

,

for even w,

(3)

i=0

where p represents the probability of a transmission error. For an unquantized Gaussian channel and BPSK modulation this probability is: Ãr ! 1 Es P (w) = erfc w , 2 N0

(4)

where Es /N0 is the signal–to–noise ratio (SNR) of the channel, and erfc(·) is the complementary error function. The probabilities (3) and (4) only depend on the weight w (neither on the error pattern nor on the length of the code), and are strictly decreasing functions of w. For systematic codes bound (2) can be rewritten as follows: k r X iX PBER ≤ A(i, j)P (i + j), k j=0 i=1

(5)

where A(i, j) is the number of codewords with Hamming weight i of the information bits and weight j of the redundancy bits. Similarly, we can write the union bound for a codeword of a turbo–code with the structure according to (1) as: r k r0 X i XX PBER ≤ A(i, j, j 0 )P (i + j + j 0 ), (6) k i=1 j=0 0 j =0

where A(i, j, j 0 ) is the number of codewords with Hamming weight i of the information bits, weight j of the first redundancy bits, and weight j 0 of the second redundancy bits. Bound (6) can easily be extended to three and more dimensional turbo–codes.

2 2.1

Optimal Interleaving for Component Codes with Known WD’s General Principles

We first consider one of the component codes. ¡Subdivide all 2k − 1 nonzero codewords into k groups so ¢ k that the ith (i = 1, 2, . . . , k) group consists of i codewords of weight i in the information part. Note that if the information vector I belongs to the ith group, then the permutated vector I 0 will be in this group too. Due to the quick decrease of the function P (i + j + j 0 ) (see (3), (4) and (6)) with increasing weight i + j + j 0 , the overall weight should be as large as possible. Thus, the aim of interleaving is to produce (by manipulating the weights of the second redundancy part) the codewords (1) with overall weights as

3

large as possible. That means that after interleaving within each group the first redundancy part with small weight should be associated with a second redundancy part with large weight and vice versa. Let the weight distribution (WD) of a component code be known in the form A(i, j). Now, within each group order the codewords with non–decreasing ¡weights of the redundancy part so that for any l ¢ holds: j(i, l + 1) ≥ j(i, l), where j(i, l), l = 1, 2, . . . , ki , is the weight of the redundancy part of the lth codeword in the ordered ith group. Note that for any i and l the numbers j(i, l) are determined by A(i, j). We shall call the connection of codeword ¡ ¢with parity–weight j(i, l), after interleaving with the codeword of parity–weight j(i, ˜l), ˜l = 1, 2, . . . , ki , and vice versa, the pair connection. The aim of the ¡ ¢ interleaving is fulfilled for the ith group if ˜l = ki − l + 1 (see Figure 2). The lth codeword of the z

k }| i i · · · · i i

r r }| { z }| { j(i, 1) j(i, 1) ­ Á J ­ j(i, 2) j(i, 2) µ @J ­¡ J · · @ ­¡ J ¡ · · @ ­ · · J@ ¡ ­ J · · ¡ @ ­ ¡k¢ ¡k ¢ J @ R j(i, −1) j(i, i −1) ¡­ i¡ ¢ J ¡k¢ J ^ j(i, k ) ­ j(i, )

{z

i

i

Figure 2: Optimal interleaving for the ith group turbo–code in this group has then weight W (i, l) = i + j(i, l) + j(i,

¡k¢ i

− l + 1).

(7)

From (7) we immediately obtain the WD of the turbo–code in the form A(i, j), which yields also the P number A(w) of codewords with weight w: A(w) = i+j=w A(i, j). An interleaving which leads to the same WD of the turbo–code as can be obtained from (7) will be called a fully optimal interleaving (f.o.i.). It is easy to see, that only by pair connections (if e.g. ∀i, l: j(i, l + 1) > j(i, l)) the f.o.i. does not exist. Actually, in the ith group each pair of codewords associated after interleaving determines i + 1 up to 2i positions in the permutation, which occurs if two up to i positions differ in the information parts of these codewords (we have to consider only the case i ≤ k/2; for i > k/2 the situation is the same if each “zero” is replaced to “one” and vice versa). But there exist altogether 2k−1 pairs of codewords and only k positions have to be determined for the permutation. Already in the first group (i = 1) the pair connections do not leave any degree of freedom for other groups. If there exists some l such that for some α > 0 all parity–weights from the set S1 = {j(i, l), j(i, l + 1), . . . , j(i, l + α)} are equal, then the codewords with parity–weights from the set ¡ ¢ ¡ ¢ ¡ ¢ S2 = {j(i, ki − l + 1), j(i, ki − l), . . . , j(i, ki − l − α + 1)} can be associated (not necessarily pair connected) with any codeword with parity–parts from the set S1 and the interleaving remains fully optimal. The problem of existence of f.o.i. in this case is not solved yet and seems to be a very interesting combinatorial problem. We shall not discuss it here, because for our point it is enough to assume the existence of f.o.i. anyhow in order to obtain “optimistic” WD’s and upper bounds. But, as we shall see especially in Section 5, this assumption is too optimistic and can cause codes which cannot exist.

4

2.2

Optimality Criteria

Now, we shall give optimality criteria for an interleaving which is or can be constructed in reality for the general case r 6= r0 . Let W(i, l) = i + j(i, l) + j 0 (i, ˜l), (8) 0 where j (i, ˜l) is the weight of the second parity–part, associated with the lth codeword in the ith group (the number ˜l is determined by l and the interleaving rule). Again, we can obtain from (8) the WD A(i, j) of the turbo–code. Then, viewing W(i, l) for each i as a random variable of l, the criterion for the optimal interleaving can be formulated as a problem of minimizing its variance: σl2 {W(i, l)} −→ min, where σl2 {W(i, l)}

=

 2 k k ( ( i) i) 1 X 2  1 X  W (i, l) −  ¡k¢ W(i, l) ¡k ¢ i

=

1 ¡k ¢ i

=

(9)

i

l=1 0 r+r X

 1 j 2 A(i, j) −  ¡k¢ i

j=0

(10)

l=1 0 r+r X

2 jA(i, j)

j=0

V (i).

(11)

We prove in Appendix A that for f.o.i. the variance (11) reaches its minimal possible value ¯ ¯ V(i) = σl2 {W(i, l)}¯ k ˜ l=( i )−l+1

(12)

and the minimum weight (8) in the ith group is maximal. Thus, the turbo–code with f.o.i. has the maximal possible minimum distance. The smaller the ith variance (9) for some real interleaver is, the nearer this interleaver is to the f.o.i. in the ith group. The sum: k 1X Q= (V (i) − V(i))β , (13) k i=1 can be taken as an empirical optimality criterion for the whole interleaver. It represents a measure of the difference between the sequences V (i) and V(i), i = 1, 2, . . . , k (β = 1, 2 . . .). In the sequel we shall set β = 1. Again, the smaller the achieved Q (13) for a real constructable interleaver is, the nearer this interleaver is to the f.o.i. A more significant criterion consists in obtaining bounds for each specific channel and a given error rate. Due to (8), counting all nonzero codewords, the bound (6) can be rewritten as: (ki) k k r+r 0 X X iX i X PBER ≤ P (W(i, l)) = A(i, j)P (i + j). k k j=0 i=1 i=1

(14)

l=1

Examples of the application of the proposed criteria are given in Section 3. Just as (6), these criteria can easily be extended to higher dimensions.

3 3.1

Examples with Block Codes as Component Codes Hamming codes

Consider the (7,4) Hamming code with generator matrix  0 1  1 0 ΛH =   1 1 1 1 5

GH = [I4 |ΛH ], where  1 1  . 0  1

Using this code for turbo–codes usually means that the information vector I (with coordinates numbered from 1 to 16) is written in lines into a 4×4 matrix (2 ). Each row is encoded with a (7,4) code so that the first component code (actually, the (28,16) code now!) has as generator matrix GT H = [I16 |ΛT H ], where   ΛH   ΛH  (blanc positions denote zeros). ΛT H =  (15)   ΛH ΛH This code will be called the (28,16) Hamming code. Afterwards, the coordinates of information vector I are permuted and the new vector I 0 is encoded according to the same procedure in order to obtain the second parity–part. Thus, we get a (40,16) code, which will be called the (40,16) Hamming turbo–code. The permutation rule is the very important part of the encoding procedure. In Figure 3 the

14

12

10

8 V(i)

Int2 6 Int1 4

2 F 0 0

2

4

6

8 i

10

12

14

16

Figure 3: Optimality criterion for distinct interleavers for Hamming (40,16) turbo–code variances V (i) (11) are plotted in terms of the group number i for three different interleavers: “F” “Int1” “Int2”

— — —

f.o.i. (variances (12)) a so called block interleaver: (1)(2 5)(3 9)(4 13)(6)(7 10)(8 14)(11)(12 15)(16) trivial interleaver (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16),

where a convenient notation for permutations is used (e.g. (2 5) denotes that the second information symbol is located after permutation in position five and the fifth — in position two; (1) denotes that the first symbol remains in position one). Block interleaving means that the encoding of the 4 × 4 information bits is done in “vertical” and in “horizontal” directions; trivial interleaving corresponds to the transmitting the parity–part of one component code twice. Using criterion (13) yields for the interleavers above: QF = 0, QInt1 = 4.059, and QInt2 = 8.076, respectively. No value V (i) below the value V(i) for the f.o.i. exists for any interleaver. The influence of the interleaver quality on the performance of the considered codes is shown in Figure 4, where together with the union bounds for the (40,16) Hamming turbo–code with interleavers “F”, “Int1”, and “Int2”, the simulation results (“S”) for block interleaver “Int1” (from [5]) and the union bound for the (28,16) Hamming code (“H”) itself are plotted; Eb /N0 denotes the SNR, where Eb is the energy per information bit. We see that 2 Other variants (e.g. if the information vector of length kγ is written into a k × γ matrix) are of course possible. Our aim in this Section is to give only some examples, which are easily to analyze and, nevertheless, allow to check the principial moments. An example with γ = 1 is given in Section 3.2.

6

0

10

-1

10

S

Bit error rate

-2

F

Int1

H

Int2

10

-3

10

-4

10

-5

10

0

1

2

3 Eb/No [dB]

4

5

6

Figure 4: Union bounds and simulation results for Hamming codes in turbo scheme • depending on the interleaver quality a coding gain of approximately 2 dB (at least for bounds) can be reached at a bit error rate of 10−3 by transition from trivial interleaver “Int2” to f.o.i. “F”. • the transition to the realizable block interleaver “Int1” yields a coding gain of approximately 1.5 dB under the same conditions • the performance of the turbo–code with the extremely poor trivial interleaver is even worse than the performance of the (28,16) Hamming code alone! • due to the untightness of the union bound at low SNR, the simulation results and the bound differ a lot in this range • the curves marked “F” and “Int1” have a slight “break” in the range of 2–3 dB. The explanation of the last observation can be found in the weight distribution of the turbo–code, which is plotted in Figure 5 in the form A(w) together with the WD of the component code. If we keep a close eye on the area with small weights w (Figure 6), we see that the minimum distance increases from three to five by transition to the turbo–code, while the number of codewords with minimum distance decreases from 28 to 8 and the number of the most probable codewords increases from AH (14) = 15558 to AT H (20) = 22318 for the component and the turbo–code respectively. For a Gaussian channel the δ5 20 curves corresponding to the terms 16 AT H (5)P (5) and δ16 AT H (20)P (20) in (2) differ much at low and large SNR and intersect at approximately 2 dB, which results in a “break” in the whole bound in this area. The fact that for some codes all components of the spectrum should be taken into account was first clearly stated by Battail in [6] and previous publications in his “criticism of the conventional minimum distance criterion” for searching good codes. It seems to be a general property of combined codes like turbo–codes with weak component codes or Battail’s iterated product of parity–check codes.

3.2

The (24,12) Golay code

We consider now the very powerful, extended (24,12) Golay code as a component code. The information part of 12 bits (numbered from 1 to 12) is interleaved in order to obtain the second parity–part, so that we have a (36,12) code which will be called the (36,12) Golay turbo–code. Figure 7 shows the union bounds for this code with f.o.i. (“F”), with interleaving “Int3”: (1 12)(2 11)(3 10)(4 9)(5 8)(6 7), and trivial interleaving (“Int4”), as well as the union bound for the (24,12) Golay code itself (“G”).

7

4

2.5

x 10

2

A(w)

1.5 H

TH

1

0.5

0 0

5

10

15

20 w

25

30

35

40

Figure 5: Weight distributions of the (28,16) Hamming code (“H”) and the (40,16) Hamming turbo–code (“TH”) with fully optimal interleaving

600

500

A(w)

400

300 H

TH

200

100

0 2

4

6

8

10

12

14

w

Figure 6: Weight distributions of the (28,16) Hamming code (“H”) and the (40,16) Hamming turbo–code (“TH”) with fully optimal interleaving (small weights)

8

Generally, we can make the same observations as for the Hamming code (the minimum distance increases, e.g. from 8 to 12), but two points differ in principle. At low SNR the (36,12) Golay turbo–code is essentially worse (at least in the bound) than the (40,16) Hamming turbo–code and there is not any “break” seen in the curves. Again, the explanations are found in the weight distributions (Figure 8). There are now enough codewords with minimum distance, so that the term corresponding to it in bound (2) differs not much from the other terms of the sum. Besides, it is known that the performance of codes with small minimum distance (5 by Hamming turbo–code) is better than the performance of codes with large minimum distance (12 by Golay turbo– code) at low SNR and vice versa at large SNR. Supported by this fact and the observations above, we do not recommend very powerful codes as component codes of turbo–codes, if they result in many codewords of the whole code at minimum distance.

4 4.1

Convolutional Codes as Component Codes Assumptions

In this Section we first give a method for approximate calculation of the weight distribution of systematic convolutional codes with both terminated feed–forward and feed–back encoders. We assume here that k is large and neglect the terminating effects. We assume also that each component code has rate Rc = 1/2 with an encoder consisting of only one shift register. This restriction is actually only relevant for feed–back encoders. The extension to feed–forward encoders of codes with Rc 6= 1/2 is straightforward. For obtaining the high rate turbo–code the puncturing of parity bits produced by each component encoder can be used (for systematic codes only the parity–bits have to be punctured). Let Rp = (r −rp )/r be the puncturing rate, where rp is the number of the punctured bits on the length r. Furthermore, we assume the shortest puncturing period, which e.g. for Rp = 1/2 means that each second bit in the parity–part is punctured, so that the resulting rate of each component code is Rc = 1/(1 + Rp ). Applying the f.o.i. rule for the calculated WD’s, we give the corresponding WD’s of the turbo–code and derive the union upper bounds on its BER.

4.2

Component Codes with Feed–Forward Encoders

Each bit in the parity–part of a convolutional code with feed–forward encoder consists of the modulo–two sum of some fixed number of J statistically independent information bits. The probability that this bit is “one”, given in [7], is: ρi = (1 − (1 − 2i/k)J )/2, (16) where the probability that each information bit is “one” is assumed to be i/k (this is true for large k according to the law of large numbers [8]). Hence, the probability of j “ones” in the parity–part of length r can be written in two ways — first, as the number of codewords A(i, j) divided by the number of ktuples with i “ones”, and second, as Bernoulli trials on the parity length r: µ ¶ A(i, j) r j ρi (1 − ρi )r−j . (17) ¡k ¢ = j i According to the DeMoivre–Laplace theorem [8] the right–hand side of (17) can be approximated for large r by a Gaussian distribution: µ ¶ A(i, j) 1 −(j − µi )2 , (18) ¡k¢ ≈ √ exp 2σi2 σi 2π i with mean and variance respectively: µi = rρi ,

σi2 = rρi (1 − ρi ).

9

(19)

0

10

-1

10

-2

Bit error rate

10

F

-3

10

Int3 G

Int4

-4

10

-5

10

-6

10

1

2

3

4 Eb/No [dB]

5

6

7

Figure 7: Union bounds for Golay codes in turbo scheme

3000

G 2500

A(w)

2000

1500

1000 TG G

G

500

0 0

5

10

15

20

25

30

35

w

Figure 8: Weight distributions of the (24,12) Golay code (“G”) and the (36,12) Golay turbo–code (“TG”) with fully optimal interleaving

10

Due to the symmetry of the Gaussian distribution (18) around its mean, one sees that after applying the f.o.i. rule (7) all codewords of the turbo–code within ¡ ¢ the ith group have weight W (i, l) ≈ i + 2µi , while the total number of codewords in this group is ki . Thus, from (14) we have: µ ¶ k X i k PBER ≤ P (i + 2µi ). k i i=1

(20)

Let us discuss the performance for small i and i around k/2. For a fixed rate R and limk→∞ i/k = 0 iJ 2 from (19) and (16) we have: µi = rJi/k and σi2 = r iJ k (1 − k ). For i = k/2: µi = r/2, σi = r/4 and the WD (17) is: ¯ µ ¶ A(i, j) ¯¯ r 1 = . (21) ¡k ¢ ¯ j 2r 2i=k i Also for i lying in a large area around the point 2i = k the WD (18) remains approximately (21). Thus, ½ i + 2rJi/k , for small i (22) W (i, l) ≈ i+r , for i near to k/2, where the value 2rJi/k (J ≈ 3–7) is usually much less than r. According to (22), the minimum distance for k = r is dmin = 1 + 2J, which remains constant by increasing k. Hence, we have again the situation which was observed for the (40,16) Hamming turbo–code: few codewords with minimum distance and many codewords in central groups, which results in a noticeable “break” in the performance curves of the whole code. An example of the union bound (20) for turbo–code with R = 1/2 (punctured with Rp = 1/2 component codes both with J = 4) and f.o.i. of various sizes is given in Figure 9. The size of the

0

10

-1

10

uncoded BPSK

-2

Bit error rate

10

-3

10

-4

10

400

100

50

40

30

20

10

-5

10

-6

10

1

2

3

4 Eb/N0 [dB]

5

6

7

Figure 9: Union bound for turbo–code with feed–forward component encoders and f.o.i. of sizes 10, 20, . . ., 50, 100, 400 interleaver has little influence on the bound (and, very likely on their real performance either) for sizes of f.o.i. close to 400. Real interleavers need of course larger sizes, but they should be not more than 1000 in any case if the interleaver is selected cleverly (see Section 5 for an explanation of the chosen interleaver sizes).

4.3

Component Codes with Feed–Back Encoders

These encoders were initially used in turbo schemes [3]. Each bit in the parity–part of a convolutional code with feed–back encoder consists of the modulo–two sum of some time varying number J(t), t = 1, 2, . . . , k,

11

of statistically independent information bits. The probability that this bit is “one” is then time varying too: ρi,t = (1 − (1 − 2i/k)J(t) )/2. (23) If we used the same method as for feed–forward encoders, then the probability of j “ones” in the parity–part of length r = k is calculated with the following formula: X

Y

Υ∈Nk cardΥ=j

t∈Υ

A(i, j) ¡k ¢ = i

ρi,t

Y

(1 − ρi,t0 ),

(24)

t0 ∈Nk \Υ

which generalizes in a simple way the Bernoulli trials (17) for the time varying probability ρi,t (here Nk = {1, 2, . . . , k} and cardΥ denotes the cardinality of the set Υ). However, the complexity of calculations with (24) increases exponentially with increasing k (we would generate all 2k various subsets of Nk ). For possible simplifications of the calculation we analyze at first the function J(t) in (23) for the encoder in Figure 10. A code with this encoder was used for both component codes in [3]. r

-

et ¤ ¤ ¶´ ? ¤²¶ / ´ +r - l

r

yt

r

r

- l ¾

r yt−4 zt

-

Figure 10: Feed–back convolutional encoder In Figure 10 let the input sequence be et , the sequence after the left modulo–two adder be yt , and the output sequence be zt , where t = 1, 2, . . . , k. Then for the sequence yt the following recurrent equation holds: y 1 = e1 ,

y 2 = e2 + e1 ,

y3 = e 3 + e 2 ,

y4 = e4 + e3 ,

yt = et + yt−1 + yt−2 + yt−3 + yt−4 ,

(25)

and the output sequence zt is: z1 = y1 ,

z2 = y2 ,

z3 = y 3 ,

z4 = y4 ,

zt = yt + yt−4 ,

(26)

where all additions are modulo–two additions. If we add yt and yt−1 obtained from (25), then we have: yt = et + et−1 + yt−5 .

(27)

Because the term yt−5 contains neither et , nor et−1 , we can conclude from (27) that after each five time units (from yt−5 to yt ) the number of input bits in the sequence yt increases by two (et and et−1 ). The same holds also for zt (26), if we remark that (zt + et ) = et−1 + et−4 + (zt−5 + et−5 ). Thus, the function J(t) for the encoder in Figure 10 can be approximated as: J(t) = 2t/5 + 3/5, where 3/5 is added because z1 = e1 and consequentely J(1) = 1. Let for some feed–back encoder hold: J(t) = η(t − 1) + 1. For the example above η = 2/5, and we can conjecture that the function J(t) is linear for any encoder. If the puncturing of parity–bits is used, then η (t − 1) + 1. (28) J(t) = Rp 12

Regarding (28), consider the time average of (23): · ¸ k 1X 1 − 2i/k 1 − (1 − 2i/k)kη/Rp 1 ρˆi = ρi,t = 1− · , k t=1 2 k 1 − (1 − 2i/k)η/Rp or for large k (k → ∞): ρˆi =

· ¸ 1 − exp(−2iη/Rp ) 1 1− . 2 2iη/Rp

(29)

(30)

Now, we can propose as an approximation of (24) the use of (17) with ρˆi instead of ρi . All other steps for obtaining the bound µ ¶ k X i k P (i + 2ˆ µi ), (31) PBER ≤ k i i=1 where µ ˆi = rρˆi , are similar to the bound for feed–forward encoders. According to (30) and (31), the minimum distance of the turbo–code for large k is: dmin = 1+2rρˆ1 , which is for the example above approximately equal to 1 + r/2. Therefore, the relationship between rate R, block length n = k + 2r, and dmin of the turbo–code is given by R = 1 − 4dmin /n. We should, however, note that for small i (i < k/10 for the example above) the approximation of ρi,t by ρˆi is unfortunately nontight, and the minimum distance of codes with feed–back encoders is thus determined by the proposed method nonexact (it is probably too large). Moreover, because of the recursion the minimum distance cannot fall into the first group. For increasing i the term i + 2ˆ µi in bound (31) becomes more and more exact, because the ρi,t (23) tends to 1/2 for any t, and thus, equation (24) is well approximated by (17) with time average (29). An example of the union bound (31) for punctured with rate Rp = 1/2 code from Figure 10 (Rc = 2/3, R = 1/2), and f.o.i. of various sizes is given in Figure 11.

0

10

uncoded BPSK

-10

10

50

100

-20

Bit error rate

10

-30

10

200

-40

10

300 -50

10

400 -60

10

1

2

3

4 Eb/N0 [dB]

5

6

7

Figure 11: Union bound for turbo–code with feed–back component encoders and f.o.i. of sizes 50, 100, 200, 300, 400 The simulation from [3] after 18 iterations and an interleaver of size 256 × 256 for the example considered here lies approximately 1 dB more to left than our bound in the range of the BER from 10−1 to 10−5 . The alterations of the Eb /N0 needed to achieve these BER’s in bound (31) with interleavings of length more than 400 are very small. Because of the increased minimum distance with increasing the block length according to (29), the “break” in the bound drops. In the next Section we shall consider idealized codes, for which, as we shall see, 2µi = r for any i, and the corresponding union bound has no “break”. These codes are useful for estimating the potential capabilities of turbo–codes as far as minimum distance or error free decoding is concerned. 13

According to (30), if η/Rp → ∞, then ρˆi = 1/2 and 2ˆ µi = r, which corresponds to random codes. Hence, for a fixed encoder (η is fixed) the codes with smaller puncturing rate (more bits punctured) are better. Without puncturing or for a fixed one the encoder with greater η is better. The value of η can be used as a quality criterion of a feed–back encoder.

5

Turbo–Codes with Random Component Codes

5.1

Weight Distributions, BER and Size of the Interleaver

A random code (or a code with average spectrum) is a useful idealization for the calculations in coding theory. It can be formally defined as a code with binary generator matrix, whose entries are choosen at random [9] (3 ). Random codes were suggested in [6] as optimal codes for soft decision decoding. In the following we shall analyze turbo–codes with such component codes. The WD A(w) of a random (k + r, k) code can be obtained by equating the probability of the occurrence of a binary (k + r)–tuple and the probability of the occurrence of a codeword both of weight w: µ ¶ k+r 1 A(w) = . (32) w 2r However, to apply equation (7) the WD is required in the form A(i, j). From a similar equation for each group we get: µ ¶µ ¶ k r 1 A(i, j) = . (33) i j 2r Because of Vandermonde convolution:

X µk ¶µr¶ µk + r¶ = , i j w i+j=w

the code with WD (33) is a random code, too. Some known classes of codes (see e.g. [10] for binary primitive BCH codes) have approximately the WD according to (32). Note that the WD (21) of convolutional codes with feed–forward encoders is close to (33) for k/2 − ε < i < k/2 + ε, where k/2 > ε > 0, and that the WD of convolutional codes with feed–back encoders, obtained from (17) and (29), is close to (33) for k/2 − ² < i < k, where k/2 > ² À ε > 0. Note also that even though the WD (32) can be a good approximation to some real WD’s, no code can have this WD exactly. This is due to two reasons. First, the value of A(w) (32) is non integer, and second, the applying of MacWilliams identity [9] to the code with this WD gives the dual code consisting of only one all–zero codeword, which obviously cannot be true. We shall call the turbo–codes with component codes having the WD (33) random ¡ ¢ turbo–codes. ¡ r ¢ Combining (33) and (7), we see that for each group i (due to the symmetry rj = r−j ) each parity–weight j is associated after f.o.i. with a second parity–weight r − j. Thus, ∀i, l : W (i, l) = i + r,

(34)

and the variance σl2 {W (i, l)} (10) is equal to zero, which is the minimum of any variance. ¡ ¢ Then, A(i, r) = ki , A(i, j 6= r) = 0, and the WD of the random turbo–code is: A(0) = 1, µ ¶ k A(w) = for r < w ≤ k + r, and w−r A(w) = 0 otherwise.

(35)

From (14) and (34) we can get the union upper bound µ ¶ k X i k PBER ≤ P (i + r) k i i=1 3 Note

(36)

that this type of random codes is different from the random codes used in the proof of Shannon capacity theorem.

14

for random turbo–codes with f.o.i. (9) and maximum likelihood decoding. No other union bound for turbo–codes can be better than (36). Figure 12 depicts the influence of the interleaver size k on the BER obtained from (36). One sees

5

4.5

Eb/N0 [dB]

4

3.5

3 10^(-3) 10^(-4)

10^(-5)

2.5

2 3*ln((1+sqrt(5))/2) in dB 1.5 1 10

2

10 Interleaver size k

3

10

Figure 12: Eb /N0 needed to achieve a BER of 10−3 , 10−4 , and 10−5 for random turbo–codes with f.o.i. (Rc = 1/2, R = 1/3) that the needed SNR has really very small alterations in the range of k = 300 . . . 1000 for any BER. Moreover, we shall show in Section 6 that for k → ∞ there exists a lower bound for the Eb /N0 needed to achieve a BER of zero. We shall see, that this bound for the considered rate of the turbo–code is √ 1+ 5 3 ln 2 (see Figure 12). Note also that because all results stem from the upper bound on BER, the actually needed Eb /N0 (for any BER > 0 and finite k) is smaller.

5.2

Asymptotic Behavior of the Minimum Distance

Let us look at the asymptotic properties of codes with WD (35). We see that the minimum distance of these codes is dmin = r + 1, (37) while the whole block length is n = k + 2r. Therefore, the rate of the code can be represented as R = 1 − 2(dmin − 1)/n or asymptotically (by n → ∞): R=1−2

dmin , n

(38)

which exactly corresponds to the asymptotic Plotkin upper bound [9]! This straight line in coordinates (dmin /n, R) is plotted in Figure 13 together with the asymptotic Gilbert–Varshamov lower and the Elias upper bound for block codes [9]. It is immediately clear that the existence of codes with relation (38) is impossible, because their performance falls on the wrong side of the Elias bound. We refer this fact first to the nonexistence of the in (34) assumed interleaving, although not all required connections can be pair connections here, and second to the nonexistence of codes with WD (32). Besides, even though the WD of the real constructable codes can be close to (32) or (33), and thus, to (35) if these codes are used in the turbo scheme, their mimimum distances are essentially smaller than (37) (see [9] for BCH codes and Section 4 for convolutional codes with feed–forward and feed–back encoders). For good component codes (which lie on the Gilbert–Varshamov bound) we can get the upper bound on increasing the minimum distance (dmin ) of the turbo–code in comparison to the minimum distance of the both component codes (dminc ). 15

It is well known (see e.g. [11]) that the minimum distance of random linear codes (considered as component codes with rate Rc ) lies on the Gilbert–Varshamov bound, or more precisely, by nc → ∞, where nc = k + r: dminc = H −1 (1 − Rc ), (39) nc where H −1 (·) is the inverse function of the binary entropy function H(x) = −x log2 (x)−(1−x) log2 (1−x) defined for 0 ≤ x ≤ 0.5. Then, regarding the relation r = 1 − Rc , nc from (37) and (39) we have:

dmin 1 − Rc . = −1 dminc H (1 − Rc )

(40)

This function is plotted in Figure 14 as well as those points corresponding to the Golay (Rc = 0.5) and Hamming (Rc = 4/7) codes with distinct interleavers from Section 3 (fully optimal interleavers (“F”) are marked with circles, others — with crosses). min We can consider equation (40) as an upper bound on ddmin in the sense that random turbo–codes c with f.o.i. have the maximal possible dmin when R is given (see proof in Appendix A). It is important to note, that for large Rc the turbo–codes have a potential possibility to surpass min (in an asymptotic context) the classical “product of codes” for which ddmin = dminc . Note also that c turbo–codes with f.o.i. are always better than the classical “sum of codes” (or turbo–codes with block min interleaver), for which ddmin ≈ 2. c

6

Turbo–Codes and Shannon Limit

Turbo–codes were first presented in a paper entitled “Near Shannon limit coding and decoding” [3]. The authors have achieved a BER of 10−5 at Eb /N0 ≈ 0.7 dB after 18 decoding iterations using a pseudo– random interleaver of size 65536 for rate R = 1/2 of the turbo–code and have compared this SNR with the Shannon limit. The Shannon limit [12], calculated from the mutual–information [13] of a memoryless Gaussian channel with binary input and unquantized output, lies around Eb /N0 ≈ 0.2 dB for the BER equal to zero on the assumption that R is equal to the channel capacity C. Thus, the comparison of Eb /N0 at a BER of 10−5 with the Shannon limit is not quite fair. In this Section we derive from bound (36) the lower bounds on Es /N0 and Eb /N0 , which are needed to achieve the error free maximum likelihood decoding of turbo–codes with interleaver size k → ∞ and any fixed rate R (consequentely, the derivation holds for n = (k + 2r) → ∞, too). These results are compared with the Shannon limit and analogous lower bounds for random codes with block length nc = (k + r) → ∞. Consider the ith term in bound (36). Let λ = i/k. Denote this term by T (k, λ). Then, from (36) and (4) we have: µ ¶ ³p ´ λ k T (k, λ) = erfc kξ , (41) 2 kλ ¡ ¢E s where ξ = λ + 1−R 2R N0 . Furthermore, we use two well known bounds — for the binomial coefficient [9]: p

2kH(λ) 8kλ(1 − λ)

µ ≤

k kλ



2kH(λ)

≤p

2πkλ(1 − λ)

,

and for the complementary error function [14]: µ ¶ √ exp(−x) 1 exp(−x) √ . 1− < erfc( x) < √ 2x πx πx

16

(42)

(43)

1 0.9 Random turbo-codes with fully optimal interleaving (Plotkin upper bound)

0.8

Rate R or R_c

0.7 0.6 0.5 0.4 Elias upper bound 0.3 0.2 Gilbert-Varshamov lower bound 0.1 0 0

0.05

0.1

0.15

0.2 0.25 0.3 0.35 (d_min/n) or (d_min_c/n_c)

0.4

0.45

0.5

Figure 13: Random turbo–codes with fully optimal interleaving cannot exist!

10 9 8

d_min/d_min_c

7 6 5 4 3 2 1 0

0.1

0.2

0.3

0.4

F Int3 Int4 0.5 R_c

Int1 (F) Int2 0.6

0.7

0.8

0.9

1

Figure 14: Asymptotic bound on the minimum distance gain

17

Inserting (42) and (43) into (41) yields: µ ¶ ψ 1 ψ < T (k, λ) < √ · Ψ(k, λ), · Ψ(k, λ) 1 − 2 2kξ π where ψ = √ 2

λ 2πλ(1−λ)ξ

(44)

is independent of k, and Ψ(k, λ) =

exp {k [H(λ) ln(2) − ξ]} . k

(45)

We are interested in limk→∞ T (k, λ). Both sides of (44) tend to zero with k → ∞ for H(λ) ln(2) ≤ ξ and to infinity for H(λ) ln(2) > ξ. Thus, limk→∞ T (k, λ) = limk→∞ Ψ(k, λ), and we have to analyze the exponent in (45). Figure 15 depicts an example for the graphical solution of the equation µ ¶ 1 − R Es H(λ) ln(2) = λ + (46) 2R N0 (the concave function describes the left–hand side and the straight lines correspond to the right–hand side for various Es /N0 ).

SNR>Omega

Functions of lambda

1 SNR=Omega

0.5 SNR 0. This fact confirms in a more precise way Battail’s inference [6] about the “non importance” of the minimum distance. Both Figures 16 and 17 contain also the lower bounds on analogous SNR’s for a random code alone. These bounds (by nc → ∞) are obtained in a similar way from the union bound µ ¶ nc X w nc 1 PBER ≤ · P (w), (49) n w 2r w=1 c which follows from (2) and (32). For random codes δkw is equal to nwc , which follws from the definition of random codes. The function Ψ(nc , λ) (like (45)) for random codes on the assumption that λ = w/nc is: n h io Es exp nc H(λ) ln(2) − λ N − (1 − R ) ln(2) c 0 Ψ(nc , λ) = , nc and the function to be zero (like (47)) is: · µ ¶¸ Es Es ∗ f (λ ) = ln 1 + exp − − (1 − Rc ) ln(2), N0 N0

(50)

where Rc = k/(k + r) and λ∗ = 1/(1 + exp(Es /N0 )). From (50) we have the required bounds on Es /N0 and Eb /N0 : Es = − ln(21−Rc − 1), N0

and

Eb ln(21−Rc − 1) =− . N0 Rc

(51)

Eb Note that again limRc →0 N = ln(4). 0 The bounds (51) for random codes are well known in coding theory, but are usually obtained in an other way (see, e.g., [4]). From Figures 16 and 17 follows that turbo–codes have indeed the capability to reach the Shannon limit for large code rates. The use of turbo–codes with small rates is (according to Figures above) not recommended, because their performance should be close to the performance of the component code alone. In [5] was observed that a turbo–code with (63,57) Hamming codes, used for encoding of the information in both “vertical” and “horizontal” directions by a block interleaver of size 57 × 57 = 3249, reaches a BER of 10−5 at Eb /N0 ≈ 3.4 dB, which is 1.2 dB away from the Shannon limit by overall rate R = 0.826 (see Figure 17). Our limit for turbo–codes for this code rate lies at approximately 3.02 dB. Thus, the result achieved in [5] is only 0.38 dB away from the potential capabilities of turbo–codes, even though the comparison at a BER = 10−5 is again not quite fair.

7

Conclusion

Using turbo–codes in a practical manner is a means of gradually (iterative) decoding the component codes until the decisions about the bits sent become stable. In contrast to those practical implementations, we have investigated here an approach for the analysis of the complete code. 19

1 0.9 0.8

(R or R_c)=C

0.7 Shannon limit

0.6

Random codes

0.5 0.4 TC

0.3 0.2 0.1 0 -20

-15

-10

-5 Es/N0 [dB]

0

5

10

Figure 16: Comparison of random turbo–codes (“TC”) with Shannon limit and random codes: lower bounds on Es /N0

1 0.9 0.8

(R or R_c)=C

0.7 Shannon limit

0.6

Random codes

0.5 0.4 Turbo-codes

0.3 0.2 0.1 0 -2

ln(2) in dB -1

ln(4) in dB 0

1

2 3 Eb/N0 [dB]

4

5

6

7

Figure 17: Comparison of random turbo–codes with Shannon limit and random codes: lower bounds on Eb /N0

20

We now would like to conjecture that iterative (turbo) decoding of component codes is the manner of decoding the whole code. From this point of view we can interpret the iterations as a means of using more and more information, which is available from the channel and constrained by the code structure, while the decoder of the complete code would use this information completely at once. If this is true, we can consider bound (14) (and its consequences) on the bit error rate of the whole code as a bound for the turbo decoding of the component codes after a sufficient number of iterations. The detailed analysis of the iterative decoding processes remains of course a future task. Using the proposed WD’s, one can obtain also tighter bounds than the union bound at low SNR according to [15], and cutoff rate lower bounds on the error exponent according to [16]. The most important theoretical result is the comparison of turbo–codes under best conditions (fully optimal interleaving, infinite length, random component codes) with the Shannon limit, only closely approached by high rate codes.

8

Acknowledgements

The author would like to thank Prof. J. Hagenauer for his wonderful introduction to turbo–codes and the guidance through this research. I’m also very grateful to the German Academic Exchange Service (DAAD) for the financial support. My further thanks go to my colleagues at the Chair for Communications of TU Munich — J. Berkmann, G. Buch, F. Burkert, M. Drimmel, M. Fleischmann, and S. Riedel — for many fruitful discussions during the preparation of this work, to Prof. G. Battail, Prof. J. Huber and U. Wachsmann for the useful comments on the preliminary version of the manuscript, and to two anonymous reviewers for many helpful suggestions.

A

Appendix

We have to show that the variance (10) reaches its minimal possible value for f.o.i. in each ith group. P(ki) Let l=1 j(i, l) = S. Then (ki) 2S 1 X W(i, l) = i + ¡k¢ ¡k ¢ i

i

l=1

is independent of l, and we can consider in (10) only the term (ki) X

W 2 (i, l).

(52)

l=1

Let some parity–weights a and c, such that without loss of generality a ≤ c,

(53)

be associated after f.o.i. with weights b and d of the second parity–part, respectively. Then, due to the interleaving rule, b ≥ d. (54) Assume that another interleaver X associates the weight a with d and the weight c with b. The comparison of the terms (52) for both interleavers leads to compare or or

(i + a + b)2 + (i + c + d)2 ab + cd (a − c)(b − d)

and and and

(i + a + d)2 + (i + b + c)2 ad + bc 0,

from which it immediately follows, due to (53) and (54), that for the interleaver X the variance (10) is greater than for f.o.i. Furthemore, because of the arbitrary choice of a and c no other interleaver can have a variance (10) less than f.o.i.

21

Now, we show that for f.o.i. the minimum weight in the group i: wmin = i + a + b

(55)

is maximal. It is obvious that the minimum distance of the turbo–code corresponds to the minimum weight in some group i. 0 The minimum weight wmin = i + c + d after another interleaver X can be greater than (55) in the following cases: 1. a = c and b < d 2. a < c and b = d 3. a < c and b < d 4. a > c and d > b + a − c 5. b > d and c > a + b − d Consider the first case. Let the second weight d be associated after f.o.i. with a first weight e, where e < a = c because of d > b. If after interleaver X the weight e is associated with some weight f ≤ d, then e + f < c + d and, 0 thus, wmin = i + c + d is not the minimum weight. If f > d, then let the weight f be associated after f.o.i. with a first weight g, where g < e because f > d. Considering the weight g analogous to weight e above, and applying the described procedure until we get the smallest parity–weight in the group, we get convinced that after no other interleaving a minimum weight within any group i can be obtained smaller than wmin (55) for f.o.i. The proofs for other cases are similar.

References [1] Yu. V. Svirid: Additive upper bounds for turbo–codes with perfect interleaving. EIDMA Winter Meeting on Coding Theory, Information Theory and Cryptology, Veldhoven, The Netherlands, p. 35, December 19–21, 1994. [2] Yu. V. Svirid: Weight distributions of turbo–codes. Accepted for 1995 IEEE International Symposium on Information Theory, Whistler, B. C. Canada, September 17–22, 1995. [3] C. Berrou, A. Glavieux, P. Thitimajshima: Near Shannon limit error–correcting coding and decoding: Turbo–codes (1). International Conference on Communications (ICC’93), Geneva, Switzerland, pp. 1064–1070, May 1993. [4] G. C. Clark, Jr., J. B. Cain: Error–correction coding for digital communications. Plenum Press, New York, 1981. [5] J. Hagenauer, E. Offer, L. Papke: Iterative decoding of binary block and convolutional codes. Submitted to “IEEE Transactions on Information Theory”, January 1995. [6] G. Battail: Coding for the Gaussian channel: the promise of weighted–output decoding. “International Journal of Satellite Communications”, vol. 7, pp. 183–192, No. 3, 1989. [7] J. L. Massey: Threshold decoding. Cambridge, Ma, M.I.T. Press, 1963. [8] A. Papoulis: Probability, random variables, and stochastic processes. McGraw–Hill, Inc., 1991. [9] F. J. MacWilliams, N. J. A. Sloane: The theory of error correcting codes. North–Holland, 1977. [10] V. M. Sidel’nikov: Weight spectrum of binary Bose–Chaudhuri–Hocquenghem codes. “Problems of Information Transmission”, vol. 7, pp.11–17, 1971.

22

[11] J. N. Pierce: Limit distribution of the minimum distance of random linear codes. “IEEE Transactions on Information Theory”, vol. 13, pp. 595–599, October 1967. [12] C. E. Shannon: A mathematical theory of communication. “Bell System Technical Journal”, vol. 27, pp. 379–423, 623–656, July, October, 1948. [13] R. E. Blahut: Principles and practice of information theory. Addison–Wesley, 1987. [14] N. M. Blachman: Noise and its effect in communication. McGraw–Hill, 1966. [15] G. Poltyrev: Bounds on the decoding error probability of binary linear codes via their spectra. “IEEE Transactions on Information Theory”, vol. 40, pp. 1284–1292, July 1994. ˇ [16] D. E. Lazi´c, V. Senk: A direct geometrical method for bounding the error exponent for any specific family of channel codes — Part I: Cutoff rate lower bounds for block codes. “IEEE Transactions on Information Theory”, vol. 38, pp. 1548–1559, September 1992.

23