SELECTION OF COMPONENT CODES FOR TURBO CODING BASED ON CONVERGENCE PROPERTIES Jakob Dahl Andersen Dept. of Telecommunication, Technical University of Denmark, http:/ www.tele.dtu.dk/~jda/
ABSTRACT
The turbo decoding is a sub-optimal decoding, i.e. it is not a maximum likelihood decoding. It is important to be aware of this fact when the parameters for the scheme are chosen. This goes especially for the selection of component codes, where the selection often has been based solely on the performance at high SNR’s. We will show that it is important to base the choice on the performance at low SNR’s, i.e. the convergence properties, as well. Further, the study of the performance with different component codes may lead to an understanding of the convergence process in the turbo decoder. 1. INTRODUCTION The main principle in the turbo coding scheme [1] is to use two codes in parallel. This means that the information sequence is encoded twice, the second time after a reordering of the information bits. The component codes are chosen as small convolutional codes in the recursive systematic form. With this encoding we are able to decode the two encoded streams with an iterative process using two soft-in soft-out decoders, one corresponding to each of the encoders. For the simulations shown in this paper we have used a MAP decoder. With this coding scheme we have to be concerned about two things. What is the distance profile of the total encoded sequence, and can we assure that the iterative decoding will converge. As regards the distance profile it is important that the
information patterns giving low weight words for the first encoder are not interleaved to similar patterns for the second encoder. This is in fact the reason why we use the convolutional codes on the recursive systematic form. With the normal feed-forward form a single 1 will give a codeword of low weight, and this single 1 will of course appear as a single 1 after any interleaving. With the recursive systematic codes the single 1's will give codewords of semiinfinite weight, and the low weight words appear with patterns of 2, 3 and 4 information bits. Concerning the convergence properties we must look at the information exchange between the decoders. The output is the log-likelihood ratio for the information bits, it.
7 ' ' '7
6 6
' '
> > 6 6
' %7 ' >
%7
)
>
)
The log-likelihood ratio can be divided into the loglikelihood ratio for the a priori probabilities and a component depending on the parity bits and a priori probabilities for the surrounding information bits. The output from the first decoding in decoder 1 can be used directly as a priori probabilities in decoder 2. But when we return to decoder 1 for the second iteration we can not use the output from decoder 2 directly, since part of it came from decoder 1 in the first place, this is mainly 7a priori . We therefore subtract 7a priori before the results are used in decoder1. However, there is still some positive feed-back since 7) depends on the a priori probabilities for the surrounding information bits. This means that we can not assure that the iterative decoding algorithm will converge The suboptimal decoding has been thoroughly analysed in [2]. Due to the suboptimal decoding the performance curves for the turbo scheme consists of two parts as seen on Figure 2. For low Signal-to-Noise Ratios (SNR) the problem is lack
Figure 1 Encoder and decoder for turbo codes
of convergence in the decoding process, resulting in frames with a large number of bit errors. If we use a large block and a large number of iterations, this part of the curve may be very steep. For high SNR’s the decoding is close to maximumlikelihood decoding. But the turbo coding scheme usually suffers from a few codewords of low weight. We call this part of the performance curve the “error floor” since the improvement for increasing SNR is very small. For pseudorandom interleavers the performance in this region can be bounded quite accurately [3,4]. However, the “error floor” depends on the specific interleaver and can be lowered by improving the interleaver construction instead of using pseudo-random interleavers [5,6]. When we compare coding schemes with different component codes it seems to be a typical behaviour that codes with good convergence properties have inferior performance on the error floor. We will try to give an explanation for this in the following sections. The performance curves and the typical relation between schemes with different component codes are illustrated in Figure 2. The choice between code 1 and code 2 in this example depends on the required Bit Error Rate. If we just require a BER of A, code 2 is obviously the best choice. If we require a BER of C, none of the codes will do. Although the BER will reach C for both codes, this will require a very high Eb/N0, and there may be other coding schemes which will be superior to the turbo codes. Finally, if the required BER falls in the region between the two error floors, B, code 1 will be the best choice. The relation between the schemes with different component codes depends on the interleaver size and on the specific interleaver used. The relations may be different if we choose to look at the Frame Error Rate (FER) instead of the BER. This means that we cannot claim that one specific component code is optimal. All the parameters - component code, number of iterations, interleaver size and construction, and performance requirements - must be considered.
Figure 2 Typical performance-curves for turbo codes
However, we can find component codes with good (or optimal) behaviour in a more restricted area. That is codes with optimal performance at high SNR’s or codes with optimal convergence properties - or codes with an appropriate blend of these two properties. In the following we will make some considerations on the choice of component code. These considerations will be illustrated by selection of component codes with rate 1/2 and 16 states. 2. COMPONENT CODES GIVING LOW ERROR FLOORS At high SNR’s the decoding will converge after only a few iterations, and the iterated turbo decoding is close to maximum-likelihood decoding. With maximum-likelihood decoding we will make a decoding error whenever the received word is closer to another codeword than the correct one. With the turbo codes we get some additional errors because the decoding it is not true maximum-likelihood. Several times we have seen the decoder locate a codeword that is close to the correct one, but not the maximumlikelihood decision. We expect that this occurs when the iterated decoding gets caught in a local minimum. In any case the decoding errors at high SNR’s relate to codewords of low weight for the complete turbo scheme. The low weight codewords appear when low weight words for the first component codes are interleaved to low weight words for the second component code. The probability of interleaving a pattern of information bits to a similar pattern decrease with the number of information bits involved. The worst patterns are patterns with only two information bits. The selection of component codes based on the performance at high SNR’s is thoroughly discussed in [7,8]. The best code with rate 1/2 and 16 states is found to be (1,37/31), i.e. (1,1+D+D2+D3+D4 /1+D+D4 ). The important part of the distance profile for this code is found in Table 1. The most important distance is the minimum weight of twoinput words, 12 in this case. Since 31, (1+D+D4), is a primitive polynomial, the two-input words are as long as possible for the given memory. Even with this component code the minimum weight of the turbo scheme might be as low as 8, since A(6,4)=1. However, most interleavers will corrupt this pattern. If we use a pseudo random interleaver of any size, we must expect two codewords of weight 22, due to the two-input word with weight 12. We can do much better by using interleavers constructed for the specific codes as described in [5,6]. Currently we have found a 7×577interleaver where the minimum distance is estimated to be 66.
A(w,Z)
w=2
w=3
w=4
w=5
A(w,Z)
w=2
w=3
w=4
w=5
Z=6
0
0
1
0
Z=6
1
0
0
0
Z=8
0
3
2
0
Z=8
1
2
2
0
Z=10
0
3
6
14
Z=10
1
4
7
10
Z=12
1
0
21
40
Z=12
1
6
19
45
Z=14
0
4
29
86
Z=14
1
8
38
124
Table 1 Distance profile for code (1,37/31). A(Z,w) denotes the number of codewords with input weight w and output weight Z.
3.
COMPONENT CODES WITH OPTIMAL CONVERGENCE
PROPERTIES
In order to find component codes with optimal convergence properties, we have made a simulation with all memory-four codes where the generator polynomials start and end with a 1. The results of these simulations are shown in Figure 3. The block length is 5000 and the signal-to-noise ratio is Eb/N0=0.1 dB.
Table 2 Distance profile for code (1,37/25). A(Z,w) denotes the number of codewords with input weight w and output weight Z. The code with the best convergence properties found in our search was (1,37/25). The distance profile for this code is shown in Table 2. We will now compare the simulations results to the distance profile and especially the minimum weight of the two-input words, but before we do that we will make a definition of a pseudo-codeword for a recursive systematic code. For the usual feed-forward convolutional codes we define a catastrophic code as a code where an input with finite
Figure 3 Performance with different 16-state, rate 1/2 component codes at Eb /N0 =0.1 dB, Blocksize 5000 Pseudo Random Interleaver. The codes are shown in octal representation.
weight produce an output with infinite weight. Obviously this cannot be the case for a systematic code. However, with the recursive codes there may be a cyclic sequence of states with an all-zero output. This will result in pseudo-codewords where a finite weight input result in a finite weight output but does not bring the encoder back to the zero-state. When we look at the distance profile we must be aware of these pseudo-codewords. For example the code (1,33/21) (octal representation) have one of these pseudo-codewords. The input sequence ...01010... will produce the parity output ... 01110..., but the state will not return to zero. This means that although the true free distance is 6, the code will act as if the free distance was only 5. In Figure 4 we have compared the performance at Eb/N0=0.1 dB with the minimum weight of the two-input words, including pseudo-codewords, for all the codes. We see that the codes with the best convergence properties all have minimum weight 6 for the two-input words. The picture is even more clear when we remove the codes with free distance less than 6 (again including pseudo-codewords).
This is shown in Figure 5. Similar results are obtained with 8 and 32 states, and codes with rate 1/3 and 8 states. We will try to explain why we have this relation. The iterated decoding can be viewed as a series of adjustments to the decoded results based on small changes in the a priori probabilities. Since we use a soft-output decoder all changes in the a priori probabilities will give small changes to the output, but the more pronounced changes to the decoder output correspond to adding (low weight) codewords to the previous estimate. This is clearly the case when a Viterbi decoder (SOVA) is used, and the output is the most probable codeword. The MAP decoder will have the same behaviour, although the final output may be found as a mix of the most probable codewords. The changes in a priori probabilities originate from the other decoder. Here the information symbols do not form a low weight codeword, due to the interleaver. This means that the direction of the changes in a priori probabilities appear at random. Therefore the probability that the changes will give a significant overall contribution decrease when the number of information symbols increases. If the number of parity bits for the codeword is large, the decoder will be very rigid and a huge change in the a priori probabilities is required. Since the change in a priori probabilities from one iteration to the next is moderate, the number of parity bits in the involved codewords must be low. This means that codes with good convergence properties have low weight words with few information bits. This requirement point at very bad codes, but of course we must still require that the component code have good errorcorrecting capabilities, which explains why the codes with low free distances do not perform well.
Figure 4 BER at Eb/N0=0.1 dB versus minimum weight of two-input words.
These requirements for good convergence properties explain the typical relation between the codes mention in Section 1, and illustrated on Figure 2. For low SNR’s the optimal codes have two-input and three-input words with low weight, and these words contribute to the error floor at high SNR’s. 4.
Figure 5 BER at Eb/N0=0.1 dB versus minimum weight of two-input words for codes with dfree$6.
ERROR EVENTS
To get an understanding of the convergence process we have divided the error events in 6 categories, depending on the convergence and the number of bit errors. The error categories are described in Table 3. By “agreement” we mean that the two decoders have reached the same harddecision decision, erroneous or not. These categories are chosen since there seems to be two critical phases in the turbo decoding, one in the beginning, where there is still about 10% of bit errors, and one at the end when we are very close to the final decision. We will also see in the following that the main part of the errors lies in categories 0,1,4 and 5.
Error cat.
Number of bit errors
Convergence status
0
5%<ERR
No Agreement
1
5%<ERR
Agreement
2
20<ERR