IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 19, NO. 5, MAY 2001
871
Design of Fixed-Point Iterative Decoders for Concatenated Codes with Interleavers G. Montorsi, Member, IEEE, and S. Benedetto, Fellow, IEEE
Abstract—We discuss the effects of quantization on the performance of the iterative decoding algorithm of concatenated codes with interleavers. Quantization refers here to the log-likelihood ratios coming from the soft demodulator and to the extrinsic informations passed from one stage of the decoder to the next. We discuss the cases of a single soft-input soft-output (SISO) module, in its additive log-likelihood version (L-SISO), performing sequentially all iterations (an implementation solution coping with medium-low data rate as compared with the hardware clock), and that of a pipelined structure in which a dedicated hardware is in charge of each SISO operation (an implementation suitable for high data rates). We give design rules in both cases, and show that a suitable rescaling of the extrinsic informations yields almost ideal performance with the same number of bits (five) representing both loglikelihood ratios and extrinsic information at any decoder stage. Index Terms—Iterative decoders, quantization, turbo codes.
Fig. 1. Parallel concatenation: coder and iterative decoder with L-SISO.
I. INTRODUCTION
T
URBO CODES were proposed in 1993 [1]. After just a few years, there is a wide consensus on their importance as a great achievement in coding theory. Turbo codes are parallel concatenated convolutional codes (PCCC) with the structure shown in Fig. 1. A PCCC is formed by two convolutional constituent encoders and one interleaver. that genThe information sequence enters the first encoder erates the code sequence . At the same time, is transformed , which is sucby the interleaver into a permutation of it, yielding the coded cessively encoded by the second encoder sequence . The coded sequence , formed by concatenating and , is then sent to the channel through a in some way suitable modulator. In the practical impossibility, for medium-large interleavers of maximum-likelihood decoding, an iterative suboptimum decoding algorithm is employed, consisting of repeated applications of a posteriori probability (APP) computations based on the trellises of the two constituent encoders. One iteration of the decoding algorithm is performed according to the block diagram of Fig. 1, which makes use of a basic building block called logarithmic soft-input soft-output (L-SISO). A complete description of the L-SISO in its several variants can be found in [2], whereas a synthetic recall of the main equations will be given in Section IV. It is important to mention that both PCCC encoder and decoder can work in a continuous or in a block-wise fashion. In
Manuscript received April 30, 2000; revised January 9, 2001. This work was supported in part by Qualcomm, Inc. and in part by Agencia Spasiole Italiana. G. Montorsi and S. Benedetto are with the Dipartimento di Elettronica, Politecnico di Torino, 10129 Torino, Italy (e-mail:
[email protected]). Publisher Item Identifier S 0733-8716(01)03909-9.
Fig. 2.
Serial concatenation: coder and iterative decoder with L-SISO.
the second case, the trellises of the two encoders need to be terminated [3] and the interleaver must be a block interleaver [4]. As an alternative to PCCC, serially concatenated codes with interleavers (SCCC) have been proposed in [5]. Their encoder and decoder block diagram are shown in Fig. 2. As the block diagrams of the iterative decoders for PCCC and SCCC show, the main difference between the two cases of PCCC and SCCC is that the first never does make use of the output of the SISO pertaining to the updated likelihood ratios of the coded symbols. Analysis and design of PCCCs and SCCC aimed at minimizing the bit and frame error probability have been presented in [6]–[8]. After years of theoretical investigations aimed at understanding and explaining their amazing performance, turbo codes have recently started to enter the field of practical applications. As two important examples, they have been chosen in the recently approved new telemetry coding standard by Consultative Committee for Space Data Systems (CCSDS), and will be used for medium-high data rate transmission in the new UMTS third-generation mobile communication standards. When dealing with implementation, and, in particular, when using field programmable gate arrays (FPGA) and very large
0733–8716/01$10.00 © 2001 IEEE
872
Fig. 3.
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 19, NO. 5, MAY 2001
Notation of the considered system.
scale integration (VLSI) technologies, a crucial design issue is the number of bits required to represent the quantities involved in the decoding algorithm. Some preliminary results in this area, mainly related to the quantization of the log-likelihood ratios, have already been presented in [9]–[12]. In this paper, we address in a more complete and systematic form all the issues involved in the design of the finite precision, fixed-point implementation of a turbo decoder, including the required internal precision of the SISO decoders and different strategies for the pipelined version of the decoder. Throughout the paper, all examples and simulation results will refer to the specific case of the UMTS PCCC. However, the methodology, as well as the design hints suggested in the paper, hold as well for other PCCCs (such as the CCSDS standard code), and for the case of SCCC. II. SYSTEM MODEL In the following, we will assume that the system works according to the block diagram (and the notations therein) shown in Fig. 3. The information sequence emitted by the source encoder (PCCC or SCCC), is encoded by the rate which transforms the information sequence into the coded sequence . The coded sequence is sent to the channel as the signal sequence obtained at the output of a binary pulse-amplitude modulation (PAM) (or, equivalently, BPSK) modulator. The relationship between the modulated signal and the coded bits is
Fig. 4.
The operations performed by the soft demodulator.
2) The internal precision of the decoder arithmetic operations. 3) The integer representation of the extrinsic information involved in the iterative decoding process. 4) The fourth problem has to do with the fact that the values of the quantities involved in the SISO operations (namely, the extrinsic informations) grow as long as the iterations progress, thus invoking, for the best performance/complexity tradeoff, a different integer representation of the quantities in each iteration. In the following, we will address all the aforementioned problems. As mentioned in the introduction, the attention will be focused on the case of PCCC. The numerical examples and results will refer to an important application: the new UMTS coding standard. III. INTEGER REPRESENTATION OF LLRS
the signal energy. The channel is having denoted by assumed to include both an additive white Gaussian noise (AWGN) channel1 , with two-sided noise power spectral , and the receiver matched filter plus sampling, so density that its output (the output of the demodulator matched filter) will form the sufficient statistics of the received signal. The log-likelihood ratio (LLR) of the th coded bit, constructed by the “soft demodulator” in Fig. 3, is defined as
We assume that the real value is represented by the integer using a total number of bits according to the fixed-point , where is the precision of representation the quantity , i.e., the number of bits to the right of the point of that are kept in the integer binary representation . We , which represents the will call dynamic the quantity number of bits to the left of the point of that are kept in its integer representation . To transform the real number into its quantized version , we need then the two (formally distinct) operations shown in Fig. 4. The first, “quantization”, operates on to obtain an inas follows teger
(2)
(3)
(1)
The inputs to the decoders shown in Figs. 1 and 2 are the sequences of the LLRs of the coded bits, which, according to (2), are a scaled version of the matched filter outputs . When implementing the receiver for a system like the one depicted in Fig. 3, one faces several problems related to the finite-precision arithmetic of a digital implementation. 1) The integer representation of the LLRs, which are inherently real numbers. 1Also,
the case of Rayleigh independent fading channel, particularly important for wireless applications, will be considered in the simulation result section. The results for this channel will not show significant differences in the design conclusions.
means “integer part of ”. This operation fixes the where number of bits to the right of the point and then the precision of the representation. The second operation, denoted by “satuof as follows: ration,” yields the final quantized version (4) and the negative numbers are represented using a complement-2 notation. An example will help. Example 1: Suppose that the sampled output of the matched filter has a value of 4.512 78 when the nominal value is
MONTORSI AND BENEDETTO: DESIGN OF FIXED-POINT ITERATIVE DECODERS
873
and that the estimated signal-to-noise ratio (SNR) is . According to (2), the LLR is then
Choosing a the total number of bits , we have different choices for the precision. For example, two of these choices are the following. , yielding a quan• Use two bits for the precision (whose binary representation is tization step . The quantizer 0.01) and a representable range output is then
while the value after saturation is
• Use only one bit for the precision , yielding a quantization step equal to 0.5 (0.1) and a representable . The quantizer output is then range
and the saturated value is
The number of precision bits in the representation of the LLRs is a crucial parameter for an operation performed by the ” (see Section IV or [2]) block SISO that we call “ (5) In the implementation of the SISO algorithm, the second operation in the right-hand-side (RHS) of (5) (which we call “correction” term) is performed using a look-up table accessed by the magnitude of the difference between the two LLRs. The size of the look-up table, as well as the values contained in it, depend on the number of precision bits . In particular, the value of the correction term
Fig. 5.
PCCC encoder adopted in the UMTS-3GPP standard.
Example 2: Assume that . Thus, from (7), we get . For a value equal to seven of the difference , , whose closest three-bit reprewe obtain from (6) . As a sentation is 0.375 corresponding to the binary consequence, in the seventh position of the look-up table, we will store the number three. Using the same procedure, we construct the following 22-integer element look-up table.
which corresponds to the fixed point numbers (found inthe equation at the bottom of the page). As already mentioned, all simulation results in the following, will refer to the new third-generation (UMTS) standard for high quality of service (QoS), medium-high data rate mobile commuPCCC based on two eight-state nication. It employs a rate systematic recursive convolutional encoders, as shown in Fig. 5. The block sizes vary from a few hundred to a few thousand. The simulation results refer to ten iterations of the decoding algorithm and a block size of 4828. The results on the decoder design that we obtain, on the other hand, are not restricted by the specific UMTS constituent encoders. In fact, we have applied the same analysis to the new CCSDS telemetry channel coding standard based on two 16-state constituent encoders, obtaining the same results in terms of final design parameters. The number of simulated frames has been chosen large enough to estimate a bit error rate (BER) and a frame error rate , in order to accurately (FER) probability as low as assess the effects of quantization on the error floor of the decoder. As we will see, the FER curves permit often a better estimate of the quantization effects. In the simulations, we have used a sliding-window version of the L-SISO algorithm that is accurately described in [2]. A. Number of Bits for the Precision of LLRs
(6) will be the closest representation of the RHS of (6) using bits, whereas the number of entries in the look-up table will be the minimum positive integer satisfying the inequality (7)
In order to design the required number of bits of precision for the representation of the LLRs, we have simulated the iterative decoding assuming that all operations performed by the decoder are ideal (i.e., with infinite precision), except those involving the fractional part of the LLRs, which is represented using bits. The look-up table yielding the correction term is constructed accordingly.
874
Fig. 6.
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 19, NO. 5, MAY 2001
Influence of precision in the performance of Turbo codes.
In Fig. 6, we report the simulated BER and FER versus for different values of the number of precision bits 2 . We have verified that three bits of precision (corresponding to a look-up table with 22 elements) yield results coincident with infinitely soft quantization so that, in the following, it will be denoted as “ideal” curve. From the curves, we see that two bits of precision are very close to the ideal case. Using only one bit of precision, on the other hand, causes a loss of 0.1 dB for medium-low SNRs but has no consequences on the error floor performance. From the error probability curves, we notice that an insufficient number of bits of precision affects the convergence abscissa of the iterative decoding algorithm, i.e., the behavior of the algorithm at low SNRs. This is due to an insufficient accuracy in the evaluation of the correction term. 2When we have a negative value for p, interpreting the precision as the “number of bits to the right of the decimal point” has no meaning. To understand the meaning of a negative precision, see (3).
Fig. 7. Influence of dynamic in the performance of Turbo codes.
B. Number of Bits for the Dynamics of LLRs To gain informations on the required bits for the dynamics of the LLRs, we have repeated the previous simulations assuming as seen previan infinite precision (equivalent to case ously, and thus used in the simulations) for all decoder operations, except for the dynamic of LLRs, which is based on bits. In Fig. 7, we report the simulated BER and FER versus for different values of the number of dynamic bits . In this case, we have kept fixed the precision to the , corresponding to the ideal case. As a benchmark, value we also report the ideal case corresponding to an infinitely-soft representation. From the curves, we see that three bits of dynamics yields a performance coincident with the ideal one, and that two bits lead to a degradation of about 0.3 dB. To confirm the previous design results when considering the two effects of finite precision and dynamic together, we report and in Figs. 8 and 9 the FER results obtained using overall bits with different precisions.
MONTORSI AND BENEDETTO: DESIGN OF FIXED-POINT ITERATIVE DECODERS
Fig. 8. Performance with n = 4 and different precisions.
875
Fig. 10.
L-SISO.
Fig. 11.
Notation for the trellis section in the SISO decoder.
are the extrinsic log-likelihood ratios of input and output bits of the encoder. As widely known, the outputs are obtained from the inputs using the BCJR [13], [2] algorithm, initially proposed to perform maximum a posteriori bit-by-bit decoding of trellis codes, whose complexity is linear with the state complexity of the code trellis. Our description of the algorithm [2] refers to the trellis section notations shown in Fig. 11, where we distinguish the trellis edges, denoted by “ ”, and, for each edge, its starting state , its ending state , and the input (uncoded, ) ) symbols that label it. We will present and output (encoded, the algorithm in its logarithmic form for a binary code with rate , being and the number of bits forming an input and output code symbol, respectively. consecutive We consider the transmission of a block of trellis steps. The algorithm performs first two (forward and backward) recursions
Fig. 9. Performance with n = 5 and different precisions.
From the curves, we can derive the following design hints. , the best choice is using one bit of • For precision and three for the dynamic. It yields performance . less than 0.1 dB worse than the ideal ones for low , the best choice is using two bits of • For precision and three for the dynamic. It yields performance almost identical to the ideal one. At this point, we have completed the design of the quantizer operating on the LLRs and need to closer examine the SISO algorithm in order to gather informations on its required internal precision.
(8) and then updates the extrinsic LLRs according to
IV. INTERNAL PRECISION OF THE SISO ALGORITHM The block L-SISO in Figs. 1 and 2 is a two-input two-output device (see Fig. 10) that accepts as inputs the quantities 3 , and outputs the quantities . The outputs 3We use for the inputs and outputs of SISO a notation that is different from the one of Figs. 1 and 2. Depending on its application inside an iterative decoder, the inputs and outputs will assume different forms bearing a precise physical meaning. The symbols “I ” and “O ” in these notations refer to the “input” and “output” of the SISO.
(9)
876
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 19, NO. 5, MAY 2001
The inputs to the block L-SISO, when it is applied in the th iteration of the iterative decoding of PCCCs (compare Fig. 10 with Fig. 1), are as follows.
ward and backward recursions (10) of the L-Max-SISO algorithm is given by
is the extrinsic information about the th bit of the th information symbol as provided by the other -th iteration decoder in the is the LLR provided by the soft demodulator • about the -th bit of the -th coded symbol, which remains the same for all iterations. with the operator in (8) and If we substitute the (9), we obtain the recursions describing the L-Max-SISO
(12)
•
is the function that gives the minimum weight where of the code sequences generated by input sequences with weight . Also, the largest difference between any two quantities involved in the computation of the output quantities and in (11) is given by
(13)
(10)
and then updates of the extrinsic LLRS according to
(11) The L-Max-SISO described by the recursions in (10) is a bidirectional Viterbi algorithm with modified branch metrics [14]. Based on its operations described by (10) and (11), we can now derive a fundamental theorem on the largest difference assumed by the SISO internal and output quantities, which extends in a tighter way the result obtained in [15] for the case of Viterbi algorithm. Theorem 1: If the input soft values of the L-Max-SISO algorithm are integers corresponding to fixed-point numbers with the same precision and bounded in the range
then, for a linear code with rate , the largest difference between any two quantities involved in the computation of the for-
is the function that gives the minimum weight where of the code sequences that pass through edge generated by input sequences with weight . Proof: The largest difference between two paths that merge into a state is bounded by their Hamming distance multiplied by the maximum value of the soft metrics. In the L-Max-SISO algorithm, both input and output symbols contribute to the path metric: each difference in a coded bit , while each difference in an gives a maximum difference of . Since the SISO input bit gives a maximum difference of algorithm always chooses the path with the maximum metric, then the largest distance is given by (12). A similar argument can be used to prove the second part of the Theorem. When the number of merging paths in a state is greater than , the bound (12) must be slightly modified in order two to take into account those paths that merge into a state from different edges. Although we have not been able to prove an equivalent theorem holding for the L-SISO algorithm, extensive simulation results show that the upper limits to the spread of quantities obtained in the previous theorem also hold a fortiori for the spread of the quantities in the L-SISO algorithm, and can thus be used safely to design the internal precision of SISO operations. As a consequence of the previous theorem, the range of the LLRs entering the L-SISO specifies in a deterministic way the largest value of the spread of the quantities that must be computed inside the SISO. Example 3: For the eight-state code of the UMTS-3GPP , standard of Fig. 5, we have computed the quantities which are required to compute (12) and (13). They are reported in Table I. Since we have that
the table contains all the informations we need to compute the bounds of the theorem. , a choice leading to alAssuming that most optimal performance and corresponding to seven bits of
MONTORSI AND BENEDETTO: DESIGN OF FIXED-POINT ITERATIVE DECODERS
QUANTITIES d
877
TABLE I
(w; e) FOR THE UMTS CODE OF FIG. 5
Fig. 12.
Quantities involved in the iterative decoder of the PCCC.
representation and letting the path metrics increase, and, possibly, overflow (see [10] and [16]). If the spread between the path metrics is always smaller than the largest number that can be represented by the available bits, the result of subtractions is operator unaffected by the occurred overflows so that the can be performed without errors. In conclusion, considering the additional bit required to avoid normalization, the number of bits that must be used for the internal arithmetic operations of the SISO algorithm is quantization for the extrinsic informations and five bit of quantization for the LLRs, the two limits4 become
(14)
for the s and s, where the symbol greater than or equal to ,” and
means “smallest integer
(15) Thus, nine bits are sufficient for the internal SISO operations, i.e., three bits in addition to the five used to represent input LLRs. In [15], the following upper bound for the maximum spread between two path metrics depending only on the constraint length of the code and on the number of input and output bits and was introduced (16) This expression cannot be used for the evaluation of the maximum spread between the quantities involved in the computation of the output extrinsic information (11). Moreover, because of its independence from the code, the bound in [15] can be significantly looser than the exact value in (12). In the case at hand, for example, the bound (16) yields
which is significantly larger than the exact value (14). The latter yields a saving of one bit in the fixed-point representation of internal path metrics. To limit the values of the quantities to the previously defined spread, a normalization would be required, consisting, for example, in subtracting the minimum (or maximum) quantity from all. It is possible, however, to avoid the normalization of forward and backward path metrics by using the complement-2 4For parallel concatenation, we are not interested in the computation of the quantities O so that is not used.
( )
1
for the extrinsic information. Theorem 1 requires that the fixed point (FP) representations of the inputs to SISO use the same precision. When this is not the case, i.e., when the LLRs of the input and output of the encoder, and , use two different precisions, say and with , we have two possibilities: 1) The SISO makes the computations using the smallest precision . In this case the quantity with higher precision must be divided by (right-shifted by ). This first choice leads to the least complex implementation of the SISO. 2) The SISO makes the computations using the highest precision . In this case, the quantity with smaller precision must be multiplied by (left-shifted by ). V. DESIGN OF FIXED-POINT ITERATIVE DECODER In the previous section, we have shown that the internal precision of the L-SISO algorithm depends from the FP representaand in a deterministic way. tion of the two input quantities In the iterative decoding procedure, however, several L-SISO modules work in a sequential way as depicted in Fig. 12. During the iterations that form the decoding process, the two SISO inputs show a different behavior. The LLRs bearing informations coming from the channel on the coded bits do not change their values, whereas the extrinsic informations face a modification of their statistics as long as the number of iterations (and, consequently, the estimated reliability) increases. In Fig. 13, we present a three–dimensional (3D) plot showing this different behavior with iterations. The plot reports, for each stage of the iterative decoding process, the
878
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 19, NO. 5, MAY 2001
Fig. 13. Evolution of the statistic of LLRs and extrinsic informations as the number of iterations increases.
relative frequency of values with different most significant bits (MSB); in other words, the curve labeled “ th bit” represents the relative frequency of values in the range or . The statistic is derived considering the values of the two input and that enter each L-SISO module. Fig. 13 quantities shows that the statistic splits into two diverging branches: the , has an average first one, corresponding to the quantities value that increases during the iterative process, while the other that corresponds to the input keeps the same value. The next step in the design of an FP implementation of the iterative decoder is then to decide which representation should be used for the quantities that are input to the SISO in the different stages. The design can follow two different strategies depending on the final implementation. • Single SISO: When the data rate is considerably smaller than the available computational speed, a single L-SISO module can perform sequentially the operations described in Fig. 12, and then be responsible of all iterations. In this case, the design of the SISO must satisfy the additional constraint that input and output quantities use the same FP representation:
• Pipelined structure: When the data rate, compared with the hardware speed, imposes a pipelined implementation, in which each SISO only performs half iteration, we can design each SISO to work with different FP representation for the input and output quantities, aiming at minimizing both the internal precision of the different L-SISO modules (and, thus, its arithmetic as well as memory com-
Fig. 14. ~ FP(
Performance of iterative decoder with FP( ) = (4; 1) and
) = (4; 1); (5; 1); (6; 1); (10; 1).
plexity) and the memory requirements for the interleavers between the successive L-SISO modules. A. Single SISO Solution The solution in which a single SISO takes care of all iterations imposes that the precision for the two SISO input quantities is the same so that the only parameter left for optimization is the of the quantities . overall number of bits In Section III, we have shown that a good choice for the reprethat gives less than 0.1 dB sentation of the LLR is that gives almost ideal performance. of loss and In Fig. 14, we report the BER and FER results obtained using a single L-SISO for the iterative decoder based on and . The number of bits used for the internal SISO operations are those coming from the application of Theorem 1. In Fig. 15, we report the same results obtained using a single and L-SISO for the iterative decoder that has . From the figures, it is apparent that the dynamic required for must be at least two bits greater then the dythe quantities namic of the LLR . With a completely fixed structure of the
MONTORSI AND BENEDETTO: DESIGN OF FIXED-POINT ITERATIVE DECODERS
Fig. 15. ~ FP(
Performance of iterative decoder with FP( ) = (5; 2) and
) = (5; 2); (6; 2); (7; 2); (10; 2).
L-SISO decoder is then possible to achieve almost optimal performance using
As a more realistic environment for wireless communications, we have also considered the case of a Rayleigh independent fading channel. The simulation results are reported in Fig. 16. From the figure, it is evident that the conclusions obtained for the AWGN channel are valid also in this case. B. Pipelined Implementation For high data rates, the implementation should be made according to the pipelined structure previously described. In this case, the designer has more freedom, and, since a pipelined
879
Fig. 16. Performance of iterative decoder over the Rayleigh independent fading channel with different FP representations: FP( ) = (4; 1); ~ ) = (7; 2) and ~ ) = (6; 1); FP( ) = (5; 2); FP( FP( ~ ) = (8; 2). FP( ) = (6; 2) and FP(
implementation would normally be done using FPGAs or application specified integrated circuits (ASICs), a greater incentive to use it wisely. The design strategy consists in the joint opti, a process that demization of the FP representations pends on the SNRs range and also on the largest number of iterations allowed. In essence, to save complexity in the SISO impleused mentation, we could keep constant the number of bits in all itfor the representation of the extrinsic informations erations by progressively decreasing the precision in order to accomodate for the larger dynamic. As a consequence, some of the SISO modules would need to process quantities employing different precisions. As pointed out in the previous section, we have in this case two possibilities: the first one, which leads to the largest complexity of the SISO, requires that the internal precision is equal to the maximum of the two precisions (see Fig. 17). Using this solution, changing the FP representation of the quantities
880
Fig. 17.
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 19, NO. 5, MAY 2001
Pipelined structure to save interleaving memory.
Fig. 18. Pipelined structure to save both interleaving memory and SISO complexity.
before and after each interleaver decreases the memory required to store them. In the second case, on the other hand, (Fig. 18) the SISO has an internal precision corresponding to the minimum of the two precisions, so that it is possible to save complexity both in the SISO implementation (logic and memory) and in the memory requirement for the interleaver. In the following, we will only refer to the second more convenient solution. To design the different stages, we consider the cases equal to and , according to the optimization results of Section III, and perform an optimization of the “gain profile”, i.e., a sequence of integer numbers identifying the SISOs in the iterative decoding where we reduce . As an example, for the by one bit the precision of the , the sequence ”11-3-7” identifies the following FP case representation in the iterative decoder: (Please see the equation at the bottom of the page.) In Figs. 19 and 20, we report the BER and FER performance for different choices of the gain profile with a versus and , with the choices total number of bits equal to and , respectively. The ideal performance with infinitely soft quantization and the perand apformance obtained with the quantization plied only to the LLRs are also reported as benchmarks. , Fig. 19 shows that there is always a penalty in the case which uses only four bits to represent both LLRs and extrinsic informations. The best gain profile depends on the required BER , the and/or FER. If the target FER is in the range
Fig. 19. Performance results for different gain profiles using four bits for the FP representation of both the LLR and extrinsic information.
best solution is 5-5-11, leading to a penalty around 0.2 dB. If , the best solution is instead 9-9-3. the target FER is Fig. 20 shows that using five bits to represent both LLRs and extrinsic informations yields almost ideal performance (a penalty less than 0.1), provided that we make the right choice for the gain profile, which is in this case the sequence 11-3-7. The second reported solution, i.e., the sequence 9-3-9, shows the presence of a higher error floor. From the wide simulation experience gained in deriving the previous results, we can affirm that a convenient strategy con-
MONTORSI AND BENEDETTO: DESIGN OF FIXED-POINT ITERATIVE DECODERS
881
• The effect of an insufficient number of bits of precision affects the convergence abscissa of the iterative decoding algorithm, i.e., the behavior of the algorithm at low SNRs. This is due to an insufficient accuracy in the evaluation of ” operation (not enough the correction term of the “ terms in the look-up table). • When using a total number of five bits to represent the LLRs, the best solution, leading to almost ideal performance is to use three bits for the dynamic and two bits for the precision. Using only four bits overall requires one bit of precision and three bits of dynamic, and leads to a 0.1 dB penalty. • The internal SISO operations require a number of bits in addition to those used for the input quantities stated in Theorem 1. • In a single-SISO implementation, a fixed-point representation of the extrinsic informations with two additional bits with respect to the LLRs yields no extra penalty. • In a pipelined implementation, we have shown that the same number of bits can be used for both LLRs and extrinsic informations, provided that a suitable strategy is adopted to increase the dynamic (and consequently decrease the precision) along the iterations. REFERENCES
Fig. 20. Performance results for different gain profiles using five bits for the FP representation of both the LLR and extrinsic information.
sists in keeping the LLRs precision also for the extrinsic informations until we reach the error floor since an insufficient precision would be detrimental for the convergence abscissa. Then, we can start decreasing the precision to enlarge the dynamic. VI. CONCLUSION We have presented an analysis of the quantization effects involved in the fixed-point implementation of iterative decoders for concatenated codes with interleavers and outlined a design procedure with examples applying to the standard turbo code for UMTS applications with AWGN and Rayleigh independent fading channels. From the results of the paper, we can summarize the following design hints. in the • The effect of an insufficient number of bits dynamic of the extrinsic informations affects the “error floor” of the performance curves. This is due to the fact that, for large SNRs, the quantized extrinsic informations tend to concentrate only on the (too low) saturated values so that no further improvement is obtained with increasing values of the SNR.
[1] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near shannon limit error-correcting coding and decoding: Turbo-codes,” in Proc. Int. Conf. Communications, Geneva, Switzerland, May 1993, pp. 1064–1070. [2] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Soft-input softoutput modules for the construction and distributed iterative decoding of code networks,” Eur. Trans. Telecommun., vol. 9, no. 2, pp. 155–172, Mar. 1998. [3] J. Hokfelt, O. Edfors, and T. Maseng, “A survey on trellis termination alternatives for turbo codes,” in Proc. 49th Vehicular Technology Conf., vol. 3, Nov. 1999, pp. 2225–2229. [4] S. Benedetto, G. Cancellieri, R. Garello, and G. Montorsi, “Interleaver theory and applications to the trellis complexity analysis of turbo codes,” IEEE Trans. Commun., vol. 49, May 2001, to be published. [5] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Serial concatenation of interleaved codes: Performance analysis, design and iterative decoding,” IEEE Trans. Inform. Theory, vol. 44, pp. 909–926, May 1998. [6] S. Benedetto and G. Montorsi, “Unveiling turbo-codes: Some results on parallel concatenated coding schemes,” IEEE Trans. Inform. Theory, vol. 42, pp. 409–429, Mar. 1996. [7] S. Benedetto, R. Garello, and G. Montorsi, “A search for good convolutional codes to be used in the construction of turbo codes,” IEEE Trans. Commun., vol. 46, pp. 1101–1105, Sept. 1998. [8] S. Benedetto and G. Montorsi, “Design of parallel concatenated convolutional codes,” IEEE Trans. Commun., vol. 44, pp. 591–600, May 1996. [9] Y. Wu and B. D. Woerner, “The influence of quantization and fixed point arithmetic upon the ber performance of turbo codes,” in Proc. Vehicular Technology Conf., vol. 2, 1999, pp. 1683–1687. [10] G. Masera, G. Piccinini, M. R. Roch, and M. Zamboni, “VLSI architectures for turbo codes,” IEEE Trans. VLSI Syst., vol. 7, pp. 369–379, Sept. 1999. [11] J.-M. Hsu and C.-L. Wang, “On finite-precision implementation of a decoder for turbo-codes,” in IEEE Int. Symp. Circuit and Applications, vol. 4, Orlando, FL, May 1999, pp. 423–426. [12] J. Au and P. J. McLane, “Performance of turbo codes with quantized channel measurements,” presented at the Globecom Conf., Dallas, TX, Nov. 1999. [13] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inform. Theory, vol. 20, pp. 284–287, Mar. 1974. [14] A. J. Viterbi, “An intuitive justification and a simplified implementation of the MAP decoder for convolutional codes,” J. Select. Areas Commun., vol. 16, pp. 260–264, Feb. 1998.
882
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 19, NO. 5, MAY 2001
[15] A. P. Hekstra, “An alternative to metric rescaling in Viterbi decoders,” Trans. Commun., vol. 37, no. 11, pp. 1220–1222, 1989. [16] C. B. Shung, P. H. Siegel, G. Ungerboeck, and H. Thapar, “VLSI architectures for metric normalization in the Viterbi algorithm,” presented at the IEEE Int. Conf. Communications, Atlanta, GA, Apr. 1990.
Guido Montorsi was born in Turin, Italy, on January 1, 1965. He received the Laurea degree in electrical engineering in 1990 from Politecnico di Torino, Turin, Italy, with a master’s thesis concerning the study and design of coding schemes for HDTV, developed at the RAI Research Center, Turin. In 1994, he received the Ph.D. degree in telecommunications from the Electronics Department, Politecnico di Torino. In 1992, he spent a year as a Visiting Scholar with the Department of Electrical Engineering, Rensselaer Polytechnic Institute, Troy, NY. Since December 1997, he has been an Assistant Professor with the Politecnico di Torino. His current interests are in the area of channel coding, particularly on the analysis and design of concatenated coding schemes and study of iterative decoding strategies.
Sergio Benedetto (M’76–SM’90–F’97) received the Laurea degree in electrical engineering (summa cum laude) from Politecnico di Torino, Turin, Italy, in 1969. From 1970 to 1979, he was with the Istituto di Elettronica e Telecomunicazioni, Politecnico di Torino, first as a Research Engineer, then as an Associate Professor. In 1980, he was made a Professor in Radio Communications at the Università di Bari. In 1981, he rejoined Politecnico di Torino as a Professor of Data Transmission Theory with the Deptartment of Electronics. He spent nine months in 1980–1981 with the System Science Department, University of California, Los Angeles, as a Visiting Professor, and three months at the University of Canterbury, New Zealand, as an Erskine Fellow. He has coauthored two books in signal theory and probability and random variables (in Italian), and the books "Digital Transmission Theory” (Prentice-Hall, 1987), and "Optical fiber Communications Systems” (Artech House, 1996), as well as about 200 papers for leading engineering journals and conferences. Active in the field of digital transmission systems since 1970, his current interests are in the field of optical fiber communications systems, performance evaluation and simulation of digital communication systems, trellis coded modulation, and concatenated coding schemes. Dr. Benedetto is Area Editor for Signal Design, Modulation and Detection for the IEEE TRANSACTIONS ON COMMUNICATIONS.