Performance Tradeo s in Multistage Decoders

Comment

Report 3 Downloads 137 Views

Complexity/Performance Tradeos in Multistage Decoders Wing-Man Lee and Frank R. Kschischang Department of Electrical and Computer Engineering University of Toronto, Toronto, Ontario, Canada M5S 1A4 May 10, 1994

This paper appeared in European Transaction on Telecommunications, vol. 5, no. 6, pp. 665{679, Nov.Dec. 1994. Due to the editorial process, there may be slight discrepancies between this version and the published one. The latter should be regarded as the authoritative version. This paper is available over the internet at 0

http://www.comm.utoronto.ca/frank/

Complexity/Performance Tradeos in Multistage Decoders Wing-Man Lee and Frank R. Kschischang

Department of Electrical and Computer Engineering University of Toronto, Toronto, Ontario, Canada M5S 1A4 May 10, 1994

Abstract This paper describes suboptimal decoding methods for multilevel block-encoded modulation schemes. These decoding methods can achieve large coding gains with relatively low decoding complexity, and operate, in part, by decoding a supercode of the code to be decoded. By decomposing the multilevel codes in various ways, a wide range of performance/complexity tradeo points can be obtained. Introducing syndrome checking can further reduce the average decoding complexity with no degradation in performance. Both analysis and extensive simulation results con rm that a given coding gain can be achieved with signi cantly smaller decoding complexity than Viterbi decoding.

This work was supported in part by the Natural Sciences and Engineering Research Council of Canada. Please direct all correspondence to the second author.

Lee and Kschischang

1

I Introduction An important issue in the implementation of any channel coding scheme is the computational complexity required to decode the noisy received data. Trellis codes are commonly decoded using the Viterbi algorithm, which is an optimum maximum-likelihood sequence estimation technique. Unfortunately, the number of computations required by the Viterbi algorithm grows exponentially with the constraint length of the code. The limited computing power available in practical decoders places a limitation on the constraint length, and thus also on the coding gain that can be achieved by practical trellis codes. It is of great interest, therefore, to determine the tradeos between decoding complexity and coding gain that can be achieved in practice. In this paper we explore the coding gain/complexity tradeos that can be achieved in the Gaussian channel by a class of suboptimal decoders for multilevel block-coded modulation. Block-coded modulation (BCM)|like its better-known counterpart, trellis-coded modulation (TCM)|is a power- and bandwidth-ecient coding scheme for information transmission over noisy channels. Both BCM and TCM are designed by combining the coding and modulation processes together, and both can achieve comparable coding gains on the Gaussian channel. The main structural dierence between BCM and TCM is evident from the terminology: in BCM, data is encoded into independent xed-length blocks of channel symbols, while in TCM, data is encoded into semi-in nite sequences of channel symbols, structured by an encoding trellis. In multilevel BCM, block component codes are combined with modulation in such a way that large minimum Euclidean distances can be obtained from component codes designed for Hamming distance. Multilevel codes can be viewed as generalized concatenated (GC) codes [2, 24]. A major advantage of GC codes is that they can be decoded by staged decoders, which is a family of decoding algorithms that carries out decoding in multiple stages with decoded information obtained from previous stages passed on to future stages to reduce the decoding complexity [4, 14, 19, 21]. Staged decoding may be viewed as a process of successive re nement; decoding each

Lee and Kschischang

2

stage in turn re nes the decoder's estimate of the decoded word. In this paper, we extend the usual staged decoding methods through various decompositions of the multilevel codes. The decompositions involve codeword partitions at both the modulation and component code level, which allow for a great exibility in obtaining various tradeo points between coding gain and decoding complexity. We show that syndrome checking can further reduce the average decoding complexity without coding gain degradation beyond that introduced by the staged decoding method itself. Both analysis and extensive simulation results con rm that a given coding gain can be achieved with signi cantly smaller decoding complexity than Viterbi decoding. The techniques described may also be useful for near-optimal decoding of certain lattices obtained using a multilevel construction [1, 6, 9, 17]. The remainder of this paper is organized as follows. In Section II we give a brief summary of the concepts underlying multilevel BCM. In Section III we describe the multi-staged coset decoding (MSCD) algorithm and the notion of coset supercodes. We also describe MSCD/SC (MSCD with syndrome checking). In Section IV we present an analysis of the decoding complexity required by the MSCD and MSCD/SC schemes. In Section V we present an analysis of the error rate performance of these decoding algorithms. Simulation results are presented in Section VI, where the various schemes are compared on the basis of complexity and performance. In Section VII we brie y sketch how some of these techniques might be extended to the decoding of lattices. Finally, some conclusions are oered in Section VIII.

II Multilevel Block Coded Modulation A Generalized Concatenation Multilevel block codes can be viewed as generalized concatenated (GC) codes. Generalized concatenation is a means of constructing codes with large minimum Euclidean distances in terms of codes designed for Hamming distance, using signal set partitions. The GC framework was de-

Lee and Kschischang

3

veloped by Blokh and Zyablov [3] and later extended by Zinov'ev [24] to construct block codes with large Hamming distance. BCM was originally presented by Imai and Hirakawa [14], using a generalization of Zinov'ev's construction. It was later extended by Ginzburg [12] and many other authors [4, 7, 15, 16, 19, 21]. In addition, the GC can be considered as a special case of constructing coset codes (see [5, 8, 9]) using a lattice [6]. 110

010

001 101

100 000

111

011

0000 1000 0010 1010 1100 0100 1110 0110 0011 1011 0001 1001 1111 0111 1101 0101

Figure 1: Ungerboeck labelings for 8-PSK and 16-QAM constellations. The multilevel BCM construction begins with an \Ungerboeck labeling" [11] of a basic constellation, as shown, for example, in Fig. 1 for 8-PSK and 16-QAM constellations. Let u1 and u2 be two constellation labels, and let U (u1 u2) be the number of leading zeros in the modulo two sum of u1 and u2. An Ungerboeck labeling is characterized by the property that the squared Euclidean distance between two distinct points with labels u1 and u2 is bounded by a function that grows with U (u1 u2), i.e., d2(u1; u2) U (u1u2 ), where 0 1 2 . For the p unit radius 8-PSK constellation of Fig. 1, 0 = 2 ? 2 0:5858, 1 = 2, 2 = 4. Similarly, for the 16-QAM constellation, if 0 = 1, then 1 = 2, 2 = 4, and 3 = 8. The main idea of an Ungerboeck labeling is that two labels agreeing in the rst U positions are elements of a common subset in which the minimum squared Euclidean distance is U ; this is precisely Ungerboeck's \mapping by set partitioning" idea [23]. Each codeword c in any block code C of length n over a 2L -point constellation with a binary Ungerboeck labeling can be represented as a coordinate array, i.e.,

c = (c1; c2; . . . ; cn)

Lee and Kschischang

4

2 c c c 66 c0111 c0212 c01nn = 66 .. ... ... 4 . cL?1;1 cL?1;2 cL?1;n

3 77 77 : 5

(1)

The ith column of the coordinate array represents the constellation label for ci, the ith component of the codeword c. Although, in general, an arbitrary relationship can exist among the rows of the coordinate arrays that make up the codewords of C , in a multilevel code, a given coordinate array constitutes a valid codeword if and only if, for every i 2 f0; . . . ; L ? 1g, the ith row of the coordinate array is a codeword of the same binary block code Ci. If the minimum Hamming distance of Ci is hi, then it follows from the properties of an Ungerboeck labeling that the minimum squared Euclidean distance d2(c1; c2) between any two distinct codewords c1 and c2 is bounded as

d2(c1; c2) minfh00; h11; . . . ; hL?1 L?1g:

(2)

The various codes from which the rows of the coordinate arrays are drawn are referred to as \component" codes. We denote a multilevel code over a given 2L-point constellation by C = C0 C1 CL?1, where Ci is the component code for the ith row of each coordinate array. For simplicity, we use the following notation for binary linear block codes: (n; k; d) denotes a code of length n, dimension k, and minimum Hamming distance d; (n; 0; 1) denotes the code consisting of just the all-zero codeword; (n; n; 1) denotes the code containing all possible binary n-tuples; RM (r; m) denotes the rth order Reed-Muller code of length 2m [18, Ch. 13]. If Ci is a speci ed (n; ki ; di) code, we will nd it convenient to display the number of information positions and the minimum Hamming distance of each component code, and write C = (n; k0; d0) (n; kL?1; dL?1 ).

Lee and Kschischang

5

B Group Partitions The multilevel code construction technique is a special case of a construction based on group partition chains [2, 9, 12, 16]. Notice that for a binary Ungerboeck labeling, the set of labels forms a group G0 = ZL2 under componentwise modulo two addition. The subset GU of labels having U leading zeros forms a subgroup of G0 for any U 1. In fact, these subgroups form a subgroup chain, with G0 G1 . . . GL. One should note that the minimum squared Euclidean distance of the points with labels in Gi is i, and that any coset of Gi in Gi?1 has this same minimum squared Euclidean distance. Denote by [Gi=Gi+1 ] a set of coset representatives for the cosets of Gi+1 in Gi and let Ci([Gi =Gi+1 ]) be a code of block length n over this set of coset representatives. The basic multilevel code construction method can then be viewed as a direct sum these codes, i.e.,

C = C0([G0=G1 ]) + C1([G1=G2]) + + CL?1([GL?1=GL ]) + CL(GL): For example, when L = 1, C is the union of jC0([G0=G1])j translates of C1(G1), where jC0([G0=G1])j denotes the number of codewords in C0([G0=G1]).

C Staged Decoding The multilevel structure of MB codes facilitates the use of staged sub-optimal decoding [4, 19, 21]. The concept of staged decoding for MB codes was suggested by Imai and Hirakawa [14]. Although staged decoding has the drawback of reducing the eective coding gain at practical signal-to-noise ratios (SNRs), the decoding complexity of the staged decoder can be much reduced relative to optimal maximum likelihood decoding. For a MB code C = C0 C1 CL?1, the component codes Ci are decoded in sequence, as shown in Fig. 2. Since C0 is the most powerful code, it is decoded rst, using a maximum likelihood decoder. Then C1 is decoded by assuming that C0 has been decoded correctly. Decoding continues in a similar manner for the code C2 and so on, until CL?1.

Lee and Kschischang

6

Decoder for C0 Delay

...

Delay

Decoder for C1

Delay

...

...

Delay

Parallel to Serial

Decoder for CL?1

Figure 2: Staged decoder block diagram. The staged decoding procedure may be thought of as a process of successive re nement or \coarse- ne decoding." Before decoding begins, the set of decoding candidates is the entire set of codewords in the multilevel code. By decoding the rst component code, the decoder obtains a coarse estimate of the value of the nal decoding decision; in fact, the decoder has narrowed the set of decoding candidates to those codewords with a speci c rst row in their coordinate array. Successive decodings narrow the set of decoding candidates further, until a nal decision is made. The decoder for Ci operates under the assumption that previous decoding is correct, and all rows of the coordinate array at depths less than i have been lled in correctly by the decoders for Cj , j < i. The decoder for Ci also operates under the assumption that no constraints are placed on the rows of the coordinate arrays for depths greater than i, i.e., Cj = (n; n) for j > i. Clearly these assumptions are invalid in general and hence staged decoding is suboptimal. The advantage of staged decoding is, of course, a reduction in decoding complexity. Roughly speaking, the maximum number of trellis states at any coordinate for a multilevel code is the product of the number of states found in the trellis of the component codes at that coordinate; hence, Viterbi-based maximum likelihood decoding has a complexity roughly proportional to the product of the decoding complexities of the individual component codes. The complexity of staged decoding, on the other hand, is the sum of the decoding complexities of the individual component codes, which is usually much smaller than the product of complexities.

Lee and Kschischang

7

III Coset Decoding of Multilevel Block Codes In this section we generalize the staged decoding procedure described in the previous section. Our generalization makes it possible to decompose the individual component codes in a BCM scheme, thereby opening a wide spectrum of possible complexity/performance tradeo points for multistage decoders.

A Trellis Contraction/Expansion Consider the 8-PSK code C = (8; 1; 8) (8; 7; 2) (8; 8; 1). A trellis diagram for this code is shown in the upper portion of Fig. 3. A multistage decoder operates by decoding the trellises shown in the lower portion of Fig. 3. The rst decoding resolves the rst bit (denoted by `z') of the branch labels used in the second decoding.

00x

01x

01x

00x 10x 10x

11x

00x

11x

0xx

1xx

z0x

z1x

10x

0xx

1xx z0x

z1x z0x

+

00x

01x 00x 10x

11x 10x

0xx

1xx z0x

z1x z0x

00x

01x

10x

11x

0xx

1xx

z0x

z1x

Figure 3: A staged decoder operates by contracting a trellis (combining states) and later reexpanding to resolve ambiguities. We observe that this multistage decoding algorithm operates by following a trellis contraction/expansion cycle (or process of successive re nement), as follows:

Lee and Kschischang

8

1. Contract: The trellis diagram is reduced by combining states, and the reduced trellis is decoded. Each branch in the reduced trellis in general represents many possible words, not all of which are valid codewords. 2. Expand: The path decoded in step 1, which represents many possible words, is expanded into another trellis, which can be used to nd a codeword among these possibilities. Note that the reduced trellis obtained in the contraction phase represents a supercode of the original code, i.e., a code with a simple trellis diagram that contains the original code as a proper subset. This observation is the basis for our generalization of the multistage decoding procedure. It is possible not only to contract/expand the trellis diagrams at the boundaries imposed by the signal set partitions, but also to contract/expand the trellis diagrams of the component codes themselves. This will, in general, lead to a multistage decoding procedure with more decoding stages than component codes. Often, the decoding complexity can be greatly reduced.

B Algebraic Description The contraction/expansion procedure outlined above has a simple algebraic interpretation. Let C be a two-level block code based on the group partition G=H , i.e., let C = C0([G=H ]) + C1(H ). Recall that C is the union of jC0([G=H ])j translates of the code C1(H ). One way to achieve maximum likelihood decoding of C is to perform the following steps: 1. Find the best translate c0 +C1(H ) of C1(H ), i.e., identify the translate of C1(H ) that contains the maximum likelihood codeword; 2. Find the best codeword c0 + c1 from within the translate c0 + C1(H ) selected in step 1. Of course to perform step 1 optimally, the structure of C1(H ) must be taken into account, and step 1 amounts to a separate decoding of each translate of C1(H ). The total complexity is equal to jC0([G=H ])j (complexity of decoding H ) + (jC0([G=H ])j ? 1) for the nal comparison among

Lee and Kschischang

9

translates. Step 2 is then trivial. If the complexity in decoding subcode C1(H ) is high and/or jC0([G=H ])j is large, the number of computations required will be impractical. The trellis contraction procedure described above can be interpreted as a means of selecting the best translate in step 1 suboptimally, using a coset supercode of C , rather than C itself. A coset supercode S of C is obtained by augmenting C1(H ) with additional \pseudo-codewords," while maintaining the same set of translation vectors, i.e., S is a supercode of C if S = C0([G=H ]) + S 0(H ), where S 0(H ) C1(H ). Since S 0(H ) is made up of more codewords than C 0(H ), it is possible for S 0 to have a much simpler trellis structure (with fewer states) than C1. In short, the coset supercode S is obtained by adding \pseudo-codewords" to C so that S can then be represented by a simpler trellis diagram. To maintain asymptotic coding gain performance, coset supercodes that have the \intercoset distance preserving property" are of special interest. The interset distance between two nonempty sets H1 and H2 is de ned as

s(H1; H2) = minfd(h1; h2) : h1 2 H1; h2 2 H2g: The minimum interset distance for a nonempty collection of disjoint sets H = fH1; H2; . . .g is de ned as x(H ) = minfs(Hi; Hj ) : Hi ; Hj 2 H; Hi 6= Hj g; where x(H ) = 1 if H contains just one element. If C = C0([G=H ]) + C 0(H ), the minimum interset distance between the translates of C 0(H ) whose union is C is denoted x(C=C 0). Let S = C0([G=H ]) + S 0[H ] be a coset supercode of of C = C0([G=H ]) + C 0(H ). The coset supercode S is said to have the intercoset distance preserving property if

x(S=S 0) = x(C=C 0)

(3)

Since C S , it follows that d[S ] d[C ]; hence we expect that the decoding of S will be less

Lee and Kschischang

10

reliable than that using C . However, since it is always true that x(C=C 0) d[C ], if (3) holds, it follows that x(S=S 0) d[C ]: (4) Since the coset supercode S is used to assist the selection of the best coset rather than actually selecting the decoded path, keeping the intercoset distance to no less than the minimum distance of C implies that, asymptotically (at high SNRs), the selection of the best coset using coset supercode S is no worse than that using C . Hence, the suboptimal scheme will have the same asymptotic decoding performance as the optimal algorithm. Since S 0 is chosen to have a simpler trellis structure than C 0, the computational eort required to decode S is smaller than that required to decode C . This motivates the following two-step reduced complexity (and suboptimal) decoding procedure: 1. Decode S to nd the translate c0 + S 0(H ) containing the best codeword in S . 2. Find the best codeword c in the translate c0 + C 0(H ). The decoded codeword for code C is then given by c. In general, if the decoding procedure is applied recursively for a nested partition chain, a multistage coset decoding (MSCD) algorithm can be formulated. MSCD combines decompositions of the component codes with decompositions at the coordinate array level. The technique is a variant of the closest coset decoding method for (uju + v) codes presented by Hemmati [13], and staged decoding for decomposable block codes proposed by Takata et al. [22].

C Example Consider the 8-PSK code C = RM (0; 4) RM (2; 4) RM (3; 4), or equivalently, C = (16; 1; 16) (16; 11; 4) (16; 15; 2). The usual multistage decoding procedure would operate by using a 3-level decomposition of this code, i.e., by decoding the component codes in succession. We can obtain a 4-level decomposition of this code by decomposing the (16,11,4) code itself.

Lee and Kschischang

11

(16,11,4) = j(4; 4)=(4; 3)=(4; 1)j4

+

(16,12,2) = j(4; 4)=(4; 3)=(4; 2)j4

(16,7,4) = j(4; 2)=(4; 2)=(4; 1)j4 Figure 4: A staged decoder for the (16,11) RM code operates by decoding a (16,12) supercode in the rst stage and a (16,7) subcode in the second stage.

Lee and Kschischang

12

The (16,11,4) RM code has a four-section trellis diagram as shown in Fig. 4, obtained by the iterated squaring construction j(4; 4)=(4; 3)=(4; 1)j4 as described in [9]. If adjacent states are merged (as indicated in Fig. 4), the trellis diagram for a (16,12,2) coset supercode is obtained. This four-section trellis can also be obtained via the j(4; 4)=(4; 3)=(4; 2)j4 iterated squaring construction. The trellis diagram for a (16; 7; 4) code to be used in the expansion phase of decoding is obtained by tracing the all-zeroes path in the (16,12) code trellis, or by a j(4; 2)=(4; 2)=(4; 1)j4 construction. Fig. 4 illustrates this decomposition. (See also [13, Fig. 1].) It can be checked that this decomposition preserves the intercoset distance, as described above. Decoding of the multilevel code C can now be carried out in four stages: in stage 0, the (16; 1) (16; 16) (16; 16) supercode is decoded; in stage 1, the (16; 0) (16; 12) (16; 16) supercode of the (16; 0) (16; 11) (16; 16) code is decoded; in stage 2, the (16; 0) (16; 7) (16; 16) subcode of the (16; 0) (16; 11) (16; 16) code is decoded; nally, in stage 3, the (16; 0) (16; 0) (16; 15) component code is decoded. The corresponding trellis diagram for each stage is illustrated in Fig. 5. The product of the maximum number of states in each trellis (2 4 2 2 = 32) is equal to the maximum number of states required by maximum likelihood decoding, as expected. This four-stage procedure requires a maximum of 2 + 4 + 2 + 2 = 10 states.

D MSCD with Syndrome Checking In this section we describe a variation of the MSCD algorithm that can reduce the average number of computations, without further reduction in coding gain. When the MSCD algorithm is used to decode C , the coset supercode S contains all the valid codewords of C as its codewords since C S . When decoding S , it is quite possible that an element of C is chosen as the decoded word. In this case, no further decoding steps are needed, since no other path can be better than this decoded path when the exact subcode C 0 of C is considered in the second stage. Since not all stages are needed before obtaining the decoded path, the average complexity is reduced.

Lee and Kschischang

13

STAGE 0 256 STAGE 1 64 STAGE 2 32 STAGE 3 8

Figure 5: Trellis diagrams for the 4-stage MSCD example. The number of parallel transitions represented by each branch is indicated. An obvious method to check whether the decoded sequence is a valid codeword is to use syndrome checking as in classical coding theory. The syndrome for each row in the decoded coordinate is checked before declaring the sequence to be a valid codeword. However, if some component codes have already been fully decoded after a certain stage, only the remaining (undecoded) rows need to be checked. We refer to an MSCD algorithm that uses syndrome checking as MSCD/SC. The average computational complexity required to implement syndrome checking of the block component codes is usually much smaller than that required to decode each component code. Therefore, MSCD/SC can be very eective in reducing the average decoding complexity. Moreover, there is no coding gain degradation relative to MSCD without syndrome checking. The number of stages required to achieve nal decoding will depend on the SNR. Intuitively, the higher the SNR, the more likely a received word will pass the syndrome checking tests after relatively few (one or two) stages of decoding. Thus we expect MSCD/SC to be especially eective at high SNR. Simulation results presented in Section VI support this conclusion.

Lee and Kschischang

14

The price to be paid for this reduction in average decoding eort is, of course, a variability in decoding eort. If a system is designed to take advantage of the reduction in average decoding eort, buering schemes similar to those required in sequential decoding must be implemented to store incoming blocks succeeding those blocks that require a greater than average time to decode.

IV Decoding Complexity In this section, we estimate the complexity required to decode a multilevel code using MSCD and MSCD/SC. We focus our attention on Reed-Muller multilevel block (RMMB) codes, i.e., multilevel codes designed using Reed-Muller component codes. Such codes can be represented by simple and regular trellis diagrams with relatively few states, and they can be decoded using the iterative procedure based on the squaring construction as described in [9]. The coset supercodes and subcodes used in MSCD all have a squaring construction description, and so all can be decoded in this manner.

A BSCS The rst step in any multilevel coding scheme is to compute the branch metric values that will be needed at each stage of the decoding process. This branch metric computation, which we call \basic sub-constellation selection" (BSCS), can be accomplished by using a tree of metric comparisons, as shown in Fig. 6 for an 8-PSK constellation, in which the smaller metric value is promoted to next round of comparisons. The branch metric values labeled \0XX" and \1XX" are usually required in the rst stage of decoding, the metric values labeled \00X," \01X," etc., are required in the second stage of decoding, and so on. The total number of comparisons needed for BSCS in a constellation with M points is M ? 2, when M is a power of two. For a code of length n, BSCS is performed at each coordinate; hence the total number of comparisons required for BSCS is n(M ? 2).

Lee and Kschischang

15 010

110

100

001 101

000

111

011

000 001 010 011 100 101 110 111

00X 0XX 01X 10X 1XX 11X

Figure 6: Branch metrics for multistage decoding can be obtained via \Basic Sub-Constellation Selection" (BSCS), as shown here for an 8-PSK constellation.

B Complexity of MSCD Given an RMMB code, the complexity of decoding by the MSCD in s stages can be estimated as follows: 1. BSCS is used to compute the metric of each bit in the accompanying trellis, which requires a total of BSCS = n(M ? 2) binary operations, as described above. 2. For each stage of decoding, a four-section trellis based on the two-level squaring construction [9] is constructed, as illustrated in Fig. 7. Assuming that this trellis represents jG=H=I j4 (in the notation of [9]), with jG=H j = M and jH=I j = N , then the complexity of decoding the 4-section trellis, given the branch metrics, can be computed from the formula 4MN 2 ? 1 [9]. We denote this component of the complexity at the ith decoding stage by T (i), where 0 i s ? 1, and s is the number of stages required. 3. The branch metrics are computed recursively, as follows. If the RMMB code is of length 2e where e 2 is an integer, then each section of the 4-section trellis is of length 2e?2 symbols. Based on an iterated squaring construction, the branch metrics are computed so that 20tuples are rst decoded, then 21-tuples, and so forth, until nally 2e?2 -tuples are decoded after e ? 1 steps. Suppose the 2t-tuples have the partition chain G0 =G1= =Gm , and the corresponding 2t+1 -tuples have the partition chain G1;0=G1;1= =G1;m?1 after the squaring

Lee and Kschischang

16

construction. For each 2t+1-tuple, the complexity to select the best element and to nd the corresponding metric is given by (2jGm?1 =Gm j? 1)jG1;0=G1;m?1j. We denote this component of the complexity at the ith decoding stage by B (i), for 0 i s ? 1. 4. The total decoding complexity is found by adding the complexity required in the above steps, i.e., sX ?1 (5) Complexity = BSCS + T (i) + B (i) i=0

G

H

I

...

N states

...

...

... ...

M groups Figure 7: The 4-section trellis diagram for jG=H=I j4 in M groups of N states. Example: Consider the 8-PSK RMMB code C = (16; 1) (16; 11) (16; 15) used in the previous example, and decoded using the 4-stage MSCD procedure described earlier. The total decoding

complexity can be estimated as follows:

BSCS = n(M ? 2) = 96 since M = 8 for 8-PSK and n = 16 is the length of the MB code. Stage 0: The coset supercode S0 = (16; 1) (16; 16) (16; 16) is decoded, for which T (0) = 4MN 2 ?1 = 7 (since M = 2 and N = 1), and B (0) = 16 (2?tuples) +8 (4?tuples) = 24: Stage 1: The coset supercode S1 = (16; 0) (16; 12) (16; 16) is decoded, for which T (1) = 4MN 2 ? 1 = 31 (since M = 2 and N = 2), and B (1) = 48 (2?tuples) +16 (4?tuples) = 64:

Stage 2: The coset supercode S2 = (16; 0) (16; 7) (16; 16) is decoded, for which T (2) = 4MN 2 ?1 = 15 (since M = 1 and N = 2), and B (2) = 0 (2?tuples) +24 (4?tuples) = 24:

Lee and Kschischang

17

(No operations are required to decode at the 2-tuple level since this was done in the previous decoding step.)

Stage 3: The subcode C3 = (16; 0) (16; 0) (16; 15) is decoded, for which T (3) = 4MN 2 ? 1 = 15 (since M = 1 and N = 2), and B (3) = 48 (2?tuples) + 24 (4?tuples) = 72: The total complexity is 348 binary operations, as summarized in Table I. The same RMMB code C can also be decoded in one stage by the Viterbi Algorithm or with a 3-stage multistage decoder which decodes component codes one by one. For comparison, the decoding complexity of the above two algorithms is also summarized in Table I.

C Average Complexity of MSCD/SC To estimate the average complexity of decoding using MSCD/SC, two factors must be taken into account: rst, the complexity of computing the syndromes must be accounted for; second, the fact that not all decoding stages are always needed must be accounted for. Syndrome checking is used after each stage of decoding except the last stage. We estimate the complexity of syndrome checking as follows. For each component code Ci of an L-level MB code C of length n, there exists a parity check matrix Hi where 0 i L ? 1. Each Hi is an (n ? ki ) n matrix, where ki is the number of information bits of the ith component code. To carry out syndrome checking, for each row of Hi , the data bits which correspond to the location of ones in Hi have to be added together and then compared with 0. In other words, if there are t ones in a row of Hi , the number of binary operations required is t ( t ? 1 additions and 1 comparison ). Consequently, the complexity of checking component code Ci is equal to the number of ones in Hi . The total complexity for checking the MB code is obtained by summing the complexity required for each component code. However, as described before, after a certain stage of decoding, component codes already decoded must be valid and syndrome checking for those component codes is not required. The number of binary operations required for syndrome checking at the ith stage is denoted by S (i) for 0 i s ? 1, where S (s ? 1) = 0.

Lee and Kschischang

18

To take the second factor into account, the proportion of the received sequences which require decoding in each stage is required. We denote the probability of a received codeword requiring stage i decoding be P (i) for 0 i s ? 1. P (i) can be found by simulations. It should be noted that P (0) = 1, since the rst stage is required for all received sequences. The average complexity for decoding a RMMB code using MSCD/SC may then be written as sX ?1

Complexity = BSCS + [T (i) + B (i) + S (i)] P (i): i=0

This complexity measure depends on the SNR since P (i) is a function of this factor.

V Performance Analysis In this section, we derive simple analytical expressions for the block-error rate performance of our decoding procedures. The results of this performance analysis are compared with simulation results in the next section.

A General Considerations To estimate the performance of MSCD in the Gaussian channel, we use the following union bound approximation. Let d2[C ] denote the minimum squared Euclidean distance between distinct codewords of a multilevel block code C , and let Nz (d[C ]) denote the \error coecient" for the codeword z, i.e., the number of codewords separated from z by a Euclidean distance of d[C ]. Let N (d[C ]) denote the error coecient averaged over all codewords, assuming all codewords are equally likely. Then, by standard arguments [20], the average probability of block error under maximum-likelihood decoding (denoted Pe (C )) is approximated (at high SNR) by 0s 1 Pe (C ) N (d[C ])Q @ d2[NC ] A ; 0

Lee and Kschischang

19

where Q() is the Gaussian tail distribution function, de ned as

Z1 2 1 Q(x) = p e?u =2du: 2 x

(6)

We de ne the rate R[C ] of C , measured in bits per channel-use, as the number of information bits per signal block normalized by the block length, i.e.,

R[C ] = n1 log2 jC j; where n is the length of the code C , jC j is the total number of codewords in C . Let Es denote the average symbol energy, and Eb the average bit energy, where Es = Eb R[C ]. Consider the 8-PSK constellation as shown in Fig. 1. Using set partitioning techniques, the constellation is partitioned into QPSK and BPSK sub-constellations on the rst and the second partition levels respectively. In the basic constellation, each point has n0 = 2 nearest neighbours. In the rst level QPSK signal set, each point has n1 = 2 nearest symbols, while in the 2nd level BPSK signal set, the number of nearest symbols n2 = 1. In the 16-QAM constellation of Fig. 1, the number of nearest symbols at each partition level depends on the chosen reference symbol. In the basic constellation, assuming equiprobable symbols, symbols have an average of n0 = 3 nearest neighbours. Similarly, in the sub-constellations obtained by set partitioning, we have n1 = 9=4, n2 = 2, n3 = 1.

B Maximum-Likelihood Decoding Consider a MB code C = C0 C1 CL of length n, where Ci is the ith level component code (n; ki ). Let Ni(h) be the number of codewords of the component code Ci with Hamming weight h. For a linear binary block component code, since the path multiplicity is independent of the reference transmitted sequence, we assume the codeword 0 to be the reference sequence for simplicity. Since all the nearest symbols for the reference symbol on the constellation are of bit

Lee and Kschischang

20

labels 1, only those bit segments where the code sequences have label 1 give rise to an increase in path multiplicity at the minimum distance. For an error sequence occurring at the ith level component code Ci, there are ni ways to mistake the correct sequence at every bit segment of label one of the error sequence. Given an error sequence at minimum squared Euclidean distance d[C ], and assuming the minimum Hamming distance of the component code Ci that can contribute to such type of error sequence is hi, the total multiplicity contributed by the component code Ci is given by (ni)hi Ni(hi ), assuming all the component codes at levels greater than i are uncoded bits. Given that errors occurring at component code Ci can contribute to error sequences of code C at d[C ], the total path multiplicity at d[C ] for MLD is given by

N (d[C ]) =

L X i=0

(ni)hi Ni(hi) 2

PL (kj ?n) j=i+1 :

(7)

PL The factor 2 j=i+1 (kj ?n) is the correction factor for coded component codes at levels greater than i, which accounts for the actual proportion of sequences at the minimum distance. Example: Consider the 8-PSK RMMB code C = (16; 1) (16; 11) (16; 15) of length 16, considered in previous examples. Assume the code C is decoded by maximum likelihood decoding. The error sequences due to errors occurring in the component code C0 have minimum squared Euclidean distance = 0:586A2(16) = 9:376A2. This is greater than the minimumsquared Euclidean distance of the code C (= 8A2), thus, errors in C0 do not contribute to minimum distance error sequences. Only errors occurring in C1 or C2 result in error sequences of distance 8A2. The minimum Hamming distances of component codes C1 and C2 are 4 and 2 respectively. The total number of paths at their corresponding Hamming distances [18] are N1(4) = 140, and N2(2) = 120. Using equation (7), the total path multiplicity at d[C ](= 8A2) for maximum likelihood decoding is given by N (8A2) = 24N1(4) 215?16 + 12N2(2) 216?16 = 1240: (8)

Lee and Kschischang

21

C Multistage Decoding Using multistage decoding, the component codes are decoded in sequence. For MB code C = C0 C1 CL of length n, the decoding involves L + 1 stages. The probability of block error P (e) can be split into L + 1 dierent terms, one for each stage of decoding. Let Pi(e) be the probability of the event that a block error occurs in decoding the ith level component code Ci with no error in prior levels. Then P (e) is given by

P (e) = P0(e) + [1 ? P0(e)]P1(e) + + [1 ? P0(e) ? ? PL?1 (e)]PL(e):

(9)

At high SNRs, all the terms [1 ? P0(e)],. . ., [1 ? P0(e) ? ? PL?1(e)] are less than, but close to 1; hence, a simple union bound P (e) P0(e) + P1(e) + + PL(e) is obtained. Provided that errors occurring at the ith level component code Ci can contribute to error sequences at minimum squared Euclidean distance d[C ], the total path multiplicity at d[C ] for the multistage decoding is approximated by

N (d[C ])

L X i=0

(ni)hi Ni(hi):

(10)

Example: Consider the same 3-level RMMB code C as in the previous example. The code is decoded by a 3-stage multistage decoding which decodes component codes one by one. Using equa-

tion (10), the total path multiplicity at d[C ](= 8A2) for the multistage decoding is approximated by N (8A2) 24N1(4) + 12N2(2) = 2360: (11)

D MSCD Since MSCD is a generalization of multi-staged decoding, the derivation of the path multiplicity for MSCD is similar. Consider the RMMB code C = C0 C1 CL of length n. For those stages which involve partitioning of the code C into just component codes Ci, the path multiplicity

Lee and Kschischang

22

can be obtained in the same manner as illustrated before. For stages which involve partitioning code C into cosets within Reed-Muller component codes, the path multiplicity can be derived as follows. We assume that errors occurring at the ith level RM component code Ci can contribute to error sequences at minimum squared Euclidean distance d[C ], and the component code Ci can be decoded in stages by modifying Ci into coset supercodes. Let Ni;j (h) be the number of codewords in the j th coset supercode of component code Ci with Hamming weight h but which are not in the same coset as the codeword 0 (the reference sequence). Then the total path multiplicity at d[C ] for the MSCD is approximated by

N (d[C ])

L X X i=0 j

(ni)hi Ni;j (hi):

(12)

Example: Consider the same 3-level RMMB code C as in previous examples, decoded by a 4-stage MSCD procedure. The minimum Hamming distances of component codes C1 and C2 are 4 and 2 respectively. The component code C1 is decoded in two stages. The number of codewords of

the rst and the second coset supercodes of component code C1 that is at Hamming distance 4 and also in a coset dierent from the reference sequence are given by N1;1(4) and N1;2(4) respectively. We have N1;1(4) = 224, N1;2(4) = 28, and N2;1(2) = 120. Using equation (12), the total path multiplicity at d[C ](= 8A2) for MSCD is approximated by

N (8A2) 24N1;1(4) + 24N1;2(4) + 12N2;1(2) = 4152:

(13)

VI Simulation Results Monte Carlo simulations of block encoded 8-PSK and 16-QAM with Reed-Muller component codes using various suboptimal decoding algorithms were performed. For consistency with other recent papers [15, 22] on multilevel block codes and with our performance analysis, the performance

Lee and Kschischang

23

measure of interest was the block error rate. We assume that if any errors are detected in a block, the whole block is considered to be an erasure and has to be re-transmitted. An approximation to the bit error rate can be obtained by assuming that half of the information bits in an incorrectly decoded block are in error. The MB codes selected for simulation are shown in Table II for 8-PSK and Table III for 16-QAM. The decoding methods are indicated by the corresponding MSCD-oriented coset decompositions shown in Table IV for coded 8-PSK and Table V for coded 16-QAM. These codes were chosen so that the number of states in each stage is practical for simulations and that the intercoset distance preserving property as described before is maintained. For simplicity, each method is identi ed by notation MX (Y ) where Y is the code used and X is the identi cation index. The corresponding path multiplicity and the required number of trellis states for each of the method is illustrated in Table VI for coded 8-PSK and Table VII for coded 16-QAM. The probability of block error is plotted against SNR. The Pb (e) for the MSCD algorithm and the MSCD/SC algorithm is the same. In order to calculate the complexity of the MSCD/SC, which depends on the SNR, it is necessary to determine the proportion of received sequences that require stage i decoding. This number, denoted P (i), is also plotted in the simulation results for i > 0. (Note that P (0) = 1 for all received sequences.) For all simulations, as a basis for comparison, the error rates for blocks of uncoded QPSK and uncoded 8-AMPM symbols containing a comparable number of information bits are presented. To compare the decoding complexities of codes at dierent block lengths and dierent spectral eciencies, the complexity is normalized to the number of binary operations per information bit. Figures 8, 9 and 10 show the performance of decoding code C16P ;a by MLD, a 3-staged, and a 4-staged MSCD respectively. The exact coset decomposition and corresponding coset supercodes and subcode for each of the MSCD schemes can be deduced from the parameters given in Table IV. The approximations t well with the simulation results as indicated by the gures. Figure 11 shows the comparison of decoding code C16P ;a by MLD, and MSCD with 2, 3, and 4 stages. At moderate to high SNR's, all these schemes perform comparably. Generally, the

Lee and Kschischang

24

100

s

10?1

c

s

s

s

s

s

s

s

s

s

s

44 4c 4 4 c4c 4c

s

s

s

s

c

10?2

Prob. Block 10?3 Error 10?4

Simulation Approximation Uncoded QPSK (26-bit block) 10?5

3

3.5

4

4.5

c

4c

4s

5 5.5 SNR (dB)

4c

4

4

s

s

s

4

4 6.5 7

6

Figure 8: Probability of block error vs. SNR (Method MC (C16P ;a))

100 s 10?1 ?c

s

s

s

s

?c

?c ? 4c

10?2

Prob. Block ?3 Error 10

s s

s

s s s s s s s s

4 ?4c ? ? 4c 4c ? ? ? ? 4 c c

Simulation Approximation 4 Uncoded QPSK (26-bit block) s Prob. of requiring Stage 1 Prob. of requiring Stage 2 ?

10?4 10?5

3

3.5

4

4.5 5 5.5 SNR (dB)

4c

6

4c

4c 4

6.5

s

?

4c 7

Figure 9: Probability of block error vs. SNR (Method MD (C16P ;a))

Lee and Kschischang

25

100 s ? 10?1 +c

s

s

s

s

? +c

s s

s

s s s s s

? ? +c 4 +c 4 ? ? + + 4 c 4

10?2

s

s

? ? ? ? c 4 + + + 4c c + c

Prob. Block 10?3 Error

Simulation Approximation 4 QPSK (26-bit block) s 10?4 Uncoded Prob. of requiring Stage 1 Prob. of requiring Stage 2 ? ? 5 Prob. of requiring Stage 3 + 10

3

3.5

4

4.5 5 5.5 SNR (dB)

4c 4c

6

s

s

? +

4

6.5

4c

4

7

Figure 10: Probability of block error vs. SNR (Method ME (C16P ;a)) more stages the decoding involves, the higher the degradation in performance, relative to MLD. However, at high SNR's this degradation is small, the performance of the 4-staged MSCD (the worst case) being inferior to MLD by about 0.27 dB at Pb (e) = 10?5 . However, the complexity has been reduced by almost 85% if the MSCD is used and 91% if the MSCD/SC is used instead. Figure 12 compares MLD, and 2-, and 3-staged MSCD decoding of C8Q;a. The 3-staged decoding is inferior to MLD by about 0.4 dB at Pb(e) = 10?5 . For the 2-staged MSCD, the rst and the second component codes are combined for decoding, resulting in a scheme that performs only about 0.1 dB worse than MLD. Our simulation results show that it is advisable to combine the rst component code and the second component code in decoding coded 16-QAM, as this improves performance with only a slight increase in decoding complexity. (This is done in Method ME in Fig. 12.) This improvement depends mainly on two factors. First, by combining component codes, the path multiplicity can be greatly reduced. Second, the rst component code C0 is, in general, quite simple in structure. In our multilevel schemes, repetition codes are used. Hence, the increase in complexity is minimal

Lee and Kschischang

26

100 10?1 ?cs

sc

10?2

Prob. Block 10?3 Error

?

sc

?cs

?

Method MC Method MD sc Method ME Method MF ?

10?4

sc

?sc ? s c ? sc s c ??

10?5

c s s c s

3

3.5

4

4.5 5 5.5 SNR (dB)

6

6.5

Figure 11: Comparison of decoding schemes for code C16P ;a

100 10?1

?s

?

s

? s ? s ? s s ? s Method MC s ? s Method MD ? s Method ME ? ?

10?2

Prob. Block 10?3 Error 10?4

?s

10?5

4

5

6

7 SNR (dB)

8

9

Figure 12: Comparison of decoding schemes for code C8Q;a

7

Lee and Kschischang

27

when component codes C0 and C1 are combined. We note, however, that this component code combining technique is much less eective for coded 8-PSK, since the rst component code, C0, does not, in general, contribute error paths at d[C ], and so the path multiplicity cannot be reduced. The complexity in binary operations per information bit for coded 8-PSK and 16-QAM is illustrated in Table VIII and Table IX respectively. The corresponding coding gains at Pb(e) = 10?4 and Pb (e) = 10?5 are also shown in the Tables. The data at Pb (e) = 10?5 are plotted in Fig. 13 and Fig. 14 for 8-PSK and 16-QAM, respectively. The plot in Fig. 13 for coded 8-PSK shows the expected trend: an increase in coding gain is accompanied by an increase in decoding complexity. For codes of short block length, both the coding gain and the complexity of MLD or MSCD are about equal. For longer codes, sub-optimal decoding schemes are obviously better choices. At moderate to low block error rates, such as 10?5, MSCD and MSCD/SC can be applied to relatively complicated codes to obtain both complexity and performance advantages over MLD decoding of simpler codes. In other words, reduced complexity decoding of complicated codes can sometimes achieve better performance|in terms of both coding gain and complexity|than optimal decoding of simpler codes, at least for the codes considered in our simulation study. For example, methods MG and MK for higher dimensional codes C16P ;a and C16P ;b respectively using the MSCD/SC achieve better coding gain at lower complexity than method MA (MLD) for a shorter code C8P;a. The observations for the plot in Fig. 14 for coded 16-QAM parallel those for coded 8-PSK. Nevertheless, it is surprising to notice a drop in the coding gain (Method MF with the MSCD) while increasing the complexity. The reason for this drop is due to the combination of component codes. Since the coding gain can be eectively improved by combining component codes as in Method MH , a better coding gain can be achieved even at a lower decoding complexity. To illustrate roughly how well the suboptimal schemes (the MSCD and the MSCD/SC) do, included in the gures are the performance and complexity tradeo of Ungerboeck's 1-D and 2-D TCM codes [8, 23]. MB codes and TCM codes are structurally dierent. In addition, TCM codes

Lee and Kschischang

28

5.5 5

ss

c

c

4.5 s

4

44

Coding Gain 3.5 (dB) 3 2.5

4

2

e u

e u

42 ? ? s

2 1.5

2

2

c

5

c

n=8, MSCD n=16, MSCD 2c n=32, MSCD n=8, MSCD/SC ? n=16, MSCD/SC 4 n=32, MSCD/SC se 1-D TCM 2-D TCM

u

10 15 20 25 30 35 40 Complexity (binary operations per information bit)

Figure 13: Gain vs. decoding complexity for block-encoded 8-PSK .

Lee and Kschischang

29

6 5.5

ss

5

c

ss

4.5 Coding 4 Gain (dB) 3.5

4

ss

4

2

2

c

c

c

c

c

24 e u

e

3

u

2.5

? ?

2 1.5 5

n=4, MSCD n=8, MSCD 2 n=16, MSCD c n=4, MSCD/SC ? n=8, MSCD/SC 4s n=16, MSCD/SC e 1-D TCM 2-D TCM

u

10 15 20 25 Complexity (binary operations per information bit)

Figure 14: Gain vs. decoding complexity for block-encoded 16-QAM.

Lee and Kschischang

30

shown are decoded by MLD while MB codes are decoded by suboptimal decoding. Hence, it is dicult to compare their performance and complexity tradeo on a fair ground. But in general, for high coding gains (3.5 to 5.5 dB), the suboptimal schemes (the MSCD and the MSCD/SC) have an outstanding performance and complexity tradeo compared with decoding TCM codes using MLD. This is achieved usually with codes at higher dimensions (n = 16 and n = 32 for 8-PSK and n = 8 and n = 16 for 16-QAM).

VII Lattice Decoding In this section we brie y discuss how some of these decoding techniques can be used to decode lattices. (See [6, Ch. 1{3] for a general discussion of lattices, and their use in data transmission and vector quantization, among other applications.) Many lattices can be constructed using the same multilevel construction described in Section II. For example, the integer lattice Z can be partitioned according to the in nite binary partition chain Z=2Z=4Z= . Integer n-tuples can then be represented by a binary coordinate array as in (1), with an in nite number of rows in the array. Integer lattices can be represented by placing constraints on the set of values that are allowed in the coordinate array; often, the rst few rows of the coordinate array are constrained to be elements of a linear code (as in Constructions A{E in [6] or, e.g., [17]). In particular, the in nite family of Barnes-Wall lattices can be constructed from the binary Reed-Muller codes. Consequently, our decoding methods RMMB codes transfer directly to these lattices. For example, as pointed out by a reviewer, the four-level 16-QAM block code C16Q ;a from Table III is a subset of the Barnes-Wall lattice 32, a close relative of the Leech lattice 24. Our decoding methods for this code extend immediately to decoding algorithms for 32. The main dierence between the decoding procedure for the nite 16-QAM code and the in nite lattice is the presence of an in nite number of \parallel branches" in the trellis representing the lattice. The rst step in decoding is to nd the best representative among the parallel branches, and assign the

Lee and Kschischang

31

corresponding branch metric. Using these branch metrics, a decoding procedure for C16Q ;a applies without modi cation. It should be noted that similar low-complexity bounded distance decoding algorithms for the Leech lattice have been developed by Forney [10] and Amrani, et al. [1]. In general, these decoding algorithms suer less than 0.1 dB degradation relative to the maximum-likelihood decoder, but with a relatively low computational complexity. Forney's bounded distance decoder operates by combining two separate decodings of the \Leech half-lattice," a sublattice of index 2 in 24. This approach of two separate \half-lattice" decodings extends immediately to any lattice with a simple sublattice of low index (for example when the rst row of the coordinate array is constrained by a repetition code) and has the advantage that it avoids an increase in nearest neighbour multiplicity that would arise with conventional staged decoding. Staged decoding (including extensions developed in this paper) can be used to decode the sublattice. The bounded-distance Leech lattice decoder described in [1] also operates in this way. The appropriate complexity normalization for lattice decoding is \operations per dimension," not \operations per information bit" as used in this paper. The reason for this is that the number of \information bits" can be made arbitrarily large, by taking a large enough subset of the lattice, without changing the complexity of the decoder. For data transmission purposes, however, the \per information bit" normalization is an appropriate measure for comparing dierent decoders all of which operate at the same nite bit rate.

VIII Conclusions In this work, we have considered the problem of realizing high coding gain with multilevel block codes at low computational cost. The Viterbi algorithm (VA) is an optimal algorithm for maximum likelihood decoding of MB codes. However, for codes with a large number of trellis states, the VA imposes a heavy computational burden which has made it impractical to implement. Due to this constraint, optimal decoding of some powerful MB codes with high coding gain is not feasible.

Lee and Kschischang

32

Our goal has been to derive and improve suboptimal schemes which enable the achievement of high coding gain at relatively low decoding complexity. In this paper, we used a multistage approach to decoding multilevel codes. We view this approach as a process of \successive re nement," in which the decoder narrows the set of candidate decoded words in a series of decoding stages. We have introduced the the multi-staged coset decoding (MSCD) algorithm which is a generalization of the closest coset decoding method for binary RM codes presented by Hemmati [13] and staged decoding for decomposable block codes proposed by Takata, et al. [22]. Using a coset supercode, chosen to have a simple trellis structure while maintaining the intercoset distance, eective suboptimal decoding is achieved, with the main performance degradation arising from an increase in the path multiplicity. We have also seen that introducing syndrome checking can further reduce the average computational eort at moderate to high SNR values, without further reduction in coding gain. The improvement obtained is worthwhile but not dramatic. However, the reduction in average eort comes at the expense of a variability in the number of stages required for decoding. Our results indicate that the best results are obtained by doing optimum decoding of enough of the lowest levels combined so that near-optimal performance is obtained. This may involve, e.g., combining the rst two rows of the coordinate array into a single code to be decoded in a single step. The nearest neighbour multiplicity is also governed largely by the code in the rst row; hence combining the codes in this way can reduce the eective nearest neighbour multiplicity. (In the context of lattice decoding, an alternative is to do multiple parallel decodings of the cosets of a sublattice of small index as in [1, 10].) The performance improvement must be balanced against the computational complexity needed for decoding. In general, the price to be paid for lower decoding complexity is lower coding gain. Therefore, there is a compromise between coding gain and decoding complexity. However, both analysis and extensive simulation results show that, for the class of codes we have considered, suboptimal decoding of a relatively complicated code can have both performance and complexity advantages relative to optimal decoding of a relatively simple code.

Lee and Kschischang

33

References [1] O. Amrani, Y. Be'ery, A. Vardy, F.-W. Sun, and H. C. A. van Tilborg, The Leech Lattice and the Golay Code: Bounded-Distance Decoding and Multilevel Constructions. IEEE Trans. on Inform. Theory , to appear. hh

ii

[2] E. Biglieri and A. Spalvieri: Generalized concatenation: A tutorial. In: Coded Modulation and Bandwidth-Ecient Transmission (E. Biglieri and M. Luise, Eds.), Elsevier Science Publishers, B.V., Amsterdam, 1992, pp. 27{39. [3] E. L. Blokh and V. V. Zyablov: Coding of generalized concatenated codes. Probl. Peredach. Inform. Vol. 10, 1974, pp. 218{222. hh

ii

[4] A. R. Calderbank: Multilevel codes and multistage decoding. IEEE Trans. Commun. Vol. 37, 1989, pp. 222{229. hh

ii

[5] A. R. Calderbank and N. J. A. Sloane: New trellis codes based on lattices and cosets. IEEE Trans. on Inform. Theory Vol. IT-33, March 1987, pp. 177{195. hh

ii

[6] J. H. Conway and N. J. A. Sloane: Sphere Packings, Lattices and Groups. Springer-Verlag, Berlin, 2nd Edition, 1993. [7] E. L. Cusack: Error control codes for QAM signalling. Electronics Letters Vol. 20, 1984, pp. 62{63. hh

ii

[8] G. D. Forney, Jr.: Coset codes I: Introduction and geometrical classi cation. IEEE Trans. on Inform. Theory Vol. 34, Sep. 1988, pp. 1123{1151. hh

ii

[9] G. D. Forney, Jr.: Coset codes II: Binary lattices and related codes. IEEE Trans. on Inform. Theory Vol. 34, Sep. 1988, pp. 1152{1187. hh

ii

[10] G. D. Forney, Jr.: A Bounded-Distance Decoding Algorithm for the Leech Lattice, with Generalizations. IEEE Trans. on Inform. Theory Vol. 35, July 1989, pp. 906{909. hh

ii

Lee and Kschischang

34

[11] G. D. Forney, Jr.: Geometrically uniform codes. IEEE Trans. on Inform. Theory Vol. 37, Sept. 1991, pp. 1241{1260. hh

ii

[12] V. V. Ginzburg: Multidimensional signals for a continuous channel. Problems of Information Transmission vol. 20, 1984, pp. 20{34. (Translated from Problemy Peredachi Informatsii Vol. 20, No. 1, 1984, pp. 28{46.) hh

ii

hh

ii

[13] F. Hemmati: Closest coset decoding of juju + vj codes. IEEE J. Select. Areas Commun. Vol. SAC-7, Aug. 1989, pp. 982{988. hh

ii

[14] H. Imai and S. Hirakawa: A new multilevel coding method using error-correcting codes. IEEE Trans. on Inform. Theory Vol. IT-23, May 1977, pp. 371{377. Correction, Nov. 1977, p. 784. hh

ii

[15] T. Kasami, T. Takata, T. Fujiwara, and S. Lin: On multilevel block modulation codes. IEEE Trans. on Inform. Theory Vol. 37, July 1991, pp. 965{975. hh

ii

[16] F. R. Kschischang, P. G. de Buda, and S. Pasupathy: Block coset codes for M-ary phase shift keying. IEEE J. Selected Areas in Commun. Vol. 7, Aug. 1989, pp. 900{913. hh

ii

[17] F. R. Kschischang and S. Pasupathy: Some Ternary and Quaternary Codes and Associated Sphere Packings. IEEE Trans. on Inform. Theory Vol. 38, March 1992, pp. 227{246. hh

ii

[18] F. J. MacWilliams and N. J. A. Sloane: The Theory of Error-Correcting Codes. NorthHolland, New York, 1977. [19] G. J. Pottie and D. P. Taylor: Multilevel codes based on partitioning. IEEE Trans. on Inform. Theory Vol. 35, Jan. 1989, pp. 87{98. hh

ii

[20] J. G. Proakis: Digital Communications. McGraw-Hill, New York, 1983. [21] S. I. Sayegh: A class of optimum block codes in signal space. IEEE Trans. Commun. Vol. COM-34, 1986, pp. 1043{1045. hh

ii

Lee and Kschischang

35

[22] T. Takata, Y. Yamashita, T. Fujiwara, T. Kasami, and S. Lin: On a sub-optimum decoding of decomposable block codes. In: Coded Modulation and Bandwidth-Ecient Transmission. (E. Biglieri and M. Luise, Eds.), Elsevier Science Publishers B.V., Amsterdam, 1992, pp. 201{ 212. [23] G. Ungerboeck: Channel coding with multilevel/phase signals. IEEE Trans. on Inform. Theory Vol. IT-28, Jan. 1982, pp. 55{67. hh

ii

[24] V. A. Zinoviev: Generalized cascade codes. Probl. Peredach. Inform. Vol. 12, 1976, pp. 2{9. hh

ii

Lee and Kschischang

36

MLD 3-staged MSCD Number of States 32 2+8+2 BSCS 0 96 4-section Trellis 1023 Stage 0: 7 T (i) Stage 1: 127 Stage 2: 15

4-staged MSCD 2+4+2+2 96 Stage 0: 7 Stage 1: 31 Stage 2: 15 Stage 3: 15 Branch Metrics 1280 Stage 0: 24 Stage 0: 24 B (i) Stage 1: 128 Stage 1: 64 Stage 2: 72 Stage 2: 24 Stage 3: 72 Total (binary op.) 2303 469 348 Table I: Complexity comparison among dierent algorithms for decoding code C = (16; 1) (16; 11) (16; 15).

d[C ] R[C] 4A2 2/3 8A2 9/16 4A2 3/4 16A2 43/96 8A2 21/32 4A2 23/32 Table II: Multilevel Block codes on 8-PSK used for simulation. Notation

C8P;a C16P ;a C16P ;b C32P ;a C32P ;b C32P ;c

De nition (8; 1; 8) (8; 7; 2) (8; 8; 1) (16; 1; 16) (16; 11; 4) (16; 15; 2) (16; 5; 8) (16; 15; 2) (16; 16; 1) (32; 1; 32) (32; 16; 8) (32; 26; 4) (32; 6; 16) (32; 26; 4) (32; 31; 2) (32; 6; 16) (32; 31; 2) (32; 32; 1)

d[C ] R[C] 4A2 3/4 8A2 5/8 16A2 1/2 8A2 47/64 Table III: Multilevel Block codes on 16-QAM used for simulation.

Notation

C4Q;a C8Q;a C16Q ;a C16Q ;b

De nition (4; 1; 4) (4; 3; 2) (4; 4; 1) (4; 4; 1) (8; 1; 8) (8; 4; 4) (8; 7; 2) (8; 8; 1) (16; 1; 16) (16; 5; 8) (16; 11; 4) (16; 15; 2) (16; 5; 8) (16; 11; 4) (16; 15; 2) (16; 16; 1)

Lee and Kschischang

Notation Stage Code to be Decoded MA (C8P;a) 0 (8; 1) (8; 7) (8; 8) MB (C8P;a) 0 (8; 1) (8; 8) (8; 8) 1 (8; 0) (8; 7) (8; 8) P MC (C16;a) 0 (16; 1) (16; 11) (16; 15) MD (C16P ;a) 0 (16; 1) (16; 16) (16; 16) 1 (16; 0) (16; 11) (16; 16) 2 (16; 0) (16; 0) (16; 15) P ME (C16;a) 0 (16; 1) (16; 16) (16; 16) 1 (16; 0) (16; 12) (16; 16) 2 (16; 0) (16; 7) (16; 16) 3 (16; 0) (16; 0) (16; 15) MF (C16P ;a) 0 (16; 1) (16; 11) (16; 16) 1 (16; 0) (16; 0) (16; 15) P MG (C16;b) 0 (16; 5) (16; 16) (16; 16) 1 (16; 0) (16; 15) (16; 16) P MH (C32;a) 0 (32; 1) (32; 32) (32; 32) 1 (32; 0) (32; 19) (32; 32) 2 (32; 0) (32; 13) (32; 32) 3 (32; 0) (32; 0) (32; 28) 4 (32; 0) (32; 0) (32; 22) P MI (C32;a) 0 (32; 1) (32; 32) (32; 32) 1 (32; 0) (32; 19) (32; 32) 2 (32; 0) (32; 14) (32; 32) 3 (32; 0) (32; 7) (32; 32) 4 (32; 0) (32; 0) (32; 28) 5 (32; 0) (32; 0) (32; 22) P MJ (C32;b) 0 (32; 6) (32; 32) (32; 32) 1 (32; 0) (32; 28) (32; 32) 2 (32; 0) (32; 22) (32; 32) 3 (32; 0) (32; 0) (32; 31) P MK (C32;c) 0 (32; 6) (32; 32) (32; 32) 1 (32; 0) (32; 31) (32; 32) Table IV: Methods for decoding Multilevel Block codes on 8-PSK.

37

Lee and Kschischang

Notation Stage Code to be Decoded MA (C4Q;a) 0 (4; 1) (4; 3) (4; 4) (4; 4) MB (C4Q;a) 0 (4; 1) (4; 4) (4; 4) (4; 4) 1 (4; 0) (4; 3) (4; 4) (4; 4) Q MC (C8;a) 0 (8; 1) (8; 4) (8; 7) (8; 8) MD (C8Q;a) 0 (8; 1) (8; 8) (8; 8) (8; 8) 1 (8; 0) (8; 4) (8; 8) (8; 8) 2 (8; 0) (8; 0) (8; 7) (8; 8) Q ME (C8;a) 0 (8; 1) (8; 4) (8; 8) (8; 8) 1 (8; 0) (8; 0) (8; 7) (8; 8) Q MF (C16;a) 0 (16; 1) (16; 16) (16; 16) (16; 16) 1 (16; 0) (16; 5) (16; 16) (16; 16) 2 (16; 0) (16; 0) (16; 11) (16; 16) 3 (16; 0) (16; 0) (16; 0) (16; 15) Q MG (C16;a) 0 (16; 1) (16; 5) (16; 16) (16; 16) 1 (16; 0) (16; 0) (16; 11) (16; 16) 2 (16; 0) (16; 0) (16; 0) (16; 15) Q MH (C16;a) 0 (16; 1) (16; 5) (16; 16) (16; 16) 1 (16; 0) (16; 0) (16; 12) (16; 16) 2 (16; 0) (16; 0) (16; 7) (16; 16) 3 (16; 0) (16; 0) (16; 0) (16; 15) Q MI (C16;a) 0 (16; 1) (16; 16) (16; 16) (16; 16) 1 (16; 0) (16; 5) (16; 16) (16; 16) 2 (16; 0) (16; 0) (16; 12) (16; 16) 3 (16; 0) (16; 0) (16; 7) (16; 16) 4 (16; 0) (16; 0) (16; 0) (16; 15) Q MJ (C16;b) 0 (16; 5) (16; 16) (16; 16) (16; 16) 1 (16; 0) (16; 11) (16; 16) (16; 16) 2 (16; 0) (16; 0) (16; 15) (16; 16) Q MK (C16;b) 0 (16; 5) (16; 16) (16; 16) (16; 16) 1 (16; 0) (16; 12) (16; 16) (16; 16) 2 (16; 0) (16; 7) (16; 16) (16; 16) 3 (16; 0) (16; 0) (16; 15) (16; 16) Table V: Methods for decoding Multilevel Block codes on 16-QAM.

38

Lee and Kschischang

39

Notation Path Multiplicity Number of Stages Number of States MA (C8P;a) 120 1 (MLD) 4 P MB (C8;a) 120 2 2+2 P MC (C16;a) 1240 1 (MLD) 32 P MD (C16;a) 2360 3 2+8+2 ME (C16P ;a) 4152 4 2+4+2+2 MF (C16P ;a) 2360 2 16+2 P MG(C16;b) 496 2 8+2 P MH (C32;a) 506712 5 2+8+8+4+4 MI (C32P ;a) 543576 6 2+8+4+2+4+4 MJ (C32P ;b) 63344 4 16+4+4+2 P MK (C32;c) 1984 2 16+2 Table VI: Path Multiplicity for decoding Multilevel Block codes on 8-PSK.

Notation Path Multiplicity MA (C4Q;a) 79 Q MB (C4;a) 119 Q MC (C8;a) 504 Q MD (C8;a) 7040 Q ME (C8;a) 889 Q MF (C16;a) 43068786 Q MG(C16;a) 43084 Q MH (C16;a) 44876 Q MI (C16;a) 43070578 MJ (C16Q ;b) 200914 Q MK (C16;b) 203784

Number of Stages 1 (MLD) 2 1 (MLD) 3 2 4 3 4 5 3 4

Number of States 4 2+2 16 2+4+2 8+2 2+8+8+2 16+8+2 16+4+2+2 2+8+4+2+2 8+8+2 8+4+2+2

Table VII: Path Multiplicity for decoding Multilevel Block codes on 16-QAM.

Lee and Kschischang

Notation

MLD MSCD

40

MSCD/SC (at 10?4 ) 4.44 8.19 7.67 21.89 6.64 24.98 24.77 10.94 7.88

MSCD/SC (at 10?5 ) 4.44 7.63 7.41 21.89 6.64 23.26 22.65 10.70 7.87

Gain (at 10?4 ) 2.23 2.19 3.89 3.72 3.60 3.72 2.38 4.71 4.69 3.85 2.23

Gain (at 10?5 ) 2.35 2.33 4.11 3.92 3.84 3.95 2.58 5.06 5.04 4.17 2.41

MSCD/SC (at 10?4 ) 5.58 7.95 9.95 15.25 16.63 16.34 15.16 9.19 9.04

MSCD/SC (at 10?5 ) 5.58 7.85 9.95 14.25 16.31 16.16 14.19 9.00 8.94

Gain (at 10?4 ) 2.16 2.02 3.96 3.57 3.85 4.65 5.25 5.21 4.65 3.68 3.68

Gain (at 10?5 ) 2.27 2.18 4.14 3.74 4.04 4.88 5.48 5.44 4.88 3.93 3.93

MA(C8P;a) 6.94 P MB (C8;a) 6.38 P MC (C16;a) 85.30 P MD (C16;a) 17.37 ME (C16P ;a) 12.89 P MF (C16;a) 24.52 P MG (C16;b) 8.61 P MH (C32;a) 36.72 MI (C32P ;a) 30.93 P MJ (C32;b) 18.35 P MK (C32;c) 10.06 Table VIII: Comparison of complexity (in binary operations per information bit) and coding gain for dierent decoding schemes for 8-PSK codes.

Notation

MLD MSCD

MA(C4Q;a) 6.58 Q MB (C4;a) 6.50 Q MC (C8;a) 25.55 Q MD (C8;a) 10.65 Q ME (C8;a) 11.50 Q MF (C16;a) 22.63 Q MG (C16;a) 24.66 MH (C16Q ;a) 20.88 Q MI (C16;a) 18.84 Q MJ (C16;b) 14.74 Q MK (C16;b) 12.17 Table IX: Comparison of complexity (in binary operations per information bit) and coding gain for dierent decoding schemes for 16-QAM codes.

Recommend Documents

Time-Space Tradeo s for Set Operations

Performance Analysis of Finite-Buffered Asynchronous Multistage ...