An Efficient Symbol-Level Combining Scheme for ... - Semantic Scholar

Report 0 Downloads 84 Views
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 5, MAY 2009

2443

An Efficient Symbol-Level Combining Scheme for MIMO Systems with Hybrid ARQ Edward W. Jang, Student Member, IEEE, Jungwon Lee, Member, IEEE, Leilei Song, Member, IEEE, and John M. Cioffi, Fellow, IEEE

Abstract—This paper proposes a new combining scheme for multiple-input multiple-output (MIMO) systems with hybrid automatic-repeat-request (HARQ). The proposed combining scheme is proved to have the optimal decoding performance. Furthermore, the proposed combining scheme is shown to have low memory requirement and reduced complexity compared to other optimal combining schemes. Simulation results under IEEE 802.16e setting with UMTS channel models verify that the proposed combining scheme achieves the optimal decoding performance and performs much better than other suboptimal combining schemes. Index Terms—MIMO, hybrid ARQ, chase combining, symbollevel combining, QR decomposition.

I. I NTRODUCTION

T

HE fundamental objective of data communication systems is to send messages to destinations without an error, and there are many approaches toward this objective. One approach is to employ a forward-error-correcting (FEC) code. Knowledge of instantaneous channel state information (CSI) both at the transmitter and the receiver allows using a FEC code with coding gain sufficiently strong to overcome the noise incurred by the channel. However, to ensure reliability without the instantaneous CSI, which is often the case especially in wireless systems, the transmitter should use a powerful FEC code that guarantees a low probability of error even for the worst case CSI. Although constant throughput is achieved, such aggressive coding results in too much redundancy in the transmitted signal and decreases the overall throughput. Another approach is to employ automatic-repeatrequest (ARQ). An ARQ system uses an error-detecting code such as a cyclic redundancy check (CRC) code. If the receiver detects an error and feeds back negative acknowledgement (NAK) or if the transmitter does not receive acknowledgement (ACK) within a certain waiting time, the message is retransmitted. Although this simple scheme offers high-reliability

Manuscript received October 17, 2007; revised April 2, 2008, October 10, 2008, and January 6, 2009; accepted January 6, 2009. The associate editor coordinating the review of this paper and approving it for publication was S. Aissa. E. W. Jang and J. M. Cioffi are with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA (e-mail: {ej1130, cioffi}@stanford.edu). J. Lee and L. Song are with Marvell Semiconductor, Inc., 5488 Marvell Ln., Santa Clara, CA 95054, USA (e-mail: [email protected], [email protected]). This work was presented in part at the IEEE Global Communications Conference, Washington, DC, USA, Nov. 2007. Digital Object Identifier 10.1109/TWC.2009.071153

transmission, it suffers from low and time-varying throughput when multiple retransmissions occur because of severe fading. To address these problems for using only either a FEC code or an ARQ system, a hybrid ARQ (HARQ) system is proposed. A HARQ system incorporates both a FEC code and an ARQ system [1]. In a HARQ system, the receiver first corrects errors using a FEC code and then detects any remaining errors using a CRC code. Normally the errors are corrected by the FEC code, but if errors still exist, retransmission then occurs as with the ARQ system. This HARQ system provides tradeoff operating points between high throughput and high reliability respectively offered through a FEC code and an ARQ system. A HARQ system can be further enhanced by incorporating Chase combining (HARQ-CC) [2] or incremental redundancy (HARQ-IR) [1]. Instead of discarding the previously received signals detected to contain errors, both HARQ-CC and HARQIR combine all the received signals to decode the transmitted message. The difference between HARQ-CC and HARQ-IR is that while HARQ-CC uses the same modulation and coding scheme for retransmissions, HARQ-IR varies the coding scheme for each retransmission. Additionally, HARQ can be used with repetition coding if the coding gain of the system’s lowest code rate is insufficient. A HARQ system and repetition coding are used in IEEE 802.16e standard [3], [4]. Design of a receiver combining scheme for single-input single-output (SISO) systems with HARQ is straightforward because maximal-ratio combining (MRC) is well-known to achieve the maximum signal-to-noise power ratio (SNR) [5]. However, design of a receiver combining scheme for multipleinput multiple-output (MIMO) systems with HARQ is not obvious, because the symbols transmitted from each antenna interfere with each other. Recently, combining schemes using linear equalizers, the zero-forcing (ZF) receiver and the minimum-mean-squared error (MMSE) receiver, are proposed in [6]. These combining schemes first use linear equalizers to convert MIMO systems into SISO systems and then combines the signals respectively. Also, weighted combining schemes at the multi-levels of a receiver, post-equalizer, post-demapper, and post-decoder, are proposed in [7]. The receiver combining schemes that have the optimal decoding performance for HARQ-CC and HARQ-IR are proposed and analyzed in [8]. This paper proposes another optimal receiver combining scheme for HARQ-CC, which has reduced complexity compared to the optimal combining schemes in [8], and analyzes its performance. This paper is organized as follows: Section II describes a

c 2009 IEEE 1536-1276/09$25.00 

2444

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 5, MAY 2009

system model. Section III proposes the combining schemes for MIMO systems with HARQ. Section IV compares the receiver combining schemes in terms of memory requirement and complexity. Section V provides simulation results under IEEE 802.16e setting and shows the performance of the proposed combining schemes. Finally, Section VI provides a conclusion and remarks. This paper uses the following notations: Vectors and Matrices are in boldface, respectively in lowercase and uppercase. XT stands for the transpose of X. X∗ stands for the conjugate transpose of X. II. S YSTEM M ODEL Consider a MIMO system with Nt transmit antennas and Nr receive antennas. The system employs HARQ-CC, which uses the same modulation and coding scheme for retransmissions. If the number of transmissions is N , the receiver has N different received signal vectors. The relationship between the transmitted signal vector and the received signal vectors is, yi = Hi x + ni ,

i = 1, · · · , N,

(1)

where yi and ni respectively denote the Nr × 1 received signal vector and the Nr × 1 additive white Gaussian noise (AWGN) vector at time i. Hi denotes the Nr × Nt effective channel matrix, which possibly includes the precoding matrix that assigns multiple transmit streams across multiple transmit antennas. The channel is assumed to be block fading. Thus Hi remains constant during each transmission but is independent over multiple transmissions due to the inherent delay in the ARQ process. x denotes the Nt × 1 signal vector, which is transmitted repeatedly. Without loss of generality, by appropriately scaling each noise variance, ni are assumed to be i.i.d. zero-mean circularly symmetric complex Gaussian (ZMCSCG) with covariance INr for all i = 1, · · · , N . Hence the conditional probability distribution function P r{yi |Hi , x} is given by, P r{yi |Hi , x} =

1 π Nr

  exp −yi − Hi x2 .

(2)

The system model naturally applies to MIMO orthogonal frequency-division multiplexing (MIMO-OFDM) systems with HARQ, where combining operations are performed per subchannel. The system model also covers MIMO-OFDM systems with repetition coding, where the transmitter sends multiple copies of the same transmit signal across different subchannels. In that case, Hi denote the different subchannel matrices where the same transmit signal is transmitted. The system model uses M -QAM modulation with M = 2m . The transmit bit sequence for the transmit signal vector x is b ∈ {0, 1}mNt , i.e., the length of the sequence is mNt , and each of its elements is either 0 or 1. This bit sequence b can be the encoder output for coded systems or the message bit sequence itself for uncoded systems. Use of a good interleaver on multiple bit sequences renders the individual bit elements of the bit sequence b uncorrelated. These bits are demultiplexed across Nt transmit antennas and may go through another coding and/or interleaving block, are modulated to x, and are finally sent at the transmitter.

III. C OMBINING S CHEME Although the existing MRC scheme with maximumlikelihood (ML) decoding achieves the optimal decoding performance with low memory requirements [8], the MRC scheme involves matrix inversions, which can be computationally complex. On the other hand, the receiver combining scheme proposed in this paper avoids matrix inversions by concatenating the channel matrices and by using the QR decomposition; hence it is denoted as the concatenation-assisted symbol-level combining (CASLC) scheme. It is shown by mathematical analysis that the proposed CASLC scheme with ML decoding also achieves the optimal decoding performance for MIMO systems with HARQ. Even though this paper shows only ML decoding, other decoding methods, such as ZF or MMSE, can be used with the CASLC scheme. Moreover, the proposed combining scheme is designed to reuse the basic decoder, regardless of the number of transmissions. The basic decoder denotes a decoder that is used when only one received signal vector is available. Repeatedly reusing the basic decoder when multiple retransmissions occur is beneficial in terms of complexity and flexibility. This is because the number of received signal vectors might be different from time to time, and designing receivers for each case not only drastically increases the complexity but also limits the flexibility of the receiver. This section consists of three parts. Section III-A describes the first step of the proposed combining scheme, the concatenation step. Section III-B describes the second step of the proposed combining scheme, the QR decomposition step, and proves that this scheme has the optimal decoding performance. Finally, Section III-C shows how to modify the first and second step into a single process, resulting in the incremental QR decomposition. It also shows that the incremental QR decomposition reduces the memory requirements and computational complexity of the scheme, while maintaining the optimal decoding performance. Throughout this section, it is assumed that the QR decomposition algorithm always outputs the same decomposed matrices for each matrix. In other words, whilst there exist multiple QR decomposition algorithms, the same algorithm is used throughout this section. The resulting Q matrix has orthonormal columns, and the resulting R matrix is an uppertriangular matrix with real and positive diagonal elements. A. Concatenation When the transmitter sends a signal vector only once, the relationship between the transmitted signal vector and the received signal vector is, y1

= H1 x + n1 .

(3)

In this case, the receiver estimates the Nt ×1 transmitted signal vector x from the Nr × 1 received signal vector y1 and the Nr × Nt channel matrix H1 . The ML decoder at this stage uses y1 − H1 x2 as the metric, and this is the basic decoder. When the transmitter sends the common signal vector N times (N ≥ 2), the relationship between the transmitted signal vector and the received signal vectors is, yi

=

Hi x + ni ,

i = 1, · · · , N,

(4)

JANG et al.: AN EFFICIENT SYMBOL-LEVEL COMBINING SCHEME FOR MIMO SYSTEMS WITH HYBRID ARQ

The proposed CASLC scheme concatenates the received signal vectors and the channel matrices, which are assumed to have been perfectly estimated. ˜ yN ˜ HN

= =

[yT1 yT2 · · · yTN ]T , [HT1 HT2 · · · HTN ]T ,

(5) (6)

˜ N respectively are the N Nr × 1 concatenated where ˜ yN and H received signal vector and the N Nr ×Nt concatenated channel matrix. Then the relationship between the channel input and output can be represented as ˜ yN

=

˜Nx + n ˜N , H

(7)

˜N = ˜N is the N Nr × 1 concatenated noise vector, n where n [nT1 nT2 · · · nTN ]T . This equation (7) is equivalent to (4) because no information is lost during concatenation. The receiver estimates the Nt × 1 common transmitted signal vector x from the N Nr × 1 concatenated signal vector ˜N. ˜ yN and the N Nr × Nt concatenated channel matrix H It is possible to directly perform ML decoding after this ˜ N x2 as the metric. This concatenation step by using ˜ yN − H scheme is denoted as the decoding scheme with concatenation only (CO). In this case, the decoding performance is optimal because no information is lost during concatenation. However, ˜ N respectively are N Nr × 1 since the sizes of ˜ yN and H and N Nr × Nt , the basic decoder cannot be used. As a result, the complexity significantly increases as the number of transmissions N increases. B. QR Decomposition The key idea of the proposed CASLC scheme is employing the QR decomposition after the concatenation step. Using QR decomposition also enables reusing the basic decoder, thereby decreasing the receiver complexity. The CASLC scheme is implemented as follows: With only one received signal vector, the channel matrix H1 is QRdecomposed into H1 = Q1 R1 , where Q1 and R1 respectively are an Nr × Nt matrix with orthonormal columns and an Nt × Nt upper-triangular matrix. Therefore, y1 = H1 x + n1 = Q1 R1 x + n1 .

(8)

Since Q∗1 Q1 = INt , multiplying both sides of (8) by Q∗1 yields Q∗1 y1 = Q∗1 Q1 R1 x + Q∗1 n1 = R1 x + Q∗1 n1 , E[Q∗1 n1 n∗1 Q1 ]

(9)

where = INt . The receiver estimates the Nt ×1 common transmitted signal vector x from the Nt × 1 signal vector Q∗1 y1 and the Nt × Nt upper-triangular matrix R1 . Now, the ML decoding metric is Q∗1 y1 −R1 x2 instead of y1 −H1 x2 . The complexity of ML decoding is reduced because R1 is an upper-triangular matrix. For example, when Nt = Nr = 2, ML decoding calculates the first and the second components of the ML decoding metric for every possible transmitted signal vector x and then adds the corresponding Euclidean distances for each x. When using y1 −H1 x2 , it requires M 2 calculations for both components. When using Q∗1 y1 − R1 x2 , it requires M 2 calculations for the first component, but only M calculations for the second component. The basic decoder in this case uses Q∗1 y1 −R1 x2 as the ML decoding metric.

2445

With N received signal vectors (N ≥ 2), the QR decomposition is performed on the N Nr × Nt concatenated channel ˜ N , resulting in H ˜N = Q ˜NR ˜ N and ˜ N , where Q matrix H ˜ N respectively are an N Nr × Nt matrix with orthonormal R columns and an Nt × Nt upper-triangular matrix. Therefore, ˜Nx + n ˜NR ˜Nx + n ˜yN = H ˜N = Q ˜N . Since yields

˜∗ Q ˜ Q N N

(10)

˜∗ = INt , multiplying both sides of (10) by Q N

˜ ∗ ˜y = Q ˜ ˜ ˜∗ Q ˜ ∗ ˜N = R ˜Nx + Q ˜∗ n Q N N N N RN x + Q N n N ˜N ,

(11)

˜N] ˜∗ n ˜∗N Q E[Q N ˜N n

= I Nt . where The QR decomposition step after the concatenation step ˜ ∗ ˜y − R ˜ N x2 renders the metric for ML decoding as Q N ∗ N 2 ˜ ˜ ˜N yN and R instead of ˜yN − HN x . The sizes of QN ˜ respectively are Nt × 1 and Nt × Nt , which respectively are the same with the sizes of the received signal and the channel matrix used by the basic decoder. Therefore, the basic decoder can be reused even when multiple received signal vectors are available from the common transmitted signal vector, thereby reducing the complexity of the receiver. This CASLC scheme with the concatenation step followed by the QR decomposition step is denoted as the CASLC scheme with the direct QR decomposition (CASLC-DQ). The following theorem proves that the CASLC-DQ scheme maintains all the necessary information for decoding, thereby achieving the optimal decoding performance. Theorem 1: The CASLC-DQ scheme achieves the optimal decoding performance. Proof: See Appendix I. Theorem 1 can be also proved intuitively as follows. The ˜ N can be thought as an N Nr Nt columns of the matrix Q dimensional orthonormal basis for the Nt dimensional subspace in which the transmitted signal vector lies. Since the transmitted signal vector dimension is Nt , the noise parts lying on the extra N Nr − Nt dimensions do not affect the decoding process. Therefore, although the multiplication reduces the both dimensions of the signal part and the noise part from N Nr to Nt , no information is lost during the multiplication of ˜ ∗ , enabling the CASLC-DQ scheme to achieve the optimal Q N decoding performance. C. Incremental QR Decomposition Although the CASLC-DQ scheme achieves the optimal decoding performance, it suffers in two areas if implemented as is: the memory requirement and the QR decomposition. First, all the previous channel matrices and the received signal vectors need to be stored for the concatenation step and the QR decomposition step, requiring large memory at the receiver. Second, for every retransmission, the size of the QR decomposition increases. This leads to designing the QR decomposition blocks for various matrix sizes, making the receiver complex and inflexible. To overcome these problems, the CASLC scheme with the incremental QR decomposition (CASLC-IQ) is proposed. The incremental QR decomposition modifies both the concatenation step and the QR decomposition step, and merges those steps into a single process.

2446

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 5, MAY 2009

When the first transmission occurs, the receiver performs the same procedure described in the previous subsection. The receiver performs y1 = Q∗1 y1 = R1 x + Q∗1 n1 ,

(12)

where H1 = Q1 R1 . Then it estimates the Nt × 1 common transmitted signal vector x from the Nt × 1 signal vector y1 and the Nt × Nt upper-triangular matrix R1 by using y1 − R1 x2 as the ML decoding metric. If an error is detected after decoding, the receiver stores y1 and R1 for later combining process. When the second transmission occurs, the incremental QR decomposition concatenates y1 and R1 respectively with y2 and H2 . Next, it performs the QR decomposition on the (Nt + Nr ) × Nt concatenated matrix of R1 and H2 ,   R1 = Q2 R2 , (13) H2 where Q2 and R2 respectively are an (Nt + Nr ) × Nt matrix with orthonormal columns and an Nt × Nt upper-triangular matrix. Then, the receiver multiplies Q∗ 2 to the (Nt + Nr ) × 1 concatenated vector of y1 and y2 to get y2 ,    y1 ∗  . (14) y2 = Q2 y2 Finally, it estimates the Nt × 1 common transmitted signal vector x from the Nt × 1 signal vector y2 and the Nt × Nt upper-triangular matrix R2 by using y2 − R2 x2 as the ML decoding metric. The following theorem proves that this incremental QR decomposition is equivalent to the concatenation step followed by the QR decomposition step when Nt = Nr . Theorem 2: When Nt = Nr , the ML decoding metric of the incremental QR decomposition, y2 − R2 x2 , is equal to the ML decoding metric of the direct QR decomposition, 2 ˜ ∗˜ ˜ Q 2 y2 − R2 x . Proof: See Appendix II. Theorem 2 suggests that the CASLC-IQ scheme also has the optimal decoding performance when the number of transmissions N is 2 and when Nt = Nr . Although the equalities no longer hold when the number of transmissions N is larger than 3 or when Nt = Nr , Theorem 2 implies that it might be unnecessary to store all the previous channel matrices and the received signal vectors. Therefore, instead of going through the concatenation step and the QR decomposition step, the CASLC-IQ scheme is implemented as follows: When the first transmission occurs, the receiver performs the QR decomposition on the Nr × Nt channel matrix, H1 = Q1 R1 . Then, it decodes the transmitted message with y1 = Q∗1 y1 and R1 , using ||y1 − R1 x||2 as the ML decoding metric. If an error is detected, the receiver stores the Nt × 1 vector y1 and the Nt × Nt upper-triangular matrix R1 . When the N th transmission occurs (N ≥ 2), the receiver concatenates the stored matrix RN −1 with the channel matrix HN . Then, it performs the QR decomposition T T on the (Nt + Nr ) × Nt concatenated matrix, [RT N −1 HN ] =   QN RN . Finally, it decodes the transmitted message with   T T T  2 yN = Q∗ N [yN −1 yN ] and RN , using ||yN − RN x|| as the ML decoding metric. If an error is detected, the receiver stores

the Nt × 1 vector yN and the Nt × Nt upper-triangular matrix RN , and waits for retransmission. The following theorem shows that this CASLC-IQ scheme achieves the optimal decoding performance for general cases, i.e., for any number of transmissions and for arbitrary numbers of Nt and Nr . Theorem 3: The CASLC-IQ scheme achieves the optimal decoding performance. Proof: See Appendix III. In Appendix III, Theorem 3 is proved straightforwardly by showing that the equality holds between the log-likelihood ratios (LLRs) calculated by the CASLC-IQ scheme and the CASLC-DQ scheme, which is shown to achieve to optimal decoding performance. It can be also proved intuitively as follows: Since the QR decomposition step is proved to maintain all the necessary information in Theorem 1, no loss of information occurs even if the QR decomposition step is applied multiple times, which is essentially what the incremental QR decomposition does. Therefore, all the necessary information is kept after the incremental QR decomposition. As a result, the CASLC-IQ scheme achieves the optimal decoding performance. IV. A NALYSIS O F T HE C OMBINING S CHEMES Both the CASLC scheme with the direct QR decomposition (CASLC-DQ) and the CASLC scheme with the incremental QR decomposition (CASLC-IQ) have the optimal decoding performance. Both of them reuse the basic decoder as well, thereby decreasing the receiver complexity. However, they are vastly different in terms of memory requirement and the size of QR decomposition. When an error is detected after the N th transmission, the CASLC-DQ scheme stores the N Nr ×Nt concatenated matrix and the N Nr × 1 concatenated received signal vector. On the other hand, the CASLC-IQ scheme stores the Nt × Nt uppertriangular matrix and the Nt × 1 signal vector. Therefore, the CASLC-IQ scheme requires small memory size regardless of the number of transmissions. When N transmissions occur, the CASLC-DQ scheme performs QR decompositions on the matrices of size Nr × Nt , 2Nr ×Nt , · · · , N Nr ×Nt . On the other hand, the CASLCIQ scheme performs the QR decomposition on an Nr × Nt matrix once and an (Nt + Nr ) × Nt matrix N − 1 times. Therefore, the CASLC-IQ scheme is a more flexible combining scheme. This analysis is summarized in Table 1, where CO denotes the decoding scheme with concatenation only. V. S IMULATION R ESULTS This section provides simulation results to compare the decoding performances of the receiver combining schemes under IEEE 802.16e system setting. A block diagram of IEEE 802.16e system is shown in Fig. 1. The number of transmit antennas Nt and the number of receive antennas Nr are both set to 2. The number of subcarriers is 1024 and full-usage of subcarriers mode is assumed. One packet consists of multiple coded blocks and spans 3 OFDM symbol times. One OFDM symbol time is

JANG et al.: AN EFFICIENT SYMBOL-LEVEL COMBINING SCHEME FOR MIMO SYSTEMS WITH HYBRID ARQ

2447

TABLE I C OMPARISON OF THE COMBINING SCHEMES Combining Scheme Decoding Performance Basic Decoder Memory Requirement The Size of QR Decomposition

convolutional encoder r = 1/2

data bits + tail bits

puncture bits

CO Optimal No 2N Nr (Nt + 1) -

CASLC-DQ Optimal Yes 2N Nr (Nt + 1) N N r × Nt

append bits

CASLC-IQ Optimal Yes Nt (2 + Nt ) Nr × Nt for N = 1 (Nt + Nr ) × Nt for N ≥ 2

16−QAM with rc = 3/4 for 3GPP−II

0

10

r = 1/2, 2/3, 3/4, 5/6 −1

10

Fig. 1.

QAM modulator

interleaver

subcarrier assignment & spatial mapping

PER

coded bits

N=1

−2

10

Block diagram of IEEE 802.16e system

CO−ML CASLC−ML ML−BLC ML CO−ZF CASLC−ZF ZF−BLC ZF

−3

10

4−QAM with rc = 1/2 for 3GPP−II

0

10

N=3

−4

10 N=1

−1

10

20 SNR (dB)

25

30

PER for 16QAM with code rate 3/4 for 3GPP channel case II

10

−3

N=3

−4

10

0

N=2 5

10 SNR (dB)

15

64−QAM with rc = 3/4 for 3GPP−II

0

10

CO−ML CASLC−ML ML−BLC ML CO−ZF CASLC−ZF ZF−BLC ZF

10

Fig. 2.

N=2 15

−2

N=1

−1

10

20 PER

PER

Fig. 3.

10

−2

10

CO−ML CASLC−ML ML−BLC ML CO−ZF CASLC−ZF ZF−BLC ZF

PER for 4QAM with code rate 1/2 for 3GPP channel case II N=2

−3

10

102.85μs. Each OFDM symbol consists of 16 subchannels and each subchannel has 48 subcarriers. The carrier frequency, the sampling rate, and the subcarrier spacing are 2.5GHz, 11.2MHz, and 10.9375KHz, respectively. The 3GPP UMTS channel model case II with mobile speed 60km/h and the case III with mobile speed 3km/h are used for the simulations [9]. Figures 2, 3, and 4 respectively show the packet-error rate (PER) versus the SNR graphs when the modulation and coding schemes are 4-QAM with code rate rc = 12 , 16-QAM with code rate rc = 34 , and 64-QAM with code rate rc = 34 , for the 3GPP UMTS channel model case II. Figures. 5, 6, and 7 respectively show the PER versus the SNR graphs when the modulation and coding schemes are 4QAM with code rate rc = 12 , 16-QAM with code rate rc = 34 , and 64-QAM with code rate rc = 34 , for the 3GPP UMTS channel model case III. In the figures, both ML and ZF denote the cases when only one transmission occurs. For all the other receiver combining schemes, the numbers of transmissions are 2 and 3 and the multiple received signal vectors are combined accordingly.

N=3

−4

10

Fig. 4.

15

20

25 SNR (dB)

30

35

PER for 64QAM with code rate 3/4 for 3GPP channel case II

CO-ML and CO-ZF denote receiver combining schemes that perform concatenation only and then respectively perform ML decoding and ZF decoding. CASLC-ML and CASLCZF denote the proposed CASLC schemes with incremental QR decomposition respectively with ML decoding and with ZF decoding. Finally, ML-BLC and ZF-BLC denote receiver combining schemes that perform ML decoding and ZF decoding respectively and then add the resulting LLR values. These BLC schemes are representative post-combining schemes for HARQ systems. The simulation results show that the PER significantly decreases by combining multiple received signal vectors. It also shows that the proposed CASLC-ML scheme performs

2448

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 5, MAY 2009

4−QAM with rc = 1/2 for 3GPP−III

0

10

N=1

−1

−2

10

CO−ML CASLC−ML ML−BLC ML CO−ZF CASLC−ZF ZF−BLC ZF

−3

10

N=3

−4

Fig. 5.

0

N=2 5

10 SNR (dB)

15

N=1

−1

10

PER

N=2

−3

N=3

−4

10

Fig. 7.

CO−ML CASLC−ML ML−BLC ML CO−ZF CASLC−ZF ZF−BLC ZF

15

20

25 SNR (dB)

30

35

PER for 64QAM with code rate 3/4 for 3GPP channel case III

16−QAM with rc = 3/4 for 3GPP−III

0

10

−2

10

CO−ML CASLC−ML ML−BLC ML CO−ZF CASLC−ZF ZF−BLC ZF

−3

10

N=2

N=3

−4

Fig. 6.

−2

10

10

20

PER for 4QAM with code rate 1/2 for 3GPP channel case III

10

N=1

−1

10

PER

PER

10

10

64−QAM with rc = 3/4 for 3GPP−III

0

10

10

15

20 SNR (dB)

25

30

PER for 16QAM with code rate 3/4 for 3GPP channel case III

equivalently with the CO-ML scheme. The CO-ML scheme receiver is proved to have the optimal decoding performance in Section III-A, because it is essentially a big ML decoder. Therefore, the fact that the proposed CASLC-ML scheme performs equivalently with the CO-ML scheme supports Theorem 3 in Section III-C that the proposed CASLC-ML scheme also achieves the optimal decoding performance. Additionally, the CASLC-ML scheme performs better than all the other schemes. The simulation results show that the decoding performance of the CASLC scheme with only 2 transmissions is almost equivalent to that of a suboptimal post-combining scheme, the BLC scheme, with 3 transmissions. This underlines the importance of the proposed CASLC scheme having the optimal decoding performance. VI. C ONCLUSION AND R EMARKS This paper proposes a novel receiver combining scheme for MIMO systems with HARQ: the CASLC scheme. By using the QR decomposition, the CASLC scheme is designed to reuse the basic decoder regardless of the number of transmissions, thereby having low receiver complexity. By using the incremental QR decomposition, the CASLC

scheme can be implemented with low memory requirement and high flexibility. By the mathematical analysis as well as the computer simulations, both the CASLC scheme with the direct QR decomposition and the CASLC scheme with the incremental QR decomposition are shown to achieve the optimal decoding performance. The CASLC scheme is preferable to the MRC scheme proposed in [8] for implementation, even though both have the optimal decoding performance and require only small memory size. This is because the MRC scheme requires matrix inversions, which comprises not only add and subtract operations but also many multiply and divide operations. On the contrary, the CASLC scheme only needs the QR decomposition, which can be efficiently performed by Givens rotation [11] with CORDIC method [12] that requires only add and subtract operations along with multiply and divide-bypower-of-2 operations. This is a very attractive feature because implementing multiply or divide by the power of 2 operations only requires shifting bits to the left or to the right in the memory where the operands are stored. Therefore, using the proposed CASLC scheme with the incremental QR decomposition is beneficial for MIMO systems with HARQ in terms of decoding performance, complexity, and memory requirement.

A PPENDIX A P ROOF OF T HEOREM 1 When N transmissions occur for the common transmit signal vector (N ≥ 1), the decoding scheme that uses the ˜ N x2 has the optimal decoding performance, metric ˜yN − H because no information is lost during concatenation. ˜ N x2 ˜yN − H

=

˜ ∗ ˜y ˜y∗N ˜yN − x∗ H N N ∗ ˜ ˜∗ H ˜ −˜yN HN x + x∗ H N N x.

(15)

On the other hand, the CASLC-DQ scheme uses the metric

JANG et al.: AN EFFICIENT SYMBOL-LEVEL COMBINING SCHEME FOR MIMO SYSTEMS WITH HYBRID ARQ 2 ˜∗ ˜ ˜∗ ˜ ˜ Q N yN − RN x . Since QN QN = INt ,

˜∗ ˜ Q N yN

˜ N x −R

2

= =

∗ ˜∗ ˜ ∗ ˜NQ ˜∗ ˜ ˜ yN y∗N Q N yN − x RN QN ˜ ∗ ˜ ˜ ∗ ˜∗ ˜ −˜ yN QN RN x + x RN RN x (16) ∗ ˜ ˜∗ ˜ ∗ ˜y ˜ yN − x∗ H yN QN QN ˜ N N ˜ N x + x∗ H ˜∗ H ˜ −˜ y∗N H (17) N N x.

Equations (15) and (17) show that the ML decoding metrics are equal except for the first term. When calculating the log-likelihood ratio (LLR), these first terms are common factors in the numerator and the denominator. Therefore, the LLRs calculated by the optimal decoding scheme and by the CASLC-DQ scheme are equal. As a result, the CASLC-DQ scheme also has the optimal decoding performance.

to the ML decoding metric of the direct QR decomposition, ˜ 2 x2 , when Nt = Nr . ˜ ∗ ˜y − R Q 2 2 A PPENDIX C P ROOF OF T HEOREM 3 When the first transmission occurs, there is no difference between the incremental QR decomposition and the direct QR decomposition. Therefore, from Theorem 1, the CASLC-IQ scheme also has the optimal decoding performance. When the N th transmission occurs (N ≥ 2), the decoding ˜ N x2 has the optimal scheme that uses the metric ˜yN − H decoding performance, because no information is lost during concatenation. ˜ N x2 ˜yN − H

A PPENDIX B P ROOF OF T HEOREM 2 Performing QR decomposition on the concatenated channel ˜ 2 yields the following. H ˜2 = Q ˜ 2R ˜ 2 and R ˜2 ˜ 2 , where Q matrix H respectively are an 2Nr ×Nt matrix with orthonormal columns and an Nt × Nt upper-triangular matrix. On the other hand,     Q 1 R1 H1 ˜2 = = (23) H H2 H2    Q1 0Nr ×Nr R1 = (24) 0N ×N INr H2  r t R1 , (25) = A H2 where the 2Nr × (Nt + Nr ) matrix A is defined as follows for notational brevity.   Q1 0Nr ×Nr A := . (26) 0Nr ×Nt I Nr Since A∗ A = INt +Nr , multiplying A∗ on both sides of (25) yields   R1 ˜ 2, ˜ 2 = A∗ Q ˜ 2R ˜ 2 = BR (27) = A∗ H H2 where the (Nt + Nr ) × Nt matrix B is defined as follows for notational brevity. ˜ 2. B := A∗ Q

(28)

Since Nt = Nr , B∗ B = INt because AA∗ = I2Nt . Moreover, ˜ R matrix. Thus, from (27) and  2 is an NTt ×Nt upper-triangular  T T   ˜ ˜  respectively are an = Q R , where Q and R R H 1

2

2

2

2

2

2Nt × Nt matrix with orthonormal columns and an Nt × Nt ˜ 2, ˜ 2 and R = R upper-triangular matrix, Q2 = B = A∗ Q 2 because of the uniqueness of the QR decomposition. Therefore,     ˜ ∗˜ ˜ ∗ y1 = Q∗ A∗ y1 y Q = Q (29) 2 2 2 2 y2 y2  ∗     Q1 y1 y1 ∗ = Q (30) = Q∗ 2 2 y2 y2 = y2 . (31)  ˜ ∗˜ ˜ Since y2 = Q 2 y2 and R2 = R2 , the ML decoding metric of the incremental QR decomposition, y2 − R2 x2 , is equal

2449

=

=

˜ ∗ ˜y ˜y∗N ˜yN − x∗ H N N ˜ N x + x∗ H ˜∗ H ˜ −˜y∗N H N Nx

N N   ∗ ∗ ∗ yi yi − x Hi yi i=1



N 

y∗i Hi

i=1

x+x

i=1



N 

(32)

H∗i Hi

i=1

x. (33)

On the other hand, the CASLC-IQ scheme uses the metT T T and ric yN − RN x2 , where yN = Q∗ N [yN −1 yN ] T T T   [RN −1 HN ] = QN RN . yN − RN x2

 ∗ ∗  = y∗ N yN − x RN yN

 ∗ ∗  −y∗ N RN x + x RN RN x.  x∗ R∗ N yN ,

The second term on the right side of (34),    yN −1 ∗ ∗  ∗ ∗ ∗ x RN y N = x RN Q N yN    yN −1 ∗ H ] = x∗ [R∗ N −1 N yN ∗ ∗  = x (RN −1 yN −1 + H∗N yN ) N

 H∗i yi . = · · · = x∗

(34) is (35) (36) (37) (38)

i=1  Similarly, the third term on the right side of (34), y∗ N RN x, is N

 ∗  ∗ yN RN x = yi Hi x. (39) i=1

Finally, of (34),

 since Q∗ N QN = ∗ ∗  x RN RN x, is  x∗ R∗ N RN x =

= = =

INt , the last term on the right side ∗   x∗ R∗ x N QN QN R N   R ∗ ∗ ∗ N −1 x [RN −1 HN ] x HN  ∗ x∗ (R∗ N −1 RN −1 + HN HN )x N

 ∗ ∗ ··· = x Hi Hi x.

(40) (41) (42) (43)

i=1

Equations (38), (39), and (43) show that the ML decoding metrics (33) and (34) are equal except for the first term. When calculating the log-likelihood ratio (LLR), these first terms are

2450

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 8, NO. 5, MAY 2009

˜ N x2 exp −˜ − y H N x

ln ˜ N x2 −˜ yN − H (0) exp ˆ x(0) ∈Xλ   ∗ ˜∗ ˜ ˜ ∗ ˜y − ˜y∗ H ˜ − ˜y∗N ˜yN − x∗ H (1) exp N N N N x + x HN HN x ˆ x(1) ∈Xλ   = ln ∗ ∗H ˜ ∗ ˜y − ˜y∗ H ˜ N x + x∗ H ˜∗ H ˜Nx ˜ ˜ y − y − x (0) exp (0) N N N N N N ˆ x ∈Xλ   ∗ ˜ ∗ ˜∗ ˜ ∗ ˜∗ ˜ ˜ y − −x − y x + x x H H H H (1) exp (1) N N N N N N ˆ x ∈Xλ   = ln ∗ ∗ ∗ ˜ ∗ ∗ ˜ ˜ ˜Nx ˜ ˜ y − −x − y x + x H H H H (0) exp (0) N N N N N ˆ x ∈Xλ   ∗ ∗ ˜ ˜∗ ∗ ∗ ˜∗ ˜ ˜ ˜y − ˜y∗ H ˜ − ˜yN QN QN ˜yN − x H (1) exp N N x + x HN HN x N N ˆ x(1) ∈Xλ   = ln ∗ ∗˜ ˜ ˜NQ ˜ ∗ ˜y − x∗ H ˜ ∗ ˜y − ˜y∗ H ˜ − ˜y∗N Q (0) exp N N x + x HN HN x N N N N ˆ x(0) ∈Xλ

˜ ∗ ˜y − R ˜ N x2 − Q (1) exp (1) N N ˆ x ∈Xλ

. = ln ˜ ∗ ˜y − R ˜ N x2 − Q (0) exp (0) N N ˆ x ∈X

ˆ(1)

(1) ∈Xλ

(18)

(19)

(20)

(21)

(22)

λ

common factors in the numerator and the denominator. Therefore, similarly as in the proof of Theorem 1 in Appendix I, the equality can be shown to hold between the LLRs calculated by the optimal decoding scheme and by the CASLC-IQ scheme.

˜ N x2 −˜ y − H (1) exp (1) N ˆ x ∈Xλ

ln 2 ˜ exp −˜ y − H x (0) N N ˆ x(0) ∈Xλ     2 (1) exp −yN − RN x ˆ x(1) ∈Xλ  . (44)  = ln  − R x2 (0) exp −y N N ˆ x(0) ∈X λ

As a result, the CASLC-IQ scheme also achieves the optimal decoding performance. R EFERENCES [1] S. Lin, D. J. Costello. Jr., and M. J. Miller, “Automatic-repeat-request error-control schemes," IEEE Commun. Mag., vol. 22, no. 12, pp. 5-17, Dec. 1984. [2] D. Chase, “Code combining-a maximum-likelihood decoding approach for combining an arbitrary number of noisy packets," IEEE Trans. Commun., vol. 33, pp. 385-393, May 1985. [3] IEEE Std 802.16-2004, “IEEE standard for local and metropolitan area networks, part 16: air interface for fixed broadband wireless access systems," Oct. 2004. [4] IEEE Std 802.16e-2005, “IEEE standard for local and metropolitan area networks, part 16: air interface for fixed broadband wireless access systems, amendment 2: physical and medium access control layers for combined fixed and mobile operation in licensed bands," Feb. 2006. [5] W. C. Jakes, Microwave Mobile Communications. New York: John Wiley and Sons, 1974. [6] E. N. Onggosanusi, A. G. Dabak, Y. Hui, and G. Jeong, “Hybrid ARQ transmission and combining for MIMO systems," in Proc. IEEE International Conf. Commun. 2003, vol. 5, pp. 3205-3209, May 2003. [7] D. Krishnaswamy, and S. Kalluri, “Multi-level weighted combining of retransmitted vectors in wireless communications," in Proc. IEEE Veh. Technol. Conf., pp. 1-5, Sept. 2006. [8] E. W. Jang, J. Lee, H. Lou, and J. M. Cioffi, “Optimal combining schemes for MIMO systems with hybrid ARQ," in Proc. IEEE International Sym. Inform. Theory 2007, pp. 2286-2290, June 2007. [9] ETSI TR 125 996 V6.1.0, “Universal mobile telecommunications system (UMTS); spatial channel model for multiple input multiple output (MIMO) simulations (3GPP TR 25.996 version 6.1.0 Release 6)," Sept. 2003. [10] D. Toumpakaris, J. Lee, E. W. Jang, and H.-L. Lou, “Reduced-storage hybrid ARQ combining with MIMO equalization," submitted to Proc. 9th IEEE International Workshop Signal Processing Advances Wireless Commun., Recife, Brazil, July 2008.

[11] T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation. Prentice Hall, 2000. [12] J. E. Volder, “The CORDIC trigonometric computing technique," in IRE Trans. Electron. Computing, vol. EC-8, no. 3, pp. 330-334, Sept. 1959. Edward W. Jang (S’04) received his B.S. degree in electrical engineering from Seoul National University, Korea, in 2002, and his M.S. degree in electrical engineering from Stanford University, Stanford, CA, in 2004. He is currently pursuing his Ph.D. degree at Stanford University. His research interests include transmission schemes for systems with a limited feedback rate and for MIMO systems with hybrid ARQ Jungwon Lee (S’00, M’05) received his BS degree in Electrical Engineering from Seoul National University in 1999, and he received his MS and PhD degrees in Electrical Engineering from Stanford University in 2001 and 2005, respectively. From 2000 to 2003, he worked as an intern for National Semiconductor, Telcordia Technologies, and AT&T Shannon Labs Research and as a consultant for Ikanos Communications. Since 2003, he has worked for Marvell Semiconductor Inc., Santa Clara, California, where he is now a Senior Manager/Principal Engineer leading algorithm development and system architecture design for the next generation wireless communication network. His research interests lie in wireless and wireline communication theory with emphasis on multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) system design. Leilei Song (M’99) received Ph.D. degree in electrical and computer engineering from University of Minnesota, Twin Cities, MN, in 1999. From 1999 to 2001, she was a Principle Investigator at Bell Lab Research in Holmdel, New Jersey, where she designed algorithms and efficient architectures for applying forward error correction to high-speed optical communication systems. From 2002 to 2006, she was a Distinguished Member of Technical Staff in the PHY IP and Architecture group at Agere Systems, formerly the Microelectronics division of Lucent Technologies, and contributed to a wide range of communications IC products, including SONET, Gigabit Ethernet, Satellite Radio, PCI Express, etc. Since April 2006, she has been with Marvell Semiconductor, Santa Clara, California, where she is currently a senior engineering manager, leading physical layer development effort for mobile WiMax and other next generation wireless systems. She has 19 patents, granted and pending, and has published more than 20 papers in IEEE journals and international conferences

JANG et al.: AN EFFICIENT SYMBOL-LEVEL COMBINING SCHEME FOR MIMO SYSTEMS WITH HYBRID ARQ

John M. Cioffi BSEE, 1978, Illinois; PhDEE, 1984, Stanford; Bell Laboratories, 1978-1984; IBM Research, 1984-1986; EE Prof., Stanford, 1986present. Cioffi founded Amati Com. Corp in 1991 (purchased by TI in 1997) and was officer/director from 1991-1997. He currently is on the Board of Directors of ASSIA (Chairman), ClariPhy, Teranetics, Vector Silicon Inc., and the Marconi Foundation. He is on the advisory boards of Focus Ventures, Quantenna, and Amicus. Cioffi’s specific interests are in the area of high-performance digital transmission. Various Awards: International Marconi Fellow (2006), Holder of

2451

Hitachi America Professorship in Electrical Engineering at Stanford (2002); Member, National Academy of Engineering (2001); IEEE Kobayashi Medal (2001); IEEE Millennium Medal (2000); IEEE Fellow (1996); IEE JJ Tomson Medal (2000); 1999 U. of Illinois Outstanding Alumnus, 1991 and 2007 IEEE Comm. Mag. best paper; 1995 ANSI T1 Outstanding Achievement Award; NSF Presidential Investigator (1987-1992), ISSLS 2004, ICC 2006, 2007, and 2008 Conference Best-Paper awards. Cioffi has published over 250 papers and holds over 80 patents, of which many are heavily licensed including key necessary patents for the international standards in ADSL, VDSL, DSM, and WiMAX.