Low Complexity Affine MMSE Detector for Iterative ... - Semantic Scholar

Report 23 Downloads 104 Views
150

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 1, JANUARY 2008

Low Complexity Affine MMSE Detector for Iterative Detection-Decoding MIMO OFDM Systems Daniel N. Liu, Student Member, IEEE, and Michael P. Fitz

Abstract— Iterative turbo processing between detection and decoding shows near-capacity performance on a multiple-antenna system. Combining iterative processing with optimum frontend detection is particularly challenging because the front-end maximum a posteriori (MAP) algorithm has a computational complexity that is exponential. Sub-optimum detector such as the soft interference cancellation linear minimum mean square error (SIC-LMMSE) detector with near front-end MAP performance has been proposed in the literature. The asymptotic computational complexity of SIC-LMMSE is O(n2t nr + nt n3r + nt Mc 2Mc ) per detection-decoding cycle where nt is number of transmit antenna, nr is number of receive antenna, and Mc is modulation size. A lower complexity detector is the hard interference cancellation LMMSE (HIC-LMMSE) detector. HIC-LMMSE has asymptotic complexity of O(n2t nr + nt Mc 2Mc ) but suffers extra performance degradation. In this paper, two front-end detection algorithms are introduced that not only achieve asymptotic computational complexity of O(n2t nr + nt n2r [Γ (β) + 1] + nt Mc 2Mc ) where Γ (β) is a function with discrete output {−1, 2, 3, ..., nt } and O(nt Mc 2Mc ) respectively. Simulation results demonstrate that the proposed low complexity detection algorithms offer exactly same performance as their full complexity counterpart in an iterative receiver while being computational more efficient. Index Terms— Turbo processing, soft interference cancellation, affine MMSE filtering, low density parity check (LDPC) codes, iterative decoding.

I. I NTRODUCTION VER since Berrou and Glavieux published their landmark paper on iterative decoding between two parallel concatenated convolutional codes (turbo-codes) [1], [2], it has been generally accepted that iterative (turbo) processing techniques have great value. As pointed out in [3] the “Turbo Principle” not only can be used with traditional concatenated channel coding schemes, but also generally applies to

E

Paper approved by A. Lozano, the Editor for Wireless Network Access and Performance of the IEEE Communication society. Manuscript received December 16, 2005; revised January 16, 2007, and June 7, 2007. This work was supported in part by the grant from STMicroelectronics Inc. with a matching grant from the University of California Discovery Program under Grant COM-0310142. The material in this paper was presented in part at the IEEE Wireless Communications and Networking Conference 2006, Las Vegas, NV, April 2006 and in part at IEEE International Conference on Communications 2006, Istanbul, Turkey, June 2006. D. N. Liu is with UnWiReD Laboratory, the Department of Electrical Engineering, University of California Los Angeles, Los Angeles, CA 90095 USA (e-mail: [email protected]). M. P. Fitz was with UnWiReD Laboratory, the Department of Electrical Engineering, University of California Los Angeles, Los Angeles, CA 90095 USA. He is now with Northrop Grumman Corp., Redondo Beach, CA 90278 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.050623.

many detection-decoding algorithms. Of late, multiple-input multiple-output (MIMO) systems have received tremendous amounts of attention due to the information theoretic studies done by Telatar, Foschini and Gans [4], [5]. To approach channel capacity in a computationally efficient manner, it seems quite natural to apply the “Iterative(Turbo) Paradigm” to MIMO systems. Therefore, many of the aforementioned iterative detection-decoding algorithms have successfully been generalized to MIMO enviroment [6]–[9], especially multipleinput multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems [10]. The complexity of optimum front-end MIMO detection motivates the search for a low complexity suboptimal detector. In fact, the optimal front-end MAP detector has complexity that grows exponentially with both the modulation size and the number of antennas as O(nt nr 2nt Mc ), and becomes impractical as either parameters grows large. Thus, it is important to seek a detector that has reasonable performance while keeping manageable complexity. To address complexity issues, suboptimal detectors/decoders such as: soft interference cancellation linear minimum mean square error (SIC-LMMSE) detector [6], [10], [11], hard interference cancellation LMMSE (HICLMMSE) detector [9], [12] and “list” sphere decoder [13] are proposed in the literature. It has been shown that iterative front-end SIC-LMMSE detector and a properly designed channel code such as low density parity check (LDPC) code system yields a performance 2 to 3 dB away from ergodic channel capcacity [9], [10]. However, the asymptotic computational complexity of this SIC-LMMSE detector is O(n2t nr + nt n3r + nt Mc 2Mc ) per detection-decoding cycle (i.e. turbo iteration) [9], where nt is number of transmit antenna, nr is number of receive antenna and Mc is modulation size. Despite the SICLMMSE detector having a linear growth in the number of transmit antennas, it’s computational complexity remains high even with moderate number of nt , nr and Mc . Further reduced complexity detection such as the HIC-LMMSE detector is also advocated in [9], [12]. HIC-LMMSE has asymptotic computational complexity of O(n2t nr +nt Mc 2Mc ) at the price of performance degradation. This paper proposes two linear front-end detectors which not only achieve a significant amount of complexity reduction, more importantly they offer the same performance as their full complexity counterpart SIC-LMMSE detector. They are named SIC-LMMSE detector with Recursive Update Algorithm (RUA) and SIC-affine minimum mean square error

c 2008 IEEE 0090-6778/08$25.00 

LIU and FITZ: LOW COMPLEXITY AFFINE MMSE DETECTOR FOR ITERATIVE DETECTION-DECODING MIMO OFDM SYSTEMS

(SIC-AMMSE) detector respectively. Unlike any linear frontend detectors previously suggested in literature, the detectors which this paper proposes use a priori information feed back from outer channel decoder for both soft interference cancellation and computational complexity reduction. By reformulating a matrix inversion step in SIC-LMMSE detection algorithm into RUA, SIC-LMMSE detector is transformed into a structure more suitable for iterative detection and decoding receiver. In particular, SIC-LMMSE detector with RUA allocates its computational power depending on the level of the a priori information provided by outer channel decoder. As number of turbo iteration increases, a priori information becomes more and more reliable. Thus, further reduced computational complexity of O(n2t nr + nt n2r [Γ (β) + 1] + nt Mc 2Mc ) is achieved without any performance degradation, where Γ (β) is a function with discrete output {−1, 2, 3, ..., nt }. The novelty of SIC-AMMSE detector lies in the detection process. The SIC-LMMSE detector is given as   ¯ (−) (1) x ˆ = w† y − y ¯ (−) denotes the mean of observation vector given where y the transmitted data symbols other than the one trying to detect. With no a priori information available, it is natural to assume that x ¯ ≡ Ex = 0. Therefore, the optimal detector w, minimizes the mean square error (MSE) with constraint   ¯ (−) = to zero bias [14]–[16]. That is: Eˆ x = w† E y − y x ¯ = 0. But in an iterative detection and decoding receiver, a priori information about the current detection estimate x does become available and x ¯ = 0 after the first turbo iteration. Thus, a priori information should also be taken into account in the detection algorithm. Different from conventional SICLMMSE detection algorithm, AMMSE detector forms its detection estimate as, ¯ ) + ˜b x ˆ = w† (y − y

(2)

¯ is the mean of observation vector given every where y transmitted data symbols and {w, ˜b} are constants to be determined. More importantly, the affine formulation in (2) no longer assumes x is zero-mean random variable throughout the whole iterative detection and decoding process. Indeed, in the absence of a priori information about x AMMSE detector has exactly same form as LMMSE detector [14]. Hence, AMMSE formulation in (2) is really just a generalization of LMMSE formulation in (1) [17], [18]. Again, as the number of turbo iterations increase, a priori information about the current detection estimate converges to the true detection estimate and much less computational power is needed in the detection process. Hence, SIC-AMMSE detector not only achieves asymptotic computational complexity of O(nt Mc 2Mc ) which is linear in nt , but more importantly without any performance degradation as compared to conventional SIC-LMMSE detector. The remainder of the paper is organized as follows: Section II presents the system model and introduces our notation. Section III introduces further reduced complexity algorithms such as SIC-LMMSE with RUA detector and SIC-AMMSE detector. Section IV presents several numerical examples

151

for different number of receive and transmit antennas on a standard wireless local area network (WLAN) channel model. Section V concludes the paper. II. S YSTEM M ODEL A. Transmitter This paper considers a multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) system with nt transmit and nr receive antennas. Let vector b with size Nb be source information bits entering the rate Rc LDPC channel encoder. c denotes the vector of encoded bits; which is not only grouped into blocks of Mc bits where Mc is number of bits per constellation symbol, but also multiplexed to nt sub-streams. Each block is then mapped onto M-ary quadrature amplitude modulation (QAM) complex symbols by the mapper µ. These symbols are transformed to time domain using inverse fast Fourier transform (IFFT). To eliminate inter-symbol-interference (ISI), a guard interval (GI), which exceeds the delay spread of MIMO channel, is appended to the original time domain signal per transmit antenna. This paper considers a linear model at the kth frequency subcarrier in T which received vector y(k) = [y1 (k), . . . , ynr (k)] ∈ Cnr ×1 T depends on transmitted vector x(k) = [x1 (k), . . . , xnt (k)] ∈ nt ×1 via C y(k) = H(k)x(k) + n(k) (3) where H(k) ∈ Cnr ×nt is complex channel matrix, known perfectly by receiver, n(k) ∈ Cnr ×1 is a vector of independent zero-mean complex Gaussian noise entries with variance σ 2 = N0 /2 per each real component and k = 1, 2, . . . , K where K refers to total number of frequency subcarriers. This paper assumes the average symbol energy Es ≡ E|xi (k)|2 = 1 where i = 1, 2, . . . , nt and symbols are equally likely chosen from a complex constellation X with cardinality |X | = 2Mc . The spectral efficiency R is then defined as R = nt Mc Rc bits per channel use (BPCU). The signal-to-noise ratio (SNR) is defined as Eb /N0 , where Eb is the energy per transmitted information bit per receive antenna. Notice that each receive antenna collects total energy of nt Es which carries nt Mc Rc information bits, therefore Eb can be expressed as Eb = Es /(Mc Rc ). B. Iterative Receiver Structure Approaching maximum-likelihood (ML) performance with reasonable complexity relies on iterative processing between detection and decoding. Analogous to a turbo decoder, the inner MIMO detector and outer channel decoder can be regarded as two elementary “decoders” [1] in a serial concatenation architecture. A na¨ıve ML decoder would have to compute the likelihood of each bit given the received vector y(k), k = 1, 2, . . . , K. Since there is total Nb bits, true ML implementation has to compare (correlate) 2Nb hypotheses and chooses in favor of the hypothesis with the best correlation metric. Clearly, such an algorithm is computational infeasible. Thus, this paper uses the idea of turbo decoding by having inner MIMO detector incorporating soft information provided by outer channel decoder, and outer channel decoder incorporating soft information provided by inner MIMO detector.

152

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 1, JANUARY 2008

GI insertion

x1 (k)

IFFT

mapper µ S/P

MIMO channel H(k)

AWGN

AWGN Fig. 1.

GI insertion

IFFT

GI removal n(k)

FFT

GI removal n(k)

xnt (k)

c

LDPC encoder

mapper µ

LA

y1 (k) MIMO detector

LE

LDPC decoder

ynr (k)

FFT

b

ˆ b

Iterative detection-decoding MIMO-OFDM system

Soft information is then being exchanged between detector and decoder until desirable performance is obtained. In future discussion, this paper emphasizes development of MIMO detection algorithm while leaving the well-known channel decoding details to references [19]–[21]. Upon receiving a noisy superposition of the transmitted signals corrupted by the MIMO channel in time domain, standard OFDM demodulation such as fast Fourier transform (FFT) and GI removal is applied to recover the transmitted signal in frequency domain. The MIMO detector takes the channel observation y(k) and a priori log-likelihood ratio (LLR) LA (cl ) to compute the extrinsic information LE (cl ) for each of nt Mc bits per received vector y(k). With cl = +1 representing a binary one and cl = −1 representing a binary zero, LA (cl ) from outer channel decoder is defined as LA (cl ) ≡ log

P [cl = +1] P [cl = −1]

(4)

where l = 1, . . . , nt Mc . The a posteriori LLR LD (cl |y(k)) for bit cl , conditioned on received vector y(k) is similarly defined as P [cl = +1|y(k)] (5) LD (cl |y(k)) ≡ log P [cl = −1|y(k)] where P [cl = m|y(k)], m = ±1, is the a posteriori probability (APP) of bit cl . “New” (extrinsic) information learned at the detection stage can easily be separated from a posteriori LLR LD (cl ) by subtracting off the a priori LLR LA (cl ). That is, (6) LE (cl ) = LD (cl |y(k)) − LA (cl ). In view of (6), extrinsic information LE (cl ) is then fed into outer channel decoder as a priori information on the coded bit cl . III. I TERATIVE D ETECTION AND D ECODING R ECEIVER A. SIC-LMMSE Detector with RUA SIC-LMMSE detector generally consists of three distinct stage of processing: soft interference cancellation, LMMSE

detection/filtering and LLR computation respectively [22], [23]. This paper focuses on the development of LMMSE filtering stage and leaves details of other two stages to reference [22]. It can be shown that the optimal solution [6], [9]–[11] is given by,  −1 N0 † In + H(k)∆i (k)H(k) hi (k), (7) wi (k) = Es r where the covariance matrix ∆i (k) is 

 σx2n (k) σx2i−1 (k) σx2i+1 (k) σx21 (k) t ∆i (k) = diag ,··· , , 1, ,··· , , Es Es Es Es (8)

and σx2n (k) , n = 1, 2, . . . , nt with n = i, is the transmit symbol variance and generally can be computed as, |x − x ¯i (k)|2 P [xi (k) = x]. (9) σx2i (k) = x∈X

There is an interesting way to perform matrix inversion via a recursive update algorithm (RUA). Finding the optimal LMMSE filter coefficient wi (k) often involves solving a system of equations which is also the most “expensive” step in the algorithm in terms of complexity. Efficient methods such as QR decomposition and Cholesky factorization [24] are used in practice for solving such system of equations, but still at the cost of cubic complexity [24]. One na¨ıve way to “solve” the system of equations would be finding a nr × nr inversion matrix of  −1 N0 † Pi (k) = In + H(k)∆i (k)H(k) , (10) Es r and compute wi (k) = Pi (k)hi (k). In what follows, an algorithm to construct Pi (k) directly via recursive update will be shown. A similar idea can also be found in [11] for multiuser detection. Let’s define ei to be a column vector where all entries are zeros except 1 at the ith entry and

variance of nth transmit symbol n = i, 2 σxn (k) = Es n = i.

LIU and FITZ: LOW COMPLEXITY AFFINE MMSE DETECTOR FOR ITERATIVE DETECTION-DECODING MIMO OFDM SYSTEMS

(n ) Pi t (k)

=

(nt −1)

(n −1) Pi t (k)

2 σx n



1+

2 σx

nt (k)

N0

t (k)

N0



(nt −1)

h†nt Pi

(n )

Matrices Pi

(k) and Pi t (k) are defined as follows, −1 nt −1 2 σxn (k) Es (nt −1) † Pi (k) = Inr + hn hn , (11) N0 n=1 Es −1 nt σx2n (k) Es (nt ) † hn hn . (12) Pi (k) = Inr + N0 n=1 Es

The term H(k)∆i (k)H(k)† in (10) can be rewritten as sum of vector outer products n t σx2n (k) † † en en H(k)† H(k)∆i (k)H(k) = H(k) E s n=1 =

nt σx2n (k)

Es

n=1

hn h†n .

(13)

In view of (13), (10) can be re-expressed as −1 nt σx2n (k) Es Es † Pi (k) = Inr + hn hn N0 N0 n=1 Es =

Es (nt ) P (k). N0 i

(14) (n )

The recursive update relation hinges on rewriting Pi t (k) as shown in (15). To arrive at (15), the “degenerate” matrix inversion lemma [24] (see the Appendix) had been applied. As (n −1) (15) suggests, a recursive update relation between Pi t (k) (nt ) and Pi (k) has been found. Therefore, Pi (k) can be directly constructed by recursive update algorithm (RUA) which is outlined in [22] (i.e. Table I). Because of this recursive algorithm the detection problem on the MIMO channel can be transformed into a structure more suitable for iterative detection and decoding receiver. Conventionally, SIC-LMMSE detector forms its optimum LMMSE filter coefficient wi (k) by solving system of equations without incorporating a priori information. Thus, fixed amount of computational resources is allocated uniformly throughout the iterative detection-decoding process. Different from SIC-LMMSE detector, SIC-LMMSE detector with RUA obtains wi (k) by directly constructing Pi (k) which is made explicitly a function of a priori information. Without a priori information, RUA is still a cubic complexity algorithm to form Pi (k). But, once a priori information becomes available, (n) (n−1) Pi (k) is only updated from the previous iteration Pi (k) in (15) when σx2n (k) /N0  0, n = i, where σx2n (k) is computed from a priori LLR. Hence, SIC-LMMSE detector with RUA enables a more flexible allocation of computing power depending on the level of a priori information. The RUA is mainly a function of residual interference-tonoise ratio, RINR(n), RINR(n) =

σx2n (k) N0

, n = i,

(16)

(k)hnt

 †  (nt −1) (n −1) (k)hnt Pi t (k)hnt  Pi

153

(15)

which also appears in (15). Depending on the number of turbo iterations and Eb /N0 , the actual value of RINR(n) is varying. If RINR(n) < β, where β is the threshold, the RUA skips the updating step as in (15) and achieves a lower complexity. In particular, since a priori LLR becomes more and more reliable as the number of turbo iteration increases, the “estimated” symbol mean x ¯i (k) becomes more likely to be the true transmit symbol while the “estimated” symbol variance σx2n (k) , n = i, is approaching zero. When σx2n (k) = 0, n = i, (i.e. perfect cancellation), RUA achieves further complexity reduction since it costs nothing to iterate from (n−1) (n) (k) to Pi (k) with n = i as clearly shown in (15). Pi Thus, Pi (k) is formed with exactly one iteration at n = i which in effect forms MRC filter with the corresponding column vector hn of channel matrix. The explicit parameterization of threshold β in SICLMMSE detector with RUA enables a trade-off between achieving a lower complexity and better performance. Smaller the value selected for β (i.e. β = 0), the less likely that Pi (k) will be formed in exactly one iteration, which implies more computational complexity. On the other hand, a larger value σ2 ) will be more likely to form Pi (k) in of β (i.e. β = Nmax 0 exactly one iteration (i.e. HIC-LMMSE detection). Replacing original matrix inversion with RUA in SICLMMSE detector will allow a more efficient computation of detection symbol estimate as the number of turbo iterations increase. When a priori information feedback from outer channel decoder becomes very reliable, SIC-LMMSE detector with RUA forms its detection estimate x ˆi (k) via MRC filter which is same as HIC-LMMSE detector. On the other hand, unlike HIC-LMMSE detector which always uses MRC filter, SIC-LMMSE detector with RUA also utilizes “unreliable” a priori information to form x ˆi (k) as clearly shown in (15). Table II in [22] gives a detailed outline of SIC-LMMSE detector with RUA. At the first turbo iteration, SIC-LMMSE detector with RUA shares about the same computational complexity as SIC-LMMSE detection algorithm which is O(n3r + Mc 2Mc ). But, for each subsequent turbo iteration at reasonable Eb /N0 , the dominant computation per transmit symbol involves performing interference cancellation with complexity O(nt nr ), finding Pi (k) via RUA and obtaining wi (k) with complexity of O(n2r [Γ (β) + 1]) and computing a posteriori LLR LD (c˜l |ˆ xi (k)) with complexity of O(Mc 2Mc ). The function Γ (β) is defined as, ⎧ β = 0, ⎪ ⎨ nt σ2 2, 3, ..., nt 0 < β < Nmax , (17) Γ (β) = 0 ⎪ 2 ⎩ σmax −1 β = N0 . σ2

If 0 < β < Nmax , Γ (β) may be an integer chosen from 2 0 to nt depending on the actual value of RINR(n). Therefore, SIC-LMMSE detector with RUA achieves an asymptotic complexity of O(n2t nr + nt n2r [Γ (β) + 1] + nt Mc 2Mc ).

154

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 1, JANUARY 2008

B. SIC-AMMSE Detector

filter is approximated as complex Gaussian distributed. More specifically,

The optimization problem in [23] (i.e. (12)) for the LMMSE filter coefficient can be generalized. Realizing the fact that a priori information about the current detection symbol x ˆi (k) becomes available after first turbo iteration, it is useful to seek an affine estimator of the following form: minimize subject to

E|xi (k) − x ˆi (k)|2 x ˆi (k) = wi (k)† yi (k) + mi (k),

E|xi (k) − x ˆi (k)|2 ¯ i (k)) . x ˆi (k) − x ¯i (k) = wi (k)† (yi (k) − y (19) xi (k) is defined in [23] (i.e. (9)) and where where x ¯i (k) = Eˆ ¯ i (k) = Eyi (k) = x y ¯i (k)hi (k). In view of (19), it should be obvious that LMMSE formulation is really a sub-class of a more general AMMSE formu¯ i (k) are zero throughout lation. Assuming both x ¯i (k) and y the iterative detection and decoding process, (19) has exactly the same formulation as (12) in [23]. More interestingly, (19) can also be derived from the original observation y(k) in (3) instead from yi (k) in [23] (i.e. (11)). To see this, it is helpful to substitute (11) in [23] into the constraint equation of (19): minimize subject to

minimize E|xi (k) − x ˆi (k)|2 ¯ (k)) + x subject to x ˆi (k) = wi (k)† (y(k) − y ¯i (k). It can be shown that the optimal filter coefficient for (19) is given by, −1 2  wi (k) = N0 Inr + H(k)∆(k)H(k)† σxi (k) hi (k) (20) where the covariance matrix ∆(k) can be expressed as   (21) ∆(k) = diag σx21 (k) , · · · , σx2i (k) , · · · , σx2n (k) t

and σx2i (k) , i = 1, 2, . . . , nt , is transmit symbol variance defined in (9). Therefore, the optimal solution to (18) is given by ¯ i (k)) + x ¯i (k) (22) x ˆi (k) = wi (k)† (yi (k) − y where wi (k) is obtained from (20). Calculating a posteriori LLR from the output of AMMSE estimator in (22) relies on Gaussian approximation of the ISIplus-noise term as in [25]. Upon rewriting (22) shows that ¯i (k)) + x ¯i (k) x ˆi (k) = wi (k)† hi (k)(xi (k) − x nt wi (k)† hn (k)(xn (k) − x ¯n (k)) + +

(24)

where αi (k) is the conditional mean (18)

where {wi (k), mi (k)} are to be determined and yi (k) is obtained from (11) in [23]. To find wi (k) and mi (k), the following two observations can be made: 1) the affine estimator should be unbiased, and 2) the filter coefficient wi (k) should be chosen optimally to minimize MSE in (18). It can be shown that the optimization problem in (18) is reduced to,

n=1,n=i wi (k)† ni (k)

P [ˆ xi (k)|xi (k) = x] ∼ Nc (αi (k), ηi2 (k)) − 21 |ˆ xi (k)−αi (k)|2 1 η (k) i = e πηi2 (k)

(23)

where the last two terms in (23) are viewed as ISI-plus-noise term. Given the knowledge of xi (k), the output of AMMSE

αi (k) = wi (k)† hi (k)(x − x ¯i (k)) + x ¯i (k)

(25)

and ηi2 (k) is the conditional variance and can be computed as ηi2 (k) = σx2i (k) wi (k)† hi (k)(1 − hi (k)† wi (k)).

(26)

Examining (20), (22), (25) and (26) shows this generalized formulation simplifies the calculation of the detection symbol estimate greatly. The optimal AMMSE filter coefficient wi (k) obtained in (20) is clearly a function of estimated symbol variance σx2i (k) . At the beginning of iterative process (i.e. first turbo iteration), no a priori information is available. Then, σx2i (k) equals Es and (20) reduces to (7) with identity covariance matrix. Meanwhile, the estimated symbol mean x ¯i (k) which is calculated from a priori information equals zero. Hence, (24) has the same conditional mean and variance as SIC-LMMSE detector in [6], [9]–[11]. As number of turbo iteration increases, the estimated symbol x ¯i (k) approaches the true transmit symbol xi (k) while the estimated symbol variance σx2i (k) approaches zero because availability of a priori information. As a result, the output of AMMSE filter becomes the estimated symbol mean x ¯i (k) since wi (k) also approaches zero as clearly indicated by (20). In general, output of the AMMSE filter is a combination of a filtered estimate and an a priori symbol mean depending on the level of a priori LLR. The likelihood function for the detection symbol estimate can also be simplified and calculated via quantization due to the AMMSE formulation. Notice that the a priori symbol probability can be reexpressed in the log-domain as, ⎞ ⎛ Mc  1 ⎠. (27) logP [xi (k) = x] = log ⎝ −xl˜LA (cl˜) 1 + e ˜ l=1

Realizing that log of products in (27), it can further simplify to logP [xi (k) = x] =

Mc

  −log 1 + e−xl˜LA (cl˜) .

(28)

˜ l=1

With (28) and Max-log approximation [26], a posteriori LLR LD (c˜l |ˆ xi (k)) becomes (29). With these approximation, comxi (k)) only needs search over 2Mc hypotheses. puting LD (c˜l |ˆ xi (k) − αi (k)|2 in (29) can be viewed The term − η21(k) |ˆ i as Euclidean distance between detection symbol x ˆi (k) and “scaled version” of the actual constellation symbol in X . Then, it is obvious that hypothesis xmin , xmin ∈ X , which is closest to x ˆi (k) in Euclidean distance maximizes the term xi (k) − αi (k)|2 . As the number of turbo iterations − η21(k) |ˆ i increases, the conditional variance, ηi2 (k), approaches zero indicating also the conditional mean x ¯i (k) is most likely one

LIU and FITZ: LOW COMPLEXITY AFFINE MMSE DETECTOR FOR ITERATIVE DETECTION-DECODING MIMO OFDM SYSTEMS

155

⎧ ⎫ Mc ⎨ ⎬   1 2 −xl˜LA (cl˜) |ˆ x LD (c˜l |ˆ xi (k)) ≈ max (k) − α (k)| + −log 1 + e − i i 2 ⎭ x∈X˜+1 ⎩ ηi (k) ˜ l l=1 ⎧ ⎫ Mc ⎨ ⎬   1 2 −xl˜LA (cl˜) − max |ˆ x (k) − α (k)| + −log 1 + e − i i 2 ⎭ x∈X˜−1 ⎩ ηi (k)

of the actual constellation symbols in X . This observation leads us to the following quantization

1 A x ˆi (k) = xmin , 2 xi (k) − αi (k)| = − 2 |ˆ (30) 0 x ˆi (k) = xmin , ηi (k)

10

10

PER

where A is the quantization value which refers to maximum value LLR may take. Then, SIC-AMMSE detector forms “new” (extrinsic) LLR LE (c˜l ) as xi (k)) − LA (c˜l ). LE (c˜l ) = LD (c˜l |ˆ

(29)

˜ l=1

l

10

0

−1

−2

(31)

The above mentioned steps are summarized in [23] (i.e. Table I) for SIC-AMMSE detector. Comparing SIC-LMMSE detection algorithm and Table I in [23] shows the SIC-AMMSE detection algorithm has obvious advantage in computational complexity. At the beginning of iterative process, no a priori information is available and SIC-AMMSE detector has the same computational complexity of O(n3r + Mc 2Mc ) as SIC-LMMSE detector. For subsequent turbo iteration, detection estimate x ˆi (k) can be directly construct from x ¯i (k) depending on the level of a priori LLR LA (c˜l ). As number of turbo iterations increases, a priori information becomes more and more reliable while estimated symbol variance σx2i (k) approaches zero indicating the estimated symbol mean x ¯i (k) also approaches the true transmit symbol xi (k). Then, the dominant computation per transmit symbol involves only calculating a posteriori LLR xi (k)) in (29) via quantization in (30) with complexity LD (c˜l |ˆ of O(Mc 2Mc ). Therefore, SIC-AMMSE detector achieves an asymptotic complexity of O(nt Mc 2Mc ) per turbo iteration. IV. N UMERICAL R ESULTS This section provides computer simulation results to show performance of the proposed front-end SIC-LMMSE detector with RUA and SIC-AMMSE detector in an iterative detectiondecoding MIMO-OFDM system. This paper assumes an equal number of transmit and receive antennas (i.e. nt = nr system). Most OFDM-PHY parameters such as: number of data sub-carriers, number of pilot sub-carriers and length of OFDM preamble are compatible with IEEE 802.11a standard [27]. The channel code which this paper adopts in iterative detection-decoding MIMO-OFDM system is LDPC code with multiple rate compatibility [28]. The LDPC code with constant block length of 1944, rate-1/2 is treated as “mother” code and higher rate codes such as rate-2/3, rate-3/4 and rate-5/6 are obtained via row combining of the original rate-1/2 parity check matrix. The actual MIMO channel which considered in simulation is taken from IEEE 802.11n channel models [29]. Specifically, Channel Model D with 50ns RMS delay spread is simulated. The packet error rate (PER) is computed. Each

10

10

−3

−4

−2

1−turbo, SIC−LMMSE detector 1−turbo, SIC−LMMSE detector w/ RUA 2−turbo, SIC−LMMSE detector 2−turbo, SIC−LMMSE detector w/ RUA 3−turbo, SIC−LMMSE detector 3−turbo, SIC−LMMSE detector w/ RUA 4−turbo, SIC−LMMSE detector 4−turbo, SIC−LMMSE detector w/ RUA 5−turbo, SIC−LMMSE detector 5−turbo, SIC−LMMSE detector w/ RUA 6−turbo, SIC−LMMSE detector 6−turbo, SIC−LMMSE detector w/ RUA 0

2

4

6

8

10

E /N , dB b

0

Fig. 2. Performance comparison between SIC-LMMSE detector and SICLMMSE detector with RUA for 4 × 4 turbo-LDPC L=1944 with 12 decoder iterations, 16 QAM, Rate-1/2 and 8 BPCU.

packet consists of 1000 bytes of information bits. This paper further assume perfect timing synchronization, no frequency offset and perfect channel state information for the iterative detection-decoding MIMO-OFDM system. Fig. 2 presents a PER performance comparison between SIC-LMMSE detector and SIC-LMMSE detector with RUA. For each packet transmission, 6 turbo iterations on the detection loop, and 12 iterations within the LDPC decoder are performed. The SIC-LMMSE detection with RUA with β = 0 is shown in Fig. 2. At 1% PER, both SIC-LMMSE detector and SIC-LMMSE detector with RUA provide a performance gain about 2 dB compared to single turbo iteration (i.e. MMSE suppression filter) and both detection algorithms converge at 6 turbo iterations. It can be seen that SIC-LMMSE detector with RUA matches the performance of its full complexity counterpart SIC-LMMSE detector. Fig. 3 presents a PER comparison of SIC-LMMSE detection algorithm with RUA at different values of β. By having a higher value of β, SIC-LMMSE detector with RUA is expected to achieve lower complexity but suffer a potential performance degradation. Hence, SIC-LMMSE detector with RUA allows a more flexible trade-off between performance and complexity. As Fig. 3 suggests, No noticeable performance degradation up to β = 0.1 with 3 turbo iterations is observed. At values of β above 0.1, RUA achieves lower complexity but at the price of performance degradation. Fig. 4 compares complexity by evaluating the ratio ρ, which

156

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 1, JANUARY 2008

10

0

10

10

PER 10

10

−1

−1

PER

10

0

10

−2

−2

−3

−2

10

3−turbo RUA w/ beta = 0 3−turbo RUA w/ beta = 1.0e−1 3−turbo RUA w/ beta = 6.0e−1 3−turbo RUA w/ beta = 7.0e−1 3−turbo RUA w/ beta = 8.0e−1 3−turbo RUA w/ beta = 9.0e−1 3−turbo HIC−LMMSE detector

10

−1

0

1

2

3

4

5

6

7

8

−3

−4

−2

1−turbo, HIC−LMMSE detector 1−turbo, SIC−LMMSE detector w/ RUA 2−turbo, HIC−LMMSE detector 2−turbo, SIC−LMMSE detector w/ RUA 3−turbo, HIC−LMMSE detector 3−turbo, SIC−LMMSE detector w/ RUA 4−turbo, HIC−LMMSE detector 4−turbo, SIC−LMMSE detector w/ RUA 5−turbo, SIC−AMMSE detector 6−turbo, SIC−LMMSE detector w/ RUA 0

2

b

0

Fig. 3. Performance comparison of SIC-LMMSE detector with RUA at different values of β for 4×4 turbo-LDPC L=1944 with 12 decoder iterations, 16 QAM, Rate-1/2 and 8 BPCU.

6

8

10

Fig. 5. Performance comparison between HIC-LMMSE detector and SICLMMSE detector with RUA for 4 × 4 turbo-LDPC L=1944 with 12 decoder iterations, 16 QAM, Rate-1/2 and 8 BPCU.

3

10

SIC−LMMSE HIC−LMMSE 2−turbo RUA at E b/N0 = 6dB 2−turbo RUA at E /N = 8dB b 0 3−turbo RUA at E /N = 6dB b 0 3−turbo RUA at E /N = 8dB b 0

2.5

10

0

−1

2

PER

Complexity−Ratio, Rho

4

E b/N0 , dB

E /N , dB

10

−2

1.5

10

−3

1

0.5 −10 10

10

−8

10

−6

10

−4

10

−2

10

0

Zero−Threshold, beta

−2

0

2

4

b

is defined as, CSIC-LMMSE with RUA CHIC-LMMSE

−4

6

8

10

E /N , dB

Fig. 4. Complexity comparison between HIC-LMMSE detector and SICLMMSE detector with different β for 4 × 4 turbo-LDPC L=1944 with 12 decoder iterations, 16 QAM, Rate-1/2 and 8 BPCU.

ρ=

10

1−turbo, SIC−LMMSE detector 1−turbo, SIC−AMMSE detector 2−turbo, SIC−LMMSE detector 2−turbo, SIC−AMMSE detector 3−turbo, SIC−LMMSE detector 3−turbo, SIC−AMMSE detector 4−turbo, SIC−LMMSE detector 4−turbo, SIC−AMMSE detector 5−turbo, SIC−LMMSE detector 5−turbo, SIC−AMMSE detector 6−turbo, SIC−LMMSE detector 6−turbo, SIC−AMMSE detector

(32)

between SIC-LMMSE detector with RUA and HIC-LMMSE detector. To measure the complexity of either detection algorithm, it can be shown that CSIC-LMMSE with RUA (i.e. also true for CHIC-LMMSE ) is inversely proportional to number of MRC performed during each packet detection. As shown in Fig. 4, CSIC-LMMSE with RUA is approaching CHIC-LMMSE as β increases. At β = 0.1, SIC-LMMSE detector with RUA achieves almost the same complexity of HIC-LMMSE detector but sacrifices no performance degradation as compared to full complexity SIC-LMMSE detector with RUA at β = 0 as shown in Fig. 3. Fig. 5 presents a PER performance comparison between HIC-LMMSE detector and SIC-LMMSE detector with RUA. HIC-LMMSE detection algorithm converges at 4 turbo it-

0

Fig. 6. Performance comparison between SIC-LMMSE detector and SICAMMSE detector for 4 × 4 turbo-LDPC L=1944 with 12 decoder iterations, 16 QAM, Rate-1/2 and 8 BPCU.

erations. At 1% PER with 4 turbo iterations, SIC-LMMSE detector with RUA outperforms HIC-LMMSE detector by 1 dB. Moreover, SIC-LMMSE detector with RUA achieves an asymptotic computational complexity of O(nt n2r + nt Mc 2Mc ) which is compatible with HIC-LMMSE detector. Fig. 6 presents a PER performance comparison between SIC-LMMSE detector and SIC-AMMSE detector. For each packet transmission, up to 6 turbo iterations, and 12 iterations within the LDPC decoder are performed. Clearly, the performance gain by doing one extra turbo iteration is diminishing and converges at 6 turbo iterations. At 1% PER, both SIC-LMMSE detector and SIC-AMMSE detector provide a performance gain about 2 dB compared to single turbo iteration (i.e. MMSE suppression filter). The proposed SICAMMSE detection algorithm not only gives exactly the same performance compare to conventional full complexity SICLMMSE detection algorithm, but also achieves a linear in

LIU and FITZ: LOW COMPLEXITY AFFINE MMSE DETECTOR FOR ITERATIVE DETECTION-DECODING MIMO OFDM SYSTEMS

10



0

=

Inr + ⎡

PER

10

10

10

10

= ⎣Inr −

−1

×

−2

−3

−4

−2

=

1−turbo, HIC−LMMSE detector 1−turbo, SIC−AMMSE detector 2−turbo, HIC−LMMSE detector 2−turbo, SIC−AMMSE detector 3−turbo, HIC−LMMSE detector 3−turbo, SIC−AMMSE detector 4−turbo, HIC−LMMSE detector 4−turbo, SIC−AMMSE detector 5−turbo, SIC−AMMSE detector 6−turbo, SIC−AMMSE detector 0

2

σx2n

t (k)

N0

−1 (n −1) Pi t (k)hnt h†nt 2 σx n

1+

(n −1) Pi t (k)

nt (k)

N0

(nt −1)

(n −1) h†nt Pi t (k)hnt

2 σx n



 (n −1) × Pi t (k)hnt

(k)

Pi

(k)hnt h†nt ⎦

t (k)

N0

  (nt −1) † P h (k)h n n t t i N0 † (n −1) Pi t (k)hnt .

2 σx

1+ 

(nt −1)

Pi



t (k)

N0

2 σx

(n −1) Pi t (k)

157

nt (k)

 4

6

8

10

E b/N0 , dB

R EFERENCES

[1] C. Berrou and A. Glavieux, “Near optimum error correcting coding and decoding: turbo-codes,” IEEE Trans. Commun., vol. 44, no. 10, pp. 1261–1271, Oct. 1996. [2] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correction coding and decoding: Turbo codes,” in Proc. IEEE Int. Conf. Communications, May 1993, pp. 1064–1070. number of antennas asymptotic computational complexity of [3] J. Hagenauer, “The turbo principle: Tutorial introduction and state of the art,” in Proc. International Symposium on Turbo Codes and Related O(nt Mc 2Mc ). Topics, Sept. 1997, pp. 1–11. Fig. 7 presents a PER comparison between HIC-LMMSE [4] I. E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans. detector and SIC-AMMSE detector. At 1% with 4 turbo Telecommun., vol. 10, pp. 585–595, Nov. 1999. [5] G. J. Foschini and M. Gans, “On the limits of wireless communication iterations, SIC-AMMSE detector outperforms HIC-LMMSE in a fading enviroment,” in Wireless Personal Commun., vol. 6, Mar. detector by 1 dB. Moreover, SIC-AMMSE detector also 1998, pp. 311–355. achieves a lower computational complexity than HIC-LMMSE [6] M. Sellathurai and S. Haykin, “Turbo-BLAST for wireless communications: Theory and experiments,” IEEE Trans. Signal Processing, vol. 50, detector. pp. 2538–2546, Oct. 2002. [7] A. Stefanov and T. M. Duman, “Turbo-coded modulation for systems V. C ONCLUSION with transmit and receive antenna diversity over block fading channels: System model, decoding approaches, and practical considerations,” This paper presents two computational more efficient frontIEEE J. Select. Areas Commun., vol. 19, pp. 958–968, May 2001. end detection algorithm for iterative detection and decoding [8] Y. Liu, M. P. Fitz, and O. Y. Takeshita, “Full rate space-time turbo codes,” IEEE J. Select. Areas Commun., vol. 19, pp. 969–980, May MIMO systems, namely SIC-LMMSE detector with RUA and 2001. SIC-AMMSE detector. By reformulating the matrix inversion [9] A. Matache, C. Jones, and R. Wesel, “Reduced complexity MIMO step in conventional LMMSE filtering process into RUA, this detectors for LDPC coded systems,” in Proc. Military Communication Conf., 2004. allows a more flexible allocation of computational power and more suitable for iterative processing receiver. Moreover, a [10] B. Lu, G. Yue, and X. Wang, “Performance analysis and design optimization of LDPC-coded MIMO OFDM systems,” IEEE Trans. complexity analysis demonstrates that the proposed system Signal Processing, vol. 52, pp. 348–360, Feb. 2004. achieves about the same complexity as HIC-LMMSE detector [11] X. Wang and H. V. Poor, “Iterative(turbo) soft interference cancellation and decoding for coded CDMA,” IEEE Trans. Commun., vol. 47, pp. proposed in the past, but also has better PER performance. 1046–1061, July 1999. On the other hand, SIC-AMMSE detector generalizes the [12] K.-B. Song and S. A. Mujtaba, “A low complexity space-frequency BICM MIMO-OFDM system for next-generation WLANs,” in Proc. conventional SIC-LMMSE detection algorithm. As number of IEEE Global Telecommunications Conf., 2003, pp. 1059–1063. turbo iteration increases, SIC-AMMSE uses much less com[13] B. M. Hochwald and S. ten Brink, “Achieving near-capacity on a putational power in the detection process, but also achieves multiple-antenna channel,” IEEE Trans. Commun., vol. 51, pp. 389–399, Mar. 2003. the same performance as conventional SIC-LMMSE detector. [14] A. H. Sayed, Fundamentals of Adaptive Filtering. John Wiley, 2003. [15] M. T¨uchler, R. Koetter, and A. C. Singer, “Turbo equalization: Principles A PPENDIX I and new results,” IEEE Trans. Commun., vol. 50, pp. 754–767, May 2002. P ROOF OF (15) [16] P. Schniter, “Low-complexity equalization of OFDM in doubly selective (nt ) By realizing that the inverse of Pi (k) in (12): channels,” IEEE Trans. Signal Processing, vol. 52, pp. 1002–1011, Apr.  −1  −1 2004. (n ) (n −1) Pi t (k) is a rank-one modification of Pi t (k) . [17] H. V. Poor, An Introduction to Signal Detection and Estimation. New York: Springer, 1994. Then, [18] T. Kailath, A. H. Sayed and B. Hassibi, Linear Estimation. Prentice   −1  −1 σx2 (k) Hall, 2000. nt (n ) (n −1) [19] M. P. C. Fossorier, M. Mihaljevi´c, and H. Imai, “Reduced complexity + hnt h†nt Pi t (k) = Pi t (k) N0 iterative decoding of low-density parity check codes based on belief propagation,” IEEE Trans. Commun., vol. 47, pp. 673–680, May 1999.   −1 2  −1 [20] E. Eleftheriou, T. Mittelholzer, and A. Dholakia, “Reduced-complexity σ xnt (k) (nt −1) (n −1) decoding algorithm for low-density parity-check codes,” IEEE Elec. = Pi t (k) Pi (k)hnt h†nt Inr + N0 Letters, vol. 37 (2), pp. 102–104, Jan. 2001. Fig. 7. Performance comparison between HIC-LMMSE detector and SICAMMSE detector for 4 × 4 turbo-LDPC L=1944 with 12 decoder iterations, 16 QAM, Rate-1/2 and 8 BPCU.

158

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 1, JANUARY 2008

[21] X.-Y. Hu, E. Eleftherious, D. M. Arnold, and A. Dholakia, “Efficient implementation of the sum-product algorithm for decoding LDPC codes,” in Proc. IEEE Global Telecommunications Conf., Nov. 2001, pp. 1036–1036E. [22] D. N. Liu and M. P. Fitz, “Low complexity linear MMSE detector with recursive update algorithm for iterative detection-decoding MIMO OFDM system,” in Proc. IEEE Wireless Communications and Networking Conf., Apr. 2006, pp. 850–855. [23] ——, “Low complexity affine MMSE detector for iterative detectiondecoding MIMO OFDM system,” in Proc. IEEE Int. Conf. Communications, June 2006, pp. 4654–4659. [24] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press, 1985. [25] H. V. Poor and S. Verd´u, “Probablity of error in MMSE mutiluser detection,” IEEE Trans. Inf. Theory, pp. 858–871, May 1997. [26] P. Robertson, E. Villebrun, and P. Hoeher, “A comparision of optimal and suboptimal MAP decoding algorithms operating in the LOG domain,” in Proc. IEEE Int. Conf. Communications, 1995, pp. 1009–1013. [27] IEEE Std. 802.11a-1999, “Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specification: high speed physical layer in the 5 GHz band,” IEE-SA Standards Board(1999-0916), Tech. Rep., 1999. [28] A. I. Vila Casado, W.-Y. Weng, and R. Wesel, “Multiple rate low-density parity-check codes with constant blocklength,” in Proc. Asilomar Conf. Signals, Systems, and Computers, 2004. [29] V. Erceg et al., “IEEE 802.11 TGn channel models,” Tech. Rep. IEEE 802.11-03/940r1, January 2004.

Daniel N. Liu (S’03) received the B.S.E.E. degree (magna cum laude) from University of California Los Angeles, Los Angeles, CA in 2003, and the M.S. degree in electrical engineering also from University of California Los Angeles, Los Angeles in 2005 respectively. He is currently pursuing his Ph.D. degree in electrical engineering at University of California Los Angeles (UCLA), Los Angeles. His research interests are in the area of physical layer communication theory and signal processing, particularly: detection and estimation theory, equalization, channel estimation and space-time coding theory. Michael P. Fitz received his B.E.E. degree (summa cum laude) from the University of Dayton, Dayton, Ohio, in 1983 and his MS and Ph.D. degrees in electrical engineering from the University of Southern California in 1984 and 1989, respectively. From 1983-1989 he worked as a communication systems engineer for Hughes Aircraft and TRW Inc. In 1989 he ventured into academia and was faculty at Purdue University, the Ohio State University (OSU), and the University of California Los Angeles. Dr. Fitz is currently employed at the Northrop Grumman Corporation as a senior systems engineer working on satellite communications. Dr. Fitz’s research is in the broad area of statistical communication theory and experimentation. Dr. Fitz is the author of the textbook, Fundamentals of Communications Systems. He was a recipient of the 2001 IEEE Communications Society Leonard G. Abraham Prize Paper Award in the Field of Communications Systems. Prof. Fitz’s research group at UCLA currently is interested in the theory of space-time modems and operates an experimental wireless wide area network and a space-time coding testbed.