IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 52, NO. 12, DECEMBER 2004
2057
Lattice-Reduction-Aided Broadcast Precoding Christoph Windpassinger, Robert F. H. Fischer, and Johannes B. Huber
Abstract—A precoding scheme for multiuser broadcast communications is described, which fills the gap between the low-complexity Tomlinson–Harashima precoding and the sphere decoderbased system of Peel et al. Simulation results show that, replacing the closest-point search with the Babai approximation, the full diversity order supported by the channel is available to each user, as in the system of Peel et al., and unlike Tomlinson–Harashima precoding, which suffers some diversity penalty. The complexity of the scheme is similar to that of Tomlinson–Harashima precoding.
antenna each. The transmission channel is assumed to be frequency-flat, and the fading coefficient from the th transmit . antenna to the receiver of user is denoted as We use the vector/matrix notation
Index Terms—Lattice reduction, multiple-input multiple-output (MIMO) broadcast channels, MIMO precoding.
with the transmit signal vector , channel , per-receiver noise , with matrix , and receive vector , where we keep in mind that receiver processing is limited to individual components of . A necessary prerequisite for performing any kind of pre-equalization is channel state information at the transmitter. Here we assume the current realization of to be perfectly known to the transmitter. With and denoting real and imaginary parts
I. INTRODUCTION
R
ECENTLY, the multiuser broadcast precoding problem has received considerable attention. Peel et al. [1] have introduced a “vector pertubation technique,” and have shown that the uncoded error-rate curves thus obtained exhibit the full diversity order of the system, unlike those obtained, e.g., by Tomlinson–Harashima precoding [3]. The key idea of suitably choosing precoding symbols for periodic extensions was already present in shaping without scrambling [4], [5], which uses a successive processing to efficiently find the pertubation vector. However, their technique, which can be described as some kind of “maximum-likelihood detection at the transmitter,” requires the use of the rather complex sphere decoder [6] in order to solve a lattice closest-point problem. In this letter, we consider the use of Babai’s approximate closest-point solution [2] to come up with a much less complex precoding technique, along the lines of [7] and [8]. It turns out that while there is some loss in power efficiency with respect to the sphere decoder-based precoding technique, the full diversity is still present with the approximate solution, leading to significant gains in uncoded error rate for high signal-to-noise ratios (SNRs). Section II will introduce the system model; subsequently, the precoding method by Peel et al. [1] is discussed in Section III. Section IV describes the approximate solutions, followed by simulation results in Section V. Some concluding remarks are offered in Section VII. II. SYSTEM MODEL AND CONVENTIONAL PRECODING
We consider the transmission from a base station with transmit antennas to users with a single receive Paper approved by A. H. Baniheshemi, the Editor for Coding and Communication Theory of the IEEE Communications Society. Manuscript received August 1, 2003; revised February 13, 2004 and June 8, 2004. This paper was presented in part at the 5th International ITG Conference on Source and Channel Coding, Erlangen, Germany, January 2004. The authors are with the Lehrstuhl für Informationsübertragung, Universität Erlangen-Nürnberg, 91058 Erlangen, Germany (e-mail:
[email protected];
[email protected];
[email protected]). Digital Object Identifier 10.1109/TCOMM.2004.838732
(1)
(2) and letting the subscript tained in this fashion, the lent to (1) is
denote vectors and matrices ob-dimensional real-valued equiva-
(3) users The vector of data symbols to be transmitted to the , chosen from an -ary will be denoted as quadrature amplitude modulation (QAM) constellation. For ex, or ample, for . equivalently The most obvious method to perform precoding for this setup , where is the right pseudoinverse of is to select , i.e., . In this case, the receiver simply quantizes to the -ary QAM constellation to recover . A more power-efficient precoding method is Tomlinson–Harashima precoding (cf., e.g., [9]). This method employs modulo arithmetics in the precoding stage, and requires a modulo operation at the receiver before quantizing to the QAM constellation. It is based on a QR-type decomposition of the channel , and its performance can be increased by using the matrix V-BLAST algorithm [10] to optimize the ordering of the subchannels. However, since all of these schemes perform linear preequalRayleigh ization for at least one of the subchannels, for fading channels, the average bit-error rate (BER) curve will , and, in particular, for show diversity order , the diversity order of a single Rayleigh fading channel.
0090-6778/04$20.00 © 2004 IEEE
2058
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 52, NO. 12, DECEMBER 2004
Fig. 1. Left: Lattice AH H and associated Voronoi regions. Illustration of linear equalization with respect to original basis (dashed arrows, middle) and reduced basis (solid arrows, right).
III. SEARCH-BASED BROADCAST PRECODING We assume that the receiver employs a modulo operation as in Tomlinson–Harashima precoding, i.e.,
and , we use the polynomial-time Starting from Lenstra–Lenstra–Lovász (LLL) algorithm1 [12] on the columns to obtain of (6)
(4) The scalar is chosen such that the points from the signal con(for 4-QAM as stellation can be uniquely recovered from to get a periodic extension of the signal above, we take constellation at the channel output). Shaping without scrambling [4], [5], initially only introduced for single-input single-output intersymbol interference channels, and the “vector pertubation” technique of [1] are both based on the observation that the transmit signal which requires minimum power is found as (5) The receiver observes frontend, for each
, and with the modulo (real representation)
Taking a closer look at the minimization (5), we see that is closest to . found as the point in the lattice As the search space for this closest-point problem has a finite , unlike in the applications for number of dimensions which shaping without scrambling was originally considered, the full search can be effectively performed using standard techniques, e.g., lattice decoding, in particular, the sphere-decoder algorithm [1], [6], cf. also [5]. IV. LATTICE-REDUCTION-AIDED BROADCAST PRECODING It has been shown that the average complexity of the sphere decoder is not as bad as suggested by its worst-case complexity (which is exponential in the number of dimensions) [11]. However, compared with both linear preequalization and Tomlinson–Harashima precoding, which merely require a few matrix (and modulo) operations, it still is quite high. Here we suggest using the closest-point approximation [2], similar to [7] and [8], to obtain a simple but efficient method for broadcast precoding.
is the LLL-reduced basis with approxHere is an integer matrix with imately orthogonal columns, and , which describes this transform. Since particularly matrices that are far from being orthogonal can be transformed into “rather” orthogonal matrices in this way, less noise enhancement is suffered by linear equalization based on these LLL-reduced matrices (truly orthogonal matrices would have zero noise enhancement). The diagrams in Fig. 1 illustrate this for a simple two-dimensional (2-D) example, cf. also [7]. On the left-hand side, the Voronoi regions associated with the set of points are shown, together with the column vectors (basis vectors) of (dashed) and the column vectors of a reduced basis for the same set of points (solid). The Voronoi regions partition the real plane corresponding to the (exact) solutions of the . If the original closest-point problem for the lattice basis is used for linear equalization, the diagram in the middle of Fig. 1 results, where the dashed basis vectors are now forced to be orthonormal. However, the integer grid used for rounding does not match the distorted Voronoi regions very well, which gives a poor approximation to the closest-vector problem. The shaded area in the diagram shows where the approximation is correct. On the other hand, if the reduced basis is used for linear equalization (right diagram), the mismatch between the distorted Voronoi regions and the integer grid is much smaller. This procedure is known as the “rounding off” approximation [2], and the solution of (5) is given by (7) where we have used to denote componentwise -dimensional vector to the scaled integer rounding of a . lattice We also consider a variant of the nearest-plane algorithm [2] for the solution of the closest-point problem. This approximation 1We note that the LLL algorithm only approximately solves the lattice basisreduction problem, i.e., the problem of finding the “most orthogonal” lattice basis. The results in [6] suggest that using the Korkine–Zolotareff reduced basis, which is significantly more complex to construct, does not offer much improvement up to about 16 dimensions, cf. also [13] and [14].
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 52, NO. 12, DECEMBER 2004
2059
Fig. 2. Illustration of the nearest-plane approximation broadcast precoding scheme. Only one receiver processing path is shown.
Fig. 3. Simulation results for the different broadcast precoding schemes. N = 4; K = 4, 4-QAM (left), N = 8; K = 8, 16-QAM (right).
is similar to decision-feedback equalization, i.e., consists of a successive quantization taking into account previous quantized values. FromtheV-BLASTalgorithm[10]appliedto obtainedfrom the LLL algorithm as above, we get (8) is a lower triangular matrix with unit Here for and for ), diagonal( amatrixwithorthogonalrows, and a permutationmatrix corresponding to the optimized decisionorder. (If the decision , these matrices can be obtained order is not optimized, i.e., fromaQR-typedecompositionof ).Theimplementationofthis scheme is shown in Fig. 2. The algorithm sets (9) and starting with for
, the feedback loop in Fig. 2 calculates
(10) (Notethat
).Finally,weobtain (11)
ExceptfortheupfrontcalculationoftheLLL-reducedbasis ,the complexity is similar to that of, e.g., Tomlinson–Harashima precoding. It is worth noting that while in the detection case, the effect of neglecting the boundary region of the constellation has a
negative impact on the performance [8], [15], all lattice points are equally valid in the present situation. V. SIMULATION RESULTS We now present simulation results in the form of BER curves over the average transmitted energy per information bit divided by the one-sided noise power spectral density , for systems with constant transmit power. The channel is generated , i.e., independent Rayleigh according to user’s receive antenna. fading channels for each of the 4-QAM constellations are used on all subchannels. and Fig. 3 (left) shows the results for a system with . Linear preequalization (labeled “linear”), straightforward Tomlinson–Harashima precoding (“THP”) and THP based on the V-BLAST permutation (“THP/VB”) all exhibit diversity order 1, in contrast to the diversity that could be obtained if no interference were present, shown by the curve labeled “orthogonal.” Precoding based on the sphere-decoder approach (“search”) shows significant improvement, particularly exhibiting the full diversity order 4. Strikingly, both of the schemes based on the Babai approximations, lattice-reduction-aided precoding with linear (“LR-lin”) and V-BLAST nearest-plane (“LR-VB”), also exhibit the full diversity order 4, and particularly LR-VB shows only little loss in power efficiency with respect to the full search, at significantly lower complexity.
2060
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 52, NO. 12, DECEMBER 2004
symbol in [1] can be avoided using the Babai approximation, at only a little cost in performance, retaining the full diversity, and improving substantially over linear preequalization, as well as THP. The LLL basis reduction required for Babai’s closest-point approximation is necessary only once per block, and for reasonable block sizes introduces merely a negligible additional overhead, resulting in complexity comparable to THP. Altogether, the computational structure of the resulting precoder is straightforward and simple to implement. Note that the precoding schemes considered in this letter are optimized under the constraint that a simple scalar modulo receiver-frontend is used, whereas precoding systems based on higher dimensional lattices may achieve additional gains, cf., e.g., [5]. Fig. 4. Distribution of the average number of arithmetic operations (ops) required by sphere-decoder-based precoding for random H (N = 8; K = 8). Sphere decoder without preprocessing (dashed) and sphere decoder with LLL preprocessing (solid).
The same holds for a larger setup with and and 16-QAM modulation, shown in Fig. 3 (right), where THP/VB improves quite substantially upon plain THP, but again eventually settles to diversity order 1. Even for this larger system, the loss of LR-VB precoding, compared with the full search, is not substantial. VI. COMPLEXITY COMPARISON We briefly compare the complexity of the proposed scheme with that of the sphere-decoder-based scheme. Indeed, the fastest known sphere-decoder algorithms use LLL (or Korkine–Zolotareff) reduced bases [6], [13]. This preprocessing makes sense, as long as several vectors are to be decoded for a given lattice, which we can assume in a precoding application. Therefore, the overhead of LLL reduction is the same for the sphere-decoder-based and the proposed scheme. The average number of arithmetic operations (additions and multiplications) required by the sphere decoder varies with the . This can be seen basis of the lattice that it operates in in Fig. 4, where the distribution of this average is shown for , and . (Results random , for the sphere decoder with and without LLL preprocessing are shown, which illustrate the significant performance advantage of the former). The complexity of the sphere decoder with LLL preprocessing varies from about 2000 to more than 20 000 operations, while the proposed scheme requires a constant number of , which can be a distinct roughly 1200 operations advantage in a practical implementation. VII. CONCLUSION The simulations conducted show that for system dimensions of practical interest, the expensive search for the precoding
REFERENCES [1] C. B. Peel, B. M. Hochwald, and A. L. Swindlehurst. (2003, June) A vector-pertubation technique for near-capacity multi-antenna multi-user communication. [Online]. Available: http://mars.bell-labs.com/papers/mod_precoding/ [2] R. F. H. Fischer, C. Windpassinger, A. Lampe, and J. B. Huber, “MIMO precoding for decentralized receivers,” in Proc. IEEE Int. Symp. Information Theory, Lausanne, Switzerland, June 2002, p. 496. [3] R. F. H. Fischer, W. H. Gerstacker, and J. B. Huber, “Dynamics limited precoding, shaping, and blind equalization for fast digital transmission over twisted pair lines,” IEEE J. Select. Areas Commun., vol. 13, pp. 1622–1633, Dec. 1995. [4] R. F. H. Fischer, C. Stierstorfer, and C. Windpassinger, “Precoding and signal shaping for transmission over MIMO channels,” in Proc. Canadian Workshop Information Theory, Waterloo, ON, Canada, May 2003, pp. 83–87. [5] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, “Closest point search in lattices,” IEEE Trans. Inform. Theory, vol. 48, pp. 2201–2214, Aug. 2002. [6] L. Babai, “On Lovász’ lattice reduction and the nearest lattice point problem,” Combinatorica, vol. 6, no. 1, pp. 1–13, 1986. [7] H. Yao and G. W. Wornell, “Lattice-reduction-aided detectors for MIMO communication systems,” in Proc. IEEE Globecom, Taipei, Taiwan, Nov. 2002, pp. 424–428. [8] C. Windpassinger and R. F. H. Fischer, “Low-complexity near-maximum-likelihood detection and precoding for MIMO systems using lattice reduction,” in Proc. IEEE Information Theory Workshop, Paris, France, Mar. 2003, pp. 345–348. [9] R. F. H. Fischer, Precoding and Signal Shaping for Digital Transmission. New York: Wiley, 2002. [10] G. J. Foschini, G. D. Golden, R. A. Valenzuela, and P. W. Wolniansky, “Simplified processing for high spectral efficiency wireless communication employing multi-element arrays,” IEEE J. Select. Areas Commun., vol. 17, pp. 1841–1852, Nov. 1999. [11] B. Hassibi and H. Vikalo, “On the expected complexity of integer leastsquares problems,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, Orlando, FL, May 2002, pp. 1497–1500. [12] A. K. Lenstra, H. W. Lenstra, and L. Lovász, “Factoring polynomials with rational coefficients,” Math. Ann., vol. 261, pp. 515–534, 1982. [13] A. H. Banihashemi and A. K. Khandani, “On the complexity of decoding lattices using the Korkin–Zolotarev reduced basis,” IEEE Trans. Inform. Theory, vol. 44, pp. 162–171, Jan. 1998. [14] C. Windpassinger and R. F. H. Fischer, “Optimum and sub-optimum lattice-reduction-aided detection and precoding for MIMO communications,” in Proc. Canadian Workshop Information Theory, Waterloo, ON, Canada, May 2003, pp. 88–91. [15] C. Windpassinger, L. Lampe, and R. F. H. Fischer, “From lattice-reduction-aided detection toward maximum-likelihood detection in MIMO systems,” in Proc. Wireless, Optical Communications Conf, Banff, AB, Canada, July 2003, [CD-ROM].