Efficient User, Bit and Power Allocation for Adaptive Multiuser MIMO-OFDM with Low Signalling Overhead Antti T¨olli and Markku Juntti Centre for Wireless Communications, University of Oulu P.O. Box 4500, 90014 University of Oulu, Finland. Email:
[email protected] Abstract— Block diagonalization (BD) of multiple users combined with coordinated transmitter-receiver processing and scheduling between users is a simple and straightforward method to utilize the available spatial degrees of freedom in downlink multi-user (MU) multiple-input multiple-output (MIMO) channel for increasing the system throughput. In this paper, an efficient user, bit and power allocation algorithm with low signalling overhead (LSO) is proposed for maximizing the downlink spectral efficiency of the adaptive multiuser MIMO-OFDM system with BD decomposition. A greedy allocation algorithm with HughesHartogs loading is also proposed and used as a reference for LSO algorithm. For a small number of users, the performance of LSO algorithm is shown to be very close to the greedy algorithm. Iterative BD decomposition with a single reiteration is shown to provide significant gains to the non-iterative solution. Linear minimum mean square error (LMMSE) filter is applied at the receiver to suppress the remaining multi access interference from incomplete orthogonalization together with a simple compensation algorithm with low rate scalar feedback to the transmitter. The proposed MU MIMO-OFDM system is also shown to be robust against imperfect channel state information at the transmitter, providing always superior performance to a time switched single user MIMO-OFDM solution.
I. I NTRODUCTION The allocation of resources among users of a communication system can be performed jointly in time, frequency and space domains using advanced multi-user multiple-input multiple-output (MIMO) resource allocation and scheduling techniques in both uplink and downlink. While the dirty paper coding (DPC) is known to be the capacity achieving albeit very complex precoding technique in downlink [1], there are several suboptimal but less complex techniques proposed in the literature to perform multi-user (MU) transmission at the transmitter. Recently, a suboptimal precoding method, known as block diagonalization (BD) was introduced in [2]. When combined with coordinated transmitter (TX) - receiver (RX) processing and scheduling between users the method is applicable to any number of antennas and users. Optimal allocation of resources among different users while considering the spatial processing at both the base station and also at mobile nodes has been a topic of significant recent interest [3]–[5]. Orthogonal frequency-division multiplexing (OFDM) system with frequency selective fading adds another dimension to the optimization problem. The optimal solution for the 2-dimensional (2D) space-frequency allocation problem
would require an exhaustive search over all possible combinations of user allocations for each sub-carrier, and hence, the number of computations required would be clearly prohibitive. Greedy allocation solutions have been proposed for flat fading case for example in [3]–[5]. These ideas are extended in this paper for frequency selective OFDM case. However, the greedy allocation of resources among sub-carriers and spatial dimensions results in an orthogonal frequency division multiple access (OFDMA) solution where the 2D allocation table can be severely fragmented, and the amount of signalling required becomes a performance limiting factor. In this paper, we consider a time division duplex (TDD) OFDM system with adaptive MU MIMO transmission, where the transmitter and receivers are equipped with multiple antennas. BD method with iterative coordinated TX-RX processing is utilized to guarantee interference free reception for each scheduled users. In addition to the greedy algorithm, we propose an efficient low complexity joint user, bit and power allocation algorithm with low signalling overhead (LSO) for maximizing the downlink spectral efficiency of the system. Moreover, the impact of imperfect channel estimation at the transmitter and imperfect orthogonalization of allocated users on the system performance is studied. II. S YSTEM M ODEL Adaptive multi-user MIMO-OFDM system is considered with NC sub-carriers, K users, each equipped with NRk antennas, and a single base station having NT antennas. The P total number of receive antennas is thus NR = NRk . The downlink OFDM modulator/demodulator chain for user k can be written as yk,c = Hk,c xc + nk,c , (1) where c = 1, . . . , NC represents the sub-carrier index, xc and yk,c denote the transmitted signal vector and the received signal vector, respectively, nk,c represents the noise vector ∼ CN (0, N0 INRk ), and H1,c , . . . , HK,c are the NRk × NT channel matrices of users 1 to K, respectively. The elements of Hk,c are normalized to have unitary variance. The transmitted vector at sub-carrier c is generated as xc =
K X k=1
¯c , ¯ cd Mk,c dk,c = M
(2)
where Mk,c ∈ CNT ×mk,c represent the pre-coding vectors used to transmit mk,c ≤ NRk parallel streams of data dk,c ¯ c = [M1,c , . . . , MK,c ] ∈ CNT ×rc to user k. The matrix M T ¯ and the vector dc = [d1,c , . . . , drc ,c ] consist of rc pre-coding vectors and complex data symbols transmitted at sub-carrier c, PK respectively. rc = k=1 mk,c ≤ NT denotes the total number of active data streams per sub-carrier decided by the user, bit and power allocation algorithm described in Section III. By incorporating (2) into (1), the received signal for user k at sub-carrier c becomes: yk,c = Tk,c dk,c +
K X
(1)
(1)
where U0 k,c and V0 k,c represent the first mk,c left and right singular vectors of H0 k,c , and Λ0 k,c is mk,c × mk,c diagonal matrix of eigenvalues. The pre-coding matrix Mk,c for user k is now defined as (0)
Hk,c Mi,c di,c + nk,c ,
(3)
i=1,i6=k
where Tk,c = Hk,c Mk,c represents the equivalent channel matrix of the desired signal at subcarrier c. It includes the accumulated effect of signal processing at the transmitter side and channel propagation on the transmitted data signal. Matrix Tk,c is assumed to be perfectly known at the receiver. The receiver is equipped with linear MMSE filter and the deb k,c = WH yk,c . The weight cision variables are generated as d k,c matrix Wk,c ∈ CNRk ×mk,c of the hMMSE filter is foundi by a H minimization Wk,c = arg min E kdk,c − Wk,c yk,c k2 and Wk,c
is given as H Wk,c
(0) (0) ˜ ˜ ¯ k,c V ¯ k,c ¯ i,c = 0 for i 6= k. The SVD of H0 k,c = Hk,c V that H is now determined individually for each user as: " # 1 02 (1) (0) 0 Λ 0 0 (1) 0 (0) k,c [V0 k,c V0 k,c ]H , (5) H k,c = [U k,c U k,c ] 0 0
¯ ¯H H = TH k,c Hk,c Mc Mc Hk,c + Rk,c
−1
(4)
where the noise covariance matrix Rk,c is assumed to be white Gaussian (Rk,c = N0 INRk ). A simple channel estimation error model is adopted to consider the impact of imperfect channel knowledge at the transmitter on the system performance. p The estimated channel n 2 (H ˆ k,c is given as H ˆ k,c = 1/ 1 + σest matrix H k,c + Hk,c ), n where the entries of the estimation noise matrix Hk,c between each TX antenna t and RX antenna r are i.i.d. and generated 2 as [Hnk,c ]r,t ∼ CN (0, σest ). A. Block Diagonalization The basic BD method in [2] relies on the condition that NT ≥ NR . In general, the transmitter can send up to NT interference free data streams, regardless of the number of users. The BD algorithm can be extended to operate with any number of users by coordinating the processing between TX and RX. This is achieved by using a new equivalent channel ¯ k,c = FH Hk,c instead of Hk,c , where Fk,c is an matrix H k,c NRk × mk,c matrix consisting of mk,c receive beamformers ¯ k,c can be used in (and spatial dimensions) [2]. The matrix H PK BD algorithm instead of Hk,c as long as rc = k=1 mk,c ≤ NT . In order to eliminate the multi-access interference (MAI), ¯ i,c Mk,c = 0 for i 6= k is imposed. If the constraint H the channel matrix for all other users except the user k is ˜ ¯ k,c = [H ¯ T1,c . . . H ¯T ¯T ¯T T defined as H k−1,c Hk+1,c . . . HK,c ] the zero interference constraint forces Mk,c to lie in the null (0) ˜ ˜ ¯ k,c . Following the procedure from [2], [6], V ¯ k,c space of H ˜ ¯ k,c so is defined as an orthogonal basis for the null space of H
1
˜ ¯ k,c V0 (1) P 2 , Mk,c = V k,c k,c
(6)
where the diagonal matrix Pk,c = diag P1,c , . . . , Pmk,c ,c controls the powers allocated for each of the mk,c eigenmodes. The loading algorithms from Section III are used to find the elements of Pk,c for the corresponding eigen values, assuming total power constraint PT . Simple non-iterative method was presented in [2] to choose proper beamforming matrices Fk,c at the transmitter for any mk,c . 1 The non-iterative method results in somewhat smaller spectral efficiency than an iterative solution [6] where the transmitter and receiver matrices are successively recomputed (1) by assigning Fk,c (n) = U0 k,c (n − 1), and where n denotes the iteration number. The initial set of Fk,c (1) is the same as in the non-iterative method. However, the iterative solution may require in some cases even tens of iterations to meet the ¯ i,c Mk,c = 0 for i 6= k. constraint H We noticed that even a single reiteration round is sufficient to achieve most of the gains from the iterative solution. Linear MMSE filter (4) can be applied at the receiver to compensate for the remaining MAI. In such a case, however, the signalto-interference-plus-noise ratio (SINR) per sub-carrier cannot be fully controlled at the transmitter. If the SINR values per eigenmode are fixed at the transmitter by the bit and power loading algorithm in order to achieve certain frame error rate (FER), they cannot necessarily be guaranteed at the receiver due to the remaining interference. Therefore, some additional mechanisms are needed to maintain the quality of service at the receiver as will be discussed in Section III-D. III. U SER , B EAM , B IT AND P OWER A LLOCATION A. Greedy Beam Selection and Ordering Algorithm The number of beams assigned per user and per sub-carrier may vary depending on the channel realization so that 0 ≤ rc ≤ NT , 0 ≤ mk,c ≤ NRk . In [5], an efficient greedy beam selection algorithm was proposed to maximize the downlink spectral efficiency of the linear BD with coordinated TX-RX processing. As in [5], NT beams are selected for each sub¯ i,c of the stacked NR ×NT row carrier by projecting each row h h T iT T ¯ S,c = FH to the set of matrix H . . . FH HK,c 1,c H1,c K,c
previously selected beams, and choosing the one resulting the 1 The F k,c matrices are generated by selecting mk,c dominant left singular H [2]. Also, H0 vectors (columns) Uk,c of each Hk,c = Uk,c Λk,c Vk,c k,c (0) ˜ ¯ V ¯ in (5) must be defined as H in this case. k,c
k,c
1≤i≤NR , i6∈Sl−1,c
where the projection matrix BSl−1,c is defined as
¯H ¯ ¯H BSl−1,c = H Sl−1,c HSl−1,c HSl−1,c
−1
¯ Sl−1,c , H
(8)
and where Sl−1,c is the index vector of the previously selected l − 1 beams for sub-carrier c. The first beam is selected based on the largest channel eigenvalue, which may not be the optimal choice (full search) in all occasions. Finally, NT ×NC index matrix S includes the indices of the selected NT beams for all the sub-carriers, and can be also P considered as an resource allocation table. When NR = NRk > NT , mk,c per user k can get any value between zero and NRk depending on how strong channels are available for the given user and how compatible they are to other users and/or beams.
12
24−28 dB 20 dB
10 Spectral Efficiency [bits/s/Hz]
maximum norm of the projection. For the beam selection, Fk,c include now every NRk left singular vectors of Uk,c . The l’th beam bl,c for sub-carrier c is selected as:
2
¯ bl,c = arg max hi,c BSl−1,c − I , (7)
16 dB
8
12 dB
6
8 dB
4
4 dB 2
0
0 dB
0
50 100 150 Iterations (excluded subcarriers / spatial modes)
200
Fig. 1. Evolution of spectral efficiency versus discarded sub-carriers / spatial modes with Hughes-Hartogs loading algorithm (NT = 4, NRk = 2, K = 2, NC = 64, SNR = 0-28 dB)
B. Greedy Beam Allocation with Hughes-Hartog Loading Due to the inherent noise amplification property of linear BD method, it is often beneficial to allocate rc < NT to maximize the mutual information, especially, at low signalto-noise ratio (SNR) region and with a low number of users. The solution for the flat fading single carrier case given in [5] was simply to find a subset of the first indices from the full ordered set of indices so that the mutual information was maximized. OFDM system adds another dimension to the optimization problem above. The optimal solution for the 2D space-frequency allocation problem would require an exhaustive search over all possible combinations of beam allocations (order and number of beams allocated) for each sub-carrier. The number of computations required is clearly prohibitive. Therefore, a sub-optimal method is proposed below. The basic idea is to look for the best allocation of beams resulting in the highest spectral efficiency by iteratively discarding one of the beams from the allocation table and recomputing the BD decomposition for the affected sub-carrier. HughesHartogs (HH) loading algorithm [7], with a given modulation and coding scheme (MCS) set and the corresponding SNRs required by each MCS to achieve the desired target FER, is used in the algorithm below to compute the maximum spectral efficiency ζ for each 2D beam allocation and for each iteration round. 1) Compute the (iterative) BD decomposition for the selected beams in allocation table S, and compose NT × NC eigenvalue ¯ 0 , where the columns Λ ¯ 0c are dimatrix Λ agonal elements of blockdiag (Λ0 1,c , . . . , Λ0 K,c ) corresponding to indices Sc . ¯ 0 with 2) Run HH algorithm on the values of Λ power constraint PT to compute ζ. 3) Look for the sub-carrier c with the ¯ 0. lowest eigenmode λ0l,c from Λ
4) Discard the last index from the set Sc and update mk,c of the affected user k accordingly. 5) Compute the (iterative) BD decomposition for the affected sub-carrier with the ¯ 0c . reduced set of beams, and update Λ 6) Goto 2) and iterate until ζ(n) < ζ(n − 1), where n denotes the iteration index. 7) Select ζ(n − 1) as the max. spectral efficiency with final allocation table S(n-1)
An example in Fig. 1 illustrates the evolution of ζ for one channel realization with different SNR (= PT /N0 ) values versus the number of iterations for {NT , NRk , K, NC } = {4, 2, 2, 64} system (solid curves). The parameters for the HH algorithm used in the example case are given in Section IV. The maximum available ζ with the given parameter set is 12 bits/s/Hz. The maximum ζ for each SNR is pointed with asterisk mark. One way to simplify the algorithm above is to one-by-one discard the last row entirely from the index matrix S(n − 1) until the peak ζ is reached. The spectral efficiency achieved with this method (dashed curves, max value marked with triangle) is also shown in Fig. 1. It can be seen from the figure that achieving the maximum ζ with the BD method requires partial loading in spatial domain, especially at low SNR. At high SNR region, the system becomes saturated due to the finite MCS set, and full spatial loading is required. Moreover, simplified version of the algorithm can achieve nearly the same performance, especially at low SNR. The greedy allocation of resources among sub-carriers and spatial modes results in an OFDMA solution where the allocation table can be severely fragmented (0 < rc ≤ NT , 0 ≤ mk,c ≤ NRk ). Also, the MCS selected per sub-carrier per beam can be anyone from the given MCS set. Moreover, the iterative greedy algorithm proposed above is still computationally very expensive. Therefore, less complex scheduling and
loading algorithms with low signalling overhead are required for practical systems.
greedy scheduling and loading algorithm (III-B), as will be demonstrated numerically in Section IV.
C. Low Complexity Bit, Power and Beam Allocation with Low Signalling Overhead
D. Compensation for the Estimation Deficiencies at the Transmitter
A low complexity bit and power loading algorithm requiring a low signalling overhead was introduced in [8] for single user adaptive MIMO-OFDM system. The throughput degradation to the universally accepted Hughes-Hartog (HH) algorithm for a fixed target frame error rate (FER) was shown to be negligible while the signalling overhead was NC times reduced. This was achieved by grouping all min(NR , NT )NC eigenmodes into min(NR , NT ) clusters. The first cluster contains the strongest eigenmodes from each subcarrier, the second cluster contains the second strongest eigenmodes from each subcarrier and so on. Some of the most faded eigenmodes are skipped, and the same MCS is used for all selected eigenmodes belonging to the same cluster. Herein, the single user algorithm is extended to the considered multiuser MIMO-OFDM case. The aim is to avoid the fragmentation of user/beam allocation in sub-carrier domain by imposing a condition mk,c = mk ∀ c. Consequently, the single user loading algorithm is computed for each user k with mk clusters (allocated beams) [8]. For each transmitted frame, first, the algorithm divides the total power between the clusters according to their eigenmode gains, and then, the MCS and the selected eigenmodes (sub-carriers) in each cluster are independently optimized to maximize the throughput subject to equal SNR constraint for all the selected sub-carriers. Target FER is maintained by using a look-up-table which contains the SNR required by each of the available MCS in order to achieve the same FER in additive white Gaussian noise (AWGN) channel. Standard half rate turbo code is used for channel coding. The encoding is performed jointly in time and frequency domain, a code word covering the selected eigenmodes from one cluster during the whole transmitted frame. The amount of signalling needed is significantly less than in the case of the greedy scheduling and loading algorithm introduced in Section III-B. The signalling information required at the terminal is reduced to the number of beams/clusters allocated for each selected user and the MCS used on each cluster. The allocation problem is now reduced to finding the optimum mk for each user k, and the optimum total number of beams rc = r ∀ c. The straightforward solution is to perform exhaustive search over all combinations of possible user/beam allocations, where the BD decomposition and loading is computed for each allocation, and the best allocation with highest spectral efficiency is selected. Some statistical fairness can be also easily included by limiting the maximum number of beams per user, for example. Obviously, the complexity of this method grows with increasing K, as the number of beam combinations (iterations) increases. Also, the scheduling gain from exhaustive search quickly saturates in highly frequency selective channels due to frequency diversity on vertically coded frames. For a relatively small K (≈ NT ), however, the performance of the proposed method is very close to the
A simple closed-loop algorithm for compensating the effect of interference non-reciprocity at the transmitter was introduced in [9]. The basic idea of the method was to apply a single power offset value at the transmitter over all eigenmodes which compensates for the frequency, time and space selective non-reciprocal interference between TDD DL and UL. The receiver estimates signal quality metric (BER after 2nd turbo decoding iteration corresponding to the target FER) from the transmitted signal, compares the estimated metric to the target metric, and transmits the updated offset value to the transmitter. The offset value is then used to adjust the transmission rate per sub-channel so that the target signal quality metric is reached. The same method is used herein to compensate for the residual MAI caused by either the limited number of iterations in the iterative BD decomposition and/or the channel estimation errors at the transmitter. The power offset value feedback is used to modify the waterfilling reference level (N0 ) in the bit and power loading algorithm (offset) {Pl,c , M CSl,c } = f N0 + Pk , λ0l,c , PT , (9) where f denotes the function for the bit and power loading (offset) algorithm [8], l = [1, . . . , rc ] and Pk is the power offset value for user k corresponding to beam l. For example, the (offset) transmission rate is reduced (Pk increased) if the received quality is less than the target rate, and vice versa. IV. S IMULATION R ESULTS The performance of the proposed algorithms was studied by simulations. The simulated system was based on HIPERLAN/2 [10] and IEEE 802.11a assumptions, where the number of subcarriers in the OFDM air interface is NC = 64 and the cyclic prefix length is 16 samples. In all cases, the users’ channels were assumed to be constant during one coded OFDM frame, and changing randomly from block to block. The channel’s delay taps were considered independent of each other with power delay profile specified by ETSI BRAN Channel A [10]. One coded OFDM frame consists of 16 OFDM symbols. Modulation schemes used in the simulations were QPSK, 16QAM and 64QAM, all turbo encoded and punctured to rate = 1/2. The minimum codeword length is lmin = 500 bits and the SNRs required by each MCS to achieve the desired target FER=10% in AWGN channel are: 1.8 dB, 7.1 dB and 11.6 dB, respectively. The number of TX antennas was set to four, while the number of RX antennas was varied between 2 and 4. The fading in antenna elements was assumed to be independent of each other. The number of active users was kept relatively low varying from 1 to 8. The spectral efficiency with the proposed LSO allocation algorithm and with linear non-iterative or iterative BD decomposition is depicted in Fig. 2 for {NT , NRk , K} = {4, 2-4, 2}.
11
10
10
9
9
Spectral Efficiency [bits/s/Hz]
Spectral Efficiency [bits/s/Hz]
11
8 7
NR =4 →
6
k
5 4
TDMA, NRk=2
← NR =2
TDMA, NRk=4
k
3
Linear BD (orig.) Iter. loading & linear BD Iter. loading & BD (20 iters) Iter. loading & BD (2 iters)
2 1
0
5
10
15
SNR [dB]
20
25
← K=2, NR =4 k
K=8, NR =4 → k
8
K=4, NR =4 → k
7
← K=2, NR =2 k
6 5
TDMA (NRk=2), LSO
4
TDMA (NRk=2), HH
TDMA (NRk=4), LSO
3
TDMA (NRk=4), HH
2 30
1
BD, LSO BD, HH 0
5
10
15
SNR [dB]
20
25
30
Fig. 2. Non-iterative versus iterative BD decomposition with LSO loading algorithm, {NT , NRk , K} = {4, 2-4, 2}
Fig. 3. Comparison of LSO algorithm to greedy allocation with HH loading, {NT , NRk , K} = {4, 2-4, 2-8}
The maximum number of beams per user is limited to mk ≤ 2 in all cases for fair comparison. The performance of time division multiple access (TDMA) with best user scheduling is also depicted for the reference. The best user scheduling means simply that a user with higher spectral efficiency is selected for scheduling among two users at each time instant. The performance of BD is shown to be superior to TDMA in all cases. The spectral efficiency of BD method with NT = 4 and with 10% FER target is saturated at high SNR to (1 − 0.1) × 4 × 3 = 10.8 bits/s/Hz. The right most curves with the square tick mark in Fig. 2 from the groups of curves with NRk = 2 and 4 correspond to the original fully loaded case {m1 , m2 } = {2, 2} with noniterative BD processing [2] and with LSO algorithm from Section III-C. The iterative search for optimum allocation (partial spatial loading) results in significant gains at low SNR region, especially for NRk = 2. The possible combinations for mk used in the simulations were {m1 , m2 } = [{2, 2}, {2, 1}, {1, 2}, {1, 1}]. The iterative BD decomposition gives significant gains when mk < NRk , as seen clearly from the curves with NRk = 4. For NRk = 2, all the curves overlap at high SNR as the maximum ζ is achieved from full spatial loading (mk = NRk ), and there is no gain from iterative loading algorithm nor iterative BD decomposition since NR = NT . Even a single reiteration round (labelled as 2 iters.) is sufficient to achieve most of the gains from the iterative BD decomposition. This is seen by comparing the curves with 2 iterations to the ones with 20 iterations, which was sufficient in nearly all of the channel realizations to reach the steady state situation. Fig. 3 compares the spectral efficiency provided by the proposed LSO algorithm (Section III-C) to the greedy algorithm with HH loading (Section III-B). Also, the single user performance (TDMA, best user scheduling) is plotted for
comparison. Results show that the performance loss to the greedy algorithm, less than 10% smaller spectral efficiency, is rather insignificant for small K considering the far less complicated processing and signalling. In the four user case with LSO, the number of beams per user was limited to one (mk = 1) in order to reduce the number of allocation combinations. The optimization was reduced to finding the optimum r, which was mostly 4 in this case, except at very low SNR region. Allowing mk > 1 for LSO, slightly better spectral efficiency could have been achieved at high SNR but with increased complexity. The performance loss to the greedy algorithm increases as K becomes large, since the scheduling gains in sub-carrier domain cannot be fully utilized. If K is doubled to 8, ζ of the greedy algorithm is improved by 10% while ζ of LSO algorithm is only increased by 5% (not shown in Fig. 3). However, the practical implementation of the greedy algorithm as such is difficult for large K. In addition to the large amount of signalling required and the high complexity, the allocation table becomes fragmented and it is difficult to construct data frames such that the target (frame) error rate can be controlled and the frames are long enough to achieve some coding gain. As the basic principle of the BD method is to orthogonalize the scheduled beams/users, it is interesting to see what is the impact of the partially lost orthogonality due to imperfect transmitter channel state information (CSI). Fig. 4 depicts the spectral efficiency with three different levels of channel 2 estimation errors σest = [−∞, −20, −10] dB. It is seen from Fig. 4 that high channel estimation error destroys the orthogonality between beams/users and full spatial loading cannot be supported in the case where NR = NT , i.e., saturated spectral efficiency cannot be reached even at very high SNR. A straightforward solution to overcome the problem is to
11
11 10
BD (NR =4) k
9 8
BD (NR =3) k
7 6
TDMA (NRk=2), Perfect CSI
TDMA (NRk=2), σ2est=−20dB
5
TDMA (NRk=2),
4
σ2est=−10dB
BD (NRk=2), Perfect CSI
3
BD (NRk=2), σ2est=−20dB
2 1
Spectral Efficiency [bits/s/Hz]
Spectral Efficiency [bits/s/Hz]
10
BD (NRk=2), 0
5
10
15
SNR [dB]
20
8 7 6
TDMA, Perfect CSI
5
TDMA, σ2est=−20dB
4
BD, Perfect CSI
TDMA, σ2est=−10dB BD, σ2est=−20dB
3
σ2est=−10dB 25
9
30
2
BD, σ2est=−10dB 0
5
10
15
SNR [dB]
20
25
30
Fig. 4. Performance of BD MU MIMO-OFDM system with channel estimation errors, {NT , NRk , K} = {4, 2-4, 2}
Fig. 5. TDMA versus BD performance comparison with channel estimation errors, {NT , NRk , K} = {4, 4, 2}
increase the number of receive antennas, or impose a limit mk < NRk , to give more degrees of freedom for interference suppression at the receiver. Three receiver antennas are suffi2 cient for reaching the saturated ζ for σest = -10 dB, with a clear penalty to full CSI case, though. The plot is not fully fair, however, since the channel estimation error does not scale along the SNR axis but is fixed for the whole SNR range. In reality, the channel estimation error could be less at high SNR and vice versa. When NRk = NT , one clear advantage of MU-MIMO is lost since TDMA solution has also the equal number of spatial modes to be utilized and it can reach the saturated ζ at high SNR. In Fig. 5, a comparison between 2-user BD and TDMA with same number of TX and RX antennas is carried out. It is seen that the BD method clearly outperforms the TDMA with best user scheduling even with large channel estimation errors. With more users the advantage for BD would be even more clear.
MIMO-OFDM system was also shown to be robust against imperfect channel estimation at the transmitter, providing always superior performance to the single user TDMA solution.
V. C ONCLUSION An efficient user, beam, bit and power allocation algorithm was proposed to maximize the downlink spectral efficiency of the adaptive multiuser MIMO-OFDM system. The performance of the proposed allocation algorithm with low signalling overhead was shown to be close to the greedy allocation algorithm with Hughes-Hartogs loading. The performance loss was less than 10 % for a small number of active users. Iterative block diagonalization decomposition with a single reiteration round was shown to provide significant gains to non-iterative solution, especially if the number of beams assigned per user is less than the number of receive antennas. LMMSE filter is applied at the receiver to suppress the remaining multi access interference from incomplete orthogonalization together with a simple compensation algorithm with low rate scalar feedback to the transmitter. The proposed MU
ACKNOWLEDGMENT This research was supported by Finnish Funding Agency for Technology and Innovation (TEKES), Nokia, the Finnish Defence Forces, Elektrobit, Tauno T¨onning Foundation, Nokia Foundation and Infotech Oulu Graduate School. R EFERENCES [1] W. Yu and J. Cioffi, “Sum capacity of gaussian vector broadcast channels,” IEEE Trans. Inform. Theory, vol. 50, no. 9, pp. 1875 – 1892, Sept. 2004. [2] Q. H. Spencer, A. L. Swindlehurst, and M. Haardt, “Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO channels,” IEEE Trans. Signal Processing, vol. 52, no. 2, pp. 461–471, Feb. 2004. [3] Z. Tu and R. Blum, “Multiuser diversity for a dirty paper approach,” IEEE Commun. Lett., vol. 7, no. 8, pp. 370 – 372, Aug. 2003. [4] G. Primolevo, O. Simeone, and U. Spagnolini, “Channel aware scheduling for broadcast MIMO systems with orthogonal linear precoding and fairness constraints,” in Proc. IEEE Int. Conf. Commun., vol. 4, June 2005, pp. 2749 – 2753. [5] A. T¨olli and M. Juntti, “Scheduling for multiuser MIMO downlink with linear processing,” in Proc. IEEE Int. Symp. Pers., Indoor, Mobile Radio Commun., vol. 3, Berlin, Germany, Sept. 2005. [6] B. Farhang-Boroujeny, Q. H. Spencer, and A. L. Swindlehurst, “Layering techniques for space-time communication in multi-user networks,” in Proc. IEEE Veh. Technol. Conf., vol. 2, Orlando, Florida, USA, Oct. 2003, pp. 1339 – 1343. [7] D. Hughes-Hartogs, “Ensemble modem structure for imperfect transmission media,” U.S. Patent 4,833,706, May, 1989. [8] M. Codreanu, D. Tujkovic, and M. Latva-aho, “Adaptive MIMO-OFDM with low signalling overhead for unbalanced antenna systems,” in Proc. IEEE Int. Symp. Pers., Indoor, Mobile Radio Commun., vol. 4, Barcelona, Spain, Sept. 2004, pp. 2382 – 2386. [9] A. T¨olli and M. Codreanu, “Compensation of interference nonreciprocity in adaptive TDD MIMO-OFDM systems,” in Proc. IEEE Int. Symp. Pers., Indoor, Mobile Radio Commun., vol. 2, Barcelona, Spain, Sept. 2004, pp. 859 – 863. [10] ETSI TS 101 683 V1.1.1, “Broadband radio access networks (BRAN); HIPERLAN type 2; system overview,” European Telecommunications Standards Institute (ETSI), Tech. Rep., 2000-02.