ARTICLE IN PRESS
Signal Processing 85 (2005) 1059–1072 www.elsevier.com/locate/sigpro
Multi-user pdf estimation based criteria for adaptive blind separation of discrete sources Charles Casimiro Cavalcantea,, Joa˜o Marcos T. Romanob a
GTEL/DETI/CT/UFC, C.P. 6005, Campus do Pici, CEP: 60.455-900, Fortaleza-CE, Brazil DSPCom/DECOM/FEEC/UNICAMP, C.P. 6101, CEP: 13.083-852, Campinas-SP, Brazil
b
Received 28 April 2004; received in revised form 27 September 2004
Abstract This paper deals with criteria for adaptive blind separation of discrete sources. The criteria are based on the estimation of the probability density function (pdf) of the recovered signal using a parametric model and the divergence of Kullback–Leibler to measure the similarities between the involved signals. Two strategies that guarantee the recovering of all sources are employed: the first one introduces a penalty when the sources are correlated and the second one constrains the filtering to an orthogonal global system response. Simulations are carried out to evaluate the performance of the criteria compared with existing blind methods in typical multi-user environments such as spatial and space-time processing. r 2005 Elsevier B.V. All rights reserved. Keywords: Blind source separation; Discrete sources; Pdf estimation; Kullback–Leibler divergence; Constrained filtering
1. Introduction Blind source separation (BSS) has been gaining increasing attention in the signal processing community due to its wide applicability in many fields such as digital communications, biomedical engineering and financial data analysis among others [14]. Since the milestone work by He´rault et al. in 1985 [13], much effort has been done in order to Corresponding author. Tel.:/fax: +55 85 3288 9470.
E-mail address:
[email protected] (C.C. Cavalcante).
construct suitable statistical criteria that reflect some known structural properties of the sources [19]. A common characteristic of many criteria is the use of higher order statistics (HOS), since second order statistics (SOS) are not sufficient to solve the source separation problem. The information-theoretic approach has been introduced by Donoho in [11], who has treated the BSS problem from an entropy minimization viewpoint. Another well-known method to solve BSS problems is the use of contrast functions introduced by Comon [8], is any non-linear function which is invariant to permutation and scaling
0165-1684/$ - see front matter r 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.sigpro.2004.11.023
ARTICLE IN PRESS 1060
C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072
matrices, and attains its minimum value in correspondence of the mutual independence among the output components [19]. Clearly, the methods are based in some existing requirements for BSS such as linear and time-invariant mixture, as many sensors as sources and mutually independent sources. These works have provided important results on the issue of necessary and sufficient conditions to blindly provide source separation. Despite the development of techniques that rely directly on HOS, some single user techniques, such as constant modulus (CM) and Shalvi–Weinstein criteria, have been proposed for BSS in a singlestage and multi-stage context [16,20]. Multiple-input multiple-output (MIMO) systems have received a lot of attention during the last decade, with several multiple access systems emerging during that time. Through BSS techniques, several strategies have shown promising results in the context of multiuser detection. To cite a few references, [9,17,21] reveal the potential of equalization strategies to source separation. Papadias proposed in [19,20] a source separation approach that is based on the Shalvi–Weinstein criterion. The proposal is called multiuser kurtosis and consists of the kurtosis maximization, constrained to an orthogonal global response. It has been a great advance on the field of BSS because it has proved global convergence for an arbitrary number of users, which had not been done before. We have previously proposed a source separation criterion based on the estimation of the probability density function (pdf) of the ideally recovered signals [5]. The criterion uses the knowledge of the pdf of transmitted signals, as in [25], to construct a parametric model that retains the statistical properties of the source signals. Thus, the separation matrix is optimized in order to maximize the similarities between the pdf of the recovered signal and the parametric model. For this sake, we use the Kullback–Leibler divergence to measure these similarities. The criterion is then similar to the one presented in [22]; however we consider a much simpler stochastic approximation to derive an adaptive algorithm.
Our objective in this work is to propose multiuser methods based on the pdf estimation based criterion and use different strategies to ensure that all sources are recovered given some requirements (e.g., linear mixture, discrete sources). Two strategies are used for this task: (i) an auxiliary criterion that penalizes the estimated sources (outputs) that are correlated and (ii) a filtering that constrains the obtained global response to be orthogonal. Also, applications in the context of multi-user processing are presented in order to evaluate the performance of the proposals comparing them with existing blind criteria. The rest of the paper is organized as follows. The system model for memoryless and convolutive systems is described in details in Section 2. In Section 3, the pdf estimation based blind criterion is revisited in the context of SISO systems. In Section 4, the criteria for adaptive blind source separation are presented in detail, showing the different strategies for multi-user consideration. Simulation results that illustrate the performance of the proposals in some typical situations are presented in Section 5. Finally, our conclusions are stated in Section 6. 2. System model 2.1. Blind source separation: general assumptions In the model we made the following assumptions [26]: (AS1) K sources with independent and identically distributed (i.i.d.) and mutually independent zero mean discrete sequences ak ðnÞ; k ¼ 1; . . . ; K: (AS2) MIMO linear channel. (AS3) At least as many sensors as sources. (AS4) Noise is zero-mean, ergodic, stationary Gaussian sequence independent of ak ðnÞ: If we consider M sensors in the receiver we can represent, for instantaneous mixtures, the received signal at time instant n as (1)
xðnÞ ¼ HaðnÞ þ vðnÞ, K
where aðnÞ ¼ ½ a1 ðnÞ aK ðnÞ is the vector of sources, H is the M K channel matrix, vðnÞ is the
ARTICLE IN PRESS C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072
1061
M 1 vector of additive noise and xðnÞ is the M 1 vector of received signals. The received signals are then processed by the MIMO equalizer given by the matrix
Through the use of a linear antenna array in the receiver, we can write the discrete time model for the received signal of the K users as [7]
WðnÞ ¼ ½ w1 ðnÞ
xðnÞ ¼
wK ðnÞ ,
(2)
K pffiffiffiffiffiffi X Pk ak ðnÞ fðyk Þ þ vðnÞ,
(4)
k¼1
which produces a K 1 vector yðnÞ ¼ ½ y1 ðnÞ yK ðnÞ K that consists of the estimate of the sources. The receiver output can be mathematically written as yðnÞ ¼ WH ðnÞxðnÞ ¼ WH ðnÞHaðnÞ þ v0 ðnÞ ¼ GðnÞaðnÞ þ v0 ðnÞ, where the superscript transposition and
H
ð3Þ
stands for Hermitian
GðnÞ ¼ WH ðnÞH ¼ ½ g1 gK 2 3 g11 gK1 6 7 .. 7 .. 6 . ¼ 6 .. . 7 . 4 5 g1K
gKK
KK
is the global response matrix and v0 ðnÞ ¼ WH ðnÞvðnÞ is the filtered noise at the receiver output. Fig. 1 depicts the above-described system. For the case of multi-user processing in practical applications such as wireless communications, some particular characteristics for the system are considered. This is the topic of next section. 2.2. Multi-user processing For multi-user processing we consider a space division multiple access (SDMA) scenario with several users transmitting simultaneously.
where Pk is the transmission power, ak is the transmitted sequence, vðnÞ is a vector of spatial additive white Gaussian noise (AWGN) samples. For simplification reasons we assume that the signal comes from a preferential direction, thus yk is the direction of arrival (DOA) of the kth user and fðyk Þ ¼ ½ f 1 ðyk Þ f M ðyk Þ T is the array response vector, where the array phase response at the mth antenna is modeled as
2d pðm 1Þ sinðyk Þ f m ðyk Þ ¼ exp , (5) l where d is the inter-element spacing in the array and l is the carrier wavelength. We assume in this paper that d ¼ l2 : The channel (mixture matrix) for all users is then denoted by the matrix F ¼ ½ fðy1 Þ
fðyK Þ .
(6)
In order to detect/separate multiple users, adaptive spatial filters are provided to each desired user as shown in Fig. 2. This corresponds to the separation matrix in the general BSS scheme. Then, the output for the kth user, provided by the kth beamformer (spatial filter), denoted by wk ; is given at time instant n by yk ðnÞ ¼ wH k ðnÞxðnÞ.
Fig. 1. General blind source separation scheme.
(7)
ARTICLE IN PRESS C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072
1062
Fig. 2. Antenna array architecture for multi-user detection in memoryless systems.
Moreover, the vector of beamformers outputs is denoted by yðnÞ ¼ ½ y1 ðnÞ yK ðnÞ T : When intersymbol interference (ISI) is considered, we have for the received signal given by convolutive mixtures the following model: xðnÞ ¼
K LX k 1 pffiffiffiffiffiffi X Pk ak ðiÞ ak ðn iÞ
is the diagonal matrix with the complex gains (amplitude and phase rotations) of the kth user channel. In order to simplify notation, we can use some special vectors. Then, the mixture signal can be given by XðnÞ ¼
k¼1 i¼0
fðyk;i Þ þ vðnÞ,
K X
(12)
Hk Ak ðnÞ þ VðnÞ,
k¼1
ð8Þ where
where ak ðiÞ is the channel coefficient for ith path of the kth user that has a DOA yk;i and Lk is the number of multipaths for user k: In a matricial form, we can write the model for the mixtures [20]: pffiffiffiffiffiffi Hk ¼ Pk Fðyk Þ ak , (9)
where N is the number of coefficients in each space–time filter of the receiver;
where
Ak ðnÞ ¼ ½ ak ðnÞ
Fðyk Þ ¼ ½ fðyk;0 Þ fðyk;1 Þ
fðyk;Lk 1 Þ ML
xM ðnÞ T ,
ak ðn Lk N þ 2Þ TðL
ð13Þ
k þN 1Þ1
(14)
is a composite matrix with the steering vectors for the kth user and, ak ðLk 1Þ Þ
xðnÞ ¼ ½ x1 ðnÞ
xT ðn N þ 1Þ TMN1 ,
k
(10)
ak ¼ diagð½ ak ð0Þ ak ð1Þ
XðnÞ ¼ ½ xT ðnÞ
(11)
VðnÞ ¼ ½ vT ðnÞ and
vðnÞ ¼ ½ v1 ðnÞ
vT ðn N þ 1Þ TMN1
vM ðnÞ T .
ð15Þ
The matrix Hk is the channel convolution matrix of the space–time channel for the kth user
ARTICLE IN PRESS C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072
1063
Fig. 3. Space–time multi-user receiver structure.
given by [20] Hk
2
Hk ðnÞ
3
0MðN 1Þ
6 6 0M1 6 6 60 ¼ 6 M1 6 6 .. 6 . 4
0M1
Hk ðn 1Þ
0MðN 2Þ
Hk ðn 2Þ
0MðN 3Þ
..
0MðN 1Þ
Hk ðn N þ 1Þ
.
where 0kl is a k l matrix of zeros. The receiver structure, which consists of a set of space–time filters is represented in Fig. 3. Then, we can write the following expressions for the estimated sources and the receiver structure: yk ðnÞ ¼ WH (17) k ðnÞXðnÞ, where Wk ¼ ½ wTk0 wTk _n ¼ ½ wk;1;n_
wkðN 1Þ TMN1 , wk;M;n_ T
M1
ð18Þ
_
and n is the time index of the temporal filter associated to each antenna. 3. PDF estimation based blind criterion: SISO systems 3.1. Review
7 7 7 7 7 7 7 7 7 5
,
ð16Þ
MNðNþLk 1Þ
Let wideal be an ideal zero-forcing linear equalizer, its output can be written as yðnÞ ¼ wH ideal xðnÞ,
(19)
where xðnÞ ¼ HaðnÞ þ vðnÞ,
(20)
H is the N ðN þ L 1Þ convolution matrix of the single-user channel with impulse response of length L and N is the length of the impulse response of the equalizer [2]. Then, using Eq. (20) in (19), it is possible to write: yðnÞ ¼ ðHaðnÞ þ vðnÞÞH wideal ¼ aH ðnÞHH wideal þ vH ðnÞwideal ¼ aH ðnÞ HH wideal þvH ðnÞwideal |fflfflfflfflffl{zfflfflfflfflffl} gideal
In [5] was proposed a blind criterion in the context of SISO systems (blind deconvolution). This method is revisited here.
H
¼ a ðnÞgideal þ WðnÞ ¼ aðn dÞ þ WðnÞ,
ð21Þ
ARTICLE IN PRESS C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072
1064
where gideal is the ideal system response (a ‘‘Dirac’’-type vector), d is an arbitrary delay and WðnÞ is a random variable (r.v.) assumed Gaussian [2]. Eq. (21) states that the pdf of the signal at the output of the ideal equalizer is a mixture of Gaussians given by
(23) yields pðyÞ ¼ pðyÞ ln dy Fðy; s2r Þ
1 Z 1 pðyÞ ln½pðyÞdy ¼
1 Z 1
pðyÞ ln½Fðy; s2r Þdy. Z
DpðyÞjjFðy;s2r Þ
1
ð24Þ
1
S X 1 jy ai j2 pY ;ideal ðyÞ ¼ qffiffiffiffiffiffiffiffiffiffi exp
pðai Þ, 2s2W 2ps2W i¼1 (22) where the ai are the possible values (scalars) of aðn dÞ; assumed to belong to the transmitted alphabet A which has cardinality S. Since the pdf of the equalized signal is known, we desire to construct a criterion that forces the adaptive filter to produce signals with the same (or as similar as possible) pdf as the ideal one. This leads to the well-known measure of similarities between strictly positive functions (such as the pdfs), the Kullback–Leibler Divergence (KLD) [12]. In order to use the KLD to provide pdf estimation, a parametric model, which is function of the filter parameters, is constructed [2]. A natural choice is the same model of mixture of Gaussians like the one in Eq. (22) considering the estimate of the pdf of the sample yðnÞ: Then
jyðnÞ ai j2 exp
pðai Þ, 2s2r
J FPC ðwÞ ¼ Efln½Fðy; s2r Þg ( "
S X jyðnÞ ai j2 exp
¼ E ln A 2s2r i¼1 #) pðai Þ
.
ð25Þ
The Fitting pdf Criterion (FPC) criterion corresponds to minimizing J FPC ðwÞ: Furthermore, it is known that minimizing Eq. (25) corresponds to finding the entropy of y if Fðy; s2r Þ ¼ pY ðyÞ [1, p. 59]. A stochastic algorithm for filter adaptation is given by wðn þ 1Þ ¼ wðnÞ mrJ FPC ðwÞ Ps jyðnÞ ai j2 ðyðnÞ ai Þ i¼1 exp
2s2r rJ FPC ðwðnÞÞ ¼ x ; P S jyðnÞ ai j2 2 sr i¼1 exp 2s2
S X 1 Fðy; s2r Þ ¼ pffiffiffiffiffiffiffiffiffiffi 2ps2 |fflfflffl{zfflfflffl}r i¼1 A
This procedure aims to force pðyÞ to be ‘‘as close’’ as possible to the model given by Eq. (23) adapting the filter parameters. So, using Eq. (24) we can write the integrals as expectations over pðyÞ since such density has to be approximated to the structure given in Eq. (23). Minimizing (24) is equivalent to minimizing only the Fðy; s2r Þ-dependent term, and this is used to construct our cost function, that is
r
ð23Þ
(26)
is the chosen parametric model, where s2r is the variance of each Gaussian in the model. In the pattern classification field this kind of parametric functions, which are used to measure similarities against other functions, are called target functions [2]. Then, applying KLD to compare the pdf of the recovered signal (equalizer output) and
where m is the step size. The adaptive algorithm which uses the proposed criterion will be called fitting pdf algorithm (FPA). Eq. (26) shows an important property of the algorithm: it takes into account the phase of the transmitted symbols. The computational complexity of this algorithm is proportional to the computation of S exponentials which are required by Eq. (26). Thus, its
ARTICLE IN PRESS C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072
complexity is a little higher than other blind LMSlike algorithms [10]. In the sequel we discuss some issues of the selection of the cost function and similarities with criteria previously reported in the literature.
1065
Quadratic distance (QD)—the cost function is defined using the quadratic distance between the pdfs to measure the similarities Z 1 JðwÞ ¼ ½pY 2 ðuÞ pA2 ðuÞ2 du, (28)
1
3.2. Discussion One important issue on the criterion design is the cost function selection. Indeed, as the KLD is a semi-distance, the order of the functions in the KLD impacts the resulting functional. If the inverse order on KLD were taken, we would obtain Z 1 Fðy; s2r Þ DFðy;s2r ÞkpðyÞ ¼ Fðy; s2r Þ ln dy pðyÞ
1 Z 1 ¼ Fðy; s2r Þ ln½Fðy; s2r Þdy
1 Z 1
Fðy; s2r Þ ln½pðyÞdy
1
¼ H½Fðy; s2r Þ E F fln½pðyÞg,
ð27Þ
where H½ stands for the entropy. In this case, if the selected cost function were the one represented in Eq. (27), the resultant algorithm from the minimization of DFðy;s2r ÞkpðyÞ would be computationally more complex due to the existence of two terms dependant on the parametric model. Clearly, the main motivation of retaining DpðyÞjjFðy;s2r Þ and not DFðy;s2r ÞkpðyÞ concerns the complexity of implementation. The analysis about possible differences on performance issues will be investigated in later works. On the other hand, other information-theoretic approaches have been employed in the context of equalization. We briefly describe the major common points of them in the sequel. Firstly, it should be noted that the FPC can be viewed as a sort of Parzen estimation [18,23], with the center of the Gaussian kernels chosen to be the elements of the transmitted alphabet A: Using Parzen estimation for the density of the recovered data, in [18] some variations of the consideration of pdf fitting are discussed. In summary, they use the following three approaches:
where Y 2 ¼ fjyðnÞj2 g and A2 ¼ fjai j2 g: Sampled pdf fitting—aims to fit the pdf only at a number N p of representative points JðwÞ ¼
Np 1 X ½p 2 ðui Þ T i 2 , N p i¼1 Y
(29)
where T i ¼ pA2 ðui Þ: Matched pdf—the similarities between the desired pdf and the estimate are measured according to Z 1 JðwÞ ¼ p2Y 2 ðuÞp2A2 ðuÞ du. (30)
1
The resultant algorithms for the three abovementioned criteria are somewhat similar to the FPA since the Parzen window is the Gaussian one. In [15], a clustering method is proposed also based on measure of similarities between density functions. The measure is a sort of Cauchy–Schwartz measure between pdfs given as " # R pðxÞqðxÞdx ffi , JðwÞ ¼ ln pRffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (31) R p2 ðxÞdx q2 ðxÞdx where pðxÞ and qðxÞ are pdfs of the involved signals. In this case, a Gaussian (Parzen) kernel is employed to estimate the density of recovered data, providing an adaptation procedure similar to the one in [18]. The work with major similarities is reported in [22]. In this case, a statistical reference criterion is derived based on the use of KLD. It inserts a regeneration function to provide a recursive estimation of the data using the consideration that the noise is assumed Gaussian. Despite the very similar structure of that criterion and the FPC, the algorithms from both are quite different. Another difference is the existence of an additional degree of freedom (parameter s2r ) in the FPC. In [22], the authors claim that the choice of the variance of the Gaussians in the model does
ARTICLE IN PRESS 1066
C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072
not affect the performance of their algorithm. Particularly, the choice of such a parameter has been shown to play a key role in the cost function since the variance of the Gaussians can be used to make the estimate more robust, as discussed in details in [3]. The FPC has been generalized to the multi-user processing case as described in the next section.
proposed in the context of a multi-user constant modulus algorithm (MU-CMA) in [21]. This strategy was called explicit decorrelation by [7], a nomenclature adopted in this paper. This approach results in the following multiuser criterion [4]: J MU-FPC ðwk Þ ¼ J FPC ðwk Þ þ g
K X K X i¼1
4. Multi-user pdf estimation based criteria In order to generalize the criterion to the multiple sources case, we have used the conditions for signal recovering, that are directly stated from the Shalvi–Weinstein (SW) criterion. These conditions can be written as [19]: (C1) ak ðnÞ is i.i.d. and zero mean ðk ¼ 1; . . . ; KÞ; (C2) ak ðnÞ and aq ðnÞ are statistically independent for kaq and have the same pdf, (C3) jk½yk ðnÞj ¼ jka j ðk ¼ 1; . . . ; KÞ; (C4) Efjyk ðnÞj2 g ¼ s2a ðk ¼ 1; . . . ; KÞ; (C5) Efyk ðnÞyq ðnÞg ¼ 0; kaq; where ak ðnÞ is the sequence transmitted by the kth source, ka is the kurtosis, s2a is the variance of the transmitted sequence, and k½ is the kurtosis operator. Conditions (C1)–(C4) are the direct interpretation of the SW theorem [24] for signal recovering and Condition (C5) guarantees achievement of all the sources involved. However, other criteria can be used to perform the signal recovering (or separation) than kurtosis maximization proposed in [24]. Since the pdf estimation approach aims, indirectly, for the equalization of all higher order statistics, FPC replaces the kurtosis maximization for interference removal. Furthermore, the strategy for making Condition (C5) hold defines multi-user criteria as described in the sequel. 4.1. Explicit decorrelation The first criterion uses the strategy of a penalization term that measures the cross-correlation of the beamformers outputs. It was initially
jrij j2 ,
(32)
j¼1 jai
where g is a regularization parameter and rij is the correlation term between the outputs of the ith and jth beamformers. The criterion described in Eq. (32) is called multi-user fitting pdf criterion (MU-FPC). The adaptation procedure of the MU-FPC for the kth user in a memoryless system (spatial processing) is then wk ðn þ 1Þ ¼ wk ðnÞ mrJ FPC ðwk Þ
g
K X i¼1 iak
b rik ðnÞpi ðnÞ,
ð33Þ
where b rik ðnÞ is the ði; kÞ element of the matrix Ry ðnÞ and pi ðnÞ is the ith column of matrix PðnÞ; which are computed as Ry ðn þ 1Þ ¼ oRy ðnÞ þ ð1 oÞyðnÞyH ðnÞ Pðn þ 1Þ ¼ oPðnÞ þ ð1 oÞxðnÞyH ðnÞ,
ð34Þ
where o is a forgetting factor. Eqs. (33) and (34) define the multi-user fitting pdf algorithm (MUFPA). For space–time multi-user processing it is also necessary to guarantee decorrelation of the signals in a time interval in order to remove the ISI. Thus, the criterion becomes [4] J MU-FPC ðWk Þ ¼ J FPC ðWk Þ D
þg
K X K X 2 X i¼1
j¼1 jai
jrij ð‘Þj2 ,
ð35Þ
‘¼ D2
where rij ð‘Þ ¼ Efyi ðnÞyj ðn ‘Þg is the cross-correlation between the ith and jth outputs from the space–time receivers with a time interval ‘; and D2 is the maximum delay for which the signals from the several users must be decorrelated.
ARTICLE IN PRESS C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072
The MU-FPA, for the space–time processing, is then given by [3]: Wk ðn þ 1Þ ¼ Wk ðnÞ mrJ FPC ðWk Þ D
g
K X 2 X i¼1 iak
‘¼ D2
b pi;‘ ðnÞ, rik;‘ ðnÞb
ð36Þ
Ry;‘ ðn þ 1Þ ¼ oRy;‘ ðnÞ þ ð1 oÞyðnÞyH ðn ‘Þ, (37) P‘ ðn þ 1Þ ¼ oP‘ ðnÞ þ ð1 oÞXðnÞyH ðn ‘Þ, yðn ‘Þ ¼ ½ y1 ðn ‘Þ
yK ðn ‘Þ T ,
(38) (39)
D D ;...; , (40) 2 2 where b rik;‘ ðnÞ is the cross-correlation between the ith and jth users estimate with time delay ‘; given by the ði; jÞth element of matrix Ry;‘ ðnÞ; and b pi;‘ ðnÞ is the ith column of matrix P‘ ðnÞ: This criterion and its stochastic adaptive algorithm present good results but suffer from the strong trade-off between the number of different recovered sources (successful recovering) and steady state error w.r.t. the parameter g; due to the use of a penalization term in the multi-user criterion. An alternative to this strategy aiming to cope with the problem of losing users (unsuccessful recovering) and a good steady state error is presented in next subsection. ‘¼
4.2. Constrained criterion Using conditions (C1)–(C5) [19] proposed a multi-user criterion based on the maximization of the kurtosis of the recovered signals named multiuser kurtosis (MUK) maximization. The recovering of all sources is ensured by a constraint that forces global response to be an orthogonal matrix. The criterion divides the separation task into two parts: the equalization step, that maximizes the kurtosis, and the separation one, that performs the decorrelation of the outputs. For each task, we denote We for the equalization part, and W for the later one. The separation is performed by means of
1067
a Gram–Schmidt orthogonalization of matrix We [19]. However, this criterion requires a pre-whitening process on the received signals in order to assure that the variance will also be equalized. As we have mentioned before, the use of such ‘‘decoupled’’ processing makes possible the use of equalization techniques to obtain source separation, constraining the global system response to be orthogonal. The key point is that the considered equalization criterion has to respect the necessary and sufficient conditions for blind signal recovering. We have noticed that the FPC can be viewed as a general case of the Shalvi–Weinstein one. This is due to the fact that equalizing the pdfs of signals corresponds to equalizing all higher-order moments (cumulants) of the signals, and matching all cumulants comprises the kurtosis one, as stated by Shalvi–Weinstein and MUK criteria. Indeed, we benefit from the parametric form of pdf estimation provided by the FPC to cope with the high computational load required to match all cumulants between the input and output of the system. Since matching all cumulants implies matching the kurtosis, we can replace the pdf estimation procedure with the kurtosis maximization, in the MUK criterion, and the set of conditions is still valid as necessary and sufficient conditions for blind source separation. With this approach, we replace the processing with a penalization term that is performed by the MU-FPA, see Eq. 32, by a constrained procedure aiming at improving the performance. Finally, the criterion is given by 8 K P > < min J FPC ðWÞ ¼ DpY ðyÞkFðyj ;s2r Þ ; W
> :
j¼1
(41)
H
subject to: G G ¼ cI;
where c is an arbitrary constant and I is the identity matrix. Since, there is no pre-whitening performed, the unit variance of the received signals cannot be assured. Furthermore, when the pdfs are equalized all the statistics have to be equal. Then, it is not necessary an orthonormalization and a scaled and permuted version of the sources will be achieved as for BSS strategies [14].
ARTICLE IN PRESS C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072
1068 Table 1 MU-CFPA
5.1. Spatial multi-user processing
1. 2. 3. 4. 5. 6.
An SDMA system with four users is used as the evaluation scenario of the algorithms. In the receiver, we use a linear antenna array with eight elements. The DOAs for the users are the following: 1 ; 52 ; 29 and 76 : In each direction, 2000 QPSK unit power symbols are transmitted over 100 Monte Carlo trials. In this environment, we have assumed that power control is used to assure the same received power to all users, and the signal-to-noise ratio (SNR) for each sensor equals 30 dB. In order to evaluate the performance of the proposed algorithm we use the constant modulus error (CME) defined for the kth user as follows:
Initialize Wð0Þ for n40 Obtain We ðn þ 1Þ from Eqs. (42) and (26) Obtain w1 ðn þ 1Þ ¼ we1 ðn þ 1Þ for j ¼ 2 : K Compute
wj ðn þ 1Þ ¼ wej ðn þ 1Þ
j 1 X e ðwH l ðn þ 1Þwj ðn þ 1ÞÞwl ðn þ 1Þ l¼1
7.Go to 5 8. Go to 2
Thus, the new adaptive algorithm is obtained by replacing the step of kurtosis maximization by the following computation: We ðn þ 1Þ ¼ WðnÞ mw rJ FPC ðWðnÞÞ,
(42)
where rJ FPC ðWðnÞÞ is given in Eq. (26) and performing just an orthogonalization in matrix We by means of the Gram–Schmidt procedure. The resulting algorithm is then called multiuser constrained FPA (MU-CFPA) and is summarized in Table 1. In terms of computational complexity, the MUCFPA is an LMS-like algorithm, although the orthogonalization procedure slightly increases the complexity of the FPA. However, the MU-CFPA can be applied only on memoryless systems. In the case of the space–time processing, the global response is a tensor and its orthogonalization is more complex than the matrix one.
5. Computational simulations This section demonstrates the performance of the proposed multi-user criteria, which is compared to blind separation criteria found in the literature. We have chosen three common situations in multi-user processing in order to evaluate the performance over different environments. They are described in the sequel.
CMEk ðnÞ ¼ ðjyk j2 RÞ2 .
(43)
The constant R is related to the power of the transmitted constellation. In our case we will assume a normalized power, i.e., R ¼ 1: Firstly, we have investigated the gain in terms of steady state error when we use two different strategies for obtaining all sources, namely the MU-FPA and MU-CFPA. For the environment previously described, we have simulated both algorithms and observed the temporal evolution of the CME. The simulation parameters for the algorithms are: m ¼ 10 3 ; g ¼ 10 3 ; s2r ¼ 0:1 chosen to achieve a CME as low as possible and initializations are performed as Wð0Þ ¼ We ð0Þ ¼ Ry ð0Þ ¼ Pð0Þ ¼ I: For the MU-FPA, the forgetting factor used in the correlation matrices estimation is set to o ¼ 0:96: Fig. 4 shows the comparison of the average CME for both algorithms. As one can see, the MU-CFPA outperforms the MU-FPA in terms of steady state error, where the difference of performance between the two algorithms is about 10 dB. This is due to the dropping of the decorrelation term in Eq. (32), which eliminates the trade-off between steady state error and number of lost users [7]. In the case of MUCFPA, this term is not taken into account, and the decorrelation procedure is done by means of the orthogonalization of the separation matrix, which improves the steady state performance. It is worth mentioning that g was chosen to ensure successful
ARTICLE IN PRESS C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072 15 10 5
CME [dB]
0 −5 −10 −15
MU-FPA
−20 10 dB
−25 −30 MU-CFPA
−35
0
250
500
750
1000
Iterations Fig. 4. Performance evaluation of MU-FPA MU-CFPA:
1069
computed over 20 independent runs (Monte Carlo trials). Since we define convergence as the necessary number of iterations to reach the lower value of CME (or another metric), we observe that the MU-CFPA (around 300 iterations) outperforms the MU-CMA (around 2000 iterations) in terms of convergence rate and steady-state error due to the use of different decorrelation methods. The MUCMA step size was taken as the highest one that achieves convergence. This behavior of faster convergence is due to the considered number of higher order statistics in the criteria that allow a faster estimation of the signal pdf since the MUFPA uses much more HOS than the MU-CMA [3,6].
5.2. Instantaneous mixtures
15 10 5
CME [dB]
0 −5 MU-CMA
−10 −15 −20 MU-CFPA
−25 −30 −35 0
500
1000
1500
2000
2500
3000
3500
4000
Iterations Fig. 5. Temporal evolution of the average constant modulus error for multi-user processing K ¼ 4 and M ¼ 8:
recovering of all sources without replication of a specific user (or users). Thus, we use the MU-CFPA instead of the MUFPA for comparing the criterion with other known blind multi-user algorithms, namely the MUCMA [21]. For this sake, we use the same setup and the simulation parameters for MU-CFPA described previously. The simulation parameters are: mMU-CFPA ¼ 4 10 3 ; mMU-CMA ¼ 10 3 ; Wð0Þ ¼ Ry ð0Þ ¼ Pð0Þ ¼ I; o ¼ 0:96 and g ¼ 10 3 : Fig. 5 shows the temporal evolution of the average CME for MU-CFPA and MU-CMA
In this experiment, we have taken memoryless mixture matrices (instantaneous ones) for evaluating the behavior of BSS algorithms on this environment. We have considered K ¼ 4 users transmitting QPSK unit power signals and M ¼ 6 sensors in the receiver. The noise is inserted in each sensor with power given by SNR ¼ 15 dB: The mixture matrix is complex valued and drawn randomly from a Gaussian zero mean and unit variance distribution for each of 100 independent trials. Fig. 6 shows the temporal evolution of the average CME for the considered algorithms. We have also included the MUK algorithm, and principal component analysis (PCA) is performed for this algorithm since pre-whitening is required. Simulation parameters are: mMUK ¼ mMU-CMA ¼ 4 10 3 ; mMU-CFPA ¼ 10 2 ; o ¼ 0:98; g ¼ 10 3 ; s2r ¼ 0:2 and Wð0Þ ¼ We ð0Þ ¼ Ry ð0Þ ¼ Pð0Þ ¼ I: One can easily observe that the random behavior of the mixture makes the MU-CFPA achieve the final CME at the same order of the other methods, however, it outperforms the other blind criteria w.r.t. the convergence rate. Again, this behavior is due to the use of a different number of higher order statistics by the criteria. Considering all HOS provides the observed gain [3]. The MU-CFPA was used instead of the
ARTICLE IN PRESS C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072
1070 0 −2 −4
MUK
CME [dB]
−6 −8
MU-CMA
−10
MU-CFPA
−12 −14 −16 −18
0
1000
2000
3000
4000
Iterations Fig. 6. Temporal evolution of the average constant modulus error for blind source separation of random channels K ¼ 4 and M ¼ 6:
Table 2 Space–time system setup User
Path 1
Path 2
Delay (T)
DOA (rad)
Delay (T)
DOA (rad)
#1
0.1
1.1
#2
0.4
2p 5 p 7
1.2
p 3 p
6
MU-FPA due to its better performance in this case of study.
A continuously trained LMS algorithm is included in this comparison as a reference for the blind algorithms. In this case, the transmitted sequences from all users are known at the receiver to update the filters making a data-aided processing. It should be noted that the MU-CFPA is not used because it does not have a known space–time version. For each algorithm, we have measured the normalized residual interference (RI), which is defined as "P " " jgk;i ðnÞj maxi jgk;i ðnÞj" i , (44) RIk ðnÞ ¼ maxi jgk;i ðnÞj where gk;i is the ith component of the vector gk ðnÞ ¼ WH k H and H is a composite channel matrix given by H ¼ ½ H1 H2 HK : This figure of merit indicates the amount of residual interference that exists in each case when compared to a zero-forcing optimal solution. Fig. 7 shows the temporal evolution of the average RI for the considered algorithms (MU-CMA and MU-FPA) under the scenario described above. The curve is an average over both users and 200 independent trials of the experiment. Simulation parameters are: mMU-FPA ¼ 10 3 ; mMU-CMA ¼ 5 10 3 ; mLMS ¼ 10 2 ; o ¼ 0:99; s2r ¼ 0:1 and Wk ð0Þ ¼ Ry;‘ ð0Þ ¼ P‘ ð0Þ ¼ I: It can be seen that the MU-FPA converges much faster than the MU-CMA and slightly slower than the LMS in the initial acquisition phase. Similar performance 10
5.3. Space– time multi-user processing 5
MU-CMA
RI [dB]
In this case, we consider K ¼ 2 users transmitting simultaneously unit power QPSK signals. We also consider Lk ¼ 2 for the users and their relative delays (in terms of the symbol interval T), and DOAs are shown in Table 2. Additive noise is inserted in each antenna that composes the receiver with an SNR ¼ 30 dB: Furthermore, the pulse shape is a raised cosine with roll-off 0.35. At the receiver, each sensor has temporal filters with N ¼ 2 coefficients. We also consider D ¼ 2 delayed samples for temporal decorrelation of the signals. Also, we assume that both users are recovered at a decision delay ‘ ¼ 0:
0
MU-FPA 5
−10
LMS −15 0
500
1000
1500
2000
2500
3000
Iterations Fig. 7. Averaged residual interference for space–time multiuser processing.
ARTICLE IN PRESS C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072
superiority of MU-FPA was verified over a large range of multiuser scenarios where different number of sources, relative powers, relative delays and angle displacements were considered.
1071
Acknowledgements The authors would like to deeply thank the anonymous reviewers for their comments and suggestions that contributed a lot to the manuscript. They also thank Dr. Renato R. Lopes by his careful proofreading and comments.
6. Conclusions and perspectives In this paper we have presented a two blind criteria based on the estimation of the probability density function of the recovered signals for adaptive blind source separation. The criterion of interference removal is based on the maximization of the similarity between the recovered signal probability density function and a parametric model that fits the system order and transmitted signal statistical characteristics. The Kullback–Leibler divergence is then used to derive the algorithm. Two different strategies that guarantee the recovering of the different sources involved in the process are used to construct multi-user approaches. In the first, a penalization term is inserted to force the recovered sources to be uncorrelated with each other. The latter employs a procedure that constrains the global system matrix being orthogonal so that the different fonts can be recovered. By evaluating the proposals in typical environments of multi-user communications, we have observed a better convergence rate and steady state error from our approaches compared to other existing blind criteria for the same purpose. A direct extension of this work is to derive a space–time version of the constrained algorithm using tensors to model the system and signals involved. Furthermore, an analytical analysis of the influence of the number of considered higher order statistics on the criteria is under development. Another direction is the analytical and computational comparison of different measures than KLD one to similarities between functions and different directions to KLD, as mentioned in Section 3. Afterwards, the inclusion of nonparametric estimation using kernels is envisaged to extend the use of the proposed methods for problems when the density of the sources is not available, such as in biomedical processing.
References [1] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, UK, 1995. [2] C.C. Cavalcante, Neural prediction and probability density function estimation applied to blind equalization, Master’s Thesis, Federal University of Ceara´ (UFC), Fortaleza, CE - Brazil, February 2001. [3] C.C. Cavalcante, On blind source separation: proposals and analysis of multi-user processing strategies, Ph.D. Thesis, State University of Campinas (UNICAMP) – DECOM, Campinas, SP - Brazil, April 2004. [4] C.C. Cavalcante, F.R.P. Cavalcanti, J.C.M. Mota, Adaptive blind multiuser separation criterion based on loglikelihood maximisation, IEE Electron. Lett. 38 (20) (2002) 1231–1233. [5] C.C. Cavalcante, F.R.P. Cavalcanti, J.C.M. Mota, A PDF estimation-based blind criterion for adaptive equalization, in: Proceedings of the IEEE International Symposium on Telecommunications (ITS 2002), Natal, Brazil, 2002. [6] C.C. Cavalcante, J.C.M. Mota, J.M.T. Romano, Polynomial Expansion of the Probability Density Function About Gaussian Mixtures, in: Proceedings of the IEEE Workshop on Machine Learning for Signal Processing (MLSP 2004), Sa˜o Luı´ s, Brazil, 2004. [7] F.R.P. Cavalcanti, J.M.T. Romano, Blind multiuser detection in space division multiple access systems, Ann. des Te´le´commun. (7–8) (1999) 411–419. [8] P. Comon, Independent component analysis: a new concept?, Signal Process. 36 (3) (1994) 287–314. [9] N. Delfosse, P. Loubaton, Adaptive blind separation of independent sources: a deflation approach, Signal Process. 45 (1995) 59–83. [10] Z. Ding, Y.G. Li, Blind Equalization and Identification, Marcel Dekker, New York, USA, 2001. [11] D. Donoho, On Minimum Entropy Deconvolution, Academic Press, New York, 1981, pp. 565–608. [12] S. Haykin (Ed.), Unsupervised adaptive filtering, Source Separation, vol. I, Wiley, New York, 2000. [13] J. He´rault, C. Jutten, B. Ans, De´tection de grandeurs primitives dans un message composite par une architecture de calcul neuromime´tique en apprentissage non supervise´, in: Actes du Xe´me Colloque GRETSI, Nice, France, 1985, pp. 1017–1022. [14] A. Hyva¨rinen, E. Oja, J. Karhunen, Independent Component Analysis, Wiley, New York, 2001.
ARTICLE IN PRESS 1072
C.C. Cavalcante, J.M.T. Romano / Signal Processing 85 (2005) 1059–1072
[15] R. Jenssen, J.C. Prı´ ncipe, T. Eltoft, Information cut and information forces for clustering, in: Proceedings of the IEEE Workshop on Neural Networks for Signal Processing (NNSP2003), Toulose, France, 2003, pp. 459–467. [16] J.-L. Lacoume, P.-O. Amblard, P. Comon, Statistiques d’Ordre Supe´rieur pour le Traitement du Signal, (Traitement du Signal), Masson, Paris, 1997. [17] S. Lambotharan, J.A. Chambers, A.G. Constatinides, Adaptive blind retrieval techniques for multiuser DSCDMA signals, IEE Electron. Lett. 35 (9) (1999) 693–695. [18] M. La´zaro, I. Santamarı´ a, D. Erdogmus, K.E. Hild, II, C. Pantaleo´n, J. C. Prı´ ncipe, Stochastic blind equalization methods based on PDF fitting using parzen estimator, IEEE Trans. Signal Process., to appear. [19] C.B. Papadias, Globally convergente blind source separation based on a multiuser kurtosis maximization criterion, IEEE Trans. Signal Process. 48 (12) (2000) 3508–3519. [20] C.B. Papadias, Blind Separation of Independent Sources Based on Multiuser Kurtosis Optimization Criteria, vol. 2, Wiley, New York, 2000, pp. 147–179 (Chapter 4). [21] C.B. Papadias, A.J. Paulraj, A constant modulus algorithm for multiuser signal separation in presence of delay
[22]
[23]
[24]
[25]
[26]
spread using antenna array, IEEE Signal Process. Lett. 4 (6) (1997) 178–181. J. Sala-Alvarez, G. Va´zquez-Grau, Statistical reference criteria for adaptive signal processing in digital communications, IEEE Trans. Signal Process. 45 (1) (1997) 14–31. I. Santamarı´ a, C. Pantaleo´n, L. Vielva, J.C. Prı´ ncipe, Adaptive Blind Equalization Through Quadratic PDF Matching, in: Proceedings of the XI European Signal Processing Conference (EUSIPCO 2002), vol. II, Toulouse, France, 2002, pp. 289–292. O. Shalvi, E. Weinstein, New criteria for blind deconvolution of nonminimum phase systems (Channels), IEEE Trans. on Information Theory 36 (2) (1990) 312–321. K. Torkkola, Blind signal separation in communications: making use of known signal distributions, in: Proceedings of the 1998 IEEE DSP Workshop, Bryce Canion, UT, USA, 1998. J.K. Tugnait, Adaptive blind separation of convolutive mixtures of independent linear signals, Signal Process. 73 (1999) 139–152.