ON PROPERTIES OF THE WIDELY LINEAR MSE FILTER AND ITS ...

Report 25 Downloads 105 Views
ON PROPERTIES OF THE WIDELY LINEAR MSE FILTER AND ITS LMS IMPLEMENTATION Tiilay Adall, Hualiang Li, and Ronald Aloysius Department of Computer Science and Electrical Engineering University of Maryland Baltimore County Baltimore, Maryland 21250 ABSTRACT Widely linear filters have been receiving much attention lately and have been proposed for many signal processing applications where the traditional circularity assumptions on the complex data do not hold. In this paper, we study the properties of the mean-square-error (MSE) widely linear filter and its least mean squares (LMS) adaptive implementation. We show that in certain cases, widely linear filter does not provide any additional advantage compared to the linear filter even with highly noncircular data. On the other hand, we show examples of cases where it can lead to important performance gains even when the input is circular. We also show that its performance can slow down significantly with highly noncircular inputs when it is implemented using an LMS type gradient descent algorithm thus making recursive least squares type adaptive implementations more desirable for an adaptive widely linear filter. Keywords: Complex-valued signal processing, widely linear, adaptive filtering, LMS, convergence.

1. INTRODUCTION

Widely linear filters [3] augment the data vector with its conjugate, and thus provide the complete second-order statistical information when computed using the minimum meansquare-error cost function. Given the rich structure of the data types in today's applications, the limiting nature of the circularity assumption has been more clearly emphasized leading to an increased interest in widely linear filters. In a recent literature search, we have counted 82 publications in the IEEE Xplore that propose widely linear solutions following its introduction in 1995 [3]. Only three of these publications appeared before 200Q-and all three are by the authors of [3],-and the vast majority of these 82 papers have appeared within the last three years. Widely linear filters have been proposed for applications such as interference cancelation, demodulation, and equalization for direct sequence code-division-multiple-access systems, and array receivers (see e.g. [4-7]). Being the most popular adaptive filtering solution, the stochastic gradient descent least mean squares (LMS) algorithm [8] has been implemented for the widely linear filter, and has been the most commonly used approach for adaptively estimating the weights of a widely linear MSE filter. See e.g. [5, 9, 10], and the recently re-derived versions that are re-named the augmented LMS algorithm [11,12].

Complex-valued signals arise frequently in applications as diverse as communications, radar, and biomedicine. A fundamental result in the processing of complex signals states that one has to take into account both the covariance and the pseudo-covariance functions in order to completely characterize their second-order statistics. Only when the process is circular, the covariance function is sufficient since the pseudocovariance in this case is zero. Most signal processing algorithms developed for the complex domain have assumed circularity of the signals either explicitly, or implicitly, by discarding the pseudocovariance information, which, needless to say is an oversimplification of the problem and is likely to lead to suboptimal solutions. One of the reasons for the prevalence of the circularity assumption in the development of many signal processing algorithms has been the inherent assumption of stationarity. Since the complex envelope of a

In this paper, we derive the widely linear filter using an augmented notation that allows easy extension of most results for the performance of the linear LMS filter to the the widely linear case. We discuss the tradeoffs involved in the selection of the two filter types, widely linear vs linear one, and particularly note that even though the widely linear filter might lead to lower MSE in certain cases, this advantage comes at the expense of decreasing convergence rate with increasing noncircularity of the input signal in its LMS implementation. We also note cases where the performance of a widely linear and linear filter are the same, i.e., discuss cases where using a

THIS WORK WAS SUPPORTED BY THE NSF GRANTS NSF-CCF 0635129 AND NSF-lIS 0612076.

978-1-4244-2734-5/09/$25.00 ©2009 IEEE

stationary signal is second-order circular [1], circularity is directly implied in this case. However many signals are not stationary, and a good number of complex-valued signals such as functional magnetic resonance and wind data as shown in [2], do not necessarily have circular distributions.

876

Authorized licensed use limited to: University of Maryland Baltimore Cty. Downloaded on June 22, 2009 at 11:42 from IEEE Xplore. Restrictions apply.

widely linear filter that doubles the filter length does not provide any additional advantages. Finally, we show that widely linear filter can provide performance advantages even when the input is circular. Given the recent interest in the use of widely linear filters, these results thus help define the problems of particular interest for the use of these filters and the issues one needs to consider in their use.

Circularity is a strong property, preserved under linear transformations, and since it implies non-informative phase, a realvalued approach and a complex-valued approach for this case are usually equivalent [19].

3. WIDELY LINEAR MSE FILTER A widely linear filter forms the estimate of a desired sequence

d( n) through the inner product

2. COMPLEX PRELIMINARIES

(2)

A complex-valued random variable X = X r + jXi is defined through the joint probability density function (pdf) ! x (x) ~ !XrXi (X r , Xi) provided that it exists. To simplify the expressions, in this paper, we assume that the input x(n) is zero mean without loss of generality. Second-order statistics of a zero mean complex random vector X are completely defined through two covariance matrices: the covariance matrix

where the weight vector w = [WO WI W2N_I]T has double dimension compared to the linear filter and _( ) _ [x(n) ] x n x*(n) with the definition x(n) = [x(n) x(n - 1) 1)]T. The MSE cost in this case is written as

c = E{XX H } that is commonly used, and in addition, the pseudo-covariance [13] matrix -also called the complementary covariance [14] or the relation matrix [15]-given by P

= E{XX T }.

The covariance matrix is a Hermitian and nonnegative definite matrix whereas the pseudo-covariance matrix is a complex symmetric matrix. Hence, while the non-negative eigenvalues of the covariance matrix can be identified using simple eigenvalue decomposition, for the pseudo-covariance matrix, we have to use Takagi's factorization [16] to obtain the spectral representation such that

+

while the traditional linear MSE cost given by JL(w) = E{ld(n) - wfx(n)1 2 } with WL E eN. By using Wirtinger calculus [2, 20], we can directly take the derivative of the MSE with respect to w* (by treating the variable w as a constant) such that

aE{e~~:*(n)} =

-E{x(n) [d*(n) -wTx*(n)]}

(3)

and obtain the widely linear version of the Wiener-Hop!equation E{x(n)x H (n)}woPt = E{d*(n)x(n)}

(1)

where Q is a unitary matrix and D = diag{~I, ~2,···, ~N} contains the singular values, 1 2: ~I 2: ~2 2: ... 2: ~N 2: 0, on its diagonal. The values ~n are canonical correlations of a given vector and its complex conjugate [17] and are called the circularity coefficients [18] since for a second-order circular random vector, which we define next, these values are all zero-which actually implies that they should be rather called the noncircularity coefficients. An important property of complex-valued random variables is related to their circular nature. A zero-mean complex random vector is called second-order circular [1] (or proper [13, 14]) when its pseudo-covariance matrix P = 0, which implies that E{XrX:} = E{XiX[} and E{XrX[} = -E{XiX:}. A stronger condition for circularity is based on the pdf of the random variable, and the random variable is called circular in the strict-sense, or simply circular, if X and X e jO have the same pdf, i.e., the pdf is rotation invariant [1].

978-1-4244-2734-5/09/$25.00 ©2009 IEEE

x(n - N

by setting (3) to zero. For simplicity, we assume that the input is zero mean so that the covariance and correlation functions coincide. Thus the optimal widely linear weight vector is given by I Wopt = C- P where P C*

and

p

= E {d*(n)x(n)} = [ ;. ]

with the definition of the two cross covariance vectors p = E{d*(n)x(n)} and q = E{d(n)x(n)}. Matrix C provides the complete second-order statistical information for a zeromean complex random process and is called the augmented covariance matrix. Note that the traditional MSE solution is simply given by Wopt = C-Ip thus completely ignoring the information given by the pseudo covariance matrix P and the pseudo cross correlation vector q.

877

Authorized licensed use limited to: University of Maryland Baltimore Cty. Downloaded on June 22, 2009 at 11:42 from IEEE Xplore. Restrictions apply.

4. WIDELY LINEAR LMS ALGORITHM

The difference in the minimum MSE value for the two linear models can be calculated as [3]:

JL(Wopt) -

The weight vector W can be computed adaptively using gradient descent updates

JWL(Wopt)

pHC-1p _ pHC-1p

w(n + 1)

(q* _ P*C-1p)H (C* _ P*C-1p)-1 (q* - P*C-1p).

+ v(n)

(5)

where Wopt E eN and v(n) is additive white Gaussian noise strongly uncorrelated with the input, i.e., E{x(n-k )v* (n)} = o and E{x(n - k)v(n)} = 0 for all k. Also let the input be noncircular which implies that P =1= 0 and q =1= o. We can estimate the optimal coefficients using either the traditional Wiener-Hopf equations Wopt = C-1p or what we call the complementary Wiener-Hopf equations W~Pt = p-lq. It is then easy to observe that for this case, we have q * P*C-1p = o. This result includes a number of important cases such as linear system identification and linear prediction where the input is written as an autoregressive process x(n) = -aHx(n) + v(n) where the driving noise process v(n) is doubly white, i.e., uncorrelated with both v(n - k) and v* (n - k) for all k. In literature, we see that examples are presented for this case with widely linear filters [11]. It is important to note that a process can be doubly white but still noncircular, which is a case that will not provide any advantages for the widely linear filter as we note here. The properties of the widely linear MSE filter for the autoregressive case is discussed in [22] in detail for a number of cases. Hence there are cases where the performance of the linear and the widely linear filters are equal even when the circularity conditions for the data do not hold. In contrast, in certain situations, the widely linear filters provide additional advantages even for circular data as we show with an example in the next section. Thus, the advantage offered by the widely linear filter is not limited to noncircular cases, and when assessing the performance of a widely linear filter one has to be careful about additional assumptions such as having a doubly white process.

978-1-4244-2734-5/09/$25.00 ©2009 IEEE

() n -

w(n)

(4)

Since the covariance matrix C is assumed to be nonsingular and thus is positive definite, the error difference Jdiff is always nonnegative. In [21], it is shown that the performance gain offered by the widely linear filter can be as much as twice that of a linear filter. When the joint-circularity condition is satisfied, i.e., when P = 0 and q = 0, the performance of the two filters, the linear and the widely linear filter, coincide, and there is no gain in using a widely linear filter. However, it is important to note that the performance of the two filters can be the same for cases where the input is noncircular as well. Let the desired signal be generated from a purely linear model such that

d(n) = W~tx(n)

W

8JWL(W) j.,t

8w*(n)

+ j.,tE{e*(n)x(n)}

or using stochastic gradient updates as in w(n + 1)

= w(n) + j.,te*(n)x(n)

where j.,t is the stepsize and e( n) = d(n) -w H (n)x(n), which leads to the widely linear version of the popular least-meansquare (LMS) algorithm [23]. The convergence of the LMS algorithm depends on the eigenvalues of the input covariance matrix, which in the case of a widely linear LMS filter, is replaced by the eigenvalues of the augmented covariance matrix. A main result in this context can be described through the natural modes of the LMS algorithm [24] as follows. Define € ( n) as the weight vector error difference € ( n) = w(n) - Wopt and let the desired response be written as

d(n) = w~tx(n)

+ v(n)

where different from the example in (5), here Wopt E e2N since we have used the augmented input vector in the characterization. When the noise term v(n) is strongly uncorrelated with the input, we have

E{€(n + I)} = (I - j.,tC)E{€(n)}. We introduce the rotated version of the weight vector error difference €' ( n) = QH € ( n) where Q is the unitary matrix composed of the eigenvectors associated with the eigenvalues of C, i.e., we factor the augmented covariance matrix using the unitary similarity transformation C = QAQH and write the mean value of the natural mode Ek(n)-the kth element of vector €' ( n )-as

(6) where "5.. k is the kth eigenvalue of C. Thus for the convergence of LMS updates to the true solution in the mean, the step size has to be chosen such that 0 < j.,t < 2/"5.. max where "5.. max is the maximum eigenvalue of the augmented covariance matrix C. The bound for the maximum step size has been noted in [9] where the widely linear MSE filter's performance is studied for multiple interference suppression for direct-sequence code-division-multiple access systems. However, the distribution of the eigenvalues and the values of all eigenvalues play an important role on the convergence behavior of the LMS algorithm. As observed in (6), small eigenvalues significantly slow down the convergence in the mean, and hence it is especially important to study the behavior of the

878

Authorized licensed use limited to: University of Maryland Baltimore Cty. Downloaded on June 22, 2009 at 11:42 from IEEE Xplore. Restrictions apply.

eigenvalues of the augmented covariance matrix as the input noncircularity changes. We can use the strongly uncorrelating transform [25] to jointly diagonalize C and P such that C = QQH and P = QDQT where Q is a nonsingular complex matrix and D is a real diagonal matrix defined in (1). We can then write the augmented covariance matrix as

6.------------;:=====::::;l "'0

Q5

~

~ -

-J-.)-

o

~

~'w

.,lJ~"

I',,,",

0000 6ample6

(a) Circular input (p

..

m

0000

= 1/ V2)

6.----------;:=====:::::;, -6 ajele Imeal Imeal

"'0

:::- -66 o

[C0 C*0]

Q5 Q)

~ -66

~

and has eigenvalues that occur with even multiplicity. In this case, the conditioning of the augmented covariance matrix C and the covariance matrix C are the same. As the noncircularity of the signal increases, the values of the entries of the pseudo covariance matrix moves away from zero increasing the condition number of the augmented covariance matrix C, thus the advantage of using a widely linear filter for noncircular signals comes at a cost when the LMS algorithm is used to estimate the widely linear MSE solution. An update scheme such as recursive least squares algorithm [26] which is less sensitive to the eigenvalue spread can be more desirable in such cases.

_66L..------....L..---------J

o

0000 6ample6

(b) Noncircular input (p

0000

= 0.1)

Fig. 1. Convergence of the linear and widely linear filter for a circular input (a) and a noncircular input (b) for a linear finite impulse response system identification problem each with multiplicity N. Hence, the condition number is given by

"'(C) = Amax =

5. EXAMPLES

Amin

In order to quantify the impact of noncircularity on the performance of the widely linear filter, we introduce a simple input model and demonstrate the performance differences of linear and widely linear filters. We define a random process X(n) = J!=P2Xr (n) + jpXi(n) where Xr(n) and Xi(n) are two uncorrelated real-valued random processes, both Gaussian distributed with zero mean and unit variance. By changing the value of p E [0,1], we can change the degree of noncircularity of x(n) and for p = 1/-}2, the random process X (n) becomes circular. Note that since secondorder circularity implies strict-sense circularity for Gaussian signals, this model lets us to generate a circular signal as well. The covariance matrix ofX(n) = [X(n)X(n - 1)··· X (n - N + 1)]T is given by C = I, and the pseudo covariance matrix as P = (1- 2 p2)I. The eigenvalues of the augmented covariance matrix C can be shown to be 2p2 and 2(1 - p2),

978-1-4244-2734-5/09/$25.00 ©2009 IEEE

t. .",,-,c..~ "-\.,t."\"-~,~".".---. ' ..... ('\

_66L..------....L..---------J

or can further factor C as in [17]. However, such factorizations or tools such as majorization used in [17] do not allow us to obtain a direct relationship for the eigenvalues of C and C. We can, however, note the relationship of the eigenvalues for the two matrices, C and C as we move from circular to noncircular data. When the signal is circular, the augmented covariance matrix assumes the block diagonal form

=

66

~

H

-

1

:::- -66 o

- [Q 0] [ID D] [Q0 QT 0] C = 0 Q* I

C circ

-6 ajele Imeall Imeal

m

2p2

1

if p E [0, 1/-}2] and by its inverse if p E [1/-}2,1]. In Figure 1, we show the convergence behavior of a linear and a widely linear LMS filter with input generated using the model defined above for identification of a system with coefficients Wopt,n = a(1 + cos(27r(n - 3)/5) - j[1 + cos(27r(n3)/10)]), n = 1,··· ,5, and a is chosen so that the weight norm is unity (in this case, a = 0.432). The input signal to noise ratio is 20 dB and the step size is fixed at J.L = 0.04 for all runs. In Figure 1 (a), we show the learning curve for a circular input, i.e., p = 1/-}2, and in Figure 1 (b), with a noncircular input where p = 0.1. For the first case, the condition numbers for both C and C are approximately unity whereas for the second case, ~(C) ~ 1 but ~(C) ~ 100. As expected, when the input is noncircular, the convergence rate of the widely linear LMS filter decreases. Since the lengths for the linear and widely linear filter are selected to match that of

879

Authorized licensed use limited to: University of Maryland Baltimore Cty. Downloaded on June 22, 2009 at 11:42 from IEEE Xplore. Restrictions apply.

the unknown system (as 5 and 10 respectively), as we show in Section 4, both filters yield similar steady-state mean square error values. In this example, even though the input is noncircular, the use of a widely linear filter does not provide an additional advantage in terms of minimum MSE. In addition, the convergence rate of the LMS algorithm decreases when the input is noncircular. Another observation to note for this example is that the steady-state error variance for the widely linear filter is slightly higher compared to the linear filter. The steadystate mean-square error for the widely linear LMS filter can be approximated as

JWL(oo) = JWL,min

+ j.LJw2L,min

6.------------.------------,

m "'0

Q5

idely linear linear

Q)

~ -66

~

_66L..------....L..---------J

o

0000 6ample6

(a) Circular input (p

2N """ L...J Ak k=l

when the step size is assumed to be small. The steady-state error expression for the linear LMS filter has the same form except the very last term, which is replaced by L~=l Ak where Ak denotes the eigenvalues of C [26]. Since we have 2N 2 N 2 Lk=l Ak = Trace(C) = 2Na and Lk=l Ak = Na where a 2 = E{IX(n)1 2 }, compared to the linear LMS filter, doubling the dimension for the widely linear filter increases the residual mean-square error compared to the linear LMS filter as expected. The difference can be eliminated by using an annealing procedure such that the step size is also adjusted such that j.L(n) ---+ 0 as n ---+ 00. In Figure 2, we show the learning curves for the linear and widely linear LMS filters for a simple nonlinear channel. All the settings for the simulation are the same as those in Figure 1 except that the unknown system output is given by d( n) = Re { W~tx( n)} and the filter coefficients Wopt,n are selected as before. As observed in the figures, for both the circular and noncircular case, the widely linear filter provides smaller MSE, though its convergence is again slower for the noncircular input due to the increased eigenvalue spread. An interesting point to note in this example is that the advantage of using a widely linear filter-in terms of the minimum MSE that is achieved-is more pronounced in this case for circular input, even though the advantages of widely linear filters have been particularly noted for noncircular signals. For a circular input, the MSE gain by using a widely linear filter given in (4) reduces to Jdiff = IIql12 = IIE{d(n)x(n)}11 2 and is clearly nonzero for the nonlinear system chosen in the example shown in Figure 2.

6. SUMMARY

0000

= 1/ V2)

-11.----------;:=====:::::;, -1 ajele Imeal Imeal