IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 11, NOVEMBER 2008
5353
On the Asymptotic Behavior of the Sample Estimates of Eigenvalues and Eigenvectors of Covariance Matrices Xavier Mestre, Member, IEEE
Abstract—This paper analyzes the asymptotic behavior of the sample estimators of the eigenvalues and eigenvectors of covariance matrices. Rather than considering traditional large samplesize asymptotics, in this paper the focus is on limited sample size situations, whereby the number of available observations is comparable in magnitude to the observation dimension. Using tools from random matrix theory, the asymptotic behavior of the traditional sample estimates is investigated under the assumption that both the sample size and the observation dimension tend to infinity, while their quotient converges to a positive quantity. Assuming that an asymptotic eigenvalue splitting condition is fulfilled, closed form asymptotic expressions of these estimators are derived, proving inconsistency of the traditional sample estimators in these asymptotic conditions. The derived expressions are shown to provide a valuable insight into the behavior of the sample estimators in the small sample size regime. Index Terms—Random matrix theory, sample covariance matrix, sample eigenvalues, sample eigenvectors.
vations of a certain -dimensional stochastic process, denoted by , where , . We assume that these observations have zero mean and covariance matrix . We will denote by the set of pairwise different eigenvalues of the covariance matrix , where here is the number of distinct true eigenvalues . Each of the eigenvalues has multiplicity , , so that . Associated with each eigenvalue , , there is a complex subspace of dimension . This subspace is determined by an matrix of eigenvectors, denoted by , such that . Note that this specification is unique up to right multiplication by an orthogonal matrix. The sample covariance matrix is constructed from the observations as
I. INTRODUCTION
E
IGENVALUES and eigenvectors of covariance matrices appear naturally in multiple applications of signal processing, econometrics, and general statistical inference. For instance, eigendecomposition methods are extensively used in many signal processing problems, such as direction of arrival estimation [1], beamforming [2], frequency estimation [3], blind channel identification [4] or model order selection [5]. In practice, the covariance matrix of the observed signals is unknown, and so are therefore both its eigenvalues and its eigenvectors, which need to be estimated from the observations. Very often, the eigenvalues and eigenvectors of the sample covariance matrix (hereafter referred to as sample eigenvalues and eigenvectors) are used in order to estimate the original quantities. These are, indeed, the maximum-likelihood estimators of the true eigenvalues and eigenvectors under the usual Gaussianity assumption for the observations. To introduce the form of these estimators, consider a collection of independent and identically distributed (i.i.d.) obserManuscript received April 2, 2008; revised July 11, 2008. First published August 19, 2008; current version published October 15, 2008. The associate editor coordinating the review of this paper and approving it for publication was Dr. Walid Hachem. This work was partially supported by the Catalan Government under Grant SGR2005-00690 and the European Commission under project NEWCOM++ 7FP-ICT-216715. The author is with the Centre Tecnològic de Telecomunicacions de Catalunya, 08860 Castelldefels, Barcelona, Spain (e-mail:
[email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSP.2008.929662
The eigenvalues and associated eigenvectors of the sample covariance matrix will be denoted by and respectively. Throughout the paper, we will refer to these quantities as sample eigenvalues and sample eigenvectors. Let us assume that we want to estimate the th eigenvalue of the true covariance matrix , , which has multiplicity . Let be the set of indexes
(the cardinality of is equal to the multiplicity of the eigenvalue , namely ). Now, the classical estimator of th eigenvalue of the true covariance matrix is given by (1) This is, indeed, the maximum-likelihood estimator of when follow a multivariate the observations Gaussian distribution (see [6, Theorem 2] and [7, Theorem 9.3.1]). In the particular case where the estimated eigenvalue is known to have multiplicity one , the estimated eigenvalues in (1) are equal to the sample eigenvalues. On the other hand, the associated subspaces are traditionally estimated using the sample eigenvectors, i.e., the eigenvectors of the sample covariance matrix. In order to avoid the underlying uncertainty in the definition of eigenvector matrices associated with a particular subspace (which are defined up to right multiplication by an orthonormal matrix), we consider the estimation
1053-587X/$25.00 © 2008 IEEE Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
5354
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 11, NOVEMBER 2008
of the corresponding eigenprojection matrices. The eigenprojection matrix associated with a subspace spanned by columns of an eigenvector matrix is defined as . One can therefore reformulate the problem of estimating a subspace associated with a particular eigenvalue as the estimation of the corresponding eigenprojection matrix. The classical estimator of this subspace matrix is given by (2) Again, in the particular case where the estimated subspace is known to have dimension one, this matrix is just the outer . product of the sample eigenvectors The exact statistical behavior of (1) and (2) is in practice too complicated to offer valuable insight in practical applications. For this reason, researchers have usually considered the statistical behavior of these two estimators in the asymptotic regime assuming that the number of samples is high compared to the observation dimension. Under some standard statistical assumptions, the two estimators in (1) and (2) are strongly -consistent (i.e., almost surely consistent as the number of samples goes to infinity, ), and their joint asymptotic distribution has been extensively studied in the literature (again, as ). For example, it is well known [6], [8], [9] that, assuming that the observations are Gaussian distributed with zero mean and covariance matrix , and assuming that (single multiplicity eigenvalue)
where convergence is in distribution. A similar result can also be established for the sample eigenvectors, i.e., can be shown to converge to a multivariate Gaussian random vector with zero mean and such that
is the covariance of
and
for the cross-covariance between and , , in the asymptotic distribution (assuming that the associated eigenvalues have both multiplicity one). These results have been used in a number of papers in order to establish the asymptotic behavior of a number of subspace-based estimators. In this paper, we take a different approach in order to investigate the properties of the classical estimators of the eigenvalues and eigenvectors of the covariance matrix. We focus on the finite sample size situation, whereby the sample size and the observation dimension are comparable in magnitude. So, instead of focusing on the situation where the samples size is large whereas the observation dimension is fixed, we consider the asymptotic situation where these two quantities are large but are comparable in magnitude. In other words, we assume that the sample size is a function of the observation dimension, namely , so that , whereas
, the quotient of these two quantities, denoted by converges to a fixed finite quantity , . We will analyze the asymptotic behavior of the classical sample estimators in (1) and (2) when both sample size and observation dimension tend to infinity at the same rate. In particular, we will see that these traditional estimators do not converge to the true quantities, so that they are inconsistent in this asymptotic regime. This confirms the poor performance of generic subspace estimators in the finite sample size regime, where1 and are comparable in magnitude [10]. Regarding the estimation of the subspaces associated with the eigenvalues of the covariance matrix, it makes little sense to try to characterize the behavior of the eigenprojection mabecause their dimension increases to infinity as trices grows large. Instead of that, we will consider the behavior of quadratic forms of the type (3) where , are two generic deterministic column vectors. In particular, one can choose and as the th and th column vectors of the identity matrix. In this case, in (3) would correspond to the th entry of the matrix . The rest of the paper is organized as follows. Section II presents the main properties of the spectrum of large sample covariance matrices that are used in the rest of the paper. Section III presents the main results of the paper, namely the asymptotic expressions for the traditional sample estimators. These results are proven in Section IV. Section V presents some simulation results and, finally Section VI concludes the paper. II. ASYMPTOTIC SPECTRAL BEHAVIOR OF THE SAMPLE COVARIANCE MATRIX The asymptotic characterization of the spectrum of sample and increase without correlation matrices when both bound at the same rate has received a lot of attention in the past decades. Indeed, random matrix theory has studied the asymptotic behavior of the eigenvalues and eigenvectors of different random matrix models, including the sample covariance matrix (see, e.g., [11] for a review of the most important results in the area). Using traditional random matrix theory one can show that, under some statistical assumptions to be specified later, the distribution of eigenvalues of is almost surely close (as at the same rate) to a nonrandom distribution function associated with a compactly supported density. Furthermore, one can characterize the asymptotic distribution of the eigenvalues of in terms of the asymptotic distribution of the eigenvalues of . Usually, the asymptotic characterization of the eigenvalues of is established in terms of the Stieltjes transform of the corresponding eigenvalue distribution, which is a complex function defined as (4) 1From now on, with some abuse of notation, we will drop the subindexes from
N and c (it will be clear from the context whether c refers to M=N or to the corresponding asymptotic limit).
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
MESTRE: ON THE ASYMPTOTIC BEHAVIOR OF THE SAMPLE ESTIMATES OF EIGENVALUES AND EIGENVECTORS
We will denote by the empirical distribution function of the sample eigenvalues, namely2 . The Stieltjes inversion formula can be used to retrieve
from
5355
almost surely for all , where is the unique solution to the following equation in the set :
as (8) (5) Furthermore
. Hence, in order to characterize the asympfor every totic distribution of the sample eigenvalues, one can alternatively characterize the asymptotic behavior of the corresponding Stieltjes transform, and then use the Stieltjes inversion formula in (5) to obtain the corresponding eigenvalue distribution. The Stieltjes transforms in (4) are appropriate to characterize the asymptotic behavior of the eigenvalues of the sample covariance matrix, but are not useful to describe the asymptotic properties of the subspaces associated with each of the eigenvalues (note that these two functions depend on the eigenvalues, but not on the eigenvectors of the associated matrices). Hence, for our purposes, it is also convenient to consider the following function:
(6) where and are two deterministic vectors. These functions are a class of Stieltjes transforms that were introduced by Girko in, e.g., [12] and [13] and are very useful in order to characterize the asymptotic behavior of quadratic forms of sample eigenvectors. We will assume throughout the paper the following. AS1) has uniformly bounded spectral norm for all , namely , where denotes the spectral norm. AS2) The two column vectors and have uniformly bounded norm for all . AS3) The sample covariance matrix takes the form , where is an matrix with complex independent and identically distributed (i.i.d.) absolutely continuous random entries, each one of them having i.i.d. real and imaginary parts with zero mean, variance , and finite eighth-order moments (the actual distribution of these entries is irrelevant). On the other hand, is the Hermitian positive definite square root of the true covariance matrix . The following result establishes the fact that the two Stieltjes transforms and are almost surely close to two deterministic counterparts, denoted by and respectively. These two deterministic counterparts become essential for the asymptotic analysis of the spectrum of . Theorem 1 [13], [14]: For all , under and as at the same rate ( , ), then (7) 2Here,
#
f1g
denotes the cardinality of a set.
(9) also almost surely for all
, where (10)
is as defined above. Proof: The result in (7) is well known in the random matrix theory literature (see, for instance, [14]). The convergence of in (9) was presented in [13], although part of the proof was omitted. In [15], we provide a full alternative proof, under slightly less generic assumptions. As an immediate consequence of Theorem 1 we can observe that, as at the same rate, if the empirical eigenvalue distribution of converges to a fixed distribution, then the empirical eigenvalue distribution of also tends almost surely to a nonrandom distribution that can be retrieved from using the Stieltjes inversion formula in (5). Note, however, that the result of Theorem 1 is more general, in the sense that the empirical distribution of eigenvalues of may not converge as , but still and become asymptotically close to and , respectively. 1) Example 1: Assume that the asymptotic eigenvalue distribution of is composed of four atoms (Dirac deltas) at the positions . In Fig. 1, we represent the asymptotic eigenvalue density of for different values of . This density was obtained by solving the equation in (8) and then using the inverse Stieltjes transform in (5). Observe that, as the number of samples per observation dimension increases ( , or equivalently, ), the eigenvalues of tend to concentrate around the four eigenvalues of the true spatial covariance matrix . This is reasonable, because as for a fixed , the entries of tend almost surely to those of . Conversely, when the observation dimension is high compared to the number of samples (high ), the eigenvalues of tend to concentrate on clusters around the asymptotic eigenvalues of . If is sufficiently low, or if the asymptotic eigenvalues of are sufficiently separated, each asymptotic eigenvalue generates a different cluster. If, on the contrary, is high or the asymptotic eigenvalues of are very close to one another, one cluster may be associated with multiple eigenvalues of . In the example we are considering, we see that for (100 samples per observation dimension) we can clearly distinguish a different cluster for each eigenvalue of . However, for the case where (one sample per observation dimension) the asymptotic distribution of eigenvalues of consists of a single cluster. Next, we give a more formal characterization of this clustering behavior of the asymptotic density of eigenvalues of . and
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
5356
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 11, NOVEMBER 2008
real-valued solutions counting multiplicities. As most illustrated in Fig. 2(a), the number of real-valued solutions counting multiplicities is always even and they can be ordered as , where equality is understood to hold when there exists a solution with is the total number of real-valued double multiplicity (here, solutions counting multiplicities). Now, it turns out that is the number of different clusters of , excluding atoms at zero whenever . Moreover, each of these clusters is supported by an interval of the type , , where , and (13)
R
Fig. 1. Asymptotic eigenvalue distribution of ^
R
tribution of eigenvalues of with the same multiplicity.
in a situation where the disis given by four different eigenvalues 1, 4, 8, 10
It is shown in Appendix I that , so we can express the support of (excluding zero when ) as the union of compact intervals, namely (14)
A. Characterization of the Support of the Asymptotic Sample Eigenvalue Distribution The number and position of the different clusters in the asymptotic distribution of eigenvalues of sample covariance matrices has been studied by several authors under different assumptions (see, e.g., [16]–[18]). The asymptotic support of the empirical eigenvalue distribution can easily be established by analyzing the behavior of the imaginary part of on the real axis, as shown by the Stieltjes inverse formula in (5). More specifically, the sample eigenvalue density will be (almost surely) asymptotically close to the density (11) and the support of will coincide with the support of on the real axis. Note that the support of is given for finite values of since no assumption is made regarding the convergence of the eigenvalues of the true covariance matrix (in fact, this is the approach followed in [16]). If one additionally assumes that the empirical distribution function of the true eigenvalues converges as , then it is still possible (see further [17]) to give a full characterization of the asymptotic support of the sample eigenvalue density. This characterization is far more complex than the one given here, because we are focusing on the study of for finite values of . In what follows, we give a characterization of this support (see further Appendix I). Consider the following equation in the unknown : (12) In Fig. 2(a), we give a typical representation of the function on the left-hand side of the above equation. The solutions of the equation can be graphically represented as the crossing points with a horizontal line at . Note that (12) is a polynomial equation of degree , so there exist at
, It is interesting to observe that a particular eigenvalue , of the true covariance matrix can be univocally associated with one of the clusters of in (11) as follows. Observing the position of the roots of (12), it can be seen that, given a particular eigenvalue of the true covariance matrix , there exists a unique such that , see further Fig. 2(a). We can therefore associate with the interval of the support of the asymptotic sample eigenvalue distribution. In general, however, this correspondence is not bijective, and different true eigenvalues can be associated with the same eigenvalue cluster of . For example, in Fig. 2(a) the eigenvalues and are associated with two difand are associferent clusters, whereas the eigenvalues ated with the same cluster. In fact, it is readily observed from Fig. 2(a) that the number of eigenvalue clusters increases with decreasing (or, equivalently, increasing number of samples per observation dimension). In particular, for sufficiently high , the density of always consists of a single cluster, and this cluster splits up into smaller ones as decreases to zero. For a sufficiently low , will consist of exactly different clusters (each one associated with one of the distinct eigenvalues , ); see further Fig. 2(b). For the purposes of this paper, it is useful to investigate how low the value of must be —or, equivalently, how many samples for a fixed observation dimension are needed—in order for a cluster associated with a particular eigenvalue of the true covariance matrix to separate from the rest of the density . From the above description, it can readily be seen that we can ensure that the cluster of corresponding to the eigenvalue is separated from the clusters associated with adjacent eigenvalues if and only if , where [see (15), shown at the bottom of the next page], and , , being the different real-valued solutions to the equation (16) ordered as
.
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
MESTRE: ON THE ASYMPTOTIC BEHAVIOR OF THE SAMPLE ESTIMATES OF EIGENVALUES AND EIGENVECTORS
5357
Fig. 2. Typical representation of the function on the left-hand side of (12) (a), and associated form of the density q (x) (b).
Having introduced the asymptotic behavior of the sample eigenvalues, we now turn to the presentation of the main result of the paper. III. MAIN RESULT The following result characterizes the asymptotic behavior of the traditional eigenvalue and eigenvectors in (1) and (2) when both the number of samples and the observation dimension increase without bound at the same rate. It points out that traditional estimators are -consistent (i.e., consistent with large sample size), but not , -consistent, that is, they fail to provide consistent estimates when the observation dimension goes to infinity at the same rate as the sample size ( , ). The characterization of the asymptotic behavior of these two estimators turns out to be especially simple when the eigenvalue cluster associated with the estimated eigenvalue is separated from the rest of the density by a closed non-empty interval for all . As pointed out in Section II-A, this can be guaranteed if we assume the following: AS4) If , , is the eigenvalue that is to be estimated, then , where is as defined in (15). Theorem 2: Consider the two traditional sample estimators of the eigenvalues and associated quadratic forms of eigenprojection matrices, namely (17)
Under As1)–As4), and as ( , ), almost surely, where and
at the same rate and are defined as
(18) where
is the
th solution to the following equation in : (19)
under the convention . Proof: See Section IV. As a direct consequence of this theorem, we observe that the traditional sample estimators of both the eigenvalues and the associated subspaces are not in general , -consistent. In fact,
(15)
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
5358
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 11, NOVEMBER 2008
they are only -consistent (this can be observed from the above and noting that ). expressions, taking limits as Interestingly enough, the asymptotic form of the traditional sample estimator of the eigenvalues coincides with the first order asymptotic expression obtained by Lawyey in [19] assuming that with fixed . Here, we show that, under this is in fact the asymptotic expression of the traditional eigenvalue estimator when these two quantities increase without bound at the same rate. Observe from the that the smallest eigenvalue is asymptotically expression of underestimated, whereas the largest one is overestimated. and, On the other hand, it follows from (19) that hence, for all . This implies that the traditional estimator of the subspace associated with the th eigenvalue becomes asymptotically equivalent to a linear combination of the intended subspace with the subspaces associated with the rest of eigenvalues. Remark 1: It is important to note that the result presented in Theorem 2 is independent of the behavior of the true eigen. In particular, the result is valid regardless of values as whether the empirical distribution of the true eigenvalues of converges or not. Hence, the presented expressions can be used to predict the behavior of a single multiplicity eigenvalue even when its relative multiplicity goes to zero as . IV. PROOF OF THEOREM 2 The proof of the Theorem 2 is based on the fact that we can express the sample estimates of the eigenvalues and eigenvectors in terms of the Stieltjes transforms presented in Section II. The asymptotic behavior of these Stieltjes transforms, which is given in Theorem 1, will then be used to derive the result. We start with the observation that, using the Cauchy integral forand in (17) can be exmula [20], the sample estimates pressed in terms of and , as follows:
(20) where is the boundary, negatively oriented,3 of a rectangle vertically centered at the real axis described as
where that
and and , , are chosen so only encloses the eigenvalues in the set
plex plane,
, by simply defining and . Still, this does not guarantee the convergence of the integral, because the behavior of the contour is not clear, and neither is its influence on the corresponding integrals. In order to avoid some technical difficulties, we concentrate first on the case , i.e., we do not consider the estimation of the lowest eigenvalue of the correlation matrix and its associated subspace. We will see that the behavior of the estican be easily obtained from the performator for the case . The basic difference between mance of the quantities for these two cases is the following. When , the eigenvalue splitting condition As4) necessarily implies that all the sample are positive (almost surely, for sufeigenvalues this may not be true, ficiently high , ). For the case so that we might have for some . The fact that some of the eigenvalues may be zero brings some additional complications to the proof. A. Proof for the Case The following proposition basically states that, asymptotically as , we can replace both the integrands , and the integration contours by some deterministic counterparts without affecting the asymptotic behavior of the integrals. This gives us two deterministic asymptotic equivalents for the two integrals in (20). Proposition 1: Under As1)–As4) (21) and (22) almost surely as defined
at the same rate, where we have
Here, is a negatively oriented contour described by the boundary of the rectangle
Hence, in order to study the asymptotic behavior of and , we only need to concentrate on the asymptotic behavior of these two integrals. The asymptotic behavior of the integrands can be derived using Theorem 1, which basically states that and pointwise almost surely as at the same rate, for all . This convergence can be extended to the lower com-
and , are two real values such that encloses the support of the eigenvalue cluster of associated with , and no other component of its support.4 Proof: See Appendix II. At this point, we have shown that the two random quantities and are almost surely asymptotically close to two deterministic counterparts given in integral form in (21) and (22), respectively. Hence, we only need to evaluate these two integrals. In order to do this, it will be crucial to use a new function of complex variable , defined as follows.
3We follow the usual convention of defining positive contours as those oriented counterclockwise.
4Observe that this choice for and can be guaranteed thanks to the cluster separability condition in (As4).
.
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
MESTRE: ON THE ASYMPTOTIC BEHAVIOR OF THE SAMPLE ESTIMATES OF EIGENVALUES AND EIGENVECTORS
5359
Based on the definition of in Theorem 1, we consider here a new function of complex variable, given by (23) In principle, is defined only for , and so is therefore . We can extend this definition to the real axis as follows. It was shown in [17] that exists for any , and hence we can use this definition to extend to . For , we further define for and for , where is the unique negative solution to
Remark 2: It can be shown (see further Appendix I) that this is equivalent to the following. If definition of (respectively, ) then is chosen as the unique solution to (24) (respectively, ). For , is chosen on the set as follows. If there exists a solution to (24) with positive imaginary part, then it is unique, and we choose as this solution. Otherwise, all the solutions are real-valued, and we choose as the unique one that has the property
Fig. 3. Schematic representation of the evolution of f (x) and f (x) on the complex plane.
when the real part of is in the set and zero otherwise. Proof: See Appendix III. As a consequence of the last two properties, we can conto , the complex funcclude that, as moves from tion describes a curve on the complex plane as shown in Fig. 3, where we have also represented the corresponding complex conjugate curve . Note that, if a cluster associated with a particular eigenvalue is separated from the rest of the asymptotic eigenvalue distribution of , the curves generated by concatenated by , with restricted to the appropriate interval, describe a contour on that encloses only and no other eigenvalue. The main idea behind the proposed approach is to use as integration contour in the expression of the quantities in (21) and (22). We summarize the result in the following proposition. Proposition 3: Under As4), and for
(25) This function has several interesting properties, that we summarize in the following proposition. Proposition 2: The function defined above has the following properties. 1) When is restricted to the real axis, is continuous on and differentiable on except for the points . We denote by the derivative of on 2) The real part of , with , going from to to . 3) The imaginary part of
. , is monotonically increasing when grows from ,
, is positive for , and zero elsewhere. In other words, the imaginary part of is positive only
(26) and [see (27), shown at the bottom of the page], where , , are defined in Theorem 2. Before concentrating on the proof of this proposition, it is worth pointing out that the expressions in (26) and (27) coincide with those presented in the statement of Theorem 2. More specifically, in to obtain the expression of in Theorem 2, we need to simplify (27) for the case , using the fact that is a solution to the equation in (19). Hence, with this proposition we conclude the proof of Theorem 2 for the case . Proof: Taking a trivial parametrization of and using the fact that, by definition for any
(27)
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
5360
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 11, NOVEMBER 2008
, we can express these two integrals as [see the equations shown at the bottom of the page]. It can be shown that and are holothe two complex-valued functions morphic on and uniformly bounded over . As a consequence of the first observation, we know that the two integrals do not depend on the value of (the height of the rectangle ). We can therefore take to be as small as desired, without altering the value of these integrals. So, from the boundedness of the last two integrands in the above equations over the set
(note that real axis. First, we reformulate (30) in terms of this can be done because and are in bijective relationship for each ; see further Appendix I), namely
where in the first identity we have used the fact that is a solution to (8). Now, when varies from to , the function can be seen as a parameterized curve on the complex plane such that
we can ensure
(28) and [see (29), shown at the bottom of the page]. Let us first deal with the integral in (28). It follows from the boundedness of over that we can apply the dominated convergence theorem, so that (30)
describes a simple closed curve that takes values on and encloses only (see further Fig. 3). Noting this, we can express the integral in (28) as a contour integral over . We only need to multiply and divide the integrands by , the derivative of . An expression for this derivative can be obtained by noting that is a solution to (24), so that taking derivatives on both sides, we obtain
In order to evaluate the above integral, we will use the properwhen is restricted to the ties of the complex function
(31)
and
(29)
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
MESTRE: ON THE ASYMPTOTIC BEHAVIOR OF THE SAMPLE ESTIMATES OF EIGENVALUES AND EIGENVECTORS
whenever exists i.e., (when ). Hence, we can write
Inserting the expression of the derivative of into the integral, we get
5361
given in (31)
(36) where (32) Now, using the Cauchy integral formula To solve this integral, we can use the residue theorem [20]. Inis holomorphic on , except for deed, the function two poles. The first one is located at the true eigenvalue , and its multiplicity is one or two depending on whether or . The residue of at can easily be seen to be given by otherwise. (33) As for the second term in (32), we equivalently have . , is the unique solution to the The second pole, denoted as , with such that following equation in on the set
, , otherwise.
(34) (37)
Inserting (33) and (34) into (32), we get to (26). Let us now deal with the integral in (29). As stated before, the integrand is uniformly bounded over , and we can use the dominated convergence theorem in (29), namely
(35) Now, using the fact that after some algebra,
is a solution to (24), we see that,
distinct real-valued solutions poThis equation has a total of sitioned as follows. Comparing the functions on the left-hand side of (37) and (12), one can conclude that the largest values in are always located on intervals of the type . Since we are dealing with the case , we see that under As4), there will be a single solution sharing the same interval with , namely , and this value will always correspond to a pole with multiplicity one. The residues of at can be calculated using the method described in [20, Prop. 4.1.2]. Taking into account the fact that is a solution to (37), one can readily write
Using the expression of the residues and in (36) we obtain (27). Proof for the Case : When , it may happen that, even under As4), some of the sample eigenvalues are equal to zero. In this case, the asymptotic behavior of Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
5362
and for
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 11, NOVEMBER 2008
can be easily obtained from the corresponding quantities , by noting that
and therefore and traditional estimators
can be expressed as a function of the , for , namely
where, apart from basic algebraic manipulations, we have used the identity
Let us finally characterize that
. We only need to use (39) to see
(38)
(39) Now, the problem here is that we cannot directly use the proof because the eigenvalue splitting condiof Theorem 2 for tion in may not be satisfied for each . In any case, an equivalent result can be established by regarding the eigenvalues as belonging to the same super-cluster of . Under As4), this super-cluster is not connected to the cluster associated with , and consequently we can formulate the following result, which is a direct consequence of the proof of Theorem 2 under the case . The proof is identical to the one for the case , and is therefore omitted. Proposition 4: Under As1)–As4), and as at the same rate ( , )
almost surely, where for are as defined in Theorem 2. The above equation can also be expressed as
where is defined as [see equation at the bottom of the page]. In order to obtain the expression for given in Theorem 2 we only need to use the following lemma. Lemma 1: The following identity:
(40) Consider first the traditional estimator of the smallest eigenvalue, namely . It follows from (38) and also the fact that5 almost surely as at the same rate, that
is valid for any Proof: See Appendix IV. V. NUMERICAL EVALUATION
5This can be shown using the Markov inequality plus the Borel–Cantelli lemma.
In this section, we provide a brief numerical evaluation to illustrate the accuracy of the derived asymptotic expressions. We consider an 8 8 diagonal covariance matrix with four different eigenvalues, namely , all of them with multiplicity 2. In Fig. 4(a), we represent the average value of the sample estimates of the minimum and maximum eigenvalues taken from realizations, together with the asymptotic values presented here. In circles we represent the minimum number of samples
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
MESTRE: ON THE ASYMPTOTIC BEHAVIOR OF THE SAMPLE ESTIMATES OF EIGENVALUES AND EIGENVECTORS
5363
Fig. 4. Average versus asymptotic behavior of eigenvalues and eigenvectors for different values of the sample size.
Fig. 5. Simulated versus asymptotic behavior of eigenvalues and eigenvectors as are equal to half of the standard deviation of the simulated values.
for which As4) holds in this particular example. In Fig. 4(b), we represent the average simulated and the predicted orthogonality factor of the maximum and minimum eigenvector eigenvalues, defined as
with as in (2). We observe that the derived formulas provide a very accurate prediction of the behavior of the sample estimators of the eigenvalues and eigenvectors even for relatively low values of and .
M and N grow without bound at the same rate. The length of the vertical bars On the other hand, Fig. 5 compares the simulated versus the predicted values of the eigenvalues and eigenvector orthogonality factors as both , increase without bound. In particular, we considered the same eigenvalues as before, and we fixed , and let go from 1 to 10. To illustrate the fact that the predicted expressions are independent of the behavior of the dimensionality of the subspaces, we fixed the multiplicity of the lowest eigenvalue to , whereas the multiplicity of the largest eigenvalue was kept constant and equal to 1 for all . Observe that the mean simulated behavior is very close to the asymptotic prediction, even for as low as 1 (in fact, mean simulated and predicted curves appear superposed). On the other hand, the variance of the simulated values decreases as increases (vertical bars are proportional to the measured standard deviation), indicating the convergence towards the asymptotic predictions.
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
5364
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 11, NOVEMBER 2008
VI. CONCLUSION This paper has considered the asymptotic behavior of the traditional eigenvalue and eigenvector estimators based on the sample covariance matrix, assuming that both the sample size and the observation dimension are large but have the same order of magnitude. Our analysis is valid under the assumption that the asymptotic sample eigenvalue cluster associated with the eigenvalue/eigenvector to be estimated is separated from the rest of the asymptotic sample eigenvalue distribution. Under this assumption, and using random matrix theory techniques, we have derived the exact limit of the traditional sample estimators of the eigenvalues and eigenvectors in closed form. Results demonstrate that traditional sample estimators are inconsistent in this asymptotic regime, indicating a poor performance in finite sample size situations. Finally, it should be pointed out that the same type of technique that has been proposed here could also be used to derive improved estimators of the eigenvalues and eigenvectors of the covariance matrix. The interested reader is referred to [21]. APPENDIX I CHARACTERIZATION OF THE ASYMPTOTIC SAMPLE EIGENVALUE DISTRIBUTION In this Appendix, we derive a closed-form characterization of , associated with the asymptotic the support of the density sample eigenvalue distribution. As we have seen in Section II, thanks to the Stieltjes inversion formula, it turns out that this as support can be studied by analyzing the behavior of moves to the real axis. This is because the support of the asymptotic sample eigenvalue distribution coincides with the support . It can be shown that of exists [17] for any , and for therefore we only need to characterize the support of . real valued To do this, we will study the solutions to the equation
and the case differentiate between the case where . , we know from [14] that there exists a When to (41) in the set unique solution . Noting from (42) that the imaginary part of and have the same sign, we see that there exists a , which is in bijective unique solution to (43) in the set . This corresponds to the definition correspondence with given in Remark 2. of , the solutions to (41) or (43) when is reWhen stricted to the real axis are the roots of a polynomial equation with real-valued coefficients. This means that of degree different complex roots counting multithere are at most plicities and including their corresponding complex conjugates. as the left-hand side of (43), namely Let us define (44) can be represented as . so that (43) for only depends on the eigenvalues and on , Note that but not on . In Fig. 6, we give a typical representation of as a function of for the cases6 and . The real-valued solutions to (43) are determined as the crossing points between and the horizontal line at (points marked with the symbol in these figures). Proposition 5: The equation has different real-valued solutions located at each one of the intervals , . The other two solutions (counting multiplicities) are a pair of conjugate roots with nonzero , and real-valued if , imaginary part if where , , , , and where are the real-valued solutions to the following equation:
(41) for both complex-valued and real-valued . To simplify the analysis, consider a new variable defined in terms of as (42) , these relationships are the It is easy to see that, if inverse of one another and establish a bijection between the sets and . Furthermore, for real valued , we observe that the imaginary part of and have the same sign. In particular, the support of as a function of coincides with the support of on the real axis. , Inserting the second expression into (41) and assuming we see that is a solution to (41) if and only if is a solution to (43) We will now characterize the solutions to (43). Since these solutions are in bijection with those of (41), we will be implicitly characterizing the solutions to the original equation. We will
Given a fixed , there is a single solution out of the real-valued roots of the equation (counting or, equivalently, multiplicities) such that (45) Furthermore, equality only holds when belongs to the boundary of , that is . Proof: Using basic analysis tools, and noting the position of the asymptotes in Fig. 6, one can easily see that here exists located at each at least one solution of the equation , . The other two solutions interval can be either a pair of of the polynomial equation conjugate roots or two additional real values. It is easy to check real-valued roots (see further Fig. 6) that there will be 6The only difference between the case c < 1 and the case c > 1 is the position of the first inflexion point, which occurs for positive f if c < 1 and for negative f if c > 1. In general, the number of solutions at each interval ( ; ) can vary from 1 to 3 in both cases.
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
MESTRE: ON THE ASYMPTOTIC BEHAVIOR OF THE SAMPLE ESTIMATES OF EIGENVALUES AND EIGENVECTORS
5365
f ) = x for c < 1 in (a) and c > 1 in (b).
Fig. 6. Typical representation of the solutions to the equation 8(
whenever the derivative of is positive at one of the intersecting points. Furthermore, if there exists a solution with pos, meaning that itive derivative of
This means that, in order to prove that and have the same sign (so that preserves the ordering of ), we only need to show that
,
(47) (46) then this solution is unique and has multiplicity one. Let denote the left-hand side of (46). In Fig. 2(a), we give a typical as a function of . The points at which representation of is a decreasing function of are the points for which . is a positive function and that it Now, observe that . presents vertical asymptotes at the points as and is always posNow, since itive, we can conclude that the number of solutions to the (counting multiplicities) is always equation even, and higher than or equal to two. We will denote them , where . The presence as of vertical asymptotes guarantees that we can order them as (equality is follows: understood to hold when there is a solution with double multi, plicity). On the other hand, we can also ensure that and that, given an eigenvalue , there exists a such that . unique is monoAll this proves that the original function except for the compact tonically decreasing for all set and that the boundary points of are the inflexion points of . We will denote as the corresponding images by , i.e., , claim that the transformation , , of the pre-images be two different inflexion points of in (44), we can write of
, . We preserves the ordering , in the sense that . Indeed, let and . Using the definition
To see that, observe that the following quantity is always posisuch that : tive for any ,
Expanding this expression, we obtain the inequality
(48) However, since and are inflexion points of correspond to the solutions of the equation consequently
, they , and
Inserting this into (48), we get to (47). From all this, we can conclude that the equation has exactly real-valued solutions (counting multiplicities) and only different real-valued solutions when . In the first case, there with multiplicity one when is a single solution that corresponds to a positive derivative of , whereas in the second case there is a single solution with positive imaginary part.
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
5366
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 11, NOVEMBER 2008
With Proposition 5, we have completely characterized the so. This in turn deterlutions to (43) in terms of for mines the position of the real roots of (41) in terms of . Noting over the real axis, which is equal that the support of , coincides with the values of to the support of for which the (43) has a couple of complex complex conjuis given by gate roots, we conclude that the support of . As a corollary of all the above, it is easy to see that coincides with the choice of that is proposed in Remark 2 of root of the equation Section IV-A. Indeed, observe that, denoting and expressing , we can write the as imaginary part of the equation
where we know from Section I that , which implies that such that
is the unique root
(49)
where the inequalities are understood to hold with probability denotes the distance one for all large , , and between and the support of (related to the asymptotic sample eigenvalue distribution) excluding when . On the other hand
Note that the values of , can be chosen arbitrarily for each as long as the interval encloses only the eigenvalues . In particular, from the results above and under can be chosen to stay bounded away [As4)], we see that , from the support for all . This guarantees that, under As4), is bounded away from zero with probability one for all large , . In other words, under As4), thanks to the exact separation arguments presented above, we can assume and have an almost sure approximant7 (respectively, that and ) such , , denoted by and that with probability one for all large , . The result of the proposition follows from all the above, together with Theorem 1 and the dominated convergence theorem. Indeed, one can see that8
for . Now, observe that since is solution to a will be a solution to the polynomial equation, . Since , one can only have (by equation positive or zero. If continuity) , then can only be the only solution with positive imaginary part. If, on the contrary, to , then from (49) we see that it necessarily is the unique root that verifies (25). The reciprocal statements are natural consequences of continuity of the roots of the original polynomial equation as functions of . APPENDIX II PROOF OF PROPOSITION 1 To prove this proposition, we use two powerful results presented in [18] and [22]. In [18], it was proven that, for sufficiently high , with probability one there are no sample located outside the support of the eigenvalues asymptotic sample eigenvalue distribution. This result was later improved in [22], where it was shown that the separation of the sample eigenvalues into the asymptotic clusters is exact. This means that the relative number of sample and true eigenvalues (counting multiplicities) contained in a particular cluster of the asymptotic sample eigenvalue distribution is exactly the same with probability one, for sufficiently high , . The only exception to that is the lowest asymptotic sample eigenvalue cluster in the case . Indeed, when , there are exactly sample eigenvalues equal to zero, which do not contribute to the first asymptotic sample eigenvalue cluster. Let us go back to the integrals in (20). Observe that, with probability one, the integrands of these two integrals are uniformly bounded for all large , in an open set within containing every realization of . Indeed, it is easy to see that, applying the Cauchy–Schwarz inequality
Now, it follows from the discussion above that, with probability one and for sufficiently high , ,
and that hand,
by definition. On the other pointwise almost surely for all , while it can also be seen that
because is holomorphic on (by the Schwarz reflection principle [20]). Hence, we only need to apply the dominated convergence theorem to ensure that
almost surely, from where the final result follows. The proof of the convergence of is almost identical, and is therefore omitted. 7These
two quantities can in general depend on
M.
8The following equalities/inequalities are understood to hold with probability
one for sufficiently high
M, N.
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
MESTRE: ON THE ASYMPTOTIC BEHAVIOR OF THE SAMPLE ESTIMATES OF EIGENVALUES AND EIGENVECTORS
APPENDIX III PROPERTIES OF Property 1: When is continuous on the points
is restricted to the real axis, and differentiable on except for .
Proof: If , continuity and differentiability of follows from the fact that is a root of the polynomial in (43), whose coeffi(note that the roots cients are differentiable functions of of a polynomial are holomorphic functions of its coefficients, see further [23]). Consequently, we only need to , namely prove continuity at the boundary points of , which is equivalent to showing that and also for all . The proof follows from Proposition 5 in Appendix I and is omitted due to space constraints. Property 2: The real part of , , is monotonito when grows cally increasing with , going from to . from and denote the real and imaginary Proof: Let , i.e., . Taking imaginary parts of in parts at both sides of the expression of the derivative (31), we see
where if
is a positive function of , and
and
APPENDIX IV PROOF OF LEMMA 1 Observe that following polynomial
can be obtained as the roots of the
where the expression on the right-hand side can easily be obwe tained from (37). Evaluating this polynomial at obtain (51) On the other hand, observing the form of the derivative of
so that
so that
. Finally, consider finally belonging to the boundary of is not defined, and we have to consider the In this case, . These limits can be shown to be left and right limits of positive quantities (details are omitted). Property 3: The imaginary part of , , is positive for , and zero is poselsewhere. In other words, the imaginary part of is in the set itive only when the real part of and zero otherwise. Proof: This is a direct consequence of Proposition 5 in is chosen out of the roots Appendix I and the way of (24).
(50) for all . Now,
From Proposition 5, this quantity is always positive. , we will have On the other hand, if imaginary parts on both sides of (24), we see that
. Taking
and evaluating it at
Using this in the expression of
5367
we obtain
given in (50),
.
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.
,
5368
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 56, NO. 11, NOVEMBER 2008
Dividing
both
sides of the above equation by and using (51) to simplify the term on the left hand side, we obtain
Multiplying the whole equation by , adding to both sides and moving the second term on the right-hand side to the left, we obtain.
Finally, using the fact that is solution to (19) so that can be replaced by the corresponding quantity on the right-hand side of (19) multiplied by , we can write
and grouping the last fourth terms, we get to the expression in (40). ACKNOWLEDGMENT The author would like to thank the anonymous reviewers for providing their valuable and constructive comments, which helped to improve a lot the quality of the manuscript. REFERENCES [1] R. Schmidt, “Multiple emitter localization and signal parameter estimation,” in Proc. RADC Spectral Estimation Workshop, Rome, NY, 1979, pp. 243–258. [2] T. Citron and T. Kailath, “An improved eigenvector beamformer,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, San Diego, CA, pp. 33.3.1–33.3.4. [3] D. Tufts and R. Kumaresan, “Singular value decomposition and improved frequency estimation using linear prediction,” IEEE Trans. Acoust., Speech, Signal Process., vol. 30, pp. 671–675, Aug. 1982. [4] K. Abed-Meraim, P. Loubaton, and E. Moulines, “A subspace algorithm for certain blind identification problems,” IEEE Trans. Inf. Theory, vol. 43, pp. 499–511, Mar. 1997. [5] M. Wax and T. Kailath, “Detection of signals by information theoretic criteria,” IEEE Trans. Acoust., Speech, Signal Process., vol. 33, pp. 387–392, Apr. 1985. [6] T. Anderson, “Asymptotic theory for principal component analysis,” Ann. Math. Stat., vol. 34, pp. 122–148, Mar. 1963.
[7] R. Muirhead, Aspects of Multivariate Analysis. New York: Wiley, 1982. [8] R. Gupta, “Asymptotic theory for principal component analysis in the complex case,” J. Indian Stat. Assoc., vol. 3, pp. 97–106, 1965. [9] J. Schott, “Asymptotics of eigenprojections of correlation matrices with some applications in principal component analysis,” Biometrika, vol. 84, pp. 327–337, Jun. 1997. [10] D. Tufts, A. Kot, and R. Vaccaro, The Threshold Effect in Signal Processing Algorithms Which Use an Estimated Subspace, in SVD and Signal Processing, II: Algorithms, Analysis and Applications. New York: Elsevier, 1991. [11] A. Tulino and S. Verdú, “Random matrix theory and wireless communications,” Found. Trends Commun. Inf. Theory, vol. 1, pp. 1–182, 2004. [12] V. Girko, “G25-estimators of principal components,” Theory of Probab. Math. Stat., vol. 40, pp. 1–10, 1990. [13] V. Girko, “Strong law for the eigenvalues and eigenvectors of empirical covariance matrices,” Random Oper. Stoch. Eq., vol. 4, pp. 176–204, 1996. [14] J. Silverstein, “Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices,” J. Multivar. Anal., vol. 5, pp. 331–339, 1995. [15] X. Mestre, “On the asymptotic behavior of quadratic forms of the resolvent covariance-type matrices,” Centre Tecnològic de Telecomunicacions de Catalunya (CTTC), Catalunya, Spain, CTTC/RC/2006-001, 2006 [Online]. Available: http://www.cttc.cat/en/person/xmestre/publications.jsp [16] V. Girko, “Asymptotic behavior of eigenvalues of empirical covariance matrices,” Theory Probab. Math. Stat., vol. 44, pp. 37–44, 1992. [17] J. Silverstein and S. Choi, “Analysis of the limiting spectral distribution of large dimensional random matrices,” J. Multivar. Anal., vol. 54, pp. 295–309, 1995. [18] Z. Bai and J. Silverstein, “No eigenvalues outside the support of the limiting spectral distribution of large dimensional sample covariance matrices,” Ann. Probab., vol. 26, pp. 316–345, 1998. [19] D. Lawley, “Test of significance for the latent roots of covariance and correlation matrices,” Biometrika, vol. 43, pp. 128–136, 1956. [20] J. Marsden and M. Hoffman, Basic Complex Analysis, 3rd ed. New York: Freeman, 1987. [21] X. Mestre, “Improved estimation of eigenvalues of covariance matrices and their associated subspaces using their sample estimates,” IEEE Trans. Inf. Theory, vol. 54, no. 11, Nov. 2008. [22] Z. Bai and J. Silverstein, “Exact separation of eigenvalues of large dimensional sample covariance matrices,” Ann. Probab., vol. 27, no. 3, pp. 1536–1555, 1999. [23] D. Brillinger, “The analycity of the roots of a polynomial as functions of the coefficients,” Math. Mag., vol. 39, pp. 145–147, May 1966. Xavier Mestre (S’96–M’03) received the M.S. and Ph.D. degrees in electrical engineering from the Technical University of Catalonia (UPC), Spain, in 1997 and 2002, respectively. He is also working towards and nearing completion of a Mathematics degree at the Facultat de Matemàtiques i Estadística at UPC. From January 1998 to December 2002, was with UPC’s Communications Signal Processing Group, where he has worked as a Research Assistant and has participated actively in the European-funded projects ACTS TSUNAMI-II, ACTS SUNBEAM, IST METRA, MEDEA+ UNILAN and IST-IMETRA. In January 2003, he joined the Telecommunications Technological Center of Catalonia (CTTC), Spain, where he currently holds a position as a Senior Research Associate in the area of radio communications. During this time, he has actively participated in the European projects MEDEA+ MARQUIS, CELTIC WISQUAS, and IST COOPCOM, the networks of excellence IST NEWCOM, IST ACE, and IST NEWCOM++, and he has led several contracts with the local industry. Since 2004, he has been the coordinator of the Radio Communications Research Area at CTTC. Dr. Mestre was a recipient of a 1998–2001 Ph.D. scholarship (granted by the Catalan government) during the pursuit of his Ph.D. degree and was awarded the 2002 Rosina Ribalta Second Prize for the Best Doctoral Thesis Project within areas of Information Technologies and Communications by the Epson Iberica foundation.
Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on December 13, 2008 at 19:19 from IEEE Xplore. Restrictions apply.