Low-Rank Subspace Representation for ... - Semantic Scholar

Report 4 Downloads 227 Views
6286

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 11, NOVEMBER 2015

Low-Rank Subspace Representation for Estimating the Number of Signal Subspaces in Hyperspectral Imagery Alex Sumarsono, Member, IEEE, and Qian Du, Senior Member, IEEE

Abstract—In this paper, we consider signal subspace estimation based on low-rank representation for hyperspectral imagery. It is often assumed that major signal sources occupy a low-rank subspace. Due to the mixed nature of hyperspectral remote sensing data, the underlying data structure may include multiple subspaces instead of a single subspace. Therefore, in this paper, we propose the use of low-rank subspace representation to estimate the number of subspaces in hyperspectral imagery. In particular, we develop simple estimation approaches without user-defined parameters because these parameters can be fixed as constants. Both real data experiments and computer simulations demonstrate excellent performance of the proposed approaches over those currently in the literature. Index Terms—Data dimensionality, hyperspectral imagery, low-rank representation (LRR), low-rank subspace representation (LRSR), rank estimation, signal subspace estimation.

I. I NTRODUCTION

E

STIMATION of data intrinsic dimensionality (ID) is a very challenging problem [1]. By definition, ID is referred to as the minimum number of parameters required to account for the observed properties of a data set [2]. ID is often much less than data dimensionality. For remotely sensed hyperspectral data that includes hundreds of spectral bands, data dimensionality is very high, which is equal to the number of spectral channels. However, its ID can be a very small value. For instance, in the linear mixture model, a pixel in a remotely sensed image with relatively rough spatial resolution is often considered as linear mixture of pure materials, called endmembers, present in the image scenes. Obviously, based on the definition of ID, the number of true endmembers that construct an entire image data may be closely related to the ID. In [3], the concept of virtual dimensionality (VD) was proposed, which is the minimum number of distinct signal sources. The value of VD may be larger than the ID, but it is relatively easier to be estimated. ID estimation can be approached by rank estimation techniques, which are often based on information criteria, such as the Akaike information criterion [4] and the minimum

Manuscript received January 28, 2015; revised April 1, 2015 and April 25, 2015; accepted May 23, 2015. The authors are with the Department of Electrical and Computer Engineering, Mississippi State University, Mississippi State, MS 39762 USA. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2015.2438079

description length [5]. More discussion can be found in [6]. Unfortunately, it has been shown in [3] that those methods do not perform well for hyperspectral image data when noise is often not white. Another frequently used approach is principal component analysis (PCA) by searching for the most significant eigenvalues of the data covariance matrix. Due to the complexity of real data, the significance level is difficult to be predetermined; thus, some thresholding methods, such as the one using an energy percentage, Malinowsk’s method in [7], etc., can be applied. More sophisticated generalized PCA (GPCA) was also proposed [8]. However, threshold selection in these methods is critical but unfortunately empirical. To statistically gauge the significance of eigenvalues, several methods, such as Harsanyi-Farrand-Chang (HFC), the noisewhitened HFC (NWHFC), and the noise subspace projection (NSP), were proposed in [3]. The HFC was modified to be parameter-free in [9], which also examines the difference between eigenvalue pairs of covariance and correlation matrices of original data as in [3]. It is well known that major data information tends to be distributed in a low-dimensional subspace, although hyperspectral imagery has very high data dimensionality. An interesting method called hyperspectral signal identification by minimum error (HySime) was proposed in [10], which searches for significant eigenvectors to construct the low-rank signal subspace. Considering the fact that data may contain rare components, such as outliers, the maximum orthogonal complement analysis algorithm (MOCA, [11]) and the robust signal subspace estimation algorithm (RSSE, [12]) were developed. Later, the computationally efficient MOCA, called modified MOCA (MMOCA) [13], and robust signal subspace identification in the presence of dependent noise (RSSI-SD) method [14] were also developed. Other work considering outliers during rank estimation can be found in [15]. Nearest neighbor distance and random matrix theory are also investigated for ID or VD estimation [16]–[18]. In this paper, the recently popular low-rank and sparse matrix decomposition technique is considered to estimate the signal subspace and a sparse matrix accommodating outliers or rare components. The original idea was proposed as robust principal component analysis (RPCA) [19]. If data recovery can be achieved by low-rank representation (LRR), more accurate estimation of signal subspaces may be possible by working on the low-rank matrix. Unfortunately, due to the complexity of real data, such as hyperspectral image data, the resulting

0196-2892 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

SUMARSONO AND DU: SUBSPACE REPRESENTATION FOR ESTIMATING THE NUMBER OF SIGNAL SUBSPACES

low-rank matrix is not truly low-rank, which makes it useless for rank estimation. Given the mixed nature of real data, the underlying data structure includes multiple subspaces instead of a single subspace, and low-rank subspace representation (LRSR) has been proposed and applied to traditional digital images, producing better performance [20]. Therefore, in this paper, we propose the use of LRSR to estimate the number of signal subspaces with the resulting low-rank matrix being much closer to be truly low-rank. The proposed technique has three major advantages: 1) it can accommodate withinclass variations because each signal subspace is related to a single class, thus the estimate is close to the number of classes present in an image scene; 2) it is insensitive to rare components because anomalies and outliers are preseparated from the signal matrix; 3) several parameters in the algorithm can be simply fixed to constants such that the overall technique becomes parameter-free. Real data experiments and computer simulations demonstrate excellent performance of the proposed approaches without user-defined parameters. Note that the original idea is presented in a conference version of this paper [21]. Here, we add discussion on the sensitivity of regularizing parameter, noise impact, and computational cost, and include more experimental results with real data and simulated data. This paper is organized as follows. Section II briefly introduces low-rank and low-rank subspace representations. Section III discusses how to use the low-rank subspace representation for estimating the number of signal subspaces. Section IV shows experimental results, and conclusion is drawn in Section V.

A. Low-Rank and Low-Rank Subspace Representations Let a hyperspectral image data matrix X ∈ Rd×n (d is the number of spectral bands and n is the number of pixels) be decomposed into a low-rank matrix D0 and a small perturbation matrix N0 [19] (1)

PCA can successfully recover D0 via the singular value decomposition (SVD) assuming the noise N0 is small. If this assumption is violated, i.e., N0 is replaced by E0 whose entries have arbitrary large magnitude, then the result of PCA could be quite inaccurate. This problem can be solved by RPCA. If D0 is not sparse and the elements in E0 are uniformly random, then the low-rank matrix D0 and the sparse matrix E0 can be recovered by casting X = D0 + E0 as a convex optimization problem [19] minimizeD∗ + λE1

subject to X = D + E

 subspaces S1 , S2 , . . . , Sk , i.e., S = ki=1 Si , will be treated  as if sampled from a single subspace defined by a sum S = ki=1 Si . This may lead to inaccuracy on the recovered data. Mixed data can be better handled by a low-rank subspace representation (LRSR) with the following formulation [20]: min Z∗ + λE2,1

subject to X = AZ + E

(2)

where λ is a positive regularizing parameter, .∗ denotes the nuclear norm of a matrix, which is the sum of its singular values, and .1 denotes the sum of the absolute values of matrix entries. RPCA has been applied successfully in many applications. However, the underlying assumption is that the data is drawn from a single subspace. Data drawn as a union from multiple

(3)

where A is a dictionary that linearly spans the data space and .2,1 denotes the sum of the l2 norm of the columns. Solving the optimization problem in (3) yields to minimizers Z∗ and E∗ [20]. Z∗ can be interpreted as the lowest rank representation of X with respect to A. Due to the fact that rank(AZ∗ ) ≤ rank(Z∗ ), AZ∗ also represents a low-rank recovery of the original data. That is, the true segmentation of the data contained in the underlying row space can be recovered by AZ∗ . In the ideal case where the data is clean, then rank(Z∗ ) = rank(X). Data samples such as errors and outliers are by definition not part of the underlying subspaces, which can be recovered by the sparse matrix E∗ . B. Augmented Lagrange Multiplier Method The general augmented Lagrange multipliers (ALM) method solves a constrained optimization problem [22], [23] in a general format min f (x)

subject to h(x) = 0

(4)

by first defining the Lagrangian function L(X, Y, μ) = f (X)+ < Y, h(X) > +

II. L OW-R ANK R EPRESENTATIONS

X = D0 + N0 .

6287

μ h(X)2F 2

(5)

where Y is a Lagrange multiplier and μ is a positive scalar controlling the convergence rate. In order to use this method, the problem in (3) is first converted to an equivalent problem min J∗ + λE2,1

subject to X = AZ + E and Z = J. (6)

Then the Lagrangian function for LRSR becomes L = J∗ + λE2,1 + Y1 , X − AZ − E + Y2 , Z − J μ μ + X − AZ − E2F + Z − J2F . (7) 2 2 The last two constraints in (7) actually represent a single equality constraint in (5) (with h(X) = X − AZ − E, and the introduced parameter J that is as close to Z as possible), so the same penalty term μ is used. In other words, the levels of penalty on these two terms ought to be the same. ALM solves this problem by repeatedly minimizing J, Z and E, whereas respectively keeping the other variables fixed then updating the Lagrange multipliers. However, solving (Jk , Zk , Ek ) = arg minJ,Z,E L(J, Z, E, Yk , μ) itself is an iterative process. As it turns out, this subproblem does not have to be solved exactly, which leads to an inexact augmented Lagrange multiplier (IALM) method [23]. The IALM method converges as fast as the exact ALM but requires significantly less computational cost due to the much fewer number of partial SVD’s to be computed. Thus, in this research, IALM is adopted.

6288

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 11, NOVEMBER 2015

Note that the parameters in IALM are chosen by default as in [23] (J , Z, E, Y are initialized to 0, and μ = 0.001); in our experiments, the final low-rank and sparse matrices are not sensitive to these parameters. The convergence speed is related to μ; in any event, the algorithm can converge within 200 iterations. C. Regularization Parameter The choice of λ in (3) may be critical. Theoretically, a larger λ is required for more√noisy data. Candes et al. provided an empirical rule: λ = 1/ n in [19]. Fortunately, it works very well in the experiments. Thus, λ is fixed in this research. III. E STIMATION OF THE N UMBER OF S UBSPACES U SING L OW-R ANK R EPRESENTATION A. Direct Rank Counting In reality, it often happens that the dictionary A is unknown. In such a case, a good choice is to set A = X. The optimization problem in (3) becomes min Z∗ + λE2,1

subject to X = XZ + E.

(8)

Intuitively, the number of subspaces can be estimated as kˆ by direct rank calculation of Z∗ or AZ∗ , or counting the number of significant singular values. In the experiments, we will show that large and small singular values become very distinctive after LRSR, making direct counting possible.

where τ is the threshold taking a value within [0, 1]. In this research, τ = 1. We find that the estimate kˆ from (11) is sensitive to the parameter τ . In reality, the smallest subspace number whose error level matches (or is very close to) that of the full SVD of matrix Z∗ (i.e., all of the singular values are included) is considered to represent the true rank. Let the error level be defined as X − Xr F (13) Error = XF where Xr = AZr denotes the reduced data set using the largest r positive singular values. kˆ should be the smallest value whose error level is the same (or almost the same) to the one using Z∗ with all singular values. Interestingly, in the experiments, it will be shown that this corresponds to the estimate when τ = 1. This means singular values larger than 1 are significant; otherwise, insignificant. Note that the output of the soft-thresholding operator in (12) is a continuous function. Contrary to hard-thresholding where values above the threshold are kept and values below the threshold are deleted, the soft threshold function makes smooth transitions at the value of threshold: values slightly below the threshold are not set to zero but gradually attenuated. Such a soft-thretholding method is widely used in image processing to generate a smooth output with less artifact. In this research, as shown in (11), the values less than τ still contribute to the counting, so the final estimate is less sensitive to the specific value of τ as in hard-thresholding.

B. Parameter-Free Soft-Thresholding

IV. E XPERIMENTS

An automatic approach without directly counting large singular values is through soft-thresholding. Here, the data Laplacian matrix L can be used because the number of signal subspaces corresponds to the number of zero singular values of L. For noisy data in practice, it is to count the number of small singular values (less than a threshold τ ). Let U∗ Σ∗ (V∗ )T be the reduced form of SVD of Z∗ , where Σ∗ only contains positive singular values. U∗ and V∗ are constructed by taking the first r columns of U and V, respectively. An affinity matrix W and the Laplacian matrix L can be formed as follows:  2 ˜U ˜ T] [W]ij = [U (9) ij

˜ = U∗ (Σ∗ )1/2 and where U L = I − D−1/2 WD−1/2 (10)  where D = diag( j [W]1j , . . . , [W]nj ). Using softthresholding, the singular values {σi }ni=1 of the Laplacian matrix L can be used to estimate the number of subspaces   n  ˆ fτ (σi ) (11) k = n − round i=1

where fτ (.) is a soft-thresholding operator defined as 1   if σi ≥ τ fτ (σi ) = σi2 log2 1 + τ 2 if σi < τ

(12)

A. Data Sets Four real data sets are used in the experiments, which are briefly introduced as below. The Salinas-A data set in Fig. 1(a) is a subset of the Salinas image collected by the AVIRIS sensor over Salinas Valley in California. Twenty water absorption bands are removed (108–112, 154–167, 224) yielding a total of 204 spectral bands. It has 6 classes. The image spatial size is 86 × 83. The Indian Pines scene in Fig. 1(b) is a subset of larger scene over the Indian Pines test site in North-western Indiana collected by the AVIRIS sensor. It has 16 classes and 224 spectral reflectance bands acquired in the 0.4–2.5 um region. The data set is composed of 145 × 145 pixels. The scene contains two-thirds agriculture and one-third forest or other natural perennial vegetation. The bands covering the region of water absorption are removed reducing number of bands to 200. Pavia University scene in Fig. 1(c) is acquired by the ROSIS sensor over Pavia in northern Italy. It has 103 spectral bands, 612 × 340 pixels per image and 9 classes. To save computational cost, this data set is downsampled with a factor of 4 after the center pixel is selected in each of 4 × 4 nonoverlapping patches. Pavia Center scene in Fig. 1(d) is acquired by the ROSIS sensor over Pavia in northern Italy. It has 102 spectral bands, 1096 × 723 pixels per image and 9 classes. This data set is downsampled with a factor of 7 after the center pixel is selected in each of 7 × 7 nonoverlapping patches.

SUMARSONO AND DU: SUBSPACE REPRESENTATION FOR ESTIMATING THE NUMBER OF SIGNAL SUBSPACES

6289

Fig. 2. Singular value distribution of Salinas-A.

Fig. 1. Image scenes used in experiments. (a) Salinas-A (Band 75). (b) Indian Pines (Band 150). (c) Pavia University (Band 45). (d) Pavia Center (Band 100).

TABLE I E STIMATION R ESULTS F ROM D IFFERENT M ETHODS

Fig. 3. Singular value distribution of Indian Pines.

B. Performance of the Proposed Methods We consider the number of classes in each data set as the actual number of signal subspaces, with the assumption that one class occupies one subspace. The results of all three methods in [3] (i.e., HFC, NWHFC, NSP) are shown in Table I, which, as expected, may vary as the false alarm probability PF changes. Note that these methods are derived based on detection theory, and PF = 10−4 for hyperspectral imagery may be a reasonable

choice. The results are not very accurate. The results from HySime in [10] are also listed, which are larger than the actual ones. “Not applicable” or “N/A” means the LRR method (with the estimation methods in Section III) cannot produce reasonable estimates. Since the matrix Z∗ is a low-rank representation of X with respect to A (here A = X), one way to estimate the actual rank is singular value decomposition of Z∗ and counting the number of significantly “large” singular values. Figs. 2–5 show the singular value distribution corresponding to each of the data sets. We can see that in each case the singular values are either relatively large or extremely small. When the effects of noise and outliers become negligible in the low-rank matrices from LRSR, this approach works very well where a sudden jump separating the two groups can be clearly seen. The estimated numbers being 6, 15, 8, and 8 (by direct calculating rank of Z∗ or counting the number of significantly large singular values) are fairly close to the ground truth of the four data sets of 6, 16, 9, and 9, respectively.

6290

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 11, NOVEMBER 2015

TABLE II S UBSPACE E STIMATES W ITH S OFT-T HRESHOLDING FOR S ALINAS -A DATA S ET

TABLE III S UBSPACE E STIMATES W ITH S OFT-T HRESHOLDING FOR I NDIAN P INES DATA S ET

Fig. 4. Singular value distribution of Pavia University.

TABLE IV S UBSPACE E STIMATES W ITH S OFT-T HRESHOLDING FOR PAVIA U NIVERSITY DATA S ET

Fig. 5. Singular value distribution of Pavia Center.

It is worth mentioning that the singular value distribution of the original data X does not present a sudden change of values; direct calculating rank of Z∗ yields the number of total bands, and direct counting the number of significantly large singular values is difficult. The similar situation happens when using the low-rank matrix from LRR. Hence, LRSR is truly powerful in retrieving signal subspaces even when data sets include outliers and noise. When using the soft-thresholding technique, the error level associated with the output of the LRSR is minimum when Xr is set to be equal to AZ. In the case of Salinas-A, as depicted in Table II, setting τ to 1 (which is the largest value allowed) in the soft-thresholding operator yields kˆ = 6, in which the error level is very close to the one using all the singular values (up to the fourth fractional digits). Table III shows the softthresholding results for the Indian Pines data set as the value of τ is increased. When τ is as large as 0.98, the error is similar to the one using all the singular values, and the resulting estimate is 15. However, if τ is set to be the maximal value 1,

TABLE V S UBSPACE E STIMATES W ITH S OFT-T HRESHOLDING FOR PAVIA C ENTER DATA S ET

then the resulting estimate is 16, which is equal the number of classes. Similarly, Table IV shows the LRSR result, in which the estimated rank is 8 for Pavia University data set, and Table V is for Pavia Center data set. Interestingly, when τ = 1, the results are more accurate or not changed. The physical meaning of τ is the threshold of significant singular values of the data Laplacian L: a singular value less than τ means it is insignificant for L. Thus, we can simply and reasonably consider singular values less than 1 are insignificant.

SUMARSONO AND DU: SUBSPACE REPRESENTATION FOR ESTIMATING THE NUMBER OF SIGNAL SUBSPACES

TABLE VI I MPACT OF THE S ELECTION OF R EGULARIZATION PARAMETER

6291

TABLE VII I MPACT OF N OISE (SNR IN dB) ON E STIMATION ACCURACY

TABLE VIII C OMPUTING T IME OF THE P ROPOSED M ETHODS ( IN S ECONDS ) AND DATA R ECONSTRUCTION E RROR

Fig. 6. Six signatures used in computer simulations.

C. Impact of Regularization Parameter Table VI shows the variation of estimation with λ. It can be seen that the optimal value of the √ regularization parameter λ suggested by [19] (i.e., λ = 1/ n, where n = 7138, 21 025, 13 005, and 16 171 for Salinas-A, Indian Pines, Pavia University, and Pavia Center data sets, respectively) can produce the estimates that are closest to the ground truth. D. Impact of Noise The synthetic data set has 6 classes, and each has 204 dimensions. 3200 pixels are generated by linearly mixing these six signatures in Fig. 6 with random variables within [0, 1]. Additive uncorrelated Gaussian noise is added. By varying the signal-to-noise ratio (SNR in dB), the performance of different algorithms is measured and compared using the synthetic data set. We can see from Table VII that the LRSR produces the most reliable results with respect to noise tolerance. E. Computational Cost The major computational cost in the proposed methods is in data decomposition, which is associated with the SVD of an n × n matrix and its computational complexity of O(n3 ). Table VIII shows computing time of using ALM and IALM executed in a 2.9 GHz 8-core Intel Xeon E5-2690 computer with 192 GB of memory. In the table, the error of data reconstruction

ˆ for LRSR itself is defined as X − X/X = X − XZ − E/X. We can see that IALM is significantly faster than ALM, with negligible difference in data reconstruction. Here, J , Z, E, Y are all initialized to 0. μ controls the convergence speed, and does not obviously affect the data reconstruction error or the result of signal subspace estimation. However, a too small μ means the algorithm needs more iterations to converge, which increases computational cost. Thus, in this experiment, μ is set to be 0.001. Here, convergence is defined as max(X − AZ − E, Z − J) < 10−8 . V. C ONCLUSION Estimation of the number of signal subspaces is very challenging. Some algorithms may perform quite well with a certain data set but would produce poor results with different sets. The LRSR technique, on the other hand, can produce a cleaner low-rank matrix, which can be used for reliable estimation. Experiments show its excellent performance consistently across different data sets, and the results are highly correlated with the number of classes as expected. The estimates from direct rank calculation and soft-thresholding are basically consistent. The regularization parameter in LRSR can be set as a default value, and the threshold in the soft-thresholding method can be set to unity. These make the proposed methods to be easily implemented.

6292

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 53, NO. 11, NOVEMBER 2015

R EFERENCES [1] K. Fukunaga, “Intrinsic dimensionality extraction,” in Classification, Pattern Recognition and Reduction of Dimensionality, vol. 2, Handbook of Statistics, P. R. Krishnaiah and L. N. Kanal, Eds. Amsterdam, The Netherlands: North-Holland, 1982, pp. 347–360. [2] K. Fukunaga, Statistical Pattern Recognition, 2nd ed. New York, NY, USA: Academic, 1992. [3] C.-I. Chang and Q. Du, “Estimation of number of spectrally distinct signal sources in hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 3, pp. 608–619, Mar. 2004. [4] H. Akaike, “A new look at the statistical model identification,” IEEE Trans. Autom. Control, vol. AC-19, no. 6, pp. 716–723, Dec. 1974. [5] A. Barron, J. Rissanen, and B. Yu, “The minimum description length principle in coding and modeling,” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2743–2760, Oct. 1998. [6] P. Stoica and Y. Selen, “Model order selection: A review of information criterion rules,” IEEE Signal Process. Mag., vol. 21, no. 4, pp. 36–47, Jul. 2004. [7] E. R. Malinowsk, “Determination of the number of factors and experimental error in a data matrix,” Anal. Chem., vol. 49, pp. 612–617, 1977. [8] K. Huang, Y. Ma, and R. Vidal, “Minimum effective dimension for mixtures of subspaces: A robust GPCA algorithm and its applications,” in Proc. IEEE Conf. Comp. Vis. Pattern Recog., 2004, pp. 631–638. [9] B. Luo, J. Chanussot, S. Douté, and L. Zhang, “Empirical automatic estimation of the number of endmembers in hyperspectral images,” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 1, pp. 24–28, Jan. 2013. [10] J. M. Bioucas-Dias and J. Nascimento, “Hyperspectral subspace identification,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 8, pp. 2435–2445, Aug. 2008. [11] O. Kuybeda, D. Malah, and M. Barzohar, “Rank estimation and redundancy reduction of high dimensional noisy signals with preservation of rare vectors,” IEEE Trans. Signal Process., vol. 55, no. 12, pp. 5579–5592, Dec. 2007. [12] N. Acito, M. Diani, and G. Corsini, “A new algorithm for robust estimation of the signal subspace in hyperspectral images in the presence of rare signal components,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 11, pp. 3844–3856, Aug. 2009. [13] N. Acito, M. Diani, and G. Corsini, “Hyperspectral signal subspace identification in the presence of rare signal components,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 4, pp. 1940–1954, Apr. 2010. [14] N. Acito, M. Diani, and G. Corsini, “Hyperspectral signal subspace identification in the presence of rare vectors and signal-dependent noise,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 1, pp. 283–299, Jan. 2013. [15] C. Andreou and V. Karathanassi, “Estimation of the number of endmembers using robust outlier detection method,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 7, no. 1, pp. 247–256, Jan. 2014. [16] Q. Du, “Virtual dimensionality estimation for hyperspectral imagery with a fractal-based method,” in Proc. Workshop Hyperspectral Image Signal Process., Evolut. Remote Sens., Reykjavik, Iceland, 2010, pp. 1–4. [17] R. Heylen and P. Scheunders, “Hyperspectral intrinsic dimensionality estimation with nearest-neighbor distance ratios,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 2, pp. 570–579, Apr. 2013.

[18] K. Cawse-Nicholson, S. B. Damelin, A. Robin, and M. Sears, “Determining the intrinsic dimension of a hyperspectral image using random matrix theory,” IEEE Trans. Image Process., vol. 22, no. 4, pp. 1301–1310, Apr. 2013. [19] E. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” J. ACM, vol. 58, no. 3, pp. 1–39, May 2011. [20] G. Liu et al., “Robust recovery of subspace structures by low-rank representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp. 171–184, Jan. 2013. [21] A. Sumarsono and Q. Du, “Estimation of number of signal subspaces in hyperspectral imagery using low-rank subspace representation,” in Proc. Workshop Hyperspectral Image Signal Process., Evolut. Remote Sens., Lausanne, Switzerland, Jun. 2014, pp. 1–4. [22] D. Bertsekas, Constrained Optimization and Langrage Multiplier Method. New York, NY, USA: Academic, 1982. [23] Z. Lin, M. Chen, and L. Wu, “The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices,” University of Illinois at Urbana-Champaign, Champaign, IL, USA, Tech. Rep., Nov. 2009.

Alex Sumarsono (M’13) received the M.S. degree in electrical and computer engineering in 1985 from Iowa State University, Ames, IA, USA, and is currently working toward the Ph.D. degree in electrical and computer engineering at Mississippi State University, Starkville, MS, USA. At present, he is a member of the R&D organization at Cisco Systems as a Senior Technical Leader responsible for next-generation product development. Previously, he was a Senior Manager and Principal Engineer at Nortel Networks and a Senior Developer at Intel Corporation. His research interests include machine learning and pattern recognition.

Qian Du (S’98–M’00–SM’05) received the Ph.D. degree in electrical engineering from the University of Maryland, Baltimore County, Baltimore, MD, USA, in 2000. She is currently the Bobby Shackouls Professor with the Department of Electrical and Computer Engineering, Mississippi State University, Starkville, MS, USA. Her research interests include hyperspectral remote sensing image analysis, pattern recognition, and machine learning. Dr. Du served as Cochair for the Data Fusion Technical Committee of IEEE Geoscience and Remote Sensing Society (GRSS) in 2009–2013 and Chair for Remote Sensing and Mapping Technical Committee of International Association for Pattern Recognition (IAPR) in 2010–2014. She currently serves as Associate Editor for the IEEE J OURNAL OF S ELECTED T OPICS IN A PPLIED E ARTH O BSERVATIONS AND R EMOTE S ENSING , IEEE S IGNAL P ROCESSING L ETTERS, and Journal of Applied Remote Sensing. She is a Fellow of SPIE—International Society for Optics and Photonics.