An Efficient Method for Supervised Hyperspectral Band Selection

Report 0 Downloads 56 Views
138

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 8, NO. 1, JANUARY 2011

An Efficient Method for Supervised Hyperspectral Band Selection He Yang, Student Member, IEEE, Qian Du, Senior Member, IEEE, Hongjun Su, and Yehua Sheng

Abstract—Band selection is often applied to reduce the dimensionality of hyperspectral imagery. When the desired object information is known, it can be achieved by finding the bands that contain the most object information. It is expected that these bands can provide an overall satisfactory detection and classification performance. In this letter, we propose a new supervised band-selection algorithm that uses the known class signatures only without examining the original bands or the need of class training samples. Thus, it can complete the task much faster than traditional methods that test bands or band combinations. The experimental result shows that our approach can generally yield better results than other popular supervised band-selection methods in the literature. Index Terms—Band selection, hyperspectral imagery.

I. I NTRODUCTION

D

UE TO vast data volume, dimensionality reduction is often conducted before hyperspectral image analysis. It can be achieved by a transform-based approach (e.g., principal component analysis) or a band-selection method. Since band selection can keep a subset of original bands, it may be preferred when the physical meaning of the original bands needs to be maintained. Band selection can be conducted based on the availability of class information. When class information is known, supervised band selection can be applied to preserve the desired object information. When such information is unknown, unsupervised band selection has to be implemented to find the most informative and distinctive bands. In this letter, we limit the discussion on supervised band selection. A large group of supervised band-selection algorithms calculate class separability when a subset of bands is selected; class separability may be measured with divergence, transformed divergence (TD), Bhattacharyya distance, or Jeffries–Matusita (JM) distance; the band subset that yields the largest class separability will be selected [1]–[5]. In this case, enough class samples are usually required to examine class statistics. Other selection criteria include spectral angle mapper (SAM) and orthogonal projection divergence (OPD) [6]. The aforementioned criteria measure the pairwise class distance and then take the Manuscript received February 22, 2010; revised April 19, 2010 and May 19, 2010; accepted June 4, 2010. Date of publication July 29, 2010; date of current version December 27, 2010. H. Yang and Q. Du are with the Department of Electrical and Computer Engineering and Geosystem Research Institute in High Performance Computing Collaboratory, Mississippi State University, Mississippi State, MS 39762 USA (e-mail: [email protected]; [email protected]). H. Su and Y. Sheng are with the Key Laboratory of Virtual Geographic Environment of the Chinese Ministry of Education, School of Geography Science, Nanjing Normal University, Nanjing 210097, China (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/LGRS.2010.2053516

average of all the pairwise distances as the final value for band selection. To avoid testing all the possible combinations, subset forward-searching strategies, e.g., sequential forward selection (SFS) and sequential forward floating selection (SFFS), can be used [7]. Another group of algorithms employs a criterion to prioritize bands, and then dissimilar bands are selected from the bands with higher ranking. The ranking criterion includes (unsupervised) variance (i.e., maximum-variance principal component analysis), (unsupervised) signal-to-noise ratio (SNR) (i.e., maximum-signal-to-noise-ratio principal component analysis), and (supervised) canonical analysis-based minimum misclassification [i.e., minimum misclassification canonical analysis (MMCA)] [8]. These algorithms need to examine all the original bands for prioritization. For simplicity, we adopt the SFS searching strategy in this letter. If an initial band is fixed, the next best band is the one providing the highest classification accuracy; after exhaustively testing all the possible initial bands, the final result is the band subset yielding the highest accuracy. Of course, this kind of band selection is not interesting because classification needs to be conducted each time. It is computationally prohibitive if the selected classifier [e.g., support vector machine (SVM)] is very expensive with training and test. In this letter, we propose a simple but effective method without performing classification during band selection or calculating class statistical information using training samples or examining original bands (or band combinations). It selects bands based on class spectral signatures only. II. P ROPOSED BAND -S ELECTION M ETHOD A. Theoretical Background In a supervised situation where class signatures are known, band-selection process can be greatly simplified. This is because we may utilize only the class signatures rather than entire bands for band selection. Assume that there are p classes present in an image scene with L bands. Based on the linear-mixture model, a pixel r can be considered as the mixture of the p endmembers. Let the endmember matrix be S = [s1 , s2 , . . . , sp ]. The pixel r can be expressed as r = Sα + n

(1)

where α = (α1 α2 αp )T is the abundance vector and n is the uncorrelated white noise with E(n) = 0 and Cov(n) = σ 2 I (I is an identity matrix). The least square estimation of α, ˆ can be obtained as denoted as α,

1545-598X/$26.00 © 2010 IEEE

ˆ = (ST S)−1 ST r. α

(2)

YANG et al.: EFFICIENT METHOD FOR SUPERVISED HYPERSPECTRAL BAND SELECTION

ˆ include According to [9], the stochastic features of α ˆ =α E(α)

ˆ = σ 2 (ST S)−1 . Cov(α)

(3)

If there are q classes present and q > p, which means that only p class signatures are known, then the noise n in (1) is not white any more. Instead, Cov(n) = σ 2 Σ, where Σ is the noise covariance matrix. In this case, the abundance of the p classes can be estimated using the weighted least square solution as ˆ = (ST Σ−1 S)−1 ST Σ−1 r. α

(4)

ˆ According to [9], the first- and second-order moments of α become ˆ = σ 2 (ST Σ−1 S)−1 . Cov(α)

ˆ =α E(α)

(5)

This analysis is consistent with the understanding that when all the structured signal sources (or classes) can be extracted, the remaining noise can be modeled as independent and identically distributed Gaussian noise [10]. When such extraction is difficult, background (i.e., unknown classes) and noise whitening should be applied first. If class pixels can be preremoved, the resulting Σ represents the background and noise covariance, denoted as Σb+n . Using Σ−1 b+n in (4) and (5) for background and noise whitening may yield better performance. Here, for simplicity, all pixels are used for Σ estimation [11]. In practice, noise variance is rarely uniform even when all the signal sources are known and extracted. If an appropriate noisevariance estimate is available, then a diagonal noise covariance matrix Σn can be used for noise whitening, which may reduce model error. B. Proposed Method ˆ Intuitively, the selected bands should let the deviation of α from the actual α as small as possible. When all the classes are known, this is equivalent to minimizing the trace of the covariance, i.e.,    ˆ T S) ˆ −1 arg min trace (S (6) ΦS

ˆ is the based on (3), where ΦS is the selected band subset and S matrix containing class spectral signatures in ΦS . If the classes are partially known, it is equivalent to determining    ˆ TΣ ˆ −1 S) ˆ −1 (7) arg min trace (S ΦS

ˆ is the data covariance matrix with the sebased on (5), where Σ lected bands in ΦS only. The resulting band-selection algorithm is referred to as minimum estimated abundance covariance (MEAC) method. The basic steps in the MEAC algorithm with the SFS searching strategy can be described as follows. 1) Initialize the algorithm by choosing a pair of bands B1 and B2 . Then, ΦS = {B1 , B2 }. 2) Find a third band B3 such that (6) or (7) is minimized. Then, the selected band subset is updated as ΦS = ΦS ∪ {B3 }. 3) Continue with step 2) until the number of bands in ΦS is large enough.

139

For an original image with L bands, after the initial band pair is determined, (L − 3) evaluations of (6) or (7) are needed when selecting the third band; after the first three bands are selected, (L − 4) evaluations are needed for the selection of the fourth band; and so on. The MEAC algorithm does not require training samples, does not conduct classification during band selection, and does not evaluate the original bands or band combinations. Thus, it is computationally efficient. When the number of selected bands is less than the number ˆ in (6) and the matrix S ˆ TΣ ˆ −1 S ˆ in ˆ TS of classes, the matrix S (7) are ill-ranked. In this case, the nonzero eigenvalues can be computed first, and the trace of the inverse matrix is the sum of the inverse of these nonzero eigenvalues. C. Initial Band Pair for Band Selection Instead of exhaustively searching for the best initial or using a random initial, MEAC can be initialized using the two bands whose dissimilarity is the largest based on maximum linear prediction error [12]. The detailed algorithm is as follows. 1) Randomly select a band A1 , and project all the other L − 1 bands to its orthogonal subspace A1 ⊥ . 2) Find the band A2 with the maximum projection in A1 ⊥ , which is considered as the most dissimilar to A1 . 3) Project all the other L − 1 bands to the orthogonal subspace A2 ⊥ and find the band A3 with the maximum projection. 4) If A3 = A1 , A1 and A2 are confirmed to be the pair with the most significant dissimilarity, and the algorithm is terminated; if A3 = A1 , go to the next step. 5) Continue the algorithm until Ai+1 = Ai−1 , then either Ai−1 or Ai can be used as the band selection initial B1 (or Ai−1 and Ai are used as the initial band pair B1 and B2 ). We find out that this method can always extract the two most distinctive bands regardless of its initial A1 , although it will result in a suboptimal set in the following band selection. D. Performance Evaluation In order to evaluate the amount of class information and class separability in the selected bands, a supervised classification algorithm, such as orthogonal subspace projection (OSP) [14], constrained linear discriminant analysis (CLDA) [15], or SVM, can be applied. It is worth mentioning that SVM provides hard classification, which may be suitable to images with fine spatial resolution. When pixel-level ground truth is unavailable, the classification maps from using all the original bands can be considered as ground truth, and those from the selected bands are compared with them with spatial correlation coefficient ρ. An average ρ closer to one generally means better performance. This is under the assumption that using all the original spectral bands (after bad-band removal), the best or, at least, satisfying classification performance can be provided. For classes with similar but separable spectra, this is a reasonable assumption [13]. Such a method based on image similarity provides quantitative performance assessment even in an unsupervised situation. In the experiments, the MEAC algorithm is compared with other supervised band-selection algorithms where OPD, SAM,

140

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 8, NO. 1, JANUARY 2011

Fig. 1. AVIRIS Lunar Lake scene.

Fig. 2. Band selection for AVIRIS Lunar Lake experiment.

Fig. 3.

AVIRIS Cuprite scene.

Fig. 4.

Band selection comparison for AVIRIS Cuprite experiment.

B. AVIRIS Cuprite Experiment TD, and JM are adopted as searching criteria. To make the comparison feasible, training samples are selected manually based on known class signatures for TD and JM. The MMCA algorithm in [8] was not included for comparison since its performance in our experiments could not compete with the others. It is noteworthy that MEAC evaluates bands jointly, which offers efficiency in band selection, while other criteria do pairwise inspection.

III. E XPERIMENTS A. AVIRIS Lunar Lake Experiment The Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data used in this experiment, as shown in Fig. 1, was taken from the Lunar Crater Volcanic Field in Northern Nye County, Nevada, with 200 × 200 pixels and 158 bands after low-SNR and water-absorption bands were removed. The spatial resolution is about 20 m, which means that a significant number of pixels are mixed pixels. According to prior information, there are six classes: {cinders, playa, rhyolite, shade, vegetation, anomaly}. Since all the six spectral signatures are available, (6) was used for band selection. The OSP classifier was chosen for soft classification, and spatial correlation coefficient was calculated between the corresponding classifications maps using all the original bands and the selected bands. The averaged correlation coefficient versus the number of selected bands was plotted in Fig. 2. As we can see, the proposed MEAC method significantly outperformed the other methods. OPD, TD, and JM performed similarly. The performance of SAM was the poorest.

As shown in Fig. 3, the AVIRIS Cuprite subimage of size 350 × 350 was used in this experiment. After the waterabsorption and low-SNR bands were removed, 189 bands were left for band selection. This image scene is mineralogically well understood. At least five minerals are present: {alunite, buddingtonite, calcite, kaolinite, muscovite}. Based on previous studies, the actual number of classes in this scene is more than 20. Due to scene complexity, the signatures of many background classes are unknown. Therefore, (7) was used for band selection with S including the five materials of interest. The CLDA classifier was adopted for classification, which could be implemented when only a part of the classes are to be classified [15]. The spatial correlation coefficient was calculated between the corresponding classifications maps using all the original bands and using the selected bands. The averaged correlation coefficient versus the number of selected bands was plotted and shown in Fig. 4. Once again, MEAC significantly outperformed other methods. TD and JM provided similar performance, and the performance of OPD and SAM were close. C. HYDICE DC Data The hyperspectral data used in the experiments was taken by the airborne Hyperspectral Digital Imagery Collection Experiment (HYDICE) sensor. It was collected for the Mall in Washington, DC, with 210 bands covering 0.4–2.4 μm spectral region. The low-SNR and water-absorption bands were deleted, resulting in 191 bands. This image has high spatial resolution, which is about 2.8 m. The original image was cropped into a subimage of size 304 × 301 pixels. The image in pseudocolor is shown in Fig. 5, which includes six classes: {road, grass,

YANG et al.: EFFICIENT METHOD FOR SUPERVISED HYPERSPECTRAL BAND SELECTION

Fig. 5.

HYDICE DC data.

Fig. 6.

Band-selection comparison for HYDICE data.

141

Fig. 7. HyMap Purdue data.

Fig. 8. Band selection comparison for HyMap data.

shadow, trail, tree, roof}. Training and test samples are available for this scene. Since all the six spectral signatures are available, (6) was used for band selection. The SVM was deployed for classification since it is suitable to conduct hard classification for this image with high spatial resolution. Due to the availability of pixel-level ground truth, the overall accuracy (OA) was calculated using the selected bands. As shown in Fig. 6, MEAC still outperformed the other methods overall, TD and SAM performed similarly, and they all provided better classification than using all the original bands. The performance of OPD in this experiment was poorer than others. D. HYMAP Purdue Data In this experiment, the original HyMap image about an area close to Purdue University was cropped into a subimage with 377 × 512 pixels, as shown in Fig. 7 in pseducolor. It includes six classes: {road, grass, shadow, soil, tree, roof}. The original image has 128 bands and about 3.5-m spatial resolution, and 126 bands participated in band selection after bad-band removal. Fig. 8 shows the OA values using different numbers of selected bands, where MEAC performed the best, JM was ranked as the second best, and OPD as the third. When selecting more than three bands, they all provided better classification than using all the bands. E. Impact of Initial Band Pair Selection To evaluate the initial bands used in MEAC, we exhaustively searched the best initial bands for comparison. This is doable only if no training or test is required for classification (e.g., OSP and CLDA). It is not tractable for SVM-based classification

Fig. 9. Impact of initial band pair in AVIRIS Lunar Lake scene.

because the time required for training and test is prohibitive. For the Lunar Lake scene, it requires 158 × 157/2 searchings; for Cuprite scene, the number of searchings is 189 × 188/2. The upper and lower bounds from these searchings are recorded. As shown in Figs. 9 and 10, MEAC and the initial band selection method in Section II-C can provide results close to the upper bound of the performance, and the discrepancy becomes insignificant as the number of selected bands is increased; but bands can be selected much faster using MEAC. We also compared the case when MEAC started from an empty set, i.e., ΦS = ∅, which means that band selection is started from finding the first band. Table I lists the OA values from the SVM classifier for the HYDICE and HyMap data when using no initial band or using two initial bands selected in Section II-C. It seems that the algorithm is not sensitive to the initial conditions since the two cases yielded similar

142

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 8, NO. 1, JANUARY 2011

When selecting bands from all the 158 original bands, SFFS is much more time-consuming than SFS. In addition, SFS implementation is much more flexible because all the previously selected k − 1 bands remain the same when selecting the kth band; SFFS and B&B methods do not have such a property. IV. C ONCLUSION

Fig. 10. Impact of initial band pair in AVIRIS Cuprite scene. TABLE I OVERALL C LASSIFICATION ACCURACY ( IN P ERCENT ) U SING D IFFERENT I NITIALS

In this letter, we have proposed a new method based on the minimum estimated abundance covariance for supervised band selection. It can outperform other frequently used methods in band selection. This method does not need training samples; all it needs is class signatures. In addition, it does not examine the entire original bands or band combinations. With the SFS searching strategy and the proposed initial band pair selection, our method can complete band selection very quickly. ACKNOWLEDGMENT The authors would like to thank Dr. D. Landgrebe at Purdue University for providing the HYDICE and HyMap data, and Dr. L. Zhang and Dr. X. Huang at Wuhan University for providing the training and test samples. R EFERENCES

TABLE II C OMPARISON OF D IFFERENT S EARCHING S TRATEGIES U SING 30-BAND L UNAR L AKE DATA P RESELECTED W ITH MEAC

performance. However, using two chosen initial bands, the searching process is faster because the searching begins from the third band. F. Impact of Searching Strategies The comparison between SFS, SFFS, and Branch & Bound (B&B) searching strategies was also conducted. The B&B was an optimal feature-selection method [16]. However, it suffers from the curse of data dimensionality and can be used for low-dimensional data only. This is because the level of the solution trees is dramatically increased with data dimensionality, resulting in the increase of computational complexity exponentially. To make the comparison feasible, 30 bands of the AVIRIS Lunar Lake were preselected and then the three searching strategies were applied for further selection. As shown in Table II, the number of searching criterion evaluations required by the B&B was significantly larger than the SFS and SFFS, while the resulting classification accuracy was similar to that from SFFS and slightly higher than SFS. When selecting five from 30 bands, the numbers of criterion evaluations in SFS and SFFS were similar, but when selecting 15 out of 30 bands, this number required by SFFS was tripled.

[1] R. Pouncey, K. Swanson, and K. Hart, ERDAS Field Guide, 5th ed. Atlanta, GA: ERDAS Inc., 1999. [2] J. A. Richards and X. Jia, Remote Sensing Digital Image Analysis: An Introduction, 4th ed. Berlin, Germany: Springer-Verlag, 2006, ch. 10, pp. 267–292. [3] A. Ifarraguerri, “Visual method for spectral band selection,” IEEE Geosci. Remote Sens. Lett., vol. 1, no. 2, pp. 101–106, Apr. 2004. [4] R. Huang and M. He, “Band selection based on feature weighting for classification of hyperspectral imagery,” IEEE Geosci. Remote Sens. Lett., vol. 2, no. 2, pp. 156–159, Apr. 2005. [5] S. D. Backer, P. Kempeneers, W. Debruyn, and P. Scheunders, “A band selection technique for spectral classification,” IEEE Geosci. Remote Sens. Lett., vol. 2, no. 3, pp. 319–323, Jul. 2005. [6] C.-I Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification. New York: Kluwer, 2003, ch. 2, pp. 15–35. [7] P. Pudil, J. Novovicova, and J. Kittler, “Floating search methods in feature selection,” Pattern Recognit. Lett., vol. 15, no. 11, pp. 1119–1125, Nov. 1994. [8] C.-I Chang, Q. Du, T.-L. Sun, and M. L. G. Althouse, “A joint band prioritization and band decorrelation approach to band selection for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 6, pp. 2631–2641, Jun. 1999. [9] R. A. Johnson and D. W. Wiechern, Applied Multivariate Statistical Analysis, 6th ed. Englewood Cliffs, NJ: Prentice-Hall, 2007. [10] B. Thai and G. Healey, “Invariant subpixel detection in hyperspectral imagery,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 3, pp. 599–608, Mar. 2002. [11] Q. Du, H. Ren, and C.-I Chang, “A comparative study for orthogonal subspace projection and constrained energy minimization,” IEEE Trans. Geosci. Remote Sens., vol. 41, no. 6, pp. 1525–1529, Jun. 2003. [12] Q. Du and H. Yang, “Similarity-based unsupervised band selection for hyperspectral image analysis,” IEEE Geosci. Remote Sens. Lett., vol. 5, no. 4, pp. 564–568, Oct. 2008. [13] R. V. Platt and A. F. H. Goetz, “A comparison of AVIRIS and Landsat for land use classification at the urban fringe,” Photogramm. Eng. Remote Sens., vol. 70, no. 7, pp. 813–819, Jul. 2004. [14] J. Harsanyi and C.-I Chang, “Hyperspectral image classification and dimensionality reduction: An orthogonal subspace projection approach,” IEEE Trans. Geosci. Remote Sens., vol. 32, no. 4, pp. 779–785, Jul. 1994. [15] Q. Du, “Modified Fisher’s linear discriminant analysis for hyperspectral imagery,” IEEE Geosci. Remote Sens. Lett., vol. 4, no. 4, pp. 503–507, Oct. 2007. [16] P. M. Narendra and K. Fukunaga, “A branch and bound algorithm for feature subset selection,” IEEE Trans. Comput., vol. C-26, no. 9, pp. 917–922, Sep. 1977.