904
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 8, NO. 5, SEPTEMBER 2011
On the Use of Feature Selection for Classifying Multitemporal Radarsat-1 Images for Forest Mapping Yasser Maghsoudi, Michael J. Collins, Senior Member, IEEE, and Donald Leckie
Abstract—As the number of satelliteborne SAR systems increases, both the availability and the length of multitemporal (MT) sequences of SAR images have also increased. Reported research with MT SAR sequences suggests that they increase the classification accuracy for all applications over single-date images. The length of the MT SAR sequences reported in the literature is still quite modest: on the order of six images. As the length of a sequence increases, the selection of images to use in a classification becomes important. The current practice is to add scenes chronologically, and some researchers have suggested that image selection does not affect classification accuracy. Our research explored the problem of image selection in MT SAR classification. We compared the chronological selection scheme with two feature selection algorithms: a very simple algorithm and a more complex class-based algorithm. We found that, while the simple feature selection algorithm was more efficient than chronological selection, yielding peak accuracy with few features, it saturated at the same accuracy as chronological selection. The more complex algorithm was significantly more accurate than chronological selection, even with just two features. Our results suggest that the use of a feature selection algorithm produces more efficient and more accurate classification results than chronological selection. Index Terms—Forestry, synthetic aperture radar (SAR), time series analysis.
I. I NTRODUCTION
F
OREST mapping is one of the core applications in remote sensing. While the majority of studies are based on multispectral optical images, weather conditions often limit the use of these data. On the other hand, the synthetic aperture radar (SAR) data are not only independent of the weather conditions but also sensitive to the geometry of both the canopy and branching structure depending on the radar frequency. For these reasons, SAR data have been consistently demonstrated to be a useful tool for forest mapping applications. Despite these advantages, the presence of speckle noise makes it difficult to obtain satisfactory results from classification of SAR data [1], i.e., the data have poor radiometric resolution. A speckle filtering preprocessing step is generally necessary in any SAR image analysis. The goal of speckle filtering is to provide an estimate of the scattering coefficient, such as σ 0 , at each pixel, i.e., to increase the radiometric resolution. The most common method reported in the literature is to apply Manuscript received August 24, 2010; revised February 4, 2011; accepted March 28, 2011. Date of publication May 31, 2011; date of current version August 26, 2011. Y. Maghsoudi and M. J. Collins are with the Department of Geomatics Engineering, University of Calgary, Calgary, AB T2N 1N4, Canada (e-mail:
[email protected];
[email protected]). D. Leckie is with the Pacific Forestry Centre, Victoria, BC V8Z 1M5, Canada (e-mail:
[email protected]). Digital Object Identifier 10.1109/LGRS.2011.2140311
the speckle filter in the spatial domain [1], [2]. The drawback of spatial speckle filters is that they reduce the spatial resolution of the image data which is an issue when the analysis is aimed at extracting or analyzing spatial features. In addition, one application of a spatial filer is sometimes not enough to increase the radiometric resolution [often measured with the equivalent number of looks (ENLs)] to the requirements of the analysis. Alternatively, when multitemporal (MT) data are available, one can filter, at least partly, in the temporal dimension, allowing an increase in radiometric resolution without the associated reduction in spatial resolution [3]. Bruniquel and Lopes [4] developed several filters, including two that assumed heterogeneous speckle, one that minimized the temporal variance, and another that minimized a mean square error. The former was found to be better at reducing speckle while the latter was better at preserving spatial structure. While the performance of these algorithms was excellent, they were computationally intensive. Gineste [5] developed an adaptive filter inspired by Lee’s sigma filter. Additionally, Quegan and Yu [6] simplified the Bruniquel filter to produce an algorithm with a simple form that allowed a recursive implementation with excellent performance measured by the increase in the ENL. In analyzing the recursive behavior of their MT filter, Quegan and Yu [6] added images chronologically to the filter and stated that the order in which the images are added does not matter, i.e., “all images will end up being filtered in the same way, whatever the order.” In an agricultural application to demonstrate the value of MT data, Skriver [7] also added his images chronologically. Quegan and Yu further observed that, as they added images to the MT data set, the ENL began to saturate, and they derived an expression for the maximum ENL value. However, the number, and particularly the selection, of scenes required to achieve this saturating value is an open question. MT SAR data have been used by several investigators to improve classification accuracies for several land covers [8]–[10]. While the vast majority of studies use a spatial speckle filter as a preprocessing step [11]–[13], a few authors have started reporting the use of MT filters [8], [9], [14]. In their classification of forest/nonforest, Quegan et al. [10] found that a singlestage MT filter did not increase the ENL enough to generate good results. To increase the ENL further, they added a spatial filtering stage to create a two-stage filtering system. This, of course, reduced the spatial resolution. However, in most forest mapping applications, the detailed spatial structure of the image is not the focus of attention. In reporting the performance of their MT filters, most authors use some measure of the radiometric resolution, such as the ENL [4]–[6] or the coefficient of variation [15]. In a more recent comparison of three MT filters, Trouve et al. [16] combined radiometric criteria with a few operational criteria. However, the latter was focused on the extraction of spatial features from
1545-598X/$26.00 © 2011 IEEE
MAGHSOUDI et al.: ON THE USE OF FEATURE SELECTION FOR CLASSIFYING MT RADARSAT-1 IMAGES
TABLE I MT R ADARSAT-1 SAR DATA U SED IN T HIS L ETTER . A SC I S AN A SCENDING PASS , AND D SC I S A D ESCENDING PASS . F5 I S F INE M ODE B EAM 5, F5 F I S F INE M ODE B EAM 5 FAR , AND F3 I S F INE B EAM 3. T HE I MAGES H AVE B EEN O RGANIZED I NTO T HREE G ROUPS , AS D ESCRIBED IN THE T EXT. T HESE G ROUPS A RE AS F OLLOWS : 1) (L EFT ) 17 A SCENDING F5 M ODE I MAGES F ROM THE S UMMER ; 2) (R IGHT, T OP ) 6 A SCENDING F3 M ODE I MAGES F ROM THE S UMMER ; AND 3) (R IGHT, B OTTOM ) 11 D ESCENDING F3 M ODE I MAGES F ROM THE W INTER
the image and is thus not relevant for forest mapping where the goal is to relate the scattering coefficient from regions to forest characteristics. While this may be a pixel-based analysis, it is not focused on spatial features. In this letter, we have two objectives. First, we propose the use of classification accuracy as the relevant operational metric for comparing MT filtering scenarios. We further use McNemar’s test to evaluate the significance of the accuracy differences. Second, we will demonstrate that the classification accuracy is influenced by the selection of the SAR scenes. Although Quegan and Yu have stated that scene selection has no effect on the results, we will employ two feature selection algorithms to show that the classification accuracy will saturate with fewer scenes if we select the scenes using feature selection than if we add scenes chronologically. II. DATA S ET A. Study Site The study site selected is the Petawawa Research Forest (PRF) located near Chalk River, Ontario (45◦ 57 N, 77◦ 34 W). It is approximately 200 km west of Ottawa and 180 km East of North Bay, Ontario, Canada. This experimental forest is larger than 100 km2 in size and is characterized by white, red, and jack pines, white and black spruces, poplar, and red oak. About 85% of the PRF is productive forestland with growing stock estimated to be 1.5 million m3 . Harvesting schedules vary depending on research program needs, but the typical volume of harvested wood is 2400–7000 m3 /year.
905
TABLE II C LASSES AND N UMBER OF T RAINING AND T ESTING S AMPLES U SED IN THE C LASSIFICATION
fine-mode beams, as shown in the table, with nominal spatial resolutions of 7.1 m × 8.4 m for F5 and 7.6 m × 8.4 m for F3 and a pixel spacing of 6.25 m in both the range and azimuth. The incidence angle ranges are 41.5◦ –44.0◦ for F3, 45.3◦ –47.8◦ for F5, and 45.6◦ –47.8◦ for F5 far. All fine-mode Radarsat-1 image data were processed with one look. Based on the season and imaging mode, the MT images were then broken into three groups, as shown in the table: summerF3-A (6 images: 12, 13, 18, 20, 25, and 27), summer-F5-A (17 images: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 14, 19, 24, 26, 28, and 32), and winter-F3-D (11 images: 15, 16, 17, 21, 22, 23, 29, 30, 31, 33, and 34). As explained hereinafter, each group was MT filtered separately. Reference data were collected using a digital land use inventory and a set of Landsat images. The list of classes and the number of training and test data is shown in Table II. III. M ETHODS Our analysis consists of five steps: preprocessing, MT filtering, feature selection, classification, and classifier combination. Preprocessing consists of two parts: calculation of a calibrated radar backscattering coefficient σ ◦ from the digital number at each pixel and registration of the images in MT sequences. The total rms of the registration was 1.2, and the x and y rmss were both less than 1.0. In our research, we used the MT filter developed by Quegan and Yu [6]. This filter assumes that the correlation between the temporal images can be neglected and takes a simple form Ik Ii (x, y) M i=1 Ii M
Jk (x, y) =
(1)
where M is the number of images and Ik can be calculated by spatial averaging the kth image over a small window. Hence, in addition to the pixel-to-pixel summation in (1), the calculation of Ik requires spatial averaging which will degrade the geometric resolution. Although the results of Quegan and Yu were mixed, an adaptive filter can ameliorate this problem. In our work, an adaptive filtering method was adopted. In our case, the enhanced Frost filter was chosen [17]. This filter is designed to smooth out noise while retaining edges or shape features in the image and ameliorates the reduction in spatial resolution implied by this spatial averaging.
B. Experimental Data Thirty-four Radarsat-1 ascending and descending images acquired from August 1996 to February 2007 were used in this letter and listed in Table I. These data are a mixture of
A. Feature Selection The M MT filtered images may be thought of “features” and are used to build a feature space required for classification. For
906
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 8, NO. 5, SEPTEMBER 2011
example, if we use all the images in the MT set, we will have a 34-dimensional feature space. However, we cannot use all 34 images due the small sample size problem. Thus, a key stage in classifier design is the selection of the most discriminative and informative features. Feature selection is the process of finding the “best” subset of features to use for classification, where “best” refers to an evaluation function used to compare features. Because it is not possible to explore every combination of features to find the optimal combination, feature selection algorithms are all suboptimal; and consistent of a search algorithm. The aim of the search algorithm is to generate subsets of features from the original feature space, and the evaluation function compares these feature subsets in terms of discrimination. There have been a large number of algorithms proposed for the purpose of feature selection, e.g., [18] and [19]. The first and the most commonly used group of methods for performing feature selection are sequential methods. They begin with a single solution (a feature subset) and progressively add and discard features according to a certain strategy. These methods are ranging from sequential forward selection (SFS) to sequential backward selection (SBS) methods [20]. SFS starts from an empty set. It iteratively generates new feature sets by adding one feature which is selected by some evaluation function. SBS, on the other hand, starts from a complete set and generates new subsets by removing a feature selected by some evaluation function. The main problem of these two algorithms is that the selected features cannot be removed (SFS) and the discarded features cannot be reselected (SBS). To overcome these problems [21] proposed the floating versions of SFS and SBS. Sequential forward floating selection (SFFS) algorithms can backtrack unlimitedly as long as they find a better feature subset. Sequential backward floating selection is the backward version. Comparing several feature selection algorithms, [22] showed that the SFFS method does not only have a good performance even on high dimensional problems but also have a feasible computational load. For these reasons, the SFFS algorithm was used as the feature selection method in this letter. The SFFS algorithm is based on the Jeffries–Matusita (JM) distance JM =
N N
Fig. 1. Schematic showing components of class-based classification of the MT image set. The shaded box is the CBFS algorithm, where JMCB stands for the class-based JM distance.
Next, the most discriminative features for the second class are selected by using the same procedure for the second class. This process is repeated until all the feature subsets for all classes are selected. The CBFS algorithm is shown in Fig. 1 in the shaded box. The JM distance employed with the CBFS algorithm is the distance between one class and all the other classes JMCB (j) =
N
Pi Pj Ji,j
i=1
in which j is the class for which features are selected. We use the CBFS algorithm as an example of a more sophisticated feature selection algorithm. B. Classification
Pi Pj Ji,j
i=1 j>i
where Pi and Pj are the prior probabilities of classes i and j, respectively, N is the number of classes, and Ji,j is the JM distance between classes i and j. Most feature selection algorithms seek a single set of features that distinguish among all the classes simultaneously. This can increase the complexity of the decision boundary between classes in the feature space [23]. In addition, considering one set of features for all the classes requires a large number of features. To overcome these problems, we developed a classbased feature selection (CBFS) algorithm [24]. The reader is referred to this paper for the details of the algorithm. The main idea of the CBFS method is that, from the large number of images in MT data, there are some which can discriminate each class better than the others. The CBFS method is explained as follows: First of all, the feature selection process is applied for the first class, and hence, the most appropriate features for discriminating the first class from the others are selected.
After speckle filtering and feature selection, the images are ready for classification. The classification problem consists of two parts: determination of the classifier and selection of the input features. In this letter, we use the standard maximum likelihood classifier and use equivalent prior probabilities for all classes. This classifier assumes that the image data have a Gaussian distribution. We found that the spatial and MT speckle filters transformed the data distribution to something closely resembling a Gaussian distribution. C. Classifier Combination The CBFS algorithm consists of a set of classifiers—one for each class. Once the posterior probabilities are computed for all classes in all classifiers, a combination mechanism is finally used to combine the outputs of the individual classifiers. Based on the classifier outputs, there are several consensus rules for the combination process. Since the classifier outputs are a list of probabilities for each class, the measurementlevel methods can be used to combine the classifier outputs
MAGHSOUDI et al.: ON THE USE OF FEATURE SELECTION FOR CLASSIFYING MT RADARSAT-1 IMAGES
Fig. 2. Overall classification accuracy versus number of images for chronological and selected images.
[25]. The most commonly used measurement-level methods are mean and product combination rules which perform the same classification in most cases. In the case of independent feature spaces, however, the product combination rule outperforms the mean rule [26], and hence, it was applied as the combination method in this letter. According to the product combination rule, the pixel x is assigned to the class if ⎡ ⎤ N N N p(xj /ci ) = max ⎣ p(xj /ck )⎦ . (2) k=1
j=1
j=1
The proposed CBFS method is schematically shown in Fig. 1. In this letter, we apply the system outlined in this schematic to the 34 calibrated, registered, and MT filtered scenes in Table I. IV. E XPERIMENTAL M ETHODS In this letter, we have run three experiments. In the first, we add the images chronologically to the classifier according to the list in Table I. In the second, we use the SFFS algorithm to select the features, and finally, we use the CBFS algorithm. Note that, in the first two experiments, we have a single set of features and a single classifier, while in the third experiment, we have a different set of features and, thus, a different classifier for each class. These classifier outputs are then combined. In this letter, we report the overall accuracy. Finally, McNemar’s test was used to evaluate whether the differences in classification accuracy are statistically significant [27]. The null hypothesis is that two classifications are not different from each other. The McNemar test statistic is T =
(f12 − f21 )2 . f12 + f21
At a given significance level α (we have used α = 0.05), H0 can be rejected if the test statistic T is greater than χ2(1,1−α) . V. E XPERIMENTAL R ESULTS Fig. 2 shows the overall accuracy as the number of features increases for the three scenarios. The graph shows that, in all cases, increasing the number of features increases the classification accuracy. The SFFS algorithm and the chronological selection scheme start at the same point at two features. The SFFS algorithm then increases rapidly to a saturation point at five features, while the chronological scheme increases slowly until 14 features where it finally jumps to the same saturation point. Thus, the advantage of using even a very simple feature selection scheme is that it requires far fewer images (one-third in this case) in the time series to generate comparable accuracy of the chronological selection scheme. However, these selection schemes eventually have comparable accuracies.
907
The optimal 5 and 15 features selected by the SFFS and CBFS algorithms are shown in Table III. In this table, we have shaded the scenes from the winter group and have used a slanted font to indicate the scenes from the summer-F3 group. The incidence angle difference between F3 and F5 is only about 3◦ and is not likely to produce any meaningful differences in backscatter. This also applies to the difference between ascending and descending scenes, and we do not expect any azimuthal differences in backscatter to be there. We believe that the main differences are between the summer and winter scenes. There are a few observations worth noting. First, for the 5-feature sets, every set includes at least one or two winter scene. There are two exceptions to this: The dense and sparse ground vegetation classes have no winter observations, and the jack pine class contains three winter scenes. This carries over to the 15-feature sets, where every set contains four or five winter scenes, with the exception of sparse ground vegetation which has only two winter scenes and jack pine which has six winter scenes. Hence, for accurate classifications, both winter and summer scenes are required. However, the majority of scenes must be from the summer. The second observation is that, with few exceptions, the scenes included in the five-feature sets are also included in the 15-feature sets, indicating a consistency between selected features. Finally, we have shaded the scenes from “midwinter”—January to mid-February—in a darker shade of gray. While only a few of the five-feature sets include a midwinter scene, all but one class (sparse ground vegetation) contain at least one midwinter scene. Jack pine clearly relies on midwinter observations for accurate results, while sparse ground vegetation requires no winter scenes. For all other classes, midwinter observations increase the accuracy over summer and early/late winter scenes. We could extend this to the inclusion of scenes 2 and 9, which are both from early June when most deciduous leaves have not fully emerged. Both SFFS feature sets contain an early summer observation. Moreover, while few of the fivefeature CBFS sets contain an early summer observation, the 15-feature sets for all classes but jack pine and black spruce contain at least one early summer scene. In general, the advantage of a more sophisticated feature selection algorithm, such as the CBFS, is clear. Even at two features, this algorithm generates a significantly higher accuracy than simple feature selection according to McNemar’s test. In all cases, the calculated statistic T was larger than χ2(1,1−α) with p values near zero. This can be interpreted as the consequence of using multiple subsets of features instead of one. The CBFS algorithm saturates at 11 features with an accuracy of 84% and then declines slowly. The CBFS-based classifier had significantly higher accuracies than both the chronological and SFFS systems according to McNemar’s test. The p values, which are not shown here, were all very small—on the order of 10−5 . VI. C ONCLUSION Our goal in this letter has been to demonstrate that feature selection, when applied to the classification of MT SAR image data, would be more efficient and more accurate than the chronological addition of MT scenes to a feature space. Our results support this hypothesis. Even simple feature selection was found to be more efficient, with the classification accuracy
908
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 8, NO. 5, SEPTEMBER 2011
TABLE III O PTIMAL 5 AND 15 F EATURES S ELECTED BY THE SFFS AND CBFS A LGORITHMS . F OR THE CBFS F EATURES , W E H AVE I NDICATED THE C LASS C ODES F ROM TABLE II ON THE L EFT-H AND C OLUMN . T HE F EATURES F ROM THE W INTER -F3 G ROUP A RE S HADED IN G RAY, AND T HOSE F ROM “M IDWINTER ” (JANUARY TO M ID -F EBRUARY ) A RE S HADED IN A DARKER G RAY. S CENES F ROM THE S UMMER -F3 G ROUP A RE IN A S LANTED F ONT
saturating at one-third the number of features of the chronological selection scheme. However, while the simple feature selection and chronological selection had comparable accuracies in the end, a more sophisticated feature selection algorithm generated significantly higher accuracies for all feature space dimensions according to McNemar’s test. An analysis of the 5- and 15-feature sets suggests that seasonal observations are different for the different classes. However, it is clear that all classes required both summer and winter scenes, although the summer scenes played a more prominent role. Midwinter observations, when the ground is frozen, and early summer observations, when the leaves are still unfurling, appear to be important as well. These results suggest that an exploration of the seasonal requirements for the different classes would be useful. Our results contradict the current practice of adding MT images chronologically as reported, for example, by Skriver [7]. In addition, although results were for a particular application, the mapping of boreal forest in eastern Canada, they also contradict the assertion of Quegan and Yu [6] that the order of adding images does not affect the classification accuracy. Our data consisted of a relatively long multiyear time series of SAR images that was characterized by diversity in season, environmental conditions, and imaging geometry. The particular results will, of course, vary with application, with the characteristics of the time series, and with the particular feature selection algorithm used. However, our results suggest that a feature selection algorithm will improve the classification accuracy of MT SAR images. R EFERENCES [1] C. J. Oliver and S. Quegan, Understanding Synthetic Aperture Radar Images. Norwood, MA: Artech House, 1998. [2] R. Touzi, “A review of speckle filtering in the context of estimation theory,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 11, pp. 2392– 2404, Nov. 2002. [3] S. Quegan and T. L. Toan, “Analyzing multi-temporal SAR images,” in Proc. 2nd J. Latino Amer. Sensoriamento Remoto Radar ESA, Buenos Aires, Argentina, 1998, pp. 17–25. [4] J. Bruniquel and A. Lopes, “Multi-variate optimal speckle reduction in SAR imagery,” Int. J. Remote Sens., vol. 18, no. 3, pp. 603–627, 1997. [5] P. Gineste, “A simple, efficient filter for multi-temporal SAR images,” Int. J. Remote Sens., vol. 20, no. 13, pp. 2565–2576, 1999. [6] S. Quegan and J. J. Yu, “Filtering of multichannel SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 11, pp. 2373–2379, Nov. 2001. [7] H. Skriver, “Comparison between multitemporal and polarimetric SAR data for land cover classification,” in Proc. IEEE IGARSS, 2008, vol. 3, pp. III 558–III 561. [8] G. Satalino, F. Mattia, T. L. Toan, and M. Rinaldi, “Wheat crop mapping by using ASAR AP data,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 2, pp. 527–530, Feb. 2009.
[9] C. Thiel, O. Cartus, R. Eckardt, N. Richter, C. Thiel, and C. Schmullius, “Analysis of multi-temporal land observation at C-band,” in Proc. IEEE IGARSS, 2009, vol. 3, pp. III 318–III 321. [10] S. Quegan, T. L. Toan, J. J. Yu, F. Ribbes, and N. Floury, “Multitemporal ERS SAR analysis applied to forest mapping,” IEEE Trans. Geosci. Remote Sens., vol. 38, no. 2, pp. 741–753, Mar. 2000. [11] D. Wang, H. Lin, J. Chen, Y. Zhang, and Q. Zeng, “Application of multi-temporal ENVISAT ASAR data to agricultural area mapping in the Pearl River Delta,” Int. J. Remote Sens., vol. 31, no. 6, pp. 1555–1572, Feb. 2010. [12] Y. Zhang, C. Wan, J. Wu, J. Qi, and W. A. Salas, “Mapping paddy rice with multitemporal ALOS PALSAR imagery in Southeast China,” Int. J. Remote Sens., vol. 30, no. 23/24, pp. 6301–6315, 2009. [13] B. Waske and M. Braun, “Classifier ensembles for land cover mapping using multitemporal SAR imagery,” ISPRS J. Photogramm. Remote Sens., vol. 64, no. 5, pp. 450–457, Sep. 2009. [14] J.-M. Martinez and T. L. Toan, “Mapping of flood dynamics and spatial distribution of vegetation in the amazon floodplain using multitemporal SAR data,” Remote Sens. Environ., vol. 108, no. 3, pp. 209–223, Jun. 2007. [15] D. Coltuc, E. Trouve, F. Bujor, N. Classeau, and J. Rudant, “Time-space filtering of multi-temporal SAR images,” in Proc. IEEE IGARSS, 2000, pp. 2909–2911. [16] E. Trouve, Y. Chambenoit, N. Classeau, and P. Bolon, “Statistical and operational performance assessment of multitemporal SAR image filtering,” EEE Trans. Geosci. Remote Sens., vol. 41, no. 11, pp. 2519– 2530, Nov. 2003. [17] Z. Shi and K. B. Fung, “A comparison of digital speckle filters,” in Proc. IEEE IGARSS, Aug. 1994, pp. 2129–2133. [18] S. B. Serpico, M. D’Inca, F. Melgani, and G. Moser, “Comparison of feature reduction techniques for classification of hyperspectral remote sensing data,” in Proc. SPIE—Image and Signal Processing for Remote Sensing VIII, S. B. Serpico, Ed., 2003, vol. 4885, pp. 347–358. [19] T. Kavzoglu and P. M. Mather, “The role of feature selection in artificial neural network applications,” Int. J. Remote Sens, vol. 23, no. 15, pp. 2919–2937, 2002. [20] J. Kittler, “Feature selection and extraction,” in Advances in Computer Vision and Image Processing: Image Reconstruction From Incomplete Observations, T. S. Huang, Ed. Greenwich, CT: JAI Press, 1984, pp. 60–81. [21] P. Pudil, J. Novoviˇcová, and J. Kittler, “Floating search methods in feature selection,” Pattern Recog. Lett., vol. 15, no. 11, pp. 1119–1125, Nov. 1994. [22] A. Jain and D. Zongker, “Feature selection: Evaluation, application and small sample performance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 2, pp. 153–158, Feb. 1997. [23] S. Kumar, J. Ghosh, and M. Crawford, “Best-bases feature extraction algorithms for classification of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 7, pp. 1368–1379, Jul. 2001. [24] Y. Maghsoudi, M. J. V. Zoej, and M. J. Collins, “Using a class-based feature selection for the classification of hyperspectral data,” Int. J. Remote Sens., to be published. DOI: 10.1080/01431161.2010.486416 [25] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combining classifiers,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, pp. 226–239, Mar. 1998. [26] D. M. J. Tax, M. V. Breukelen, R. P. W. Duin, and J. Kittler, “Combining multiple classifiers by averaging or by multiplying,” Pattern Recog., vol. 33, no. 9, pp. 1475–1485, Sep. 2000. [27] G. Foody, “Classification accuracy comparison, hypothesis tests and the use of confidence intervals in evaluations of difference, equivalence and non-inferiority,” Remote Sens. Environ., vol. 113, no. 8, pp. 1658–1663, Aug. 2009.