Pixel Unmixing in Hyperspectral Data by Means of Neural Networks

Report 2 Downloads 84 Views
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 49, NO. 11, NOVEMBER 2011

4163

Pixel Unmixing in Hyperspectral Data by Means of Neural Networks Giorgio A. Licciardi and Fabio Del Frate, Senior Member, IEEE

Abstract—Neural networks (NNs) are recognized as very effective techniques when facing complex retrieval tasks in remote sensing. In this paper, the potential of NNs has been applied in solving the unmixing problem in hyperspectral data. In its complete form, the processing scheme uses an NN architecture consisting of two stages: the first stage reduces the dimension of the input vector, while the second stage performs the mapping from the reduced input vector to the abundance percentages. The dimensionality reduction is performed by the so-called autoassociative NNs, which yield a nonlinear principal component analysis of the data. The evaluation of the whole performance is carried out for different sets of experimental data. The first one is provided by the Airborne Hyperspectral Scanner. The second set consists of images from the Compact High-Resolution Imaging Spectrometer on board the Project for On-Board Autonomy satellite, and it includes multiangle and multitemporal acquisitions. The third set is represented by Airborne Visible/InfraRed Imaging Spectrometer measurements. A quantitative performance analysis has been carried out in terms of effectiveness in the dimensionality reduction phase and in terms of the accuracy in the final estimation. The results obtained, when compared with those produced by appropriate benchmark techniques, show the advantages of this approach. Index Terms—Autoassociative neural networks (AANNs), dimensionality reduction, hyperspectral, NNs, nonlinear principal components, pixel unmixing.

I. I NTRODUCTION

N

EURAL networks (NNs) appeared on the scene of remote sensing at the beginning of the 1990s when it was proven that, in multisource analysis, where we do not always know the distribution functions, they could be more appropriate than traditional statistical algorithms [1]. Since then, the use of NNs has spread in the remote sensing community, leading to an increasing number of studies reported in literature in recent years [2]. Many of them have shown considerable advantages of NNs over other methods. In brief, the rapid growth of neural approaches in remote sensing is mainly due to their largely demonstrated ability to learn the complex patterns characterizing both the forward radiative transfer model and the

Manuscript received September 30, 2010; revised February 1, 2011 and April 1, 2011; accepted May 19, 2011. Date of publication August 1, 2011; date of current version October 28, 2011. G. A. Licciardi was with the Computer Science, Systems and Production Department, Tor Vergata University, 00133 Rome, Italy. He is now with the GIPSA-Lab, Grenoble Institute of Technology, 38402 Grenoble, France (e-mail: [email protected]). F. Del Frate is with the Computer Science, Systems and Production Department, Tor Vergata University, 00133 Rome, Italy (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2011.2160950

inversion problem [3]. Their capability to generalize in noisy environments makes NNs robust techniques when the input data are incomplete or incorrect [4]. Moreover, NNs can be flexible and capable of positively combining different types of data, with no need of assumptions about the distributions of the data set used [5]. Also, in the specific context of the pixel unmixing of multispectral images, NNs have already been proven to be an effective approach [6]–[8]. In particular, although a heavy learning computation can be required, the multilayer perceptron (MLP) models have provided competitive performance with respect to other techniques such as the linear unmixing model and fuzzy c-means classifiers [9], [10]. Considering the hyperspectral data, the unmixing problem may become more complex due to the high dimensionality of the input vector, i.e., the number of spectral bands in the acquired data. For such data, the design of effective procedures aiming at lowering the size of the data, preserving as much as possible their information content, can be one of the key steps for the success of the whole unmixing procedure [11]. In fact, a high number of spectral bands exhibit high correlation, adding a redundancy that may obscure information relevant for abundance estimation and may decrease the accuracy of final products. Dealing with an NN inversion scheme, the extraction of representative features plays an even more crucial role. A network with fewer inputs has fewer adaptive parameters to be determined, which need a smaller training set to be properly constrained. This leads to a network with improved generalization properties providing smoother mappings. In addition, a network with fewer weights may be faster to train. All of these benefits make the reduction of the dimension of the input data a recommended procedure in many NN applications [12]. As far as the use of NNs in the field of hyperspectral data is concerned, they have already been recognized as representing a rather competitive family of algorithms for image classification [13]. Moreover, NNs have already been successfully applied for the design of one of the first end-to-end processing scheme dedicated to hyperspectral imagery provided by the Compact High-Resolution Imaging Spectrometer (CHRIS)-Project for On-Board Autonomy (PROBA) satellite [14]. However, despite the recognized potential of NNs in inversion problems, their exploitation in hyperspectral pixel unmixing has so far been rather sparsely investigated, and the management of a highdimension input was confirmed to be one of the crucial issues [15], [16]. The objective of this paper is to present a new methodology in designing robust schemes for hyperspectral unmixing using NNs. The main technical innovation is that MLP topologies are

0196-2892/$26.00 © 2011 IEEE

4164

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 49, NO. 11, NOVEMBER 2011

exploited for both tasks of feature reduction and abundance estimation, providing an effective solution to the management of a high-dimension input vector for pixel unmixing. The rationale of using MLP models relies on two main factors: they have been found to have the best-suited topology for classification and inversion problems [17], and they are able to perform with one single architecture (both the feature extraction and the unmixing). In fact, the final implementation chains the two tasks to a rather compact scheme and is characterized by a high level of automatism. In the study, both airborne and satellite images are used for the evaluation of the performance. II. M ETHODOLOGY A. Training MLP NNs If, on one hand, NNs are recognized as being universal approximators [18], on the other hand, their inappropriate use might lead to an undesired network overfitting over the training set. This means that the network becomes tailored too much to the learning examples and does not provide a satisfactory performance when it is asked to generalize over new patterns. Hence, in NN design, two main issues have to be addressed: 1) when to stop the training algorithm, and 2) how many neurons have to be included in the hidden layers of the topology. For the first aspect, we considered the so-called early stopping algorithm [12]. According to this algorithm, training and test sets are involved in the learning phase. The test set consists of examples not belonging to the training one. The network parameters are iteratively changed to minimize the error on the training set, but the network performance is, in parallel, evaluated also on the test set. The training of the net is stopped when the error on the test set reaches its minimum. Whereas this procedure is not applied, a net overtraining is very likely, and despite the fact that the error on the training data set may be smaller, the generalization capability is reduced. For the second task, a grid search method aiming at minimizing the mean-square error M SE was carried out. The M SE error is given by ! ! (tpj − opj )2 M SE =

p∈P j∈N

P

where P is the number of training patterns, N is the number of output neurons, tpj is the desired (true) output of neuron j for pattern p, and opj is the actual output. It should be mentioned that most of the neural simulations were provided by the Stuttgart Neural Network Simulator package [19]. B. AANNs Autoassociative NNs (AANNs) have already been used in remote sensing for data dimensionality reduction in [20]. However, in that case, the authors considered the AANN model in a rather different application, i.e., for the retrieval of atmospheric variables from ground-based microwave radiometry. AANNs are MLP networks where the targets used to train the network are simply the input vectors themselves so that the network is attempting to map each input vector onto itself. For this reason,

Fig. 1.

AANN topology.

the network is said to form an autoassociative mapping. The number of units in the central bottleneck layer is significantly reduced with respect to the number of input (or output) units; therefore, in general, a perfect reconstruction of all inputs is not possible. However, if network training finds an acceptable solution, i.e., a solution which gives an error below a predefined threshold, a good representation of the input must exist in the bottleneck layer. In other words, data compression caused by the network bottleneck may force hidden units to represent significant features in the data. As shown in Fig. 1, the network can be viewed as two successive functional mappings. The first mapping projects the original d-dimensional data onto a lower dimensional subspace defined by the activations of the units in the central hidden layer. Because of the presence of nonlinear units, this mapping is essentially arbitrary and, in particular, not restricted to being linear, as for the standard principal component analysis (PCA). Similarly, the second half of the network defines an arbitrary functional mapping from the lower dimensional space back into the original d-dimensional space. The described network performs a nonlinear PCA (NLPCA). It might be thought that these arbitrary mappings can be performed by a net with one hidden layer, obtained by the one in Fig. 1, removing the mapping and demapping layers, provided that the bottleneck layer has nonlinear (sigmoid) activation functions. However, it was shown by Bourlard and Kamp [21] that, in this case, the PCA analysis is the one providing the minimum error on the representation of the data in a subspace with reduced dimensionality. There is therefore no advantage in using the net with one hidden layer to perform dimensionality reduction. It also has to be noted that, as the output has to simply replicate the input, no independent target data are provided, and there is no need to have an a priori knowledge for the implementation of the learning phase. This implies that the AANN training can be performed in a fully automatic way and that all pixels in the image can be considered for this task, which has actually been the technique adopted in this paper. As to the number of nonlinear principal components to be included into the final processing scheme, we observed that the PCA technique is still a consolidated technique for dimensionality extraction in hyperspectral data. For this reason, it could be used for two purposes: first, it was the benchmark in evaluating the dimensionality reduction performance of the AANN technique on hyperspectral data, and second, it was used to decide the

LICCIARDI AND DEL FRATE: PIXEL UNMIXING IN HYPERSPECTRAL DATA BY MEANS OF NEURAL NETWORKS

4165

processing units are real-valued sigmoidal functions, providing an output value in the range [0, 1]. Therefore, such a value can be considered correlated to the fuzzy membership value. More analytically, the abundance ai corresponding to i-ism class is given by the following expression: oi ai = M ! ok k=1

where ok indicates the NN output associated to the k-ism endmember and M is the total number of endmembers. D. NN Performance Evaluation

Fig. 2. Cascade NN architecture performing feature extraction (left stage) and pixel unmixing (right stage).

number of units to be considered in the AANN bottleneck layer. In particular, the number was set equal to the number of PCA components enabling the representation of at least 99% of the variance of the data. C. Unmixing Algorithm Once feature extraction was performed by means of AANN, the reduced measurement vector could be used as input to a new MLP scheme for a pixel-based fuzzy classification procedure (Fig. 2). When adequate and available, a priori knowledge was exploited for two purposes: to define the number and the typology of the endmembers and to select the pure training samples (pixels) necessary for the learning phase of the NN performing the fuzzy classification. For the first task, this means that the most important types of land cover characterizing the considered test areas were individuated. For the second task, the a priori information available guided the selection of a set of individual seed pixels in the image. At this point, to generate a statistically significant number of training pixels, other representative pure pixels, belonging to the same land cover type, were automatically extracted. The selection condition was that their Euclidean distance from the seed pixels was below a predefined threshold. It has to be noted that providing reliable examples is essential for the success of the network learning phase. As the selection of the training pixels was based on image photo-interpretation, the exact individuation of mixed pixels could be critical. Hence, the strategy of using only pure pixels, also considered by Foody in [7], seemed more robust and appropriate in minimizing photo-interpretation errors. However, in one of the three experiments considered (Airborne Visible/InfraRed Imaging Spectrometer (AVIRIS) data), it was not possible to extract individual pure pixels in the image. In this case, the training of the NN performing the fuzzy classification was carried out using pure reference spectra made available by the United States Geological Survey (USGS) spectral library. In any case, as only pure pixels were considered in the learning phase, the following standard code was considered for the output vector: the output unit associated to the actual land cover class had a value of “1,” while the remaining ones had a value of “0.” Although the network is trained with binary values in the output vector, the activation functions of its

The results yielded by the NN scheme have been qualitatively and quantitatively evaluated both in terms of the NNs’ capability of representing the hyperspectral data with a reduced number of components and in terms of the accuracy obtained on the endmember abundance estimation. For the dimensionality reduction results, a performance index was introduced and computed by considering the pixel mean percentage reconstruction error obtained over the whole image, including all bands. More precisely, we defined the reconstruction mean error RM E as

RM E =

B M ! ! vt −vr 1

1

vt

M ×B

where M is the overall number of pixels in the image, B is the number of bands, vt is the true original band value for the specific pixel, and vr is the reconstructed value. Note that, in one case (Airborne Hyperspectral Scanner (AHS) data), an analysis of the robustness to variable atmospheric path contribution was carried out. In fact, the scattering of atmospheric molecules and aerosols and the absorption of gases such as water vapor can weaken the upwelling signals received by the sensors [22]. For the dimensionality reduction, the PCA and minimum noise fraction (MNF) algorithms have been considered as benchmarks for the assessment of the NN methodology. In fact, they are still commonly used approaches for dimensionality reduction of hyperspectral data [23], [24]. As far as the unmixing performance is concerned, the quantitative evaluation was made by considering the results provided by the network on a set of individual pixels, not used in the training phase. Some mixed pixels interpreted based on the available ground truth (in most cases, an image at higher spatial resolution) were carefully selected for this purpose. With regard to the benchmark algorithm used for comparison, the linear spectral unmixing (LSU) algorithm provided by ENVI software has been chosen. It belongs to the class of inversion algorithms based on squared-error minimization [25]. The algorithm has been set up to follow the full-additivity and nonnegativity constraints. III. E XPERIMENTAL DATA A. INTA-AHS Data The airborne INTA-AHS instrument data set has been acquired in the framework of the European Space Agency (ESA)

4166

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 49, NO. 11, NOVEMBER 2011

Fig. 4. RGB image of the Tor Vergata–Frascati test site taken by CHRISPROBA. R: band 6. G: band 4. B: band 2.

Fig. 3. RGB image of the DEMMIN test site taken by the AHS instrument. R: band 6. G: band 4. B: band 2.

AGRISAR measurement campaign [26]. The test site is the area of Durable Environmental Multidisciplinary Monitoring Information Network (DEMMIN). This is a consolidated test site located in Mecklenburg-Western Pomerania, North-East Germany, which is based on a group of farms within a farming association, covering approximately 25 000 ha. The fields are very large in this area (in average, 200–250 ha). The main crops grown are wheat, barley, rape, maize, and sugar beet. The altitudinal range within the test site is around 50 m. The AHS has 80 spectral channels available in the visible, shortwave infrared, and thermal infrared, with a pixel size of 5.2 m. In this paper, the acquisition taken on the June 6, 2006, has been considered. At that time, five bands in the SWIR region became blind due to loose bonds in the detector array, so they were not used in this paper [27]. An RGB image of the DEMMIN test site taken by the AHS instrument is shown in Fig. 3. Note that the AGRISAR campaign included the collection of extensive ground truth data that, together with a 50-cm resolution orthophoto, formed the a priori knowledge for the algorithm training and the result assessment. B. CHRIS-PROBA Data The payload of the PROBA-1 satellite includes the CHRIS instrument, which provides acquisitions up to 62 narrow and quasi-contiguous spectral bands with a spatial resolution of 34–40 m (mode 1) [28]. In this paper, only acquisitions in mode 3, recommended for land-use and land-change investigations, have been considered. This configuration is characterized by 18 spectral bands covering the visible and near-infrared ranges at 18 m of spatial resolution. Bands for atmospheric correction are present (442 and 490 nm), and also, the red edge range

(between 700 and 800 nm) is well sampled in order to produce accurate identification and monitoring of crops and green areas. From the scientific point of view, one of the most interesting capabilities of this satellite consists in tilting the spacecraft on two axes (along and across track) during the target overpass, allowing quasi-simultaneous acquisitions from five different angles. More particularly, the images are acquired when the zenith angle of the platform, with respect to the fly-by position, is one of the following: +55◦ , +36◦ , 0◦ , −36◦ , or −55◦ . This multiple-view-angle imaging capability, in addition to the high spectral and spatial sensor resolution, permits the collection of a large amount of data in order to perform new studies regarding the land and the atmosphere. The area between Frascati and Tor Vergata is in a test site for the PROBA-1 mission and has also been considered for this paper. This is a mainly flat area located in the southeast of Rome, Italy, which presents an interesting heterogeneous landscape consisting of both natural and artificial areas (see Fig. 4). The multiangular acquisitions given by CHRIS have been combined with a multitemporal data set. We considered three acquisition dates with a flyby zenith angle (FZA) of 0◦ : February 28, 2006, August 19, 2006, and October 9, 2006. Such dates provided 54 (18 × 3) inputs and seemed particularly suitable in sampling the crops’ growth cycle and, hence, in catching the differences among the multitemporal signatures associated to each land cover type. Additionally, the acquisition at an FZA of 36◦ on the second date (month of August) was also considered for an overall number of 72 inputs to be exploited for the unmixing scheme. The choice of the second date stems from the fact that the multiangular information should carry the best contribution when the agricultural fields are at the stage of full development. Note that all data have been atmospherically calibrated. A VHR panchromatic QuickBird image represented the available ground truth. C. AVIRIS Data The AVIRIS data set used in this paper has been obtained by connection to the AVIRIS-JPL website (http://aviris.jpl.nasa.gov). It was acquired over the Cuprite site during a 1997 campaign. AVIRIS is an optical sensor delivering calibrated images of the upwelling spectral radiance in 224 contiguous spectral bands with wavelengths from 0.4 to 2.5 µm. Cuprite is a place located in NV, located approximately

LICCIARDI AND DEL FRATE: PIXEL UNMIXING IN HYPERSPECTRAL DATA BY MEANS OF NEURAL NETWORKS

4167

Fig. 5. RGB image of the Tor Vergata–Frascati test site taken by CHRISPROBA. R: band 6. G: band 4. B: band 2.

200 km northwest of Las Vegas, with a relatively undisturbed acid-sulfate hydrothermal system exhibiting well-exposed alteration mineralogy consisting principally of kaolinite, alunite, and hydrothermal silica. Cuprite has been used as a geologic remote sensing test site since the early 1980s, where so many studies have been published [29], [30] and its geology mapped in detail [31]. In this paper, the original 224 bands were reduced to 134, selecting the spectral range 1.0–2.5 µm containing very diagnostic absorption features. Noisy bands were also discarded. On the Cuprite test site, nonlinear mixtures of endmembers dominate the image, and they prevented the identification of pure pixels by visual inspection. Hence, the training phase was carried out by using the mineral spectral samples taken from the USGS spectral library. The selection of the types of signatures was carried out in agreement with those individuated in [32]. These correspond to the following 18 main spectral typologies: diopsite, hyperstene, lizardite, illite, montmorillonite, goethite, hematite, jarosite, dolomite, calcite, chlorite, muscovite, alunite, dickite, halloysite, kaolinite, nontronite, and olivine. IV. R ESULTS AND D ISCUSSIONS A. AHS Data In the case of the AHS data, from the PCA analysis, it resulted that the first five PCA components contained almost 99.9% of the whole statistical information. For this reason, five principal components have also been considered for the nonlinear dimensionality reduction performed by AANN. In Fig. 5, we show the result of the analysis aiming at selecting the suitable number of neurons for the intermediate layers. We can see that the minimization of the cost function was obtained with 25 neurons. This led to the following topology for the AANN: 75 nodes for both input and output layers, 25 nodes for the mapping and demapping hidden layers, and five nodes for the bottleneck layer. As anticipated in Section II, a similar quantitative analysis, yielding plots as the one shown in Fig. 5, has been carried out each time an MLP topology has to be designed. However, to be concise, in the rest of this paper, only the final selected configurations will be provided. We observed that, with some minor exceptions, the components

Fig. 6. Atmospheric effects on the first (up) and fourth (down) principal components extracted by means of the different considered techniques. (Left) NLPCA. (Center) PCA. (Right) MNF.

extracted by means of the AANN are generally not seriously disturbed by atmospheric effects. On the other hand, this problem seems to be more visible in the PCA components, and it seriously degrades the MNF components. This is shown in Fig. 6, where we show, for each technique, the two components that appear as the most affected by the atmospheric effects. We see how the level of disturbance is slight for NLPCA, more significant for the PCA, and even dramatic for MNF. In Fig. 7, we examined the accuracy in reconstructing the original spectra starting from the five extracted principal components. Two examples of reconstructed signatures are shown. The plots have been obtained by averaging over pixels of the same area of interest. The two land cover types considered are water and trees. In the case of water (about 650 pixels), it can be noted that the NLPCA is significantly more effective than PCA in encoding the spectral information. In particular, the behavior in the visible and near infrared with strong curvatures is better resembled. Differently, the forest case (again about 650 pixels) is an example where the two techniques are rather comparable, even though the PCA shows slight discrepancies with the true spectra at the higher wavelengths. Trends similar to those shown in Fig. 7 have been observed for the other land cover types. An RM E value of 0.051 was computed for the NLPCA technique versus a value of 0.101 for the PCA method; hence, the NLPCA improved the performance of the linear approach by about 50%. The five components extracted from each pixel spectral signature have then been used for the implementation of the unmixing algorithm via an MLP topology of 5-25-25-10. About 45 000 pixels were considered for the training phase, and about 50 000 were considered for the test set. The training phase lasted less than 150 epochs; hence, a limited computational burden was required. It must be remembered that the decrease of the complexity of the NN training phase was one of the objectives of this paper. In the implementation of the final automatic unmixing scheme, the

4168

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 49, NO. 11, NOVEMBER 2011

Fig. 8. Examples of AHS individual pixels (in the red box) for which ground truth was evaluated by photo-interpretation and compared with the unmixing results provided by LSU and NN. For the correspondence with Table I, example 1 is on the left, and example 2 is on the right. TABLE I A BUNDANCE E STIMATION ( IN P ERCENT ) FOR T WO P IXELS ( THE R ED B OXES IN F IG. 8) E XTRACTED F ROM THE AHS I MAGE

Fig. 7. Original spectral signatures and spectral signatures reconstructed by using NLPCA and PCA. Examples for water (up) and tree (down) land cover types. Values between bands 25 and 60 have been skipped because they are not significant.

whole NN architecture consists of the following layers: one input layer of 75 units; four hidden layers of 25, 5, 25, and 25 neurons, respectively; and an output vector of ten components. The first hidden layer extracts the five nonlinear principal components which feed the subsequent hidden layers performing the abundance estimation. Ten different land cover types were individuated in the scene, corresponding to the following: maize, winter wheat, winter barley, rape, sugar beet, pasture, water, artificial areas, coniferous trees, and deciduous trees. The abundance estimation was obtained starting from the fuzzy classification of the NN algorithm, adopting the procedure previously explained. The technique was compared through a quantitative exercise with the LSU method. For such an evaluation, we derived the ground truth for 40 separated individual pixels by photo-interpretation of a 50-cm spatial resolution orthophoto taken on the same area. In Fig. 8, we show two examples of the individual selected pixels whose abundances were estimated and verified. The corresponding results in terms of percentages of each endmember are given in Table I. In the first example, it can be noted that the radiance measured by the sensor stems from the mixed contribution of three different endmembers: mostly from pasture and, for a lower percentage, from artificial area and deciduous trees. The LSU is better than NN in retrieving the artificial percentage and also detects the significant presence of deciduous trees. However, it fails in the abundance estimation of the other agricultural endmembers, particularly pasture. On the other hand, NN correctly assigns a high percentage value to pasture and low values to the other low-vegetated

classes. This behavior can be explained by considering that standard unmixing methodologies such as LSU require the endmembers to be the most uncorrelated as possible. If, conversely, the components are strictly connected, the mixing process between the different components is essentially nonlinear [20]. In our case, most of the endmembers correspond to crops, which may be correlated to each other, leading the linear unmixing technique to a wrong result. On the other hand, the NN unmixing technique should be able to exploit nonlinear dependencies more effectively and should provide good accuracy even if the chosen endmembers are closely correlated. The results of example 2, showing certain confusion in the LSU performance, seem to confirm this assumption. A more general quantitative assessment is provided in Table II, which reports the mean and the standard deviation values of each endmember, considering the whole set of the 40 pixels selected for the quantitative evaluation. In particular, Table II gives the quantitative assessment in terms of root-mean-square error (rmse), computed by considering all classes. For each class, the rmse value is given by "! # # (a − ati )2 $ i∈N ei rmse = N where N is the number of pixels selected for the performance evaluation (in this case 40), aei is the estimated abundance for pixel i, and ati is the corresponding true abundance as calculated via photo-interpretation. Note that the percentage mean value of some classes is zero. In fact, it was difficult to find pixels characterized by some specific contributions by

LICCIARDI AND DEL FRATE: PIXEL UNMIXING IN HYPERSPECTRAL DATA BY MEANS OF NEURAL NETWORKS

4169

TABLE II A BUNDANCE S TATISTICS AND RMSE VALUES O BTAINED IN THE Q UANTITATIVE P ERFORMANCE E VALUATION FOR THE AHS DATA

photo-interpretation. From the reported values, we can see that NN definitely seems to be more effective than LSU. Averaging the rmse values over all of the endmembers, we obtained a more concise performance parameter, which is 0.065 for NN and 0.329 for LSU. Therefore, the NN approach improves the result obtained with LSU by almost one order of magnitude. B. CHRIS-PROBA Data In the case of CHRIS-PROBA, a multitemporal– multiangular data set has been considered. The final selected AANN topology was 72-25-5-25-72. A total of 36 inputs are given by the acquisitions taken on two days, but with the FZA only at 0◦ , the remaining inputs are provided by the acquisition taken in a different date but at two FZAs (0◦ and 36◦ ). In Fig. 9, the five NLPCA and PCA components are shown. It can be observed that the nonlinear components seem to focus on specific features of the scene, which is rather interesting. For example, components 2 and 3 appear rather suitable in analyzing the building distribution, while component 1 is more sensitive to natural and agricultural areas, and component 5 is sensitive to the road network. Note that components 4 and 5 clearly detect the presence of cloud in one of the acquisitions, while this is not the case for the other three components. We can see in Fig. 9 that the PCA results did not show such a specific property. For example, none of the five PCA components is able to highlight the urban fabric as the third NLPCA component does. These results can be, at least partially, explained with the capability of the NLPCA in detecting nonlinear dependencies among the data. Also for the CHRIS-PROBA imagery, the more quantitative evaluation considering the overall error in the reconstruction of the original band information was carried out. In this case, parameter B includes the multitemporal and multiangular acquisitions. We obtained an RM E value of 0.006 for the NLPCA and 0.083 for the PCA, so in this case, the improvement of the NLPCA with respect to PCA is even more significant. For the case of the CHRIS-PROBA data, the network topology estimating the abundances from the five nonlinear components is 5-36-36-11. The following different classes have been considered: vineyards, pasture, permanent crops, industrial, dark asphalt, maize, built-up area, bright asphalt, and agricultural area. It should be noted that three agricultural areas have been differentiated in the network

Fig. 9. Five principal components (first to fifth from the top to the bottom) obtained for the multitemporal–multiangular CHRIS imagery using (left) PCA and (right) NLPCA.

Fig. 10. Examples of PROBA individual pixels (in the red box) for which the ground truth was evaluated by photo-interpretation and compared with the unmixing results provided by LSU and NN. For the correspondence with Table III, example 1 is on the left, and example 2 is on the right.

output layer according to the growth cycle, but they have been incorporated within the same class for abundance estimation. The NN used for abundance estimation has been obtained using training and test sets of 4000 and 1200 patterns, respectively. With only five inputs, the number of epochs necessary to get the network trained was again around 150, which is much less than those required in performing the abundance estimation directly starting from the 72 measurements. This confirmed that dimensionality reduction significantly lowered the complexity of the training phase. In this exercise, the ground truth in terms of percentages of abundances was determined for 50 selected individual pixels by visual inspection using a VHR panchromatic QuickBird image. It should be observed that the considered CHRIS-PROBA imagery pixel size is 18 m, so the panchromatic QuickBird data characterized by less than 1-m resolution can be recognized as suitable in producing the

4170

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 49, NO. 11, NOVEMBER 2011

TABLE III A BUNDANCE E STIMATION ( IN P ERCENT ) FOR T WO P IXELS ( THE R ED B OXES IN F IG. 10) E XTRACTED F ROM THE CHRIS-PROBA I MAGERY

of each endmember, considering the whole set of measured CHRIS-PROBA pixels. With regard to the two dominant categories (pasture and built-up areas), the NN methodology gives an rmse quite below the standard deviation value, which proves the effectiveness of the approach. The mean NN rmse value computed over all of the endmembers is 0.161, and again, it is better than the value of 0.391 obtained with LSU. C. AVIRIS Data

TABLE IV A BUNDANCE S TATISTICS AND RMSE VALUES O BTAINED IN THE Q UANTITATIVE P ERFORMANCE E VALUATION FOR THE CHRIS-PROBA DATA

ground truth reference. In Fig. 10, we report two examples extracted from the selected pixels whose abundances were computed by interpretation of the VHR data. In Table III, the corresponding results obtained with the two methods are shown. We can see that, in the first example, the pasture estimation of NN is rather accurate, while some confusion between built-up and bright asphalt appears. Also, in example 2, the performance of NN is positive: the percentage of bright asphalt is detected with a good approximation, and the underestimation of the built-up area with respect to the industrial area can be recognized as a minor error. On the other hand, the LSU does not provide a good performance, especially in the first example where the industrial area is given as the prevailing class. In the comment of such results, we have to note that the CHRIS-PROBA data set is different from AHS under two (linked) points of view: spatial resolution is lower, and endmembers correspond to categories whose spectral signatures can be more easily distinguished. In the AHS scene, most of the endmembers correspond to agricultural classes. In this second test site, we have more heterogeneity in terms of spectral signatures. These factors should, in principle, facilitate the unmixing analysis via a linear approach. However, in the CHRIS-PROBA case, we have to manage rather different types of measurements: hyperspectral, multitemporal, and multiangular. Therefore, it can be possible that only the NNs have the appropriate flexibility to manage such a variability in one single input quantity. Finally, we evaluated the rmse over the entire set of the pixels selected for the performance assessment. As for the AHS case, we were not able to include all endmembers in the considered ground truth mixed pixels. Table IV reports the mean and the standard deviation values

The 134-band data set was reduced using the NLPCA approach. In this case, we found the best topology to be 134-50-6-50-134. This means that we reduced the original data set to six nonlinear principal components. In Fig. 11, we show two examples of spectral signatures reconstructed from the six NLPCA components compared with the original one. For the training phase of the NN performing the unmixing, the six NLPCs corresponding to the reference spectra of the minerals were computed and given as input for the training of a network with topology 6-45-45-18. The spectra from the USGS spectral library were subdivided into 36 samples for the training set and 18 for the test set. For this exercise, the performance evaluation could only be carried out at a more qualitative level as the exact composition values of the single pixel were not available. In Fig. 12, we can see the image fractions of the most significant minerals. The maps shown agree with the results reported in [32]. V. C ONCLUSION In this paper, a novel approach based on NNs for the extraction of pixel abundances from hyperspectral data has been developed. The NN performs both the dimensionality reduction procedure and the final unmixing. The final scheme is a single architecture chaining the two operations in an automatic mode. However, the procedures need to be designed separately, with special care in avoiding overfitting effects. The methodology has been applied to the following: the images acquired by the INTA-AHS instrument, characterized by 75 working bands, over a German test site; a set of satellite CHRIS-PROBA images taken over the extra-urban area of the Tor Vergata University campus, Rome, Italy; and the AVIRIS imagery taken on the Cuprite test site, NV. We have noted that the AANN approach is more suitable than other techniques such as PCA in eliminating nonlinear correlations in the data (hence, to optimize the design of successive inversion schemes). In particular, in the AHS data, we have observed a better robustness to atmospheric effects. For the CHRIS-PROBA data, the nonlinear NLPCA technique seems to have interesting capabilities in feature extraction. It often happened that single NLPCA components were polarized on a specific land cover type, e.g., urban fabric, which could be of great use for object detection problems. The interesting overall performance of NLPCA highlights the potential of this technique in reducing dimensionality of hyperspectral data even only for storage or transmission purposes. The unmixing results show that the reduced vector, when used as input for a subsequent NN dedicated module, allows us to yield accurate pixel abundance estimation, in any case, better than that obtained with LSU. In fact, from the quantitative comparison with the

LICCIARDI AND DEL FRATE: PIXEL UNMIXING IN HYPERSPECTRAL DATA BY MEANS OF NEURAL NETWORKS

4171

Fig. 11. Two examples of original (solid line) spectral signatures and spectral signatures reconstructed by using NLPCA (dashed line) from the AVIRIS data set on Cuprite, NV.

Fig. 12. On the left, examples of distribution of (a) alunite, (b) calcite, (c) kaolinite, (d) muscovite, (e) halloysite, and (f) jarosite derived from the NN unmixing algorithm. On the right, the classification map obtained in [32]. Note that the dashed yellow box approximately shows the common area for the two studies.

ground truth, both in AHS and in the CHRIS-PROBA data, the observed rmse on the abundance estimation is significantly lower than that obtained using the LSU approach. In the case of AHS, the NN exploits their inherent characteristic of being a nonlinear model, which is more appropriate in dealing with highly correlated endmembers. In the case of the CHRISPROBA data, the NN approach seems to be more effective in the management of the different types of information simultaneously available with the hyperspectral–multitemporal and multiangular acquisition modes. The experiment with AVIRIS imagery basically confirmed the effectiveness of the approach on the third type of hyperspectral data. We note that, so far, NN hyperspectral unmixing schemes have been mainly applied to airborne data, so the inclusion of the spaceborne data can be considered as an additional innovative result provided by this

paper. In particular, even if the analysis of a multitemporal and multiangular configuration might represent a rather uncommon scenario, it has to be noted that the availability of satellite multiconfiguration data is continuously increasing. Under this perspective, the impact of the presented technique could be even more significant. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their helpful comments. The AHS data set was provided within the ESA CAT-1 project n. 6519, while the CHRISPROBA data set was provided within the ESA CAT-1 project n. 3075. The Cuprite AVIRIS data set was provided by JPL (http://aviris.jpl.nasa.gov).

4172

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 49, NO. 11, NOVEMBER 2011

R EFERENCES [1] J. A. Benediktsson, P. H. Swain, and O. K. Ersoy, “Neural network approaches versus statistical methods in classification of multisource remote sensing data,” IEEE Trans. Geosci. Remote Sens., vol. 28, no. 4, pp. 540– 552, Jul. 1990. [2] J. F. Mas and J. J. Flores, “The application of artificial neural networks to the analysis of remotely sensed data,” Int. J. Remote Sens., vol. 29, no. 3, pp. 617–663, 2008. [3] M. S. Dawson, “Applications of electromagnetic scattering models to parameter retrieval and classification,” in Microwave Scattering and Emission Models and Their Applications, A. K. Fung, Ed. Norwood, MA: Artech House, 1994, ch. 12. [4] F. Del Frate and G. Schiavon, “A combined natural orthogonal functions—Neural network technique for the radiometric estimation of atmospheric profiles,” Radio Sci., vol. 33, no. 2, pp. 405–410, 1998. [5] J. A. Benediktsson and J. R. Sveinsson, “Feature extraction for multisource data classification with artificial neural networks,” Int. J. Remote Sens., vol. 18, no. 4, pp. 727–740, 1997. [6] A. Baraldi, E. Binaghi, P. Blonda, P. A. Brivio, and A. Rampini, “Comparison of the multi-layer perceptron with neurofuzzy techniques in the estimation of cover class mixture in remotely sensed data,” IEEE Trans. Geosci. Remote Sens., vol. 39, no. 5, pp. 994–1005, May 2001. [7] G. M. Foody, “Relating the land-cover composition of mixed pixels to artificial neural network classification output,” Photogramm. Eng. Remote Sens., vol. 62, no. 5, pp. 491–499, 1996. [8] R. Fernandes, R. Fraser, R. Latifovic, J. Cihlar, J. Beaubien, and Y. Du, “Approaches to fractional land cover and continuous field mapping: A comparative assessment over the BOREAS study region,” Remote Sens. Environ., vol. 89, no. 2, pp. 234–251, Jan. 2004. [9] P. M. Atkinson, M. E. J. Cutler, and H. Lewis, “Mapping sub-pixel proportional land cover with AVHRR imagery,” Int. J. Remote Sens., vol. 18, no. 4, pp. 917–935, 1997. [10] W. Liu and E. Y. Wu, “Comparison of non-linear mixture models: Subpixel classification,” Remote Sens. Environ., vol. 94, no. 2, pp. 145–154, Jan. 2005. [11] N. Keshava, “A survey of spectral unmixing algorithms,” Lincoln Lab. J., vol. 14, no. 1, pp. 55–78, 2003. [12] C. Bishop, Neural Networks for Pattern Recognition. New York: Oxford Univ. Press, 1995. [13] G. Licciardi, F. Pacifici, D. Tuia, S. Prasad, T. West, F. Giacco, C. Thiel, J. Inglada, E. Christophe, J. Chanussot, and P. Gamba, “Decision fusion for the classification of hyperspectral data: Outcome of the 2008 GRS-S Data Fusion Contest,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 11, pp. 3857–3865, Nov. 2009. [14] R. Duca and F. Del Frate, “Hyperspectral and multi-angle CHRIS PROBA images for the generation of land cover maps,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 10, pp. 2857–2866, Oct. 2008. [15] J. L. Crespo, R. J. Duro, and F. López Peña, “Gaussian synapse ANNs in multi- and hyperspectral image data analysis,” IEEE Trans. Instrum. Meas., vol. 52, no. 3, pp. 724–732, Jun. 2003. [16] J. Plaza, A. Plaza, R. Perez, and P. Martinez, “On the use of small training sets for neural network-based characterization of mixed pixels in remotely sensed hyperspectral images,” Pattern Recognit., vol. 42, no. 11, pp. 3032–3045, Nov. 2009. [17] S.-Y. Hsu, T. Masters, M. Olson, M. Tenorio, and T. Grogan, “Comparative analysis of five neural network models,” Remote Sens. Rev., vol. 6, no. 1, pp. 319–329, 1992. [18] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Netw., vol. 2, no. 5, pp. 359– 366, 1989. [19] A. Zell, G. Mamier, M. Vogt, N. Mache, R. Hübner, S. Döring, K. U. Herrmann, T. Soyez, M. Schmalzl, T. Sommer, A. Hatzigeorgiou, D. Posselt, T. Schreiner, B. Kett, G. Clemente, and J. Wieland, “SNNS Stuttgart Neural Network Simulator user manual,” Univ Stuttgart, Inst. Parallel Distrib. High Perform. Syst., Stuttgart, Germany, Rep. N6/95, 1995. [20] F. Del Frate and G. Schiavon, “Nonlinear principal component analysis for the radiometric inversion of atmospheric profiles by using neural networks,” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 5, pp. 2335– 2342, Sep. 1999. [21] H. Bourlard and Y. Kamp, “Auto-association by multilayer perceptrons and singular value decomposition,” Biol. Cybern., vol. 59, no. 4/5, pp. 291–294, 1988. [22] R. S. Fraser and Y. J. Kaufmann, “The relative importance of aerosol scattering and absorption in remote sensing,” IEEE Trans. Geosci. Remote Sens., vol. GRS-23, no. 5, pp. 625–633, Sep. 1985.

[23] M. D. Farrell and R. M. Mersereau, “On the impact of PCA dimension reduction for hyperspectral detection of difficult targets,” IEEE Geosci. Remote Sens. Lett., vol. 2, no. 2, pp. 192–195, Apr. 2005. [24] A. Green, M. Berman, P. Switzer, and M. D. Craig, “A transformation for ordering multispectral data in terms of image quality with implications for noise removal,” IEEE Trans. Geosci. Remote Sens., vol. 26, no. 1, pp. 65– 74, Jan. 1988. [25] N. Keshava and J. F. Mustard, “Spectral unmixing,” IEEE Signal Process. Mag., vol. 19, no. 1, pp. 44–57, Jan. 2002. [26] “AGRISAR final Rep.,” Document ID 19974/06/I-LG, 2008. [27] J. A. Gómez, E. de Miguel, Ó. Gutiérrez de la Cémara, and A. FernándezRenau, “Status of the INTA AHS sensor,” in Proc. 5th EARSeL Workshop Imag. Spectrosc., Bruges, Belgium, Apr. 23–25, 2007. [28] M. J. Barnsley, J. J. Settle, M. A. Cutter, D. R. Lobb, and F. Teston, “The PROBA/CHRIS mission: A low-cost smallsat for hyperspectral multiangle observations of the Earth surface and atmosphere,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 7, pp. 1512–1520, Jul. 2004. [29] A. F. H. Goetz and V. Strivastava, “Mineralogical mapping in the Cuprite mining district,” in Proc. AIS Data Anal. Workshop, JPL Publ. 85-41, 1985, pp. 22–29. [30] G. A. Swayze, R. L. Clark, S. Sutley, and A. J. Gallagher, “Groundtruthing AVIRIS mineral mapping at Cuprite, Nevada,” in Proc. Summaries 3rd Annu. JPL Airborne Geosci. Workshop, V 1, AVIRIS Workshop, JPL Publ. 92-14, 1992, pp. 47–49. [31] G. A. Swayze, “The hydrothermal and structural history of the Cuprite mining district, southwestern Nevada: An integrated geological and geophysical approach,” Ph.D. dissertation, Univ. Colorado, Boulder, CO, 1997, 341 p. [32] R. N. Clark, G. A. Swayze, K. E. Livo, R. F. Kokaly, S. J. Sutley, J. B. Dalton, R. R. McDougal, and C. A. Gent, “Imaging spectroscopy: Earth and planetary remote sensing with the USGS Tetracorder and expert systems,” J. Geophys Res., vol. 108, no. E12, pp. 5.1–5.44, Dec. 2003, DOI: 101029/2002JE001847.

Giorgio A. Licciardi received the M.S. degree in telecommunication engineering and the Ph.D. degree in “geoinformation” from the Tor Vergata University, Rome, Italy, in 2005 and 2010, respectively. He is currently a Postdoctoral Fellow with the Grenoble Institute of Technology (INPG), Grenoble, France, where he is conducting his research at the Laboratoire Grenoblois de l’Image, de la Parole, du Signal et de l’Automatique GIPSA-Lab. His research includes information extraction from remote sensing data and multispectral and hyperspectral image analysis. He is also a European Space Agency Category-1 Principal Investigator for Earth observation data. Dr. Licciardi serves as a Referee for several scientific journals such as the IEEE T RANSACTIONS ON G EOSCIENCE AND R EMOTE S ENSING and the IEEE G EOSCIENCE AND R EMOTE S ENSING L ETTERS. Fabio Del Frate (M’03–SM’09) received the Laurea degree in electronic engineering and the Ph.D. degree in computer science from the Tor Vergata University, Rome, Italy, in 1992 and 1997. From September 1995 to June 1996, he was a Visiting Scientist with the Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge. In 1998 and 1999, he was a Research Fellow with the ESRIN establishment, European Space Agency (ESA), Frascati, Italy, where he was engaged in projects concerning end-to-end remote sensing applications. He is currently a Research Professor with the Tor Vergata University, where he teaches courses of electromagnetics and neural networks (NNs). He has been a PI in several remote sensing projects supported by ESA. He is the author or coauthor of more than 100 proceeding and journal papers with a special focus on the applications of NNs to remote sensing inversion problems. His main research topics include retrieval and classification algorithms for land cover from satellite data, oil spill detection in SAR imagery, retrieval of atmospheric variables with microwave radiometry, and data exploitation for the PROBA and OMI missions. Dr. Del Frate has been a session organizer and a member of technical committees in different international conferences. He serves as an Associate Editor of the IEEE G EOSCIENCE AND R EMOTE S ENSING L ETTERS.