www.DownloadPaper.ir
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010
1355
Simulated Multispectral Imagery for Tree Species Classification Using Support Vector Machines Ville Heikkinen, Timo Tokola, Jussi Parkkinen, Ilkka Korpela, and Timo Jääskeläinen
Abstract—The information content of remotely sensed data depends primarily on the spatial and spectral properties of the imaging device. This paper focuses on the classification performance of the different spectral features (hyper- and multispectral measurements) with respect to three tree species. The Support Vector Machine was chosen as the classification algorithm for these features. A simulated optical radiation model was constructed to evaluate the identification performance of the given multispectral system for the tree species, and the effects of spectral-band selection and data preprocessing were studied in this setting. Simulations were based on the reflectance measurements of the pine (Pinus sylvestris L.), spruce [Picea abies (L.) H. Karst.], and birch trees (Betula pubescens Ehrh. and Betula pendula Roth). Leica ADS80 airborne sensor with four spectral bands (channels) was used as a fixed multispectral sensor system that leads to response values for the at-sensor radiance signal. Results suggest that this four-band system has inadequate classification performance for the three tree species. The simulations demonstrate on average a 5–15 percentage points improvement in classification performance when the Leica system is combined with one additional spectral band. It is also demonstrated for the Leica data that feature mapping through a Mahalanobis kernel leads to a 5–10 percentage points improvement in classification performance when compared with other kernels. Index Terms—Feature extraction, image sensors, pattern classification, radiometry, remote sensing.
I. I NTRODUCTION LASSIFICATION is one of the approaches in deriving information from remotely sensed forest data, and detailed tree species classification is important in forest inventories for technical, ecological, and economic reasons [14]. The adequate accuracy level for practical applications is above 90%, as the value of the forest data deteriorates rapidly at lower accuracies [15]. Tree species classification is an evident bottleneck in current remote sensing of forests, in spite of the ample research carried out into the use of airborne laser scanning and the recently introduced digital aerial multispectral cameras (e.g., [25]). These cameras offer enhanced radiometric and geometric properties when compared with traditional film cameras, but they are not customized for forestry applications but for sur-
C
Manuscript received April 6, 2009; revised May 22, 2009 and July 27, 2009. First published December 4, 2009; current version published February 24, 2010. This work was supported by the Academy of Finland under Grant 123193. V. Heikkinen, J. Parkkinen, and T. Jääskeläinen are with the Faculty of Science, InFotonics Center, University of Joensuu, 80101 Joensuu, Finland (e-mail:
[email protected]). T. Tokola is with the Faculty of Forest Sciences, University of Joensuu, 80101 Joensuu, Finland. I. Korpela is with the Faculty of Agriculture and Forestry, Department of Forest Resource Management, University of Helsinki, 00014 Helsinki, Finland. Digital Object Identifier 10.1109/TGRS.2009.2032239
veying and mapping purposes. There is still a substantial lack of basic research into the spectral characteristics distinguishing given forest objects, and such information would be valuable in specifying optimal sensors designed for forest use. The classification of objects in images is based on their geometrical or spectral features. At the single-tree level, important structures contribute to the image texture only at very high resolutions, and such images are often too expensive to acquire over large areas. We will thus ignore the spatial features of images here and focus entirely on features derived from pointwise multispectral measurements. Tree species recognition algorithms that operate at the individual tree level have been developed, and they are mainly based on the spectral properties of the observed signal [9], [10], [13], [22]. A property which characterizes the spectral imaging system is the number of the individual wavelength bands sensed in the electromagnetic spectrum. Every spectral band of the sensor has a corresponding spectral response function with some shape, and the number of these bands defines the dimensionality of the measurement vectors. The bandwidth of an individual band in the sensor system is usually called its spectral resolution. Current sensor technology allows the capture of spectral data using hundreds of high-resolution spectral bands simultaneously. Imaging devices with these capabilities provide a possibility to use well-known analytical methods to extract representative spectral space features from the data. It has been shown that the classification based on high-dimensional reflectance data can be carried out accurately and efficiently using linear mappings to lower-dimensional subspaces for each class [3], [13], [20]. Although the most informative spectral data are obtained with systems involving hundreds of spectral bands, the use of such imaging devices can be impractical or too costly in some applications. For example, the width of the imaged area (swath width) of hyperspectral devices for remote sensing is smaller than that of multispectral devices. Usually, the highdimensional hyperspectral data also involve a high level of redundancy, implying an inefficient data management and storage. The identification and usage of a small number of datadependent relevant bands would increase the applicability of such an imaging system. When the imaging device has a low number of spectral bands, the available data already reside in the fixed lowerdimensional subspace defined by the spectral response functions of the data-independent sensor system. Consequently, an efficient linear feature extraction might be disrupted due to the lower information content of the measured data. It is also possible that the system may have been optimized for some particular task, which could lead to poor performance when it
0196-2892/$26.00 © 2009 IEEE Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:487 UTC from IE Xplore. Restricon aply.
1356
www.DownloadPaper.ir
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010
is used for classification purposes. In this paper, the modeling problems due to lower information content of the multispectral measurements are compensated for by means of various data preprocessing methods and nonlinear feature space mappings through positive definite kernel functions. The kernel technique does not compensate for the lack of information content, but they introduce the tools to model complex data structures nonlinearly. The features derived from the kernel give representations for the data in a high-dimensional feature space, where the classification task is assumed to be easier to accomplish [24]. Recently, support vector machine (SVM) classifiers have been found to achieve excellent accuracy when used for the classification of remotely sensed data. In particular, when combined with kernel functions, SVMs have been found to give results that compete well with the best previously available classifiers [4], [7], [11], [18], [19]. An SVM classifier is also robust to noise and high-dimensional data and gives the solution (support vectors) in the form of a sparse representation, which is beneficial in practical usage [4]. Performance of a classifier is also affected by the preprocessing stage for data. Standard methods for preprocessing are different scaling methods, outlier detection, principal component analysis, and whitening. In this paper, we concentrate on outlier removal and whitening transformation. Whitening transformation is related to the use of translation invariant Mahalanobis kernel, which has been suggested to allow fast model selection of the SVM algorithm [1]. The objective of the present paper is to evaluate the effects of spectral-band selection and data preprocessing on tree species identification with the SVM algorithm. Simulated high-spectral-resolution radiance measurements and simulated response values [digital numbers (DNs)] of a four-channel Leica ADS80 airborne camera were used as a basis for this paper [16]. The simulation features are based on real reflectance measurements of pine (Pinus sylvestris L.), spruce [Picea abies (L.) H. Karst.], and birch trees (Betula pubescens Ehrh. and Betula pendula Roth). The simulated setting and the availability of a reflectance ensemble allow the Leica ADS80 system to be studied in conjunction with additional spectral response functions. Simulated measurements obtained using alternative multispectral system are compared in terms of the classification accuracy of the SVM algorithm, when the first-order polynomial, Gaussian, and Mahalanobis kernels are used. Statistical significance between classifications using different kernels and systems was evaluated with McNemar’s test. We demonstrate that significant improvements for classification accuracy are obtained when an additional spectral response function is added to the Leica system. In addition, we also present improvements of classification accuracy by using preprocessing of data. II. O PTICAL R ADIATION M ODEL In the following, we introduce the optical radiation model and data used for the simulations of the hyper- and multispectral measurements. In the notation used in the following, the wavelength variable is denoted by λ (in nanometer) and the wavelength region by Ω. Operators are written in capital Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:487 UTC from IE Xplore. Restricon aply.
letters, and functions in lower case letters. Vectors are denoted by boldface letters, and the corresponding vector components are superscripted. In the visible and near-infrared regions, the at-sensor radiance component of a perfectly diffuse reflecting surface (this is also known as Lambertian surface model, where reflectance is independent of viewing angle) is approximated as η(λ) = r(λ)
l(λ)τv (λ) l0 (λ)τs (λ)τv (λ) cos(Θ)+r(λ) +η s (λ) π π (1)
where [23] l0 : Ω → R+ l : Ω → R+ τs : Ω → [0, 1]
the exo-atmospheric solar irradiance; the irradiance at the surface due to skylight; the atmospheric transmittance along the solar path; τv : Ω → [0, 1] the atmospheric transmittance along the sensor view path; r : Ω → [0, 1] the spectral reflectance of the object; Θ the angle between the surface normal and the solar incident angle; the path scattered radiance at-sensor η s (λ) component. The dependence on the spatial location is not explicitly written into the earlier equation. For a non-Lambertian surface, we could assume a fixed viewing angle and replace r(λ)/π with the bidirectional reflectance distribution function [23]. We approximate that τv = 1 for our airborne sensor and assume that η s (λ) = 0 for the indirect component. Assuming that the angle Θ is fixed, the effect of the electromagnetic radiation η in the k-band camera can be modeled as ⎞ ⎛ (2) xi = Γ ⎝ξ1i η(λ)τc (λ)si (λ)dλ⎠ Ω
where i = 1, . . . , k, si : Ω → [0, 1] is the spectral response function of the ith camera band, and τc : Ω → [0, 1] is the transmittance of the camera optics. The scalar ξ1i corresponds to the chosen measurement geometry and exposure setting, while the function Γ gathers together the nonlinearity of the system. Substituting the radiance function into (2), we have ⎞ ⎛ (3) xi = Γ ⎝ξ2i wi (λ)r(λ)dλ⎠ Ω
with ξ2i = (1/π)ξ1i cos(Θ) as the system calibration constant and
wi (λ) = (l0 (λ)τs (λ) + l(λ)) τc (λ)si (λ) = ld (λ)τc (λ)si (λ). (4) We assume here that the response functions {si }ki=1 are located in the wavelength interval of the visible and near-infrared radiation Ω = [390, 850] nm. This wavelength region was based on the properties of the available reflectance ensemble. We use the discrete high-resolution daylight measurement as an approximation for the irradiance ld (λ) = l0 (λ)τs (λ) + l(λ).
(5)
www.DownloadPaper.ir
HEIKKINEN et al.: SIMULATED MULTISPECTRAL IMAGERY FOR TREE SPECIES CLASSIFICATION
Fig. 1.
1357
Daylight irradiance, sampling of 5 nm.
The spectral power distribution of daylight used here corresponds to spectroradiometer measurements of the hemispheric daylight in the region of 380–780 nm, including the global spectral irradiance E on a horizontal surface from direct sunlight and the entire sky. The measurements correspond to midday conditions with clear weather, and they were carried out in Joensuu, Finland. Due to the restricted wavelength range of the measurements, the daylight irradiance was extended to the region of 380–850 nm using a constant value in the region of 780–850 nm (see Fig. 1). This daylight irradiance was used for all the simulations. A. Reflectance Ensemble The reflectance spectra of needles of young (less than 40 years old) Scots pines (Pinus sylvestris L.) and Norway spruces [Picea abies (L.) H. Karst.] and the leaves of birches (Betula pubescens Ehrh. and Betula pendula Roth) collected from Finland and Sweden were used as experimental data [13]. The spectroradiometric measurements were made in clear weather during the growing season in June 1992. Each radiance measurement represented the average spectrum of thousands of leaves on a growing tree. The measured crowns were thick in order to minimize the effects of branch color and background illumination. The reflectance component of the signal was calculated using the baseline measurement at a distance of 5 m, with the aid of a BaSO4 reference surface [12]. In the measurement setting, the sun was always behind the measurement direction of the sensor (backscattering), with a clear path toward the object. The solar vector had an almost constant elevation angle, but the azimuth angle with respect to the measurement direction had a variation. The measurements can be considered to be free of spectral signatures from other classes. They were carried out on the ground at a distance of 50 m (field of view 0.6 m2 ) using a PR 713/702 AM spectroradiometer in the wavelength interval of 390–1070 nm with a 4-nm spectral sampling. Repetition accuracy of the device is 3.5%. The original spectra were transformed to the wavelength range of 390–850 nm at 5-nm sampling intervals by linear interpolation. The upper limit of the wavelength was set to 850 nm, as no significant differences in the spectra were found above this wavelength for these measurements. The ensemble sizes for the three classes were almost equal: 336 samples for birch, 369 samples for pine, and Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:487 UTC from IE Xplore. Restricon aply.
Fig. 2. Average spectra of birch, pine, and spruce ensembles.
348 samples for spruce. Differences between average spectra of deciduous (birch) and coniferous (pine and spruce) groups are shown in Fig. 2. More details on the measurement setting and the analysis of the data subspaces for discrimination and approximation are given in [13]. B. Response Functions Systems {si }ki=1 based on rectangular response functions were used in this paper. Examples of real multispectral systems using response functions of this kind are the Leica ADS40 and ADS80 (airborne digital sensor) systems [16]. These ideal functions {si }ki=1 correspond to the characteristic functions of nonoverlapping wavelength intervals {Ωi }ki=1 and are defined as si (λ) = ci χΩi (λ)
(6)
where ci ∈ R+ and χΩi (λ) =
1, λ ∈ Ωi 0, otherwise.
(7)
The wavelength supports for the Leica system are Ω1 = [428, 492], Ω2 = [533, 587], Ω3 = [608, 662], and Ω4 = [833, 887]. These bands are comparable in their spectral properties to Landsat bands 1–4 [2]. The response functions of the Leica system are shown in Fig. 3. C. Numerical Approximation of Multispectral Responses High-spectral-resolution reflectance measurements were used as approximations for the true reflectance values r(λi ), where the measurements i = 1, . . . , n correspond to the spectroradiometer measurements and λi correspond to uniform sampling of the wavelength interval Ω. For the camera, we assume that Γ can be inverted and τc = 1. We used Simpson’s rule as a standard quadrature model for the numerical integration in order to simulate the camera response values in accordance with (2). The approximated model is formulated as
m i i Ω x ≈ ξ2 wi (λt )q(λt )r(λt ) (8) 3m t=0
1358
www.DownloadPaper.ir
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010
Fig. 3. Leica system with (solid) an additional band in the 705–755-nm waveband, (dash-dotted) an additional band in the 710–725-nm waveband, and (dashed) an additional band in the 695–725-nm waveband.
where m is given by the fixed sampling interval ∆λ for the region Ω and q is the quadrature weight of Simpson’s rule [21]. Using a vector notation r = (r(λ0 ), . . . , r(λm ))T ∈ n R (n = m + 1) for the measured reflectance and a matrix notation for the responsivity matrix W ∈ Rk×n (including the weight q), we can write (2) as x = (x1 , . . . , xk )T ≈ W r.
(9)
This numerical model was used to form the simulated responses from a sampled spectrum r. The sampling interval was set to 5 nm (n = 93), which corresponded to the maximal spectral resolution of our reflectance data. III. E XTENSION OF THE S ENSOR S YSTEM We chose to study how the addition of a new characteristic response function would improve the classification performance of the four-band Leica system introduced in Section II-B. Three alternative response functions with different bandwidths are considered in this paper. In the first case, the additional response function was chosen based on the existing real system. In some configurations, the Leica system provides an optional near-infrared channel supported in the region of 705–755 nm [2]. The four-band Leica system and the extended system are shown in Fig. 3. In the second case, the four-band Leica system was extended using an extra response function based on the properties of the reflectance ensembles of the trees and the original Leica system. We fixed two different bandwidths and calculated the positions of these response functions by measuring the separability among the three classes. The positions of the new response functions were allowed to change only in the nonsupported 662–833-nm gap of the original Leica system. The details of the calculation are described below. Two new response functions corresponding to the different bandwidths are derived by maximizing the average Jeffries–Matusita distance (see [26]) of the three tree classes. This distance is based on the calculation of the average Bhattacharyya distance between the density functions of two classes. Assuming normal distribution for the classes, the distance between class i and j is defined as Jij = 2 (1 − exp(−αij )) Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:487 UTC from IE Xplore. Restricon aply.
(10)
Fig. 4. Leica system and average spectra for the second derivatives of the tree ensembles in the wavelength region of 420–850 nm (solid line = spruce, dash-dotted line = birch, and dashed line = pine).
where 1 1 αij = (xi −xj )T ((Σi +Σj )/2)−1 (xi −xj )+ δij (11) 8 2 1 1 δij = ln(10) lg (|(Σi +Σj )/2|)− lg (|Σi |)− lg (|Σj |) . 2 2 (12) In the earlier formulation, covariance matrices Σi and Σj correspond to the simulated 5-D multispectral responses, and the determinants of these matrices are presented in their tenbased logarithmic scales (see [26] for more information). Multispectral responses were simulated using (9) with concatenated ˆ T = [w ˆp,b W ]T ∈ Rn×5 , where W corresponsivity matrix W p,b respond to the four-band Leica system and w ˆp,b correspond to the fifth response function with bandwidth (b) and position (p). In this paper, the bandwidths were chosen to be 30 and 15 nm. For both of these bandwidths, we calculated the optimal positions from the region of 662–833 nm (using a 5-nm sampling grid), which were defined to correspond to the maximal average distances between the classes. In the calculations, we used randomizations of available dataset, where 75% of the available reflectance data were used to simulate the multispectral ensembles for the three classes. Approximately, same amount of data samples was used for every class. Because of the deviation in the properties of the randomized datasets, there was some variation in the optimal positions. This variation with respect to optimal positions was approximately ±10 nm, but all the distance values in this region were close to the maximal values. The response functions chosen for this paper are located in the regions of 695–725 and 710–725 nm (see Fig. 3). To support the choice of the new response functions presented earlier, we analyzed the differences in second-order derivative features between classes so as to identify changes in the reflectance curve. Derivative analysis has already been used for remotely sensed data in previous studies [5] [26]. In order to dampen the effect of measurement noise, mean filtering with a window size of three units was performed before the calculation of the divided difference approximation for the second derivative [26]. When the average derivative curves are analyzed, it can be seen that the 660–830-nm interval shows an interesting behavior when compared with other wavelength regions and locations of the original Leica response functions (see Fig. 4). The average of the second derivative for the birch
www.DownloadPaper.ir
HEIKKINEN et al.: SIMULATED MULTISPECTRAL IMAGERY FOR TREE SPECIES CLASSIFICATION
minimization problem ⎧ ⎨ minw,b,ξ 12 wT w + C li=1 ξi , s.t. yi wT Φ(xi ) + b > 1 − ξi , ⎩ and ξi > 0,
i = 1, . . . , l i = 1, . . . , l
1359
(15)
where the term wT w/2 corresponds to the margin between classes, the parameter C controls the penalization of the samples located at the incorrect side of the decision boundary, and {ξi }li=1 are slack variables which indicate misclassification of sample xi when ξi > 1 [24]. The solution is obtained using the dual space of Lagrange multipliers and the property κ(x, z) = Φ(x)T Φ(z) Fig. 5. Average spectra for the second derivatives of the tree ensembles in the unsupported wavelength region of 660–780 nm of the Leica system (solid line = spruce, dash-dotted line = birch, and dashed line = pine).
spectra deviates strongly from that of the pine and spruce spectra in the regions of 690–720 and 725–755 nm. It can be also seen that the derivative of spruce spectra shows deviation in shape from that of the pine and birch spectra in the region of 710–725 nm (see Fig. 5). Summarizing the earlier discussion, the final candidates for the fifth band were χ[705,755] (λ), χ[695,725] (λ), and χ[710,725] (λ). The classification performance was studied for the three five-channel systems corresponding to these response functions. Details of the classification are presented in the following sections. IV. C LASSIFICATION D ETAILS We used an SVM classification algorithm to discriminate between the simulated multispectral or hyperspectral measurements. The algorithm is based on the optimization problem of finding a separating hyperplane between the feature vectors of two classes [24]. The separating hyperplanes identified by the SVM maximize the margin between the classes and are robust for the classification of unseen samples. Since the multispectral/hyperspectral signatures inside the three classes were not mixed with signatures from other sources, this setting can be defined as a pure pixel classification [3]. Let us assume that we have a binary classification problem with training data in the form {xi , yi }li=1 , with xi ∈ Rk and yi ∈ {−1, 1}. In the SVM framework, it is assumed that the data are mapped to some feature space F with feature map Φ : Rk → F
(13)
and an explicit representation of the decision function is written as
f (x) = sign wT Φ(x) + b (14)
where b is a bias term and wT Φ(x) + b = 0 defines a hyperplane in the feature space. If the data are separable in the feature space, it can be written that f (xi ) ≥ +1, when yi = +1 and f (xi ) ≤ −1, when yi = −1. Assuming that the two classes are not separable in the feature space, the classification model is derived as the solution to the Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:487 UTC from IE Xplore. Restricon aply.
(16)
where kernel κ defines the mapping Φ : Rk → F of input samples x, z ∈ Rk to the feature space F [24]. In this way, nonlinear decision boundaries in the input space are defined without having to make explicit use of the possibly infinite dimensional feature space F. The decision function for the SVM becomes
l s αi yi κ(x, xi ) + b (17) f (x) = sign i=1
s where ls is the number of support vectors, {αi }li=1 are the calculated Lagrange multipliers, and κ is the selected positive definite kernel function [24]. The algorithm for multiclass classification is an extension of binary classifications using separate binary classifications. For more information on SVM and multiclass techniques, please refer to the study in [7], [18], and [24]. The present SVM classification was performed with a polynomial kernel of first degree
κL (x, z) = xT z + 1
(18)
and a Gaussian kernel
κG (x, z) = exp −δx − z22
(19)
where δ defines the length scale of the kernel. When the firstdegree polynomial kernel is used, it is assumed that the decision boundaries between the classes are hyperplanes in the original input space. A. Data Preprocessing The data were standardized to a zero mean and unit variance before the calculations. For the Gaussian kernel, this preprocessing is equivalent to the use of a kernel (20) κG (x, z) = exp −δ (x − z)2Σ−1 .
The norm x2Σ−1 = xT Σ−1 x is defined by a diagonal matrix 2 Σ−1 ii = 1/σi , and σi denotes the standard deviation of the ith components of the training set. The above preprocessing step can be generalized with a full covariance matrix using the translation invariant Mahalanobis
www.DownloadPaper.ir
1360
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010
TABLE I C LASSIFICATION E RROR R ATES W HEN THE S IMULATED H YPERSPECTRAL DATA A RE U SED . S TATISTICALLY S IGNIFICANT D IFFERENCES TO THE B EST P ERFORMING K ERNEL A RE I NDICATED W ITH U NDERLINES
Fig. 6. Assumed birch outliers (10% of the data) for one training set. The outliers are calculated from the Leica DN using the Mahalanobis distance. The data in the figure are represented in terms of the two most significant principal components of the set.
kernel κM (x, z) = exp −δ/m (x − z)2Σ−1
(21)
where Σ = E[(x − x)(x − x)T ] denotes the covariance matrix of the training ensemble {xi }li=1 , x denotes the expected value of the ensemble and m = k [1]. This preprocessing corresponds to data whitening, where the data are first represented in orthogonal directions defined by the eigenvectors of the covariance matrix and scaled to have equal variance in this orthogonal representation. Using eigendecomposition of the covariance matrix Σ = U SU T and notation d = (x − z) for the difference vector, the generalized norm can be written in the form dT Σ−1 d = dT U S −1 U T d =
k
(dT ui )2 /Sii
(22)
i=1
where ui and Sii denote the ith eigenvector and corresponding eigenvalue, respectively. Equation (22) shows that coordinates dT ui of the difference vector are divided by the data variances in the corresponding directions. Directions corresponding to a small data variance are given more weight. Usage of Mahalanobis kernel is closely related to outlier removal based on the Mahalanobis distance [23] x − x2Σ−1 = (x − x)T Σ−1 (x − x).
(23)
When using this distance for data preprocessing, we assume that the data distribution in one class has an ellipsoidal shape. According to this assumption, large Mahalanobis distances indicate anomalous data items and should be removed from the ensemble. For example, the dataset used in this study includes reflectance measurements with different solar geometries and therefore contains some variation that might disrupt the classification performance. It can be expected that this kind of refinement method for training data weights the essential parts of the ensemble more efficiently (see Fig. 6). B. Model Training The kernel and margin parameters were found using a tenfold cross-validation routine [24], with a grid search performed for the Gaussian kernel, while an alternative two-step line search Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:487 UTC from IE Xplore. Restricon aply.
method was used for the Mahalanobis kernel [1]. In the first step, a line search for parameter C was performed using a fixed value δ = 1. In the second step, the resulting parameter C was fixed, and a line search was performed for parameter δ. This provided a significant speeding up of the training phase without any significant decrease in accuracy. The kernel evaluations were carried out using the highdimensional radiance measurements or simulated camera responses (DNs). The multiclass “one against one” method was used for the classifications, which were carried out using the SimpleSVM Matlab toolbox (v.2.31) [17]. In this method, the SVM is trained separately for each pair of classes, and the decision regarding the class label for the test sample x is made by voting between the binary classifiers concerned. This strategy has been verified as accurate for land cover classification [18].
V. E XPERIMENTAL D ETAILS The classification performance was calculated for the DNs from the simulated multispectral systems and for the simulated full-resolution radiance data with 5-nm sampling. The wavelength interval was fixed at 390–850 nm. The covariance matrix is poorly suited for use with full-resolution data due to the small effective dimensionality of the data and the instability of the inversion. Magnitude of the condition numbers of the covariance matrices of radiance data was 1011 . Because of this, the results of the classification of the full-resolution data with the Mahalanobis kernel are not presented. A subspace mapping or regularization technique of some kind would be needed in order to use this kernel efficiently with high-dimensional data. Misclassification ratios were calculated for the combinations of sensor system and kernel, employing five randomizations of the available data to the training and test sets. These same randomizations were used for all the systems and kernels. The sample sizes for the tree classes were the following: birch, 336; pine, 369; and spruce, 348. In each randomization, 75% of the samples in each class were assigned to the training set, and the remaining 25% were used in the test set. The data were standardized for the first-degree polynomial and Gaussian kernels. This data preprocessing method was also compared to preprocessing with “outlier” removal. In this paper, 10% of the training data items corresponding to the largest Mahalanobis distances were removed; this is done for each class separately using the simulated 4- or 5-D camera responses. The results are presented in Tables I–V. The results obtained after extending the Leica system with additional sensors are denoted in Tables IV and V by the respective wavelength intervals supported. In the following, the Gaussian and Mahalanobis kernels are called as nonlinear kernels.
www.DownloadPaper.ir
HEIKKINEN et al.: SIMULATED MULTISPECTRAL IMAGERY FOR TREE SPECIES CLASSIFICATION
TABLE II C LASSIFICATION E RROR R ATES W HEN DN S F ROM THE L EICA S YSTEM A RE U SED . S TATISTICALLY S IGNIFICANT D IFFERENCES TO THE B EST P ERFORMING K ERNEL A RE I NDICATED W ITH U NDERLINES
TABLE III C LASSIFICATION E RROR R ATES W HEN DN S F ROM THE L EICA S YSTEM A RE U SED AND 10% OF THE T RAINING DATA I TEMS A RE R EMOVED AS O UTLIERS . S TATISTICALLY S IGNIFICANT D IFFERENCES TO THE B EST P ERFORMING K ERNEL A RE I NDICATED W ITH U NDERLINES
TABLE IV C LASSIFICATION E RROR R ATES W HEN DN S F ROM THE F IVE -C HANNEL S YSTEMS A RE U SED . S TATISTICALLY S IGNIFICANT D IFFERENCES TO THE B EST P ERFORMING K ERNEL AND S YSTEM A RE I NDICATED W ITH U NDERLINES AND B OLDFACE , R ESPECTIVELY
1361
hypothesis is true, the probability of having a McNemar’s value greater than the critical value is less than 5%. Results are presented in Tables I–V, so that, for every sensor system, statistically significant differences to the best performing kernel are indicated with underlining. The McNemar’s test for statistical significance shows that there are significant differences between the first-degree polynomial and nonlinear kernels. The significant difference between the Gaussian and Mahalanobis kernels can be seen only for the simulated fourchannel Leica system (Sets 2 and 5). When the outlier removal is performed, the significant differences between these kernels vanish. In the case of the five-channel systems, no significant difference is detected between these two kernels. For every kernel (Tables IV and V), statistically significant differences to the best performing sensor system is indicated with bold face notation. It was validated for all the kernels that there are no statistically significant differences between the systems with additional support in the 695–725- and 705–755-nm regions. This result is also valid for the case where outlier removal is used. Significant differences can be found when the system supported in the region of 710–725 nm is compared with two other systems. B. Classification Results
TABLE V C LASSIFICATION E RROR R ATES W HEN DN S F ROM THE F IVE -C HANNEL S YSTEMS A RE U SED AND 10% OF THE T RAINING DATA I TEMS A RE R EMOVED AS O UTLIERS . S TATISTICALLY S IGNIFICANT D IFFERENCES TO THE B EST P ERFORMING K ERNEL AND S YSTEM A RE I NDICATED W ITH U NDERLINES AND B OLDFACE , R ESPECTIVELY
A. Statistical Significance of Classification Differences McNemar’s test was used to test the statistical significance between the classification results [6], [8]. The McNemar’s value with “continuity correction” is defined as M=
(|f12 − f21 | − 1)2 f12 + f21
(24)
where f12 is the number of samples misclassified by classifier 1 but not by classifier 2 and f21 is the number of samples misclassified by classifier 2 but not by classifier 1. The null hypothesis is that the two different classifiers 1 and 2 have the same error rate, which means that f12 = f21 . McNemar’s test is based on a χ2 test with one degree of freedom. In this paper, the χ2 critical value with a 5% level of significance was chosen, and with one degree of freedom, the value is 3.8414. If the null Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:487 UTC from IE Xplore. Restricon aply.
The results in Table I show that the classification based on the hyperspectral measurements of radiance leads to superior performance relative to the classification based on the four-band measurements in Table II. The large number of wavelength bands allows small changes in spectral shape to be detected. For the high-dimensional radiance data, a first-degree polynomial kernel gives a slight improvement in classification accuracy over the Gaussian kernel. The results for the low-dimensional inputs suggest that it is beneficial to use the nonlinear kernel. The use of a nonlinear feature space compensates for the poor spectral accuracy of the measurement system; a decrease of approximately 5–23 percentage points in the misclassification performance is achieved when a nonlinear kernel is substituted for the first-degree polynomial kernel. The Mahalanobis kernel outperforms the Gaussian kernel in four cases and has almost equal performance for one set. It can be seen that the maximal difference in misclassification in favor of the Mahalanobis kernel is 16 percentage points. It is shown in Table III that the removal of 10% of data points as outliers clearly increases the accuracy of the first-degree polynomial and Gaussian kernels when compared with the training sets including outliers, whereas for the Mahalanobis kernel, the results are similar with or without this preprocessing. We also studied how an additional response function would enhance the classification performance. Three wavelength locations with varying support were tested. It is shown in Table IV that all these five-channel systems show a significant improvement in classification performance relative to the results for the four-channel system in Table II, with the misclassification ratio decreasing by 1–23 percentage points depending on the kernel. Nonlinear kernels seem to benefit most from the addition of a new band; the largest increase in accuracy is obtained for Set 2. For the Gaussian kernel, there is a decrease of 22.8 percentage
1362
www.DownloadPaper.ir
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010
points in the misclassification ratio. The response function in the 705–755-nm region shows almost identical performance to that in the 695–725-nm region, with the misclassification ratio differing between these two regions by 0–4 percentage points in favor of the former. A response function in the region of 710–725 nm leads to the most accurate classification with both the Gaussian and Mahalanobis kernels, reducing the average error by approximately 0.5–9 percentage points relative to the other two five-channel systems. On the other hand, this sensor system leads to an increase in the errors relative to the other five-channel systems with the first-degree polynomial kernel. In most cases, the Gaussian and Mahalanobis kernels have similar performance with these five-channel systems. Outlier removal for the five-channel systems (Table V) again leads to improved performance for the first-degree polynomial kernel with almost every dataset and every system, whereas the results for the Gaussian kernel with these five-channel systems are poorer in almost every case due to preprocessing. For the Mahalanobis kernel, the results again show a somewhat similar performance with or without preprocessing. The results are poorer for the systems with additional support in the 695–725and 705–755-nm regions but remain almost unaltered for the system with a sensor in the 710–725-nm region. VI. D ISCUSSION We presented the classification results when additional spectral response functions were used with the Leica system. The bandwidth of the best performing response function χ[710,725] (λ) is smaller than that of the other two response functions χ[705,755] (λ) and χ[695,725] (λ), and the wavelength range of 710–725 nm is also covered by the other two additional bands. Thus, the results suggest that the deviation in performance is partly due to the smaller bandwidth of this response function. The effectiveness of the band χ[710,725] (λ) depends also on the kernel because results suggested that the first-degree polynomial kernel does not show consistent performance difference between the three different bands. The results also show that there was no significant performance difference between the χ[705,755] (λ) and χ[695,725] (λ) bands, although the bandwidth difference is 20 nm. The experiments suggest that a decrease in the bandwidth (from 30 to 15 nm in this paper) is useful only if the location is specified accurately. For a response function with 15-nm bandwidth, a small shift in the position had sometimes significant effect for the classification of samples from the test sets. The experiments also showed that the band-selection method for the optimal positions had deviating results, depending on the used data randomization (75% of the available data were used in every randomization). Although the deviation in optimal positions was small, a careful data evaluation is needed, particularly when position is calculated for the response function with small bandwidth. The three new response functions were located on the wavelength interval of 660–830 nm in this paper. This region is motivated since response functions in this region avoid overlap with the four Leica response functions. In addition, the derivative analysis shows that the tree spectra have interesting features Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:487 UTC from IE Xplore. Restricon aply.
in this region. It can be argued that other useful regions can be found beyond the 660–830-nm region. For example, the average second derivatives have also elevated values in intervals of 495–525 and 590–605 nm, as shown in Figs. 4 and 5. These two intervals are not supported by the original Leica system and might also provide valuable information from the tree species. However, further investigation is needed to find out if these wavelength regions have potential also for the classification of these tree species. When classification of these reflectance data at the same resolution as in the present case was attempted previously with linear subspace classifiers in [13], a linear pseudoinverse classifier was found to give the best performance, with a similar performance to that of the SVM model used here for the simulated radiance data. In comparison with our results for the lowdimensional multispectral measurements, it has been reported that it is beneficial to increase the order of the polynomial kernel if only a small number of input variables are present [11]. Results suggest that the measurements obtained from a simulated hyperspectral imaging device will capture essential information from the training set also with first-degree polynomial kernel. Some techniques for the preprocessing of training sets were evaluated here in order to improve further the performance of the SVM classification based on the lowdimensional multispectral measurements. For the four-channel system, it was verified that the usage of the Mahalanobis distance as a preprocessor or the usage of the Mahalanobis kernel improved the classification accuracy. This suggests that the SVM model based on the Mahalanobis kernel is robust with respect to outliers in the case of the datasets used in this paper. The results for the five-channel systems suggest that the outlier removal used here is beneficial only when the classifier is used with a first-degree polynomial kernel. The Gaussian kernel was capable of extracting essential information from the training data of these five-channel systems without outlier processing. On the other hand, the usage of the Mahalanobis kernel did not lead to any decrease in performance when compared with the Gaussian kernel. It can be assumed that some of the so-called “outliers” in data will be due to, for instance, varying measurement conditions (e.g., variable sensor–object–sun geometry) or measurement errors. On the other hand, the set of “outliers” includes also samples due to natural spectral variation within and between species. In practice, it might be difficult to remove the disrupting data variation automatically in an optimal way so that no essential information is removed from the training set. The results obtained here for the five-channel systems, for example, suggest that the procedure for the removal of 10% of the training samples using Mahalanobis distance was too extreme and it decreased the classification accuracy. On the other hand, the results for this dataset suggest that the Mahalanobis kernel performed the processing of “outliers” more efficiently and automatically, without need to set any threshold value. It should be noted that, when compared with the Gaussian kernel, different cross-validation routines were used for the training of the SVM with Mahalanobis kernel. Training method based on two line searches has effect to the performance of the Mahalanobis kernel, but the difference to the grid search
www.DownloadPaper.ir
HEIKKINEN et al.: SIMULATED MULTISPECTRAL IMAGERY FOR TREE SPECIES CLASSIFICATION
was found to be small for our data. It is also noted that, when the Mahalanobis kernel is used, the inverse covariance matrix includes the effect of the training samples from all the three classes (total covariance matrix). It was verified for these data that the set of outliers detected using the Mahalanobis distance for every class separately is similar to the set of outliers when calculated using the inverse of the total covariance matrix. This similarity gives some explanation for the similar results of the Mahalanobis and Gaussian kernels with outlier removal. VII. C ONCLUSION A significant amount of redundancy exists in spectral radiance from natural objects, and intelligent signal measurement (or compression) is appropriate. This was achieved here using four-channel system based on a real Leica ADS40/ADS80 sensor system. A simulated optical radiation model was used to evaluate the tree species classification performance of the given sensor system using the SVM classifier with three different kernel functions. The effects of an additional fifth spectral band and data preprocessing were studied using this simulator. We have employed a model in the simulations which includes the following properties for the sensed signal. 1) The reflectance spectra (as measured at ground level) are assumed to correspond to the signal sensed from the geometry of the airborne camera. 2) A pure pixel assumption [3]. Depending on the distance between the camera and the surface (varying flight altitude), the reflectance distribution for practical measurements is a mixture of different signatures present in the scene. 3) The incident light at the sensor also has some effect on account of indirect scattered component, which has been ignored in this paper. 4) An unknown measurement noise component is included in the measured reflectance distributions and is propagated to simulated responses via a quadrature model in (9). 5) The spectral response functions of the simulated camera were an idealization, and in reality, they will be influenced by the properties of the lens, beam splitter, and interference filters and by the sensitivity of the chargedcoupled device system [2]. In the previous study, it has been shown that it is possible to classify reflectance spectra accurately in a 3-D subspace using spectral response functions from linear subspace classifiers [13]. The modeling presented in this paper is more closely related to practical band construction since it allows us to interpret the subspace-mapped data directly as simulated measurements from a multispectral camera. Real construction of spectral response functions derived from the optimal subspace classifier is impossible due to the wildly oscillating behavior of these functions (see [13]), and it may also be unrealistic to use only certain very narrow wavelength bands in the measurement system. Classification performance nevertheless degenerates significantly from the results obtained with high-dimensional measurements when the camera DNs corresponding to the fixed Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:487 UTC from IE Xplore. Restricon aply.
1363
four-channel Leica system are used as input vectors for the SVM classifier. The results indicate a need for a higher number of bands, decrease in the bandwidths, or new positioning of the bands in order to improve the classification accuracy. Of the three extensions of the Leica system to a five-channel system evaluated here, band selection based on the use of the interval of 710–725 nm showed promising results, with an average misclassification ratio of 15%. It was also assumed that the classification performance in low-dimensional multispectral spaces was decreased due to “outliers” in the class samples. It was shown for the fourchannel Leica system that the use of the Mahalanobis kernel or outlier removal increased the accuracy of the SVM classifier. In addition to this, results suggest that the Mahalanobis kernel performed the outlier processing automatically without any user interference and also provided significant speedup in the training phase of the classifier. ACKNOWLEDGMENT The authors would like to thank anonymous reviewers for the advice and suggestions concerning this paper. R EFERENCES [1] S. Abe, “Training of support vector machines with Mahalanobis kernel,” in Proc. ICANN, 2005, pp. 571–576. [2] U. Beisl, “Absolute spectroradiometric calibration of the ADS40 sensor,” in Proc. Congrès ISPRS Commission Technique I. Symp., Marne-la-Vallée, France, 2006, pp. 14–18. [3] C.-I. Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification. New York: Kluwer, 2003. [4] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspectral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 43, no. 6, pp. 1351–1362, Jun. 2005. [5] T. H. Demetriades-Shah, M. D. Steven, and J. A. Clark, “High-resolution derivatives spectra in remote sensing,” Remote Sens. Environ., vol. 33, no. 1, pp. 55–64, Jul. 1990. [6] T. G. Dietterich, “Approximate statistical tests for comparing supervised classification learning algorithms,” Neural Comput., vol. 10, no. 7, pp. 1895–1923, Oct. 1998. [7] G. M. Foody and A. Mathur, “A relative evaluation of multiclass image classification by support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 6, pp. 1335–1343, Jun. 2004. [8] G. M. Foody, “Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy,” Photogramm. Eng. Remote Sens., vol. 70, no. 5, pp. 627–633, 2004. [9] F. A. Gougeon, D. A. Leckie, D. Paradine, and I. Scott, “Individual tree crown species recognition: The Nahmint study,” in Proc. Int. Forum Autom. Interpretation High Spatial Resolution Digital Imagery Forestry, D. A. Hill and D. G. Leckie, Eds., Victoria, BC, Canada, 1998, pp. 209–223. [10] A. Haara and M. Haarala, “Tree species classification using semiautomatic delineation of trees on aerial images,” Scand. J. For. Res., vol. 17, no. 6, pp. 556–565, Nov. 2002. [11] C. Huang, L. S. Davis, and J. R. G. Townshend, “An assessment of support vector machines for land cover classification,” Int. J. Remote Sens., vol. 23, no. 4, pp. 725–749, Feb. 2002. [12] R. D. Jackson, S. M. Moran, P. N. Slater, and S. F. Biggar, “Field calibration of reference reflectance panels,” Remote Sens. Environ., vol. 22, no. 1, pp. 145–158, Jun. 1987. [13] T. Jääskelainen, R. Silvennoinen, J. Hiltunen, and J. P. S. Parkkinen, “Classification of the reflectance spectra of pine, spruce, and birch,” Appl. Opt., vol. 33, no. 2, pp. 2356–2362, Apr. 1994. [14] I. Korpela, “Individual tree measurements by means of digital aerial photogrammetry,” Silva Fennica Monographs, vol. 32004. [15] I. Korpela and T. Tokola, “Potential of aerial image-based monoscopic and multiview single-tree forest inventory: A simulation approach,” For. Sci., vol. 52, no. 2, pp. 136–147, Apr. 2006. [16] ADS80 Datasheet, Leica Geosystems AG, Heerbrugg, Switzerland, 2008. [Online]. Available: http://www.leica-geosystems.com
1364
www.DownloadPaper.ir
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 48, NO. 3, MARCH 2010
[17] G. Loosli, SimpleSVM: The Matlab Toolbox2004. [Online]. Available: http://gaelle.loosli.fr/research/tools/simplesvm.html [18] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sensing images with support vector machines,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004. [19] G. Mercier and M. Lennon, “Support vector machines for hyperspectral image classification with spectral-based kernels,” in Proc. IEEE IGARSS, 2003, pp. 288–290. [20] E. Oja, Subspace Methods of Pattern Recognition. Hertfordshire, U.K.: Res. Studies Press, 1983. [21] G. M. Phillips and P. J. Taylor, Theory and Applications of Numerical Analysis. New York: Academic, 1973. [22] A. Pinz, “Tree isolation and species classification,” in Proc. Int. Forum Autom. Interpretation High Spatial Resolution Digital Imagery Forestry, D. A. Hill and D. G. Leckie, Eds., Victoria, BC, Canada, 1998, pp. 127–139. [23] R. A. Schowengerdt, Remote Sensing: Models and Methods for Image Processing, 3rd ed. Amsterdam, The Netherlands: Elsevier, 2007. [24] B. Schölkopf and A. J. Smola, Learning With Kernels. Cambridge, MA: MIT Press, 2002. [25] R. Säynäjoki, P. Packalén, M. Maltamo, M. Vehmas, and K. Eerikäinen, “Detection of aspens using high resolution aerial laser scanning data and digital aerial images,” Sensors, vol. 8, no. 8, pp. 5037–5054, 2008. [26] F. Tsai and W. D. Philbot, “Derivative-aided hyperspectral image analysis system for land-cover classification,” IEEE Trans. Geosci. Remote Sens., vol. 40, no. 2, pp. 416–425, Feb. 2002.
Jussi Parkkinen received the M.Sc. degree in medical physics and the Ph.D. degree in mathematics from the University of Kuopio, Kuopio, Finland, in 1982 and 1989, respectively. In 1989–1990, he was a Visiting Researcher with The University of Iowa, Iowa City. In 1990, he was a Visiting Professor with the University of Saskatchewan, Saskatoon, SK, Canada. In 1991– 1992, he was a Professor and the Head of the Department of Computer Science, University of Kuopio. In 1992–1998, he was a Professor of information processing, and in 1995–1998, he was the Dean of the Department of Information Technology, Lappeenranta University of Technology, Lappeenranta, Finland. Since 1999, he has been a Professor of computer science, and since 2007, he has been the Vice Rector responsible for research with the University of Joensuu, Joensuu, Finland. He specializes in spectral color image analysis and pattern recognition. Since 2007, he has been a Visiting Professor with Chiba University, Chiba, Japan. Dr. Parkkinen was the Chairman of the Finnish Pattern Recognition Society in 1995–1999. He is a fellow and a member of the governing board of the International Association of Pattern Recognition. He was the Chairman of the CIE TC8-07 technical committee on multispectral imaging in 2004–2008.
Ville Heikkinen received the M.Sc. degree in applied mathematics from the University of Joensuu, Joensuu, Finland, in 2004. He is currently with the Department of Computer Science and Statistics, University of Joensuu. He has worked with method development in spectral data analysis and classification.
Ilkka Korpela received the Ph.D. degree in forestry from the University of Helsinki, Helsinki, Finland, in 2004. He is currently a Researcher with the Department of Forest Resource Management, University of Helsinki. He has worked with method development in 3-D measurement and classification of forest vegetation using terrestrial and airborne image and light detection and ranging (LiDAR) data.
Timo Tokola received the D.Sc. degree in forestry from the University of Joensuu, Joensuu, Finland. He has over 20 years of professional experience. He is currently a Professor of forest information technology with the Faculty of Forest Sciences, University of Joensuu. He had previously mainly worked in the fields of natural resource inventory, geographical information systems (GIS), information system planning, and forest management planning. He has been working in various projects as a Coordinator. His private and public sector assignments include modern forest management including aerial photography, photogrammetry, satellite remote sensing, terrestrial and airborne laser scanning, GPS-based mapping, GIS database design, analysis of GIS data, and implementation of desktop GIS systems. He has published over 50 scientific refereed papers on database design, GIS, forest resource inventory, and remote sensing. His main interests include developing methods for using remote sensing in natural resource inventory and computer applications for supporting regional decision making.
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:487 UTC from IE Xplore. Restricon aply.
Timo Jääskeläinen was born in Luumaaki, Finland, in 1953. He received the Ph.D. degree in physics from the University of Joensuu, Joensuu, Finland, in 1981. From 1981 to 1991, he was a Chief Assistant and Acting Associate Professor with the University of Kuopio, Kuopio, Finland. From 1987 to 1989, he was a Visiting Research Scientist with Saitama University, Saitama, Japan. In 1991, he was an Associate Professor with the Department of Physics, University of Joensuu. Since 1994, he has been a Professor with the same department. He is also the Head of the department. He has published approximately 100 articles. His research interests include optical materials research, optical metrology, color research, and information optics. Prof. Jääskeläinen is a member of the Optical Society of America.