Sensor Band Selection for Multispectral Imaging via Average Normalized Information Hongzhi Wang, Elli Angelopoulou {hwang3, elli}@cs.stevens.edu Department of Computer Science, Stevens Institute of Technology
Abstract. The information-rich scene descriptors created by multispectral sensors can act as a bottleneck in further analysis. Many of the spectral band selection methods treat the two underlying tasks (feature bands selection and redundancy reduction) in isolation. Furthermore, the majority of the work assumes reflectance data. However, the captured surface radiance varies with scene geometry and illumination. We propose a new band selection method, which uses spectral gradient entropy to choose bands that are more stable to such variations. Equally important, our measurement, the average normalized information (ANI) of a set of selected bands, combines feature band selection and band redundancy together. In our experiments, ANI exhibited comparable performance with mutual information on reflectance data but outperformed mutual information when applied on surface radiance data.
1 Introduction Multispectral imaging allows us to capture scene information beyond the capabilities of RGB or grey-scale cameras. The wealth of data provided by multispectral sensors, especially with respect to the reflectance properties of objects can greatly facilitate further processing [13]. As multispectral sensors become more readily available [7], new limitations are introduced from their employment, particularly in applications that require real time processing, or imaging of moving scenes. Due to the high correlation between different wavelengths, multispectral images provide us with considerably higher dimensionality than is inherent in a scene [8,18]. Without reducing the redundant data, further analysis becomes quite complicated [9,10]. Many researchers are focusing on automated band selection [3,4,5,6,14,17]. Most of the methods usually involve two separate tasks: a) selecting the bands that can indicate the particular material well, feature bands selection; and b) removing the feature bands contributing redundant information, redundancy reduction. For feature band selection, Chang et al. [6] selects the bands based on Fisher’s discriminant analysis. Rennich developed a supervised method based on correlation [20]. Recently, independent component analysis (ICA) was used in selecting feature bands [8]. Information theory has also been used in feature bands selection. Entropy is interpreted as a measure of stability of each individual band in [3]. [4] uses entropy to measure the difference between different classes. For redundancy reduction, [6] used relative entropy to
measure the redundancy. The band selection methods introduced so far separate feature bands selection and redundancy reduction into two steps, which may lead to a suboptimal solution. Consider, for example, redundancy reduction which is based on a redundancy measurement and a specified threshold. Choosing a proper threshold becomes crucial and that can only be efficiently achieved by considering the overall feature information and the redundancy information at the same time. Moreover, most of the existing methods assume reflectance (albedo) data. As multispectral sensors become more widely used, one should be able to select bands directly from the captured radiance values. However, scene radiance varies with geometry and illumination [2,8,18] and the current methods do not take these factors into consideration. We propose a new unsupervised band selection method which integrates the feature bands selection and redundancy reduction into a single procedure. Entropy is used to measure the overall information carried by selected bands based on their spectral appearances, which is used to detect the redundancy. Entropy is also used to measure the stability of the selected band set. All computations are based on spectral gradients since they are more stable than radiance with respect to changes in geometry and illumination [2]. We combine feature selection and redundancy reduction, by defining the average normalized information carried by the selected bands. Band selection experiments on various spectra and classification tests demonstrate that our method selects the material representative bands independent of the target number of the feature bands, scene geometry and illumination.
2 Band Selection via Average Normalized Information (ANI) In this section we first describe how we combine measurements of stability and information loss in a single integrated solution which can be applied on feature selection in general. We then present each of the two measurements in greater detail. 2.1 Average Normalized Information Our goal is to identify features containing the most information for a particular material independent of the variations in scene geometry and illumination. We will use entropy as the measurement of information contained in the selected features. High entropy values indicate that each value is equally likely to occur and are thus not indicative of the particular material. Low entropy values on the other hand can be attributed to the existence of a persistent pattern [11,12,15,4]. Hence, the preferred features should have small joint entropy values, Hjoint (see section 2.2 for details). At the same time, to avoid including redundant features, we need to detect which ones provide repetitive information. As we will show in section 2.3 a good, yet fast to compute, indicator of redundancy is how distinct the values at different features are, measured by Happearance. To combine stability, Hjoint, and information content, Happearance, into a single measurement, we define a new measure F, Average Normalized Information (ANI), as:
F (W ) = H appearance (W ) H jo int (W )
(1)
where W = [W1, W2,…, WM] represents the selection decision. If the jth feature is selected, j=1,2,…,M, Wj is 1 otherwise Wj is 0. According to the principles for feature selection, we want Happearance(W) as large as possible and Hjoint(W) as small as possible. Thus, W should be set so as to maximize the ANI, F. Note that spectral bands are seldomly totally correlated, thus it is highly probable that Hjoint(W)>0. We use a greedy algorithm in computing the selection vector. We initially select the full set, i.e. W(0)={1,1,…,1}. Then at each iteration i, we remove from W(i-1) the feature which maximizes F(W(i))-F(W(i-1)). The algorithm stops when the desired number of features is reached. Note that unlike mutual information which can only measure redundancy, ANI combines a measure of information content with a redundancy measure. More specifically, ANI: a) is based on the diversity of the appearance and b) explicitly measures the joint stability of the features. 2.2 Joint Entropy of Feature Bands Assume that we have N multispectral samples of a material The spectrum of each sample i is denoted by a spectral vector Ii = [Iλi1, Iλi2,..., IλiM], i=1,2,…,N, where M is the number of spectral bands and Iλik is the radiance of the ith sample on the kth band. One could use the radiance Iλik directly, but spectral radiance varies with illumination and geometry. Instead, we use the spectral derivative L(λ):
L ( λ ) = ∂ ( log I ( λ ) ) / ∂λ
(2)
as our scene descriptor, because for diffuse surfaces [2], it extracts the surface albedo (reflectance) information:
L ( λ ) = ρλ ( λ ) / ρ ( λ )
(3)
where ρ(λ) is the surface albedo and ρλ ( λ ) = ∂ρ ( λ ) / ∂λ is its partial derivative with respect to wavelength. As a result, the spectral derivative is invariant to changes in scene geometry and incident illumination. In our discrete spectral sampling, our scene descriptor for sample i becomes:
di = [ log(Iλi 2 ) − log(Iλi1 ),..., log(IλiM ) − log(IλiM −1 ) ]
(4)
If we measure entropy in the original spectral derivatives scale, bands with high spectral derivative values, which often have a broad range of values, will give raise to higher entropy. In order to remove this bias, we stretch the histogram of each band as follows. Let minj and maxj be the minimum and maximum values respectively of the spectral derivatives at the jth band over all N samples. Then the interval [minj,maxj] is evenly subdivided into C subintervals, where C is a free parameter and in our experiments C=10. For any band, a sample may be in one of C states, corresponding to the subinterval in which the derivative of that sample belongs. Let the index to the state at the jth band of a material be a r.v. Xj. When only K bands are selected, the distribution of the samples is a vector of random variables ( X ind (W ) ,..., X ind (W ) ) , where indi(W) is the index of the ith 1
K
nonzero element in the selection decision W. A sample lives in a space of CK states. The number of samples located in the sth state, Ns, depends on the selected band set W. The joint probability ps(W)=P( ( X ind (W ) ,..., X ind (W ) ) is in the sth state | material) can be 1
K
estimated by Ns(W)/N. The joint entropy of the selected band-set is: CK
H jo int (W ) = H (X ind1 (W ) ,..., X ind K (W ) ) = − ∑ ps (W )log( ps (W )) s =1
NS N N sin d i (W ) N (W ) N (W ) sin d i (W ) = −∑ s log s log = −∑ N N N N s =1 i =1 CK
(5)
where NS is the number of states with nonzero samples and sindi is the index of the ith state with nonzero samples. Function (5) reaches its maximal value when the spectral derivatives are uniformly distributed across the C K states, which shows the strongest random pattern. When the samples are located in only a few states, there is an underlying pattern and the corresponding joint entropy will be small. Since it is hard to accurately evaluate the joint entropy for large feature sets, if removing several bands results in the same Happearance and the same Hjoint, we use the following approximate joint entropy measurement to distinguish these bands.
(
)
H jo int (W ) = H X ind1 (W ) ,..., X ind K (W ) ≈
(
2 ∑ ∑ H X indi (W ) , X ind j (W ) K ( K − 1) i =1:K −1 j =i +1:K
)
(6)
2.3 Information Redundancy Measurement To control the redundancy of the selected features, we analyze the total information measured from the spectral appearances of the band set. If the total amount of information does not rise or drop after adding or deleting a spectral band, then this band provides redundant information and should be removed. The spectral appearance of a band, j, is defined as its average spectral derivative: N a j = round ∑ dλkj N j=1,…,M-1 k =1
(7)
where dλk,j= logIλk(j+1) - logIλkj. Let the integer set {G1, G2,…, GL} include all possible values of the spectral appearances of the selected bands, where L is the total number of different spectral appearances. Then the number of bands with spectral appearance equal to Gi is ni = ∑ Wj . The probability pi that the spectral appearance of a a =G j
i
selected band is equal to Gi is estimated by ni/N. The appearance entropy of the selected bands is: L L H appearance (W ) = − ∑ [ pi log ( pi )] = − ∑ ∑ W j i =1 i =1 a j =Gi
M −1
∑W j =1
j
log ∑ W j a =G j i
M −1
∑W j =1
j
(8)
To minimize the duplication of information we should choose a set of bands with as wide a range of spectral appearance as possible. Such a set of bands will exhibit maximum appearance entropy.
3 Experiments with reflectance data First, we test the performance of ANI on albedo data and compare it against other band selection methods. The experiments were aimed at testing the stability and information preservation properties of our method. As expected, ANI consistently selects the same bands independent of the attributes of the multispectral sensor and the selected features are sufficient for further analysis (e.g. classification). 3.1 Stability To test whether ANI consistently selects spectral bands, we applied it separately on two distinct multispectral human skin reflectance databases: a) The GRASP skin database [1], which is in the 439-691nm range at 1nm resolution and b) the Oulu skin database [16] in the 400-700nm range at 10 nm resolution. Note that most human skin spectra exhibit two dips and a bump around 545nm, 575nm and 560nm respectively (Fig. 1). reflectivity
reflectivity wavelength (nm)
wavelength (nm)
wavelength (nm)
(a) (b) (c) Fig. 1. (a) Samples of human skin spectra from the Oulu database (b) Samples of human skin spectra from the GRASP database. (c) Band selection results. Red and yellow bars indicate the bands selected for GRASP (40 bands) and Oulu (7 bands) respectively. The bands selected for Oulu data have a 10nm resolution. The overlapping shows the consistency.
Since the reflectance data does not include geometry information, the spectral gradient in (4) was computed without the log. We first tested ANI on the GRASP data and selected 7, 15, 20, and 40 spectral bands. Then we tested ANI on the Oulu data for 2, 3,…, 10 spectral bands. The results are displayed in Table 1. When only 7 bands were selected for the GRASP data, 5 of them are located in the region of the characteristic skin pattern. A similar behavior was observed in the Oulu data. When 7 bands were selected, 4 concentrated in that same region. The bands around 580nm, 590nm, 600nm, 650nm and 690nm overlap those selected from the GRASP data (Figure 2). Due to the coarse spectral resolution, the middle region of the skin spectra is rather flat in the Oulu data. As a result, that region is not selected for the Oulu data. On the other hand, since the GRASP data preserves the spectral details at those wavelengths, this region was selected for the GRASP data. When more bands are selected for the two data sets, the selected bands are still consistent with each other (Table 1). Our experiments showed that, when different numbers of bands are selected as well as when bands are selected from data sets with different spectral resolutions, our method chooses bands consistently.
Table 1. Band selection results K
Band Location (nm)
7
565 573 584 588 593 651 683
Human skin 15 spectra from 20 the GRASP database 40 2 3 4 Human skin 5 spectra from 6 the Oulu 7 8 9 10
562 565 572 573 582 584 585 588 593 605 607 619 651 682 683 529 562 565 570 572 573 582 583 584 585 586 588 593 605 607 612 619 651 682 683 443 448 456 489 503 525 529 555 562 563 565 567 568 569 570 572 573 579 580 582 583 584 585 586 588 592 593 594 595 599 605 607 612 619 651 669 678 681 682 683 580 590 580 590 600 580 590 600 690 580 590 600 650 690 530 580 590 600 650 690 430 530 580 590 600 650 690 430 520 530 580 590 600 650 690 430 520 530 580 590 600 650 670 690 430 440 520 530 580 590 600 650 670 690
3.2 Classification In order to examine the information preservation ability of ANI, we used the selected bands for material classification. The data used in this experiment are the reflectance spectra of human skin and mannequin (both from the GRASP database). These two materials have similar color in RGB space. Feature bands were selected for human skin data only from the training samples. Then the spectral derivatives at the feature bands were used in creating training samples for each of the two databases. These samples were then used to train a three-layer feedforward neural network. After training, classification was performed on the remaining human skin and mannequin samples. Training and classification were repeated 50 times.
Fig. 2. Classification accuracy on spectra of human skin and mannequin.
We also tested the classification performance using ICA [6], correlation [14] and mutual information (MI). Since the computation of mutual information increases exponentially with feature size, feature selection techniques based on mutual information use some form of approximation [17]. We use the following approximation of mutual information in our experiment. Let I(Xi,Xj) be the mutual information between band i and band j. To minimize the information loss, we have the
following function to minimize: M −1 M −1
MI (W ) = 2 ∑ ∑ I (X i , X j )WiW j i =1 j =i +1
M −1 M −1 ∑Wi ∑W j − 1 i =1 j =1
(9)
Fig. 2 summarizes the results. ANI and MI have similar performance (> 98% classification accuracy) and both outperform other methods. This test showed that ANI and MI both maintain the major information of the material.
4 Experiments with surface radiance data We showed that ANI gives similar good classification performance as MI on albedo data. In this section, we test how consistently ANI selects bands compared to MI when applied on radiance data which varies with scene geometry and illumination. We took multispectral images of 2 scenes each containing two diffuse objects: a) a painted soda can and a swan sculpture and b) a green and a yellow plastic peppers. We placed 2 objects in each scene in order to incorporate inter-reflection effects. The scene was illuminated by a point light source located approximately 15cm away from the objects which were about 12cm long. Fig. 3 shows two sample images taken at 520nm. We manually selected 3 regions with different geometries on the soda can and 4 regions with different geometries on the swan sculpture and the plastic peppers (Fig. 3). The sizes of the selected patches are approximately same and are small enough so that the geometry and illumination do not vary significantly within each region. Using the samples within each region, we select bands for each region independently.
Fig. 3. Images taken at 520nm in our experiment. We used four objects: a soda can and a swan sculpture (left) and two plastic peppers (right, green – left, yellow – right) which exhibited isolated specularities. The white squares are the patches used in our experiment. Each patch is small enough that the geometry within each patch is approximately constant.
We define the following consistency measure:
(10) c(k , n ) = kn d (k , n ) where k is the number of features to be selected and n is the number of regions the features are selected from. d(k,n) is the total number of distinct bands when k features are independently selected from n regions. By definition, 1 ≤ c ≤ n and bigger c values indicate better consistency between the selected bands from different regions. For the consistency experiment, MI was tested on radiance data as well as on stretched spectral
derivative data. The performance of ANI and MI is given in Fig. 4. ANI performs more consistently than MI because ANI is biased towards features that appear across the training data which are typically features representative of the underlying material and are thus relatively constant across a variety of scenes. Hence, ANI select the most stable informative set of bands that can be consistently used for the same material in a variety of scenes.
Fig. 4. Consistency comparison between ANI and MI with respect to scene geometry.
3 Conclusions We introduced a new unsupervised band selection method, which is insensitive to variations in scene geometry and illumination. We defined average normalized information, which integrates the feature selection and redundancy reduction tasks into one step. The ANI based band selection method consistently chooses the same feature set independent of the size of the feature bands and spectral resolution. Our classification experiments showed that ANI gives very good classification accuracy comparable to MI. More importantly, we showed that ANI is more robust and more consistent than MI with respect to changes in scene geometry and illumination.
References 1. E. Angelopoulou, “Understanding the Color of Human Skin,” SPIE, 4299, 243-251, 2001 2. E. Angelopoulou “Objective Colour from Multispectral Imaging”, ECCV, pp. 359-374, 2000
3. P. Bajcsy and P. Groves, “Methodology for Hyperspectral Band Selection”, Photogrammetric Engineering and Remote Sensing Journal, vol. 70, pp. 793-802, 2004 4. E. M. Bassett and S.S. Shen, “Information Theory-Based Band Selection for Multispectral Systems”, Proceedings of SPIE, Vol. 3118, pp 28-35, 1997 5. C.-I. Chang, Q. Du, T-L. Sun and L. G. Althouse, “A Joint Band Prioritization and Band-Decorrelation Approach to Band Selection for Hyperspectral Image Classification”, IEEE Trans. On Geoscience and Remote Sensing, Vol. 37, No. 6, pp 2631- 2641, 1999 6. H. Du, H. Qi, X. Wang, R. Ramanath and W. E. Snyder, “Band Selection Using Independent Component Analysis for Hyperspectral Image Processing”, Proceedings AIPR, 2003 7. N. Gat, “Imaging Spectroscopy Using Tunable Filters: A Review.” SPIE, 4056, 50-64, 2000 8. G. Healey and D. Slater, “Invariant Recognition in Hyperspectral Images”, CVPR, 1999 9. A. Jain and D. Zongker, “Feature Selection: Evaluation, Application, and Small Sample Performance”, IEEE Trans. on PAMI, 19(2), pp 153-158, 1997 10. M. Lennon, G. Mercier, M.C. Mouchot and L. Hubert-Moy, “Independent Component Analysis as a Tool for the Dimensionality Reduction and the Representation of Hyperspectral Images”, IGARSS, 2001 11. A. Papoulis, Probability, Random Variables, and Stochastic Process, McGraw-Hill, 1991 12. H. Peng, F. Long and C. Ding, “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy”, PAMI, 27(8), 1226-1238, 2005 13. A. Richards, Alien Vision, SPIE Press, Bellingham Washington, 2001 14. B. D. Rennich, ”Active Multispectral Band Selection and Reflectance Measurement System”, Master Thesis, Air Force Inst. of Tech., 1999 15. C. E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical Journal, Vol. 27, pp. 379-423 and pp. 623-656, July 1948 16. M. Soriano, E. Marszalec and M. Pietikäinen, “Color Correction of Face Images under Different Illuminants by RGB Eigenfaces”, AVBPA, 148-153, 1999 17. J.M. Sotoca, F. Pla and A.C. Klaren “Unsupervised Band Selection for Multispectral Images using Information Theory”, International Conference on Pattern Recognition, 2004 18. D. Slater and G. Healey, “Physics-based Model Acquisition and Identification in Airborne Spectral Images”, International Conference on Computer Vision, pp. 257-262, 2001