Hyperspectral image processing: A direct image simplification method Christopher A. Neylana, Tyler Rushb, Angel Gutierrezc, Stefan A. Robila*c a Department of Computer Science, The College of New Jersey b Department of Mathematical Sciences, Susquehanna University c Center for Imaging and Optics, Department of Computer Science, Montclair State University, Montclair, NJ 07043 ABSTRACT We describe a novel approach to produce color composite images from hyperspectral data using weighted spectra averages. The weighted average is based on a sequence of numbers (weights) selected using pixel value information and interband distance. Separate sequences of weights are generated for each of the three color bands forming the color composite image. Tuning of the weighting parameters and emphasis on different spectral areas allows for emphasis of one or other feature in the image. The produced image is a distinct approach from a regular color composite result, since all the bands provide information to the final result. The algorithm was implemented in high level programming language and provided with a user friendly graphical interface. The current design allows for stand-alone usage or for further modifications into a real time visualization module. Experimental results show that the weighted color composition is an extremely fast visualization tool. Keywords: Hyperspectral images, feature extraction, efficient data display, color composite images
1.
INTRODUCTION
A hyperspectral image (HSI) is a collection of digital images, taken simultaneously, each corresponding to a small wavelength interval of the electromagnetic spectrum measuring the intensity of the reflected light [1]. In the last decade, hyperspectral data availability and applicability has increased significantly, mainly due to technological advances in both sensor and processing [2]. Such rapid progress has lead to a diversified HSI use, from traditional fields (such as agriculture [3], defense and homeland security [4], geology and mining [5]) to food processing [6, 7, 8], medicine [9], pollutant control [10]. Increased spatial and spectral resolution in HSI has lead to increased challenges in efficient processing. Due to the large size and large number of components, hyperspectral data processing becomes inefficient when ‘traditional’ multispectral techniques are applied [1]. Often, the data is reduced to a lower dimensional space through a diverse set of feature extraction techniques such as Principal Component Analysis (PCA), Independent Component Analysis (ICA), Maximum Noise Fraction (MNF), Orthogonal Subspace Projection (OSP), etc [2]. While feature extraction is usually followed by further processing, in many instances, such an initial step is also used to provide a color composite image that allows a human observer for easy identification of the scene and its details [11]. Such stage is especially helpful when the image corresponds to an area not associated with any prior knowledge and could constitute an initial step for target or endmember identification. Creating a relevant color composite image does not have to be as time consuming as PCA or ICA. Instead, we suggest that weighted averaging of the image cubes can lead to similar relevant color display at a fraction of the computational cost. In this paper we provide an overview of the weighted averaging and describe our design and experiments with various techniques. The paper is organized as follows. In the next section we discuss the concept of weighting images and the formation of color composite display. In section 3 we provide an overview of the distance series used. The technical details of our implementation together with the results of our experiments are discussed in section 4.
*
[email protected]; phone 1 973 655-4230; fax 1 973 655-4166; csam.montclair.edu/~robila
2.
WEIGHTED AVERAGES FOR HYPERSPECTRAL IMAGES
2.1 Hyperspectral Data Format While there is no unifying view of the format for hyperspectral data, we describe below the main standards usually encountered in practice. We note that most of the HSI are stored as a single file, usually accompanied by an additional header file that maintains the information needed to read in the data: the number of bands, number of samples, number of lines, band interleave, data type, byte ordering, and data offset. Each image, called a band, is a two-dimensional matrix of light intensity values with each pixel addressed by an x and y coordinate originating in the upper left corner of the image. Each row of the band is referred to as a line and each column as a sample. Together, these bands make up a data cube where the collection of values from each band at one coordinate is a pixel vector [1]. Depending on the nature of the sensor or on the application that handles the data, HIS is represented in one of the three interleaving options: band interleave by pixel (BIP), band interleave by line (BIL), and band sequential (BSQ). Figures 1 through 3 show the orderings of each pixel where (B1L1S1) is the coordinate of a pixel at Band 1, Line 1, Sample 1. A band contains one 2D image at a specific wavelength and the samples and lines are the (x,y) coordinates of a pixel in that band. BIP stores pixels by band, then sample, then line. In other words, if a pixel coordinate is given by (sample, line, band), BIP cycles the band first, the sample second, and the line last. This implementation stores the data in BIP because it separates the pixel vectors in order of sample by line.
Fig. 1. HSI Representation: Band Interleave by Pixel BIL stores pixels by sample, then band, then line. In other words, if a pixel coordinate is given by (sample, line, band), BIL cycles the sample first, the band second, and the line last.
Fig. 2. HSI Representation: Band Interleave by Line BSQ stores pixels by sample, then line, then band. Thus, if a pixel coordinate is given by (sample, line, band), BSQ cycles the sample first, the line second, and the band last.
Fig. 3. HSI Representation: Band Sequential
Data ordering plays an important role in the speed HIS can be loaded into the memory and processed. Given the large data size, it expected that parts of the HIS file will be swapped in and out of the memory. If the processing technique is focused on spectra manipulation, a BIP representation would be most appropriate. When the data is handled on band by band basis (such is the case for most feature extraction or color composite image display techniques) a BSQ representation is more efficient. Finally, since most hyperspectral sensors are based on pushbroom technologies that collect one line at a time), a BIP representation is most efficient for fast data storage at collection time. 2.2 Weighted Displays The purpose of this research is to see if the two-dimensional RGB image produced by the application of weighted averages on hyperspectral image vectors can be used to identify subject matter as effectively as more complex algorithms that transform, and therefore produce, hyperspectral images, such as MNF and PCA transforms. Compared with a regular color composite image where each of the RGB bands is associated to a single HSI band, the weighted average is based on a sequence of numbers (weights) selected using pixel value information and interband distance. Separate sequences of weights are generated for each of the three color bands forming the color composite image. We will apply the weighted averages to the pixel vectors, in order to incorporate spectral information outside the visible zone. We hope that this information will be able to differentiate objects that appear to look the same to the naked eye when they are, in fact, different; e.g. real leaves and artificial leaves. Because we will be producing an RGB image, three sequences will be applied to each vector of the image—one for R, one for G, and one for B. Each sequence will contain one weight value for every pixel in the vector. Each pixel value will be multiplied by the corresponding weight value of the sequence and then totaled to produce the new pixel value for that coordinate of the final 2D image. In the following we discuss two approaches to weighting, one emphasizing a single band and reducing the importance of the adjacent ones, while the other focused on the adjacent bands and lowering the importance of the centered one. We note that for each band we end up using all the bands in the scene for averaging.
3.
WEIGHTED SERIES
3.1 Direct Weighing In our first approach, based on the idea that the radiant energy gets its amplitude attenuated by the square of the distance, we used the following set of weights. For a band Bk0 the weighted average of the n bands (numbered from 0 to n-1) is obtained by: n −1
Wk 0 = ∑ α k Bk
(1)
0
αk =
1 1 * 2 p ad kk0 + bd kk0 + c
n −1
p=∑ 0
ad
2 kk0
1 + bd kk0 + c
(2)
(3)
where a, b, and c are tuning parameters, and
d kk0 = abs(k − k0 )
(4)
The value p is computed such that: n −1
∑α 0
k
=1
(5)
Fig. 4. Direct Weight Sequence
Fig. 5. Direct Weighting - Effect of More/Less Weight The values of each weight sequence will form a curve in which the highest point is the weight value corresponding to the band on which the sequence will ‘focus’. The weight values will decrease as a function of distance from the focus. Figure 4 shows a pixel vector containing seven intensity values of a particular coordinate (sample, line) of a hyperspectral image. In this case, we have seven bands (numbered from 0 to 6) and focus on band number 2. The blue line represents the values of the multipliers of the weight sequence. The coefficients a, b, and c are arbitrary greater than one and only affect the shape of sequence’s curve. As a increases, the weight on the focus band increases as well, with the weights on the other bands becoming very small, thus making a more pointed curve. Inversely, as c increases, the weight on the focus band decreases having the opposite effect on the curve, making it smoother Figure 5 shows the difference between more weight on the focus and less weight on the focus. The original hyperspectral image was 120 bands in size, ranging from 400nm to 900nm of the spectrum. The Red component of the final (above) image focused on 725nm, Green focused on 525nm, and Blue on 424nm (the wavelengths associated with red, green, and blue) [3]. As b increased in value, so did the tendency for the denominator to become negative. For the implementation, the values a = 1, b = 1, and c =1 were used because it produced the best balance between too much weight on the focus and not enough. 3.2 Inverse Weighing When shifting a focus-band into the non-visible spectrum visually brings forth more information in the final image result, then the inversion of the weights on the spectral data may serve the same purpose. In our second approach, we used the following set of weights:
αk =
ad kk2 0 + bd kk0 + c p
(6)
n −1
p = ∑ ad kk2 0 + bd kk0 + c 0
where all the parameters and notations are the same as above.
(7)
Fig. 6. Inverse Weight Sequence
Fig. 7. Inverse Weighting - Effect of More/Less Weight The values of each weight sequence will form an inverse curve in which the lowest point on the curve will be the weight value corresponding to the band on which the sequence will 'focus'. In this case we will give less weight to the colors that appear similar in the visual spectrum. Figure 6 shows a pixel vector containing seven intensity values of a particular coordinate (sample, line) of a hyperspectral image. Just like Figure 4, the blue line represents the values of the multipliers of the weight sequence. In this case, the focus of the weights is on band 2, with the amount of weight on the focus-band increasing by with the square of the distance. Figure 7 shows the effect of coefficient manipulation on the final image. As in Figure 5, in these images, the Red component of the image focused on 725nm, Green focuses on 525nm, and Blue on 424nm and were created from a hyperspectral image consisting of 120 bands ranging from 400nm to 900nm. Unlike the original distance series, where the alteration of these coefficients produced visible changes in the final image, their manipulation has little to no visible effect on the result. For simplicity’s sake, therefore, we will use a = 1, b = 1, and c = 1 for the implementation.
4.
EXPERIMENTAL RESULTS
A Java-based GUI was developed in order to easily view and create images from hyperspectral data. The Java GUI was made using mostly Java Swing components, and was focused on two areas, being able to display a particular band and being able to display one image that portrays the whole image via weighting the data of the images around user specified wavelengths representing the red, green and blue of the visible light spectrum. The program also has the abilities to convert a band or weighted image to pgm image file format and rotate clockwise and counter-clockwise by ninety degrees. Java was used in order to take advantage of its extensive Standard API and cross-platform compatibility [13].
Fig. 8. MNF and PCA Transforms
Fig. 9. Direct Weighting - Color Association Shift As stated, the purpose of producing this two-dimensional representation of the hyperspectral image is to see if it can be effectively used in object identification. Figure 5 shows the RGB output of the direct weighted series on a plant image with the focus of the RGB components being on the red, green, and blue wavelength centers respectively. The sensor producing the image is a SOC 700 hyperspectral camera provided by Surface Optics able to collect 120 bands between 400 and 900 nanometers. It uses pushbroom technology that collects the data cube one line at a time. In the current configuration the sensor is collecting the data in vertical lines from left to right. Each image band has a resolution of 640x640 pixels. The leaves in the image are a mixture of real and artificial leaves. While visibly identical, each reflect unique spectra of light. Transformation algorithms like MNF and PCA bring this relevant information forward. Figure 8 shows the first three bands of the results of the MNF and PCA transforms on the potted plant. The purple leaves in the MNF output are the real leaves of the plant, while the pink leaves are artificial. The MNF transform clearly shows a visible differentiation between the real and artificial leaves, however the distance series, as shown in Figure 5, does not adequately do the same. Each of the three RGB color components that make up the image correspond to their equivalent wavelength—the focus band used by the distance series. This means that the image will still tend to appear “natural” to the naked eye. Even though there is a visible difference in the greens of the artificial and real leaves of Figure 5, the colors still appear similar. In order to bring out the difference in these colors, we shifted the color association of the output. Figure 9 shows the exact same output, but with the color associations shifted once. On the left, the RGB components are associated with 700nm, 525nm, and 425nm respectively. That same data is represented on the right, but the RGB components are instead associated with 525nm, 424nm, and 700nm respectively. Now the color differences in the leaves are more easily visible. Figure 10 is a side by side comparison of the MNF transformation and the distance series. Circled are two leaves that, on the MNF, are colored such that they look like artificial. On the Distance image, the same leaves are colored as if they were real. The circled leaves are indeed artificial, which means if the distance series only focuses on the visible parts of the spectrum (R:725nm, G:525nm, B:425nm), information about the differences is lost, since the weight put on raw data is inversely proportional to the square of the distance from the focus. Figure 11 shows the output of the distance series when the R component corresponds to a wavelength focused on infrared. At R corresponding to 800nm, the real leaves begin to turn orange while the artificial leaves remain green. By moving a focus band beyond the visible wavelengths, the distance series put a greater emphasis on the infrared information.
Fig. 10. Direct Weighting - MNF vs. Distance
Fig. 11. Direct Weighting - Infrared Shift
Fig. 12. Direct Weighting - Infrared Shift Figure 12 shows the direct weighting applied to of an aerial HYDICE hyperspectral image of 120 bands ranging from 400nm to 2500nm of the spectrum. The left image is with a focus on the visible spectrum (R:725nm, G:525nm, B:424nm), the center image corresponds R with 1000nm, and the right image corresponds R with 1300nm. Visually, the spot in the upper left of each image is supposed to be completely camouflaged, however because the spectra it reflects differs from that of its surroundings, the distance series makes this visually apparent, especially when more of the infrared spectra is taken into consideration. Figure 7 shows the output of the inverse distance series. The visible difference between the real and artificial leaves is now very apparent. Now, the real leaves are bright blue while the artificial leaves remain dull blue. Figure 13 shows a comparison between the MNF transform, the new distance series, and the old distance series. Originally, when the focus-band was given the most weight of the sequence, there was ambiguity with two of the leaves as to whether or not they were real or artificial. By inverting the weights being applied to the hyperspectral image, increasing the weight value with distance from the focus and thereby giving more weight to the non-visible spectrum, we were able to bring forth more spectral information in a two-dimensional RGB image without a sacrifice in computation time. Figure 14 shows the impact of shifting for the indirect series. On the left, the RGB components are associated with 700nm, 525nm, and 425nm respectively. That same data is represented on the right, but with the RGB components associated with 525nm, 425nm, and 700nm respectively. Finally, Figure 15 shows the same output of the new distance series, but with the R component corresponding to a wavelength focused on infrared. The purpose of color shifting and infrared shifting the images in the old distance series was to visually bring out information that was lost by the weight sequence. Rather than making the color differences much more visible as in the old distance series, the color differences are just as visible with the new distance series. Color shifting only serves to change the way the same information is displayed. Infrared shifting has very little visible effect on the final image.
a) b) c) Fig. 13. a) MNF vs. b) Inverse Weighting, and c) Direct Weighting
Fig. 14. Inverse Weighting - Color Association Shift for Inverse Weighting
Fig. 15. Inverse Weighting - Infrared Shift
5.
CONCLUSION
The transformation algorithms attempt to ‘bring forward’ relevant information in the hyperspectral image. Of the multibanded image the transforms output, only the first handful of bands actually contain relevant data; thereafter is nothing but noise and blur. Figure 16 shows the first band and 40th band of the output of the MNF transform. While the first band contains a wealth of information (as do the bands that immediately follow it) the 40th contains nothing but noise. It is often difficult to identify how many bands contain relevant information and most color composite schemes in fact focus on only three bads. The application of weighted averages to the vectors of a hyperspectral image, like the distance series, is a way to turn a hyperspectral image of any size ranging any wavelength to a single two-dimensional RGB image that can be used to, at least, visually identify different subject matter that would otherwise be indistinguishable to the naked eye. Using focus-bands corresponding to the red, green, and blue wavelengths, decreasing the weight with distance from the focus band, as in the old distance series, visually displays spectral information with a strong emphasis on the visual information—mimicking the degradation of energy with distance. Using the same focus-bands with the same correspondences, increasing the weight with distance from the focus band, as in the new distance series, visually displays non-visible spectral information, thereby better illustrating the hyperspectral data than the old distance series.
Fig. 16. MNF bands 1 and 40
6.
REFERENCES
[1] C.-I. Chang, Hyperspectral Imaging: Techniques for Spectral Detection and Classification, Kluwer Academic, 2003 [2] C.-I. Chang, Hyperspectral Data Exploitation: Theory and Applications, Wiley Interscience, 2007 Th [3] T.M. Lillesand, R.W. Kiefer, J.W. Chipman, Remote Sensing and Image Interpretation, 5 Edition, John Wiley &
Sons Inc., 2003 [4] J. Goutsias and A. Banerji, “A morphological approach to automatic mine detection problems,” IEEE Trans.
Aerospace and Electronic Systems, vol 34, no. 4, pp. 1085-1096, 1998 [5] K. Yang, J. F. Huntington, J. W. Boardman, P. Mason, “Mapping hydrothermal alteration in the Comstock mining
district, Nevada, using simulated satellite-borne hyperspectral data”, Australian Journal of Earth Sciences, vol 46, no. 6, pp. 915-922, 1999 [6] T. Pearson, “Spectral properties and effect of drying temperature on almonds with conceal damage,” Lebensm.Wiss. U.-Technol. vol 32, pp. 67-72, 1999 [7] M. S. Kim, A. M. Lefcourt, K. Chao, Y. R. Chen, I. Kim, and D. E. Chan, “Multispectral detection of fecal contamination on apples based on hyperspectral imagery: Part I.Application of visible and near-infrared reflectance imaging,” Transactions of the ASAE, vol 45, no 6, pp. 2027-2037, 2002 [8] M. S. Kim, A. M. Lefcourt, K. Chao, Y. R. Chen, I. Kim, and D. E. Chan, “Multispectral detection of fecal contamination on apples based on hyperspectral imagery: Part I.Application of visible and near-infrared reflectance imaging,” Transactions of the ASAE, vol 45, no 6, pp. 2039-2047, 2002 [9] Z. Liu, Q. Li, J. Yan, Q. Tang, “A novel hyperspectral medical sensor for tongue diagnosis”, Sensor Review, vol 27, no. 1, pp. 57 – 60, 2007 [10] A. Barducci, P. Marcoionni, I. Pippi, and M. Poggesi, " Effects of Light Pollution Revealed During a Nocturnal Aerial Survey by Two Hyperspectral Imagers," Applied Optics, vol. 42, no. 21, pp. 4349-4361, 2003 [11] T. Achalakul and S. Taylor. “A distributed spectral-screening PCT algorithm”, Journal of Parallel and Distributed Computing, vol. 63, no. 3, pp.373-384, 2003 [12] Wikipedia.com. Electromagnetic Spectrum, http://en.wikipedia.org/wiki/Electromagnetic_spectrum, The Wikimedia Foundation, June 2007. [13] J. Gosling, B. Joy, G. Steele, G. Bracha, The Java Language Specification, Edition, John Wiley, 2000