Implementation of Hyperspectral Image Unmixing ... - Stanford University

Report 1 Downloads 92 Views
Implementation of Hyperspectral Image Unmixing via Alternating Projected Subgradients Brandon Richardson Final Report for EE364b, Stanford University, Spring 2010-2011 May 25, 2011

1

Topic and approach

This project will focus on solving a hyperspectral image unmixing problem using the alternating projected subgradients method illustrated by Zymnis. My goal is to fully implement the method described in Zymnis paper [AZB07, §2.4], and apply the resulting algorithm to a known hyperspectral image cube.

2

Background

An imaging spectrometer is an instrument that captures a continuous spectrum of light (often ranging from 460nm to 2.5µm) for every pixel in a photograph. Using this type of instrument an image can be taken of any subject and detailed spectral analysis can be run on any part of the image produced. Hyperpectral image umixming is the process by which hyperspectral images are broken down into their most basic parts. This is a key aspect of hyperspectral imaging, allowing a complex spectrum produced by a single pixel to be decomposed into a set of basis spectrums. The basis spectrum when linearly combined make up the spectrum observed in that pixel. This set of basis spectrum can then be easily compared to a known spectrum library to determine the composition of that pixel. This particular technique is extremely useful when the image is of unknown composition.

3

Spectral unmixing problem

In a standard hyperspectral image unmixing problem we are given a hyperspectral image cube Y ∈ Rm×n with n pixels and m frequency bands, we are then asked to find two matrices + W and H such that Y ≈ W H. The W matrix is called the end member matrix. It contains the set of basis spectrum that make up the image Y. The matrix H is called the abundance matrix, it chooses the mixing coefficients of each basis spectrum that makes up the image. 1

So our hyperspectral image unmixing problem becomes an optimization problem where we want Y ≈ W H and we want the neighboring pixels to be similar. We also have to have W and H be positive, since the values represent reflectance, and the columns of H must sum to one. The problem is shown below. [AZB07, §2.4], [NB01] minimize kY − W Hk2F + λ ni=1 subject to W, H ≥ 0, 1T H = 1T P

P

j∈N (i)

khi − hj k1

(1)

In the problem W and H are the optimization variables, while λ is the regularization parameter. The λ factor will set the trade off between the pixel similarity and the data accuracy. The hi term is the ith column of matrix H. N (i) is the set of all pixels in close proximity to pixel i. This problem is biconvex, meaning that if we hold H constant the problem is convex in W, and if we hold W constant the problem is convex in H. The planned approach is an alternating projected subgradient method which is guaranteed to converge to the local minimum. I plan to initialize both H and W and then interchange between taking several subgradient steps in H while holding W constant, and taking a gradient step in W while holding H constant.

4

Method

The method chosen to solve this problem can be broken up into two parts. The first, a minimization over the H variable while holding W constant. This part of the problem is called the abundance Matrix update. The second part of the problem is a minimization over the W variable while holding H constant, this part is called the end-member matrix update. I will begin by describing the abundance Matrix update method. While holding W fixed the problem (1) can be reduced to: [AZB07, §2.4] minimize kyi − W hi k22 + λ subject to hi ≥ 0, 1T hi = 1

P

j∈N (i)

khi − hj k1

(2)

This problem will be solved for each pixel i, i.e. for each column in Matrix H. We will use a standard sub-gradient method to solve this problem. The sub-gradient for this problem is shown below: [AZB07, §2.4] g = 2W T W h − 2W T yi + λs

(3)

The sub-gradient method proceeds by taking a step in the negative sub-gradient direction and then projecting the result back onto the feasible set. Keeping in mind that the use of a sub-gradient method does not guarantee reduction of the objective at each iteration, however, the method does converge so the objective is ultimately minimized. The hi negative sub-gradient step update is shown below: [AZB07, §2.4] (t+1)

hi

(t)

= (hi − γ (t) g)P 2

(4)

In this case the projection back into the feasible set is a minimization problem itself. We need to project the results from the sub-gradient step back onto the probably simplex. This problem is shown below: [AZB07, §2.4] (x)P = argminy≥0,1T y=1 kx − yk2

(5)

This minimization can be efficiently carried out using a technique similar to water filling. Solving the problem (5) is essentially the same as finding a λ and µ such that the derivative of the lagrangian is zero. [AZB07, §2.4] x − y + µ1 − λ = 0

(6)

The algorithm shown below was presented in the paper, and will find y up to a tolerance  > 0. [AZB07, §2.4] given: x ∈ Rk ,  > 0 initialize: µ = 0, λ = 0, y = x while: 1T y − 1 >  or yl < − for some 1 ≤ l ≤ k dµ = (1T y − 1)/k µ = µ + dµ y = y − dµ1 dλ = (y)− λ = λ − dλ y = y − dλ end

(7)

Each column (hi ) of H can be updated simultaneously allowing us to write a single projected sub-gradient update of the abundance matrix shown below: [AZB07, §2.4] H (t+1) = (H (t) − γ (t) (2W T W H (t) − 2W T Y + λS (t) ))P P (t) (t) (t) Sli = j∈N (i) sgn(Hli − Hlj ) √ γ = 1/ p

(8)

The S matrix is assembled by calculating the differences between the abundance vectors of neighboring pixels. Choosing the pixels that we call neighboring pixels becomes an important part of the algorithm. In my implementation of this algorithm I choose q/2 pixels before the pixel of interest and q/2 pixels after the pixel of interest and called these the neighboring pixels. This allowed me to adjust the region of neighboring pixels by adjusting q. This approach only allows me to take into account neighboring pixels that are on the same line of the image as the pixels of interest. A more complicated approach would be necessary if I also wanted to account for neighboring pixels that lie in different lines from the pixel of interest. The second part of the un-mixing problem is the end-member matrix update. Here we will hold H constant and minimize over W. Holding H constant reduces Problem (1) to the convex problem shown below: [AZB07, §2.4] 3

minimize kY − W Hk2F (9) subject to W ≥ 0 The objective is now separable, as well as differentiable in the rows of W. Therefore its simple enough to write out the gradient update for W with the included projection into the feasible set. Here kAk denotes the spectral norm, or the maximum singular value. W (t+1) = (W (t) − γ(W (t) H − Y )H (T ) )+ (10) γ < 2/kHH T k By combining the abundance matrix update with the end-member update we can write out the complete algorithm to solve problem (1). First H and W are initialized with feasible values. The columns of H can be made to sum to 1, if every value in H = 1/k. W can be initialized by randomly pulling k columns out of Y. The algorithm then proceeds by taking P update steps of the abundance matrix and then taking one end-member update step. As shown below this alternating method is repeated T times. [AZB07, §2.4] given: initialize: for: for:

T, P, λ H = H (0) , W = W (0) t = 1 to T p = 1 to P √ H = (H − (1/ p)(2W T W H − 2W T Y + λS))P

(11)

end W = (W + (1/kHH T k)(W H − Y )H T )+ end

5

Implementation of ADMM

The algorithm (11) shown above summarizes the approach taken by Zymnis. In an attempt to improve the algorithm I implemented alternating direction method of multiplier (ADMM) for the abundance matrix update shown below. [SBE10] H (t+1) = (W T W + ρI)−1 (W T Y + ρZ (t) − U (t) ) (12) Z (t+1) = (H (t+1) + U (t) /ρ)P (t+1) (t) (t+1) (t+1) U = U + ρ(H −Z ) This method was taken straight from the ADMM lecture notes; I basically substituted this method for the abundance matrix (H) sub-gradient update that Zymnis used. The method shown above (12) is actually a condensed version of the update which is being preformed on each hi separately. Here I am updating all the columns of H simultaneously in the first step. Then in the second step the projection is done independently on every column of H to create Z. The final step is a simultaneous update of U. This particular implementation is a standard lasso problem. This update excludes the penalty function which addresses pixel dissimilarity, and adds a regularization constraint on H. 4

Figure 1: Objective versus iteration comparing the Subgradient Method to the ADMM Method.

My original implementation of ADMM used CVX to solve the exact objective function but the time it took to solve made it impractical. I tested both the ADMM method and the subgradient Method on the same data, with the same initial conditions and created the graph shown in Figure 1. We can see that the two methods converge within 50 steps. However, we cannot draw any conclusions from the optimal points because the objective functions are fundamentally different.

6

Results

In this section I will describe the application of the subgradient method on two sets of Hyper spectral images. I chose the subgradient method over the ADMM method for these results so that I could benefit from the extra term in the objective penalizing for pixel dissimilarity. I specifically selected these two data sets to stress the algorithm in drastically different areas. The first data set I actually acquired myself, it was taken of a rock sample which was chosen for its unique minerals properties. This set has large uniform areas of minerals, which should stress the algorithms ability to reduce pixel dissimilarity. The second data set was taken from the images produced by the Moon Mineralogy Mapper (M 3 ). The M 3 data I picked was taken of the Mare Orientale, a region know to contain hydroxyl signatures. This set is spectrally very uniform; it will stress the algorithms ability to pull out the proper end member spectra. [PV09] The data I acquired myself has 350 spectral bands linearly spaced from 660nm to 2.4µm. The image size is 265×200 pixels. Applying the matrix un-mixing method to this data set yields surprisingly good results. After several trial runs, I settled on k = 5 and λ = 0.001 values that give the best results. My lambda value was dependent on my q value (I picked q = 5) the number of pixels considered neighbors to the pixel of interest. I selected the same 5

number of abundance matrix update steps P=10 and the same number of total algorithm iterations T = 50 as Zymnis suggested in his paper. Figure 2 shows the objective value vs iteration for the final run of the method. On this data set, the algorithm properly identified two end members that were present in the data, one of which was easily found in the USGS library. Figures 3 and 4 show the properly identified end-members with the mineral spectra shown on the bottom and the pixel abundances shown on the top. Overall, these results seems to show the algorithm properly reduces pixel dissimilarities, creating nice smooth abundance maps. Because the algorithm was run with k = 5, three other end members where also found, they are shown in Figures 5, 6, and 7. The end members shown in Figure 5 and 6 appear to have truly different spectral properties causing the algorithm to pull the features out as an end member, however, I have not been able confirm if these features actually exist and are different minerals. Figure 7 shows a rather interesting end member, as you can see in the spectral graph the spectral signature almost perfectly matches the first end member, the only difference being an overall change in total luminance. This topic will be discussed further in the M 3 data analysis section. The M 3 data set was more challenging, it has 85 spectral bands linearly spaced from 460nm to 3µm. The image dimensions are 27548×304 pixels. This strip of data is one line from M 3 , it begins at the top of the moon, just as the sun begins to illuminate the surface, and ends once it reaches the bottom of the moon, just as the sun stops illuminating the surface. [PV09] I began by choosing the first 800 pixels out of the 27548, this data was of the northern hemisphere of the moon where the sun would illuminate the surface from a very low angle. Using the exact same settings as before, the algorithm found only two end members with non-zero abundance. Figure 8 show the results. Its easy enough to see that the algorithm simply selected two end members with the only difference being total luminance. We can see that the two spectra it found are virtually identical, with the only difference being the total intensity. Next, I choose a region of the moon near the equator which had considerably less contrast, and was overall more evenly lit. Figure 9 show the results are almost identical, with the two spectra virtually identical. I reran all the data using several different λ values without any real effect on the results. My final attempt to find the correct end member spectra, centered on the idea that perhaps the small regions of the moon I used actually had extremely uniform spectral features and I needed to give the algorithm larger variation of data. Compressing the entire 27548×304 pixel data set down to a 1000×304 pixel data set. I was able to run the algorithm on a compressed version of all the data. This resulted in exactly the same problem. There simply isnt enough spectral variation; this causes the algorithm to favor an end member matrix based only on total luminance with little to no spectral difference. The objective term penalizing differences in pixel dissimilarity is not enough to overcome the uneven illumination problem. [PV09]

6

Figure 2: Objective versus iteration for the final run of the subgradient algorithm

Further investigation into this illumination problem is necessary. With more time I would try running the M 3 data using only a small subset of the bands, perhaps from 2µm to 3µm. I would pick this subset of data because I expect there to be the largest hydroxyl signatures in this region of the spectrum. However, for the general case, in order to choose a small subset of the total data requires that assumptions are made about the absorption features of the end member spectra. [PV09] Perhaps a better approach would be to add a penalty term in the objective function which penalizes for similar end member spectra. A possible penalty function is show below: −β

Pk

i=1

Pk

j=i

kwi − wj k1

(13)

Here, wi denotes the ith column of W, and β denotes the regularization parameter. The trouble with this particular penalty function is it does not normalize the end member spectra before subtraction; therefore, differences in total illumination will still affect the results.

7

Conclusion

In conclusion, this paper described the non-convex hyperspectral image unmixing problem and the method proposed by Zymnis to obtain an approximate solution. A possible implementation of ADMM for the abundance matrix update was also described, and results were presented showing the new algorithm converges. Finally, the algorithm was applied to two drastically different sets of data. The first, a rock sample, produced results as expected finding two verified end members one of which was easy to identify in the USGS library. The second data set from M 3 proved to be too spectrally uniform causing the algorithm to choose end members based on total intensity rather then spectral differences. A few possible

7

ways to mitigate this problem were discussed but further investigation would be necessary to test these theories.

References [AZB07] J. Skaf M. Parente A. Zymnis, S.-J. Kim and S. Boyd. Hyperspectral image unmixing via alternating projected subgradients. Proceedings Asilomar Conference on Signals Systems and Computers, page 11641168, 2007. [NB01]

A. Nedi´c and D. Bertsekas. Hyperspectral image unmixing via alternating projected subgradients. SIAM J. on Optimization, 12:109–138, 2001.

[PV09]

J. N. Goswami R. N. Clark M Annadurai J. Boardman B. Buratti J.-P Combe M. D. Dyar R. Green J. W. Head C. Hibbitts M. Hicks P. Isaacson R. Klima G. Kramer S. Kumar E. Livo S. Lundeen E. Malaret T. McCord J. Mustard J. Nettles N. Petro C. Runyon M. Staid J. Sunshine L. A. Taylor S. Tompkins Pieters, C. M. and P. Varanasi. Character and spatial distribution of oh/h2o on the surface of the moon seen by m3 on chandrayaan-1. Science, 326:568–572, 2009.

[SBE10] E. Chu B. Peleato S. Boyd, N. Parikh and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multiplierss. Foundations and Trends in Machine Learning, 2010.

8

Figure 3: Top: Abundance Matrix 1 Bottom: End member spectra 1 graphed along with USGS Libary Saponite spectra to show similarity

9

Figure 4: Top: Abundance Matrix 3 Bottom: End member spectra 3

10

Figure 5: Top: Abundance Matrix 2 Bottom: End member spectra 2

11

Figure 6: Top: Abundance Matrix 4 Bottom: End member spectra 4

12

Figure 7: Top: Abundance Matrix 5 Bottom: End member spectra 5 graphed along with End member spectra 1 to show similarity

13

Figure 8: Top: Image of area taken at 1578nm Middle 1: Abundance Matrix 6 Middle 2: Abundance Matrix 10 Bottom: End member spectra 10 graphed along with End member spectra 6 to show similarity

14

Figure 9: Top: Image of area taken at wavelength 1578nm Middle 1: Abundance Matrix 5 Middle 2: Abundance Matrix 2 Bottom: End member spectra 5 graphed along with End member spectra 2 to show similarity

15