AN APPROXIMATE MESSAGE PASSING APPROACH FOR COMPRESSIVE HYPERSPECTRAL IMAGING USING A SIMULTANEOUS LOW-RANK AND JOINT-SPARSITY PRIOR Yangqing Li? , Saurabh Prasad† , Wei Chen 0 , Changchuan Yin? , and Zhu Han†
arXiv:1607.03380v1 [cs.IT] 12 Jul 2016
?
Beijing Laboratory of Advanced Information Network, Beijing University of Posts and Telecommunications, Beijing, China † Electrical and Computer Engineering Department, University of Houston, USA 0 State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China
ABSTRACT This paper considers a compressive sensing (CS) approach for hyperspectral data acquisition, which results in a practical compression ratio substantially higher than the state-of-theart. Applying simultaneous low-rank and joint-sparse (L&S) model to the hyperspectral data, we propose a novel algorithm to joint reconstruction of hyperspectral data based on loopy belief propagation that enables the exploitation of both structured sparsity and amplitude correlations in the data. Experimental results with real hyperspectral datasets demonstrate that the proposed algorithm outperforms the state-of-the-art CS-based solutions with substantial reductions in reconstruction error. Index Terms— Compressive hyperspectral imaging, lowrank and joint-sparse, compressive sensing, approximate message passing 1. INTRODUCTION Unlike traditional imaging systems, hyperspectral imaging (HSI) sensors [1], [2] acquire a scene with several millions of pixels in up to hundreds of contiguous wavelengths. Such high resolution spatio-spectral hyperspectral data, i.e., three-dimensional (3D) datacube organized in the spatial and spectral domain, has an extremely large data size and enormous redundancy, which makes compressive sensing (CS) [3] a promising solution for hyperspectral data acquisition. To date, most existing designs for CS-based hyperspectral imagers can be grouped into frame-based acquisition in the spatial direction [4–7] and pixel-based acquisition in the spectral direction [8–10]. While a lot of reconstruction approaches for these two acquisition schemes have been proposed, most existing algorithms can only take advantage of the spatial and spectral information of hyperspectral data from the aspect of sparsity (or joint-sparsity). Because the foundation of these algorithms is built on conventional CS, which reconstructs the signals by solving a convex programming and proceeds without exploiting additional information (aside from sparsity or The work of S. Prasad was funded in part by the NASA New Investigator (Early Career) Award, grant # NNX14AI47G. The work of Z. Han was partially supported by the U.S. National Science Foundation under grants ECCS1547201, CCF-1456921, CNS-1443917, ECCS-1405121, CMMI-1434789, and NSFC 61428101.
compressibility) [3]. For hyperspectral data, the spatial and spectral correlations, which not only reflect in the correlation between the sparse structure of the data (i.e., structured sparsity), but also in the correlation between the amplitudes of the data, can be used to provide helpful prior information in the reconstruction processed and assist on increasing the compression ratios. In this paper, the structured sparsity and the amplitude correlations are considered jointly by assuming that spatially and spectrally correlated data satisfies simultaneous low-rank and joint-sparse (L&S) structure. Using a structured L&S factorization, we propose an iterative approximate message passing (AMP) algorithm [11], [12], in order to enable joint reconstruction of the data with the practical compression ratio that is substantially higher than the state-of-the-art. Specifically, in section 2, we introduce the structured factorization representation of the L&S model. In section 3, we propose a novel AMP-based approach, called L&S-approximate message passing (L&S-AMP), that decouples the global inference problem into two sub-problems. One sub-problem considers the linear inverse problem of recovering the signal matrix from its compressed measurements. Another sub-problem exploits the L&S structure of the signal matrix. Then a recently proposed “turbo AMP” framework [13] is used to enable messages to pass between these two phases efficiently. Section 4 presents simulation results with real hyperspectral data that support the potential of the approach to considerably reduce the reconstruction error. In section 5, we conclude the paper. 2. PROBLEM SETUP In this section, we first present the problem for compressive hyperspectral imaging. Then, we propose a structured L&S factorization model for the signal matrix, which will be later exploited to acquire any HSI with very few measurements, via a novel joint reconstruction approach. 2.1. Data Acquisition Model Owing to the inherent 3D structure present in the hyperspectral datacube and the two-dimensional nature of optical sensing hardware, CS-based hyperspectral imagers generally capture a group of linear measurements across either the 2D spatial extent of the scene for a spectral band or the spectral
extent for a spatial (pixel) location at a time, i.e., ft ∈ RN , t ∈ [1, ..., T ]. Then, the compressed measurements y1 , y2 , ..., yT are sent to a fusion station that will recover the original 3D datacube by utilizing a CS reconstruction algorithm. By taking the highly correlated hyperspectral data vectors f1 , f2 , · · · , fT to admit a sparse representation in some orthonormal basis, e.g., DCT basis or wavelet basis, we have, ft = Ψxt , t = 1, . . . , T, (1) N N ×N where sparse signal vectors xt ∈ R , ∀t. Ψ ∈ R is a certain orthonormal basis matrix. Then we obtain the typical CS formulation as follows
R
N
R
Z
X
T
Y R
yt = At xt + wt = zt + wt , t = 1, . . . , T,
(2)
where At = Φt Ψ = [amn ] ∈ RMt ×N , zt ∈ RMt is the measurement output vector, and wt is an additive noise vector with unknown variance τtω . 2.2. Structured L&S factorization for signal matrix As mentioned in the introduction, while the original hyperspectral data can be reconstructed by using conventional CS recovery algorithms, it is possible to achieve a much better recovery performance by applying the L&S model to further exploit the structural dependencies between the values and locations of the coefficients of the sparse signal vectors xt , ∀t. The main reason that we consider X as a L&S matrix is two-fold. First, images from different spectral bands enjoy similar natural image statistics, and hence can be jointsparse in a wavelet/DCT basis [6]; second, a limited number of unique materials in a scenes implies that spectral signatures across pixels can be stacked to form a matrix that is often lowrank [14]. To precisely achieve the benefits of the L&S model and reconstruct the original hyperspectral data from a Bayesian point of view, here we propose an accurate probabilistic model by performing a structured L&S factorization for X as X , SΘ , SHL , GL,
(3)
where the diagonal matrix S = diag(s1 , s2 , · · · , sN ) is the sparsity pattern matrix of the signals withP the support indiN cates sn ∈ {0, 1} , ∀n. We refer to K = n=1 sn N as N ×R the sparsity level of X. H = [hnr ] ∈ R and L = [lrt ] ∈ RR×T are obtained from the low-rank matrix factorization of Θ ∈ RN ×T , which is the amplitude matrix of X. For a jointsparse matrix G and an arbitrary matrix L, this factorization implies that X is a simultaneous low-rank (R ≤ K N ) and joint-sparse matrix with rank R ≤ min(K, T ), where all sparse signal vectors x1 , x2 , · · · , xT share a common support with sparsity level K. Assuming independent entries for S, H, and L, the separable probability density functions (PDFs) of G and L become Y Y Y p(G) = p(gnr ) = p(sn ) N (hnr ; gˆ0 , ν0g ) , (4) n,r n r Y Y p(L) = p(lrt ) = N (lrt ; 0, 1), (5) r,t
r,t
p (lrt )
p ( xnt sn hnr lrt )
lrt
r 1
p (hnr )
hnr
sn p( sn )
Fig. 1. The factor graph for the L&S structure model with N = 4, T = 3, R = 2, and M = 3. where both {hnr }∀n,r and {lrt }∀r,t are assumed to be i.i.d. Gaussian with unknown mean gˆ0 and variance ν0g . In particular, we assume {lrt }∀r,t follow i.i.d. Gaussian distribution with zero mean and unit variance, i.e., N (0, 1), to avoid ambiguity and the unnecessary model parameters update. As {sn }∀n are treated as i.i.d. Bernoulli random variables with ∆ Pr(sn = 1) = λ, ∀n, the sparse coefficients, {gnr }∀n,r , become i.i.d. Bernoulli-Gaussian (BG), i.e., the marginal PDF p(gnr ) = (1 − λ)δ(gnr ) + λN (gnr ; gˆ0 , ν0g ), ∀n, r,
(6)
where δ(·) is the Dirac delta function. Furthermore, due to the assumption of adaptive Gaussian noise in (2), the likelihood function of Z = {zt }∀t is known and separable, i.e., p(Y |Z ) =
Mt T Y Y
p (ymt |zmt ) =
Mt T Y Y
N (ymt ; zmt , τtω ),
t=1 m=1
t=1 m=1
(7)
where the measurement Y = {yt }∀t . 3. THE L&S-AMP ALGORITHM
With the problem formulation in (2) and (3), our proposed method is to maximize the posterior joint distribution, i.e., p(S, H, L |Y ) = p(Y |S, H, L)p(S, H, L)/p(Y) Y ∝ p(yt |At xt )p(S)p(H)p(L) t ! N Mt T N R X Y Y X Y = p ymt amn sn hnr lrt × p(sn ) t=1 m=1 N Y R Y
×
n=1 r=1
n=1
p(hnr ) ×
r=1
R Y T Y
p(lrt ),
n=1
(8)
r=1 t=1
where ∝ denotes equality up to a normalizing constant scale factor. This posterior distribution can be represented with a factor graph shown in Fig. 1, where circles denote random variables and squares denote posterior factors based on belief propagation [15]. Each factor node represents the conditional probability distribution between all variable nodes it connected. T vertical planes (parallel to Y and Z axes) exploit the linear measurement structure zt = At xt , ∀t (detailed
(MMSE) estimation of {xnt }∀n,t is facilitated by the following prior-dependent integrals Z x ˆnt (j) = xnt ∆xnt dxnt . (11)
GAMP GAMP GAMP N
M
Z Y X
T N
p ( ymt z mt )
zmt p( zmt
a n 1
x )
mn nt
xnt
p ( xnt )
Fig. 2. The factor subgraphs for the M-GAMP phase with N = 4, T = 3, R = 2, and M = 3. in Fig. 2), while the remaining part of Fig. 1 further exploits the L&S structure X = SHL. To bypass the intractable inference problem of marginalizing (8), we propose to solve an alternative problem that consists of two sub-problems that mainly require local information to complete their tasks [13]. Correspondingly, our proposed algorithm is divided into two phases: I) the multiple generalized approximate message passing (M-GAMP) phase; II) the low-rankness and sparsity pattern decoding (L&SPD) phase. Owing to this, an efficient “turbo AMP” iterative framework [13] is used, that iteratively updates one of the phases’ beliefs, and passes the beliefs to another phase, and vice versa, repeating until both phases converge. 3.1. M-GAMP Phase In each frame t = 1, ..., T of the M-GAMP phase, we apply the generalized approximate message passing (GAMP) approach [12] in parallel for the linear inference problem: estimate the vector xt from the observation yt , as shown in Fig. 2. Specifically, the GAMP computes the approximated posteriors on {xnt }∀n as [11], [12] Y ∆xnt = p(xnt ) ∆zmt →xnt m X u ∝ p(xnt | gnr lrt )N (xnt ; u ˆnt , νnt ), (9) r
where ∆A→B denotes a message passed from node A to the adjacent node p(B) in the factor graph. The parameters u ˆnt , u and νnt are obtained after the GAMP iteration converges. For P the prior distribution of xnt , i.e., p(xnt | r gnr lrt ) used in (9), we can assume BG prior PDF X q p(xnt | gnr lrt ) = (1−λ)δ(xnt )+λN (xnt ; qˆnt , νnt ), (10) r
q where qˆnt and νnt are the active-coefficient mean and the active-coefficient variance, respectively, of the variable xnt . It is worth mentioning that the prior parameters {ˆ qnt }∀n,t and q {νnt }∀n,t , are only initialized to agnostic values at the beq ginning of the L&S-AMP algorithm (e.g., qˆnt = 0, νnt = 1, ∀n, t), then iteratively updated according to the message passed from the L&SPD phase. This process will be detailed in next subsection. Then the minimum-mean-squared error
3.2. L&SPD Phase In the L&SPD phase, to exploit the L&S structure, we employ the recently proposed bilinear generalized approximate message passing (BiG-AMP) approach [16] to a variant of the PCA problem: estimate the matrices G and X from an obserˆ = [ˆ vation X xnt ] ∈ RN ×T which is the posterior estimation of their product X = GL obtained form the M-GAMP phase in (10). In particular, the BiG-AMP [16] obtains the approximately GaussianZ posterior messages {∆x0nt }∀n,t as i hY Y ∆x0nt = p(x0nt ) ∆x0nt ←lrt ∆x0nt ←gnr × r
{gnr ,lrt }∀r
q ∝ p(x0nt |yt )N (x0nt ; qˆnt , νnt ), q where the parameters qˆnt and νnt are
r
(12)
obtained after the BiGAMP iteration converges. The prior distribution of x0nt , i.e., p(x0nt |yt ) used in (12), comes from the posterior message of xnt given the observation P yt in the M-GAMP phase. The prior distribution of p(xnt | r gnr lrt ) used in (9) comes from the posterior message of xnt given the matrix factorization P xnt = r gnr lrt in the L&SPD phase. To enable effective implementation of “turbo AMP” iteration, given the construction of the factor graph in Fig. 1, the sum-product algorithm (SPA) [15] implies that, Y p(x0nt |yt ) ∝ ∆zmt →xnt (xnt ), (13) m Z h Y X gnr lrt ) ∝ p(xnt | ∆x0nt ←gnr (gnr ) r i r Y{gnr ,lrt }∀r × ∆x0nt ←lrt (lrt ) . (14) r
Comparing (13) and (14) with (9) and (12), respectively, we have u p(x0nt |yt ) ≈ N (x0nt ; u ˆnt , νnt ), (15) X q p(xnt | gnr lrt ) ≈ N (xnt ; qˆnt , νnt ). (16) r
u Thus, the parameters u ˆnt (j), νnt (j) computed during the final iteration of the M-GAMP phase, are treated as the prior parameters of X in the L&SPD phase. Conversely, the paramq eters qˆnt (j 0 ) and νnt (j 0 ) computed during the final iteration of the L&SPD phase are in turn used as the prior parameters of X in the M-GAMP phase in (10). In addition, to further exploiting the joint-sparsity of ( {xnt }∀n,t , we use the local support estimate λ nt instead of the common sparsity rate λ in (10). Then, by applying the SPA in the M-GAMP phase, we get * Q λ t0 6=t λ nt0 ( , (17) λ nt = Q * * Q λ t0 6=t λ nt0 + (1 − λ) t0 6=t (1 − λ nt0 )
where the posterior local support probability −1 u * N (0; u ˆnt , νnt ) λ nt = 1 + . q u ) N (ˆ unt ; qˆnt , νnt + νnt
(18)
(
q with the initial prior parameters qˆnt = 0, νnt = 1, λ nt = 0.5, ∀n, t in (10). Then the converged outgoing messages u {ˆ unt }∀n,t , {νnt }∀n,t are treated as prior parameters in the L&SPD phase. Then the converged messages {ˆ qnt }∀n,t and q {νnt }∀n,t obtained from the L&SPD phase, along with the (
updated beliefs { λ nt }∀n,t in (17), are used for the M-GAMP phase at inter-phase iteration i = 2. This procedure continues until either a stopping condition or a maximum number of allowable iterations is reached. Then we obtain the posterior mean estimates {ˆ xnt }∀n,t computed in (11). Furthermore, we tune our prior and likelihood parameters {λ, gˆ0 , ν0g , {τtω }∀t } using expectation-maximization [17], [18], and estimate the rank R using a rank selection strategy based on the penalized log-likelihood maximization in [16]. In addition, we recommend initializing L&S-AMP using λ = 0.5, gˆ0 = 0, ν0g = 1, and τtω = 100, ∀t.
Column Averaged Normalized MSE (CNMSE) [dB]
3.3. Algorithm Summary Beginning at the initial inter-phase iteration index, i = 1, the L&S-AMP algorithm first performs the M-GAMP phase
(a) Urban dataset
0
(b) Agriculture−oriented dataset L&S−AMP
−12 −5
±
L&S−AMP AMP−MMV T−MSBL PPXA RA−ORMP SA−MUSIC
−14
−10
−16 −18
−15
−20 −20 −22 −25
−24 0.1
0.15
0.2
0.25
0.15
Compressive Ratio (M/N)
0.2
0.25
0.3
0.35
0.4
Compressive Ratio (M/N)
Fig. 3. CNMSE versus M/N for the recovery of the urban dataset (left) and agriculture-oriented dataset (right).
4. NUMERICAL RESULTS In this section, we present real data results to compare the performance of the proposed L&S-AMP algorithm with prior state-of-art PPXA [6], RA-ORMP [19], SA-MUSIC [20], and T-MSBL [21] algorithms. We evaluate the performance of the algorithms on two real hyperspectral datasets: 1) An urban dataset acquired over the University of Houston, with 144 spectral bands, 340 × 740 pixels, and a spatial resolution of 4m. 2) An agricultural dataset acquired over the Salinas valley in California. The dataset has a spatial resolution of 3.7m and consists of N = 224 spectral bands with each band corresponding to an image with 512 × 217 pixels. We assume that the L&S signal matrix X ∈ RN ×T is obtained using pixel-based acquisition, so that T denotes the number of pixels and N denotes the number of spectral band. The DCT matrix is used as the sparsifying matrix Ψ, and Gaussian noise is added to achieve SNR = 25 dB. It is worth noting that, for the sake of comparion, different random Gaussian measurement matrices A1 , A2 , ..., AT are used. Also note that T-MSBL, RA-ORMP, and SA-MUSIC are derived only for the common measurement matrix case. Fig. 3 plots the column-averaged normalized MSE (CNMSE) versus the compressive ratio M/N on the two real ˆ datasets. The CNMSE is . defined as CNMSE(X, X) = P 2 2 1 ˆ is an estimate of X. ˆt k kxt k , where X kxt − x T
t
2
2
From the figure, we observe that the proposed algorithm outperforms all the other algorithms in terms of CNMSE, e.g., in Fig. 3.(b), we note that L&S-AMP achieves nearly 3dB reconstruction gain than the other algorithms at M/N = 0.3. In addition, a plus-minus sign (±) is used (i.e., L&S-AMP± ) to denote the case of using random ±1 measurement matrices, which are easy to implement in DMD, and can significantly reduce the burden of storage.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 4. Visual quality comparison of the recovered images for the urban dataset. From left to right and top to bottom: original image, the recovered images by SA-MUSIC, RA-ORMP, T-MSBL, PPXA, and the proposed algorithm. M/N is fixed to 0.243, and other simulation parameters remain unchanged. The whole scene is partitioned into a sequence of sub-scenes to enable parallel processing. Some visual results of the recovered hyperspectral images by using different algorithms are presented in Fig. 4. As expected, our proposed algorithm preserves more fine details and much sharper edges, and shows much clearer and better visual results than the other competing methods. 5. CONCLUSION In this paper, we studied joint CS reconstruction of spatially and spectrally correlated hyperspectral data acquired, assuming that the hyperspectral signal matrix satisfies the joint-sparse model with a lower rank than the sparsity level, i.e., the L&S model. We proposed an AMP-based algorithm for recovering the signal matrix with the L&S model while exploiting the structured sparsity and the amplitude correlation of the data. The numerical results were presented to confirm the performance advantage of our algorithm.
6. REFERENCES [1] A. Plaza, J. A. Benediktsson, J. Boardman, J. Brazile, L. Bruzzone, G. Camps-Valls, J. Chanussot, M. Fauvel, P. Gamba, J. Gualtieri, M. Marconcini, J. C. Tilton, and G. Trianni, “Recent advances in techniques for hyperspectral image processing,” Remote Sens. Environment, vol. 113, no. 1, pp. 110–122, Sep. 2009. [2] J. M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders, N. M. Nasrabadi, and J. Chanussot, “Hyperspectral remote sensing data analysis and future challenges,” IEEE Geoscience and Remote Sens. Mag., vol. 1, no. 2, pp. 6–36, Jun. 2013. [3] E. J. Cand`es, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006. [4] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. E. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag., vol. 25, no. 2, pp: 83–91, Mar. 2008. [5] C. Li, T. Sun, K. F. Kelly, and Y. Zhang, “A compressive sensing and unmixing scheme for hyperspectral data processing,” IEEE Trans. Image Proces., vol. 21, no. 3, pp: 1200–1210, Mar. 2012. [6] M. Golbabaee and P. Vandergheynst, “Hyperspectral image compressed sensing via low-rank and joint-sparse matrix recovery,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Proces. (ICASSP), pp: 2741–2744, Kyoto, Japan, Mar. 2012. [7] G. Mart´ın, J. M. Bioucas-Dias, and A. Plaza, “HYCA: A new technique for hyperspectral compressive sensing,” IEEE Trans. Geoscience and Remote Sens., vol. 53, no. 5, pp: 2819–2831, May 2015. [8] R. M. Willett, M. F. Duarte, M. A. Davenport, and R. G. Baraniuk, “Sparsity and structure in hyperspectral imaging: Sensing, reconstruction, and target detection,” IEEE Signal Process. Mag., vol. 31, no. 1, pp: 116–126, Jan. 2014. [9] J. E. Fowler, “Compressive pushbroom and whiskbroom sensing for hyperspectral remote-sensing imaging,” in Proc. of the Int. Conf. on Image Process., pp: 684-688, Paris, France, Oct. 2014. [10] J. Ma , “A single-pixel imaging system for remote sensing by two-step iterative curvelet thresholding,” IEEE Geoscience and Remote Sensing Letters, vol. 6, no. 4, pp: 676–680, Oct. 2009.
[11] D. L. Donoho, A. Maleki, and A. Montanari, “Messagepassing algorithms for compressed sensing,” in Proc. Natl. Acad. Sci, vol. 106, no. 45, pp. 18914-18919, Sep. 2009. [12] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in Proc. IEEE Int. Symp. Inf. Theory, pp. 2168–2172, Saint Petersburg, Russia, Aug. 2011. [13] P. Schniter, “Turbo reconstruction of structured sparse signals,” in Proc. Conf. Inform. Science & Syst., Princeton, NJ, Mar. 2010. [14] J. Vila, P. Schniter, and J. Meola, “Hyperspectral unmixing via turbo bilinear approximate message passing.” IEEE Trans. Computational Imaging, vol. 1, no. 3, pp. 143–158, Sep. 2015. [15] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, Feb. 2001. [16] J. T. Parker, P. Schniter, and V. Cevher, “Bilinear generalized approximate message passing–Part I: Derivation,” IEEE Trans. Signal Process., vol. 62, no. 22, pp. 5839– 5853, Nov. 2014. [17] J. Ziniel and P. Schniter, “Efficient high-dimensional inference in the multiple measurement vector problem,” IEEE Trans. Signal Process., vol. 61, no. 2, pp. 340–354, Jan. 2013 [18] J. P. Vila and P. Schniter, “Expectation-maximization Bernoulli-Gaussian approximate message passing,” in Proc. Asilomar Conf. Signals, Syst., and Comput., pp. 799–803, Pacific Grove, CA, Nov. 2011. [19] M. E. Davies and Y. C. Eldar, “Rank awareness in joint sparse recovery,” IEEE Trans. Inf. Theory, vol.58, no.2, pp.1135–1146, Feb. 2012. [20] K. Lee, Y. Bresler, and M. Junge, “Subspace methods for joint sparse recovery,” IEEE Trans. Inf. Theory, vol. 58, no. 6, pp. 3613–3641, Jun. 2012. [21] Z. Zhang and B. D. Rao, “Sparse signal recovery with temporally correlated source vectors using Sparse Bayesian Learning,” IEEE J. Select. Topics Signal Process., vol. 5, no. 5, pp. 912–926, Sep. 2011.