compressed sensing mri with bayesian ... - Columbia University

Report 3 Downloads 193 Views
COMPRESSED SENSING MRI WITH BAYESIAN DICTIONARY LEARNING Xinghao Ding1 , John Paisley2 , Yue Huang1 , Xianbo Chen1 , Feng Huang3 and Xiao-ping Zhang1,4 1

Department of Communications Engineering , Xiamen University, Fujian, China 2 Department of EECS, University of California, Berkeley, USA 3 Philips Research China, Shanghai, China 4 Department of Electrical and Computer Engineering, Ryerson University, Toronto, Canada E-mail: [email protected] ABSTRACT We present an inversion algorithm for magnetic resonance images (MRI) that are highly undersampled in k-space. The proposed method incorporates spatial finite differences (total variation) and patch-wise sparsity through in situ dictionary learning. We use the beta-Bernoulli process as a Bayesian prior for dictionary learning, which adaptively infers the dictionary size, the sparsity of each patch and the noise parameters. In addition, we employ an efficient numerical algorithm based on the alternating direction method of multipliers (ADMM). We present empirical results on two MR images. Index Terms— compressed sensing, MRI reconstruction, dictionary learning, Bayesian models 1. INTRODUCTION Magnetic resonance (MR) imaging is a widely used technique for visualizing the anatomical structure and physiological functioning of the body. A limitation of MR imaging is its slow scan speed during data acquisition, which is a drawback especially in dynamic imaging applications. Recent advances in signal reconstruction from measurements sampled below the Nyquist rate, called compressed sensing [1, 2], have had a major impact on MRI imaging [3]. CS-MRI allows for significant undersampling in k-space, where each k-space measurement is a Fourier transform coefficient of the MR image. MRI reconstruction using undersampled k-space data is a case of an ill-posed inverse problem. However, compressed sensing (CS) theory has shown that it is possible to reconstruct a signal from significantly fewer measurements than mandated by traditional Nyquist sampling if the signal is sparse in a particular transform domain [1, 2]. CS-MRI reconstruction algorithms tend to fall into two categories: Those which enforce sparsity withing an imagelevel transform domain (e.g., [3]–[7]), and those which enThis project is supported by NSFC (No.30900328, 61172179), the Natural Science Foundation of Fujian Province of China (NO: 2012J05160), The National Key Technology R&D Program (2012BAI07B06), the Fundamental Research Funds for the Central Universities (No.: 2011121051).

978-1-4799-2341-0/13/$31.00 ©2013 IEEE

2319

force sparsity on a patch-level (e.g., [8]–[11]). Most CS-MRI reconstruction algorithms belong to the first category. For example Sparse MRI [3], the leading study in CS-MRI, performs MR image reconstruction by enforcing sparsity in both the wavelet domain and the total variation (TV) of the reconstructed image. Algorithms with image-level sparsity constraints such as Sparse MRI typically employ “off-the-shelf” dictionaries, which can usually capture only one feature of the image. The second group of algorithms learn a sparse basis on image subregions called “patches” that is adapted to the image class of interest. One approach to adapted basis learning is dictionary learning. Recent studies in the image processing literature have shown that dictionary learning can find sparse representations of images on the patch-level [12]–[13], [18]. These algorithms learn a patch-level basis (i.e., dictionary) by exploiting structural similarities between patches extracted from images within a class of interest (for example K-SVD [13]). Among these approaches, adaptive dictionary learning usually outperforms analytical dictionary approaches in denoising, super-resolution reconstruction, interpolation, inpainting, classification and other applications, since the adaptively learned dictionary suits the signals of interest [13]–[15]. Dictionary learning has been applied to CS-MRI as a sparse basis for reconstruction (e.g., LOST [9] and DLMRI [10]). Results using this approach demonstrate a significant improvement when compared with previous CS-MRI methods. However, these methods still have restrictions in that the dictionary size, patch sparsity and noise levels must be preset. In addition, algorithms such as dictionary learning that are based on only local image sparsity do not take into account additional image-level constraints, such as total variation, which can improve reconstruction. In this paper, we address the issues discussed above by proposing a new inversion framework for CS-MRI. Our work makes two contributions: 1) We propose a combination of in situ dictionary learning and total variation as a sparsity constraint for the inverse CS-MRI problem. We use the alternating direction method of multipliers (ADMM) to derive an

ICIP 2013

efficient optimization procedure [21]; 2) We use a Bayesian approach to dictionary learning based on the beta-Bernoulli process [16]–[18]. This approach has three advantages: (i) it can learn the size of the dictionary from the data, (ii) it can learn the sparsity pattern on a patch-by-patch level, (iii) it can adaptively learn regularization weights, which correspond to noise variance in the Bayesian framework. 2. COMPRESSED SENSING MRI WITH BPFA

x

λ kFu x − yk22 , 2

(1)

where kFu x − yk22 is a data fidelity term, λ > 0 is a regularization parameter and h(x) is a regularization function that controls desired properties of the image—specifically, that the reconstruction is sparse in the selected domain. As our regularization scheme for CS-MRI inversion, we propose a combination of total variation and Bayesian dictionary learning. We write this optimization problem as arg min λg hg (x) + hl (x) + x,ϕ

hg (x) := T V (x), hl (x) :=

P

λ kFu x − yk22 , 2

γε i 2 kRi x

1. Construct a dictionary D = [d1 , . . . , dK ]: dk ∼ N (0, P −1 IP ), k = 1, . . . , K. 2. Draw a probability πk ∈ [0, 1] for each dk : πk ∼ Beta(a0 , b0 ), k = 1, . . . , K. 3. Draw precision values γε ∼ Gamma(g0 , h0 ) and γs ∼ Gamma(e0 , f0 ). 4. For the ith patch in x:

In this section, we briefly review the problem of CS-MRI. We then present our CS inversion approach that uses a Bayesian method for dictionary learning called beta process factor analysis (BPFA). √ √ Let x ∈ RN be a N × N image in vectorized form. Let Fu ∈ Cu×N , u < N , be the under-sampled Fourier encoding matrix and y = Fu x be the sub-sampled set of k-space measurements. The goal is to estimate x from the small fraction of k-space measurements y. For dictionary learning, let Ri be the ith patch extraction operator. The operator Ri is a P × N matrix of all zeros √ except √ for a one in each row that extracts a vectorized P × P patch from the image, xi = Ri x ∈ RP for i = 1, . . . , N . We represent the CS-MRI inversion problem as optimizing an unconstrained function of the form arg min h(x) +

Algorithm 1 Generating an image with BPFA

(2)

− Dαi k22 + f (ϕ).

For the patch-wise regularization function hl (x) we use BPFA as given in Algorithm 1 [16]-[18]. This is a Bayesian model for generating images from dictionaries that are sparsely used; the optimization problem is equivalently an inference problem in Bayesian terms. The parameters to be optimized for BPFA are contained in the set ϕ = {D, s, z, γε , π}, and are defined in Algorithm 1. The regularization term γε is a model variable that corresponds to an inverse variance parameter of the multivariate Gaussian likelihood. In (2), the function f can be expanded to include the negative log likelihood from Algorithm 1, which we omit here. This term acts as the sparse basis for the image and also aids in producing a denoised reconstruction [16].

2320

(a) (b) (c) (d)

Draw the vector si ∼ N (0, γs−1 IK ). Draw the binary vector zik ∼ Bernoulli(πk ). Define αi = si ◦ zi by an element-wise product. Construct the patch Ri x = Dαi + εi with noise εi ∼ N (0, γε−1 IP ).

5. Construct the image x as the average of all Ri x that overlap on a given pixel.

The regularization function hg (x) is the total variation of the image. This term encourages homogeneity within contiguous regions of the image, while still allowing for sharp jumps in pixel value at edges due to the underlying `1 penalty. For the total variation penalty T V (x) we use the isotropic TV model [20]. To review, let ψi be the 2 × N difference operator for pixel i. Each row of ψi contains a 1 centered on pixel i, and −1 on the pixel directly above (for the first row of ψi ) or to the right (for the second row of ψi ) of pixel i, and T T zeros elsewhere. Let Ψ = [ψ1T , . . . , ψN ] be the resulting 2N × N difference matrix for the entire image. The TV coefficients are β = Ψx ∈ R2N , q and the isotropic TV penalty is P P 2 2 , where i ranges T V (x) = i kψi xk2 = i β2i−1 + β2i over the pixels in the MR image. Several algorithms have been proposed for TV minimization, for example using Newton’s method [22] or graph cuts [23]. Recently, a simple and efficient method based on the alternating direction method of multipliers (ADMM) [21], also called the split Bregman method, has been proposed [19] for TV denoising models. We adopt this approach in our optimization algorithm. 3. ALGORITHM We sketch an algorithm for finding a local optimal solution to the non-convex objective function given in (2). This involves a stochastic optimization (inference) portion for updating the BPFA parameters, and an ADMM algorithm for the total variation penalty. Our use of ADMM for total variation minimization follows [20]. Also, a general introduction to the ADMM algorithm is given in [21]. Recall from the discussion in Section 2 that P the total variation part of the objective is T V (x) = λg i kψi xk2 . We use the ADMM algorithm to split x from this TV penalty. We begin by defining the TV coefficients for the ith pixel as

β i := ψi x and we insert β for ψi x in the objective under the constraint that they are equal. For each β i , we introduce a vector of Lagrange multipliers ηi . We then split βi from ψi x by relaxing their equality via an augmented Lagrangian. This results in the objective function L(x, β, η, ϕ) = P λg i kβi k2 + ηiT (ψi x − βi ) + ρ2 kψi x − βi k22 P + i γ2ε kRi x − Dαi k22 + f (ϕ) λ + kFu x − yk22 . (3) 2 The first line is the ADMM expansion of the TV penalty, which contains a Lagrangian term plus a squared error term. From the ADMM theory, holding everything else fixed, this objective will have optimal values βi∗ and x∗ with βi∗ = ψi x∗ , and so the equality constraints will be satisfied. As written in (3), optimizing this function can be split into three separate sub-problems: one for the TV penalty βi , one for BPFA ϕ = {D, s, z, γε , π} and one for the image reconstruction x. Following the discussion in [21], we simplify the ADMM part by defining ui = (1/ρ)ηi and completing the square in the first line of (3). We then cycle through the following three sub-problems using the current values of all parameters: βi0 = arg minβ λg kβ i k2 + ρ2 kψi x − β i + ui k22 , P γε 2 (P 2) ϕ0 = arg minϕ i 2 kRi x − Dαi k2 + f (αi , D), P ρ 0 2 (P 3) x0 = arg minx i 2 kψi x − βi + ui k2 P γε0 + i 2 kRi x − D0 αi0 k22 + λ2 kFu x − yk22 , (P 1)

u0i = ui + ψi x0 − βi0 ,

i = 1, . . . , N.

P 1 has a simple, globally optimal and closed form solution that can be found in, e.g., [20]. The update for ui follows from the general ADMM algorithm [20, 21]. Since P 2 is non-convex, we cannot perform the desired minimization, and so an approximation is required. Furthermore, this problem requires iterating through several parameters. Our approach is to use stochastic optimization for problem P 2 by Gibbs sampling each variable in BPFA conditioned on current values for all other variables. A Gibbs sampling algorithm for this model can be found in [18]. P 3 is a least squares problem with a closed form solution. However, we observe that P 3 requires the inversion of a prohibitively large matrix as written. Instead, we work in the Fourier domain as done in [20], which results in the inverted matrix being diagonal. We update each Fourier coefficient one-by-one, and then take to inverse transform to obtain x0 . 4. EXPERIMENTS We perform experiments using our CS-MRI inversion approach on two 256 × 256 MR images as shown in Figure 2b

2321

Fig. 1. Example masks for sub-sampling 25% of k-space. (left) Cartesian mask, (middle) random mask, (right) radial mask. The Cartesian mask is the most practical, followed by the radial mask. and 3b. We consider sub-sampling 10%, 20%, 25%, 30% and 35% of k-space. We also consider three sub-sampling masks: Cartesian, (pseudo) random and radial, which determine the sampling pattern in k-space. We show examples of these masks in Figure 1. We compare our algorithm (BPFA+TV) with three other algorithms: (i) BPFA without TV, (ii) FCSA [24], (iii) SparseMRI [3]. For FCSA and SparseMRI we use the codes available from the authors’ websites, as well as their default parameterizations. To measure performance, we use the peak-signal-to-noise ratio (PSNR). In Figures 2 and 3 we show qualitative results from the algorithms. In Figure 2 we consider a shoulder scan with 35% sampling using a Cartesian mask. The inset shows a close-up of a region of the original image and reconstructions. For this example, BPFA+TV is smoother than FCSA, but has a better resolution than SparseMRI. We show similar results for the coronal image in Figure 3 using a random mask that samples 20% in k space. The results appear to again slightly favor BPFA+TV for this example. In Figure 4 we show PSNR results for all sampling percentages and masks. The performance for the BPFA models is the best for these problems. We also see that in general, adding a total variation term to the penalty improves the reconstruction while not significantly increasing the running time. We report that the model learned a dictionary from the images that captured the salient features in a sparse way. We also note that performance significantly improved as a result of adaptively learning BPFA-related parameters such as regularization weights, rather than fixing them in advance. 5. CONCLUSION We have presented an algorithm for CS-MRI inversion that uses total variation and Bayesian dictionary learning as sparsity constraints. We used the ADMM algorithm for efficient optimization of the total variation penalty and Bayesian model that learns the dictionary from the data during image reconstruction. Results on two MR images show that the proposed algorithm performs favorably compared with other available software for CS-MRI.

(a)

(b)

(c)

(d)

(e)

Fig. 2. Reconstruction of the shoulder image with a 35% sampling in k-space using Cartesian sampling: (a) sampling mask, (b) original image, (c) BPFA+TV, (d) FCSA, (e) SparseMRI. The inset shows a region in more detail.

(a)

(b)

(c)

(d)

(e)

Fig. 3. Reconstruction of the coronal image with a 20% sampling in k-space using random sampling: (a) sampling mask, (b) original image, (c) BPFA+TV, (d) FCSA, (e) SparseMRI. The inset shows a region in more detail. 50

44

42 45 40

38 40 36

35

34

32 30 30

28 25 26

20 0.1

0.15

(a) Shoulder/Cartesian

0.2

0.25

0.3

0.35

24 0.1

0.15

(b) Shoulder/Random

28

0.2

0.25

0.3

0.35

0.3

0.35

(c) Shoulder/Radial

45

36

27

34 40

26

32

25

35

30

24

28 30

23

26

22

25

24

21

22 20

20

19 0.1

20

0.15

0.2

0.25

(d) Coronal/Cartesian

0.3

0.35

15 0.1

0.15

0.2

0.25

(e) Coronal/Random

0.3

0.35

18 0.1

0.15

0.2

Fig. 4. Reconstruction PSNR as a function of percentage sampled in k-space.

2322

0.25

(f) Coronal/Radial

6. REFERENCES [1] E. Cand´es, J. Romberg, and T. Tao, “Robust Uncertainty Principles: Exact Signal Reconstruction From Highly Incomplete Frequency Information,” IEEE Trans. on Information Theory, vol. 52, no. 2, pp. 489-509, 2000. [2] D. Donoho, “Compressed sensing,” IEEE Trans. on Information Theory, vol. 52, no. 4, pp. 1289-1306, 2006. [3] M. Lustig, D. Donoho, and J. M. Pauly, “Sparse MRI: The Application of Compressed Sensing for Rapid MR Imaging,” Magnetic Resonance in Medicine, vol. 58, no. 6, pp. 11821195, 2007. [4] Y. Kim, M. S. Nadar, and A. Bilgin, “Wavelet-Based Compressed Sensing Using Gaussian Scale Mixtures,” IEEE Trans. on Image Processing, vol. 21, no. 6, pp. 3108-3108, 2012. [5] X. Qu, W. Zhang, D. Guo, C. Cai, S. Cai, and Z. Chen, “Iterative Thresholding Compressed Sensing MRI Based on Contourlet Transform,” Inverse Problems Sci. Eng., Jun. 2009. [6] J. Trzasko and A. Manduca, “Highly Undersampled Magnetic Resonance Image Reconstruction via Homotopic L0Minimization,” IEEE Trans. on Medical Imaging, vol. 28, no. 1, pp. 106-121, 2009. [7] J. C. Ye, S. Tak, Y. Han, and H. W. Park, “Projection Reconstruction MR Imaging Using FOCUSS,” Magnetic Resonance in Medicine, vol. 57, no. 4, pp. 764-775, 2007. [8] M. Akcakaya, T. A. Basha, B. Goddu, L. A. Goepfert, K. V. Kissinger, V. Tarokh, W. J. Manning, and R. Nezafat, “LowDimensional-Structure Self-Learning and Thresholding: Regularization beyond Compressed Sensing for MRI Reconstruction,” Magnetic Resonance in Medicine, vol. 66, pp. 756-767, 2011. [9] S. Ravishankar and Y. Bresler, “MR Image Reconstruction from Highly Undersampled k-space Data by Dictionary Learning,” IEEE Trans. on Medical Imaging, vol. 30, no. 5, pp. 1028-1041, 2011. [10] X. Qu, D. Guo, B. Ning, Y. Hou, Y. Lin, S. Cai, and Z. Chen, “Undersampled MRI Reconstruction with the PatchBased Directional Wavelets,” Magnetic Resonance in Imaging, doi:10.1016/j.mri.2012.02.019, Feb. 2012. [11] Z. Yang and M. Jacob, “Robust Non-Local Regularization Framework for Motion Compensated Dynamic Imaging without Explicit Motion Estimation,” IEEE Int. Symp. Biomedical Imaging, pp. 1056-1059, 2012. [12] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image Denoising by Sparse 3D Transform-Domain Collaborative Filtering,” IEEE Trans. on Image Process, vol. 16, no. 8, pp. 2080-2095, 2007. [13] M. Aharon, M. Elad, A. Bruckstein, and Y. Katz, “K-SVD: An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation,” IEEE Trans. on Signal Processing, vol. 54, pp. 4311-4322, 2006. [14] M. Protter and M. Elad, “Image Sequence Denoising via Sparse and Redundant Representations,” IEEE Trans. on Image Processing, vol. 18, no. 1, pp. 27-36, 2009.

2323

[15] G. Yu, G. Sapiro, and S. Mallat, “Solving Inverse Problems with Piecewise Linear Estimators: From Gaussian Mixture Models to Structured Sparsity,” IEEE Trans. on Image Processing, vol. 21, issue. 5, pp. 2481-2499, 2010. [16] J. Paisley, and L. Carin, “Nonparametric Factor Analysis with Beta Process Priors,” International Conference on Machine Learning, 2009. [17] X. Ding, L. He, and L. Carin, “Bayesian robust principal component analysis,” IEEE Trans. Image Process., vol. 20, no. 12, pp. 3419-3430, Dec. 2011. [18] M. Zhou, H. Chen, J. Paisley, L. Ren, L. Li, Z. Xing, D. Dunson, G. Sapiro, and L. Carin, “Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete Images,” IEEE Trans. Image Process., vol. 21, no. 1, pp. 130-144, Jun. 2011. [19] W. Yin, S. Osher, D. Goldfarb, and J. Darbon, “Bregman Iterative Algorithms for L1 Minimization with Applications to Compressed Sensing,” SIAM Journal on Imaging Sciences, vol. 1, no. 1, pp. 143-168, 2008. [20] T. Goldstein and S. Osher, “The Split Bregman Method for L1 Regularized Problems,” SIAM Journal on Imaging Sciences, vol. 2, issue.2, pp. 323-343, 2009. [21] S. Boyd, N. Parikh, E. Chu, B. Peleato and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, num. 1, pp. 1-122, 2010. [22] T.F. Chan, G.H. Golub and P. Mulet, “A nonlinear primaldual method for total variation-based image restoration,” SIAM Journal of Scientific Computing, vol. 20, pp. 1964-1977, 1999. [23] J. Darbon and M. Sigelle, “A fast and exact algorithm for total variation minimization,” iBPRIA, Lecture Notes in Computer Science 3522, Springer-Verlag, Berlin, pp. 351-359, 2005. [24] J. Huang, S. Zhang, and D. Metaxas, “Efficient MR Image Reconstruction for Compressed MR Imaging,” Medical Image Analysis, vol. 15, no. 5, pp. 670-679, 2011.