2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)
Joint-sparse Recovery in Compressed Sensing with Dictionary Mismatch Zhao Tan, Peng Yang, and Arye Nehorai Preston M. Green Department of Electrical and Systems Engineering Washington University in St. Louis, St. Louis, MO, USA Abstract—In traditional compressed sensing theory, the dictionary matrix is given a priori, while in real applications this matrix suffers from random noise and fluctuations. This paper considers a signal model where each column in the dictionary matrix is affected by a structured noise. This formulation is common in radar related applications and direction-of-arrival estimation. We propose to use joint-sparse signal recovery in this compressed sensing problem with dictionary mismatch and also give a theoretical result on the performance bound for this joint-sparse method. We show that under mild conditions the reconstruction error of the original sparse signal is bounded by both the sparsity and the noise level in the measurement model. Moreover, a fast first-order method is implemented to speed up the computing process. Numerical examples demonstrate the good performance of the proposed algorithm, and also show that the joint-sparse recovery method converges faster and gives a better reconstruction result than previous methods.
I. I NTRODUCTION Compressed sensing is a fast growing area in the field of signal reconstruction [1]-[3]. It allows us to reconstruct the signal by using a sample rate less than the normal Nyquist rate, as long as the signal of interest is sparse in a basis representation. Compressed sensing covers a wide range of applications, such as imaging [4], radar signal processing [5][6], and remote sensing [7]. A typical compressed sensing problem employs the following linear model: y = Ds + w, in which D ∈ Rm×n is a given dictionary matrix, y ∈ Rm is the measurement vector, and w ∈ Rm is the unknown noise term. The signal of interest is s ∈ Rn , and it is known to be sparse, i.e., the number of nonzero terms in s is far less than n. In real applications, we normally do not have perfect information about the dictionary matrix D. We can only know some structures of it, i.e., D = A+E with matrix A ∈ Rm×n known, while matrix E ∈ Rm×n is unknown. In [8], [9], the authors showed that the reconstruction error increases with the mismatch level. In this work, we consider the structured dictionary mismatch case where di = ai + βi bi , 1 ≤ i ≤ n. di and ai are the ith column of matrices D and A. ai and bi are given for all i. This model is often encountered in direction-of-arrival
estimation and radar applications. Thus the signal model in our paper is (M) y = (A + B∆)s + w, where ∆ = diag(β), β = [β1 , β2 , . . . , βn ], and B = [b1 , b2 , . . . , bn ] ∈ Rm×n . In [10], [11], the alternating minimization method is proposed to solve simultaneously for sparse signal s and mismatch β in (M). However, this method suffers from slow convergence and has no performance guarantee. In [11], bounded mismatch is considered, and several reconstruction optimization frameworks are proposed. These optimization frameworks are solved using interior point methods in [12], but since these methods require solving linear systems, the computing speed can be extremely slow when the problem dimension increases. In this work, we first propose to use the idea of the joint-sparse recovery to further exploit the structure underlying compressed sensing with dictionary mismatch, then we use a fast algorithm, named fast iterative shrinkage-thresholding algorithm (FISTA) [13], to solve the joint-sparse recovery. FISTA is more efficient in dealing with large dimensional data than the interior point methods. We use a capital italic bold letter to represent a matrix and a lowercase italic bold letter to represent a vector. For a given matrix D, D T denotes the transpose of D. For a given vector x, kxk1 , kxk2 are the `1 and `2 norms, respectively. Let kxk0 represent the number of nonzero components in a vector, which is also referred as the `0 norm. For a matrix A, kAk2 is the induced spectral norm. We use xi to represent the i-th element in the vector x. We use to denote the pointwise multiplication of two vectors with the same dimension. In this paper, we refer a vector s as k-sparse if there are at most k nonzero terms in s. We call a vector x ∈ R2n k jointsparse if x = [sT , pT ]T with s ∈ Rn , and p ∈ Rn , which are both k sparse with the same support set. II. G ENERAL S IGNAL M ODEL AND R ECOVERY M ETHOD A. Compressed Sensing with Dictionary Mismatch Traditional compressed sensing can be solved using the LASSO formulation [14], stated as
This work was supported by the NSF Grants CCF-1014908 and CCF- 0963742, the AFOSR Grant FA9550-11-1-0210, and ONR Grant N000141310050.
978-1-4673-3146-3/13/$31.00 ©2013IEEE
248
min
s∈Rn
1 kDs − yk22 + λksk1 . 2
2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)
In order to recover the sparse signal s in the mismatch model (M), the optimization problem is given as 1 min k(A + B∆)s − yk22 + λksk1 , s.t.∆ = diag(β). s∈Rn ,β∈Rn 2 (1) The above optimization is non-convex and generally hard to solve. We notice that when si = 0 for certain i, then βi can be any value, without affecting the reconstruction. Therefore, in the rest of this paper, we focus only on instances of βi with nonzero si . In [10], [11], the authors proposed to use the alternating minimization method to solve for both s and β when the mismatch variable β is bounded or Gaussian distributed. In this paper, we let p = β s and Φ = [A, B], and then transform the original non-convex optimization into a relaxed convex one. Instead of enforcing the sparsity of s, we enforce the joint sparsity between s and p. We let x = [sT , pT ]T , and we define the `2 /`1 norm of x as n q X s2i + p2i . kxk2,1 =
p = β s, then x will be joint k-sparse. Thus we have kx − (x)k k2,1 = 0, and the reconstruction error depends only on the noise level. 2. In the performance bound (2), the bound is on the reconstruction error of x, while we care more about the error bound of s. In radar applications, we also care about βi with their ˆ − xk2 ≤ C, it is corresponding nonzero si terms. When kx easy to get kˆ s −sk2 ≤ C. For the i-th element of the mismatch variable β, we have |βˆi sˆi − βi si | ≤ C. Using triangle inequality, we have |ˆ si ||βi − βˆi | ≤ C + |βi ||si − sˆi |. When si is nonzero, sˆi is also normally nonzero, which is confirmed by numerical examples. In real applications, the mismatch is often upper bounded; therefore, we can bound the reconstruction error of βi as C + |βi ||si − sˆi | . |βi − βˆi | ≤ |ˆ si |
i=1
If s is k-sparse, then p will also be k-sparse, with the same support set as s. Hence the relaxed optimization enforcing joint sparsity will be referred as (JS) throughout the paper and it can be stated as 1 (JS) min2n kΦx − yk22 + λkxk2,1 . 2 x∈R B. Performance Bound for Joint Sparse LASSO In order to analyze the recovery performance of (JS), we introduce a condition similar to the restricted isometry property (RIP) [1] in compressive sensing. This definition is a special case of the Block RIP introduced in [15]. Definition II.1. We say that the measurement matrix Φ ∈ Rm×2n obeys the joint restricted isometry property with constant σk if (1 − σk )kvk22 ≤ kΦvk22 ≤ (1 + σk )kvk22 2n
holds for all k joint-sparse vectors v ∈ R . With this definition, a performance bound for (JS) can be obtained using techniques given in [16]. The details have been omitted and will be included in the journal version of this paper. Theorem II.1. Let Φ be an m × 2n matrix, and let Φ satisfy the joint RIP with σ2k < 0.1907. Let the measurement y follow y = Φx + w, where w is the measurement noise in the linear λ system. Assume that λ obeys kΦT wk∞ ≤ 2√ , and then the 2 ˆ solution x to the optimization problem (JS) satisfies √ kx − (x)k k2,1 √ ˆ − xk2 ≤ C0 kλ + C1 kx . (2) k Here (x)k is the best k joint-sparse approximation to x. C0 and C1 are constants that depend on σ2s .
3. There are two ways to recover the mismatch parameter β. The first one is to directly use the optimal solution from solving (JS) and let βˆi = pˆi /ˆ si . The other is to use the sˆ recovered from solving (JS) and put it back in the original optimization problem (1) to solve for β. III. O FF - GRID C OMPRESSED S ENSING A. General Model and Bounded Joint-Sparse Method In real applications, the received signal is the linear combination of several vectors, each defined by a parameter. In traditional compressed sensing, we introduce a discretized grid θ = [θ1 , θ2 , . . . , θN ] with step size 2r. Thus the signal model can be written as y = A(θ)s + w, (3) where A(θ) = [a(θ1 ), a(θ2 ), . . . , a(θN )]. Here w is the noise term. When the actual parameters do not fall on the discretized grid θ, the model in (3) is inaccurate, and the performance of compressed sensing can be highly jeopardized [8]. Let ϕ = [ϕ1 , ϕ2 , . . . , ϕN ] be the actual grid we want to use, and let the parameters of interest fall on this grid. In this paper, we assume that two parameters with nonzero si are at least 2r apart, i.e., |ϕi − ϕj | > 2r for all i, j. Using the Taylor expansion with the first order derivative, a more accurate signal model is y = A(ϕ)s + w = (A + B∆)s + w,
(4)
∂a(θN ) 1 ) ∂a(θ2 ) where A = A(θ), B = [ ∂a(θ ∂θ1 , ∂θ2 , . . . , ∂θN ], ∆ = β, and β = ϕ − θ. We know that every element in β is in the range of [−r, r]. Hence the optimization problem can be formulated as
Remarks: 1. In our case, x = [sT , pT ]T . So if s is k-sparse, since
249
min
s∈Rn ,β∈Rn
1 2 k(A
+ B∆)s − yk22 + λksk1 ,
s.t. ∆ = diag(β),
−r1 ≤ β ≤ r1.
2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)
By letting p = β s and using the separating technique in [11], and by letting s = s+ − s− and penalizing the joint-sparse term, we can formulate the bounded joint-sparse optimization problem as (BJS)
min
s+ ,s− ,p,x
1 + 2 kAs
Fast Proximal Gradient Method (FISTA) Input: An upper bound L ≥ kΦk22 and initial point x0 . Step 0. Take y1 = x0 , t1 = 1. Step k. (k ≥ 1) Compute xk = prox L1 h zk − L1 ∇F (zk ) , and h(u) = λkuk2,1 , √ 1+ 1+4t2k tk+1 = , 2 −1 zk+1 = xk + ttkk+1 (xk − xk−1 ).
− As− + Bp − yk22 + λkxk2,1 , s+ ≥ 0, s− ≥ 0,
s.t.
−r(s+ + s− ) ≤ p ≤ r(s+ + s− ), x = [(s+ − s− )T , pT ]T . This transformation requires that s+ s− = 0. As confirmed by the numerical examples in both [11] and this paper, the solution to the optimization problem always satisfies this condition.
V. N UMERICAL E XAMPLES A. Randomly Generated Data 1
10
B. Dictionary Ambiguity
Joint Sparse Method Alternating Minimization
When an actual parameter falls at the midpoint between θi and θi+1 , then it can be regarded as either θi + 12 r or θi+1 − 21 r. Even though the actual parameter is near the midpoint of some interval [θi , θi+1 ], due to the ambiguity we will normally have two recovered parameters corresponding to nonzero terms in the interval [θi , θi+1 ]. To solve this problem, we can perform a linear interpolation between the two parameters in the same interval and merge them into one parameter, since we know a priori that the two parameters are at least 2r apart. Suppose the two parameters we recovered are ϕa and ϕb , and they all fall into the same interval, with θc as the interval midpoint. Their corresponding magnitudes are sa and sb . After merging them, we have one recovered parameter ϕ, with a magnitude s given as
0
Es
10
−1
10
−2
10
−3
10 0.004
0.04
0.4
4
δ
Fig. 1: Signal reconstruction error.
1.4
s = sa + sb , and ϕ = θc +
|sa |(ϕa − θc ) + |sb |(ϕb − θc ) . |sa | + |sb |
Alternating Minimization Joint Sparse Method
1.2
1
IV. FAST F IRST O RDER A LGORITHM Eβ
0.8
Solving (JS) using an interior point method could be time consuming for large problems, in order to speed up the computing time, we can use a first order method based on a proximal operator, namely FISTA [13]. To introduce the algorithm, we first review a key concept used in FISTA, named Moreau’s proximal operator, or proximal operator for short [17]. Given a closed proper convex function h : Rp → R ∪ {∞}, the proximal operator of h is defined by 1 2 proxh (x) = arg min h(u) + ku − xk2 . 2 u∈Rp When h(u) = λkuk2,1 and u ∈ R2n , the proximal operator of x = [sT , pT ]T is a group-thresholding operator defined as [si , pi ] proxh ({[si , pi ]}) = p 2 max( si + p2i
q
s2i + p2i − λ, 0), ∀i.
The algorithm is based on [13] and summarized as follows:
0.6
0.4
0.2
0 0.004
0.04
0.4
4
δ
Fig. 2: Mismatch reconstruction error. In this numerical example we compare the FISTA-based joint-sparse method with the alternating minimization method proposed in [10]. Both matrices A ∈ Rm×n and B ∈ Rm×n are randomly generated with a normal distribution with mean 0 and standard deviation 1. We set m = 40, n = 100, and the sparsity of signal s is 3. The noise term w is randomly generated according to a normal distribution with mean zero and standard deviation σn = 0.01. The mismatch term β is also generated according to a normal p distribution with standard deviation δ. λ is chosen as σn 2 log(n) from [18].
250
2013 5th IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP)
We change δ from 4 to 0.004 to compare these two methods. The alternating method is run for 100 iterations or fewer if it converges. ˆ 2 kβ(I)−β(I)k s k2 . In Fig. 1 and Fig. 2, Es = ks−ˆ ksk2 and Eβ = kβ(I)k2 Here I denotes the support set of s. We can see that the joint -sparse method performs much better than alternating minimization when the mismatch is larger and performs comparably when the mismatch is small. Alternating minimization is very time consuming and converges slowly. On average, it takes 31.238 seconds, while the FISTA-based joint sparse method takes only 0.668 second. B. Frequency Estimation Using Off-grid Compressed Sensing 50
45
R
40
VI. C ONCLUSION In this paper, we proposed a method for overcoming the dictionary mismatch in compressed sensing. We utilized the jointsparse recovery model and also gave a performance bound on the joint-sparse reconstruction. We conducted two simulations and showed that compressed sensing with dictionary mismatch can benefit from the property of joint sparsity. In future work, we will apply our method to other applications, such as distributed MIMO radar and direction-of-arrival estimation using a sensor array. R EFERENCES
35
30
25
20 15
Joint Sparse Method P−BPDN 20
25
30
35
30
35
SNR
Fig. 3: Detection ability.
0.05 0.045 0.04 0.035 0.03
E (Hz)
the results using the merge method proposed in section III-B. Results are shown in Fig. 3 and Fig. 4. R in Fig. 3 is defined as the power of the largest three magnitudes versus all the other recovered signal powers. The E in Fig. 4 is the frequency estimation error between the frequencies of the largest three reconstruction signals and the true underlying frequencies. We can see that the method proposed in this paper has consistently better reconstruction performance than P-BPDN.
0.025 0.02 0.015 0.01 0.005 0 15
P−BPDN Joint Sparse Method 20
25
SNR
Fig. 4: Frequency reconstruction error. In this numerical example, we compare the bounded jointsparse optimization with P-BPDN [11] in a frequency estimation scenario. The received signal is given by b(t) = P K k=1 sk sin (2πfk t). The goal is to estimate the value of fk with nonzero sk terms. In this simulation there are three of these frequencies. We created a discretized grid from 1 Hz to 200 Hz with a step size of 0.5 Hz. We assume that two frequencies with nonzero sk terms are at least 0.5Hz apart. We randomly choose 40 sampling points from time 0 to 1 second. From section III-A, we can derive a dictionary matrix A and also a mismatch matrix B. We apply both the joint-sparse recovery method and P-BPDN to this model, and post-process
[1] E. Cand`es and M. Wakin, “An introduction to compressive sampling,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 21 –30, Mar. 2008. [2] E. Cand`es, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489 – 509, Feb. 2006. [3] E. Cand`es and T. Tao, “Near-optimal signal recovery from random projections: Universal encoding strategies?” IEEE Trans. Inf. Theory, vol. 52, no. 12, pp. 5406 –5425, Dec. 2006. [4] J. Romberg, “Imaging via compressive sampling,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 14 –20, Mar. 2008. [5] S. Gogineni and A. Nehorai, “Target estimation using sparse modeling for distributed MIMO radar,” IEEE Trans. Signal Process., vol. 59, no. 11, pp. 5315 –5325, Nov. 2011. [6] A. Petropulu, Y. Yu, and H. Poor, “Distributed MIMO radar using compressive sampling,” in Signals, Systems and Computers, 2008 42nd Asilomar Conference on, Oct. 2008, pp. 203 –207. [7] W. Bajwa, J. Haupt, A. Sayeed, and R. Nowak, “Compressive wireless sensing,” in Information Processing in Sensor Networks, 2006. IPSN 2006. The Fifth International Conference on, 2006, pp. 134 –142. [8] Y. Chi, L. Scharf, A. Pezeshki, and A. Calderbank, “Sensitivity to basis mismatch in compressed sensing,” IEEE Trans. Signal Process., 2011. [9] M. Herman and T. Strohmer, “General deviants: An analysis of perturbations in compressed sensing,” Selected Topics in Signal Processing, IEEE Journal of, vol. 4, no. 2, pp. 342–349, 2010. [10] H. Zhu, G. Leus, and G. Giannakis, “Sparsity-cognizant total leastsquares for perturbed compressive sampling,” IEEE Trans. Signal Process., May 2011. [11] Z. Yang, C. Zhang, and L. Xie, “Robustly stable signal recovery in compressed sensing with structured matrix perturbation,” IEEE Trans. Signal Process., vol. 60, no. 9, pp. 4658–4671, 2012. [12] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004. [13] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM. on Imaging Sciences 2(1), 2009. [14] R. T. T. Hastie and J. Friedman, The Elements of Statistical Learning: Data, Mining, Intereference, and Prediction, 2nd ed. New York: Springer, 2009. [15] Y. Eldar and M. Mishali, “Robust recovery of signals from a structured union of subspaces,” IEEE Trans. Inf. Theory, vol. 55, no. 11, pp. 5302– 5316, 2009. [16] E. Cand`es, “The restricted isometry property and its implications for compressed sensing,” C. R. Acad. Sci. Paris, Ser. I, 2008. [17] J. J. Moreau, “Proximit´eet dualit´e dans un espace hilbertien,” Bull. Soc. Math. France, vol. 93, pp. 273–299, 1965. [18] D. S.S.Chen and M.A.Saunders, “Atomic decomposition by basis pursuit,” SIAM Rev., vol. 43, no. 3, pp. 129 –159, Mar. 2001.
251