1
Robust Recovery of Signals From a Union of Subspaces
arXiv:0807.4581v1 [nlin.CG] 29 Jul 2008
Yonina C. Eldar, Senior Member, IEEE and Moshe Mishali, Student Member, IEEE
Abstract Traditional sampling theories consider the problem of reconstructing an unknown signal x from a series of samples. A prevalent assumption which often guarantees a unique signal consistent with the given measurements is that x lies in a known subspace. Recently, there has been growing interest in nonlinear but structured signal models, in which x is assumed to lie in a union of subspaces. An example is the case in which x is a finite length vector that is sparse in a given basis. In this paper we develop a general framework for robust and efficient recovery of such signals from a given set of samples. More specifically, we treat the case in which x lies in a finite union of finite dimensional spaces and the samples are modelled as inner products with an arbitrary set of sampling functions. We first develop conditions under which unique and stable recovery of x is possible, albeit with algorithms that have combinatorial complexity. To derive an efficient and robust recovery algorithm, we then show that our problem can be formulated as that of recovering a block sparse vector, namely a vector whose non-zero elements appear in fixed blocks. To solve this problem, we suggest minimizing a mixed ℓ2 /ℓ1 norm subject to the measurement equations. We then develop equivalence conditions under which the proposed convex algorithm is guaranteed to recover the original signal. These results rely on the notion of block restricted isometry property (RIP), which is a generalization of the standard RIP used extensively in the context of compressed sensing. A special case of the proposed framework is that of recovering multiple measurement vectors (MMV) that share a joint sparsity pattern. Specializing our results to this context leads to new MMV recovery methods as well as equivalence conditions under which the entire set can be determined efficiently.
I. I NTRODUCTION Sampling theory has a rich history dating back to Cauchy in the 19th century. The most prevalent setting studied in the literature is that in which it is desired to recover an unknown signal x from a series of samples, which are modelled as inner products with a set of sampling functions [1]–[9]. Without additional prior information on x, the This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Department of Electrical Engineering, Technion—Israel Institute of Technology, Haifa 32000, Israel. Phone: +972-4-8293256, fax: +9724-8295757, E-mail: {yonina@ee,moshiko@techunix}.technion.ac.il. This work was supported in part by the Israel Science Foundation under Grant no. 1081/07 and by the European Commission in the framework of the FP7 Network of Excellence in Wireless COMmunications NEWCOM++ (contract no. 216715).
2
sampling process is in general irreversible. Therefore, in order to recover x, prior assumptions on the nature of x are necessary. Undoubtedly, the sampling theorem that had the most impact on signal processing and communications is that associated with Shannon, Whittaker and Kotelnikov [10], [11]. Their famous result is that a bandlimited function x(t) can be recovered exactly from its uniform samples as long as the sampling rate exceeds the Nyquist rate,
corresponding to twice the highest frequency of the signal [12]. More recently, this basic theorem has been extended to include more general classes of signal spaces. In particular, it can be shown that under mild technical conditions, a signal x lying in a given subspace can be recovered exactly from its linear generalized samples using a series of filtering operations. [2], [3], [13], [14]. Recently, there has been growing interest in nonlinear signal models in which the unknown x does not necessarily lie in a subspace. In order to ensure recovery from the samples, some underlying structure is needed. A general signal model that captures many interesting cases is that in which the signal x lies in a union of subspaces. Thus, x lies in one of a set of given subspaces Vi , however, a priori it is not known in which subspace the signal resides. A special case of this framework, which has been studied extensively in the recent literature, is the problem underlying the emerging field of compressed sensing (CS). In this setting, the goal is to recover a length N vector x from n < N linear measurements, where x is known to be k-sparse in some basis, namely in the given basis it contains
no more than k non-zero elements [15], [16]. In other words, x belongs to one of the subspaces spanned by the appropriate k basis vectors. Many algorithms have been proposed in the literature in order to recover x exactly in a stable and efficient manner [16]–[19]. A variety of conditions have then been developed to ensure that the proposed methods recover x exactly. One of the main tools in this context is the restricted isometry property (RIP) [16], [20], [21]. In particular, it can be shown that if the measurement matrix satisfies the RIP then x can be recovered by solving a convex ℓ1 minimization algorithm. Another special case of a union of subspaces is the setting in which the unknown signal x = x(t) has a multiband structure, so that its Fourier transform consists of at most N bands, each of limited width [22]. The important aspect of this problem setting is that the band locations are unknown, and therefore the signal lies in a union of multiband spaces. By formulating this problem within the framework of CS, explicit sub-Nyquist sampling and reconstruction schemes were developed in [22], [23] that ensure perfect-recovery of the multiband signal at the minimal possible rate. This setup was generalized in [24] to deal with sampling and reconstruction of signals that lie in a finite union of shift-invariant subspaces. By combining ideas from standard sampling theory with CS results, explicit low-rate sampling and recovery methods were developed for such signal sets. Another example of a union of subspaces model is the set of finite rate of innovation signals [25], [26], that are modelled as a weighted sum of shifts of a given generating function, where the shifts are unknown.
3
In this paper, our goal is to develop a unified framework for efficiently recovering signals that lie in a union of subspaces. Our emphasis is on developing methods that are stable in the presence of noise and modelling errors, and computationally efficient. In contrast to [24], here we consider unions of finite dimensional subspaces. Specifically, we restrict our attention to the case in which the union is comprised of a finite sum of subspaces Vi , where each possible Vi is the sum of k subspaces Aj , 1 ≤ j ≤ m, selected from a given set of m subspaces. In total, there are m k choices Vi . The standard CS model fits this framework in which Ai is the space spanned by the ith column
of the identity matrix. The setting we treat here is a special case of that considered in [27], where infinite unions were allowed as well. Conditions under which unique and stable sampling are possible in this general setting were developed in [27]. However, no concrete algorithm was provided to recover such a signal from a given set of
samples in a stable and efficient manner. Here we propose a convex optimization algorithm that will often recover the true underlying x and develop explicit conditions under which perfect recovery is guaranteed. Furthermore, we prove that our method is stable and robust in the sense that the reconstruction error is bounded in the presence of noise and mismodelling, namely when x does not lie in the union. Our first contribution is showing that the problem of recovering a signal x in a union of subspaces can be cast as a sparse recovery problem, in which it is desired to recover a sparse vector c that has a particular sparsity pattern: the non-zero values appear in fixed blocks. We refer to such a model as block sparsity. Clearly any block sparse vector is also sparse in the standard sense. However, by exploiting the block structure of the sparsity pattern, recovery may be possible under more general conditions. Block sparse signal models were introduced in [28] and studied in the context of DNA microarrays in [29], [30]. In order to recover such a signal from a given set of measurements, a mixed ℓ2 /ℓ1 algorithm was proposed. By analyzing the properties of the measurement operator’s null space, it was shown that asymptotically, as the signal length grows to infinity, perfect recovery is possible with high probability. Although this result may be relevant to DNA microarray experiments in which the problems are large scale, it is less relevant to the analysis of the finite, non-asymptotic case that is of interest here. A convenient tool for analyzing the recovery capabilities of efficient algorithms for finding a sparse vector from a given set of measurements is the RIP. Generalizing this concept to our setting, we introduce the block RIP, which is a less stringent requirement. We then prove that if the measurement matrix satisfies the block RIP, then the underlying block sparse signal can be recovered exactly using the convex mixed ℓ2 /ℓ1 problem. The proposed algorithm is also shown to be stable in the presence of noise and mismodelling errors under the block RIP condition. Using ideas similar to [19], [31] we then prove that random matrices satisfy the block RIP with overwhelming probability. Moreover, the probability to satisfy the block RIP is substantially larger than that of satisfying the standard RIP. These results establish that a signal x that lies in a finite union can be recovered efficiently and stably
4
with overwhelming probability if the measurement matrix is constructed from a random ensemble. An interesting special case of the block sparse model is the multiple measurement vector (MMV) problem in which we have d unknown vectors that share a joint sparsity pattern. Uniqueness conditions and recovery algorithms for the MMV setting were studied in [17], [18], [23], [32], [33]. However, there are very few results that establish equivalence between the output of the proposed efficient algorithms and the true signal set. In [33] an equivalence result was derived for a mixed ℓp /ℓ1 program in which the objective is to minimize the sum of the ℓp -norms of the rows of the estimated matrix whose columns are the unknown vectors. The resulting equivalence condition is based on mutual coherence, and turns out to be the same as that obtained from a single measurement problem, so that the joint sparsity pattern does not lead to improved recovery capabilities as judged by this condition. This is in contrast to the fact that in practice, MMV methods tend to outperform algorithms that treat each of the vectors separately. In order to develop meaningful equivalence results, we cast the MMV problem as one of block sparse recovery. The mixed ℓ2 /ℓ1 method we propose for block sparsity translates into minimizing the sum of the ℓ2 norms of the rows of the unknown matrix representing the MMV set. Our general results lead to RIP-based equivalence conditions for this class of algorithms. Furthermore, our framework suggests a different type of sampling method for MMV problems which tends to increase the recovery rate. Specifically, instead of sampling each unknown vector with the same measurement matrix, we treat the unknowns as a block sparse vector and sample with a dense matrix. In this case every sample has a contribution from all the unknown vectors, in contrast with standard MMV sampling techniques in which each measurement vector depends only on the corresponding unknown vector. The equivalence condition we obtain in this case is stronger than the single measurement setting. The remaining of the paper is organized as follows. In Section II we describe the general problem of sampling from a union of subspaces. Uniqueness and stability conditions are derived in Section III by exploiting the connection between our problem and that of recovering a block-sparse vector. We also present a non-convex optimization algorithm with combinatorial complexity whose solution is the true unknown signal x. A convex relaxation of this algorithm is proposed in Section IV. We then derive equivalence conditions based on the notion of block RIP that ensure that the relaxed solution is equal to the true unknown signal x. We also prove that the algorithm is robust and stable in the presence of noise and modelling errors. This approach is specialized to MMV sampling in Section V. Finally, in Section VI we prove that random ensembles satisfy the block RIP with high probability. Throughout the paper, we denote vectors in an arbitrary Hilbert space H by lower case letters e.g., x, and sets of vectors in H by calligraphic letters, e.g., S . Vectors in RN are written as boldface lowercase letters e.g., x, and matrices as boldface uppercase letters e.g., A. The identity matrix of appropriate dimension is written as I or Id when the dimension is not clear from the context. Given a matrix A, AT is its transpose, and R(A) denotes its range space. The ith element of a vector x is denoted by x(i). Linear transformations from Rn to H are written as
5
upper case letters A : Rn → H. The adjoint of A is written as A∗ and R(A) is the range space of A. The standard √ P Euclidean norm is denoted kxk2 = xT x and kxk1 = i |x(i)| is the ℓ1 norm of x. The Kronecker product between matrices A and B is denoted A ⊗ B. The following variables are used in the sequel: n is the number
of samples, N is the length of the input signal x when it is a vector, k is the sparsity or block sparsity (to be defined later on) of a vector c, and m is the number of subspaces. For ease of notation we assume throughout that all scalars are defined over the field of real numbers; however, the results are also valid over the complex domain with appropriate modifications. II. U NION
OF
S UBSPACES
A. Subspace Sampling Traditional sampling theory deals with the problem of recovering an unknown signal x ∈ H from a set of n samples yi = fi (x) where fi (x) is some function of x. The signal x can be a function of time x = x(t), or can represent a finite-length vector x = x. Typically it is assumed that x lies in a given subspace A of H [2], [3], [13], [14]. This information is then utilized in the recovery process in order to reconstruct x from the samples yi . The most common type of sampling is linear sampling in which yi = hsi , xi,
1 ≤ i ≤ n,
(1)
for a set of functions si ∈ H [1], [2], [4]–[9]. Here hx, yi denotes the standard inner product on H. For example, if H = L2 is the space of real finite-energy signals then hx, yi =
Z
When H = RN for some N , hx, yi =
∞
x(t)y(t)dt.
(2)
−∞ N X
x(i)y(i).
(3)
i=1
Nonlinear sampling is treated in [34]. However, here our focus will be on the linear case. When H = RN the unknown x = x as well as the sampling functions si = si are vectors in RN . Therefore, the samples can be written conveniently in matrix form as y = ST x,
(4)
where S is the matrix with columns si . In the more general case in which H = L2 or any other abstract Hilbert space, we can use the set transformation notation in order to conveniently represent the samples. A set transformation
6
S : Rn → H corresponding to sampling vectors {si ∈ H, 1 ≤ i ≤ n} is defined by Sc =
n X
c(i)si
(5)
i=1
for all c ∈ Rn . From the definition of the adjoint, if c = S ∗ x, then c(i) = hsi , xi. Note that when H = RN , S = S and S ∗ = ST . Using this notation, we can always express the samples as y = S ∗ x,
(6)
where S is a set transformation for arbitrary H, and an appropriate matrix when H = RN . Our goal is to recover x from the samples y ∈ Rn . If the vectors si do not span the entire space H, then there are many possible signals x consistent with the measurements y. More specifically, if we define by S the sampling space spanned by the vectors si , which is equal to R(S), then clearly S ∗ v = 0 for any v ∈ S ⊥ . Therefore, if S ⊥ is not the trivial space then adding such a vector v to any solution x of (6) will result in the same samples y. However, by exploiting prior knowledge on x, in many cases uniqueness can be guaranteed. As a simple example, if we know a priori that x ∈ S , then there is a unique x consistent with (6). More generally, it can be shown that if x lies in a subspace A that satisfies the direct sum condition1 H = A ⊕ S ⊥ , then x can be perfectly recovered from
the samples y. In finite dimensional spaces this condition implies that S and A must have the same dimension, and that S ⊥ and A intersect only at the 0 vector [3], [14]. B. Union of Subspaces Evidently, when subspace information is available, perfect reconstruction can often be guaranteed. Furthermore, in all of the cases above, recovery can be implemented by a simple linear transformation of the given samples (6). However, there are many practical scenarios in which x does not necessarily lie in a subspace. Nevertheless, x can S often be described as lying in a union of subspaces, U = i Vi where each Vi is a subspace. Thus, x belongs to
one of the Vi , but we do not know a priori to which one [27]. Note that the set U is no longer a subspace. Indeed,
if Vi is, for example, a one-dimensional space spanned by the vector vi , then U contains vectors of the form αvi for some i but does not include their linear combinations. Our goal is to recover a vector x lying in a union of subspaces, from a given set of samples. In principle, if we knew which subspace x belonged to, then reconstruction can be obtained using standard sampling results. However, here the problem is more involved because conceptually we first need to identify the correct subspace and only then can we recover the signal within the space. In our development we focus on the case in which x lies in one of the subspaces Vi where each Vi is generated by a given set of spaces {Aj , 1 ≤ j ≤ m} in such a way that Vi is a sum of at most k of the m subspaces Aj 1
The sum set {a + v; a ∈ A, v ∈ S ⊥ } is referred to as direct when A ∩ S ⊥ = {0}.
7
but we do not know in advance which k are chosen. We assume throughout the paper that m and the dimensions di = dim(Ai ) of the subspaces Ai are finite, and that the underlying spaces Ai are disjoint, namely they intersect
only at the zero vector. Given n samples y = S ∗x
(7)
and the knowledge that x lies in one of the subspaces Vi , we would like to recover the unknown signal x. In this setting, there are m k possible subspaces comprising the union that are generated from the m spaces Ai .
A special case of our model is the standard CS problem in which x = x is a vector of length N , that has a sparse
representation in a given basis defined by an invertible matrix W. Thus, x = Wc where c is a sparse vector that has at most k nonzero elements. This fits our framework by choosing Ai as the space spanned by the ith column of W. Another example is the block sparsity model [28] in which the vector x is divided into blocks of length d, and at most k blocks can be non zero. Such a vector can be described in our setting with H = RN by choosing A1 to be the space spanned by the first d columns of the identity, A2 to be the space spanned by the next d columns and so on. The choice d = 1 leads to the standard sparsity model. Selecting the spaces Ai to have different dimension leads to a block sparsity model with varying block lengths. Specializing the results we develop to this setting leads to efficient and stable algorithms for perfectly recovering block sparse signals. The algorithms and results we derive can also be applied to the MMV problem [17], [18], [23], [32], [33]. In an MMV setting we are given a matrix of measurements Y that is obtained by measuring a set of k-sparse vectors xi which are the columns of a matrix X. The distinguishing feature of the MMV setting is that the non-zero elements of xi are assumed to share a joint sparsity pattern. Thus, Y = MX,
(8)
where X has at most k non-zero rows, and M is a given sampling matrix. This problem can be transformed into that of recovering a k-block sparse signal by noting that if we define x = vec(XT ) as the vector obtained by stacking the rows of x, then x is k-block sparse where each block has length d. From (8) we have that vec(Y T ) = (M ⊗ I) vec(XT ).
(9)
Denoting by y = vec(Y T ), S = MT ⊗ I the measurements can be expressed as y = ST x where x is a block sparse vector. Namely, x lies in a union of subspaces where each possible subspace is the sum of k choices of Ai , and Ai is the space spanned by the appropriate d columns of the identity. Therefore, the uniqueness and equivalence results we derive for the union of subspaces model lead to corresponding results for the MMV problem. Furthermore, our general framework suggests improved algorithms for recovering X, as we will discuss in Section V-B.
8
C. Problem Formulation and Main Results Given the subspaces Ai , we would like to address the following questions: 1) What are the conditions on the sampling vectors si , 1 ≤ i ≤ n in order to guarantee that the sampling is invertible and stable? 2) How can we recover the unique x (regardless of computational complexity)? 3) How can we recover the unique x in an efficient and stable manner? The first question was addressed in [27]. However, no concrete methods were proposed in order to recover x and no conditions were derived to ensure stability of the reconstruction. Here we provide efficient convex algorithms that recover x in a stable way for arbitrary k under appropriate conditions on the sampling functions si and the spaces Ai . In Section III we address the uniqueness and stability question by specializing the results of [27] to our context, and provide a method to recover the unique x. As we will see, the proposed approach has exponential complexity and is therefore not practical for large values of k or m. In Section IV we suggest an efficient convex optimization problem that takes on the form of a second order cone program (SOCP), which approximates the unknown x. We then develop a block RIP condition, similar in spirit to the RIP used in several recent works in CS [19], [16], [20], [21] under which we prove that x can be recovered exactly in a stable way using the proposed optimization program. If x does not lie in the union to begin with, then we would like to find the best approximation to it within the union. As we show, our algorithm can approximate the best sum-k solution. Furthermore, under the RIP condition, a slight modification of the method estimates the true block sparse vector in a stable way in the presence of noise. In Section VI we prove that with high probability, several random matrices satisfy the proposed block RIP condition and therefore can be used to recover x efficiently. We also show that for given dimensions and sparsity level, a random matrix has a higher probability to satisfy the block RIP than the standard one. Furthermore, when the problem dimensions are large enough, a higher fraction of non-zeros is allowed when restricting the non-zero entries to blocks. In Section V we specialize the results to recovery of the sparsest matrix in an MMV system. This leads to equivalence results in MMV models under which a mixed ℓ2 /ℓ1 algorithm can recover the true sparse matrix. In addition, viewing the MMV problem in a block sparsity context suggests new sampling strategies, in which each measurement vector yj is affected by all the unknown vectors xi and does not depend only on xj , as in standard MMV approaches. As we show, this method leads to superior recovery rate when compared with other popular MMV algorithms.
9
III. U NIQUENESS
AND
S TABLE R ECOVERY
A. Connection with Block Sparsity Consider the model of a signal x in the union of k out of m subspaces Ai , with di = dim(Ai ): x∈
[
|i|=k
Ai ,
(10)
where |i| = k denotes a union over all choices of k indices out of m. To write x explicitly, we choose a basis for each Ai . Denoting by Ai : Rdi → H the set transformation corresponding to the basis for Ai , any signal x of the form (10) can be written as x=
X
Ai ci ,
(11)
|i|=k
where ci ∈ Rdi are the representation coefficients in Ai , and here |i| = k denotes a sum over a set of k indices. Our goal is to recover any x of the form (11) from the samples (6). In this section we develop necessary and sufficient conditions on the sampling operator S to be invertible and stable over the set defined by (11). In order to present our results, it is useful to introduce some further notation. First, we define A : RN → H as the set transformation that is a result of concatenating the different Ai , with N=
m X
di .
(12)
i=1
Next, we define the ith sub-block c[i] of a length-N vector c over I = {d1 , . . . , dm }. The ith sub-block is of length di , and the blocks are formed sequentially so that c[1] is a vector of length d1 containing the first d1 elements of c, c[2] is a vector of length d2 consisting of the next d2 values, and so on. Using these sub-blocks we can define A by Ac =
m X
Ai c[i].
(13)
i=1
When H = RN for some N , Ai = Ai is a matrix and A = A is the matrix obtained by column-wise concatenating Ai . If the j th subspace Aj does not appear in the union (10), then c[j] = 0.
Any x of the form (10) can be written as x = Ac where there are at most k non-zero blocks c[i]. Therefore, the problem of recovering x reduces to that of recovering a sparse vector from a set of samples. However, the sparsity pattern here has a unique form which we will exploit in our conditions and algorithms: the non-zero elements appear in blocks. The precise structure of block sparsity is now defined. Definition 1: A vector c ∈ RN is called block k-sparse over I = {d1 , . . . , dm } if c[i] is nonzero for at most k P indices i where N = i di .
10
When di = 1 for each i, block sparsity reduces to the conventional definition of a sparse vector. Denoting kck0,I =
m X
I(kc[i]k2 > 0),
(14)
i=1
where I(kc[i]k2 > 0) is an indicator function that obtains the value 1 if kc[i]k2 > 0 and 0 otherwise, a block k-sparse vector c can be defined by kck0,I ≤ k.
Since our problem is equivalent to that of recovering a block sparse vector over I from linear measurements, in the sequel we analyze this setting.
B. Uniqueness and Stability Proposition 1 below states conditions for invertibility of the sampling operator S . Proposition 1: The sampling operator S is invertible for every x of the form (10) if and only if S ∗ Ac 6= 0 for every c 6= 0 that is block 2k-sparse, where A is defined by (13). Proof: The proof follows from [27, Proposition 4]. Note that the invertibility condition of Proposition 1 does not depend on the choice of basis vectors Ai . Indeed, suppose that A˜i is an alternative basis for Ai . Then A˜i = Ai Mi for some square invertible matrix Mi of dimension P ˜ = di so that Ac i Ai d[i] where d[i] = Mi c[i]. The correspondence follows from the fact that Mi is invertible. We next address the issue of stability. A sampling operator is stable for a set U if and only if there exists
constants α > 0, β < ∞ such that αkx1 − x2 k2H ≤ kS ∗ x1 − S ∗ x2 k22 ≤ βkx1 − x2 k2H ,
(15)
for every x1 , x2 in U . The ratio κ = β/α provides a measure for stability of the sampling operator. The operator is maximally stable when κ = 1. Proposition 2: The sampling operator S is stable for every x of the form (10) if and only if there exists C1 > 0 and C2 < ∞ such that C1 kck22 ≤ kS ∗ Ack22 ≤ C2 kck22 ,
(16)
for every c that is block 2k-sparse. Proof: From [27, Proposition 2] it follows that S is stable if and only if αkxk2H ≤ kS ∗ xk22 ≤ βkxk2H ,
(17)
for every x that lies in the sum set V1 + V2 where each Vi is the sum of at most k subspaces Ai . Since any x ∈ Vi can be written as Ac for some block k-sparse signal c, it follows that x ∈ V1 + V2 can be expressed as Ac where
11
c is block 2k-sparse. Therefore, (17) is equivalent to αkAck2H ≤ kS ∗ Ack22 ≤ βkAck2H ,
∀c block 2k-sparse.
(18)
Finally, since the subspaces Ai are disjoint and Ai represents a basis, the size N Gram matrix G = A∗ A is invertible and ηkck22 ≤ kAck2H ≤ γkck22 .
(19)
Here η = λmin (G) > 0 and γ = λmax (G) < ∞, where λmin , λmax are the largest and smallest eigenvalues respectively of G. Combining (18) and (19) completes the proof. It is easy to see that if S satisfies (16) then S ∗ Ac 6= 0 for all block 2k-sparse vectors c. Therefore, this condition implies both invertibility and stability. Property (16) is related to the RIP used in several previous works in CS [16], [20], [21]. A matrix D of size n × N is said to have the RIP if there exists a constant δk ∈ [0, 1) such that for every k-sparse c ∈ RN , (1 − δk )kck22 ≤ kDck22 ≤ (1 + δk )kck22 .
(20)
Extending this property to block sparse vectors leads to the following definition: Definition 2: Let D : RN → Rn be a given matrix. Then D has the block RIP over I = {d1 , . . . , dm } with parameter δk|I if for every c ∈ RN that is block k-sparse over I we have that (1 − δk|I )kck22 ≤ kDck22 ≤ (1 + δk|I )kck22 .
(21)
By abuse of notation, we use δk for the block-RIP constant δk|I when it is clear from the context that we refer to blocks. It is easy to see that the RIP is a special case of (16) when C1 + C2 = 2. Note that a vector that is block k-sparse over I is M -sparse in the conventional sense where M is the sum of the k largest values in I , since it has at most M nonzero elements. If we require D to satisfy RIP for all M -sparse vectors, then (21) must hold for all M -sparse vectors c. Since we only require the RIP for block sparse signals, the condition (21) only has to be satisfied for a certain subset of M -sparse signals, namely those that have block sparsity. As a result, the block-RIP constant δk|I is typically smaller than δM . In the next section, we will see that the ability to recover x in a computationally efficient way depends on the constant δ2k|I in the block RIP (21). The smaller the value of δ2k|I , the easier the recovery process. Both standard and block RIP constants δk , δk|I are by definition increasing with k. Therefore, it was suggested in [19] to normalize each of the columns of D to 1, so as to start with δ1 = 0. In the same spirit, we recommend choosing the basis for Ai such that S ∗ A has unit-norm columns, corresponding to δ1|I = 0.
12
C. Recovery Method From our previous discussion it follows that if D = S ∗ A satisfies the RIP (21) with δ2k < 1, then there is a unique value of x consistent with (6). The question is how to find x in practice. Below we present an algorithm that will in principle find the unique x from the samples y. Unfortunately, though, it has exponential complexity. In the next section we show that under a stronger condition on δ2k we can recover x in a stable and efficient manner. Our first claim is that x can be uniquely recovered by solving the optimization problem min c
s. t.
kck0,I y = S ∗ Ac.
(22)
To show that (22) will indeed recover the true value of x = Ac, suppose that there exists a c′ such that S ∗ Ac′ = y and kc′ k0,I ≤ kck0,I ≤ k. Since both c, c′ are consistent with the measurements, 0 = S ∗ A(c − c′ ) = S ∗ Ad,
(23)
where kdk0,I ≤ 2k so that d is a block 2k-sparse vector. Since S ∗ A satisfies (21) with δ2k < 1, we must have that d = 0 or c = c′ . In principle (22) can be solved by searching over all possible sets of k subspaces, and for each such choice checking whether there exists a unique c that is consistent with the measurements. The invertibility condition (21) ensures that there is only one such c. However, clearly this approach has exponential complexity. IV. C ONVEX R ECOVERY A LGORITHM In this section we develop an efficient convex optimization problem instead of (22) to approximate x. As we show, if D = S ∗ A satisfies (21) with a small enough value of δ2k , then the method we propose will recover x exactly. Our approach is to minimize the sum of the energy of the representation coefficients c[i] in each of the spaces Ai . To write down the problem explicitly, we define the mixed ℓ2 /ℓ1 norm over the index set I = {d1 , . . . , dm } as kck2,I =
m X i=1
kc[i]k2 .
(24)
Our proposed reconstruction algorithm is min c
s. t.
kck2,I y = S ∗ Ac.
(25)
Denoting the solution of (25) by c0 , the reconstructed signal is then x0 = Ac0 . Problem (25) is an SOCP and can
13
therefore be solved efficiently using standard software packages. To rewrite (25) explicitly as an SOCP, we define ti = kc[i]k2 . Then (25) is equivalent to min c,ti
s. t.
m X
ti
i=1
y = S ∗ Ac
ti ≥ kc[i]k2 , ti ≥ 0,
1≤i≤m
1 ≤ i ≤ m.
(26)
The next theorem establishes that the solution to (25) is the true c as long as δ2k is small enough. Theorem 1: Let y = S ∗ x0 be measurements of a vector x0 lying in the union ∪|i|=k Ai . Let A be a set transformation corresponding to a basis for the sum of all the Ai so that x0 = Ac0 for some block k-sparse √ vector c0 . If D = S ∗ A satisfies the block RIP (21) with δ2k < 2 − 1 then 1) there is a unique x = Ac with c a k-block sparse vector consistent with y; 2) the SOCP (26) has a unique solution; 3) the solution to the SOCP is equal to c0 . Before proving the theorem we note that it provides a gain over standard CS results. Specifically, it is shown in √ [21] that if c is k-sparse and the measurement matrix D satisfies the standard RIP with δ2k < 2 − 1, then c can be recovered exactly from the measurements y = Dc via the linear program: min c
s. t.
kck1 y = Dc.
(27)
Since any block k-sparse vector is also M -sparse with M equal to the sum of the k largest values of di , we can find c0 of Theorem 1 by solving (27) if δ2M is small enough. However, this standard CS approach does not exploit the fact that the non-zero values appear in blocks, and not in arbitrary locations within the vector c0 . On the other hand, the SOCP (26) explicitly takes the block structure of c0 into account. Therefore, the condition of Theorem 1 is not as stringent as that obtained by using equivalence results with respect to (27). Indeed, the block RIP (21) bounds the norm of kDck over block sparse vectors c, while the standard RIP considers all possible choices of c, also those that are not 2k-block sparse. Therefore, the value of δ2k in (21) can be lower than that obtained from (20) with k = 2M . To emphasize the advantage of block RIP over standard RIP, consider the following matrix, separated into three
14
blocks of two columns each:
0 0 −1 1 0 0 2 −1 0 0 D= 0 3 0 −1 0 0 −1 0 1 0
1 3 · B, 1 1
(28)
where B is a diagonal matrix that results in unit-norm columns of D, i.e., B = diag (1, 15, 1, 1, 1, 12)−1/2 . In this example m = 3 and I = {d1 = 2, d2 = 2, d3 = 2}. Suppose that c is block-1 sparse, which corresponds to at most two non-zero values. Brute-force calculations show that the smallest value of δ2 satisfying the standard RIP (20) is δ2 = 0.866. On the other hand, the block-RIP (21) corresponding to the case in which the two-non zero elements
are restricted to occur in one block is satisfied with δ1|I = 0.289. Increasing the number of non-zero elements to k = 4, we can verify that the standard RIP (20) does not hold for any δ4 ∈ [0, 1). This implies that there can be
two 4-sparse vectors that result in the same measurements. In contrast, δ2|I = 0.966 satisfies the lower bound in (21), which allows 4 non-zero values grouped into two blocks. Consequently, the measurements y = Dc uniquely specify c when the non-zeros are restricted to blocks. If each of the Ai has dimension equal to 1, and are spanned by vectors ai , then our problem becomes that of P reconstructing a k-sparse vector c ∈ Rm in the model y = Dc with D = S ∗ A where Ac = i ai c(i). This is the
basic recovery problem treated in the CS literature. In this case, (25) is equal to (27) and the equivalence condition of Theorem 1 is equal to that obtained in [21]. Proof: Our method of proof is similar to that of [21]; the main difference lies in the fact that our algorithm relies on the mixed ℓ2 /ℓ1 norm rather than the ℓ1 norm alone. We first note that δ2k < 1 guarantees uniqueness of x from Proposition 1. To prove parts 2) and 3) we show that any solution to (25) has to be equal to c0 , and therefore x0 = Ac0 is unique. To this end let c′ = c0 + h be a solution of (25). The true value c0 is non-zero over at most k blocks. We denote by I0 the block indices for which c0 is nonzero, and by hI0 the restriction of h to these blocks. Next we decompose h as h=
ℓ−1 X i=0
hIi ,
(29)
where hIi is the restriction of h to the set Ii which consists of k blocks, chosen such that the norm of hI0c over I1 is largest, the norm over I2 is second largest and so on. Our goal is to show that h = 0. We prove this by noting that khk2 = khI0 ∪I1 + h(I0 ∪I1 )c k2 ≤ khI0 ∪I1 k2 + kh(I0 ∪I1 )c k2 .
(30)
In the first part of the proof we show that kh(I0 ∪I1 )c k2 ≤ khI0 ∪I1 k2 . In the second part we establish that khI0 ∪I1 k2 = 0, which completes the proof.
15
Part I:kh(I0 ∪I1 )c k2 ≤ khI0 ∪I1 k2 We begin by noting that
ℓ−1 ℓ−1
X
X
khIi k2 . hIi ≤ kh(I0 ∪I1 )c k2 =
i=2
2
(31)
i=2
Therefore, it is sufficient to bound khIi k2 for i ≥ 2. Now,
khIi k2 ≤ k1/2 khIi k∞,I ≤ k−1/2 khIi−1 k2,I ,
(32)
where we defined kck∞,I = maxi kc[i]k2 . The first inequality follows from the fact that for any block k-sparse c, kck22 =
X
|i|=k
kc[i]k22 ≤ kkck2∞,I .
(33)
The second inequality is a result of the fact that the norm of each block in hIi is by definition smaller or equal to the norm of each block in hIi−1 . Since there are at most k nonzero blocks, kkhIi k∞,I ≤ khIi−1 k2,I . Substituting (32) into (31), kh(I0 ∪I1 )c k2 ≤ k−1/2
ℓ−2 X i=1
khIi k2,I ≤ k−1/2
ℓ−1 X i=1
khIi k2,I = k−1/2 khI0c k2,I ,
(34)
where the equality is a result of the fact that kc1 + c2 k2,I = kc1 k2,I + kc2 k2,I if c1 and c2 are non-zero on disjoint blocks. To develop a bound on khI0c k2,I note that since c′ is a solution to (25), kc0 k2,I ≥ kc′ k2,I . Using the fact that c′ = c0 + hI0 + hI0c and c0 is supported on I0 we have kc0 k2,I ≥ kc0 + hI0 k2,I + khI0c k2,I ≥ kc0 k2,I − khI0 k2,I + khI0c k2,I ,
(35)
from which we conclude that khI0c k2,I ≤ khI0 k2,I ≤ k1/2 khI0 k2 .
(36)
The last inequality follows from applying Cauchy-Schwarz to any block k-sparse vector c: X
kc[i]k2 · 1 ≤ k1/2 kck2 .
(37)
kh(I0 ∪I1 )c k2 ≤ khI0 k2 ≤ khI0 ∪I1 k2 ,
(38)
kck2,I =
|i|=k
Substituting (36) into (34):
which completes the first part of the proof. Part II:khI0 ∪I1 k2 = 0 We next show that hI0 ∪I1 must be equal to 0. In this part we invoke the RIP.
16
Since Dc0 = Dc′ = y, we have Dh = 0. Using the fact that h = hI0 ∪I1 + kDhI0 ∪I1 k22 = −
ℓ−1 X i=2
P
i≥2 hIi ,
hD(hI0 + hI1 ), DhIi i.
(39)
From the parallelogram identity and the RIP it can be shown that |hDc1 , Dc2 i| ≤ δ2k kc1 k2 kc2 k2 ,
(40)
for any two block k-sparse vectors with disjoint support. The proof is given in [21, Lemma 2.1]. Therefore, |hDhI0 , DhIi i| ≤ δ2k khI0 k2 khIi k2 ,
(41)
and similarly for hDhI1 , DhIi i. Substituting into (39), ℓ−1 X 2 hD(hI0 + hI1 ), DhIi i kDhI0 ∪I1 k2 = i=2
≤
ℓ−1 X i=2
(|hDhI0 , DhIi i| + |hDhI1 , DhIi i|)
≤ δ2k (khI0 k2 + khI1 k2 )
ℓ−1 X i=2
khIi k2 .
From the Cauchy-Schwarz inequality, any length-2 vector a satisfies a(1) + a(2) ≤
(42) √
2kak2 . Therefore,
√ q √ khI0 k2 + khI1 k2 ≤ 2 khI0 k22 + khI1 k22 = 2khI0 ∪I1 k2 ,
(43)
where the last equality is a result of the fact that hI0 and hI1 have disjoint support. Substituting into (42) and using (32), (34) and (36), kDhI0 ∪I1 k22 ≤ ≤ ≤
√ √ √
2k−1/2 δ2k khI0 ∪I1 k2 khI0c k2,I 2δ2k khI0 ∪I1 k2 khI0 k2 2δ2k khI0 ∪I1 k22 ,
(44)
where the last inequality follows from khI0 k2 ≤ khI0 ∪I1 k2 . Combining (44) with the RIP (21) we have (1 − δ2k )khI0 ∪I1 k22 ≤ kDhI0 ∪I1 k22 ≤
Since δ2k
1 + (1 + ǫ)f (r) ≤ 2e−N H(r)ǫ · e−m(d−1)H(r) .
Here, the ratio r = kd/N is fixed, f (r) = entropy function defined for 0 < q < 1.
q
N n
√
(68)
p r + 2H(r) , and H(q) = −q log q − (1 − q) log(1 − q) is the
The assumption that di = d simplifies the calculations in the proof. Following the proof, we shortly address the more difficult case in which the blocks have varying lengths. We note that Proposition 3 reduces to the result of [19] when d = 1. However, since f (r) is independent of d, it follows that for d > 1 and fixed problem dimensions n, N, r , block-RIP constants are smaller than the standard RIP constant. The second exponent in the right-hand-side
of (68) is responsible for this behavior. Proof: Let λ = (1 + ǫ)f (r) and define σ ¯ = max σmax (DT ), |T |=k,d
σ = min σmin (DT ), |T |=k,d
(69)
where σmax (DT ), σmin (DT ), are the largest and the smallest singular values of DT , respectively. We use |T | = k, d to denote a column subset of D consisting of k blocks of length d. For brevity we omit subscripts and denote
25
δ = δk|I . The inequalities in the definition of block-RIP (21) imply that 1+δ ≥σ ¯2
(70)
1 − δ ≤ σ2.
(71)
Since δ is the smallest number satisfying these inequalities we have that 1 + δ = max(¯ σ 2 , 2 − σ 2 ). Therefore, Prob
√
p 1 + δ > 1 + λ = Prob max(¯ σ2 , 2 − σ2) > 1 + λ p ≤ Prob(¯ σ > 1 + λ) + Prob( 2 − σ 2 > 1 + λ).
Noting that σ ≥ 1 − λ implies Prob
p
√
(72) (73)
2 − σ 2 ≤ 1 + λ we conclude that
σ > 1 + λ) + Prob(σ < 1 − λ). 1 + δ > 1 + λ ≤ Prob(¯
(74)
We now bound each term in the right-hand-side of (74) using a result of Davidson and Szarek [36] about the concentration of the extreme singular values of a Gaussian matrix. It was proved in [36] that an m × n matrix X with n ≥ m satisfies p 2 Prob(σmax (X) > 1 + m/n + t) ≤ e−nt /2 p 2 Prob(σmin (X) < 1 − m/n − t) ≤ e−nt /2 .
(75) (76)
Applying a union bound leads to Prob σ ¯ >1+
r
kd +t n
!
≤ ≤
X
Prob σmax (DT ) > 1 +
X
e−nt
|T |=k,d 2
/2
r
kd +t n
!
(77) (78)
|T |=k,d
m −nt2 /2 = e . k
(79)
Using the well-known bound on the binomial coefficient (for sufficiently large m) m ≤ emH(k/m) , k
we conclude that Prob σ ¯ >1+
r
kd +t n
!
(80)
2
≤ emH(k/m) e−nt
/2
.
(81)
26
To utilize this result in (74) we rearrange 1 + λ = 1 + (1 + ǫ)f (r) r = 1 + (1 + ǫ)
≥1+
r
kd + n
(82) kd + n
r
r
!
2N H(r) n
(1 + ǫ)
2N H(r) n
r
r
(83) (84)
and obtain that Prob (¯ σ > 1 + λ) ≤ Prob σ ¯ >1+
kd + n
! 2N H(r) . (1 + ǫ) n
(85)
Using (81) leads to Prob (¯ σ > 1 + λ) ≤ emH(k/m) e−
n(1+ǫ)2N H(r) 2n
(86)
= eN H(r)−m(d−1)H(r)−(1+ǫ)N H(r)
(87)
≤ e−N H(r)ǫ e−m(d−1)H(r) .
(88)
Similar arguments are used to bound the second term in (74), completing the proof. The following corollary allows us to emphasize the asymptotic behavior of block-RIP constants per given number of samples. Corollary 3: Consider the setting of Proposition 3, and define g(r) = Prob
q
q
N n
√
r+
1 + δk|I > 1 + (1 + ǫ)g(r) ≤ 2e−mH(r)ǫ .
Proof: Let λ = (1 + ǫ)g(r). The result then follows by replacing (82)-(84) with r r kd 2N + (1 + ǫ) H(r), 1+λ≥1+ n nd
p
2H(r)d−1 . Then:
(89)
(90)
which leads to Prob(¯ σ > 1 + λ) ≤ e−mH(r)ǫ . To evaluate the asymptotic behavior of block-RIP we note that for every ǫ > 0 the right-hand-side of (89) goes to zero when N = md → ∞. Consequently, this means that for fixed d △
δk|I < ρ(r)= − 1 + [1 + g(r)]2 ,
(91)
with overwhelming probability. In Fig. 4 we compute ρ(r) for several problem dimensions and compare it with standard-RIP which is obtained when d = 1. Evidently, as the non-zero entries are forced to block structure, a wider range of sparsity ratios r satisfy the condition of Theorem 1.
27
n/N=1/2 n/N=2/3 n/N=3/4
1.6 1.4
ρ(r)
1.2 1
d=1
0.8 d=5
0.6 0.4 d=20
0.2 0
0
1
2
3
4
r
5 −3
x 10
Fig. 4. The upper bound on δk|I as a function √ of the sparsity ratio r, for three sampling rates n/N , and three block structures d = 1, 5, 20. The horizontal threshold is fixed on ρ∗ = 2 − 1 representing the threshold for equivalence derived in Theorem 1. D 16X24
D 12X24 1
0.8
1
0.8
0.8
Block size
0.4
0.2
0
2
4 6 8 10 Total number of non−zeros
12
0.6
1 2 3 4 6 8 12
0.4
0.2
0
0
5 10 Total number of non−zeros
(a)
(b)
15
Block size 0.6
1 2 3 4 6 8 12
δk|I
δk|I
1 2 3 4 6 8 12
Block size δk|I
0.6
0
D 18X24
1
0.4
0.2
0
0
5 10 15 Total number of non−zeros
(c)
Fig. 5. The standard and block-RIP constants δk|I for three different dimensions n, N . Each graph represent an average over 10 instances of random matrix D. Each instance of D is scaled by a factor such that (15) is satisfied with α + β = 2.
Although Fig. 4 shows advantage for block-RIP, the absolute sparsity ratios predicted by the theory are pessimistic as also noted in [19], [31] in the case of d = 1. To offer a more optimistic viewpoint, the RIP and block-RIP constants were computed brute-force for several instances of D from the Gaussian ensemble. Fig. 5 plot the results and qualitatively affirm that block-RIP constants are more ”likely” to be smaller than their standard RIP counterpart, even when the dimensions n, N are relatively small. Proposition 3 holds as is for matrices D from the Bernoulli ensemble, namely [D]ik = ± √1n with equal probability. In fact, the proposition is true for any ensemble for which the concentration of extreme singular values holds. The proof of Proposition 3 can be adapted to the case in which di are not equal. In this case, the notation |T | = k, d is replaced by |T | = k|I and has the following meaning: T indicates a column subset of D consisting of k blocks from I . Since I contains variable-length blocks, |T | is not constant and depends on the particular column
subset. Consequently, in order to apply the union bounds in (77) we need to consider the worst-case scenario
28
corresponding to the maximal block length in I . Proposition 3 thus holds for d = max(di ). However, it is clear that the resulting probability bound will not be as stringent as in the case of equal di = d, especially when the ratio max(di )/ min(di ) is large. VII. C ONCLUSION In this paper, we studied the problem of recovering an unknown signal x in an arbitrary Hilbert space H, from a given set of n samples which are modelled as inner products of x with sampling functions si , 1 ≤ i ≤ n. The signal x is known to lie in a union of subspaces, so that x ∈ Vi where each of the subspaces Vi is a sum of k subspaces Ai chosen from an ensemble of m possibilities. Thus, there are m k possible subspaces in which x can lie, and a-priori we do not know which subspace is the true one.
We began by determining conditions on the subspaces and the sampling functions such that x can be uniquely recovered from the given samples. The conditions depend on the combined operator S ∗ A where S ∗ is the sampling operator and A is a set transformation corresponding to a basis for the sum of all Ai . We then showed that recovering x can be reduced to a sparsity problem in which the goal is to recover a block sparse vector c from measurements y = S ∗ Ac where the non-zero values in c are grouped into blocks. To determine c we suggested a mixed ℓ2 /ℓ1 convex optimization program that takes on the form of an SOCP. Relying on the notion of block-RIP, we developed sufficient conditions under which c can be perfectly recovered using the proposed algorithm. We also proved that under the same conditions, the unknown c can be stably approximated in the presence of noise. Furthermore, if c is not exactly block-sparse, then its best block-sparse approximation can be well approached using the proposed method. We then showed that when S is chosen at random, the recovery conditions are satisfied with high probability. Specializing the results to MMV systems, we proposed a new method for sampling in MMV problems. In this approach each measurement vector depends on all the unknown vectors. As we show, this can lead to better recovery rate. Furthermore, we established equivalence results for a class of MMV algorithms based on RIP. Throughout the paper, we assumed a finite union of subspaces as well as finite dimension of the underlying spaces. An interesting future direction to explore is the extension of the ideas developed herein to the more challenging problem of recovering x in a possibly infinite union of subspaces, which are not necessary finite-dimensional. Although at first sight this seems like a difficult problem as our algorithms are inherently finite-dimensional, recovery methods for sparse signals in infinite dimensions have been addressed in some of our previous work [23], [24]. In particular, we have shown that a signal lying in a union of shift-invariant subspaces can be recovered efficiently from certain sets of sampling functions. In our future work, we intend to combine these results with those in the current paper in order to develop a more general theory for recovery from a union of subspaces.
29
R EFERENCES [1] S. G. Mallat, “A theory of multiresolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 11, pp. 674–693, 1989. [2] M. Unser, “Sampling—50 years after Shannon,” IEEE Proc., vol. 88, pp. 569–587, Apr. 2000. [3] Y. C. Eldar and T. Michaeli, “Beyond bandlimited sampling: Nonlinearities and smoothness.” [4] I. Djokovic and P. P. Vaidyanathan, “Generalized sampling theorems in multiresolution subspaces,” IEEE Trans. Signal Processing, vol. 45, pp. 583–599, Mar. 1997. [5] Y. C. Eldar and T. Werther, “General framework for consistent sampling in Hilbert spaces,” International Journal of Wavelets, Multiresolution, and Information Processing, vol. 3, no. 3, pp. 347–359, Sep. 2005. [6] O. Christansen and Y. C. Eldar, “Oblique dual frames and shift-invariant spaces,” Applied and Computational Harmonic Analysis, vol. 17, no. 1, 2004. [7] Y. C. Eldar, “Sampling and reconstruction in arbitrary spaces and oblique dual frame vectors,” J. Fourier Analys. Appl., vol. 1, no. 9, pp. 77–96, Jan. 2003. [8] ——, “Sampling without input constraints: Consistent reconstruction in arbitrary spaces,” in Sampling, Wavelets and Tomography, A. I. Zayed and J. J. Benedetto, Eds. Boston, MA: Birkh¨auser, 2004, pp. 33–60. [9] Y. C. Eldar and O. Christansen, “Characterization of oblique dual frame pairs,” J. Applied Signal Processing, pp. 1–11, 2006, article ID 92674. [10] C. E. Shannon, “Communications in the presence of noise,” Proc. IRE, vol. 37, pp. 10–21, Jan 1949. [11] A. J. Jerri, “The Shannon sampling theorem–Its various extensions and applications: A tutorial review,” Proc. of the IEEE, vol. 65, no. 11, pp. 1565–1596, Nov. 1977. [12] H. Nyquist, “Certain topics in telegraph transmission theory,” EE Trans., vol. 47, pp. 617–644, Jan. 1928. [13] P. P. Vaidyanathan, “Generalizations of the sampling theorem: Seven decades after Nyquist,” IEEE Trans. Circuit Syst. I, vol. 48, no. 9, pp. 1094–1109, Sep. 2001. [14] Y. C. Eldar and T. G. Dvorkind, “A minimum squared-error framework for generalized sampling,” IEEE Trans. Signal Processing, vol. 54, no. 6, pp. 2155–2167, Jun. 2006. [15] D. L. Donoho, “Compressed sensing,” IEEE Trans. on Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr 2006. [16] E. Cand`es, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006. [17] J. A. Tropp, “Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit,” Signal Process. (Special Issue on Sparse Approximations in Signal and Image Processing), vol. 86, pp. 572–588, Apr. 2006. [18] ——, “Algorithms for simultaneous sparse approximation. Part II: Convex relaxation,” Signal Process. (Special Issue on Sparse Approximations in Signal and Image Processing), vol. 86, pp. 589–602, Apr. 2006. [19] E. Cand`es and T. Tao, “Decoding by linear programming,” IEEE Trans. Inform. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005. [20] E. J. Cand`es, J. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Comm. Pure and Appl. Math., vol. 59, no. 8, pp. 1207–1223, Mar. 2006. [21] E. Cand`es, “The restricted isometry property and its implications for compressed sensing,” C. R. Acad. Sci. Paris, Ser. I, vol. 346, pp. 589–592, 2008. [22] M. Mishali and Y. C. Eldar, “Blind multi-band signal reconstruction: Compressed sensing for analog signals,” CCIT Report no. 639, EE Dept., Technion - Israel Institute of Technology; submitted to IEEE Trans. Signal Process., Sep. 2007. [23] ——, “Reduce and boost: Recovering arbitrary sets of jointly sparse vectors,” to appear in IEEE Trans. Signal Processing, 2008. [24] Y. C. Eldar, “Compressed sensing of analog signals,” submitted to IEEE Trans. Signal Processing, 2008. [25] M. V. Vetterli, P. Marziliano, and T. Blu, “Sampling signals with finite rate of innovation,” IEEE Trans. Signal Processing, vol. 50, pp. 1417–1428, June 2002. [26] P. Dragotti, M. Vetterli, and T. Blu, “Sampling moments and reconstructing signals of finite rate of innovation: Shannon meets StrangFix,” IEEE Trans. Sig. Proc., vol. 55, no. 5, pp. 1741–1757, May 2007. [27] Y. M. Lu and M. N. Do, “A theory for sampling signals from a union of subspaces,” IEEE Trans. Signal Process, submitted for publication, 2007. [28] M. Stojnic, F. Parvaresh, and B. Hassibi, “On the reconstruction of block-sparse signals with an optimal number of measurements,” arxiv 0804.0041, Mar. 2008. [29] H. Vikalo, F. Parvaresh, S. Misra, and B. Hassibi, “Recovering sparse signals using sparse measurement matrices in compressed DNA microarrays,” Asilomor conference, November, 2007. [30] F. Parvaresh, H. Vikalo, S. Misra, and B. Hassibi, “Recovering sparse signals using sparse measurement matrices in compressed dna microarrays,” Selected Topics in Signal Processing, IEEE Journal of, vol. 2, no. 3, pp. 275–285, June 2008. [31] E. Cand`es and T. Tao, “Near optimal signal recovery from random projections: Universal encoding strategies?” IEEE Trans. Inform. Theory, vol. 52, no. 12, pp. 5406–5425, Dec. 2006. [32] S. F. Cotter, B. D. Rao, K. Engan, and K. Kreutz-Delgado, “Sparse solutions to linear inverse problems with multiple measurement vectors,” IEEE Trans. Signal Processing, vol. 53, no. 7, pp. 2477–2488, July 2005. [33] J. Chen and X. Huo, “Theoretical results on sparse representations of multiple-measurement vectors,” IEEE Trans. Signal Processing, vol. 54, no. 12, pp. 4634–4643, Dec. 2006. [34] T. G. Dvorkind, Y. C. Eldar, and E. Matusiak, “Nonlinear and non-ideal sampling: Theory and methods,” submitted to IEEE Trans. Signal Processing. [35] D. Malioutov, M. Cetin, and A. S. Willsky, “A sparse signal reconstruction perspective for source localization with sensor arrays,” IEEE Trans. Signal Processing, vol. 53, no. 8, pp. 3010–3022, Aug. 2005. [36] S. J. Szarek, “Condition numbers of random matrices,” J. Complexity, vol. 7, no. 2, pp. 131–149, 1991.