Decentralized Support Detection of Multiple ... - Semantic Scholar

Report 3 Downloads 129 Views
DECENTRALIZED SUPPORT DETECTION OF MULTIPLE MEASUREMENT VECTORS WITH JOINT SPARSITY Zhi Tian†

Qing Ling* *



Department of Automation, University of Science and Technology of China, Hefei, Anhui, China Department of Electrical and Computer Engineering, Michigan Technological University, Houghton, Michigan, USA

ABSTRACT This paper considers the problem of finding sparse solutions from multiple measurement vectors (MMVs) with joint sparsity. The solutions share the same sparsity structure, and the locations of the common nonzero support contain important information of signal features. When the measurement vectors are collected from spatially distributed users, the issue of decentralized support detection arises. This paper develops a decentralized row-based Lasso (DR-Lasso) algorithm for the distributed MMV problem. A penalty term on row-based total energy is introduced to enforce joint sparsity for the MMVs, and consensus constraints are formulated such that users can consent on the total energy, and hence the common nonzero support, in a decentralized manner. As an illustrative example, the problem of cooperative spectrum occupancy detection is solved in the context of wideband cognitive radio networks. Index Terms— multiple measurement vectors, joint sparsity, support detection, decentralized row-based Lasso 1. INTRODUCTION The problem of finding sparse solutions from multiple measurement vectors (MMVs) with joint sparsity arises in many signal processing applications [1, 2, 3]. These measurement vectors share the same sparsity structure in the sense that the solution representing each measurement vector has a small number of nonzero entries with the same nonzero support as other solutions, but the locations of the nonzero support are unknown, and the amplitudes of these solutions are different. One of the early motivating applications of the MMV problem is magnetoencephalography (MEG) for brain imaging [2]. For the MEG signals measured from multiple snapshots, the activation magnitudes of the brain change, but the locations of the activation sites do not. This way the MMV problem boils down to recovering multiple sparse signals with the same sparsity structure. In recent years, distributed measurement systems have become prevalent, in which a network of users collaboratively detect and estimate signals with joint sparsity. In such a network, measurement vectors are Qing Ling is supported in part by the NSFC grant #61004137. Zhi Tian is supported in part by the NSF grant #ECS-0925881.

978-1-4577-0539-7/11/$26.00 ©2011 IEEE

2996

collected at distributed sites and hence not necessarily centrally available for solving the inverse problem, which gives rise to the distributed MMV problem. Examples abound, such as distributed coding and decoding for compressed sensing in wireless sensor networks [3], cooperative spectrum sensing for wideband cognitive radio networks [4], to name a few. This paper investigates the distributed MMV problem and develops a decentralized support detection algorithm. Most of the existing work focuses on the centralized MMV problem with the use of a fusion center where all measurement vectors are available [1, 2, 3]. For a large-scale distributed and networked measurement system where scalability in computation and communication is important, each user has to make its own decision and can only communicate with its neighbors within a one-hop transmission range. For such an infrastructureless network, this paper derives a decentralized and iterative solution that implements a joint optimization approach to simultaneous support detection and multiple sparse signal recovery. A total-energy-related quantity is introduced for decentralized support detection from the MMVs, and consensus-enforcing constraints are formulated such that users consent on the total-energy-related quantity (and hence the common nonzero support) in a decentralized manner. The proposed decentralized algorithm can be shown to converge to the globally optimal solution. 2. SIGNAL MODEL Consider the problem of sparse signal recovery in noisy linear systems. In an MMV problem, a set of N measurement vectors yn ∈ RK , which are collected at N networked users, are generated from N unknown sparse signal vectors sn ∈ RM , n = 1, 2, . . . , N , as follows: (MMV) :

yn = An sn + wn ,

n = 1, . . . , N,

(1)

where An ∈ RK×M and wn ∈ RK are the corresponding sampling matrix and additive noise, respectively. When {sn }n share the same sparsity structure, the unknown signal matrix S = [s1 , . . . , sN ] ∈ RM ×N has a sparse representation. It means that S has a small number of rows that contain nonzero entries, which is measured by the mixed p /q -norm

ICASSP 2011

of S for p ∈ [0, 2) (in the following S[m,:] denotes the m-th row of S and smn is the (m, n)-th entry of S):  T    Sp,q =  S[1,:] q , . . . , S[M,:] q  =

p

 1/q p 1/p M N q   . m=1  n=1 |smn | 



A typical choice is (p, q) = (1, 2), which renders Sp,q convex. When N = 1 and p = 1, the p /q -norm reduces to an 1 -norm s1 for a vector s, and the MMV problem in (1) reduces to a single measurement vector (SMV) problem that has been extensively studied in the compressed sensing literature [5]. The least absolute shrinkage and selection operator (Lasso) can be used to recover s by imposing 1 -norm regularization on the least squares formulation [6]. Considering the joint sparsity of the row elements in S, the MMV problem can be solved by a row-based Lasso (RLasso) formulation as follows: (R-Lasso) :

N 1 min yn − An sn 22 + N λSp,q . (2) S 2 n=1

This formulation acts like the Lasso on a row-by-row basis: depending on λ, an entire row of S may be shrinked to zero. Meanwhile, if a row is selected, all the N entries in this row will be nonzero. Note that the R-Lasso formulation for the MMV problem differs from the group Lasso (G-Lasso) formulation in [7] for group selection. In group Lasso, either a single or multiple measurement vectors are collected to measure a single signal source s. The vector s has a group sparsity structure, which means that, among the G (pre-defined) groups of entries in s, a small number of groups are nonzero, and each group of entries are either all nonzero or all zero. Correspondingly, the penalty term for group sparsity is an p /q -norm term on s, in which q -norm is imposed within each group and p -norm is imposed across groups [7]. When G = 1 and p = 1, the G-Lasso becomes the Lasso for the SMV problem. Despite of the similarity between the G-Lasso and the RLasso, their decentralized implementations are quite different. For the G-Lasso, a decentralized algorithm has been developed under the consensus optimization framework [8, 9]. Therein, each user n keeps a local copy s(n) of the common unknown signal s, and then seeks to consent on s with its one-hop neighbors through consensus-enforcing constraints on s(n) . Essentially, it solves an SMV problem, and can further be cast as a decentralized estimation problem when MMVs are collected. However, in the R-Lasso formulation, the signals {sn }n to be recovered by individual users are not the same, which makes it impossible to directly impose consensus constraints on {sn }n . User n has to recover its own signal sn , while collaboratively decides on which rows of sn are nonzero. As a result, the R-Lasso is an MMV problem that requires decentralized support detection.

2997

3. ALGORITHM DEVELOPMENT This section solves the R-Lasso in (2) in a decentralized manner. Suppose that a large number of N users are spatially distributed over a large field. In the absence of a fusion center, each user n can only exchange information with its one-hop neighbors, denoted by the set Nn . 3.1. Support Detection Recast as Energy Detection Let p = 1 and q = 2, the mixed 2 -p /q -norm minimization problem in (2) becomes: N M N

1 2  min yn − An sn 2 + N λ s2mn . (3) S 2 n=1 m=1 n=1 The task of solving (3) in a decentralized manner is nontrivial, because the p /q -norm penalty term in (3) is not separable with respect to user n, and hence renders many decentralized optimization techniques inapplicable [10]. Meanwhile, the first least-squares term is completely separable, which alone does not offer any mechanism for cooperation of the users. To address this issue, we introduce a new vector r ∈ RM , N with its m-th row element rm being defined as n=1 s2mn . Apparently, r is a function of S, and its elements take on nonnegative values. In fact, each rm is the total energy collected from all signals on their m-th rows. In this sense, the R-Lasso penalizes on each row of all signals based on the total energy, and hence bears resemblance to a decentralized energy detection problem. Now (3) turns to: N 

√ ||yn − An sn ||22 + N λ1T r, n=1  N rm = n=1 s2mn , ∀m, 1 2

min s.t.

(4)

where 1 is the all-one vector with length M . To make (4) solvable in a decentralized way, we let each user n keep a local copy of r, denoted as r(n) , and let local copies of one-hop neighbors consent to the same value. If the network of users is connected, then the following consensus optimization problem is equivalent to (4): min s.t.

1 2

N 

yn − An sn 22 + λ

n=1 (n) rm = (n)

r

N

2 n=1 smn , (n)

=r

,

N  n=1

√ 1T r(n) , (5a)

∀m, ∀n,

(5b)



(5c)

∀n, ∀n ∈ Nn .

3.2. A Suboptimal Algorithm Note that (5) is non-convex due to the nonlinear total energy constraints (5b). This paper proposes a suboptimal iterative algorithm. Each iteration contains two stages, one for updating {sn }n and another for updating {r(n) }n .

In the first stage, to update {sn }n , let us consider the unconstrained optimization problem (5a) and treat each r(n) as a function of sn as in (5b). For the m-th element smn of sn , the optimality condition is hence: √ (n)  M ∂ rm T (6) −anm yn − l=1 anl sln + λ ∂smn = 0, √

(n) rm

∂ ∂smn

where anm is the m-th column of An ;  (n) gradient of rm with respect to smn .

Table 1. DR-Lasso: Decentralized Row-based Lasso Initialization:

Iteration:

is the sub-

Step 1: Step 2:

(n)

From (5b), if rm = 0, then:  √ (n)   ∂ rm   ∂smn  ≤ 1.

Decision:

(7)

2

(n)

Otherwise, if if rm =  0, then  √ (n) (n) ∂ rm = s / rm . mn ∂smn

(8)

Plugging (7) and (8) into (6), we can derive the optimality condition for smn . Each smn is given in the following cases: (n)

Case 1: if rm = 0, then smn =

aTnm (yn −

(n)

Case 2: if rm = 0 and aTnm (yn − then smn = 0; Case 3: if then

(n) rm

= 0 and

smn =

aTnm (yn

aTnm (yn −





 l=m

l=m

l=m anl sln ) T anm anm

anl sln )2 < λ, (10)



anl sln )2 ≥ λ,

+ γmn

1 2

s.t.

r

N 

n=1 (n)

N s2n − r(n) 22 ,

= r(n) ,

∀n, ∀n ∈ Nn .

r(n) ←

(9)

(11)

where γmn is between ±λ. The optimality condition for smn in (6) is inexact, since (n ) each rm , n = n is also a function of sn . However, in this (n ) stage each user n can treat any rm , n = n as a constant after collecting it; the influence of this approximation is marginal. Therefore, each user n can iteratively apply (9), (10), and (11) in an inner loop to approximately solve sn ; this is indeed a coordinate descent subroutine. In the second stage, after the users calculates {sn }n , the local copies {r(n) }n should be updated according to the total energy constraints (5b) and the consensus constraints (5c). This constraint satisfaction problem can be essentially recast as a consensus summation formulation: min

loop. In each iteration, each user n first updates r(n) with sn solved by itself previously and the neighboring local copies (n ) rm , n ∈ Nn , then updates a vector of Lagrange multipliers. The updating rule for r(n) is:



l=m anl sln )  ; (n) aTnm anm + λ/ rm

(12a) (12b)

We propose to apply the augmented Lagrangian method [8, 9, 10] to solve (12); this subroutine also contains an inner

Each user n collects its measurement vector yn , sets the local copy of r as r(n) ← 0, and sets the estimate of signal vector as sn ← 0, ∀n; At each iteration, each user n does steps 1-2: updates sn in an inner loop with T1 iterations, according to (9), (10), and (11); updates r(n) in an inner loop with T2 iterations, according to (13) and (14); After T iterations, each user n locally makes binary support detection by thresholding r(n) ; meanwhile, the estimates of signal vectors are sn .

 c|N

n|

(r(n) +¯r(n) )+N s2n −|Nn |βn 1+2c|Nn |

+

,

(13)

|Nn | where [·]+ = max(0, ·) denotes non-negative  mapping; 1 (n ) denotes the cardinality of |Nn |; ¯r(n) = r ;  n ∈Nn |Nn | M βn ∈ R is the vector of Lagrange multipliers with respect to the consensus constraints of user n; c is a positive constant. The updating rule for βn is:   (14) βn ← βn + c r(n) − ¯r(n) . Readers of interest are referred to [8, 9, 10] for derivation of the subroutine. 3.3. Decentralized Support Detection The decentralized R-Lasso (DR-Lasso) algorithm for support detection is summarized in Table 1. The computation burden at each iteration is low, concerning the local measurement vector yn and the local sampling matrix An only, for each user n. In terms of the communication burden of each user n, the inner loop in Step 1 requires only local information and brings no communication burden. At each iteration of the inner loop in Step 2, only the local copy r(n) needs to be broadcast to its one-hop neighbors for each user n. Therefore, the overall communication burden for each user n is M × T × T1 . To further reduce the computation burden and the communication burden, we can run the inner loops in Step 1 and Step 2 inexactly, namely with fewer iterations. Indeed, calculating the total energy exactly in the beginning of the algorithm is unnecessary since the estimates of signal vectors are inaccurate at that time, and vise versa.

2998

1 Element #5 Element #11

0.8 Estimate

Probability of Detection

1

0.5

Lasso + ED, two−step R−Lasso, centralized DR−Lasso, decentralized 0 0

0.5 Probability of False Alarm

0.6 0.4 0.2 0 0

1

20

40 60 Iteration

80

100

Fig. 1. ROC curves for support detection. The compression ratio in sampling is K/M = 50%.

Fig. 2. Estimates of nonzero elements #5 and #11 converge, taking one cognitive radio as an example.

4. SIMULATIONS

tralized version DR-Lasso converges to near the centralized optimal solution. Taking one cognitive radio as an example, convergence rate of its estimate is shown in Fig. 2. Estimates of nonzero elements #5 and #11 converge to near the true values while estimates of other elements converge to zero.

Consider N = 6 distributed users. Let s ∈ RM be a sparse vector with M = 20 and sparsity order I = 2. For each user n, related parameters are generated as follows: sn = Hn s, An = Qn F, and yn = An sn + wn , where Hn ∈ RM ×M is a diagonal matrix who diagonal elements are i.i.d. Rayleigh distributed, F is the M -point inverse discrete Fourier transform matrix, and Qn ∈ RK×M is a random Gaussian matrix whose elements are i.i.d. with zero mean and unit variance. Due to the diagonal nature of Hn , all {sn }n share the same nonzero support as the sparse vector s. When K < M , Qn acts as a random matrix for compressed sensing. The above simulation setup describes a cooperative spectrum sensing problem in a cognitive radio network [4]. The transmitted spectrum from primary users is modeled by s, which is sparse due to low spectrum utilization. Without the channel knowledge Hn , cognitive radio n only seeks to recover its own received spectrum sn . To do so, MMVs {yn }n are collected in the time domain through sub-Nyquist-rate sampling reflected in Qn . From {yn }n , the network of cognitive radios seeks to detect the common nonzero support of {sn }n , and hence the spectrum occupancy of the primary users. A nonzero row infers to an occupied channel in frequency. In this regard, the figure of merit is the receiver operating characteristic (ROC) in detecting the nonzero support. Fig. 1 depicts the ROC curves for K = 10 and the signalto-noise-ratio (SNR) being −5dB. As a benchmark, a conventional two-step algorithm is also tested, in which each node first recovers sn locally from yn as a SMV problem using the Lasso, without cooperation; then, local estimates are collected to compute the total energy and make cooperative support detection via energy detection (ED). It is suboptimal, because the individually recovered sparse signals at the first step may have different nonzero supports. In contrast, our R-Lasso solutions jointly perform support detection and sparse signal recovery, resulting in evident performance gain. The decen-

2999

5. REFERENCES [1] J. Chen and X. Huo, “Theoretical results on sparse representations of multi-measurements vectors,” IEEE Trans. Signal Processing, vol. 54, pp. 4634–4643, 2006. [2] S. Cotter, B. Rao, K. Engan, and K. Kreutz-Delgado, “Sparse solutions to linear inverse problems with multiple measurement vectors,” IEEE Trans. Signal Processing, vol. 53, pp. 2477–2488, 2005. [3] M. Duarte, S. Sarvotham, M. Wakin, D. Baron, and R. Baraniuoint, “Sparsity models for distributed compressed sensing,” Proc. of ICASSP, 2005. [4] F. Zeng, C. Li, and Z. Tian, “Distributed compressive spectrum sensing in cooperative multi-hop wideband cognitive networks,” IEEE Journal of Selected Topics in Signal Processing, to appear. [5] D. Donoho, “Compressed sensing,” IEEE Trans. on Information Theory, vol. 52, pp. 1289–1306, 2006. [6] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society, Series B, vol. 58, pp. 267–288, 1996. [7] M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” Journal of the Royal Statistical Society, Series B, vol. 68, pp. 49–67, 2007. [8] G. Mateos, J. Bazerque, and G. Giannakis, “Distributed sparse linear regression,” IEEE Transactions on Signal Processing, vol. 58, pp. 5262–5276, 2010. [9] J. Bazerque, G. Mateos, and G. Giannakis,“Group-lasso on splines for spectrum cartography,” IEEE Trans. on Signal Processing, submitted. arXiv:1010.0274v1. [10] D. Bertsekas and J. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Second Edition, Athena Scientific, 1997.