compressive sensing signal reconstruction by weighted median ...

Report 3 Downloads 270 Views
COMPRESSIVE SENSING SIGNAL RECONSTRUCTION BY WEIGHTED MEDIAN REGRESSION ESTIMATES Jose L. Paredes

Gonzalo R. Arce∗

Electrical Engineering Department Universidad de Los Andes, Venezuela. [email protected]

Dep. of Electrical and Computer Eng. University of Delaware, USA. [email protected]

ABSTRACT In this paper, we address the compressive sensing signal reconstruction problem by solving an 0 -regularized Least Absolute Deviation (LAD) regression problem. A coordinate descent algorithm is developed to solve this 0 -LAD optimization problem leading to a twostage operation for signal estimation and basis selection. In the first stage, an estimation of the sparse signal is found by a weighted median operator acting on a shifted-and-scaled version of the measurement samples with weights taken from the entries of the projection matrix. The resultant estimated value is then passed to the second stage that tries to identify whether the corresponding entry is relevant or not. This stage is achieved by a hard threshold operator with adaptable thresholding parameter that is suitably tuned as the algorithm progresses. Index Terms— Compressive Sensing, weighted median, linear regression, sparse signal reconstruction. 1. INTRODUCTION Consider an N -dimensional, K-sparse signal X that is remotely sensed by projecting it onto an M × N random basis A. Furthermore, assume that the measurements are noisy as described by Y = AX + ξ, where Y is an M -dimensional measurement vector with K < M  N , A is the measurement matrix with i.i.d. random entries following a Gaussian or a Bernoulli distribution, and ξ = [ξ1 , ξ2 , . . . , ξM ]T is the noise vector with i.i.d. components obeying a common distribution. In this model, the noise vector comprises the noise introduced during the sensing and transmission processes including, among other sources, thermal noise, quantization, sensor system and communication channel. Furthermore, the noise may contain outliers, hence, it is better characterized by a distribution with heavier-than-Gaussian tails. The theory of compressive sensing (CS) [1] has shown that by randomly projecting the sparse signal the most salient information is preserved in just a few measurements such that, with high probability, the sparse signal can be recovered from the low-dimensional representation, Y, by solving an inverse problem. Since the number of measurements is in general much smaller than the signal dimension, the inverse problem is underdetermined and has multiple solution. Furthermore, since the measurement signal is contaminated by noise, the recovery process reduces to finding an approximation of the signal, X, such that a performance criterion is achieved under the constraint that the signal is sparse in some dictionary Ψ of basis functions or tight-frames. A common criterion widely used in ∗ This work was supported in part by the NSF under the Grant ECCS0725422, by ONR under Grant N00014-07-1- 0393 and by the CDCHTULA.

978-1-4244-4296-6/10/$25.00 ©2010 IEEE

4090

the CS literature is to reconstruct the sparse signal by minimizing the norm of a residual error subject to the constrain that the signal is ˆ = arg minX Y − AXp + τ X sparse [1]. Formally [2], X 0 where ·p denotes the p -norm and τ is a regularization parameter that balances the conflicting goals of minimizing the data-fitting term while yielding, at the same time, a sparse solution on X imposed by the sparsity-inducing term. The choice of the optimum norm for the data-fitting term is closely related to the statistical characterization of the noise. Thus, if the underlying contamination follows a Gaussian-distributed model, the l2 -norm is the optimum choice under the maximum likelihood (ML) principle leading thus to solve a regularized least square (LS) regression problem for the recovery of the sparse signal X[1, 2, 3]. However, it is well-known that LS based estimators are highly sensitive to outliers present in the measurement vector leading to a poor performance when the noise no longer follows the Gaussian assumption but, instead, it is better characterized by heavier-than-Gaussian tail distributions. In this paper, we propose to use the 1 -norm for the data-fitting term to mitigate the effect of the impulsive noise in the compressive measurements reducing the signal recovery to solve an 0 regularized least absolute deviation (0 -LAD) problem. Unlike LS based estimator, LAD offers robustness to a broad class of noise being optimum under the ML principle when the underlying contamination follows a Laplacian-distributed model. Solving an 0 -regularized LAD problem leads inevitably to an optimization problem that is Np -hard whose direct solution even for modest-sized signal is infeasible. To overcome this limitation, we use a coordinate descent approach where each component of the sparse signal is estimated by a weighted median operator acting on a shifted-and-scaled version of the measurement samples with weights taken from the entries of the measurement matrix. Once each component is estimated, a hardthresholding operator induced by the sparsity term is applied on the estimated value to decide whether the component is relevant or not. This two-stage operation, WM operator followed by a hard threshold operator, adds the desired robustness to the estimation of the sparse signal, and, at the same time, ensures the sparsity of the solution. 2. 0 -LAD PROBLEM: ONE-DIMENSIONAL CASE Consider the following one-dimensional minimization problem arg min α

M  i=1

|Yi − ai α| + τ0 |α|0

(1)

where Yi is the i-th observation sample, a = [a1 , a2 , . . . , aM ]T is a vector of known coefficients and τ0 is a regularization parameter. We are interested in finding the value of α that minimizes the LAD subject to the constraints imposed by the 0 term.

ICASSP 2010

Theorem 1 (0 -Least Absolute Deviation (0 -LAD)) The solution to the minimization problem (1) is given by ⎧    ⎨ α˜ = MEDIAN |ai |  Yi  M ˜ 1 > τ0 if Y1 − Y − αa ai i=1 α ˆ= ⎩ 0

Otherwise

where the symbol  denotes the replication operator defined as Wi  Wi times

Zi = Zi , Zi , . . . , Zi and Y = [Y1 , Y2 , . . . , YM ]T . For a proof see [4]. Interestingly, Theorem 1 states that the solution to the 0 -regularized LAD problem can be thought of as a twostage operation. First, an estimate of the parameter that minimizes the regularized LAD is found by the weighted median operator and, then, a hard thresholding operator is applied on the estimated value. Note that the estimation stage has been reformulated as a location parameter estimation problem for which a closed form solution exists. Note also that the hard thresholding parameter is equal to the regularization parameter, τ0 . Thus, a large value for the regularization parameter forces the solution of (1) to become zero while a small value for τ0 yields the weighted median sample as the solution of the 0 -LAD minimization problem. 3. CS SIGNAL RECONSTRUCTION BY SOLVING 0 -LAD: A COORDINATE DESCENT APPROACH Consider the CS reconstruction algorithm where the sparse signal is recovered by minimizing the sum of absolute deviation of the residual (Y − AX)i , i = 1, 2, . . . , M subject to the constrain imposed by the sparsity-inducing term X0 . Having the LAD as the data-fitting term, it is expected that the resulting estimator be robust against noise and also enjoy a sparse representation since it combines the LAD criterion and the sparsity-forced regularization. To solve this optimization problem under the coordinate descent framework, let’s proceed as follows. Assume that we want to estimate the n-th entry of the sparsity vector X while keeping the other entries fixed. Thus, the N -dimensional optimization problem reduces to N single variable minimization problems   N  

  Yi − A X ij j M    j=1,j=n   ˆ n = arg min X |Ain |  − Xn  + τ0 |Xn |0   A in Xn i=1     (2) for n = 1, 2, . . . , N , provided that none of the entries in the n-th column of the measurement matrix A is zero. Note that if one of these entries is zero, then the corresponding summand in (2) can be dropped since it becomes a constant independent of Xn . Upon closer examination of Eq. (2), it can be noticed that the first term on the right-hand side is a sum of the weighted absolute

Yi −

N

where rn = Y − N j=1,j=n Aj Xj is the n-th residual term that remains after subtracting from the measurement vector the contribution of all the components but the n-th one and An denotes the n-th column-vector of the measurement matrix A. Note that the robustness inherent in the WM operation and the sparsity induced by the 0 -term are jointly combined into a single estimation-basis-selection task. More precisely, the detection of the few significant entries of the sparse vector X and the estimation of their values are jointly achieved by the WM operation followed by the hard thresholding operator. The former operation adds the desired robustness on the estimation of the n-th entry whereas the latter ˆ n is one of the relevant component of X. Furthermore, detects if X the regularization parameter acts like a tuning parameter that control which component is zeroed according to its magnitude value. 3.1. Iterative WM Regression Estimate To implement an iterative reconstruction algorithm, we need to estimate the regularization parameter from the measurement sample set {Yi |M i=1 }. This task has been shown to be critical in solving regularized linear regression problems [5, 6, 7] since this parameter governs the sparsity of the solution. In our approach, the regularization parameter plays also a vital role in the solution of the 0 -LAD problem since it defines the threshold parameter of the two-stage operator. Larger values for τ0 leads to identify only the most relevant components of X, leaving out nonzero entries with small values. On the other hand, small values for τ0 may lead to wrong selection of the support of X leading to spurious components in the reconstruction signal caused by the noise. The commonly used crossvalidation and generalized cross-validation methods may be suitably adapted to determine the optimal regularization parameter as it was done in [7, 6] for 1 -LAD regression. However, since this parameter strongly influences the accuracy of the reconstruction signal [7, 2], special attention has to be given to its estimation which may increase the computational complexity of the proposed algorithm. Therefore, the following option is considered here. Treat the regularization paTable 1. Iterative WM regression algorithm Input

Initialization

Iteration Step A

Aij Xj

j=1,j=n deviations, where for i = 1, 2, . . . M are the Ain data samples, |Ain |, i = 1, 2, . . . , M are the weights and Xn plays the role of a location parameter. From Theorem 1, the solution to this optimization problem is found by ⎛ M ⎞

Yi − N Aij Xj  j=1,j = n ˜ Xn = MEDIAN ⎝ |Ain |   ⎠  Ain

Step B Step C Output

4091

Iteration counter: k = 1 ˆ (1) = 0, X ˆ (1) ∈ RN Estimation at k = 1: X Hard-thresholding value: Th = Thi ˆ n = 1, 2, . . . , N , compute For the n-th entry of X, M ⎛ N 

ˆ (k)  Aij X Yi − j ⎜  j=1,j = n ˜ = MEDIAN⎜|Ain |   X  Ain ⎝   (k) ˆn = X

i=1

followed by the threshold operator ⎧ ˜n ˜ n 1 > τ0 if rn 1 − rn − An X ⎨ X ˆn = X ⎩ 0 otherwise

Observation vector Y Measurement matrix A Initial hard-thresholding value Thi Number of iterations K0

⎧ ˜ ⎨ X ⎩

⎞ ⎟ ⎟ ⎠

i=1

if

0 Updating operations

˜ n  > Th rn  − rn − XA Otherwise

Th = Thi β k ˆ (k+1) = X ˆ (k) X Check stopping criterium If k ≤ K0 then set k = k + 1 and go to step A; otherwise, end ˆ Recovered sparse signal X

rameter (threshold parameter of the hard thresholding operation) as a tuning parameter whose value changes as the iterative algorithm progresses. More precisely, start with a relative large value of Th favoring sparsity during the first iterations, where Th denotes the threshold parameter. Then, as the iterative algorithm progresses, the hard-thresholding parameter is slowly reduced. By doing so, it is expected that only those entries in the sparse vector X that have the most significant values (higher magnitude values) will be identified in the first iterations. Subsequently, as the hard-thresholding parameter is reduced, those entries that exceed the new threshold value are identified next. The algorithm continues until the Th reaches a final value or until a total number of iterations is reached. We set the hard-thresholding parameter to follow an exponential decay function with base β, i.e., Th = Thi β k where 0 < β < 1 is a tuning parameter that controls the decreasing speed of Th . This particular setting allows us to decrease the hard-thesholding parameter rapidly for the first few iterations, then slowly as k increases, and, ultimately, approaching to zero as k → ∞. Table 1 shows the iterative WM regression algorithm for sparse signal reconstruction. Note that the most recently estimated value for each entry is used for the estimation of subsequent entries in the same iteration step. It turns out that replacing the previous estimates by the most recent ones in a recursive fashion increases the convergence speed of the reconstruction algorithm. The overall operation of the proposed algorithm is as follows. The iterative algorithm aims to detect in order of descending signal strength the nonzero entries on the sparse vector. To achieve so, at the first iteration a rough estimate of the signal (entries of the sparse vector) is obtained through the WM operator. The estimated signal is kept if its magnitude value leads to a variation on the 1 -norm of the corresponding residual signal that is larger than the initial threshold value, otherwise it is considered as a non-significant entry and, hence, it is set to zero. If the entry is kept its influence is removed from the observation vector. The resultant residual signal is used in the estimation of subsequent entries. Thus, it is expected that subsequent signal estimates be closer to the true values since the contributions of those entries considered as relevant have been partially removed from the observation vector as the algorithm progresses. At the end of the first iteration, the estimate of the sparse vector contains nonzero values on those entries possibly related to the strongest entries in the sparse vector X. As the hard-thresholding parameter becomes smaller at the second iteration, it is most likely that those entries of the sparse vector that were selected as significant entries in the previous iteration are identified again as relevant entries in the second iteration. Moreover, refinements of the estimated values for those entries are achieved by the WM operation since the contribution of the larger values have been canceled out from the observation vector. Furthermore, new estimated entries that may become greater than or equal to the new threshold value are also added to the estimated sparse vector. This process continues until the maximum number of iteration is reached. Figure 1 shows an illustrative example depicting how the nonzero values of the sparse vector are detected and estimated iteratively by the proposed algorithm. See [4] for a convergence analysis of the proposed algorithm.

1.2 1

1 0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0

−0.2

−0.2

−0.4

−0.4

−0.6

−0.6

−0.8 −1

−0.8

0

20

40

60

80

100

120

140

160

180

200

1

2

3

4

5

6

7

iteration

8

9

10

11

matrix is equipped with fast matrix-vector multiplication routines like the one used for FFT. Furthermore, the WM operation can be implemented at a complexity in the order of O(M ) by extending the concepts used in the QuickSelect algorithm for the median operator to the weighted median operator. Thus, the overall computation complexity per iteration can be reduced to O(N (log M + M )). As will be shown in the simulations, the number of iterations required by the proposed algorithm is significantly smaller compared to those required by other iterative algorithms. 4. SIMULATIONS In the first simulation, the reconstruction capability of the proposed algorithm is tested in the recovery of a sparse signal from a reduced set of noiseless random projections and compared to those attained by the Matching Pursuit (MP) and linearized Bregman iterative (LBI) algorithm [3]. We are particularly interested in finding the number of iteration needed for the various iterative algorithms to reach a given residual energy. To this end, a set of sparse signals are generated as in [3]. We stop the algorithms as soon as the normalized residual energy becomes < 10−10 . For this simulation, Thi and β are set to 1 and 0.75, respectively. Table 2 shows the results yielded by the various iterative algorithms where each entry in the table reports the ensemble average over 20 trials rounded to the closest integer value. As can be noticed in Table 2, the proposed algorithm reaches the performance target much faster than the others. Indeed, it took nearly four-fold fewer iterations than the MP algorithm and about 43-fold fewer iterations than the LBI algorithm. Note also that the WM regression reconstruction algorithm outperforms the other two reconstruction algorithms in the NMSE sense. For this simulation, the parameters of the Linearized Bregman algorithm are set to one. In order to test the robustness of the proposed algorithm to noise, a 25-sparse signal of length 512 is used as a test signal. It is generated by randomly locating the nonzero entries and then by assigning to the nonzero entries of X random values obeying a zero-mean unitvariance Gaussian distribution. The projected signal is then contaminated with noise with different distributions. Figure 2 depicts the curves of normalized MSE vs SNR yielded by the various reconstruction algorithms for different noise distributions. For the proposed algorithm, the hard thresholding parameter Table 2. Number of Iterations and NMSE.

3.1.1. Computation Complexity per Iteration The proposed reconstruction algorithm is simple to implement and requires minimum memory resources. It involves just matrix-vector multiplications, WM operators and comparisons. The per-iteration complexity is as follows. The data shifting operation is implemented at a complexity cost of O(N log(M )) provided that the projection

4092

12

(a) (b) Fig. 1. (a) Test sparse signal. (b) Nonzero entries of sparse vector as the iterative algorithm progresses. Dotted lines: true values. Solid lines: estimated values.

N 300 600 900 1200

M 75 150 225 300

K 6 12 18 24

Number of iterations WMR MP LBI 15 27 723 16 62 302 17 91 783 17 123 984

Normalized MSE (dB) WMR MP LBI -106 -100 -96 -105 -99 -96 -105 -99 -97 -102 -99 -97

2

5

5

l −magic

2

1

l1−ls Bregman Algorithm ROMP CoSaMP WM Regression

−5

Normalized MSE in dB

0

−10

−15

−20

1.5

1 0

1

−1

0.5

−5

Normalized MSE in dB

0

−10

−2

0

−3 0 1

−15

50

100

150

200

250

300

350

400

450

500

0.5

−20

−1

0

−25

−25

−30

−30

−1.5

−0.5

−35

0

5

10

15

SNR[dB]

20

−35

25

0

5

10

10

5

5

0

0

Normalized MSE in dB

Normalized MSE in dB

10

−5

−10

−15

−20

25

0

50

100

−30

−35

20

25

0

50

100

150

200

250

300

350

400

450

500

300

350

400

450

500

300

350

400

450

500

(b)

2

2

1.5

1.5

1

1

0.5

0.5

0

0

−0.5

−0.5

−1

−1

−1.5

−1.5

−20

−35

15

−3

−15

−30

SNR[dB]

150

(a)

−10

−25

10

20

−2.5

−5

−25

5

15

SNR[dB]

Noisy projections Noiseless projections

(b)

(a)

0

−2

−1 −1.5

−0.5

0

5

10

15

SNR[dB]

20

−2

−2

−2.5

−2.5

−3

25

(d) Fig. 2. Performance of the reconstruction algorithms for different noise distributions. Gaussian noise (a), Laplacian noise (b) and contaminated normal distributions with different percentages of impulsive contamination  = 1% (c) and  = 3% (d). (c)

β was set to 0.95 and the number of iterations to 40, while for LBI algorithm the best results were obtained for μ = 140, δ = 0.1 and 1000 iterations. The parameters of the other reconstruction algorithms are set according to the suggestions given by their authors. For all the reconstruction algorithms 256 measurement were used. The superior performance of the proposed approach is clearly shown in Fig. 2 as the heaviness of the noise’s tail distribution increases. As can be seen in Fig. 2(a), even when the underlying contamination is purely Gaussian noise, the proposed algorithm outperforms the other reconstruction algorithms. This improvement is due, in part, to the inherent robustness of the WM operator in the estimation of the nonzero entries of X and, in part, to the robust detection of the their locations achieved by the adaptive threshold operator. To illustrate this point better, consider the 15-sparse, 512-long signal shown in Fig. 3(a) (top) that has been projected and contaminated with impulsive noise obeying the standard (0,1)-Cauchy distribution. The noise is scaled by a factor of 10−2 and then added to the projected signal. Figure 3(a) (bottom) shows both the noiseless and noisy projections. Figures 3(b)-(f) show the recovered signals achieved by the various reconstruction algorithms using 150 measurements. Note that the MP and the 1 -ls reconstruction algorithms are very sensitive to the presence of outliers introducing several spurious components in the reconstructed signal. As can be seen in Fig. 3(f), the proposed reconstruction algorithm is able not only to identify correctly the most significant nonzero entries of the sparse signal but to yield good estimated values for them as well. 5. CONCLUSIONS In this paper, we present an algorithm to recover a sparse signal from a reduced set of random projections contaminated with impulsive noise. The proposed approach recasts the reconstruction problem as a location parameter estimation where each entry of the sparse signal is iteratively estimated by a WM operator followed by a hard thresholding operation. This two-stage operation emerges naturally as the solution of the one-dimensional 0 -regularized LAD optimization problem which, in turn, is the underlying problem to be solved following a coordinate descent approach for solving the multidimen-

4093

0

50

100

150

200

250

300

350

400

450

500

−3

0

50

100

150

200

(c) 2

2

1.5

1.5

1

1

0.5

0.5

0

0

−0.5

−0.5

−1

−1

−1.5

−1.5

−2

−2

−2.5

−2.5

−3

0

50

100

150

200

250

250

(d)

300

350

400

450

500

−3

0

50

100

150

200

250

(e) (f) Fig. 3. Reconstruction of a 15-sparse, 512-long signal from a set of 150 contaminated measurements.(a) Top: Sparse signal. Bottom: noiseless and noisy projections. Recovered signals yield by: (b) MP, (c) CoSaMP (d) 1 -ls, (e) LBI, and (f) WM regression algorithms. ◦ denotes original signal, • denotes reconstructed signal. sional 0 -LAD regression problem. The proposed approach is able to mitigate the influence of noise in the measurement vector leading to a sparse reconstructed signal without spurious components. 6. REFERENCES [1] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006. [2] J. Tropp and S. Wright, “Computational methods for sparse solution of linear inverse problems,” ACM TECHNICAL REPORT 1, CALTECH, 2009. [3] S. Osher, Y. Mao, B. Dong, and W. Yin, “Fast linearized bregman iteration for compressive sensing and sparse denoising,” Tech. Rep. 08-37, UCLA CAM, 2008. [4] J. Paredes and G. Arce, “Compressive sensing signal reconstruction by weighted median regression estimates,” IEEE Trans. on Signal Processing, Feb. 2009, Submitted. [5] E. Larsson and Y. Sel´en, “Linear regression with a sparse parameter vector,” IEEE Transactions on Signal Processing, vol. 55, no. 2, pp. 451–460, Feb. 2007. [6] H. Wang, G. Li, and G. Jiang, “Robust regression shrinkage and consistent variable selection through the LAD-Lasso,” J. of Business & Economic Statistic, vol. 25, pp. 347–355, Jul 2007. [7] T. Wu and K. Lange, “Coordinate descent algorithm for lasso penalized regression,” The Annual of Applied Statistics, vol. 2, no. 1, pp. 224–244, 2008.