Robust Sparse Recovery via Non-Convex Optimization

Report 3 Downloads 174 Views
Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: [email protected]

August 23, 2014

Contents

1

Introduction

2

Preliminary

3

Main Contribution

4

Simulation

5

Summary

Contents

1

Introduction

2

Preliminary

3

Main Contribution

4

Simulation

5

Summary

Introduction Sparse Recovery Problem Sparse recovery problem is expressed as argminkxk0 , subject to y = Ax x

where A ∈ RM ×N is a “fat” sensing matrix. Computationally intractable

Sparse Recovery Algorithms Greedy pursuits (OMP, CoSaMP, SP, etc) Convex optimization based algorithms Non-convex optimization based algorithms Better performance tends to be derived Still lacking complete convergence analysis

Introduction Current Convergence Analysis For p-IRLS (0 < p < 1), the convergence is guaranteed in a sufficiently small neighborhood of the sparse signal. For MM subspace algorithm, it is shown that the generated sequence will converge to a critical point. For SL0, the complete convergence analysis is done due to the “local convexity” of the penalties, and the algorithm needs to solve a sequence of optimization problems rather than a single one to guarantee convergence.

Objective To provide theoretical convergence guarantees of a non-convex approach for sparse recovery from the initial solution to the global optimal.

Contents

1

Introduction

2

Preliminary

3

Main Contribution

4

Simulation

5

Summary

Preliminary Problem Setup Non-convex optimization problem is formulated as argminJ(x) = x

N X

F (xi ), subject to y = Ax

i=1

F (·) belongs to a class of sparseness measures (proposed by R´emi Gribonval et al.) 2 No. 1 No. 2 No. 3 No. 4 No. 5 No. 6

1.5

1

0.5

0 −2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Preliminary Null Space Property Define γ(J, A, K) as the smallest quantity such that J(zS ) ≤ γ(J, A, K)J(zS c ) holds for any set S ⊂ {1, 2, . . . , N } with #S ≤ K and for any z ∈ N (A). Performance analysis with γ(J, A, K) (provided by R´emi Gribonval et al.) If γ(J, A, K) < 1, for any K-sparse x∗ and y = Ax∗ , the optimization problem returns x∗ ; If γ(J, A, K) > 1, there exists a K-sparse x∗ and y = Ax∗ such that the problem returns a signal differs from x∗ ; γ(`0 , A, K) ≤ γ(J, A, K) ≤ γ(`1 , A, K).

Preliminary

Weak Convexity F (·) is ρ-convex if and only if ρ is the largest quantity such that there exists convex H(·) and F (t) = H(t) + ρt2 . F (·) is strongly convex if ρ > 0; convex if ρ = 0; weakly convex (also known as semi-convex) if ρ < 0.

Generalized gradient (a generalization of subgradient for convex functions) for weakly convex functions can be seen as any element in the set ∂F (t) := ∂H(t) + 2ρt.

Contents

1

Introduction

2

Preliminary

3

Main Contribution

4

Simulation

5

Summary

Main Contribution Sparsity-inducing Penalty F (·) is a weakly convex sparseness measure; Vast majority of non-convex sparsity-inducing penalties are formed by weakly convex sparseness measures; There exists α such that F (t) ≤ α|t|. Define −ρ/α as the non-convexity of the penalty (a measure of how quickly the generalized gradient of F (·) decreases). 2 −ρ/α = 1/3 −ρ/α = 1/2 −ρ/α = 1

1.5

1

0.5

0 −2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Main Contribution Projected Generalized Gradient (PGG) Method Initialization: Calculate A† , x(0) = A† y, n = 0; Repeat: ˜ (n + 1) = x(n) − κ∇J(x(n)); x ˜ (n + 1) + A† (y − A˜ x(n + 1) = x x(n + 1)); n = n + 1; Until: Stopping criterion satisfied;

Gradient Update

x* x(0) y=Ax

Projection

Main Contribution Performance of PGG Theorem 1. Assume y = Ax∗ + e and kx∗ k0 ≤ K. For any tuple (J, A, K) with J(·) formed by weakly convex sparseness measure F (·) and γ(J, A, K) < 1, and for any positive constant M0 , if the non-convexity of J(·) satisfies 1 1 − γ(J, A, K) −ρ ≤ , α M0 5 + 3γ(J, A, K) ˆ by PGG satisfies the recovered solution x kˆ x − x∗ k2 ≤

4α2 N κ + 8C2 kek2 C1

provided that kx(0) − x∗ k2 ≤ M0 .

Main Contribution Approximate PGG (APGG) Method To reduce the computational burden of calculating A† = AT (AAT )−1 , assume we adopt AT B to approximate A† , and let ζ = kI − AAT Bk2 . Theorem 2. Under the same assumptions as Theorem 1, if the non-convexity of J(·) satisfies −ρ 1 1 − γ(J, A, K) ≤ , α M0 5 + 3γ(J, A, K) ˆ by APGG satisfies and ζ < 1, the recovered solution x kˆ x − x∗ k2 ≤ 2C3 (ζ)κ + 2C4 (ζ)kek2 provided that kx(0) − x∗ k2 ≤ M0 .

Contents

1

Introduction

2

Preliminary

3

Main Contribution

4

Simulation

5

Summary

Simulation

Parameter Setting Matrix A: 200 × 1000; independent and identically distributed Gaussian entries; Vector x∗ : nonzero entries independently satisfy Gaussian distribution; normalized to have unit `2 norm; The approximate A† is calculated using an iterative method (introduced by Ben-Israel et al.);

Simulation First Experiment Recovery performance of the PGG method is tested in the noiseless scenario with different sparsity-inducing penalties and different choices of non-convexity. 70 60 50

No.1 No.2 No.3 No.4 No.5 No.6

Kmax

40 30 20 10 0 −1 10

0

1

10

10

Non−convexity

2

10

Simulation Second Experiment The recovery performance of the (A)PGG method is compared in the noiseless scenario with some typical sparse recovery algorithms. 1

Successful Recovery Probability

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 20

OMP L1 RL1 ISL0 IRLS PGG APGG 30

40

50

60

K

70

80

90

100

Simulation Third Experiment The recovery precisions of the (A)PGG method are simulated under different settings of step size and measurement noise. 100 90 80

Average RSNR

70 60

PGG APGG MSNR = 20dB MSNR = 30dB MSNR = 40dB MSNR = 50dB MSNR = infinity

50 40 30 20 10 0 −2 10

−4

10

Step Size κ

−6

10

Contents

1

Introduction

2

Preliminary

3

Main Contribution

4

Simulation

5

Summary

Summary

Conclusion A class of weakly convex sparseness measures is adopted to constitute the sparsity-inducing penalties; The convergence analysis of the (A)PGG method reveals that when the non-convexity is below a threshold, the recovery error is linear in both the step size and the noise term; As for the APGG method, the influence of the approximate projection is reflected in the coefficients instead of an additional error term.

Related Works

[1] Laming Chen and Yuantao Gu, The Convergence Guarantees of a Non-convex Approach for Sparse Recovery, IEEE Transactions on Signal Processing, 62(15):3754-3767, 2014. [2] Laming Chen and Yuantao Gu, The Convergence Guarantees of a Non-convex Approach for Sparse Recovery Using Regularized Least Squares, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 3374-3378, May 4-9, 2014, Florence, Italy. [3] Laming Chen and Yuantao Gu, Robust Recovery of Low-Rank Matrices via Non-Convex Optimization, International Conference on Digital Signal Processing (DSP), 355-360, Aug. 20-23, 2014, Hong Kong, China.

http://gu.ee.tsinghua.edu.cn/

Thanks!

Supplementary

Weakly Convex Sparseness Measures with Parameter ρ No. 1. 2. 3. 4. 5. 6.

F (t) |t| |t| (|t|+σ)1−p 1 − e−σ|t|

ln(1 + σ|t|) atan(σ|t|) (2σ|t| − σ 2 t2 )X|t|≤ 1 + X|t|> 1 σ

σ

ρ 0 (p − 1)σ p−2 −σ 2 /2 2 −σ √ /2 −3 3σ 2 /16 −σ 2