Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email:
[email protected] August 23, 2014
Contents
1
Introduction
2
Preliminary
3
Main Contribution
4
Simulation
5
Summary
Contents
1
Introduction
2
Preliminary
3
Main Contribution
4
Simulation
5
Summary
Introduction Sparse Recovery Problem Sparse recovery problem is expressed as argminkxk0 , subject to y = Ax x
where A ∈ RM ×N is a “fat” sensing matrix. Computationally intractable
Sparse Recovery Algorithms Greedy pursuits (OMP, CoSaMP, SP, etc) Convex optimization based algorithms Non-convex optimization based algorithms Better performance tends to be derived Still lacking complete convergence analysis
Introduction Current Convergence Analysis For p-IRLS (0 < p < 1), the convergence is guaranteed in a sufficiently small neighborhood of the sparse signal. For MM subspace algorithm, it is shown that the generated sequence will converge to a critical point. For SL0, the complete convergence analysis is done due to the “local convexity” of the penalties, and the algorithm needs to solve a sequence of optimization problems rather than a single one to guarantee convergence.
Objective To provide theoretical convergence guarantees of a non-convex approach for sparse recovery from the initial solution to the global optimal.
Contents
1
Introduction
2
Preliminary
3
Main Contribution
4
Simulation
5
Summary
Preliminary Problem Setup Non-convex optimization problem is formulated as argminJ(x) = x
N X
F (xi ), subject to y = Ax
i=1
F (·) belongs to a class of sparseness measures (proposed by R´emi Gribonval et al.) 2 No. 1 No. 2 No. 3 No. 4 No. 5 No. 6
1.5
1
0.5
0 −2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Preliminary Null Space Property Define γ(J, A, K) as the smallest quantity such that J(zS ) ≤ γ(J, A, K)J(zS c ) holds for any set S ⊂ {1, 2, . . . , N } with #S ≤ K and for any z ∈ N (A). Performance analysis with γ(J, A, K) (provided by R´emi Gribonval et al.) If γ(J, A, K) < 1, for any K-sparse x∗ and y = Ax∗ , the optimization problem returns x∗ ; If γ(J, A, K) > 1, there exists a K-sparse x∗ and y = Ax∗ such that the problem returns a signal differs from x∗ ; γ(`0 , A, K) ≤ γ(J, A, K) ≤ γ(`1 , A, K).
Preliminary
Weak Convexity F (·) is ρ-convex if and only if ρ is the largest quantity such that there exists convex H(·) and F (t) = H(t) + ρt2 . F (·) is strongly convex if ρ > 0; convex if ρ = 0; weakly convex (also known as semi-convex) if ρ < 0.
Generalized gradient (a generalization of subgradient for convex functions) for weakly convex functions can be seen as any element in the set ∂F (t) := ∂H(t) + 2ρt.
Contents
1
Introduction
2
Preliminary
3
Main Contribution
4
Simulation
5
Summary
Main Contribution Sparsity-inducing Penalty F (·) is a weakly convex sparseness measure; Vast majority of non-convex sparsity-inducing penalties are formed by weakly convex sparseness measures; There exists α such that F (t) ≤ α|t|. Define −ρ/α as the non-convexity of the penalty (a measure of how quickly the generalized gradient of F (·) decreases). 2 −ρ/α = 1/3 −ρ/α = 1/2 −ρ/α = 1
1.5
1
0.5
0 −2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Main Contribution Projected Generalized Gradient (PGG) Method Initialization: Calculate A† , x(0) = A† y, n = 0; Repeat: ˜ (n + 1) = x(n) − κ∇J(x(n)); x ˜ (n + 1) + A† (y − A˜ x(n + 1) = x x(n + 1)); n = n + 1; Until: Stopping criterion satisfied;
Gradient Update
x* x(0) y=Ax
Projection
Main Contribution Performance of PGG Theorem 1. Assume y = Ax∗ + e and kx∗ k0 ≤ K. For any tuple (J, A, K) with J(·) formed by weakly convex sparseness measure F (·) and γ(J, A, K) < 1, and for any positive constant M0 , if the non-convexity of J(·) satisfies 1 1 − γ(J, A, K) −ρ ≤ , α M0 5 + 3γ(J, A, K) ˆ by PGG satisfies the recovered solution x kˆ x − x∗ k2 ≤
4α2 N κ + 8C2 kek2 C1
provided that kx(0) − x∗ k2 ≤ M0 .
Main Contribution Approximate PGG (APGG) Method To reduce the computational burden of calculating A† = AT (AAT )−1 , assume we adopt AT B to approximate A† , and let ζ = kI − AAT Bk2 . Theorem 2. Under the same assumptions as Theorem 1, if the non-convexity of J(·) satisfies −ρ 1 1 − γ(J, A, K) ≤ , α M0 5 + 3γ(J, A, K) ˆ by APGG satisfies and ζ < 1, the recovered solution x kˆ x − x∗ k2 ≤ 2C3 (ζ)κ + 2C4 (ζ)kek2 provided that kx(0) − x∗ k2 ≤ M0 .
Contents
1
Introduction
2
Preliminary
3
Main Contribution
4
Simulation
5
Summary
Simulation
Parameter Setting Matrix A: 200 × 1000; independent and identically distributed Gaussian entries; Vector x∗ : nonzero entries independently satisfy Gaussian distribution; normalized to have unit `2 norm; The approximate A† is calculated using an iterative method (introduced by Ben-Israel et al.);
Simulation First Experiment Recovery performance of the PGG method is tested in the noiseless scenario with different sparsity-inducing penalties and different choices of non-convexity. 70 60 50
No.1 No.2 No.3 No.4 No.5 No.6
Kmax
40 30 20 10 0 −1 10
0
1
10
10
Non−convexity
2
10
Simulation Second Experiment The recovery performance of the (A)PGG method is compared in the noiseless scenario with some typical sparse recovery algorithms. 1
Successful Recovery Probability
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 20
OMP L1 RL1 ISL0 IRLS PGG APGG 30
40
50
60
K
70
80
90
100
Simulation Third Experiment The recovery precisions of the (A)PGG method are simulated under different settings of step size and measurement noise. 100 90 80
Average RSNR
70 60
PGG APGG MSNR = 20dB MSNR = 30dB MSNR = 40dB MSNR = 50dB MSNR = infinity
50 40 30 20 10 0 −2 10
−4
10
Step Size κ
−6
10
Contents
1
Introduction
2
Preliminary
3
Main Contribution
4
Simulation
5
Summary
Summary
Conclusion A class of weakly convex sparseness measures is adopted to constitute the sparsity-inducing penalties; The convergence analysis of the (A)PGG method reveals that when the non-convexity is below a threshold, the recovery error is linear in both the step size and the noise term; As for the APGG method, the influence of the approximate projection is reflected in the coefficients instead of an additional error term.
Related Works
[1] Laming Chen and Yuantao Gu, The Convergence Guarantees of a Non-convex Approach for Sparse Recovery, IEEE Transactions on Signal Processing, 62(15):3754-3767, 2014. [2] Laming Chen and Yuantao Gu, The Convergence Guarantees of a Non-convex Approach for Sparse Recovery Using Regularized Least Squares, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 3374-3378, May 4-9, 2014, Florence, Italy. [3] Laming Chen and Yuantao Gu, Robust Recovery of Low-Rank Matrices via Non-Convex Optimization, International Conference on Digital Signal Processing (DSP), 355-360, Aug. 20-23, 2014, Hong Kong, China.
http://gu.ee.tsinghua.edu.cn/
Thanks!
Supplementary
Weakly Convex Sparseness Measures with Parameter ρ No. 1. 2. 3. 4. 5. 6.
F (t) |t| |t| (|t|+σ)1−p 1 − e−σ|t|
ln(1 + σ|t|) atan(σ|t|) (2σ|t| − σ 2 t2 )X|t|≤ 1 + X|t|> 1 σ
σ
ρ 0 (p − 1)σ p−2 −σ 2 /2 2 −σ √ /2 −3 3σ 2 /16 −σ 2