Expectation-Maximization Gaussian-Mixture Approximate Message Passing Philip Schniter and Jeremy Vila
Supported in part by NSF-I/UCRC grant IIP-0968910, by NSF grant CCF-1018368, and by DARPA/ONR grant N66001-10-1-4090.
CISS @ Princeton – 3/23/12
Compressive Sensing Goal: recover signal x from noisy sub-Nyquist measurements x ∈ RN
y = Ax + w
y, w ∈ RM
M < N.
where x is K-sparse with K < M , or compressible. With sufficient sparsity and appropriate conditions on the mixing matrix A (e.g. RIP, nullspace), accurate recovery of x is possible using polynomial-complexity algorithms. A common approach (LASSO) is to solve the convex problem min ky − Axk22 + αkxk1 x
where α can be tuned in accordance with sparsity and SNR.
Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
2 / 19
Phase Transition Curves (PTC) K The PTC identifies ratios ( M N , M ) for which perfect noiseless recovery of K-sparse x occurs (as M, N, K → ∞ under i.i.d Gaussian A).
Suppose {xn } are drawn i.i.d. pX (xn ) = λf (xn )+(1−λ)δ(xn )
1
with known λ , K/N . LASSO’s PTC is invariant to f (·). Thus, LASSO is robust in the face of unknown f (·). MMSE-reconstruction’s PTC is far better than Lasso’s, but requires knowing f (·).
K/M (sparsity)
0.8
0.6
0.4
MMSE reconstruct empirical AMP
0.2
theoretical LASSO
0
0.2
0.4
0.6
0.8
M/N (undersampling)
Wu and Verd´ u, “Optimal phase transitions in compressed sensing,” arXiv Nov. 2011. Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
3 / 19
Motivations For practical compressive sensing. . . want minimal MSE – distributions are unknown ⇒ can’t formulate MMSE estimator – but there is hope: various algs seen to outperform Lasso for specific signal classes – really, we want a universal algorithm: good for all signal classes
want fast runtime – especially for large signal-length N (i.e., scalable).
want to avoid algorithmic tuning parameters, – who has the patience to tweak yet another CS algorithm!
Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
4 / 19
Proposed Approach: “EM-GM-GAMP” Model the signal and noise using flexible distributions: – i.i.d Bernoulli Gaussian-mixture (GM) signal p(xn ) = λ
L X
ωl N (xn ; θl , φl ) + (1 − λ)δ(xn ) ∀n
l=1
– i.i.d Gaussian noise with variance ψ
Learn the prior parameters q , {λ, ωl , θl , φl , ψ}L l=1 – treat as deterministic and use expectation-maximization (EM)
Exploit the learned priors in near-MMSE signal reconstruction – use generalized approximate message passing (GAMP)
Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
5 / 19
Approximate Message Passing (AMP) AMP methods infer x from y = Ax + w using loopy belief propagation with carefully constructed approximations. The original AMP [Donoho, Maleki, Montanari ’09] solves the LASSO problem (i.e., Laplacian MAP) assuming i.i.d matrix A. The Bayesian AMP [Donoho, Maleki, Montanari ’10] framework tackles MMSE inference under generic signal priors. The generalized AMP [Rangan ’10] framework tackles MAP or MMSE inference under generic signal & noise priors and generic A.
AMP is a form of iterative thresholding, requiring only two applications of A per iteration and ≈ 25 iterations. Very fast! Rigorous large-system analyses (under i.i.d Gaussian A) have established that (G)AMP follows a state-evolution trajectory with optimal properties [Bayati, Montanari ’10], [Rangan ’10]. Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
6 / 19
AMP Heuristics (Sum-Product)
p1→1 (x1 )
x1
N (y1 ; [Ax]1 , ψ) 1
Message from yi node to xj node:
{xr }r6=j
≈
Z
x2
N (y2 ; [Ax]2 , ψ)
≈ N via CLT Z zP }| { Q pi→j (xj ) ∝ N yi ; r air xr , ψ r6=j pi←r (xr )
N (yM ; [Ax]M , ψ)
pX (x2 )
.. .
.. .
pX (x1 )
.. . xN
pX (xN )
pM ←N (xN )
N (yi ; zi , ψ) N zi ; zˆi (xj ), νiz (xj ) ∼ N zi
To compute zˆi (xj ), νiz (xj ), the means and variances of {pi←r }r6=j suffice, thus Gaussian message passing! Remaining problem: we have 2M N messages to compute (too many!). 2
Exploiting similarity among the messages N (y ; [Ax] , ψ) {pi←j }M i=1 , AMP employs a Taylor-series approximation of their difference whose N (y ; [Ax] , ψ) error vanishes as M → ∞ for dense A (and similar for {pi←j }N i=1 as N → ∞). Finally, need to compute only O(M +N ) N (y ; [Ax] , ψ) messages! 1
2
M
Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
p1→1 (x1 )
x1
pX (x1 )
1
x2
pX (x2 )
2
.. .
.. .
.. . xN
M
pX (xN )
pM ←N (xN )
CISS @ Princeton – 3/23/12
7 / 19
Expectation-Maximization We use expectation-maximization (EM) to learn the signal and noise prior parameters q , {λ, ω, θ, φ, ψ} The missing data is chosen to be the signal and noise vectors (x, w). The updates are performed coordinate-wise. For example, updating λ at the ith EM iteration involves (E-step)
Q(λ|q i ) =
N X
n=1
(M-step)
E ln p(xn ; λ, ω i , θ i , φi ) y; q i
λi+1 = arg max Q(λ|q i ). λ∈(0,1)
The updates of (ω, θ, φ, ψ) are similar (details in paper). All quantities needed for the EM updates are provided by GAMP! Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
8 / 19
Parameter Initialization Initialization matters; EM can get stuck in a local max. We suggest. . . initializing the sparsity λ according to the theoretical LASSO PTC. initializing the noise and active-signal variances using known energies kyk22 , kAk2F and user-supplied SNR0 (which defaults to 20 dB): ψ0 =
kyk22 , (SNR0 + 1)M
(σ 2 )0 =
kyk22 − M ψ 0 λ0 kAk2F
fixing L (e.g., L = 3) and initializing the GM parameters (ω, θ, φ) as the best fit to a uniform distribution with variance σ 2 . We have also developed a “splitting” mode that adds one GM component at a time. a “heavy tailed” mode that forces zero-mean GM components. Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
9 / 19
Examples of Learned Signal-pdfs The following shows the Gaussian-mixture pdf learned by EM-GM-GAMP when the true active-signal pdf was uniform (left) and ±1 (right): 1
2
Estimated True
Estimated True
0.8
0.6
p(xn )
p(xn )
1.5
0.4
1
0.5
0.2
0 −1
−0.5
0
xn
0.5
1
0 −1.5
−0.5
0
xn
0.5
1
1.5
True and learned signal pdfs
True and learned signal pdfs
Philip Schniter and Jeremy Vila (OSU)
−1
EM-GM-GAMP
CISS @ Princeton – 3/23/12
10 / 19
Empirical PTCs: Bernoulli-Rademacher (±1) signals We now evaluate noiseless reconstruction performance via phase-transition curves constructed using N = 1000-length signals, i.i.d Gaussian A, and 100 realizations. 0.9
We see EM-GM-GAMP performing significantly better than LASSO for this signal class.
0.8
K/M
0.7
We also see EM-GM-GAMP performing nearly as well as GM-GAMP under genie-aided parameter settings.
0.6 0.5 0.4
genie GM-GAMP EM-GM-GAMP EM-BG-GAMP Laplacian-AMP theoretical LASSO
0.3 0.2 0.1 0.2
0.4
0.6
0.8
M/N Empirical noiseless Bernoulli-Rademacher PTCs
Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
11 / 19
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
K/M
K/M
PTCs for Bernoulli-Gaussian and Bernoulli signals
0.5 0.4
genie GM-GAMP EM-GM-GAMP EM-BG-GAMP Laplacian-AMP theoretical LASSO
0.3 0.2 0.1 0.2
0.4
0.6
0.5 0.4
genie GM-GAMP EM-GM-GAMP EM-BG-GAMP Laplacian-AMP theoretical LASSO
0.3 0.2 0.1
0.8
0.2
0.4
M/N
0.6
0.8
M/N
Empirical noiseless Bernoulli-Gaussian PTCs
Empirical noiseless Bernoulli PTCs
For these signals, we see EM-GM-GAMP performing. . . significantly better than LASSO, nearly as well as genie-aided GM-GAMP, on par with our previous “EM-BG-GAMP” algorithm. Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
12 / 19
Noisy Recovery: Bernoulli-Rademacher (±1) signals We now compare the normalized MSE of EM-GM-GAMP to several state-of-the-art algorithms (SL0, T-MSBL, BCS, Lasso via SPGL1) for the task of noisy signal recovery under i.i.d Gaussian A. For this, we fixed N = 1000, K = 100, SNR = 25dB and varied M .
Notice that our previous EM-BG-GAMP algorithm cannot accurately model the Bernoulli-Rademacher prior.
0 −5
NMSE [dB]
For these Bernoulli-Rademacher signals, we see EM-GM-GAMP outperforming the other algorithms for all undersampling ratios M/N .
−10 −15 −20
SL0
−25
BCS
genie Lasso
−30 −35 0.2
EM-BG-GAMP T-MSBL EM-GM-GAMP
0.25
0.3
0.35
0.4
0.45
0.5
M/N Noisy Bernoulli-Rademacher recovery NMSE.
Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
13 / 19
Noisy Recovery: Bernoulli-Gaussian and Bernoulli signals
BCS EM-BG-GAMP
−10
SL0 genie Lasso
−5
T-MSBL
NMSE [dB]
NMSE [dB]
0
SL0 genie Lasso
−5
EM-GM-GAMP
−15
BCS T-MSBL
−10
EM-BG-GAMP EM-GM-GAMP
−15 −20 −25
−20
−30 −25 −35 0.2
0.25
0.3
0.35
0.4
0.45
0.5
M/N
0.2
0.25
0.3
0.35
0.4
0.45
0.5
M/N
Noisy Bernoulli-Gaussian recovery NMSE.
Noisy Bernoulli recovery NMSE.
For Bernoulli-Gaussian and Bernoulli signals, EM-GM-GAMP again dominates the other algorithms. We attribute the excellent performance of EM-GM-GAMP to its ability to learn and exploit the true signal prior. Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
14 / 19
Noisy Recovery of Heavy-tailed (Student’s-t) signals BCS
−5
T-MSBL EM-BG-GAMP
NMSE [dB]
SL0
−6
genie Lasso EM-GM-GAMP
−7 −8 −9 −10 0.3
0.35
0.4
0.45
0.5
0.55
0.6
M/N Noisy Student-t recovery NMSE.
Algorithm rankings on heavy-tailed signals are often the reverse of those for sparse signals! In its “heavy tailed” mode, EM-GM-GAMP performs on par with the best algorithms for all M/N . Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
15 / 19
Runtime versus signal-length N We fix M/N = 0.5, K/N = 0.1, SNR = 25dB, and average 50 trials. 2
−20
SPGL1 SL0 SPGL1 fft BCS T-MSBL EM-BG-GAMP
−25
NMSE [dB]
Runtime [sec]
10
1
10
T-MSBL BCS SL0 SPGL1 EM-GM-GAMP
0
10
EM-BG-GAMP
EM-BG-GAMP fft EM-GM-GAMP fft
−30
EM-GM-GAMP
−35
EM-GM-GAMP fft
SPGL1 fft EM-BG-GAMP fft −1
10
3
10
4
10
5
N
10
6
10
Noisy Bernoulli-Rademacher recovery time.
−40 3 10
4
10
5
N
6
10
10
Noisy Bernoulli-Rademacher recovery NMSE.
For all N > 1000, EM-GM-GAMP has the fastest runtime! EM-GM-GAMP can also leverage fast operators for A (e.g., FFT). Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
16 / 19
Extension to structured sparsity (Justin Ziniel) Recovery of an audio signal sparsified via DCT Ψ and compressively sampled via i.i.d Gaussian Φ (so that A = ΦΨ). Exploit persistence of support across time via discrete Markov chains and turbo AMP. t ...
...
(T )
s1
...
(1)
s1
(0)
(0)
x1
(0)
f1
(1)
h1
(0)
h1
(0)
g1
.. .
.. .
.. .
(0)
(0)
fn
.. .
.. .
(T )
θ1
(1)
θ1
(0)
θ1 .. .
(1)
sn
(1)
sn
(T )
sn
θn
...
gm
(1)
θn
(0)
(T )
sN
θn .. . (0)
gM
(0)
xN
.. .
(0)
(1)
sN
(0)
(T −1)
hN
sN
fN
(1)
hN
(0) hN
AMP
(1)
(T )
θN ...
θN (0)
θN
(T −1)
d1 (T )
d1
(0)
d1 (0)
(0)
xn
(T −1)
h1
s1
(T −1)
dN
(1)
dN (0) dN
algorithm EM-GM-AMP turbo EM-GM-AMP
M/N = 1/5 -9.04 dB 8.77 s -12.34 dB 9.37 s
Philip Schniter and Jeremy Vila (OSU)
M/N = 1/3 -12.72 dB 10.26 s -16.07 dB 11.05 s
EM-GM-GAMP
M/N = 1/2 -17.17 dB 11.92 s -20.94 dB 12.96 s
CISS @ Princeton – 3/23/12
17 / 19
Conclusions We proposed a sparse reconstruction alg that uses EM to learn GM-signal and AWGN-noise priors, and that uses GAMP to exploit these priors for near-MMSE signal recovery. Advantages of EM-GM-GAMP: State-of-the-art NMSE performance for all tested signal types. State-of-the-art complexity for signals of length N & 1000. Minimal tuning: choose between “sparse” or “heavy-tailed” modes.
Ongoing related work: Theoretical performance guarantees of EM-GM-GAMP. Extension to non-Gaussian noise. Universal learning/exploitation of structured sparsity. Extensions to matrix completion, dictionary learning, robust PCA.
Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
18 / 19
Matlab code is available at http://ece.osu.edu/~vilaj/EMGMAMP/EMGMAMP.html
Thanks!
Philip Schniter and Jeremy Vila (OSU)
EM-GM-GAMP
CISS @ Princeton – 3/23/12
19 / 19