Compressed Sensing and Bayesian Experimental Design Matthias W. Seeger and Hannes Nickisch Max Planck Institute for Biological Cybernetics Tübingen, Germany
July 8, 2008
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
1 / 12
Problem Statement
Measuring Natural Images Reconstruct natural images u ∈ Rn from m n noisy linear measurements y = Xu + ε ∈ Rm Digital photography Magnetic resonance imaging
How to choose X? Compressed sensing theory: random X Image sparsity highly structured: Random X should not do well
Weiss et.al., Snowbird 07
Our Contributions 1
Large study on natural images
2
Bayesian method for optimizing X: Learning compressed sensing
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
2 / 12
Problem Statement
Measuring Natural Images Reconstruct natural images u ∈ Rn from m n noisy linear measurements y = Xu + ε ∈ Rm Digital photography Magnetic resonance imaging
How to choose X? Compressed sensing theory: random X Image sparsity highly structured: Random X should not do well
Weiss et.al., Snowbird 07
Our Contributions 1
Large study on natural images
2
Bayesian method for optimizing X: Learning compressed sensing
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
2 / 12
Our Approach
Compressed Sensing as Bayesian Design
Low-level statistics of natural images: Images are sparse Use a sparsity prior distribution
Sequential measurement optimization Next filter x> ∗ along direction of largest uncertainty Need for Bayesian posterior P(u|y)
Approximate inference drives optimization of X
1
10
0
10
−1
Expectation Propagation
10
Histogram Sparse Normal
−2
10
−3
10
−4
10
−1
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
−0.5
0
July 8, 2008
0.5
1
3 / 12
Results
Sequential Algorithm Illustration
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
4 / 12
Results
Sequential Algorithm Illustration
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
4 / 12
Results
Comparison of Different Methods 14 SBL (opt) LASSO (rand) L2 (heur) EP (opt)
Reconstruction error
12 10 8 6 4 10
100
300 500 700 Number of measurements
1024
75 images: 64 × 64 = 4k , σ 2 = 0.005 M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
5 / 12
Results
A Very Simple Baseline
L2 (wavelet heuristic) I: Fixed index of wavelet coefficients, coarse → fine. yi i ∈ I vi ← 0 i 6∈ I T ˆ u ←W v Natural images: How you measure is more important than how you reconstruct
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
6 / 12
Results
A Very Simple Baseline
L2 (wavelet heuristic) I: Fixed index of wavelet coefficients, coarse → fine. yi i ∈ I vi ← 0 i 6∈ I T ˆ u ←W v Natural images: How you measure is more important than how you reconstruct
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
6 / 12
Results
The Same for Larger Images 75 images: 64 × 64 = 4k
75 images: 256 × 256 = 65k
14
25
LASSO (rand)
LASSO (rand) L2 (heur) LASSO (heur)
12
L2 (heur) LASSO (heur) 20
10 8
15
6 4 10 100
300
500
700
σ 2 = 0.005 M. Seeger & H. Nickisch (MPI)
102410
5k
10k
15k
σ 2 = 0.001
Bayesian Experimental Design, #459
July 8, 2008
7 / 12
Compressed Sensing by Minimax Theory
Compressed Sensing by Minimax Theory y = Xu + ε ∈ Rm ,
u ∈ Rn sparse
What if u image?
Sparse Signals
Natural Images
Upper bounds Candès, Romberg, Tao, Th. 1.3; Wainwright(a), Th. 1 If m > s log n and P(X) . . . : For all s-sparse signals u: Lasso reconstruction exact with prob. ≈ 1 (over X)
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
8 / 12
Compressed Sensing by Minimax Theory
Compressed Sensing by Minimax Theory y = Xu + ε ∈ Rm ,
u ∈ Rn sparse
What if u image?
Sparse Signals
Natural Images
Lower bounds (this is “tight” because . . . ) “No recovery can be successful for all [s-sparse] signals using significantly fewer observations.” CRT, Sect. 1.4 “We think of the underlying true vector [u] with its support [T ] randomly chosen, . . . ” Wainwright(b), Sect. 1.2
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
8 / 12
Compressed Sensing by Minimax Theory
Compressed Sensing by Minimax Theory m
y = Xu + ε ∈ R , What if u image?
n
u ∈ R sparse
111111111111111111111 000000000000000000000 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 Sparse Signals 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 0000000 1111111 000000000000000000000 111111111111111111111 0000000 1111111 000000000000000000000 111111111111111111111 0000000 1111111 Natural Images 000000000000000000000 111111111111111111111 0000000 1111111 000000000000000000000 111111111111111111111 0000000 1111111 000000000000000000000 111111111111111111111
Lower bounds (this is “tight” because . . . ) “No recovery can be successful for all [s-sparse] signals using significantly fewer observations.” CRT, Sect. 1.4 “We think of the underlying true vector [u] with its support [T ] randomly chosen, . . . ” Wainwright(b), Sect. 1.2
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
8 / 12
Compressed Sensing by Minimax Theory
Compressed Sensing by Minimax Theory m
y = Xu + ε ∈ R , What if u image?
n
u ∈ R sparse
111111111111111111111 000000000000000000000 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 Sparse Signals 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 000000000000000000000 111111111111111111111 0000000 1111111 000000000000000000000 111111111111111111111 0000000 1111111 000000000000000000000 111111111111111111111 0000000 1111111 Natural Images 000000000000000000000 111111111111111111111 0000000 1111111 000000000000000000000 111111111111111111111 0000000 1111111 000000000000000000000 111111111111111111111
Optimality of Lasso and simple P(X): Minimax optimality. Signals sparse, all other things random
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
8 / 12
Where is the Energy?
Where is the Energy?
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
9 / 12
Where is the Energy?
Where is the Energy?
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
9 / 12
Where is the Energy?
Where is the Energy?
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
9 / 12
The World according to Minimax
The World according to Minimax
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
10 / 12
The World according to Minimax
The World according to Minimax
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
10 / 12
The World according to Minimax
The World according to Minimax
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
10 / 12
The World according to Minimax
Nyquist.1 → Nyquist.2
Nyquist.1: Signal band-limited, otherwise random: You cannot do better than X1 Nyquist.2: Signal band-limited, sparse, otherwise random: But now you really cannot do better than X2 < X1 Natural images (real-world signals) √ are band-limited √ are approximately sparse have much exploitable structure beyond that!
You can do better than X2 on images. You can show this works, by sound empirical evaluation
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
11 / 12
The World according to Minimax
Nyquist.1 → Nyquist.2
Nyquist.1: Signal band-limited, otherwise random: You cannot do better than X1 Nyquist.2: Signal band-limited, sparse, otherwise random: But now you really cannot do better than X2 < X1 Natural images (real-world signals) √ are band-limited √ are approximately sparse have much exploitable structure beyond that!
You can do better than X2 on images. You can show this works, by sound empirical evaluation
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
11 / 12
The World according to Minimax
Nyquist.1 → Nyquist.2
Nyquist.1: Signal band-limited, otherwise random: You cannot do better than X1 Nyquist.2: Signal band-limited, sparse, otherwise random: But now you really cannot do better than X2 < X1 Natural images (real-world signals) √ are band-limited √ are approximately sparse have much exploitable structure beyond that!
You can do better than X2 on images. You can show this works, by sound empirical evaluation
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
11 / 12
Conclusions
Conclusions L1 /Lasso/Dantzig/. . . , minimally coherent P(X): meets minimax lower bounds for sparse signals (all other things random) Natural image statistics: sparse, but much else non-random ⇒ Can robustly choose much better filters X Our method uses the same prior knowledge as L1 /Lasso. It does more with the posterior than just maximizing it You should optimize X for your domain of interest. You can do that with little firm prior knowledge. Needs experimental design, not just uniform random sampling Bayesian experimental design can be scaled up with novel variational inference algorithms Seeger et.al., submitted
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
12 / 12
Conclusions
Conclusions L1 /Lasso/Dantzig/. . . , minimally coherent P(X): meets minimax lower bounds for sparse signals (all other things random) Natural image statistics: sparse, but much else non-random ⇒ Can robustly choose much better filters X Our method uses the same prior knowledge as L1 /Lasso. It does more with the posterior than just maximizing it You should optimize X for your domain of interest. You can do that with little firm prior knowledge. Needs experimental design, not just uniform random sampling Bayesian experimental design can be scaled up with novel variational inference algorithms Seeger et.al., submitted
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
12 / 12
Conclusions
Conclusions L1 /Lasso/Dantzig/. . . , minimally coherent P(X): meets minimax lower bounds for sparse signals (all other things random) Natural image statistics: sparse, but much else non-random ⇒ Can robustly choose much better filters X Our method uses the same prior knowledge as L1 /Lasso. It does more with the posterior than just maximizing it You should optimize X for your domain of interest. You can do that with little firm prior knowledge. Needs experimental design, not just uniform random sampling Bayesian experimental design can be scaled up with novel variational inference algorithms Seeger et.al., submitted
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
12 / 12
Conclusions
Conclusions L1 /Lasso/Dantzig/. . . , minimally coherent P(X): meets minimax lower bounds for sparse signals (all other things random) Natural image statistics: sparse, but much else non-random ⇒ Can robustly choose much better filters X Our method uses the same prior knowledge as L1 /Lasso. It does more with the posterior than just maximizing it You should optimize X for your domain of interest. You can do that with little firm prior knowledge. Needs experimental design, not just uniform random sampling Bayesian experimental design can be scaled up with novel variational inference algorithms Seeger et.al., submitted
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
12 / 12
Conclusions
Conclusions L1 /Lasso/Dantzig/. . . , minimally coherent P(X): meets minimax lower bounds for sparse signals (all other things random) Natural image statistics: sparse, but much else non-random ⇒ Can robustly choose much better filters X Our method uses the same prior knowledge as L1 /Lasso. It does more with the posterior than just maximizing it You should optimize X for your domain of interest. You can do that with little firm prior knowledge. Needs experimental design, not just uniform random sampling Bayesian experimental design can be scaled up with novel variational inference algorithms Seeger et.al., submitted
M. Seeger & H. Nickisch (MPI)
Bayesian Experimental Design, #459
July 8, 2008
12 / 12