Accelerating image reconstruction using variable splitting methods Jeffrey A. Fessler EECS Dept., BME Dept., Dept. of Radiology University of Michigan web.eecs.umich.edu/˜fessler
UM CSP Seminar May 23, 2013 1
Disclosure • • • •
Research support from GE Healthcare Research support to GE Global Research Supported in part by NIH grants R01 HL-098686 and P01 CA-87634 Equipment support from Intel
2
Statistical image reconstruction: a CT revolution • A picture is worth 1000 words • (and perhaps several 1000 seconds of computation?)
Thin-slice FBP
ASIR
Statistical
Seconds
A bit longer
Much longer
(Same sinogram, so all at same dose)
3
Outline • Image denoising (review) • Image restoration Antonios Matakos, Sathish Ramani, JF, IEEE T-IP, May 2013 Accelerated edge-preserving image restoration without boundary artifacts
• Low-dose X-ray CT image reconstruction Sathish Ramani & JF, IEEE T-MI, Mar. 2012 A splitting-based iterative algorithm for accelerated statistical X-ray CT reconstruction
• Model-based MR image reconstruction Sathish Ramani & JF, IEEE T-MI, Mar. 2011 Parallel MR image reconstruction using augmented Lagrangian methods
• Image in-painting (e.g., from cutset sampling) using sparsity (premier! inspired by work of D. Neuhoff and M. Prelee) 4
Image denoising
5
Denoising using sparsity Measurement model: y x ε |{z} = |{z} + |{z} observed unknown noise Object model: assume Q x is sparse (compressible) for some orthogonal sparsifying transform Q , such as an orthogonal wavelet transform (OWT). Sparsity regularized estimator: 1 Qx ∥ p . xˆ = arg min ∥yy − x ∥22 +β ∥Q | {z } x |2 {z } sparsity data fit Regularization parameter β determines trade-off. Equivalently (because Q−1 = Q′ is an orthonormal matrix): 1 Qy : β, p) Qy − θ ∥22 + β ∥θ ∥ p = shrink(Q xˆ = Q ′θˆ , θˆ = arg min ∥Q 2 θ Non-iterative solution!
6
Orthogonal transform thresholding Equation:
Qy : β, p) xˆ = Q ′ shrink(Q
Block diagram: Noisy Denoised Analysis Synthesis Shrink image → Transform → → θˆ → Transform → image β, p y xˆ Q Q’ todo: show shrink function for p = 1 and p = 0
But sparsity in orthogonal transforms often yields artifacts.
Spin cycling... 7
Hard thresholding example Noisy image
Denoised
1
1
200
200 184 1
1 PSNR = 76.1 dB
184 PSNR = 89.7 dB
p = 0, orthonormal Haar wavelets 8
Sparsity using shift-invariant models Analysis form: Assume R x is sparse for some sparsifying transform R . Often R is a “tall” matrix, e.g., finite differences along horizontal and vertical directions, i.e., anisotropic total variation (TV). Rx ∥ p = ∥R R circshift(xx)∥ p and R ′R is circulant. Often R is shift invariant: ∥R 1 Rx ∥ p . xˆ = arg min ∥yy − x ∥22 + β ∥R 2 | {z } x transform sparsity Synthesis form Assume x = S θ where coefficient vector θ is sparse. Often S is a “fat” matrix (over-complete dictionary) and S ′S is circulant. 1 ˆ ˆ xˆ = S θ , θ = arg min ∥yy − S θ ∥22 + β ∥θ ∥ p 2 | {z } θ sparse coefficients Analysis form preferable to synthesis form? (Elad et al., Inv. Prob., June 2007)
9
Constrained optimization Unconstrained estimator (analysis form for illustration): 1 Rx ∥ p . xˆ = arg min ∥yy − x ∥22 + β ∥R 2 x (Nonnegativity constraint or box constraints easily added.) Equivalent constrained optimization problem: 1 min ∥yy − x∥22 + β ∥vv∥ p sub. to v = R x. x ,vv 2 (Y. Wang et al., SIAM J. Im. Sci., 2008) (M Afonso, J Bioucas-Dias, M Figueiredo, IEEE T-IP, Sep. 2010)
(The auxiliary variable v is discarded after optimization; keep only xˆ .) Penalty approach: 1 µ xˆ = arg min min ∥yy − x ∥22 + β ∥vv∥ p + ∥vv − R x ∥22 . v 2 2 x Large µ better enforces the constraint v = R x , but can worsen conditioning. Preferable (?) approach: augmented Lagrangian.
10
Augmented Lagrangian method: V1 General linearly constrained optimization problem: min Ψ(uu) sub. to C u = b . u
Form augmented Lagrangian:
ρ ′ u γ u γ C u b C u −bb∥22 L(u , ) ≜ Ψ(u ) + (C −b ) + ∥C 2 where γ is the dual variable or Lagrange multiplier vector . AL method alternates between minimizing over u and gradient ascent on γ : u(n+1) = arg min L(uu, γ (n)) u ( (n+1) ) (n+1) (n) = γ +ρ Cu γ −bb . Desirable convergence properties. AL penalty parameter ρ affects convergence rate, not solution! Unfortunately, minimizing over u is impractical here: v = Rx
equivalent to
C u = b,
R C = [R
−II ],
[ ] x u= , v
b = 0. 11
Augmented Lagrangian method: V2 General linearly constrained optimization problem: min Ψ(uu) sub. to C u = b . u
Form (modified) augmented Lagrangian by completing the square: ρ C u −η ∥22 +Cη , L(uu, η ) ≜ Ψ(uu) + ∥C 2 where η ≜ b − ρ1 γ is a modified dual variable or Lagrange multiplier vector . AL method alternates between minimizing over u and gradient ascent on η : u (n+1) = arg min L(uu, γ (n)) u (n)
C u(n+1) −bb). η (n+1) = η − (C Desirable convergence properties. AL penalty parameter ρ affects convergence rate, not solution! Unfortunately, minimizing over u is impractical here: v = Rx
equivalent to
C u = b,
C = [R R
−II ],
[ ] x u= , v
b = 0. 12
Alternating direction method of multipliers (ADMM) [ ] x When u has multiple component vectors, e.g., u = , v rewrite (modified) augmented Lagrangian in terms of all component vectors: ρ Rx − v − η ∥22 L(xx, v ; η ) = Ψ(xx, v ) + ∥R 2 1 ρ 2 2 y x v R x v η ∥y ∥ ∥v ∥ ∥R ∥ = + − 2+β − − p {z }2 2 2| cf. penalty! because here C u = R x − v. Alternate between minimizing over each component vector: x (n+1) = arg min L(xx, v (n), η (n)) x
v (n+1) = arg min L(xx(n+1), v , η (n)) v ( (n+1) ) (n+1) (n) (n+1) = η + Rx −v . η Reasonably desirable convergence properties. (Inexact inner minimizations!) Sufficient conditions on matrix C . (Eckstein & Bertsekas, Math. Prog., Apr. 1992) (Douglas and Rachford, Tr. Am. Math. Soc., 1956, heat conduction problems)
13
ADMM for image denoising Augmented Lagrangian: 1 ρ Rx − v − η ∥22 L(xx, v ; η ) = ∥yy − x ∥22 + β ∥vv∥ p + ∥R 2 2 Update of primal variable (unknown image): x
(n+1)
′
−1 (
′
(
= arg min L(xx, v , η ) = [II + ρ R R ] y + ρ R v + η | {z } x (n)
(n)
(n)
(n)
))
Wiener filter
Update of auxiliary variable: (n+1)
v
= arg min L(xx
(n+1)
v
Update of multiplier: η
(n+1)
(No “corner rounding” needed for ℓ1.)
( (n+1) ) (n) − η ; β/ρ , p , v , η ) = shrink R x
=η
(n)
(n)
(
+ Rx
(n+1)
−v
(n+1)
)
Equivalent to “split Bregman” approach. (Goldstein & Osher, SIAM J. Im. Sci. 2009) −1
Each update is simple and exact (non-iterative) if [II + ρ R ′R ]
is easy.
14
ADMM image denoising example True
Noisy
1
1
128
128 1
128
1
128
SNR = 24.8 dB Denoised 1
SNR (dB)
53
25 128 1
128
SNR = 53.5 dB
0
5
10 15 ADMM iteration
20
R : horizontal and vertical finite differences (anisotropic TV), p = 1 (i.e., ℓ1), β = 1/2, ρ = 1 (condition number of (II + ρ R ′R ) is 9) 15
ADMM image denoising iterates
adm denoisngmovie 16
Image restoration
17
Image restoration models Unrealistic model: y = |{z} A ε x + |{z} |{z} |{z} blur unknown noise observed Measured blurry image y and unknown image x have the same size. A is a circulant matrix corresponding to a shift-invariant blur model. Somewhat more realistic measurement model: y = T Ax + ε Measured blurry image y is smaller than unknown image x. T is a (fat) “truncation” matrix, akin to [00 I 0]. (S. Reeves, IEEE T-IP, Oct. 2005)
18
Image restoration with sparsity regularization Regularized estimator: 1 Rx ∥ p . xˆ = arg min ∥yy − T A x ∥22 +β ∥R 2 | {z } x {z } | sparsity data fit Basic equivalent constrained optimization problem: 1 min ∥yy − T A x ∥22 + β ∥vv∥ p sub. to v = R x . x ,vv 2 Corresponding (modified) augmented Lagrangian (cf. “split Bregman”): 1 ρ 2 x v η y T A x v R x − v − η ∥22 ∥2 + β ∥v ∥ p + ∥R L(x , ; ) = ∥y − 2 2 ADMM update of primal variable (unknown image): [ ′ ′ ]−1 ( ′ ′ ( (n) )) ′ ′ (n+1) (n) (n) (n) x A T y + ρR v + η = arg min L(xx, v , η ) = A T T A + ρ R R x
Simple if A ′A and R ′R are circulant and T = I (unrealistic). Otherwise need iterative inner (quadratic) minimization: PCG.
19
Improved ADMM for image restoration New equivalent constrained optimization problem: 1 min ∥yy − T u ∥22 + β ∥vv∥ p sub. to v = R x , x ,uu,vv 2
u = Ax.
(Antonios Matakos, Sathish Ramani, JF, IEEE T-IP, 2013, to appear)
Corresponding (modified) augmented Lagrangian: 1 ρ1 ρ Rx − v − η 1∥22 + 2 ∥A Ax − u − η 2∥22 L(xx, u , v ; η 1, η 2) = ∥yy − T u ∥22 +β ∥vv∥ p + ∥R 2 2 2 ADMM update of primal variable (unknown image): [ ]−1 ( ) ′ ′ ′ ′ arg min L(xx, u , v , η 1, η 2) = ρ2A A + ρ1R R ρ1R (vv + η 1) + ρ2A (uu +η 2) x
Simple if A ′A and R ′R are circulant. No inner iterations needed! ADMM update of new auxiliary variable u : −1
T ′y + ρ2(A Ax − η 2)) T ′T + ρ2I ] (T arg min L(xx, u , v , η 1, η 2) = [T | {z } u diagonal v update is shrinkage again. Very easy to code!
20
Image restoration results: quality
Measurement y
Using circulant model ADMM with Reeves model with boundary preprocessing
( ) 2 T A x} /σ , 15 × 15 pixel uniform blur, 50dB BSNR = 10 log Var{T isotropic TV regularization, β = 2−17 Qualitatively confirms Reeves model is preferable.
21
Image restoration results: iterations 90-2:;.;1/0?+,3+0>+1;@++A1>/=@-,.;/01