FAST MULTIRESOLUTION PHOTON-LIMITED IMAGE RECONSTRUCTION Rebecca Willett and Robert Nowak ∗ March 18, 2004
Abstract The techniques described in this paper allow multiscale photon-limited image reconstruction methods to be implemented with significantly less computational complexity than previously possible. Methods such as multiscale Haar estimation, wedgelets, and platelets are all promising techniques in the context of Poisson data, but the computational burden they impose makes them impractical for many applications which involve iterative algorithms, such as deblurring and tomographic reconstruction. With the advent of the proposed implementation techniques, hereditary translation-invariant Haar wavelet-based estimates can be calculated in O (N log N ) operations and wedgelet and platelet estimates can be com puted in O N 7/6 operations, where N is the number of pixels; these complexities are comparable to those of standard wavelet denoising (O (N )) and translation-invariant wavelet denoising (O (N log N )). Fast translation-invariant Haar denoising for Poisson data is accomplished by deriving the relationship between maximum penalized likelihood tree pruning decisions and the undecimated wavelet transform coefficients. Fast wedgelet and platelet methods are accomplished with a coarse-to-fine technique which detects possible boundary locations before performing wedgelet or platelet fits.
1 PHOTON-LIMITED IMAGING Multiscale image estimation methods based on translation-invariant (TI) Haar wavelets, wedgelets, and platelets are near minimax optimal reconstruction techniques for photon-limited images [1, 2]. Platelets, developed in previous work, are localized atoms at various locations, scales and orientations that can produce highly accurate, piecewise linear approximations to images consisting of smooth regions separated by smooth boundaries. These methods, while promising, can be computationally demanding, particularly in the context of deblurring or tomographic reconstruction. Expectation-Maximization reconstruction methods involve computing wedgelet, platelet or TI-Haar estimates at each step in an iterative algorithm, making fast estimation techniques essential. In this paper we propose methods for accelerating TI-Haar, wedgelet, and platelet estimation algorithms for photon-limited image reconstruction. These methods all have two primary advantages over traditional ∗
R. Nowak was supported by the National Science Foundation, grants CCR-0310889 and ANI-0099148, the Office of Naval Research grant N00014-00-1-0390, and the State of Texas ATP, grant 003604-0064-2001. R. Willett and R. Nowak are with the Departments of Electrical and Computer Engineering at the University of Wisconsin-Madison and Rice University.
1
wavelet denoising. First, they all exhibit near minimax optimal error rates for both Poisson and Gaussian data. Secondly, these methods mitigate the effect of the dyadic partition underlying classical multiscale techniques and the associated rectangular artifacts. TI-Haar denoising reduces artifacts by averaging over all possible shifts of the underlying partition. Wedgelets and platelets are not restricted to traditional dyadic partitions because the wedgelet splits form a non-rectangular partition of the image. While wavelets do not have these advantages, they are very fast: only O (N ) operations are required for wavelet denoising and only O (N log N ) operations are required for TI wavelet denoising. TI-Haar for Poisson data and wedgelet and platelet methods, in contrast, involve redundant transforms which can require up to O N 2 log N operations. TI-Haar estimation for Poisson data can be complicated because thresholding decisions are based not only on the magnitude of the coefficients (as in traditional wavelet denoising), but also on the coefficients of the node’s descendants [3]. Previous studies (e.g.[4]) have demonstrated that consideration of parentchild relationships can result in improved signal and image estimation. Our experiments have revealed that wavelet coefficient thresholding can be more robust to noise when the thresholding rule has a hereditary constraint, i.e., when a coefficient can only be thresholded if all its descendants are also thresholded. In this paper we demonstrate that hereditary TI-Haar denoising of Poisson and Gaussian data can be performed with the same computational complexity as traditional TI wavelet thresholding techniques [5]. Wedgelet and platelet based methods for photon-limited image reconstruction are also promising, particularly for recovering boundaries in tomographic reconstruction, but these methods pose similar computational hurdles. Wedgelet and platelet approximations are computed using an exhaustive search over a large dictionary of possible model fits. In this paper we describe a coarse-to-fine approach to wedgelet and platelet estimation which significantly reduces the computational burden without reducing accuracy. In the following sections, we review multiscale photon-limited image estimation, and then detail the implementation techniques used to dramatically reduce the computational complexity of these methods. Finally, we demonstrate the effectiveness of these methods on the Shepp-Logan phantom using our MATLAB toolbox for Poisson estimation available online at http://www.ece.rice.edu/∼willett/Research/software.html.
2
MULTISCALE ESTIMATION OF POISSON DATA
The TI-Haar, wedgelet, and platelet methods mentioned above calculate image estimates by determining the ideal partition of the domain of observations (assumed to be [0, 1]2 ) and using maximum likelihood estimation to fit a model to each square in the optimal partition, where the model may be a constant, wedgelet, or platelet fit [1]. The space of possible partitions is a nested hierarchy defined through a recursive dyadic partition (RDP) of [0, 1]2 , and the optimal partition is selected by optimally pruning a quad-tree representation of the observed data. (This quad-tree is referred to as a complete RDP.) This gives our estimators the capability of spatially varying the resolution to automatically increase the smoothing in very regular regions of the image and to preserve detailed structure in less regular regions. Pruning decisions are made using a penalized likelihood criterion. We tackle this problem using a minimum description length/coding theoretic approach to regularization √ √ [3]. We assume that we observe N × N noisy samples of an underlying true image, f : [0, 1]2 → [0, ∞) 2
(assumed to be smooth or piecewise smooth). The N samples, x, are contaminated by either additive white Gaussian noise, with variance σ 2 , or Poisson noise resulting from counting photons hitting an array of photomultiplier tubes. Thus, in the Gaussian case, the likelihood of observing x, given the corresponding √ √ N × N samples of f , f , is p(x|f ) = (2πσ 2 )−N/2 e− and in the Poisson case
kx−f k2 2σ 2
√
p(x|f ) =
N −fi,j xi,j Y e fi,j
i,j=0
xi,j !
.
In general, the RDP framework leads to a model selection problem that can be solved by a tree pruning process. Each of the terminal squares in the pruned RDP could correspond to a region of homogeneous or smoothly varying intensity. Such a partition can be obtained by merging neighboring squares of (i.e. pruning) a complete RDP to form a data-adaptive RDP P and fitting models to the data on the terminal squares of P. Thus the image estimate, b f , is completely described by P. Model coefficients for each square b are chosen to maximize p(x|f ). This provides for a very simple framework for penalized likelihood estimation, wherein the penalization is based on the complexity of the underlying partition. The goal here is to find the partition which minimizes the penalized likelihood function: h i b ≡ arg min − log p(x | f (P)) b + pen (P) b P m P
b b f ≡ f (P)
(1)
b denotes the likelihood of observing x given the image estimate f (P) b and where pen (P) b where p(x | f (P)) m b when using model m, where m is constant, wedgelet, or is the penalty associated with the estimate f (P) platelet. (We penalize the estimates according to a codelength required to uniquely describe each model b b with a prefix code; the penalties are discussed in detail in [1, 6, 2].) Given the partition P, f can be calculated b The resulting by finding the maximum likelihood model-m fit to the observations over each square in P. estimator b f is referred to as the penalized likelihood estimator (PLE). This estimation can be performed optimally but not always efficiently. In previous work, we proposed an algorithm which begins at the leaf nodes in a quad-tree representation of the initial RDP and traverses upwards, performing a tree-pruning operation at each stage. For each node (i.e., dyadic square) at a particular scale, we determine the maximum likelihood model-m coefficient vector and then calculate the penalized log likelihoods for (a) the model fit and (b) the optimal fit calculated at the previous, finer scale of the quadtree. We then select the model which maximizes the penalized likelihood for each square and proceed to the next coarser level. The partition underlying the PLE is pruned to a coarser scale (lower resolution) in areas with low SNR and where the data suggest that the image is fairly smooth. Thus the PLE provides higher resolution and detail in areas of the image where there are dominant edges or singularities with higher count levels (higher SNR). The computational complexity of the above techniques motivates our work here. In the case of constant model fits, the optimization can be computed very quickly, but the resulting image may contain artifacts
3
caused by the dependence of the estimator on the dyadic partition. This can be alleviated through a process called “averaging over shifts” or “cycle-spinning”, which entails circularly shifting the raw data, denoising, and then shifting the estimate back to its original position, and repeating this procedure for each possible shift [5]. However, this process increases the computational complexity from O (N log N ) to O N 2 log N operations. In the case of wedgelet or platelet model fits, the algorithms must exhaustively calculate a total of O N 4/3 wedgelet or platelet fit, resulting in an overall computational complexity of O N 11/6 . Details can be found in [7]. In the following section, we derive the relationship between the above cycle-spinning multiscale constant estimator and the translation invariant Haar wavelet coefficients. Using this relationship, we can compute the cycle-spinning multiscale constant estimator with only O (N log N ) operations. Next we describe a new, coarse-to-fine wedgelet and platelet estimation technique which reduces the complexity of these estimators to O N 7/6 .
3
TRANSLATION-INVARIANT HAAR ESTIMATION
Cycle-spinning, as originally proposed, requires O (N log N ) operations, but was derived in the context of undecimated wavelet coefficient thresholding in the presence of Gaussian noise. The above multiscale tree-pruning method can be modified to produce the same effect by averaging over shifts, but the increase in quality comes at a high computational cost. The method proposed in this paper is based on the key observation that averaging over shifts involves unnecessarily computing the various penalized log likelihoods multiple times. This idea can be used to make TI-Haar Poisson estimation as computationally efficient as TI-Haar wavelet thresholding for Gaussian data. For example, consider computing a cycle-spinning piecewise constant estimate of the length-4 signal [x0 , x1 , x2 , x3 ], as shown in Figure 1. In this figure, internal nodes of the tree are marked as xij ≡ xi + xj . Computing the penalized likelihood estimate for Shift 0 involves calculating p([x0 , x1 ]|[ x201 , x201 ]) and p([x0 , x1 ]|[x0 , x1 ]); note that the first likelihood depends on the scaling coefficients at two different scales, which is why this procedure is less straightforward than traditional coefficient thresholding. hese same calculations must be conducted when computing the estimate at Shift 2. In fact, in total there are only 2N log N unique penalized likelihoods which need to be computed to form a cycle-spinning estimate. The factor of two comes from the fact that we need to calculate both the probability of pruning and the probability of not pruning for each unique node and choose the more likely of the two. Thus we are left with a total of N log N unique pruning decisions. These terms can then be used to perform non-linear estimation on the N log N TI-Haar wavelet coefficients, and then the inverse undecimated wavelet transform can be used to reconstruct the estimated signal in O (N log N ) operations. The relationship between the pruning decisions and the wavelet coefficients is delicate and does not correspond to traditional hard or soft thresholding schemes. In the case of multiscale penalized likelihood estimation, wavelet coefficients are scaled depending on their ancestors’ pruning decisions. That is, each wavelet coefficient is weighted by the percentage of different shifts in which the corresponding node was not pruned. For example, if neither node x0123 nor node x2301 is pruned, then the wavelet coefficient corresponding to node x01 would remain untouched. If node x0123 is pruned but node x2301 is not, then the 4
x0123 x01 x0
x1230
x23 x1 x2 Shift 0
x12 x3
x1
x30 x2
x2301 x23 x2
x0
x3012
x01 x3 x0 Shift 2
x3 Shift 1
x30 x1
x3
x12 x0
x1 Shift 3
x2
Figure 1: Four different trees used for producing a cycle-spinning estimate of a length-4 signal. wavelet coefficient corresponding to node x01 is scaled by one half. If there were more levels in the tree, each of the descendants of node x0123 would lose 2−j of its weight, where j is the number of levels between the descendant and the pruned node. Finally, if both nodes x0123 and x2301 were pruned, then the wavelet coefficient corresponding to node x01 would be set to zero, because each parent pruning causes the wavelet coefficient to lose half of its weight. For images, each node has four possible parents instead of two, and so weights are reduced by a quarter instead of a half for each pruned parent. It is important to note that this scaling can be done one level at a time for increased efficiency. Specifically, once all N log N pruning decisions have been made, the image estimate can be calculated with a breadth-first traversal of the quad tree associated with the wavelet coefficients. The coefficient weight is initialized to one for each node in the tree. Then, at the top of the quad tree, each coefficient weight is multiplied by a zero or a one, depending on the pruning decision. Next, the four children coefficients have their weight reduced by a quarter of the total weight loss of the parent. This way, the grandchildren and other descendants will be appropriately weighted when the breadth-first traversal reaches their scales, and a nested loop for updating these weights is unnecessary. Using this method, only O (N log N ) operations are required to calculate the translation invariant multiscale estimate of an image. Unlike traditional wavelet thresholding techniques, this method is near minimax optimal for both Gaussian and Poisson noise distributions and is robust to noise due to the hereditary nature of the tree-pruning process.
4
COARSE-TO-FINE WEDGELET ESTIMATION
Multiscale image estimation techniques such as wedgelets and platelets are promising because of their ability to accurately represent images consisting of smooth regions separated by smooth boundaries and are near minimax optimal for both Gaussian and Poisson noise distributions. However, the wedgelet and platelet dictionaries are overcomplete and can be quite computationally demanding to implement. This computational hurdle motivates our work here. We consider a sequential, coarse-to-fine image estimation strategy. The
5
basic idea is to first examine the data on a coarse grid, and then refine the analysis and approximation in regions of interest. By carefully examining the bias and variance trade-offs in each stage, we can show that images can be optimally recovered through a sequential process in almost linear time. In the smooth regions of the function, f , the piecewise constant or linear approximation estimates perform well. The task that drives the rates of convergence of the error is the estimation of f in the vicinity of boundaries. The main idea is therefore to perform the global estimation task in a sequential, coarse to fine, fashion: In the first step (denoted the Preview step), a coarse estimation of the field is performed, using only (translation dependent) constant fits to leaves of a coarse resolution RDP, as an attempt to identify the approximate location of the boundary. In a second step (denoted the Refinement step) we perform a piecewise linear boundary fit on the areas that were identified as possible boundary regions. If the Preview step is effective then we will perform a search over the dictionary of possible wedgelet splits (which is computationally demanding) only in the regions where they are needed, instead of using them throughout the entire domain. A detailed description of this technique, along with a complete error and computational complexity analysis, is available in [7].
Figure 2: Shepp-Logan phantom. 512 × 512 Poisson measurements with an average of 12 photons per pixel, MSE = 0.0391.
5
SIMULATIONS
We demonstrate the effectiveness of the proposed methods using the Shepp-Logan phantom brain image, commonly used in medical imaging simulations. The 512 × 512 grid of Poisson measurements is displayed in Figure 2; in this example, there is an average of 8 photons counted per pixel. For these simulations, the penalty used for constant fits was 1/3 log N per leaf node and the penalty used for wedgelet fits was 2/3 log N per leaf node. In Figure 3(a) is the standard Haar estimate. The rectangular artifacts mentioned above are clearly visible in this image. The coarse-to-fine wedgelet estimate after the refinement stage is displayed in Figure 3(b). Note the sharp boundaries in this estimate, particularly in regions of high contrast, as compared to the standard Haar 6
(a)
(b)
(c)
Figure 3: Shepp-Logan simulation results. (a) Standard Haar estimate, MSE = 0.00548. (b) Shepp-Logan phantom estimate formed by fitting one wedgelet or constant to each of the unpruned squares from the preview stage; MSE = 0.00132. (c) Shepp-Logan phantom estimate formed using hereditary TI-Haar denoising; MSE = 0.00259.
estimate. In this case, the preview stage is initialized with an RDP of 4096 8 × 8 coarse resolution squares. After the preview stage pruning, the initial coarse resolution RDP has been pruned back to a nonuniform RDP with only 955 8 × 8 squares remaining. Recall that this procedure requires O(N ) operations, as opposed to the O N 11/6 required for standard wedgelet estimation. Coarse-to-fine wedgelet and platelet estimation can be performed using the functions ctfWedgeletApprox and ctfPlateletApprox, respectively, in our online toolbox. The TI-Haar estimate is displayed in Figure 3(c). While this estimate has a higher mean square error than the coarse-to-fine wedgelet estimate, it should be noted that this estimate exhibits fewer rectangular artifacts, particularly in regions of low SNR. Recall that this procedure requires O(N log N ) operations, the same complexity as standard wavelet TI denoising. Such estimates can be formed using the function haarTIApprox in our online toolbox.
6
CONCLUSIONS
The multiscale methods discussed in this paper are near minimax optimal for images in certain smoothness classes [1, 2], and with the accelerated implementations presented here, the techniques become practical and effective for a variety of applications in photon-limited imaging, including Positron Emission Tomography (PET), Single Photon Emission Computed Tomography (SPECT), and Confocal Microscopy [8]. The denoising techniques detailed here are easily extended to Expectation-Maximization algorithms for deblurring and tomographic reconstruction. These algorithms have been implemented as fast MEX files in a MATLAB toolbox and are available online at http://www.ece.rice.edu/∼willett/Research/software.html.
7
References [1] R. Willett and R. Nowak. Platelets: a multiscale approach for recovering edges and surfaces in photonlimited medical imaging. IEEE Transactions on Medical Imaging, 22(3), 2003. [2] R. Willett and R. Nowak. Multiscale density estimation. Technical Report TREE0303, Rice University, 2003. Available at http://www.ece.rice.edu/∼willett/papers/WillettTREE0303.pdf. [3] E. Kolaczyk and R. Nowak. Multiscale likelihood analysis and complexity penalized estimation. submitted to Annals of Stat. August, 2001. Available at http://www.ece.wisc.edu/ nowak/pubs.html. [4] Matthew Crouse, Robert Nowak, and Richard Baraniuk. Wavelet-based statistical signal processing using hidden markov models. IEEE Transactions on Signal Processing, 46(4):886–902, 1998. [5] M. Lang, H. Guo, J. E. Odegard, C. S. Burrus, and R. O. Wells. Noise reduction using an undecimated discrete wavelet transform. IEEE Signal Processing Letters, 3(1):10–12, 1996. [6] R. Nowak, U. Mitra, and R. Willett. Estimating inhomogeneous fields using wireless sensor networks, 2003. Submitted to IEEE Journal on Selected Areas in Communications. [7] R. Castro, R. Willett, and R. Nowak. Coarse-to-fine manifold learning. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing — ICASSP, 2004. [8] D. Snyder and M. Miller. Random Point Processes in Time and Space. New York: Springer-Verlag, 1991.
8