1128
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 8, AUGUST 2004
Improved Poisson Intensity Estimation: Denoising Application Using Poisson Data H. Lu, Y. Kim, and John M. M. Anderson
Abstract—Recently, Timmermann and Nowak developed algorithms for estimating the means of independent Poisson random variables. The algorithms are based on a multiscale model where certain random variables are assumed to obey a beta-mixture density function. Timmermann and Nowak simplify the density estimation problem by assuming the beta parameters are known and only one mixture parameter is unknown. They use the observed data and the method of moments to estimate the unknown mixture parameter. Taking a different approach, we generate training data from the observed data and compute maximum likelihood estimates of all of the beta-mixture parameters. To assess the improved performance obtained by the proposed modification, we consider a denoising application using Poisson data.
I. INTRODUCTION
M
ANY PHYSICAL processes are modeled as Poisson processes, and, in many applications, an objective is to estimate the underlying intensity of a Poisson process given some observed data. The problem of estimating the intensity of a Poisson process arises in such fields as astronomy, medical imaging, and communications. Timmermann and Nowak’s denoising algorithms are based on a multiscale model where certain random variables are assumed to obey a three component beta-mixture density function. They simplify the estimation problem by assuming that the beta parameters are known and only one mixture parameter is unknown. The estimation of the unknown mixture parameter was obtained in an ad-hoc manner using the method of moments. Our main goal is to make Timmermann and Nowak’s denoising algorithms more general. We do this by assuming the mixture and beta parameters are all unknown and developing a method for obtaining their maximum likelihood (ML) estimates. As desired, the mixture parameter estimates are nonnegative and sum to one and the beta parameters estimates are nonnegative. This paper is organized as follows. We briefly review Timmermann and Nowak’s one-dimensional (1-D) denoising algorithms and two-dimensional (2-D) extensions in Section II. In Section III, we present a ML method for estimating the mixture and beta parameters. Results from applying the improved denoising algorithm to the problem of photon-limited imaging are discussed in Section IV. Finally, we provide conclusions in Section V. Manuscript received March 5, 2003; revised August 20, 2003. This work was supported in part by NSF 9702856 MIP. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Mark R. Luettgen. The authors are with the Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611 USA (e-mail:
[email protected]). Digital Object Identifier 10.1109/TIP.2003.822606
II. POISSON DENOISING Since Timmermann and Nowak’s Poisson denoising algorithms may not be well known to many of the readers, we provide a brief review of their 1-D algorithms. A. One-Dimensional Multiscale Multiplicative Innovation (MMI) Model-Based Algorithm For , let be an observation of , where is Poisson distributed with unknown mean . In and addition, let the vectors contain the data and unknown means, respectively. Note that is assumed to be a power of two. and unA multiscale representation of the data vector known mean vector can be obtained using the unnormalized Haar wavelet transform. Specifically, the scaling coefficients of and at scale and translation are obtained by iterating
and , , , and ilar manner, the wavelet coefficients of and translation follow from the iterations
for
(1) , where . In a simat scale and
(2) We define the scaling coefficient vector and wavelet coefficient at scale as and vector of , respectively. is assumed to be Taking a Bayesian viewpoint, the vector 1 random vector . Let and an observation of an denote the scaling and wavelet coefficient of at scale and translation , respectively. Then, and can be regarded and , respectively. With this viewas observations of given the data . point, the problem of interest is to estimate The minimum-mean square error (MMSE) estimate of , can be obtained by computing the conditional mean of given the data : . In order to compute this conditional mean, a probabilistic framework is needed. Toward this end, consider a collection of perturbation variables (PVs) that are obtained by dividing wavelet coefficients by their corresponding scaling coefficients
1057-7149/04$20.00 © 2004 IEEE
and (3)
LU et al.: IMPROVED POISSON INTENSITY ESTIMATION
1129
It is straightforward to verify that for and , scaling coefficients at adjacent scales obey the following relationship: (4) (5)
Timmermann and Nowak’s observation that the denoised data are “blocky” when the MMI algorithm is used. To overcome this drawback, Timmermann and Nowak developed a shift-invariant version of the MMI model, which we denote by SI-MMI. Details on the SI-MMI model can be found in [1]. B. Two-Dimensional MMI Model
. The collection of random where variables are referred to as multiscale multiplicative innovations (MMIs). In their model, which we refer to as the MMI model, Timmermann and Nowak assume that the (or equivalently the MMIs ) are mutually PVs independent and independent of the scaling coefficients . Moreover, they assume that for each , the random variables are identically distributed and obey a beta-mixture density function
By using a quadtree structure, Timmermann and Nowak extended the 1-D MMI model to 2-D. We merely address the major differences of the 1-D and 2-D MMI models in this section. Details on the 2-D version of the 1-D MMI algorithm can be found in [1], [2]. The multiscale signal representation for the 2-D MMI model is based on groups of scaling coefficients, where each group size equals four. To make the discussion more concrete, we consider an example. Let a group of four scaling coefficients at some be given by , , , and scale . The 1-D unnormalized Haar wavelet decomposition is first and ) and lower half (i.e., otherwise (6) applied to the upper half (i.e., and ) of the scaling coefficients. The result is two sets where is the Euler beta function, is the of horizontal scaling and horizontal wavelet coefficients at scale . Let and denote the scaling coefficients for the upper and is weight of the th beta density function, and and denote the wavelet lower halves, respectively, and, let the value for both parameters of the th beta density funccoefficients for the upper and lower halves, respectively. Then, tion. Also, , , and , , for this example, these coefficients are . We will discuss the estimation of , and . Now, a vertical scaling coefficient and beta parameters in the mixture parameters and vertical wavelet coefficient are obtained by taking the Section III. sum and difference of the horizontal scaling coefficients. Thus, is As shown in [1], the MMSE estimate of and it follows that . Generalizing this example, it is evident that there are two (upper and lower) horizontal sets of MMIs and one set of vertical MMIs. An important advantage of the 2-D MMI model is that the mathematical machinery developed for 1-D data can be (7) applied to 2-D data. The 2-D SI-MMI algorithm has the same structure as the 1-D SI-MMI algorithm. The differences are that the 2-D MMI algorithm is used and the data is circularly shifted in both dimensions over a square region. The final estimate for the unknown means results by averaging the estimates due to Thus, with as an initialization, each circular shift. can be estimated using the recursions (8) (9) with estimate of
and
. The desired MMSE
is
Importantly, the estimates are guaranteed to be nonnegative and their sum equals the sum of the data. We refer to the algorithm just described as the MMI algorithm As it is well known, the Haar wavelet transform is shift variant. Consequently, the analysis and estimation procedure we have just discussed depends on the alignment between the Haar basis functions and data. Our experiments have confirmed
III. DETERMINATION OF MIXTURE AND BETA PARAMETERS The expectation maximization algorithm [3] has been used to compute ML estimates of the parameters of a mixture of probability density functions [4]. However, at this time, no ML methods are available for the case of a beta-mixture density function. In this section, we develop an algorithm for computing ML estimates of the mixture and beta parameters for a beta-mixture density function. In this discussion, we suppress the subscript notation for clarity. For a problem of interest, suppose PV observations are available for all scales. In Section IV, we describe a training procedure that “creates” PV observations for a denoising applicadenote the observations of the tion. Let
1130
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 8, AUGUST 2004
PVs for the th scale. The PV observations are the incomplete data. Since the PV random variables are assumed to be independent and obey a beta-mixture distribution, the log likelihood function for the incomplete data is given by (10) where
[
],
[
and are the th estimate of and , reAlso, [ ] and spectively, and [ ]. In the M-Step, new estimates for the mixture and beta pa. A rameters are obtained by maximizing straightforward computation for yields [4]
],
, and
, is the number of mixture components. To develop an EM algorithm, the PV observations are assumed to be generated in the following manner. A PV observation is generated by first choosing a population group from , where the probability that the group is selected is and the probability density function associated with is . Once a group is selected an observation is drawn using the associated density function. Given this framework, it 1 independent and idenis convenient to introduce a set of , where, for tically distributed random vectors and if is due to the density function otherwise.
(14) for all number of mixture components. Observe that and . The computation for is a major contribution of this paper. In the Appendix , we show that is approximately equal to the positive root of the equation (15) and . where Note, since and , the equation above will always have a positive and negative root. IV. DENOISING APPLICATION USING POISSON DATA
Our complete data will be the combination of the random and random variables , where is the vectors . The log-likerandom variable underlying the observation lihood function of the complete data is the log of the joint and probability density function of
In this section, we consider the problem of denoising images where the pixel values of the images are observations of Poisson random variables. We will refer to the proposed modification and Timmermann and Nowak’s method as the beta-EM and TN methods, respectively. A. Numerical Issue
(11) The E-step is to compute the conditional expectation of the log-likelihood function of the complete data given the observed data and the current estimate of the unknown parameters (i.e., , and )
We first address an important numerical issue that arises when computing terms involving the beta function. To estimate the scaling coefficients, terms of the following form must be calculated [see (7)]:
(16)
, , , , , , , and are where known constants. When the argument of the beta function is large, the resulting value is extremely small. Consequently, measures must be taken to avoid underflow. One approach for addressing this problem is summarized as follows. a) For each , use the betalog function to compute (12) where
b) Compute (13)
, where
LU et al.: IMPROVED POISSON INTENSITY ESTIMATION
1131
TABLE I AMSE RESULTS UNDER VARIOUS NOISE LEVELS FOR CAMERAMAN IMAGE
TABLE II AMSE RESULTS UNDER VARIOUS NOISE LEVELS FOR LENA IMAGE
c) Calculate the ratio in (16) using
Note, in all of our experiments, the above exponential terms were numerically computable. B. Methods In the beta-EM method, all of the beta-mixture parameters were estimated in a ML fashion from training images. The training images were generated by applying a nonparametric wavelet denoising method [5]–[7] to the observed noisy image. The nonparametric method is available in MATLAB as the ddencmp and wpdencmp functions. By contrast, Timmermann and Nowak apply a moment matching method directly to the observed noisy image. A summary of the beta-EM method follows. Step 1) Create the training image: Let and denote the vertical and horizontal shift, respectively. , For . For
Circularly shift observed image by and . Perform Anscombe transformation [8]–[9]on circularly shifted image. Denoise resulting image using MATLAB’s ddencmp and wpdencmp functions. Perform inverse of Anscombe’s transformation on denoised image. Circularly shift resulting image to original orientation. End End Get training image by averaging the 64 denoised images. Step 2) Compute PV observations of the training image and pool them according to scale. Step 3) Estimate the beta-mixture parameters of the PVs at each scale using the PV observations from Step 2) and the EM algorithm in Section III. Step 4) Denoise observed image: . For . For Circularly shift observed image by and .
1132
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 8, AUGUST 2004
=
=
Fig. 1. Denoising results for the cameraman image: (a) original image (maximum pixel value 60), (b) noisy image (mean-squared pixel error 27.94), (c) denoised image using the TN method (mean-squared pixel error 7.81), and (d) the denoised image using the beta-EM method (mean-squared pixel error 5.33).
=
Compute the unnormalized Haar wavelet transform on circularly shifted image and, at each scale, compute the ratio of the wavelet and scaling coefficients. These values are viewed as observations of the ). PVs (i.e., Get denoised image using the betamixture parameters from Step 3) and Timmermann and Nowak’s 2-D denoising algorithm [1] [note: 2-D version of (7)–(9) is used]. Circularly shift resulting image to original orientation. End End Get desired image by averaging 64 denoised images. Timmermann and Nowak simplify the problem of estimating the beta-mixture parameters by fixing the shape parameters to be
, to be
=
, and
and the mixture probability . Since is constrained to equal , the problem that remains is to estimate . Using the following:
(19)
they estimate . estimate
by replacing
with a data based
The moment matching method used by Timmermann and Nowak has the drawback that , the estimate for , is not guaranteed to lie in the feasible region [0,0.999]. In the case of the cameraman image, about 75% of the time the estimate did not lie in the feasible region. To make the moment matching and , we replaced method practical, when and , the moment matching estimate with respectively.
LU et al.: IMPROVED POISSON INTENSITY ESTIMATION
1133
=
=
Fig. 2. Denoising results for the Lena image: (a) original image (maximum pixel value 60), (b) noisy image (mean-squared pixel error 29.30), (c) denoised image using the TN method (mean-squared pixel error 6.50), and (d) the denoised image using the beta-EM method (mean-squared pixel error 5.29).
=
We now provide a possible explanation for the poor performance of the moment matching method. Combining (19) with , yields the following system: the constraint (20) Straightforward calculations show that the condition number ( norm) of the matrix on the left hand side of (20) is about 1726. Because the condition number is very large, a small perfrom may lead to poor turbation of the data estimate estimates for and . Hence, it is not unexpected that oftentimes lies outside of the feasible region [0,0.999]. C. Experimental Results We assessed the performance of the denoising methods using the well-known cameraman and Lena images as test images. Four different noise levels were considered by scaling the 256 256 test images so their maximum pixel values were 30, 60, 90, and 120. For each scaled image, ten noisy realizations were generated using MATLAB’s random command to generate observations of Poisson random variables. The mean
=
values of the Poisson random variables were the pixel values of the scaled images. The beta-EM and TN methods were objectively compared by computing the average mean-square error (AMSE) for the cameraman and Lena test images at each noise level. The AMSE is defined to be the average of the mean-square errors for the ten realizations. The resulting AMSE values for the cameraman and Lena images are tabulated in Tables I and II, respectively. The tables demonstrate that in all cases the beta-EM method had lower AMSE than the TN method. Consider the test case where the maximum pixel value for the test images was 60. In the cameraman scenario, the AMSE was 27.92 for the noisy image, 8.40 for the nonparametric image, 7.74 for the TN image, and 5.37 for the beta-EM image. For the Lena image, the AMSE was 29.09 for the noisy image, 7.36 for the nonparametric image, 6.46 for the TN image, and 5.32 for the beta-EM image. The tables illustrate that the degree of improvement of the beta-EM method over the TN method increases as the noise level increases. For the cameraman image, the beta-EM method produced AMSE values that were 22.68%, 30.62%, 34.65%, and 37.03% lower than the TN AMSE values when the maximum
1134
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 8, AUGUST 2004
pixel value was 30, 60, 90 and 120, respectively. These same percentages for the Lena image were 14.74, 17.69, 19.51, and 21.58. We now compare the methods visually. Fig. 1 illustrates the results for the cameraman image with a maximum pixel value of 60. Comparing Fig. 1(c) with Fig. 1(d), it can be seen that the details in the cameraman image are more visible and the speckle noise is less pronounced in the beta-EM image than the TN image. For example, the camera tripod in the beta-EM image [Fig. 1(d)] has well-defined edges and greater contrast than the TN image in Fig. 1(c). Note also that the buildings in the beta-EM image are more sharply defined. Similarly, for the Lena images in Fig. 2, we find that the beta-EM image has more detail and contrast than the TN image.
where
(A1) Since with respect to
, the derivative of equals
(A2) V. CONCLUSION We have improved Timmermann and Nowak’s parametric method for estimating the intensity of Poisson processes. This improvement was made possible by our development of an expectation maximization algorithm for computing ML estimates of the parameters of a beta-mixture distribution function. Timmermann and Nowak simplified the beta-mixture estimation problem to the point where only a single mixture parameter was unknown. The unknown mixture parameter was estimated using a moment matching method. We provided a theoretical and experimental justification for avoiding the moment matching method and using instead the ML method to estimate all of the beta-mixture parameters. We investigated the performance of our improvement of Timmermann and Nowak’s method, which we call the beta-EM method, in a Poisson denoising application. In the beta-EM method, the beta-mixture parameters were estimated using a training image and the proposed ML method. Given an observed image, a training image was generated by denoising the observed image with a nonparametric method. Using multiple realizations of images corrupted by varying levels of Poisson noise, we found that the beta-EM images always had lower mean-square error than the images produced by Timmermann and Nowak’s method. We also found that as the noise increased, the relative improvement of the beta-EM method over Timmermann and Nowak’s method increased. In the future, other applications where Timmermann and Nowak’s method can be used will be considered to see if the beta-EM still offers a measurable improvement.
APPENDIX In this appendix, we show that is approximately equal to the positive root of the equation in (15). A similar approach was used by Hsiao et al. in the context of Bayesian estimation [10]. The main difference is that they were concerned with estimating the parameters of a gamma-mixture distribution. is given by Recall, from Section III, that
We simplify the problem of computing the derivatives of the the gamma function terms by approximating the gamma function with [11] (A3) For
, it follows from this approximation that (A4) (A5)
Using (A4) and (A5) leads to an approximation for the derivative with respect to of
(A6) Now, setting the right hand side of (A6) to zero yields the key equation (A7) Since the ML estimates of the beta parameters must be positive, is equal to the positive root of the above equation, which only has one positive root. REFERENCES [1] K. E. Timmermann and R. D. Nowak, “Multiscale modeling and estimation of Poisson processes with application to photon-limited imaging,” IEEE Trans. Inform. Theory, vol. 45, pp. 846–862, July 1999. [2] H. Lu, “Image reconstruction of PET images,” M.S. thesis, Univ. Florida, Gainesville, FL, 2001. [3] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Roy. Stat. Soc. Ser. B, vol. 39, pp. 1–38, 1977. [4] G. J. McLachlan and T. Krishnan, The EM Algorithm and Extensions. New York: Wiley, 1996. [5] R. R. Coifman and M. V. Wickerhauser, “Entropy-based algorithms for best basis selection,” IEEE Trans. Inform. Theory, vol. 38, pp. 713–718, May 1992. [6] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inform. Theory, vol. 41, pp. 613–627, May 1995. [7] D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika, vol. 81, pp. 425–455, 1994.
LU et al.: IMPROVED POISSON INTENSITY ESTIMATION
[8] F. Anscombe, “The transformation of Poisson, binomial and negative binomial data,” Biometrika, vol. 15, pp. 246–254, 1948. [9] B. A. Mair, R. B. Carroll, and J. M. M. Anderson, “Filter banks and the EM algorithm,” in Proc. IEEE Medical Imaging Conf., vol. 3, 1996, pp. 1747–1751. [10] I. Hsiao, A. Rangarajan, and G. Gindi, “Joint MAP reconstruction/segmentation using mixture models as priors,” in Proc. IEEE Symp. Nuclear Science, vol. 3, 1998, pp. 1689–1693. [11] I. S. Gradshteyn and I. M. Rhyzhik, Tables of Integrals, Series, and Products. New York: Academic, 1980.
H. Lu received the B.S. degree in electronics and electrical engineering from Jiao Tong University, Shanghai, China, in 1998, and the M.S. degree in electrical and computer engineering from the University of Florida, Gainesville, in 2001. Her M.S. research focused on image processing, medical image denoising, and image reconstruction. She was with the Statistical Signal Processing Laboratory, University of Florida, as a Research Assistant. In 2001, she joined Cadence Design Systems, Inc. Currently, she is with Cadence at the Chelmsford, MA, office. Her research interests include digital signal processing, image processing, and communications.
1135
Y. Kim was born in Gwangjoo, Korea, in 1976. He received the B.S. degree from the School of Electrical and Computer Engineering, Seoul National University, Seoul, Korea, in 2002. He is currently pursuing the M.S. degree at the Department of Electrical and Computer Engineering, University of Florida, Gainesville. His research interests include image reconstruction applications for positron emission tomography and statistical signal processing.
John M. M. Anderson was born in Baltimore, MD, in 1963. He received the Sc.B. degree in electrical engineering from Brown University, Providence, RI, in 1985, the M.S.E.E. degree from the Georgia Institute of Technology, Atlanta, in 1987, and the Ph.D. degree in electrical engineering from the University of Virginia, Charlottesville, in 1992. He joined the Department of Electrical and Computer Engineering, University of Florida, in 1992 and was later promoted to the rank of Associate Professor. Since 2001, he has been a faculty member with the Department of Electrical and Computer Engineering, Howard University, Washington, DC. His general research interests lie in signal and image processing. In 1997, he participated in the NASA Faculty Fellowship Program at the Goddard Space Flight Center, Greenbelt, MD. He also participated in the 2002 U.S. Army Summer Faculty Research and Engineering Program at the Night Vision and Electronic Sensors Directorate, Fort Belvoir, VA. Dr. Anderson has served as an Associate Editor for the IEEE SIGNAL PROCESSING LETTERS (from 2001 to 2003) and was a member of the organizing committee for ICASSP 2002, Orlando, FL.