COMPLEXITY{REGULARIZED IMAGE RESTORATION Juan Liu and Pierre Moulin
U. of Illinois, Beckman Inst. and ECE Dept 405 N. Mathews Ave., Urbana, IL 61801 Email: fj-liuf,
[email protected] ABSTRACT In this paper, we propose the use of complexity regularization in image restoration. This is a exible estimation method which borrows from recent developments in nonparametric estimation theory. The regularized estimation problem is formulated in the wavelet domain and solved using a computationally ecient multiscale relaxation algorithm.
1. INTRODUCTION In some applications, images are blurred and contaminated by additive noise. Image restoration aims at producing good estimates of the original image from the degraded observations. The restored image should be sharper and contain less noise than the observations. Due to the ill{posedness of the image restoration problems, application of the inverse blurring operator to the observations produces an unacceptably rough solution. A priori knowledge of the image can be used to regularize the solution. The classical choice of regularization penalty is a quadratic smoothness penalty. But this restoration technique lacks the capability to locally adapt resolution of the estimates to the data. There are regularization penalties such as Lp penalties (with p 1), which have far better edge rendition properties as well as robustness properties [6, 7]. An even more exible class of smoothness penalties is obtained by application of complexity regularization principles. Assuming that the complexity of the underlying true image is low in a compression sense, complexity measures can be used as regularization penalties. In contrast with traditional regularization methods, such penalties possess appealing universality properties, in the sense that they are guaranteed to deliver good performance over a very broad range of signals or images, under suitable conditions [1]. We have recently demonstrated the bene ts of such penalties in image denoising applications and introduced a rate{distortion interpretation for denoising problems [3]. In this paper, we formulate universal complexity penalties in terms of the wavelet coecients of
the unknown image and solve the resulting optimization problem using numerical relaxation. Our multiscale relaxation algorithm is applicable to regularization techniques using any locally computable penalty, such as an Lp penalty on the wavelet coecients.
2. MODEL The original, unknown image x is blurred by a convolution operator H (typically a low{pass lter), and corrupted by additive white Gaussian noise (AWGN) w with zero mean and known variance 2 . The goal is to restore the original image from the observation y = Hx + w. The wavelet representation of x is often sparse with few signi cant components. This sparsity can be exploited to reduce the computational expense. More importantly, discriminating between noise and image is easier in the wavelet domain than in the spatial domain, as white Gaussian noise aects all coecients equally. We use an orthonormal wavelet transform W . The model in the wavelet domain takes the form y~ = ~ x~ + w~, where y~, x~ and w~ are the wavelet represenH tations of y, x and w respectively; and H~ = W HW T is the 2{D transform of the spatial domain operator H . The restoration problem in the wavelet domain consists of estimating x~ from the observed coecients y~.
3. REGULARIZATION PENALTIES The maximum{likelihood (ML) estimate x^2ML maximizes ln p(yjx) = ? N2 ln(22 ) ? jjHx2?2yjj over x. The solution is the inverse{ ltering solution, x^ML = H ?1 y . If H is a low{pass operator, the estimate is unacceptably rough. A priori knowledge of the image is exploited via a regularization term (x). The regularized estimate minimizes the penalized negative loglikelihood E = ? ln p(yjx) + (x); (1) where is a regularization parameter, and (x) penalizes unlikely images.
Quadratic smoothness penalties. The classical choice is the quadratic penalty (x) = jjCxjj2 , where C is a high{pass operator that penalizes large high{frequency uctuations in x. For the equivalent problem in the wavelet domain, the cost function to be minimized is E = ? ln p(~yjx~) + jjC~ x~jj2 : (2) In [11], the authors use a diagonal C~ and minimize the quadratic cost function (2) by solving a linear system for each subband in a coarse{to{ ne order. This technique works only for quadratic penalty functions, because in this case the estimates are linear functions of the observations. Nonquadratic smoothness penalties. Generalized Gaussian distributions (GGD) with shape parameter 0 < p 1 often provide a good statistical model for wavelet coecients of natural images. For an iid GGD prior, the penalty function (~x) = ? ln p(~x) is additive over the components of x~. The penalty assigned to each wavelet coecient x~l is equal to bjx~l jp , where b depends on p and the signal variance x2 . Complexity penalties. As an alternative to the Lp penalties above, we study the use of complexity penalties in (1) and assign to (x) the length (in bits) of a codeword describing x. Estimates with high complexity in a data{compression sense are penalized. This is a very exible way in which to penalize roughness. Complexity regularization includes MDL [8] as a special case, with the regularization parameter = ln 2. The theoretical motivation for complexity regularization is explained in detail in [1]. As there exists a great variety of compression schemes, the complexity (x) may take diverse forms. For the wavelet domain representation, for instance, we can use the simple complexity penalty (~x) = 23 M log2 N [3], where M is the number of nonzero wavelet coecients, and 32 log2 N is the cost assigned to each nonzero coecient. This cost accounts for log2 N bits for coding its location out of N equally possible locations (implicitly assuming a uniform distribution of the location parameter), and 21 log2 N bits for coding its amplitude (implicitly assuming uniform quantization and uniform distribution among the quantized values) [9]. The underlying model is simple, but crude and unrealistic for practical images. See [3] for far more sophisticated penalties associated with state{of{ the{art image coders. In view of the weakness of the underlying statistical assumptions in the above model, we propose an alternative universal coding scheme for wavelet coecients.
This coding scheme is based upon Rissanen's universal prior for integers [8]. The length of the codeword for positive integer j is log j = log2 j + log2 log j + ::: + log2 c0 where the summation stops at the rst negative term, and c0 2:865 is computed to satisfy Kraft's inequalP 4 2? log j ity with equality, i.e., j 2? log j = 1. So P (j ) = may be interpreted as a probability distribution over strictly positive integers. Rissanen's coding technique is extended to encode all positive and negative integers, including zero. One needs to use a sign bit as well as a special codeword for zero. De ne the length of the codewords as Lq (j )
=
? log q : j=0 ? log (1 ? q) + 1 + log jj j : j =6 0 2
2
(3) where q 2 (0; 1) determines the length of the zero codeword. It is possible to choose q so that the extended function Lq (j ) does not exhibit an abrupt change at j = 0. The value used in our experiments is q = 0:2. The codebook for integers can now be applied to the reproduction levels of a uniform quantizer applied to the wavelet coecients. The quantization step we chose is equal to .
4. A RELAXATION ALGORITHM Minimization of (1) is a very dicult high{dimensional optimization problem, so deriving a computationally ecient optimization algorithm is a major challenge. We propose a solution based on relaxation, where the original optimization problem is broken up into a succession of simpler local optimization problems. Computational requirements can be reduced through proper data management. The relaxation algorithm that we propose operates iteratively in a multiscale manner: we optimize (1) with respect to each wavelet subband, in a coarse{to{ ne order. Such multiscale relaxation algorithms have produced good results in many applications and typically converge rapidly [5, 10]. In our problem, the cost function to be minimized is E = 212 jjH~ x~ ? y~jj2 + (~x). The algorithm is a modi cation of the algorithm in [5]. In each iteration cycle (called \sweep"), we start with an initial guess of wavelet coecients values, denoted as x~(s) , and improve this guess by optimizing the cost function over the wavelet coecients. We de ne the error term = H~ T (H~ x~(s) ? y~), and the correlation matrix
Initialization: Choose x~(0) , e.g., x~(0) = y~, ~ T (H~ x~(0) ? y~) (0) = H for s = 1 to smax do for j = J downto 0 do (
s?1) (l) + [K r(s) ](l) : j < J 10;j j +1 s ? 1) (l) :j=J j note: K10;j is the lower{left submatrix of Kj ; ~ T H~ , and Kj?1 is the upper{left submatrix of Kj . KJ = K = H 2. for 0 ( l < Nj : update fx ~(js) (l)g using (4). (s?1) (s) (s?1) (s) (s?1) (s) 3. rj(s) = [~x(Js) ? x~(Js?1) ; x~J ?1 ? x~J ?1 ; :::; x~j ? x~j ] : j < J : j = J: x ~j ? x~j
1. for 0 l < Nj : (js) (l) =
(
j (
endfor ~ T (H~ x~(s) ? y~) (s) = H Apply convergence test. Exit if algorithm has converged. endfor
Table 1: 1{D Optimization Algorithm. = H~ T H~ . The cost function can be written as: E (~x) = 21 2 jj(H~ x~ ? H~ x~(s) ) + (H~ x~(s) ? y~)jj2 + (~x) XX = 21 2 Kkl (~ xk ? x ~(ks) )(~xl ? x~(l s) )
K
k
l
X + 12 (~xl ? x~(l s) )(l s) + (~x) l
+ 21 2 jjH~ x~(s) ? y~jj2 (unrelated to x~):
At each scale, we iteratively minimize E (~x) with respect to individual wavelet coecients in turn. Assume all coecients are xed except for x~j (l), where j and l denote the scale and location parameters, respectively. Let Ej;l (~xj (l)) be the contribution of x~j (l) to E . The updated value x~(js+1) (l) = arg minx~ (l) Ej;l (~xj (l)) is chosen so as to minimize Ej;l . The next wavelet coecient is then optimized in the same manner, and so on. While direct computation of Ej;l (~xj (l)) involves coecients at all scales, ecient data management makes the computation local both in location and in scale, by using inter{scale feedback terms which are updated after work at each scale [5]. The optimization is conducted iteratively, in a coarse{ to{ ne order. For simplicity, we take the 2{level decomposition of a 1{D signal to illustrate the basic idea, see Table 1. The relaxation algorithm starts from ! (0) and some initial value x~(0) = x~0(0) , where x~(0) 0 x ~1 are respectively the coarse and ne level coex ~(0) 1 cients. First we minimize the contribution of each coj
ecient in the coarse band, coecient by coecient, and get a new estimate x~(1) ~(0) 0 . It is improved over x 0 in the(1)sense that the total cost E is reduced. We then x x~0 and optimize w.r.t the ne{scale coecients x ~1 (l). This completes the sweep. Multi{level decomposition is a recursive extension of the 2{level case. This procedure runs iteratively for several sweeps, until a convergence criterion is satis ed. In the case of separable blurring, the 2{D restoration algorithm is an extension of the 1{D algorithm above. The subbands are optimized in a coarse{to{ ne order. There are three subbands (instead of just one in the 1{D case) per scale below the coarse scale. These three subbands are processed in a predetermined order, e.g., horizontal, vertical, and diagonal subbands. The relaxation algorithm is greedy. At each step it nds the best value of the current wavelet coecient. Each local optimization decreases the total cost E . Convergence of the algorithm is guaranteed, but convergence to a global minimum is not. The relaxation algorithm may be used with various forms of penalty functions. For penalty functions that are additive in the wavelet coecients, x~j (l) is updated by applying a simple nonlinearity () to x ~LS;j (l), the least-square estimate minimizing jjy~ ? ~ x~jj2 over x~j (l), see (4). The penalty function deterH mines the speci c form of the nonlinearity (). Quadratic penalty function (~x) = jjC~ x~jj2 . The resulting estimate is linear in y~. For diagonal C~ , the penalty is additive in the wavelet coecients. The es-
x ~js
( +1)
(l) = (~xLS;j (l));
where x~LS;j (l) = x~js (l) ? ( )
timate is obtained by (4), with () simply being a K scaling factor K +2 2 C~ . Nonquadratic penalty (~x) = b Pj;l jx~j (l)jp. For the special case p = 1 (Laplacian prior), the nonlinearity p22 () is a soft{thresholding function with threshold K . Complexity penalty function 32 M log2 N . In this case, optimization over x~j (l) admits a closed{form solution at each relaxation step. The function () inq(4) is a hard{thresholding function with threshold ln N . The thresholds are typically very large at 3K ne scales. ll
ll
ll
ll
x
ll
Complexity penalty using universal prior for integers. In this case no closed{form solution for () exists, and the optimal value of x~j (l) is searched numerically. The nonlinearity () also exhibits a hard{ thresholding eect, but with a smaller threshold, approximately equal to 1:86 [4].
5. EXPERIMENTAL RESULTS We tested the regularization schemes discussed above on Lena. The original image was blurred by a separable symmetric 55 lter with lter taps [ 161 ; 41 ; 38 ; 41 ; 161 ] in both horizontal and vertical directions, and contaminated with AWGN with zero mean and known variance 2 . We used a 3{level wavelet decomposition with the length{4 Daubechies lters. Image boundaries were treated using periodic extensions. We compared restored images using dierent penalty functions: quadratic penalty, Lp penalty, complexity penalty 3 log2 N for each nonzero coecient, and the complex2 ity penalty based on the universal prior for integers. We evaluate quality using sharpness and other subjective criteria. We also give values for the \SNR im4 10 log jjy?xjj2 (dB ). provement" ISN R = jjx^?xjj2 For the quadratic penalty (~x) = jjC~ x~jj2 , we used a diagonal C~ , assigning higher penalties to ne{scale coecients and lower penalties to coarse{scale coecients, as in [11]. The penalty on coecients in sub2 2 band B was 2 , where x;B denotes the signal variance in subband B . This is equivalent to MAP estimation assuming a Gaussian prior on signal coecients. In practice, fx;B g are unknown. Since our goal in x;B
P
(s) (s) k6=l Klk (~xj (k ) ? x~j (k )) + j (l) : Kl;l
(4)
this preliminary study was to compare the merits of various penalties, we did not address the problem of estimating fx;B g. Instead we used \true" values computed from the underlying image. For the nonquadratic penalty (~xj (l)) = bjx~j (l)jp based on GGD priors, we experimented with the special case p p = 1 (Laplacian prior). For subband B , b = 2 . Again we used \true" values for fx;B g. x;B
We evaluated complexity penalties: 23 log2 N for each nonzero coecient; and complexity based on the universal prior coding scheme for integers. In both cases, we observed artifacts in the form of occasional isolated noise spikes (isolated wavelet coecients) in the restored image. We conjecture the artifacts are due to the greedy nature of the relaxation algorithm and the discontinuity of the nonlinearity () implied by the complexity penalties. We applied a simple post{processing technique that removes isolated high{ amplitude wavelet coecients, which are very unlikely to be signal components. In Fig. 1, we show results for Lena, with 2 = 49. Fig. 1a shows the original image. Fig. 1b shows the blurred and noisy image. The restored image using the quadratic penalty is shown in Fig. 1c. The restored image lacks sharpness and is grainy. MAP estimation using the Laplacian prior produced the result shown in Fig. 1d. Visually the image is better, with sharp edges and smooth texture in the face and shoulder areas. Fig. 1e presents restored image using the complexity penalty 23 M log2 N . While the overall sharpness of the image is improved (e.g., in the eye area), some textures are noticeably oversmoothed, and the overall image quality is very poor. This is not surprising due to the crudeness of the model implied by this particular complexity measure. This penalty tends to overly penalize signal components and thus oversmooth the image. Fig. 1f shows the restoration result using the complexity penalty based on the universal prior for integers. The restored image is sharp and does not look grainy. Visually it is noticeably better than the results in Fig. 1c and e, and comparable to the results in Fig. 1d obtained with the Laplacian prior.
Figure 1: a) The original image; b) observed image, blurred by 5 5 low{pass lter, and contaminated by AWGN with zero mean and variance 2 = 49; c) restored image using quadratic penalty, ISNR = 1:078 dB; d) restored image using the Laplacian prior, ISNR = 0:863 dB; e) restored image using complexity penalty 23 log2 N for every nonzero coecient, ISNR = ?0:752 dB; f) restored image using complexity penalty based on universal prior on integers, ISNR = 0:584 dB.
6. REFERENCES [1] A. R. Barron, \Complexity Regularization With Application to Arti cial Neural Networks," pp. 561|576 in Nonparametric Function Estimation and Related Topics, Ed. G. Roussas, NATO ASI Series, Kluwer, Dordrecht, 1991. [2] J. Besag, \On the statistical analysis of dirty pictures" (with discussion), J. Royal Statistical Soc. B., vol. 48, no. 3, pp. 259{279, 1986. [3] J. Liu and P. Moulin, \Complexity{regularized image denoising," Proc. of ICIP'97, vol. II, Santa Barbara, CA, Oct. 1997, pp. 370|373. [4] J. Liu and P. Moulin, \A new complexity prior for multiresolution image denoising," to appear in IEEE{SP Int. Symp. on Time{Frequency and Time{Scale Analysis, Pittsburg, Oct. 1998. [5] P. Moulin, \A multiscale relaxation algorithm for SNR maximization in nonorthogonal subband coding," IEEE Trans. on Image Processing, vol. 4, no. 9, pp. 1269|1281, Sep. 1995.
[6] P. Moulin and J. Liu, \Analysis of multiresolution image denoising schemes using generalized{Gaussian priors," to appear in IEEE{SP Int. Symp. on Time{Frequency and Time{Scale Analysis, Pittsburg, Oct. 1998. [7] M. Nikolova, \Regularisation functions and estimators," Proc. ICIP'96, Lausanne, Switzerland, pp. I. 457|460, 1996. [8] J. Rissanen, Stochastic Complexity in Statistical Inquiry, World Scienti c, Singapore, 1989. [9] N. Saito, \Simultaneous Noise Suppression and Signal Compression Using a Library of Orthonormal Bases and the MDL Criterion," Wavelets in Geophysics, Eds. E. Foufoula-Georgiou and P. Kumar, pp. 299-324, Academic Press, 1994. [10] D. Terzopoulos, \Image relaxation using multigrid relaxation methods," IEEE Trans. Pattern Anal. Machine Intell., vol. 8, pp. 129{139, 1986. [11] G. Wang, J. Zhang, and G. Pan, \Solutions of inverse problems in image processing by wavelet expansion," IEEE Trans. on Image Processing, vol. 4, no, 5, 1995.