A signal denoising algorithm based on overcomplete wavelet representations and Gaussian models
Guang Deng† , David B. H. Tay† and Slaven Marusic‡
† Department
of Electronic Engineering, La Trobe University Victoria 3083,
Australia. Phone: +61 3 9479 2036. Fax: +61 3 9471 0524, {d.deng,d.tay}@latrobe.edu.au ‡ School
of Electrical Engineering and Telecommunications, The University of New South Wales, Australia.
[email protected] Corresponding author: Guang Deng
Abstract In this paper, we propose a simple signal estimation algorithm based on multiple wavelet representations and Gaussian observation models. The proposed algorithm has two major steps: a joint-optimum estimation of the wavelet coefficients and an averaging of the denoised images. Experimental results show that the denoising performance of proposed algorithm is comparable to that of the state-of-the-art.
Key words: wavelet transform, over-complete representations, denoising.
Preprint submitted to Elsevier
May 24, 2007
1
Introduction
Signal processing using over-complete representations has been an active research field in recent years [1–3]. The general observation is that the wavelet transform coefficients of real-world signals such as images have a sparse characteristic. Using over-complete representations (for example, multiple wavelet transforms), we aim at capturing important features of the signal using the least number of transform coefficients. In image denoising, excellent performance has been achieved by using over-complete representations with elaborate statistical models [4–7]. A simple and effective image denoising algorithm in the wavelet domain is one that is based on the maximum a posteriori (MAP) estimation principle and the Gaussian observation model [6, 8]. See also a classical paper in image filtering [9]. In this paper, we extend the work of MAP-based signal estimation from using one wavelet to using two wavelets. This is a natural extension of the simple MAP-based algorithm using single wavelet representation. Our goal is to develop a simple and effective image denoising algorithm. To make the algorithm simple, we assume that the two sets of wavelet coefficients (of the same space index) from two wavelet transforms follows a joint Gaussian distribution of which the parameters are estimated locally. The localized Gaussian prior has been successfully used in sparse Bayesian learning and denoising [6, 10]. To demonstrate the performance of the algorithm, we first test the pair of wavelets: sym12 and coif4. This pair of wavelets have similar properties. We then study the performance of the algorithm with a pair of wavelets that 2
originates from one orthogonal wavelet. One way to form such a wavelet pair is to use an orthogonal wavelet and its double-shifted version. The other way is to use an orthogonal wavelet and its reverse version. Next, we study an extension of the proposed algorithm by running it with several different pairs of wavelets and taking average of the denoised images. This is aimed at further enhance the performance of the algorithm. This added averaging step will further improve the signal-to-noise ratio as long as the remaining noises in each of the denoised image are not correlated.
This paper is organized as the following. In Section 2, we derive an algorithm for signal estimation using two wavelet representations and Gaussian observation models. We also discuss model assumptions and extension of the proposed algorithm to handle more than two wavelet representations. In Section 3, we present experimental results using two images. We show that the performance of the proposed algorithm is competitive to those of the state-of-the-art.
2
Signal estimation using two wavelets and Gaussian observation models
2.1 Problem formulation and a two-step solution
We consider the following observation model for the signal: x = s + e, where elements of additive noise vector e are modelled by an i.i.d. Gaussian with zero mean and variance of σe2 . Suppose we have two orthonormal wavelet transforms represented by two matrices W1 and W2 . The transform domain 3
representations of the observation model are given by
yk = uk + rk
(1)
where yk = Wk x, uk = Wk s, rk = Wk e and k = 1, 2. With these two representations, the task of optimum estimation of the signal s can be implemented in the following two steps. First, we make the optimum estimation of the transform coefficients u1 and u2 . The results are represented as v1 and v2 . The associated estimates of the signal are given by s1 = W1T v1 and s2 = W2T v2 , where the superscript T denotes matrix transpose. Then, we take the average to obtain the estimate for the signal
s12 = (s1 + s2 )/2.
(2)
The underlying assumption is that after denoising sk = s + dk ,(k = 1, 2) such that the averaging operation can further improve the signal-to-noise ratio. How much improvement can be achieved depends on the correlation of the remaining noise d1 and d2 after wavelet denoising.
2.2 Optimum estimation of the transform coefficients
To simplify our presentation, we denote the mth element of the vector u1 as u1 [m]. The same notation is used for other vectors. Where there is no confusion, the location index is dropped. For example, a pair of coefficients (u1 [m], u2 [m]) will be simply denoted as (u1 ,u2 ). 4
2.2.1 Model specifications and assumptions We perform the optimum estimation following the MAP principle. We make the following assumptions about the statistical model. • Elements in each set of the wavelet coefficients are uncorrelated. This is a reasonable assumption as the wavelet transform decorrelates the signal. • The coefficient-pair (u1 [m], u2 [m]) are modelled as jointly Gaussian p(u1 [m], u2 [m]) ∼ N (0, Λs ) where the covariance matrix is given by
1 ρs Λs = σs2
(3)
ρs 1
The same model is used for all coefficient-pairs. The signal variance σs2 is to be estimated locally and the correlation coefficient ρs , which depends on the two wavelet transforms and the signal, is assumed fixed. Although the joint-Gaussian model is not an exact model, it is however a convenient tool to model the relationship between the two-trees of wavelet coefficients and is motivated from a computational perspective and justified by the good performance of the proposed algorithm. • Elements of the noise vector rk (k = 1, 2) are not correlated. • The noise-pair (r1 [m], r2 [m]) are modelled as jointly Gaussian p(r1 [m], r2 [m]) ∼ N (0, Λe ) where the covariance matrix is given by
1 ρe Λe = σe2
(4)
ρe 1
The same model is used for all noise-pairs. The noise variance σe2 and correlation coefficient ρe are assumed fixed.
5
2.2.2 Simplifying the problem using the Walsh-Hadamard transform With these assumptions, the posterior probability is given by p(u1 , u2 |y1 , y2 ) =
Y
p(u1 [m], u2 [m]|y1 [m], y2 [m])
(5)
Thus, the task of estimating the two coefficient vectors u1 and u2 reduces to estimating each coefficient-pair (u1 [m], u2 [m]). Dropping the location index and using Bayes’ rule, the joint posterior is given by p(u1 , u2 |y1 , y2 ) ∝ p(y1 , y2 |u1 , u2 )p(u1 , u2 )
To simplify calculation, we perform the Walsh-Hadamard transform (WHT) to decorrelate the two trees of wavelet coefficients. As a result, the task of jointestimation of (u1 , u2 ) reduces to two individual MAP estimation problems which are easy to solve. The WHT of the coefficient-pair, represented as a two-element vector u = [u1 , u2 ]T , is given by t = Qu
(6)
and where t = [t1 , t2 ]T , and Q =
√1 2
1
1
1 −1
is the WHT matrix. Similarly,
the WHT of the observation data-pair, represented as a two-element vector y = [y1 , y2 ]T , is given by z = Qy
(7)
where z = [z1 , z2 ]T . 6
p(z1 |t1 )
N (t1 , σe2 (1 + ρe ))
p(t1 )
N (0, σs2 (1 + ρs ))
p(z2 |t2 )
N (t2 , σe2 (1 − ρe ))
p(t2 )
N (0, σs2 (1 − ρs ))
Table 1 Probability distribution functions for the new variables.
Using the WHT, the MAP estimate of (u1 , u2 ) is transformed into the MAP estimate of (t1 , t2 ), which is determined by maximizing the following posterior p(t1 , t2 |z1 , z2 ) ∝ p(z1 , z2 |t1 , t2 )p(t1 , t2 )
(8)
Once we have estimated (t1 , t2 ), estimates of (u1 , u2 ) are obtained by taking the inverse WHT (IWHT) of (t1 , t2 ). We now show that the two variables t1 and t2 are independent: p(u1 , u2 ) ∝ exp[−uT Λ−1 s u] T T T = exp[−u Q QΛ−1 s Q Qu] = exp[−tT A−1 t]
(9)
where
2 σs (1 + ρs ) T A = Q Λs Q =
0
0 σs2 (1 − ρs )
(10)
Therefore, p(t1 , t2 ) ∝ exp[−tT At] = p(t1 )p(t2 ), where p(t1 ) and p(t2 ) are Gaussian distributions given in Table 1. In the same way, we can show that p(z1 , z2 |t1 , t2 ) = p(z1 |t1 )p(z2 |t2 )
(11) 7
where the distribution functions are given in Table 1. Thus, the posterior for the new variables is given by
p(t1 , t2 |z1 , z2 ) = αp(z1 , z2 |t1 , t2 )p(t1 , t2 ) = αp(z1 |t1 )p(t1 )p(z2 |t2 )p(t2 ) = βp(t1 |z1 )p(t2 |z2 )
(12)
where α and β are two scaling constants. The joint-posterior is now factorized into two individual posteriors p(t1 |z1 ) and p(t2 |z2 ). Thus, the MAP estimation of the data-pair (t1 , t2 ) reduces to two individual MAP estimates.
2.2.3 The MAP solution using locally estimated parameters From Table 1, it is easy to see that the posteriorp(t1 |z1 ) ∝ p(z1 |t1 )p(t1 ) is also a Gaussian distribution. The MAP estimate of t1 , denoted by q1 , is the mean of the distribution, which is given by q1 =
σs2 (1 + ρs ) z1 σs2 (1 + ρs ) + σe2 (1 + ρe )
(13)
In the same way, the MAP estimates of t2 , denoted by q2 , is given by the mean of p(t2 |z2 ) q2 =
σs2 (1 − ρs ) z2 σs2 (1 − ρs ) + σe2 (1 − ρe )
(14)
In actual implementation, we can avoid estimating all parameters by rewriting equation (13) as the following q1 =
2 σs1 [m] z 2 2 1 σs1 [m] + σe1
(15)
8
2 [m] = σs2 (1 + ρs ) to indicate that the where we have used the notation σs1 2 = variance is to be locally estimated. We have also used the notation σe1
σe2 (1 + ρe ) to indicate that this is the noise variance related to the variable z1 . 2 To estimate σe1 , we use a robust estimator
σe1 = median(|z1 |)/0.6745
(16)
For images, z1 is the HH subband (after the WHT) from the first level decomposition. A localized estimation of the signal variance is given by:
2 [m] = σs1
2 E − σe1 , 0,
where E =
1 N2
2 i z1 [m
P
2 E > σe1
(17)
otherwise
− i], is the average signal energy calculated using a
square window centered at location m and N is the total number of pixels in that window. The same method is applied to calculate q2 . Therefore equation (15) becomes
q1 =
2 [m] σs1 2 z , E > σe1 2 2 1 σs1 [m] + σe1
0,
(18)
otherwise
We can see that by assuming a localized Gaussian model, the MAP principle leads to a shrinkage/hard-thresholding rule. The shrinkage part is an adaptive Wiener filter, while the hard-thresholding part plays an essential role in favouring a sparse solution. 2 We note that this method of localized estimation of the signal variance σs1 [m]
and the signal q1 is essentially the same as that of the adaptive Wiener filter [9]. Our simulations show that the performance of the proposed algorithm in terms 9
of the PSNR is not very sensitive to moderate window size settings such as (5 × 5) and (7 × 7). Although we may be able to adaptively determine an optimal size based on certain statistical assumptions, we choose to keep the size fixed for the whole image to reduce the computation time.
2.3 The proposed algorithm
2.3.1 The general algorithm The proposed denoising algorithm, illustrated in Figure 1, involves the following steps. y1
s1
v1
WT1
q1
z1
IWT1
MAP
x WHT
+
q2
z2
WT2
s
IWHT
Denoising
IWT2 v2
y2
1/2
s2
Figure 1. The block diagram of the denoising algorithm.
(1) Perform a K-level wavelet transformation of the noisy image x using the two wavelets. This produces two trees of wavelet coefficients, denoted (y1 ,y2 ). (2) De-correlation of the two trees of wavelet coefficients using the WalshHadamard transform, (z1 ,z2 ) = WHT(y1 , y2 ). (3) MAP estimation of (t1 , t2 ) results in (q1 , q2 ). (4) Performing the inverse WHT results in the denoised wavelet coefficients (v1 , v2 )= IWHT(q1 , q2 ) . (5) Performing the inverse wavelet transformation results in the two denoised images (s1 , s2 ). 10
(6) The final denoised image is given by taking the average of the two images (s1 , s2 ).
2.3.2 A special case
There is a special case in the proposed algorithm where the covariance matrices Λs and Λe are diagonal. The joint-posterior is factorized as
p(u1 , u2 |y1 , y2 ) ∝ p(y1 |u1 )p(u1 )p(y2 |u2 )p(u2 )
The WHT and IWHT are no longer needed. This corresponds to by-passing those two processing blocks in Figure 1. As such, the proposed algorithm simply performs two individual MAP-based wavelet denoising process and takes the average of the two denoised images as the final result. Equation (18) can still be applied to image denoising based on a single wavelet. In this case, z1 is the noisy wavelet coefficient and q1 is the estimated wavelet coefficient.
We note that in real applications, the covariance matrix Λs is not diagonal, even though under certain conditions Λe is approximately diagonal. As such, the WHT is still useful in decorrelating the two trees of wavelet coefficients and the general algorithm described in the previous section should be used. In Section 3.1, we will demonstrate that the performance of the general algorithm is indeed better than that of this special case. 11
2.4 Discussion and extension
2.4.1 Model assumptions
We now briefly comment on the model assumptions which are required for the extension of MAP-based signal estimation from using one wavelet to using two wavelets. A major motivation is to derive an algorithm that is simple and mathematically tractable.
Although wavelet coefficients of natural images generally follow a heavy-tail distribution [5, 11–13], we assume that from a localized point of view, wavelet coefficients can be modelled as i.i.d Gaussian with their own variances. This is justified in the sense that a mixture of Gaussian distributions can approximate a heavy-tail distribution [14, 15]. This assumption is also motivated by the sparse nature of the wavelet coefficients [3,16–18]. Indeed, we can see in section 2.2 that these model assumptions naturally leads to a combination of shrinkage (adaptive Wiener filtering) and hard-thresholding of the noisy coefficients. Both processing units are well established as effective signal processing tools. Having the hard-thresholding processing unit, this algorithm clearly favours a sparse solution.
In addition, as we adopt a model based approach, the performance of the proposed algorithm depends on how well the model fits the signal and how accurately the parameters are estimated. It is expected that the performance of the algorithm will be further improved if a more accurate model is adopted. However, this is usually at the cost of much increased computational complexity. 12
2.4.2 General covariance matrix In the following discussion, we assume Λe is a diagonal matrix. We consider a general form of the covariance matrix for the signal
Λs =
σ12 ρs σ1 σ2
ρs σ1 σ2 σ22
(19)
With this covariance matrix, the two trees of wavelet coefficients (u1 , u2 ) can be decorrelated in a similar way as that using the WHT. The transform matrix is now given by:
sin θ
Q=
cos θ
cos θ − sin θ
(20)
If the three parameters (ρs , σ1 , σ2 ) are known, then it can be shown that θ=
1 2ρs σ1 σ2 tan−1 2 2 σ2 − σ12
(21)
After the transformation, the MAP estimate for this general setting of the covariance can be obtained in exactly the same way as that outlined in section 2.2. In practical applications, it is not unusual that the three parameters (ρs , σ1 , σ2 ) have to be estimated. This adds a substantial layer of complexity to the algorithm. On the other hand, in the proposed algorithm, it is assumed that σ22 = σ12 and transform matrix reduces to the WHT matrix for which θ = π/4. 2 Thus, only one parameter (σs1 [m]) is to be estimated in the proposed algo-
rithm. 13
2.4.3 Dealing with more than two wavelets In principle, the algorithmic development in section 2.2 can be used to deal with more than two wavelets. For example, if Λs is known, then it is straightT forward to find an orthogonal transform matrix such that QΛ−1 s Q is a di-
agonal matrix. As such, the joint estimation problem reduces to a number of individual MAP estimation problems. However, if Λs is not known, then the problem can be treated by using either an empirical Bayes approach [19] where Λs is estimated from the noisy wavelet coefficients then used in subsequent processing, or a full Bayesian approach where elements of Λs are regarded as hyper-parameters and an iterative algorithm is required to solve the problem [3, 20]. Both approaches add substantial complexity to the algorithm. In this paper, we consider a simple extension to the proposed algorithm. For example, if we want to use three wavelets, then we can group them into three pairs. Using the proposed algorithm, we have three denoised images. We then take the average of them as the final output. For example, we can first calculate the three wavelet transforms, then go through steps 2 to 6 of the proposed three times. Each time we use two arrays of wavelet coefficients as the input. We note that the computation time is about three times more than that is required for the proposed algorithm using two wavelets.
3
Application in image denoising
Simulation experiments are performed using Matlab. The noisy image is generated by x = s + σe randn(size(s)), where σe2 is the variance of the added noise. The performance of the denoising algorithm is measured by the peak14
signal-to-noise-ratio (PSNR). To smooth out random fluctuation of the results, the same experiment is run 100 times and the average value of the PSNR is reported.
3.1 Using the (sym12,coif4) wavelet pair
The aim of this experiment is to investigate the performance of proposed algorithm and compare it with a special case described in Section 2.3.2. We consider a pair of wavelets (sym12, coif4). In our experiments, the resulting images are indicated by a subscript. Referring to Figure 1, s1 and s2 are the denoised images using the two wavelets sym12 and coif4, respectively. s12 is the average of s1 and s2 . We use the Barbara image to study the performance of the denoising algorithm under different noise levels. Table 2 shows the PSNR (dB) results. We can see that as a result of using two wavelets and the last step of averaging, the PSNR of s12 is about 0.4–0.5dB higher than those of s1 and s2 . Next, we consider the special case which takes the average of two individual MAP denoised images, denoted b1 and b2 , as the final result denoted b12 . See section 2.3.2 for discussion of this special case. Actually, results for images b1 , b2 and b12 were obtained by running the proposed algorithm with the WHT and IWHT being by-passed. We can see from Table 2 that the PSNR for b12 is also about 0.4–0.5 dB higher than those of b1 and b2 . The improvement in both cases confirms that the remaining noise in the two images (s1 and s2 , b1 and b2 ) are to some degree uncorrelated and is thus reduced by averaging. Comparing the results of s12 with those of b12 , we can also see that the pro15
σe
s1
s2
s12
b1
b2
b12
10
32.86
32.71
33.39
32.72
32.64
33.19
15
30.66
30.52
31.21
30.52
30.43
30.96
20
29.12
29.00
29.67
28.97
28.88
29.38
25
27.95
27.85
28.49
27.81
27.72
28.18
30
27.02
26.93
27.55
26.88
26.80
27.22
35
26.26
26.18
26.77
26.11
26.04
26.43
40
25.62
25.53
26.10
25.48
25.41
25.77
Table 2 Denoising results (dB) using wavelets sym12 and coif4 with the Barbara image under different noise levels.
posed algorithm with joint-denoising of wavelet coefficients performs about 0.2–0.5 dB better than that treating the two trees of wavelet coefficients individually. Thus, we have demonstrated the effectiveness of the joint-Gaussian model for the two trees of wavelet coefficients. In figure 2, we show typical image denoising results. The left and right columns show the results of the proposed joint-denoising algorithm and its special case, respectively. The variance of the noisy image is σe = 25. The two wavelets (sym12, coif4), whose wavelet filters are of the same length, have similar properties such as: near symmetrical and highest number of vanishing moments for the given support width. The two wavelets are chosen in our experiments because they provide better performance than some other wavelets such as the Daubechies wavelets. However, the experimental results should be interpreted as an indication of the performance of the proposed algorithm with a particular pair of wavelets. It is possible that other combinations of wavelets would lead to better results. Indeed, choosing a wavelet for image denoising or compression application [21] is an important issue which 16
b1
s1
s2
b2
s12
b12
Figure 2. Results of the proposed joint-denoising algorithm (left column) and its special case (right column). The variance of the noisy image is σe = 25. Typical PSNR values for these images are shown in Table 2.
17
is out of the scope of this paper.
3.2 Using a pair of wavelets that originated from the same wavelet
In [6], excellent denoising results were obtained by creating an over-complete representation using one wavelet on the original image and the shifted version. An interesting question arises: what is the performance of proposed algorithm if the two wavelet transforms originate from the same wavelet? For orthogonal wavelets, we can “create” another wavelet by either double-shifting or reversing the scaling filter. In actual implementation, the former corresponds to doubleshifting all filters 1 , while the latter corresponds to using the reconstruction filters for decomposition and the decomposition filters for reconstruction. We have conducted experiments using the sym12 and coif4 wavelets and called their double-shifted versions and reversed versions sym12d and coif4d, sym12r and coif4r, respectively. The results from using five pairs of wavelets 1
For images, we note that wavelet coefficients from the original wavelet and its
double-shifted version are not related by a simple shift. However, through some elaborate data structure, certain savings in computation can indeed be achieved. For example, let the lowpass decomposition filter and image signal in z-transform domain be: H(z) and S(z1 , z2 ), respectively. Then the LL subbands are given by S1 (z1 , z2 ) = (↓ 2)H(z2 )[(↓ 2)H(z1 )S(z1 , z2 )] and S2 (z1 , z2 ) = (↓ 2)z2−2 H(z2 )[(↓ 2)z1−2 H(z1 )S(z1 , z2 )] = z2−1 (↓ 2)H(z2 )[z1−1 (↓ 2)H(z1 )S(z1 , z2 )]
18
σe
A
B
C
D
E
10
33.39
33.24
33.45
33.17
33.10
15
31.21
31.08
31.23
31.01
30.88
20
29.67
29.56
29.65
29.49
29.33
25
28.49
28.39
28.45
28.33
28.15
30
27.55
27.45
27.48
27.39
27.22
35
26.77
26.66
26.66
26.61
26.46
40
26.10
25.99
25.96
25.94
25.81
Table 3 A comparison of the performance of proposed denoising algorithm using the five pairs of wavelets. Each column presents results from a pair of wavelets. A = (sym12,coif4), B = (sym12,sym12d), C = (sym12,sym12r), D = (coif4,coif4d) and E = (coif4,coif4r)
are shown in Table 3. We can see that the difference in PSNR is smaller as the noise level grows higher. We can also see that the results for each individual wavelet pair is very similar to that presented in Table 2. The general observation (stated in previous subsection) about the performance of proposed algorithm is still valid for a pair of wavelets originating from one wavelet.
3.3 Using three wavelets
We test an extension of the proposed algorithm to using three wavelets. This is discussed in section 2.4.3. The three wavelets used in our experiments are sym12,coif4 and sym12d. Results are shown in the last column in Table 4. Comparing these results to those in Table 3, we can see that the PSNR has improved by about 0.2-0.3 dB due to using three wavelets and an added averaging process. We have also tested another group of wavelets sym12,coif4 19
[22]
[23]
[24]
[8]
[6]
[5]
[4]
Proposed
σe = 10
33.32
33.84
34.10
33.94
34.96
35.61
35.34
35.17
σe = 15
31.41
31.76
32.23
32.06
33.05
33.90
33.67
33.34
σe = 20
30.17
30.39
30.89
30.73
31.72
32.66
32.40
31.99
σe = 25
29.22
29.24
29.89
29.81
30.64
31.69
31.40
30.92
σe = 30
28.48
28.35
29.05
28.94
-
-
30.54
30.05
σe = 10
30.86
31.36
31.99
31.13
33.35
34.03
33.35
33.67
σe = 15
28.51
29.23
29.60
28.71
31.10
31.86
31.31
31.51
σe = 20
27.13
27.80
27.94
27.25
29.44
30.32
29.80
29.98
σe = 25
26.01
25.99
26.75
25.97
28.23
29.13
28.61
28.80
σe = 30 Table 4
25.16
25.11
25.80
25.21
-
-
27.65
27.85
Lena
Barbara
A comparison of denoising results (dB) using different algorithms. Two test images are used under different noise levels.
and sym12r. The PNSR results are about 0.1 dB less than the those using sym12,coif4 and sym12d.
Next, we compare our results to other published results. Since a very comprehensive comparison of the performance of the complex wavelet transform with bivariate shrinkage to those of other state-of-art algorithms [5, 6, 8, 22–24] has been reported in Tables I and II in [4], we reproduce most part of those tables in Table 4 in this paper for easy reference. As we can see from this table, the results of our algorithm are comparable to those of the state-of-the-art algorithms. 20
Figure 3. Simulation results using the Lena image corrupted with zero-mean Gaussian noise with variance σe = 25. For easy comparison, only the face part of the image is shown. These images are presented in the order from left to right and from top to bottom: the original, noisy, the proposed algorithm, Li’s algorithm, Sendur’s algorithm and Portilla’s algorithm. Typical PSNR values for these images can be found in Table 4.
21
Figure 4. Simulation results using the Barbara image corrupted with zero-mean Gaussian noise with variance σe = 25. For easy comparison, only the face part of the image is shown. These images are presented in the order from left to right and from top to bottom: the original, noisy, the proposed algorithm, Li’s algorithm, Sendur’s algorithm and Portilla’s algorithm. Typical PSNR values for these images can be found in Table 4.
22
In figures 3 and 4 we compare the results of denoised images using algorithms 2 in [4–6] and the proposed algorithm using three wavelets. Again, we can see that the denoised images are quite similar. We note that computation time of these algorithms are quite different in our simulations using a PC with a Pentium-4 3 GHz processor. While the running time for the algorithm [5] is more than 75 seconds those for the proposed algorithm using two and three wavelets are about 2.7 and 7.6 seconds, respectively. The running time for the algorithm in [4] is about 3.6 seconds and the running time for the algorithm in [6] is about 7.2 seconds. We should point out that the source codes are experimental and can be further optimized for faster implementations. As such the running times can only be interpreted as rough estimates of the relative complexity of various algorithms.
4
Conclusion
In this paper, we have studied signal estimation using multiple wavelet representations and Gaussian observation models. The proposed algorithm has two major steps: a joint-optimum estimation of the wavelet coefficients based on the MAP principle and an averaging of the denoised images. Each step contributes to improvements in denoising performance. We develop an algo2
Prof.
lab net
Li
codes
kindly for
address:
provided
code.
Mat-
the
inter-
http://decsai.ugr.es/~javier/denoise/software
and
algorithms
in
us [4, 5]
http://taco.poly.edu/WaveletSoftware,
with are
his
source
available
respectively.
from
Default
settings
for
algorithms in [4, 6] are used and suggested settings to reproduce results in [5] are also used.
23
rithm for two wavelets and propose a simple extension of the algorithm to deal with more than two wavelets. The proposed algorithm, due to the Gaussian assumption, is simple and easy to implement. We have tested the algorithm using a wavelet pair (sym12,coif4) which have similar properties. We have also tested the algorithm by making a wavelet pair from a known wavelet and its double-shifting or reversing version. We then study averaging the results from running the proposed algorithm on several different wavelet pairs. Results comparable to those of the state-of-the-art algorithms have been achieved. The performance of the proposed algorithm can be further enhanced by using a full Bayesian model averaging approach [25] to combine the two denoised images. It is also interesting to study how to select a pair of wavelets for a particular class of signals [26]. A further extension is to study an image restoration problem involving both de-blurring and denoising using overcomplete representations.
Acknowledgements
This research is supported by an Expertise Grant, The Victoria Partnership for Advanced Computing (VPAC), Australia.
References
[1] M. Elad and A. M. Bruckstein, “A generalized uncertainty principle and sparse representation in pairs of bases,” IEEE Trans. Information Theory, vol. 48, pp.
24
2558–2567, Sept. 2002. [2] D. Donoho and M. Elad, “Optimally sparse representation in general (nonorthogonal) dictionaries via l1 minimization,” Proc. National Academy of Sciences, vol. 100, pp. 2197–2202, 2003. [3] M. A. T. Figueiredo, “Adaptive sparseness for supervised learning,” vol. 25, pp. 1150–1159, Sept. 2003. [4] L. Sendur and I. W. Selesnick, “Bivariate shrinkage with local variance estimation,” IEEE Signal Processing Letters, vol. 9, no. 12, pp. 438–441, Dec 2002. [5] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image denoising using scale mixtures of Gaussian in the wavelet domain,” IEEE Trans. Image Processing, vol. 12, pp. 1338–1351, Nov. 2003. [6] X. Li and M. T. Orchard, “Spatially adaptive image denoising under overcomplete expansion,” in Proc. IEEE International Conference on Image Processing, vol. 3, Sep 2000, pp. 300–303. [7] P. Ishwar and P. Moulin, “Multiple-domain image modeling and restoration,” in Proc. IEEE International Conference on Image Processing, vol. 2, Santa Barbara, California, USA, 1999, pp. 286–289. [8] L. Sendur and I. W. Selesnick, “Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency,” IEEE Trans. Signal Processing, vol. 50, pp. 2744–2756, Nov 2002. [9] J. S. Lee, “Digital image enhancement and noise filtering by use of local statistics,” vol. 2, pp. 165–168, Mar. 1980. [10] D. J. C. MacKay, “Bayesian non-linear modelling for the energy prediction competition,” ASHRAE Trans., vol. 100, pp. 1053–1062, 1994.
25
[11] P. Moulin and J. Liu, “Analysis of multiresolution image denoising schemes using generalized Gaussian and complexity priors,” vol. 45, pp. 909–919, Apr. 1999. [12] A. Antoniadis and J. Q. Fan, “Regularization of wavelet approximations,” Journal of the American Statistical Association, vol. 96, pp. 939–955, Sept. 2001. [13] J. Dias, “Bayesian wavelet-based image deconvolution: A GEM algorithm exploiting a class of heavy-tailed priors,” IEEE Trans. Image Processing, 2005, to appear. [14] B. W. Silverman, Density Estimation for Statistics and Data Analysis. London: Chapman and Hall, 1986. [15] B. A. Olshausen and K. J. Millman, “Learning sparse codes with a mixture-ofGaussians prior,” in Advances in Neural Information Processing Systems, vol. 12. MIT Press, 2002, pp. 841–847. [16] B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature, vol. 381, pp. 607–609, 1996. [17] A. Hyvarinen, P. Hoyer, and E. Oja, “Image denoising by sparse code shrinkage,” in Intelligent Signal Processing, S. Haykin and B. Kosko, Eds. IEEE Press, 2000. [18] E. Simoncelli and B. A. Olshausen, “Natural image statistics and neural representation,” Annual review of Neuroscience, vol. 24, pp. 1193–1216, 2001. [19] A. Gelman, H. B. Carlin, H. S. Stern, and D. B. Rubin, Bayesian Data Analysis. Boca Raton, Florida, USA: Chapman & Hall/CRC, 2004. [20] D. J. C. Mackay, “Bayesian interpolation,” Neural Computation, vol. 4, no. 3, pp. 415–447, 1992. [21] J. D. Villasenor, B. Belzer, and J. Y. Liao, “Wavelet filter evaluation for image compression,” IEEE Trans. Image Processing, vol. 4, no. 8, pp. 1053–1060, Aug.
26
1995. [22] S. G. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for image denoising and compression,” vol. 9, pp. 1532–1546, Sept. 2000. [23] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, “Wavelet-based signal processing using hidden markov models,” IEEE Trans. Signal Processing, vol. 46, pp. 886–902, 1998. [24] M. K. Mihcak, I. Kozintsev, K. Ramchandram, and P. Moulin, “Low -complexity image denoising based on statistical modelling of wavelet coefficients,” IEEE Signal Processing Letters, vol. 6, pp. 300–303, Dec 1999. [25] A. E. Raftery, D. Madigan, and J. A. Hoeting, “Bayesian model averaging for linear regression models,” Journal of the American Statistical Association, vol. 92, pp. 179–191, 1997. [26] G. Deng, “Signal estimation using multiple-wavelet representations and gaussian models,” in Proc. IEEE International Conference on Image Processing, CDROM, Genoa, Italy, 2005.
27