IEEE SIGNAL PROCESSING LETTERS, VOL. 18, NO. 4, APRIL 2011
215
Adaptive Denoising by Singular Value Decomposition Yanmin He, Tao Gan, Wufan Chen, and Houjun Wang
Abstract—This letter presents an adaptive denoising method based on the singular value decomposition (SVD). By incorporating a global subspace analysis into the scheme of local basis selection, the problems of previous adaptive methods are effectively tackled. Experimental results show that the proposed method achieves outstanding preservation of image details, and at high noise levels it provides improvements in both objective and subjective quality of the denoised image when compared to the state-of-the-art methods. Index Terms—Basis selection, image denoising, singular value decomposition.
I. INTRODUCTION MAGE denoising remains an important research problem in image processing. Over the past few decades, many denoising methods have been proposed. No-local means (NLM) is a recent method [1] that has received a lot of attention and many schemes were proposed to improve its performance, e.g., the one which combines NLM with Stein’s unbiased risk estimate (SURE) [2], the principle neighborhood dictionary (PND) which uses principal component analysis (PCA) in neighborhood weight calculation [3], as well as various NLM speed-up schemes [4]. The most evolved version of the NLM framework is probably BM3D [5] which proves to be very effective and is among the best state-of-the-art denoising methods. Another line of work is driven by finding sparse representations of signals. With the growing realization that the fixed basis such as wavelet is inappropriate for handling images, recent efforts have been made towards building the globally or locally adaptive basis for image sparse representations. Under the assumption that each image patch can be sparsely represented, Elad et al. [6] proposed K-SVD algorithm which learns a global dictionary for denoising grayscale images with additive white Gaussian noise. In parallel, Muresan et al. [7] proposed a spatially adaptive scheme which uses the PCA of image neighborhoods for noise removal. More recently, Zhang et al. [8] proposed LPG-PCA algorithm in
I
Manuscript received December 09, 2010; revised January 13, 2011; accepted January 19, 2011. Date of publication January 28, 2011; date of current version February 10, 2011. This work was supported by the National Basic Research Program of China under Grant 2010CB732501. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Ce Zhu. Y. He, W. Chen, and H. Wang are with the School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China (e-mail:
[email protected];
[email protected];
[email protected]). T. Gan is with the School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China (e-mail:
[email protected]). This letter contains downloadable supplemental material that can be accessed at http://ieeexplore.ieee.org. The size of the files in the collections is 10.2 Mb. Digital Object Identifier 10.1109/LSP.2011.2109039
which the pixels with similar local spatial structures are grouped as training samples before PCA transformation. The above global and local adaptive methods have their own advantages and disadvantages. With an appropriate scale, the global dictionary can provide efficient sparse representations for global features. However, due to the limited number of dictionary atoms, image details like edges or textures may not be accurately approximated. In contrast, methods with locally adaptive basis or dictionary have the potential to capture local structures. Yet, they have difficulties in finding suitable training samples as well as choosing the optimal principal components for denoising. Consequently, the local adaptive methods face much higher risk of overfitting than the global methods do. Quite often, artifacts appear in smooth regions of the denoised image, which severely degrades the visual quality [3]. For these reasons, the denoising performances of global and local adaptive methods are inferior to that of BM3D. In this letter, we propose a new adaptive method to tackle the problems of previous ones. While most of the previous researches focus on either local or global basis selection, our method combines the strength of both approaches. It consists of three steps. In the first step, a global subspace analysis is carried out by performing a singular value decomposition (SVD) on the data matrix of image patches. The signal subspace and the noise subspace are then chosen automatically. In the second step, similar patches are grouped as training samples using distances in the signal subspace. The SVD is then performed in each group and the leading singular vectors are selected as candidate vectors for representing the corresponding patch. In the final step, the local basis for each patch is obtained by excluding the candidates learned from noise-dominated samples as well as the ones belonging to the noise subspace. The main contribution of this letter is to show that with a good basis selection strategy, the adaptive denoising method can outperform the leading NLM methods like BM3D. Experimental results demonstrate that the proposed adaptive SVD-based (ASVD) method achieves outstanding preservation of image details, and at high noise levels it provides improvements in both objective and subjective quality of the denoised image when compared to the state-of-the-art methods. II. ASVD DENOISING ALGORITHM Suppose one clean image is contaminated by an additive zero-mean white Gaussian noise with a standard deviation . We divide the image into overlapping patches of a fixed size and model each patch as a vector variable. The denoising is then performed on the patch vectors one by one and the whole image is restored by averaging over all denoised patches. We assume that natural images will place most of their patches in a lower dimensional manifold. Thus, our objective here is to find the suitable basis for each patch and we proposed a three-step algorithm to
1070-9908/$26.00 © 2011 IEEE
216
IEEE SIGNAL PROCESSING LETTERS, VOL. 18, NO. 4, APRIL 2011
this end. In the following sections, we describe each of the steps in detail. A. Step 1: Global Subspace Analysis For noise reduction, one essential problem is to find an effective discrimination between noise and signals. In our treatment, two subspaces are identified within the space of image patches: the signal subspace which is associated to clean signals and the noise subspace which is dominated by noise. To perform the task of subspace division, the SVD is used for its efficiency. and there are total Suppose the patch size is patches in the image, we define as a data matrix in which each column represents one patch vector, and comas , where the matrices pute the SVD of and have orthonormal columns and the matrix is a diag. The values onal matrix with diagonal elements are called singular values and are sorted in decreasing , order. The columns of , denoted as singular vectors form a basis for the space . The remarkable property of the SVD is its statistical reprein subspaces of decreasing importance. More sentations of precisely, one can reconstruct the closest rank- approximation to by retaining only the first singular values. Based on this fact, we determine the signal and the noise subspaces as follows. by solving For signal subspace, we estimate its dimension the constrained optimization problem as (1) The basic idea behind this approach is that if we choose the right dimension for the subspace, the average squared residuals between noisy patches and denoised patches should be very close to the noise variance. Note that in SVD framework, the left-hand side of the constraint term in (1) just equals the average squared is estimated, we obtain a set of singular residuals. After the which form a basis of the signal subvectors space. As we will see in Section II-B, this subspace can substitute the full space in finding the local training samples. For noise subspace, we estimate the dimension of its complementary subspace, denoted by , as (2) where is a parameter. Then we obtain a set of singular vectors which form a basis of the noise subspace. As will be shown in Section II-C, this subspace is used to exclude noise elements from the local basis. The thus controls the trade-off between noise reduction and estimation fidelity, and we choose (3) where is a preset constant. It can be seen from (3) that larger values are set for higher noise cases. In doing so, more noise elements can be excluded so as to reduce the risk of overfitting. Note that the used above is estimated by using the robust median estimator [9] as (4)
B. Step 2: Local Basis Training In this step, we intend to learn the local basis of each patch. To do this, we first find the training samples for each patch through a patch grouping procedure and perform an SVD on the sample data matrix. After that, the leading singular vectors are selected as candidate vectors for representing the corresponding patch. It is well known that images often contain self similarity and this property has been exploited in modern denoising methods, leading to very impressive results [3], [5]. Following the same idea, we find training samples of the local patch by selecting a group of similar patches in its neighborhood. While the grouping can be realized by various techniques, we employ block matching technique for its simplicity. Particularly, in the patch grouping process, distances are computed in the signal subspace previously obtained rather than in the full space. be the patch to be denoised. We look for the similar Let within the search windows centered on . The patch is defined as block distance between and (5) where
is projected vector of
on the signal subspace, i.e., . Using the above similarity metric, we select the patches whose distance from is smaller , where than a threshold . As in [8], we set is the maximum distance for which two patches are considered similar. Suppose we select sample patches in total. To guarantee there are enough samples for local subspace analysis, could not be too small. In practice, we use samples, where . That is to say, if (or ), we will use the best (or ) matched samples in training. After the patch grouping, we organize each group into a local whose columns represent the training data matrix samples. We view these samples as i.i.d. observations of the contaminated by an additive zero-mean white Gaussian noise with a standard deviation . Now the problem of basis learning boils down to the rank estimation of . In present work, we use the rank estimation method proposed in [10]. In this method, is first estimated using matrix perturbation theory, and the rank estimation is then carried out through a sequence of hypothesis tests, which is based on recent results in random matrix theory (RMT) regarding the behavior of noise eigenvalues (see [10] for more details). Once the rank is estimated, the candidate vectors of the local basis are obtained by taking the first singular vectors. C. Step 3: Local Basis Screening The learning method proposed in previous section may substantially overestimate the number of basis vectors or patterns. The incorrect inclusion of these patterns will lead to noticeable degradation in visual quality of the restored image. The main reason for this deficiency is due to the fact that the rank estimator we used is based on the assumption that the samples are i.i.d. observations of the signal to be detected [10]. However, this does not hold in our case. The samples obtained by patch grouping may be quite different from each other, which adversely affects the estimation accuracy. The performance of the rank estimator
HE et al.: ADAPTIVE DENOISING BY SINGULAR VALUE DECOMPOSITION
217
is also limited by the finite number of samples and signal dimension. In the following, we present two methods to remedy the problem. The first method, called null-basis approximation (NA), considers the restoration of the patches which have no strong edge structures. For these smooth patches, the signal-to-noise ratios are usually low and it is difficult to find the correct samples to train the local basis. Therefore, for each patch , before performing step 2, we first check the average energy of its group with respect to the noise variance as (6) If (6) holds, we simply skip step 2 and use the mean of as its estimation. Since the neighboring patches (with only one row difference) are very similar, in (6) we use the of previous patch. The of each patch is updated as follows. If the patch is a normal one, i.e., (6) does not hold, its is calculated by the rank estimator in Step 2; otherwise, we set
TABLE I PSNR RESULTS FOR BARBARA IMAGE DENOISED WITH VARIOUS ALGORITHMS. THE BEST RESULTS ARE IN BOLD
as used in PCA. 3) Finally, in transformed domain ASVD uses the latest RMT-based rank estimation technique followed by the basis screening methods to select the signal basis, while most of the previous methods apply coefficient shrinkage to attenuate the noise. For example, LPG-PCA employs the linear minimum mean square-error estimation (LMMSE) in the PCA domain. Our method corresponding to each feature above is termed feature module. The three feature modules enhance the algorithm’s capability in preserving local image details. As a consequence, ASVD needs perform only one denoising stage while other methods like LPG-PCA and BM3D resort two stages to achieve good denoising performances.
(7) where is a constant. Since natural images contain large portions of homogeneous regions, if the current patch is smooth, it is very likely that its neighboring patches are also smooth. Based on this consideration, we introduce the parameter to slightly overestimate the which will be used for checking the next patch so as to reduce the risk of overfitting. Beside the method of NA, we propose another method, called basis pruning (BP), to further improve the accuracy of basis selection. Recall that we have determined the global noise subspace in Step 1. Armed with this information, we can remove the candidate vectors that look more like noise rather than signals. Specifically, for each candidate vector , we calculate its inner and then find the global product with the global basis basis vector that maximizes the inner product. If this vector belongs to the noise subspace, we regard the current is learned from noise and thus exclude it from the final local basis. D. Discussion The novelty of this work is the way we select the local basis for denoising. In comparison with the previous methods, the proposed ASVD algorithm has three main features which account for the performance improvement. 1) ASVD performs training sample selection in the signal subspace while other methods like LPG-PCA perform this task in the full space. And this makes ASVD algorithm more robust to additive noise [3]. 2) Instead of using the popular PCA, ASVD employs SVD to transform the sample dataset. Working on the column covariance matrix, PCA is almost identical to SVD except that it requires the mean of samples to be subtracted first. However, the mean we estimated may deviate significantly from the true value due to the existence of unmatched samples. Therefore, the result of PCA transform may not be as accurate as that of SVD transform. To reduce the computational complexity in calculating SVD, ASVD applies the same eigen-decomposition procedure
III. RESULTS The performance of ASVD is evaluated on four 512 512 grayscale images: Lena, Barbara, Goldhill, and Boat.1 The noisy images are produced by adding zero mean white Gaussian noise . Throughout with standard deviation this study, the size of image patch is set as 11 11 and the local search window is fixed as 39 39. For the other parameters, we , and . set A. Evaluation of Feature Module We start with the evaluation of the three feature modules of ASVD. To study their individual contributions to the overall performance, we devise the following three algorithms for testing the modules respectively. 1) ASVD algorithm with signal subspace replaced by full space in module 1 (A1_FS); 2) ASVD algorithm with SVD replaced by PCA in module 2 (A2_PCA); 3) ASVD algorithm with basis selection scheme replaced by LMMSE filter in module 3 (A3_LMMSE). Table I lists the output PSNR results of Barbara image denoised with above algorithms. The result of LPG-PCA algorithm with only one denoising stage is also provided for comparison. Compared to the original ASVD, A3_LMMSE shows the largest PSNR drop among the three modified algorithms. This indicates that the highest PSNR gain can be achieved by replacing LMMSE filter with our basis selection scheme. Furthermore, the scheme of subspace sample selection in module 1 also contributes to the performance gain and the gain grows as the noise level increases. Finally, we can also see that the replacement of PCA by SVD in transform brings improvements at high noise levels. B. Performance Comparison We compare ASVD with LPG-PCA [8] and BM3D [5]. Table II shows the PSNR and SSIM results of these three algorithms. We can see that ASVD shows superior performance 1Matlab
code of our algorithm is available at http://ieeexplore.ieee.org.
218
IEEE SIGNAL PROCESSING LETTERS, VOL. 18, NO. 4, APRIL 2011
TABLE II PSNR AND SSIM RESULTS OF LPG-PCA, BM3D, AND ASVD ALGORITHMS. THE BEST RESULTS ARE IN BOLD
Fig. 1. Denoising of Lena corrupted by noise with
over LPG-PCA, except for the lowest noise level. And the denoising performances of ASVD and BM3D are comparable. Particularly, at high noise levels, ASVD provides the best results among the three algorithms. As shown in [3] and [8], local adaptive methods perform well in capturing structured patterns. Not surprisingly, ASVD shows outstanding performances for images with rich textures. For example, for Barbara image, ASVD outperforms the other two methods, and on average it is superior to LPG-PCA by 1.18 dB and to BM3D by 0.32 dB. Fig. 1 shows the denoising results of Lena image obtained by BM3D and ASVD algorithms. We can see that ASVD provides better visual quality. While some edges and textures are blurred or lost in BM3D, they are better preserved in ASVD. The visual comparison is further illustrated in Fig. 2. We observe that the superiority of ASVD becomes more evident at a higher noise level. This demonstrates the effectiveness of ASVD in preserving image details such as edges and textures. IV. CONCLUSION This work has presented a framework for denoising by learning a suitable basis to represent each image patch. By exploiting the merits of global subspace analysis, the problems of previous methods are effectively tackled. The experimental results demonstrate that the proposed method achieves the state-of-the-art denoising performance in terms of both PSNR and subjective visual quality at medium to high noise levels.
= 25.
Fig. 2. Denoising of Barbara corrupted by noise with
= 50.
REFERENCES [1] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denoising,” in Proc. IEEE Conf. Comput. Vision Pattern Recognition, 2005, pp. 60–65. [2] D. van de Ville and M. Kocher, “SURE-Based non-local means,” IEEE Signal Process. Lett., vol. 16, no. 11, pp. 973–976, Nov. 2009. [3] T. Tasdizen, “Principal neighborhood dictionaries for nonlocal means image denoising,” IEEE Trans. Image Process., vol. 18, no. 12, pp. 2649–2660, Dec. 2009. [4] R. Vignesh, B. T. Oh, and C.-C. J. Kuo, “Fast non-local means (NLM) computation with probabilistic early termination,” IEEE Signal Process. Lett., vol. 17, no. 3, pp. 277–280, Mar. 2010. [5] K. Dabov, A. Foi, V. Katkovnik, and K. O. Egiazarian, “Image denoising by sparse 3-D transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16, no. 8, pp. 2080–2095, Aug. 2007. [6] M. Elad and M. Aharon, “Image denoising via sparse and redundant representations over learned dictionaries,” IEEE Trans. Image Process., vol. 15, no. 12, pp. 3736–3745, Dec. 2006. [7] D. D. Muresan and T. W. Parks, “Adaptive principal components and image denoising,” in Proc. Int. Conf. Image Process., 2003, vol. 1, pp. 101–104. [8] L. Zhang, W. Dong, D. Zhang, and G. Shi, “Two-stage image denoising by principal component analysis with local pixel grouping,” Patt Recog, vol. 43, pp. 1531–1549, 2010. [9] D. L. Donoho and I. M. Johnstone, “Ideal spatial adaptation via wavelet shrinkage,” Biometrica, vol. 81, pp. 425–455, 1994. [10] S. Kritchman and B. Nadler, “Non-Parametric detection of the number of signals: Hypothesis testing and random matrix theory,” IEEE Trans. Signal Process., vol. 57, no. 10, pp. 3930–3941, Oct. 2009.