Anisotropic Nonlocal Means Denoising Arian Maleki1,∗, Manjari Narayan2 , Richard G. Baraniuk3,∗∗ Dept. of Computer and Electrical Engineering, Rice University, MS-380 6100 Main Street, Houston, TX 77005, USA
Abstract It has recently been proved that the popular nonlocal means (NLM) denoising algorithm does not optimally denoise images with sharp edges. Its weakness lies in the isotropic nature of the neighborhoods it uses to set its smoothing weights. In response, in this paper we introduce several theoretical and practical anisotropic nonlocal means (ANLM) algorithms and prove that they are near minimax optimal for edge-dominated images from the Horizon class. On real-world test images, an ANLM algorithm that adapts to the underlying image gradients outperforms NLM by a significant margin. Keywords: Denoising, nonlocal means, minimax risk, anisotropy
1. Introduction Image denoising is a fundamental primitive in image processing and computer vision. Denoising algorithms have evolved from the classical linear and median filters to more modern schemes like total variation denoising [1], ∗
Corresponding author Principal corresponding author Email addresses:
[email protected] (Arian Maleki),
[email protected] (Manjari Narayan),
[email protected] (Richard G. Baraniuk) URL: http://www.ece.rice.edu/~mam15/ (Arian Maleki), http://dsp.rice.edu/dspmember/12 (Manjari Narayan), http://web.ece.rice.edu/richb/ (Richard G. Baraniuk) 1 Phone: +1 713.348.3579; Fax: +1 713.348.5685 2 Phone: +1 713.348.2371; Fax: +1 713.348.5685 3 Phone: +1 713.348.5132; Fax: +1 713.348.5685 ∗∗
Preprint submitted to Applied and Computational Harmonic Analysis November 30, 2011
t2
h(t1 ) t1
Figure 1: An example of a Horizon class image that features a smooth edge contour that separates the white region from the black region. wavelet thresholding [2], and bilateral filters [3–6]. A particularly successful denoising scheme is the nonlocal means (NLM) algorithm [7], which estimates each pixel value as a weighted average of other, similar noisy pixels. However, instead of using spatial adjacency or noisy pixel value as the similarity measure to adjust the estimate weights, NLM uses a more reliable notion of similarity based on the resemblance of the pixels’ neighborhoods in high-dimensional space. This unique feature benefits NLM in two ways. First, it provides more accurate weight estimates. Second, it enables NLM to exploit the contribution of all pixels in the image. In concert, these features enable NLM to provide state-of-the-art performance for a large class of image denoising problems. Nevertheless, in a recent paper, we have proved that NLM does not attain optimal performance on images with sharp edges from the so-called Horizon class (see Figure 1) [8]. Indeed, NLM’s theoretical performance is more or less equivalent to wavelet thresholding, which was shown to be suboptimal in [9]. The core problem is that NLM (and wavelet thresholding) cannot exploit the smoothness of the edge contour that separates the white and black regions. Therefore, there is room for improvement. In this paper, we introduce and study a new denoising framework and prove that it is near-optimal for Horizon class images with sharp edges. Anisotropic nonlocal means (ANLM) outperforms NLM, wavelet threshold2
t2
t2
(i2 , jb )
t1
(i1, ja )
t1
Figure 2: Comparison of (left) the isotropic neighborhoods employed by Non Local Means (NLM) versus (right) anisotropic neighborhoods employed by Anisotropic NLM (ANLM).
ing, and more classical techniques by using anisotropic neighborhoods that are elongated along and matched to the local edge orientation. Figure 2 compares the neighborhoods used in ANLM with those used in NLM. Anisotropic neighborhoods enable ANLM to distinguish between similar and dissimilar pixels more accurately. We develop three different ANLM algorithms of increasing levels of practicality. Oracle Anisotropic Nonlocal Means (OANLM) assumes perfect knowledge of the local orientation of the edge contour and is used primarily for our theoretical optimality analysis. Discrete-angle Anisotropic Nonlocal Means (DANLM) optimizes the choice of the anisotropic neighborhood around each pixel in order to achieve near-optimal performance without any oracle information. Since it is more computationally demanding than NLM, we introduce an algorithmic simplification. Gradient based Anisotropic Nonlocal Means (GANLM) uses image gradient information to estimate the edge orientation; using simulations, we demonstrate that GANLM significantly outperforms NLM in practice on both Horizon class and real-world images. The paper is organized as follows. Section 2 explains the minimax framework we use to analyze the denoising algorithms and reviews the necessary background. Section 3 introduces the OANLM and DANLM algorithms and presents the main theorems. Section 4 addresses some of the practical ANLM issues by introducing GANLM and summarizes the results of a range of simu3
lations using synthetic and real-world imagery. Section 5 contains the proofs of the main theorems. Section 6 reviews the related work in the literature. Section 7 closes with a discussion of our current and potential future results. 2. Minimax analysis framework In this section we introduce the minimax framework [2, 10] and the Horizon class image model considered in this paper. Note that, in order to streamline the proofs, we take a continuous-variable analysis approach in this paper, in contrast to our approach in [8]. The moral of the story is compatible with [8], however. 2.1. Risk We are interested in estimating an image described by the function f : [0, 1]2 → [0, 1] (f ∈ L2 ([0, 1]2 )) from its noisy observation dY (t1 , t2 ) = f (t1 , t2 )dt1 dt2 + σdW (t1 , t2 ).
(1)
Without loss of generality, we consider only square images. Here W (t1 , t2 ) is the Wiener sheet,4 and σ is a constant that scales the noise. For a given function f and a given estimator fˆ, define the risk as R(f, fˆ) = E(kf − fˆk22 ), where the expected value is over W . The risk can be decomposed into bias squared and variance terms R(f, fˆ) = kf − Efˆk22 + Ekfˆ − Efˆk22 . Let f belong to a class of functions F. The risk of the estimator fˆ on F is defined as the risk of the least-favorable function, i.e., R(F, fˆ) = sup R(f, fˆ). f ∈F
4
The Wiener sheet is the primitive of white noise.
4
The minimax risk over the class of functions F is then defined as R∗ (F) = inf sup R(f, fˆ). fˆ f ∈F
R∗ (F) is a lower bound for the performance of any estimator on F. In this paper, we are interested in the asymptotic setting as σ → 0. For all the estimators we consider, R(F, fˆ) → 0 as σ → 0. Therefore, following [11, 12] we will consider the decay rate of the minimax risk as our measure of performance. 2.2. Horizon edge model In our analysis, we consider the Horizon model that contains piecewise constant images with sharp step edges lying along a smooth contour [9, 11, 13] (our analysis extends easily to piecewise smooth edges). Let H o¨lderα (C) be the class of H¨older functions on R, defined in the following way: h ∈ H o¨lderα (C) if and only if |h(k) (t1 ) − h(k) (t01 )| ≤ C|t1 − t01 |α−k , where k = bαc. Consider a transformation that maps each one-dimensional edge contour function h to a two-dimensional image fh : [0, 1]2 → R via fh (t1 , t2 ) = 1{t2 1, then the optimal neighborhood size is given by δs = Θ(σ −2α/(α+1) | log σ|2α/(α+1)) and δ` = Θ(σ −2/(α+1) | log σ|2/(α+1) ). 10 In smooth regions, where the pixel neighborhoods do not intersect with the edge, neither the shape nor the orientation of the neighborhood affect the denoising performance.
12
t2
t1
Figure 6: The OANLM neighborhoods satisfy the quadratic scaling δs = δ`2 and are aligned with the edge. In regions far from the edge, isotropic neighborhoods perform just as well.
Theorem 2. If fˆO is the OANLM estimator with the threshold τσ = √
2 , | log σ|
then R(H α (C), fˆO ) = O(σ 4/3 | log σ|4/3 ). The proof of Theorem 2 can be √ simply modified to provide a bound for the risk of NLM as well. Let δ = δs δ` , where δs and δ` are as given in OANLM algorithm. p Corollary 1. If fˆN is the NLM estimator with the threshold τσ = 2/ | log(σ)|, then R(H α (C), fˆN ) = O(σ| log σ|). In fact, under certain mild assumptions, the risk of NLM can be lower bounded by Ω(σ). The proof is similar to the proof of Theorem 5 in [8]. However, for the sake of completeness and since our framework differs from [8], we overview the main steps of the proof in Appendix B. Theorem 2 is based on the strong oracle assumption that the edge direction is known exactly. Needless to say, such information is rarely available in 13
practical applications. Consider a weaker notion of OANLM that has access to an estimate θˆγ of θγ that satisfies |θˆγ − θγ | ≤ Θ(σ β ).
(8)
OANLM with exact edge orientation information corresponds to β = ∞ in this model. When β < ∞, the choices δs = σ 2/3 | log σ|2/3 and δ` = σ 4/3 | log σ|4/3 are not necessarily optimal. The following theorem characterˆ izes the performance of OANLM using θ. Theorem 3. Let the OANLM estimator use the inaccurate edge orientation θˆγ satisfying (8). Set the neighborhood sizes to δ` = 2/3 2/3 −1+β/2 min(σ | log σ|, σ |p log σ|) and δs = σ 2 log2 σ/δ` . Then the risk of the estimator with τσ = 2/ | log(σ)| satisfies R(H α (C), fˆO ) = O(δs Poly(| log σ|)), where Poly(| log σ|) is a polynomial of degree at most 2 in terms of | log σ|. If the edge estimate is exact, then the result simplifies to the result of Theorem 2. However, this theorem confirms that OANLM algorithm achieves the optimal rate even if there is an error in θˆ of order O(σ β ) with β > 2/3. Corollary 2. Let the OANLM estimator use the inaccurate edge orientation θˆγ satisfying (8). Set the neighborhood sizes to δ` = σ 1−β/2 | log σ| and δs = σ 1+β/2 | log σ|. Then, the risk of the estimator satisfies R(H α (C), fˆO ) = O(σ 1+β/2 Poly(| log σ|)). This corollary suggests that, as long as we have an edge orientation estimate that improves as the number of pixels increases, OANLM outperforms NLM. Also note that, as β decreases, the neighborhoods become more isotropic. 3.3. Discrete angle ANLM In this section, we introduce the Discrete-angle ANLM (DANLM) algorithm that achieves the minimax rate without oracle information. The key 14
t2
(u, v)
t1
Figure 7: DANLM neighborhoods around one pixel location for q = 4.
idea is to calculate the neighborhood distance over several directional neighborhoods and fuse them to obtain a similarity measure that works well for all directions. As for OANLM, set δs = σ 4/3 | log(σ)|4/3 and δ` = σ 2/3 | log(σ)|2/3 , and let q = πσ −2/3 . Define the angles θ0 = 0, θ1 = σ 2/3 , θ2 = 2σ 2/3 , . . . , θq = π − σ 2/3 . For a point (t1 , t2 ) we consider all of the anisotropic, directional neighborhoods for θ ∈ Θ. Figure 7 displays four of these neighborhoods. Define the discrete angle anisotropic distance between dY (t1 , t2 ) and dY (s1 , s2 ) d2A (dY (t1 , t2 ), dY (s1 , s2 )) , min d2θ,δs ,δ` (dY (t1 , t2 ), dY (s1 , s2 )), θ∈Θ
where d2θ,δs ,δ` (dY (t1 , t2 ), dY (s1 , s2 )) is defined in (6). The DANLM estimate is then given by R wAN (s , s )X(s1 , s2 )ds1 ds2 (s1 ,s2 )∈S t1 ,t2 1 2 D ˆ R . f (t1 , t2 ) = wAN (s , s )ds1 ds2 (s1 ,s2 )∈S t1 ,t2 1 2 where wtAN (s1 , s2 ) 1 ,t2
1 = 0
if d2A (dY (t1 , t2 ), dY (s1 , s2 )) ≤ otherwise.
ns n` σ 2 δs δ`
+ τσ ,
(9)
In summary, we note the following features of the DANLM algorithm. First, it uses the quadratic scaling δs = δ`2 . Second, the optimal neighborhood 15
direction can change from pixel to pixel. The following theorem, proved in Section 5, shows that the risk of this algorithm is within the logarithmic factor of the minimax risk. Theorem 4. The risk of DANLM satisfies R(H α (Cα ), fˆD ) ≤ O(σ 4/3 | log(σ)|4/3 ). Figure 8 reprises the simulation experiment of Figure 4 using the four algorithms described thus far: NLM, OANLM with perfect knowledge of the edge orientation, OANLM with imperfect knowledge of the edge orientation, and DANLM with four angles. All three of the latter algorithms outperform the isotropic NLM. 4. Practical ANLM algorithms and experiments In this section we introduce a practical gradient-based ANLM algorithm and then complement the above theoretical arguments with additional simulations and experiments with synthetic and real-world imagery. 4.1. Extension to discrete images In practice the observations are noisy pixelated values of an image and the objective is only to estimate the pixelated values. In this section we explain how the ideas of directional neighborhood and ANLM can be extended to the discrete settings. Suppose we are interested in estimating a iid n × n image f ( ni , nj ) with noisy observations oi,j = f ( ni , nj ) + ζi,j , where ζi,j ∼ N (0, ξ 2 ) and ξ is the standard deviation of the noise. The extension of the ¯ anisotropic neighborhood discrete 1 2 1 2 to then−1 setting is straightforward. Let S = n−1 ˜ , , . . . , n , 1 × n , n , . . . , n , 1 and S = {1, 2, . . . , n} × {1, 2, . . . , n}. n n ¯ = B ∩ S. ¯ For a given set B ⊂ S we define B The discrete (θ, δs , δ` )-distance between two pixels oi,j and om,` is defined as 1 X d¯2θ,δs ,δ` (oi,j , om,` ) , (oi+r,j+q − om+r,`+q )2 , |P| (r,q)∈P
16
(10)
(a) NLM, PSNR = 26 dB
(b) OANLM with no error, PSNR = 28.9 dB
(c) ONALM with error, PSNR = 28.6 dB
(d) DANLM, PSNR = 27.5 dB
Figure 8: Continuation of the experiment of Figure 4 that compares (a) isotropic NLM, (b) OANLM with perfect knowledge of the edge direction, (c) OANLM with 10% error in the knowledge of the edge direction, (d) DANLM with q = 4 angles. The images are of size 256 × 256 pixels. In the ANLM algorithms, The neighbor√ δs and δ` are the same as in Figure 4. ◦ hood size of NLM is δs × δ` . The edge orientation is 135 . The standard deviation of the noise is ξ = 0.5. where P = {(r, q) ∈ Z2 | ( i+r , j+q ) ∈ I¯θ,δs ,δ` (i/n, j/n)}. See Figure 9. The n n i j ANLM estimate at pixel f ( n , n ) is given by 17
Figure 9: The anisotropic neighborhood I¯θ,δs ,δ` ( ni , nj ) in the discrete setting for two different pixels of a very simple Horizon class image.
θ,δs ,δ` fˆi,j =
θ,δ` ,δs w¯i,j (m, `)om,` , P P θ,δ` ,δs ¯i,j (m, `) m `w
P P m
`
where the weights are obtained from the distances d¯2θ,δs ,δ` (oi,j , om,` ). For instance, the simple policy introduced in (7) corresponds to 1 if d¯2θ,δs ,δ` (oi,j , om,` ) ≤ 2ξ 2 + τn , θ,δ` ,δs (11) w¯i,j (m, `) = 0 otherwise. Using the discrete anisotropic neighborhood and distance, the extensions of OANLM and DANLM to the discrete setting is straightforward. 4.2. Gradient-based ANLM While DANLM is somewhat practical and also theoretically optimal for the Horizon class of images, its computational complexity is higher than NLM and grows linearly in the number of directions q, where q ∼ o(n2/3 ). As a more practical alternative, we propose Gradient based ANLM (GANLM), which adjusts the orientation of an anisotropic ANLM neighborhood using an estimate of the edge orientation provided by the image gradients. Pseudocode is given in Algorithm 4.2. Note that GANLM reverts back to NLM in regions with low image gradients, since they will not be “edgy” enough to benefit from the special treatment. 18
Algorithm 1 Gradient-based ANLM (GANLM) Inputs: fˆi,j : Estimate of the image pixel δs × δ` : Size of the neighborhood λ: Threshold that determines isotropic/anisotropic neighborhood selection Estimate image gradient (gh (i, j), gv (i, j)) at each pixel (i, j) for every pixel p (i, j) ∈ I do 2 2 g(i, j) = g h (i, j) + g v (i, j) gv (i, j) θi,j = tan−1 gh (i, j) if gi ≥ λ then Perform ANLM at pixel yi,j with dδ` ,δs ,θi,j else √ Perform NLM at pixel yi,j with δs δ` . end if end for There is a rich literature on robust image gradient estimation [23–25]. Most simply, if gh (i, j) and gv (i, j) are the estimated image derivatives at pixel (i, j),then we can estimate the local orientation of an edge by g (i,j) ˆ j) = tan−1 v . To allay any concerns that gradient-based adaptivity θ(i, gh (i,j) is not robust to noise and errors, we recall Theorem 3, which establishes the robustness of OANLM to edge angle estimation error. For extremely noisy images, numerous heuristics are possible, including estimating the image gradients for GANLM from an isotropic NLM pilot estimate. Figure 10 builds on Figure 8 with a slightly more realistic, curved edge and demonstrates that the pilot estimate approach to GANLM performs almost as well as an oracle GANLM that has access to the edge gradient. Table 2 summarizes the performance of the algorithms introduced in this paper with that of NLM on the natural test images Barbara [26], Boats [27], and Wet Paint [28] submerged in noise of two different levels. The table demonstrates two facts. First, the performance of the practical GANLM algorithm is very close to the oracle GANLM algorithm. Second these two algorithms outperform DANLM in all but one case and significantly out-
19
(a) Horizon image
(b) Noisy image, PSNR = 16.5dB
(c) NLM, PSNR = 34.4dB
(d) DANLM, PSNR = 36.2dB
(e) Oracle GANLM, PSNR = 38dB
(f) GANLM, PSNR = 38dB
Figure 10: Simulation experiment in the same vein as Figures 4 and 8 but with a slightly more realistic C 2 curved edge. The images are of size 512 × 512 pixels. In the ANLM algorithms δs and δ` follow √ the quadratic scaling of Theorem 2. The neighborhood size of NLM is δs × δ` . The standard deviation of the noise is ξ = 0.15.
20
Table 2: Performance of NLM and ANLM algorithms on three test images at two noise levels. Test image Barbara (512 x 512)
Boats (512 x 512)
Wet Paint (1024 x 1024)
Algorithm NLM DANLM Oracle GANLM GANLM NLM DANLM Oracle GANLM GANLM NLM DANLM Oracle GANLM GANLM
ξ = 0.25 22.48 23.15 23.51 23.50 22.75 23.45 23.88 23.75 27.66 28.41 29.02 28.86
ξ = 0.15 25.86 26.27 26.63 26.60 25.83 26.45 26.45 26.35 30.49 30.75 31.18 31.07
perform standard NLM in all cases. We use 4 discrete angles11 in DANLM and attribute the superior performance of GANLM over DANLM to more accurate selection of orientations. 5. Proofs of the main results 5.1. Preamble We first introduce some notation. Define the following partitions of the set S: S1 , {(v, u) | u > h(v) + (1 + C/2)δs }, S2 , {(v, u) | h(v) − (1 + C/2)δs < u < h(v) + (1 + C/2)δs }, S3 , {(v, u) | u < h(v) − (1 + C/2)δs }. It is important to note that, if (t1 , t2 ) ∈ S1 and tan(θ) = h0 (t1 ), then Iθ,δs ,δ` (t1 , t2 ) does not overlap with the edge contour. In other words, the correctly aligned neighborhood of (t1 , t2 ) ∈ S1 is always above the edge. The points in S3 satisfy a similar property. This is clarified in Figure 11. 11
Experiments with larger q did not differ appreciably in PSNR at this image size.
21
S1
t2 S2
(u’,v’) S3
(u,v)
t1
Figure 11: Regions S1 , S2 and S3 . The neighborhood of (u, v) ∈ S3 is aligned with the edge contour, and therefore it does not intersect the edge, while the neighborhood of (u0 , v 0 ) is not aligned and therefore may intersect with the edge. The neighborhoods of the pixels in S2 may intersect the edge even though they are correctly aligned.
We further partition S1 into P1 and P2 and S3 into P3 and P4 such that P1 P2 P3 P4
, , , ,
{(v, u) {(v, u) {(v, u) {(v, u)
| | | |
h(v) + 2δ` + C/2δs ≤ u}, h(v) + (1 + C/2)δs ≤ u ≤ h(v) + 2δ` + C/2δs }, h(v) − (1 + C/2)δs ≥ u ≥ h(v) − 2δ` − C/2δs }, u ≤ h(v) − 2δ` − C/2δs }.
Lemma 1. Any neighborhood of pixel (v, u) ∈ P1 will lie completely above the edge contour. Proof. Let (t1 , t2 ) ∈ P1 . If (u, v) ∈ Iθ,δs ,δ` (t1 , t2 ), then t2 − v < δ` . On the other hand, for t01 ∈ [t1 − δ` , t1 + δ` ] we have, 1 h(t01 ) = h(t1 ) + h0 (t1 )(t01 − t1 ) + h00 (t001 )(t01 − t1 )2 , 2 22
(12)
P1
P2 t2 P3
S1 S2
!!
S3
!s
P4
!s !!
t1
Figure 12: The regions P1 , P2 , P3 , P4 with αs = (1 + C/2)δs and α` = 2δ` − δs . Every neighborhood of (t1 , t2 ) ∈ P1 will lie completely above the edge contour. However, some of the neighborhoods of the pixels (t1 , t2 ) ∈ P2 may intersect the edge. A similar property holds for the regions P3 and P4 .
where t001 is between t1 and t01 . Therefore, 1 h(t01 ) − h(t1 ) < δ` + Cδs , 2 Comparing (12) and (13) completes the proof.
(13)
In spite of P1 , some of the neighborhoods of the pixels (v, u) ∈ P2 may intersect the edge. A similar property holds for P3 and P4 , respectively. Figure 12 displays these regions. The following lemma, proved in [8], will be used in the proofs of the main theorems. For the sake of completeness, we sketch the proof here. 2 Lemma 2. Let 1 , Z2 , . . . , Zr be iid N (0, 1) random variables. The χr ranPZ r 2 dom variable i=1 Zi concentrates around its mean with high probability, i.e., ! r 1X 2 Zi − 1 > t ≤ e− 2 (t−ln(1+t)) , (14) P r i ! r 1X 2 P Zi − 1 < −t ≤ e− 2 (t+ln(1−t)) . (15) r i
23
Proof. Here we prove (14); the proof of (15) follows along very similar lines. From Markov’s Inequality, we have ! ! r η Pr 2 X 1 2 −ηt−η P Zi − 1 > t ≤ e E e r i=1 Zi r i=1 2 r ηZ1 e−ηt−η −ηt−η E e r = = e (16) r2 . 1 − 2η r The last inequality follows from Lemma 3 in [8]. The upper bound proved above holds for any η < 2r . To obtain the lowest upper bound we mini−ηt−η −ηt−η rt mize e 2η r2 over η. The optimal value is η ? = arg minη e 2η r2 = 2(t+1) . (1− r ) (1− r ) Plugging η ? into (16) proves the lemma. 5.2. Proof of Theorem 2 In the set up above, we consider three different regions for the point (t1 , t2 ). As we will see in the proof, the risk of all pixels in each region has the same upper bound. We calculate these upper bounds and then combine them to obtain a master upper bound for the risk of OANLM. Case I: (t1 , t2 ) ∈ S1 . We know that if the anisotropic neighborhood of (t1 , t2 ), Iθ,δs ,δ` is aligned with the edge contour, i.e., tan(θ) = h0 (t1 ), then it does not intersect the edge contour. To calculate the OANLM estimate we first calculate the weights. Define Z ns n` θ,δs ,δ` dW (s1 , s2 ), (17) zt1 ,t2 (j1 , j2 ) = j1 ,j2 δs δ` (s1 ,s2 )∈Iθ,δ ,δ s
`
and ns n` Z(t1 , t2 ) = δs δ`
Z Z
ns n` F (t1 , t2 ) = δs δ`
Z Z
dW (s1 , s2 ), (s1 ,s2 )∈I
δs , δ` θ, n s n`
(t1 ,t2 )
(s1 ,s2 )∈I
δs , δ` θ, n s n`
(t1 ,t2 )
f (s1 , s2 )ds1 ds2 .
24
For notational simplicity we will use w(s1 , s2 ) and zt1 ,t2 (j1 , j2 ) instead of s ,δ` s ,δ` (j1 , j2 ), respectively. Define (s1 , s2 ) and ztθ,δ wtθ,δ 1 ,t2 1 ,t2 A1 , {(u, v) ∈ P1 | w(u, v) = 1}, A2 , {(u, v) ∈ P4 | w(u, v) = 0}. Here, let λ(·) denote the Lebesgue measure of a set over R2 . Lemma 3. Let δ` = 2σ 2/3 | log(σ)|2/3 and δs = 4σ 4/3 | log(σ)|4/3 , ns = 2| log(σ)|2/3 n` = | log(σ)|4/3 , and τσ = √ 2 . Then, | log σ|
σ8 P(λ(P1 ) − λ(A1 ) > ) = O , 8 σ P(λ(P4 ) − λ(A2 ) > ) = O .
Proof. Here we prove the first inequality. The proof of the second inequality follows a similar route. Since zt1 ,t2 (j1 , j2 ) is the integral of the Wiener 2 sheet, we have zt1 ,t2 (j1 , j2 ) ∼ N (0, nsδns δ``σ ). Combining this with Lemma 2 we conclude that ns n` t 2 ns n` σ 2 2 P dθ,δs ,δ` − 2 ≤ t ≤ e− 4 = O(σ 8 ). δs δ` Therefore, Z
Z I((u, v) ∈ A) =
E(λ(A)) = E (u,v)∈P1
P((u, v) ∈ A) (u,v)∈P1
= λ(P1 ) − O(σ 8 ).
(18)
An upper bound for P(λ(P1 ) − λ(A) > ) using the Markov inequality yields the result 8 σ E(λ(P1 ) − λ(A)) =O . P(λ(P1 ) − λ(A) > ) ≤
25
Define the event E as E = {λ(P1 ) − λ(A1 ) < σ 2 } ∩ {λ(P4 ) − λ(A2 ) < σ 2 }. For notational simplicity we also define the following notations: wXdudv Xdudv wdudv Zdudv wZdudv
, , , , ,
w(u, v)X(u, v)dudv, X(u, v)dudv, w(u, v)dudv, Z(u, v)dudv, w(u, v)Z(u, v)dudv.
The risk of the OANLM estimator at (t1 , t2 ) ∈ S1 is then given by R 2 wXdudv S E R . wdudv S Define U =
R
wXdudv S wdudv
RS
2
. We have
E(U ) = E(U | E)P(E) + E(U | E c )P(E c ) ≤ E(U | E)P (E) + P(E c ).
(19)
Lemma 3 proves that P(E c ) = O(σ 6 ). We have Z Z wXdudv − Xdudv P1 P1 Z Z Z Z = wXdudv − wXdudv + Xdudv − Xdudv A P1 ZP1 AZ1 Z 1 Z wXdudv − wXdudv + Xdudv − Xdudv = P1
A1
A1
= O(λ(P1 \A1 )).
P1
(20)
For the last inequality we have assume that the estimate is bounded over the P1 region. This assumption can be also justified by calculating the probability that this condition does not hold and showing that the probability is negligible. Using arguments similar to (20) it is straightforward to prove that
26
wXdudv − Xdudv = O(λ(P4 \A2 )). P4 P4 Z Z = O(λ(P1 \A1 )). wdudv − dudv P ZP1 Z 1 wdudv − dudv = O(λ(P4 \A2 )).
Z
Z
P4
(21)
P4
Define P14 = P1 ∪ P4 and B = S\P14 . We now calculate an upper bound for the first term of (19) E(U | E)P(E) R !2 R wXdudv + B wXdudv R = E PR14 E P(E) wdudv + B wdudv P14 R !2 R Xdudv + wXdudv RB = E PR1 E P(E) + O(σ 2 ) dudv + wdudv P1 B !2 R R Xdudv + wXdudv P1 R RB ≤ E + O(σ 2 ) dudv + wdudv P1 B !2 !2 R R R Zdudv + B wZdudv wF dudv P1 B R R R ≤ E R +E dudv + wdudv dudv + wdudv P1 P1 B B v v !2 u !2 R R R R u u u F dudv + wF dudv Zdudv + wZdudv P1 P1 tE R RB R RB + 2tE dudv + wdudv dudv + wdudv P1 B P1 B + O(σ 2 ).
(22)
Lemma 4. Let w(u, v) be the weights of OANLM with δ` , δs , ns , n` , and τσ as specified in Lemma 3. Then !2 R wF dudv B R ≤ O(σ 4/3 | log(σ)|4/3 ). E R dudv + B wdudv P1
27
Proof. We have !2 !2 R wF dudv dudv B R E R ≤ E RB dudv + B wdudv dudv P1 P1 2 λ(B) = = O(σ 4/3 | log(σ)|4/3 ). λ(P1 ) R
To obtain the inequality we maximize the numerator and minimize the denomerator independently. Lemma 5. Let w(u, v) be the weights of OANLM with δ` , δs , ns , n` , and τσ as specified in Lemma 3. Then !2 R R Zdudv + wZdudv P1 R RB E ≤ O(σ 4/3 | log(σ)|4/3 ). dudv + wdudv P1 B R Proof. Since B wdudv ≥ 0, and we are interested in the upper bound of the risk, we can remove it from the denominator to obtain !2 !2 R R R R Zdudv + wZdudv Zdudv + wZdudv P1 P1 B R R RB E ≤E dudv + wdudv dudv P1 P1 B !2 !2 R R Zdudv wZdudv B R ≤ E RP1 +E dudv dudv P1 P1 | {z } | {z } V1 V2 v 2 u v !2 u R u R u Zdudv u uE B wZdudv . + 2tE RP1 2 t R dudv P1 dudv P1 | {z } V3
28
Now we obtain upper bounds for V1 ,V2 , and V3 separately. First, we have !2 R R R 0 0 0 0 Z(u, v)Z(u , v )dudvdu dv Zdudv = E P1 P1 V1 = E RP1 R 2 dudv P1 dudv P1 R R 0 0 0 0 E(Z(u, v)Z(u , v ))dudvdu dv P1 I 2δs 2δ` (u,v) θ, n , n s ` = 2 R dudv P1 R R ns n` σ 2 0 0 dudvdu dv P1 Iθ, 2δnss , 2δn`` (u,v) δs δ` δ δ s ` =O ≤ . (23) R 2 ns n` dudv P1 To obtain an upper bound for V2 , we first note that E(w(u, v)Z(u, v)w(u0 , v 0 )Z(u0 , v 0 )) p p ns n` σ 2 ≤ E(w(u, v)Z(u, v))2 E(w(u0 , v 0 )Z(u0 , v 0 ))2 = , δs δ`
(24)
and hence, !2 wZdudv B R = E dudv P1 ! R R 0 0 0 0 0 0 w(u, v)Z(u, v)w(u , v )Z(u , v )dudvdu dv B B R = E ( P1 dudv)2 ! R R ns n` σ2 dudvdu0 dv 0 B B δs δ` R ≤ = O(δ`2 ). 2 ( P1 dudv) R
V2
(25)
Using (23) and (25) it is straightforward to obtain an upper bound for V3 . Combining the upper bounds for V1 , V2 , and V3 completes the proof. Combining (22) with Lemmas 4 and 5 establishes the following upper bound for Case I, where (t1 , t2 ) ∈ S1 : R 2 w(u, v)X(u, v)dudv S R E f (t1 , t2 ) − = O(σ 4/3 | log(σ)|4/3 ). (26) w(u, v)dudv S 29
Case II: (t1 , t2 ) ∈ S2 . In this case we assume that the risk is bounded by 1, since the function f is bounded between 0 and 1. If the estimate is out of this range, then we will map it to either 0 or 1. Case III: (t1 , t2 ) ∈ S3 . This case is exactly the same as Case I , and hence we skip the proof. Finally, combining our results for Cases I, II, and III, we can calculate an upper bound for the risk of OANLM as Z Z o o 2 o 2 ˆ ˆ ˆ R(f, f ) = E(kf − f k ) = (f − f ) dt1 dt2 + (f − fˆo )2 dt1 dt2 S1 ∪S3
= λ(S1 ∪ S3 )O(σ
4/3
S2 4/3
| log(σ)|
) + λ(S2 )O(1) = O(σ 4/3 | log(σ)|4/3 ).
So ends the proof of Theorem 2. 5.3. Proof of Corollary 1 The proof of this corollary follows exactly the same route as that of Theorem 2. We merely redefine the regions Sk and Pk for k ∈ {1, 2, 3}. The new definition of the Sk regions for the NLM algorithm is given by S1n = {(t1 , t2 ) | t2 > h(t1 ) + 2δ}, S3n = {(t1 , t2 ) | t2 < h(t1 ) − 2δ}, and S2n = S\S1n ∪ S3n . Since the neighborhoods in NLM are isotropic, we require a single parameter to describe the neighborhood size, δ = δs = δ` . The main assumption is that C2 δ 2 < σ. This is clear since δ is assumed to go to zero as σ → 0. Therefore in the isotropic case, the neighborhood of the pixels in S1 and S3 does not intersect with the edge contour. Since the neighborhoods no longer have different anisotropic lengths, the size of the intersection of the neighborhoods and the edge contour is identical in all directions and there is no need to define the regions, P1 , . . . , P4 . Equivalently, P1 = S1 , P4 = S4 and P2 = P3 = ∅. With these definitions the proof of this corollary is exactly the same as above. 5.4. Proof of Theorem 3 The proof of this theorem is very similar to the proof of Theorem 2. The only difference is in the definitions of Sk and Pk for k ∈ {1, 2, 3}. Since 30
there is a mismatch between the orientations of the neighborhood and the edge contour, the neighborhood of S1 may intersect the edge. In order to fix this, we define the new regions called Siβ and Piβ . If the error in θ is upper bounded by cβ σ β , then define S1β = {(t1 , t2 ) | t2 > h(t1 ) + cβ σ β δ` + δs + C/2δ`2 }, S3β = {(t1 , t2 ) | t2 < h(t1 ) − cβ σ β δ` − δs − C/2δ`2 }, and S2β = S\S1β ∪ S3β . Furthermore, define P1β = P1 , P4β = P4 , P2β = S1β \P1 , and P3β = S3β \P4 . Using these new partitions, the proof follows exactly the same route as the proof of Theorem 2. 5.5. Proof of Theorem 4 Since the proof is mostly similar to that of Theorem 2, we shall focus on the major differences. The first difference is that we consider the regions P1 – P4 instead of S1 –S3 . Previously the regions P1 and P2 were treated jointly under the region S1 . Instead, we shall now consider P1 and P2 separately, and their differences shall become progressively apparent. Case I: (t1 , t2 ) ∈ P1 . We start with the calculation of the weights w(u, v) for (u, v) ∈ P1 ∪ P4 . Define R1 , {(u, v) ∈ P1 | wD (u, v) = 1}, R2 , {(u, v) ∈ P4 | wD (u, v) = 0}. Lemma 6. Let δs , δ` , ns , n` , t be as defined in Lemma 3 and let q = πσ −2/3 in the DANLM algorithm. Then, πσ 22/3 , πσ 22/3 P(λ(P4 ) − λ(R2 ) > ) ≤ . P(λ(P1 ) − λ(R1 ) > ) ≤
The proof is a simple application of Lemma 3 and the union bound. Note that since (t1 , t2 ) ∈ P1 , all of its directional neighborhoods are located above the edge and do not intersect with the edge contour. Define the event F as F = {λ(P1 ) − λ(R1 ) < σ 2 } ∩ {λ(P4 ) − λ(A4 ) < σ 2 }. 31
Lemma 7 confirms that P(F c ) = O(σ 16/3 ). The rest of the proof of Case I is exactly the same as the proof of Case I in Theorem 2, and therefore we do not repeat it here. Case II: (t1 , t2 ) ∈ P2 . In this case we again start with defining the following two sets: D L1 , {(u, v) ∈ P1 | wu,v = 1}, D L2 , {(u, v) ∈ P4 | wu,v = 0}.
Lemma 7. Let δs , δ` , ns , n` , τσ be as defined in Lemma 3, and let q = πσ −2/3 . Then, P(λ(P1 ) − λ(L1 ) > ) ≤
πσ 22/3 .
Proof. If we prove that for some θ ∈ Θ, Iθ,δs ,δ` (t1 , t2 ) does not intersect the edge, then the proof of the lemma is immediate. Suppose that θ∗ is such that tan(θ∗ ) = h0 (t1 ), and consider θˆ = arg minθ∈Θ |θ − θ∗ |. To ensure that the neighborhood Iθ,δs ,δ` (t1 , t2 ) does not intersect with the edge, we need to have ˆ (t0 − t1 ) − δs > h(t1 ) + h0 (t1 ) (t0 − t1 ) t2 + tan(θ) 2 i 1 00 00 0 + h (t ) t − 2 n
(27)
ˆ = tan(θ∗ ) + for all t0 ∈ [t1 , t1 + δ` ). Also, t00 ∈ [t1 , t0 ]. Clearly, tan(θ) 1 ∆θ, where ∆θ = θˆ − θ∗ . Furthermore, |h00 (t00 )| ≤ C2 and ∆θ ≤ π2 δ` . 1+tan2 (θ0 ) Therefore, (27) will be satisfied if t2 > h (t1 ) +
πδ` δ` + δs + C2 δ`2 . 2
This constraint holds for all pixels in the region P2 . Therefore, at least one of the neighborhoods is completely in the region above h, which in turns proves the lemma. Lemma 8. Let δs , δ` , ns , n` , τσ be as defined in Lemma 3, and let q = πσ −2/3 . Then, πσ 10/3 P(λ(P4 ) − λ(L2 ) > ) ≤ . 32
2!s
!!
v
h(t1 )
w
Figure 13: Explanation of the neighborhood rotation in Lemma 8.
Proof. We first prove that at least half of the area of Iθ,δs ,δ` (u, v) is located below the edge. For notational simplicity we assume that h0 (t1 ) = 0. Consider Figure 13, in which the neighborhood has an arbitrary orientation. Let w, v be as defined in this figure. We then have that Cw2 d − Cw2 − δs sin θ d − Cw2 − δs d − − δs tan θ = ≥ , (28) cos θ cos θ cos θ cos θ where d = t2 − h(t1 ). Suppose d is such that, at the correct angle d − Cw2 − δs > 0, i.e., d > (C + 1)σ 4/3 . According to (28), v > 0, and hence more than half of the area of the neighborhood is in the white region. Therefore, the number of pixels in this region is 4 log2 σ ± o(log2 σ). v=
Case III: (t1 , t2 ) ∈ S2 . The proof of this case is identical to that of Case II in the proof of Theorem 2. Case IV: (t1 , t2 ) ∈ P3 ∪ P4 . The proof of this case is similar to that for the regions P1 and P2 in Cases I and II and therefore is skipped here. Finally, combining the upper bounds obtained in Cases I, II, III, and IV completes the proof of the theorem. 6. Related work on anisotropic denoising Anisotropy is a fundamental feature of real-world images. As a result, anisotropic image processing tools can be traced back at least as far as 33
the 1970s [29]. Here, we briefly compare and contrast some of the relevant anisotrpic denoising schemes with ANLM. Anisotropic filtering methods use a space-varying linear convolution filters to reduce blur along image edges. Toward this goal, [29] considers several different neighborhoods around a pixel and selects the most “homogeneous” neighborhood to smooth over. A more advanced version of this idea can be found in [30]. There are major differences between such algorithms and NLM/ANLM, in particular, the estimators are local and do not exploit global similarities. Anisotropic diffusion methods smooth a noisy image with a Gaussian kernel. As the standard deviation of the kernel increases, the smoothing process introduces larger bias to the edges. In [31, 32] the authors proved that the set of images derived by this approach can be viewed as the solution to the heat diffusion equation. Perona and Malik [33] noted the isotropy in the heat equation and introduced anisotropy. Their anisotropic diffusion starts with the heat equation but at every iteration exploits the gradient information of the previous estimate to increase the conductance along the edges and decrease it across the edges. Efforts to theoretically analyze the risk of this algorithm have left many open questions remaining [14]. It is worth noting that the idea of applying an image denoising algorithm iteratively and guiding it at every iteration based on previous estimates goes back to Tukey’s twicing [34]. Anisotropic transformations enable simple threshold based denoising algorithms. While the standard separable wavelet transform cannot exploit the smoothness of the edge contour, a menagerie of anisotropic transforms have been developed, including ridgelets [15], curvelets [16], wedgelets [9], platelets [17, 35], shearlets [18], contourlets [19], bandelets [20], and directionlets [21]. As mentioned in the Introduction, among the above algorithms only wedgelets can obtain the optimal minimax risk for the Horizon class; however wedgelets are not suited to denoising textures. One promising avenue combing wavelets (for texture denoising) and wedgelets (for edge denoising) could follow the path of the image coder in [36]. Alternative anisotropic NLM algorithms have been proposed to address the inefficiency of using a fixed neighborhood. In [37, 38] the authors adapt the neighborhood size to the local image content but do not provide any optimality analysis. In [38] the authors consider different isotropic NLM 34
neighborhood sizes depending on how smoothly the image content varies. In [37] the authors do not change the neighborhood size (which is critical to acheive optimality) but rather use image gradient information to increase the weights of the NLM along edges and decrease them across edges. This method is equivalent to modifying the threshold parameter tn to force NLM to assign higher weights to edge-like neighborhoods. Unfortunately, this technique does not reduce the bias that renders NLM sub-optimal. Finally, data-driven optimality criteria have been considered in [39–41], where the authors derive lower bounds for the performance of denoising algorithms. However, the analyses provided in these papers are not fully rigorous and do not cover the performance of NLM for images with sharp edges. 7. Discussion and future directions We have introduced and analyzed a framework for anisotropic nonlocal means (ANLM) denoising. Similar to NLM, ANLM exploits nonlocal information in estimating the pixel values. However, unlike NLM, ANLM uses anisotropic, oriented neighborhoods that can exploit the smoothness of edge contours. This enables ANLM to outperform NLM both theoretically and empirically. In fact, the performance of ANLM is withtin a logarithmic factor of optimal as measured by the minimax rate on the Horizon class of images. Numerous questions remain open for future research. From the theoretical perspective, the risk analysis of GANLM, the application to noise models beyond Gaussian, and the extension to three dimensions and beyond (for seismic, medical, and other data) pose interesting research challenges. From the practical perspective, the question of how to best tune ANLM to match the nuanced edges and textures of real-world images remains open, since we have considered only brutal binary images here. Finally, while NLM is no longer the state-of-the-art denoising algorithm, it is a key building block in several top-performing algorithms. It would be interesting to see whether anisotropy pays off as handily for those algorithms as it does for NLM. 8. Acknowledgements This work was supported by the grants NSF CCF-0431150, CCF-0926127, and CCF-1117939; DARPA/ONR N66001-11-C-4092 and N66001-11-1-4090; 35
ONR N00014-08-1-1112, N00014-10-1-0989, and N00014-11-1-0714; AFOSR FA9550-09-1-0432; ARO MURI W911NF-07-1-0185 and W911NF-09-1-0383; and the TI Leadership University Program. Appendix A. Minimax risk of the mean filter In this appendix, we obtain the decay rate of the risk of the mean filter. Our proof is similar to the proof in [14]. Nevertheless, we repeat the proof here for the sake of completeness and since our continuous framework is slightly different from the discrete framework in [14]. See [8] for the generalization of this result. The classical mean filter estimates the image via ˆMF
f
1 (t1 , t2 ) = 2 ∆
Z
∆ 2
τ1 =− ∆ 2
Z
∆ 2
dY (t1 − τ1 , t2 − τ2 ),
τ2 =− ∆ 2
where ∆ specifies the the size of the window on which averaging takes place. Theorem 5 ([14]). If fˆMF is the estimate of the mean filter, then inf
sup R(f, fˆM F ) = Θ(σ 2/3 ).
∆ f ∈Hα (C)
Proof. We first derive a lower bound for R(f, fˆMF ). Consider a function fh (t1 , t2 ) with h(t1 ) = 1/2 (recall Figure 2) and define Q∆ = {(t1 , t2 ) | 1/2 − ∆ < t2 < 1/2 + ∆}. It is straightforward to confirm that if (t1 , t2 ) ∈ Q∆/4 , then |fh (t1 , t2 ) − EfˆMF (t1 , t2 )| > 41 . Therefore, if Bias(fˆMF ) denotes the bias of the mean filter estimator, then we have Z 2 ˆMF Bias (f ) = |fh (t1 , t2 ) − EfˆMF (t1 , t2 )|2 dt1 dt2 ZS 1 ≥ |fh (t1 , t2 ) − EfˆMF (t1 , t2 )|2 dt1 dt2 ≥ ∆. (A.1) 32 Q∆/4
36
Now consider a point (t1 , t2 ) ∈ S\Q∆ . We have E(fˆMF (t1 , t1 ) − EfˆMF (t1 , t2 ))2 Z ∆ Z ∆ Z ∆ Z ∆ 2 2 2 2 = τ1 =− ∆ 2
=
τ2 =− ∆ 2
τ10 =− ∆ 2
E(dW (t1 − τ1 , t2 − τ2 )dW (t1 − τ10 , t2 − τ20 )) ∆4 τ20 =− ∆ 2
σ2 ∆2
Therefore, if Var(fˆMF ) is the variance of the mean filter estimator, then we have Z σ2 MF ˆ Var(f ) ≥ E(fˆMF (t1 , t1 ) − EfˆMF (t1 , t2 ))2 = 2 (1 − 4∆). (A.2) ∆ S\Q4∆ 2
2
σ 1 σ By combining (A.1) and (A.2) we obtain the lower bound of ∆ 2 + 32 ∆ − 4 ∆ for the risk of the mean filter estimator. If we minimize this lower bound over ∆ then we obtain the lower bound of Θ(σ 2/3 ) for the risk.
Now, we derive an upper bound for the risk. Define the region R∆ = {(t1 , t2 ) | h(t1 ) − ∆ ≤ t2 ≤ h(t1 ) + ∆}. It is straightforward to confirm that, if (t1 , t2 ) ∈ S\R∆ , then the ∆neighborhood of this point does not intersect the edge contour. Therefore, σ2 the bias of the estimator over this region is zero and the variance is ∆ 2 . If we bound the risk of the points in the R∆ region by 1, we obtain the following upper bound for the risk, σ2 R(f, fˆMF ) ≤ 2 + 2∆. ∆ Again by optimizing the upper bound over ∆ we obtain the upper bound of Θ(σ 2/3 ). This completes the proof. Appendix B. Minimax risk of NLM Corollary 1 states the following upper bound for the risk of NLM: sup R(f, fˆN ) = O(σ| log σ|). f ∈H α (C)
37
In this section we prove that the risk of NLM is lower bounded by Ω(σ). The proof is similar to the proof of Theorem 5 in [8]; we consider the function fh (t1 , t2 ) for h(t1 ) = 1/2 and prove that the bias of the NLM on (t1 , t2 ) ∈ Q δ n is Θ(1) for any choice of the threshold parameter. However, in the continuous setting considered here, the steps are more challenging. Following [8] we consider the semi-oracle NLM algorithm (SNLM). The semi-oracle δ-distance is defined as d˜2δ (dY (t1 , t2 ), dY (s1 , s2 )) 1 kxδt1 ,t2 − ysδ1 ,s2 k22 − |xδt1 ,t2 (0, 0) − ysδ1 ,s2 (0, 0)|2 , = 2 n −1 where xδt1 ,t2 (j1 , j2 )
Z =
j ,j2
f (s1 , s2 )ds1 ds2 .
(s1 ,s2 )∈I δ1 n
SNLM then estimates the weights according to 1 if d˜2δ (dY (t1 , t2 ), dY (s1 , s2 )) ≤ S wt1 ,t2 (s1 , s2 ) = 0 otherwise.
n2 σ 2 δ2
+ τσ ,
It is clear that the distance estimates of the SNLM algorithm are more accurate than those of NLM algorithm. Hence it outperforms NLM, and a lower bound that holds for SNLM will hold for NLM as well. We make the following mild assumptions on the parameters of SNLM. A1: The window size δ → 0 as σ → 0. A2:
δ2 n2
= Ω(σ 2 ). Otherwise Ed˜2δ (dY (t1 , t2 ), dY (s1 , s2 )) → ∞.
A3: n → ∞ as σ → 0. This ensures that, if two points (t1 , t2 ) and (s1 , s2 ) have the same neighborhoods, then wt1 ,t2 (u1 , u2 ) = 1 with high probability. ˜ (t1 , t2 ), f (s1 , s2 )) > 1/4, then P(wS (s1 , s2 ) = 1) = o(σ 3 ). A4: If d(f t1 ,t2 Again, consider the function fh (t1 , t2 ) for h(t1 ) = 1/2 (recall Figure 2). Let (t1 , t2 ) ∈ Qδ/n . For notational simplicity we use w(s1 , s2 ) instead of wtS1 ,t2 (s1 , s2 ). 38
Lemma 9. If |s1 − t1 | > δ/2 and |s01 − t1 | > δ/2, then P(wtS1 ,t2 (s1 , s2 ) = 1) = P(wtS1 ,t2 (s01 , s2 ) = 1) for any t1 , t2 , s1 , s2 and s01 . The proof is straightforward and hence skipped here. Now consider a point (t1 , t2 ) ∈ Gδ/n . Lemma 10. If |s − t1 | > δ/2, for u < δ/4, then we have P(wtS1 ,t2 (s, t2 − u) = 1) = P(wtS1 ,t2 (s, t2 + u) = 1) This lemma is a straightforward application of symmetry, and hence we skip the proof. Lemma 11. Suppose that δ and τσ satisfy A1–A4. Then we have ! Z w(s1 , s2 )ds1 ds2 > σ 2
P
= o(σ).
S\Qδ/2
Proof. Define B = {(s1 , s2 ) ∈ S\Qδ/2 | w(s1 , s2 ) = 1} and the event G = {λ(B) ≥ σ 2 }. Using the Markov Inequality and Assumption A4, it is straightforward to show that ! Z P(G) = P w(u, v)dudv > σ 2 = o(σλ(S\Qδ/2 )) = o(σ). S\Qδ/2
On the other hand, if λ(B) < σ 2 , then Z w(u, v)dudv = O(σ 2 ). S\Qδ/2
This completes the proof. 39
Proposition 1. Let (t1 , t2 ), (s1 , s2 ) ∈ Qδ/n . Then there exists p0 > 0 independent of σ such that for every tn , δ we have P(wtn1 ,t2 (u, v) = 1) > p0 . Proof. We need to demonstrate that regions above and below the edge contour are included in the NLM algorithm with a constant non-zero probability p0 . First, we use the definition of the weights of the NLM algorithm to determine the probability that w(u, v) = 1 2 2 n σ n 2 P(wt1 ,t2 (s1 , s2 ) = 1) = P d˜δ (dY (t1 , t2 ), dY (s1 , s2 ) < 2 + τσ δ 2 2 X X 1 nσ 1 1 2 = P s`,p − 2 − 2 s`,0 ≤ − + τσ n2 − 1 δ n −1 n 2 2 X X 1 nσ 1 1 2 ≥ P s`,p − 2 − 2 s`,0 ≤ − , n2 − 1 δ n −1 n where s`,p = z0,δ,δ s1 ,s2 (`, p). Using the Berry-Esseen Central Limit Theorem for independent non-identically distributed random variables [42], we can easily bound this probability away from zero. For more details, see Proposition 1 in [8]. We now consider the weights for the regions above and below the edge contour separately. Therefore, we define Q1∆ = {(t1 , t2 ) | 1/2 < t2 < 1/2 + ∆/4}, Q2∆ = {(t1 , t2 ) | 1/2 − ∆/4 < t2 < 1/2}, Ω1∆,δ = {(t1 , t2 ) | 1/2 < t2 < 1/2 + ∆/4, 0 < t1 < 2δ}, Ω2∆,δ = {(t1 , t2 ) | 1/2 − ∆/4 < t2 < 1/2, 0 < t1 < 2δ}. Let n pu,σ = P(w(t (u, v) = 1). 1 ,t2 )
Note that according to Lemma 9 this probability does not depend on v. Lemma 12. If w(u, v) denotes the weights used in the NLM algorithm, then
40
we have Z ! r 4 Z σ | log(σ)| ∆δ = O , P w(u, v)dudv − pu,σ du > 2 + 2 Q1∆ δ Q1∆ Z ! r 4 Z | log(σ)| σ P w(u, v)dudv − pu,σ du > 2 + 2 ∆δ = O . Q2∆ δ Q2∆ Proof. The weights w(u, v) over the pixelated neighborhoods can be rewritten in the following way 1 2δ X
Z
Z w(u, v)dudv =
w(u + 2kδ, v)dudv.
Ω1∆ k=1
Q1∆
Applying the Hoeffding inequality, it is straightforward to confirm that 1 X 2δ pu,σ 2 > t ≤ 2e−2δt . w(u + 2kδ, v) − P 2δ k=1 Now, defining Γ1∆ =
(u, v) ∈ Ω1∆
1 2δ X pu,σ w(u + 2kδ, v) − < t , 2δ k=1
we see that P(λ(Ω1∆ ) − λ(Γ1∆ ) > ) = P
!
Z (u,v)∈Ω1∆
(1 − I((u, v) ∈ Γ1∆ )dudv >
2
≤
2e−2δt ,
where the last step is due to Markov’s Inequality. Define the event F = {λ(Ω2∆ ) − λ(Γ2∆ ) < }.
41
If λ(Ω2∆ ) − λ(Γ2∆ ) < , then we have that Z Z Z Z w(u, v)dudv − p(u, σ)dudv ≤ w(u, v)dudv − w(u, v)dudv Q1∆ 1 1 1 Q∆ Q∆ Γ∆ Z Z Z Z w(u, v)dudv − p(u, σ)dudv + p(u, σ)dudv − p(u, σ)dudv + Γ1∆ Γ1∆ Γ1∆ Q1∆ ≤ 2 + 2t∆δ. q Setting t = | logδ σ| completes the proof. Theorem 6. Suppose that δ, τσ and n satisfy Assumptions A1–A4. Then the risk of SNLM is inf
sup R(f, fˆS ) = Ω(σ).
δn ,tn f ∈H α (C)
Proof. Let (t1 , t2 ) ∈ Ω2δ/n . We will calculate the bias of the NLM estimator at this point. We have R 2 R 2 w(u, v)X(u, v)dudv w(u, v)X(u, v)dudv S R S R ≥ E . E f (t1 , t2 ) − w(u, v)dudv w(u, v)dudv S S
42
which leads us to towards the lower bound R R w(u, v)X(u, v)dudv w(u, v)X(u, v)dudv S R S R ≥E | F ∩ G P(F ∩ G) E w(u, v)dudv w(u, v)dudv S R S ! w(u, v)X(u, v)dudv Qδ/4 ≥ E R | F ∩ G P(F ∩ G) p(u, v)dudv − 2 − tδ 2 Qδ/4 ! R w(u, v)X(u, v)dudv Qδ /4 ≥ E R − P(F c ∪ G c ) 2 p(u, v)dudv − 2 − tδ Qδ/4 R ! w(u, v)f (u, v)dudv Qδ/4 − P(F c ∪ G c ) = E R 2 p(u, v)dudv − 2 − tδ Q δ/4 R w(u, v)dudv 1 Qδ/4 − P(F c ∪ G c ) = E R 2 p(u, v)dudv − 2 − tδ Qδ/4 R p(u, v)dudv + 2 + t 1 Qδ/4 − P(F c ∪ G c ) = E R 2 p(u, v)dudv − 2 − tδ Qδ/4 R p(u, v)dudv + 2 + t 1 Q − P(F c ∪ G c ). R δ/4 ≥ E (B.1) σ + 2Q1 p(u, v)dudv − 2 − tδ 2 δ/4 R The minimum of the last line is achieved when Q1 p(u, v)dudv is minimized. δ/4
However, we have proved in Proposition 1 that p(u) > p0 for (u, v) ∈ Q1δ/n . Therefore, this minimizing integral is Ω(σ). When we substitute this optimum value in the lower bound (B.1), we see that the risk over this region is Θ(1). Therefore the bias of the NLM over the entire image is Ω( nδ ) or, equivalently, Ω(σ) according to Assumption A2. Appendix C. Proof of Theorem 1 Here we focus on the case of α = 2 and use the standard technique of hypercube construction to establish the lower bound [12]. Let φ : [0, 1] → d2 φ R+ ∪ {0} be a two times differentiable function with k k = 1, φ(0) = dt2 ∞ 2 2 φ(1) = 0, dφ = dφ = 0, and d 2φ = d 2φ = 0. Define dt 0
dt 1
dt
dt
0
−2
φi,m , Cm φ(mt − i) 43
1
i = 0, 1, . . . , m − 1.
Set f0 , 1{t2