deconvoluting kernel density estimators - NCSU Statistics

Comment

Report 3 Downloads 75 Views

•

DECONVOLUTING KERNEL DENSITY ESTIMATORS

•

by Leonard A. Stefanski 1 Department of Statistics North Carolina State University Raleigh, NC 27695-8203 and Raymond J. Carrol1 2 Department of Statistics Texas A&M University College Station, TX 77843

1

Research supported by the National Science Foundation Grant DMS-8613681.

2 Research supported by the Air Force Office of Scientific Research Contract AFOSR-F-49620-85-C-0144.

SUMMARY This paper considers estimation of a continuous bounded probability density when observations from the density are contaminated by additive measurement errors having a known distribution.

Properties of the estimator obtained by

deconvolving a kernel estimator of the observed data are investigated.

When

the kernel used is sufficiently smooth the deconvolved estimator is shown to be pointwise consistent and bounds on its integrated mean squared error are derived.

Very weak assumptions are made on the measurement-error density

thereby permitting a comparison of the effects of different types of measurement error on the deconvolved estimator.

• Some key words:

Bandwidth selection; Convolution; Deconvolution; Density estimation; Kernel; Measurement error

1. 1.1

INTRODUCTION

The deconvo 7ution prob 7em

Let U and Z be independent random variables with probability density functions g and h respectively. densHy f

= g*h

Then the random variable X = U+Z has the

where "*" denotes convolution.

Assuming h is known we consider

estimating g from a set of independent observations {X 1, ••• ,X } having the n common density f. The problem arises whenever data are measured with nonnegligible error and knowledge of g is desired.

In this case U represents a "true" value, Z is an

error of measurement and X is an observed value.

An application is discussed

in Mendelsohn & Rice (1983); other applications and related work can be found in Eddington (1913), Trumpler & Weaver (1953), Kahn (1955), Gaffey (1959), Wise

•

et. ale (1977), and Devroye & Wise (1979).

Also, the deconvolution problem can

be cast in the format of an empirical Bayes problem wherein g represents the prior distribution for a sequence of location parameters.

Estimation of prior

distributions or mixing distributions have been studied by several authors, for example, Blum & Susarla (1977), Choi & Bulgren (1968), Deely & Kruse (1968), Maritz (1967) and Preston (1971).

The estimator proposed in Section 1.2 is a

transformation of a kernel density estimator of f.

Some papers in which a

transformation of a density estimator is of primary interest include Taylor (1982) and an unpublished manuscript by P. Hall and R. L. Smith (1986) titled "Unfolding a Nonparametric Density Estimate." Although many of the estimators proposed in the literature have been shown to be consistent less is known about their rates of convergence.

The estimator

we propose has the advantage of being analytically and, in some cases,

4It

computationally no more complex than an ordinary kernel density estimator, thus facilitating a discussion of its convergence properties.

However, a price is

Page 2

paid for the reduction in complexity in that the resulting estimator is not range preserving, i.e., it assumes negative values with positive probability. Throughout we make very weak assumptions on h and this allows us to assess the effects of different types of measurement error.

A conclusion

indicated by the analysis is that the difficulty of the deconvolution problem varies inversely with the smoothness of the measurement-error density.

Thus

whi le normality has typically been the "assumption of convenience" for error distributions in many statistical models it is an inconvenient assumption in the deconvolution problem. Some conditions on h are necessary to insure that g is identifiable.

We

assume that h has a nonvanishing characteristic function, 'h' i.e.,

•

l'h(t}1 > 0 for all real t. Although

(1.1)

(1.1 )

is not the weakest assumption insuring identifiability of g

it holds in many cases of interest, and in particular at the normal model. 1.2

The estimator

Let K be a bounded even probability density function whose characteristic function, 'K' satisfies, for each fixed A > 0, (1. 2)

integrable, which in turn implies that 'K is invertible, i.e., K(x} = (2Tr)

-lJe -itx.K(t}dt.

(1

• 3)

Page 3 Let f be an ordinary kernel density estimator of f based on the kernel K, i.e. , f(x) :II

n

(n~)-1

.r

[X+) .-x

K

J:II1

( 1.4)

The characteristic function of f is denoted '~(t)

= '(t)'K(~t)

,~

and satisfies

f

where' is the empirical characteristic function.

Under

f

(1.2) ,. /'h is an integrable function and therefore possesses a Fourier f

transform.

The estimator we propose is 9 given by (1.5 )

*

Define the function

•

K~

as (1.6)

then 9 has the representation

~ g(x)

*[ -i---x ) . J=1

= (n~)- 1 .rn

~

X.

(1. 7)

Properties of 9 are best understood in terms of the properties of

*

K~

and the

latent variables (U.,Z.) j=1, ••. ,n. J

J

Equation (1.2) implies that

*

IK~I

is bounded, thus Igi is also bounded

and its expectation necessarily exists.

Furthermore it can be established that

From (1.8) it follows that (1 .9)

Thus 9 has the same bias as an ordinary kernel density estimator.

Page 4

Formally, this is a consequence of the fact that the linear operations of expectation and Fourier transformation commute.

Furthermore if 9 is the

kernel "estimator" g(x) .. (n).) -1

n

.r

[ui-] .-x

K

J=1

then it follows from (1.8) that E(g(x)IU1 , ••. ,Un ) 9 can be viewed as an unbiased estimator of g.

= g(x).

Thus, conditionally

The fact that 'K is even and real implies that K).* is real and thus 9 is also.

* is even. When h ;s even, K).

If 'K/'h(-/).) possesses m continuous

integrable derivatives then it follows from the Riemann-Lebesgue Lemma that

* .. o(ltl -m ) as It I diverges; and for m ~ 2 this means that K).* and K).(t) hence 9 are integrable.

~

Furthermore in this case the Fourier Inversion

Formula indicates that

and thus

'K().y)/'h(-y)

= ).-1feitYK~(t/)')dt

fK~(t)dt = 1

implying that f (X)dX

9

(1.10)

= 1.

Since K).* has many

properties of an ordinary kernel we call it a deconvo luting kerne 1.

The one

property it lacks is nonnegativity; the left hand side of (1.10) is not positive definite thus K).* cannot be a true probability density. A

Since, conditioned

_

on U1 , •.• ,Un , g is an unbiased estimator of g, this problem can be viewed as a failure of unbiased estimators to be range preserving. In summary, provided 'K/'h(-/).) is smooth and (1.2) holds,

9

g is continuous, real-valued and integrable with f (X)dX

= 1.

Although the

severity of (1.2) depends on 'h' it is always possible to satisfy these conditions when (1.1) holds by choosing 'K so that it vanishes outside a finite interval.

For example, we can take 'K proportional to u(2m) where

Page 5 U( 211) is the 2m-fold convolution of the uniform density, (1/2)1(lxl with itself.

~

1),

The corresponding density is proportional to [sin(t)/t}2m.

When

:> 2 , U(2m) has two continuous integrable derivatives and the smoothness m_

conditions on

'K/'h(./~)

are obtained provided 'h is sufficiently smooth.

If 'h is not smooth then g need not be integrable although it will still be bounded and square integrable. For certain measurement-error distributions (1.6) has a closed-form expression.

For example, when h(x)

* = K(t) = (1/2)exp(-lxl), K~(t)

-

~

-2K"(t);

in fact the integral in (1.6) can be evaluated analytically whenever l/'h is a polynomial.

Unfortunately, it does not seem possible to obtain

*

K~

in closed

form for the normal measurement-error model. 2.

ASYMPTOTIC RESULTS

Throughout we work under the assumptions that g is continuous and bounded and hence square integrable.

Since

I'K/'h(./~)I

is square integrable we have

from (1.6) and Parseval's Identity (2.1)

In light of (1.9) we can appeal to known results on kernel density estimators to claim that the bias of g(x) converges to zero as

~

goes to zero.

Define

A(~,a) =f{K~(X)}2g(a+~X)dx[ f[K~(X)}2dX ]-1 and note that

A(~,a)

is bounded by 8g defined as 8g

= sup x

g(x).

Now note that

Page 6 _

= (z+U-X)/A

and aft'e·'" the,· chang&, of variables t

in the inner integral we

get

E[K~[X~Xj}2 = A fA(A'X-Z)h(Z)dZf[K~(t)}2dt S A

Bgf{K~(t)}2dt

(2.2)

•

Now since nA 2 var{~(x)} is bounded by the left hand side of (2.2) we find that

..

-1 -1

}2dt

* var{g(x)} S n A BgfJ~ tKA(t) •

(2nnA)-lB9f+~(t)/I+h(t/A)12dt

upon appealing to (2.1).

,

(2.3)

Together, (1.9) and (2.3) can be used to obtain the

following result. THEOREM 2.1.

If +K and +h are such that (1.1) and (1.2) hold and g is

continuous and bounded then g(x) defined by (1.5) is a consistent estimator of g(x) provided n increases and A decreases to zero in such a way that (nA)

-1}+K(t)/I+h(t/A)I 2 2dt converges to zero.

Next using Panseval's Identity, (2.1) and the change of variables t

= (X-y)/A

we have as n

~ ~

and A ~ 0

Page' 1

5 var[g(Y)}dY = n-15A-2e[K~[¥)}2dY - n-1f[ep -2K~[~}] 2dy • n-'e

5A-2(K~[X~Y)}2dY - (n2n)-'f l+g(t)12+~(At)dt

= (An)-1 e 5{K~(t)}2dt

- (2nn)-1f

= (2nnA)-'5

+~(t)/I+h(t/A)12dt +

.. (2TrnA)-15

+~(t)/I+h(t/A)12dt

l+g(t)12+~(At)dt o{(nA)-1}

•

(2.4)

If in addition to previous assumptions, g possesses two bounded integrable

derivatives then as A decreases to zero 4 f[e(9(X)} - g(X)]2 dx .. (A /4)

~

when

~,2 =5 Y~K(y)dY

is finite.

.u~,2 f(gll(X)}2 dX

(2.5)

Combining (2.4) and (2.5) we have that to a

first-order approximation

f e{g(x)

- g(X)}2 dX

= (2nnA)-1f+~(t)/I+h(t/A)12dt 4

+ (A /4)

~,2 5{gll(X)}2dX

(2.6)

•

The first term in (2.6) can be much larger than the variance component of the integrated mean squared error of an ordinary kernel density estimator. the price paid for not measuring u1, .•• ,U n precisely.

This is

The rate at which

diverges as A decreases is dictated by the tail behavior of I+hl, which in turn is related to the smoothness of h.

~

and vanishes off this interval.

Suppose +K is strictly positive on (-B,B)

Then considering (2.7) when h is standard

Page 8

normal, Cauchy and double-exponential we have respectively that for each

o<e

< 8 there exist positive constants c

1

-

2 2 82/,2 c e(8-e) /;\ S VK,h(;\) S c e ~ 1 2 < C e 28 /;\ K,h (') ~ 4

< V

-

C

s

such that (normal) (Cauchy) (double-exponential) •

This means that in order for (2.7) to converge to zero, ;\ must approach zero at a rate no faster than (log n)-1/2 for the normal model, no faster than (log n)-1 for the Cauchy model, while for the double-exponential model it is sufficient to have n;\ S

The corresponding rates for the integrated mean squared errors are (log n)-2 , (log n)-4 and n- 4/ 9 respectively.

.e

~

m.

The normal, Cauchy and double-exponential densities share the same ordering with respect to their peakedness at the origin; the normal is least peaked and the double-exponential is most peaked.

The relationship between the variance

of g and the peakedness of h is intuitively plausible if we think of peakedness as a measure of the closeness of h to a delta function for which measurement error is identically zero and the deconvolution problem disappears. This analogy can be pushed a little further by considering a model wherein only 100p% (0

Recommend Documents

On the Robustness of Kernel Density M-Estimators - EECS @ UMich

On the Accuracy of Binned Kernel Density Estimators - Semantic Scholar

Model centroids for the simplification of Kernel Density estimators

Estimation of Integrated Squared Density Derivatives ... - NCSU Statistics

Nonparametric Kernel Estimators for Image Classification - Gatsby ...