WEAK CONVERGENCE OF EMPIRICAL PROCESSES ~
T.G. Sun l/ and R. Pyke 2/ University of Washington
ABSTRACT
In this paper, the weak convergence of empirical processes defined on a family A of subsets of the unit cube with smooth surfaces is obtained. The index family A, closely related to one introduced by Dudley (1974), is studied and it is shown in particular that the volume of the £-tubes about elements of A are uniformly 0(£). The approach to weak convergence involves first the study of a smoothed version of the empirical processes, obtained by replacing each point mass by a uniform measure of equal mass on a small ball centered at each point. This process has continuous paths in C(A) with respect to the Hausdorff metric. The remaining steps are to show the uniform closeness of the smoothed and unsmoothed versions and to obtain the necessary bounds on the modulus of continuity.
AMS 1970 Subject Classification.
Primary Secondary
Key words and phrases.
Empirical processes, indexed by sets, Brownian measures, smoothed empirical processes, metric entropy, weak convergence.
60 B 10 60 F 05, 60 G 99
FOOTNOTES 1/ Mr. Sun is presently employed with Farmers Insurance Company of Washington, Mercer Island, Washington. 2/ The research of both authors was supported in part by the National Science Foundation under Grants MCS-75-08557 and MCS-78-09858.
-i-
PREFACE These remarks are written in order to describe the background of this paper and to explain the appearance in 1982 of results that should more properly be dated 7 years earlier. Mr. T.G. Sun, the first author of this report was a graduate student of mine during 1974-77.
During this time he worked on three problems in-
volving multidimensionally indexed processes, namely (i) Skorokhod-type embedding results for partial-sum processes derived from random matrix arrays as in Pyke
(1973)~/ (ii) the construction of a Prokhorov-Billingsley-
type metric for D(A) that would make it complete and separable (cf. the discussion prior to Theorem 5 of this paper), and (iii) the weak convergence of empirical processes indexed by sets.
The first and third problems were
discussed in preliminary form by me at the Conference on Asymptotic Methods in Statistics held in Oberwolfach, West Germany, November 10-16, 1974, and a preliminary outline of his results entitled, liOn the convergence of empirical processes parametrized by some class of sets with differentiable boundaries of I k," was typed up by Mr. Sun in April, 1975. During the next 2 years, Mr. Sun continued his work on this and the other two problems, and on April 6, 1977 I received from him a preliminary typed draft of his proposed thesis. on the 3 topics mentioned above.
It consisted of 3 chapters, one each
The present paper is based as closely as
possible upon Chapter I of that document which in turn was an expanded version of his 1975 paper.
Although a few revisions still needed to be made,
the final results would have represented very important and original con-
Brownian sheets. E. F. Hard
-ii-
tributions.
Unfortunately, with respect to the completion of the thesis,
it was at this time that Mr. Sun chose to pursue his interest in actuarial science and joined Farmers Insurance Company of Washington, Mercer Island, Washington, his present employer.
In order not to preclude prematurely
the completion of the thesis I did not make the necessary revisions until January, 1979, when I was presenting a series of lectures on Empirical Processes for the Department of Statistics at Colorado State University, whose support and facilities I gratefully acknowledge. Professor Dudley's (1978) paper arrived.
Shortly after my return,
Since this published paper contained
a statement of Mr. Sun's result, together with an acknowledgement and a correct proof, I did not put the revision into a technical report at that time, though a preliminary typed version was prepared. The present paper is a belated completion of that typing, with only a couple of bibliographic updates inserted. is two-fold.
The purpose for this completion
First, for personal and historical reasons I believe it is
desirable for Mr. Sun's work to be made available in a form that is as close as possible to his own writings of 1975 and 1977.
Secondly, the results
and methods of the paper playa foundational role in most of my subsequent research, and since I find it helpful to reference them, it is desirable to have them available in a form that can be distributed. Concerning the first purpose, I have endeavored to make as few changes as possible in Mr. Sun's 1977 paper. were indeed primarily editorial.
In the first two sections the changes
In Section 3 I have made some substantive
changes, resulting in some abbreviation and clarification.
The main addition
is the proof of Theorem 3 which I have written out in considerable detail.
-iii-
The original referred at this point to the method of Strassen and Dudley (1969) but did not provide a suitable proof. In trying to preserve as much as possible the flavor and format of the original paper t it is inevitable that some of the changes I have made may have resulted in some uneveness in the paper's readability.
It is clear
that a total rewrite with 5 years of hindsight could obtain more succinctly a much more general result with essentially the same effort; e.g. more general families At non-identically distributed observations t weakly dependant observations t stronger metrics t among others.
(I plan to prepare a general
revision along these lines of generalization for publication.) However t I trust our original approach remains of value.
In itt we first smooth the
empirical process by replacing the point mass of lin at each observation by a uniform measure (of the same total mass) over a small sphere centered there. The resulting smoothed process is in C(A) to which existing Central Limit theorems were originally to have been applied.
The final step would then be
to obtain uniform bounds on the difference between the smoothed and original empirical processes.
In 1975 and 1977 it was in order also to have a de-
tailed treatment of the classes I(Mtutk)t and this ;s given in Sections 1 and 2.
Ron Pyke January 20 t 1982
O.
INTRODUCTION Let I k = [O,l]k be the closed unit cube in Rk and let Sk denote the family of Borel subsets of I k. For Ac Sk, let Y1'YZ'" be a sequence of independent random variables on a probability space (n,F,p) having distribution F on (Ik,Sk).
The empirical process based on
Y1'Y Z""'Y k and
A c: Sk, denoted by Wn'A is defined on A
(1.1) W A(A,W) n,
= n1/ Z .I {lA(Y J.(W)) - F(A)}, A E A, W t n J=l
x
n by
n
where 1A is the indicator function of A. When one deals with empirical processes it is natural to consider them in this way as being indexed by sets. this.
See Pyke [13] for a discussion of
The case usually considered in the literature views empirical processes
as being indexed by points rather than sets.
This case can of course be
viewed as a special case of the above if A is chosen as the family of all kdimensional intervals [O,x] C I k. For this case, Dudley [4], Bickel and ~N
Wichura [2], Neuhaus [lZ], and Straf [15] have shown that Wn converges weakly in appropriate senses to a tied-down Brownian sheet. The purpose of this paper is to prove an analogous limit theorem when the index class A of subsets is much larger. In order to study questions of weak convergence, the sample paths of the processes must have a suitable structure.
In particular, it is helpful if
the limiting process has continuous sample paths.
r~d1ey
[5] has shown that
the sample paths of Brownian sheets may be assumed to be continuous when A is a fairly large family of subsets of I k with sufficiently smooth boundaries. a
s
a
ana
inTa~\J~ls.
weak convergence also holds for a
I
s
ass A
per we s is
-2-
similar to the class of Dudley.
The precise definitions of this class and of
the convergence is given in the next section. 1.
NOTATIONS AND DEFINITIONS Let
(V.m)
be a metric space.
The family of all closed bounded subsets of
V can be given the Hausdorff metric defined by d(A,B) = inf{s
>
0; A c SS, Be AS}
where A,S are closed bounded subsets of
V,
and AS
points whose distance (with respect to the metric m) than or equal to
s.
Let 2I metric.
from the set A is less
(A useful summary of properties of this metric space is
given in Debreu [31.) Let C(V) functions on
denotes the set of all
denote the set of all continuous real-valued
V metrized as usual by the supremum norm.
k
be the metric space of all closed subsets of
Since
I
k
is compact, so is
Ik
2
theorem on page 47 of [lOJ). Next we define a class of subsets of 2I
(see theorem 1, page 45, and the
k with "smooth boundaries" which is
closely related to one defined by Dudley [6J. all sets are closed with "1-1
11
I k under the Hausdorff
boundaries.)
(The difference is primarily that First we define a space of functions
into I K with bounded derivatives of order a; for a
>
n, let b
= [aJ
where [xJ
is the largest integer ~ than x. Let t = a-b ((O,lJ. Let Urn be the ODen unit ball in Rm and for M > 0 and a > 0, let F(~,a,m) be the set of all real functions on a)
Urn
such that;
the partial derivatives
DPf
= --~------ dxPl 1
exist for all
b
f
P = (Pl , ... 'Pm)
0, let
6 =
~M
to obtain for any el.=:.l, whenever
If(x)-f(y)I<e:
Ix-yl