SAMPLEBASED NON-UNIFORM RANDOM ... - Semantic Scholar

Report 19 Downloads 47 Views
Proceedings of the 1086 Winter Simulation 3. Wilson, J. Henriksen, S. Roberts (eds.)

Conference

SAMPLEBASED

NON-UNIFORM GENERATION

ABSTRACT. A sample of n lid random variables wlth a given unknown density Is given. We discuss several Issues related to the problem of generatlng a new sample of Ild random varlables wlth almost the same denslty. In particular, we look at sample Independence, consistency, sample lndlstlngulshablllty, moment matching and generator efnclency. We also introduce the notlon of a replacement number, the mlnlmum number of obsenratlons In a glven sample that have to be replaced to obtaln a sample wlth a glven denslty.

RANDOM

VARIATE

-P(YEA)P(XEB)I

.

where the supremum Is wlth respect to all Bore1 sets A of II ’ and all Bore1 sets B of R nd , and where Y = Y, and X 1s our shorthand notatlon for (X,, , X, ). We say that the samples are asymptotically Independent when llm D, = o . “-+a3 Ln sltuatlons In which X,, , X,, ls used to deslgn or build a system, and Y,, . , Y,,, Is used to test It. the sample dependence often causes optlmlstlc evaluations. Wlthout the asymptotic Independence. we can’t even hope to dlmlnlsh ‘this optlmlstlc bias by lncreaslng n . The lnequallty In Theorem 1 below provldes us wlth a sufficient condltlon for asymptotic Independence. First. we need the following Lemma.

1. INTRODUCTION. Assume that we are glven a sample x,, . , x, of Ild R d-valued random vectors with common unknown density J. and that we are asked to generate a new lndependent sample Y,,..., Y,,, of lndependent random vectors wlth the same denslty f . This 1s qulte an lmposslble task since f 1s usually not known. The purpose of this note 1s to dlscuss Just what can be done, and how close we can come to generatlng a perfect sample. What one can do 1s construct a density estimate f,,(z)=fn(z,X1, . ,X,) of f (2). and then generate a sample of slze m from /, . Thls procedure has several drawbacks: first or all, f, IS typically not equal to J . AISO, the new sample depends upon the orlglnal sample. Yet, we have very few optlons avallable to us. Ideally, we would llke the new sample to appear to be dlstrlbuted as the orlglnal sample. Thls wlll be called sample lndbtlngulshablllty. Thls and other issues will be discussed In thls section. Some of thls materlal appeared orlglnally In Devroye and Gyorfl (1885, chapter 8) and Devroye (lQ86).

Lemma 1. (Scheffe, 1947). For all densltles f and g on R d , J-If-s

I =2sugPIf~-Js E

E

I

where the supremum 1s wlth respect to all Bore1 sets B of Rd.

Scheffe’s lemma tells us that If we asslgn probabllltles to sets (events) using two different densltles, then the maxlmal dlaerence between the probabllltles over all sets is equal to one hall or the L 1 distance between the densltles. From Lemma 1, we obtaln Theorem Let f.

1. be a density estlmate, which Itself 1s denslty.

Then

2. SAMPLE INDEPENDENCE. There 1s llttle that can be done about the dependence between X,, . , X, and Y,, . , Y, except to hope that for n large enough, some sort of ssymptotlc lndependence 1s obtalned. In some appllcatlons, sample lndependence 1s not an issue at all. Since the Yi’s are condltlonally lndependent glven x,, . , x, * we need only consider the dependence between Y 1 and X,, . . , X,, . A measure of the dependence 1s

D, <Et./Il.-f

Proof

“GO

of Theorem 1. See Devroye and Gyorfl (1885).

I).

n

Sample-Based

Non-Uniform

We see that for the sake of ssymptotlc sample lndependeuce, It sufllces that the expected L 1 distance between j, and j tends to zero wlth n . This 1s also called consistency. Asymptotic independence does not imply conslstency: Just let j I be the uniform density In all caSes, and obsente that D, ~0, yet f 1 j n -j 1 1s a posltlve constant for all R and all nonuniform j .

3.

CONSISTENCY

OF DENSITY ESTIMATES. 1s consistent lf for all densltles

A density estlmate j,

Random

Variate

Generation

depends upon the unknown denslty j , can be estlmated from the data. Altematlvely, as suggested by Deheuvels (1077). one could compute the formula for a given parametric density, a rough guess of sorts, and then estlmate the parameters from the data. For example, lf thls LS done wlth the normal density as lnltial guess, we obtaln the recommendation to take

where b 1s a robust estlmate of the standard devlatlon of the normal density (Devroye and Gyorll, 1985). A typical robust estlmate Is the so-called quick-and-dirty estlmate

f9 IlmE(JI/n-f “--US

I)=o.

Consistency guarantees that the expected value of the maxlma1 error committed by replacing probabllltles defined wlth j wlth probabllltles defined wlth j n tends to 0. Many estlmates are consistent, see e.g. Devroye and Gyorfl (1985). Parametric estimates, I.e. estimates In which the form of f ,, ls fixed up to a flnlte number of parameters, which are estlmated from the sample, cannot be consistent because j n 1s required to converge to j for all j , not a small subclass. Perhaps the best known and most wldely used consistent density estlmate 1s the kernel estimate

/n(x)=

x -xi

-&5K(T)

I

i=l

where K 1s a given den&y (or kernel), chosen by the user, and h >O 1s a smcothlng parameter, which typically depends upon n or the data (Rosenblatt, 1056; Parzen, 1062). For consistency It 1s necessary and sufflclent that h +O and nh d +oo In probablllty as n -00 (Devroye and Gyorfi, 1985). How one should choose h ss a function of n or the data 1s the subJect of a lot of controversy. Usually, the choice 1s made based upon the approxlmate mlnlmlzatlon of an error crlterlon. Sample independence (Theorem 1) and sample lndlstlngulshablllty (next section) suggest that we try to mlnlmlze E(J

I fn-I

I)

But even after narrowing down the error crlterlon, there are several strategies. One could mlnlmlze the supremum of the crlterlon where the supremum 1s taken over a class of denslties. Thb 1s called a minimax strategy. If j hss compact support on the real Ilne and possesses one absolutely contlnuous derlvatlve and an absolutely Integrable second derlvatlve, then the best choices for lndlvldual j (I.e.. not In the mlnlmax sense) are 1 h =&~;, K(x)

= $w

(lx

where C 1s a constant depending upon f

Iill, only:

where sp ,z* are the p -th and q -th quantlles of the standard normal density, and X,,, ) and Xc,,) are the p -th and q -th quantlles In the data, I.e. the (np )-th and (nq >th order statlstlcs. The construction given here wlth the kernel estlmate 1s simple, and yields fsst generators. Other constructlons have been suggested In the llterature wlth random varlate generatlon ln mind. Often, the expllclt form of j, 1s not given or needed. Constructions often start from an emplrlcal dlstrlbutlan function baaed upon X,, , X, , and a smooth approxlmatlon of thls dlstrlbutlon function (obtalned by lnterpolatlon), which 1s directly useful ln the lnverslon method. Guerra, Tapla and Thompson (2978) use Aklma’s (Aklma, 1870) qua.+Hermlte plecewlse cubic lnterpolatlon to obtaln a smooth monotone function colncldlng wlth the emplrlcal dlstrlbutlon function at the points Xi. Recall that the emptrlcal dlstrlbutlon ls the dlstrlbutlon which puts mass -1 at point Xi . Butler (1870) on the other hand uses &grange’s quadratic lnterpolatlon on the Inverse emplrlcal dlstrlbutlon function to speed random varlate generatlon up even further.

4. SAMPLE REPLACEMENT

THE

In slmulatlons, one lmportant qualltatlve measure of the of goodness of a method 1s the lndlstlngulshablllty X,, .,X,,, and Y,, . . ., Y, for the glven sample size m . Note that we have forced both sample slzes to be the same, although for the construction of j, we keep on using n polnts. Let us try to quantlfy this notlon by means of the followlng lmbeddlng technique. Let A be a Uxed Bore1 set of R d, and let (n,F.P) be a probablllty space wlth the property that (Y,(w), . . , Y,,, (w)) and (Z&J), , Z,,, +J)) are two sequences of lid R d-valued random vectors wlth common density j I and j respectively. The sequences are For a fixed set A of R d, let altowed to be dependent. NA and MA be the cardlnalltles of A induced by the first and second sample respectively. in appropriate measure of closeness IS the replacement

The optlmal kernel colncldes wlth the optlmal kernel for L 2 crlterla (Bartlett, 1963). The optlmal formula for h , which

INDISTINGUISHABILITY. NUMBER.

number

I,. Devroye

Here E 1s the condltlonal expectation glven X,, . . , X, . Thls Is dlflerent from, and stronger than, the approach taken ln Devroye and Gyorti (1985). Indeed, A 1s small If the cardlnalltles of all sets A are nearly equti for all A . We can consider A ss the (condltlonal) expected value of the mlnlmum number ol Y;‘s that should be altered and replaced by other random variables to make the sample Into one that can be considered ss an lld sample drawn from / . The crucial result needed here Is Theorem

(Z,,...,Zm) ) u,-N,V,,

= (II,,

(Y,> . , Ym) , Urn-,+,W,, . . , W,)

.

, VN) 1 .

We claim that

and f *, we have A=TJIf.-r

Proof

i= (VI,.

Is an lld sample drawn from / , and that

2.

For any /

samples. Then, ‘define

Is an !ld sample drawn rrom f n . Thls 1s based upon the mlx6ure decomposltlon

1 .

of Theorem 2. First, we note that

/

E S”AP INA-MA t Zs:pE( 3 s;p

1

INA-&

I)

min

E $2~ IN.,-MA t

(Jensen’s lnequallty) “;P

(1-4/

lJ/,-Jr

+

af

0

What matters 1s that the Zi ‘s and the Yi’s agree except In N components, where N Is blnomlal (m .6). Let E be the expected value wlth respect to the probablllty measure deflned above. Then

I

IE(N,tE(M,)I

=m

=

=$E

I

I

1

5 INz,-Mz, I i=1 1 (Scheffe’s theorem)

=&-A (Scheffe’s theorem) . For the lnequallty In the other dlrectlon. we will use an embeddlng argument. The obJect here 1s to construct two dependent samples of size m each, one drawn from / , and one drawn from fn , such that A+If.-I

1.

Observe that there ks no hope of obtalnlng this wlth two lndependent samples, for s;p I NA -MA 1 = 2nt for any lndependent samples wlth densltles, even lf the densltles are ldentlcal. The coustructlon of the samples can be done as follows (see Devroye, 1985): define

6=./-l/.-/

1.

Then detlne the Iollowlng densltles: mln(f f

f

min

_ o-

90 = Three lndependent sldered:

=

J,)

1-6 f-mln(f



A 1s precisely

equal to

will allow us to associate numbers wlth A. It also shows the importance of taking a denslty estlmate f n which 1s close to f In the L 1 sense. Thfs Is why we have concentrated thus far on the kernel estimate, and not on Its ancestor, the hlstogram estlmate. It should be noted that the kernel estlmate Is very flexlble, and can be adapted to many sltuatlons. However, there are certain llmltatlons. To cite two typlcal (negatlve) results, we have A. r’;‘h- E (I I 1 n-f I )

,,

.I,) 6

f.-mln(f

The fact that

’ J,

1

6

samples of Ild random vectors are con1

2

w,.w,,

, w,

-

90.

In addltlon, let N be blnomlal (m ,s) and let (ul, , u, ) be a random permutatlon of (1, , m ), and let both N and the random permutatlon be lndependent of the three

(0.8&M

(l))n-’

The difference between these results 1s that In the rormer c-e. the lnflmum Is over all lntegrable K, even kernels taklng negatlve values, whlle In the second case, the lnllmum 1s over all kernels that are densltles. Both bounds however are valid for all f . This makes them very useful for determlnlng whether the sample size 1s large enough for the kernel esttmate. As a rule of thumb, when K 20. we have

Sample-Based

Non-Uniform

Random

2

E(A)z0.42m

M n.2 =

n-‘.

This glves an idea of what klnd of accuracy we can expect. A small table of approxlmatlve lower bounds for E(A) 1s provided below. n: m:

1

1 10 100 1000 10000 100000

0.42 4.2 42 420 4200 42000

10

0.167 1.67 16.7 187 1870 10700

Generation

Variate

100

1000

10000

100000

0.000 0.86 6.6 06 060 0600

0.020 0.26 2.6 26 280 2000

0.010 0.010 1 10 100 1000

0.0042 0.042 0.42 4.2 42 420

~ ,~ (Xi 2-E (Xi 2)) + h *U2 . 1-l

This Iollows from the fact that j, 1s an equlprobable mlxture of densltles K shlfted to the Xi’s , each having varlante k*u” and zero mean. It 1s Interest&g to note that the dlstrlbutlon of M,, ,I 1s not influenced by h or K. By the weak law of large numbers, M,,l tends to 0 In probablllty as n +oo when j has a flnlte first moment. The story 1s different for the second moment mismatch. Whereas E (M, ,,)=O, we now have E (M,,,)=h*$. a posltlve bias. Fortunately, h Is usually small enough so that thls 1s not too big a blea. Note rurther that the variances of M, ,1 , M,, ,* are equal to Var (X 12) Var (X A n ’ R

If we could attaln thls lower bound, then given an orlgtnal sample of size n =lCOOO. we could generate m =lOOOO Yi ‘s wlth the property that If we could alter about LOCIor the Yi values, we would In lact obtaln a sample that 1s exactly dlstrlbuted. Most or the tlme, tables or thls nature can be used to determlne whether the lower bound Ior E(A) 1s acceptable. On the posltlve side. we should mentlon that for many densltles, E (I 1 f n -/ 1 )=0 (ne215). Thls Is true whenever j has a flnlte l++th moment for some t>O, and j and f ’ are absolutely continuous, and j” 1s absolutely Integrable. For precise lnformatlon about the rates, consult Devroye and GyorU (1985). We flnally mention that A cannot oscillate a lot about Its mean ror any kernel estlmate. We have for any boxed kernel (Le., bounded kernel of compact support, lntegratlng to one), Cm2 for some unlvenal constant C A. ( A I In war depending upon K only.

respectively. Thus, h and K only affect the bias or the second order mismatch. Maklng the bias very small 1s not recommended ss It increases the expected L 1 error, and thus the sample dependence and dlstlngulshablllty.

8. GENERATORS FOR f, . For the kernel estlmate, generators can be based upon the property that a random variate ls dlstrlbuted as an equlprobable mlxture, as 1s seen from the followlng trivial algorlthm.

Generate 3, {l.Z,

. .

Generate

5 e -‘*’ for some constant C* dependlng upon K only, and all u >O. Both results are valld unlrormly over all densltles j (Devroye, 1~386). Together wlth (upper or lower) bounds for E(A) they can be used to derlve dlstrlbutlon-free confidence intervals ror A. They also Imply that A/E (A)-1 In proba(at least those for which blllty for most j, v/;;E(Jlf.-j I)--).

arandomvariate

RETURN

For

a random integer nnlfcrmly dlstrlbuted on

, n }. W

with

den&y

K.

&+/II+’

Bartlett’s

kernel

K (z )=:(I-z*)+.

we suggest elther rejection or a method based upon propertles or order statlstlcs:

Generator baaed upon rejection for Bartlett’s kernel 5. MOMENT MATCHING. Some statlstlclans attach a great deal of Importance to the moments of the densltles j n and j . For d =l. the a’-th moment mismatch 1s defined sa

REPEAT Generate

FWTURN

M n,i =JxiSn-pf

(i =l,2.3 ,...) _ Clearly, M, ,i 1s a random varlable. Assume that we employ the kernel estlmate with a zero mean flnlte variance (u2) kernel K. Then, we have M n.1 =

$.$(xi-E(Xi)) i-1

a uniform

[-I,11

random

variate

dependentuniform [O,l] random variate U . UNTIL CI5 1-X”

9

263

X

X

and

an

he,-

I,. Devroye

The order statietiu

method for Bartlett’e

Generate three iid uniform (-1.11 random IF

based upon the fact that the dlstrlbutlon function of every hlstogram 1s plecewlse Ilnear. The allas method can be used In general to obtaln fast lnverslon algorithms (see Walker (1977), Chen and Asau (1074). Ahrens and Kohrt (lQ81). Kroomal an.d l?etemon (1970) and Devroye and Gyorfl (1985)). Archer (1980) Is malnly concerned wlth moment matching In hls deflnltlon of a data-based hlstogram. Scott

kernel

variates

v,,

v,,

v,.

lV,l=-max(IV,l.IV,l) THEN ELSE

RETURN

x+v,

RETURN

X+-v,

1:1979) and others discuss the issue of choosing ‘In equl-spaced hlstograms. In the reJectlon method, X 1s accepted wlth probablllty 2/3, so that the algorithm requlres on average three lndependent uniform random variates. However, we also need some multlpllcatlons. The order statlstlcs method always uses precisely three lndependent uniform random varlables. but the multlpllcatlons are replaced by a few absolute value operatlons. Sometlmes, K takes negatlve values, but Integrates to one. The denslty estlmate 1s 2

fn@)=

c t

where

c ls a normallzatlon

I+

constant.

ordlnary hlstogram and a to the deflnltlon of the Interval. For a data-based helght on the lnterval A,

1s

number

of polnts

n Xlength

In A,,

of A,,

Generators for these klnds of hlstograms :are easy to deflne. Associate wlth each data polnt Xi the interval coordinates (Li ,Ri) of the lnterval to which Xi belongs. Thus, the storage 1s 271 . Then proceed as follows:

-Xi

-&.kK(Tl *=,

The difference between an data-based hlstogram Is related helght of the histogram In each hlstogram, wlth lntervals A,, the

the bln wldth

Since

fn(2) I g,(z) * =

c nh,C

2 -xi

* K+(7)

adata-basedhiitogram.

Gsnerrtorfor



,=I

the reJectlon method can be altered slightly: Generator based upon the rejection method

Generate

a unlfom~

(1,

. . , n }-valued

Generate

a uniloi-m

[O,l]

random variate r.J.

random

integer

27

RETURN LE + U(RZ 4~ ).

REPEAT Generate {l.Z,

a random integer uniformly

2,

-

distributed

on

, n }.

Generate

a random

variate

W with

K+/IK+.

density

The data polnts could be rearranged In a preprocessing step according to membershlp In the same Intervals, e.g. by sorting. Thls can be used to reduce the storage. What 1s more important than storage and speed however 1s the conslstency of the underlying denslty estlmate. For example, lr the blns are deflned by the order statlstlcs (so that each bln has precisely one data point), the estlmate 1s not consistent for any / . The best one can hope for with a continuous density f 1s E (I 1 f, -f ( )=0 (n-‘j3) (which 1s worse than the best achievable rate wlth the kernel estlmate). See e.g. Scott (lQ7Q) or Devroye and Gyorfl (1985).

Y*Xz+hW. Generate Accept UNTIL, RETURN

a

uniform

[OJ]

random

variate

u.

-t&,(Y)