CalibratingConfidence Coefficients WEI-YINLOH* underwhich ofpoorcoverageprob- Freedman1981;Singh1981),theconditions fordealingwiththeproblem Twoapproaches areproposed.The first Y-> yarenotcompletely intervals confidence abilities ofcertainstandard atthepresenttime. understood from directly thattheactualcoveragebe estimated is a recommendation Even when Yn,-r y foreach F in a class fQ,the converin addition to thenominallevel.Thisis thedataanditsvaluereported givena fixed overfl.Therefore, maynotbe uniform gence simulation and density of computer a combination achievedthrough forone F butnotforanotherin validity of the procedureis provedfora n, Inmaybe satisfactory The asymptotic estimation. A classicalexampleis thenonparametricQl. number ofcommonsituations. inusingthenormal-theory estimation of thevarianceof a population In viewof theseproblems,I proposein thisarticlea consis- methodofestimatingdirectly coverageprobability terval.Here itis shownthattheestimated fromthedata.The effecyn distriifthepopulation estimates thetruecoverageprobability tently insomeexamples demonstrated of this is proposal tiveness sixthmoment. butionpossessesa finite argument. restson thefollowing It is a procedure formodi- inSection2. The method The secondapproachis moretraditional. Givena confi- SinceifF wereknownwe wouldbe able to saywhatYnis coverageproperties. an interval to yieldimproved fying obtainedin thefirst (bybrute-force coverageprobability itsestimated denceinterval, whynot ifnecessary), computer simulation The interval estimateF, by an estimator say,fromthe data and approachis usedto alterthenominalleveloftheinterval. Fn, interval. In the withthismodified nominallevelis calleda calibrated fromFnwill Ynthatintervals interval fortheesti- thenfindouttheprobability is thenormal-theory case thatthegiveninterval definterval is provedtobe asymptoticallycontain0 = O(F )? (See AppendixB fora working mationofvariance, thecalibrated themeanof of Yn For example,whenestimating themethod inition application, exist.As another robust as longas sixthmoments forvariance estimation. F usingthet interval, interval procedure a bootstrap is usedtomodify ofthose 9nwouldbe theproportion interval. ofa newbootstrap Thisleadsto thederivation t intervals generatedbysamplesfrom thatcontainits
Fn mean0.) Thisidea is, strictly speaking,notnew.It is justa new inwhichthedata philosophy, ofthebootstrap application 1. INTRODUCTION WhatI hopeto show is resampled formoreinformation. overtheuse ofyas an of a distribution F, and is thatitcan lead to improvements Let 0 = 0(F) be a functional interval (CI) for0 estimatorof Yn. It is easy to see whyYnshould estimateYn letI, be a nominallOOy%confidence SupposethatC(y*) is a classofdistributions basedon a sampleofsizen. Theword"nominal"indicates consistently. F forwhichYn y*as n -> oo.If Fneventually containing of not may say, thatthetruecoverageprobability, In Yn , Y* be exactlyy. Usually,though,thereis a class of F for belongsto C(y*) a.s., thenwe mayalso expectYn ofYn.The consistent estimator to Yn,in thesensethat a.s.; thatis, Ynis a strongly whichy is a good approximation thatYn-Y* Yn- y 0 as n -* oo.For example,if 0 is the mean of F argumentwill be rigorousifit can be shown that over C(y*). Note thatit is notnecessary t interval, it is wellknownthat uniformly and In thenormal-theory = y.Thisadvantage in where be clear Example 4, will y* Because F has a finite variance. that Yn yprovidedonly intervalforCU2 the class of F withfinitevarianceis ratherlarge,the t I applythemethodto the normal-theory tobe "robust."In contrast, and show that - Yn 0 a.s. providedonlythatF has isgenerally considered interval interval forthevariance finitesixthmoment. thecorresponding normal-theory ThreemoreexamplesaregiveninSection2. In Example 0 iffthe In of F is nonrobust. that case, o2 Yn y Fortheoneofthemeanis considered. difference can 1, theestimation ofF is 3. ForotherF's thelimiting kurtosis conditions itisprovedthat,undermoment the sidedtinterval, be quitelarge;forexample,ifF is thet5distribution, less on F, Yn Yn= O(n-1) a.s., whereasYn- y = 0(n-112), coefficient nominal90% CI forU2 hastrueconfidence it can be provedonlythat intervals, (1959, For somebootstrap than.60inlargesamples-see Table2 andScheffe both Yn- y' and Y' -Yn convergeto zero as n -s oo. The chap. 10). thattheconvergence resultssuggest, however, The convergenceof Yn- yto zero is harderto ascertain simulation such rateforYn- Ynmaybe faster. viamorecomplicated forCl's constructed procedures twosituations where,forall intervals Examples2 and3 illustrate method.In thelatter, as Efron's(1982)bootstrap most reasonable errorfor Fn. ofselectedsta- n, YnestimatesYnwithout frombootstrap aredetermined histograms tistics (see, e.g., The intervalin bothcases is the bootstrap"percentile and,exceptforcertainclassesofstatistics ofEfron(1982). In Example2, 0 is the Abramovitch and Singh1985; Beran 1982; Bickel and method"interval approximation median.Hereyis knowntobe an excellent , of thesupportof an is In 0 for endpoint 3, Example Yn * Wei-YinLoh is Assistant Uni- F. Here ofStatistics, Professor, Department 0 Yn forall n, a totallyunacceptablesituation. in of Wisconsin, Madison,WI 53706.Thisworkwas supported versity It is tempting to tryto use thisidea of "calibrating" y partby NationalScienceFoundationGrantsMCS 8300140and DMS 8502303;thefirstgrantalso providedaccessto theCRAY supercom- withY,to construct whosetruecoverage new intervals
Kernel estimation; level;Interval Confidence KEY WORDS: Bootstrap; estimation. density
-
C. F. J. toT. Hesterberg, puter.The authoris grateful E. L. Lehmann, onvarious andtworeferees foruseful comments Wu,theassociateeditor, of themanuscript, out an errorin an drafts to B. Efronforpointing on the earlyversionof Table 1, to J. Gurlandforhelpfuldiscussions distribution in Section2, and to KathyPedulaforassistance mixture on theCRAY. withprogramming
? 1987 American StatisticalAssociation Journalof the American StatisticalAssociation March 1987,Vol. 82, No. 397, Theoryand Methods
155
This content downloaded from 128.104.46.206 on Sat, 10 Oct 2015 17:31:32 UTC All use subject to JSTOR Terms and Conditions
156
Journalofthe American StatisticalAssociation,March 1987
probabilities come closerto the desirednominalvalue. asymmetric intervals (aboutX,), so as toreflect anyskewThisis done in Section3, leadingto theintroduction (1978)proposedthemodified tinterval, of nessinF. Johnson calibrated intervals withpossiblyimproved In [Xn + (6-e3,ns -2n-')] ? tn 1,1san -1/2, where&3,n iS the properties. a newmethodofconstructing an interval particular, from samplethirdcentralmoment.A morerecentandgeneral thebootstrap is proposedand shownvia sim- technique is to construct histogram Cl's froma bootstrap histogram. ulationto be quiteeffective. ofX* = n-1 Section4 presentsan appli- In thecase ofthemean,thisis a histogram cationofthismethodto a bivariatedata set. 21 X*, where (X1, X2*, . . , X*) are iid observations Throughout thisarticle,whenI referto bootstrapre- fromtheempirical cdfthatputsmassn-1on each obserI meanrandomsampling sampling fromtheempirical cdf. vationXi (i = 1, 2, . . . , n). Efron(1982) givesa number Similarly, wheneverI mentionthe percentile, bias-cor- ofmethodsforsetting Cl's fromthishistogram. Theperrectedpercentile, and bootstrapt intervals, I meanthe centilemethodprescribes as a nominal100(1 - 2a)% CI (unsmoothed) methods originally defined inEfron(1982). the interval[OL, Ou], where OL and Ou are the lower and uppera pointsof thehistogram. The bias-corrected per2. FOUR EXAMPLES centilemethodattempts to incorporate theskewnessofF 2.1 Example 1: Estimatinga Mean betterby redistributing theprobability unequallyin the two tails of the Efron histogram [see (1982) fordetails]. Let 0 be themeanof F and I, the1OOy%two-sided t Viewing 0 as an asymptotic pivot, it is also natural Xn whereXn and s2 are the interval, X, + tn-1,_,,s,n-112, -
sample mean and variance, tn,ais the lOOa-percentileof
the t distribution withn degreesof freedom,and y = 1 - 2a. The following theoremshowsthatYnis a better estimator ofYnthanyforIn,as wellas itsone-sidedcounterpart. Theorem 1. AssumethatF is continuous andhasfinite eighthmoment.Let Fnbe an estimator ofF suchthatits first eightmoments converge to thoseofF a.s. Then,for the one-sided t interval,Yn y = O(n-112) and Ynthenfor Yn= O(n1) a.s. If F has a finitetenthmoment, -
to considerthe interval[2Xn
Ou, 2Xn
OL],
whichis
the reflection of [OL, Ou] aboutXn. I will call thisthe reflection methodinthisarticle[seeLoh (1984)andEfron forand againstthis]. (1979a,remarkD) forarguments forYn- Y The nexttheorem conditions givessufficient and Yn Yn to convergeto zero a.s. for these bootstrap The proofis presentedin AppendixA. Note intervals. ofFnto converge thatitis notnecessary forthemoments to those of F. The convergenceof Yn- yto zero only is provedin Beran(1984) undermoregeneralconditions.
= 0 O(n-') and Yn- Yn= Theorem 2. Let F be anydistribution withfinitesixth o(n-1) a.s. andletFnbe an estimator ofF suchthatitsfirst moment, convergea.s. SupposethatInis a bootstrap Proof. The resultfortheone-sidedt interval depends sixmoments on thetwo-term bias-corrected forthedistributionCI constructed fromthe percentile, Edgeworth expansion perofthet statistic, as follows: methods.ThenYn- y- 0 andYncentile,or reflection Yn 0 as n -o a.s. Pr(t c x) = 4>(x) Table 1 displaysthe resultsfroma simulation experi-
the two-sidedt interval,Yn-
+ (6-93
-3n-1/2)(2x2
+
1)+(x)
+
O(n1),
(2.1)
mentbased on thisexamplewithn = 10 and y = .90. Six interval procedures arecompared:(a) two-sided t, (b) per-
whereq2 and/ arethevarianceandthirdcentralmoment centilemethod,(c) bias-corrected percentile method,(d) of F, respectively, and 4F(.) and 4(Q)are the standard and(f) bootstrap t.The reflection method, (e) Johnson's t, normalcdfand density.[See, e.g., Hall (1983) or Abrabootstrapt was originally proposedin Efron(1982,sec. movitch and Singh(1985). Chung(1946) demonstrated methodtothe It of the thatthe"O(n-1)" termis boundedby Qn-lh(x),where 10.10). consists applying percentile inHinkstudentized formofthestatistic. [The arguments h(x)is a function ofx and Q is a constant depending only leyandWei(1984)andAbramovitch and Singh (1985) can on thefirsteightmoments of F.] Since,u3(F)= 0 when be used to show for the t that bootstrap interval, typically F is normal,it followsthatY, - y = O(n -1/2). Applying y = o(n-112) in the one-sidedcase and Yn- Y = thesameexpansionto F"ninsteadof F, we see thatthe Yn in thetwo-sided case; thenominallevely' forthis distribution oft underF,nmatchesthatunderF up to and o(n'1) of 'Yn as an matches theperformance interval, therefore, including thetermin n112. Therefore, Yn- Yn= O(n1) estimator ofYnforthet interval in Theorem1.] a.s. The distributions selectedforthesimulation are(a) norFor thetwo-sided t interval, then-112 termin theexand(d) exponential. normalmixture, mal, (b) uniform, (c) pansionforYnis missing, becausethesecondtermon the The particular normalmixture usedis 2rN(,u1, ci2) + (1 rightside of (2.1) is an even function of x. Therefore, j), with7Z= .5504,,u1= .3342,/12 = - .4091, m)Nf(,2, Yn- y = O(n-1) in thiscase. By resortingto a three-term v1= .2385,andCT2 = 1.3603.[As usual,N(,u,a2) denotes as Edgeworth expansion,however,the same argument a normaldistribution withmean,uand variance2.] Lee aboveshowsthatnow Yn - Yn = o(n'1) a.s. [A more and Gurland(1977) showedthatthisdistribution is quite careful inChung(1946)indicates analysis usingtheresults unfavorable for the t testwhenn is small.The one-sample that therateisO(n-4'3).] estimateF, usedhereis a data-basedkerneldensity estiThereare severalotherwaysof constructing a CI for mate.Appendix thewholeprocedure B describes ingreater themean.Themostinteresting oftheseattempt toprovide detail.
This content downloaded from 128.104.46.206 on Sat, 10 Oct 2015 17:31:32 UTC All use subject to JSTOR Terms and Conditions
Loh:CalibratingConfidence Coefficients
157
Table 1. Monte Carlo Estimatesof yn,E()J, and sd(yp) forExample 1 (n = 10, y = .90)
Distribution Normal
Uniform
Mixture
Exponential
Maximum SE NOTE:
"?"
Method
Yn
E(%n)
sd(yn)
t interval Percentile Bias-corrected Reflection Johnsont t Bootstrap t interval Percentile Bias-corrected Reflection Johnsont Bootstrap t t interval Percentile Bias-corrected Reflection Johnsont Bootstrap t t interval Percentile Bias-corrected Reflection Johnsont Bootstrap t
.89 .83 .83 .83 .89 .90 .89 .86 .85 .84 .89 .93 .89 .76 .73 .82 .87 .76 .86 .79 .79 .78 .86 .87 ?.02
.897 .840 .836 .836 .898 .897 .898 .843 .838 .834 .900 .905 .874 .813 .803 .816 .873 .860 .876 .822 .819 .812 .877 .887 ?.002
.030 .036 .037 .036 .030 .033 .031 .039 .039 .038 .031 .035 .051 .055 .059 .052 .051 .060 .052 .053 .054 .051 .052 .052 ?.004
quantitiesare maxima of estimatedstandarderrors(SE's).
2.3 Example 3: Estimatingan Endpoint of therightendpoint0 of the Considertheestimation ofa continuous support distribution F, usingthebootstrap method.A naturalquantity percentile to bootstrap here thebootis thelargestorderstatistic X(n).Unfortunately, of X(n)is of necessityto the leftof 0. straphistogram thesupportof interval lieswithin Because thepercentile thishistogram,it can never contain 0. Hence Yn, 0 for all y. The factthatthelatterholdsforall continuous F, however,impliesthatwe mustalso have , =0 ifF, is Thus =-Y. Notethatthesameconclusions continuous. methodas well.The applytothebias-corrected percentile butthey methodgivesmoresensibleintervals, reflection consistent too maynotbe asymptotically (see Loh 1984). 2.4 Example 4: Estimatinga Variance Let (X1, . . . , Xj) be a random sample fromF with
varianceU2 The 100(1 - 2a)% CI forU2basedon normal theoryis (n
-
l)S2IX2_1 l-a < U2 < (n
-
(2.3)
l)S2I/Xn2
wheresnis theunbiasedestimateof varianceand xnais the 100a-percentile of thex2 distribution withn degrees offreedom.It is wellknownthatthisinterval is sensitive to thekurtosis ,BofF. In fact(see Scheffe 1959,chap.10), V(n- 1)/2{s2v-2 -1}
-?
N(O, B2)
as n
->
oo,
Table 1 indicatesthat,apartfromthet and Johnson t (2.4) thecoverageprobabilities ofall ofthebootstrap intervals, intervals can be quitepoor. On theotherhand,the ac- where B2 = (,B - 1)/2. Hence the coverage Ynof (2.3) curacyof Ynin predictingYnis quite good (except forthe tends to 1 - 2'F(B'1Za), whichequals y = 1 - 2a only tinthemixture bootstrap distribution case). Notethatthe if,f= 3. (Throughout thisarticle, to the100a,za refers tabledoes notshowtheJohnson t to be anybetterthan percentile of thestandardnormaldistribution.) The foltheordinary t interval. shows theorem this, that,despite lowing Yn Yn 0 a.s. I now give two situationsWhereYnestimatesYnwithout Theorem 3. SupposethatF has a finitesixthmoment error. and Fnis an estimator ofF suchthatitsfirst sixmoments with the first four to converge a.s., thoseofF. converging 2.2 Example 2: Estimatinga Median - Yn Yn 0 as n > oo a.s. Then,forthe interval(2.3), Let 0 be the medianof a continuousdistribution F. Proof. Let,u = EX1 and Yi = c-2(X, - ,U)2(i = 1, Given the orderstatistics,X(1),X(2), . . , X(n),exact Cl's 2, . . . , n). Because the distributionof the leftside of for0 can be constructed by usingthefactthat,forany (2.4) is asymptotically equivalentto thatof (n/2)12(Y,n 1 ? k1 < k2 c n, 1), whereYndenotesthemean of {Y1,. . . , Yn},it suffices to considerthelimiting of theeventA(y) probability Pr[X(k1)< 0 ? X(k2)] = E (1)n (2.2) In fYn - 1) c y} under F and Fn. The Berry-Esseen theoremimpliesthat -
forall F. Efron(1982) used thisto demonstrate thatthe sup IPF{A(y)}- 'F{y(fl - 1)-1/2}I y methodcan be quiteeffective in setting interpercentile ? Kp(F)(fl 1)-312n -1/2, (2.5) vals.Fromthebootstrap ofthesamplemedian histogram forodd n, thismethodyieldsan intervalof the form wherep(F) = ElY1 - 113 = EFlcv2(Xl - ,u)2 - 1j3and levely K is a universal confidence [X(k1), X(k2)] withnominal(bootstrap) constant.Applying thesameresultto Fn closeto theYngivenin (2.2). For example,if gives remarkably n = 13, k1 = 4, and k2 = 10, one gets y = .914 and ? Kp(F,,)b-312n1-1l2(2.6) SupjPpn{A (Y)} - 14)yb,y"l2}j Yn= .908. Now suppose thatone did notknowabout (2.2) y but constructed the bootstrapintervalby usingthepercentilemethod.Because(2.2) is distribution free,mypro- wherebn,= var[&,-2(W - /?,)2], Whas distribution and Fn,, cedurewouldgiveY~n ofwhichFnis chosen, /Un andan~ arethemeanandvarianceofW.Theassumptions --Yn regardless providedonlythatit is continuous. statedimplythat b >(/11 -3 1o2 (say)as
This content downloaded from 128.104.46.206 on Sat, 10 Oct 2015 17:31:32 UTC All use subject to JSTOR Terms and Conditions
158
Joumal of the American Statistical Association, March 1987 Table 2. Monte Carlo Estimatesof y,,E(qn),and sd(%y)for(2.3) (y = .90) n = 25
Distribution Normal Beta(.6825,2) Uniform Exponential ts SE Maximum
Yn
.90 .89 .99 .64 .71 ?.02
n = 50
E(
sd()
.905 .891 953 .765 .844 ?.007
.053 .085 .031 .154 .108 ?.006
Y
E(P%)
sd(p,)
urnYn
.90 .91 .99 .64 .68 +.02
.905 .901 .969 .706 .811 ?.007
.045 .061 .024 .149 .124 ?.006
.900 .900 .991 .588 .588
are less than halfthe maximumin distribution NOTE: "-?" quantitiesreferto the maximumestimatedSE's. The SE's forthe uniform each case.
n -- ooa.s. Here po. maydepend on theparticularsequence (X1,X2, .. .). I concludefrom(2.5) and (2.6) that9nYn- 0 a.s.
n fromFn,and let &2 denotethevarianceof Fn.Finally, letIn = [ksn2,o) be thecalibrated interval forq2, where k
k(X1, X2, ... , Xj) is chosen so that Prp,(ksn2< Y. (In thisexpression,k and &2 are fixedgivenFn) Table2 givestheresultsofa simulation experiment for Then PrF(a2 E In) -- y as n -m oo. fivedistributions, withr = .90 and n = 25 and 50. The Proof. Let ar(F) = E(X /cr)r denote the standardized theoretical valuesoflimYnare also reported. The paramrth momentof F and a& = ar(Fn). Hsu (1945) showed 2)
=
=
etersof the beta distribution are chosenso thatit has that kurtosis equal to 3. The convergence of n- y,to zerois seento be quitegood forthenormal,beta,and uniform sup IPrF{n1"2(S2U-2- 1)(a4 - 1)-1/2 < X}- D(x)l distributions, butslowerfortheexponential distribution. x Notethatthet5distribution is notcoveredbyTheorem3. cAn? - 121aja4 - 1 -3 Again,thedetailsof thesimulation can be foundin Apforsomeuniversal constant A. It followsthat pendixB. r =Prn(kS*2< A2) 3. CALIBRATED INTERVALS =
PrF {n"12(S*2A-2
=
PrF(kS2< a2)
1)(A4
-
1)1/2
-
The preceding examplessuggestthat,givenan interval procedure 0, Y,ncan be a moreaccurate < n1'2k(1 k)(A4 - 1)1/2} In forestimating estimate ofYnthanitsnominallevel.Whenthisis thecase, = 1{nh/2k(1 - k)(a4 - 1)1/2} + Op(n112)a.s. it is naturalto ask whether one can use theinformation in Yn a betterinterval, I* say,for0, thatis, Thisimplies to construct thatk - 1 = 1 + zyn-1/2(a4- 1)1/2+ one forwhichYn(In) is closerto the desiredlevel than op(n-112) a.s. Hence andappliesthem Thissectionproposestwomethods yn(In). PrF((72 E- In) to theproblemofestimating thevarianceofF. 3.1 Calibrated Normal-Theory Interval Supposethatin Example4, we wanta 90% confidence interval forthevariancea2. Furthersupposethat,upon usingthe CI (2.3) witha = .05 (so thaty = .90), we find nowto increasey (e.g., to .95) 5n = .70. It is tempting and recompute Ynfortheupdatedintervalto see if5, is
closerto .90.One might evenimagine thisprocess iterating (i.e., changing until9nis exactly.90. The ycontinuously) finalvalue of y thatresultsin thisis thenputback into (2.3) to obtaina modifiedinterval.I will call thisthe calibrated normal-theory (CNT) interval. The interesting questionis whateffectthisprocessof hason thecoverageproperties calibration ofthemodified on F for interval. The following theorem givesconditions whichy,,(CNT) is consistent. It suffices to stateandprove theresultfortheone-sidedinterval. 4. AssumethatF is continuous Theorem andhasfinite sixthmoment. Let Fnbe a continuous ofF such estimator thatitsfirstsixmoments convergeto thoseof F a.s. Let basedon a sampleofsize 5n*2denotethesamplevariance
= Pr {a-s22
ya.s. In general,it wouldbe impractical to iteratethecalito thedesiredlevel.I brationprocessuntilYnconverges have foundthatoftena one-step calibration plus linear is enough.To illustrate, interpolation supposethatwe want a CNT interval withdesiredcoefficient yo.FirstfindYnfor theinterval (2.3) withy = yo.Thenset A=
-1
= Yo + (1
if A _ yo -
Yo)(YO- Y )(1
-
YA)1
if A < yo.
(3.1)
Thatis,thepoint(yl,yo)is gottenbylinearly interpolating between andeither on (0, 0) or (1, 1) depending (yQ,Yn) whether Y' ? or < y. (For example,in thehypothetical ofthissection,ifyo= .90 case discussedin thebeginning
This content downloaded from 128.104.46.206 on Sat, 10 Oct 2015 17:31:32 UTC All use subject to JSTOR Terms and Conditions
159
Loh:CalibratingConfidence Coefficients
(all and Yn= .70,we willset y2= .9667.)The CNT interval nominallevelchosenis y = .90,andfourdistributions so thata2 = 1) are used: (a) normal,(b) t5, standardized is thengivenby (2.3) withy =y intervals The competing and(d) exponential. theCNT interval has (c) uniform, It shouldbe notedthatalthough interval(2.3), CNT-calibrated of variance, are NT-normal-theory forthe estimation been definedspecifically JK-jackknifeintervalbased on s2, JKLis quitegeneral.It includes,forex- NT interval, thebasic definition basedon log(s2), PER-bootstrapperinterval oftheform jackknife versionof anyinterval ample,thecalibrated method, percentile method, BCP-bias-corrected centile 0 of a where 0 is estimator 0 + parameter any zaSE(Q), ofthestandard errorof0 (such BST-bootstrapt basedon S2, BSTL-bootstrapt based andSE(0) is anyestimate on log(s2), PVT-Schenker's(1985)pivotalmethod,and as a jackknife estimate). interval. reflection CSR-calibratedshortest 3.2 A New BootstrapInterval is S2 + t_1,95SD, whereSD is thejackThe JKinterval calibratedin- knifeestimateof standarderrorofSn.The JKLinterval Because our algorithm forconstructing interval forlog(a2)basedon log(s2), subgeneral,itcanbe appliedtocalibrate is thejackknife tervalsis completely to recoverthe intervalfora2. as well. For example,the undercov- sequentlyexponentiated bootstrapintervals inTable 1 may, [Miller(1968)showedthatjackknifing log(a2)isbothpowintervals bythebootstrap erageexhibited Sincecalibration erfuland robustfortestingvariancesin thetwo-sample via calibration. hopefully, be corrected in- problem.] The PVT intervalhas the form[nU n,SIL], calibrated bootstrap is itselfa formof bootstrapping, in- where [OL, Ou] is the PER interval.(The PVT intervalis be callediterated bootstrap tervalsmayalso properly andbootobtainedbytreating s2Ua2 as a pivotalquantity tervals. uses(3.1). BST andBSTL it.) The CNT interval ofcalibrating theeffect anyofthe strapping Insteadofexamining t versionsofJKand JKL,respectively. includedin Table 1, I proposeherea are bootstrap bootstrap intervals newbootstrap proceduredesignedto takefulladvantage The resultsare shownin Table 3. The valuesforE(L) ofthe oftheexpectedlengths referto estimates his- generally ofthecalibration idea. Recallthat,givena bootstrap at zero.The onlyexceptionis forthe truncated method intervals togramand a chosenvalue of y, the reflection The distributions. at thet5andexponential prescribes theinterval I, = [20 - u,,20 - 0LI, where BSTL interval unstableinthesetwo seemstobe extremely ofthe BSTL interval 6L andOuare thelowerandupper(1 - y)/2-points of E(L) are manyordersof maghistogram. The objecthereis to retain100y%ofthehis- situations-estimates is symmetric, thereis nitudelargerthanfortheothermethods,and the assotogrammass.Unlessthehistogram Let ciatedestimatesof standarderrorsdid not seem to deno a priorireasonfortreating thetailssymmetrically. [06, 0*] be the shortestintervalcontaining1OOy%of the crease with increasein the numberof Monte Carlo fortheBSTL thatE(L) isinfinite I conjecture intervalI* = replications. The corresponding reflection histogram. whenn = 20. Therefore, atthesetwodistributions thanI,. If In interval [20 - 60, 20 - 0*] wouldthusbe shorter I* willonlymaketheproblem insteadofexpectedlength, ofmedianlengthare estimates undercovers 0, theinterval inthetable.TheJKL,PVT,and (inparentheses) reported worse.Ifwe do notstophere,however,butcalibrateI, somewhat BST intervalsalso appear to be quite unstableforthe we maybe abletoovercome theundercoverage obtaina relatively shortCI. I propose samplesize studied,thoughnot as muchas the BSTL. andsimultaneously of jackknifeintervalsin other as a newbootstrap therefore, theresultof cali- [The relativeinstability interval, shortest problemshas also been observedin Efron(1982,p. 15) bratingI* and willreferto thisas a calibrated andWu(inpress).]On theotherhand,thePER andBCP reflection (CSR) interval. a2. tendto be too shortand henceundercover intervals 3.3 A Monte Carlo Study thattheCNT intervalis trying Thereis someindication oftheNT interval, although theproblemsetoutin Example4, to setrightthemiscoverage To examinefurther forn = 20. The notas muchas one wouldlike.Exceptfortheexponential wasperformed a MonteCarloexperiment ofynand E(L) forEstimating Table3. Estimates a2 (y = .90; n = 20)
Normal t5 Uniform Exponential
PER
BCP
PVT
BST
BSTL
CSR
.90 1.26
.81 .91
.80 .93
.84 1.27
.88 1.46
.88 1.71a
1.00
2.4b
.85
.71 1.17
.72 1.21
.79 2.1c
3.2b
.85
.85 (1.4)
.87 1.62
.91 .82
.86 .68
.85 .67
.85 .88
.88 .82
.90 .77
.87 .74
.80
.68 1.33
.69 1.37
.72 3.2c
.83
.84 (2.6)
.71 1.43
NT
CNT
JK
JKL
E(L)
Yn
.90 1.25
.89 1.22
.86 1.07
Yn E(L)
.76 1.25
.78 1.41
.76 1.38
E(L)
Yn
.99 1.25
.96 1.08
.90 .77
Yn E(L)
.64 1.25
.69 1.56
.72 1.56
Distribution
3.5b
55d
.83
NOTE: Medianlengths aregiveninparentheses. Unlessotherwise SE's foryn .01 andmaximum SE's forE(L) stated,maximum .02. a SE = .06. b SE = .2. CSE = .1. dSE = .3.
This content downloaded from 128.104.46.206 on Sat, 10 Oct 2015 17:31:32 UTC All use subject to JSTOR Terms and Conditions
Journalofthe American StatisticalAssociation,March 1987
160
distribution case,inwhichno methodappearssatisfactory Table 4. 68% intermsofbothcoverageprobability andinterval length, Method theJKandCSR intervals appearquitereasonableforthe otherdistributions. The highcomputational cost of the Normal-theory CSR methodis somewhatoffset byitsabilityto provide Percentile percentile asymmetric intervals (a property thattheJKmethod lacks). Bias-corrected Bootstrapt(p) Furtherdetailsof the simulation are givenin Appendix Bootstrapt(arctanhp) CSR B.
Confidence Intervalsforp Interval .16, p + .12, p + - .17, p + - .19, p + - .42, p + (p- .16,p+ (p
-
-
Length .09) .13) .10) .15) .09) .11)
4. A BIVARIATE EXAMPLE:THELAW SCHOOL DATA
.25 .25 .27 .34 .51 .27
The calibration was carriedoutwith1,000replicatesamples drawnfromthedensity estimate.For each replicate a forthesamplecorrelation sample, bootstrap histogram Thepreceding section thattheCSR method demonstrated was 1,000bootstrap samples. constructed, using another can produceintervals thatare fairly shortas wellas have respectively, .615and The values of thus obtained were, Yn One advantage quitesatisfactory coverageprobabilities. = .772. Linear .771 as the adjusted interpolation gave y of anybootstrap methodis thepotentialforconstructing asymmetric intervals (aboutthepointestimate).We will nominallevel. Theresulting CSR interval is showninTable4, together examinethisproperty of theCSR methodbyapplying it with the intervals basedon normaltheory corresponding to a realbivariate dataset.Theexercisewillalsoillustrate tintervals and other methods. The twobootstrap bootstrap howthemethod canbe extended tomultidimensional data. t from p and Fisher's are based on the statistics computed The data,givenin Efron(1979b,1982),consistof the with the corretransformation arctanh(p), respectively, averageLSAT andGPA scoresfor15American lawschools. of error used for estimates standard sponding jackknife The problemis to construct a 68% CI forthecorrelation this studentization. Efron noted that for data, (1982, p. 83) isp5= .776.To apply coefficient p. The samplecorrelation interval is moresimilar tothe percentile of thebias-corrected theCSR method,we use thevariablekernelalgorithm ininterval than the uncorrected percentile normal-theory Breiman,Meisel,andPurcell(1977)witha normalkernel the the latter too In this respect terval, being symmetric. to estimatefirstthetruebivariatedensity.[See Devroye withtheformer is in qualitative agreement of thisdensity CSR interval (1985) forsome largesampleproperties is not much different. In both contrast, two. Its length also estimator.] Figure1 showsa contourplotoftheestimate the to be conservative. of t intervals [Efbootstrap appear on the15 datapoints.The estimateis unisuperimposed t seems to ron observed that the bootstrap (1982, p. 88) modal,hasa littleridgerunning and northeast-southwest, to and its to be translation application specific problems has correlation .344. (The difference coefficient between the correlation coefficient results.] gives poor thiscorrelation and p is an indication of the amountof smoothing producedbythevariablekernelestimate.) 5. CONCLUDING REMARKS Becauseonlyone setofdatais involved, we can afford to be a littlemoreelaboratein calibrating the shortest The ideal confidence intervalis one forwhich(a) its Insteadofusingjustone calibration as in truecoverageprobability lengthinterval. ynis closeto thenominallevel intervals werecali- y,and (b) thisproperty foras manydis(3.1), two shortest holdsuniformly lengthreflection as possible,at leastforlargeenoughn. These brated,withnominallevels68% and 90%, respectively.tributions
(0
'I;
0~
(0
--,
piu~~~~~~~~~~~~~~~
N
450
500
550
9
800 LSAT
650
700
Figure1. Contour PlotofDensityEstimate.
This content downloaded from 128.104.46.206 on Sat, 10 Oct 2015 17:31:32 UTC All use subject to JSTOR Terms and Conditions
750
Loh:CalibratingConfidence Coefficients
161
twingoalsmaybe called"accuracy"and "robustness of finitethirdmoment.Let s2 and r, denotethesamplevariance Thenthereis respectively. itis wellknownthat, and thirdabsolutecentralmoment, validity," respectively. Unfortunately, function K(sn, rn)suchthat exceptforcertainproblemsadmitting so- a continuous nonparametric 112a.s. as -1 - (D-(a)I -- K(Sn, rn)n lutions(suchas estimating themedian),thetwogoalsare O0L ) (nSr) In1l/2(OL-X~s~ / T)Sn oftenincompatible (Bahadurand Savage1956). A similarresultholdsforOu. I proposeinthis Proof.Let Pr* denoteprobabilities thisdifficulty, To circumvent somewhat underbootstrap resamarticlea newwayoflookingat theproblem-namely, to plingandXT*be a bootstrap theorem mean.The Berry-Ess6en fromthedataandreportitin addition impliesthat estimate y,directly to y.The potential valueofthisapproachis demonstrated la - 4[n1l2Snl(0 - Xn)]I in the examples,wherewe see that,besidesimproving itcansometimes a ythatistotally correct accuracy, wrong. = Pr*{nl/2s-1(X* - Xn) C n12s1(0L - Xn)} The proposedmethodis, of course,notfoolproof. A - 4[nl/2Sn 1(0L - X)II is theestimation of an endpoint0 of F, counterexample c K1rns 3n-1"2 a.s., whereF is completely unspecified. Here anyasymptoticallyvalidintervalmustdependon some knowledgeof whereK1is a universal constant. thisgivestheresult. Inverting thedensityof F near 0. Unlessour estimator Fnis told Lemma2. Let Inbe a CI forthemeanconstructed fromthe this,theprocedure cannotbe expectedtogivegoodresults percentile andsupposethatF hasa finite method, sixthmoment. all thetime.(Thisremark doesnotcontradict Example3, Then,foreveryc > 0, thereis a continuous function CF(c), sincethepercentile interval considered thereisnotasymp- dependingonlyon e and thefirstsixmoments of F, suchthat totically validforanyF.) |Y - yl c e + n-"12C (e). If we giveup therequirement ofuniform convergence Proof.FromLemma1, we have of ynand ask onlythatyn-- y at each fixedF, thenmany Pr(0L ? 0) 2 d-'(a) - n-112K(sn, r )} 0) C Pr{n"2sn'(X methodsare availableforspeedingup theconvergence. - 0)-1 = Pr{n 12(Xn 2 D(sn, rn)} Hall (1983), Hinkleyand Wei (1984), and Abramovitch = where n-1/2K(sn, and Singh(1985), forexample,gave methodsbased on D(Sn, rn) a 1sn[qD-( a) rn)]and U2 iS the > F. variance of Let . .. e 0 and K1, , K4 denote constants inverting Edgeworth expansions. The calibrated intervals c on depending only and the six first moments of F. By Chehavethesameaim.Unlikemethodsbasedon Edgeworth byshev's there exists K1 inequality, such that the event A = expansions,however,whichrequireknowledgeof the - a21> K1or Irn- pl > K1}has probability less than c. {nf2lIs' leadingtermsof theexpansionsand calculationof highHerep denotesthethirdabsolutecentralmoment ofF. Byconordermoments(whichmaybe unstable),the methods tinuity, theminimum ofD(sn, rn)overthecomplement ofA is in Section3 are lessdemanding introduced ofmathemat- boundedbelowbyM = 4F-'(a)[1 + n"-12K2(F)],forsomeK2. icalexpertise, sincetheyareentirely basedon simulation. Therefore, be easierto implement Therefore, theymight inpractice, Pr(0L ? 0) c e + Pr{n112(XT- 0)a-1 ? M} ifa computer is available. e + a + n"-12K3, The calibratedintervals obviouslyrequiremanymore arithmetic to be performed operations than,say,theHall bytheBerry-Esseen theorem. Thisand a corresponding result intervals. In the case of the CSR interval, if B sets of forOuimplythat IYn- Yl - 2e + n"-12K4. pseudorandom samplesareusedtoconstruct thebootstrap Lemma 2 impliesthat Yn- y --> 0 as n -* oo. The proofof andC setsofsamplesareusedtocalibrate histogram each Theorem2 is completedbyapplying thesamelemmato F,n. of these,thena totalof BC setsof samplesneed to be and processed.In otherwords,ifit takesone generated APPENDIXB: COMPUTATIONAL DETAILS unitof computer timeto calculatea Hall intervaland B The experiments in Examples1 and 4 were based on 500 unitsto computea percentile interval, thenitwouldtake replications each.For eachreplication, a kernelestimate ofthe BC unitsto obtaina CSR interval. The calibrated version underlying densityof F was firstobtained.The normalkernel of a standard(nonbootstrap) forexample,the wasusedthroughout, interval, withbandwidth chosenviathedata-based CNT interval, on theotherhand,requiresonlyC unitsof algorithm in Scott,Tapia,andThompson(1977) [see suggested computertime,because no bootstraphistogram is re- alsoScottandFactor1981,formulas (2.10) and(2.11)]. Starting quired.The appropriate valuesof B and C to use will withthesamplerangeas theinitialguess,20 iterations of this wereexecutedto arriveat theeventualbandwidthdependon theproblem,butwiththegreateravailability algorithm procedure prooffastcomputers, thecomputational costshouldbe more I didthisinsteadofusingtheNewton-Raphson posed the authors to avoid the of by original converpossibility affordable withtime(see Efron1979b). wasselected,0 = 0(Fn)was The readeris referred to Loh (1985)fora discussion of genceto zero.Afterthebandwidth and of size n fromFnweredrawn. 100 computed samples similarissuesin a hypothesis testing setting.
The fraction of thesesamplesforwhichthe corresponding intervals contained0 gave an estimateof Y. The averageand APPENDIXA: PROOF OF THEOREM2 standard deviation ofthesevaluesofYn overthe500replications intheoutermost layeroftheMonteCarloprovided theestimates Onlytheproofforthepercentile method is given,because of and sd(y%) in Tables1 and2. For theintervals derived E(y,,) similar proofs holdfortheothertwomethods. The proofis via bootstrap methodsin Table 1, thebootstrap histograms of broken intotwolemmas. thesamplemeanwereconstructed from100bootstrap samples. Lemma1. Let(X1,. . . , Xn)be aniidsample from F with Thisformed thethird(innermost) layeroftheMonteCarlo.
This content downloaded from 128.104.46.206 on Sat, 10 Oct 2015 17:31:32 UTC All use subject to JSTOR Terms and Conditions
162
Journal of the American Statistical Association, March 1987
Jber.d. Dt. Math(1984), "BootstrapMethodsin Statistics," The resultsforTable 3 wereobtainedbyusingthefollowing Verein, 86, 14-30. variancereduction technique.Let yNT and E(LNT) denotethe Theory D. A. (1981),"SomeAsymptotic P. J.,andFreedman, oftheNT interval. Bickel, andexpected truecoverageprobability length 9, 1196-1217. fortheBootstrap,"TheAnnalsofStatistics, MonteCarloesti- Breiman,L., Meisel,W., andPurcell,E. (1977),"VariableKernelEsFor eachdistribution and all othermethods, mates of Yn -nT
and E(L - LNT) were obtained, using the
ofMultivariate 19, 135-144. Densities,"Technometrics, timates
ofStudent's StaDistribution same2,000replicates.[For the CNT interval, e.g., let i (NT) Chung,K. L. (1946),"The Approximate AnnalsofMathematical Statistics, 17,447-465. tistic," be 1 or 0 accordingto whetherNT containso2 or not, for ofVariableKernel Devroye,L. (1985),"A Noteon theL, Consistency and let W = each replicatesample.Definei (CNT) similarly, TheAnnalsofStatistics, 13, 1041-1049. Estimates," i (CNT) - i (NT). Then Yn -
YnT
is estimatedby averagingW
Efron,B. (1979a), "BootstrapMethods:AnotherLook at theJack-
7, 1-26. ismuchquicker knife,"TheAnnalsofStatistics, overthe2,000replicates.] BecausetheNT interval Thinking the andtheTheoryofStatistics: (1979b),"Computers via anotherMonte to compute,yNT was estimatedseparately SIAM Review,21, 460-480. ofYnreported Unthinkable," Carlorun,using50,000replicates. The estimates theBootstrap andOther Resampling Plans, (1982),TheJackknife, of yNT and Yn- yNT in Table 3 are thesumsof theestimates andAppliedMathematics. Philadelphia: SocietyforIndustrial ofstandard withestimates errors adjustedaccordingly. Estimates Hall, P. (1983),"Inverting an Edgeworth Expansion,"TheAnnalsof 11,569-576. of E(L) are obtainedsimilarly, althoughE(LNT) is calculated Statistics, ofJackknife ConHinkley, D., andWei,B.-C. (1984),"Improvements sinceo2 = exactly.[E(LNT) = 1.25 forall of thedistributions, 71, 331-340. fidenceLimitMethods,"Biometrika, 1.] Quitesubstantial reductions in varianceswereachieved(as Hsu, P. L. (1945),"The Approximate of theMean and Distributions ofwhatwouldhavebeenobtainedhada direct Varianceof a Sampleof Independent muchas one-half Variables,"Annalsof MathematicalStatistics, 16, 1-29. meMonteCarlobeenused). In thecase oftheBSTL interval, for Intervals N. J. (1978),"Modifiedt Testsand Confidence dian(L) is estimated by themedianof theMonteCarloreali- Johnson, AsJournalof theAmericanStatistical Asymmetrical Populations," zationsofL. sociation, 73, 536-544. in Table 3 werebased on 100 Lee, A. F. S., andGurland,J.(1977),"One-Samplet-TestWhenSamAll of thebootstrapintervals wereused for bootstrap replicates, and another100 replicates TheAnnalsof StaplingFroma Mixtureof NormalDistributions," tistics, 5, 803-807. calibrating theCNT and CSR intervals. AgaintheScottet al. With an Endpointof a Distribution Loh, Wei-Yin(1984),"Estimating to estimate (1977)algorithm was employed densities. 12, 1543-1550. Resampling Methods,"TheAnnalsofStatistics, on a CRAY supercomThe resultsinTable 3 werecomputed (1985),"A New MethodforTestingSeparateFamiliesof Hyforthisarticleweredone puter.The restof thecomputations Statistical Association, 80, 362potheses,"Journalof theAmerican 368. fromthe on a VAX 11/750,usingrandomnumbergenerators AnnalsofMathematical Variances," Miller,R. G. (1968),"Jackknifing andStatistical International Mathematical Library. June1984.RevisedMay1986.] [Received
REFERENCES Pivotal Corrected L., andSingh,K. (1985),"Edgeworth Abramovitch, 13, 116-132. andtheBootstrap,"TheAnnalsofStatistics, Statistics ofCertain Bahadur,R. R., andSavage,L. J.(1956),"TheNonexistence AnnalsofMathProblems," in Nonparametric Procedures Statistical 27, 1115-1122. ematical Statistics, The Bootstrap Beran,R. (1982), "EstimatedSamplingDistributions: 10,212-225. TheAnnalsofStatistics, andCompetitors,"
Statistics, 39, 567-582. NewYork:JohnWiley. H. (1959),TheAnalysisof Variance, Scheffe, N. (1985),"QualmsAboutBootstrap Confidence Intervals," Schenker, Journal Statistical Association, 80, 360-361. oftheAmerican Scott,D. W., andFactor,L. E. (1981),"MonteCarloStudyofThree Journal Probability DensityEstimators," Data-BasedNonparametric Statistical Association, 76, 9-15. oftheAmerican J.R. (1977),"KernelDensity Scott,D. W.,Tapia,R. A., andThompson, Estimation Journal Analysis,1, 339-372. ofNonlinear Revisited," ofEfron'sBootstrap," Accuracy Singh,K. (1981),"On theAsymptotic TheAnnalsofStatistics, 9, 1187-1195. Bootstrapand OtherResampling Wu, C. F. J. (in press),"Jackknife, TheAnnalsofStatistics. Inference in Regression" (withdiscussion),
This content downloaded from 128.104.46.206 on Sat, 10 Oct 2015 17:31:32 UTC All use subject to JSTOR Terms and Conditions