Selecting the Better Bernoulli Treatment Using a Matched Samples Design Author(s): Ajit C. Tamhane Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 42, No. 1 (1980), pp. 26-30 Published by: Blackwell Publishing for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2984734 Accessed: 21/10/2010 18:20 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=black. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact
[email protected].
Royal Statistical Society and Blackwell Publishing are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series B (Methodological).
http://www.jstor.org
J. R. Statist.Soc. B (1980), 42, No. 1,pp. 26-30
SelectingtheBetterBernoulliTreatmentUsing a MatchedSamplesDesign By AJIT C.
TAMHANE
Northwestern University, Evanston, Illinois
[Received June1978.Finalrevision February 1979] SUMMARY The problemof selectingthe betterBernoullitreatment usinga matchedsamples designis considered in theframework of theindifference-zone approach. A singlestageprocedure is proposedand itsproperties are studied.Tablesofsamplesizesfor implementing the proposedprocedureare given. A comparisonis made withan independent samplesdesignand theassociatedSobel-Huyett selectionprocedure. Keywords: MATCHED SAMPLES DESIGN; INDEPENDENT SAMPLES DESIGN; BERNOULLI TREATMENTS; RANKING AND SELECTION; INDIFFERENCE-ZONE APPROACH 1. INTRODUCTION
IN thispaperwe considertheproblemofselectingthe"better"ofthetwoBernoullipopulations (i.e. the one havingthe largersuccessprobability)when a matchedsamplesdesignis used. The corresponding problemwhentheindependentsamplesdesignis used has been considered by Sobel and Huyett(1957). It should be noted that althoughconsiderableliteratureexists on the problemof comparingmatchedproportions(see McNemar, 1947; Cochran, 1950; Bennet,1967, 1968; Bhapkar,1973; Bhapkarand Somes, 1977),mostlyit deals withtestsof homogeneity.However,in manypracticalsituationsthe experimenter's goal is to selectthe "best" treatment;a testof homogeneity does not providethe information whichthe experimentertrulyseeksin such situations.This paper providesan appropriateformulation of the selectionproblemand givesa procedureforattainingthisgoal. 2. ASSUMPTIONS, NOTATION AND PROBLEM FORMULATION Considertwo treatmentsT, and T2 and let 7Tij denote the probabilitythat a matched observationon T, and T2 resultsin outcomei withT, and outcomej withT2 (i,j = 0,1) where 1 denotessuccessand 0 denotesfailure.We have I E 7Tij= 1. We assumethatthe7Tijremain constantthroughout thetrial.Thuseachmatchedobservation can be thoughtofas a realization froma fixedmultinomialdistributionwith four cells: (1, 1), (1,0), (0, 1) and (0,0); the correspondingprobabilitiesare 7T11,7T10,7TO, and 7TOO, respectively.Let Pt = 711 + 7T10 and and let P[t]!P[2j denote P2 = 11+ 7To be the success probabilitiesof T, and T2 respectively is the orderedvalues of thepi. We assume thatthe 7Tijare unknown,but the experimenter able to specifyan upper limit7T*(0 < 7T* < 1) on 7T10+ 7TOl = 7T (say). (In general,ifmatching is properlydone and T, and T2are comparableto each otherthen7i, theprobability ofdifferent outcomeson T, and T2 withthe same matchedobservation,will be small; see Section5 for furtherdiscussion.) The experimenter's goal is to selectthe treatmentassociated withp' p such a selectionis referred to as a correctselection(CS) and the corresponding probabilityis denotedby PCS. The experimenter restricts consideration to procedureswhichguaranteethe probability requirement: PCSkP*
8>8*, wheneverp[21-p[1,=
and
(2.1)
7Tto+roj=7T-7r,
where{7T8 **,P* are constantsspecifiedbeforeexperimentation starts;0 < and -P2 or equivalently assumethatT1is thebettertreatment, Withoutloss of generality V10> Tro1.Then T
PCS
=
P{X10> X01}+ vP{X10= X01}
= E[P{X10> xolxo
=
(= x} + P{X10 = Xoi X1o+ Xo0=x}]
)
-( x(l
7r)nx.
(3.1)
bracketsin (3.1) is just P( Y> 2x)+ P( Y = lx) if x Note thatthe quantityinsidethe squaWre withparameters is even,and P{ Y) -(x + l)} ifx is odd, where Y has a binomialdistribution x and A = vlolv. Denotingit byg(x, A) we have g(x, A) = IA{2(x+ 1), '(x+ 1)} forodd x> 1, forevenx > 2, 2X) = lA(1x, 2
1
forx = 0,
(3.2a) (3.2b) (3.2c)
where du I(a, b) = Fr(a+b) |Pua-l(l-u)b(a)rF(b)Jo denotestheusual incompletebeta function. To guarantee(2.1) it is necessaryto find the infimumof the PCS over the region PI P2 = Tlh-rol > 8,*10 + v&ol'< *; theminimumvalue ofn whichmakesthisinfimum> P* A)} whereX representPCS = Eff{g(X, will be the desiredsample size. To findthe infimum, is a binomialrandom variable with parametersn and Xr and E., denotes the expectation theinfimumof the PCS over the specifiedregion evaluatedat parametervalue v. Intuitively as possible, will occurwhenthepi are as close as possibleand the matchingis as ineffective = 8* and 7r10+ = favourable least to as the is referred which 7r* i.e.whenp1-P2 = vol (LFC). However,notethatit is not completelyobviousthatthePCS decreases configuration withincreasing7T since this also correspondsto an increasein the numberof "effective" observationsxl0 and x01. A formalproof of the LFC is thus needed and is givenin the Appendix. Note thatat the LFC, we have PCSLFC
=
Elr*{g(X,1+ 8*/2X*)} nn 1 2+ 8*/27 - zg(x,,
x=o
( )(7,*)X(1-7T*)n-x.(3.3)
side of (3.3) is increasingin n and to verifythatthe right-hand It is fairlystraightforward tendsto 1 as n tendsto of.Thus any desiredvalue ofP* can be attainedby choosingn large to 1-'(1 enough. It should be noted that,when S*= 7r*,(3.3) simplifies *)n and when T= +.*). g(n, to simplifies (3.3) 1,
- The BetterBernoulliTreatment TAMHANE
28
[No. 1,
4. TABLESOF SAMPLESIZES The values of n whichguarantee(2.1) werefoundusing(3.3) forn( 35. For n> 35, the followingnormal approximationwas used. Note that since X1o, X0l are multinomial frequencies,for large n, (X1O-X0l) can be regardedas a normal random variable with mean = n(71o- 7T) and variance = n{i71o+ i7T1 - (7-70-ol)2}. LFC can be written as PCSLFCa@{8*4n/V(7T*-8*2)}
Therefore the PCS under the
where (D(.)
denotes
the standard
normal distribution function. From this we obtain
n L (v* 8*2){(D l(p*)}2
(4.1)
This approximation is useful when P* is large and/or 8* is small and/or 7&* is large. The values of n obtained from (4.1) were rounded upwards. The calculations for P* = 0 90 and 0 95 appear in Table 1; the values of n for P* = 0 99 are almost exactly double those for 0 95 and hence are not given here. P*=
TABLE 1
Valuesof n P* = 0 90
8*
0-05
0-10
65 130 196 262 327 393 459 524 590
32 48 65 81 97 114 130 147
0-15
0-20
0-25
0-30
0-35
0-40
0-45
14 21 29 35 43 50 57 65
8 12 16 20 24 28 32 36
7 10 13 15 18 21 23
5 7 9 10 12 14 16
5 6 8 9 10 12
4 5 6 7 8 9
4 4 5 6 7
0-25
0-30
0-35
0-40
0-45
11 16
7 10
25
17
0-50
7T* \
0-1 0-2 0.3 0.4 0-5 0-6 0.7 0-8 0-9 1.0
656
16
163
72
40
P*=
\8*
0-05
0.10
106
23
0o3 0-4
322 431
79 106
34 46
18 25
0-6
647
160
70
38
0-8 0-9 1.0
864 972 1080
214 241 268
94 106 118
0.1
0-2 0-5
0-7
214
539
755
52
133 187
0-15
0-20
21
11
58 82
25
13
9
7
7
0-95
32
20
45
29
52 59 65
17
3 3 4 5 5
34 37 41
7
5
0-50
14
10
7
5
4
20
14
11
8
6
23 26 29
12
17 19 21
9
13 14 17
7 10 11 13
5
8 9 9
To check the accuracy of the normal approximation we computed exact and approximate n-values for 20 < n < 35 and found that the approximate n is always within + 1 of the exact n; the accuracy of the approximation improves with increasing n. Thus the normal approximation should be very good for n > 35.
TAMHANE -
1980] 5.
29
The BetterBernoulliTreatment
COMPARISON OF MATCHED SAMPLES DESIGN WITH INDEPENDENT SAMPLES DESIGN
designthegoaloftheexperimenter isthesameas before, In thecaseofindependent samples However,sinceno r namelyto selectthetreatment havingthelargersuccessprobability. simply reads arepresent here,theprobability requirement PCS >P*
= 8> 8*9 whenever P[2]-P[1]
(5.1)
where{8*,P*} areconstants before experimentation starts;0 < 8 < 1 andi < P* < 1. specified (5.1)andshowed procedure whichguarantees SobelandHuyett (1957)proposeda single-stage in largesamples,is thattheoptimalsamplesizeperpopulation, n
(I
-
D-l(p*)}2 2*2){
(5.2)
The relative efficiency (RE) of thematchedsamplesdesignin termsof theratioof sample sizesobtainedfrom(4.1) and (5.2) is
RE -
1-
8*2
1r- *2(3
~~~~~~~~~~~(5
h s 2(ta Notethatweassumethevaluesof83*andP* specified by(2.1)and(5.1)arethesameandthat RE does not dependon P*. Furthermore, RE> 1 if iT* < 2(1 + 8 *2). Also if &* = I (i.e. about its value) thenRE = 0 5. by any priorknowledge +o701 iS not constrained 1T=-10 samplesdesignis lessefficient in thelargesamplecase Thusfor r*> 1(1+ 8*2), thematched doesnotassumeanypriorknowledge thantheindependent samples design.Iftheexperimenter samplesdesignrequires concerning thevalueof7T, thenin thelargesamplecasethematched as the independent twiceas manyobservations samplesdesignto guaranteethe same probability requirement. mustbe (in our Theseresultsgivea quantitative measureof howeffective thematching thanthe howsmall1T mustbe) so thatthematched samplesdesignis moreefficient notation is that, to be drawnfromthisdiscussion independent samplesdesign.The mainconclusion in thedesignofa matchedsamplesexperiment shouldbe ensured. a highlevelofmatching correlated variablesso thattheyare highly Thiscan be achievedby choosingthematching withthe outcomevariables.If thematching is ineffective thentherecan be considerable lossin efficiency relative to theindependent samplesdesign. Theresults withthesimilar workdoneforthetesting obtained hereareinbroadagreement in2 x 2 tablesbyseveralauthors, (1964) see,forexample, Youkeles(1963),Worcester problem andMiettinen thehomogeneity alsoreachtheconclusion thatfortesting (1968).Theseauthors is not of twoproportions, ifthematching matchedsamplesdesigncan be disadvantageous For and theadvantageis not substantial unlessthematching is highlyeffective. effective additional references andalso forsomepractical aspectsofmatching see McKinlay(1977). ACKNOWLEDGEMENTS
Theauthoris thankful forimproveformaking manyusefulsuggestions to threereferees mentofthepresentation. Thisresearch was supported by NSF GrantNo. ENG77-06112. REFERENCES
R. E. (1954). A single-sample multiple-decision procedurefor rankingmeans of normal populationswithknownvariances.Ann.Math.Statist.,25, 16-39. BENNET, B. M. (1967). Testsof hypotheses concerning matchedsamples.J. R. Statist.Soc. B, 29, 468-474. (1968). Note on x2testsformatchedsamples.J. R. Statist.Soc. B, 30, 368-370. BHAPKAR, V. P. (1973). On thecomparison ofproportions in matchedsamples. SankhydA, 35, 341-356. BHAPKAR, V. P. and Somes,G. W. (1977). Distribution of Q whentesting equalityofmatchedproportions. J. Amer.Statist.Ass.,72, 658-661.
BECHHOFER,
TAMHANE- The BetterBernoulliTreatment
30
[No. 1,
37, 256-266. in matchedsamples.Biometrika, W. G. (1950). Comparisonofpercentages S. (1972). On a classofsubsetselectionprocedures.Ann.Math.Statist., S. S. and PANCHAPAKESAN, GUPTA, 43, 814-822. 33, 725-735. reappraisalofa populartechnique.Biometrics, McKINLAY, S. M. (1977). Pair-matching-a betweencorrelatedproportionsor Q. (1947). Note on the samplingerrorof the difference McNEMAR, 12, 153-157. percentages.Psychometrika, 24, response.Biometrics, MIETTINEN, 0. S. (1968). The matchedpairs designin the case of all-or-none 339-352. SOBEL,M. and HuYETT,M. J. (1957). Selectingthebestone of severalbinomialpopulations.Bell System Tech.J.,36, 537-576. 20, 840-848. studies.Biometrics, J.(1964). Matchedsamplesin epidemiologic WORCESTER, on smalltwo-treatment ineffective pairingof observations YOUKELES,L. H. (1963). Loss of powerthrough 19, 175-180. all-or-none Biometrics, experiments. COCHRAN,
APPENDIX
keepXf"fixedand regardthePCS as a theLFC, we first regarding To provetheassertion
functionof A = 7r1Tol.To show that,subjectto irl+vil = ir (fixed),the PCS is minimized functionof A foreach x. But to showthatg is a non-decreasing at ilo-vrl = 8*, it suffices
function of A> l for g is a strictly increasing from(3.2); in factthisfollowsimmediately each x > 1. Our nexttask is to findtheinfimumover X < w* of
(A.1)
inf PCS = EJ,{g(X,A*)}, ffo10To14>8*
whereA* =
i(f?+
occursat X = 8*)/lr.To showthatthisinfimum
1T*
we provethefollowing.
Theorem.E,,(g(X,A*)} is a decreasingfunctionof v.
(f)px(i p)n-x by b(x; n,p) and the function Proof.Denotethebinomialprobability by B(x; n,p). A "discreteanalog"of Theorem2.1 of function distribution corresponding forthemonotonicity tobe verified (1972)showsthatthecondition GuptaandPanchapakesan of Ef,(g(X,A*)} relativeto 7ris
{(O/a)g(x, A*)}b(x; n,m))-{g(x, A*)-g(x- 1,A*)} (a/lav)B(x- 1; n,f) < 0
(A.2)
in (A.3), sideof(A.2) is 0. Substitute for1(x < n; forx = 0 theleft-hand 1; n,r) = -(x/iT)b(x; n,iT) becomes to be verified andfindthatthecondition (al/a) B(x-
A*)+(x/T){g(x,A*)-g (x- 1,A*)}< 0. it can be shownthat,forx odd, Withsomealgebraicmanipulations (a/av)g(x,
(A.3)
(A.3)
g(x, A*)-g(x- 1,A*) = -(i/x)(a/la)g(x, A*).
sideof(A.3) is 0 forx odd. For x even> 2 it followsfrom Thus,we findthattheleft-hand that it can be easilyverified (3.2a) and (3.2b)thatg(x,A*)-g(x-1, A*)= 0. Furthermore, for inequality in all thecases. Becauseofthestrict A*) 2, itfollows