The Annals Annals of of Statistics Statistics The 1989, Vol. Vol. 17, 17, No.3, No. 3, 1325-1334 1325-1.334 1989,
OPTIMAL-PARTITIONING OPTIMAL-PARTITIONING INEQUALITIES INEQUALITIES IN IN AND MULTI-HYPOTHESES CLASSIFICATION AND CLASSIFICATION MULTI-HYPOTHESES TESTING TESTING 2 I By BY THEODORE AND THEODORE P. P. HILL HILLO AND Y. TONG2 Y. L. L. TONG
Georgia of GeorgiaInstitute Institute ofTechnology Technology Optimal-partitioning and risk are Optimal-partitioning andminimax minimax riskinequalities inequalities areobtained obtainedfor forthe the classification classification and testing Best Best possible and multi-hypotheses multi-hypotheses testingproblems. problems. possiblebounds bounds arederived are risk families, based derivedfor forthe theminimax riskfor onthe minimax forlocation locationparameter basedon parameter families, the tail concentrations concentrations and Levy of the tail and of Special Levy concentrations concentrations the distributions. distributions. Special attention is given givento to continuous continuous maximum distributions with likelihood attention is distributions withthe themaximum likelihood ratioproperty property and to to symmetric unimodal distributions. Bounds ratio and symmetric unimodalcontinuous continuous Bounds distributions. forgeneral general(including (including for discontinuous) distributions are discontinuous) distributions arealso alsoobtained. obtained.
1. Preliminaries. in its Preliminaries. The The statistical statisticalclassification its standard 1. problem, classification standard problem,in form,deals deals with withoptimal intoone oneof form, an into optimaldecision decisionrules rulesfor forclassifying an observation observation classifying of severalspecified several specifiedpopulations. populations.The The problem is closely to the problemis closelyrelated relatedto the following following multi-hypotheses testing multi-hypotheses testingproblem: be given problem:For For nn>~ 2, let F F1,..., given(univariate) (univariate) 2,let Fn I , ••• ,F n be distributions. Let distributions. Let X be aa random F. In In testing the X be F. randomvariable variablewith withdistribution distribution testingthe hypotheses hypotheses (1.1) (1.1)
Hi: F=EF,
ii=l,...,n, = 1, ... , n,
decisionrule rulecorresponds corresponds to aa measurable measurable partition {A i})i of ofthe thereal realline linesuch such aa decision to partition {Ai}'i=l E Ai. that Hi is accepted acceptediff iff X The main use main purpose purposeof of this this paper is to to use that Hi is X E Ai. The paper is optimal-partitioning resultsfor optimal-partitioning results fordensities densitieswith with the the monotone monotonelikelihood likelihoodratio ratio (MLR) property together withconvexity convexity to derive derivesome somebest-possible (MLR) property together with inequalito best-possible inequalities risk,in ties for for the the minimax minimax risk, in terms terms of of two two probability-concentration probability-concentration parameters parameters (the tail-d (the tail-d concentration, concentration, Definition Definition 2.1 2.1 below below and and the the Levy concentration, Levy concentration, Definition forgeneral generallocation locationparameter 2.4) of of continuous continuous distributions, distributions, for parameter famifamiDefinition 2.4) lies symmetric unimodal densities(Section (Section2). 2). Analogous for lies and and for for symmetric unimodal densities Analogous results results for discontinuous discontinuous distributions distributions are are then then given given(Section (Section3). 3). For For the the objective objective of of minimizing minimizing the the largest largest probability probability of of misclassification, misclassification, the the standard standard classification classification problem problem is is equivalent equivalent to to many many "fair-division" "fair-division" problems problems in in which U n defined which there there are are n n probability probability measures measures /I ILl' ,..., ... ' ILn defined on on the the same same space, space, and and the the objective objective is is to to partition partition the the space space so so as as to to maximize maximize the the minimum minimum share, share, i.e., (A*',..., i.e., to to find find an an ordered ordered measurable measurable partition partition (Ai, ... , A*) A:) which which attains attains or or nearly nearly attains attains C*(Ji) == sup( sUP{ min ~n ti(Ai): #Li(A i ): (AI" An) isameasurable is a measurable partition partition of of Q ~}, C*(tL) (A,,...,.. ' An) 1 o. ration with with (Ji (i -- l)d 1)d for ration A more moregeneral forfixed fixeddd > 0. A resultis is then then obtained generalresult obtained Oi= { f(x -- (Ji)} under the the additional additional assumption that {f(x under the MLR assumptionthat MLR property. possesses the 6,)) possesses property. DEFINITION 2.1. 2.1. The The tail-d tail-dconcentration DEFINITION by concentrationof of F, is defined definedby F, p(F, p(F, d), is (2.1) (2.1)
= max + d]), max{{It (( d) = essinf ((-- oo, p (F, d) 00, ess inf F F+ It ([ ess sup F -- d, }. p(F, d, 00 d]), [([esssup x))) ) }I.
Note if F that if are two two continuous Note that continuous distributions = distributionssuch such that that F F1(x) F22 are Fl, I, F 1(x) = F2(x ess sup F2( x -- d) d) and and aa = = essinf ess inf Fl, F I , bb == ess F I , then then sup Fl,
ItI((b, b + d]) d]) == 1t2([a, d») = o. 0. AM((b, M2([a, a + d)) So under So under an an optimal optimal classification classification rule rule A* A* = = (Ai, A~)) one A~ (A *, Al one has (b, b + + d] d] c A a.s. + d) a + d) cc A* a.s. Furthermore, a.s. and and [a, [a, a Ai a.s. Furthermore, note that p(F, p( F, d) = 00 if if and and only only if if note that d) = ess infF= --00 essinfF= and ess esssupF= 00. oo and sup F = oo. THEOREM THEOREM 2.2. 2.2. If If F F is is continuous continuous and and Fi(x) F;(x) == F(x F(x -- (i (i -- 1)d) l)d) for for il = =
1,...,n, 1, ... , n, then then (2.2) (2.2)
C*(~)
(1 ++ :~:qj) fqj
2 C*(R) ?(i
1,
where where q q == 11-p(F, - p( F, d). d ). Moreover, Moreover, this this bound bound is is best best possible possible and and is is attained attained for for all all n, n, all all d d and and all all qq > 0 Supposethat p(F, d) d) EE (0,1). (0,1). Since F >> -- 00 or esssupF it may implies impliesessinf essinfF esssupF k > 1. 1. = «-00,d],(d,2d], (-od], (d,2d]',..., ((k -1)d,oo),ep, k = Then Then {al,. .., an} c PR(pi),
(2.3) (2.3)
= p,(A = (p, Rn' with where8akk = 1,O,... in R where .. , p, 1,0, (p,....,p, ... ,0) is the thevector with11 in in the O)is vectorin the kth kth p(Ak) k) = coordinate /(1 + coordinateand and preceded entriesof of p. Let 13 + Ej:i 1kk = = qn-k precededby by kk -- 11 entries p. Let for qn-k/(l q') for ?Y-1 qj) = 1, 1,..., n. By and Proposition ... , n. 1.1(ii), kk = By (2.3) (2.3) and Proposition 1.1(ii), nn
a= 8=
[13 ,fikakePR(p) k 8 k EPR(p,)
k=i k=l
and an an easy easy calculation calculationshows showsthat that each each entry and entryof of aa is is (1 (1 + + Ii:: q j) , which which EJ- qj)-t, establishes (2.2). establishes (2.2). To see see that that (2.2) (2.2) is is best for qq == 1, 1, let let F F11 == F To best possible possible for be uniformly be uniformly 1 M Fl,M -* n-1. -* distributedon on [-M, as M M ~ 00, [-M, M]. Then as distributed p(F, d) n- 1• For For M]. Then oc, p(F, d) ~ 0 and and C*(p,) C*(pL)~ = 0, any anydistribution distribution withsupport supportin in [0, [0,d/2] d/2] attains attainsthe qq = 0, with bound in thebound in (2.2). (2.2).That That is attained (2.2) is attainedfor forall all n, n, all all d d and and all all qq EE (0,1) (0,1) is is shown shownby thenext nextexample. (2.2) by the example.0O
°
°
EXAMPLE2.3. forx forfixed fixedn EXAMPLE 2.3. Let Let F(x) F(x) == 11 -- e-x e- X for x > > O and and for n >> 11 and and d d >> 0 0 = let 1,..., for ii == 1, F(x - (i (i - l)d) 1)d) for let .F:(x) ... , n. the corresponding density n. Then Then the correspondingdensity Fi(x) = F(x
functions are negative negative exponential exponential with with location parameters ()i functions are location parameters (i -- l)d, 1)d, Le., i.e., 9i == (i
Ii (x) == exp(exp( - (x (x -- (i 1) d) ) fi(x) (i --l)d))
for x >~ (i d forx (i -- 1) 1)d
{ fi} has and 1,... .. .,, n. and zero zero otherwise otherwise for for ii = = 1, n. Clearly Clearly {Ii} has the the MLR MLR property, property, so so by by ... < ... * 1 < Theorem 1.6 there Theorem 1.6 there exist exist positive positive constants constants d d 1** > p, p, so so d1* d 1* >> d d and and inductively inductively d d k* >> kd kd for for all all k 1. This k > > 1. This implies implies that that (2.4) for for jj ==
(2.5) (2.5)
1/ on = qf2 q/2 == q2f q 2/a = ... == qj- lfj (d j*-I' d j*) 1, dj*) j on (dj1 2,..., 2, ... , n n (do (d o* = = 0, 0, d d:* = = ox). 00). Together Together (2.4) (2.4) and and (1.5) (1.5) imply imply
11 f
-j+l1
dl I =- C*( ) f . - 1 qq-j+1 d* * fl1 - C*( IP, ) for or jJ =1***,n - , . · . , n.X dj -
1
Since 1, it Since E!f Ej-l J* f~{l fl 11 == 1, it follows follows from from (2.5) (2.5) that that C*(>) C*(p,) == (1 (1 + + Ej::lqqi)-l. If If n n == 2, 2, the the location location parameter parameter classification classification problem problem is is precisely precisely the the probproblem HI: 0 () = = 01 ()1 against against aa simple simple altemative alternative lem of of testing testing aa simple simple null null hypothesis hypothesis H1:
OPTIMAL-PARTITIONING OPTIMAL-PARTITIONINGINEQUALITIES INEQUALITIES
1329 1329
H 2 : (J6 = 8 bound for H2: + d > 0. 0. In In this thiscase where802 case aa sharp forthe d for forsome somedd > the sharpbound 02, 6l1 + 2 = 8 2 , where minimax function is by the riskin in terms of the the Levy-concentration function is given the next minimaxrisk termsof next Levy-concentration givenby theorem. theorem. Levy concentration for F, dd is DEFINITION The Levy concentration forF, is 2.4. The DEFINITION 2.4. + d) X(F, d) F(x)}} EE (0,1]. X(F, d) == sup {F(x + d) -- F(x)) (0, 1]. sup {F(x x x
THEOREM Let X F(x Levy X have THEOREM 2.5. 2.5. Let have aa continuous continuousdistribution distribution withLevy F(x -- 8) 6) with = X(F, X= concentration X(F, d) parameters such let 861, and let be location locationparameters such that concentrationA that d) and 62 1,8 2 be = d > 0. 862 for testing d> 0. Then Thenthere thereexists existsaa test testfor testing 61 1 = 2 -- 8 = 8 6= = 8 H1: 86 = HI: versus H H2: 62 6,1 versus 2: 8 2
(2.6) (2.6) which satisfies whichsatisfies
< (1 -- X)/(2 max{a, ,B} ~ max{a,,B} X)/(2 -- X),
(2.7) (2.7)
where a, f3 are Moreover, this where are the thetype II errors, and type a, /B typeII and typeII thisbound bound errors,respectively. respectively. Moreover, is for all is attained attainedfor all dd and and all all A. X. = A(F, ? p(F, X= REMARK. We We note note that, REMARK. by definition, p(F, d) F and that, by definition,A X(F,d) all F d) ~ d) for forall and > 0 and = 2, all d> and equality formonotone monotonedensity If nn = all d holds for If then functions. equalityholds densityfunctions. 2, then ? (1 + q)-l 1 -- (1 1 p)/(2 -- p) p) ~ Thus the holds. Thus the (1 + (1 -- p)/(2 (1 -- X)/(2 A)/(2 -- A) X) always always holds. q)-1 == (1 in (2.7) bound in bound is sharper thanthat in (2.2). thatin (2.7) is sharperthan (2.2).
°
PROOF OF THEOREM PROOF OF 81 == 0. 0. We 2.5. For For notational THEOREM 2.5. notational convenience convenienceassume assume 61 We show showthat thatthere is aa test thereis testwith with
(2.8)
°
= min{l C*(~) > (2 -- A)-I. min{1 -- a,l C*(,) = a,1 -- f3} /3)~ X) 0 and and if unimodal, and ifPi(x) F(x -- (i (i -- l)d) 1)d) for and ii == 1, 1,..., unimodal, for fixed fixed dd > ... , n, n, then then Fi(x) == F(x
°
~? (11++ 22'EE >jTj + (k(k ++ l)T
-1
m-1
(2.10)
C*(I1) C*(F)
1)T
m
)
1,
j=1
wherem m is is the the largest largestinteger integerless less than than or or equal equal to to n/2, n/2, kk == nn -- 2m, 2m, 'T== where = A(F, (1 -- A)/(1 + A) X)/(1 + A) and and A X= d). Moreover, thisbound boundis is attained attainedfor all n, n, dd (1 A(F, d). Moreover, this for all and A. A. and PROOF. 1. n n == 2m 2m for for some some m m~ ? 1. 1. Using Using the PROOF. CASE 1. the symmetry symmetryof of F F and Definition 2.4,it Definition 2.4, to see see that it is is easy easy to that
{V,.. ., vm} c PR()I
(2.11) (2.11) where where
= (( + A)/2, + A)/2) VI = A) /2, A, A, (1 + A,..., , A,(1 v, ((11 + A)/2),,
+ A)/2, A, , A,(1 + A)/2,0), + A)/2,XI,...IX, A(I + A)/2,0), ... ....I, ... ,,0,(1 V m ==(0, (0, ... A)/2,(1 A)/2,0, ....,0). ,0). 0, (1+ + A)/2, vm (1+ + A)/2,0,... v 2 ==(0, (0,(1 V2 (1
[For example, 3d/2]),JL3« + 3d/2, [For example, v2 V2 = (AI(4) (JLl( ep), A2(( JL2« - 0c, 00, b b ++ 3d/2]), b + 3d/2, b b + + 5d/2]), 5d/2]), .t3((b •••* , JL n( ep ».] For For T 'T = (1 (1 - A)/(1 A)/(1 + A) A) define define n()).] f3Ti-=r f3j - T _
j-l
!(
m-l
i~OriT
/i=o
i)
._ forj=1,...,m. for J - 1, ... , m.
Then 1. It EJ=1,8j Then f3j fJj ?~ 00 and and 'L'j=1 Pj == 1. It follows follows from from (2.11) (2.11) and and Proposition Proposition 2.1(ii) 2.1(ii) that that 1 j m (c,c,c,c, =1f1v1 ...,, c) . 'Li=1 Pjv} = =(c, C, ... c) EE PR(f), PR(f.1), where where cc = = (1 (1 ++ 2EX 2'[7'=-11l 'T + + Tm 'T )-I. TI CASE CASE 2. 2. n n= = 2m 2m + + 11 for for some some m m >~ 1. 1. Proceed Proceed as as in in Case Case 1 1 using using the the additional 0,...... ,0,1,0, 0). additional vector vector vm?+ Vm + 1 = = (0, (0,0, ,0, 1,0, ......,,0). To To see see that that these these bounds bounds are are attained attained for for all all n, n, d d and and A, A, consider consider the the continuous continuous symmetric symmetric (about (about d/2) d/2) unimodal unimodal distribution distribution F F with with right-half right-half
1331 1331
OPTIMAL-PARTITIONING OPTIMAL-PARTITIONINGINEQUALITIES INEQUALITIES
density by densitygiven givenby
If(x)=XT ( x) = AT jj for x E [jd, (j + 1) d) for j = 0, 1, 2, . .. . forj=0,1,2,. forxe[jd,(j+1)d) have note that that {II'···' { f,..., fn} for ii == 1, 1,..., n, note (i - l)d) 1)d) for Letting 'i(x) Letting I(x - (i ... , n, In} have fi(x) == f(x common Then 2.3,using using Thenproceed as in in Example Example2.3, and the theMLR MLR property. property. proceedas commonsupport supportand Theorem is attained. attained.D to show showthat thatthe thebound boundin in (2.10) C1 1.6 to (2.10)is Theorem1.6 Theorem REMARK. believe that, of all nn > > 2, conclusion ofTheorem forall theconclusion 2, the The authors authorsbelieve that,for REMARK. The 2.2 p(F, d) by qq = = 11 -- A(F, which is is aa if q is replaced even if X(F, d), d), which d) is replaced by 2.2 is is true true even q = 11 -- p(F, ? p(F, as is is stronger result p(F, d). A(F, The Levy concentration d) is, is,as X(F,d) d). The Levyconcentration resultsince sinceA(F, X(F,d) d) ~ stronger F is, and analogous the F outthe thedistribution distribution analogous somegauge ofhow howspread is,and spreadout thevariance, gaugeof variance,some bounds for risk are also ofthe thedistribution distribution arealso riskin in terms ternsof ofthe the variance varianceof forthe theminimax minimax bounds the best possible possible bounds bounds are possible. Although knownto to the the authors, are not not known authors,the the best Althoughthe possible. minimaxbounds be used minimaxin Theorems 2.6 may used to to obtain obtaincorresponding Theorems2.5 2.5 and and 2.6 corresponding maybe boundsin risk of Levy Levy[e.g., [e.g., inequalitiesof in terms termsof ofthe the variance varianceby by applying applyinginequalities riskinequalities inequalitiesin X in in pages 26-30] Hengartner whichgive boundson on A givebounds 26-30] which and Theodorescu Theodorescu(1973), (1973),pages Hengartnerand tenns viceversa. versa. ofthe thevariance varianceand and vice termsof Thus distributions with locationpapawithequally equallyspaced spaced location distributions Thus far farwe we have have considered considered = (i = 1, 1,..., 1)d, ii = In the we extend extendthe the results results rameters, ... , n. n. In the following followingwe (i -- l)d, rameters,Le., i.e., 0i Oi= case. forthe themore moregeneral generalcase. given yieldlower lowerbounds boundsfor 2.2 and and 2.6 2.6 to to yield givenin in Theorems Theorems2.2 of aa monotonicity property of Toward property monotonicity thisend firstobserve observeaa lemma lemmaconcerning concerning endwe we first Towardthis the partitioning problem. problem. the optimal optimalpartitioning = 1, 1,..., n -- 11 be and LEMMA 2.7. 3S! for be positive real numbers numbers and 2.7. Let LEMMA Let Si, ~i'~: for ii = ... , n positive real = L~::'\ '= = (0 1, ... = where01 = O{ = 0, 0ii = 0'= = (O{, ...* , O~), where 01=.1'=, XJ--j3, n define80 = define , On)' 8' ~j' O{ =I F F a distribution and i n. Let be continuous L~-==11 ~J for i = 2, ... , n. Let F be a continuous distribution function and define F for define function 2,..., Ej=1i Sj' and F' by and F' by
= (F(x), F = (F(x), F(x F F(x - O2),., F(x --n)) On»)' 2 ), ••• , F(x
F' == (F( (F(x), F' X ), F(x F( X
--
0?),***, O~), ... , F(xF( X - O~»).
distriburiskswhen whenthe thetrue truedistributo the theminimax minimaxrisks Let C*(je) correspond correspond Let C*(JLe) to C*( jie) and and C*(JLe') tion If the functions of F or F', F and F' have havethe the tionvector vectoris is For thedensity of F and F' If density functions F', respectively. respectively. < C*(JLe'). < S! for = 1, 1,..., n -- 1, thenC*(JLe) MLR and if MLR property property and for ii = ... , n 1, then if ~i C*(Qi,). Si ~~: C*(Rio)~ PROOF. it to holds ~; PROOF. By induction it suffices suffices to show showthat thatthe thestatement statement holdsfor for~] By induction 8, I. + A, + d] A forii > A~l =(di* = (d.* 1 + d ' d·* + I. A] for di* l- 1 l
S' ~;
-- ~] 8
P. HILL T. T. P. HILL AND AND Y. Y. L. L. TONG TONG
1332 1332
Then Then dF(x-- Of)r)?f ~f fJ dF(x Al
Aj
dF(x - oJ), dF(x-9I),
L.
L,dF(X dF(x --9)OJ for I-I dF(x -- Of) 01') == JdF(x forii s > I. I.
Since an when vector SinceA' A' is is not notnecessarily an optimal whenthe necessarily optimalpartition partition thetrue truedistribution distribution vector is F', is complete. theproof F', the proofis is complete.0O
withTheorems Combining obtains Lemma2.7 2.7 with Theorems2.2 2.2 and and 2.6, CombiningLemma one immediately 2.6,one obtains immediately the theorems which families the following whichapply theorems to all all location locationparameter following familiesof ofdistri.. distriapplyto parameter whenthe butions butionswhen thedensities MLR property. densitiespossess theMLR possessthe property. THEOREM 2.2'. F be 2.2'. Let Let F be aa continuous THEOREM distribution, Fi(x) for continuous distribution, F(x - 0i) Fi(x) == F(x 9j) for ... < and let letdd ==mini