A New Class of Minimum Power Divergence Estimators with ...

Comment

Report 3 Downloads 33 Views

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance Nirian Martín1 and Yi Li2;3 1 Dept. Statistics, Carlos III University of Madrid 2 Dept. Biostatistics & Computational Biology, Dana Farber Cancer Institute 3 Dept. Biostatistics, Harvard School of Public Health January 20, 2011

Abstract The Annual Percent Change (APC) has been adopted as a useful measure for analyzing the changing trends of cancer mortality and incidence rates by the NCI SEER program. Di¢ culties, however, arise when comparing the sample APCs between two overlapping regions because of the induced dependence (e.g., comparing the cancer mortality change rate of California with the national level). This paper deals with a new perspective of understanding the sample distribution of the test statistics for comparing the APCs between overlapping regions. Our proposal allows for computational readiness and easy interpretability. We further propose a more general family of estimators, namely, the so-called minimum power divergence estimators, including the maximum likelihood estimators as a special case. Our simulation experiments support the superiority of the proposed estimator to the conventional maximum likelihood estimator. The proposed method is illustrated by the analysis of the SEER cancer mortality rates observed from 1991 to 2006.

Keywords: Minimum power divergence estimators, Age-adjusted cancer rates, Annual percent change (APC), Trends, Poisson sampling.

1

Introduction

According to the World Health Statistics 2009, published by the World Health Organization, in 2004 the age-standardized mortality rate in high-income countries attributable to cancer deaths was 164 per 100,000. Cancer constituted the second highest cause of death after cardiovascular disease (its agestandardized mortality rate was equal to 408 per 100,000). For cancer prevention and control programs, such as the Surveillance, Epidemiology and End Results (SEER) in the United States (US), it is very important to rely on statistical tools to capture downward or upward trends of rates associated with each type of cancer and to measure their intensity accurately. These trends in cancer rates are de…ned within a speci…c spatial-temporal framework, that is, di¤erent geographic regions and time periods are considered. Let rki be the expected value of the cancer rate associated with region k and the i-th time point in a k sequence of ordered Ik time points ftki gIi=1 . We shall assume that Region 1 starts with the earliest time. Each point is representing an equally spaced period of time, for instance a year, and thus without any loss of generality t1i = i, i = 1; :::; I1 (any change in origin or scale with respect to the time should not a¤ect a measure of trend). The cancer rates are useful to evaluate either the risk of developing cancer Corresponding author, E-mail: [email protected].

1

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

2

(cancer incidence rates) or dying from cancer (cancer mortality rates) in a speci…c moment. Statistically, the trend in cancer rates is an average rate of change per year in a given relatively short period of time framework when constant change along the time has been assumed. The annual percent change (APC) is a suitable measure for comparing recent trends associated with age-adjusted expected cancer rates rki =

J P

!j rkji ;

(1)

j=1

PJ where J is the number of age-groups, f!j gJj=1 is the age-distribution of the Standard Population ( j=1 !j = 1, !j > 0, j = 1; :::; J) and frkji gJj=1 is the set of expected rates associated with the k-th region (k = 1; 2) at the time-point tki (i = 1; :::; Ik ), or the i-th year, in each of the age-groups (j = 1; :::; J). For example, the SEER Program applies as standard the US population of year 2000 with J = 19 age-groups [0; 1), [1; 5), [5; 10), [10; 15), ..., [80; 85), [85; ). The APC removes di¤erences in scale by considering the proportion (rk;i+1 rk;i )=rk;i = rk;i+1 =rk;i 1 under constant change assumption of the expected rates. Proportionality constant k = rk2 =rk1 = ::: = rkIk =rk;Ik 1 constitutes the basis for de…ning APCk = 100( k 1) as a percentage associated with the expected rates frkji gJj=1 of the k-th region. Since the models that deal with the APCs consider the logarithm of age-adjusted cancer rates, the previous formula is usually replaced by APCk = 100(exp(

1k )

1);

(2)

and we would like to make statistical inferences on parameter 1k . The data that are collected for modeling the APC associated with region k, are: dkji , the number of deaths (or incidences) in the k-th region, j-th age-group, at the time-point tki ; nkji the population at risk in the k-th region, j-th age-group, at the time-point tki ; so that the r.v.s that generate dkji , Dkji , are considered to be mutually independent. In a sampling PJ PJ Dkji , framework we can de…ne the empirical age-adjusted cancer rates as Rki = j=1 !j Rkji = j=1 !j nkji whose expected value is (1). Even though the assumption of “independence”associated with Dkji simplify the process of making statistical inference, it is in practice common to …nd situations in which the two APCs to be compared, APC1 and APC2 , share some data because there is an overlap between the two regions. For example, in Riddell and Pliska (2008) county-level data on 22 selected cancer sites during 1996-2005 are analyzed, so that the APC of each county is compared with the APC of Oregon state. It is not possible to assume independence between the data of counties (local level) and their state (global level). Moreover, the APC comparison between overlapping regions is more complicated when the APCs are not for the same period of time. For instance in the aforementioned study that appeared in Riddell and Pliska (2008), while Oregon APC was obtained for a period of time ending in 2005, the US APC was calculated for a period of time ending in 2004 because the US data of year 2005 were not available. Figure 1 represents the most complicated overlapping case for two regions, where f1; 6g f5; 8g is the set of points of the …rst region, f5; 9g f2; 6g is the set the points of the second region, f5; 6g f5; 6g is the set of points of the overlapping region (boxed points). Each of the two regions have a portion of space and period of time not contained in the other one (circular points for region 1 and diamond points for region 2). This paper is structured as follows. In Section 2 di¤erent models that establish the relationship between rki and 1k are reviewed and the two basic tools for making statistical inferences are presented, the estimators and test-statistics for equal APCs. Speci…cally, the Age-strati…ed Poisson Regression model, introduced for the …rst time in Li et al. (2008), is highlighted as the model that arises as an alternative to improve the previous ones. Based on Power-divergence measures, in Section 3 a family of estimators that generalize the maximum likelihood estimators (MLEs) are considered for the Agestrati…ed Poisson Regression model. In addition, a new point of view for computing the covariance

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

space

8

3

k=b=1

6

k=1 or 2 b=2

k=1,b=2

k=b=2

4

k=2,b=1

2

0 0

1

2

3

4

5

6

7

8

9

10

i = time point

Figure 1: Two overlapping regions not sharing the same period of time. between the MLEs of 1k is introduced inside the framework of this family of estimators and this is the key for substantially improving the Z-test statistic for testing the equality of APCs for the Agestrati…ed Poisson Regression model. In addition, such a methodology provides explicit and interpretable expressions of the covariance between the estimators of 1k . We evaluate the performance of the new proposed methodology in Section 4 through a simulation study and we also consider an application example to Breast and Thyroid cancer data from California (CA) and the US population, extracted from the SEER*STAT software of the SEER Program. Finally in Section 5 some concluding remarks are given.

2

Models associated with the Annual Percent Change (APC)

When non-overlapping regions are taken into account, there are basically two models which allow us to estimate the APC starting from slightly di¤erent assumptions, the Age-adjusted Cancer Rate Regression model and Age-strati…ed Poisson Regression model. The main di¤erence between them is based on the probability distribution of Dkji , number of deaths in the k-th region, j-th age-group, at the time-point tki : while the Age-adjusted Cancer Rate Regression model assumes normality for log Rki with Dkji having the same mean and variance, the Age-strati…ed Poisson Regression model assumes directly a Poisson random variable (r.v.) for Dkji . The Age-adjusted Cancer Rate Regression model establishes PJ PJ ind 2 2 log Rki = 0k + 1k tki + ki , where ki N (0; ki ) with ki = j=1 !j2 rkji =nkji = j=1 !j2 mkji =n2kji under E[Dkji ] = Var[Dkji ] = nkji rkji mkji ; (3) i.e. log Rki

ind

N (log rki ;

2 ki )

with rki = exp(

0k ) exp( 1k tki ):

According to the Age-strati…ed Poisson Regression model (Li et al. 2008), Dkji rkji it holds mkji log rkji = 0kj + 1k tki or log = 0kj + 1k tki : nkji

(4) ind

P(nkji rkji ) and for (5)

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

4

Observe that the parametrization of both models is essentially the same because the expected age-adjusted rate rki in terms of (5) is equal to (4), where exp(

0k )

=

J X !j exp(

0kj );

(6)

j=1

and thus for both models it holds that k

=

rkIk rk1

1 tkI k

tk1

= exp(

1k ):

(7)

The original estimators associated with the Age-adjusted Cancer Rate Regression model and Agestrati…ed Poisson Regression model are the Weighted Least Square estimators (WLSE) and Maximum Likelihood estimators (MLE) respectively. The hypothesis testing for comparing the equality of trends of two regions, H0 : APC1 = APC2 , is according to (2), equivalent to H0 : 11 12 = 0. Hence, the Z-test statistic for both models can be de…ned as b11 b12 ; (8) Z=q d b11 b12 ) Var(

d b11 b12 ) is the estimator where b1k , k = 1; 2 are the estimators of 1k associated with each region, Var( 2 2 of the variance of b11 b12 , Var( b11 b12 ). The expression of the variance is Var( b11 b12 ) = 11 + 12 , 2 b with 1k Var( 1k ), k = 1; 2, for non-overlapping regions. When overlapping regions are taken into account, the methodology for obtaining the estimators as well as Z-test statistic (8) remain valid, but the given expression for Var( b11 b12 ) is no longer valid. When the overlapping regions do not share the same period of time (t11 6= t21 or I1 6= I2 ), we must consider a new reference point for index i, denoted 1 by I, such that t1I represents the time point within ft1i gIi=1 where the time series associated with the I2 second region is about to start, i.e. we have ft2i gi=1 such that t21 = t1I + 1. In particular, if t1i = i, 1 1 I i = 1; :::; I1 , then t2i = I + i, i = 1; :::; I2 . Observe that ft1i gIi= , or equivalently ft2i gIi=1 , is the time I+1 series associated with the overlapping region (t1i = t2;i I , i = I + 1; :::; I1 ). In Figure 1 I1 = 6, I2 = 5, I = 4 and thus we can distinguish three subregions f5; 6g f1; :::; 4g, f5; 6g f5; 6g and f5; 6g f7; :::; 9g. Without any loss of generality each random variable Dkji can be decomposed into two summands (1)

(2)

Dkji = Dkji + Dkji

(9)

(1)

where Dkji , i 2 f1; :::; Ik g, is the number of deaths (or incidences) in the k-th region, j-th age-group, at the (2)

time-point tki for the subregion where there is no overlap in space; Dkji , i 2 f1; :::; Ik g, is the the number of deaths (or incidences) in the k-th region, j-th age-group, at the time-point tki for the subregion where (1) (2) (1) (2) there is overlap in space. Similarly, nkji = nkji + nkji and mkji ( k ) = mkji ( k ) + mkji ( k ). Observe (2)

(2)

that when i 2 fI + 1; :::; I1 g, r.v.s D1ji and D2j;i I are associated with the same overlapping subregion. Revisiting the example illustrated in Figure 1, it should be remarked that in the y-axis (space) there are (b) more points than those that represent one realization of all r.v.s Dkji in each time point, but grouping the points belonging to the same vertical line inside the portion marked in dash we are referring to one (1) (2) realization of them (for instance, for t11 = 1 we have two groups of points associated with D1j1 , D1j1 (1)

(2)

(2)

respectively, while for t1j5 = t2j1 = 5 we have three groups of points associated with D1j5 , D1j5 or D2j1 , (1)

D2j1 ). Grouping points symbolize di¤erent extension in regions. In Figure 1 there are 20 realizations of (b)

all r.v.s Dkji in total, 12 for region 1, 10 for region 2 and 2 r.v.s are shared for both regions.

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

5

(b)

It is important to understand r.v.s Dkji , b 2 f1; 2g as “homogeneous contributors” with respect (b)

(b)

(2)

(2)

1 1 I to Dkji , i.e. Dkji P(mkji ) such that (10) holds, and hence fm1ji ( 1 )gIi= and fm2ji ( 2 )gIi=1 I+1 are only equal when 11 = 12 (or equivalently, when 1 = 2 ). Now we can say thoroughly that b12 ) = Var( b11 ) + under 11 = 12 , the reason why Cov( b11 ; b12 ) = 0 is not true inside Var( b11 Var( b12 ) 2Cov( b11 ; b12 ) for overlapping regions is that fD1ji gi=1;:::;I1 ;j=1;::;J and fD2ji gi=1;:::;I2 ;j=1;::;J (2) are not independent, because both regions share the same the set of r.v.s fD1ji gi=I+1;:::;I1 ;j=1;::;J with

(2)

(2)

D1ji = D2j;i

I

.

(b) ind

Assumption 1 Dkji

(b)

(b)

P(mkji ), b 2 f1; 2g, where for nkji > 0 the following holds (b)

(b)

mkji =

nkji nkji

mkji ;

b 2 f1; 2g:

(b)

(10)

(b)

We accept the case where nkji = 0, for some b 2 f1; 2g, so that Dkji = 0 a.s. (degenerate r.v.) because (b) mkji

= 0.

Regarding the basic models considered in the papers dealing with overlapping regions, the Agestrati…ed Poisson regression model can be considered as the most realistic one, actually they have been constructed by successive improvements on the previous models so that initially normality assumptions were taken as approximations of underlying Poisson r.v.s. In the …rst paper concerned about trend comparisons across overlapping regions (Li and Tiwari (2008)), it is remarked that “... the derivation of Cov( b11 ; b12 ), ...., is nontrivial as it requires a careful consideration of the overlapping of two regions”. The assumption considered by them (which is based on Pickle and White (1995)) for the overlapping subregion is similar to the assumption considered herein in the sense that the overlapping subregion follows the same distribution considered for the whole region. A similar criterion was followed in Li et al. (2007) and Li et al. (2008).

3

Minimum Power Divergence Estimators for an Age-strati…ed Poisson Regression Model with Overlapping

Let ms be the expected value of the r.v. of deaths (or incidences) Ds associated with the s-th cell of a contingency table with Mk JIk cells (s = 1; :::; Mk ). In this section, we consider model (5) in matrix notation so that the triple indices are uni…ed in a single one by following a lexicographic order. Hence, the vector of cell means mk ( k ) = (m1 ( k ); :::; mMk ( k ))T = (mk11 ( k ); :::; mkJIk ( k ))T of the multidimensional r.v. of deaths (or incidences) D k = (D1 ; :::; DMk )T = (Dk11 ; :::; DkJIk )T , is related to T the vector of parameters k = ( 0k1 ; :::; 0kJ ; 1k ) 2 k = RJ+1 according to log Diag

1

(nk ) mk (

k)

= Xk

k

or mk (

k)

= Diag(nk ) exp(X k

k );

(11)

where Diag(nk ) is a diagonal matrix of individuals at risk nk = (n1 ; :::; nMk )T = (nk11 ; :::; nkJIk )T (ns > 0, s = 1; :::; Mk ) and 0 1 1Ik tk B .. C .. Xk = @ = (IJ 1Ik ; 1J tk ); . .A 1Ik tk JI (J+1) k

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

6

with tk (tk1 ; :::; tkIk )T , a full rank Mk (J + 1) design matrix. Based on the likelihood function of a Poisson sample D k the kernel of the log-likelihood function is given by `

k

(D k ) =

Mk X

Ds log ms (

k)

s=1

and thus the MLE of

k

is b = arg max ` k k2

k

k

M Pk

ms (

s=1

k );

(D k ) :

It is well known that there is a very close relationship between the likelihood theory and the KullbackLeibler divergence measure (Kullback and Leibler (1951)). Focussed on a multinomial contingency table it is intuitively understandable that a good estimator of the probabilities of the cells should be such that the discrepancy with respect to the empirical distribution or relative frequencies is small enough. The oldest discrepancy or distance measure we know is the Kullback divergence measure, actually the estimator which is built from the Kullback divergence measure is the MLE. By considering the unknown parameters of a Poisson contingency table, the expected values, rather than probabilities and the observed frequencies rather than relative frequencies, we are going to show how is it possible to carry out statistical inference for Poisson models through power divergence measures. According to the Kullback divergence measure, the discrepancy or distance between the Poisson sample D k and its vector of means mk ( k ) is given by Mk X Ds Ds + ms ( k ) : (12) dKull (D k ;mk ( k )) = Ds log ms ( k ) s=1 Observe that dKull (D k ;mk ( k )) = ` k (D k ) + Ck , where Ck does not depend on parameter a relationship allows us to de…ne the MLE of k as minimum Kullback divergence estimator b = arg min dKull (D k ;mk ( k k2

k.

Such

k ));

k

and the MLE of mk ( k ) functionally as mk ( b k ) due to the invariance property of the MLEs. The power divergence measures are a family of measures de…ned as M

k X 1 d (D k ;mk ( k )) = (1 + ) s=1

Ds +1 ms ( k )

Ds (1 + ) + ms (

k)

;

2 = f0; 1g:

(13)

such that from each possible value for subscript 2 R f0; 1g a di¤erent way to quantify the discrepancy between D k and mk ( k ) arises. In case of 2 f0; 1g, we de…ne d (D k ;mk ( k )) = lim`! d` (D k ;mk ( k )), and in this manner the Kullback divergence appears as special case of power divergence measures when = 0, d0 (D k ;mk ( k )) = dKull (D k ;mk ( k )) and on the other hand case = 1 is obtained by changing the order of the arguments for the Kullback divergence measure, d 1 (D k ;mk ( k )) = dKull (mk ( k ); D k ). The estimator of k obtained on the basis of (13) is the so-called minimum power divergence estimator (MPDE) and it is de…ned for each value of 2 R as b

k;

= arg min d (D k ;mk ( k2

k ));

(14)

k

and the MPDE of mk ( k ) functionally as mk ( b k; ) due to the invariance property of the MPDEs. Apart from the MLE ( b k or b k;0 ) there are other estimators that are members of this family of estimators: minimum modi…ed chi-square estimator, b k; 2 ; minimum modi…ed likelihood estimator, b k; 1 ; CressieRead estimator, b k;2=3 ; minimum chi-square estimator, b k;1 . These estimators were introduced and

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

7

analyzed for multinomial sampling by Cressie and Read (1988), but for Poisson sampling were applied for the …rst time in Pardo and Martín (2009). The so-called minimum -divergence estimators are a wider class of estimator that contains MPDEs as special case (see Pardo (2005) for more details) and this statistical problem could be easily extended for these estimators. Taking into account that the asymptotic distribution of all MPDEs tend to be “theoretically” the same, including the MLE, we are going to propose an alternative method for estimating Var( b11 b12 ) = Var( b11;0 b12;0 ) that covers a new element for overlapping regions, Cov( b11 ; b12 ) = Cov( b11;0 ; b12;0 ). We postulate that for not very large data sets, the MLEs, b11;0 b12;0 , might be likely improved by the estimation associated with = 1, b11;1 b12;1 , when overlapping regions are considered. [ k; = 100(exp( b1k; ) 1), we need to compute the estimator In order to obtain the MPDE of (2), APC of the parameter of interest by following the next result. Proposition 2 The MPDE of

1k ,

b1k; , is the solution of the nonlinear equation f ( b1k; ) =

with ki

=

J X j=1

mkji ( b ) ('kji

Ik X

tki

exp( b0kj; ) =

pkjs

+1 kjs

s=1

!

= 0;

1) ;

mkji ( b ) = nkji exp( b0kj; ) exp( b1ki; tki ) Ik X

ki

i=1

and

'kji =

1 +1

;

j = 1; :::; J;

nkjs exp( b1k; tks ) pkjs = PIk b h=1 nkjh exp( 1k; tkh )

and

kjs

=

Dkji mkji ( b )

!

+1

;

Dkjs : nkjs exp( b1k; tks )

b12; is asymptotically normal and to obtain an explicit expression Our aim is to show that b11; of the denominator of the Z-test statistic (8) with MPDEs b11; b12; Z =q ; b12; ) d b11; Var(

(15)

when the random vectors of observed frequencies of both regions, D 1 and D 2 , share some components (those belonging to the overlapping subregion). Since (15) is approximately standard normal for minfN1 ; N2 g large enough, we can test H0 : APC1 = APC2 ( 11 = 12 ) vs H1 : APC1 6= APC2 ( 11 6= 12 ), so that if the value of jZ j is greater than the quantile z1 2 (i.e., Pr(Z < z1 2 ) = 1 2 ), H0 is rejected with signi…cance level . The following result is the key result for estimating the variances and covariance of the estimators of interest, b1k; , k = 1; 2. It allows us to establish a linear relationship between the parameter of interest and the observed frequencies under Poisson sampling when the expected total mean Nk in each region (k = 1; 2) is large enough and the way that Nk increases is given in Assumption 3. Assumption 3 mkji ( 0k ) = mkji ( at the same rate as Nk .

0 k )=Nk

remains constant as Nk increases, that is, mkji (

0 k)

increases

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

Theorem 4 The MPDE of b1k;

1k , 0 1k

=

b1k; , k = 1; 2, can be expressed as 0 T 2 eT 1k tk ( k )X k (D k

mk (

0 k ))

+o

D k mk ( Nk

0 k)

8

;

where superscript 0 is denoting the true and unknown value of a parameter, o is denoting a little o function for a stochastic sequence (see Appendix in Bishop et al. (1975)) and 2 1k

etTk(

0 k)

T = etk (

0 0 0 T e k )X k Diag(mk ( k ))X k tk ( k )

e e tk1 ( 0k ) tkJ ( PIk 0 i=1 mkji ( k )tki e tkj ( 0k ) = P : Ik 0 i=1 mkji ( k ) =

Theorem 5 The MPDE of equal to (16).

1k ,

0 k)

1

=

Ik J P P

mkji (

j=1i=1

1 ;

e tkj (

0 k )(tki

0 2 k ))

!

1

;

(16)

(17)

b1k; , k = 1; 2, is asymptotically Normal, unbiased and with variance

p 0 2 Note that Theorem 5 would be more formally enunciated in terms of Nk ( b1k; 1k ), because 1k is not constant as Nk increases. We havePavoided that in order to focus directly on the estimator of interest. Ik Due to Assumption 3 and e tkj ( 0k ) = i=1 mkji ( 0k )tki , what is constant is p Var( Nk b1k; ) = Nk

2 1k

=

Ik J P P

mkji (

j=1i=1

0 k )(tki

e tkj (

0 2 k ))

!

1

:

Let N be the total expected value of the region constructed by joining regions 1 and 2. Note that N N1 + N2 , being only equal with non-overlapping regions. In order to establish the way that N increases with respect to Nk , we shall consider throughout the next assumption. Assumption 6 Nk = Nk .

Nk N

(k = 1; 2) is constant as N increases, that is N increases at the same rate as

0 0 Note that for overlapping regions, N1 + N2 > 1 holds and under the hypothesis that 11 = 12 , 0 0 we have a common true parameter vector (k = 1; 2). Hence, under the hypothesis that PJ PI1 I (2) k0 0 0 )=N is constant, the overlapping death fraction, 11 = 12 , since N1 + N2 = 1 + j=1 i=1 m2kj ( PJ PI1 I (2) 0 )=N , is also constant as N increases. j=1 i=1 m2kj (

Theorem 7 Under the hypothesis that as b11; X1 =

X2 = X3 = Y =o

0 11

=

0 12 ,

the MPDE of

11

b12; = X1 + X2 + X3 + Y;

(1) (1) 0 2 eT )X T1 (D 1 m1 ( 0 )); 11 t1 ( (1) (1) 0 2 eT )X T2 (D 2 m2 ( 0 )); 12 t2 ( T T (2) 0 0 2 eT 2 eT )X 1 )X 2 (D 11 t1 ( 12 t2 ( D 1 m1 ( N1

12 ,

0

)

+o

D 2 m2 ( N2

0

)

;

m(2) (

b11;

0

));

b12; , is decomposed

(18)

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

where X k is an ampli…ed J(I + I2 ) (J + 1) matrix of X k , 1 0 1k tk B .. C .. = (IJ Xk = @ . .A 1k tk J(I+I ) (J+1)

1k ; 1J

9

tk );

2

T 11

=

(1TI1 ; 0TI+I2 I1 )

T

t1 = (tT1 ; 0TI+I2 and D

(2)

(2)

joining D k

D

(2)

for k = 1; 2 and mk (

(2) 0

m(2) (

0

2 1k

2 1k

(2)

(2)

) = ((m111 (

Ik J P P

0

(2)

); :::; m1J I (

mkji (

Ik P

mkji (

i=1

12

0

)(tki

j=1i=1

with mkj =

=

(2)

(2) J IX 1 I X n2ji j=1 i=1

n2ji

0

), e tkj (

m2ji (

0 k)

0

e tkj (

0

(2)

); :::; m1J;I+I2 (

)(t2i

)(t22i

0

(2)

(2)

(2)

0 11

the asymptotic distribution of b11;

=

0 12 ,

b12; ) = 0

))

2

!

2 11

1

m2 (

2 12

+

Ik J P P

=

) = (m211 (

0

(2)

0

))T ;

0

(2)

)); m2 (

e t1j (

+e t1j (

1;12

1;12

))T are the vectors obtained

D 2 = (D211 ; :::; D2JI2 )T ;

mkji (

0

)t2ki

J P

mkj

j=1

0

); :::; m2JI2 (

e t2kj (

))T :

b12; is central

2 2 11 12 12

2

j=1i=1

0

0

))(t2i

)e t2j (

0

e t2j ( ))

0

0

)

!

1

;

(19)

))

= Cov( b11; ; b12; ) =

= Cor( b11; ; b12; ) =

(20) (2)

J I1PI n P 2ji m2ji ( n j=1 i=1 2ji

That is, the covariance between b11; and b12; is given by and the correlation by

0

is (17) and

(2)

J I1PI n P 2ji = m2ji ( n j=1 i=1 2ji

0

) = (m111 (

= ((D111 ; :::; D1J I ); (D 2 )T )T ;

is equal to

=

T

t2 = (0TI ; tT2 );

and 0

= (0TI ; 1TI2 );

) for k = 1; 2 respectively, i.e.

Theorem 8 Under the hypothesis that Normal with Var( b11; where

and

I1 )

= (D111 ; :::; D1J;I+I2 )T , m(2) (

T 12

2 2 11 12 12 ;

0

)t2i (e t1j (

0

)+e t2j (

0

)):

(21)

11 12 12 .

2 For the expression in the denominator of (15), we need to obtain the MPDEs of 1k , k = 1; 2 and 12 , 0 b , k = 1; 2 and 12; respectively. A way to proceed is based on replacing by the most e¢ cient MPDE ( 0 b ; if N1 N2 1; b0 : b 0 ; if N1 < N2 2 b1k;

2;

An important advantage of this new methodology is that the expression of the denominator of (15) is explicit, easy to compute and can be interpreted easily. The term (20) determines the sign of (21). The structure of (20) is similar to the covariance proposed in the model of Li et al. (2007) for WLSEs or as

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

10

well as for the estimators in the model of Li and Tiwari (2008). We can see that if there is no time-point b12; ) = b2 + b2 ; if there is no d b11; shared by the two regions, i.e. I I1 , then b1;12; = 0 and Var( 11; 12; 0 (2) b space overlap, then it holds m ( ) = 0 for all i and j belonging to the overlapping subregion and hence 2ji

d b11; b1;12; = 0 and Var(

b12; ) = b2 +b2 . On the other hand, when the two regions to be compared 11; 12; b12; ) = b2 + b2 d b11; share at least one time-point and there is space overlap, Var( 2b1;12; holds, 11; 12; with b1;12; 6= 0. Moreover, when the period of time not shared by the two regions is large (small), the 0 0 covariance tends to be negative (positive) because the average values, e t1j ( b 1; ) and e t2j ( b 2; ), are more separated from (closer to) the time-points associated with the overlapping subregion. We shall later analyze this behaviour through a simulation study, and we shall now investigate how is the structure of 12 when the two regions to be compared share the whole period of time. 0 11

Corollary 9 When I = 0 and I1 = I2 , under the hypothesis that 12

=

1 2 1(2)

+

J m(1) P 1j

j=1

m1j

(1)

m2j m2j

with 1 2 b1(2)

(b) e tkj ( 0 )

=

=

(b)

(2)

2 1(2)

(2)

m2ji (

0

(2) e t1j (

)

)(t2i

=

PIk

(b) 0 )tki i=1 mkji ( PIk (b) 0 ; ) i=1 mkji (

Ik P

mkji (

(b)

0

);

(2)

0

)=

m2ji (

i=1

(2) e t2j (

(1) ))(e t2j (

0

0

)

(2) e t2j (

0

));

(22)

))2 ;

(1)

(2)

mkj = mkj + mkj ;

i=1 I2 P

0

I1 P

(2)

m1ji (

0

):

i=1

represents the variance of b12; focussed on the overlapping subregion. In particular, if region 2 is

completely contained in region 1,

4

I2 J P P

0

j=1i=1

mkj = mj

(2) (1) mj (e t1j (

0 12

=

12

= 1=

2 1(2)

Var( b11;

= 1=

2 12 ,

(1)

m2j = 0 for all j = 1; :::; J, and hence

b12; ) =

2 12

2 11 :

(23)

Simulation Studies and Analysis of SEER Mortality Data

When dealing with asymptotic results, it is interesting to analyze the performance of the theoretical results in an empirical framework. Speci…cally, for Poisson sampling what is important to calibrate is the way that the total expected value of deaths (or incidences) Nk a¤ects the precision of the results. Other characteristics such as the percentage of overlapping regions “in space”or “in time”, as well as the suitable choice of values are also worth to be analyzed. As a preliminary study, before focussing on Nk , we have considered thyroid cancer mortality (rare cancer) in three regions, Western (W) US population (composed of Arizona, New Mexico and Texas), South Western (SW) US population (composed of Arizona, California and Nevada) and West Coast (WC) US population (composed of California, Oregon and Washington). APC comparison of W vs. SW (Arizona is shared) on one hand and SW vs. WC (California is shared) on the other hand are considered. We have taken di¤erent scenarios with di¤erent time periods, 1998-2007 for SW in all scenarios and 1986-1995, 1989-1998, 1992-2001, 1995-2004 and (1998-2007) for the other region (W or WC) in each of scenarios A0 , B 0 , C 0 , D0 and E 0 respectively. In Table 1 the percentage of expected

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

11

deaths in the regions to be compared with respect to the shared part (the percentages of overlapping) are shown, when 11 = 12 = 0:005 (APC1 = APC2 ' 0:5) for W vs. SW, and 13 = 14 = 0:02 for SW vs. WC (APC3 = APC4 ' 2:02). Observe that in the same scenario but di¤erent couple of comparisons, the change in overlapping percentage is due to the space overlapping (the overlapping percentages are greater for SW vs. WC, actually the shared part is a large state, California). In addition, we have chosen some values of , 2 f 0:5; 0; 32 ; 1; 1:5g, in order to compare the performance of minimum power divergence estimators. In Table 3 these results are shown for W vs. SW. From scenario B 0 to E 0 (i.e. when the overlapping percentage is increasing), the covariance is increasing, starts with negative values at B 0 (1 time point is shared), decreases at E 0 (4 time points are shared), later positive values but small are reached at F 0 (7 time points are shared) and …nally at E 0 (10 time points are shared) ends with positive and high values. It seems that more or less the sign of the covariance changes in the middle of time points considered for each of the regions. In scenario A0 the theoretical covariance is zero, actually the two regions do not share observations. By asterisk we have marked the variances and signi…cance levels obtained by simulation which are greater than its corresponding theoretical values, in order to visualize them as the worst cases. From the results it is concluded the minimum power divergence estimators with = 1, that is the minimum chi-squared estimators provide empirically e¢ cient estimators and their Z-test statistics have good performance with respect to the theoretical signi…cance level in the sense that tend to be much smaller. We have omitted the results for SW vs. WC because we have seen that the space overlapping by itself do not a¤ect much the covariances of bk1; . That is, there were no remarkable di¤erence among the covariances in case of choosing SW vs. WC rather than W vs. SW, because the sign of the covariances starts at the same scenario and it is just the value of the covariance what marks the di¤erence between both of them. The behaviour of minimum power divergence estimators is very similar too. Hence, in the simulation study that follows we are going to focus only on …xed overlapping percentages and one of them is going to be 100% and the focus of interest are going to be the MLEs and the MCSEs. Table 1: Overlapping percentages for W vs SW and SW vs WC in …ve scenarios. space n time W vs SW SW vs WC

sc A0 18.96%; 13.03% 81.80%;78.39%

sc B 0 12.66%; 9.12% 59.09%; 54.06%

sc C 0 6.94%; 5.24% 34.75%; 30.30%

sc D0 1.66%; 1.32% 8.93%; 7.40%

sc E 0 0%; 0% 0%; 0%

For studying the precision of the results when Nk changes, we have considered three proportionality 1 1 ; 300 g associated with Nk in each of the following scenarios for Regions 1 and 2, with constants 2 f1; 100 2 f0:02; 0:005; 0; 0:005g being equal for both (k = 1; 2) as it is required for the null hypothesis, i.e. 1k APC1 = APC2 ' 2:02, APC1 = APC2 ' 0:50, APC1 = APC2 ' 0, APC1 = APC2 ' 0:50: Scenario A: Low level overlapping regions, I1 = 6; I2 = 11; I1 I = 3. Scenario B: Medium level overlapping regions, I1 = 10; I2 = 11; I1 I = 7. Scenario C: High level overlapping regions, I1 = 8; I2 = 8; I1 I = 8. The values of nkji have been obtained from real data sets for female: Scenario A: Region 1 = United States (US) during 1993–1998, Region 2 = California (CA) 1996–2006. Scenario B: Region 1 = US during 1993–2002, Region 2 = CA during 1996–2006. Scenario C: Region 1 = US during 1999–2006, Region 2 = CA during 1999–2006. From the same data sets we have have taken 0kj = log( Dkj1 =nkj1 ) k1 tk1 , focussed on the Breast cancer for the …rst year of the time interval (i = 1). All these data were obtained from the SEER database and hence we are taking into account J = 19 age groups. Once the previous parameters have been established we can compute in a theoretical framework the individual variances of estimators bk1; , 2 b b12; ) = 2 + 2 2 1;12 . We can also compute the theoretical 11 12 1k , covariance 1;12 and Var( 11;

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

12

value of k Nk =(JIk ), the average expected value per cell, which is useful to see if the value of Nk is large enough, these values are in Table 2. Table 2: Average total expected means of deaths per cell. Scenario A 1k

1 1 1 1 1 100 1 100 1 100 1 100 1 300 1 300 1 300 1 300

0:020 0:005 0:000 0:0050 0:020 0:005 0:000 0:0050 0:020 0:005 0:000 0:0050

1

2538:24 2441:69 2410:67 2380:23 25:38 24:42 24:11 23:80 8:46 8:14 8:03 7:93

2

331:42 292:43 280:62 269:35 3:31 2:92 2:81 2:69 1:10 0:97 0:93 0:90

Scenario B 1

2741:10 2552:96 2494:19 2437:28 27:41 25:53 24:94 24:37 9:14 8:51 8:31 8:12

Scenario C

2

1

331:42 292:43 280:619 269:35 3:31 2:92 2:81 2:69 1:10 0:97 0:93 0:90

2493:85 2360:81 2318:67 2277:59 24:94 23:61 23:19 22:77 8:31 7:87 7:73 7:59

2

265:98 251:71 247:19 242:79 2:66 2:52 2:47 2:43 0:89 0:84 0:82 0:81

Since both regions share a common space, we have generated …rstly its death counts by simulation and thanks to the Poisson distribution’s reproductive property under summation, we have generated thereafter the death counts for each region by adding the complementary Poisson observations. In Tables 4, 6, 8 are summarized the theoretical results as well as those obtained by simulation for the MLEs and in Tables 5, 7, 9 for the MCSEs. The variances and covariances appear multiplied by 109 in all the tables. We have added tilde notation for those parameter that have been calculated by simulation with R = 22; 000 replications: R 1 P ( b1k; (r) R r=1 R 1 P = ( b11; (r) R r=1

2 ~1k; =

~1;12;

~ b1k; ])2 ; E[

R ~ b1k; ] = 1 P b1k; (r); E[ R r=1

~ b11; ])( b12; (r) E[

~ b12; ]): E[

It is important to remark that such a large quantity of replications have been chosen in order to reach a reliable precision in the simulation study (e.g., it was encountered that R = 10; 000 was not large enough). The last column is referred to the exact signi…cance level associated with the Z-test obtained by simulation when the nominal signi…cance level is given by = 0:05, e =

R 1 P I(jZ (r)j > z0:975 ); R r=1

where I() is an indicator function and z0:975 ' 1:96 the quantile of order 0:975 for the standard normal distribution. It can be seen as expected, that in Scenario 3 the covariance is positive in all the cases, while in b12; ) as well as for ~ Scenario 1 the covariance is negative. It is clear that the precision for Vg ar( b11; gets better as increases. While for large data sets ( = 1) there is no best choice regarding , for small b12; are more data sets ( = 1=300) the choice in favour of = 1 is clear because estimators b11; b b b b b b g g e¢ cient, in fact Var( 11;1 12;1 ) < Var( 11; 12; ) < Var( 11;0 12;0 ), and the exact signi…cance levels or estimated type I error is less than for = 0 in all the cases (~ 1 ~ 0 ). Since perhaps type II error could be better for MLEs, the power functions for both estimators have been studied. In particular,

sc A0 A0 A0 A0 A0 B0 B0 B0 B0 B0 C0 C0 C0 C0 C0 D0 D0 D0 D0 D0 E0 E0 E0 E0 E0

1 1:5

2 3

1 1:5 0:5 0

2 3

1 1:5 0:5 0

2 3

1 1:5 0:5 0

2 3

1 1:5 0:5 0

2 3

0:5 0

106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94 106106:94

2 11

2 ~11; 117206:91 106482:52 100968:49 99842:89 99346:15 117293:27 106707:01 101342:71 100261:70 99807:30 116056:24 105572:08 100178:76 99090:96 98646:02 115548:66 104872:37 99400:99 98300:76 97854:16 115740:28 105114:37 99710:57 98636:63 98219:30

88722:57 88722:57 88722:57 88722:57 88722:57 83850:45 83850:45 83850:45 83850:45 83850:45 79295:40 79295:40 79295:40 79295:40 79295:40 74971:59 74971:59 74971:59 74971:59 74971:59 70747:13 70747:13 70747:13 70747:13 70747:13

2 12

2 ~12; 96004:29 88399:60 84561:80 83793:94 83510:99 92311:97 85398:66 81753:09 80985:96 80649:82 84620:24 78400:39 75138:00 74470:54 74214:18 81107:54 75820:32 72923:02 72306:86 72044:77 75885:14 71094:62 68383:44 67789:30 67513:89

~1;12; 190:20 131:23 64:51 34:77 2:27 3833:28 3490:04 3229:76 3142:66 3047:64 5099:81 4630:90 4302:56 4199:62 4094:68 2271:85 2148:49 2060:12 2034:16 2011:33 17152:32 16123:83 15273:05 14953:01 14557:46

b12; ) V ar( b11; 194829:51 194829:51 194829:51 194829:51 194829:51 197997:72 197997:72 197997:72 197997:72 197997:72 197473:63 197473:63 197473:63 197473:63 197473:63 176489:89 176489:89 176489:89 176489:89 176489:89 145611:20 145611:20 145611:20 145611:20 145611:20

b12; ) Vg ar( b11; 213591:59 195144:57 185659:31 183706:37 182852:60 217271:80 199085:75 189555:32 187532:98 186552:39 210876:09 193234:25 183921:87 181960:74 181049:56 192112:50 176395:71 168203:77 166539:31 165876:27 157320:78 143961:33 137547:92 136519:90 136618:26

~ 0:056 0:050 0:047 0:047 0:049 0:058 0:051 0:049 0:049 0:052 0:055 0:048 0:046 0:046 0:049 0:057 0:050 0:048 0:050 0:052 0:055 0:048 0:047 0:047 0:049

2 f 0:5; 0; 32 ; 1; 1:5g for scenarios A0 , B 0 , C 0 , D0 and E 0 .

0:00 0:00 0:00 0:00 0:00 4020:16 4020:16 4020:16 4020:16 4020:16 6035:64 6035:64 6035:64 6035:64 6035:64 2294:32 2294:32 2294:32 2294:32 2294:32 15621:44 15621:44 15621:44 15621:44 15621:44

1;12

Table 3: Minimum Power Divergence Estimators with

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance 13

1 100 1 100 1 100 1 100 1 300 1 300 1 300 1 300

1 1 1 1

1 100 1 100 1 100 1 100 1 300 1 300 1 300 1 300

1 1 1 1

1188:50 1233:49 1248:91 1264:55 118849:86 123349:03 124891:15 126455:01 356549:59 370047:09 374673:44 379365:02

2 11

1k

0:020 0:005 0:000 0:005 0:020 0:005 0:000 0:005 0:020 0:005 0:000 0:005

1188:50 1233:49 1248:91 1264:55 118849:86 123349:03 124891:15 126455:01 356549:59 370047:09 374673:44 379365:02

2 11

0:020 0:005 0:000 0:005 0:020 0:005 0:000 0:005 0:020 0:005 0:000 0:005

1k

1468:14 1653:48 1720:50 1790:33 146813:66 165348:33 172050:41 179033:49 440440:97 496045:00 516151:22 537100:47

2 12

2 ~12; 1475:38 1644:16 1707:01 1801:21 146902:40 167479:53 173875:81 181356:96 451288:03 503332:51 532280:30 562780:79 1;12

152:70 166:41 171:20 176:11 15269:95 16640:50 17119:82 17610:72 45809:84 49921:51 51359:46 52832:16

~1;12; 153:71 159:88 156:44 185:35 16186:46 16374:79 17946:96 15914:04 53204:15 51558:77 50448:54 58143:46

b12; ) 2962:03 3219:78 3311:81 3407:10 296203:41 321978:37 331181:19 340709:94 888610:23 965935:10 993543:56 1022129:82

V ar( b11;

2 ~11; 1196:61 1245:63 1237:43 1276:42 118678:59 121351:67 122229:69 122628:20 342888:09 354377:57 356408:32 360799:63

1468:14 1653:48 1720:50 1790:33 146813:66 165348:33 172050:41 179033:49 440440:97 496045:00 516151:22 537100:47

2 12

2 ~12; 1474:56 1642:03 1704:17 1797:77 131711:15 148155:30 152728:68 158913:98 340354:35 369058:29 388239:25 403732:21

152:70 166:41 171:20 176:11 15269:95 16640:50 17119:82 17610:72 45809:84 49921:51 51359:46 52832:16

1;12

~1;12; 153:88 158:86 156:10 185:22 14717:95 14873:11 16259:50 14215:39 42220:99 41173:07 38265:08 47176:61

b12; ) 2962:03 3219:78 3311:81 3407:10 296203:41 321978:37 331181:19 340709:94 888610:23 965935:10 993543:56 1022129:82

V ar( b11;

Table 5: Scenario A: Minimum Chi-Square Estimators ( = 1).

2 ~11; 1196:88 1245:97 1237:81 1276:82 120545:66 123414:45 124618:15 125135:69 359581:84 373291:90 375119:77 380562:71

Table 4: Scenario A: Maximum Likelihood Estimators ( = 0).

b12; ) 2978:92 3205:38 3253:80 3444:64 279825:64 299253:18 307477:36 309972:96 767684:41 805782:01 821177:73 858885:06

Vg ar( b11;

b12; ) 2979:67 3209:89 3257:70 3448:73 299820:97 323643:55 334387:88 338320:73 917278:18 979741:96 1008297:13 1059630:42

Vg ar( b11;

~ 0:049 0:049 0:049 0:051 0:051 0:049 0:048 0:045 0:050 0:045 0:045 0:045

~ 0:049 0:050 0:049 0:052 0:052 0:052 0:050 0:047 0:052 0:050 0:051 0:054

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance 14

1 100 1 100 1 100 1 100 1 300 1 300 1 300 1 300

1 1 1 1

1 100 1 100 1 100 1 100 1 300 1 300 1 300 1 300

1 1 1 1

0:020 0:005 0:000 0:005 0:020 0:005 0:000 0:005 0:020 0:005 0:000 0:005

1k

0:020 0:005 0:000 0:005 0:020 0:005 0:000 0:005 0:020 0:005 0:000 0:005

1k

234:90 251:10 256:77 262:57 23489:78 25109:90 25676:71 26257:50 70469:35 75329:71 77030:13 78772:49

2 11

234:90 251:10 256:77 262:57 23489:78 25109:90 25676:71 26257:50 70469:35 75329:71 77030:13 78772:49

2 11

1468:14 1653:48 1720:50 1790:33 146813:66 165348:33 172050:41 179033:49 440440:97 496045:00 516151:22 537100:47

2 12

2 ~12; 1461:71 1648:47 1713:16 1792:15 147774:27 147273:99 171995:21 179024:69 442433:57 510147:59 521168:35 545463:20 1;12

12:72 13:91 14:35 14:83 1272:17 1390:58 1435:45 1483:35 3816:51 4171:74 4306:36 4450:04

~1;12; 6:74 10:97 7:81 17:78 181:93 1546:90 822:83 708:28 2392:77 3181:74 2781:06 5288:71

b12; ) 1677:59 1876:77 1948:56 2023:24 167759:10 187677:08 194856:21 202324:30 503277:31 563031:24 584568:62 606972:89

V ar( b11;

2 ~11; 234:36 252:87 255:00 261:91 23039:44 24424:06 25227:72 25713:71 68417:63 71630:87 73435:63 75952:38

1468:14 1653:48 1720:50 1790:33 146813:66 165348:33 172050:41 179033:49 440440:97 496045:00 516151:22 537100:47

2 12

2 ~12; 1459:11 1646:88 1710:41 1790:65 132848:97 147273:99 152123:09 157016:58 333558:97 375196:57 380384:11 394349:52

12:72 13:91 14:35 14:83 1272:17 1390:58 1435:45 1483:35 3816:51 4171:74 4306:36 4450:04

1;12

~1;12; 6:73 10:79 7:53 17:89 204:75 1546:90 950:07 608:74 2545:19 2568:93 1980:50 3665:47

b12; ) 1677:59 1876:77 1948:56 2023:24 167759:10 187677:08 194856:21 202324:30 503277:31 563031:24 584568:62 606972:89

V ar( b11;

Table 7: Scenario B: Minimum Chi-Square Estimators ( = 1).

2 ~11; 234:40 252:79 255:02 261:96 23328:11 24424:06 25666:21 26172:50 71112:09 74737:59 76849:11 79582:80

Table 6: Scenario B: Maximum Likelihood Estimators ( = 0).

b12; ) 1680:02 1878:16 1950:35 2016:78 155478:91 168604:26 175450:67 181512:81 396886:23 441689:58 449858:76 462970:97

Vg ar( b11;

b12; ) 1682:63 1879:32 1952:57 2018:56 170738:52 168604:26 196015:76 203780:65 508760:12 578521:71 592455:34 614468:57

Vg ar( b11;

~ 0:050 0:052 0:050 0:049 0:056 0:049 0:049 0:050 0:055 0:049 0:046 0:046

~ 0:050 0:052 0:050 0:049 0:053 0:049 0:052 0:051 0:052 0:053 0:050 0:050

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance 15

1 100 1 100 1 100 1 100 1 300 1 300 1 300 1 300

1 1 1 1

1 100 1 100 1 100 1 100 1 300 1 300 1 300 1 300

1 1 1 1

0:020 0:005 0:000 0:005 0:020 0:005 0:000 0:005 0:020 0:005 0:000 0:005

1k

0:020 0:005 0:000 0:005 0:020 0:005 0:000 0:005 0:020 0:005 0:000 0:005

1k

505:19 532:12 541:45 550:96 50518:62 53212:46 54145:29 55096:19 151555:86 159637:38 162435:88 165288:56

2 11

505:19 532:12 541:45 550:96 50518:62 53212:46 54145:29 55096:19 151555:86 159637:38 162435:88 165288:56

2 11

2 ~11; 502:28 529:39 543:50 550:04 49937:40 53092:20 52785:97 54621:08 141857:76 153101:35 154380:49 57146:31

4753:38 5006:55 5094:21 5183:56 475338:31 500654:98 509420:86 518356:01 1426014:92 1501964:94 1528262:58 1555068:02

2 12

2 ~12; 4766:57 4962:78 5129:48 5202:96 480772:05 500398:48 511073:61 521012:61 1461021:71 1529631:00 1534795:18 1599312:35 1;12

505:19 532:12 541:45 550:96 50518:62 53212:46 54145:29 55096:19 151555:86 159637:38 162435:88 165288:56

~1;12; 515:35 527:77 549:19 563:62 52893:29 53825:39 55232:68 56166:72 152950:11 160328:08 165625:27 168363:58

b12; ) 4248:20 4474:43 4552:76 4632:60 424819:68 447442:52 455275:57 463259:82 1274459:05 1342327:55 1365826:70 1389779:46

V ar( b11;

4753:38 5006:55 5094:21 5183:56 475338:31 500654:98 509420:86 518356:01 1426014:92 1501964:94 1528262:58 1555068:02

2 12

2 ~12; 4756:74 4956:27 5120:89 5194:95 417941:93 434030:70 441697:10 449926:19 1037025:40 1075673:38 1074400:76 1110194:55

505:19 532:12 541:45 550:96 50518:62 53212:46 54145:29 55096:19 151555:86 159637:38 162435:88 165288:56

1;12

~1;12; 514:11 527:17 549:07 563:79 47230:21 48517:38 49680:01 50651:05 119830:05 123845:16 128138:62 131114:59

b12; ) 4248:20 4474:43 4552:76 4632:60 424819:68 447442:52 455275:57 463259:82 1274459:05 1342327:55 1365826:70 1389779:46

V ar( b11;

Table 9: Scenario C: Minimum Chi-Square Estimators ( = 1).

2 ~11; 502:38 529:53 543:59 550:15 50823:72 53963:47 53655:97 55610:24 149118:50 161042:69 162828:12 165289:23

Table 8: Scenario C: Maximum Likelihood Estimators ( = 0).

b12; ) 4230:79 4431:32 4566:24 4617:41 373418:90 390088:14 395123:06 403245:16 939223:06 981084:41 972504:01 1005111:67

Vg ar( b11;

b12; ) 4238:26 4436:77 4574:69 4625:87 425809:19 446711:16 454264:23 464289:42 1304240:00 1370017:53 1366372:76 1427874:42

Vg ar( b11;

~ 0:051 0:049 0:050 0:051 0:050 0:048 0:046 0:047 0:048 0:046 0:043 0:044

~ 0:050 0:049 0:050 0:051 0:050 0:049 0:050 0:050 0:051 0:049 0:047 0:050

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance 16

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

17

for = 1=300 it was observed the same behaviour as appears in Figure 2: in equidistant di¤erences 0 regarding = 11 > 0 ( < 0) then 12 , when 11 is …xed, if error II is better for MLEs when error II is better for MCSEs when < 0 ( > 0). Hence, in overall terms we recommend using MCSE rather than MLEs for small data sets. This is the case of the study illustrated instance in Riddell and PJ Pfor Ik Pliska (2008) where there are a lot of cases such that the value of bk = j= i=1 dkji =(JIk ) is quite low (moreover, several cases such that bk < 12=19 appear without giving any estimation “due to instability of small numbers”).

Pr

1.0

0.8

0.6

with MCSEs

0.4

with MLEs 0.2

-0.008 -0.007 -0.006 -0.005 -0.004 -0.003 -0.002 -0.001 0.000 0.001

0.002 0.003

0.004

0.005 0.006

0.007 0.008

β Figure 2: Power fuction in terms of

=

11

12

when

0 11

= 0, for Scenario A and

= 1=300.

We have applied our proposed methodology to compare with real data the APC in the age-adjusted mortality rates of WC, WS and W (described at the beginning of this section) for di¤erent periods of time, 1969-1983, 1977-1991 and 1990-1999 respectively, with both estimators and for Thyroid cancer (rare cancer). The third one di¤ers from the rest in the sense that it considers a shorter period of time for its study. The rates are expressed per 100; 000 individuals at risk. In Figure 3 the …tted models are plotted and from them it seems at …rst sight that there is a decreasing trend for Thyroid cancer in WC and SW, and null or decreasing trend in W. The speci…c values for estimates and test-statistics Z , for = 0; 1, are summarized in Table 10. Apart from the appropriate test-statistic, we have included naive test-statistics Z~ , for = 0; 1 that are obtained by applying the methodology for non-overlapping regions. For Thyroid cancer there is no evidence for rejecting the hypothesis of equal APCs for WS and W but it is not clear WC and WS. Looking at the con…dence intervals for each region, observe that for WC and WS the test-statistic has more power to discriminate di¤erences than for WS and W, because the variability is less (the period of time considered for W is shorter). The hypothesis of equal APCs is rejected with 0:05 signi…cance level for WC and WS when using the naive test, and cannot be rejected when using the proper test-statistic for overlapping regions (anyway, its p-value is close to 0:05). When dealing with common cancer types the same value of APC di¤erences on the sample would probably lead to reject the null hypothesis.

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

18

Table 10: Thyroid cancer mortality trends comparison among WC, SW and W during 1969-1983, 19771991 and 1990-1999 respectively: Maximum Likelihood Estimators and Minimum Chi-Square Estimators.

Region WC

k 1 1 2 2 3 3

b1k;

b0k;

2 b1k; 2:923 2:785 3:044 2:888 13:064 12:421

2 b1;k;k+1; 77:292 10 73:429 10 32:915 10 3:1074 10

[ AP C k; 2:639 2:646 1:064 1:053 0:031 0:117

CIAP Ck ; (95%) ( 3:665; 1:601) 5 5 ( 3:648; 1:635) 5 5 SW ( 2:128; 0:011) 5 5 ( 2:089; 0:005) 5 W ( 2:184; 2:297) 5 ( 2:275; 2:088) Z-test statistics for WC vs. SW: Z12;0 = 1:85, Z~ 12;0 = 2:08; Z12;1 = 1:92, Z~12;1 = 2:16 Z-test statistics for SW vs. W: Z23;0 = 0:85, Z~23;0 = 0:87; Z23;1 = 0:75, Z~23;1 = 0:76 0 1 0 1 0 1

0:0267 0:0268 0:0107 0:0106 0:0003 0:0012

0:3680 0:3241 0:5404 0:4943 0:7939 0:7084

10 10 10 10 10 10

5

5

MCSE MLE

r 0.7

0.6

0.5

0.4

WC

SW

W

1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000

t

Figure 3: MCSE and MLE for Thyroid cancer mortality trends in WC, SW and W.

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

5

19

Concluding Remarks

In this work, we have dealt with an important problem of comparing the changing trends of cancer mortality/incidence rates between two overlapping regions. Our new proposal allows us to correctly account for the correlation induced by the overlapping regions when drawing statistical inference. The better …nite sample performance of the minimum chi-square estimators, in comparison with the maximum likelihood estimators, suggests the practical utility of the proposed methods especially when comparing the APCs of rare cancers. Not only do our results verify the claim of Berkson (1980) that the e¢ ciency of the maximum likelihood estimator is questionable for the …nite sample size situations, they also encompass the Poisson models, for which the power-divergence based theoretical results (in particular for the minimum chi-square estimators) have remained elusive. In this paper, we have mainly focused on comparing two regions. Extending the methods to accommodate more than two regions simultaneously is certainly worthy of future investigations.

Technical Appendix Proof of Theorem 4 Mk

Let Mk be the set with all possible Mk -dimensional probability vectors and C Mk = (0; 1) (0; 1). The way in which N increases is so that Diag 1 (nk ) mk ( k ) does not change, hence ms ( k ) and ns increase at the same time (s = 1; :::; Mk ). This means that as Nk increases, parameter k does not su¤er any change and neither does the normalized mean vector of deaths, mk ( ) = N1k mk ( k ). Note that mk ( k ) 2 Mk C Mk . Let V RJ+1 be a neighbourhood of 0k and a function ( )

( )

( )

F Nk = (F1 ; :::; FJ+1 ) : C Mk ! RJ+1 ; so that ( )

Fi

(mk ;

k)

=

@d (Nk mk ;mk ( @ ki

T

k ))

;

i = 1; :::; J + 1;

T

with k = ( 0k1 ; :::; 0kJ ; 1k ) = ( k1 ; :::; kJ ; k;J+1 ) 2 V and mk = (m1 ; :::; mMk )T 2 Mk PMk More thoroughly, considering X k = (xsi )s=1;:::;Mk ;i=1;:::;J+1 and d (D k ;mk ( k )) = s=1 ms ( k ) where ( +1 x x (x 1) ; ( + 1) 6= 0; ( +1) (x) = lim ! (x); ( + 1) = 0; it holds ( )

Fi

(mk ;

k)

=

Mk X

N ms

ms ( )xsi

( ms (

k)

Nk ms ms ( k )

)

N ms

0

( ms (

( )

(mk (

k)

C Mk . s ), ( mD s( )

) :

s=1

It can be seen that replacing mk by mk ( 0k ), k by 1; :::; Mk . We shall now establish that Jacobian matrix ( )

@F Nk (mk ; @ k

k)

( )

=

@Fi

0 k,

(mk ; @ kj

it holds Fi

k)

!

i;j=1;:::;Mk +J+1

0 k );

0 k)

= 0, for all i =

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

is nonsingular when (mk ; ( )

@Fi

(mk ; @ kj

k)

= =

=

k)

0 k );

= (mk (

@ @d (N mk ;mk ( @ j @ ki Mk X

@ @

kj

ms (

0 k ).

For i; j = 1; :::; J + 1

k ))

Nk ms ms ( )

N m ( msk( s) ) k

k )xsi

N m

Nk ms ms ( )

( mks ( s) )

k )xsi xsj

0

N m

( mks ( s) ) +

s=1

and because

0

(1) =

!

N m ( msk( s) ) k

0

s=1

Mk X ms (

Mk X N m Nk ms xsi xsj mks ( s)

00 Nk ms ( ms ( ) );

s=1

00

(1) = 0, and ( )

@Fi

20

(mk ; @ kj

Hence, ( )

@F Nk (mk ; @ k

k)

(1) = 1 for all ,

k)

= Nk (mk ;

!

0 ); k

k )=(mk (

Mk X

0 k )xsi xsj :

ms (

s=1

0) k

1 0 k ))X k :

= Nk X Tk Diag(mk ( (mk ;

k )=(mk (

0 ); k

0) k

Applying the Implicit Function Theorem there exist: a neighbourhood Uk of (mk ( 0k ); 0k ) in C Mk RJ+1 such that @F ( ) (mk ; k )=@ k is nonsingular for every (mk ; k ) 2 Uk ; an open set Ak C Mk that contains mk ( 0k ); ( ) ( ) and a unique, continuously di¤erentiable function e k : Ak ! RJ+1 such that e k (mk ( 0k )) = 0k and ( ) ( ) f(m ; ) 2 Uk : F (m ; ) = 0g = f(m ; e (m )) : m 2 Ak g: k

k

k

Nk

k

k

k

k

k

Since

min d

mk (

mk 2Ak

it holds e(

k

that is

)

arg min d mk 2Ak

0 e( ) k );mk ( k (mk ))

= min d

0 e( ) k );mk ( k (mk ))

mk (

e ( ) (m ( k k

0 k ))

= arg min d k2

k2

mk (

0 k0 );mk ( k )

;

k

= arg min d k2

Nk mk (

mk (

0 k0 );mk ( k )

0 k );m( k )

:

(24)

k

( ) Furthermore, from the properties of power divergence measures and because e k (mk ( have

0=d

mk (

0 0 e( ) k );m( k (mk ( k )))

A11 = U T Diag(mk ( 0k )) U = Diag(fNkj gJj=1 ); > > > Ik Ik > P P < A12 = U T Diag(mk ( 0k ))v = ( mk1i ( k0 )tki ; :::; mkJi ( k0 )tki )T = AT21 ; i=1 i=1 > Ik J P > P > 0 0 T 2 > > mkji ( k )tki ; : A22 = v Diag(mk ( k )) v = j=1i=1

we can use formula

8 1 1 1 < B 11 = A11 + A11 A12 B 22 A21 A11 T 1 B 21 = B 12 = B 22 A21 A11 : : 1 1 B 22 = A22 A21 A11 A12

It follows that

0 1 k ))X k )

eTJ+1 (X Tk Diag(mk (

(26)

= B 21

B 22 =

B 22 A21 A111

= B 22

A21 A111

1 ;

B 22

where A21 A111 = (

Ik P

0 k )tki ; :::;

mk1i (

i=1

= (Nk11 and Ik J P P

B 22 =

mkji (

j=1i=1 Ik J P P

=

mkji (

j=1i=1 Ik J P P

=

mkji (

j=1i=1

because

PJ

j=1

Ik P

mk1i (

i=1

0 2 k )tki

PIk

0 i=1 mkji ( k )

Proof of Theorem 5

Ik P

mkji (

j=1

i=1

J P

Ik P

j=1

0 k )(tki

e t2kj (

i=1

e tkj ( 0 k)

mkji (

0 2 k ))

=

!

PJ

j=1

Reformulating Theorem 4 we obtain p p 0 T Nk b1k; Nk (D k 1k = ak with aTk

0 T 2 eT 1k tk ( k )X k .

1 J 0 k )tki )Diag(fNkj gj=1 )

mkJi (

i=1

Ik 1P 0 mkJi ( k0 )tki ) k )tki ; :::; NkJ i=1

J P

0 2 k )tki

Ik P

!

e t2kj ( 0k )

0 k)

e t2kj ( 0k )

0 k)

1

= (e tk1 (

0 0 e k ); :::; tkJ ( k ))

1

J P

j=1

Ik P

mkji (

i=1

0 k )tkj

e tkj (

!

1

0 k)

; PIk

0 i=1 mkji ( k )tkj

mk (

o k ))

+o

e tkj (

p Nk

0 k ).

Dk Nk

mk (

o k)

;

We would like to calculate the asymptotic distribution as a linear function of

p

Nk

Dk Nk

mk (

o k)

L

! N (0; Diag(mk (

Nk !1

0 k ))):

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

23

Since Var aTk

p

Nk (D k

mk (

o k ))

= aTk Var Nk

p Nk

Dk Nk 0 k ))ak

= Nk2 aTk Diag(mk (

it holds aTk p

p

Nk (D k

o k ))

mk (

L

Nk

! N (0; Nk

Nk !1

mk ( ok ) p Theorem, the asymptotic distribution of Nk b1k; ution of 27. Taking into account that o

Dk Nk

mk ( = Nk

o k)

ak

2 1k ;

2 1k ):

(27)

= o(OP (1)) = oP (1), according to the Slutsky’s 0 1k

must coincide with the asymptotic distrib-

Proof of Theorem 7 From Theorem 4 subtracting b12; ( b11;

0 11 )

( b12;

0 T 2 eT 12 t2 ( 2 )X 2

0 12 )

(1)

0 12

=

(1)

(D 2

m2 (

(2)

T

to b11;

0 T 2 eT 11 t1 ( 1 )X 1 0 1 ))

0 11 (1)

(D 1

(2)

we get (1)

m1 (

(2)

(D 2

0 1 ))

m2 (

0 1 ))

(2)

(2)

(D 1

0 1)

D 1 m1 ( N1

+o

0 1 ))

m1 (

(2)

o

D 2 m2 ( N2

0 2)

:

T

(2)

0 0 = 12 it holds X Tk mk ( 0k ) = X k m(2) ( 0 ), Observe that X Tk D k = X k D , k = 1; 2, and under 11 0 0 it holds k = 1; 2. In addition, o () function is not a¤ected by the negative sign and under 11 = 12 0 0 and thus we obtain (18). = 2 1

Proof of Theorem 8 We can consider the following decomposition p with

N b11;

p

b12;

p = (N aT1 ) N D1

NY = o

1 N1

D 1 m1 ( p N

m1 ( N

0

)

0

)

p + (N aT2 ) N D2

+o

1 N2

D 2 m2 ( p N

m2 ( N

0

)

0

)

+

p

N Y;

(28)

;

rather than (18). Note that from Assumptions 3 and 6 mk ( 0 )=N = Nk m ( 0 ) is constant as N p 0 0 increases and hence N Y = o D1 pmN1 ( ) + o D2 pmN2 ( ) = o (OP (1)) + o (OP (1)) = oP (1). We would like to calculate the asymptotic distribution as a linear function of p

N Dk

mk ( N

0

L

)

! N 0; Diag Nk m (

N !1

0

)

:

From (28) and by applying Slutsky’s theorem we can conclude that the asymptotic distribution of p b12; is central Normal. In order to calculate the variance we shall follow (18) so that N b11; p

N b11;

b12;

=

p

N X1 +

p

N X2 +

p

N X3 +

p

N Y;

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

with

p (1) (1) N X1 = aT1 N D 1 m1 ( p p (1) (1) N X2 = aT2 N D 2 m2 ( p p (2) N X3 = aT1 aT2 N D p N Y = oP (1) p

) ;

0

) ; 0

m(2) (

) ;

T 0 2 eT )X k , 1k tk (

where aTk

and X1 , X2 and X3 are independent random variables. Since p p (1) (1) Var N Xk = Var aTk N D k mk ( 0 ) (1)

0

= N aTk Diag(mk ( Var

0

24

p

aT1

N X3 = Var = N aT1

p

aT2

N D

(2) 0

aT2 Diag(m(2) (

aT1 Diag(m(2) ( 0 ))a1

=N

(2)

= N aT1 Diag(m1 (

0

0

m(2) (

+

)) (a1

))ak ;

k = 1; 2;

)

a2 )

aT2 Diag(m(2) ( 0 ))a2 (2)

))a1 + aT2 Diag(m2 (

0

2aT1 Diag(m(2) (

))a2

2

2 2 11 12 12

0

))a2

;

with T

12

it holds

p

Var

T

= et1 ( 0 )X 1 Diag(m(2) ( 0 ))X 2et2 ( 0 ) PJ PI1 I (2) 0 = j=1 i=1 m2ji ( )(t2i e t1j ( 0 ))(t2i e t2j ( 0 )) PJ PI1 I n(2) 0 2ji )(t2i e t1j ( 0 ))(t2i e t2j ( 0 )); = j=1 i=1 n2ji m2ji ( (1)

(1)

0

2 12

2

+ aT2 Diag(m2 ( 2 11

= N( that coincides with the asymptotic variance of

p

Proof of Corollary 9 Since 0

(2)

)=e tkj (

0

)+

mkj

mkj

formula (20) can be rewritten as =

J I1PI P

j=1 i=1

=

+

(2)

(2)

) + m2 (

N b11;

0

) + m1 ( 0

))a1

))a2

2 2 11 12 12 )

2

2 2 11 12 12 );

b12;

.

(2) e tkj (

0

(1)

e tkj (

12

0

N (X1 + X2 + X3 ) = N (aT1 Diag(m1 (

J I1PI P

j=1 i=1

m2ji (

(2)

0

) t2i

(2)

0

)(t2i

m2ji (

J I1PI 2 P P

k=1j=1 i=1

(2)

m2ji (

0

(2) e t1j (

(2) e t2j (

)(t2i

0

0

(1)

m1j m1j

)

))2 +

(2) e t2j (

(1) (e t1j (

J I1PI P

j=1 i=1 0

(1)

(1) (e tkj ( 0

)

0

)

(2) e t1j (

0

(2) e tkj (

0

(1)

(2)

m2ji (

mkj (1) )) mkj (e tkj (

0

)

)) (1)

0 m1j m2j ) m1j m2j

)):

));

t2i (1) (e t1j (

k = 1; 2;

(2) e t2j (

0

)

0

(1)

m2j m2j

)

(2) e t1j (

0

(1) (e t2j (

(1) ))(e t2j (

0

)

0

)

(2) e t2j (

(2) e t2j (

0

0

))

))

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

The last summand is canceled because J I1PI P

j=1 i=1

=

(2)

m2ji (

J m(1) P kj

j=1

mkj

0

(2) e t2j (

)(t2i

(1) (e tkj (

0

(2) e tkj (

)

0

0

(1)

mkj (1) )) mkj (e tkj (

))

I1PI i=1

(2)

m2ji (

0 2)

0

(2) e tkj (

)(t2i

and it follows (23).

b12; ) =

2 12

+

2 11

2

2 2 12 11 12

=

2 12

+

))

(2) e t2j (

0

2 11

2

PI1 I (2) 0 (2) and i=1 m2ji ( )(t2i e t2j ( 0 )) = 0. Hence, it holds (22). If region 2 is completely contained in region 1, 12 = 1= 12 , and therefore Var( b11;

0

))

2 11 ;

25

A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance

26

Acknowledgements The authors thank the associate editor and two referees for their valuable comments and suggestions. This work was carried out during the stay of the …rst author as Visiting Scientist at Harvard University and Dana Farber Cancer Institute, supported by the Real Colegio Complutense and grant MTM2009-10072.

References [1] Berkson, J. (1980). Minimum chi-square, not maximum likelihood! Annals of Statistics, 8, 457-487. [2] Bishop, Y.M.M., Fienberg, S.E. and Holland, P.W. (1995): Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge. [3] Fay, M., Tiwari, R., Feuer, E. and Zou, Z. (2006). Estimating average annual percent change for disease rates without assuming constant change. Biometrics, 62, 847-854. [4] Horner, M.J., Ries, L.A.G., Krapcho, M., Neyman, N., Aminou, R., Howlader, N., Altekruse, S.F., Feuer, E.J., Huang, L., Mariotto, A., Miller, B.A., Lewis, D.R., Eisner, M.P., Stinchcomb, D.G., Edwards, B.K. (eds). SEER Cancer Statistics Review,1975-2006, National Cancer Institute. Bethesda, MD, http://seer.cancer.gov/csr/1975_2006/ [5] Imrey, P.B. (2005). Power Divergence Methods. In: Encyclopedia of Biostatistics, Armitage P., Colton T., eds. John Wiley and Sons, New York. [6] Kullback, S. and Leibler, R.A. (1951). On information and su¢ ciency. The Annals of Mathematical Statistics. 22, 79–86. [7] Li, Y. and Tiwari, R.C. (2008). Comparing Trends in Cancer Rates Across Overlapping Regions. Biometrics. 64, 1280-1286. [8] Li, Y., Tiwari, R.C. and Zou, Z. (2008). An age-strati…ed model for comparing trends in cancer rates across overlapping regions. Biometrical Journal. 50, 608-619. [9] Pardo, L. (2006). Statistical Inference Based on Divergence Measures. Chapman & Hall / CRC (Statistics: Textbooks and Monographs), New York. [10] Pardo, L. and Martín, N. (2009). Homogeneity/heterogeneity hypotheses for standardized mortality ratios based on minimum power-divergence estimators. Biometrical Journal. 51, 819-836. [11] Pickle, L.W. and White, A.A. (1995): E¤ects of the choice of age-adjustement method on maps of death rates. Statistics in Medicine, 14, 615-627. [12] Cressie, N. and Read, T. R. C. (1988): Multinomial goodness-of-…t tests. Journal of the Royal Statistical Society B. 46, 440-464. [13] Riddell, C. and Pliska, J.M. (2008): Cancer in Oregon, 2005: Annual Report on Cancer Incidence and Mortality among Oregonians. Department of Human Services, Oregon Public Health Division, Oregon State Cancer Registry, Portland, Oregon. http://egov.oregon.gov/DHS/ph/oscar/arpt2005/ar2005.pdf [14] Tiwari, R.C., Clegg, L. and Zou, Z. (2006). E¢ cient interval estimation for age-adjusted cancer rates. Statistical Methods in Medical Research. 15, 547-569. [15] Walters, K.A., Li, Y., Tiwari, R.C. and Zou, Z. (2010). A Weighted-Least-Squares Estimation Approach to Comparing Trends in Age-Adjusted Cancer Rates Across Overlapping Regions. Journal of Data Science. 8, 631-644.