Computational Statistics and Data Analysis A flexible spatial scan test ...

Report 1 Downloads 114 Views
Computational Statistics and Data Analysis 53 (2009) 2843–2850

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

A flexible spatial scan test for case event data Lionel Cucala Institut de Mathématiques, Université Paul Sabatier, 118 Route de Narbonne, F-31062 Toulouse cedex 9, France

article

info

Article history: Available online 17 October 2008

a b s t r a c t A new method is proposed for identifying clusters in spatial point processes. It relies on a specific ordering of events and the definition of area spacings which have the same distribution as one-dimensional spacings. Then the spatial clusters are detected using a scan statistic adapted to the analysis of one-dimensional point processes. This flexible spatial scan test seems to be very powerful against any arbitrarily-shaped cluster alternative. These results have applications in epidemiological studies of rare diseases. © 2008 Elsevier B.V. All rights reserved.

1. Introduction Let X1 , . . . , Xn be random variables that denote the spatial coordinates of n events observed in A, a bounded subset of R2 . The objective of this work is to identify, if they exist, the zones in which the concentration of events is abnormally high, usually named clusters. The observed concentration is usually compared with the concentration observed under the null hypothesis H0 that the events are sampled independently from the underlying population density h(x), x ∈ A. This problem arises naturally in epidemiological applications when one wants to assess the outbreak of an unknown disease and to analyse for example whether it is infectious or dependent on any spatial environmental factor. In these cases, it is crucial to detect and to recover as precisely as possible the clusters (or clustering zone). Many procedures are designed for grouped data, that is when the observation domain is divided into subdomains and only the number of events within each subdomain is known (Lawson, 2001). These procedures can also be used when individual data are available but the loss of information may be important and the test result is then dependent on the arbitrarily chosen division. Moreover, as the monitoring tools get improved and the computational capacities increase, the methods for individual data become more and more applicable. Existing spatial methods are often derived from one-dimensional cluster detection methods, which are mainly applied to temporal point processes. For example, the techniques introduced by Kelsall and Diggle (1995a,b)) are based on a kernel intensity estimation of the events process. But, as in the temporal setting, the most popular is the scan statistic adapted to the spatial setting by Kulldorff (1997). It relies on the generalised likelihood-ratio test statistic of H0 against a piecewiseconstant density alternative. To apply this method, one needs to set the family of the possible clusters, for example all the discs centered on a point of a predefined grid. This is an important limitation since, in the real world, an excess of events may be recorded along a river for example. An alternative has been proposed recently: Kulldorff et al. (2006b) investigated a wide family of elliptic windows with predetermined shape, angle and centre. The ultimate solution would be to consider all the convex envelopes including any subset of the events locations. However, this becomes computationally infeasible when the number of events is large. The solution we propose in this article is totally different, since it considers completely data-based possible clusters, as suggested by Duczmal et al. (2007) with grouped data. Indeed, we show how an ordering of the events introduced by Dematteï et al. (2007) leads to transform the spatial data into a one-dimensional point process that can be analysed using any temporal cluster detection technique. The following section describes the data transformation process leading to the

E-mail address: [email protected]. 0167-9473/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.csda.2008.10.008

2844

L. Cucala / Computational Statistics and Data Analysis 53 (2009) 2843–2850

Fig. 1. The events ordering.

flexible spatial scan test. In a third section, the test is applied to real and simulated data sets. The paper is concluded by a discussion. 2. The flexible spatial scan test 2.1. The events ordering Let X1 , . . . , Xn be defined as in the Introduction. The starting point of the data transformation is the ordering of the events introduced by Dematteï et al. (2007). The first event, whose location is called X(1) , is the closest one to the boundary of the observation domain A, denoted by ∂ A, using the euclidian distance, denoted by d(., .). Thus D1 = d(X(1) , ∂ A) = min d(Xi , ∂ A). 1≤i≤n

Then the second event, whose location is called X(2) , is the closest one to X(1) among all the events that have not yet been ordered, such that D2 = d(X(2) , X(1) ) = min d(Xi , X(1) ). 1≤i≤n Xi 6=X(1)

All the events are iteratively ordered in the same way and Dj = d(X(j) , X(j−1) ) =

min

1≤i≤n Xi 6=X(k) ,∀k=1,...,j−1

d(Xi , X(j−1) ),

∀j = 3, . . . , n.

Fig. 1 represents the trajectory through all the ordered events for a simulated point pattern on [0, 1]2 , with n = 3. The method introduced by Dematteï et al. (2007) is based on the distances between successive events {D1 , . . . , Dn }. This seems quite natural since two successive events may belong to a cluster if the distance from one to another is small. However, even under the null hypothesis, these distances are not identically distributed and their dependance structure is quite complex. For these reasons we propose a different approach based on a specific distributional property. 2.2. The distributional property Let S1 , . . . , Sn+1 denote one-dimensional uniform spacings, that is spacings issued from a uniform [0, 1] n-sample (Pyke, 1965). The spacings are just the intervals between consecutive ordered points. The distribution of {S1 , . . . , Sn+1 } is the wellknown Dirichlet distribution. In the following we will prove that the ordering described earlier gives birth to a set of random variables whose null probability distribution is exactly the distribution of {S1 , . . . , Sn+1 }. Let A1 = {x ∈R A : d(x, ∂ A) < D1 } denote the subdomain of A containing all the points closest than D1 from the boundary and let B0 = B h(x)|dx|, ∀B ⊂ A, where |.| is the Lebesgue measure. Thus the variable A01 is the population proportion of the area we had to explore before finding the first event, when starting from the boundary of A. Actually, under H0 , the distribution of A01 is the same than the distribution of S1 and does not depend on the geometry of the domain A. Indeed, for all r > 0, we get P (A01 ≤ S10 ,r ) = 1 − P (A01 > S10 ,r )

= 1 − P (Xi ∈ S1,r , ∀i = 1, . . . , n) !n Z = 1− h(x)|dx| = 1 − (1 − S10 ,r )n , S1,r

L. Cucala / Computational Statistics and Data Analysis 53 (2009) 2843–2850

2845

where S1,r = {x ∈ A : d(x, ∂ A) ≤ r } is the subset of the points closest than r from the boundary and B¯ = {x ∈ A : x 6∈ B}, ∀B ⊂ A. The second equality comes from the definition of A1 : it is larger than S1,r only if all the events X1 , . . . , Xn are further than r from the boundary. The third equality comes from the null distribution of the Xi ’s. Since P (S1 < t ) = 1 − (1 − t )n , ∀t ∈ [0, 1], A01 and S1 follow the same distribution. Similarly, letting A2 = {x ∈ A1 : d(x, X(1) ) < D2 } and S2,r = {x ∈ A1 : d(x, X(1) ) ≤ r }, we get, for all r > 0,

   P A02 ≤ S20 ,r |A01 = 1 − P A02 > S20 ,r |A01 = 1 − P Xi ∈ S2,r |Xi ∈ A1 , ∀i ∈ {1 ≤ j ≤ n : Xj 6= X(1) } = 1−

= 1−

P Xi ∈ S2,r ∩ A1 , ∀i ∈ {1 ≤ j ≤ n : Xj 6= X(1) }



P Xi ∈ A1 , ∀i ∈ {1 ≤ j ≤ n : Xj 6= X(1) }



A1 ∪ S2,r A1

0

0

! n −1

 =1−

1 − A01 − S20 ,r 1 − A01

 n −1

.

The second equality comes from the definitions of A1 and A2 . The third one comes from the definition of the conditional distribution and the final one is a consequence of A1 ∩ S2,r = ∅. Since P (S2 < t |S1 = s) = 1 −((1 − s − t )/(1 − s))n−1 , ∀t ∈ [0, 1 − s], the conditional distribution of A02 given A01 is the same as the conditional distribution of S2 given S1 . Since the distributions of A01 and S1 are also the same, the marginal distribution of A02 is also identical to the distribution of S2 . The same method can be used recursively for all A0i , where

( Ai =

) [

x∈

Aj : d(x, X(i−1) ) < Di

,

2 ≤ i ≤ n,

j=1,...,i−1

and

[

An+1 =

Aj .

j=1,...,n

The areas A1 , . . . , An+1 are represented on Fig. 1. Finally we can set that, under H0 , the so-called area spacings {A01 , . . . , A0n+1 } have the same distribution as the uniform spacings {S1 , . . . , Sn+1 }. 2.3. The cluster detection 0 We now define the one-dimensional point process {T1 , . . . , Tn }, where Ti = j=1 Aj , 1 ≤ i ≤ n: this is the point process 0 0 whose spacings are the area spacings A1 , . . . , An+1 . From the distributional property of the area spacings, we can set that, under H0 , the events {T1 , . . . , Tn } are distributed as order statistics issued from a uniform n-sample on [0, 1]. Moreover, the one-dimensional distances between consecutive Ti ’s represent the spatial distances between consecutive X(i) ’s. Thus, a one-dimensional cluster detected on {T1 , . . . , Tn } corresponds to an abnormal sequence of small consecutive area spacings, that is to a spatial cluster. The initial spatial cluster detection problem has now been transformed into a one-dimensional cluster detection problem. The most popular of the one-dimensional cluster detection methods for individual data is the scan statistic with variable window, introduced by Nagarwalla (1996). It relies on the generalised likelihood-ratio test between the null uniform density hypothesis and a piecewise-constant density hypothesis. However, a hypothesis-free scan statistic adapted to multiple testing has recently been defined (Cucala, 2008) and gives better results. It consists of applying a concentration index based on the null distribution of the m-spacings issued from the point process. The significance of the maximal concentration is then estimated by a Monte-Carlo procedure, exactly as in the standard method. In the following, we will only use the latter method but any other choice could be made. Suppose the application of the one-dimensional cluster detection technique to the point process {T1 , . . . , Tn } leads to a number k of significative clusters denoted by Ii = [T(ai ) , T(bi ) ], 1 ≤ i ≤ k. Thus the events which are included in the significative spatial clustering zone are:

Pi

{X(j) : ∃i ∈ {1, . . . , k}, ai ≤ j ≤ bi }. We know which events are included in the clustering zone but we also need to define the shape of this clustering zone. Two possibilities are given by Dematteï et al. (2007). The first one consists in identifying a cluster with the union of the Voronoi cells associated to its events. The second one consists of defining an influence zone for each event: the circle centred on this event and whose area is the total observation area divided by the total number of events. The cluster is then the union of the influence zones of its events. Any of these solutions could be chosen. However, we should remember that each area spacing A0i corresponds to the population proportion contained in the area Ai . Thus it seems logical to associate the area spacings A0i included in the one-dimensional cluster with the areas Ai included in the spatial cluster. Then, the corresponding spatial clusters are the zones: Cˆ i =

bi [ j=ai +1

Ai ,

2846

L. Cucala / Computational Statistics and Data Analysis 53 (2009) 2843–2850

Fig. 2. Detection of cancer clusters.

that is the union of the areas corresponding to the area spacings included in the one-dimensional cluster. The total clustering zone is the union of all the clusters: Cˆ =

k [

Cˆ i .

i=1

Since the whole process, described in Section 2, relies on a one-dimensional scan statistic and leads to the detection of arbitrarily-shaped clusters, we call it the flexible spatial scan test. 3. Data analysis In this section we compare the results obtained by the flexible spatial scan test to the multiple test based on the elliptic spatial scan statistic (Zhang et al., 2007). The first one is denoted by ΛF , the latter one by ΛE . We could not compare these methods to the method introduced by Dematteï et al. (2007) since the software they provide in the R package spatclus does not allow one to set the nominal level. 3.1. An application to real data We first apply the methods to a data set describing the spatial distribution of the larynx cancers and the lung cancers recorded between 1973 and 1984 in the Chorley–Ribble area in Lancashire, UK (Diggle et al., 1990). There are 917 lung cancer cases and 57 larynx cancer cases and the associated residence location is known for each case. Since there is no location matching another, these are individual data. The larynx cancers are less frequent and they are suspected to aggregate around a disused industrial incinerator. Since the lung cancer cases seem to be less dependent on any environmental factor and may be distributed according to the underlying population, they are used as control data and their locations are denoted by {zj , j = 1, . . . , N }. The density integrals A0i , 1 ≤ i ≤ n + 1, introduced in Section 2, are thus estimated using the control data: for any B ⊂ A, the integral B h(x)|dx| is approximated by j=1 1(zj ∈ B)/N, the proportion of the control cases included in B. The flexible spatial scan test is written in R language and the programs refer to an existing package named spatstat. The elliptic spatial scan statistic is applied in a standard Bernoulli model by the SaTScan software, without compactness penalty function. In both methods, the p-values are estimates based on 999 Monte-Carlo replications. Fig. 2 shows the results given by the methods when the nominal level is set to α = 0.05. The controls are represented by smaller dots and the cases by larger dots. The triangle point down represents the disused incinerator. We observe that many larynx cancer cases are aggregated in small areas but the population at risk, represented by the lung cancer cases, is also more concentrated in these areas, except in a south central zone around the old incinerator containing five cases. The shaded area is the cluster estimated by the flexible spatial scan test: it matches this zone almost perfectly. Its significance value is 0.033. The most likely cluster computed by the elliptic scan test, represented by the ellipse containing the same five events, is very similar and its significance value is 0.012. Thus, the flexible method and the elliptic scan method clearly indicate an abnormal case concentration around the old incinerator and the cluster obtained is very close to the convex envelope of the five case locations surrounding the incinerator. Running on the same personal computer, the flexible scan test takes approximately 260 s and the elliptic scan test around 510 s. We have to mention that the circular

R

PN

L. Cucala / Computational Statistics and Data Analysis 53 (2009) 2843–2850

2847

Fig. 3. Detection of events clusters.

scan test, which is much quicker since the number of possible clusters is much smaller, leads to the conclusion that there is no significant spatial clustering of the larynx cancer cases. Indeed, contrary to the elliptic significative cluster, a circle containing the five case locations surrounding the incinerator also contains a large number of control locations. Remark also that this data set has previously been studied using different methods. Diggle (1990) and Diggle and Rowlingson (1994) first estimated the larynx cancer intensity in a parametric model but the choice of the model remained problematic. Then, Kelsall and Diggle (1995b) analysed the overall ratio between the cases density and the controls density. However, they obtained a non-significant result. We believe that this comes from using a statistic reflecting the global discrepancy between H0 and the data set in the whole observation domain. On the other side, all the scan statistics rely on a local discrepancy measure in the possible clusters and are thus more likely to detect small clusters. 3.2. An application to artificial data The same methods are applied to a data set containing n = 100 events simulated according to the density

 125/44   125/44 f 1 ( x, y ) =  25/44 0

if 0.45 < x < 0.55, else if 0.45 < y < 0.55, else if (x, y) ∈ [0, 1]2 , otherwise.

The true cluster is cross-shaped and hence quite difficult to recover, and we want to compare the efficiency of both methods in this example. First, we need to create an artificial control data set. For the flexible spatial scan test, the underlying population is given by a 1000 × 1000 regular grid on [0, 1]2 . Unfortunately, we could not use the elliptic spatial scan test with such a large number of control data since it would take more than a week, as indicated by the SaTScan software! We thus restricted it to a 20 × 20 regular grid on [0, 1]2 . This computational problem is a consequence of the generation of the possible clusters. Indeed, the SaTScan software takes into account all the events in order to build the possible clusters, whether these are case events or control events. Then, the programme counts how many case and control events are located in each possible cluster. So, increasing the number of control events increases both the number of possible clusters and the length of the counting process in each one. On the other hand, when one applies the flexible scan test, the computation of the areas Ai , 1 ≤ i ≤ n + 1, only depends on the case events. The control events are then counted in each of these areas to compute the area spacings A0i , 1 ≤ i ≤ n + 1. Thus, increasing the number of control events increases only the length of the counting process in the flexible scan test: it becomes less time-consuming when the control data increases. The result obtained from both methods is given in Fig. 3. Here again, the shaded area represents the cluster zone given by ΛF and the two ellipses are the significative clusters exhibited by ΛE . Both methods mimic quite well the true cluster, except in the right side of the window, where the number of cases is not large enough to be significative. Remark that all the cases included in the elliptic clusters, except one, are in the clustering zone of the flexible test. On the other hand, there are fifteen cases, all included in the true cluster, which are in the clustering zone of ΛF but not in the one obtained by ΛE . These results have to be confirmed by a simulation study. 3.3. A simulation study The same methods are now applied to simulated data sets containing n = 100 events.

2848

L. Cucala / Computational Statistics and Data Analysis 53 (2009) 2843–2850

Table 1 Tests applied to a simulated circular cluster. r

Empirical results of the following:

ΛF

ΛE

3

Power TP TN I TPC TNC IC

0.700 0.0308654 0.8195426 0.850408 0.10375 0.6285 0.73225

0.375 0.0376875 0.8448977 0.8825852 0.1155 0.65525 0.77075

5

Power TP TN I TPC TNC IC

0.975 0.08432245 0.8144343 0.8987567 0.33425 0.5115 0.84575

1.000 0.0979375 0.836125 0.9340625 0.377 0.5245 0.9015

First, the methods are tested against a circular clustering alternative. The events are simulated according to the density function  r  if (x − 0.5)2 + (y − 0.5)2 < 0.22 ,    1 + 0.04π (r − 1) 1 f2 (x, y) = else if (x, y) ∈ [0, 1]2 ,   1 + 0 . 04 π ( r − 1 )   0 otherwise, where the parameter r is the ratio between the maximum density and the minimum density on the observation domain A = [0, 1]2 . We compare the power of the global tests associated to each method. Moreover, we check whether the significative clusters exhibited by the different methods match the ‘‘true’’ cluster C , that is the subset in which the density is higher, the disc centred in O = (1/2, 1/2) with radius 0.2 in this example. A true positive index is given by TP = |C ∩ Cˆ |. A

¯

true negative index is given by TN = |Cˆ ∩ C¯ |. Finally, a correspondence index between the true and the estimated clustering zone is given by I = TP + TN. Here we consider that the type I and type II errors have the same weight but any other choice could be made, as considered by Takahashi and Tango (2006). All these indices are estimated on a 1000 × 1000 regular grid Pn ˆ on [0, 1]2 . Finally, we also check whether the methods correctly classify the simulated events. Let n1 = 1n i=1 1(Xi ∈ C ∩ C )

¯

ˆ ¯ and n2 = 1n i=1 1(Xi ∈ C ∩ C ). A true positive index on cases is given by TPC = n1 /n, a true negative index on cases by TNC = n2 /n and a correspondence index on cases by IC = TPC + TNC . The nominal level of the global test is set to 5% and 40 data sets are simulated. For both methods, the underlying population is the same as in the preceding subsection. Table 1 gives the results obtained. As we could expect, the elliptic spatial scan statistic ΛE recovers more precisely the circular shape of the true cluster than the flexible scan test. This sounds logical as the family of the potential clusters proposed by ΛE also contains circles which may be very similar to the true cluster. The concordance indices on cases lead to the same conclusion: the flexible method seems to be less efficient for classifying the events. However, the flexible scan test seems to be more powerful, even against these circular alternatives. This may be a consequence of the quite small number of control data we could generate for ΛE . Now we want to test whether our method is more efficient than the classical one against non-circular clusters, for example horizontal clusters. Events are simulated according to the density function Pn

f3 (x, y) =

 r     1 + 0.12(r − 1) 1

    1 + 0.12(r − 1) 0

if 0.2 < x < 0.8 and 0.4 < y < 0.6, else if (x, y) ∈ [0, 1]2 , otherwise.

Table 2 gives the results obtained. The results are very similar to the previous ones. Since the set of the ellipses which are tested as potential clusters is very large, there is always one of them that is quite similar to the true horizontal cluster. However, once again, the flexible scan test exhibits higher powers. 4. Discussion The flexible spatial scan test allows one to detect several clusters in a spatial point process without assuming anything about the clustering structure, and without setting up any parameter. On the other hand, the widely-used circular scan

L. Cucala / Computational Statistics and Data Analysis 53 (2009) 2843–2850

2849

Table 2 Tests applied to a simulated horizontal cluster. r

Empirical results of the following:

ΛF

ΛE

3

Power TP TN I TPC TNC IC

0.700 0.01746803 0.863304 0.8807721 0.06975 0.69425 0.764

0.325 0.02575 0.8693125 0.8950625 0.09025 0.70025 0.7905

5

Power TP TN I TPC TNC IC

0.950 0.07540475 0.7836604 0.8590651 0.308 0.486 0.794

0.975 0.08525 0.843125 0.928375 0.348 0.52775 0.87575

statistic is very restrictive concerning the shape of the possible clusters. The recently-introduced elliptic scan statistic is more flexible but is computationally very time-consuming when the control data are not aggregated. Moreover, contrary to the null quantiles of the elliptic spatial scan statistic, the quantiles used by the flexible spatial scan test do not depend on the observation domain and do not need to be recalculated when this changes or when the family of the possible clusters changes. The concordance indices obtained by the flexible spatial scan test are lower than expected. Maybe, identifying the estimated cluster as an union of Ai ’s is not the best choice since these areas Ai , 1 ≤ i ≤ n + 1, may contain quite large zones not containing any event. Once the events contained in the cluster zone are known, the choice of this cluster zone is always problematic. Dematteï et al. (2007) proposed two different possibilities but none of them was entirely satisfactory. Anyway, the concordance indices on cases are always better for ΛE so it seems that any choice for the shape of the clustering zone of ΛF leads to lower concordance indices than ΛE . This may be a consequence of the ordering of the events as mentioned by Dematteï et al. (2007): there is always the possibility that a non-clustered event is ordered between two subsets of clustered events and is then considered as belonging to the clustering zone. However, if we focus on the power, the flexible method ΛF seems to perform very well against any alternative. As noted by Kulldorff et al. (2006b), the main goal of a cluster detection technique is to generate an alarm so that the public health officials can investigate the details of this excess of events. In order to achieve this, the flexible spatial scan test is the best and fastest method when dealing with large individual data sets. As discussed in Section 2, the flexible spatial scan test adapts to inhomogeneity in the same way as the elliptic spatial scan test for case event data. This would also be true for any continuous covariate adjustment (Klassen et al., 2005). This could be done by modelling a risk function depending both on the underlying population and on the adjusted covariates. Finally, a possible extension of this method to the spatio-temporal setting could be imagined: the only need is to define an appropriate spatio-temporal distance between events. This might be an alternative to the spatio-temporal scan statistic which is computationally unfeasible when dealing with large point processes, since the number of cylindrical windows to explore becomes huge. Acknowledgements This research project is within the scope of my Ph.D. thesis work under the direction of Christine Thomas-Agnan. I gratefully thank her for her constant help and support. All programs are written in R language and are available on demand. References Cucala, L., 2008. A hypothesis-free multiple scan statistic with variable window. Biometrical Journal 50, 299–310. Dematteï, C., Molinari, N., Daurès, J.P., 2007. Arbitrarily shaped multiple spatial cluster detection for case event data. Computational Statistics and Data Analysis 51, 3931–3945. Diggle, P.J., 1990. A point process modelling approach to raised incidence of a rare phenomenon in the vicinity of a prespecified point. Journal of the Royal Statistical Society, Series A 153, 349–362. Diggle, P.J., Gatrell, A.C., Lovett, A.A., 1990. Modelling the prevalence of cancer of the larynx in part of Lancashire: A new methodology for spatial epidemiology. In: Thomas, R.W. (Ed.), Spatial Epidemiology, vol. 21. Pion, London. Diggle, P.J., Rowlingson, B.S., 1994. A conditional approach to point process modelling of elevated risk. Journal of the Royal Statistical Society, Series A 157, 433–440. Duczmal, L., Cançado, A., Takahashi, R., Bessegato, L., 2007. A genetic algorithm for irregularly shaped spatial scan statistics. Computational Statistics and Data Analysis 52, 43–52. Kelsall, J.E., Diggle, P.J., 1995a. Kernel estimation of relative risk. Bernoulli 1, 3–16. Kelsall, J.E., Diggle, P.J., 1995b. Non-parametric estimation of spatial variation in relative risk. Statistics in Medicine 14, 2335–2342.

2850

L. Cucala / Computational Statistics and Data Analysis 53 (2009) 2843–2850

Klassen, A., Kulldorff, M., Curriero, F., 2005. Geographical clustering of prostate cancer grade and stage at diagnosis, before and after adjustment for risk factors. International Journal of Health Geographics 4, 1. Kulldorff, M., 1997. A spatial scan statistic. Communications in Statistics. Theory and Methods 26, 1481–1496. Kulldorff, M., Huang, L., Pickle, L., Duczmal, L., 2006b. An elliptic spatial scan statistic. Statistics in Medicine 25, 3929–3943. Lawson, A.B., 2001. Statistical Methods in Spatial Epidemiology. Wiley, Chichester. Nagarwalla, N., 1996. A scan statistic with a variable window. Statistics in Medicine 15, 845–850. Pyke, R., 1965. Spacings (with discussion). Journal of the Royal Statistical Society, Series B 27, 395–449. Takahashi, K., Tango, T., 2006. An extended power of cluster detection tests. Statistics in Medicine 25, 841–852. Zhang, Z., Kulldorff, M., Assuncão, R., 2007. Spatial scan statistics adjusted for multiple clusters. Preprint.