Bayesian Semiparametric Intensity Estimation for ... - Semantic Scholar

Report 2 Downloads 248 Views
Biometrics 000, 000–000

DOI: 000

000 0000

Bayesian Semiparametric Intensity Estimation for Inhomogeneous Spatial Point Processes

Yu Ryan Yue∗ Baruch College, City University of New York, New York, NY 10010, USA *email: [email protected]

and Ji Meng Loh∗ AT&T Labs-Research, Florham Park, NJ 07932, USA *email: [email protected]

Summary:

In this work we propose a fully Bayesian semiparametric method to estimate the

intensity of an inhomogeneous spatial point process. The basic idea is to first convert intensity estimation into a Poisson regression setting via binning data points on a regular grid, and then model log intensity semiparametrically using an adaptive version of Gaussian Markov random fields to smooth the corresponding counts. The inference is carried by an efficient MCMC simulation algorithm. Compared to existing methods for intensity estimation, e.g., parametric modeling and kernel smoothing, the proposed estimator not only provides inference regarding the dependence of the intensity function on possible covariates, but also uses information from the data to adaptively determine the amount of smoothing at the local level. The effectiveness of using our method is demonstrated through simulation studies and an application to a rainforest dataset. Key words:

Adaptive spatial smoothing; Gaussian Markov random fields; Gibbs sampling; In-

tensity estimation; Spatial point process.

1

Bayesian Semiparametric Intensity Estimation

1

1. Introduction Spatial point data arise in a wide variety of applications including ecology, seismology, astronomy and epidemiology (e.g. Waagepetersen, 2007; Schoenberg, 2003; Stein et al., 2000; Diggle et al., 2000, respectively). Diggle (2003) and Stoyan et al. (1995) both contain comprehensive introductions to the analysis of spatial point patterns. While spatial point analyses range from relatively simple exploratory methods to complex Markov Chain Monte Carlo (MCMC) algorithms for model fitting (Møller and Waagepetersen, 2003), in any spatial point analysis, the first main interest is often the intensity function λ(s), s ∈ R2 , defined by  λ(s) =

lim

|ds|→0

E[N (ds)] |ds|

 ,

(1)

where ds represents an infinitesimal region around s, |ds| its area and N (ds) the number of points in ds. The quantity λ(s)|ds| can be thought of as the probability of observing one data point in the region ds. In the stationary case, λ(s) ≡ λ a constant and, given a spatial point pattern with n points observed in an observational region D ⊂ R2 , can be easily estimated ˆ = n/|D|. In the nonstationary or inhomogeneous case, however, λ(s) is not constant using λ but a function of s. Motivated by theoretical considerations of consistency, much of recent work has considered parametric estimation of the intensity, i.e. modeling the intensity as a function of one or more measured covariates, for an inhomogeneous spatial point process. Specifically, suppose X(s), s ∈ D, is a row vector of p spatial covariates. The point data are then modeled as a realization of a spatial point process with intensity function given by λ(s; α, β) = exp {α + X(s)β} .

(2)

This model can be generalized by replacing the exponential function by another positive strictly increasing differentiable function. Assuming that the point process is an inhomogeneous Poisson with intensity function given by (2), the log-likelihood for a point pattern

2

Biometrics, 000 0000

T = {t1 , . . . , tn } is n

1 X 1 L(α, β; T ) = log λ(ti ; α, β) − |D| i=1 |D|

Z λ(s; α, β) ds.

(3)

D

The quantity L(α, β; X) can be easily maximized using regular software for fitting generalized linear models (Baddeley and Turner, 2000) to obtain estimates. The spatstat R package (Baddeley and Turner, 2005) contains the function ppm that can be used to fit such model. Schoenberg (2004) showed that under some mild conditions on the spatial point process, the estimates can be consistent even if the process is not Poisson. Guan and Loh (2007), Waagepetersen and Guan (2009) and Guan (2009) used this framework in their work. Many important theoretical results for intensity function estimation using parametric models have been derived under some assumptions. One crucial assumption is that the model is correct, i.e. the correct covariates X(s) are used, and that the choice of the relationship between the intensity function and the covariates, usually the form shown in (2), is correct. However, spatial point patterns are often the results of complex interactions between physical, environmental and ecological processes. For example, a census of tree locations in a tropical forest includes many generations of trees whose growth and decay are affected by soil, rainfall, water runoff, etc. At best, this makes the choice of model very challenging. If the relevant covariates were not measured, an appropriate model may not even be possible. When there is no covariate involved or it is too difficult to incorporate covariate information, nonparametric estimation of the intensity function is often used. One way to estimate λ(s) nonparametrically is with kernel estimation (Diggle, 1985), usually with a single bandwidth h chosen by eye. Diggle (2003) gave an ad-hoc rule, h = 0.68n−0.2 for the quadratic Epaneˇcnikov kernel when the observation window is a square. He also provided a more sophisticated method where h can be obtained by minimizing an estimate M (h) of the mean squared error as a function of h. Stoyan et al. (1995) suggested a likelihood cross-validation method for selecting the bandwidth, which is an extension of the method in Silverman

Bayesian Semiparametric Intensity Estimation

3

(1986) to the context of spatial point patterns. Since the same amount smoothing is applied to the whole data area, the ordinary kernel smoother often fails to capture the local spatial variation of the intensity function. Diggle et al. (2005) therefore used various bandwidths hi to perform adaptive smoothing with the kernel estimator. Guan (2008) introduced an interesting version of kernel estimation, with the distance measure in the kernel based on covariate distance instead of geographic distance. This allowed consistent estimation of the intensity by pooling information over widely separated points that have similar covariate values. He addressed bandwidth selection and dimension reduction using a sliced inverse regression technique. While it is nonparametric and does not involve a specified form of the relationship between the intensity and the covariate(s), this method does assume that the correct covariate(s) are used. Furthermore, it is not entirely clear how covariate distances should be computed if two or more covariates are involved. Guan (2008) used the Euclidean norm on the standardized covariates. The standardization reduces the issues involved with using the Euclidean norm. However, with some covariates, e.g. altitude and soil pH, it is difficult to interpret the meaning of this distance, nor is it clear whether the Euclidean norm is the correct norm to use. To combine the strengths of both parametric and nonparametric methods, we propose modeling the intensity function in a semiparametric way, i.e., λ(s; α, β, z) = exp {α + X(s)β + z(s)} ,

(4)

where z(s) is a stochastic process accounting for spatial correlation among locations that cannot be explained by the covariates. We convert intensity estimation into a generalized spatial regression problem via binning the data and assume that the observed number of points in each grid cell follows a Poisson distribution with intensity of the form given in (4). A fully Bayesian hierarchical model is developed by taking appropriate prior distributions on α, β and z(s). In particular, the prior on z(s) is an adaptive version of a Gaussian Markov

4

Biometrics, 000 0000

random field (GMRF). Such an adaptive GMRF incorporates local spatial structure into the model while simultaneously accounting for spatial variation and uncertainty at a larger spatial scale. The inference is carried out by an efficient MCMC simulation algorithm using the sparse structure of the GMRF. The proposed method avoids the restrictive assumptions of the specific form of relationship between the intensity function and the set of possible covariates. Compared to a kernel smoother whose bandwidth(s) is (are) often chosen in an ad-hoc manner, the method uses information contained in the data to adaptively determine the amount of smoothing at the local level. There are some other Bayesian approaches that can be used to estimate intensity of an inhomogeneous spatial point process. Knorr-Held and Rue (2002) and Rue et al. (2004) developed a flexible Bayesian model for disease mapping based on MCMC inference. Rue et al. (2009) introduced a novel Bayesian inference method based on integrated Laplace approximations (INLA), which provides (much) faster and more reliable inference than using MCMC. Liang et al. (2009) introduced Bayesian wombling for spatial point processes. All the methods, however, modeled unknown spatial process with either non-adaptive GMRFs or conditionally autoregressive (CAR) models (Besag, 1974). Illian and Rue (2010) incorporated constructed covariates into their model to capture local spatial structure and also used GMRF to account for spatial variation at a larger scale. The rest of the paper is organized as follows. In Sections 2 and 3, we introduce our Bayesian hierarchical modeling method based on an adaptive GMRF prior and MCMC inference. In Section 4, we compare our method with kernel intensity estimation through simulation studies. In Section 5, we apply the method to four species of trees in the Barro Colorado Island tropical forest dataset.

Bayesian Semiparametric Intensity Estimation

5

2. Model and priors 2.1 Observation model Consider a regular grid In with n = n1 n2 cells and denote by Cjk the grid cell in the j-th row and k-th column. Given a spatial point pattern, we bin the observed points into the cells. Letting yjk be the number of points binned to Cjk , we use the model yjk |λjk ∼ Poisson (|Cjk |λjk )

and

log(λjk ) = α + X(sjk )β + z(sjk ),

(5)

for j = 1, . . . , n1 and k = 1, . . . , n2 , as a simple approximation to (3). The location point sjk is the center of Cjk , X(sjk ) represents p spatial covariates measured at each sjk , z(sjk ) is the realization of a spatial process at sjk and λjk can be considered the average intensity for each grid cell. We note that in an actual application, the covariates may not have been measured at the grid points. In such situations some preprocessing such as interpolation would be needed. Binning has been proposed by a number of authors in density estimation and nonparametric regression. In particular, Brown et al. (2009) considered binning data to convert the density estimation problem to a nonparametric regression problem in one dimension, using a wavelet block thresholding estimator (Cai, 1999) as the regression estimator. Kass et al. (2009) considered, for neuronal spike data, adaptive smoothing by binning the data and using Bayesian regression splines where the number of knots of the cubic splines is determined from data.

2.2 Spatially adaptive Gaussian Markov random fields We model the spatial process z(sjk ) in (5) using an adaptive version of the Gaussian Markov random field (GMRF) introduced by Yue and Speckman (2010). In particular, this GMRF is derived by approximating a two-dimensional Bayesian thin-plate spline (TPS) estimator (e.g. Wahba, 1990; Gu, 2002). We briefly describe the adaptive GMRF in this section and

6

Biometrics, 000 0000

refer the reader to Rue and Held (2005) and Yue and Speckman (2010) respectively for details on GMRF’s in general and on the movitation behind the TPS GMRF. The adaptive GMRF of Yue and Speckman (2010) is based on the following spatial Gaussian random walk model :   ∇2(1,0) + ∇2(0,1) zjk ∼ N 0, (δγjk )−1 ,

(6)

where zjk = z(sjk ), and ∇2(1,0) and ∇2(0,1) denote the second-order backward difference operators in the vertical and horizontal directions respectively, i.e., ∇2(1,0) zjk = zj+1,k −2zjk +zj−1,k and ∇2(0,1) zjk = zj,k+1 − 2zjk + zj,k−1 for 2 6 j 6 n1 − 1 and 2 6 j 6 n2 − 1 . The positive parameter δ is a global smoothing parameter accounting for large-scale spatial variation while the γjk (also positive) are the adaptive smoothing parameters that capture the local structure of the process z(s). The use of γjk is important for estimating the intensity function from an inhomogeneous point pattern. Heuristically, we need less smoothing (small γjk ) at dense sub-regions and relatively more smoothing (large γjk ) where very few or no points are observed. Standard smoothing techniques (e.g., kernel smoother with fixed width) involve a trade-off between increasing detectability for intensity and retaining information on the spatial extent of the areas. Using adaptive γjk reduces such loss of information. The need of adaptive smoothing for intensity estimation was also demonstrated in Diggle et al. (2005) and Illian and Rue (2010). Note that setting γjk ≡ 1 makes (6) a non-adaptive GMRF on lattice, which yields a Bayesian solution for thin-plate splines (see Rue and Held, 2005, section 3.4.2). Yue and Speckman (2010) showed that model (6) together with certain edge correction terms define an adaptive GMRF with matrix form given by π(z | γ, δ) ∝ δ

(n−1)/2

1/2 |Aγ |+

 δ T exp − z Aγ z , 2 

(7)

where z = (z11 , z21 , . . . , zn1 ,n2 )0 , γ = (γ21 , γ22 , . . . , γn1 ,n2 )0 , Aγ = B T diag(γ)B is an n × n structure matrix of rank n − 1, B is an (n − 1) × n full rank sparse matrix, and diag(γ)

Bayesian Semiparametric Intensity Estimation

7

is a diagonal matrix of γ. The quantity |Aγ |+ is the product of the non-zero eigenvalues Q from Aγ . Yue et al. (2008) and Yue and Speckman (2010) proved that |Aγ |+ = |BB T | ` γ` which is easily computed. The prior (7) is an intrinsic GMRF because its density is improper Gaussian (Aγ is singular) and it has an attractive Markov property making Aγ highly sparse. See Yue and Speckman (2010) for the detailed specification of Aγ . Additional priors need to be specified for γjk in (7). Brezger et al. (2007) suggested using iid gamma priors γjk ∼ Gamma(ν/2, ν/2). The marginal prior distribution of the increment in (6) turns out to be a heavy-tailed Student t distribution with ν degrees of freedom. The case ν = 1 (i.e. a Cauchy distribution) is of special interest as a prior for robust nonparametric regression (Carter and Kohn, 1996). It is intuitive to expect γjk to be spatially correlated. Accordingly, Yue and Speckman (2010) and Yue et al. (2009) used a second hierarchy with a first order spatial GMRF model for log(γjk ). They showed that this two-stage GMRF prior was quite flexible for estimating functions with differing curvature and had successful applications in spatial modeling of meteorological data and functional magnetic resonance imaging data. From our investigations, we find that the iid gamma prior approach has a better ability to capture sharp changes in the underlying spatial process compared to the two-stage GMRF approach. Such changes often occur in inhomogeneous point patterns. Furthermore, it is computationally much less demanding. We therefore decided to use iid gamma priors for γjk .

2.3 Other priors Fully Bayesian inference requires a hyperprior on the precision parameter δ in (6). For conjugacy, we choose a weakly informative gamma prior, i.e., δ ∼ Gamma(, ), where  is some small value, say 0.0001. Other priors, such as the Pareto prior (Yue and Speckman, 2010) and half-Cauchy prior (Gelman, 2006), are also good options. Since the posterior inference was found robust to the choice of the prior on δ, we use the diffuse gamma prior for

8

Biometrics, 000 0000

computational simplicity. Since the null space of Aγ is spanned by 1 = (1, . . . , 1)0 , we remove intercept α from model (5) for identifiability. It is equivalent to taking an implicit constant prior on α. For β, we assume a highly dispersed Gaussian prior π(β) ∼ Np (0, κ−1 Ip ), where κ is a small constant, say 0.0001. Such prior specification has been commonly used for the Bayesian inference for generalized linear mixed models (see, e.g., Fahrmeir and Lang, 2001).

3. Posterior inference via MCMC Based on the likelihood and prior distributions specified in Section 2, the joint posterior distribution of our Bayesian hierarchical model is π(β, z, γ, δ | y) ∝ L(y | β, z) π(z | γ, δ) π(γ) π(δ)π(β).

(8)

Although the GMRF prior on z is improper, the posterior distribution in (8) is proper (see Sun et al., 2001; Fahrmeir and Kneib, 2009). In order to derive an efficient Gibbs sampler, we let η = Xβ + z and obtain the joint posterior distribution of (β, η, γ, δ) proportional to   δ κ δ (n−1)/2+−1 exp − (η − Xβ)0 Aγ (η − Xβ) − β 0 β − δ 2 2 X  n−1  ν  Y ν/2−1 × exp yjk ηjk − |Cjk | exp(ηjk ) γ` exp − γ` . 2 j,k `=1

(9)

We sample β from the full conditional Np (µ, Σ), where µ = δΣX 0 Aγ η and Σ = (δX 0 Aγ X+ κIp )−1 . The full conditionals of δ and γ` are gamma distributions that can be easily derived. However, the full conditional of η does not correspond to any regular distribution. We thus employ the Metropolis-Hastings algorithm to update η as described below. 3.1 Block-move Metropolis-Hastings sampling algorithm From (9), the full conditional of η = (η11 , η21 , . . . , ηn1 ,n2 )0 is proportional to  δ  X exp − (η 0 Aγ η − 2θ 0 X 0 Aγ η) + yjk ηjk − |Cjk | exp(ηjk ) . 2 j,k

(10)

Knorr-Held and Rue (2002) and Rue and Held (2005) suggest updating η as a block using the Metropolis-Hastings algorithm. The proposal distribution is a GMRF approximation of (10).

Bayesian Semiparametric Intensity Estimation

9

One can efficiently evaluate the acceptance rate using the sparse structure of the GMRF. However, the acceptance rate can be low, especially when η is highly dimensional. Rue et al. (2004) introduced a more flexible non-Gaussian approximation method for sampling η, where the accuracy can be tuned by intuitive parameters to nearly any desired precision. This method is, however, much more computationally intensive. To improve the acceptance rate, we first partition η into several blocks and then update each block conditional on the other blocks. More specifically, letting φk = (η1k , . . . , ηn1 ,k )0 for k = 1, . . . , n2 , the conditional prior distribution of φk is then n δ o π(φk |φ−k , γ, θ, δ) ∝ δ n1 /2 |Ak |1/2 exp − (φk − µk )T Ak (φk − µk ) , 2

(11)

where φ−k is the vector of ηjk not included in φk , Ak is the submatrix of Aγ corresponding to φk , and µk is the mean vector depending on φ−k , γ and θ. Due to the sparsity of Ak , the vector µk can be efficiently computed. See Knorr-Held and Richardson (2003) and Yue and Speckman (2010) for a detailed specification of the prior distribution in (11). From (9) and (10), the full conditional of φk is proportional to n1 n δ o X T π(φk | φ−k , y) ∝ exp − (φk − µk ) Ak (φk − µk ) + yjk ηjk − |Cjk | exp(ηjk ) , 2 j=1

(12)

We now use a second-order Taylor expansion of exp(ηjk ) around φm k , the mode of the full conditional (12), to construct a suitable approximation of (12), n1  n δ X 1 2 o T T dj ηjk − cj ηjk π ˜ (φk | φ−k , y) ∝ exp − φk Ak φk + δµk Ak φk + 2 2 j=1 n 1 o ∝ exp − φTk (δAk + diag(c)) φk + (δAk µk + d)T φk , 2

(13)

where cj and dj are easily derived constants. To locate mode φm k , one may use the well-known (multivariate) Newton-Raphson method. It is straightforward to check that (13) is a GMRF with distribution 

−1

N (δAk + diag(c))

−1

(δAk µk + d) , (δAk + diag(c))



,

and will be used as the proposal distribution. Using the Metropolis-Hastings algorithm, for

10

Biometrics, 000 0000

˜ (φk |φ−k , y) and then accept it with probability k = 1, . . . , n2 we draw a candidate φ∗k from π   π(φ∗k |φ−k , y) π ˜ (φk |φ−k , y) α = min 1, . (14) π(φk |φ−k , y) π ˜ (φ∗k |φ−k , y) An important feature of (13) is that the GMRF approximation inherits the Markov property of the prior on φk , which makes the MCMC simulation efficient. Furthermore, such an approximation is fairly accurate and yields acceptance rates above 0.9 for all the examples that we present in the paper.

4. Simulations and comparisons with kernel methods We performed two simulated examples to compare the adaptive GMRF method for intensity estimation with both the fixed and adaptive bandwidth versions of the kernel estimator. Although we included covariates in our model in Section 2, for simplicity, we do not use any here. In the first example, the point processes are simulated by assuming the true intensity to be a (smooth) two-dimensional unimodal function. In the second example, we simulate realizations of inhomogeneous Poisson processes with log-intensities generated from Gaussian processes. For fixed bandwidths, we used the cross-validation likelihood method of Stoyan et al. (1995) and the M (h) minimization method of Diggle (2003) to select bandwidths. The implementation of the adaptive kernel method follows Diggle et al. (2005), except that instead of estimating intensities at the observed points, we estimate them at grid locations. The initial bandwidth h0 was chosen to be the bandwidth used in the fixed bandwidth kernel methods. We use the integrated squared error (ISE) between the true and estimated intensities as a performance criterion. Since the results depend on the bandwidth selectors, we only report the minimum ISE achieved by kernel methods. 4.1 Unimodal intensity function We first considered a unimodal intensity function on a 10×10 window centered at the origin:  λ(s) = 9 + 27 exp −2||s||2 .

(15)

Bayesian Semiparametric Intensity Estimation

11

We simulated 100 inhomogeneous Poisson spatial point patterns using the rpoispp function in the spatstat R package. We also used the inhomogeneous Neyman-Scott process (INSP) model introduced by Waagepetersen (2007), simulating 100 realizations using the intensity function (15). These are generated by first simulating a stationary Neyman-Scott realization and then thinning the sample using retension probabilities that depend on λ. We used the Thomas process, a Neyman-Scott process where the offspring points are distributed about the parent points according to a bivariate Gaussian density. We chose the constant parent intensity, the mean number of offspring points per parent and the standard deviation of the Gaussian density to be 10, 4 and 0.5 respectively. The functions in spatstat we used are rThomas and rthin. These point patterns have correlations and so are not Poisson processes. It allows us to examine the performance of the adaptive GMRF method when the underlying Poisson assumption is not valid. Both the Poisson and INSP point processes have roughly 1000 points in each data set. The intensities are estimated on 100×100 regular grids for the (non-adaptive) kernel method and on 30×30 grids for adaptive methods. We evaluated ISEs on three different grid dimensions: 30×30, 50×50 and 100×100. The results are summarized in Table 1. Compared to the kernel methods, the adaptive GMRF model works much better for Poisson point patterns. The Poisson assumption shows in the results for the INSP point patterns, where the ISEs are slightly larger than for the kernel methods. It is also interesting to see that the adaptive kernel estimator slightly outperforms the fixed bandwidth kernel for Poisson processes but performs similarly for INSP. [Table 1 about here.]

4.2 Stationary Gaussian processes Following Diggle et al. (2005), we also simulated realisations of inhomogeneous Poisson processes whose intensities are generated from λ(x) = exp {S(x)}, where S(x) is a stationary Gaussian process with covariance function Cov {S(x), S(x − u)} = σ 2 exp(−u/φ). For each

12

Biometrics, 000 0000

comparison, we simulated 100 samples, each consisting of 1000 points on a 89×89 region. From each simulated sample we computed the (minimum) integrated squared errors ISEF K , ISEAK and ISEAG achieved by the fixed bandwidth kernel, adaptive bandwidth kernel, and adaptive GMRF estimators respectively. All the ISEs are evaluated on 100×100 regular grids. To compare the performance of the three estimators it is more informative to look at the ratios of the ISE’s: RF K/AG = ISEF K /ISEAG and RAK/AG = ISEAK /ISEAG . Table 2 summarizes the results for each pair of the model parameters (σ 2 , φ). As we can see, the adaptive GMRF estimator outperforms both kernel estimators in almost every situation. Also, the adaptive GMRF model seems to work better for φ = 5, 7 than φ = 3. This is due to the GMRF prior that we constructed, which assumes second-order smoothness and is more suitable for modeling Gaussian processes with large φ (i.e. stronger correlation). This can also be seen from the first simulation example, where the true intensity function is a smooth unimodal function. Finally, note that the superiority of using adaptive methods is more pronounced at larger values of σ 2 , consistent with the fact that larger values of σ 2 are associated with more pronounced spatial heterogenity in the resulting point patterns. This agrees with the findings in Diggle et al. (2005). As highlighted by a referee, the performance should not be as good for, say, a Thomas Neyman-Scott model, since it violates the assumptions of model (5). Briefly, we expect better performance with weaker spatial correlation (i.e. closer to a Poisson process) and vice versa. A comprehensive study would require consideration of a wide range of point process models and model parameters, which is beyond the scope of this paper. [Table 2 about here.]

4.3 Some remarks The adaptive GMRF framework allows for reliable and efficient MCMC simulation. The Markov chains have quick convergence: 15,000 MCMC iterations with a burn-in of 5,000

Bayesian Semiparametric Intensity Estimation

13

proved to be sufficient for all the simulated examples. Due to the block-move sampling method described in Section 3, the MCMC computation is fast considering the large number of parameters to be estimated in the model. With n = 402 , 602 , 802 and 1002 , our FORTRAN program took 114, 255, 510 and 837 seconds, respectively, to estimate intensities from a simulated data set of 10,000 points. The computational times increase roughly at order n. All the computations were carried out on a Mac laptop with a 2.13 GHz Intel Core 2 Duo processor. Since the adaptive GMRF model is based on binning of the data, its performance depends on the choice of grid size. This is similar to bandwidth choice with kernel methods. For our simulation studies, we used grids of size 30×30, 40×40 and 50×50. The results in Table 2 are for the best performing grid size, specifically 30×30 for σ 2 =1, 40×40 for σ 2 =3 and 50×50 for σ 2 =7. Choice of grid size does not appear to be influenced by φ, at least, not for the values we considered. Note that for the kernel method with fixed bandwidth, we used optimal bandwidths selected using cross-validation (Stoyan et al., 1995) and the M (h) miminization method (Diggle, 2003). We believe the advantage of the adaptive GMRF estimator is the use of data information to choose the amount of smoothing. In contrast, the kernel smoothing methods depend on bandwidth selectors that have not been well studied in the context of spatial point analyses. In our simulations, we found that the two selectors we used, the cross-validation likelihood method and the M (h) minimization method, often selected significantly different bandwidths. Stoyan et al. (1995) indicates that the likelihood cross-validation method has a tendency to return small bandwidths. The quantity M (h) depends on the K function, a second-order statistic that will have to be estimated in practice. Estimates of the K function are known to be rather variable. Moreover, the estimation using adaptive kernel

14

Biometrics, 000 0000

method is sensitive to the initial bandwidth h0 . Therefore, the kernel estimators rarely yielded consistent performance in our simulation studies.

5. Application to rainforest data We here use the dataset from a census of a tropical forest in a 1000 by 500 meter plot in Barro Colorado Island in the Panama. See Hubbell and Foster (1983), Condit et al. (1996) and Condit (1998). The full dataset contains the locations of more than 300 species of trees, many of these numbering thousands of trees. We consider three particular species of trees: the Acalypha diversifolia (Acaldi, 528 trees), Capparis frondosa (Cappfr, 3299 trees) and Lonchocarpus heptaphyllus (Loncla, 836 trees) species. Measurements of the altitude, slope, various soil minerals, and soil pH are available on a grid of locations. Figure 1 shows plots of Potassium and Nitrogen levels in the soil as well as altitude and slope over the observation region. We obtained intensity estimates for these three species of trees using the parametric model of Waagepetersen and Guan (2009), the fixed and adaptive kernel methods, and the semiparametric model based on the adaptive GMRF. Figures 2, 3 and 4 show our results for the Acaldi, Cappfr and Loncla species respectively. Specifically, these figures contain plots of the actual data, the parametric, adaptive kernel and semi-parametric adaptive GMRF estimates (we exclude the plots of the fixed kernel estimates in the interest of conserving space but they are available upon request). The intensity plots are on the square-root scale to show the spatial structures more clearly. Waagepetersen and Guan (2009) found altitude and potassium to be significant for both Acaldi and Cappfr, and nitrogen and phosphorus for Loncla. However, the parametric intensity estimates tend to contain the features that are visible in the plot of covariate values but are not apparent from the actual point data. For example, in Figure 2(b) for Acaldi, the peak near the lower right corner does not correspond to an abundance of trees, while in

Bayesian Semiparametric Intensity Estimation

15

Figure 3(a) for Cappfr, the dense curved area from the lower left corner to location (200, 200) has a very low estimated intensity as shown in Figure 3(b). Also notice the similarity between the Acaldi and Cappfr plots. In Figure 4(b) for Loncla, where the coefficient for the nitrogen covariate is negative, we find that the pattern appears to roughly follow the inverse of the nitrogen pattern. See, for example, the two intensity peaks on the left edge and the low intensity area on the upper right. We also found that the non-adaptive kernel estimator (not shown) appears to over-smooth dense regions and under-smooth areas with few data points. Conversely, the adaptive kernel estimator seems to provide local smoothing for intensity estimation, but at the cost of losing global smoothness, resulting in a very granular appearance. Using the semiparametric adaptive GMRF model, the significant covariates at the 5% level were potassium for Acaldi, slope for Cappfr, and altitude for Loncla. For Acaldi, the estimate of the regression parameter for potassium was 0.0045 (0.0002, 0.0089) (approximate 95% Bayesian credible interval in parenthesis). For Cappfr, the estimate of slope was -2.51 (-4.45, -0.62), and for Loncla the estimate of elevation was 0.094 (0.043, 0.148). Note that the result on potassium for Acaldi is consistent with Waagepetersen and Guan (2009). We also found that our Bayesian point estimates are close to Waagepetersen and Guan’s estimates, but with more conservative 95% intervals. The reason may be that our estimates take into account the uncertainty from estimating the spatial features that are not due to the covariates. Compared to the kernel estimates, the proposed method, in our opinion, achieves a much better balance between smoothing the image and retaining the necessary details on spatial extents. Due to its data-driven nature, our model is able to apply adaptive smoothing when the point process has increasingly local variation and provides general smoothing where the process is more homogeneous. In particular, notice that although the intensity estimates contain some spatial features of the underlying covariate structure, these features do not

16

Biometrics, 000 0000

overwhelm the information contained in the data. For example, for the Cappfr species, the dense curved structure in the lower left portion of the observation region is retained. As suggested by a referee, we checked the plots of estimated γjk for all species, which show low estimates for clustered points and high estimates for more homogeneous points. Such plots indicate that it is necessary to consider the spatially adaptive approach for the rainforest data. [Figure 1 about here.] [Figure 2 about here.] [Figure 3 about here.] [Figure 4 about here.]

6. Discussion In this work, we introduced a data-driven adaptive method to obtain estimates of the intensity function from a spatial point pattern. This method involves binning the data and using adaptive Gaussian Markov random fields to model the underlying spatial process. The amount of smoothing at the local level is controlled by parameters in the prior, and these parameters are estimated using the data in the neighboring areas. The simulation study and rainforest data examples show that this method can achieve a nice balance between smoothing and retension of features of the intensity function that is difficult to attain with kernel estimation. Our method is versatile and works with or without covariates. The nonparametric version is useful for exploratory analyses and mapping purposes, or if information about covariates or form of the model is lacking. This is particularly important when the process producing the point pattern is complicated and/or not well understood. If appropriate covariates are available, inference regarding the dependence of the intensity function on these covariates can be obtained using the semiparametric version of our method.

Bayesian Semiparametric Intensity Estimation

17

To implement our method, one must first select an appropriate grid dimension for binning. Currently, this is a subjective decision to be made by the investigator. We examined the effects of choice of grid size in our simulation study and provided some initial guidelines. In particular, in the simulation study with the stationary Gaussian processes (Section 4.2), we found that performance is better with small cells (larger grid size) when σ is larger, i.e. when there is more heterogeneity in the point pattern. Thus we recommend a grid size so that each cell is roughly homogeneous. Also, a finer grid can be used if the data size is larger. We are planning a more comprehensive study to explore this topic further, especially in the context of differing correlation strengths. A theoretical study of this and of issues of consistency are also planned. Another area of further investigation is the use of these intensity estimates in estimates of the inhomogeneous K function (Baddeley et al., 2000). The correlation exhibited in point patterns is often of great interest. However, in the inhomogeneous case this requires good estimates of the intensity function. We will study how estimates of the inhomogeneous K function might be improved by the use of our intensity estimates.

Acknowledgements

Yu Yue’s research is supported by PSC-CUNY research award #60147-39 40.

References

Baddeley, A. and Turner, R. (2000). Practical maximum pseudolikelihood for spatial point patterns. Australian and New Zealand Journal of Statistics 42, 283–322. Baddeley, A. J., Møller, J., and Waagepetersen, R. (2000). Non- and semi-parametric estimation of interaction in inhomogeneous point patterns. Statistica Neerlandica 54, 329–350. Baddeley, A. J. and Turner, R. (2005). Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software 12, 1–42. Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). Journal of the Royal Statistical Society, Series B: Methodological 36, 192– 236.

18

Biometrics, 000 0000

Brezger, A., Fahrmeir, L., and Hennerfeind, A. (2007). Adaptive Gaussian Markov random fields with applications in human brain mapping. Journal of the Royal Statistical Society: Series C (Applied Statistics) 56, 327–345. Brown, L., Cai, T., Zhang, R., Zhao, L., and Zhou, H. (2009). The root-unroot algorithm for density estimation as implemented via wavelet block thresholding. to appear in Probability Theory and Related Fields . Cai, T. (1999). Adaptive wavelet estimation: a block thresholding and oracle inequality approach. Annals of Statistics 27, 898–924. Carter, C. K. and Kohn, R. (1996). Markov chain Monte Carlo in conditionally Gaussian state space models. Biometrika 83, 589–601. Condit, R. (1998). Tropical forest census plots. Springer-Verlag and R. G. Landes Company, Berlin, Germany and Georgetown, Texas. Condit, R., Hubbell, S. P., and Foster, R. B. (1996). Changes in tree species abundance in a neotropical forest: impact of climate change. Journal of Tropical Ecology 12, 231–256. Diggle, P., Rowlingson, B., and Su, T. (2005). Point process methodology for on0line spatiotemporal disease surveillance. Environmetrics 16, 423–434. Diggle, P. J. (1985). A kernel method for smoothing point process data. Applied Statistics 34, 138–147. Diggle, P. J. (2003). Statistical Analysis of Spatial Point Patterns. Arnold, London, 2nd edition. Diggle, P. J., Morris, S. E., and Wakefield, J. C. (2000). Point-source modelling using matched case-control data. Biostatistics 1, 89–105. Fahrmeir, L. and Kneib, T. (2009). Propriety of posteriors in structured additive regression models: Theory and empirical evidence. Journal of Statistical Planning and Inference 139, 843–859. Fahrmeir, L. and Lang, S. (2001). Bayesian inference for generalized additive mixed models based on Markov random field priors. Journal of the Royal Statistical Society, Series C: Applied Statistics 50, 201–220. Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis 1, 515–533. Gu, C. (2002). Smoothing Spline ANOVA Models. Springer-Verlag Inc, New York. Guan, Y. (2008). On consistent nonparametric intensity estimation for inhomogeneous spatial point processes. Journal of the American Statistical Association 103, 1238–1247. Guan, Y. (2009). Fast block variance estimation procedure for inhomogeneous spatial point processes. Biometrika 96, 213–220. Guan, Y. and Loh, J. M. (2007). A thinned block bootstrap procedure for modeling inhomogeneous spatial point patterns. Journal of the American Statistical Association 102, 1377–1386. Hubbell, S. P. and Foster, R. B. (1983). Diversity of canopy trees in a neotropical forest and implications for conservation. In Sutton, S. L., Whitmore, T. C., and Chadwick, A. C., editors, Tropical Rain Forest: Ecology and Management, pages 25–41. Blackwell Scientific. Illian, J. B. and Rue, H. (2010). A toolbox for fitting complex spatial point process models using integrated Laplace transformation (INLA). Technical Report 6, Department of Mathematical Sciences, Norwegian University of Science and Technology.

Bayesian Semiparametric Intensity Estimation

19

Kass, R., Ventura, V., and Cai, C. (2009). Statistical smoothing of neuronal data. Biometrics 65, 1243–1253. Knorr-Held, L. and Richardson, S. (2003). A hierarchical model for space-time surveillance data on meningococcal disease incidence. Journal of the Royal Statistical Society, Series C: Applied Statistics 52, 169–183. Knorr-Held, L. and Rue, H. (2002). On block updating in Markov random field models for disease mapping. Scandinavian Journal of Statistics 29, 597–614. Liang, S., Banerjee, S., and Carlin, B. P. (2009). Bayesian wombling for spatial point processes. Biometrics 65, 1243–1253. Møller, J. and Waagepetersen, R. P. (2003). Statistical Inference and Simulation for Spatial Point Processes. Chapman and Hall, New York. Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications, volume 104 of Monographs on Statistics and Applied Probability. Chapman & Hall, London. Rue, H., Martino, S., and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models using integrated nested Laplace approximations (with discussion). Journal of the Royal Statistical Society, Series B 71, 319–392. Rue, H., Steinsland, I., and Erland, S. (2004). Approximating hidden Gaussain Markov random fields. Journal of the Royal Statistical Society, Series B: Statistical Methodology 66, 877–892. Schoenberg, F. P. (2003). Multi-dimensional residual analysis of point process models for earthquake occurrences. Journal of American Statistical Association 98, 789—795. Schoenberg, F. P. (2004). Consistent parametric estimation of the intensity of a spatialtemporal point process. Journal of Statistical Planning and Inference 128, 79–93. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. CRC Press, New York. Stein, M. L., Quashnock, J. M., and Loh, J. M. (2000). Estimating the K function of a point process with an application to cosmology. Annals of Statistics 28, 1503–1532. Stoyan, D., Kendall, W. S., and Mecke, J. (1995). Stochastic Geometry and Its Applications, 2nd edition. John Wiley, New York. Sun, D., Tsutakawa, R. K., and He, Z. (2001). Propriety of posteriors with improper priors in hierarchical linear mixed models. Statistica Sinica 11, 77–95. Waagepetersen, R. (2007). An estimating function approach to inference for inhomogeneous Neyman-Scott processes. Biometrics 63, 252–258. Waagepetersen, R. and Guan, Y. (2009). Two-step estimation for inhomogeneous spatial point processes. Journal of the Royal Statistical Association Series B 71, 685–702. Wahba, G. (1990). Spline Models for Observational Data. SIAM [Society for Industrial and Applied Mathematics], Philadelphia. Yue, Y., Loh, J. M., and Lindquist, M. A. (2009). Adaptive spatial smoothing of fMRI images. Statistics and Its Interface (to appear). Yue, Y. and Speckman, P. L. (2010). Nonstationary spatial Gaussian Markov random fields. Journal of Computational and Graphical Statistics 19, 96–116. Yue, Y., Speckman, P. L., and Sun, D. (2008). Fully Bayesian adaptive spline smoothing. Manuscript (submitted), University of Missouri-Columbia, Department of Statistics.

350 300 250 200 150

0.20 0.15 0.10 0.05 0.00 -0.05

0

0

100

100 200 300 400 500

Biometrics, 000 0000

100 200 300 400 500

20

0

200

400

600

800

1000

0

200

400

20 10 0

0

-10 400

600

(c) Nitrogen

800

1000

100 200 300 400 500

30

200

1000

155 150 145 140 135 130 125

0

40

0

800

(b) Slope

100 200 300 400 500

(a) Potassium

600

0

200

400

600

800

1000

(d) Altitude

Figure 1. Plots of the measurements of soil Potassium and mineralized Nitrogen (in units of mg/kg of soil), as well as the altitude (in meters) and slope in the observation region in Barro Colorado Island.

21

0.045

400

100 200 300 400 500

500

Bayesian Semiparametric Intensity Estimation

300

0.040

200

0.035

100

0.030

0

0

0.025 0

200

400

600

800

0

1000

200

600

800

1000

0.14 0.12 0.10 0.08 0.06 0.04

100 200 300 400 500

(b) Parametric estimator

100 200 300 400 500

(a) Acaldi

400

0.12 0.10 0.08 0.06 0.04 0.02

0

0

0.02 0

200

400

600

800

1000

(c) Adaptive kernel estimator

0

200

400

600

800

1000

(d) Adaptive GMRF estimator

Figure 2. Plots of Acalypha diversifolia trees: (a) tree locations; (b) intensity estimates using the parametric method; (c) intensity estimates using the adaptive kernel method; (d) intensity estimates using the adaptive GMRF method. (The estimates are on a square-root scale.)

Biometrics, 000 0000

0.11

300

0.09

200

0.08

100

0.10

0.07 0.06

0

0

400

100 200 300 400 500

500

22

0

200

400

600

800

0

1000

200

600

800

1000

400

0.30

300

0.25

200

0.15

100

0.20

0.10

100 200 300 400 500

(b) Parametric estimator

500

(a) Cappfr

400

0.15

0.10

0.05

0

0

0.05 0

200

400

600

800

(c) Adaptive kernel estimator

1000

0

200

400

600

800

1000

(d) Adaptive GMRF estimator

Figure 3. Plots of Capparis frondosa trees: (a) tree locations; (b) intensity estimates using the parametric method; (c) intensity estimates using the adaptive kernel method; (d) intensity estimates using the adaptive GMRF method. (The estimates are on a square-root scale.)

23

0.055

400

100 200 300 400 500

500

Bayesian Semiparametric Intensity Estimation

300

0.050 0.045

200

0.040 0.035 0.025

0

0

100

0.030

0

200

400

600

800

0

1000

200

0.12 0.10 0.08 0.06 0.04

0

0.02 400

600

1000

800

1000

(c) Adaptive kernel estimator

100 200 300 400 500

0.14

200

800

0.12 0.10 0.08 0.06 0.04 0.02

0

0.16

0

600

(b) Parametric estimator

100 200 300 400 500

(a) Loncla

400

0

200

400

600

800

1000

(d) Adaptive GMRF estimator

Figure 4. Plots of Lonchocarpus heptaphyllus trees: (a) tree locations; (b) intensity estimates using the parametric method; (c) intensity estimates using the adaptive kernel method; (d) intensity estimates using the adaptive GMRF method. (The estimates are on a square-root scale.)

24

Biometrics, 000 0000

Table 1 The integrated squared errors obtained with different methods of intensity estimation, averaged over 100 simulated realizations from the inhomogeneous Poisson (POIS) and inhomogeneous Neymann-Scot (INSP) models, with unimodal intensity function in (15).

Method

n = 30

POIS n = 502

Fixed Kernel Adapt Kernel Adapt GMRF

512.71 502.99 422.53

512.78 503.25 437.02

2

2

n = 100 512.70 503.36 436.30

n = 30

INSP n = 502

n = 1002

706.64 705.27 720.21

706.86 705.53 726.83

706.55 705.69 743.59

2

Bayesian Semiparametric Intensity Estimation

25

Table 2 Summary results from simulation study to compare performance of the adaptive GMRF, fixed and adaptive band-width kernel estimators, for different values of the Gaussian process parameters σ 2 and φ. The values are the ratios of minimum integrated squared errors achieved by the kernel and adaptive GMRF estimators.

σ2 = 1 RF K/AG RAK/AG φ=3 φ=5 φ=7

.859 .949 .851

1.048 1.199 1.162

σ2 = 3 RF K/AG RAK/AG 1.148 1.496 1.311

1.016 1.659 1.605

σ2 = 5 RF K/AG RAK/AG 1.097 1.696 1.427

1.012 1.787 1.658

σ2 = 7 RF K/AG RAK/AG 1.188 1.493 1.164

1.069 1.116 0.963