A Class of Stochastic Volatility Models for Environmental Applications

Report 1 Downloads 59 Views
A Class of Stochastic Volatility Models for Environmental Applications Wenying Huang, Amylin Pharmaceuticals Inc. Ke Wang, Pfizer Inc. F. Jay Breidt, Colorado State University Richard A. Davis, Columbia University November 29, 2010

Abstract Many environmental data sets have a continuous domain, in time and/or space, and complex features that may be poorly modeled with a stationary Gaussian process (GP). We adapt stochastic volatility modeling to this context, resulting in a stochastic heteroscedastic process (SHP), which is unconditionally stationary and non-Gaussian. Conditional on a latent GP, the SHP is a heteroscedastic GP with nonstationary covariance structure. The realizations from SHP are versatile and can represent spatial inhomogeneities. The unconditional correlation functions of SHP form a rich isotropic class that can allow for a smoothed nugget effect. We apply an importance sampling strategy to implement pseudo maximum likelihood parameter estimation for the SHP. To predict the process at unobserved locations, we develop a plug-in best predictor. We extend the single-realization SHP model to handle replicates across time of SHP realizations in space. Empirical results with simulated data show that SHP is nearly as efficient as GP in out-of-sample prediction when the true process is stationary GP, and outperforms GP substantially when the true process is SHP. The SHP methodology is applied to enhanced vegetation index data and U.S. NO3 deposition data for illustration.

1

Keywords: Covariance function; Gaussian process; Importance sampling; Latent process; Non-stationarity; Nugget effect; Stochastic volatility.

1

Introduction

As spatial data in the environmental and geophysical sciences have become more prevalent, spatial models have seen rapid development. Applications of spatial data can be found in diverse areas including meteorology, ecology, environmental health, environmental science, agriculture, disease modeling, and complex computer experiments. Stationary and even isotropic Gaussian processes (GP) often provide the starting point for modeling spatial data. Excellent references for spatial modeling that include discussions of GP models can be found in Ripley (1981), Cressie (1993), Stein (1999) and Banerjee et al. (2003). The GP model possesses a number of attractive features such as an easily computable likelihood function, simple formulae for computing predictors at all points in the domain, and a straightforward method for measuring uncertainty of predictors throughout the domain. On the other hand, the GP has some strong limitations for data that exhibit spatial inhomogeneities. For example, if the smoothness of the process varies over the domain, then it can be difficult for a GP model, stationary or nonstationary, to provide an adequate description of the spatial surface. The stochastic heteroscedastic process (SHP) that is introduced in Section 2 attempts to overcome these issues while retaining a relatively simple formulation.

1.1

Gaussian processes

The linear GP model for spatio-temporal data has the form Y (s, t) = g(s, t)T β + σZ(s, t),

σ > 0,

s ∈ IRd ,

t ∈ IR ,

(1)

where g(s, t) = (g1 (s, t), . . . , gp (s, t))T are known regression functions, β = (β1 , . . . , βp )T is a vector of unknown regression coefficients, and σZ(s, t) is a stationary Gaussian stochastic process with mean 0, variance σ 2 and stationary covariance function Cov (Z(s, t), Z(s0 , t0 )) = C(h, λ) = σ 2 ρ(h, λ) with h = s − s0 , λ = t − t0 . The correlation function ρ(·, ·) characterizes the spatial and temporal dependence of the process Z(s, t). A major issue in modeling the GP noise is the selection of a suitable covariance function. Restricting attention for the moment to spatial data, one often allows for the possibility of measurement error or microscale variability, which is typically referred to as a “nugget” (Cressie (1993), Section 2.3). The spatial correlation function with relative nugget δ (0 ≤ δ ≤ 1) takes the form   1, if ks − s0 k = 0, δ 0 ρ (s, s ) = (2)  (1 − δ)Corr (Z(s), Z(s0 )) , if ks − s0 k > 0. 2

Spatio-temporal modeling considers spatial correlation, temporal correlation and possibly the interaction of space and time simultaneously. Gneiting et al. (2007) provides a thorough introduction and extensive references to the variety of spatio-temporal covariance functions that have been introduced in the literature, which we do not attempt to survey here. Of particular interest are separable models (e.g., Gneiting et al. (2007), Genton (2007), and the references below), constructed by taking the sum or product of two independent random fields, i.e., let Z(s, t) = S(s)+T (t) so that C(h, λ) = CS (h)+CT (λ), or let Z(s, t) = S(s)T (t) such that C(h, λ) = CS (h)CT (λ). P The more recently developed product-sum model is motivated by C(h, λ) = limn→∞ ni=1 CS,i (h)CT,i (λ). Examples of product-sum models can be found in de Cesare et al. (2002) and Ma (2002). A series of fully symmetric models of the form   ||h||2 σ2 , (h, λ) ∈ IRd × IR, ϕ C(h, λ) = ψ(|λ|2 )d/2 ψ(|λ|2 ) ϕ(u) : [0, ∞) → IR completely monotone, with(−1)n ϕ(n) (u) ≥ 0 ψ(u) : [0, ∞) → IR positive, with completely monotone derivative

∀n ∈ {1, 2, . . .} (3)

have been developed by Cressie and Huang (1999) and Gneiting (2002), and recently generalized by Schlather (2010).

1.2

Nonstationary models

Often, stationary Gaussian models are used as a starting point for modeling de-trended spatio-temporal data. When this working assumption fails, many authors consider nonstationary Gaussian models. For example, Sampson and Guttorp (1992) introduced an approach to model non-stationarity through deformation. They map the plane of geographical coordinates s of the observed locations via a non-linear transformation to a new “dispersion” plane, where stationarity (and, in fact, isotropy) holds. This approach has been applied and extended in a variety of ways, e.g., Mardia and Goodall (1993) and Smith (1996). A fundamental limitation of the deformation approach is that the implementation requires independent replications of the spatial process in order to get an estimated sample covariance matrix. Other methods of estimating nonstationary covariance functions by smoothing the sample covariance have been developed, including Nychka et al. (2002) by using multiresolution (wavelet) basis functions and Chang et al. (2010) by incorporating local basis functions and some stationary processes. Higdon et al. (1998) and Fuentes and Smith (2001) proposed two different approaches of kernel mixing to induce non-stationarity, which allows estimation of the nonstationary model based on a single realization observed at a number of locations. The former mixes a single process across all locations by allowing the kernel to vary spatially. The latter mixes an uncountable number of stationary spatial processes with spatially varying parameters at each location. Some early work on nonstationary models focused on nonstationary correlation functions while keeping the variance constant. In recent years, there have also been some geostatistical 3

models developed in which the variance is allowed to vary. The simplest version of this is through scaling (Banerjee et al. (2003)). Suppose Z(s, t) is a mean 0, variance 1 stationary process with correlation function ρ and σ(s, t) is a pre-specified deterministic function. Then W (s, t) = σ(s, t)Z(s, t) is a nonstationary process. Alternatively, the variance may be modeled through a random process instead of a deterministic function. For spatial data with independent replications, Damian et al. (2001) and Schmidt and O’Hagan (2003) proposed to specify the location-dependent variance when formulating the deformation model (Sampson and Guttorp (1992)) in fully Bayesian frameworks. The former modeled the temporal variances as a random field by use of a GP prior. The latter assigned an iid inverse Gamma prior distribution to the variances. We return to this theme of a varying variance in the next subsection.

1.3

Stationary non-Gaussian models

An alternative to nonstationary models is to consider stationary non-Gaussian models. The motivation comes from the stochastic volatility model in time series data analysis. Figure 1 panel (a) shows a stationary, zero-mean, non-Gaussian time series where the “nonstationary” features are clearly seen: values large in absolute value tend to be clustered, corresponding to high volatility, and values small in absolute value tend to be clustered, corresponding to low volatility. A stationary, zero-mean, Gaussian time series generated using the same unconditional covariance structure is shown in panel (b). Note its homogeneous features across the domain. Stochastic volatility (SV) is an important model in time series to capture changing variance and covariance structure (Shephard (1996)). Some examples of conditional variance modeling in a variety of environmetric applications include Tol (1996), Graf-Jaccottet and Jaunin (1998), Cripps and Dunsmuir (2003), Campbell and Diebold (2005), and Gneiting et al. (2006). Yan (2007) applies the ideas of SV to model spatial lattice data by introducing a spatial stochastic volatility (SSV) component into the popular conditional autoregressive (CAR) model. Palacios and Steel (2006) extend the SV ideas to establish a non-Gaussian geostatistical model. They adopted Mat´ern correlation functions using identical parameterizations for the main GP and the latent GP (SV component). They derived expressions for moments of the so-called “Gaussian-log-Gaussian” GLG model. We extend the GLG model by allowing for general correlation functions and dropping the restriction of identical parameterizations.

1.4

Overview of the paper

The remainder of this paper is organized as follows. Section 2 gives the definition of SHP and describes some of the key properties that are useful for modeling. Specifically, we investigate the characteristics of the unconditional and conditional covariance structures and explore the associated limiting processes as the range parameter in the latent process approaches zero or infinity. Unconditionally, the SHP is a stationary non-Gaussian process, with stationary 4

4 −8

−6

−4

−2

0

2

4 2 0 −2 −4 −6 −8 0

100

200

300

400

500

0

100

200

Time

300

400

500

Time

(a)

(b)

Figure 1: Stationary non-Gaussian and stationary Gaussian time series plots. Panel (a) shows a time series simulated from a stationary non-Gaussian process yt = t exp(ηt /2), where both t and ηt follow the AR(1) stochastic process. Panel (b) shows a time series simulated from a stationary GP. The two time series processes share the same unconditional    |λ|  φ|λ| φ 1 covariance function exp − 4 + 4(1−φ2 ) , where φ = 0.8 and λ is the time lag. 1−φ2

5

GP as a special case. Conditional on the latent process, the SHP is a nonstationary GP. The sample paths of SHP offer more modeling flexibility than those produced by a traditional GP, and can better reflect prediction uncertainty. A smoothed nugget effect is a unique property of the unconditional correlation function. We present a plug-in best predictor for process prediction. In Section 3, we propose to apply importance sampling in calculating an approximate likelihood and estimating the latent process. We derive an importance density which is multivariate normal. Some implementation details are given, including a robust estimation of the variance parameter and applying process convolution to approximate the latent process. In Section 4, we extend the single-realization model to a SHP model with replicates. The spatial replicates are modeled as independent realizations from a SHP model conditional on a common latent process. We discuss the estimation and prediction of the replicate model. In Section 5, we investigate the parameter estimation and process prediction performance of the proposed strategies through a simulation study. Also we show the improved outof-sample prediction performance of SHP modeling over a stationary GP fit through the simulation and two real data applications—enhanced vegetation index data and U.S. NO3 deposition data. Section 6 provides a brief conclusion and points out some future work.

2

The SHP model

As noted above, Palacios and Steel (2006) build a scale process σ(s, t) in W (s, t) = σ(s, t)Z(s, t) by using a GP with Mat´ern correlation, using the same parameters as those in the process Z(s, t). We extend their idea by considering general stationary covariance functions for the ln σ(s, t) and Z(s, t) processes, and by modeling and parameterizing these two processes independently.

2.1

Definition of SHP

We now represent the SHP model as follows: Y (s, t) = g(s, t)T β + W (s, t), (s, t) ∈ IRd × IR,   τ α(s, t) Z(s, t), σ > 0, τ > 0, W (s, t) = σ exp 2

(4)

where α(s, t) and Z(s, t) are two independent stationary Gaussian processes with mean 0, variance 1 and correlation functions ρα and ρz respectively. Throughout this paper, we take ρα and ρz to be isotropic in space and stationary in time, in the sense that Corr(α(s, t), α(s0 , t0 )) = ρα (khk, |λ|) , Corr(Z(s, t), Z(s0 , t0 )) = ρz (khk, |λ|) , for h = s − s0 , λ = t − t0 . The overall trend is g(s, t)T β and the correlated error process W (s, t) captures residual spatio-temporal dependence. Note that W (s, t) is stationary as 6

in the traditional GP model, but conditional on the α process, W (s, t) has a nonstationary covariance function. Also note that the W process has variance σ 2 exp(τ 2 /2) and kurtosis 3 exp(τ 2 ). Since the kurtosis is greater than 3, the W process has tails heavier than Gaussian. These heavy tails are a consequence of the latent process α(s, t), which models the clustering effect of volatility. This volatility clustering makes realizations generated from the SHP model better able to reflect spatial or temporal heterogeneities than those from Gaussian processes. That is, unlike GP realizations, SHP realizations can have sharp minima and maxima on parts of the input domain, with relatively flat spots on other parts of the input domain. See Figure 2 for examples of this behavior.

2.2

SHP Covariance function

Conditioning on the latent process α(s, t), the covariance function between two points is     τ α(s0 , t0 ) τ α(s, t) 0 0 0 0 2 ρz (ks − s k, |t − t |) exp . (5) Cov (W (s, t), W (s , t ) | α) = σ exp 2 2 Using the independence of the two Gaussian processes α and Z, it is easy to show that W has unconditional correlation given by   1 2 1 2 0 0 0 0 Corr (W (s, t), W (s , t )) = exp − τ + τ ρα (ks − s k, |t − t |) ρz (ks − s0 k, |t − t0 |). (6) 4 4 This unconditional correlation function flexibly combines the properties of the two correlation functions ρα and ρz . With t ≡ t0 , (6) is a pure spatial correlation function that can be used independently as a rich isotropic correlation class. We will discuss its unique “smoothed nugget effect” property in Section 2.3. Equations (5) and (6) indicate that the W process is conditionally heteroscedastic and unconditionally stationary. For a single realization of SHP, it is the covariance conditional on the latent process α that determines the heteroscedastic features. In Figure 2, we plot three GP realizations (panels (a)–(c)) and three SHP realizations (panels (d)–(f)). While the two models share exactly the same unconditional covariance functions, the realizations are remarkably different. The GP realizations have similar variability everywhere in the input domain. In contrast, the variability or “smoothness” of the SHP realizations varies dramatically over different parts of the region, with both sharp minima and maxima, and broad flat areas. This behavior is a reflection of the underlying realization of the α process, which in turn means that the behavior reflects the inhomogeneity of the conditional covariance function (5).

2.3

Smoothed nugget effect

Let φα and φz denote the spatial range parameters for the correlation functions ρα , ρz . We note that some authors define the range parameters as the reciprocals of φα and φz . In our parameterization, the correlation function is decreasing in the range parameter so that for a 7

8

8

6 0.8

6 0.8 4

2

0 0.4 −2

−4

−6

−8 0.2

0.4

0.6

0.2

0.4

(b)

4

2 0.6

0

0 0.4

0 0.4

−2

−2

−4

−2

−4 0.2

−4 0.2

−6

−6

−8 0.8

4

2 0.6

0.6

6 0.8

2

0.2

8

6

4

0.4

0.8

8

0.8

0.6

0.6

(c)

6 0.8

(d)

−8

0.8

8

0.4

−6

−8

0.8

(a)

0.2

−4 0.2

−6

0.6

−2

−4 0.2

0.4

0 0.4

−2

0.2

2 0.6

0

0.2

4

2 0.6

0.4

6 0.8

4

0.6

8

−6

−8 0.2

0.4

0.6

(e)

0.8

−8 0.2

0.4

0.6

0.8

(f)

Figure 2: GP and SHP 2-d simulation contour plots. Panels (a)–(c) are 3 realizations from the same GP. Panels (d)–(f) are 3 realizations from the same SHP model. The GP and SHP models have mean 0 and share the same unconditional covariance function with σ 2 = 1, τ 2 = 2 and Mat´ern correlation for both ρα and ρz , using ν = 2.5 and φα = 8.5 and φz = 4.4.

8

fixed spatial lag, the correlation function for the processes α and Z decrease as φα and φz go to infinity, respectively. We consider the limiting spatial processes (for fixed t) as φα → 0 and φα → ∞. When φα = 0, the α(·, t) process degenerates to a single N(0, 1) random variable and the unconditional correlation function for W (·, t) becomes ρz (·, 0). While the spatial process W (·, t) remains non-Gaussian, a single realization of W (·, t) is not distinguishable from a realization of a GP. On the other hand, as φα → ∞ for fixed t, the unconditional correlation function converges to   1, if h = ks − s0 k = 0, 0 Corr (W (s, t), W (s , t)) =  exp(−τ 2 /4)ρz (ks − s0 k, 0), if h > 0.

This correlation function is simply ρz (·, 0) with the addition of a relative nugget δ = 1 − exp(−τ 2 /4). The effect of varying the correlation parameters on the correlation function can be seen from Figure 3. The correlation decreases with increasing φα or τ 2 for each value of h > 0. As φα approaches infinity, the sequence of continuous SHP correlation functions converges to the discontinuous correlation function with nugget of size 1 − exp(−τ 2 /4). Because we can approximate the correlation function with nugget arbitrarily closely, we say that the SHP correlations can display a “smoothed nugget effect”. In the other direction, the correlation function decays smoothly for small values of φα . When τ 2 = 0, the correlation in panel (b) reduces to a Gaussian correlation. With τ 2 increasing, the smoothed nugget effect becomes more pronounced and the correlation decreases more sharply. For the traditional GP model, a nugget is added to the covariance structure to model the measurement error and microscale variability. These effects are modeled with a single parameter and cannot be separated. The additive nugget makes the covariance discontinuous at the origin, which is undesirable for microscale variation that accounts for possible model misspecification at a very fine scale. The smoothed nugget of the SHP unconditional correlation function explains the microscale variation in a natural way, using a parameterization that is distinct from an additive measurement error.

2.4

Prediction

A critical issue in spatio-temporal data analysis is prediction. Let x = (s, t) and consider observations of Y (x) at a set of m distinct site-times {xj }m j=1 (note that a given site may be observed at more than one time, and a given time may have observations at more than one site). It is of interest to predict Y0 = Y (x0 ) for x0 ∈ / {xj }m j=1 , and set α0 = α(x0 ). Let α := (α(x1 ), . . . , α(xm )) denote the vector of latent process values at the observed site-times, and Y := (Y (x1 ), . . . , Y (xm )) the vector of observations at those site-times. Let ψ := (θ, φα ) denote the vector of model parameters, where θ := (σ 2 , τ 2 , φz , β). We seek expressions for the best predictor, E(Y0 | Y ), and its prediction error variance Var(Y0 − E(Y0 | Y ) | Y ) = Var(Y0 | Y ). Given the latent process and parameter vector, the joint distribution of Y0 and Y is heteroscedastic Gaussian. The covariance matrix of (Y0 , Y ) conditional on (α0 , α) is given 9

1.0

1.0

τ2 = 0

φα = 30

τ2 = 1

φα = 300

τ2 = 3

0.8

0.8

φα = 0

φα = ∞

0.6 0.4 0.2 0.0

0.0

0.2

0.4

0.6

τ2 = 5

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

h

0.6

0.8

1.0

h

(a)

(b)

Figure 3: SHP unconditional correlation function plots. Panel (a) shows the effect of φα (φz = 5 and τ 2 = 2). Panel (b) shows the effect of τ 2 (φz = 5 and φα = 200). Both ρα and ρz are Gaussian correlations.

10

by n  τ α o r z (x0 , x)diag exp , n  τ2α o n 2  τ α o Rz diag exp , = σ 2 diag exp 2 2 = σ 2 exp(τ α0 ),

Cov(Y0 , Y | α0 , α) = σ 2 exp Cov(Y , Y | α) Cov(Y0 , Y0 | α0 )

τα  0

(7)

where Rz is the m × m correlation matrix for Z := (Z(x1 ), . . . , Z(xm )) and rz (x0 , x) is the 1 × m correlation vector between Z(x0 ) and Z. For ease of notation, we will write r z for r z (x0 , x) and r Tz for r z (x, x0 ). We will adopt similar notation (Rα and rα ) for the correlations of the α process. The term diag{exp(τ α/2)} refers to the m×m diagonal matrix with exp(τ α/2) as the diagonal elements. The conditional mean and variance of the predictive distribution of Y0 can then be written as: n  τ α o T −1 E(Y0 | Y , α, α0 ) = g(x0 ) β + exp(τ α0 /2)r z Rz diag exp − (Y − G(x)β), (8) 2 Var(Y0 | Y , α, α0 ) = σ 2 exp(τ α0 )(1 − r z Rz−1 r Tz ),

(9)

where G(x) = (g(x1 ), . . . , g(xm ))T . As such, the best predictor E(Y0 | Y ) can be obtained by integrating out (α0 , α) with respect to p(α0 , α | Y ) in (8). Note that p(α0 | α, Y ) is equivalent to p(α0 | α), since p(Y | α, α0 ) = p(Y | α). Hence the conditional distribution of α0 given Y , α is Gaussian with mean and variance given by E(α0 | α, Y ) = E(α0 | α) = r α Rα−1 α,

(10)

Var(α0 | α, Y ) = Var(α0 | α) = 1 − r α Rα−1 r Tα .

(11)

Using conditional log-normality, we then obtain   τ2 τ −1 −1 T r α Rα α + (1 − r α Rα r α ) , E(exp(τ α0 /2) | Y , α) = exp 2 8   τ2 −1 −1 T E(exp(τ α0 ) | Y , α) = exp τ r α Rα α + (1 − r α Rα r α ) . 2

(12)

(13)

Combining equations (8) and (12), the best predictor E(Y0 | Y ) is given by E(Y0 | Y ) = E[E(Y0 | Y , α, α0 ) | Y ] = E{E[E(Y0 | Y , α, α0 ) | Y , α] | Y } T

= E [g(x0 ) β + E(exp(τ α0 /2) | Y

, α)r z Rz−1 diag{exp(−τ α/2)}(Y

(14)  − G(x)β)] | Y .

The last expectation is taken with respect to the posterior density p(α | Y ), for which we do not have a closed form, so we cannot calculate (14) explicitly. Instead, we will apply importance sampling and Monte Carlo integration strategies for the last step. The details of estimating such a posterior expectation for a function of α will be discussed in Section 3.3. 11

Similar computations lead to Var(Y0 | Y ) = E(Var(Y0 | Y , α, α0 ) | Y ) + Var(E(Y0 | Y , α, α0 ) | Y ),  E(Var(Y0 | Y , α, α0 ) | Y ) = E [σ 2 E(exp(τ α0 ) | Y , α)(1 − r z Rz−1 r Tz )] | Y ,

Var(E(Y0 | Y , α, α0 ) | Y ) = E{[E(exp(τ α0 ) | Y , α)(r z Rz−1 diag{exp(−τ α/2)}(Y − G(x)β))2 ] | Y } −(E(Y0 | Y ) − g(x0 )T β)2 , (15) where E(exp(τ α0 ) | Y , α) is calculated by use of (13) and E(Y0 | Y ) is obtained by (14). The final expectations are taken with respect to p(α | Y ), which will be approximated by importance sampling and Monte Carlo integration. ˆ we plug ψ ˆ into (14) and (15) to get the plug-in best Given parameter estimates ψ, predictor (PBP) and the corresponding prediction variance. Note that the plug-in prediction variance calculated above depends on the observations Y . Recall the prediction variance for the GP model, Var(Y0 | Y ) = σg2 (1 − r g Rg−1 r Tg ), (16) where the subscript g indicates parameters and correlation matrices for the GP model. Equation (16) does not depend on the observation vector Y . Unlike the GP prediction variance, the observation-dependent SHP prediction variance can account for spatial heterogeneity.

3

Parameter estimation

Palacios and Steel (2006) applied Bayesian methods for inference. To deal with the high dimensionality and correlation in the latent process, they partitioned the latent vector into blocks, and constructed the proposal distribution by use of log-normal distributions to approximate truncated normal distributions in the conditional posterior. In this paper, we use pseudo maximum likelihood estimation and an importance sampling strategy.

3.1

Likelihood approximation

The joint density of (Y , α) of the SHP model is given by p(Y , α | ψ) = p(Y | α, θ)p(α | φα ) = p(Y | α, θ)|Rα |

− 21

  m 1 T −1 exp − α Rα α (2π)− 2 , 2

(17)

where

 n  τ α o n  τ α o p(Y | α, θ) ∼ N G(x)β, σ 2 diag exp Rz diag exp . 2 2 It follows that the likelihood of the observed data is given by the m-fold integral Z Z L(ψ; Y ) = p(Y , α | ψ)dα = p(Y | α, θ)p(α | φα )dα. 12

(18)

The likelihood (18) cannot be computed explicitly. A number of simulation-based procedures are available in the literature to approximate such integrals (e.g., Robert and Casella (1999)). One such method is importance sampling. Given an importance density pa (α | Y , ψ), the integral in (18) can be rewritten and approximated via Monte Carlo integration as Z p(Y | α, θ)p(α | φα ) pa (α | Y , ψ)dα L(ψ; Y ) = pa (α | Y , ψ)   p(Y | α, θ)p(α | φα ) = Ea pa (α | Y , ψ)   N 1 X p(Y | α(i) , θ)p(α(i) | φα ) ≈ , (19) N i=1 pa (α(i) | Y , ψ) where α(1) , . . . , α(N ) are drawn from pa (α | Y , ψ). Some methodologies have been developed to find efficient important densities, e.g., Danielsson and Richard (1993) and Durbin and Koopman (1997). As mentioned in Durbin and Koopman (1997), to achieve efficiency, the importance density pa (α | Y , ψ) should be chosen to be as close as can be managed to p(α | Y , ψ) within the class of conditional densities that are feasible and efficient for drawing simulation samples. The reason for this choice is that, if pa (α | Y , ψ) is exactly equal to p(α | Y , ψ), then a sample of only N = 1 is required for exact likelihood calculation, as is easily shown. In this paper, we refer to the likelihood approximation method in Davis and Rodriguez-Yam (2005) and obtain a density pa (α | Y , ψ) which is an approximation of the posterior density p(α | Y , ψ). Their work was applied to state-space models and recursive prediction algorithms, such as the Kalman recursion or innovations algorithm, accelerating the calculation in finding the importance density. In Section 3.2, we will derive pa (α | Y , ψ) by modifying the method in Davis and Rodriguez-Yam (2005) to fit the SHP model framework.

3.2

Importance density

To find a good approximation of p(α | Y , ψ), we follow the method proposed by Davis and Rodriguez-Yam (2005). We generate an importance density by ignoring the remainder in the Taylor series expansion of log p(α | Y , ψ) in a neighborhood of the posterior mode of p(α | Y , ψ), pa (α | Y , ψ) ∼ N(α∗ , (K ∗ + Rα−1 )−1 ), (20)

13

which is equation (6) of Davis and Rodriguez-Yam (2005), replacing V by Rα−1 . For the SHP model, K ∗ is given by τ2 (B + diag{c}), 4σ 2   τ α∗ B = diag exp − diag{Y − G(x)β}Rz−1 diag{Y − G(x)β} 2    τ α∗ , ×diag exp − 2   T τ α∗ c = exp − diag{Y − G(x)β}Rz−1 diag{Y − G(x)β} 2    τ α∗ ×diag exp − . 2

K∗ =

(21)

Thus we have created an importance density and can approximate the likelihood value by ˆ When using (19) to (19). Maximizing (19) with respect to ψ, we can get the estimate of ψ. calculate the likelihood, it is a common practice to use “common random numbers” (CRNs) to generate α(1) , . . . , α(N ) for different parameter values ψ. CRNs will help improve the smoothness of the likelihood and facilitate the convergence of estimates. The performance of the importance density (20) heavily depends on the posterior mode α∗ , which is obtained by maximizing the log-density of (Y , α). Since the dimensionality of α is the same as the number of observations, it is difficult to find α∗ especially for a large data set. To reduce the dimensionality in this optimization procedure, we adopt the process convolution method (Higdon (2002)) to approximate the latent process α. A GP α(x) over a domain D can be constructed through convolving a continuous white noise ω(x) with a smoothing kernel k(x), i.e., Z α(x) = k(u − x)ω(u)du, for x ∈ D. (22) D

The resulting covariance function for α(x) is given by Z Z 0 0 0 γ(x, x ) = Cov(α(x), α(x )) = k(u − x)k(u − x )du = k(u − d)k(u)du, D

(23)

D

where d = x − x0 . In the stationary, isotropic case, γ(x, x0 ) only depends on khk = ks − s0 k, λ = |t − t0 |. There is a one to one relationship between the smoothing kernel k(d) and the covariance γ(d) under mild conditions. The relationship is based on the convolution theorem for Fourier transforms. For example, the Gaussian kernel k(d) ∝ exp(−2φ || d ||2 ) corresponds to a Gaussian covariance function γ(d) ∝ exp(−φ || d ||2 ). More details about this relationship and various kernels and their induced covariance functions are given in Higdon (2002). We restrict the latent process ω(u) to be nonzero at site-times κ1 , . . . , κJ . We take J far less than the dimensionality of the observations, distributing the knots in the input domain using stratification and unequal probabilities to achieve good performance. The 14

stratification distributes knots across the domain, and the unequal probability sampling, typically with probabilities proportional to the absolute values of the observations, tends to put additional knots in regions of high volatility. See the empirical examples for further details. The resulting continuous α process is then approximated by α(x) =

J X

ωj k(x − κj ).

(24)

j=1

We substitute α in (17) by the linear combination (24) and maximize the joint density in (17) with respect to ω = (ω1 , . . . , ωJ )T , obtaining the mode ω ∗ = (ω1∗ , . . . , ωJ∗ )T . The mode P α∗ in (20) is then approximated by Jj=1 ωj∗ k(s − κj ). Empirical results show substantial improvement of applying this low-dimensional approximation in finding the posterior mode. By simulation, we found that the likelihood is flat for a wide range of σ 2 values, which brings difficulties in maximum likelihood estimation for σ 2 . It is impossible to profile out σ 2 in equation (18). Therefore, we explore an alternative way to estimate σ 2 . From SHP model (4), the expected value of the sample variance is P P σ 2 exp(τ 2 /2)(m2 − i j ρY (xi , xj )) 2 . E(s ) = m(m − 1) Therefore, an unbiased estimator for σ 2 would be σ ˆ2 =

m(m − 1) exp(−τ 2 /2)s2 P P m2 − i j ρY (xi , xj )

(25)

if τ 2 and ρY were known. We propose to fix σ 2 at the sample variance and obtain estimates for τ 2 , φα , φz and β by maximizing the approximated log likelihood. Then by plugging estimates of the other parameters in equation (25), we get the final estimate for σ 2 . Because the data have a heavy-tailed distribution, we use a robust alternative to s2 in estimating σ 2 . We use the scaled Median Absolute Deviation (MAD) estimate, a robust estimate for σ developed by Johnstone and Silverman (1997), MAD =

median(|Y − median(Y )|) , 0.6745

(26)

where 0.6745 is the standard normal MAD. We replace s2 in equation (25) by the square of MAD in equation (26). Since the parameters are estimated by maximizing the approximated likelihood, using the plug-in estimate for σ 2 , the estimation strategy is referred to as pseudo maximum likelihood estimation (PMLE), following Gong and Samaniego (1981).

3.3

Estimation of functions of volatility

If ψ were known, a function h(α) of the latent process at observed locations could be estimated as the conditional expectation E[h(α) | Y , ψ], given by E[h(α) | Y , ψ] =

Ea [h(α)p(Y | α, θ)p(α | φα )/pa (α | Y , ψ)] . Ea [p(Y | α, θ)p(α | φα )/pa (α | Y , ψ)] 15

(27)

ˆ we sample α(1) , . . . , α(N ) from pa (α | Y , ψ), ˆ and approximate Given parameter estimates ψ, equation (27) by Monte Carlo integration. We will typically be interested in estimating α itself and the conditional standard deviation exp(τ α/2). The PBP (equation (14)) and its variance (equation (15)) will be computed using this scheme.

4

SHP with replicates

In practice, some spatial processes are measured at a finite set of locations over a region of interest D at regular times. For example, in studies of air pollution, some measurements of interest are recorded at monitoring stations at regular time intervals (daily, monthly, yearly, etc.) Frequently, the aims are to estimate the process Y (s, t) and to predict the process at locations and times which are not being measured. One approach to reducing the complexity and computational burden of treating both time and space is to simplify the temporal aspect. For example, it is common to assume that the observations are independent in time, usually after detrending: see Sampson and Guttorp (1992), Schmidt and O’Hagan (2003), Damian et al. (2001) and Nychka et al. (2002). We also consider a simplified temporal structure in the case of “replicates” at T regular time intervals of n observations at a fixed set of spatial locations, {si }ni=1 . Let Y t = (Y (s1 , t), . . . , Y (sn , t))T for t = 1, . . . , T . We assume that Y 1 , . . . , Y T are conditionally independent realizations from (4), given a common realization of the α process; i.e., Y (s, t) = g(s, t)T β + W (s, t),   τ α(s) Z(s, t), W (s, t) = σ exp 2

σ > 0,

τ > 0,

(28)

where Z(s, t), t = 1, . . . , T are independent stationary Gaussian processes with mean 0, variance 1 and correlation function ρz . The latent process α(s), independent of t, is used to model the spatially-dependent heteroskedasticity in the error process. The log density of (Y 1 , . . . , Y T , α) is given by T

X n 1 1 l(ψ; Y 1 , . . . , Y T , α) = − log(2π) + log |Rα |−1 + log p(Y t | θ, α) − αT Rα−1 α. (29) 2 2 2 t=1 Comparing this logPdensity with the log of equation (17), the only difference is to replace log p(Y | θ, α) by Tt=1 log p(Y t | θ, α). Therefore, we can derive the importance density, apply pseudo maximum likelihood estimation and predict the process following a strategy similar to that in Section 3. For more details, refer to Huang (2008).

16

5 5.1

Empirical results Simulation

To compare the performance of SHP and GP, we consider two sets of simulations: one set of 100 realizations from a one-dimensional SHP model, with parameters summarized in Table 1, and one set of 100 realizations from a one-dimensional GP model, with mean 0, variance 4 and Gaussian correlation function with range parameter equal to 100. For the SHP model, both ρα and ρz are taken to be Gaussian correlation functions. In each realization, the true sample path is generated at 200 points equally spaced on [0, 2]. In Figure 4, panel (a) shows two SHP realizations and panel (d) shows two GP realizations from the simulation. For each realization, we take observations at 30 points regularly spaced in the domain [0, 2]. We then fit both SHP and GP models using the 30 observations, estimate model parameters, predict Y at the remaining unobserved locations and calculate the out-of-sample mean square prediction errors (MSPE). For the SHP low-dimensional approximation, we used J = 10 with 5 knots evenly distributed within the domain and 5 knots randomly selected using probabilities proportional to the absolute values of the observations. For the GP model fitted to the SHP realizations, we tried Gaussian, exponential, spherical and Mat´ern correlation functions. It turns out that Gaussian correlation leads to the best prediction result, i.e., smallest errors (MSPE). Therefore, we only present the results for the GP model using a Gaussian correlation function. For SHP modeling, we implement pseudo maximum likelihood estimation and obtain PBP using Gaussian correlation functions for both ρα and ρz . Figure 4 panels (b) and (c) give boxplots, with and without outliers, of the 100 MSPE’s from the SHP realizations for each method. We can see that SHP outperforms the GP modeling substantially because it captures the spatial heterogeneities in the SHP realizations. Panels (e) and (f) give boxplots, with and without outliers, of the 100 MSPE’s from the GP realizations for each method. In this example, SHP model fitting to the GP realizations results in prediction performance nearly identical to GP model fitting to the GP realizations. We also consider performance of the PMLE’s of the parameter values when the true generating process is SHP. Table 1 lists the mean and standard deviation of parameter estimates for the SHP model based on 100 realizations from the SHP model. Overall, the pseudo-MLE parameter estimates are seen to be nearly unbiased. The large standard deviations, relative to the means, reflect not only the moderate sample size but also the flatness of the likelihood function in a neighborhood of the optimum. For example, in results not reported here, we simulated a one-dimensional SHP as above, with φz = 80, then predicted its values out-of-sample using φz = 60, 70, 80, 90, 100, 110, and 120. The MSPE’s change very slowly as a function of φz , with MSPE’s for φz = 70, 80 and 90 all very similar. We note also that this experiment suggests that though our plug-in procedure might tend to lead to underestimation of prediction error variance, the effect of parameter estimation uncertainty is small relative to the effect of prediction uncertainty.

17

0.8

0.015

0.6

0.020

4

0.010

0.4

2

0.005

0.2

0

0.0

0.000

−2 0.0

0.5

1.0

1.5

2.0

GP

GP

(b) 0.030 0.000

−3

0.005

0.01

−2

0.010

−1

0.02

0.015

0

0.020

0.03

1

0.025

2

PBP

(c)

0.04

(a)

PBP

0.0

0.5

1.0

(d)

1.5

2.0

GP

SHP

(e)

GP

SHP

(f)

Figure 4: One-dimensional simulations. Panel (a) shows two sample paths simulated from the 1-d SHP model. Panel (b) shows boxplots of 100 out-of-sample mean square prediction errors for empirical GP predictor and plug-in SHP predictor, each based on a model fitted to 30 observations. Panel (c) shows the same boxplots as (b) with extreme outliers omitted to facilitate comparison. Panel (d) shows two sample paths simulated from the 1-d GP model. Panel (e) shows boxplots of 100 out-of-sample mean square prediction errors for empirical GP predictor and plug-in SHP predictor, each based on a model fitted to 30 observations. Panel (f) shows the same boxplots as (e) with extreme outliers omitted to facilitate comparison.

18

Table 1: Performance of pseudo maximum likelihood estimates of parameters in the 1dimensional SHP model. The means and standard deviations are based on 100 simulated realizations from the model.

True

σ2

τ2

φα

φz

β

0.2

4

40

80

0

Mean 0.22 3.85 41.28 90.25 0.01 Stdev 0.23 2.41 20.80 22.09 0.34

5.2

Enhanced Vegetation Index (EVI) Data Analysis

The Enhanced Vegetation Index (EVI) is the most common index used to assess Earth’s vegetation from space. It was developed by Huete et al. (2002) and uses remote sensing data collected with the MODIS (Moderate-resolution Imaging Spectroradiometer) sensor on NASA’s Terra satellite. EVI describes the relative “greenness” of Earth’s vegetation by comparing absorption and reflection of visible and near-infrared sunlight. It has been shown to be well correlated with leaf area index, biomass, canopy cover, and the fraction of absorbed photosynthetically active radiation (Gao et al. (2000)), and therefore is useful for monitoring seasonal, inter-annual, and long-term variation of the vegetation structure (Huete et al. (2002), Gurung et al. (2009)). The data for a given pixel are collected every 8 days due to the satellite’s rotation, and cloud contamination and other sub-optimal atmospheric or illumination conditions can also result in missing data, so it is of interest to predict EVI at unsampled times (Gurung et al. (2009)). The data set contains EVI values recorded every 8 days for 6 full years (January 2000 to December 2005) in Iowa (longitude 91.91◦ N and latitude 42.84◦W). EVI values range from 259 to 8034. We standardized the EVI values (subtracted the sample mean and divided by the sample standard deviation) before model fitting. From panel (a) of Figure 5, we see the strong periodic behavior of the EVI curve, and so we first regress on a set of sinusoidal basis functions, k X f (t) = β0 + (βcr cos(2rπt/365) + βsr sin(2rπt/365)), (30) r=1

choosing k = 6 based on a preliminary analysis of fitting successively higher-order models and examining the residuals. (Residuals with similar characteristics are obtained for k = 3 through k = 7.) Figure 5 panel (b) shows the residuals from this fit. Since the residuals are obviously correlated, we consider GP and SHP models. The presence of inhomogeneous features suggests that SHP may be particularly useful. We compare the performance of GP and SHP modeling by examining the out-of-sample prediction performance. We randomly sample 6 time points per year and use these 36 observations to fit GP and SHP models, then predict the residuals at the remaining 216 time points. We repeat this procedure 100 times, independently sampling 36 points each 19

−1.0

0.05 0.04 0.03 0.02 0.01

−0.6

−0.5

−0.4

0.0

−0.2

0.5

0.0

1.0

0.2

1.5

0.4

2.0

0.6

2.5

time. For GP modeling, we tried Gaussian, exponential and spherical correlation functions. The spherical correlation function leads to the smallest MSPE, and we report GP results for this case only. Figure 5 panel (c) provides the MSPE boxplots. For the SHP modeling, we chose J = 12 knots in the low-dimensional approximation, with 6 knots evenly distributed within the domain and an additional 6 knots (one knot per year) randomly selected using probabilities proportional to absolute value of the observations. The SHP model with plug-in best prediction outperforms GP considerably in this example.

0

500

1000

1500

2000

0

500

1000

t

t

(a)

(b)

1500

2000

GP

PBP

(c)

Figure 5: EVI data analysis. Panel (a) plots the original data and regression curve fitted using equation (30). Panel (b) shows the residuals after regression. The dashed lines separate different years. Panel (c) shows MSPE boxplots for 100 replicates of out-of-sample prediction of 216 days given 36 days selected by stratified simple random sampling, six days per year.

5.3

NO3 deposition data analysis

The National Atmospheric Deposition Program (NADP) monitors wet atmospheric deposition in the United States. In this section, we will analyze the annual average of NO3 concentration (mg/L) from 1986 to 2005. The domain of interest covers 17 states in the western United States with longitude between 100◦ and 120◦ west and latitude between 30◦ and 50◦ north. There are a total of 79 monitoring sites, 52 of which have complete data for 20 years. The remaining 27 sites have an average of 43% complete data (232 records/(20 × 27)), with records varying from two years to 18 years. In practice, it is of interest to predict the un-monitored/un-recorded data values. In order to assess prediction performance, we fit the stationary GP model and SHP model using only the data from the 52 sites with complete records, then predict the data from the 27 sites with incomplete records. We calculate the out-of-sample MSPE using the 232 observations recorded irregularly across 20 years at the 27 sites. For the GP model, the observations 20

across years are regarded as iid replicates with common isotropic covariance function in space. For the SHP model, the observations across years are taken as independent replicates conditional on a common α process. Therefore we will fit a SHP model with replicates, i.e., model (28). For the low-dimensional approximation, we chose J = 12 knots from among the 52 monitoring sites. Sites were randomly selected using probabilities proportional to the variance (across years) at each location. For GP models, we tried Gaussian correlation functions with and without nugget and exponential correlation functions with and without nugget. It turns out that the exponential correlation function without nugget model yields the smallest overall MSPE (taken over 232 observations across 27 locations and 20 years), which equals 0.057. We fit the SHP model with replicates by use of exponential correlation functions for both α and Z processes. The overall MSPE for SHP model is 0.050. So the SHP model reduces the overall MSPE by about 12% compared to the best GP model. In addition to computing the overall MSPE, we summarize the out-of-sample prediction errors for the 27 locations (computing MSPE across all recorded years within each location) and for the 20 years (computing MSPE across all locations within each year). The top row of graphs in Figure 6 compare relative MSPE ratios over 27 locations (panel a) and across 20 years (panel b). For MSPE across the 27 locations, SHP is rarely much worse than GP (sites 3, 21 and 23), often comparable (12 locations), and often much better than GP (12 locations). For MSPE across the 20 years, SHP outperforms GP for all but four years, in which SHP has comparable MSPE. In Figure 6, the contours of the sample standard deviations and the estimated SHP conditional standard deviations are plotted separately in panels (c) and (d), respectively. Note that we observe smaller standard deviations around the Idaho (ID) and Nevada (NV) regions and higher standard deviations around the Colorado (CO) region in panel (c). The SHP conditional standard deviations depicted in panel (d) mimics these gross features in the sample standard deviations contour plot and depicts spatial inhomogeneous features that an isotropic covariance function is unable to capture. We observe that the three locations yielding largest relative MSPE ratios (locations 12, 13 and 25) are clustered in the low volatility areas near ID and NV, while locations 6, 7, 8, 9, 10, that produce the next largest MSPE ratios, are clustered in the high volatility areas near CO. This suggests that the SHP is reflecting some of the spatial inhomogeneities that it was designed to model.

6

Conclusions

In this paper, by analogy to temporal and spatial lattice stochastic volatility models, we extended the non-Gaussian geostatistical model proposed by Palacios and Steel (2006) to a stochastic heteroscedastic process (SHP). Conditional on a latent GP, the SHP is a GP with nonstationary covariance structure. Unconditionally, the SHP is a stationary non-Gaussian process. The realizations from SHP are versatile and can show obvious heterogeneous features. The unconditional correlation of SHP offers a rich class of correlation functions which can also allow for a smoothed nugget effect. This smoothed nugget explains the microscale 21

0.5 0.2 0.0

0.1

Relative MSPE ratios

0.3

0.4

1.5 1.0 0.5 0.0

−0.1

−0.5

Relative MSPE ratios

1

3

5

7

9

11 13 15 17 19 21 23 25 27

1 2 3 4 5 6 7 8 9

Locations

11

13

15

17

19

Year

0.4

50

(b)

50

(a)

0.35 MT

45

WA 0.3

ND0.3

0.2

0.3

OR

0.3

ID

ND

MT

WA

45

0.5

OR ID

0.4 SD

0.6

WY

NE

40

0.3 0.3

0.5

0.4

KS OK

0.5 0.6

NM

0.7 0.3

0.4

0.4

0.7

UT

NV

0.45

0.4

CA

35

40 35

6

CO

0.5 CA

NE

0.6

UT

0.65

0.

NV

SD

WY

0.3

5

CO

0.25

AZ

0.55

TX

0.

30

30

5

0.4

TX

OK

0.35 0.3

NM

AZ

KS

5 0.5 0.5 0.4

−120

−115

−110

−105

−100

−120

(c)

−115

−110

−105

−100

(d)

Figure 6: NO3 deposition data analysis. Panel (a) plots the relative MSPE ratios at each of 27 locations. Panel (b) shows the relative MSPE ratios across 20 years. We calculate the relative MSPE ratio by [MSPE(GP)−MSPE(SHP)]/mean[MSPE(GP),MSPE(SHP)]. Ratios greater than 0 favor the SHP model. Panel (c) is the contour plot of sample standard deviations using cubic spline interpolation based on 52 sites. Panel (d) is SHP conditional standard deviations, i.e., the estimates of σ exp(τ α/2). The triangles are 52 complete data locations. Both contour plots are based on 50 × 50 grid points. 22

variation in a natural way. The unconditional correlation function can be used independently as a flexible correlation class. The SHP model provides for a richer and more varied array of sample paths than can be produced by a stationary GP model. However, this additional flexibility comes at the expense of a more complicated and difficult estimation procedure. We have proposed importance sampling and Monte Carlo integration to evaluate and maximize an approximation to the likelihood function. The importance density is based on a multivariate normal distribution with mean and covariance that are functions of the posterior mean of the α process given the data. As such, the performance of this importance density depends heavily on the quality of the posterior mode estimation. By using a low-dimensional approximation of the latent process in the optimization procedure, the posterior mode estimation can be improved dramatically. Our choices of the number and locations of knots in the approximation made our numerical experiments computationally feasible. While improving upon these choices would be an interesting direction for further research, our empirical results with these choices show improved out-of-sample prediction performance of SHP relative to the classical GP approach for simulated data, EVI data, and NO3 deposition data. Our prediction error variances are estimated by plugging in estimated parameter values, and so run the risk of underestimation of prediction uncertainty. In our examples, however, the greatest impact on the prediction error uncertainty comes from the prediction error (response minus model prediction at fixed parameter values), not from estimation error (model prediction at fixed parameters minus model prediction at estimated parameter values), so we do not expect the underestimation due to plug-in estimation to be substantial. Bootstrap corrections or Bayesian inferential methods could be explored to adjust for such underestimation, if necessary. The estimation procedure proposed here is computationally expensive. When the number of observations is large, it is difficult to estimate the posterior mode accurately and provide a reliable importance density. One possible solution would be to divide the domain into blocks, compute the log-likelihood within each block, then sum across blocks to provide a reasonable and computable criterion. This will be a subject of future work.

Acknowledgements This research was supported by grants from the National Science Foundation (NSF Grant Nos. MSPA-CSE-0434354 and DMS-0743459), the NASA Applied Sciences Program (NASA Grant No. NNG05GL07G) and USDA/CSREES Carbon Cycle Science Program (Agreement No. 2001-38700-11092).

References Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2003). Hierarchical Modeling and Analysis for Spatial Data. Chapman and Hall/CRC.

23

Campbell, S. D. and Diebold, F. X. (2005). Weather forecasting for weather derivatives. Journal of the American Statistical Association, 100:6–16. Chang, Y.-M., Hsu, N.-J., and Huang, H.-C. (2010). Semiparametric estimation of nonstationary spatial covariance function. Journal of Computational and Graphical Statistics, 19(1):117–139. Cressie, N. and Huang, H.-C. (1999). Classes of nonseparable, spatio-temporal stationary covariance functions. Journal of American Statistical Association, 94:1330–1340. Cressie, N. A. C. (1993). Statistics for Spatial Data. New York: Wiley, 2nd edition. Cripps, E. and Dunsmuir, W. T. M. (2003). Modeling the variability of Sydney Harbor wind measurements. Journal of Applied Meteorology, 42:1131–1138. Damian, D., Sampson, P. D., and Guttorp, P. (2001). Bayesian estimation of semiparametric non-stationary spatial covariance structures. Environmetrics, 12:161–178. Danielsson, J. and Richard, J.-F. (1993). Accelerated Gaussian importance sampler with applications to dynamic latent variable models. Journal of Applied Econometrics, 8:153– 173. Davis, R. A. and Rodriguez-Yam, G. (2005). Estimation for state-space models: an approximate likelihood approach. Statistica Sinica, 15:381–406. de Cesare, L., Myers, D., and Posa, D. (2002). Fortan programs for space-time modeling. Computational Geosciences, 28:205–212. Durbin, J. and Koopman, S. J. (1997). Monte Carlo maximum likelihood estimation for non-Gaussian state space models. Biometrika, 84:669–684. Fuentes, M. and Smith, R. L. (2001). A new class of nonstationary spatial models. Technical report, North Carolina State University, Department of Statistics. Gao, X., Huete, A., Ni, W., and Miura, T. (2000). Optical-biophysical relationships of vegetation spectra without background contamination. Remote Sensing of Environment, 74:609–620. Genton, M. G. (2007). Separable approximations of space-time covariance matrices. Environmetrics, 18 (7):681–696. Gneiting, T. (2002). Nonseparable, stationary covariance functions for space-time data. Journal of American Statistical Association, 97:590–600. Gneiting, T., Genton, M. G., and Guttorp, P. (2007). Geostatistical space-time models, stationarity, separability and full symmetry. In Statistical Methods for Spatio-Temporal Systems, pages 151–175. B. Finkenstadt, L. Held and V. Isham (Eds.).

24

Gneiting, T., Larson, K., Westrick, K., Genton, M. G., and Aldrich, E. M. (2006). Calibrated probabilistic forecasting at the stateline wind energy center: The regime-switching spacetime method. Journal of American Statistical Association, 101:968–979. Gong, G. and Samaniego, F. J. (1981). Pseudo maximum likelihood estimation: Theory and applications. Annals of Statistics, 9 (4):861–869. Graf-Jaccottet, M. and Jaunin, M.-H. (1998). Predictive models for ground ozone and nitrogen dioxide time series. Environmetrics, 9:393406. Gurung, R., Breidt, F., Dutin, A., and Ogle, S. (2009). Predicting enhanced vegetation index (EVI) curves for ecosystem modeling applications. Remote Sensing of Environment, 113:2186–2193. Higdon, D. (2002). Space and space-time modeling using process convolutions. In Quantitative Methods for Current Enviromental Issues, pages 37–56. London: Springer-Verlag. Higdon, D., Swall, J., and Kern, J. (1998). Non-stationary spatial modeling. In Bayesian Statistics 6, pages 761–768. Oxford: Oxford University Press. Bernardo, J. M., Berger, J. O., Dawid, A. P. and Smith, A. F. M. (Eds.). Huang, W. (2008). Spatial Processes with Stochastic Heteroscedasticity. PhD thesis, Colorado State University, Fort Collins, CO. Huete, A., Didan, K., Miura, T., Rodriguez, E., Gao, X., and Ferreira, L. (2002). Overview of the radiometric and biophysical performance of the modis vegetation indices. Remote Sensing of Environment, 83:195–213. Johnstone, I. M. and Silverman, B. W. (1997). Wavelet threshold estimators for data with correlated noise. Journal of the Royal Statistical Society: Series B, 59:319–351. Ma, C. (2002). Spatio-temporal covariance functions generated by mixtures. Journal of Mathematical Geology, 34:965–975. Mardia, K. V. and Goodall, C. R. (1993). Spatial-temporal statistical analysis of multivariate environmental monitoring data. In Multivariate Enviromental Statistics, pages 347–386. In Patil, G.P. and Rao, C.R. (Eds.). Nychka, D., Wikle, C., and Royle, J. A. (2002). Multiresolution models for nonstationary spatial covariance functions. Statistical Modelling, 2:315–331. Palacios, M. B. and Steel, M. F. J. (2006). Non-Gaussian Bayesian geostatistical modeling. Journal of the American Statistical Association, 101:604–618. Ripley, B. (1981). Spatial Statistics. John Wiley & Sons. Robert, C. P. and Casella, G. (1999). Monte Carlo Statistical Methods. Spinger-Verlag, New York, 2nd edition. 25

Sampson, P. D. and Guttorp, P. (1992). Nonparametric estimation of nonstationary spatial covariance structure. Journal of the American Statistical Association, 87:108–119. Schlather, M. (2010). Some covariance models based on normal scale mixtures. Bernoulli, 16:780–797. Schmidt, A. M. and O’Hagan, A. (2003). Bayesian inference for non-stationary spatial covariance structure via spatial deformations. Journal of the Royal Statistical Society: Series B, 65:743–758. Shephard, N. (1996). Statistical aspects of ARCH and stochastic volatility. In Time Series Models in Econometrics, Finance and Other Fields, pages 1–67. Chapman and Hall, London. In: Cox, D. R., Hinkley, D. V. and Barndork-Nielsen, O. E. (Eds.). Smith, R. L. (1996). Estimating nonstationary spatial correlations. Technical report, Cambridge University, UK. Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. New York: Spinger-Verlag. Tol, R. S. (1996). Autoregressive conditional heteroscedasticity in daily temperature measurements. Environmetrics, 7:6775. Yan, J. (2007). Spatial stochastic volatility for lattice data. Journal of Agricultural, Biological, and Environmental Statistics, 12(1):25–40.

26