Kriging with Nonparametric Variance Function Estimation - CiteSeerX

Report 2 Downloads 127 Views
Kriging with Nonparametric Variance Function Estimation J. D. Opsomer D. Ruppert Iowa State University Cornell University M. P. Wand U. Holst Harvard University University of Lund O. Hossjer University of Lund 1 2 September 4, 1998

Research supported by the Swedish National Board for National and Technical Development Grant 91-02637P, National Science Foundation Grant #DMS9626762 and by the Center for Agricultural and Rural Development at Iowa State University. 2 The authors thank Wayne Fuller for the helpful suggestions in creating the bias-adjusted variogram. 1

Abstract A method for tting regression models to data that exhibit spatial correlation and heteroskedasticity is proposed. It is well-known that ignoring a nonconstant variance does not bias least-squares estimates of regression parameters, so data analysts are easily lead to the false belief that moderate heteroskedasticity can generally be ignored. Unfortunately, ignoring nonconstant variance when tting variograms can seriously bias estimated correlation functions. By modeling heteroskedasticity and standardizing by estimated standard deviations, our approach eliminates this bias in the correlations. A combination of parametric and nonparametric regression techniques is used to iteratively estimate the various components of the model. The approach is demonstrated on a large dataset of predicted nitrogen runo from agricultural lands in the Midwest and Northern Plains regions of the U.S. For this dataset, the model is comprised of three main components: (1) the mean function which includes farming practice variables, local soil and climate characteristics and the nitrogen application treatment, is assumed linear in the parameters and tted by generalized least squares, (2) the variance function, which contains a local as well as a spatial component whose shapes are left unspeci ed, is estimated by local linear regression, and (3) the spatial correlation function is estimated by tting a parametric variogram model to the standardized residuals, with the standardization adjusting the variogram for the presence of heteroskedasticity. The tting of these three components is iterated until convergence. The model provides an improved t to the data compared to a previous model that ignored the heteroskedasticity and the spatial correlation.

Key words and phrases: Heteroskedasticity, local linear estimation, metamodel, runo modeling, spatial correlation.

1 Introduction For many practical problems, the degree to which components of the statistical model can be speci ed in a parametric form varies dramatically. When the model is misspeci ed, the resulting model t can be biased and the possibility for making wrong inferences exists. On the other hand, when part of the model is amenable to parametric tting, it is useful to do so in order to have a more analytically tractable model and be able to use traditional inference techniques. Even in the most common form of nonparametric regression where the mean function is left unspeci ed, it is common to assume that the observations are uncorrelated, which can be viewed as a \parametric" assumption on the distribution of the errors. Violation of that assumption has a serious e ect on the optimal bandwidth for estimating that mean function (e.g. Opsomer (1997)). Most models for spatial data assume a stationary process implying a constant variance. When the data are heteroskedastic, naively assuming a constant variance when tting a variogram can lead to badly biased estimates of the correlation function. To appreciate this problem, one need only consider that the variance of the di erence between two observations depends not only on their correlation but also on their individual variances. In our experience, heteroskedasticity is common in spatial data but rarely can be t by a parametric model. In this article, we consider an application where it appears reasonable to accept a (roughly) linear relationship between dependent and independent variables and the observations clearly display spatial dependence, but where the shape of the spatial variance cannot be speci ed a priori. The proposed approach blends elements of parametric and nonparametric tting and is applicable a wide range of problems, in particular those that involve spatially distributed observations. We begin by describing the application that motivated this research. Economists at the Center for Agricultural and Rural Development at Iowa State University (CARD) are developing models to evaluate the impact of federal and state agricultural policies on the nitrogen water pollution in the Midwest and Northern Plains of the U.S. (Wu, Lakshminarayan and Babcock (1996)), at both the regional and the local level. Local prediction is achieved by using the 128,591 National Resources Inventory (NRI) points in the re1

gion of interest as the basis for the evaluation of pollution impact: the NRI database provides measurements on many landuse and soil variables of interest, as well as sampling weights allowing statistically valid area predictions based on the point predictions (Nusser and Goebel (1997)). Nitrogen pollution occurs via two primary pathways: by nitrogen runo into surface waters, and by leaching through the soil into the groundwater. In the current article, we will focus on the prediction of nitrogen runo . Table 1 shows the variables used in the model. They are further described in Wu et al. (1996). A map of the study region containing the locations of weather stations is given in Figure 1. The estimated variance function also displayed there will be discussed later. INSERT TABLE 1 HERE. Nitrogen runo from non-point sources such as agricultural practices is typically unobservable, especially at the scale of interest in this study. The Water Quality and Erosion Productivity Impact Calculator (EPIC-WQ, see Sharpley and Williams (1990)), a widely used deterministic biogeophysical process model, provides, at least conceptually, a convenient tool for predicting the nitrogen runo at the NRI points. Running the model for all NRI points would be very computation-intensive, and any change in any of the input variables would require re-running the EPIC-WQ model. It was therefore decided to estimate a statistical \metamodel" on a representative subset of 11,403 data points, and use this metamodel in place of EPIC-WQ to predict nitrogen runo at the remaining observation points. Another advantage of this approach is the estimation of coecients and accompanying con dence intervals for the covariate e ects, providing additional insights in the nature of the e ect of agricultural practices (represented by NRATE and the dummy variables in Table 1) on nitrogen pollution. The original approach of Wu et al. (1996) was to t the metamodel by OLS after transforming the dependent variable and adding a limited number of interaction terms. The model was: (Y N 03)

1=3:5

= +Z

1

+ NRATE  Z + X + iid errors; (1) 2

z1

z2

x

where Z contains the values for the covariates from Table 1 except the weather station location, Z the same as Z except for the removal of the covariate NRATE , and X = (LAT; LONG) the location of the nearest 1

2

1

2

weather station. We will let Z = [Z NRATE  Z X ] and for simplicity refer to Z as the covariates for this model, and let = [ 1 2 ] . The location and interaction terms were included to improve the t of the model, and the transformation was selected to remove some of the observed departures from the usual assumptions that the errors are homoscedastic and normally distributed. Nevertheless, the residuals still exhibited both severe heteroskedasticity, as well as spatial correlation. As noted in Carroll and Ruppert (1988), transformations of the dependent variable only remove heteroskedasticity when it depends on the mean. They are therefore not appropriate in cases where spatial location appears to cause most of the variance e ects. This was con rmed for this dataset: when using the proposed model, transformation of the dependent variable no longer had any noticeable e ect on the goodness of t of the model (see Section 4). In the current paper, we demonstrate how a combination of universal kriging and nonparametric variance function estimation can be used to develop an improved regression model for this problem, while maintaining the interpretability of mean function model (1). The choice of kriging is motivated by the fact that one of primary uses of this model is the prediction of Y N 03 at the large number of points not included in the regression observations, a situation for which kriging has well-known optimality properties (Cressie (1993)). Since the residuals of the OLS t of model (1) exhibited signi cant heteroskedasticity as well, the explicit inclusion of a spatial variance function is expected to further improve the t of both the mean and the correlation function. A generalization of the nonparametric variance estimation approach of Ruppert, Wand, Holst and Hossjer (1997) is used to estimate the variance function. Section 2 proposes a model that explicitly accounts for the heteroskedasticity and spatial correlation in the data, and Section 3 describes the approach used in estimating its various components. In Section 4, the model estimates are discussed. Section 5 addresses the use of universal kriging for predicting the nitrogen runo values at the remaining NRI points not included in the metamodel. 1

3

2

T

T

z

z

T T x

2 The Model The data consist of n scalar response measurements Y (the Y N 03 measurements from Section 1) and covariates Z recorded at N distinct geographic sites x (the weather stations from Section 1), and the total number of observations is denoted by n = P n . The model is i

ij

ij

i

N i=1

i

Y = Z + v (x ) " + v (x ) u T ij

ij

"

i

1=2

i

u

1=2

i

(2)

ij

for j = 1; : : : ; n , i = 1; : : : ; N . Here is a q  1 vector of parameters, v and v are bivariate variance functions. The errors u are independent and identically distributed with E (u ) = 0, var(u ) = 1 and the " are such that E (" ) = 0, var(" ) = 1 and cov(" ; " ) = (kx ? x k; ), where (; ) represents a parametric family of stationary, isotropic correlation functions indexed by the parameter . The fu g are independent of the f" g, and both are independent of the fZ g. Model (2) is typical of a variance components models where all within-site correlations are captured by the f" g so that the fu g are independent. However, the f" g are modeled by a spatial process to allow between-site correlations. This model is easily adapted to apply to other situations. The mean function Z can be replaced by any other parametric model, including Z   if ordinary kriging is used. Similarly, if there are no replicates at the geographic sites x (i.e. n = 1 for all i), the term v (x ) u can be subsumed into v (x ) " . As mentioned above, many points share the same \location" x , with n ranging from 1 to 221 for the N = 329 weather stations in our dataset. There is also a computational reason for working with these approximate locations instead of the actual point locations: only this reduction in the true dimension of the spatial variance-covariance matrix allows us to use \o -theshelf" packages to perform the computations. The remaining errors u at a given weather station location x were assumed to be independent, since the correlation is taken to be spatial. In the kriging context, the variance function associated with the u is referred to as the nugget e ect. If no replicates are available, the nugget e ect would be estimated from the spatial error process v (x ) " . i

"

u

ij

ij

i

ij

i

i

i0

i

i

i0

ij

i

ij

i

ij

i

T ij

T ij

i

i

"

i

1=2

u

i

1=2

ij

i

i

i

ij

i

ij

"

i

1=2

i

4

3 Estimation Procedure

3.1 Overview

Let Y be the n  1 vector of Y 's and Z be the n  q matrix with (i; j )th row equal to Z . Let  be the variance-covariance matrix of Y . Let p be a positive integer-valued tuning parameter. The role of p is to determine the minimum number of replicates at an x in order to use that location for estimating the variance functions. The choice of p is discussed later. ij

T ij

i

Step 0:

(Initialization step) Set b = I .

Step 1:

Obtain

Step 2:

b = (Z b ? Z )? Z b ? Y :

Set

1

1

T

r = Y ? Z b and r = n1 ij

Step 3:

1

T

T ij

ij

i

i j=1

r : ij

Obtain vb (x ) by local linear smoothing of fve (x ) : n  pg where i X ve (x ) = n 1? 1 (r ? r ) : u

u

i

n

u

Step 4:

ni X

i

i

ij

i

i

i

2

j=1

Obtain vb (x ) by local linear smoothing of fv~ (x ) : n  pg where v~ (x ) = (r ) ? v~ (x ) : "

"

i

"

i

i

u

2

n

i

i

i

i

De ne vb (x ) = vb (x ) + vb (x )=n and let "b = r =vb (x ) . Estimate  in correlation model (; ) by tting the variogram of the "b .

Step 5:

r

"

i

u

i

i

i

i

i

r

i

1=2

i

Step 6:

Obtain

where (b )

b = b + b "

= vb (x ) if i = i0, j = j 0 and 0 otherwise, and [b ] = vb (x ) vb (x0 ) (kx ? x k; b ):

u ij;i0 j 0

u

i

" ij;i0 j 0

Step 7:

u

Repeat Steps 1{6 R

"

iter

i

1=2

"

i

1=2

i

i0

times.

Of course, the local linear smoothing in Steps 3 and 4 could be replaced by higher degree local polynomial regression. After the estimation steps have been completed, predictions can be made as will be discussed in Section 5. 5

3.2 Details on the Implementation

3.2.1 Generalized Least Squares

In Step 1, computations involving the inverse of the 11; 403  11; 403 matrix d Y ) are avoided by noting that, because of the assumed model (2), b = cov(

 =  + K V K; T

u

"

where  is a diagonal matrix with repeating \blocks" of length n : i

u

 = block diagfv (x ) I i = 1; : : : ; N g; u

u

i

ni

with I i being the n by n identity matrix, V is the N  N covariance matrix of the " and K is an N  n matrix with (i; i0) entry equal to 1 for i0 = 1 + P ? n ; : : : ; P n and zero otherwise. The inverse of  is therefore equal to n

i

i

"

i

i 1 k=1

i k=1

k

k

? = ? ? ? K (V ? + K ? K )? K ? ; 1

u

1

u

1

T

"

1

u

1

T

1

u

1

(Horn and Johnson (1985), p.19), which can be rapidly computed since the largest non-diagonal matrix to invert is only N  N .

3.2.2 Variance Function Estimation by Local Linear Regression

The v~ (x ) in Step 3 are approximately independently distributed, heteroskedastic random variables, with variance equal to 2v (x ) =(n ? 1) (this would hold exactly if were known and the errors were normally distributed). We will therefore apply the theory developed in Ruppert et al. (1997) to construct an estimator for the function v . While p = 2 observations are sucient for computing v~ (x ) at a location, there is clearly much more information about v at the locations with more observations. Since n = 11; 403 and N = 329, we have n = 35 and it might make sense to use only locations where n is \not too small." We experimented with p = 2; 3; 4 and found that p = 3 gave the best estimates, in terms of speed of convergence of the algorithm, lack of boundary problems as well as avoiding negative variance estimates (see below). The number of locations where with n  3 is 290. The special structure between the estimator and its variance is used in the bandwidth selection of the EBBS algorithm (Ruppert (1997)). More specifically, let vb (x ; h) be the local linear estimator of v (x ). EBBS separately u

i

u

i

2

i

u

u

i

u

i

i

u

u

i

6

i

estimates the squared bias and variance of vb (x ; h). These quantities are added together and their sum is minimized over a grid of h-values to produce the EBBS bandwidth at x . The bias estimate is exactly as in Ruppert (1997). The estimate of var(vb (x ; h)) uses the relation u

i

i

u

i

var(vb (x ; h)) = s(x ; h) diagfvar(~v (x )gs(x ; h); u

i

T

i

u

i

i

where s(x ; h) is the N by 1 local polynomial \smoother vector" for a given value of h, such that vb (x ; h) = (~v (x ); : : :; v~ (x ))s(x ; h). EBBS estimates var(vb (x ; h)) by i

u

u

u

i

u

1

N

i

i

vd ar(vb (x ; h)) = s(x ; h) diagf2(vb (x ; h))=(n ? 1)gs(x ; h): u

i

T

i

u

i

i

i

) be denoted We let hb denote the EBBS bandwidth and let vb (x ; hb by vb (x ). In Step 4, we obtain vb by smoothing fv~ (x ) : n > pg again using EBBS to select the bandwidth. We will ignore the error in b so that r = v (x )  + v (x ) u ; and therefore r = v (x )  + v (x ) u : Then v~ (x ) is unbiased for v (x ), and therefore E (~v (x )) = E (r ) ? v (x ) = v (x ): EBBS

u

u

EBBS

i

i

"

"

i

i

ij

"

i

u

i

1=2

i

u

i

1=2

i

ij u

"

"

i

1=2

i

u

i

1=2

i

i

u

2

i

i

n

i

"

i

i

Therefore, when we smooth the fv~ (x )g there is no bias term involving v and EBBS will properly estimate the bias of our nal estimate of v . One might consider estimating v by smoothing the fr g and then subtracting o an estimate of v (x )=n ; however, in this case, the bandwidth optimal for smoothing the fr g will not be optimal for the nal estimate of v . The EBBS bandwidth for smoothing v~ (x ) requires an estimate of var(~v (x ; h)). Estimation of this variance is based upon the following results. Let H = Z (Z ? Z )? Z ? represent the \hat" matrix from the estimation of the mean. Let  be the N  n matrix with (i; j ) entry equal to 1=n for j = 1+ P ? n ; : : :; P n and zero otherwise. Finally, let A B denote the elementwise product of equi-sized matrices A and B. Result 1: Assuming normality of the Y 's, the covariance matrix of the random vector containing r ; i = 1; : : : ; N is given by "

i

u

"

"

u

i

i

i

i

"

"

T

i

1

i 1 k=1

1

k

"

i

T

1

i k=1

k

i

ij

2 i

2AA (AA + 2Amm T

T

7

T

A) T

(3)

where m = Z , A = (I ? H ). If the error due to estimation of the mean Z is ignored, then the above covariance matrix simpli es to 2(V + V

E) : (4) where V = diagfv (x ); i = 1; : : : ; N g and E = diagf1=n ; i = 1; : : : ; N g and A = A A. Result 2: If the error due to estimation of is ignored and the Y 's are "

u

u

u

[2]

1

i

i

1

[2]

ij

normally distributed, the covariance matrix of the random vector containing v~ (x ) = (r ) ? v~ (x ) i = 1; : : :; N "

i

i

is given by

u

2

n

i

i





 = 2 (V + V E ) + V E E ; where E = diag((n ? 1)? ). v ~"



1

(5)

2

1

i

2

u

[2] 1

[2] u

[2]

Let Vc and Vc be obtained from V and V be replacing v (x ) and v (x ) by vb and vb . Then let b " be given by (5) with V and V replaced by Vc and Vc . Suppose that "

"

u

u

i

"

"

e

u

u

v ~

"

i

u

u

vb = (~v (x ; : : :; v~ (x ))s(x ; h) where, as with vb , s(x ; h) is a smoother vector. Then, the estimate of var(vb (x )) used by EBBS is s (x ; h)b " s(x ; h): Our estimate of v does not use locations where n < p, but these are locations where there is relatively little information about v . Since the v~ (x ) e

u

"

"

"

1

n

i

i

i

T

i

v ~

i

"

i

"

"

i

smoothed in Step 4 are possibly negative, there is a positive probability that vb (x ) is negative. As n increases, the probability that v~ (x ) is negative decreases. While negative values for v~ (x ) are in principle not a problem, it is highly undesirable to have negative variance estimates vb , since they would result in a negative de nite covariance matrix Vc . For p = 3, only 9 locations had negative variance estimates, and all were located at the the North and North-West boundaries of estimation region, making it very likely that they are the result of \boundary e ects," a common nuisance in nonparametric regression similar to extrapolation problems in parametric regression. We therefore decided to add a local averaging step at each iteration of the algorithm to \correct" any negative estimates. Note that this step only changes the negative estimates and leaves all the other values unchanged. "

i

i

"

"

i

e

"

8

i

3.2.3 Estimation of the Correlation Function by Variogram Fitting In Step 5, the correlation function is estimated parametrically by variogram tting. Because heteroskedasticity is known to cause spurious patterns in variograms, it is important to remove that e ect before estimating the correlation function. Hence, the spatial residuals r have to be normalized. What the normalizing constants should be is a somewhat subtle issue. If we ignore the errors caused by the estimation of the mean and variance functions and use "~ = r =v (x) , the variogram will estimate i

i

i

1=2

"

2~ (x ? x ) := E(~" ? "~ ) = var(~") + var(~" ) ? 2cov(~"; "~ ) = vv ((xx )) + vv ((xx )) ? 2(x ? x ) i0

i

i0

i

2

i0

i

r

i

r

i0

"

i

"

i0

i

i0

i0

i

while if we use "b = r =v (x ) , then i

i

r

i

1=2

2~ 0(x ? x ) := E("b ? "b ) i

i0

i

i0

2

= 2 ? 2(x ? x

i0

i

v u u )t v"(

x ) v (x ) : v (x ) v (x ) r

i

"

i0

i

r

i0

Neither ~() nor ~0() are in general equal to () := 1 ? (), so they cannot be directly used to t the correlation function. However, it is easy to see that 0

(x ? x ) = 1 ? 1r? ~ x(x ?xx ) : (6) i

i

i0

v" (

i0

x

x

i0 ) i0 )

i ) v" ( i ) vr (

vr (

We can therefore construct a \bias-corrected" variogram based on (6). Let "b = r =v (x ) For a given distance t, let S (t) = f(i; i0) : kx ?x k 2 (t)g with  a given bin size and n(t) = jS (t)j. The  was chosen so that 200 equalsized bins were produced over the range of kx ? x k in the study region, corresponding to   0.09 degree. This represents a compromise between computational tractability and the need for sucient observations in each bin. Then, P 1? ("b ? "b )

b (t) = 1 ? P r b x b x : i

i

r

i

1=2

i

i0

i

1 2n(t)

1 n(t)

S(t)

i

S(t)

"r (

x

bvr (

i0

x

i ) v" ( i ) vr (

b

2

The following parametric model is used for ():

i0 ) i0 )

(t; ) = 1 ?  e? 1 ? (1 ?  )e? 2 3

 t

9

3

 t

i0

with  ;  > 0 and 0    1. This is a mixture of two exponential functions, which was chosen to guarantee the positive de niteness of the variance/covariance matrix estimate. Clearly, other parametric models, including mixtures of larger numbers of exponentials, could be selected as correlation functions for other datasets. The parameters  ;  ;  are estimated by weighted least squares minimization following Cressie (1993, p.96). The estimate of the spatial variance covariance matrix V is computed by setting q [Vc ] = (x ? x ; b ) vb (x )vb (x ): 1

2

3

1

2

3

"

" ii0

i

i0

"

i

"

i0

4 Results The model was run on the CARD dataset, and converges in 2-10 iterations, depending on the strictness of the convergence criterion and the choice of some of the tuning parameters. For p = 3 the model converges after 4 iterations, which takes approximately 10 minutes to run on a DEC 3000 Model 900 AXP workstation, with the bulk of the computing time taken by the GLS tting (Step 1 in Subsection 3.1). Figures 1 and 2 show the nonparametric estimates of the variance functions v () and v () at the weather station locations. Both estimates show a pattern of low values in the center. The estimated functions also display some interesting di erences: while the Great Lakes region exhibits high local and spatial variance, the spatial variance is also high in the South-most part of the study region, while the local variance is high at the Western edge. Most of the variability in the data is explained by the local variance v , with the mean value of vb (x ) equal to 4.789, while that for vb (x ) is 0.404. u

"

u

u

"

i

i

INSERT FIGURE 1 HERE. INSERT FIGURE 2 HERE. In Figure 3, the bias-adjusted variogram of the standardized residuals "b is displayed as well as the weighted least squares tted variogram function. The spatial correlation decreases rapidly as distance increases, and is only important for points at short distances of each other. i

INSERT FIGURE 3 HERE. 10

The goodness of t of a model such as this can be evaluated using data splitting techniques (e.g. Picard and Berk (1990)), and this approach was applied to a comparison between using the transformed and untransformed EPIC-WQ predicted nitrogen runo values as dependent variables. This analysis was performed by tting the model on 90% of the data and predicting on the remaining 10%, and did not show any signi cant di erence in average prediction error between the transformed and untransformed models. As mentioned in Section 1, this is not surprising since the heteroskedasticity was now explicitly accounted for in the model itself.

5 Model Predictions The purpose for developing this metamodel is to be able to predict the potential nitrogen runo at a set of 128,591 NRI points. As the prediction and estimation points use the same set of weather station locations, the spatial residuals " can be considered a lattice process (Cressie (1993)). The vector of spatial errors " = (" ; : : :; " ) can therefore be predicted by a \shrunk" version of the spatial residuals r :   "b = Vc Vc + Vc E ? r; (7) with r = (r ; : : :; r ) , by a straightforward application of conditional expectations (e.g. Bickel and Doksum (1977), p.26). Hence the spatial \correction" for an NRI point with closest weather station location x can be predicted by the corresponding element of the vector "b. Figure 5 of Opsomer et al. (1997) shows a plot of the values of the spatial corrections "b . i

1

N

T

i

1

"

1

N

"

u

1

T

i

i

6 Conclusions We have described a method for tting spatial data that combines parametrically speci ed mean and correlation functions with an unspeci ed spatial variance function. It can easily be generalized to other situations with different parametric models, or to situations without replication at the spatial locations. An iterative procedure to estimate the parameters and nonparametric regressions was explained in this article. However, no attempt was made to prove optimality or convergence properties for our algorithm, nor do more than sketch its theoretical properties under simplifying assumptions. These topics are open research questions at this point. 11

A Proofs Proof of Result 1: The vector of residuals, rij , can be written as r = (I ? H )Y . From the de nition of  we have the vector of ri values equalling (I ? H )Y . The stated result in (3) then follows directly from Lemma 1 of Ruppert et al. (1997) for the special case of normal Yi. If the bias due to estimation of Z is ignored, then (3) simpli es to

2( ) : T [2]

Expression (4) follows directly after noting that we are indexing the matrix  as  = cov(Y ; Y ) for j = 1; : : : ; n ; i = 1; : : : N . Hence, ij

ij;i0 j 0

( ) = n 1n T

ii0

i

i

i0 j 0

ni ni X X 0

i0 j=1 j 0 =1

x + v (x )



ij;i0 j 0

8 vu ( ) i > > < ni => q > :

"

i = i0

i

v (x )v (x )(kx ? x k; ) i 6= i0 "

i

"

i0

i

i0

Proof of Result 2: Recall that ni ni X X v~u(xi ) = n 1? 1 (rij ? ri)2 = (uij ? ui)2: i j=1 j=1

Because the fu g are iid normals, u is independent of v~ (x ). Therefore, r is independent of v~ (x ). Let  be the covariance matrix of the vector ((r ) ; : : :; (r ) ) ,  " the covariance matrix of (~v (x ); : : : ; v~ (x )) , and  u the covariance matrix of (~v (x ); : : : ; v~ (x ) ). Since v~ (x ) is v (x )=(n ? 1) times at  (n ? 1) random variable, we have ! ) 2 v ( x  u = diag n ? 1 = 2V E : i

ij

i

u

1

2

N

2 T

i

"

v ~

u

i

2

i

i

r

v ~

u

u

1

u

N

[2] u

2

"

1

T

N

u

T

i

i

2 u

v ~

i

i

By (4), and ignoring the error caused by using b in place of , we have 



 = 2(V + V E ) +  E = 2 (V + V E ) + V E E : (8) v ~"



u

1

[2]

v ~u

[2] 1



u

1

[2]

[2] u

[2] 1

2

References [1] P. J. Bickel and K. A. Doksum Mathematical Statistics. Holden-Day, Oakland, CA, 1977. 12

[2] R.J. Carroll and D. Ruppert. Transformation and Weighting in Regression. Chapman and Hall, New York, 1988. [3] N. A. C. Cressie. Statistics for Spatial Data. John Wiley & Sons, New York, 2 edition, 1993. [4] R. A. Horn and C. A. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, U.K., 1985. [5] S.M. Nusser and J.J. Goebel. The national resources inventory: a longterm multi-resource monitoring programme. Environmental and Ecological Statistics, 4:181{204, 1997. [6] J.-D. Opsomer. Nonparametric regression in the presence of correlated errors. In Modelling Longitudinal and Spatially Correlated Data: Methods, Applications and Future Directions, Springer-Verlag, 339{348, 1997. [7] J.-D. Opsomer, D. Ruppert, M.P. Wand, U. Holst, and O. Hossjer. An application of kriging with nonparametric variance function estimation. In 1997 Proceedings of the Biometrics Section, American Statistical Association, Alexandria, VA, 123{128, 1997. [8] D. Ruppert. Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation. Journal of the American Statistical Association, 92:1049{1062, 1997. [9] D. Ruppert, M.P. Wand, U. Holst, and Ola Hossjer. Local polynomial variance function estimation. Technometrics, 39:262{273, 1997. [10] R.R. Picard and K.N. Berk. Data splitting. American Statistician, 44:140{147, 1990. [11] A.N. Sharpley and eds. J.R. Williams. Epic-erosion/productivity impact calculator: 1. model documentation. Tech. Bull. 1768, USDA, Washington, DC, 1990. [12] J. Wu, P.G. Lakshminarayan, and B.A. Babcock. Impacts of agricultural practices and policies on potential nitrate water pollution in the Midwest and the Northern Plains of the United States. Working Paper 96{WP 148, Center for Agricultural and Rural Development, Iowa State University, February 1996. 13

YN03 nitrogen runo (predicted by EPIC-WQ) NRATE nitrogen application rate Tillage, conservation and irrigation practice dummies (reference: conventional tillage): DRT reduced tillage DSTRIP DNT no till DTERRA DCONTR contouring DIRTYP Crop rotation dummies (reference: continuous alfalfa): DROT1 continuous corn DROT8 DROT2 continuous soybeans DROT9 DROT3 continuous wheat DROT10 DROT4 continuous sorghum DROT11 DROT5 corn-soybeans DROT12 DROT6 corn-corn-soybeans DROT14 DROT7 corn-soybeans-wheat Rainfall and soil properties: RAIN rainfall (mm) BD SLOPE eld slope PH CLAY clay percentage PERM OM organic matter (%) AWC Hydrology dummies (reference: DHYGA): DHYGB hydrologic group B DHYGD DHYGC hydrologic group C Location of closest weather station: LAT latitude LONG longitude Table 1: Model variables.

14

strip-cropping terracing irrigation soybeans-soybeans-corn wheat-fallow wheat-sorghun-fallow wheat-soybeans wheat-sorghum corn-corn-3 years alfalfa bulk density soil pH soil permeability available water capacity hydrologic group D

#

# # #

# # ##

#

# # ## # # # # # # # ## # # # # ## # # # # # # # # # # # # # # # # ## # # # # # # # # # # # # # # # # ## # # # # ## ## # # # # # # ## # # # # # # # # # # ## ## # # # # # # # # ## # ## # # # # ## # # # # # # # # ## # # # # # ## # # ## ## ## # # ## # ## # # ## # # # # # # # ## # ## # # # # # ## # # # # # # # # # # # # # ## # # # ## # ## # ## # # # # # # #

#

# #

# # # # # # ## # # # # # # # # ## # # # # # # ## # # # # # # # ## # # # # ## # ## # # #### ## ## # # # # ## # # # # # # ## # # # # # # # # # # Local variance # # # ## # # # # # 0.276 - 2.826 # # # 2.826 - 5.375 # # # # # # # 5.375 - 7.924 # # # # # # # 7.924 - 10.474 # # # 10.474 - 13.023 # # # # # # # # # # # # # # #

#

# # #

# #

Figure 1: Estimate of the variance function v () at the weatherstation locations. u

15

#

# # #

# # # ## # # # # # # # # # # # # # # # # #

## #

# #

#

#

#

# # # # # # # # # # # # ## # # # # # # # # ## # # # # # # # # # # # ## # # # # # # # # ## # # # # # ## # # # # # # # # # # ## # # # # # # # # # # # ## # # # # # # # # # # # # # # ## # # # # # # # ## # # # # # # # ## # # # # # # # # ## # # ## # # # # ## # # # # ## # # # ## # # # # # # # # # # # # # ## ## # # # # # # # ## # # # # # # # # # ## # # # # # # # # # # # # # # ## # # ## # # # # # # # # # # # # # # # ## # # # #

# #

#

## ## # # ## # ## # ###

# # # # # # # # # # ## # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

#

Spatial variance

#

#

#

#

#

#

0.006 - 0.376 0.376 - 0.747 0.747 - 1.117 1.117 - 1.487 1.487 - 1.858

# # # # ## # # # #

#

Figure 2: Estimate of the variance function v () at the weatherstation locations. "

16

1.4

1.2

Variogram

1

0.8

0.6

0.4

0.2

0

0

5

10

15

20

Distance

Figure 3: Variogram and estimated variogram function b(; b).

17

25