On a symmetrized simulation extrapolation estimator in linear errors-in ...

Report 6 Downloads 23 Views
On a symmetrized simulation extrapolation estimator in linear errors-in-variables models Polzehl, J¨org Weierstrass-Institute, Mohrenstr. 39, 10117 Berlin, Germany [email protected]

Zwanzig, Silvelyn



Uppsala University, Department of Mathematics, Box 480, SE-751 06 Uppsala, Sweden [email protected]

1

Introduction

In [6] and [10] Cook and Stefanski introduce the simulation extrapolation estimator (SIMEX), a new type of estimator in linear errors-in-variables models (EVM). This estimator becomes more and more popular, see e.g. [1], [8], [9] or [12]. It is also used as a tool in nonparametric curve estimation, see e.g. [11]. In contrast to regression models, errors-in-variables models have measurement errors in the regressors, see [2], [3] or [7]. The naive use of regression estimators leads to a severe bias in this situation. The main idea of SIMEX is to eliminate the effect of measurement errors by extrapolation. New samples with arbitrary larger variances are generated by adding simulated errors to the original observed regressor variables. A model for the influence of measurement error variance on the expectation of the naive regression estimate is fitted to regression estimates obtained from these samples. The SIMEX estimate is defined extrapolating within this model to the case of zero measurement variance. Knowledge of the variance is essential to determine ∗

This is the author to whom the correspondence should be sent.

1

polzehl, j. and zwanzig, s.

2

the length of the backward extrapolation step. Asymptotic results for SIMEX have only been obtained assuming vanishing variances, see [4] and [5]. In this paper we propose a simulation extrapolation estimator, which we call SYMEX. Here the length of the backward step will be determined by symmetry arguments. Knowledge of the measurement variance is not needed. Instead the variance ratio is required to be known. This is a common and probably more natural assumption in EVM’s, see e.g. [7]. The name SYMEX should underline the symmetrical construction of this simulation extrapolation estimator. Let us try to explain the main idea of this construction. EVM are symmetric in all variables. There is no difference between response and the p explanatory variables. Each variable can be considered as a response variable, just by reparametrization without any change in the underlying relationship. By the same reason p + 1 different naive regression estimators can be calculated. All estimators characterize the same relationship and no particular estimate can be preferred. The idea of SYMEX consists in the application of the first steps of a slightly modified SIMEX procedure to each naive estimator. The backward extrapolation is stopped at the point where all estimators are closest to each other. Our modification of SIMEX uses a different simulation scheme and a slightly different extrapolation model. With SYMEX all components of the parameter of interest are estimated simultaneously. The extrapolation model is linear in its parameter but nonlinear as a function of the error variance. For a simple linear relationship the SYMEX estimator turns out to be an arbitrary exact approximation of the total least squares estimator (TLS). This implies the consistency of SYMEX also for increasing sample sizes and fixed variances. Simulations show preferable results for the SYMEX procedure compared with the naive estimators and SIMEX. In Section 2 and 3 we discuss the symmetry of the model. In Sections 4 and 5 formally introduce the SIMEX and SYMEX procedures. The result for the simple linear relationship is given in Section 6. Section 7 reports simulation results and some conclusions.

polzehl, j. and zwanzig, s.

2

3

Symmetry of errors-in-variables models

The explicit linear errors-in-variables model is usually given as η = ξ t β,

(1)

where β ∈ Rp is the parameter of interest. ξ ∈ Rp is the vector of unknown explanatory variables and η ∈ R is the expectation of the response variable. We assume all components of β to be different from zero. Otherwise these components could be eliminated from the model. Both the ξ ∈ Rp and the η ∈ R are observed by Y = η + ²,

X = ξ + δ,

(2)

where E² = 0 , V ar(²) = σ 2 , Eδ = 0p , Cov(δ) = σ 2 Diag(τ ), and Cov(², δ) = 0p . We assume the matrix Diag(τ ) to be known while σ 2 may be unknown. Without loss of generality we set Diag(τ ) = I p , we may use transformed vari˜ = Diag(1/√τ )X and transformed parameters β˜ = Diag(√τ )β othables X erwise. The regression model is just a special extremal case of the EVM with Cov(δ) = Op , i.e. τ = 0 . Note that in model 1 we assume a zero intercept parameter. This is not really restrictive since estimation of the intercept in linear EVM can be easily separated by centering the observed data, see e.g. [7]. The model (1) can be reformulated as an implicit model: 0 = −η + β t ξ = αt ζ,

(3)

with αt = (−1, β1 , . . . , βp ) and ζ t = (η, ξ1 , . . . , ξp ) . Instead of (2) we write Z = ζ + ε,

(4)

with Z t = (Y, X t ) and Cov (ε) = σ 2 Ip+1 . The implicit form of the model in (3) and (4) underlines the symmetric structure of the errors-in-variables model. There is no distinction between response variable Y and predictor variable X . We can formulate another implicit model just by dividing (3) by −β` . 0=

−η + β t ξ ¡ ` ¢t = α ζ, −β`

(5)

polzehl, j. and zwanzig, s.

4

with

α , for l = 1, . . . , p, α0 = α. β` The inverse parameter transformation is α` = −

βj = −

αj` , for j = 1, . . . , p. α0`

(6)

(7)

Let, for a vector v = (v0 , v1 , . . . , vp )t , v(−`) = (v0 , v1 , . . . , v`−1 , v`+1 , vp )t denote the vector consisting of all, except the `0 th , components of v . In (6) we have α`` = −1 , thus (5) is ` )t ζ(−`) . 0 = ζ` − (α(−`)

This means we can rewrite the errors-in-variables model 1 as p + 1 different explicit EVM’s corresponding to (5 ) ` Z` = (α(−`) )t ζ(−`) + ε` , Z(−`) = ζ(−`) + ε(−`) .

(8)

Let our sample consist of observations ZiT = (Yi , XiT ) at ζit = (ηi , ξ1i , . . . , ξpi ) , i = 1, . . . , n . Denote the observation matrix Z = ζ + ε = (Z0 , . . . , Zp )T = (Z`i )`=0,...,p;i=1,...,n . Z(−`) denotes the p × n matrix without the `0 th row. The naive regression models related to (8) are ` Z` = (α(−`) )t Z(−`) + ε` , for l = 0, . . . , p.

(9)

Here always the `0 th variable is considered as response and the errors in the other predictor variables are ignored.

3

Naive estimators and total least squares

The naive least squares estimators α b` , ` = 0, . . . , p in (9) are given by ` b`` = −1 = (Zt(−`) Z(−`) )−1 Zt(−`) Z` , α α b(−`)

(10)

` as the solution of minα,α` =−1 αt Zt Zα . The naive regression estimates for α(−`)

can be highly biased with ` t t Eb α(−`) = E((ζ(−`) + εt(−`) )(ζ(−`) + ε(−`) ))−1 (ζ(−`) + εt(−`) )ζ` , t t ` ≈ (ζ(−`) ζ(−`) + σ 2 Ip )−1 ζ(−`) ζ(−`) α(−`)

(11)

5

polzehl, j. and zwanzig, s.

and matrices ζ and ε inheriting their dimensions from Z . Using the original parametrization we have µ µ ¶t ¶ 1 −1 −1 t βbnaive,` = arg min 2 ZZ . β β β β `

(12)

Let us denote the criterion function of the ` ’th naive regression by N aive` (β) 1 N aive` (β) = 2 β`

µ

−1 β

and

¶t

µ t

ZZ µ

N aive0 (β) =

−1 β



−1 β

, for ` = 1, . . . , p

¶t

µ t

ZZ

−1 β

(13)

¶ .

(14)

The total least squares estimator βbT LS is the maximum likelihood estimator assuming a Gaussian error distribution in (2). It is defined as the solution of the minimization problem βbT LS = arg min β

n X i=1

³° ´ ° 2 t °2 ° min Yi − ξ β + kXi − ξk .

ξ∈Rp

The minimization with respect to ξ can be done for all β explicitly, see e.g. [7]. We get the total least squares criterion as 1 T LS(β) = 1 + kβk2

µ

−1 β

µ

¶t t

ZZ

−1 β

¶ .

(15)

Comparing the criterion function in (13), (14) with (15 ) it holds p X

N aive` (β)−1 = T LS(β)−1

l=0

and

p X l=0

max N aive` (β)−1 ≥ max T LS(β)−1 . β

β

(16)

If all naive estimators are the same this implies that they coincide with TLS. We use this property for determining the length of the backward step. In simple linear relationships SYMEX delivers the same value for both naive approaches which is an arbitrary exact approximation of the total least squares estimator, see our Theorem 6.1.

6

polzehl, j. and zwanzig, s.

4

The SIMEX method

We shortly explain how SIMEX works by providing the procedure. Note that this slightly differs from [6] and [10], using a more efficient sampling scheme and a simpler and more stable model for extrapolation. Nevertheless the procedure follows the same arguments and ideas. 1. Simulation: For values λ out of a set {λk ; k = 0, . . . , K} , with λ0 = 0 , simulate samples with increased measurement variance as ∗ Xbij (λ) = Xij + λ1/2 δbij

b = 1, . . . , nb , i = 1, . . . , n, j = 1, . . . , p

∗ with i.i.d. δbij ∼ N (0, 1) . Define the new observation matrix Z(λ) using

Xbij (λ) instead of Xij . 2. Estimation: Calculate the naive regression estimate ˆ β(λ) = (Z(−1) (λ)t Z(−1) (λ))−1 Y for all values of λ . 3. Model fit: The parametric model ¡ ¢−1 β(λ, θ) = Zt(−1) Z(−1) + λIp θ

(17)

ˆ with unknown parameter θ is used to describe the expectation of β(λ) as a function of λ , see (11) for a justification. Fit this model by weighted least squares θb = arg min

K X

° °2 °b ° wk °β(λk ) − β(λk , θ)°

(18)

k=0

with w0 >> 1 and wi = 1 for i > 0 . A large value of w0 can be used to force the model to fit perfectly for the most informative estimate, i.e. λ = 0. 4. Extrapolation: The model (17) is used to extrapolate to the case of zero measurement variance using λ = −σ 2 . This leads to the SIMEX estimate ˆ βˆSIM EX = β(−σ 2 , θ).

(19)

7

polzehl, j. and zwanzig, s.

5

The SYMEX procedure

We perform the following SIMEX-type procedure. Steps 1–3 are carried out for all models ` = 0, . . . , p . Steps 4 and 5 determine the SYMEX estimator. 1. Simulation: For values λ out of a set {λk k = 0, . . . , K} , with λ0 = 0 , simulate p -variate samples with increased measurement variance as Z(−l)bij (λ) = Z(−l)ij + λ1/2 ε∗bij

b = 1, . . . , nb , i = 1, . . . , n, j = 1, . . . , p

with i.i.d. ε∗ ∼ Nnb np (0, I) , i.e. adding independent errors with variance λ in each component of the resulting matrix. Define the observation matrix Z(λ) as the matrix containing the values Z(−l)bij (λ) . Note that Z(λ0 ) contains the nb -times replicated original sample. 2. Estimation: For all simulated samples, i.e. for all values λk , calculate the naive estimates α b` (λk ) in model (9) by ` α b(−`) (λ) = (Zt(−`) (λ)Z(−`) (λ))−1 Zt(−`) (λ)Z` ,

α b`` (λ) = −1.

3. Model fit: Fit the model 1 −1 ` α(−`) (λ, θ) = ( Zt(−`) Z(−`) + λIp ) θ n

and α`` (λ, θ) = −1,

(20)

with parameter θ ∈ Rp . The estimate of θ is given as K X

θˆ` = arg min

° ` °2 w k °α b (λk ) − α` (λk , θ)°

(21)

k=0

with w0 >> 1 and wk = 1 for k > 0 . Using the inverse transform (7) we define βb` (λ) = −α` (λ, θˆ` )/α` (λ, θˆ` ) for j = 1, . . . ., p and arbitrary λ . j

j

0

4. Extrapolation: Determine an optimal value λ∗ for extrapolation to the case of zero measurement variance as λ∗ = arg

min

λ∈[λmin ,0]

p p X X

||βˆ` (λ) − βˆm (λ)||2

l=0 m=l+1

under the constraint ¾ ½ 1 t λmin = min λ ∈ R : Z(−`) Z(−`) + λIp  0, for ` = 0, . . . , p . λ n

(22)

8

polzehl, j. and zwanzig, s.

5. SYMEX estimation: Estimate β by p

βˆSY M EX = βˆ (λ∗ ) =

1 X ˆ` ∗ β (λ ). p + 1 l=0

(23)

The philosophy of SIMEX and SYMEX includes to extrapolate to the case of zero variance, i.e. using λ = −σ 2 in the extrapolation step. This means that the value −λ∗ can be interpreted as an estimate of the measurement-error variance σ 2 .

6

The simple linear relationship

Let us illustrate the SYMEX idea for the simple linear relationship with p = 1 . We introduce the usual denotations. n

sxx

n

n

1X 2 1X 2 1X = Xi , syy = Yi , sxy = Xi Yi . n i=1 n i=1 n i=1

The naive estimators in (10) are µ ¶ µ −1 0 1 α b = , α b = sxy sxx

sxy syy



−1

.

Within the original parametrization (7) they are given by sxy b syy βbnaive,0 = , βnaive,1 = . sxx sxy The total least squares estimator is obtained by q s − s + (syy − sxx )2 + 4s2xy xx 1 yy b . βT LS = 2 sxy

(24)

We simulate samples 1

∗ xi,j (λk ) = xi + λk2 δijk , i = 1, . . . , n j = 1, . . . , nb k = 0, . . . , K 1

yi,j (λk ) = yi + λk2 ε∗ijk , i = 1, . . . , n j = 1, . . . , nb k = 0, . . . , K. Denote by sx(λ)x(λ) and sx(λ)y the quadratic forms sx(λ)x(λ)

nb nb n n 1X 1 X 1X 1 X 2 xi,j (λ) and sx(λ)y = xi,j (λ) yi = n i=1 nb j=1 n i=1 nb j=1

(25)

9

polzehl, j. and zwanzig, s.

and define sxy(λ) and sy(λ)y(λ) in the same way. The estimators in Step 2 are à −1 sx(λ)y α b0 (λ) =

!

à , α b1 (λ) =

sy(λ)y(λ)

−1

sx(λ)x(λ)

In Step 3 we apply the models µ ¶ µ −1 0 1 α (λ, θ) = , α (λ, θ) = θ

θ syy +λ

−1

sxx +λ

The fit in (21) yields µ PK 1 k=0

θ0 =

sx(λ )y k



sxx +λk sx(λ )x(λ ) k k

³

PK k=0

1 sxx +λk

´2

µ

PK k=0

, θ1 =

!

sxy(λ)

.

¶ , θ ∈ R.

sxy(λ ) 1 k syy +λk sy(λ )y(λ ) k k

PK ³ k=0

1 syy +λk

´2

¶ .

(26)

Hence βb10 (λ) = α10 (λ, θ0 ) =

θ0 1 syy + λ , βb11 (λ) = 1 = . sxx + λ α0 (λ, θ1 ) θ1

In the extrapolation step we determine the λ∗ as the fixed-point βb10 (λ∗ ) = βb11 (λ∗ ) , with λ∗1,2

1 1 = − (syy + sxx ) ± 2 2

q (syy − sxx )2 + 4θ0 θ1 .

We exclude one of the solutions by the following argumentation. A reasonable simulation in Step 1 should not change the direction of the dependence between X and Y . We suppose that sxy(λk ) and sx(λk )y have the same sign for all k = 0, . . . , K . Thus θ0 θ1 ≥ 0 . Then q 1 (syy − sxx )2 + 4θ0 θ1 2q 1 (syy − sxx )2 < −syy . 2 P 2 = β 2 n1 ξi + σ 2 + oP (1) , thus λ∗1