Linearization Variance Estimators for Longitudinal Survey Data A. Demnati and J. N. K. Rao A. Demnati, Social Survey Methods Division, Statistics Canada, Ottawa, Ontario, Canada, K1A 0T6
[email protected] J. N. K. Rao, School of Mathematics and Statistics, Carleton University, Ottawa, Ontario, Canada, K1S 5B6
[email protected] Summary In survey sampling, Taylor linearization is often used to obtain variance estimators for nonlinear finite population parameters, such as ratios, regression and correlation coefficients, which can be expressed as smooth functions of totals. Taylor linearization is generally applicable to any sampling design, but it can lead to multiple variance estimators that are asymptotically design unbiased under repeated sampling. The choice among the variance estimators requires other considerations such as (i) approximate unbiasedness for the model variance of the estimator under an assumed model, (ii) validity under a conditional repeated sampling framework. Demnati and Rao (2001) proposed a new approach to deriving Taylor linearization variance estimators that leads directly to a unique variance estimator that satisfies the above considerations. In this paper, we extended the work of Demnati and Rao (2001) to deal with longitudinal surveys which lead to dependent observations and to multiple weights on the same unit. We consider a variety of longitudinal sampling designs, covering panel surveys, household panel surveys as well as rotating surveys. Key Words: Multiple weights; Repeated surveys; Taylor linearization. 1. Introduction Taylor linearization is a popular method of variance estimation for complex statistics such as ratio and regression estimators and logistic regression coefficient estimators. It is generally applicable to any sampling design that permits unbiased variance estimation for linear estimators, and it is computationally simpler than a resampling method such as the jackknife. However, it can lead to multiple variance estimators that are asymptotically design unbiased under repeated sampling. The choice among the variance estimators, therefore, requires other considerations such as (i) approximate unbiasedness for the model variance of the estimator under an assumed model, (ii) validity under a conditional repeated sampling framework. For example, in the context of simple random sampling and the ratio estimator, YˆR = ( y / x ) X , of the population total Y , Royall and Cumberland (1981) showed that a commonly used linearization variance estimator ϑ L = N 2 (n −1 − N −1 ) s z2 does not track the conditional variance of YˆR given x , unlike the jackknife variance estimator
ϑ J . Here y and x are the sample means, X is the known population total of an auxiliary variable x , s z2 is the sample variance of the residuals z i = y i − ( y / x ) x i and (n, N ) denote the sample and population sizes. By linearizing the jackknife variance estimator, ϑ J , we obtain a different linearization variance estimator, ϑ JL = ( X / x ) 2 ϑ L , which also tracks the conditional variance as well as the unconditional variance, where X = X / N is the mean of x . As a result, ϑ JL or ϑ J may be preferred over ϑ L . Valliant (1993) obtained ϑ JL for the post-stratified estimator and conducted a simulation study to demonstrate that both ϑ J and ϑ JL possess good conditional properties given the estimated post-strata counts. Särndal, Swensson and Wretman (1989) showed that ϑ JL is both asymptotically design unbiased and asymptotically model unbiased in the sense of E (ϑ ) = Var (Yˆ ) , where E denotes model expectation and Var (Yˆ ) is the model variance m
JL
m
R
m
m
R
of YˆR under a Αratio model≅: E m ( y i ) = β x i ; i = 1,..., N and the y i ’s are independent with model variance Varm ( y i ) = σ 2 xi , σ 2 > 0 . Thus, ϑ JL is a good choice from either the design-based or the model-based perspective. Demnati and Rao (2001) proposed a new approach to variance estimation that is theoretically justifiable and at the same time leads directly to a ϑ JL -type variance estimator for general designs. Afterward, Demnati and Rao (2002) extended their
1
Page 138
method to the case of missing responses when adjustment for complete nonresponse and imputation based on smooth functions of observed values, in particular ratio imputation, are used. While several methods have been proposed to correctly estimate the variance of an estimator under crosssectional data, methods of variance estimation from longitudinal data are limited, despite an increase in recent years in the number of longitudinal surveys. The dynamic nature of longitudinal populations such as births and deaths, the limitation of longitudinal samples to take into account such features of longitudinal populations, and multiple weights are one of the difficulties inherent to longitudinal data in addition to the customary cross-sectional difficulties such as complex designs, missing data and so on. This paper is a first attempt to extend the Demnati and Rao (2001,2002) method to the case of longitudinal survey data. Section 2 gives a brief account of the method for the case of cross-sectional data, and section 3 presents the extension to longitudinal survey data. 2. Cross-Sectional Data
We give a brief account of the Demnati and Rao (2001) method under full response. Suppose an estimator θˆ of a parameter θ can be expressed as a differentiable function g (Yˆ ) of estimated totals Yˆ = (Yˆ1 ,..., Yˆm ) T , where Yˆ j = ∑ i∈U d i ( s ) y ij is an estimator of the population Y j , j = 1,.., m , where d i ( s ) = 0 if the unit i is not in the sample s , U is the set of population units, and θ = g (Y ) with Y = (Y1 ,..., Ym ) T . We may write θˆ as θˆ = f (d ( s ), A y ) and
θ = f (1, A y ) , where A y is an m × N matrix with j th column y j = ( y1 j ,..., y mj ) T , j = 1,..., N , d ( s ) = (d1 ( s ),..., d N ( s )) T and 1 is the N -vector of 1's. For example, if θˆ denotes the ratio estimator YˆR = X (∑ i∈U d i ( s ) y i ) /( ∑ i∈U d i ( s ) x i ) , then m = 2 , y = y , y = x and f (1, A ) reduces to the total Y , noting that X (Y / X ) = Y . Note that Yˆ is a function of 1i
i
2i
i
R
y
d (s ) , y , x and the known total X , but we dropped X for simplicity and write YˆR = f (d ( s ), y , x ) . If the Horvitz-
Thompson weights are used, then d i ( s ) = 1 / π i for i ∈ s , where π i is the probability of selecting unit i in the sample s . Let f (b, A y ) = f (b) for arbitrary real numbers b = (b1 ,..., b N ) T . Demnati and Rao (2001) showed that the Taylor linearization of θˆ − θ , namely
θˆ − θ = g (Yˆ ) − g (Y ) ≈ (∂g (a ) / ∂a )T is equivalent to
a =Y
(Yˆ − Y ),
θˆ − θ ≈ ∑ kN=1 (∂f (b) / ∂bk ) b =1 (d k ( s) − 1)
(2.1)
=~ z T (d ( s ) − 1),
z = ( ~z1 ,..., ~z N ) T with ~ z k = ∂f (b) / ∂bk where ∂g (a ) / ∂a = (∂g (a ) / ∂a1 ,..., ∂g (a ) / ∂a m ) T and ~
b =1
. It follows from (2.1) that
a variance estimator of θˆ is approximately given by the variance estimator of the estimated total ∑ d i ( s ) ~z i = Yˆ ( ~z , d ( s )) ; that is, var(θˆ) ≈ ϑ ( ~z , d ( s )) , where ϑ ( y, d ( s )) denotes the variance estimator of Yˆ = Yˆ ( y, d ( s )) in operator notation using
the vector of design weights d (s ) . Now we replace ~z k by z k = ∂f (b) / ∂bk
b= d(s)
, since ~z k =s are unknown, to get a
linearization variance estimator
ϑ L (θˆ) = ϑ ( z, d ( s )) .
(2.2)
Note that ϑ L (θˆ) given by (2.2) is simply obtained from the formula ϑ ( y, d ( s )) for Yˆ = Yˆ ( y, d ( s )) by replacing y i by z i for i ∈ s . Note that we do not first evaluate the partial derivatives ∂f (b) / ∂bk at b = 1 to get ~z k and then substitute estimates for the unknown components of ~ z . Our method, therefore, is similar in spirit to Binder(1996)’s approach. The ˆ variance estimator ϑ (θ ) is valid because z is a consistent estimator of ~z . L
k
k
2
Page 139
Suppose θˆ is the ratio estimator YˆR = X (∑ d i ( s ) y i ) /( ∑ d i ( s ) x i ) , where ∑ denotes summation over i ∈U . Then f (b) = X (∑ b y ) /( ∑ b x ) = XYˆ ( y , b) / Yˆ ( x, b) and i
i
i i
z k = ∂f (b) / ∂bk
b =d(s)
=
(
)
X y k − Rˆ x k . Xˆ
For simple random sampling, ϑ L (YˆR ) = ϑ ( z , d ( s )) agrees with ϑ JL = ( X / x ) 2 ϑ L . The above derivation is simple and natural. On the other hand, in the standard linearization method, θˆ is first expressed in terms of elementary components Yˆ1 ,..., Yˆm as g (Yˆ ) and the partial derivatives ∂g (a ) / ∂a j are then evaluated at a = Y . It is interesting to note that all the components Yˆ use the same weights d ( s ) and our approach always takes i
j
partial derivatives of f (b) with respect to bk at b = d ( s ) . It is not necessary to first express θˆ in terms of elementary components. Demnati and Rao (2001) applied the method to a variety of problems, covering regression calibration estimators of a total Y and other estimators defined either explicitly or implicitly as solutions of estimating equations. They obtained a new variance estimator for a general class of calibration estimators that includes generalized raking ratio and generalized regression estimators. They also extended the method to two-phase sampling and obtained a variance estimator that makes fuller use of the first phase sample data compared to traditional linearization variance estimators. 3. Longitudinal Data
As in the usual finite population situation, we identify a population of size N during a fixed period of time by a set of indices P = {1,..., N } . Let P (t ) denote the cross-sectional population at time t and let J i(t ) denote the cross-sectional population membership indicator for unit i , i = 1,..., N , i.e. J i(t ) = 1 if i ∈ P (t ) and J i(t ) = 0 if i ∉ P (t ) with J +(t ) = ∑ i J i(t ) = N (t ) the size of the cross-sectional population P (t ) , where ∑ i denotes summation over the population units i . In this section, we consider variance estimation of a differentiable function of estimated cross-sectional totals under a series of longitudinal sampling designs assuming full response and no post-stratification. It is shown that a variance estimator can be obtained through variance estimation of a design-weighted estimator of total of a synthetic population. The case of estimators obtained as solution of estimating equations can be obtained along the lines of Demnati and Rao (2001), and the case of estimators obtained from survey data with missing responses can be obtained along the lines of Demnati and Rao (2002). In this version, details of these extensions are omitted for space reason.
To discuss multiple weights inherent in repeated surveys, suppose an estimator θˆ of a parameter θ can be expressed as a differentiable function θˆ = g (Yˆ ) of estimated cross-sectional totals Yˆ = (Yˆ (1) ,..., Yˆ (T ) ) T , where Yˆ (t ) is an estimator of the population total Y (t ) = ∑ i J i(t ) y i(t ) at time t , t = 1,..., T , and θ = g (Y ) with Y = (Y (1) ,..., Y (T ) ) T . We may (t ) ~ write Yˆ (t ) as Yˆ (t ) = ∑ Gg ∑ i d gi(t ) ( s g(t ) ) y i(t ) = 1GT ( t ) A (~t ) y (t ) with specified constants G (t ) , where d ~ ~ ~ A (~t ) = (d 1(t ) ( s1(t ) ),..., d (t()t ) ( s (t ()t ) )) T , d g(t ) ( s g(t ) ) are vectors of cross-sectional weights at time t , 1G (t ) is the G (t ) -vector of d
G
1’s and y
(t)
=
( y1(t ) ,...,
G
y N(t ) ) T
. In this section we show that the estimator θˆ can be written as θˆ = f (d (s )) under a wide
class of longitudinal survey designs. Therefore, it follows from Demnati and Rao (2001) that a linearization variance estimator of θˆ is given by
ϑ L (θˆ) = ϑ ( z, d ( s )) ,
(3.1)
d ( s ) = vech( AdT ) ,
(3.2)
where
Ad =
( Ad(1)T
,...,
Ad(T )T ) T
,
Ad(t )
=
(d 1(t ) ( s1(t ) ),..., d (t()t ) G
( s (t ()t ) G
T
)) ,
d g(t ) ( s )
are vectors of design weights, vech( A) denotes the
vector obtained by stacking distinct column of A underneath each other, in order from left to right, and
3
Page 140
z k = ∂f (b) ∂bk
b =d ( s )
.
(3.3)
3.1 Independent Samples
Under independent samples we have G (t ) = 1 , Yˆ (t ) = Yˆ ( y (t ) , d (t ) ( s (t ) )) = ∑ i y i(t ) d i(t ) ( s (t ) ) , and d ( s ) = vech(d (1) ( s (1) ),..., d (T ) ( s (T ) )) ,
(3.4)
where d (t ) ( s (t ) ) = (d 1(t ) ( s (t ) ),..., d N(t ) ( s (t ) )) T is the vector design weights at time t . If the Horvitz-Thompson weights are used then d i(t ) ( s (t ) ) = J i(t ) a i(t ) ( s ) / π i(t ) , ai(t ) = 1 if unit i belongs to the cross-sectional sample s (t ) and ai(t ) = 0 if not, and π i(t ) = Pr(i ∈ s (t ) / J i(t ) = 1) . 3.2 Panel Sample
G
(t )
Under panel sample situation, where observations are collected over time on the same sampled units, we have = 1 , Yˆ (t ) = Yˆ ( y (t ) , d ( 0) ( s (0) )) and d ( s ) = d ( 0) ( s ) ,
where t = 0 ).
d i(0) ( s ( 0) )
(3.5)
, i = 1,..., N , are the basic survey weights of the unique sample selected at the initial wave (say at time
3.3 Household Panel Sample
Under household panel sample situation, where observations are collected over time not only on the sampled units selected at the initial wave but also on non-sampled units who join households containing at least one sampled unit, we ~ have G (t ) = 1 , Yˆ (t ) = ∑ y (t ) d (t ) ( s (t ) ) , and i
i
d ( s ) = d ( 0) ( s ) ,
(3.6) ~ (t ) ( t ) (t ) (t ) ( 0) (t ) (t ) (t ) (t ) (t ) (t ) ( 0) ( 0) (0) where d i ( s ) = ∑ j J j I ji d j ( s )α ji , α ji = α i = 1 / ∑ k J k I ki a k ( s ) , I ji is an indicator variable indicating if the unit j lives in the same household as unit i at time t , and a k( 0) ( s ( 0) ) = d k( 0) ( s ( 0) ) E (a k( 0) ( s (0) )) . The value for α (jit ) is obtained through the multiplicity approach, Sirken (1970). 3.4 Multiple Samples
New entrants who have zero probability to join the initial population units are not covered, unless a supplementary sample is taken. That’s why multiple samples are used at the estimation stage at time t under household panel surveys. Under multiple samples situation each sample s g(t ) , g = 1,..., G (t ) can be viewed as a sample from the same cross-sectional population using a different sampling frame Fg(t ) . The frames are probably incomplete, but the union of the G (t ) frames ~ covers the entire cross-sectional population. The G (t ) frames overlap. Let J gi(t ) be the conditional frame membership ~ ~ indicator variable, i.e., J gi(t ) = 1 if i ∈ Fg(t ) | J i(t ) = 1 and J gi(t ) = 0 otherwise. When combining the G (t ) samples, the weights have to be adjusted to account for the multiplicity of the sampling frames: ~ d (t ) ( s ) = d (t ) ( s )α~ (t ) ,
(3.7) gi gi ~ where α~ gi(t ) are constants satisfying the constraints ∑ g J gi(t )α~ gi(t ) = 1 and d gi(t ) ( s ) are the cross-sectional weights resulting ~ ~ from frame Fg(t ) . A first choice for α~ gi(t ) is α~ gi(t ) = 1 / J +(ti ) , where J +(ti ) is the multiplicity of unit i at time t , i.e., the total gi
number of sampling frames reporting unit i
at time t , (Sirken, 1970). A second choice for α~ gi(t ) 4
Page 141
is
~
~
α~ gi(t ) = J gi(t ) f g(t ) / ∑ g J gi(t ) f g(t ) in which case one wants to take into account the sampling fraction f g(t ) of each sampling frame (Kalton and Anderson, 1986). Under multiple frames, the estimated total Yˆ (t ) can be expressed as before as 1GT ( t ) A (~t ) y (t ) . Hence, estimation of the variance can be done along the line of sub-section 3.1-3.3. Details are omitted for d
simplicity. 3.5 Rotating Samples
Rotating samples are used as a compromise between independent samples and panel surveys. Rotating samples control overlap between successive time periods and provide unbiased cross-sectional estimates while taking advantage of the correlation between two time periods. For example, in labor-force surveys which are conducted monthly in many countries, overlap is controlled by re-interviewing a high fraction of previous selected households (say G − 1 rotation groups) while new households are selected for a first interview(say one rotation group) for a total of G rotation groups or panel samples. Each household remains in the same rotation group for a predetermined number of months. Unlike multiple samples, rotating groups do not overlap. A simple example of a rotating group is given by the following two steps: (a)
Split the population at random into G groups U g ( g = 1,..., G ) each of size N / G .
(b)
Split each group at random into N / n samples each of size n / G .
The following table illustrates a rotation pattern with G = 2 , N = 20 , n = 4 and 50% overlap. Table 1 : Example of Rotating Scheme Group Sample U1
s11
Time 1 √
s12 … s15 U2
s 21
√
2
3
√
√
√ √
s 22 … s 25
Cross-sectional sample
…
s (1) = s11 ∪ s 21
s ( 2) = s12 ∪ s 21
s (3) = s12 ∪ s 22
Each cross-sectional sample is composed of 2 samples. The sample s11 appears in the first cross-sectional sample (time t = 1 ) , the samples s12 and s 21 appear in two time periods(time 2 and 3), while the sample s 22 appears only in the third cross-sectional sample (time t = 3 ). The first step consists on selecting G non-overlap samples from the population with a complete coverage of the union of the G samples. Let a g1i (U g ) be the group U g , g = 1,..., G , membership indicator variables with ∑ i a g1i (U g ) = N g , ∑ g ∑ i a g1i (U g ) = N , and a g1i (U g )a h1i (U h ) = 1( g = h) . The second step is a repetition of the first step within each
group. Let a gs 2i ( s gs ) be the conditional sample s gs membership indicator variables with ∑ i a gs 2i ( s gs ) = n gs , ∑ s ∑ i a gs 2i ( s gs ) = N g , and a gs 2i ( s gs )a gq 2i ( s gq ) = 1( s = q ) . We have G (t ) = G , Yˆ (t) = ∑ g Yˆ ( y (t ) , d g(t ) ( s g(t ) )) / G , and
d ( s ) = vech(d 1(1) ( s1(1) ),..., d G(1) ( s G(1) ),..., d 1(T ) ( s1(T ) ),..., d G(T ) ( s G(T ) )) ,
(3.8)
(t ) (t ) with d g(t ) ( s g(t ) ) = (d g(t1) ( s g(t ) ),..., d gN ( s g(t ) )) T , d gi ( s g(t ) ) = d g1i (U g )d g(t2)i ( s g(t ) ) , d g1i (U g ) = J i(1) a g1i (U g ) / E (a g1i (U g )) ,
5
Page 142
(t ) (t ) (t ) d g(t2)i ( s g(t ) ) = a gi ( s g(t ) ) / E (a gi ( s g(t ) )) , and a gi ( s g(t ) ) = a gs 2i ( s gs ) if s gs is the sample used at time t in group g .
Under rotating samples, the variance of Yˆ (t ) is given by Var (Yˆ (t ) ) = EVar (Yˆ (t ) ) + VarE (Yˆ (t ) ) . Now VarE (Yˆ (t ) ) = Var (Y (t ) ) = 0 , so an estimator of Var (θˆ) is given by (3.9) ϑ L (θˆ) = ∑ g ϑ 2 ( z g , d g 2 ( s g )) , where d g 2 ( s g ) = vech(d g(12) ( s g(1) ),..., d g(T2) ( s g(T ) )) ,
(3.10)
and ϑ 2 (.) denotes a conditional variance estimator of a total.
Composite estimation Under rotating samples, it is advantageous to use a composite estimator. Suppose the following simple composite estimator ~ Y (t ) = Yˆ (t ) − αˆ (t ) (Yˆ (t ) − Yˆ (t ) ) − βˆ (t ) (Yˆ (t −1) − Yˆ (t −1) ) . (3.11) u
m
m
where Yˆu(t ) is an estimator of the population total using the unmatched sample, Yˆm(t ) and Yˆm(t −1) are estimators of the population totals using matched samples, and αˆ (t ) and βˆ (t ) are assumed to be for simplicity regression parameters estimators as in the GREG approach. The case of optimal regression is covered in Demnati and Rao (2001). A variance ~ ~ estimator of the composite estimator, θ = g (Y ) , can be obtained using (3.9) with d g 2 ( s g ) = vech(d g( 02) ( s g( 0) ),..., d g(T2) ( s g(T ) )) .
(3.12)
The composite estimator (3.11) has three components as in the AK composite estimator, but it is not recursive and estimates for different domains add up unlike in the case of AK composite estimator. Concluding Remarks
We have presented a new approach to variance estimation under longitudinal survey data. A valid variance estimator is given under a variety of longitudinal sampling designs. Extension to estimators defined as solutions to survey weighted estimating equations is currently under investigation. References
Binder, D. (1996), ΑLinearization Methods for Single Phase and Two-Phase Samples: A Cookbook Approach≅, Survey Methodology, 22, 17-22. Demnati, A. and Rao, J. N. K. (2001),Α Linearization Variance Estimators for Survey Data≅, Methodology Branch Working Paper, SSMD-2001-010E. Statistics Canada. Demnati, A. and Rao, J. N. K. (2002), ΑLinearization Variance Estimators for Survey Data With Missing Responses≅, in Proceeding of the Section Survey Research Methods, American Statistical Association. Kalton, G. and Anderson, D. (1986), “Sampling Rare Populations”, Journal of the Royal Statistical Society, Series A, 149, Part 1, pp.65-82. Royall, R. M., and Cumberland, W. G. (1981), ΑAn Empirical Study of the Ratio Estimator and Estimators of its Variance≅, Journal of the American Statistical Association, 76, 66-77. Särndal, C.-E., Swensson, B., and Wretman, J.H. (1989),ΑThe Weighted Residual Technique for Estimating the Variance of the General Regression Estimator of the Finite Population Total≅, Biometrika, 76, 527-537. Sirken, M. G. (1970),”Household Surveys with Multiplicity”, Journal of the American Statistical Association, 65, 257-266. Valliant, R. (1993),ΑPostsratification and Conditional Variance Estimation≅, Journal of the American Statistical Association, 88, 89-96.
6
Page 143