Estimation of Panel Data Models with Parameter ... - Columbia University

Report 4 Downloads 112 Views
Estimation of Panel Data Models with Parameter Heterogeneity when Group Membership Is Unknown Chang-Ching Lin∗

Serena Ng†

April 25, 2011

Abstract This paper proposes two methods for estimating panel data models with group specific parameters when group membership is not known. The first method uses the individual level time series estimates of the parameters to form threshold variables. The problem of parameter heterogeneity is turned into estimation of a panel threshold model with an unknown threshold value. The second method modifies the K-means algorithm to perform conditional clustering. Units are clustered based on the deviations between the individual and the group conditional means. The two approaches are used to analyze growth across countries and housing market dynamics across the states in the U.S.

Keywords: Threshold Models, Cluster Analysis, Convergence Clubs, Regional Housing.



Institute of Economics, Academia Sinica, 128 Academia Road, Sec 2, Taipei 115, Taiwan. Tel: 886-2-27822791 ext. 301. E-mail: [email protected] † Department of Economics, Columbia University, 420 West 118 St., New York, NY 10027 USA tel: 212-854-5488 Email: [email protected]. The second author acknowledges financial support from the National Science Foundation (SES-0549978).

1

Introduction

This paper considers estimation of panel data models when the slope parameters are heterogeneous across groups, but that group membership is not known to the econometrician. We consider two methods that let the data determine the grouping. The first method uses the time series estimates of the individual slope coefficients to form threshold variables. The problem of identifying group membership is turned into one of estimating a threshold panel regression with an unknown threshold value. The second method is a modification of the K-means algorithm. The units are clustered according to the deviations between the individual and the group conditional means. Both approaches are agnostic about the sources of parameter heterogeneity. Panel data models often take parameter homogeneity as a maintained assumption even though evidence against it is not difficult to find. Using data on US manufacturing firms, Burnside (1996) rejects homogeneity of the parameters in the production function. Lee, Pesaran, and Smith (1997) find that the convergence rates of per capita output to the steady state level are heterogeneous across countries. Hsiao and Tahmiscioglu (1997) find heterogeneity in the parameters that describe investment dynamics and observe that such differences cannot be explained by commonly considered firm characteristics. As Browning and Carro (2007) point out, there is usually much more heterogeneity than empirical researchers allow. Robertson and Symons (1992) find using simulations that the Anderson and Hsiao (1982) estimator can be severely biased when parameter heterogeneity is omitted.1 While fixed effects estimation allows for time invariant type heterogeneity through the intercept, few methods are available to allow for heterogeneity in the slope parameters. A random coefficient model2 can estimate the mean of the coefficients but is uninformative about the response at a more disaggregated level, which is sometimes an object of interest. Maddala, Trost, Li, and Joutz (1997) suggest a Bayesian method that shrinks the individual estimates toward the estimator of the overall mean. Alvarez, Browning, and Ejrnæs (2010) parameterize the individual coefficients as a function of observed characteristics, but the results necessarily depend on the specifications used. While one can assume complete parameter heterogeneity, this would reduce the problem to time series estimation on a unit by unit basis which does not take advantage of the panel structure of the data. 1

Song (2004) studied the convergence rate of estimating group parameters under cross-section dependence from the individual coefficient estimates and develops a measure of parameter heterogeneity. However, the ideas have not been illustrated using simulations or applications. 2 See, for example, Swamy (1970) and Hsiao and Pesaran (2004).

1

We are interested in precise modeling of the conditional mean function and not group membership per se. We consider two ways of clustering units with the objective of pooling observations to estimate the group specific parameters. In this way, units within a group have the same parameters, but the parameters are heterogeneous across groups. This provides a compromise between complete parameter heterogeneity and parameter homogeneity. The remainder of this paper is organized as follows. After a review of related work, Section 3 presents the pseudo threshold method. Extensions to multiple groups and multiple covariates are considered in Section 4. The conditional K-means approach is presented in Section 5. Simulations are then presented in Section 6. We apply the methods to study economic growth across countries and regional housing dynamics in the U.S. 2

Related Literature and the Econometric Framework

The simplest way to form clusters from a set of heterogeneous observations on a scalar variable yit is P to plot the unconditional mean of the ordered data βb(i) = y¯i = T1 Tt=1 yit for i = 1, . . . , N , and then ‘eyeball’ to see when βb(i) abruptly shifts from one mean to another.3 Such a graphical approach is often a useful diagnostic, but does not permit formal statistical statements to be made. One can also use a priori information to organize units into groups, but the approach is not objective. A more systematic approach is model based clustering which assumes that the data are generated by a mixture of distributions. That is, an observation drawn from group g is assumed to have density fg (yit , xit |βg ), where βg are the group-specific parameters of the conditional mean function, and Q QT xit are the covariates. The likelihood is then L(x; θ) = N i=1 t=1 fg (yit , xit ; β). As identification of mixture models is a delicate problem, the parameters are typically estimated using Bayesian methods. The exception is Sun (2005), who considers the problem from a frequentist perspective. These likelihood based analyses yield an estimate of the probability of which group a unit belongs, but can be computationally cumbersome if N is large because we need to consider up to 2N possible combinations of the data. While the focus of cluster analysis is usually group membership, our interest is precise estimation of the group-specific parameters βg . To this end, we consider a balanced panel of data with observations (e yit , x eit ), i = 1, . . . , N , t = 1, . . . , T , and that N and T are large. Let K be the 0 ) be indicator variables number of regressors and G be the number of clusters. Let I 0 = (I10 , . . . , IG 3

See, for example, Henderson and Russell (2005).

2

for true group membership. Then for g = 1, . . . , G, Ng0 is the number of individuals in cluster Ig0 . The panel data model is yeit = αi + x eit β(i) + eit = αi + x eit βg + eit

i ∈ Ig0 ,

where x eit = (e x1it , . . . , x eKit ) be a 1 × K vector of regressors, αi is unobserved heterogeneity, β(i) = (β(i)1 , . . . , β(i)K )0 is a K×1 vector of slope coefficients for unit i. Group effects are modeled by letting βg = (βg1 , . . . , βgK )0 denote a K × 1 vector of group-specific slope coefficients such that β(i) equals P P ekit − T1 Tt=1 x ekit or is well approximated by βg for all i’s in Ig0 . Let yit = yeit − T1 Tt=1 yit and xkit = x for k = 1 . . . , K. The model in terms of demeaned data becomes yit = xit βg + eit

i ∈ Ig0 .

The econometric exercise is to estimate βg without knowing Ig0 .4 This requires a way to pool ‘similar’ observations for estimation. Two methods are considered. The first defines similarity in terms of the slope coefficients, and the second defines similarity in terms of the conditional mean. Let Ngj be the number of units assigned to group j when they belong to group g. Then P Ns = g6=j Ngj is the number of misclassified units. Let I = (I1 , . . . , IG ) denote arbitrary group P membership. The size of group j based on an arbitrary classification is Nj = Njj + g6=j Ngj . The following assumptions will be used. Assumption 1: For all i and t, (a) eit ∼ (0, σi2 ) has finite fourth moments and is cross-sectionally P and serially independent. (b) 0 < σi2 < ∞, and (c) N −1 N i=1 αi = O(1). Furthermore, eit is independent of yi0 and uncorrelated with βg for all g = 1, . . . , G. Assumption 2: (a) Ng0 /N > 0 and Ng0 /N → π with 0 < π < 1. (b) 0 ≤ lim N/T < ∞ as N and T diverge jointly. In the case of a dynamic panel model with x e1it = yei,t−1 , we also need the following. 2 ) Assumption 3: For g = 1, . . . , G and i ∈ Ig0 , |βg1 | < 1, yi0 = αi /(1−βg1 )+ui0 , where ui0 ∼ (0, σu,i 2 < ∞. Furthermore, u is cross-sectionally independent, has finite fourth moments and 0 < σu,i i0

independent of eit , and uncorrelated with βg1 . 4

A group specific intercept can be recovered once group membership is determined by regressing α bi = y¯i − βbg x ¯i on a set of group dummies.

3

We assume cross-section independence of eit to simplify the presentation. Cross-section dependence can be entertained by explicitly controlling for the presence of common factors. Assumptions 2 and 3 are similar to the ones used in Hahn and Kuersteiner (2002), Alvarez and Arellano (2003), and Pesaran and Yamagata (2008) for dynamic panel model with fixed effects. The assumptions can be relaxed if the regressors are strictly exogenous. 3

A Two-Step Pseudo Threshold Approach

Goldfeld and Quandt (1973) were the first to use threshold variables, also referred to as transition variables, to form clusters. They considered a model in which the clusters are determined by a linear function of several transition variables and proposed a D-method to estimate the parameters in the transition function by maximum likelihood. The D method assumes deterministic switching of regimes, and stands in contrast to the λ-method in which units are assigned to regimes in a random manner. A more popular idea, also due to Goldfeld and Quandt (1973), is to partition a data set based on a known threshold variable taking on an unknown threshold value. Threshold autoregressive and structure break models are variations of this approach. To fix ideas, consider the case of G = 2 groups. The model expressed in demeaned data is  xit β1 + eit i ∈ I10 (1) yit = xit β2 + eit i ∈ I20 . Suppose there exists a variable qi0 and a set of cut-off parameter values Γ0 such that i ∈ I10 if qi0 ≤ γ 0 for any γ 0 ∈ Γ0 , and i ∈ I20 otherwise. Then the pair of variables (qi0 , γ 0 ) provides perfect information about group membership, and (1) can be written as a panel threshold model:  xit β1 + eit qi0 ≤ γ 0 yit = xit β2 + eit qi0 > γ 0 .

(2)

Hansen (1999) considers threshold panel regressions where the sample is split according to 0 is less than some cutoff value whether an observed and possibly time varying threshold variable qit 0 ≥ γ 0 , but might be in another group in period t + 1 γ. Unit i can be in one group in period t if qit 0 if qit+1 < γ 0 . Our threshold variable qi0 in (1) is not observed, and group structure does not change

over time. Because of these differences, we call qi a ‘pseudo threshold variable’ and γ the ‘pseudo threshold parameter’ to distinguish them from the definitions used in the literature. If qi0 and Γ0 were known, estimates of β1 could be obtained by pooling observations with qi0 ≤ γ 0 for any γ 0 ∈ Γ0 , while observations with qi0 > γ 0 would be pooled to estimate β2 . The problem, 4

however, is that neither qi0 nor Γ0 is observed. We propose to replace qi0 by some qbi that has the same information as qi0 in the sense that qbi ≤ γ when qi0 ≤ γ as T → ∞ for a given γ. Unit i is then classified into group 1 if qbi is less than γ; otherwise unit i is in group 2. More precisely, we propose to use qbi to order the data, which has the computational advantage that any unit i0 with qbi0 < qbi will also be classified in the same group as unit i. Thus, even though there are 2N possible groupings of the data, we only need to consider at most N − 1 possible values of γ. Given qbi , the problem of how to group units with similar coefficients is formulated as finding the value of the threshold parameter γ that minimizes SN T (γ, qb): γ b = arg min SN T (γ, qb),

(3)

q] γ∈[b q ,b

where SN T (γ, qb) is the total squared residuals, obtained by summing over groups 1 and 2: SN T (γ, qb) =

T X X

(yit − xit βb1 (γ))2 +

i|b qi ≤γ t=1

T X X (yit − xit βb2 (γ))2 .

(4)

i|b qi >γ t=1

As in the breakpoint literature, the threshold parameter cannot be too large or too small, as defined by qb and qb. If the trial value of γ is too low, βb2 (γ) will be estimated with some observations from group 1 and will not be consistent for β2 . Similarly, at too high a value of γ, βb1 (γ) will be estimated with observations from group 2, and hence will not precisely estimate β1 . The optimization problem yields Ib1 = {i|b qi ≤ γ b} and Ib2 = {i|b qi > γ b}. Since the data are ordered according to qbi , Ib1 and Ib2 are the same if qbi∗ = γ e for any γ e ∈ [qi∗ , qi∗ +1 ). We will refer to this procedure as PSEUDO(G,K). It remains to be precise about the choice of qbi . Now β(i) = β1 if i ∈ I10 and β2 6= β1 by assumption. Thus, qi = β(i) along with any γ 0 ∈ [β1 , β2 ] completely summarizes group membership. For example, βω = ωβ1 +(1−ω)β2 for ω ∈ (0, 1) is a valid value of γ 0 . This suggests letting qbi = βb(i) where for each i, βb(i) is the least squares estimate from a time series regression of yit on xit .5 Since T is large by assumption, βb(i) is a consistent estimate of β(i) . To see that qbi = βb(i) will separate the sample, suppose γ b = βbω , where βbω is a consistent estimate of βω = ωβ1 + (1 − ω)β2 . Under Assumptions 1–3:   √ √ b b b b P (β(i) > βω |β(i) = β1 ) = P T (β(i) − β1 ) > T (βω − β1 ) β(i) = β1     √ √ 1 1 b √ = P T (β(i) − β1 ) > T (1 − ω)(β2 − β1 ) + Op ( ) + Op ( ) β(i) = β1 . T NT 5

We also consider using qbi =

heterogeneity due to σi .

b −β bω ) (β (i) b −1/2 σ bi Q i

b i = T −1 PT x0it xit . By standardizing βb(i) , we control for where Q t=1

5

Since β2 − β1 > 0 by assumption, the expected misclassification rate is E(Ns (b γ )/N ) = P (βb(i) > βbω |β(i) = β1 ) + P (βb(i) < βbω |β(i) = β2 ) = O(T −1 ). Instead of imposing γ b to βbω , we let γ b be determined by (3), but the argument is similar. Procedure PSEUDO(G,K) for G = 2, K = 1 can be summarized as follows.

Algorithm 1: PSEUDO(2,1) 1. For each i, regress yit on xit to obtain βb(i) and let qbi = βb(i) . 2. Use (3) to obtain γ b. 3. Assign unit i into Ib1 if qbi ≤ γ b; and into Ib2 if qbi > γ b. 4. For g = 1, 2, estimate βg using all units in Ibg . It is shown in the Appendix that PSUEDO(2,1) yields a negligible misclassification rate in the sense that Ns (b γ )/N → 0 as (N, T ) → ∞ jointly. In practice, we restrict the number of units in each group to be at least N = max(10, 0.1N ) and N = N − N . This avoids having groups with too few units. As qbi − qi = Op (T −1/2 ), the two-step procedure may be imprecise when T is small. 3.1

Extension to Multiple Regressors

We now turn to the case when there are K > 1 regressors. There are two cases to consider. The first occurs if a subset but not all K parameters are suspicious of being different across groups. In such a case of partial parameter homogeneity, procedure PSEUDO(2,1) is still valid even when K > 1. For example, if the second slope coefficient varies between groups, we define qbi = βb(i)2 and apply Algorithm 1. More difficult to handle is the second case of complete parameter heterogeneity which arises when all K coefficients are group specific. To see why this is more involved, suppose there are two regressors, x1it and x2it and there are G = 2 clusters. Let βg = (βg1 , βg2 )0 , g = 1, 2 be the group specific slope parameters. Suppose first that β11 > β21 and β12 > β22 . Since both parameters are + strictly larger in one group, an obvious pseudo threshold variable is βb(i) = βb(i)1 + βb(i)2 . But this

does not always work! In particular, if β11 > β21 but β12 < β22 , the sum of the coefficients may not be a sufficient statistic for group membership. For example, suppose that (β11 , β12 ) = (0.8, 1) and (β21 , β22 ) = (1, 0.8). Since β11 + β12 = β21 + β22 , the appropriate pseudo threshold variable is no 6

− + − longer βbi+ , but βb(i) = βb(i)1 − βb(i)2 . Although βb(i) or βb(i) can be used as threshold variable, we would

first need to determine the sign of the coefficients before we can classify the units. In an earlier version of this paper, we used the Goodman-Kruskal’s gamma statistic to measure the association between pairs of concordant (same sign) and discordant data (opposite sign) data. Although the method works reasonably well in simulations, it is somewhat cumbersome. A simpler and more effective approach is to recognize that even in the case of complete parameter heterogeneity, we can still split the sample using one of the βb(i)k parameters since each component of βb(i) = (βb(i)1 , . . . , βb(i)K )0 is informative about group membership. The only issue that remains is which βb(i)k to use. We let the data speak by considering each component as a possible candidate and choose the one that minimizes the sum of squared residuals. More precisely, for given k with qbik = βb(i)k , let γ bk be estimated from (3). Define SN T,k (b γk ) =

T T X X X X (yit − xit βb1 (b γk ))2 + (yit − xit βb2 (b γk ))2 , i|b qik ≤b γk t=1

i|b qik >b γk t=1

where x0it , βb1 , and βb2 are K × 1 vectors. The best threshold variable is βb(i)k∗ where k ∗ = min SN T,k (b γk ). k

(5)

The procedure for complete parameter heterogeneity when K > 1 is summarized as follows: Algorithm 2: PSEUDO(2,K) 1 For each i, regress yit on xit to obtain βb(i) . 2 For k = 1, . . . , K, let qbik = βb(i)k . i Use (3) to obtain γ bk . ii Assign unit i into Ib1 if qbik ≤ γ bk ; and into Ib2 if qbik > γ bk . iii For g = 1, 2, estimate βg using all units in Ibg and record SN T,k (b γk ). 3 Let k ∗ = arg mink SN T,k (b γk ), qbi = βb(i)k∗ and apply Algorithm 1. The appeal of this approach is generality, since the procedure is the same for any K. Partial parameter heterogeneity is just a special case when Steps 1 and 2 can be skipped, and qbi is simply the parameter estimate associated with the variable whose effect on yit is group specific.

7

Procedure PSEUDO(2,K) extends naturally to G > 1, which would then have G − 1 threshold values. In the multiple structural breaks literature, Bai (1997) showed that a sequential approach can consistently estimate the break fractions without the need to search for all break dates simultaneously. This idea is adapted to our threshold problem. We estimate two subgroups from each of the groups identified in the preceding step, subject to the constraint that the size of each subgroup is not too small. For example, given Ib1 and Ib2 , we first partition Ib1 into Ib11 and Ib12 . If Ib11 and Ib12 both have a minimum number of observations, then they together with Ib2 form one possible split of the sample into three groups. Similarly, Ib2 is partitioned into Ib21 and Ib22 , which together with Ib1 form another possible sample split of three groups. We then decide which of the two possibilities for G = 3 to keep by comparing the sum of squared residuals. 4

Conditional K-means Clustering

The K-means algorithm is a popular way of forming clusters from a single series with N observations. The algorithm moves unit i to an appropriate group to minimize the sum of squared deviations between the units and the centroids.6 Except for the gene-array analysis in Qin and Self (2005), the K-means method tends to be used to form clusters from observations on a scalar variable with no reference to covariates. We modify the K-means algorithm for use in regression analysis. which can be thought of as a form conditional clustering. We refer to the procedure as CK-means(G,K). Algorithm 3: CK-means(G,K) 1. Initialize {I¨1 , . . . , I¨G } randomly and let (β¨1 , . . . , β¨G ) be the pooled estimates of β1 , . . . , βG . 2. Repeat for i = 1, . . . , N until no individual changes group : (a) Calculate SSRig =

PT

t=1 (yit

− xit β¨g )2 , g = 1, . . . , G. 0

(b) For g = 2, . . . , G, g 0 = 1, . . . , g − 1, if SSRig ≤ SSRig individual i is re-assigned to group g 0 ; otherwise, i stays with group g. (c) Update {I¨g , . . . , I¨G } and estimate (β¨1 , . . . , β¨G ). The unconditional K-means method is known to be sensitive to the initial choice of the centroids and is not guaranteed to find the global minimizer. Thus Steps 1 to 2 are repeated several times 6

There are many variations to the basic algorithm. Harmonic and fuzzy means have also been used instead of simple means. See, for example, Hartigan (1975), Abraham, Cornillion, Matzner-Lober, and Molinari (2003).

8

with initial group assignment.7 Assuming i.i.d. data, Pollard (1981) uses empirical process arguments to obtain a strong consistency result, while Pollard (1982) shows that the centroids estimated by the algorithm are asymptotically normal. However, Pollard (1981) notes that his consistency result does not necessarily apply to algorithms used in practice which involve multiple starting values. The asymptotic properties of K-means algorithm used in practice is not known even in the absence of covariates. While our pseudo threshold procedure minimizes the same objective function as CK-means, some differences are noteworthy. First, we only estimate the ordered regression once. The CKmeans algorithm makes random initial guesses of the centroids and then evaluates if a move to a different group is desirable unit by unit. This makes the CK-means method computationally costly when N is large. Furthermore, when there are multiple alternatives and N is large, convergence of the CK-means can be slow. Second, because we follow the structural break literature and search for the optimal threshold value in the subsample [N , N ], our approach is less sensitive to outliers. Simulations bear this out. Third, we locate the threshold values one at a time, starting with the largest. In contrast, the CK-means is a global procedure. Units found to be in Group 1 by the CK-means when G = 2 may well be in Group 2 when G = 3. The CK-means method also has some advantages. First, the algorithm relies only on the pooled p estimator βbg which is Ng T consistent, and does not require the individual estimates βb(i) ’s, which √ are T consistent. Thus the CK-means method should be more precise than the PSEUDO when N or T is small. Second, the CK-means method considers moving every unit to a different group. Our pseudo threshold method moves all those units with qbi above and below the threshold value simultaneously. The simultaneous move method is fast, but can be inaccurate when the ordering of qbi does not agree with qi , as may be the case when the sample size is small, or when qi does not provide complete information about the group structure. We can therefore expect a trade-off between precision and speed in the two methods. Third, the CK-means algorithm is easier to implement when G > 3 because it is a global procedure and considers moving every unit to a different group. In contrast, PSEUDO is a sequential procedure; the membership of the subgroups identified by PSEUDO always depends on the outcome of the preceding step. 7

Garcia-Escudero and Gordaliza (1999) pointed out the algorithm can be sensitive to outliers.

9

5

Determining G

Both the pseudo threshold and the conditional K-means algorithm require knowledge of the number of groups, G. An informal approach is to graph the value of the objective function SN T for a given G against G and then locate the ‘knee point’ at which the objective function starts to flatten. More formal procedures have been proposed for unconditional clustering of yit . Milligan and Cooper (1985) consider 30 procedures and find that the global procedure of Calinski and Harabasz (1974) works best, while the local procedure of Duda and Hart (1973) is second. But as Sugar and James (2003) point out, most methods are aimed at clustering data with specific properties and no method works uniformly well. We experimented with many of these methods and found them to be accurate only when the parameters in different groups are very far apart. The problem of determining the number of clusters is similar to determining the number of break points or thresholds in many ways. In breakpoint problems, we can use a sup-Wald type test for the null hypothesis of no break.8 However, there are three features that make the SupW test for parameter homogeneity infeasible here. First, βb1 and βb2 are estimated from two split samples ordered by βb(i) . By construction, one sample will have smaller values of βb(i) and the other will have the larger values. Thus, the pooled estimate will be biased if β1 = β2 . Second, βb1 and βb2 are correlated when β1 = β2 , making inference non-standard. Third, as qbi is ordered, bootstrap procedures valid for cross-sectionally independent data are now invalid. We found two ways that determine G quite accurately in our setup. The first use a sequential test of parameter homogeneity to provide information about the number of groups. Specifically, if we reject parameter homogeneity in the pooled data, we partition the sample into two groups and then test if parameter homogeneity holds in each of the subgroups. If subsample homogeneity is rejected, the sample is split again until the null hypothesis of parameter homogeneity cannot be rejected for the subsamples. We use the dispersion tg test proposed by Pesaran and Yamagata (2008) to test parameter homogeneity: √ N (ξN /N − K) √ tg = , 2KG where K denotes the number of the regressors, ξN =

(6)

P  T −2 b 0 0 x b e e 0 σ e ( β − B ) x i w i=1 i t=1 it it (βi − Bw ) ,

PN

ew is the weighted pooled fixed effects estimator of Swamy (1970), and σ B ei2 is obtained under the null 8

See, for example, Davies (1977), Andrews and Ploberger (1994), Hansen (1996), Bai (1997), and Caner and Hansen (2004).

10

hypothesis of homogeneity. The tg test allows for heteroskedasticity and non-normally distributed √ errors and is consistent as N and T go to infinity jointly such that N /T 2 → 0. The second approach is to use a modified BIC criterion:   2 e e e · cN T log(N T ) + (G e − 1) log(N ) , BIC(G) = log ΣN T (G, γ b, qb) + GK NT N2

(7)

where e G T 1 X 1 XX e ΣN T (G, γ b, qb) = (yit − xit βbg (b γ ))2 . e Ng T G g=1

i∈Ibg t=1

The goodness of fit component of the BIC is computed as the average over groups of the regression √ error variance. The penalty term log(N T )/N T is guided by the fact that βbg is N T consistent under the null and thus the BIC should consistently select G if (i)

c∗N T NT

=

cN T log(N T ) NT

→ 0, and (ii)

c∗N T → ∞ as N, T → ∞. When all regressors and γ are observed, BIC obtains with c∗N T = log(N T ) and cN T = 1. We consider a heavier penalty because qbi and γ b are themselves estimated. Based p on extensive simulations, we let cN T = min[N, T ]. The required conditions for consistent model selection are satisfied because c∗N T = cN T log(N T ) → ∞ and

c∗N T NT

→ 0. Furthermore, the breakpoint

literature suggests γ b is super-consistent with a convergence rate of N 2 . We put a penalty of log(N 2 ) e − 1) log(N 2 )/N 2 . The idea of using on each threshold variable, giving an overall penalty on γ b of (G the BIC or tg test to determine G is simple, but the exposition of the complete algorithm is notationally involved. Details are given in an appendix available on request. 6

Simulations and Applications

We now use Monte Carlo simulations to examine the finite sample properties of the methods considered. For G = 2, 3, K = 1, 2, data are generated as yeit = αi +

G X X

x e0it βg + eeit ,

g=1 i∈Ig0

where βg is K × 1, αi ∼ i.i.d. N (1, 1), x ekit ∼ i.i.d. N (1, 3), and independent of eeit , eeit ∼ N (0, 1) is i.i.d. over i and t. When G = 2, we randomly assign individuals into two groups {I10 , I20 } with size N10 = b2N/3c and N20 = N − N10 , where bAc denotes the maximum integer that does not exceed real number A. When G = 3, we randomly assign individuals into three groups {I10 , I20 , I30 } of equal size bN/3c. We let N = (50, 100, 200, 500) and T = (20, 50, 100, 200, 500). We consider the following configurations: 11

i (G, K) = (2, 1): (β1 , β2 ) = (0.3, 0.9). ii (G, K) = (3, 1), (β1 , β2 , β3 ) = (0.3, 0.5, 0.8). iii (G, K) = (2, 2): β1 = (β11 , β12 )0 = (0.1, 0.3)0 and β2 = (β21 , β22 )0 = (2/3, 0.6)0 . iv (G, K) = (3, 2): β1 = (β11 , β12 )0 = (0.3, −0.3)0 , β2 = (β21 , β22 )0 = (0.5, 0)0 and β3 = (β31 , β32 )0 = (0.7, 0.3)0 . These parameterizations give an R2 of around 0.5. For given N and T , group membership is held fixed in the M = 1000 replications. We determine G by the BIC defined in (7). We impose the restriction that each subgroup must contain at least N = max{10, 0.1Np } units, where Np denotes the number of units in the parent group. We evaluate the root mean squared error of the estimates (RMSE) defined as v u M X N

2 u1 X 1

b(m) b

t RM SE =

β(i) (G) − β(i) (G) M NK m=1 i=1

(m) b where βb(i) (G) is the pooled slope parameter in the m-th replication estimated for the ith unit based

b and β(i) (G) is the true slope coefficient for the unit. To assess the error in the estimates due on G, to estimation of G, we also consider the RMSE (a) when G and group membership are known, and (b) when G is known but group membership is not. Group membership is determined by PSEUDO or the CK-means. As can be seen from Table 1, the RMSEs of both PSEUDO and CK-means decrease as N or T increases. An increase in T has a larger impact on RMSE than an increase in N . When G = 2, PSEUDO and CK-means yield similar RMSEs when T is large, but the CK-means has smaller errors √ when T is small. This is to be expected since the pseudo threshold method requires T consistent estimation of the individual slope parameters. When G = 3 and assumed known, the CK-means tends to outperform PSEUDO, perhaps because the former can cluster units more flexibly. When G is estimated, The RMSEs are similar to those when both G and group membership are known, provided T is large. This suggests that estimation of group membership has little impact on the estimated slope parameters when T is large. To check robustness, we consider smaller differences in the parameters between groups. i’ (G, K) = (2, 1): (β1 , β2 ) = (0.55, 0.65). 12

ii’ (G, K) = (3, 1), (β1 , β2 , β3 ) = (0.4, 0.5, 0.6). iii’ (G, K) = (2, 2): β1 = (β11 , β12 )0 = (0.3, 0.4)0 and β2 = (β21 , β22 )0 = (0.4, 0.5)0 . iv’ (G, K) = (3, 2): β1 = (β11 , β12 )0 = (0.4, 0.2)0 , β2 = (β21 , β22 )0 = (0.5, 0.3)0 and β3 = (β31 , β32 )0 = (0.6, 0.4)0 . The RMSE results are reported in Table 2. Not surprisingly, PSEUDO and CK-means need a larger T to be precise. However, other features of the results are similar to those in Table 1. We also generate data from a dynamic panel model with group specific parameters: yeit = αi + ρg yei,t−1 + φg t + eeit ,

if i ∈ Ig0 .

(8)

We set αi = 0 for all i’s, (ρ1 , ρ2 ) = (0.3, 0.8), (φ1 , φ2 ) = (0, 0.03), eeit ∼ N (0, 1) is i.i.d. over i and t. The results are presented in Table 3. As in the static DGP, the RMSE tends to decrease as T increases. Furthermore, the RMSEs for PSEUDO and CK-means are similar to those with known membership when T is large. Overall, the results find that PSEUDO and CK-means have good properties especially when T is large. For small sample sizes, the CK-means is preferred. When the sample size is large, PSEUDO tends to be as effective as CK-means. The PSEUDO has the distinct computational advantage that the number of regressions is of order N , much smaller than the GN regressions under CK-means. 6.1

Example 1: Growth Regressions

The existence of “convergence clubs” has generated much research interests in the growth literature. A group of countries with a similar steady state that can be characterized by the same linear model are said to form a convergence club. Lee, Pesaran, and Smith (1997) took data for 69 countries over the sample 1965 to 2003 from PWT v6.2 provided by Heston, Summers, and Aten (2006).9 The regression model is yeit = αi + ρg yeit−1 + φg t + eeit ,

i ∈ Ig ,

g = 1, 2, . . . , G,

(9)

where yeit is the log per-capita output, αi denotes country-specific fixed effects, ρg and φg , g = 1, . . . , G, G = {1, 2, . . . , 5}, are group specific. While previous studies allow for differences in αi , 9 We started with 75 countries, as in Mankiw, Romer, and Weil (1992). From this, Germany is removed from data set due to reunification. Due to limitation of data, we also remove Bangladesh, Bolivia, Botswana, Haiti, and Myanmar.

13

we allow for the possibility that φ and ρ are potentially heterogeneous. The BIC suggests G = 1, but tg rejects parameter homogeneity, and the CK-means suggests G = 4. Results for G = 4 are reported in the left panel (Model A) of Table 4. Evidently, φbg is negative in groups 1 and 2, but positive in groups 3 and 4. Furthermore, ρbg is much smaller in groups 1 and 3 than in groups 2 and 4. Thus, both ρg and φg appear to be heterogeneous. Interestingly, the 21 OECD countries do not all belong to the same group while the fast growing countries including Indonesia, Korea, Malaysia, and Thailand are in the same (non-OECD) group. A priori information would unlikely arrive at such a grouping. Equation (9) assumes cross-section independence in eeit . Pesaran (2006) suggests to control for cross-correlated errors by adding the cross-section average of appropriate variables to the pooled regression. To avoid simultaneity bias, we add ∆¯ yt−1 and ∆¯ yt−2 to both the pooled and the P eis , 4ys = y¯s − y¯s−1 , s = 1, . . . , T . The result to individual regressions, where y¯s = N −1 N i y highlight in the right panel of Table 4 (labeled Model B) is that countries differ in both the growth rate and in the speed of adjustment to equilibrium. 6.2

Example 2: Housing Dynamics

Housing wealth is a large component of household wealth and housing market activities are always closely watched by policy makers and business cycle analysts. Stock and Watson (2008) provide a new data set on state-level monthly seasonally adjusted building permits (Hit ) from 1969:1 to 2007:4. They use the K-means algorithm to cluster the four quarter change in the idiosyncratic component of log(Hit ), where the common component is estimated by a dynamic factor model. The K-means algorithm finds G = 5 groups that roughly define contiguous geographical regions, even though no spatial structure was imposed in the estimation. Their results suggest that variations in housing permits consist of a national, a regional, and an idiosyncratic component. We analyze the data to study if long run effects of the federal funds rate Rt on Hit are possibly group specific. For each i = 1, . . . , 50, an autoregressive distributed lag model in Hit and Rt can be reparameterized as: ∆4 log(Hit ) = αi + β(i)1 log(Hit−1 ) + β(i)2 Rt−1 +

3 X

γ(i)1,j ∆ log(Hit−j ) +

j=1

3 X

γ(i)2,j ∆Rt−j + eit .

j=1

The parameter β(i)2 is the sum of the coefficients on Rt−1 and its lags, while β(i)1 is the sum of the coefficients on the autoregressive terms. The long run effect 14

β(i)2 1−β(i)1

can differ across states if

β(i)1 and/or β(i)2 vary across states. We allow the short-run dynamics (γ(i)1,j , γ(i)2,j ) to be state specific. As there is no evidence to support the need to consider more than two groups, we report only results of G = 1 and G = 2. We consider four models and the results are reported in Table 5. (A) β(i)1 = β1 , β(i)2 = β2 for all i; (B) β(i)1 = βg1 and β(i)2 = βg2 for i ∈ Ig ; (C) β(i)1 = β1 for all i, β(i)2 = βg2 for i ∈ Ig ; (D) β(i)1 = βg1 for i ∈ Ig , β(i)2 = β2 for all i. Model (A) is a pooled regression that imposes parameter homogeneity. Model (B) allows both β(i)1 and β(i)2 to be group specific. Model (C) allows β(i)2 to be group specific but β(i)1 is the same across i. Model (D) allows β(i)1 to be group specific but β(i)2 is the same across i. Not surprisingly, the pooled estimates of β1 and β2 are roughly the average of the group estimates, and the sum of squared residuals of model (A) is largest. The group specific long run effects are always larger in absolute value than the one implied by the pooled regression. Given the group-specific estimates βb1 and/or βb2 , the tg statistic is used to test for parameter heterogeneity in the full panel, as well as within the identified groups. While parameter homogeneity in the full panel is always rejected, the parameters in the subgroups appear to be homogeneous. The tg statistic does not take into account that allowing for parameter heterogeneity increases the complexity of the model. When this is taken into account by the BIC, group effect is found for β1 and only when group membership is determined by the CK-means. Overall, the data find some evidence for parameter heterogeneity but do not strongly reject a pooled regression for the 50 states. This shows that the proposed method favors group effects only if the evidence is strong. 7

Conclusion

We use time series estimates of the coefficients for each unit to form ‘pseudo threshold variables’. These are then used to partition the panel into groups. A conditional K-means algorithm is also considered. Both methods can be used to estimate group-specific parameters in panel data models when group membership is not known.

15

APPENDIX: In this appendix, we show that PSUEDO(2,1) yields a negligible misclassification rate in the sense that Ns (b γ )/N → 0 as (N, T ) → ∞ jointly. Let I 0 = (I10 , I20 ) be the true group membership, let 0 Ng denote the number of individuals in cluster Ig0 for g = 1, 2, and let I = (I1 , I2 ) denote group membership other than (I10 , I20 ). Suppose that the DGP is yeit = αi + x eit β1 + eeit , for i ∈ I10 , yeit = αi + x eit β2 + eeit , for i ∈ I20 . We will consider the general case where β2 − β1 = νT −α , 0 ≤ α < 1/2, ν does not depend on T , and ||ν|| > 0. Then α = 0 corresponds to the case when β2 − β1 = ν 6= 0. For g, j = 1, 2, let Ngj be the number of individuals assigned to be in group j by I = (I1 , I2 ) when individuals truly belong to group g. We also let Ns = Ns (I, I 0 ) = N21 + N12 be the number of misclassified units. If Ns /N → 0, then we have βbg (b q ) → βg , g = 1, 2. In the following, we will show Ns /N → 0 0 under PSEUDO. Let Γ be a set of threshold values that will achieve correct clustering. Let 0 0 0 , γ 0 ], γmin = minγ {γ : γ ∈ Γ0 } and γmax = maxγ {γ : γ ∈ Γ0 }. Then for any γ 0 ∈ [γmin max 0

0

F (γ ) = P (qi < γ ) =

PN

i=1 1(qi

0 ) < γmax

N

=

N10 . N

Consider the following cases: I. Γ0 known, qi estimated: Ns N

=

=

=

Suppose we know Γ0 but not qi . Let

ci √b T

= qbi − qi

N N   1 X 1 X 0 0 1 qbi > γmax + 1 qbi < γmin N N i∈I10 i∈I20     1 X b ci 1 X b ci 0 0 1 γmax < qi + √ + 1 γmin > qi + √ N N T T i∈I10 i∈I20     1 X 1 X b ci b ci 0 0 1 √ > γmax − qi + 1 √ < γmin − qi . N N T T 0 0 i∈I1

i∈I2

0 Note that b ci > 0 if qbi > qi and i ∈ I10 , and b ci < 0 if qbi < qi and i ∈ I20 . Now γmax = β2 and 0 0 0 γmin = β1 with qi = β1 for those i ∈ I1 and qi = β2 for those i ∈ I2 ,     Ns 1 X b ci 1 X b ci = 1 √ > β2 − qi + 1 √ < β1 − qi N N N T T 0 0 i∈I1 i∈I2     1 X b ci 1 X b ci √ √ = 1 > β2 − β1 + 1 < β1 − β2 N N T T i∈I10 i∈I20   1 X  1 X  = 1 b ci > νT 1/2−α + 1 b ci < −νT 1/2−α . N N 0 0 i∈I1

i∈I2

Since 0 ≤ α < 1/2, Ns /N tends to zero as T → ∞. 16

II. Γ0 and qi both unknown: Now turn to the case when Γ0 and qi are both unknown. Under dbmax 0 0 √ = the assumption that γ bmin − γmin and γ bmax − γmax are of order Op (N −1 T −1/2 ), we can let N T 0 γmax −γ bmax and

qbi = qi +

ci √b , T

Ns N



=

dbmin √ N T

0 = γmin −γ bmin . Note that dbmax and dbmin do not depend on i. Because

we have N N 1 X 1 X 1 (b qi > γ bmax ) + 1 (b qi < γ bmin ) N N 0 0 i∈I1 i∈I2 ! 1 X b ci dbmax 1 X 0 1 √ + √ > γmax − qi + 1 N N T N T 0 0 i∈I1

=

=

=

i∈I2

b c db 0 √i + min √ < γmin − qi T N T

!

! ! b ci b ci dbmax 1 X dbmin √ + √ > β2 − β1 + 1 √ + √ < β1 − β2 N T N T T N T 0 i∈I1 i∈I2 ! ! X bmin b ci b c 1 X dbmax 1 d i 1 √ + √ > νT −α + 1 √ + √ < −νT −α N N T N T T N T i∈I10 i∈I20 ! ! X bmax bmin d d 1 X 1 1 b ci > νT −α+1/2 − 1 b ci < −νT −α+1/2 − + . N N N N 0 0 1 X 1 N 0

i∈I1

i∈I2

Notice that  0

P qbi > γ bmax |i ∈ I1

! bmax d 0 −α+1/2 = P b ci > νT − i ∈ I1 N   ≤ P |b ci | > νT −α+1/2 + Op (N −1 ) i ∈ I10 = O(T 2α−1 ),

where the last equality comes from the Chebyshev’s inequality. Similarly,  P qbi < γ bmin |i ∈ I20 = O(T 2α−1 ). Thus, E(Ns /N ) = O(T 2α−1 ). Since 0 ≤ α < 1/2, Ns /N tends to zero as T → ∞.

17



References Abraham, C., P. Cornillion, E. Matzner-Lober, and N. Molinari (2003): “Unsupervised Curve Clustering Using B-Splines,” Scandinavian Journal of Statistics, 30, 581–595. Alvarez, J., and M. Arellano (2003): “The Time Series and Cross-Section Asymptotics of Dynamic Panel Data Estimators,” Econometrica, 71, 1121–1160. Alvarez, J., M. Browning, and M. Ejrnæs (2010): “Modelling Income Processes with Lots of Heterogeneity,” The Review of Economic Studies, 77, 1353–1381. Anderson, T., and C. Hsiao (1982): “Formulation and Estimation of Dynamic Models Using Panel Data,” Journal of Econometrics, 18, 47–82. Andrews, D., and W. Ploberger (1994): “Optimal Tests When a Nuisance Parameter is Present only under the Alternative,” Econometrica, 62, 1383–1414. Bai, J. (1997): “Estimation of a Change Point in Multiple Regression Models,” Review of Economics and Statistics, 79, 551–563. Browning, M., and J. Carro (2007): “Heterogeneity and Microeconometric Modeling,” Advances in Economics and Econometrics, 3, edited by Richard Blundell, Whitney Newey and Torsten Persson, Cambridge University. Burnside, C. (1996): “Production Function Regressions, Returns to Scale and Externalities,” Journal of Monetary Economics, 177–201. Calinski, R., and J. Harabasz (1974): “A Dendrite Method for Cluster Analysis,” Communication in Statistics, 3, 1–27. Caner, M., and B. E. Hansen (2004): “Instrumental Variable Estimation of a Threshold Model,” Econometric Theory, 20, 813–843. Davies, R. B. (1977): “Hypothesis Testing when a Nuisance Parameter is Present only under the Alternative,” Biometrika, 64, 247–254. Duda, R., and P. Hart (1973): Pattern Classification and Scene Analysis. Wiley. Durlauf, S. N., A. Kourtellos, and C. M. Tan (2008): “Empirics of Growth and Development,” International Handbook of Development Economics, 1, edited by Amitava Dutt and Jaime Ros, Edward Elgar. Garcia-Escudero, L., and A. Gordaliza (1999): “Robustness Properties of K Means and Trimmed K Means,” Journal of the American Statistical Association, 94, 956–969. Goldfeld, S., and R. Quandt (1973): “The Estimation of Structural Shifts by Switching Regressions,” Annals of Economic and Social Measurement, 2, 475–485. Hahn, J., and G. Kuersteiner (2002): “Asymptotically Unbiased Inference for A Dynamic Panel Model with Fixed Effects when Both N and T are Large,” Econometrica, 70, 1639–1657. Hansen, B. E. (1996): “Inference When a Nuisance Parameter Is Not Identified under the Null Hypothesis,” Econometrica, 64, 413–430. (1999): “Threshold Effects in Non-dynamic Panels: Estimation, Testing, and Inference,” Journal of Econometrics, 93, 345–368. 18

Hartigan, J. A. (1975): Clustering Algorithms. Wiley. Henderson, D. J., and R. R. Russell (2005): “Human Capital and Convergence: A ProductionFrontier Approach,” International Economic Review, 46, 1167–1205. Heston, A., R. Summers, and B. Aten (2006): Penn World Table Version 6.2. Center for International Comparisons of Production, Income and Prices at the University of Pennsylvania. Hsiao, C., and M. H. Pesaran (2004): “Random Coefficient Panel Data Models,” edited by L. Matyas and P. Sevestre, Third Edition, Springer Publishers, Ch. 6. Hsiao, C., and A. K. Tahmiscioglu (1997): “A Panel Analysis of Liquidity Constraints and Firm Investment,” Journal of the American Statistical Association, 92, 455–465. Lee, K., M. H. Pesaran, and R. Smith (1997): “Growth and Convergence in a Multi-country Empirical Stochastic Solow Model,” Journal of Applied Econometrics, 12, 357–392. Maddala, G., R. Trost, H. Li, and F. Joutz (1997): “Estimationn of Short Run and Long Run Elasticities of Energy Demand from Panel Data using Shrinkage Estimators,” Journal of Business and Economic Statistics, 15, 90–100. Mankiw, N. G., D. Romer, and D. N. Weil (1992): “A Contribution to the Emprics of Economic Growth,” The Quarterly Journal of Economics, 107, 407–437. Milligan, G., and W. Cooper (1985): “An Examination of Procedures for Determining the Number of Clusters in a Data Set,” Psychometrika, 50, 159–179. Pesaran, M. H. (2006): “Estimation and inference in large heterogeneous panels with a multifactor error structure,” Econometrica, 74(4), 967–1012. Pesaran, M. H., and T. Yamagata (2008): “Testing Slope Homogeneity in Large Panels,” Journal of Econometrics, 142, 50–93. Pollard, D. (1981): “Strong Consistency of K-means Clustering,” Annals of Statistics, 9, 135– 140. (1982): “A Central Limit Theorem for K-Means Clustering,” Annals of Probability, 10, 919–926. Qin, L. X., and S. Self (2006) “The Clustering of Regression Models Methods with Applications in Gene Expression Data”, Biometrics, 62, 526-533. Robertson, D., and J. Symons (1992): “Some Strange Properties of Pooled Data Estimators,” Journal of Applied Econometrics, 7, 175–189. Song, K. (2004): “Large Panel Models with Limited Heterogeneity,” mimeo. Sugar, C., and G. James (2003): “Finding the Number of Clsuters in a Dataset: An Information Theoretic Approach,” Journal of the American Statistical Association, 98:463, 750–763. Sun, Y. (2005): “Estimation and Inference in Panel Structure Models,” Working Paper: 2005-11, Dept. of Economics, University of California, San Diego. Swamy, P. A. V. B. (1970): “Efficient Inference in a Random Coefficient Regression Model,” Econometrica, 38, 311–323.

19

Table 1: RMSE: Static Panel Parameters (G,K)=(2,1) β1 = 0.3 β2 = 0.9

(G,K)=(2,2) β1 = (0.1, 0.3)0 β2 = (2/3, 0.6)0 (G,K)=(3,1) β1 = 0.3 β2 = 0.5 β3 = 0.8 (G,K)=(3,2) β1 = (0.3, −0.3)0 β2 = (0.5, 0)0 β3 = (0.7, 0.3)0

T 20 50 100 200 500 20 50 100 200 500 20 50 100 200 500 20 50 100 200 500

PSEUDO 100 200 10.86 12.12 1.64 5.51 0.82 1.25 0.57 0.40 0.37 0.26 6.51 8.76 1.34 1.02 0.82 0.59 0.59 0.41 0.36 0.27 13.78 13.60 9.14 8.51 5.57 5.30 2.28 2.21 0.53 0.36 12.17 11.98 7.07 6.22 3.82 3.41 1.79 1.58 0.52 0.41

N=50 8.60 1.79 1.14 0.82 0.53 6.62 1.72 1.17 0.81 0.52 14.02 10.73 6.39 2.60 0.67 13.03 9.87 4.54 2.02 0.63

500 12.16 6.64 4.20 2.68 0.16 9.82 4.00 2.27 0.26 0.16 13.57 8.49 5.46 2.16 0.27 12.25 5.96 3.23 1.36 0.30

50 8.95 1.81 1.14 0.82 0.53 6.06 1.72 1.17 0.81 0.52 14.01 10.44 4.65 1.81 0.66 12.55 7.24 1.76 0.99 0.63

CK-means 100 200 11.20 12.18 1.63 5.50 0.82 1.03 0.57 0.40 0.37 0.26 5.69 8.24 1.32 1.00 0.82 0.59 0.59 0.41 0.36 0.27 13.33 13.52 7.61 7.67 4.08 3.97 1.65 1.57 0.48 0.33 9.82 9.70 4.16 3.98 1.43 1.24 0.73 0.50 0.44 0.32

500 12.37 6.57 2.65 0.28 0.16 9.56 4.52 2.37 0.26 0.16 13.56 8.36 4.83 1.50 0.23 10.78 4.21 1.10 0.32 0.20

PSEUDO when G is known 50 100 200 500 8.44 8.04 8.03 7.86 1.79 1.49 1.38 1.10 1.14 0.82 0.56 0.37 0.82 0.57 0.40 0.27 0.53 0.37 0.26 0.16 6.62 6.47 6.46 6.75 1.72 1.34 1.01 0.84 1.17 0.82 0.59 0.36 0.81 0.59 0.41 0.26 0.52 0.36 0.27 0.16 13.75 13.74 13.77 13.87 9.05 9.11 9.09 9.06 5.74 5.61 5.38 5.35 2.60 2.28 2.21 2.16 0.67 0.53 0.36 0.27 11.96 12.00 12.16 12.38 7.26 7.44 7.74 8.02 4.39 4.19 3.97 3.82 2.02 1.79 1.58 1.36 0.63 0.52 0.41 0.30

50 2.70 1.56 1.14 0.82 0.53 2.60 1.65 1.16 0.81 0.52 3.30 1.99 1.40 1.01 0.64 3.23 2.04 1.43 0.99 0.63

I 0 is known 100 200 500 1.88 1.32 0.83 1.14 0.83 0.52 0.82 0.56 0.37 0.57 0.40 0.27 0.37 0.26 0.16 1.87 1.34 0.84 1.17 0.83 0.52 0.82 0.59 0.36 0.59 0.41 0.26 0.36 0.27 0.16 2.29 1.62 1.03 1.40 1.02 0.64 1.01 0.70 0.46 0.70 0.50 0.32 0.46 0.32 0.20 2.31 1.65 1.03 1.44 1.01 0.64 1.00 0.72 0.44 0.72 0.50 0.32 0.44 0.32 0.20

Data are generated from the following static panel model. For G = 2, 3, K = 1, 2, t = 1, . . . , T , and i = 1, . . . , N , ! G K X X yeit = αi + x ekit βgk 1(i ∈ Ig0 ) + eeit . g=1

k=1

The numbers reported above are the RMSEs ×100, where RM SE is defined in the text. The number of units in each group is restricted to be at least max(10, 0.1N ). Columns 1 and 2 assume that G and group membership are both unknown. Column 3 assumes that G is known.

Table 2: RMSE: Static Panel – Robustness Check Parameters (G,K)=(2,1) β1 = 0.55 β2 = 0.65

(G,K)=(2,2) β1 = (0.3, 0.4)0 β2 = (0.4, 0.5)0 (G,K)=(3,1) β1 = 0.4 β2 = 0.5 β3 = 0.6 (G,K)=(3,2) β1 = (0.4, 0.2)0 β2 = (0.5, 0.3)0 β3 = (0.6, 0.4)0

T 20 50 100 200 500 20 50 100 200 500 20 50 100 200 500 20 50 100 200 500

N=50 10.02 4.91 4.78 4.57 2.74 5.11 4.81 4.74 4.54 1.01 11.55 8.04 6.05 5.04 4.64 8.85 8.05 5.60 4.76 3.87

PSEUDO 100 200 10.97 11.27 7.05 7.12 5.10 5.16 3.64 3.60 1.73 1.70 8.46 9.84 4.88 5.83 4.73 4.39 3.12 3.05 1.13 1.25 11.56 12.26 7.73 7.74 5.94 5.92 5.01 5.00 4.13 3.06 9.76 11.06 6.86 6.92 5.44 5.50 4.79 4.78 2.21 2.01

500 12.87 7.84 5.27 3.59 1.68 10.97 6.96 4.77 3.12 1.38 13.02 8.10 5.79 4.20 2.45 11.43 7.31 5.18 3.55 2.05

50 9.97 4.91 4.78 4.58 2.78 5.25 4.81 4.74 4.44 0.85 11.49 8.04 6.05 5.05 4.71 8.85 7.82 5.25 4.78 3.32

CK-means 100 200 10.96 12.09 7.05 7.12 5.12 5.17 3.67 3.63 1.76 1.71 8.55 10.25 4.98 5.30 4.67 3.68 2.32 2.22 0.69 0.64 11.60 12.55 7.71 7.88 5.94 5.89 5.02 4.76 2.09 1.93 9.03 10.72 6.08 6.01 5.07 5.03 4.78 3.95 0.81 0.75

See footnote below Table 1.

20

500 12.76 7.68 5.46 3.80 1.69 11.05 6.59 4.07 2.16 0.60 13.01 8.00 5.75 4.00 1.92 11.35 7.00 4.06 2.43 0.68

PSEUDO when G is known 50 100 200 500 11.11 10.97 10.94 10.90 7.16 7.09 7.12 7.06 5.20 5.17 5.16 5.14 3.63 3.62 3.60 3.58 1.77 1.73 1.70 1.68 8.97 8.66 8.45 8.35 6.00 5.90 5.83 5.83 4.31 4.34 4.39 4.43 2.82 2.95 3.05 3.11 0.97 1.13 1.25 1.38 12.62 12.44 12.40 12.33 7.95 7.93 7.95 7.90 5.80 5.81 5.82 5.81 4.43 4.47 4.51 4.54 3.15 3.30 3.38 3.53 11.05 10.80 10.67 10.64 7.10 7.06 7.05 7.11 5.06 5.15 5.21 5.30 3.74 3.86 3.97 4.08 2.07 2.12 2.07 2.14

50 2.70 1.56 1.14 0.82 0.53 2.60 1.65 1.16 0.81 0.52 3.30 1.99 1.40 1.01 0.64 3.23 2.04 1.43 0.99 0.63

I 0 is known 100 200 500 1.88 1.32 0.83 1.14 0.83 0.52 0.82 0.56 0.37 0.57 0.40 0.27 0.37 0.26 0.16 1.87 1.34 0.84 1.17 0.83 0.52 0.82 0.59 0.36 0.59 0.41 0.26 0.36 0.27 0.16 2.29 1.62 1.03 1.40 1.02 0.64 1.01 0.70 0.46 0.70 0.50 0.32 0.46 0.32 0.20 2.31 1.65 1.03 1.44 1.01 0.64 1.00 0.72 0.44 0.72 0.50 0.32 0.44 0.32 0.20

Table 3: RMSE: Dynamic Panel Model.

T 20 50 100 200 500

N=50 15.41 5.84 1.60 0.99 0.56

PSEUDO 100 200 15.98 16.69 6.72 6.67 1.38 1.27 0.80 0.67 0.42 0.34

500 16.64 7.08 2.27 0.60 0.27

50 13.06 3.03 1.60 0.99 0.56

CK-means 100 200 13.35 15.46 2.67 2.53 1.38 1.24 0.80 0.67 0.42 0.34

500 15.90 5.28 2.84 0.60 0.27

PSEUDO when G is known 50 100 200 500 15.38 15.45 15.47 15.48 5.84 7.03 7.93 8.49 1.60 1.38 1.27 1.23 0.99 0.80 0.67 0.60 0.56 0.42 0.34 0.27

I 0 is known 6.55 2.77 1.60 0.99 0.56

6.12 2.51 1.38 0.80 0.42

5.86 2.33 1.24 0.67 0.34

5.70 2.24 1.14 0.60 0.27

Data are generated from the following dynamic panel model: if i ∈ Ig0 .

yeit = αi + ρg yei,t−1 + φg t + eeit , See footnote below Table 1.

Table 4: Application: Growth Regression

ρbp bp φ Group Ng ρbg bg φ tg test Group Ng ρbg bg φ tg test

1 9 0.8663 (35.8575) -0.0009 (-3.4227) 2.9625∗ 1 9 0.8394 (31.0516) -0.0009 (-3.0740) 0.5454

Model A 0.9683 (218.0212) 0.0001 (0.9487) PSEUDO 2 3 13 23 0.9693 0.9178 (125.5729) (79.3623) -0.0005 0.0011 (-2.3176) (4.4124) -0.2241 0.9209 CK-means 2 3 20 27 0.9727 0.8764 (166.9385) (73.1236) -0.0004 0.0018 (-2.5774) (7.3693) 1.1865 0.0757

4

1

24 0.9592 (123.8460) 0.0009 (3.8578) 6.3712∗

20 0.9606 (114.8522) -0.0002 (-1.0806) 2.8038∗

4

1

13 0.9519 (93.9476) 0.0018 (4.8926) 1.2584

11 0.9279 (57.2315) -0.0004 (-1.4769) 1.5780

Model B 0.9669 (208.6939) 0.0005 (4.4844) PSEUDO 2 3 9 19 0.8736 0.8866 (41.1522) (59.5900) 0.0011 0.0026 (3.1301) (7.3747) -0.3125 5.1246∗ CK-means 2 3 8 16 0.8771 0.916 (42.6843) (73.9328) 0.0009 0.0023 (2.1326) (7.0390) 0.3537 1.3110

4 21 0.9681 (131.6144) 0.0011 (4.6951) 3.2935∗ 4 34 0.9817 (206.5877) 0.0004 (2.6644) 5.2131∗

Note: Regression models for g = 1, 2, . . . , 4: i ∈ Ig ,

Model A : yeit

=

αi + ρg yeit−1 + φg t + eeit ,

Model B : yeit

=

αi + ρg yeit−1 + φg t + βg 4yt−1 + γg 4yt−2 + eeit ,

i ∈ Ig ,

where αi denotes country-specific fixed effects, ρg , φg , βg and γg , are group specific, y¯s = N −1

PN i

yeis , 4ys =

y¯s − y¯s−1 , s = 1, . . . , T . t values are in parenthesis. tg tests the null hypothesis of parameter homogeneity and denotes the significance at the 5% nominal level.

21



Table 5: Housing Dynamics in 50 States in the U.S.

∆4 log(Hit ) = αi + β(i)1 log(Hit−1 ) + β(i)2 Rt−1 +

3 X

γ(i)1,j ∆ log(Hit−j ) +

j=1

Regression Homogenous Group-specific βb1 βb2

(A) β·1 , β·2 — -0.117 (-15.887) -0.017 (-14.761)

βb11 βb12 βb21 βb22 b2 β b1 1−β

Ng b G(by BIC) tg : accept H0 tg : accept H00 SSR

(B) — β·1 , β·2 PSEUDO CK-means

-0.0193 50 — no — 60.656

-0.261 (-12.404) -0.024 (-10.378) -0.095 (-12.461) -0.017 (-12.294)

-0.277 (-12.024) -0.028 (-10.271) -0.095 (-12.728) -0.016 (-12.520)

-0.0325 -0.0177 (16, 34) 1 no no 59.679

-0.0387 -0.0242 (14, 36) 1 no yes 59.627

3 X

γ(i)2,j ∆Rt−j + eit .

j=1

(C) β·1 β·2 PSEUDO CK-means -0.120 -0.120 (-18.498) (-18.498)

-0.024 (-17.719)

(D) β·2 β·1 PSEUDO CK-means

-0.019 (-18.403) -0.231 (-14.059)

-0.019 (-18.218) -0.271 (-15.487)

-0.102 (-14.502)

-0.105 (-13.056)

-0.0247 -0.0212 (14, 36) 1 no yes 59.874

-0.0261 -0.0212 (13, 37) 2 no yes 59.785

-0.024 (-17.719)

-0.010 (-5.921)

-0.010 (-5.921)

-0.0273 -0.0242 (31, 19) 1 no yes 60.009

-0.0273 -0.0242 (31, 19) 1 no yes 60.009

Note: αi , γ(i)1,j and γ(i)2,j are state-specific. Ng denotes the size of the groups. t values are in parentheses. tg is used to test for parameter homogeneity for the full sample (H0 ), and for the subsamples (H00 ) with group membership estimated by PSEUDO or CK-means.

22