Supplementary Material Dynamic Linear Panel Regression Models ...

Report 2 Downloads 135 Views
Supplementary Material Dynamic Linear Panel Regression Models with Interactive Fixed Effects Hyungsik Roger Moon‡

Martin Weidner§

December 22, 2014

S.1

Proof of Identification (Theorem 2.1) 

Proof of Theorem 2.1. Let Q(β, λ, f ) ≡ E kY − β · X −

λ f 0 k2F

 0 0 λ , f , w , where β ∈

RK , λ ∈ RN ×R and f ∈ RT ×R . We have Q(β, λ, f ) o n   0 = E Tr (Y − β · X − λ f 0 ) (Y − β · X − λ f 0 ) λ0 , f 0 , w n h o 0 i = E Tr λ0 f 00 − λf 0 − (β − β 0 ) · X + e λ0 f 00 − λf 0 − (β − β 0 ) · X + e λ0 , f 0 , w i h 0 0 0 = E Tr (e e) λ , f , w n h o 0 i + E Tr λ0 f 00 − λf 0 − (β − β 0 ) · X λ0 f 00 − λf 0 − (β − β 0 ) · X λ0 , f 0 , w . | {z } ≡Q∗ (β,λ,f )

h i In the last step we used Assumption ID(ii). Since E Tr (e0 e) λ0 , f 0 , w is independent of β, λ, f , we find that minimizing Q(β, λ, f ) is equivalent to minimizing Q∗ (β, λ, f ). We decompose ‡

Department of Economics, University of Southern California, Los Angeles, CA 90089-0253.

Email:

[email protected]. Department of Economics, Yonsei University, Seoul, Korea. § Department of Economics, University College London, Gower Street, London WC1E 6BT, U.K., and CeMMaP. Email: [email protected].

1

Q∗ (β, λ, f ) as follows Q∗ (β, λ, f ) n h 0 0 00 i 0 0 o 0 00 0 0 0 0 = E Tr λ f − λf − (β − β ) · X λ f − λf − (β − β ) · X λ , f , w n h o 0 i = E Tr λ0 f 00 − λf 0 − (β − β 0 ) · X M(λ,λ0 ,w) λ0 f 00 − λf 0 − (β − β 0 ) · X λ0 , f 0 , w n h 0 i 0 0 o 0 00 0 0 00 0 0 0 + E Tr λ f − λf − (β − β ) · X P(λ,λ0 ,w) λ f − λf − (β − β ) · X λ , f , w n h o 0 i = E Tr (β high − β 0,high ) · Xhigh M(λ,λ0 ,w) (β high − β 0,high ) · Xhigh λ0 , f 0 , w {z } | ≡Qhigh (β high ,λ)

n h o 0 i + E Tr λ0 f 00 − λf 0 − (β − β 0 ) · X P(λ,λ0 ,w) λ0 f 00 − λf 0 − (β − β 0 ) · X λ0 , f 0 , w , {z } | ≡Qlow (β,λ,f )

where (β high − β 0,high ) · Xhigh =

PK

m=K1 +1 (β m

− β 0m )Xm . A lower bound on Qhigh (β high , λ) is

given by Qhigh (β high , λ) ≥

min e N ×(R+R+rank(w)) λ∈R

n h 0 i 0 0 o high 0,high E Tr (β high − β 0,high ) · Xhigh M(λ,λ,w) (β − β ) · X λ , f , w e high

min(N,T )

=

X

n h  0 0 0 io high 0,high high 0,high −β ) · Xhigh (β −β ) · Xhigh λ , f , w . µr E (β

r=R+R+rank(w)

(S.1.1) Since Q∗ (β, λ, f ), Qhigh (β high , λ), and Qlow (β, λ, f ), are expectations of traces of positive semi-definite matrices we have Q∗ (β, λ, f ) ≥ 0, Qhigh (β high , λ) ≥ 0, and Qlow (β, λ, f ) ≥ 0 for ¯ λ ¯ and f¯ be the parameter values that minimize Q(β, λ, f ), and thus also all β, λ, f . Let β, ¯ λ, ¯ f¯) = minβ,λ,f Q∗ (β, λ, f ) = 0. This Q∗ (β, λ, f ). Since Q∗ (β 0 , λ0 , f 0 ) = 0 we have Q∗ (β, high ¯ ¯ λ, ¯ f¯) = 0. Assumption ID(v), the lower bound (S.1.1), implies Qhigh (β¯ , λ) = 0 and Qlow (β, high high ¯ = 0 imply that β¯ and Qhigh (β¯ , λ) = β 0,high . Using this we find that ¯ λ, ¯ f¯) Qlow (β,    0   low low 0 0 0 00 0,low 0 00 0,low 0 0 ¯ ¯ ¯ ¯ ¯ ¯ = E Tr λ f − λf − (β − β ) · Xlow λ f − λf − (β − β ) · Xlow λ , f , w ,    0   low low 0 0 0 00 0,low 0 00 0,low 0 0 ¯ ¯ ¯ ¯ ≥ min E Tr λ f − λf − (β − β ) · Xlow λ f − λf − (β − β ) · Xlow λ , f , w f    0   low low 0 0 0 00 0,low 0 00 0,low ¯ ¯ = E Tr λ f − (β − β ) · Xlow Mλ¯ λ f − (β − β ) · Xlow λ , f , w , (S.1.2) 2

P 1 ¯ low 0 low ¯ ¯ ¯ (β, λ, f ) = 0 and the last expression where (β¯ − β 0,low ) · Xlow = K l=1 (β l − β l )Xl . Since Q in (S.1.2) is non-negative we must have    0   low low 0 00 0,low 0 0,low 0 00 0 E Tr λ f − (β¯ − β ) · Xlow Mλ¯ λ f − (β¯ − β ) · Xlow λ , f , w = 0. Using Mλ¯ = Mλ¯ Mλ¯ and the cyclicality of the trace we obtain from the last equality that   Tr Mλ¯ AMλ¯ = 0, where A = E



¯ low

0 00

λ f − (β

−β

0,low

  0 low 0 00 0,low 0 0 ) · Xlow λ f − (β¯ − β ) · Xlow λ , f , w . The

trace of a positive semi-definite matrix is only equal to zero if the matrix itself is equal to zero, so we find Mλ¯ AMλ¯ = 0, This together with the fact that A itself is positive semi definite implies that (note that A positive semi-definite implies A = CC 0 for some matrix C, and Mλ¯ AMλ¯ = 0 then implies Mλ¯ C = 0, i.e. C = Pλ¯ C) A = Pλ¯ APλ¯ , and therefore rank(A) ≤ rank(Pλ¯ ) ≤ R. We have thus shown that     0 low low 0 0 0 00 0,low 0 00 0,low ¯ ¯ rank E λ f − (β − β ) · Xlow λ f − (β − β ) · Xlow λ , f , w ≤ R. We furthermore find     0 low low 0 0,low 0 00 0,low 0 00 0 ) · Xlow λ , f , w ) · Xlow λ f − (β¯ − β R ≥ rank E λ f − (β¯ − β      0 low low 0 00 0,low 0 0,low 0 00 0 ) · Xlow Pf 0 λ f − (β¯ − β ) · Xlow Mw λ , f , w ≥ rank Mw E λ f − (β¯ − β      0 low low 0 0 0,low 0 00 0,low 0 00 ¯ ¯ ) · Xlow Mf 0 λ f − (β − β ) · Xlow Pw λ , f , w + rank Pw E λ f − (β − β   ≥ rank Mw λ0 f 00 f 0 λ00 Mw      0 low low 0 0 0,low 0,low ¯ ¯ + rank E (β − β ) · Xlow Mf 0 (β − β ) · Xlow λ , f , w .   Assumption ID(iv) guarantees that rank Mw λ0 f 00 f 0 λ00 Mw = rank λ0 f 00 f 0 λ00 = R, i.e. we must have  E

¯ low



−β

0,low

) · Xlow



  0 low 0,low 0 0 Mf 0 (β¯ − β ) · Xlow λ , f , w = 0.

low According to Assumption ID(iii) this implies β¯ = β 0,low , i.e. we have β¯ = β 0 . This also ¯ λ, ¯ f¯) = kλ0 f 00 − λ ¯ f¯0 k2 = 0, and thereofere λ ¯ f¯0 = λ0 f 00 . implies Q∗ (β, F

3

S.2

Examples of Error Distributions

p The following Lemma provides examples of error distributions that satisfy kek = Op ( max(N, T )) as N, T → ∞. Example (i) is particularly relevant for us, because those assumptions on eit are imposed in Assumption 5 in the main text, i.e. under those main text assumptions we indeed p have kek = Op ( max(N, T )). Lemma S.2.1. For each of the following distributional assumptions on the errors eit , i = p 1, . . . , N , t = 1, . . . , T , we have kek = Op ( max(N, T )). (i) The eit are independent across i and t, conditional on C, and satisfy E(eit |C) = 0, and E(e4it |C) is bounded uniformly by a non-random constant, uniformly over i, t and N, T . Here C can be any conditioning sigma-field, including the empty one (corresponding to unconditional expectations). (ii) The eit follow different MA(∞) process for each i, namely eit =

∞ X

ψ iτ ui,t−τ ,

for i = 1 . . . N, t = 1 . . . T ,

(S.2.1)

τ =0

where the uit , i = 1 . . . N , t = −∞ . . . T are independent random variables with Euit = 0 and Eu4it uniformly bounded across i, t and N, T . The coefficients ψ iτ satisfy ∞ X

∞ X

τ max ψ 2iτ < B ,

τ =0

i=1...N

τ =0

max |ψ iτ | < B ,

i=1...N

(S.2.2)

for a finite constant B which is independent of N and T . (iii) The error matrix e is generated as e = σ 1/2 u Σ1/2 , where u is an N × T matrix with independently distributed entries uit and Euit = 0, Eu2it = 1, and Eu4it is bounded uniformly across i, t and N, T . Here σ is the N × N cross-sectional covariance matrix, and Σ is the T × T time-serial covariance matrix, and they satisfy max

j=1...N

N X

|σ ij | < B ,

max

τ =1...T

i=1

T X

|Σtτ | < B ,

(S.2.3)

t=1

for some finite constant B which is independent of N and T . In this example we have Eeit ejτ = σ ij Σtτ .

4

Proof of Lemma S.2.1, Example (i). Latala (2005) showed that for a N × T matrix e with independent entries, conditional on C, we have  " #1/2 " #1/4  " #1/2   X X X     2 4 2 E eit C + E eit C E eit C + max , E kek C ≤ c max j   i t

i

i,t

where c is some universal constant. Since we assumed uniformly bounded 4’th conditional p √ √ moments for eit we thus have kek = OP ( T ) + OP ( N ) + OP ((T N )1/4 ) = Op ( max(N, T )). Example (ii). Let ψ j = (ψ 1j , . . . , ψ N j ) be an N × 1 vector for each j ≥ 0. Let U−j be an N × T sub-matrix of (uit ) consisting of uit , i = 1 . . . N , t = 1 − j, . . . , T − j. We can then write equation (S.2.1) in matrix notation as e=

∞ X

diag(ψ j ) U−j

j=0

=

T X

diag(ψ j ) U−j + rN T ,

j=0

where we cut the sum at T , which results in the remainder rN T =

P∞

j=T +1

diag(ψ j ) U−j . When

approximating an MA(∞) by a finite MA(T ) process we have for the remainder 2

E (krN T kF ) =

N X T X

E (rN T )2ij

i=1 t=1



σ 2u

N X T ∞ X X i=1 t=1 j=T +1 ∞ X

max ψ 2ij

≤ σ 2u N T ≤ σ 2u N

ψ 2ij

j=T +1 ∞ X



i

 j max ψ 2ij ,

j=T +1

i

where σ 2u is the variance of uit . Therefore, for T → ∞ we have ! (krN T kF )2 −→ 0 , E N √ which implies (krN T kF )2 = Op (N ), and therefore krN T k ≤ krN T kF = Op ( N ). Let V be the N × 2T matrix consisting of uit , i = 1 . . . N , t = 1 − T, . . . , T . For j = 0 . . . T the matrices U−j are sub-matrices of V , and therefore kU−j k ≤ kV k. From example (i) we know p  that kV k = Op ( max(N, 2T )). Furthermore, we know that k diag(ψ j )k ≤ maxi ψ ij .

5

Combining these results we find kek ≤

T X

k diag(ψ j )k kU−j k + krN T k

j=0



T X

√  max ψ ij kV k + op ( N ) i

j=0



"∞ X

# p √  max ψ ij Op ( max(N, 2T )) + op ( N )

j=0

i

p ≤ Op ( max(N, T )) . This is what we wanted to show. Example (iii). Since σ and Σ are positive definite, there exits a symmetric N ×N matrix φ and a symmetric T ×T matrix ψ such that σ = φ2 and Σ = ψ 2 . The error term can then be generated as e = φuψ, where u is an N ×T matrix with iid entries uit such that E(uit ) = 0 and E(u4it ) < ∞. Given this definition of e we immediately have Eeit = 0 and Eeit ejτ = σ ij Σtτ . What is left to p p show is that kek = Op ( max(N, T )). From example (i) we know that kuk = Op ( max(N, T )). p Using the inequality kσk ≤ kσk1 kσk∞ = kσk1 , where kσk1 = kσk∞ because σ is symmetric we find kσk ≤ kσk1 ≡ max

j=1...N

N X

|σ ij | < L ,

i=1

and analogously kΣk < L. Since kσk = kφk2 and kΣk = kψk2 , we thus find kek ≤ kφkkukkψk ≤ p p LOp ( max(N, T )), i.e. kek = Op ( max(N, T )).

S.3

Comments on assumption 4 on the regressors

b requires that the regressors not only satisfy the standard Consistency of the LS estimator β non-collinearity condition in assumption 4(i), but also the additional conditions on high- and low-rank regressors in assumption 4(ii). Bai (2009) considers the special cases of only highrank and only low-rank regressors. As low-rank regressors he considers only cross-sectional invariant and time-invariant regressors, and he shows that if only these two types of regressors are present, one can show consistency under the assumption plimN,T →∞ WN T > 0 on the regressors (instead of assumption 4), where WN T is the K × K matrix defined by WN T,k1 k2 = (N T )−1 Tr(Mf 0 Xk0 1 Mλ0 Xk2 ). This matrix appears as the approximate Hessian in the profile 6

objective expansion in theorem 4.1, i.e. the condition plimN,T →∞ WN T > 0 is very natural in the context of the interactive fixed effect models, and one may wonder whether also for the general case one can replace assumption 4 with this weaker condition and still obtain consistency of the LS estimator. Unfortunately, this is not the case, and below we present two simple counter examples that show this. (i) Let there only be one factor (R = 1) ft0 with corresponding factor loadings λ0i . Let there only be one regressor (K = 1) of the form Xit = wi vt + λ0i ft0 . Assume that the N × 1 vector w = (w1 , . . . , wN )0 , and the T × 1 vector v = (v1 , . . . , vN )0 are such that the N × 2 matrix Λ = (λ0 , w) and and the T × 2 matrix F = (f 0 , v) satisfy plimN,T →∞ (Λ0 Λ/N ) > 0, plimN,T →∞ (F 0 F/T ) > 0. In this case, we have WN T = (N T )−1 Tr(Mf 0 vw0 Mλ0 wv 0 ), and therefore plimN,T →∞ WN T = plimN,T →∞ (N T )−1 Tr(Mf 0 vw0 Mλ0 wv 0 ) > 0. However, β is not identified because β 0 X + λ0 f 00 = (β 0 + 1)X − wv 0 , i.e. it is not possible to distinguish (β, λ, f ) = (β 0 , λ0 , f 0 ) and (β, λ, f ) = (β 0 + 1, −w, v). This implies that the LS estimator is not consistent (both β 0 and β 0 + 1 could be the true parameter, but the LS estimator cannot be consistent for both). (ii) Let there only be one factor (R = 1) ft0 with corresponding factor loadings λ0i . Let the N × 1 vectors λ0 , w1 and w2 be such that Λ = (λ0 , w1 , w2 ) satisfies plimN,T →∞ (Λ0 Λ/N ) > 0. Let the T ×1 vectors f 0 , v1 and v2 be such that F = (f 0 , v1 , v2 ) satisfies plimN,T →∞ (F 0 F/T ) > 0. Let there be four regressors (K = 4) defined by X1 = w1 v10 , X2 = w2 v20 , X3 = (w1 + λ0 )(v2 + f 0 )0 , X4 = (w2 + λ0 )(v1 + f 0 )0 . In this case, one can easily check that P plimN,T →∞ WN T > 0. However, again β k is not identified, because 4k=1 β 0k Xk + λ0 f 00 = P4 0 0 00 0 k=1 (β k +1)Xk −(λ +w1 +w2 )(f +v1 +v2 ) , i.e. we cannot distinguish between the true parameters and (β, λ, f ) = (β 0 + 1, −λ0 − w1 − w2 , f 00 + v1 + v2 ). Again, as a consequence the LS estimator is not consistent in this case. In example (ii), there are only low-rank regressors with rank(Xl ) = 1. One can easily check that assumption 4 is not satisfied for this example. In example (i) the regressor is a low-rank regressor with rank(X) = 2. In our present version of assumption 4 we only consider low-rank regressors with rank rank(X) = 1, but (as already noted in a footnote in the main paper) it is straightforward to extend the assumption and the consistency proof to low-rank regressors with rank larger than one. Independent of whether we extend the assumption or not, the regressor X of example (i) fails to satisfy assumption 4. This justifies our formulation of assumption 4, because it shows that in general the assumption cannot be replaced by the weaker condition plimN,T →∞ WN T > 0. 7

S.4

Some Matrix Algebra (including Proof of Lemma A.1)

The following statements are true for real matrices (throughout the whole paper and supplementary material we never use complex numbers anywhere). Let A be an arbitrary n × m matrix. In addition to the operator (or spectral) norm kAk and to the Frobenius (or Hilbert-Schmidt) norm kAkF , it is also convenient to define the 1-norm, the ∞-norm, and the max-norm by kAk1 = max

n X

j=1...m

|Aij | ,

kAk∞ = max

i=1...n

i=1

m X

|Aij | ,

kAkmax = max max |Aij | .

j=1

i=1...n j=1...m

Lemma S.4.1 (Some useful Inequalities). Let A be an n × m matrix, B be an m × p matrix, and C and D be n × n matrices. Then we have: (i)

kAk ≤ kAkF ≤ kAk rank (A)1/2 ,

(ii)

kABk ≤ kAk kBk ,

(iii)

kABkF ≤ kAkF kBk ≤ kAkF kBkF ,

(iv)

|Tr(AB)| ≤ kAkF kBkF ,

(v)

|Tr (C)| ≤ kCk rank (C) ,

for n = p,

(vi)

kCk ≤ Tr (C) ,

(vii)

kAk2 ≤ kAk1 kAk∞ , √ kAkmax ≤ kAk ≤ nm kAkmax ,

(viii)

for C symmetric and C ≥ 0,

kA0 CAk ≤ kA0 DAk ,

(ix)

for C symmetric and C ≤ D.

For C, D symmetric, and i = 1, . . . , n we have: (x)

µi (C) + µn (D) ≤ µi (C + D) ≤ µi (C) + µ1 (D) ,

(xi)

µi (C) ≤ µi (C + D) ,

for D ≥ 0,

(xii)

µi (C) − kDk ≤ µi (C + D) ≤ µi (C) + kDk .

Proof. Here we use notation si (A) for the i’th largest singular value of a matrix A. P (i) We have kAk = s1 (A), and kAk2F = rank(A) (si (A))2 . The inequalities follow directly from i=1 this representation. (ii) This inequality is true for all unitarily invariant norms, see e.g. Bhatia (1997). (iii) can be shown as follows kABk2F = Tr(ABB 0 A0 ) = Tr[kBk2 AA0 − A(kBk2 I − BB 0 )A0 ] ≤ kBk2 Tr(AA0 ) = kBk2 kAk2F , 8

where we used that A(kBk2 I − BB 0 )A0 is positive definite. Relation (iv) is just the Cauchy Schwarz inequality. To show (v) we decompose C = U DO0 (singular value decomposition), where U and O are n × rank(C) that satisfy U 0 U = O0 O = I and D is a rank(C) × rank(C) diagonal matrix with entries si (C). We then have kOk = kU k = 1 and kDk = kCk and therefore |Tr(C)| = |Tr(U DO0 )| = |Tr(DO0 U )| rank(C) X 0 0 η i DO U η i = i=1 rank(C)



X

kDkkO0 kkU k = rank(C)kCk .

i=1

For (vi) let e1 be a vector that satisfied ke1 k = 1 and kCk = e01 Ce1 . Since C is symmetric such an e1 has to exist. Now choose ei , i = 2, . . . , n, such that ei , i = 1, . . . , n, becomes a orthonormal basis of the vector space of n × 1 vectors. Since C is positive semi definite we P then have Tr (C) = i e0i Cei ≥ e1 Ce1 = kCk, which is what we wanted to show. For (vii) we refer to Golub and van Loan (1996), p.15. For (viii) let e be the vector that satisfies kek = 1 and kA0 CAk = e0 A0 CAe. Since A0 CA is symmetric such an e has to exist. Since C ≤ D we then have kCk = (e0 A0 )C(Ae) ≤ (e0 A0 )D(Ae) ≤ kA0 DAk. This is what we wanted to show. For inequality (ix) let e1 be a vector that satisfied ke1 k = 1 and kA0 CAk = e01 A0 CAe1 . Then we have kA0 CAk = e01 A0 DAe1 − e01 A0 (D − C)Ae1 ≤ e01 A0 DAe1 ≤ kA0 DAk. Statement (x) is a special case of Weyl’s inequality, see e.g. Bhatia (1997). The Inequalities (xi) and (xii) follow directly from (ix) since µn (D) ≥ 0 for D ≥ 0, and since −kDk ≤ µi (D) ≤ kDk for i = 1, . . . , n. Definition S.4.2. Let A be an n × r1 matrix and B be an n × r2 matrix with rank(A) = r1 and rank(B) = r2 . The smallest principal angle θA,B ∈ [0, π/2] between the linear subspaces span(A) = {Aa| a ∈ Rr1 } and span(B) = {Bb| b ∈ Br2 } of Rn is defined by cos(θA,B ) = maxr 06=a∈R

1

maxr

06=b∈R

2

a0 A0 Bb . kAakkBbk

Lemma S.4.3. Let A be an n × r1 matrix and B be an n × r2 matrix with rank(A) = r1 and rank(B) = r2 . Then we have the following alternative characterizations of the smallest principal angle between span(A) and span(B) sin(θA,B ) = minr 06=a∈R

1

= minr 06=b∈R

9

2

kMB A ak kA ak kMA B bk . kB bk

Proof. Since kMB A ak2 + kPB A ak2 = kA ak2 and sin(θA,B )2 + cos(θA,B )2 = 1, we find that proving the theorem is equivalent to proving cos(θA,B ) = minr 06=a∈R

1

kPA B bk kPB A ak = minr . 06=b∈R 2 kA ak kA bk

This result is theorem 8 in Galantai, Hegedus (2006), and the proof can be found there. Proof of Lemma A.1. Let S1 (Z) = min Tr [(Z − λf 0 ) (Z 0 − f λ0 )] , f,λ

S2 (Z) = min Tr(Z Mf Z 0 ) , f

S3 (Z) = min Tr(Z 0 Mλ Z) , λ

S4 (Z) = min Tr(Mλe Z Mfe Z 0 ) , ˜ f˜ λ,

S5 (Z) =

T X

µi (Z 0 Z) ,

i=R+1

S6 (Z) =

N X

µi (ZZ 0 ) .

i=R+1

The theorem claims S1 (Z) = S2 (Z) = S3 (Z) = S4 (Z) = S5 (Z) = S6 (Z) . We find: (i) The non-zero eigenvalues of Z 0 Z and ZZ 0 are identical, so in the sums in S5 (Z) and in S6 (Z) we are summing over identical values, which shows S5 (Z) = S6 (Z). (ii) Starting with S1 (Z) and minimizing with respect to f we obtain the first order condition λ0 Z = λ0 λ f 0 . Putting this into the objective function we can integrate out f , namely   0 Tr (Z − λf 0 ) (Z − λf 0 ) = Tr (Z 0 Z − Z 0 λf 0 ) = Tr Z 0 Z − Z 0 λ(λ0 λ)−1 (λ0 λ)f 0



= Tr Z 0 Z − Z 0 λ(λ0 λ)−1 (λ0 λ)λ0 Z



= Tr (Z 0 Mλ Z) . This shows S1 (Z) = S3 (Z). Analogously, we can integrate out λ to obtain S1 (Z) = S2 (Z). 10

(iii) Let Mλb be the projector on the N − R eigenspaces corresponding to the N − R smallest eigenvalues1 of ZZ 0 , let Pλb = IN − Mλb , and let ω R be the R’th largest eigenvalue of ZZ 0 . We then know that the matrix Pλb [ZZ 0 − ω R IN ]Pλb − Mλb [ZZ 0 − ω R IN ]Mλb is positive semi-definite. Thus, for an arbitrary N × R matrix λ with corresponding projector Mλ we have  2 o Pλb [ZZ 0 − ω R IN ]Pλb − Mλb [ZZ 0 − ω R IN ]Mλb Mλ − Mλb    = Tr Pλb [ZZ 0 − ω R IN ]Pλb + Mλb [ZZ 0 − ω R IN ]Mλb Mλ − Mλb     = Tr [Z 0 Mλ Z] − Tr Z 0 Mλb Z + ω R rank(Mλ ) − rank(Mλb ) ,

0 ≤ Tr

n

and since rank(Mλb ) = N − R and rank(Mλ ) ≤ N − R we have   Tr Z 0 Mλb Z ≤ Tr [Z 0 Mλ Z] . This shows that Mλb is the optimal choice in the minimization problem of S3 (Z), i.e. the b is chosen such that the span of the N -dimensional vectors λ br (r = 1 . . . R) optimal λ = λ equals to the span of the R eigenvectors that correspond to the R largest eigenvalues of ZZ 0 . This shows that S3 (Z) = S6 (Z). Analogously one can show that S2 (Z) = S5 (Z). e such that the span of the N (iv) In the minimization problem in S4 (Z) we can choose λ er (r = 1 . . . R1 ) equals to the span of the R1 eigenvectors that dimensional vectors λ correspond to the R1 largest eigenvalues of ZZ 0 . In addition, we can choose fe such that the span of the T -dimensional vectors fer (r = 1 . . . R2 ) equals to the span of the R2 eigenvectors that correspond to the (R1 + 1)-largest up to the R-largest eigenvalue of Z 0 Z. e and fe we actually project out all the R largest eigenvalues of Z 0 Z With this choice of λ and ZZ 0 . This shows that S4 (Z) ≤ S5 (Z). (This result is actually best understood by using the singular value decomposition of Z.) e where We can write Mλe Z Mfe = Z − Z, Ze = Pλe Z Mfe + Z Pfe . Since rank(Z) ≤ rank(Pλe Z Mfe)+rank(Z Pfe) = R1 +R2 = R, we can always write Ze = λf 0 1

If an eigenvalue has multiplicity m, we count it m times when finding the N − R smallest eigenvalues. In

this terminology we always have exactly N eigenvalues of ZZ 0 , but some may appear multiple times.

11

for some appropriate N × R and T × R matrices λ and f . This shows that S4 (Z) = min Tr(Mλe Z Mfe Z 0 ) ¯ f¯ λ,



min e : rank(Z)≤R} e {Z

e e 0) Tr((Z − Z)(Z − Z)

= min Tr [(Z − λf 0 ) (Z 0 − f λ0 )] = S1 (Z) . f,λ

Thus we have shown here S1 (Z) ≤ S4 (Z) ≤ S5 (Z), and actually this holds with equality since S1 (Z) = S5 (Z) was already shown above.

S.5

Supplement to the Consistency Proof (Appendix A)

Lemma S.5.1. Under assumptions 1 and 4 there exists a constant B0 > 0 such that for the matrices w and v introduced in assumption 4 we have w0 Mλ0 w − B0 w0 w ≥ 0 ,

wpa1,

v 0 Mf 0 v − B0 v 0 v ≥ 0 ,

wpa1.

Proof. We can decompose w = w e w, ¯ where w e is an N ×rank(w) matrix and w¯ is a rank(w)×K1 matrix. Note that w e has full rank, and Mw = Mwe . By assumption 1(i) we know that λ00 λ0 /N has a probability limit, i.e there exists some B1 > 0 such that λ00 λ0 /N < B1 IR wpa1. Using this and assumption 4 we find that for any R × 1 vector a 6= 0 we have B kMv λ0 ak2 a0 λ00 Mv λ0 a = > , 0 00 0 B1 kλ ak2 a0 λ λ a

wpa1.

Applying Lemma S.4.3 we find min 06=b∈Rrank(w)

b0 w e0 Mλ0 w eb a0 λ00 Mw λ0 a B > = min , 00 0 0 0 0 R b w e w eb B1 06=a∈R a λ λ a

wpa1.

Therefore we find for every rank(w) × 1 vector b that b0 (w e0 Mλ0 w e − (B/B1 )w e0 w e ) b > 0, wpa1. Thus w e0 Mλ0 w e − (B/B1 ) w e0 w e > 0, wpa1. Multiplying from the left with w¯ 0 and from the right with w¯ we obtain w0 Mλ0 w − (B/B1 ) w0 w ≥ 0, wpa1. This is what we wanted do show. Analogously we can show the statement for v. As a consequence of the this lemma we obtain some properties of the low-rank regressors summarized in the following lemma. 12

Lemma S.5.2. Let the assumptions 1 and 4 be satisfied and let Xlow,α =

PK1

l=1

αl Xl be a linear

combination of the low-rank regressors. Then there exists some constant B > 0 such that

Xlow,α Mf 0 X 0 low,α >B, wpa1, min NT {α∈RK1 ,kαk=1}

Mλ0 Xlow,α Mf 0 X 0 0 M λ low,α min >B, wpa1. NT {α∈RK1 ,kαk=1}



0 0

, because kMλ0 k = 1, i.e. Proof. Note that Mλ0 Xlow,α Mf 0 Xlow,α Mλ0 ≤ Xlow,α Mf 0 Xlow,α if we can show the second inequality of the lemma we have also shown the first inequality. We can write Xlow,α = w diag(α0 ) v 0 . Using Lemma S.5.1 and part (v), (vi) and (ix) of Lemma S.4.1 we find

0

Mλ0 Xlow,α Mf 0 Xlow,α Mλ0 = kMλ0 w diag(α0 ) v 0 Mf 0 v diag(α0 ) w0 Mλ0 k ≥ B0 kMλ0 w diag(α0 ) v 0 v diag(α0 ) w0 Mλ0 k B0 ≥ Tr [Mλ0 w diag(α0 ) v 0 v diag(α0 ) w0 Mλ0 ] K1 B0 = Tr [v diag(α0 ) w0 Mλ0 w diag(α0 ) v 0 ] K1 B0 ≥ kv diag(α0 ) w0 Mλ0 w diag(α0 ) v 0 k K1 B02 ≥ kv diag(α0 ) w0 w diag(α0 ) v 0 k K1 B2 ≥ 02 Tr [v diag(α0 ) w0 w diag(α0 ) v 0 ] K1  B2  0 = 02 Tr Xlow,α Xlow,α . K1

0 Thus we have Mλ0 Xlow,α Mf 0 Xlow,α Mλ0 /(N T ) ≥ (B0 /K1 )2 α0 WNlow T α , where the K1 × K1  low −1 0 matrix WNlow T is defined by WN T,l1 l2 = (N T ) Tr Xl1 Xl2 , i.e. it is a submatrix of WN T . Since WN T and thus WNlow T converges to a positive definite matrix the lemma is proven by the inequality above. (2) Using the above lemmas we can now prove the lower bound on SeN T (β, f ) that was used in

the consistency proof. Remember that " ! K X 1 (2) SeN T (β, f ) = Tr λ0 f 00 + (β 0k − β k )Xk Mf NT k=1

13

λ0 f 00 +

K X k=1

!0 (β 0k − β k )Xk

# P(λ0 ,w)

.

We want to show that under the assumptions of theorem 3.1 there exist finite positive constants a0 , a1 , a2 , a3 and a4 such that

low

0,low 2

a − β 0 β (2) SeN T (β, f ) ≥



β low − β 0,low 2 + a1 β low − β 0,low + a2



− a3 β high − β 0,high − a4 β high − β 0,high β low − β 0,low ,

wpa1.

(2) Proof of the lower bound on SeN T (β, f ). Applying Lemma A.1 and part (xi) of Lemma S.4.1

we find that 1 (2) µ SeN T (β, f ) ≥ N T R+1

"

1 µ = N T R+1

"

λ0 f 00 +

K X

!0 (β 0k − β k )Xk

K X λ0 f 00 + (β 0k − β k )Xk

P(λ0 ,w)

k=1

λ0 f 00 +

K1 X

k=1

!0 (β 0l − β l )wl vl0

λ0 f 00 +

l=1

+

K1 X

K1 X

(β 0l − β l )wl vl0

+ 1 µ ≥ N T R+1

λ0 f 00 +

K1 X 0 00 λ f + (β 0l − β l )wl vl0

0 (β 0m − β m )Xm P(λ0 ,w)

0 (β 0m − β m )Xm P(λ0 ,w)

+

(β 0l − β l )wl vl0

λ0 f 00 +

00

K1 X

! (β 0l − β l )wl vl0

l=1 K1 X

!0 (β 0l



β l )wl vl0

K X

P(λ0 ,w)

(β 0m − β m )Xm

m=K1

0 (β 0m − β m )Xm P(λ0 ,w)

m=K1



(β 0m − β m )Xm

m=K1

λ f + K X

#

!0

l=1



K X

l=1

+

!

l=1

K1 X

0

(β 0m − β m )Xm

m=K1

m=K1

"

K X

P(λ0 ,w)

m=K1 K X

(β 0l − β l )wl vl0

!0

l=1

+

!

l=1

λ0 f 00 + K X

!#

K1 X 0 00 λ f + (β 0l − β l )wl vl0

!#

l=1

!0

K1 X

K1 X

!

1  λ0 f 00 + (β 0l − β l )wl vl0  µ (β 0l − β l )wl vl0 λ0 f 00 + N T R+1 l=1 l=1

high



− a3 β − β 0,high − a4 β high − β 0,high β low − β 0,low , wpa1,

where a3 > 0 and a4 > 0 are appropriate constants. For the last step we used part (xii) of

14

Lemma S.4.1 and the fact that

K ! K1

X X 1

0 00 0 0 0 0 0 (β − β m )Xm P(λ ,w) λ f + (β l − β l )wl vl

N T m=K m l=1 1





high

Xm λ0 f 00

low

wl vl0 0,high 0,low





.

√ ≤K β −β max √ +K β −β max √

m l NT NT NT √ √ Our assumptions guarantee that the operator norms of λ0 f 00 / N T and Xm / N T are bounded from above as N, T → ∞, which results in finite constants a3 and a4 . (2) We write the above result as SeN T (β, f ) ≥ µR+1 (A0 A)/(N T ) + terms containing β high , where P 1 0 0 we defined A = λ0 f 00 + K l=1 (β l −β l ) wl vl . Also write A = A1 +A2 +A3 , with A1 = Mw A Pf 0 = PK1 0 P 1 0 0 00 0 Mw λ0 f 00 , A2 = Pw A Mf 0 = K l=1 (β l − l=1 (β l − β l ) wl vl Mf 0 , A3 = Pw A Pf 0 = Pw λ f + β l ) wl vl0 Pf . We then find A0 A = A01 A1 + (A02 + A03 )(A2 + A3 ) and A0 A ≥ A0 A − (a1/2 A03 + a−1/2 A02 )(a1/2 A3 + a−1/2 A2 ) = [A01 A1 − (a − 1) A03 A3 ] + (1 − a−1 )A02 A2 , where ≥ for matrices refers to the difference being positive definite, and a is a positive number. We choose a = 1 + µR (A01 A1 )/(2 kA3 k2 ). The reason for this choice becomes clear below. Note that [A01 A1 − (a − 1) A03 A3 ] has at most rank R (asymptotically it has exactly rank R). The non-zero eigenvalues of A0 A are therefore given by the (at most) R non-zero eigenvalues of [A01 A1 − (a − 1) A03 A3 ] and the non-zero eigenvalues of (1 − a−1 )A02 A2 , the largest one of the latter being given given by the operator norm (1 − a−1 )kA2 k2 . We therefore find   1 1 µR+1 (A0 A) ≥ µR+1 (A01 A1 − (a − 1) A03 A3 ) + (1 − a−1 )A02 A2 NT NT  1 min (1 − a−1 )kA2 k2 , µR [A01 A1 − (a − 1) A03 A3 ] . ≥ NT Using Lemma S.4.1(xii) and our particular choice of a we find µR [A01 A1 − (a − 1) A03 A3 ] ≥ µR (A01 A1 ) − k(a − 1)A03 A3 k 1 = µR (A01 A1 ) . 2 Therefore   1 2 kA2 k2 1 0 0 µ (A A) ≥ µ (A A1 ) min 1 , N T R+1 2 NT R 1 2 kA3 k2 + µR (A01 A1 ) 1 kA2 k2 µR (A01 A1 ) ≥ , N T 2 kAk2 + µR (A01 A1 ) 15

where we used kAk ≥ kA3 k and kAk ≥ kA2 k. Our assumptions guarantee that there exist positive constants c0 , c1 , c2 and c3 such that K1

kwl v 0 k kAk kλ0 f 00 k X √ ≤ √ + |β 0l − β l | √ l ≤ c0 + c1 β low − β 0,low , NT NT NT l=1  µR f 0 λ00 Mw λ0 f 00 µR (A01 A1 ) = ≥ c2 , wpa1 , NT N T "K # K1 1 X X kA2 k2 = µ1 (β 0l1 − β l1 ) wl1 vl01 Mf 0 (β 0l2 − β l2 ) vl2 wl02 NT l1 =1 l2 =1

low 2 ≥ c3 β − β 0,low , wpa1 ,

wpa1 ,

were for the last inequality we used Lemma S.5.2. We thus have 1 µ (A0 A) ≥ N T R+1 1+ Defining a0 =

c2 c3 , 2c21

a1 =

2c0 c1

and a2 =

c2 2c21

2 c2

2 c3 β low − β 0,low

2 , c0 + c1 β low − β 0,low

wpa1 .

we thus obtain

2 a0 β low − β 0,low 1 0 , µ (A A) ≥

N T R+1

β low − β 0,low 2 + a1 β low − β 0,low + a2

wpa1 ,

(2) i.e. we have shown the desired bound on SeN T (β, f ).

S.6

Regarding the Proof of Corollary 4.2

As discussed in the main text, the proof of Corollary 4.2 is provided in Moon and Weidner (2013). All that is left to show here is that the matrix WN T = WN T (λ0 , f 0 , Xk ) does not become singular as N, T → ∞ under our assumptions. Proof. Remember that WN T =

1 Tr(Mf 0 Xk0 1 Mλ0 Xk2 ) . NT

16

The smallest eigenvalue of the symmetric matrix W (λ0 , f 0 , Xk ) is given my µK (WN T ) =

a0 W N T a kak2 {a∈RK , a6=0} min

" ! !# K K X X 1 = min Tr Mf 0 ak1 Xk0 1 Mλ0 ak2 Xk2 {a∈RK , a6=0} N T kak2 k1 =1 k2 =1    0 0 Tr Mf 0 Xlow,ϕ + Xhigh,α Mλ0 (Xlow,ϕ + Xhigh,α ) , = min N T (kαk2 + kϕk2 ) {α ∈ RK1 , ϕ ∈ RK2 α 6= 0, ϕ 6= 0}

where we decomposed a = (ϕ0 , α0 )0 , with ϕ and α being vectors of length K1 and K2 , respectively, and we defined linear combinations of high- and low-rank regressors2 Xlow,ϕ =

K1 X

ϕl Xl ,

Xhigh,α =

K X

α m Xm .

m=K1 +1

l=1

We have Mλ0 = M(λ0 ,w) + P(Mλ0 w) , where w is the N × K1 matrix defined in assumption 4, i.e. (λ0 , w) is an N × (R + K1 ) matrix, while Mλ0 w is also an N × K1 matrix. Using this we obtain µK (WN T ) =

min

{ϕ ∈ RK1 , α ∈ RK2

    1 0 0 Tr Mf 0 Xlow,ϕ + Xhigh,α M(λ0 ,w) (Xlow,ϕ + Xhigh,α ) 2 2 N T (kϕk + kαk )

ϕ 6= 0, α 6= 0}

=

min

{ϕ ∈ RK1 , α ∈ RK2

   0 0 + Tr Mf 0 Xlow,ϕ + Xhigh,α P(Mλ0 w) (Xlow,ϕ + Xhigh,α )    1 0 0 X 0 Tr M M X high,α f (λ ,w) high,α N T (kϕk2 + kαk2 )



ϕ 6= 0, α 6= 0}

   0 0 + Tr Mf 0 Xlow,ϕ + Xhigh,α P(Mλ0 w) (Xlow,ϕ + Xhigh,α )



(S.6.1) We note that there exists finite positive constants c1 , c2 , c3 such that   1 0 M(λ0 ,w) Xhigh,α ≥ c1 kαk2 , wpa1, Tr Mf 0 Xhigh,α NT    1 0 0 Tr Mf 0 Xlow,ϕ + Xhigh,α P(Mλ0 w) (Xlow,ϕ + Xhigh,α ) ≥ 0 , NT   1 0 P(Mλ0 w) Xlow,ϕ ≥ c2 kϕk2 , wpa1, Tr Mf 0 Xlow,ϕ NT   1 c3 0 Tr Mf 0 Xlow,ϕ P(Mλ0 w) Xhigh,α ≥ − kϕkkαk , wpa1, NT 2   1 0 Tr Mf 0 Xhigh,α P(Mλ0 w) Xhigh,α ≥ 0 , (S.6.2) NT 2

As in assumption 4 the components of α are denoted αK1 +1 , . . . , αK to simplify notation.

17

.

and we want to justify these inequalities now. The second and the last equation in (S.6.2) are     0 0 true because e.g. Tr Mf 0 Xhigh,α P(Mλ0 w) Xhigh,α = Tr Mf 0 Xhigh,α P(Mλ0 w) Xhigh,α Mf 0 , and the trace of a symmetric positive semi-definite matrix is non-negative. The first inequality in (S.6.2) is true because rank(f 0 ) + rank(λ0 , w) = 2R + K1 and using Lemma A.1 and assumption 4 we have     1 1 0 0 0 X 0 Tr M M X ≥ µ X X >b, high,α high,α f (λ ,w) 2R+K +1 high,α high,α 1 N T kαk2 N T kαk2

wpa1,

i.e. we can set c1 = b. The third inequality in (S.6.2) is true because according Lemma S.4.1(v) we have   K1 1 0 Tr Mf 0 Xlow,ϕ P(Mλ0 w) Xhigh,α ≥ − kXlow,ϕ k kXhigh,α k NT NT K1 ≥− kXlow,ϕ kF kXhigh,α kF NT



Xk 1

Xk 2

√ ≥ − K1 K1 K2 kϕk kαk max max √ k1 =1...K1 N T k2 =K1 +1...K N T F F c3 ≥ − kϕk kαk , 2



where we used that assumption 4 implies that Xk / N T < C holds wpa1 for some constant F

C as, and we set c3 = K1 K1 K2 C 2 . Finally, we have to argue that the third inequality in (S.6.2) 0 0 Mλ0 Xlow,ϕ , i.e. we need to show that holds. Note that Xlow,ϕ P(Mλ0 w) Xlow,ϕ = Xlow,ϕ

  1 0 Tr Mf 0 Xlow,ϕ Mλ0 Xlow,ϕ ≥ c2 kϕk2 . NT Using part (vi) or Lemma S.4.1 we find     1 1 0 0 Tr Mf 0 Xlow,ϕ Tr Mλ0 Xlow,ϕ Mf 0 Xlow,ϕ Mλ0 Xlow,ϕ = Mλ0 NT NT

1 0

Mλ0 Xlow,ϕ Mf 0 Xlow,ϕ ≥ Mλ0 , NT and according to Lemma S.5.2 this expression is bounded by some positive constant times kϕk2 (in the lemma we have kϕk = 1, but all expressions are homogeneous in kϕk). Using the inequalities (S.6.2) in equation (S.6.1) we obtain µK (WN T ) ≥

min

{ϕ ∈ RK1 , α ∈ RK2

   1 2 2 c kαk + max 0, c kϕk − c kϕkkαk 1 2 3 kϕk2 + kαk2

ϕ 6= 0, α 6= 0}

 ≥ min

c2 c1 c22 , 2 c22 + c23

 ,

wpa1.

Thus, the smallest eigenvalue of WN T is bounded from below by a positive constant as N, T → ∞, i.e. WN T is non-degenerate and invertible. 18

S.7

Proof of Examples for Assumption 5

Proof of Example 1. We want to show that the conditions of Assumption 5 are satisfied. Conditions (i)-(iii) immediately follow by the assumptions of the example. For condition (iv), notice that Cov (Xit , Xis |C) = E (Uit Uis ). Since |β 0 | < 1 and supit E(e2it ) < ∞, it follows that

N N T T 1 XX 1 XX |Cov (Xit , Xis |C)| = |E (Uit Uis )| N T i=1 t,s=1 N T i=1 t,s=1 N T ∞ 1 X X X 0 p+q (β ) E (eit−p eis−q ) < ∞. N T i=1 t,s=1 p,q=0

=

For condition (v), notice by the independence between the sigma field C and the error terms {eit } that we have for some finite constant M, N T   1 X X e e Cov e X , e X |C it is iu iv N T 2 i=1 t,s,u,v=1 N T 1 X X = |Cov (eit Uis , eiu Uiv )| N T 2 i=1 t,s,u,v=1 N T ∞   1 X X X 0 p+q 0 p 0 q = β E (e e e e ) − β E (e e ) β E (e e ) it is−p iu iv−q it is−p iu iv−q N T 2 i=1 t,s,u,v=1 p,q=0

M ≤ T2 =

=

M T2 M T

T X

∞ X 0 p+q β [I {t = u} I {s − p = v − q} + I {t = v − q} I {s − p = u}]

t,s,u,v=1 p,q=0 T X

s X



v X

0 s−k+v−l 1 β I {t = u} I {k = l} + M  T t,u,s,v=1 k=−∞ l=−∞ T min{s,v} X X s,v=1 k=−∞

 0 s+v−2k 1 β +M T

T X s,u=1 s−u≥0

 0 s−u   1 β   T

19

T X v,t=1 v−t≥0

T X s,u=1 s−u≥0

 0 s−u   1 β   T 

0 v−t  β  .

T X v,t=1 v−t≥0

 0 v−t  β 

Notice that T min{s,v} 1 X X 0 s+v−2k β T s,v=1 k=−∞ T s v T s 2 X X X 0 s−v+2(v−k) 2 X X 0 2(s−k) = β + β T s=2 v=1 k=−∞ T s=1 k=−∞ T s ∞ T ∞ 2 X X 0 s−v X 0 2l 2 X X 0 2l = β β + β T s=2 v=1 T s=1 l=0 l=0 T s 1 X X 0 s−v 2 2 β + = 0 2 2 1 − β T s=2 v=1 1 − β 0 ! T −1  X l  2 2 l 0 + = β 1− 0 2 2 T 1 − β 1 − β 0 l=1 = O (1) ,

and

  T −1 T T s l 1 X X 0 s−u X 0 l 1 X 0 s−u β β β = 1− = = O (1) . T s,u=1 T s=1 u=1 T l=0 s−u≥0

Therefore, we have the desired result that T N   1 X X e e Cov eit Xis , eiu Xiv |C = Op (1) . 2 N T i=1 t,s,u,v=1

Preliminaries for Proof of Example 2 • Although we observe Xit for 1 ≤ t ≤ T, here we treat that Zit = (eit , Xit ) has infinite past and future over time. Define Gτt (i) = C ∨ σ ({Xis : τ ≤ s ≤ t}) and Hτt (i) = C ∨ σ ({Zit : τ ≤ s ≤ t}) . Then, by definition, we have Gτt (i) , Hτt (i) ⊂ Fτt (i) for all τ , t, i. By Assumption (iv) of Example 2, the time series of {Xit : −∞ < t < ∞} and {Zit : −∞ < t < ∞} are conditional α− mixing conditioning on C uniformly in i. • Mixing inequality: The following inequality is a conditional version of the α-mixing inequality of Hall and Heyde (1980, p.  278). Suppose that Xit is a Ft -measurable random variable with E |Xit |max{p,q} |C < ∞, where p, q > 1 with 1/p + 1/q < 1. Denote 20

1/p

kXit kC,p = (E (|Xit |p |C))

. Then, for each i, we have 1− p1 − 1q

|Cov (Xit , Xit+m |C)| ≤ 8 kXit kC,p kXit+m kC,q αm

(i) .

(S.7.1)

Proof of Example 2. Again, we want to show that the conditions of Assumption 5 are satisfied. Conditions (i)-(iii) immediately follow by the assumptions of the example. For condition (iv), we apply the mixing inequality (S.7.1) with p = q > 4. Then, we have N T 1 XX |Cov (Xit , Xis |C)| N T i=1 t,s=1 N T T −t N T −1 T −m 2 XX X 2 XXX |Cov (Xit , Xit+m |C)| = |Cov (Xit , Xit+m |C)| ≤ N T i=1 t=1 m=0 N T i=1 m=0 t=1 N

T −1 T −m

p−2 16 X X X = kXit kC,p kXit+m kC,p αm (i) P N T i=1 m=0 t=1  X ∞ p−2 2 αmP ≤ 16 sup kXit kC,p

i,t

m=0

≤ Op (1) , where the last line holds since supi,t kXit k2C,p = Op (1) for some p > 4 as assumed in the example p−2 P P 4p −ζ p−2 P P (2), and ∞ = ∞ = O (1) due to ζ > 3 4p−1 and p > 4. m=0 αm m=0 m For condition (v), we need to show N T   1 X X e e Cov e X , e X |C = Op (1) . it is iu iv 2 N T i=1 t,s,u,v=1

Notice that N T   1 X X e e Cov eit Xis , eiu Xiv |C 2 N T i=1 t,s,u,v=1

=

N T      1 X X  e e e e E e X e X |C − E e X |C E e X |C it is iu iv it is iu iv N T 2 i=1 t,s,u,v=1

N T N  1 X X  e 1 X e ≤ E e X e X |C + it is iu iv N T 2 i=1 t,s,u,v=1 N i=1

T 1 X  e  E eit Xis |C T t,s=1

!2

= I + II, say. First, for term I, there are finite number of different orderings among the indices t, s, u, v. We consider the case t ≤ s ≤ u ≤ v and establish the desired result. The rest of the cases are the 21

same. Note that N T T −t T −k T −l  1 X X X X X  e e E eit Xit+k eit+k+l Xit+k+l+m |C 2 N T i=1 t=1 k=0 l=0 m=0 N T 1 X 1 X ≤ N i=1 T 2 t=1

    eit+k eit+k+l X eit+k+l+m |C E eit X

X 0≤l,m≤k 0≤k+l+m≤T −t

N T 1 X 1 X + N i=1 T 2 t=1

X 0≤k,m≤l 0≤k+l+m≤T −t

h   i eit+k eit+k+l X eit+k+l+m |C E eit X    e e −E eit Xit+k |C E eit+k+l Xit+k+l+m |C 

+

N T 1 X 1 X N i=1 T 2 t=1

N T 1 X 1 X + N i=1 T 2 t=1

    eit+k |C E eit+k+l X eit+k+l+m |C E eit X

X 0≤k,m≤l 0≤k+l+m≤T −t

h  i eit+k+l+m |C eit+k eit+k+l X E eit X

X 0≤p,l≤m 0≤k+l+m≤T −t

= I1 + I2 + I3 + I4 , say.     eit+k eit+k+l X eit+k+l+m |C with eit and By applying the mixing inequality (S.7.1) to E eit X eit+k eit+k+l X eit+k+l+m , we have X     eit+k eit+k+l X eit+k+l+m |C E eit X

1− 1 − 1

e

e ≤ 8 keit kC,p Xit+k eit+k+l Xit+k+l+m αk p q (i) C,q



1− 1 − 1

e

e

≤ 8 keit kC,p Xit+k keit+k+l kC,3q Xit+k+l+m αk p q (i) , C,3q

C,3q

where the last inequality follows by the generalized Holder’s inequality. Choose p = 3q > 4. Then, I1

N T 8 X 1 X ≤ N i=1 T 2 t=1

X



e keit kC,p Xit+k

C,p

0≤l,m≤k 0≤k+l+m≤T −t

  T

 1 X

e 2 2 ≤ 8 sup keit kC,p sup Xit+k T 2 t=1 C,p i,t i,t



e

keit+k+l kC,p Xit+k+l+m

X 0≤l,m≤k 0≤k+l+m≤T −t

  ∞

X 1− 1

e 2 2 ≤ 8 sup keit kC,p sup Xit+k k 2 αk 4p i,t

i,t

C,p

k=0

≤ Op (1) , 22

C,p

1 1− 4p

αk

1 1− 4p

αk

(i)

where the last line holds since we assume in the example (2) that Op (1) for some p > 4,, and

1 1− 4p 2 m α m m=0

P∞

=

P∞

m=0

2−ζ 4p−1 4p

m



supi,t keit k2C,p

 

e 2 = supi,t Xit+k C,p

4p = O (1) due to ζ > 3 4p−1 and

p > 4. By applying similar argument, we can also show that I2 , I3 , I4 = Op (1) .

S.8

Supplement to the Proof of Theorem 4.3

Notation EC and VarC and CovC : In the remainder of this supplementary file we write EC , VarC and CovC for the expectation, variance and covariance operators conditional on C, i.e. EC (A) = E(A|C), VarC (A) = Var(A|C) and CovC (A, B) = Cov(A, B|C). What is left to show to complete the proof of Theorem 4.3 is that Lemma B.1 and Lemma B.2 in the main text appendix hold. Before showing this, we first present two further intermediate lemmas. Lemma S.8.1. Under the assumptions of Theorem 4.3 we have for k = 1, . . . , K that (a) (b)

√ ek k = op ( N T ) , kPλ0 X √ ek Pf 0 k = op ( N T ) , kX

(c)

kPλ0 eXk0 k = op (N 3/2 ) ,

(d)

kPλ0 ePf 0 k = Op (1) .

Proof of Lemma S.8.1. # Part (a): We have ek k = kλ0 (λ00 λ0 )−1 λ00 X ek k kPλ0 X ek k ≤ kλ0 (λ00 λ0 )−1 kkλ00 X ek kF = Op (N −1/2 )kλ00 X ek kF , ≤ kλ0 kk(λ00 λ0 )−1 kkλ00 X

23

where we used part (i) and (ii) of Lemma S.4.1 and Assumption 1. We have   !2  R X T N  X n h io X 0 e 00 e 2   EC λir Xk,it E EC kλ Xk kF =E   r=1 t=1 i=1 ( R T N )   XXX 2 ek,it =E (λ0ir )2 EC X =

r=1 t=1 i=1 R T X N XX

  E (λ0ir )2 VarC (Xk,it )

r=1 t=1 i=1

= Op (N T ), ek,it is mean zero and independent across i, conditional on C, and our where we used that X √ ek kF = Op ( N T ) and the bounds on the moments of λ0ir and Xk,it . We therefore have kλ00 X √ √ ek k = Op ( T ) = op ( N T ). above inequality thus gives kPλ0 X ek Pf 0 k = kPf 0 X e0 k ≤ # The proof for part (b) is similar. As above we first obtain kX k

Op (T

−1/2

e 0 kF . Next, we have )kf X k 00

 !2  R X N T h i X X e 0 k2 = ek,it  EC kf 00 X EC  ftr0 X k F r=1 i=1

=

R X N X

t=1 T X

  0 ek,it X ek,is ftr0 fsr EC X

r=1 i=1 t,s=1

" ≤

R  X r=1

# N T 2 X X

max |ftr0 | t

|CovC (Xk,it , Xk,is )|

i=1 t,s=1

= Op (T 2/(4+) ) Op (N T ) = op (N T 2 ), where we used that uniformly bounded Ekft0 k4+ implies that maxt |ftr0 | = Op (T 1/(4+) ). We √ √ e 0 k2 = op (T N ) and therefore kX ek Pf 0 k = op ( N T ). thus have kf 00 X k F

# Next, we show part (c). First, we have   !2  R X N N X T   n h io X X  2 00 0 0   E EC kλ eXk kF = E EC λir eit Xk,jt   r=1 j=1 i=1 t=1 ( R N ) T X X X λ0ir λ0lr EC (eit els Xk,jt Xk,js ) =E r=1 i,j,l=1 t,s=1

=

R X N X T X   2 E (λ0ir )2 EC e2it Xk,jt = O(N 2 T ) , r=1 i,j=1 t=1

24

where we used that EC (eit els Xk,jt Xk,js ) is only non-zero if i = l (because of cross-sectional independence conditional on C) and t = s (because regressors are pre-determined). We can thus √ conclude that kλ0 0 eXk0 kF = Op (N T ). Using this we find kPλ0 eXk0 k = kλ0 (λ00 λ0 )−1 λ00 eXk0 k ≤ kλ0 (λ00 λ0 )−1 kkλ00 eXk0 k

√ √ ≤ kλ0 kk(λ00 λ0 )−1 kkλ00 eXk0 kF = Op (N −1/2 )Op (N T ) = Op ( N T ) .

This is what we wanted to show. # For part (d), we first find that

√1 NT

00 0

f eλ = Op (1), because F

   !2 

!  N X T  1 

f 00 eλ0 2  X F  √ = E EC  E EC  eit ft00 λ0i   NT   NT i=1 t=1 ) ( N N T T 1 XXXX 0 EC (eit ejs ) ft00 λ0i λ00 = E j fs N T i=1 j=1 t=1 s=1  

N T   1 XX  0 E EC e2it ft00 λ0i λ00 = i ft N T i=1 t=1

= O (1) , where we used that eit is independent across i and over t, conditional on C. Thus we obtain kPλ0 ePf 0 k = kλ0 (λ00 λ0 )−1 λ00 ef 0 (f 00 f 0 )−1 f 00 k



≤ kλ0 k (λ00 λ0 )−1 kλ00 ef 0 k (f 00 f 0 )−1 kf 00 k ≤ Op (N 1/2 )Op (N −1 )kλ00 ef 0 kF Op (T −1 )Op (T 1/2 ) = Op (1) , where we used part (i) and (ii) of Lemma S.4.1. Lemma S.8.2. Suppose that A and B are a T × T and an N × N matrices that are independent   of e, conditional on C, such that EC kAk2F = Op (N T ) and EC kBk2F = Op (N T ), and let Assumption 5 be satisfied. Then there exists a finite non-random constant c0 such that    2 (a) EC {Tr [(e0 e − EC (e0 e)) A]} ≤ c0 N EC kAk2F ,    2 (b) EC {Tr [(ee0 − EC (ee0 )) B]} ≤ c0 T EC kBk2F .

25

Proof. # Part (a): Denote Ats to be the (t, s)th element of A. We have Tr {(e0 e − EC (e0 e)) A} = =

T X T X

(e0 e − EC (e0 e))ts Ats

t=1 s=1 T X T X

N X

t=1 s=1

i=1

! (eit eis − EC (eit eis )) Ats .

Therefore EC (Tr {(e0 e − EC (e0 e)) A}) =

2

T X T X T X T X

" EC

t=1 s=1 p=1 q=1

N X

! (eit eis − EC (eit eis ))

i=1

N X

!# (ejp ejq − EC (ejp ejq ))

EC (AtsApq ) .

j=1

Let Σit = EC (e2it ). Then we find ! N !) ( N X X EC (eit eis − EC (eit eis )) (ejp ejq − EC (ejp ejq )) i=1

j=1

=

N X N X

{EC (eit eis ejp ejq ) − EC (eit eis ) EC (ejp ejq )}

i=1 j=1

  Σit Σis     Σ Σ it is =  EC (e4it ) − Σ2it     0

if (t = p) 6= (s = q) and (i = j) if (t = q) 6= (s = p) and (i = j) if (t = s = p = q) and (i = j) otherwise.

Therefore, 0

0

2

EC (Tr {(e e − EC (e e)) A}) ≤

T X T X N X

Σit Σis EC

A2ts





+ EC (Ats Ast ) +

t=1 s=1 i=1

T X N X

  EC e4it − Σ2it EC A2tt .

t=1 i=1

Define Σi = diag (Σi1 , ..., ΣiT ) . Then, we have T X T X N X

Σit Σis

N X

 EC A2ts = EC

t=1 s=1 i=1

!  Tr A0 Σi AΣi

i=1



N X

N X

i 2

i 2

Σ EC kAk2

EC AΣ F ≤ F

i=1

i=1   ≤ N sup Σ2it EC kAk2F . it

26

(S.8.1)

Also, T X T X N X

" Σit Σis EC (Ats Ast ) = EC

t=1 s=1 i=1

N X

#  Tr Σi AAΣi

i=1



N X

N X

i i

i 2



Σ EC kAk2 EC Σ A F AΣ F ≤ F

i=1

i=1

  2 ≤ N sup Σit EC kAk2F .

(S.8.2)

it

Finally, T X N X

EC

e4it





Σ2it



EC A2tt

   4 ≤ N sup EC eit EC kAk2F ,

(S.8.3)

it

t=1 i=1

and supit EC (e4it ) is assumed bounded by Assumption 5(vi). # Part (b): The proof is analogous to that of part (a). Proof of Lemma B.1. # For part (a) we have   1   1 0 0 e e √ N T Tr Pf 0 e Pλ0 Xk = √N T Tr Pf 0 e Pλ0 Pλ0 Xk Pf 0

R

e ≤√ kPλ0 e Pf 0 k Pλ0 Xk kPf 0 k NT √ 1 =√ Op (1) op ( N T ) Op (1) NT = op (1), where the the second last equality follows by Lemma S.8.1 (a) and (d). ek,jt . We then have # To show statement (b) we define ζ k,ijt = eit X " # N R T X   X 00 0 −1 X 1 λ λ 1 e0 = √ √ Tr Pλ0 e X λ0ir λ0jq ζ k,ijt . k N NT N N T t=1 i,j=1 r,q=1 rq {z } | ≡Ak,rq

 We only have EC ζ k,ijt ζ k,lms = 6 0 if t = s (because regressors are pre-determined) and i = l and j = m (because of cross-sectional independence). Therefore ( ) T N X X    1 λir λjq λlr λmq EC ζ k,ijt ζ k,lms E EC A2k,rq = E N 3 T t,s=1 i,j,l,m=1 =

T N  1 XX  2 2 2 E λ λ E ζ = O(1/N ) = op (1). C ir jq k,ijt N 3 T t=1 i,j=1

27

√ 1 Tr NT

We thus have Ak,rq = op (1) and therefore also



 e 0 = op (1). P λ0 e X k

ek,is − for statement (c) is similar to that of statement (b). Define ξ k,its = eit X # The proof  ek,is . We then have EC eit X " −1 # R N X T n h  io X X 1 f 0f 1 0 e 0 e √ √ Tr Pf 0 e Xk − EC e Xk = ftr fsq ξ k,its . T NT T N T r,q=1 t,s=1 i=1 rq {z } | ≡Bk,rq

Therefore N T  1 X X f f f f E ξ ξ tr sq ur vq C k,its k,juv T 3 N i,j=1 t,s,u,v=1  4 N T   1 X X ek,is , eju X ek,jv e X Cov ≤ max |fter | it C t,e r T 3 N i,j=1 t,s,u,v=1  4 N T   1 X X e e = max |fter | CovC eit Xk,is , eiu Xk,iv 3 t,e r T N i=1 t,s,u,v=1

 2 EC Bk,rq =

= Op (T 4/(4+) )Op (1/T ) = op (1), where we used that that uniformly bounded Ekft0 k4+ implies that maxt |ftr0 | = Op (T 1/(4+) ). # Part (d) and (e): We have kλ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 k = Op ((N T )−1/2 ), kek = Op (N 1/2 ), √ kXk k = Op ( N T ) and kPλ0 ePf 0 k = Op (1), which was shown in Lemma S.8.1. Therefore:  1 √ Tr ePf 0 e0 Mλ0 Xk f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 NT  1 =√ Tr Pλ0 ePf 0 e0 Mλ0 Xk f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 NT

R kPλ0 ePf 0 k kekkXk k f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 = Op (N −1/2 ) = op (1) . ≤√ NT which shows statement (d). The proof for part (e) is analogous. # To prove statement (f) we need to use additionally kPλ0 e Xk0 k = op (N 3/2 ), which was also

28

shown in Lemma S.8.1. We find  1 √ Tr e0 Mλ0 Xk Mf 0 e0 λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 NT  1 =√ Tr e0 Mλ0 Xk e0 Pλ0 λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 NT  1 −√ Tr e0 Mλ0 Xk Pf 0 e0 Pλ0 λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 NT R kekkPλ0 e Xk0 k kλ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 k ≤√ NT R −√ kekkXk kkPλ0 e Pf 0 kkλ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 k NT = op (1) . # Now we want to prove part (g) and (h) of the present lemma. For part (g) we have  1 √ Tr [ee0 − EC (ee0 )] Mλ0 Xk f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 NT  1 =√ Tr [ee0 − EC (ee0 )] Mλ0 X k f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 NT n o 1 ek Pf 0 f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 +√ Tr [ee0 − EC (ee0 )] Mλ0 X NT  1 Tr [ee0 − EC (ee0 )] Mλ0 X k f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 =√ NT

1

e

0 00 0 −1 00 0 −1 00 0 +√ P (λ λ ) λ kee0 − EC (ee0 )k X k f f (f f ) NT  1 Tr [ee0 − EC (ee0 )] Mλ0 X k f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 + op (1) =√ NT  Thus, what is left to prove is that √N1 T Tr [ee0 − EC (ee0 )] Mλ0 X k f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 = op (1). For this we define Bk = Mλ0 X k f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 . Using part (i) and (ii) of Lemma S.4.1 we find kBk kF ≤ R1/2 kBk k

≤ R1/2 kX k k f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00

≤ R1/2 kX k kF f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 . and therefore

2   EC kBk k2F ≤ R f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 EC kX k k2F = O(1) , 29

where we used EC kX k k2F



= O(N T ), which is true since we assumed uniformly bounded

moments of X k,it . Applying Lemma S.8.2 we therefore find  2  1 T 0 0 EC kBk k2F = o(1) , EC √ Tr {[ee − EC (ee )] Bk } ≤ c0 NT NT and thus √

1 Tr {[ee0 − EC (ee0 )] Bk } = op (1) , NT

which is what we wanted to show. The proof for part (h) is analogous. # Part (i): Conditional on C the expression e2it Xit X0it − EC (e2it Xit X0it ) is mean zero, and it is also uncorrelated across i. This together with the bounded moments that we assume implies that ( VarC

N T  1 X X 2 eit Xit X0it − EC e2it Xit X0it N T i=1 t=1

) = Op (1/N ) = op (1),

which shows the required result. # Part (j): Define the K × K matrix A =

1 NT

PN PT i=1

t=1

e2it (Xit + Xit ) (Xit − Xit )0 . Then

we have N T 1 1 XX 2 eit (Xit X0it − Xit Xit0 ) = (A + A0 ) . N T i=1 t=1 2

Let Bk be the N × T matrix with elements Bk,it = e2it (Xk,it + Xk,it ). We have kBk k ≤ kBk kF = √ Op ( N T ), because the moments of Bk,it are uniformly bounded. The components of A can be written as Alk =

1 Tr[Bl (Xk NT

− Xk )0 ]. We therefore have

1 rank(Xk − Xk )kBl k kXk − Xk k . NT e k P f 0 + Pλ 0 X ek Mf 0 . Therefore rank(Xk − Xk ) ≤ 2R and We have Xk − Xk = X

  2R

e

e 0 0 0 |Alk | ≤ + P X M kBl k X P

λ k f k f NT

  √ √ 2R 2R

e

e 0 0 ≤ kBl k X P + P X Op ( N T )op ( N T ) = op (1).

λ k = k f NT NT |Alk | ≤

where we used Lemma S.8.1. This shows the desired result. Proof of Lemma B.2. Let c be a K-vector such that kck = 1. The required result follows by the Cramer-Wold device, if we show that N T 1 XX √ eit X0it c ⇒ N (0, c0 Ωc) . N T i=1 t=1

30

For this, define ξ it = eit X0it c. Furthermore define ξ m = ξ M,m = ξ N T,it , with M = N T and m = T (i − 1) + t ∈ {1, . . . , M }. We then have the following: (i) Under Assumption 5(i), (ii), (iii) the sequence {ξ m , m = 1, . . . , M } is a martingale difference sequence under the filtration Fm = C ∨ σ({ξ n : n < m}). (ii) E(ξ 4it ) is uniformly bounded, since by Assumption 5(vi) EC e8it and EC (kXit k8+ ) are uniformly bounded by a non-random constant (applying Cauchy-Schwarz and the law of iterated expectations). (iii)

1 M

PM

m=1

ξ 2m = c0 Ωc + op (1). h

1 M

PM

2 m=1 ξ m

i2

EC (ξ 2m )



This is true, because firstly under our assumptions we have EC = − n P o 2 PM 2 2 2 EC M12 M = OP (1/M ) = oP (1), implying that we have M1 m=1 ξ m − EC (ξ m ) m=1 ξ m = PM PM PM 2 2 1 1 −1/2 m=1 ξ m ), m=1 EC (ξ m ) + op (1). We furthermore have M m=1 EC (ξ m ) = VarC (M M P M and using the result in equation (14) of the main text we find VarC (M −1/2 m=1 ξ m ) = P PT 0 VarC ((N T )−1/2 N i=1 t=1 ξ it ) = c Ωc + op (1). These three properties of {ξ m , m = 1, . . . , M } allow us to apply Corollary 5.26 in White (2001), P 0 which is based on Theorem 2.3 in Mcleish (1974), to obtain that √1M M m=1 ξ m →d N (0, c Ωc). P PN PT 0 √1 This concludes the proof, because √1M M m=1 ξ m = N T i=1 t=1 eit Xit c.

S.9

Expansions of Projectors and Residuals

b as well as the residuals eb enter into the asymptotic The incidental parameter estimators fb and λ b To describe the properties of fb, λ b and eb, it bias and variance estimators for the LS estimator β. is convenient to have asymptotic expansions of the projectors Mλb (β) and Mfb(β) that correspond b b to the minimizing parameters λ(β) and fb(β) in equation (4). Note that the minimizing λ(β) and b The corresponding fb(β) can be defined for all values of β, not only for the optimal value β = β. b residuals are eb(β) = Y − β · X − λ(β) fb0 (β).

31

Theorem S.9.1. Under Assumptions 1, 3, and 4(i) we have the following expansions (1)

(2)

K X

(2)

k=1 K X

Mλb (β) = Mλ0 + Mλ,e b − b + Mλ,e (1)

Mfb(β) = Mf 0 + Mfb,e + Mfb,e − eb(β) = Mλ0 e Mf 0 + eb(1) e −

 (1) (rem) (β) , β k − β 0k Mλ,k b + Mλ b  (1) (rem) β k − β 0k Mfb,k + Mfb (β) ,

k=1 K X

 (1) β k − β 0k ebk + eb(rem) (β) ,

k=1

where the spectral norms of the remainders satisfy for any series η N T → 0

(rem)

Mλb (β) sup = Op (1) , 0 2 −1/2 kek kβ − β 0 k + (N T )−3/2 kek3 {β:kβ−β 0 k≤η N T } kβ − β k + (N T )

(rem)

Mfb (β) sup = Op (1) , 0 2 −1/2 kek kβ − β 0 k + (N T )−3/2 kek3 {β:kβ−β 0 k≤η N T } kβ − β k + (N T )

(rem)

eb (β) sup = Op (1) , 1/2 kβ − β 0 k2 + kek kβ − β 0 k + (N T )−1 kek3 {β:kβ−β 0 k≤η N T } (N T ) and we have rank(b e(rem) (β)) ≤ 7R, and the expansion coefficients are given by (1)

0 00 0 −1 (λ00 λ0 )−1 λ00 − λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 e0 Mλ0 , Mλ,e b = − Mλ0 e f (f f ) (1)

0 00 0 −1 Mλ,k (λ00 λ0 )−1 λ00 − λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 Xk0 Mλ0 , b = − Mλ0 Xk f (f f ) (2)

0 00 0 −1 Mλ,e (λ00 λ0 )−1 λ00 e f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 b = Mλ0 e f (f f )

+ λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 e0 λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 e0 Mλ0 − Mλ0 e Mf 0 e0 λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 − λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 e Mf 0 e0 Mλ0 − Mλ0 e f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 e0 Mλ0 + λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 e0 Mλ0 e f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 ,

32

analogously (1)

Mfb,e = − Mf 0 e0 λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 − f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 e Mf 0 , (1)

Mfb,k = − Mf 0 Xk0 λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 − f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 Xk Mf 0 , (2)

Mfb,e = Mf 0 e0 λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 e0 λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 + f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 e f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 e Mf 0 − Mf 0 e0 Mλ0 e f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 − f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 e0 Mλ0 e Mf 0 − Mf 0 e0 λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 e Mf 0 + f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 e Mf 0 e0 λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 , and finally (1)

ebk = Mλ0 Xk Mf 0 , 00 0 −1 0 0 eb(1) (f 00 f 0 )−1 f 00 e = −Mλ0 e Mf 0 e λ (λ λ )

− λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 e0 Mλ0 e Mf 0 − Mλ0 e f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 e Mf 0 . Proof. The general expansion of Mλb (β) is given Moon and Weidner (2013), and in the theorem we just make this expansion explicit up to a particular order. The result for Mfb(β) is just obtained by symmetry (N ↔ T , λ ↔ f , e ↔ e0 , Xk ↔ Xk0 ). For the residuals eb we have ! h   i X b − β 0 · X + λ0 f 00 , b Xk = M b e − β β eb = Mλb Y − k λ k=1

and plugging in the expansion of Mλb gives the expansion of eb. We have eb(β) = A0 + λ0 f 00 − b fb0 (β), where A0 = e − P (β k − β 0 )Xk . Therefore eb(rem) (β) = A1 + A2 + A3 with A1 = A0 − λ(β) k k (1) 0 00 0 b b ee . We find rank(A1 ) ≤ 2R, rank(A2 ) ≤ 2R, Mλ0 A0 Mf 0 , A2 = λ f − λ(β)f (β), and A3 = −b rank(A3 ) ≤ 3R, and thus rank(b e(rem) (β)) ≤ 7R, as stated in the theorem. Having expansions for Mλb (β) and Mfb(β) we also have expansions for Pλb (β) = IN −Mλb (β) and Pfb(β) = IT −Mfb(β). The reason why we give expansions of the projectors and not expansions of b λ(β) and fb(β) directly is that for the latter we would need to specify a normalization, while the b projectors are independent of any normalization choice. An expansion for λ(β) can for example b b be defined by λ(β) = Pb (β)λ0 , in which case the normalization of λ(β) is implicitly defined by λ

0

the normalization of λ . 33

S.10

Consistency Proof for Bias and Variance Estimators (Proof of Theorem 4.4)

It is convenient to introduce some alternative notation for the Definition 1 in Section 4.3 of the main text. Definition

Let Γ : R → R be the truncation kernel defined by Γ(x) = 1 for |x| ≤ 1, and

Γ(x) = 0 otherwise. Let M be a bandwidth parameter that depends on N and T . For an N × N matrix A with elements Aij and a T × T matrix B with elements Bts we define (i) the diagonal truncations AtruncD = diag[(Aii )i=1,...,N ] and B truncD = diag[(Btt )t=1,...,T ]. (ii) the right-sided Kernel truncation of B, which is a T × T matrix B truncR with elements  truncR truncR Bts = Γ s−t Bts for t < s, and Bts = 0 otherwise. M Here, we suppress the dependence of B truncR on the bandwidth parameter M . Using this notation we can represent the estimators for the bias in Definition 1 as follows: i h truncR 0 b1,k = 1 Tr P b (b B e X ) , k f N i h truncD b −1 λ b0 , b0 λ) b2,k = 1 Tr (b B e eb0 ) Mλb Xk fb(fb0 fb)−1 (λ T h i truncD b −1 (fb0 fb)−1 fb0 . b (λ b0 λ) b3,k = 1 Tr (b B e0 eb) Mfb Xk0 λ N Before proving Theorem 4.4 we establish some preliminary results.   √ b − β 0 = Op (1). Corollary S.10.1. Under the Assumptions of Theorem 4.3 we have N T β This corollary directly follows from Theorem 4.3. Corollary S.10.2. Under the Assumptions of Theorem 4.4 we have



Pb − Pλ0 = Mb − Mλ0 = Op (N −1/2 ) , λ λ





−1/2 0 0 P − P = M − M ).

fb

fb f f = Op (T Proof. Using kek = Op (N 1/2 ) and kXk k = Op (N ) we find that the expansion terms in Theorem S.9.1 satisfy

(1) −1/2 ),

Mλ,e b = Op (N



(2)

(1) −1 = O (N ) , M

Mλ,e

p b b = Op (1) . λ,k

Together with corollary S.10.1 the result for Mλb − Mλ0 immediately follows. In addition we have Pλb − Pλ0 = −Mλb + Mλ0 . The proof for Mfb and Pfb is analogous. 34

Lemma S.10.3. Under the Assumptions of Theorem 4.4 we have N T  1 XX 2  eit Xit Xit0 − Xbit Xbit0 = op (1) , A1 ≡ N T i=1 t=1

A2 ≡

N T  1 XX 2 eit − eb2it Xbit Xbit0 = op (1) . N T i=1 t=1

Lemma S.10.4. Let fb and f 0 be normalized as fb0 fb/T = IR and f 00 f 0 /T = IR . Then, under the assumptions of Theorem 4.4, there exists an R × R matrices H = HN T such that3



b

b 0 0 −1 0

λ − λ (H ) = Op (1) .

f − f H = Op (1) , Furthermore



b b0 b −1 b0 b −1 b0

λ (λ λ) (f f ) f − λ0 (λ00 λ0 )−1 (f 00 f 0 )−1 f 00 = Op N −3/2 . Lemma S.10.5. Under the Assumptions of Theorem 4.4 we have

truncR (i) N −1 EC (e0 Xk ) − (b e0 X k )

= op (1) ,

truncD −1 0 0 (ii) N EC (e e) − (b e eb)

= op (1) ,

truncD (iii) T −1 EC (ee0 ) − (b e eb0 )

= op (1) . Lemma S.10.6. Under the Assumptions of Theorem 4.4 we have

0 truncR (i) N −1 (b e Xk )

= Op (M T 1/8 ) ,

truncD −1 0 (ii) N (b e eb)

= Op (1) ,

0 truncD e eb ) (iii) T −1 (b

= Op (1) . The proof of the above lemmas is given in the supplementary material. Using these lemmas we can now prove Theorem 4.4. c = W + op (1). Proof of Theorem 4.4, Part I: show W 3

We consider a limit N, T → ∞ and for different N, T different H-matrices can be chosen, but we write H

instead of HN T to keep notation simple.

35

Using |Tr (C)| ≤ kCk rank (C) and corollary S.10.2 we find c Wk1 k2 −WN T,k1 k2   i h i h  = (N T )−1 Tr Mλb − Mλ0 Xk1 Mfb Xk0 2 + (N T )−1 Tr Mλ0 Xk1 Mfb − Mf 0 Xk0 2

2R

Mb − Mλ0 kXk1 kkXk2 k 2R 0 ≤ M − M

f kXk1 kkXk2 k λ fb NT NT 2R 2R Op (N −1 )Op (N T ) + Op (T −1 )Op (N T ) = NT NT = op (1) . c = WN T + op (1) = W + op (1). Thus we have W b = Ω + op (1). Proof of Theorem 4.4, Part II: show Ω PN PT 0 2 b Let ΩN T ≡ N1T i=1 t=1 eit Xit Xit . We have Ω = ΩN T + oP (1) = Ω + A1 + A2 + op (1) = b + oP (1), where A1 and A2 are defined in Lemma S.10.3, and the lemma states that A1 and Ω A2 are op (1). b1 = B1 + op (1). Proof of Theorem 4.4, Part III: show B Let B1,k,N T = N −1 Tr [Pf 0 EC (e0 Xk )]. According to Assumption 6 we have B1,k = B1,k,N T +op (1). b1,k + op (1). Using |Tr (C)| ≤ kCk rank (C) we find What is left to show is that B1,k,N T = B  1 h i 1 truncR b1 = EC Tr(Pf 0 e0 Xk ) − Tr Pfb (b e0 Xk ) B1,k,N T − B N N  i 1 h truncR ≤ Tr Pf 0 − Pfb (b e0 Xk ) N h io 1 n truncR 0 0 e Xk ) + Tr Pf 0 EC (e Xk ) − (b N

2R

0 truncR ≤ e Xk )

Pf 0 − Pfb (b

N

R

truncR + kPf 0 k EC (e0 Xk ) − (b e 0 Xk )

. N We have kPf 0 k = 1. We now apply Lemmas S.10.5, S.10.2 and S.10.6 to find  b B1,k,N T − B1 = N −1 Op (N −1/2 )Op (M N T 1/8 ) + op (N ) = op (1) . This is what we wanted to show. b2 = B2 + op (1) and B b3 = B3 + op (1). Proof of Theorem 4.4, final part: show B Define B2,k,N T =

 1  Tr EC (ee0 ) Mλ0 Xk f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 . T 36

According to Assumption 6 we have B2,k = B2,k,N T + op (1). What is left to show is that b2,k + op (1). We have B2,k,N T = B   b2,k = 1 Tr EC (ee0 ) Mλ0 Xk f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 B2,k − B T i 1 h 0 truncD b0 λ) b −1 λ b0 e eb ) Mλb Xk fb(fb0 fb)−1 (λ − Tr (b T i  1 h 0 truncD b0 b0 λ) b −1 λ e eb ) Mλb Xk f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 − fb(fb0 fb)−1 (λ = Tr (b T i  1 h 0 truncD + Tr (b e eb ) Mλ0 − Mλb Xk f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 T o i 1 nh truncD + Tr EC (ee0 ) − (b e eb0 ) Mλ0 Xk f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 . T Using |Tr (C)| ≤ kCk rank (C) (which is true for every square matrix C, see the supplementary material) we find

R



0 00 0 −1 00 0 −1 00 b b0 b −1 b0 b −1 b0

0 truncD b e eb ) B2,k − B2,k ≤ (b

kXk k f (f f ) (λ λ ) λ − f (f f ) (λ λ) λ T



R

0 truncD + (b e eb )

Mλ0 − Mλb kXk k f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 T

R

truncD + EC (ee0 ) − (b e eb0 )

kXk k f 0 (f 00 f 0 )−1 (λ00 λ0 )−1 λ00 . T



Here we used kMf 0 k = Mfb = 1. Using kXk k = Op ( N T ), and applying Lemmas S.10.2, S.10.4, S.10.5 and S.10.6, we now find  −1 b Op (T ) Op ((N T )1/2 ) Op (N −3/2 ) B2,k − B2,k = T + Op (T ) Op (N −1/2 ) Op ((N T )1/2 ) Op ((N T )−1/2 )  1/2 −1/2 + op (T ) Op ((N T ) ) Op ((N T ) ) = op (1) . b3 = B3 + op (1) is analogous. This is what we wanted to show. The proof of B

S.11

Proof of Intermediate Lemma

Here we provide the proof of some intermediate lemmas that were stated and used in Section S.10, but not proved yet, in order to keep that section more readable. The following lemma gives a useful bound on the maximum of (correlated) random variables Lemma S.11.1. Let Zi , i = 1, 2, . . . , n, be n real valued random variables, and let γ ≥ 1 and B > 0 be finite constants (independent of n). Assume maxi EC |Zi |γ ≤ B, i.e. the γ’th moment 37

of the Zi are finite and uniformly bounded. For n → ∞ we then have  max |Zi | = Op n1/γ .

(S.11.1)

i

Proof. Using Jensen’s inequality one obtains EC maxi |Zi | ≤ (EC maxi |Zi |γ )1/γ ≤ (EC γ 1/γ

(n maxi EC |Zi | )

≤n

1/γ

B

1/γ

Pn

i=1

. Markov’s inequality then gives equation (S.11.1).

Lemma S.11.2. Let (1) Z¯k,tτ = N −1/2

(2) Z¯t = N −1/2

(3) Z¯i = T −1/2

N X i=1 N X i=1 T X

[eit Xk,iτ − EC (eit Xk,iτ )] ,  2  eit − EC e2it ,

 2  eit − EC e2it .

t=1

Under assumption 5 we have (1) 4 EC Z¯k,tτ ≤ B , (2) 4 EC Z¯tτ ≤ B , (3) 4 EC Z¯i ≤ B , (1) (2) (3) for some B > 0, i.e. the conditional expectations Z¯k,tτ , Z¯tτ , and Z¯i are uniformly bounded

over t, τ , or i, respectively. (1) (1) Proof. # We start with the proof for Z¯k,tτ . Define Zk,tτ ,i = eit Xk,iτ − EC (eit Xk,iτ ). By assump-

tion we have finite 8’th moments for eit and Xk,iτ uniformly across k, i, t, τ , and thus (using (1)

Cauchy Schwarz inequality) we have finite 4th moment of Zk,tτ ,i uniformly across k, i, t, τ . (1)

For ease of notation we now fix k, t, τ and write Zi = Zk,tτ ,i . We have EC (Zi ) = 0 and EC (Zi Zj Zk Zl ) = 0 if i ∈ / {j, k, l} (and the same holds for permutations of i, j, k, l). Using this we compute EC

N X i=1

!4 Zi

N X

=

EC (Zi Zj Zk Zl )

i,j,k,l=1

=3

X i6=j

=3

N X

 X  EC Zi2 Zj2 + EC Zi4 i N n   X   2 o , EC Zi2 EC Zj2 + EC Zi4 − 3 EC Zi2

i,j=1

i=1

38

1/γ

|Zi |γ )



(1)

Since we argued that EC (Zi4 ) is bounded uniformly, the last equation shows that Z¯k,tτ = P (1) N −1/2 N i=1 Zk,tτ ,i is bounded uniformly across k, t, τ . This is what we wanted to show. (2) (3) # The proofs for Z¯t and Z¯ are analogous. i

Lemma S.11.3. For a T × T matrix A we have4

truncR

A

≤ M AtruncR ≡ M max max t

max |Atτ | ,

t T .

39

√ √ which implies kB2,k k = Op ( N T ), and analogously we find kB3,k k = Op ( N T ). Therefore 4R (kB1,k1 kkB3,k2 k + kB2,k1 kkB1,k2 k) NT  √ √ 4R  = Op (N 1/2 )Op ( N T ) + Op ( N T )Op (N 1/2 ) = op (1) . NT

|A1,k1 k2 | ≤

This is what we wanted to show. # Finally, we want to show A2 ≡ (N T )−1

PN PT

(e2it − eb2it ) Xbit Xbit0 = op (1).  According PK  b to theorem S.9.1 we have e − eb = C1 + C2 , where we defined C1 = − k=1 β k − β 0k Xk , and  P b (1) 0 be − eb(rem) , which satisfies β − β C2 = K k k (Pλ0 Xk Mf 0 + Xk Pf 0 ) + Pλ0 e Mf 0 + e Pf 0 − e k=1 i=1

t=1

kC2 k = Op (N 1/2 ), and rank(C2 ) ≤ 11R (actually, one can easily prove ≤ 5R, but this does not follow from theorem S.9.1). Using this notation we have N T 1 XX A2 = (eit + ebit )(C1,it + C2,it )Xbit Xbit0 , N T i=1 t=1

which can also be written as A2,k1 k2 = −

K  X k3 =1

 b − β 0 (C5,k k k + C6,k k k ) + 1 Tr (C2 C3,k k ) + 1 Tr (C2 C4,k k ) , β k3 1 2 1 2 1 2 3 1 2 3 k3 NT NT

where we defined C3,k1 k2 ,it = eit Xbk1 ,it Xbk2 ,it , C4,k1 k2 ,it = ebit Xbk1 ,it Xbk2 ,it , C5,k1 k2 k3

N T 1 XX = eit Xbk1 ,it Xbk2 ,it Xk3 ,it , N T i=1 t=1

C6,k1 k2 k3 =

N T 1 XX ebit Xbk1 ,it Xbk2 ,it Xk3 ,it . N T i=1 t=1

Again, since we have uniformly bounded 8’th moments for eit and Xk,it , we find kC3,k1 k2 k4 ≤ kC3,k1 k2 k4F = ≤

N X T X i=1 t=1 N X T X

!2 e2it Xbk21 ,it Xbk22 ,it ! e4it

i=1 t=1 2

N X T X i=1 t=1

2

= Op (N T ) , 40

! Xbk41 ,it Xbk42 ,it

√ i.e. kC3,k1 k2 k = Op ( N T ). Furthermore kC4,k1 k2 k2 ≤ kC3,k1 k2 k2F = ≤

N X T X

eb2it Xbk21 ,it Xbk22 ,it

i=1 t=1 N X T X



i=1 t=1 N X T X i=1 t=1

! eb2it



max max Xbk21 ,it Xbk22 ,it







i=1...N t=1...T

! e2it

max max Xbk21 ,it Xbk22 ,it

i=1...N t=1...T

= Op (N T )Op ((N T )(4/(8+)) ) = op ((N T )(3/4) ) . Here we used the assumption that Xk has uniformly bounded moments of order 8 +  for some P PT 2 P PT 2 bit ≤ N  > 0. We also used N i=1 t=1 eit . i=1 t=1 e For C5 we find 2 C5,k ≤ 1 k2 k3

!  N T 1 b2 b2 1 XX 2 2 e X X X N T i=1 t=1 it N T k1 ,it k2 ,it k3 ,it

= Op (1) , i.e. C5,k1 k2 k3 = Op (1), and analogously C6,k1 k2 k3 = Op (1), since

PN PT i=1

b2it t=1 e



PN PT i=1

2 t=1 eit .

Using these results we obtain K

X 11R 11R

b 0 |A2,k1 k2 | ≤ − kC2 kkC3,k1 k2 k + kC2 kkC4,k1 k2 k

β k3 − β k3 |C5,k1 k2 k3 + C6,k1 k2 k3 | + NT NT k3 =1 √ 11R 11R = Op ((N T )−1/2 )Op (1) + Op (N 1/2 )Op ( N T ) + Op (N 1/2 )op ((N T )3/4 ) = op (1) . NT NT

This is what we wanted to show. Remember that the truncation Kernel Γ(.) is defined by Γ(x) = 1 for |x| ≤ 1 and Γ(x) = 0 otherwise. Without loss of generality we assume in the following that the bandwidth parameter M is a positive integer (without this assumption, one needs to replace M everywhere below by the largest integer contained in M , but nothing else changes). Proof of Lemma S.10.4. By Lemma S.10.2 we know that asymptotically Pfb is close to Pf 0 and therefore rank(PfbPf 0 ) = rank(Pf 0 Pf 0 ) = R , i.e. rank(Pfbf 0 ) = R asymptotically. We can therefore write fb = P bf 0 H, where H = HN T is a non-singular R × R matrix. f

41

We now want to show kHk = Op (1) and kH −1 k = Op (1). Due to our normalization of fb and f 0 we have H = (fb0 P bf 0 /T )−1 = (fb0 f 0 /T )−1 , and therefore kH −1 k ≤ kfbkkf 0 k/T = Op (1). f

We also have fb = f 0 H + (Pfb − Pf 0 )f 0 H, and thus H = f 00 fb/T − f 00 (Pfb − Pf 0 )f 0 H/T , i.e.  kHk ≤ Op (1) + kHkOp T −1/2 which shows kHk = Op (1). Note that all the following results only require kHk = Op (1) and kH −1 k = Op (1), but apart from that are independent of the choice of normalization.



The advantage of expressing fb in terms of Pfb as above is that the result Pfb − Pf 0 =  Op T −1/2 of Lemma S.10.2 immediately implies

b

0 f − f H

= Op (1) . The FOC wrt λ in the minimization of the first line in equation (4) reads ! K X bk Xk fb , b fb0 fb = Y − β λ

(S.11.2)

k=1

which yields " b = λ0 f 00 − λ " = λ0 f 00 +

K  X k=1 K  X

#

 −1 b − β 0 Xk fb fb0 fb β k k 

#

−1  −1 b Xk + e P bf 0 f 00 P bf 0 (H 0 ) β 0k − β k f f 

k=1 0 −1

= λ0 (H )

   −1 −1 + λ0 f 00 Pfb − Pf 0 f 0 f 00 Pfbf 0 (H 0 )   −1  −1 0 00 0 00 0 00 0 −1 +λ f f f Pfbf − f f (H 0 ) " K #   −1 X −1 bk Xk + e P bf 0 f 00 P bf 0 + β 0k − β (H 0 ) . f f k=1



00

We have f Pfbf

0

.

−1

T)

00 0

− (f f /T )

−1

= Op (T

−1/2



), because Pfb − Pf 0 = Op T −1/2 and

00 0

f f /T by assumption is converging to a positive definite matrix (or given our particular √ choice of normalization is just the identity matrix IR ) In addition, we have kek = Op ( T ), √ √ b − β 0 k = Op (1/ N T ). Therefore kXk k = Op ( N T ) and by corollary S.10.1 also kβ

b 0 0 −1

λ − λ (H ) = Op (1) , which is what we wanted to prove.

42

(S.11.3)

Next, we want to





show b0 λ b λ N

!−1





!−1

 (H) λ λ (H )

= Op N −1/2 , −

N

!−1 

 −1 0 0 00 0

 fb fb H f f H

= Op T −1/2 . −

T T

−1

00

0

0 −1

(S.11.4)

b0 λ b and B = N −1 (H)−1 λ00 λ0 (H 0 )−1 . Using (S.11.3) we find Let A = N −1 λ i ih i h 0 ih h 1

b0 −1 00 −1 00 0 0 0 −1 0 −1 b b b λ + λ (H ) + λ − (H) λ kA − Bk = λ − λ (H )

λ + (H) λ 2N  = N −1 Op (N 1/2 ) Op (1) = Op N −1/2 . By assumption 1 we know that



λ00 λ0 −1



= Op (1) .

N and thus also kB −1 k = Op (1), and therefore kA−1 k = Op (1) (using kA − Bk = op (1) and applying Weyl’s inequality to the smallest eigenvalue of B). Since A−1 − B −1 = A−1 (B − A)B −1 we find

−1





A − B −1 ≤ A−1 B −1 kA − Bk  = Op N −1/2 . Thus, we have shown the first statement of (S.11.4), and analogously one can show the second one. Combining (S.11.3), (S.11.2) and (S.11.4) we obtain

!−1

 00 0 −1  00 0 −1 00 0 !−1 0 0 0 b b b b b b

λ λλ ff f λ λ λ f f f

√ √ −√ √

N N T N T T N T



!−1 !−1 

−1 0 00 0 !−1 −1 00 0 0 0b 0 0 00 0 0 −1 0 −1 b b b b b

λ λλ ff f λ (H ) (H) λ λ (H ) Hf f H Hf √ √ √ √ = −

N N T N T T N T

 = Op N −1/2 , 0

b (λ b λ) b −1 (fb0 fb)−1 fb0 is independent which is equivalent to the statement in lemma. Note also that λ of H, i.e. independent of the choice of normalization.

43

Proof of Lemma S.10.5. # Part A of the proof: We start by showing that

h i

truncR N −1 EC e0 Xk − (e0 Xk )

= op (1) .

(S.11.5)

Let A = e0 Xk and B = A − AtruncR . By definition of the left-sided truncation (using the equal weight kernel Γ(.)) we have Btτ = 0 for t < τ ≤ t+M and Btτ = Atτ otherwise. By assumption 5 P we have EC (Atτ ) = 0 for t ≥ τ . For t < τ we have EC (Atτ ) = N i=1 EC (eit Xk,iτ ). We thus have PN EC (Btτ ) = 0 for τ ≤ t + M , and EC Btτ = i=1 EC (eit Xk,iτ ) for τ > t + M . Therefore kEC (B)k1 = max

t=1...T

≤ max

t=1...T

T X

|EC (Btτ )|

τ =1 T X τ =t+M +1

N X E (e X ) C it k,iτ ≤ N max t=1...T i=1

T X

c (τ − t)−(1+) = op (N ) ,

τ =t+M +1

where we used M → ∞. Analogously we can show kEC (B)k∞ = op (N ). Using part (vii) of Lemma S.4.1 we therefore also find kEC (B)k = op (N ), which is equivalent to equation (S.11.5) that we wanted to show in this part of the proof. Analogously we can show that

h i 0 0 truncD −1 N EC e e − (e e)

= op (1) ,

h i

truncD T −1 EC ee0 − (ee0 )

= op (1) . # Part B of the proof: Next, we want to show that

truncR N −1 [e0 Xk − EC (e0 Xk )]

= op (1) . Using Lemma S.11.3 we have

truncR N −1 [e0 Xk − EC (e0 Xk )]

≤ M max

(S.11.6)

N −1 |e0t Xk,τ − EC (e0t Xk,τ )| N X ≤ M max max N −1 [eit Xk,iτ − EC (eit Xk,iτ )] t t