VARS WITH MIXED ROOTS NEAR UNITY
By Peter C.B. Phillips and Ji Hyung Lee
January 2012
COWLES FOUNDATION DISCUSSION PAPER NO. 1845
COWLES FOUNDATION FOR RESEARCH IN ECONOMICS YALE UNIVERSITY Box 208281 New Haven, Connecticut 06520-8281 http://cowles.econ.yale.edu/
VARs with Mixed Roots Near Unity Peter C. B. Phillips Yale University, University of Auckland, Singapore Management University & University of Southampton Ji Hyung Lee Yale University January 8, 2012
Abstract Limit theory is developed for nonstationary vector autoregression (VAR) with mixed roots in the vicinity of unity involving persistent and explosive components. Statistical tests for common roots are examined and model selection approaches for discriminating roots are explored. The results are useful in empirical testing for multiple manifestations of nonstationarity –in particular for distinguishing mildly explosive roots from roots that are local to unity and for testing commonality in persistence. Keywords: Common roots, Local to unity, Mildly explosive, Mixed roots, Model selection, Persistence, Tests of common roots. JEL classi…cation: C22
1
Introduction
Aman Ullah’s contributions cover a wide spectrum of econometrics with sustained scienti…c work over the last four decades in …nite sample theory, nonparametric estimation, spatial econometrics, panel data modeling, …nancial econometrics, time series and applied econometrics. His advanced texbook on Nonparametric Econometrics (1999, with Adrian This paper is based on the …rst part of a Yale take home examination in 2010/2011. Phillips acknowledges support from the NSF under Grant No. SES-0956687.
1
Pagan) has been particularly in‡uential, helping to educate a generation of econometricians in nonparametric methods and providing an accessible reference for applied researchers. His monograph on Finite Sample Econometrics (2004) encapsulates many of his own contributions to this subject and touches some of the wider reaches of this di¢ cult and vitally important …eld. One …eld of econometrics that his work has less frequently touched is nonstationary time series and unit root limit theory. Since the mid 1980s models with autoregressive roots in the vicinity of unity have attracted much attention. These models are particularly useful in empirical work with nonstationary series when it may be too restrictive to insist on the presence of roots precisely at unity or where mildly integrated or mildly explosive behavior may be more relevant than unit roots. When multiple time series are considered, it may be useful to allow simultaneously for various types of behavior in the individual series: some roots that are local to unity and others that are mildly integrated or mildly explosive. Limit theory for regressors with roots local to unity developed early in the literature of this …eld (Phillips, 1987; Chan and Wei, 1987). More recent work has considered mildly integrated and mildly explosive cases (Phillips and Magdalinos, 2007a, 2007b; [PM7]). The latter theory has proved particularly relevant in studying data during periods of …nancial exuberance (Phillips, Wu and Yu, 2011; Phillips and Yu, 2011). The present paper considers time series models with mixed and common roots in the vicinity of unity. To simplify exposition, we work with a bivariate model and analyze a case of primary interest where there is one local to unit root and one mildly explosive root. Models of this type may be anticipated when there are dual manifestations of nonstationarity with somewhat di¤erent individual characteristics. Or there the behavior may be common across series – for instance in asset prices – arising from a single source of persistence of exuberance. We may be particularly interested empirically in testing commonality in persistence or long run behavior across series, which occurs when the autoregressive roots have the same value. The remainder of the paper is organized as follows. Section 2 considers mixed VARs whose variates have mixed degrees of persistence that allow for a local to unit root and a mildly explosive root. Modi…ed Wald statistics for testing commonality in long run behavior are developed and shown to produce consistent tests. Section 3 considers a model selection approach and shows that the BIC criterion can distinguish persistent and mildly explosive behavior. Section 4 concludes and technical derivations are given in the Appendix. 2
2
Mixed Variate VARs
For simplicity of exposition, we consider the bivariate VAR(1) model Xt = Rn Xt "
+ ut ; t = 1; :::; n; # 0 c ; n =1+ ; n n
n
Rn =
(2.1)
1
0
n
b ; b > 0; kn
(2.2)
; t = 1; :::; n
(2.3)
=1+
which we write in component form as "
X1t X2t
#
=
"
0
n
0
n
1=2
with initialization X0 = op kn
#"
X1t
1
X2t
1
#
+
"
u1t u2t
#
; and martingale di¤erence innovations ut satisfying
Assumption 1 below. Our results may be extended to systems with weakly dependent errors ut under conditions like those in the linear process framework of Magdalinos and Phillips (2009), but all the key ideas follow as in the simpler VAR(1) model studied here so we do not provide details. The coe¢ cient
n
= 1+
c n
is local to unity,
is a mildly explosive coe¢ cient with b > 0 and the sequence kn satis…es n ! 1:
Although
kn
n n
n n
n
is ‘further’from unity than
n
statistical tests need to di¤erentiate
Assumption 1.
! 0 as
n
from
n
n
from the persistence
for all …nite c as n ! 1:
The errors fut g in (2.1) form a martingale di¤ erence sequence with
respect to the natural …ltration Ft = EFt for some
kn n
b kn
for all …nite c as n ! 1:
In order to distinguish the mildly explosive behavior induced by n,
+
= 1+
! 1 as n ! 1 (so both coe¢ cients are in the vicinity of unity),
1 ! b > 0 and so
induced by
1 kn
n
1
ut u0t =
(ut ; ut
1 ; :::)
and EFt
> 0 and positive de…nite matrix
1
=
max E kut k2 1 fkut k >
1 t n
3
satisfying
kut k
"
ng
a:s: for all t
11
12
21
22
#
(2.4)
; supt E kut k4 < 1; and
! 0 as n ! 1
(2.5)
for any sequence (
n )n2N
such that
kM k = max i
n
n 1=2 i
! 1, and where :
i
is an eigenvalue of M 0 M
o
is the spectral norm of the matrix M: As expected from the di¤erences in the coe¢ cients
n
and
n
in (2.3), the time series
components X1t and X2t have di¤erent orders of magnitude as n ! 1: These di¤erences
translate into di¤erent rates of convergence of the sample moments of Xt and the least squares regression components. To accommodate these di¤erences we employ the (asymptotically equivalent) normalizing matrices Dn :=
"
n
0
0 kn
n n
#
and Fn :=
"
n 0
0 (
2 n
n n
1)
#
:
The unrestricted least squares regression estimate of Rn in (2.1) is written in standard ^ n = X 0 X 1 X 0 X 1 1 : This estimate is consistent and has a limit disnotation as R 1 tribution that is obtained from a combination of functional limit theory that applies to the persistent components and central limit theory that applies to the mildly explosive components, as detailed in the following result. Theorem 2.1 As n ! 1; ^n R where J1c (r) =
Rr 0
0
ec(r
Rn Fn )
s) dB
1 (s);
R1 0 R
J1c (r)dB(r)
1 0
J1c (r)2 dr
Y (b) X2 (b)
:= ;
(2.6)
which is an Ornstein-Uhlenbeck (O-U) process, B (r) =
(B1 (r) ; B2 (r)) is bivariate Brownian motion with variance matrix ; X(b) = (X1 (b) ; X2 (b))0 1 N (0; 2b ); Y (b) =d X(b); and X(b) and Y (b) are independent. The two column components R1 0 R
J1c (r)dB(r)
1 0
J1c (r)2 dr
and
Y (b) X2 (b)
of the limiting matric variate
are independent.
Remarks ^ n Rn converge at di¤erent rates, the …rst at the usual O (n) rate 1. The two columns of R for near integrated regressions and the second at the mildly explosive rate
4
(
2 n
n n
1)
=
O (kn
n n)
= O kn ebn=kn : In particular, writing n(b r11
r11 ) )
n(b r21 n n
(
2 n
1)
(b r22
r22 ) )
2. The process J1c (r) =
Rr
r21 ) )
22
ec(r
=
=
11
=
21
Y2 (b) ; X2 (b) (
n n 2 n
=(
ij ) ;
we have
R1
J1c (r)dB1 (r) ; R1 2 dr J (r) 1c 0
0
(2.7)
R1
(2.8)
J1c (r)dB2 (r) ; R1 2 dr J (r) 1c 0
0
1)
(b r12
r12 ) )
12
=
Y1 (b) : X2 (b)
(2.9)
s) dB (s) 1
that appears in the limit variate 11 involves R1 R1 component B1 (r) of B(r); so that the limit variate 0 J1c (r)dB1 (r)= 0 J1c (r)2 dr has 0
a standard local unit root distribution that is independent of
11
but is dependent
on c: 3. The limit variate write
Y (b) X2 (b)
=:
Y X2 ;
Y (b) X2 (b)
=
(2b)1=2 Y (b) (2b)1=2 X2 (b)
=:
Y X2
is independent of b and we can therefore
N (0; ) ; X = (X1 ; X2 )0
where Y
N (0; ) ; and X and Y are
independent. As indicated earlier, we may be interested in testing commonality of persistence characteristics in the component series X1t and X2t : In the present case, setting Rn = (rij ) and under a maintained hypothesis that Rn is diagonal with roots local to unity, commonality amounts to testing the hypothesis H0 : r11 = r22 = 1 + The null can be written as H0 :
a1 0 vec (Rn )
= 0 where
a01
c n
for some …nite c 2 ( 1; 1) :
= [1; 0; 0; 1] without explicitly
specifying a common persistence parameter rn = 1 + c=n: H0 may also be subsumed in a block test of Rn = rn I for some rn = 1 + nc ; which we can write in the form H0A : A0 vec (Rn ) = 0 where we use row vectorization in the vec operator and 2
1 0 0
1
6 A0 = 4 0 1 0
3
2
a01
3
7 6 7 0 5 =: 4 a02 5 : 0 a03
0 0 1
The standard Wald test of H0 uses the statistic ^n Wn = a01 vec R
2
n =a01 ^ 5
X 0 1X
1 1
o
a1 ;
and the corresponding block test of H0A has the form n A0 ^ n 0 A0 ^
0
^n A0 vec R
WnA =
^n A0 vec nR
=
X 0 1X
1 1
n2 X 0 1 X
o
Under (2.3) the coe¢ cients r11 = b kn
o
^n A0 vec R 1
A
^n A0 vec nR
;
based on the least squares residuals
and r22 =
n
1
1
P where ^ = n 1 nt=1 u ^t u ^0t is a consistent estimator of ^ n Xt 1 : u ^t = Xt R
1
A
n;
so that r11
r22 =
c n
b kn
= o(1); which is local to zero. Hence the model (2.2) actually corresponds to a local
alternative to the null H0 : Theorem 2.2 Under the null hypothesis H0 : Rn = rn I with rn = 1 + nc , as n ! 1 Wn )
and WnA where Jc (r) =
Rr 0
0
)
ec(r
A A
0
(a01 )2 0
(
s) dB(s);
alternative H1 : Rn = diag (
R1
a01 Z
Jc (r)Jc
0
Jc (r)Jc (r) dr
= vec ( ) and
0
n; n)
Wn ; WnA 11
Remarks
R1 0
) !
1
A0 ;
A
R1
=
(2.10)
a1
1
1
0
n kn b
;
1
(r)0
R1
dBJc 0
0
(2.11) 1
Jc Jc 0
: Under the
2 1
J1c (r)2 dr
f1 + op (1)g = Op
n kn
2
:
(2.12)
4. The null limit distributions (2.10) and (2.11) are parameter dependent. The dependence involves the localizing coe¢ cient c and the variance matrix =
Z
0
1
dBB 0
Z
0
1
1
BB 0
=
1=2
Z
1
0
6
dV V 0
Z
0
1
: When c = 0;
1
VV0
1=2
=:
1=2
V
1=2
where V
BM (I2 ) is standard vector Brownian motion. The limit distribution of
the Wald statistic is then Wn )
where
V
a01
1=2
a01
1=2
= vec (
V)
2
1=2
R1 0
V
=
1
VV0
1=2
a1
b0 I
(b0 V )2 R1 0 0 VV
1
;
(2.13)
b
and 1=2
b=
1=2
a1
1 ) a )1=2 1
(a01 (
lies on the unit sphere b0 b = 1: Thus, even in the case of a common unit root, the null limit distribution of the test depends on ; although this matrix is consistently estimable by the residual moment matrix ^ : In the general case, the limit distributions (2.10) and (2.11) both have nuisance parameters (c; ) : 5. The parameter c is not consistently estimable and it is therefore not possible to construct a standard test of the composite H0 : However, modi…ed tests are available to distinguish H0 from alternatives that involve a mildly explosive component. For instance, for some (possibly slowly varying) sequence Ln ! 1; the statistic WLn =
Wn =Ln !p 0 under H0 for all …nite c: Then, under the alternative hypothesis H1 ;
WLn = Op
n2 2L kn n
which diverges for all sequences Ln ! 1 such that
particular, if kn = O (n ) for some then WLn = Op any …xed critical
n2(1 Ln
)
value1
2L kn n n2
! 0: In
2 (0; 1) and Ln is slowly varying at in…nity,
! 1 as n ! 1 and tests based on the statistic WLn with
are consistent and have zero size asymptotically. Similar
remarks apply to the block test based on WLAn = WnA =Ln : 6. In view of (2.12), Wn ; WnA = Op scaled statistics WLn alternative H1 : r11 = 1
n2 2 kn
and the Wald statistics diverge, as do the
and WLAn . So there c n = 1 + n ; r22 = n
is discriminatory power under the local =1+
b kn :
For example, asymptotic critical values might be computed for the limit distribution (2.13) with
and b=
a1 (a01 a1 )1=2
7
=I
3
Model Selection
Another approach to testing for common roots in (2.1) is to apply model selection methods. This involves estimating (2.1) in the restricted case under the null of a common root and under the alternative of unrestricted roots. Estimating (2.1) under the restriction Rn = rn I gives the pooled least squares estimator Pn P 1 0 of the common root rn : We have the following r^n = ( nt=1 Xt0 Xt 1 ) t=1 Xt 1 Xt 1
limit theory for r^n under the null hypothesis and alternative.
Lemma 3.1 (i) Under the null Rn = rn I with rn = 1 + nc ; r^n has the limit distribution n (^ rn
rn ) )
Z
1
0
Jc (r) dB =
0
Z
1
Jc (r)0 Jc (r)dr ;
(3.1)
0
and the residual moment matrix ~ = n the form
1
n
Pn
~t u ~0t t=1 u
!p
X ~= 1 ut u0t + O n n
1
; where u ~t = Xt :
r^n Xt
1;
has
(3.2)
t=1
(ii) Under the alternative hypothesis where Rn = diag ( kn
n rn n (^
n)
) 2b
n; n) ;
r^n has the limit distribution
Y2 (b) ; X2 (b)
(3.3)
where Y2 (b) =d X2 (b) N 0; 2b22 and Y2 (b) and X2 (b) are independent. The residual moment matrix ~ of the restricted regression has the following asymptotic behavior under the alternative hypothesis: n
X b2 n ~= 1 ut u0t + 2 n kn t=1
Pn
"
1 n2
Pn
2 t=1 X1t
0
1
0 0
#
f1 + op (1)g :
(3.4)
, it follows from (3.2) that ~ is consistent for under R1 2 P 2 the null. However, from (3.4) and the fact that n 2 nt=1 X1t 1 ) 0 J1c ; it is apparent 2 that ~ is consistent for when n = o k 2 but is inconsistent when kn = O (1) and, in Since
=n
1
0 t=1 ut ut
!p
n
particular, when kn = o
n1=2
n
: These results enable us to determine conditions for the
consistency of model selection criteria such as the Schwarz criterion (BIC). For the model (2.1), the restricted regression and unrestricted regression BIC criteria
8
are:
log n log n BICr = log ~ + ; BICu = log ^ + 4 : n n
When the null holds and Rn = rn I it is evident that log n BICr = log ~ + = log n
+
log n + Op n
1 n
;
(3.5)
whereas for the unrestricted regression log n BICu = log ^ + 4 = log n since ^ =
+ Op n
1
+4
log n + Op n
1 n
(3.6)
analogous to the proof of (3.2). In view of (3.5) and (3.6), BICr
BICu under the alternative as n ! 1 whenever b2 n n kn2
2
Pn
2 t=1 X1t 1
>3
11:2
log n ; n 2
which inequality holds with probability approaching unity provided k2 nlog n ! 1 as n ! 1 n R1 2 P 2 because n 2 nt=1 X1t 1 ) 0 J1c > 0 with probability one. Hence, under the alternative,
the unrestricted model will be chosen with probability approaching unity as n ! 1 provided kn goes to in…nity slower than n= log n; that is provided
kn log n n
! 0:
It follows that model selection by BIC is consistent and as n ! 1 the criterion will
successfully distinguish roots in the vicinity of unity provided one of the roots
n
= 1 + kbn
is mildly explosive and su¢ ciently di¤erent from local to unity in the sense that kn ! 1 slower than O
n log n
. In this respect, the discriminatory capability of model selection is
analogous to that of classical Wald testing.
4
Conclusion
Model selection by BIC is well known to be blind to local alternatives in general (see Ploberger and Phillips, 2003; and Loeb and Poetscher, 2005). For instance, in the current set up, BIC cannot consistently distinguish between a model with a unit root ( and models with roots local to unity (
n
= 1+
c n ),
n
= 1)
just as localizing coe¢ cients such as
the parameter c are not consistently estimable. On the other hand, as shown here, BIC and classical tests can successfully distinguish roots in the immediate locality of unity like
n
from roots that are in the wider vicinity of unity like
n;
which opens the door to
distinguishing mildly explosive behavior in data. We expect these model selection results to be generalizable to models with weakly dependent innovations, analogous to the …ndings in Phillips (2008) on unit root discrimination and Cheng and Phillips (2009) for cointegrating rank determination. Tests of this type will be useful in empirical work where it is of interest to di¤erentiate between the behavioral time series character of …nancial data such as asset prices and the fundamentals that are believed to determine prices, like dividends and earnings. In such cases, the primary maintained hypothesis is that the series have roots that are local to unity (without being speci…c about the localizing coe¢ cient) and the alternative is that one or other of the series may be mildly explosive at least over subperiods of data (see
10
Phillips, Wu and Yu, 2011; Phillips and Yu, 2011). On the other hand, if the primary maintained hypothesis is that both series may be mildly explosive and the null hypothesis is commonality in the roots, then problems of bias and inconsistency may arise in testing and model selection. Recent work by Nielsen (2009) and Phillips and Magdalinos (2011) provide a limit theory for least squares regression in the case of purely explosive common roots and show that least squares regression is inconsistent. That work may be extended to the case of common mildly explosive roots and will be explored in later work.
5 5.1
Appendix Preliminary Lemmas
We start with some lemmas that assist in the asymptotic development. These results rely on existing limit theory so we only sketch the main details here for convenience. We repeatedly use the fact that kn (
2 n
1) = 2b+O( k1n ) and
n
n
= exp( b knn ) f1 + o(1)g = o(1): The …rst
result is from PM7. See also Phillips and Magdalinos (2008) and Magdalinos and Phillips (2009) for related results on systems with explosive and mildly explosive processes. Lemma 5.1 (PM7) De…ne Xn (b) =
Yn (b) =
"
"
X1n (b) X2n (b) Y1n (b) Y2n (b)
#
#
n 1 X p := kn j=1
n 1 X := p kn j=1
Then, as n ! 1; Xn (b))X(b) = (X1 (b) ; X2 (b))0
n
n
j
uj ;
(n j) 1
uj :
1 ); and Yn (b))Y (b) = N (0; 2b
(Y1 (b) ; Y2 (b))0 ; where Y (b) =d X(b); and X(b) and Y (b) are independent. Lemma 5.2 De…ne Sn (r) :=
p1 n
c X1n (r)
Pbnrc
=
X2n (b) =
j=1
uj and
bnrc X1bnrc 1 X p =p n n
j=1 n X
X 1 p 2n n = p kn n kn
11
j=1
j n u1bnrc j ;
u2j j n
:
Then, as n ! 1; (i) Sn (r) =
"
p1 n p1 n
Pbnrc j=1
u1j
j=1
u2j
Pbnrc
Rr
c (r))J (r) = (ii) X1n 1c
0
#
ec(r
)
"
s) dB
(iii) X2n (b))X2 (b); where X2 (b)
#
B1 (r) B2 (r)
1 (s)
= B(r)
and n
N 0;
22
2b
1
;
BM ( );
Pn
j=1 X1t 1 ut
)
R1 0
J1c (r)dB(r);
(iv) J1c (r) and X2 (b) are independent. (v) For all s; r > 0 the following joint convergence applies: "
# X1bnrc X2bnsc p ;p ) [J1c (r); X2 (b)] ; as n ! 1: bnsc n kn n
Proof. Result (i) is standard, (ii) is from Phillips (1987b), and (iii) is from lemma 5.1. To prove (iv), it su¢ ces to show that B1 (r) and X2 (b) are independent, since J1c (r) is a functional of fB1 (s)gs
r
: Note that the covariance 20
!3 n X 1 u2k 5 p kn k=1 kn
1 n X 1 E (S1n (1)X2n (b)) = E 4@ p u1j A n p 12 nkn
=
p
=
j=1
n X
k n
k=1
1
12
nkn
1
1
n
12 =p nkn
1
n
f1 + o(1)g =
1 1
n
n
12
b
r
n 1
kn f1 + o(1)g = o(1); n
as n ! 1: Independence of the limit processes J1c (r) and X2 (b) follows. To prove (v), …rst
observe that for any (integer sequence) Ln ! 1 such that Pn u2j X X2 (b): Note that X2n (b) = pk 2LnLn + p1k j=Ln +1 j and n
n n
1 E p kn
n X
j=Ln +1
2
u2j j n
= =
1 kn kn
! 1; we have
n
n X
j=Ln +1 22 2 n
Ln kn
1
12
22 2j n
=
n
1
22
kn
2Ln
2Ln +2 n n
2n
1
n
1 = o(1);
2n+2Ln n
2
X p 2Ln n kn L n
)
since
n
X p 2Ln n kn L n
2Ln
= 1+
2Ln
b kn
=
b kn
1+
n 2L k
kn
n
= exp( 2b Lknn ) + o(1) = o(1): Hence,
) X2 (b) by lemma 5.1. Now let Ln = bnsc for any s > 0 and then
X2bnsc p bnsc kn n
)
X2 (b): Joint convergence and (v) follow from marginal convergence and asymptotic independence of the components. Lemma 5.3 As n ! 1; (i) (ii) (iii)
Pn
1
1 n2
2 t=1 X2t 1
2n n
2 kn
Pn
1 nkn
2 t=1 X1t 1
n n
Pn
)
)
R1 0
(X2 (b))2 ; 2b
J1c (r)2 dr;
t=1 X1t 1 X2t 1
= op (1):
Proof. (i) follows from MP7 and (ii) is standard (Phillips, i1987a &b). For (iii), it is h X1bnrc X p convenient to take a probability space where !p [J1c (r); X2 (b)] : Then, ; p 2bnsc bnsc n for any sequence Ln ! 1 such that 1 nkn
n X
n n t=1
X1t
1 X2t 1
=
=
1 p nkn
X (b) p 2 n nkn n +p
=
=
Now
Pn
t=1
E
J1c
t n
n X t=1
t 1 n
J1c
n n
Ln n
nkn
X (b) p 2 n nkn n X (b) p 2 n nkn n
kn
Ln n
n
! 0; we have
(L n X
n X
+
t=1 n X
t=Ln +1
Ln X
t 1 n f1
X1t 1 p n
J1c
t=1
X2t 1 p kn tn 1
t 1 n f1
t n
t 1 n
t 1 n
+ op (1)g t 1 n Ln n
X2t 1 p kn tn 1 t n
J1c
t=Ln +1 n X
X1t 1 p n
t n
J1c
t=Ln +1
n n t=1 n X
)
+ op (1)g + Op
+ op (1):
has zero mean and variance t n
t 1 n
!2
=
n X n X
t=1 s=1 n n
M
n
13
E J1c 1 1
t n
J1c
M0
kn2 2n n ; 2 b
2
s n
t+s 2 n
L Ln pn n n nkn n
for some …nite constants M and M 0 : It follows that ! n X t 1 kn2 2n n t 1 Var p J = O 1c n n nkn nn t=L nkn 2n n
=O
kn n
= o(1);
n
leading to
p 1 nkn
n n
Pn
t=Ln
t 1 n
t n
J1c
= op (1); which implies that
1 nkn
n n
Pn
t=1 X1t 1 X2t 1
op (1) and this also holds in the original probability space, giving the required result. Lemma 5.4 As n ! 1; (i) Dn 1 X 0 1 X (ii) u0 X
1 Dn
1 1 Dn )
1)
h R 1 0
" R1 0
J1c (r)2 dr
#
0 (X2 (b))2 2b
0
J1c (r)dB(r) X2 (b)Y (b)
i
;
:
Proof. Using lemma 5.3 1
Dn X
0
1 X 1 Dn
1
=
Dn
=
"
t=1 1 n2
1 nkn
" R1 0
)
n X
1
n n
Xt 1 Xt0 1
Pn
Pn
!
Dn 1 1
2 t=1 X1t 1
t=1 X2t 1 X1t 1
J1c (r)2 dr
0 (X2 (b))2 2b
0
#
;
Pn
t=1 X1t 1 X2t 1 Pn 1 2 2 2n t=1 X2t 1 kn n
nkn n n
#
giving (i). Result (ii) follows directly from lemmas 5.2 and 5.3 as u0 X
1 1 Dn
= = )
h
1 n
Pn
h P n
h R 1 0
t=1 X1t
t=1
Xp 1t 1 n
1 ut
1
kn n n ut p n
Pn
p
t=1 X2t 1 ut Pn 1
kn
n n
J1c (r)dB(r) X2 (b)Y (b)
i
t=1
:
i
pX2t t 1 1 kn n
ut
t 1 n
i
Joint convergence follows from the independence between B(r) and (X2 (b); Y (b)):
14
=
5.2
Proofs of the Main Results
Proof of Theorem 2.1. Using Lemma 5.4, continuous mapping and joint convergence, we have ^n R
Rn Dn = u0 X
Since (
2 n
1) =
2b kn (1
1 1 Dn
Dn 1 X 0 1 X
1
1 1 Dn
R1
)
0 R
J1c (r)dB(r)
1 0
Y (b) X2 (b)=2b
J1c (r)2 dr
:
+ o(1)) the equivalent result ^n R
R1
Rn Fn )
0 R
J1c (r)dB(r)
1 0
Y (b) X2 (b)
J1c (r)2 dr
;
holds as stated. Proof of Theorem 2.2. We …rst prove (2.10) and (2.12) for the statistic Wn : Under the null we have by standard theory ^n n R ^ =n
1
Rn )
Pn
^t u ^0t t=1 u
Z
0
!p
1
dBJc0
Z
0
1
1
Jc Jc0
=: ; n
2
X
0
1
1X
1
)
Z
1
0
Jc Jc0
(5.1)
; and (2.10) follows directly for Wn and (2.11) for WnA . Under
the alternative from theorem 2.1 with correct centering we have a01 vec
n
^n R
o r11 Rn Fn = n (b
r11 )
(
2 n
n n
1)
r22 ) )a01 vec ;
(b r22
whereas under (2.2) with b > 0; the null centred linear combination behaves as ^n a01 vec nR
= n(b r11 = n(b r11 = n(b r11 = n(b r11
rb22 ) = n(b r11
n n
r11 )
r11 )
n (b r22
(b r22
r22 )
n(
r22 ) + n(r11
r22 )
2 n
nb kn
1) n n
(
1)
kn = knn kn ( 2n 1) = O( exp(b n ) = o(1): Next, ) n kn Pn 2 ( t=1 X1t 1 X2t 1 ) and using Lemma 4.3 we
1) nb + op (1) r11 ) + c kn n r11 ) + Op ( ) ! 1; as n ! 1; kn 2
in view of (2.7) - (2.9) and since n( nn n Pn Pn 2 2 setting dn = X X t=1 1t 1 t=1 2t 1
+ c
2 n
n
15
n
…nd that n X
dn =
2 X1t
1
t=1
n X
=
n X
2 X2t
t=1
2 X1t 1
t=1
n X
2 X2t
t=1
8 > < 1 1 > : 1 f1
1
nkn n n
1 n2
Pn
Pn
t=1 X1t
1 2 2 2n t=1 X1t 1 kn n
op (1)g ;
1 X2t Pn
2 1
9 > =
2 > t=1 X2t 1 ;
(5.2)
and dn 2 n kn2
=
2n n
)
n n 1 X 2 1 X 2 X2t 1 f1 X1t 1 2 2n n2 kn n t=1 t=1 ! Z 1 2 X (b) J1c (r)2 dr : 2b 0
op (1)g
It follows that n
2
X
0
1
1X 1
n2 = dn 2
"
Pn
2 t=1 X2t 1
t=1 X1t 1 X2t 1
2 Pn n 2 X P t=1 1t 1 n2 n X t=1 X1t Pn Pn1 2t 2 1 2 X t=1 1t 1 t=1 X2t 1 Pn 1 2 t=1 X1t 1 + op (1) n2
= 4
2
= 4 Since ^ !p
Pn
op (1)
Pn
t=1 X1t 1 X2t 1 P n 2 t=1 X1t 1 P n2 n X t=1 X1t Pn Pn1 2t 2 1 2 X t=1 1t 1 t=1 X2t 2 n Pn 2 t=1 X2t 1
3
2
op (1) 5 4 ) op (1)
R1 0
# 1
J1c
3
5 f1 + op (1)g(5.3)
(r)2 dr 0
; we have n o 1 n2 a01 ^ X 0 1X 1 a1 8 39 2 Pn 1 2 < = t=1 X1t 1 + op (1) op (1) 5 0 n 4 2 a1 = a1 ( + op (1)) : op (1) op (1) ; 8 2 39 1 R1 < 2 0 5= 0 J1c (r) dr 4 ) a01 a1 : 0 0 ; =
11
Z
1
1
J1c (r)2 dr
:
0
16
1
3
0 5 : 0
It follows that Wn = =
giving the stated result.
n 2 ^n X 0 1X 1 =a01 ^ a01 vec R n o2 n(b r11 r11 ) + Op ( knn ) = Op 1 R1 2 dr + o (1) J (r) p 11 1c 0
1
o
a1
n2 kn2
;
The proof of (2.12) for the statistic WnA under the alternative follows the same lines but involves more complex calculations to cope with di¤erent orders of magnitude in the components. First consider the behavior of the centred elements under the alternative. By (2.7) - (2.9) we have
A0 vec
n
^n R
2
o 6 Rn Fn = 6 4
n (b r11
r11 ) (
2 n
n n
(
2 n
(b r12 1)
n (b r21
n n
(b r22 1)
r22 )
r12 ) r21 )
30
7 7 )A0 vec : 5
On the other hand under (2.2) with b > 0; the null-centred linear combinations behave as follows. First, ^n a01 vec nR
rb22 ) = n(b r11
= n(b r11 = n(b r11
r11 )
n (b r22
(b r22
r22 )
n n
r11 )
1) n r11 ) + Op ( ) ! kn
= n(b r11
(
2 n
n(
r22 ) + n(r11
r22 )
2 n
nb kn
1) n n
+ c
1; as n ! 1;
as for Wn .Second ^ n = nb a02 vec nR r12 = and third
n n
(
2 n
1)
rb12
n(
2 n
1) n n
= Op
n kn
exp(b knn )
^ n = nb a3 0 vec nR r21 ) a03 vec ; as n ! 1:
17
!
= op (1) ;
Also, as in (5.3) X 0 1X
2
1
Pn 1 2 Pn t=1 X1t 1 X X Pn t=12 1tP1n 2t 12 t=1 X1t 1 t=1 X2t
=4
1
Pn X X Pn t=12 1tP1n 2t 12 t=1 X1t 1 t=1 X2t Pn 1 2 t=1 X2t 1
1
1
3
5 f1 + op (1)g :
We now evaluate each of the components of the matrix n o 1 A0 ^ X 0 1X 1 A 2 0 3 a1 " 1 0 6 0 7 ^ 11 X 1 X 1 = 4 a2 5 1 ^ 12 X 0 1 X 1 0 a3
1
^ 12 X 0 1 X
1
^ 22 X 0 1 X
1
1
#
[a1 ; a2 ; a3 ] :
Using lemma 4.3 we …nd a01
"
1
^ 11 X 0 1 X
1
^ 12 X 0 1 X
1
1
^ 12 X 0 1 X
1
#
a1 1 ^ 22 X 0 1 X 1 Pn X1t 1 X2t 1 + 2^ 12 Pn t=12 Pn 2 t=1 X1t 1 t=1 X2t 1
1 2 t=1 X1t 1 1 f1 + op (1)g ; = ^ 11 Pn 2 t=1 X1t 1 ^ 11 Pn
=
a01 = =
a02 a03
"
^ 11 X 0 1 X
^ 11 X 0 1 X
"
^ 11 X 0 1 X ^ 12
1
^ 12 X 0 1 X
1 1
X0
X0
1
1
1X 1
1
^ 22
X0
^ 12 X 0 1 X
1
X0
^ 22
1
1
1X 1
1 1
1X 1
^ 12 X 0 1 X
1 1
1
1X 1
1
18
# #
1
f1 + op (1)g
#
a2 1 1 ^ 12 X 0 1 X 1 ^ 22 X 0 1 X 1 Pn X1t 1 X2t 1 1 + ^ 12 Pn ^ 11 Pn t=12 Pn 2 2 X X 1t 1 t=1 2t 1 t=1 X2t Pt=1 n X1t 1 X2t 1 ^ 11 Pn t=12 Pn f1 + op (1)g ; 2 t=1 X1t 1 t=1 X2t 1
"
^ 12
1
1
1 2 t=1 X2t
+ ^ 22 Pn
1
f1 + op (1)g
a2 = ^ 11 Pn
1 2 t=1 X2t
1
f1 + op (1)g ;
a3 = ^ 22 Pn
1 2 t=1 X1t
1
f1 + op (1)g ;
a01
=
a02
^ 12
^ 12 X 0 1 X
1
^ 11 X 0 1 X X0
^ 11 X 0 1 X
1 1
^ 12 X 0 1 X
1 1
1X 1
^ 22
1
1
1
1X 1
1
#
1
"
^ 11 X 0 1 X ^ 12
X0
1 1
1X 1
1
^ 12 X 0 1 X ^ 22
X0
Pn
t=1 X1tP1 X2t 1 n 2 2 t=1 X2t 1 t=1 X1t 1
^ 12 Pn
a3 =
1 1
1X 1
1
#
Hence n X 0 1X A0 ^ 2 ^ 11 Pn 1X 2 t=1 1t 1 6 6 = 6 4 Set Kn = diag(n; n; kn
1 1
o
Pn
t=1 X1tP1 X2t 1 n 2 2 t=1 X1t 1 t=1 X2t 1
^ 11 Pn
1 2 t=1 X1t
^ 22 Pn
1
f1 + op (1)g :
Pn
t=1 X1tP1 X2t 1 n 2 2 t=1 PnX1t 1 t=1 X2t 1 X X 1t 1 2t 1 t=1 Pn 12 Pn X 2 2 t=1 1t 1 t=1 X2t 1
^
t=1
n n)
1
^ 22 Pn
^ 22 Pn
1 2 X2t
1
3
7 7 7 f1 + op (1)g : 5
and observe that
since Pn
1 2 t=1 X2t
a3 = ^ 22 Pn
f1 + op (1)g ;
A
o n 1 X 0 1X 1 AKn Kn A0 ^ 2 2 ^ 11 Pn nX 2 op (1) t=1 1t 1 6 2 6 ^ 22 Pn nX 2 = 6 t=1 1t 1 4 n2 Pn
f1 + op (1)g
1
f1 + op (1)g ;
and a03
1
a3 1 ^ 22 X 0 1 X 1 Pn X1t 1 X2t 1 ^ 22 Pn t=12 Pn 2 X t=1 X2t t=1 1t 1
^ 12 X 0 1 X X0
#
1
1
1 ^ 12 Pn 2 t=1 X2t 1 Pn X1t 1 X2t 1 ^ 22 Pn t=12 Pn 2 t=1 X2t t=1 X1t 1
=
"
"
1t 1 X2t 1 t=1 XP n 2 2 t=1 X1t 1 t=1 X2t 1 Pn X1t 1 X2t 1 n nkn n Pn t=12 Pn 2 t=1 X1t 1 t=1 X2t 1
3
op (1) op (1) 2 2n
^ 22 Pnkn Xn2 t=1
2t 1
7 7 7 f1 + op (1)g 5
= op (1) ; =
Pn 1 t=1 X1t 1 X2t 1 nkn n n Pn P n 1 1 2 2 2 2n t=1 X1t 1 kn t=1 X2t 1 n2 n 19
= op (1) ;
by lemma 4.3(iii). We deduce that WnA =
^n A0 vec R
=
^n A0 vec R
^n A0 vec R
=
o 1 X 0 1X 1 A n 0 X 0 1X 1 Kn Kn A0 ^ 2 2
0
n A0 ^
6 0 6 Kn 6 4
^n Kn A vec R 0
11 Pn
n 2 X1t
t=1
0
1
1
0
1
22 Pn
o
1
AKn
n2 2 X1t
t=1
0
^n A0 vec R
1
^n Kn A0 vec R 3 1 0 7 7 0 7 5 2 2n
kn n 22 Pn X 2 t=1
f1 + op (1)g :
2t 1
(5.4)
Next ^ n Kn = A0 vec R ^ n Rn Kn + A0 vec (Rn ) Kn A0 vec R 2 30 2 30 n (b r11 r11 ) n (b r22 r22 ) n (r11 r22 ) 6 7 6 7 = 4 n (b r12 r12 ) 0 5 +4 5 n kn n (b r21 r21 ) 0 30 2 2 30 n n (b r11 r11 ) + Op knn n b c n kn 6 7 6 7 7 +4 n = 6 0 5 ; O n p 4 5 kn n 0 kn nn (b r21 r21 )
(5.5)
from Theorem 2.1 and (2.7) - (2.9). It now follows from (5.4) and (5.5) that
n b + Op (1) ; op (1) ; Op (1) kn
WnA = 2 6 4 =
n kn b
+ Op (1)
op (1) n b kn
Op (1) 2 Z 1 0
3
2
n2 11 Pn X 2 t=1 1t
6 6 6 4
0
0
11 f1
1
n2 2 t=1 X1t
22 Pn
+ op (1)g ;
giving the stated result.
20
3
0 0 1
2 2n kn n 22 Pn X 2 t=1
7 5
J1c (r)2 dr =
0
2t 1
7 7 7 5
1
Proof of Lemma 3.1.
Part (i) follows by standard methods in view of Lemmas 5.2 -
5.5. Also u ~t = Xt
1
~
= n
1
n X
r^n Xt
ut u0t
= ut
+ (^ rn
(^ rn
rn ) n
1
t=1
= n
1
n X
n X
rn ) Xt Xt 1 u0t
1,
and so we have
+
ut Xt0 1
2
+ (^ rn
rn ) n
1
t=1
ut u0t + Op
t=1
1 n
n X
Xt
0 1 Xt 1
t=1
;
(5.6)
as stated. For part (ii) to obtain the limit distribution under the alternative, write r^n as r^n =
n X
X1t X1t
t=1
1
+
n X t=1
X2t X2t
1
!
n X
2 X1t 1
+
n X
2 X2t 1
t=1 t=1 Pn u X 1+ t=1 Pn 1t 1t 2 t=1 X1t 1 +
!
1
Pn Pn Pn 2 2 u X t=1 X1t 1 + n t=1 X2t 1 Pn Pnt=1 2t2 2t Pn + = 2 2 X X + X t=1 2t 1 t=1 2t 1 t=1 1t 1 Pn Pn 2 2 X X = n+ n P t=1 2 1t 1Pn t=1 2 2t 1 = X = 1 + nt=1 X1t 1 t=1 2t 1 Pn Pn Pn Pn 2 2 t=1 X2t 1 t=1 u1t X1t 1 = t=1 X2t 1 + t=1 u2t X2t 1 = Pn P + n 2 2 1 + t=1 X1t 1= t=1 X2t 1 P Pn 1 n 2 2 X1t 1 t=1 X1t 1 P 1 + = n 1 + n Pt=1 n n 2 2 n t=1 X2t 1 t=1 X2t 1 Pn Pn Pn 1 2 2t 1 + t=1 X1t 1 t=1 u1t X1t 1 t=1 u2t XP P 1 + : + n n 2 2 t=1 X2t 1 t=1 X2t 1 n
1
Then, using Lemma 5.3 r^n
n
Pn P Pn 2 u2t X2t 1 + nt=1 u1t X1t t=1 X1t 1 t=1 Pn = ( n f1 + op (1)g + n ) Pn 2 2 t=1 X2t 1 t=1 X2t 1 P P n n n 1 1 1 kn nn t=1 u2t X2t 1 + kn nn n t=1 u1t X1t 1 f1 + op (1)g = Pn 1 2 kn nn 2 2n t=1 X2t 1 kn n 1 Pn 2 n2 c b t=1 X1t 1 n2 + 2 2n f1 + op (1)g P n 1 2 n kn X2t kn n 2 2 1 t=1 kn n 1 Pn u X n 2t 2t 1 1 kn n t=1 = f1 + op (1)g ; Pn n 1 2 kn n 2 2n t=1 X2t 1 k n n
21
1
f1 + op (1)g
and in view of Lemma 5.1 n rn n (^
kn
n)
)
X2 (b) Y2 (b) Y2 (b) ; = 2b 2 X X2 (b) =2b 2 (b)
giving the stated result (3.3). To prove (3.4), …rst note that r^n
n
= (^ rn
n)
+(
n)
n
b kn
=
c n
1
+ Op
kn
:
n n
The restricted regression residuals are u ~t = Xt
r^n Xt
= ut
(^ rn
= ut +
Let
=n
1
Pn
0 t=1 ut ut
~
since n
6
1
= ut
(^ rn I
n ) Xt 1
"
b kn
1
X1t 0
1
#
+
"
(
Rn ) Xt n
1
= ut
n ) X1t 1
0
#
"
r^n
0
n
0
r^n
n
#
Xt
1
f1 + op (1)g :
and then
!p
and
(" # ) n h i X1t 1 b X 0 = + ut + ut X1t 1 0 kn n 0 t=1 " # n i 2 X X1t 1 h b + 2 X1t 1 0 kn n 0 t=1 # " P 2 0 b2 n n 2 nt=1 X1t 1 = + 2 f1 + op (1)g ; kn 0 0
Pn
t=1 X1t 1 ut
= Op (1) by Lemma 5.2(ii) and
kn n
! 0:
References
Chan, N. H. and C. Z. Wei (1987). “Asymptotic Inference for Nearly Nonstationary AR(1) Processes, Annals of Statistics 15, 1050–1063. Cheng, X and P. C. B. Phillips (2009). “Semiparametric cointegrating rank selection,”
22
The Econometrics Journal, 12, pp. S83-S104. Hall, P. and C.C. Heyde (1980). Martingale Limit Theory and its Application. Academic Press. Lai, T.L. and C.Z. Wei (1982). “Least Squares Estimates in Stochastic Regression Models with Applications to Identi…cation and Control of Dynamic Systems”. Annals of Statistics, 10, 154-166. Leeb H. and B. Pötscher, “Model selection and inference: facts and …ction,”Econometric Theory, 21, pp. 21-59. Magdalinos, T. and P. C. B Phillips (2009). “Limit Theory for Cointegrated Systems with Moderately Integrated and Moderately Explosive Regressors”. Econometric Theory, 25, 482-526. Nielsen, B. (2009). “Singular vector autoregressions with deterministic terms: Strong consistency and lag order determination.” University of Oxford working paper. Pagan, A. and A. Ullah (1999). Nonparametric Econonometrics. Cambridge University Press. Ploberger, W. and P. C. B. Phillips (2003). “Empirical limits for time series econometric models” Econometrica, 71, 627-673. Phillips, P. C. B. (1987a). “Time Series Regression with a Unit Root,”Econometrica, 55, 277–302 Phillips, P. C. B. (1987b). “Towards a Uni…ed Asymptotic Theory for Autoregression,” Biometrika 74, 535–547. Phillips, P. C. B. (2008). “Unit root model selection,” Journal of the Japan Statistical Society, 38, 65-74. Phillips, P. C. B., T. Magdalinos (2007a), "Limit Theory for Moderate Deviations from a Unit Root," Journal of Econometrics 136, 115-130. Phillips, P. C. B., T. Magdalinos (2007b), "Limit Theory for Moderate Deviations from a Unit Root Under Weak Dependence," in G. D. A. Phillips and E. Tzavalis (Eds.)
23
The Re…nement of Econometric Estimation and Test Procedures: Finite Sample and Asymptotic Analysis. Cambridge: Cambridge University Press, pp.123-162. Phillips, P. C. B. and T. Magdalinos (2008). “Limit theory for explosively cointegrated systems”, Econometric Theory, 24, 865-887. Phillips, P. C. B., T. Magdalinos (2011), “Inconsistent VAR Regression with Common Explosive Roots,” Working Paper, Yale University. Phillips P. C. B., Y. Wu and J. Yu (2011). “Explosive behavior in the 1990s Nasdaq: When did exuberance escalate asset values?”, International Economic Review, 52, pp. 201-226. Phillips P. C. B. and J. Yu (2011). “Dating the Timeline of Financial Bubbles during the Subprime Crisis,” Quantitative Economics, 2, pp. 455-491. Ullah, A. (2004). Finite Sample Econometrics. Oxford: Oxford University Press.
24