Least squares estimators for discretely observed stochastic processes ...

Report 4 Downloads 52 Views
Author's personal copy Journal of Multivariate Analysis 116 (2013) 422–439

Contents lists available at SciVerse ScienceDirect

Journal of Multivariate Analysis journal homepage: www.elsevier.com/locate/jmva

Least squares estimators for discretely observed stochastic processes driven by small Lévy noises Hongwei Long a,∗ , Yasutaka Shimizu b , Wei Sun c a

Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, FL 33431-0991, USA

b

Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka 560-8531, Japan

c

Department of Mathematics and Statistics, Concordia University, Montreal, Quebec H3G 1M8, Canada

article

abstract

info

Article history: Received 21 May 2012 Available online 25 January 2013

We study the problem of parameter estimation for discretely observed stochastic processes driven by additive small Lévy noises. We do not impose any moment condition on the driving Lévy process. Under certain regularity conditions on the drift function, we obtain consistency and rate of convergence of the least squares estimator (LSE) of the drift parameter when a small dispersion coefficient ε → 0 and n → ∞ simultaneously. The asymptotic distribution of the LSE in our general setting is shown to be the convolution of a normal distribution and a distribution related to the jump part of the Lévy process. Moreover, we briefly remark that our methodology can be easily extended to the more general case of semi-martingale noises. © 2013 Elsevier Inc. All rights reserved.

AMS 2010 subject classifications: primary 62F12 62M05 secondary 60G52 60J75 Keywords: Asymptotic distribution of LSE Consistency of LSE Discrete observations Least squares method Stochastic processes Parameter estimation Small Lévy noises

1. Introduction Let (Ω , F , P) be a basic probability space equipped with a right continuous and increasing family of σ -algebras (Ft , t ≥ 0). Let (Lt , t ≥ 0) be an Rd -valued Lévy process, which is given by Lt = at + σ Bt +

 t 0

z N˜ (ds, dz ) + |z |≤1

 t 0

zN (ds, dz ),

(1.1)

|z |>1

where a = (a1 , . . . , ad ) ∈ Rd , σ = (σij )d×r is a d × r real-valued matrix, Bt = (B1t , . . . , Brt ) is an r-dimensional standard Brownian motion, N (ds, dz ) is an independent Poisson random measure on R+ × (Rd \ {0}) with characteristic measure dt ν(dz ), and N˜ (ds, dz ) = N (ds, dz ) − ν(dz )ds is a martingale  measure. Here we assume that ν(dz ) is a Lévy measure on

Rd \ {0} satisfying



Rd \{0}

(|z |2 ∧ 1)ν(dz ) < ∞ with |z | =

d

i=1

zi2 . The stochastic process X = (Xt , t ≥ 0), starting from

x0 ∈ Rd , is defined as the unique strong solution to the following stochastic differential equation (SDE) dXt = b(Xt , θ)dt + ε dLt ,

t ∈ [0, 1];

X0 = x0 ,

(1.2)

¯ 0 (the closure of Θ0 ) with Θ0 being an open bounded convex subset of Rp , and b = (b1 , . . . , bd ) : where θ ∈ Θ = Θ d d R × Θ → R is a known function. Without loss of generality, we assume that ε ∈ (0, 1]. The regularity conditions on b will ∗

Corresponding author. E-mail address: [email protected] (H. Long).

0047-259X/$ – see front matter © 2013 Elsevier Inc. All rights reserved. doi:10.1016/j.jmva.2013.01.012

Author's personal copy H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

423

be provided in Section 2. Assume that this process is observed at regularly spaced time points {tk = k/n, k = 1, 2, . . . , n}. The only unknown quantity in SDE (1.2) is the parameter θ . Let θ0 ∈ Θ0 be the true value of the parameter θ . The purpose of this paper is to study the least squares estimator for the true value θ0 based on the sampling data (Xtk )nk=1 with small dispersion ε and large sample size n. In the case of diffusion processes driven by Brownian motion, a popular method is the maximum likelihood estimator (MLE) based on the Girsanov density when the processes can be observed continuously (see Prakasa Rao [31], Liptser and Shiryaev [19], Kutoyants [16], and Bishwal [2]). When a diffusion process is observed only at discrete times, in most cases the transition density and hence the likelihood function of the observations is not explicitly computable. In order to overcome this difficulty, some approximate likelihood methods have been proposed by Lo [20], Pedersen [27,28], Poulsen [29], and Aït-Sahalia [1]. For a comprehensive review on MLE and other related methods, we refer to Sørensen [37]. The least squares estimator (LSE) is asymptotically equivalent to the MLE. For the LSE, the convergence in probability was proved in Dorogovcev [5] and Le Breton [18], the strong consistency was studied in Kasonga [12], and the asymptotic distribution was studied in Prakasa Rao [30]. For a more recent comprehensive discussion, we refer to Prakasa Rao [31], Kutoyants [16], Bishwal [2] and the references therein. The parametric estimation problems for diffusion processes with jumps based on discrete observations have been studied by Shimizu and Yoshida [35] and Shimizu [33] via the quasi-maximum likelihood. They established consistency and asymptotic normality for the proposed estimators. Moreover, Ogihara and Yoshida [26] showed some stronger results than the ones by Shimizu and Yoshida [35], and also investigated an adaptive Bayes-type estimator with its asymptotic properties. The driving jump processes considered in Shimizu and Yoshida [35], Shimizu [33] and Ogihara and Yoshida [26] include a large class of Lévy processes such as compound Poisson processes, gamma, inverse Gaussian, variance gamma, normal inverse Gaussian or some generalized tempered stable processes. Masuda [24] dealt with the consistency and asymptotic normality of the TFE (trajectory-fitting estimator) and LSE when the driving process is a zero-mean adapted process (including Lévy process) with finite moments. The parametric estimation for Lévy-driven Ornstein–Uhlenbeck processes was also studied by Brockwell et al. [3], Spiliopoulos [39], and Valdivieso et al. [46]. However, the aforementioned papers were unable to cover an important class of driving Lévy processes, namely α -stable Lévy motions with α ∈ (0, 2). Recently, Hu and Long [9,10] have started the study on parameter estimation for Ornstein–Uhlenbeck processes driven by α -stable Lévy motions. They obtained some new asymptotic results on the proposed TFE and LSE under continuous or discrete observations, which are different from the classical cases where asymptotic distributions are normal. Fasen [6] extended the results of Hu and Long [10] to multivariate Ornstein–Uhlenbeck processes driven by α -stable Lévy motions. Masuda [25] proposed a self-weighted least absolute deviation estimator for discretely observed ergodic Ornstein–Uhlenbeck processes driven by symmetric Lévy processes. The asymptotic theory of parametric estimation for diffusion processes with small white noise based on continuous-time observations has been well developed (see, e.g., Kutoyants [14,15], Yoshida [48,50], Uchida and Yoshida [44]). There have been many applications of small noise asymptotics to mathematical finance, see for example Yoshida [49], Takahashi [40], Kunitomo and Takahashi [13], Takahashi and Yoshida [41], Uchida and Yoshida [45]. From a practical point of view in parametric inference, it is more realistic and interesting to consider asymptotic estimation for diffusion processes with small noise based on discrete observations. Substantial progress has been made in this direction. Genon-Catalot [7] and Laredo [17] studied the efficient estimation of drift parameters of small diffusions from discrete observations when ε → 0 and n → ∞. Sørensen [36] used martingale estimating functions to establish consistency and asymptotic normality of the estimators of drift and diffusion coefficient parameters when ε → 0 and n is fixed. Sørensen and Uchida [38] and Gloter and Sørensen [8] used a contrast function to study the efficient estimation for unknown parameters in both drift and diffusion coefficient functions. Uchida [42,43] used the martingale estimating function approach to study estimation of drift parameters for small diffusions under weaker conditions. Thus, in the cases of small diffusions, the asymptotic distributions of the estimators are normal under suitable conditions on ε and n. Long [21] studied the parameter estimation problem for discretely observed one-dimensional Ornstein–Uhlenbeck processes with small Lévy noises. In that paper, the drift function is linear in both x and θ (b(x, θ ) = −θ x), the driving Lévy process is Lt = aBt + bZt , where a and b are known constants, (Bt , t ≥ 0) is the standard Brownian motion and Zt is a α -stable Lévy motion independent of (Bt , t ≥ 0). The consistency and rate of convergence of the least squares estimator are established. The asymptotic distribution of the LSE is shown to be the convolution of a normal distribution and a stable distribution. In a similar framework, Long [22] discussed the statistical estimation of the drift parameter for a class of SDEs with special drift function b(x, θ ) = θ b(x). Ma [23] extended the results of Long [21] to the case when the driving noise is a general Lévy process. However, all the drift functions discussed in Long [21,22] and Ma [23] are linear in θ , which restricts the applicability of their models and results. In this paper, we allow the drift function b(x, θ ) to be nonlinear in both x and θ , and the driving noise to be a general Lévy process. We are interested in estimating the drift parameter in SDE (1.2) based on discrete observations {Xti }ni=1 when ε → 0 and n → ∞. We shall use the least squares method to obtain an asymptotically consistent estimator. Consider the following contrast function

Ψn,ε (θ) =

n  |Xtk − Xtk−1 − b(Xtk−1 , θ ) · ∆tk−1 |2 , ε 2 ∆tk−1 k=1

Author's personal copy 424

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

where ∆tk−1 = tk − tk−1 = 1/n. Then the LSE θˆn,ε is defined as

θˆn,ε := arg min Ψn,ε (θ). θ ∈Θ

Since minimizing Ψn,ε (θ) is equivalent to minimizing

Φn,ε (θ) := ε 2 (Ψn,ε (θ ) − Ψn,ε (θ0 )), we may write the LSE as

θˆn,ε = arg min Φn,ε (θ ). θ∈Θ

We shall use this fact later for convenience of the proofs. In the nonlinear case, it is generally very difficult or impossible to obtain an explicit formula for the least squares estimator θˆn,ε . However, we can use some nice criteria in statistical inference (see Chapter 5 of van der Vaart [47] and Shimizu [34] for a more general criterion) to establish the consistency of the LSE as well as its asymptotic behaviors (asymptotic distribution and rate of convergence). In this paper, we consider the asymptotics of the LSE θˆn,ε with high frequency (n → ∞) and small

dispersion (ε → 0). Our goal is to prove that θˆn,ε → θ0 in probability and to establish its rate of convergence and asymptotic distributions. We obtain some new asymptotic distributions for the LSE in our general setting, which are the convolutions of normal distribution and a distribution related to the jump part of the driving Lévy process. Some similar but more general results are also established when the driving Lévy process is replaced by a general semi-martingale. The paper is organized as follows. In Section 2, we state our main result with some remarks and examples. We establish the consistency of the LSE θˆn,ε and give its asymptotic distribution, which is a natural extension of the classical smalldiffusion cases. All the proofs are given in Section 3. In Section 4, we discuss the extension of main results in Section 2 to the general case when the driving noise is a semi-martingale. Some simulation studies are provided in Section 5. 2. Main results 2.1. Notation and assumptions Let X 0 = (Xt0 , t ≥ 0) be the solution to the underlying ordinary differential equation (ODE) under the true value of the drift parameter: dXt0 = b(Xt0 , θ0 )dt ,

X00 = x0 . m

m

m

m

For a multi-index m = (m1 , . . . , mk ), we define a derivative operator in z ∈ Rk as ∂zm := ∂z1 1 · · · ∂zk k , where ∂zi i := ∂ mi /∂ zi i . Let C k,l (Rd × Θ ; Rq ) be the space of all functions f : Rd × Θ → Rq which is k and l times continuously differentiable k,l with respect to x and θ , respectively. Moreover C↑ (Rd × Θ ; Rq ) is a class of f ∈ C k,l (Rd × Θ ; Rq ) satisfying that β

supθ∈Θ |∂θα ∂x f (x, θ)| ≤ C (1 + |x|)λ for universal positive constants C and λ, where α = (α1 , . . . , αp ) and β = (β1 , . . . , βd )

p

are multi-indices with 0 ≤ i=1 αi ≤ l and 0 ≤ i=1 βi ≤ k, respectively. We introduce the following set of assumptions.

d

(A1) There exists a constant K > 0 such that

|b(x, θ) − b(y, θ)| ≤ K |x − y|;

|b(x, θ)| ≤ K (1 + |x|)

for each x, y ∈ R and θ ∈ Θ . 2,3 (A2) b(·, ·) ∈ C↑ (Rd × Θ ; Rd ). (A3) θ = ̸ θ0 ⇔ b(Xt0 , θ) ̸= b(Xt0 , θ0 ) for at least one value of t ∈ [0, 1]. (A4) I (θ0 ) = (I ij (θ0 ))1≤i,j≤p is positive definite, where d

I ij (θ) =

1



(∂θi b)T (Xs0 , θ )∂θj b(Xs0 , θ)ds. 0

It is well-known that SDE (1.2) has a unique strong solution under (A1). For convenience, we shall use C to denote a generic constant whose value may vary from place to place. For a matrix A, we define |A|2 = tr(AAT ), where AT is the d r 2 transpose of A. In particular, |σ |2 = i =1 j=1 σij . 2.2. Asymptotic behavior of LSE The consistency of our estimator θˆn,ε is given as follows. Theorem 2.1. Under conditions (A1)–(A3), we have Pθ

0 θˆn,ε −→ θ0

as ε → 0 and n → ∞.

Author's personal copy H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

425

The next theorem gives the asymptotic distribution of θˆn,ε . As is easily seen, our result includes the case of Sørensen and Uchida [38] as a special case. Theorem 2.2. Under conditions (A1)–(A4), we have Pθ

0 ε −1 (θˆn,ε − θ0 ) −→ I −1 (θ0 )S (θ0 ),

(2.1)

as ε → 0, n → ∞ and nε → ∞, where S (θ0 ) :=

1



(∂θ1 b) ( T

Xs0

, θ0 )dLs , . . . ,

1



(∂θp b) ( T

Xs0

, θ0 )dLs

T

.

0

0

Remark 2.3. One of our main contributions is that we no longer require any high-order moments condition on X as in, e.g., Sørensen and Uchida [38] and others, which makes our results applicable in many practical models. Remark 2.4. In general, the limiting distribution on the right-hand side of (2.1) is a convolution of a normal distribution and a distribution related to the jump part of the Lévy process. In particular, if the driving Lévy process L is the linear combination of standard Brownian motion and α -stable motion, the limiting distribution becomes the convolution of a normal distribution and a stable distribution. Remark 2.5. When d = 1 and b(x, θ ) = −θ x, i.e., SDE (1.2) is linear and driven by a general Lévy process, Theorem 2.2 reduces to Theorem 1.1 of Ma [23]. When the driving Lévy process is a linear combination of standard Brownian motion and α -stable motion, Theorem 2.2 was discussed in Long [21] and Ma [23]. Example 2.6. We consider a one-dimensional stochastic process in (1.2) with drift function b(x, θ ) = θ1 + θ2 x. We assume that the true value θ0 = (θ10 , θ20 ) of θ = (θ1 , θ2 ) belongs to Θ0 = (c1 , c2 ) × (c3 , c4 ) ⊂ R2 with c1 < c2 and c3 < c4 . Then, X 0 satisfies the following ODE dXt0 = (θ10 + θ20 Xt0 )dt ,

X00 = x0 .

(2.2) θ20 t

θ10 (e

The explicit solution is given by Xt0 = eθ2 t x0 + 0

−1 )

θ20

θˆn,ε = (θˆn,ε,1 , θˆn,ε,2 )T of θ0 is given by   n 1 Xtk−1 , θˆn,ε,1 = (X1 − X0 ) − θˆn,ε,2

when θ20 ̸= 0; Xt0 = x0 + θ10 t when θ20 = 0. The LSE

n k=1

n 

θˆn,ε,2 =

(Xtk − Xtk−1 )Xtk−1 − (X1 − X0 )



k=1 1 n

n  k =1

Xt2k−1

 −

1 n

n 

1 n

n  k=1

2

 Xtk−1

.

Xtk−1

k=1

1

Note that ∂θ1 b(x, θ) = 1 and ∂θ2 b(x, θ) = x. In this case, the limiting random vector in Theorem 2.2 is I −1 (θ0 )(

1

) where  

0

dLs ,

Xs0 dLs T , 0

1

1



Xs0 ds

ds I (θ0 ) =  



0 1

Xs0 ds 0

0 1



  . 

(Xs0 )2 ds 0



Example 2.7. We consider a one-dimensional stochastic process in (1.2) with drift function b(x, θ ) = θ + x2 . We assume that the true value θ0 of θ belongs to Θ0 = (c1 , c2 ) ⊂ R with 0 < c1 < c2 < ∞. Then, X 0 satisfies the following ODE dXt0 =



θ0 + (Xt0 )2 dt ,

The explicit solution is given by nonlinear equation n  Xt − Xtk−1 k = 1. k=1 θ + Xt2k−1

X00 = x0 . Xt0

=

 (x0 + θ0 +x20 )2 e2t −θ0  . It is easy to verify that the LSE θˆn,ε 2(x0 + θ0 +x20 )et

of θ is a solution to the following

Author's personal copy 426

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

Since it is impossible to get the explicit expression for θˆn,ε , we solve the above equation numerically (e.g. by using Newton’s method). Note that ∂θ b(x, θ) = √1 . It is clear that the limiting random variable in Theorem 2.2 is θ+x2

2

I −1 (θ0 )

1 0

√ 2

1

θ0 +(Xs0 )2

dLs , where I (θ0 ) =

1

1



0 4 θ +(X 0 )2 0 s

 ds.

In particular, we assume that Lt = aBt + σ Zt , where Bt is the

standard Brownian motion and Zt is a standard α -stable Lévy motion independent of Bt . Let us denote by N a random variable with the standard normal distribution and U a random variable with the standard α -stable distribution Sα (1, β, 0), where α ∈ (0, 2) is the index of stability and β ∈ [−1, 1] is the skewness parameter. By using the self-similarity and time change, we can easily show that the limiting random variable in Theorem 2.2 has distribution given by aI

− 12

(θ0 )N + σ I

−1

(θ0 )



1



α

1

ds

2 θ0 + (Xs0 )2



0

1/α U.

Example 2.8. We consider a two-dimensional stochastic process in (1.2) with drift function b(x, θ ) = C + Ax, where C = (c1 , c2 )T , A = (Aij )1≤i,j≤2 and x = (x1 , x2 )T . We assume that the eigenvalues of A have positive real parts. We want to estimate θ = (θ1 , . . . , θ6 )T = (c1 , A11 , A12 , c2 , A21 , A22 )T ∈ Θ ⊂ R6 , whose true value is θ0 = (c10 , A011 , A012 , c20 , A021 , A022 )T . Then Xt0 satisfies the following ODE dXt0 = (C0 + A0 Xt0 )dt ,

X00 = x0 .

The explicit solution is given by Xt0 = eA0 t x0 +

(θˆn,ε,i )1≤i≤6 is given by 

n

n 

(1)

Yk

0

eA0 (t −s) C0 ds. After some basic calculation, we find that the LSE θˆn,ε =



   k=1   θˆn,ε,1 n     ( 1) (1)  − 1 θˆn,ε,2  = Λn n Yk Xtk−1    k=1   n θˆn,ε,3   (1) (2) 





and

n

n 

(2)

Yk



    k=1   θˆn,ε,4 n     ( 2) (1)  − 1 θˆn,ε,5  = Λn n Yk Xtk−1  ,  k=1   n  θˆn,ε,6   (2) (2)  n

Yk Xtk−1

n

t

k=1

Yk Xtk−1

k=1

(i)

(i)

where Xtk−1 (i = 1, 2) are the components of Xtk−1 , Yk (i = 1, 2) are the components of Yk = Xtk − Xtk−1 , and



n 

n

  n   (1) X Λn =   k=1 tk−1  n  (2)

Xtk−1

k=1

k =1 n



(1)

Xtk−1 (1)

Xtk−1

2

(1)

(2)

Xtk−1 Xtk−1

k=1

(2)

Xtk−1



   (1) (2)  Xtk−1 Xtk−1  .  k=1 n  2    (2) k=1

k=1

n 

n 

n 

Xtk−1

k=1

Since it is easy and straightforward to compute the partial derivatives ∂θi b(x, θ ), 1 ≤ i ≤ 6, and the limiting random vector in Theorem 2.2, we omit the details here. 3. Proofs 3.1. Proof of Theorem 2.1 We first establish some preliminary lemmas. In the sequel, we shall use the notation n,ε

Yt

:= X[nt ]/n

for the stochastic process X defined by (1.2), where [nt ] denotes the integer part of nt. n,ε

Lemma 3.1. The sequence {Yt n → ∞.

} converges to the deterministic process {Xt0 } uniformly on compacts in probability as ε → 0 and

Proof. Note that Xt −

Xt0

t



(b(Xs , θ0 ) − b(Xs0 , θ0 ))ds + ε Lt .

= 0

Author's personal copy H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

427

By the Lipschitz condition on b(·) in (A1) and the Cauchy–Schwarz inequality, we find that

 t 2   0  | ≤ 2  (b(Xs , θ0 ) − b(Xs , θ0 ))ds + 2ε 2 |Lt |2 0  t |b(Xs , θ0 ) − b(Xs0 , θ0 )|2 ds + 2ε 2 sup |Ls |2 ≤ 2t

Xt0 2

|Xt −

0≤s≤t

0

≤ 2K 2 t

t



|Xs − Xs0 |2 ds + 2ε 2 sup |Ls |2 . 0≤s≤t

0

By Gronwall’s inequality, it follows that

|Xt − Xt0 |2 ≤ 2ε 2 e2K

2t2

sup |Ls |2 0≤s≤t

and consequently



2ε eK

sup |Xt − Xt0 | ≤

2T 2

0 ≤t ≤T

sup |Lt |,

(3.1)

0 ≤t ≤T

which goes to zero in probability as ε → 0 for each T > 0. Since [nt ]/n → t as n → ∞, we conclude that the statement holds.  n,ε

Lemma 3.2. Let τmn,ε = inf{t ≥ 0 : |Xt0 | ≥ m or |Yt

| ≥ m}. Then, τmn,ε → ∞ a.s. uniformly in n and ε as m → ∞.

Proof. Note that t



b(Xs , θ0 )ds + ε Lt .

Xt = x0 + 0

By the linear growth condition on b and the Cauchy–Schwarz inequality, we get

 t 2   |Xt |2 ≤ 2(|x0 | + ε|Lt |)2 + 2  b(Xs , θ0 )ds 0



≤ 2 |x0 | + ε sup |Ls |

2

t



|b(Xs , θ0 )|2 ds

+ 2t

0≤s≤t

0

2

 t ≤ 2 |x0 | + ε sup |Ls | + 2K 2 t (1 + |Xs |)2 ds 0≤s≤t 0    2  t 2 2 2 ≤ 2 |x0 | + ε sup |Ls | + 4K t + 4K t |Xs |2 ds. 

0≤s≤t

0

Gronwall’s inequality yields that

   2 2 2 2 2 |Xt | ≤ 2 |x0 | + ε sup |Ls | + 4K t e4K t 2

0≤s≤t

and

 |Xt | ≤







2 |x0 | + ε sup |Ls |

 2 2 + 2Kt e2K t .

0≤s≤t

Thus, it follows that

|Ytn,ε | = |X[nt ]/n | ≤

  √



2 |x0 | + sup |Ls |

 2 2 + 2Kt e2K t ,

0≤s≤t

which is almost surely finite. Therefore the proof is complete.



We shall use ∇x f (x, θ ) = (∂x1 f (x, θ ), . . . , ∂xd f (x, θ)) to denote the gradient operator of f (x, θ ) with respect to x. T

1 ,1

Lemma 3.3. Let f ∈ C↑ (Rd × Θ ; R). Assume (A1)–(A2). Then, we have n 1

n k=1

Pθ 0

f (Xtk−1 , θ) −→



1

f (Xs0 , θ )ds

0

as ε → 0 and n → ∞, uniformly in θ ∈ Θ .

Author's personal copy 428

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

Proof. By the differentiability of the function f (x, θ ) and Lemma 3.1, we find that

   1 n 1     f (Xs0 , θ)ds = sup  f (Xtk−1 , θ ) −   θ∈Θ n k=1 0

  sup  θ∈Θ

1

f( 1

sup θ∈Θ

, θ )ds −

1



f(

Xs0

0

0

 ≤

Ysn,ε

  , θ )ds

|f (Ysn,ε , θ ) − f (Xs0 , θ )|ds

0 1

 1     (∇x f )T (X 0 + u(Y n,ε − X 0 ), θ ) · (Y n,ε − X 0 )du ds s s s s s   θ∈Θ 0 0   1  1 0 n,ε 0 sup |∇x f (Xs + u(Ys − Xs ), θ)|du |Ysn,ε − Xs0 |ds 



sup



θ∈Θ

0

0 1

 ≤

C (1 + |Xs0 | + |Ysn,ε |)λ |Ysn,ε − Xs0 |ds

0

 ≤

1 + sup |

C

0≤s≤1 Pθ

Xs0

λ | + sup |Xs | 0≤s≤1

sup |Ysn,ε − Xs0 |

0≤s≤1

0

−→ 0 as ε → 0 and n → ∞.



1,1

Lemma 3.4. Let f ∈ C↑ (Rd × Θ ; R). Assume (A1)–(A2). Then, we have that for each 1 ≤ i ≤ d and each θ ∈ Θ , n 

f (Xtk−1 , θ)(

Litk



Litk−1

Pθ 0

) −→

1



f (Xs0 , θ)dLis

0

k=1

as ε → 0 and n → ∞, where Lit

= ai t +

r 

 t

j ij Bt

σ

zi N˜ (ds, dz ) +

+ |z |≤1

0

j =1

 t

zi N (ds, dz ) |z |>1

0

is the i-th component of Lt . Proof. Note that n 

f (Xtk−1 , θ)(Litk − Litk−1 ) =

k=1

Let L˜ it = Lit − 1



t  0

|z |>1 zi N (ds, dz ).

f (Ysn,ε , θ)dLis −

0

1



f (Ysn,ε , θ)dLis .

0

Then, we have the following decomposition

1



f (Xs0 , θ)dLis = 0

1

 0



(f (Ysn,ε , θ) − f (Xs0 , θ ))zi N (ds, dz ) +

|z |>1

1



(f (Ysn,ε , θ ) − f (Xs0 , θ ))dL˜ is .

0

Similar to the proof of Lemma 3.3, we have

   

1 0

 |z |>1

Ysn,ε

(f (

, θ) − f (

Xs0

   , θ ))zi N (ds, dz ) ≤

1



1



|z |>1

0



|f (Ysn,ε , θ ) − f (Xs0 , θ )||zi |N (ds, dz ) C (1 + |Xs0 | + |Ysn,ε |)λ |Ysn,ε − Xs0 ||zi |N (ds, dz )

≤ |z |>1

0



Xs0

≤ C 1 + sup | 0≤s≤1 1





| + sup |Xs | 0≤s≤1

|zi |N (ds, dz ),

× 0

λ

|z |>1

sup |Ysn,ε − Xs0 |

0≤s≤1

Author's personal copy H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

429

which converges to zero in probability as ε → 0 and n → ∞ by Lemma 3.1. By using the stopping time τmn,ε , Lemma 3.1, Markov inequality and dominated convergence, we find that for any given η > 0 and some fixed m

  1      |ai | 1  n,ε 0 i  ˜ P  (f (Ys , θ ) − f (Xs , θ))1{s≤τmn,ε } dLs  > η ≤ E |f (Ysn,ε , θ ) − f (Xs0 , θ )|1{s≤τmn,ε } ds η 0 0  r  σij2    1/2 1 j =1 E |f (Ysn,ε , θ ) − f (Xs0 , θ )|2 1{s≤τmn,ε } ds + η 0 +

1

1



η

0

   n,ε 0 2 E |f (Ys , θ) − f (Xs , θ)| 1{s≤τmn,ε } ds ·

1/2 |zi | ν(dz ) , 2

(3.2)

|z |≤1

which goes to zero as ε → 0 and n → ∞. Then, we have

   1   1      n,ε 0 i n,ε n,ε 0 i   ˜ ˜ n ,ε P  (f (Ys , θ ) − f (Xs , θ))dLs  > η ≤ P (τm < 1) + P  (f (Ys , θ ) − f (Xs , θ ))1{s≤τm } dLs  > η , 0

0

which converges to zero as ε → 0 and n → ∞ by Lemma 3.2 and (3.2). This completes the proof.



1,1

Lemma 3.5. Let f ∈ C↑ (Rd × Θ ; R). Assume (A1)–(A2). Then, we have that for 1 ≤ i ≤ d, n 

Pθ 0

f (Xtk−1 , θ)(Xtik − Xtik−1 − bi (Xtk−1 , θ0 )∆tk−1 ) −→ 0

k =1

as ε → 0 and n → ∞, uniformly in θ ∈ Θ , where Xti and bi are the i-th components of Xt and b, respectively. Proof. Note that Xtik = Xtik−1 +



tk tk−1

bi (Xs , θ0 )ds + ε(Litk − Litk−1 ).

It is easy to see that n 

f (Xtk−1 , θ)(Xtik − Xtik−1 − bi (Xtk−1 , θ0 )∆tk−1 ) =

n  

k =1

f (Xtk−1 , θ )(bi (Xs , θ0 ) − bi (Xtk−1 , θ0 ))ds

tk−1

k=1



tk

n 

f (Xtk−1 , θ )(Litk − Litk−1 )

k=1 1



Ysn,ε

f(

=

bi Ysn,ε

, θ )(bi (Xs , θ0 ) − (

, θ0 ))ds + ε

1



0

f (Ysn,ε , θ )dLis .

0

By the given condition on f and the Lipschitz condition on b, we have 1

 

sup  θ∈Θ

0

 

f (Ysn,ε , θ)(bi (Xs , θ0 ) − bi (Ysn,ε , θ0 ))ds ≤

1

 0

sup |f (Ysn,ε , θ )| · K |Xs − Ysn,ε |ds θ∈Θ 1

 ≤ KC

(1 + |Ysn,ε |)λ (|Xs − Xs0 | + |Ysn,ε − Xs0 |)ds

0

λ   0 n,ε 0 ≤ KC 1 + sup |Xt | sup |Xs − Xs | + sup |Ys − Xs | , 

0≤t ≤1

0≤s≤1

0≤s≤1

which converges to zero in probability as ε → 0 and n → ∞ by Lemma 3.1. Next using the decomposition of Lt , we have

   sup ε θ∈Θ

1 0

   r  1      f (Ysn,ε , θ)ds + ε sup  f (Ysn,ε , θ ) σij dBjs   θ ∈Θ  0 0 j =1  1    1       n,ε n,ε    ˜ + ε sup  f (Ys , θ )zi N (ds, dz ) + ε sup  f (Ys , θ )zi N (ds, dz ) . 1

     n,ε i f (Ys , θ)dLs  ≤ ε sup ai θ∈Θ θ∈Θ

0

|z |≤1

θ∈Θ

0

|z |>1

Author's personal copy 430

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

It is clear that

   ε sup ai θ∈Θ

1 0

 

f (Ysn,ε , θ )ds ≤ ε|ai |C

1



(1 + |Ysn,ε |)λ ds

0

λ ≤ ε|ai |C 1 + sup |Xs | , 

0≤s≤1

which converges to zero in probability as ε → 0 and n → ∞, and

  ε sup  θ∈Θ

1

 



0

|z |>1

f (Ysn,ε , θ)zi N (ds, dz ) ≤ ε

≤ε



1



1



|z |>1 θ∈Θ

0



sup |f (Ysn,ε , θ )| · |zi |N (ds, dz ) C (1 + |Ysn,ε |)λ · |zi |N (ds, dz )

|z |>1

0



≤ ε C 1 + sup |Xs |

λ 

0≤s≤1

1



0

|zi |N (ds, dz ), |z |>1

which converges to zero in probability. Note that

       r r  1   1        P ε sup  f (Ysn,ε , θ ) σij dBjs  > η ≤ P (τmn,ε < 1) + P ε sup  f (Ysn,ε , θ )1{s≤τmn,ε } σij dBjs  > η .   θ∈Θ  0 θ∈Θ  0 j =1 j =1 

(3.3)

Let uin,ε (θ) = ε

1

 0

f (Ysn,ε , θ )1{s≤τmn,ε }

r 

σij dBjs ,

1 ≤ i ≤ d.

j =1

We want to prove that uin,ε (θ ) → 0 in probability as ε → 0 and n → ∞, uniformly in θ ∈ Θ . It suffices to show the pointwise convergence and the tightness of the sequence {uin,ε (·)}. For the pointwise convergence, by the Chebyshev inequality and Ito’s isometry, we have

 2  r  1     P (|uin,ε (θ)| > η) ≤ ε 2 η−2 E  f (Ysn,ε , θ )1{s≤τmn,ε } σij dBjs    0  j =1    1  r   2 2 −2 σij ε η E |f (Ysn,ε , θ )|2 1{s≤τmn,ε } ds ≤ 0

j =1

 ≤

r 

 σij2 ε 2 η−2



r 

1

0

j =1









E C 2 (1 + |Ysn,ε |)2λ 1{s≤τmn,ε } ds

 σij2 ε 2 η−2 C 2 (1 + m)2λ ,

(3.4)

j =1

which converges to zero as ε → 0 and n → ∞ with fixed m. For the tightness of {uin,ε (·)}, by using Theorem 20 in Appendix I of Ibragimov and Has’minskii [11], it is enough to prove the following two inequalities

E[|uin,ε (θ)|2q ] ≤ C ,

(3.5)

E[|uin,ε (θ2 ) − uin,ε (θ1 )|2q ] ≤ C |θ2 − θ1 |2q

(3.6)

for θ , θ1 , θ2 ∈ Θ , where 2q > p. The proof of (3.5) is very similar to moment estimates in (3.4) by replacing Ito’s isometry with the Burkholder–Davis–Gundy inequality. So we omit the details here. For (3.6), by using Taylor’s formula and the Burkholder–Davis–Gundy inequality, we have uin,ε

E[|

(θ2 ) −

uin,ε

(θ1 )| ] ≤ ε Cq 2q

2q

 r 

σ

2 ij

q 

≤ ε 2q Cq

j =1

Ysn,ε

(f (

E

0

j =1

 r 

1

σij2

q 

1

E

0

1

 0

Ysn,ε

, θ2 ) − f (

q 

, θ1 )) 1{s≤τmn,ε } ds 2

|θ2 − θ1 |2 |∇θ f (Ysn,ε , θ1 + v(θ2 − θ1 ))|2 1{s≤τmn,ε } dv ds

q 

Author's personal copy H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

 ≤ ε Cq 2q

r 

q σ

C |θ2 − θ1 | E

2 ij

2q

2q



 ≤ ε Cq

r 

(1 + |

Ysn,ε

0

j=1 2q

1

431

q 



|) 1{s≤τmn,ε } ds

q σ

2 ij

C 2q (1 + m)2λq |θ2 − θ1 |2q .

j=1

  1

 j σ dB  converges to zero in probability ij s j =1   0   1  as ε → 0 and n → ∞. Similarly, we can prove that ε supθ∈Θ  0 |z |≤1 f (Ysn,ε , θ )zi N˜ (ds, dz ) converges to zero in probability as ε → 0 and n → ∞. Therefore, the proof is complete.  Combining (3.3) and the above arguments, we have that ε supθ∈Θ 

f (Ysn,ε , θ )

r

Now we are in a position to prove Theorem 2.1. Proof of Theorem 2.1. Note that

Φn,ε (θ) = −2

n n  1 (b(Xtk−1 , θ) − b(Xtk−1 , θ0 ))T (Xtk − Xtk−1 − n−1 b(Xtk−1 , θ0 )) + |b(Xtk−1 , θ ) − b(Xtk−1 , θ0 )|2

n k=1

k=1

:= Φn(1,ε) (θ ) + Φn(2,ε) (θ ). Pθ 0

(1)

By Lemma 3.5 and let f (x, θ ) = bi (x, θ) − bi (x, θ0 ) (1 ≤ i ≤ d), we have supθ ∈Θ |Φn,ε (θ )| −→ 0 as ε → 0 and n → ∞. By Pθ 0

(2)

using Lemma 3.3 with f (x, θ) = |b(x, θ ) − b(x, θ0 )|2 , we find supθ∈Θ |Φn,ε (θ) − F (θ )| −→ 0 as ε → 0 and n → ∞, where F (θ) =

1 0

|b(Xt0 , θ) − b(Xt0 , θ0 )|2 dt. Thus combining the previous arguments, we have Pθ

sup |Φn,ε (θ) − F (θ )| −→ 0 0

θ∈Θ

as ε → 0 and n → ∞, and that (A3) and the continuity of X 0 yield that inf

|θ−θ0 |>δ

F (θ) > F (θ0 ) = 0, Pθ 0

for each δ > 0. Therefore, by Theorem 5.9 of van der Vaart [47], we have the desired consistency, i.e., θˆn,ε −→ θ0 as ε → 0 and n → ∞. This completes the proof.  3.2. Proof of Theorem 2.2 Note that

∇θ Φn,ε (θ) = −2

n 

(∇θ b)T (Xtk−1 , θ )(Xtk − Xtk−1 − b(Xtk−1 , θ )∆tk−1 ).

k=1

Let Gn,ε (θ) = (

G1n,ε

Gin,ε (θ) =

, . . . , Gpn,ε )T with

n  (∂θi b)T (Xtk−1 , θ )(Xtk − Xtk−1 − b(Xtk−1 , θ )∆tk−1 ),

i = 1, . . . , p,

k=1 ij

and let Kn,ε (θ) = ∇θ Gn,ε (θ), which is a p × p matrix consisting of elements Kn,ε (θ ) = ∂θj Gin,ε (θ ), 1 ≤ i, j ≤ p. Moreover, we introduce the following function K (θ) = ij

1



(∂θj ∂θi b)T (Xs0 , θ)(b(Xs0 , θ0 ) − b(Xs0 , θ))ds − I ij (θ ),

1 ≤ i, j ≤ p.

0

Then we define the matrix function K (θ) = (K ij (θ ))1≤i,j≤p . Before proving Theorem 2.2, we prepare some preliminary results. Lemma 3.6. Assume (A1)–(A2). Then, we have that for each i = 1, . . . , p Pθ

ε Gn,ε (θ0 ) −→ −1 i

0

1



(∂θi b)T (Xs0 , θ0 )dLs 0

as ε → 0, n → ∞ and nε → ∞.

Author's personal copy 432

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

Proof. Note that for 1 ≤ i ≤ p

ε −1 Gin,ε (θ0 ) = ε −1

n  (∂θi b)T (Xtk−1 , θ0 )(Xtk − Xtk−1 − b(Xtk−1 , θ0 )∆tk−1 ) k=1

 n  = ε −1 (∂θi b)T (Xtk−1 , θ0 ) k=1 (1) Hn,ε (θ0 )

:=

tk

(b(Xs , θ0 ) − b(Xtk−1 , θ0 ))ds +

tk−1

+

n  (∂θi b)T (Xtk−1 , θ0 )(Ltk − Ltk−1 ) k=1

Hn(2,ε) (θ0 ).

By using Lemma 3.4 and letting f (x, θ ) = ∂θi bj (x, θ ) (1 ≤ i ≤ p, 1 ≤ j ≤ d) with θ = θ0 , we have Hn(2,ε) (θ0 )

1



(∂θi b) ( T

=

Ysn,ε

Pθ 0

, θ0 )dLs −→



1

(∂θi b)T (Xs0 , θ0 )dLs

0

0

(1)

(1)

as ε → 0 and n → ∞. It suffices to prove that Hn,ε (θ0 ) converges to zero in probability. For Hn,ε (θ0 ), we need some delicate estimate for the process Xt . For s ∈ [tk−1 , tk ], we have s



(b(Xu , θ0 ) − b(Xtk−1 , θ0 ))du + b(Xtk−1 , θ0 )(s − tk−1 ) + ε(Ls − Ltk−1 ).

Xs − Xtk−1 = tk−1

By the Lipschitz condition on b and the Cauchy–Schwarz inequality, we find that

2    s  2   (b(Xu , θ0 ) − b(Xtk−1 , θ0 ))du + 2 |b(Xtk−1 , θ0 )|(s − tk−1 ) + ε|Ls − Ltk−1 | |Xs − Xtk−1 |2 ≤ 2    tk−1  2  s

|Xu − Xtk−1 |2 du + 2 n−1 |b(Xtk−1 , θ0 )| + ε

≤ 2K 2 n−1

tk−1

sup tk−1 ≤s≤tk

|Ls − Ltk−1 |

.

By Gronwall’s inequality, we get

2

 |Xs − Xtk−1 | ≤ 2 n |b(Xtk−1 , θ0 )| + ε −1

2

sup tk−1 ≤s≤tk

|Ls − Ltk−1 |

e2K

2 n−1 (s−t

k−1 )

.

It further follows that



√ sup tk−1 ≤s≤tk

|Xs − Xtk−1 | ≤

2 n

 −1

|b(Xtk−1 , θ0 )| + ε

sup tk−1 ≤s≤tk

|Ls − Ltk−1 | eK

2 /n2

.

(3.7)

Thus, by the Lipschitz condition on b and (3.7), we get

    tk   |Hn(1,ε) (θ0 )| ≤ ε −1 |∂θi b(Xtk−1 , θ0 )| ·  (b(Xs , θ0 ) − b(Xtk−1 , θ0 ))ds   tk−1 k =1  tk n  ≤ ε −1 |∂θi b(Xtk−1 , θ0 )| · |b(Xs , θ0 ) − b(Xtk−1 , θ0 )|ds n 

tk−1

k=1

≤ ε −1

n 

|∂θi b(Xtk−1 , θ0 )|



n 

2 2 2KeK /n



K |Xs − Xtk−1 |ds

|∂θi b(Xtk−1 , θ0 )|

k =1



tk tk−1

k=1

≤ (nε)−1 K



·

n 1

n k=1

sup tk−1 ≤s≤tk

|Xs − Xtk−1 | √

|∂θi b(Xtk−1 , θ0 )| · |b(Xtk−1 , θ0 )| +

2 2 n 2KeK /n 

n

:= Hn(1,ε,1) (θ0 ) + Hn(1,ε,2) (θ0 ). (1,1)

It is easy to see that Hn,ε (θ0 ) converges to zero in probability as nε → ∞ since n 1

n k =1

 λ+1 |∂θi b(Xtk−1 , θ0 )| · |b(Xtk−1 , θ0 )| ≤ CK 1 + sup |Xs | < ∞ a.s. 0≤s≤1

k=1

|∂θi b(Xtk−1 , θ0 )|

sup tk−1 ≤s≤tk

|Ls − Ltk−1 |

Author's personal copy H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

433

(cf. (3.1)). By using the basic fact that n 1

n k=1

sup tk−1 ≤s≤tk

|Ls − Ltk−1 | = oP (1),

we find that Hn(1,ε,2) (θ0 )

√ ≤

2Ke

K 2 /n2

λ  n 1 C 1 + sup |Xs | 

n k=1

0≤s≤1

sup tk−1 ≤s≤tk

|Ls − Ltk−1 |,

which converges to zero in probability as ε → 0 and n → ∞. Therefore the proof is complete.



Lemma 3.7. Assume (A1)–(A4). Then, we have Pθ 0

sup |Kn,ε (θ) − K (θ )| −→ 0 θ∈Θ

as ε → 0 and n → ∞. Proof. It suffices to prove that for 1 ≤ i, j ≤ p Pθ

sup |Knij,ε (θ) − K ij (θ)| −→ 0 0

θ∈Θ

as ε → 0 and n → ∞. Note that Knij,ε (θ) = ∂θj Gin,ε (θ )

=

n  (∂θj ∂θi b)T (Xtk−1 , θ)(Xtk − Xtk−1 − b(Xtk−1 , θ0 )∆tk−1 ) k=1

+ :=

n 1 

n i =1

,(1) Knij,ε

 (∂θj ∂θi b)T (Xtk−1 , θ)(b(Xtk−1 , θ0 ) − b(Xtk−1 , θ )) − (∂θi b)T (Xtk−1 , θ )∂θj b(Xtk−1 , θ )

(θ ) + Knij,ε,(2) (θ). ij,(1)

By using Lemma 3.5 and letting f (x, θ ) = ∂θj ∂θi bl (x, θ) (1 ≤ i, j ≤ p, 1 ≤ l ≤ d), we have that supθ∈Θ |Kn,ε (θ )| converges to zero in probability as ε → 0 and n → ∞. By using Lemma 3.3 and letting f (x, θ ) = (∂θj ∂θi b)T (x, θ )(b(x, θ0 ) − b(x, θ )) −

(∂θi b)T (x, θ)∂θj b(x, θ), it follow that supθ ∈Θ |Knij,ε,(2) (θ ) − K ij (θ)| converges to zero in probability as ε → 0 and n → ∞. Thus, 

the proof is complete.

Finally we are ready to prove Theorem 2.2. Proof of Theorem 2.2. The proof ideas mainly follow Uchida [42]. Let B(θ0 ; ρ) = {θ : |θ − θ0 | ≤ ρ} for ρ > 0. Then, by the consistency of θˆn,ε , there exists a sequence ηn,ε → 0 as ε → 0 and n → ∞ such that B(θ0 ; ηn,ε ) ⊂ Θ0 , and that Pθ0 [θˆn,ε ∈ B(θ0 ; ηn,ε )] → 1. When θˆn,ε ∈ B(θ0 ; ηn,ε ), it follows by Taylor’s formula that Dn,ε Sn,ε = ε −1 Gn,ε (θˆn,ε ) − ε −1 Gn,ε (θ0 ),

1

where Dn,ε =

0

Kn,ε (θ0 + u(θˆn,ε − θ0 ))du and Sn,ε = ε −1 (θˆn,ε − θ0 ) since B(θ0 ; ηn,ε ) is a convex subset of Θ0 . We have

|Dn,ε − Kn,ε (θ0 )|1{θˆn,ε ∈B(θ0 ;ηn,ε )} ≤

sup

|Kn,ε (θ) − Kn,ε (θ0 )|

sup

|Kn,ε (θ ) − K (θ )| +

θ∈B(θ0 ;ηn,ε )



θ∈B(θ0 ;ηn,ε )

sup

θ∈B(θ0 ;ηn,ε )

|K (θ ) − K (θ0 )| + |Kn,ε (θ0 ) − K (θ0 )|.

Consequently, it follows from Lemma 3.7 that Pθ 0

Dn,ε −→ K (θ0 ),

ε → 0, n → ∞.

Note that K (θ) is continuous with respect to θ . Since −K (θ0 ) = I (θ0 ) is positive definite, there exists a positive constant δ > 0 such that inf|w|=1 |K (θ0 )w| > 2δ . For such a δ > 0, there exists ε(δ) > 0 and N (δ) ∈ N such that for any ε ∈ (0, ε(δ)), n > N (δ), B(θ0 ; ηn,ε ) ⊂ Θ0 and |K (θ ) − K (θ0 )| < δ/2 for θ ∈ B(θ0 ; ηn,ε ). For such δ > 0, let

 Γn,ε =

sup |θ−θ0 | 0 : [M , M ] > m or 0 |dAs | > m}. Then, we can modify the definition of τmn,ε by

τmn,ε = inf{t ≥ 0 : |Xt0 | ≥ m or |Ytn,ε | ≥ m} ∧ Tm . Lemma 3.2 still holds, i.e. τmn,ε → ∞ a.s. uniformly in n and ε as m → ∞. When the proofs are based on pathwise arguments, they can be carried over to the semi-martingale noise case easily. When the proofs are based on the Markov inequality (or the Chebyshev inequality), Ito’s isometry and the Burkholder–Davis–Gundy inequality (c.f. Theorem 54 and remark on page 175 in Chapter IV of Protter [32]), we can apply the modified stopping times τmn,ε to the stochastic integrals with respect to the local martingale Mt . Thus all the proofs will be still valid in terms of the modifications described as above. We omit the details here. 5. Simulations Consider a 2-dimensional model for (1.2) with

 

b(x, θ) =  θ1 + x21 + x22 , − 

T

θ2 x2

1 + x21 + x22

 ,

 Lt =

δ,γ

Vt

 + Bt , α

(5.1)

St

where B is the standard Brownian motion, S α is a standard symmetric α -stable process Sα (1, 0, 0), and V δ,γ is a variance gamma process with Lévy density pV ( z ) =

δ −γ |z | e , |z |

z ∈ R, δ, γ > 0.

In this example, we find that our LSE of θ , say θˆ = (θˆn,ε,1 , θˆn,ε,2 ) satisfies (2) (2) (2) Xt −Xt Xt k−1 k−1  k (1) 2 (2) 2 ) +(Xt ) k=1 1+(Xt k−1 k−1

n 

(1)

n  k=1

(1)

Xtk − Xtk−1



θˆn,ε,1 + (Xt(k1−)1 )2 + (Xt(k2−)1 )2

= 1;

θˆn,ε,2 = −

n− 1



n 



(2) 2 ) k−1

(Xt

.

(5.2)

(1) 2 (2) 2 k=1 1+(Xtk−1 ) +(Xtk−1 )

In the sequel, we set values of parameters as

(X0(1) , X0(2) ) = (1, 1),

(θ1 , θ2 ) = (2, 1),

(δ, γ , α) = (5, 3, 3/2).

Then both X (1) and X (2) are infinite activity jump-processes, but the jump activity of X (1) is bounded variation, and the one (1) (2) of X (2) is unbounded variation. A sample path of X = (Xt , Xt )t ∈[0,1] with ε = 0.3 is given in Fig. 1.

In each experiment, we generate a discrete sample (Xtk )k=0,1,...,n and compute θˆ from the sample. This procedure is iterated 10,000 times, and the mean and the standard deviation of 10,000 sampled estimators are computed in each case of (ε, n). To optimize the nonlinear equation (5.2), we used nlm function in R. On generating discrete samples of Lévy processes, see e.g., Cont and Tankov [4], Section II.6 and references therein, or one can find some random number generator in yuima package of R, which is a package for simulating SDEs with jumps; see https://r-forge.r-project.org/projects/yuima/. For example, we use rstable to generate random samples from α -stable distributions. The results are shown in Tables 1–4. From those tables, we can observe the consistency result holds true when ε → 0. We also note that the size of n is often less important in practice for estimating the drift parameter than the size of ε although n → ∞ is necessary in theory. It is intuitively clear because the accuracy of estimating drift highly depends on the terminal time T of observations in general, that is, the larger T becomes, the more accurately θ is estimated. However, the terminal T = 1 is now fixed. Note that, in the small noise model, letting ε → 0 corresponds to observing a process from a macro point of view, which corresponds to the case T → ∞ in some sense. Therefore, increasing n under fixed ε does not improve a bias of estimators, which is improved only if ε → 0. In general, a large n can decrease the standard error (or standard deviation) of estimation, but the effect seems small in this example. Comparing standard deviations between θˆn,ε,1 and θˆn,ε,2 , the former seems to be estimated more ‘stably’ than the latter. This is because ‘‘big’’ jumps of X (1) are less frequent than those of X (2) . If ε is small enough, the path of X (1) is almost similar to the deterministic curve of (X 0 )(1) since ‘‘big’’ jumps do not occur so frequently. However, X (2) can have more ‘‘big’’ jumps that are not ignorable even if ε is ‘‘small’’, which makes the estimator fluctuating. To observe the asymptotic distribution of θˆ , we shall compare the above example, say Model A (non-Gaussian noise), with a 2-dimensional process with the same drift b as in (5.1), but the driving noise L is 2-dimensional Brownian motion, say Model

Author's personal copy H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

0.6 0.4

X2

0.8

1.0 1.0

1.5

2.0

X1

2.5

3.0

3.5

436

0.0

0.2

0.4

0.6

0.8

1.0

Time

Fig. 1. A sample path of Model (5.1) with (θ1 , θ2 , δ, γ , α) = (2, 1, 5, 3, 3/2) and ε = 0.3. Table 1 Mean (upper) and standard deviation (parentheses) of estimates through 10,000 experiments in the case ε = 0.3 and (δ, γ , α) = (5, 3, 3/2).

ε = 0.3

n = 500

n = 1000

n = 3000

True

θˆn,ε,1

2.30885 (1.8770) 1.54087 (2.8493)

2.31618 (1.8248) 1.50664 (2.8685)

2.29381 (1.7926) 1.52753 (2.7667)

2.0

θˆn,ε,2

1.0

Table 2 Mean (upper) and standard deviation (parentheses) of estimates through 10,000 experiments in the case ε = 0.1 and (δ, γ , α) = (5, 3, 3/2).

ε = 0.1

n = 500

n = 1000

n = 3000

True

θˆn,ε,1

2.03134 (0.5829) 1.10165 (1.2024)

2.02699 (0.5836) 1.09839 (1.1212)

2.02389 (0.5833) 1.09709 (1.0971)

2.0

θˆn,ε,2

1.0

Table 3 Mean (upper) and standard deviation (parentheses) of estimates through 10,000 experiments in the case ε = 0.05 and (δ, γ , α) = (5, 3, 3/2).

ε = 0.05

n = 500

n = 1000

n = 3000

True

θˆn,ε,1

2.00583 (0.2961)

2.00599 (0.2951)

2.01071 (0.2913)

2.0

θˆn,ε,2

1.04883 (1.4364)

1.05963 (1.3026)

1.04438 (0.6773)

1.0

B (Gaussian noise). Figs. 2 and 3 respectively show (normal) QQ-plots for 10,000 iterated samples of ε −1 (θˆn,ε,i −θi ) (i = 1, 2) in Model A with (ε, n) = (0.01, 3000), and Figs. 4 and 5 are those for Model B with (ε, n) = (0.01, 3000). In Model B, (marginal) asymptotic distributions of ε −1 (θˆn,ε,i −θi ) (i = 1, 2) must theoretically be normal, which are supported by Figs. 4 and 5. On the other hand, tails of the corresponding distributions in Model A should be heavier than normal distributions due

Author's personal copy H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

437

Table 4 Mean (upper) and standard deviation (parentheses) of estimates through 10,000 experiments in the case ε = 0.01 and (δ, γ , α) = (5, 3, 3/2).

ε = 0.01

n = 500

n = 1000

n = 3000

True

θˆn,ε,1

2.00051 (0.0583) 1.00308 (0.2626)

2.00061 (0.0583) 1.00572 (0.1454)

2.00108 (0.0578) 0.99958 (0.1371)

2.0

θˆn,ε,2

1.0

Fig. 2. Normal QQ-plot for 10,000 iterated samples of ε −1 (θˆn,ε,1 − θ1 ) in Model A (non-Gaussian); (ε, n) = (0.01, 3000).

Fig. 3. Normal QQ-plot for 10,000 iterated samples of ε −1 (θˆn,ε,2 − θ2 ) in Model A (non-Gaussian); (ε, n) = (0.01, 3000).

to jump activities, and we can observe those facts from Figs. 2 and 3. We can also observe that the asymptotic distribution of ε −1 (θˆn,ε,2 − θ2 ) is much heavier than the one of ε −1 (θˆn,ε,1 − θ1 ) because of the high frequency of jumps in X (2) . These facts are consistent with the theory.

Author's personal copy 438

H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439

Fig. 4. Normal QQ-plot for 10,000 iterated samples of ε −1 (θˆn,ε,1 − θ1 ) in Model B (Gaussian); (ε, n) = (0.01, 3000).

Fig. 5. Normal QQ-plot for 10,000 iterated samples of ε −1 (θˆn,ε,2 − θ2 ) in Model B (Gaussian); (ε, n) = (0.01, 3000).

Acknowledgments The authors are grateful to anonymous referees for suggesting to add sections for semi-martingale noise and simulations (Sections 4 and 5). This research was supported by JSPS KAKENHI Grant Number 24740061, Japan Science and Technology Agency, CREST (the 2nd author), and NSERC Grant Number 311945-2008 (the 3rd author). References [1] Y. Aït-Sahalia, Maximum likelihood estimation of discretely sampled diffusion: a closed-form approximation approach, Econometrica 70 (2002) 223–262. [2] J.P.N. Bishwal, Parameter Estimation in Stochastic Differential Equations, in: Lecture Notes in Mathematics, Vol. 1923, Springer-Verlag, Berlin, Heidelberg, New York, 2008. [3] P.J. Brockwell, R.A. Davis, Y. Yang, Estimation for non-negative Lévy-driven Ornstein–Uhlenbeck processes, J. Appl. Probab. 44 (2007) 977–989. [4] R. Cont, P. Tankov, Financial Modelling with Jump Processes, Chapman & Hall/CRC, Boca Raton, FL, 2004. [5] A.Ja. Dorogovcev, The consistency of an estimate of a parameter of a stochastic differential equation, Theory Probab. Math. Stat. 10 (1976) 73–82.

Author's personal copy H. Long et al. / Journal of Multivariate Analysis 116 (2013) 422–439 [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50]

439

V. Fasen, Statistical estimation of multivariate Ornstein–Uhlenbeck processes and applications to co-integration, J. Econometrics 172 (2013) 325–337. V. Genon-Catalot, Maximum contrast estimation for diffusion processes from discrete observations, Statistics 21 (1990) 99–116. A. Gloter, M. Sørensen, Estimation for stochastic differential equations with a small diffusion coefficient, Stochastic Process. Appl. 119 (2009) 679–699. Y. Hu, H. Long, Parameter estimation for Ornstein–Uhlenbeck processes driven by α -stable Lévy motions, Commun. Stoch. Anal. 1 (2007) 175–192. Y. Hu, H. Long, Least squares estimator for Ornstein–Uhlenbeck processes driven by α -stable motions, Stochastic Process. Appl. 119 (2009) 2465–2480. I.A. Ibragimov, R.Z. Has’minskii, Statistical Estimation: Asymptotic Theory, Springer-Verlag, New York, Berlin, 1981. R.A. Kasonga, The consistency of a nonlinear least squares estimator for diffusion processes, Stochastic Process. Appl. 30 (1988) 263–275. N. Kunitomo, A. Takahashi, The asymptotic expansion approach to the valuation of interest rate contingent claims, Math. Finance 11 (2001) 117–151. Yu.A. Kutoyants, Parameter Estimation for Stochastic Processes, Heldermann, Berlin, 1984. Yu.A. Kutoyants, Identification of Dynamical Systems with Small Noise, Kluwer, Dordrecht, 1994. Yu.A. Kutoyants, Statistical Inference for Ergodic Diffusion Processes, Springer-Verlag, London, Berlin, Heidelberg, 2004. C.F. Laredo, A sufficient condition for asymptotic sufficiency of incomplete observations of a diffusion process, Ann. Statist. 18 (1990) 1158–1171. A. Le Breton, On continuous and discrete sampling for parameter estimation in diffusion type processes, Math. Program. Studies 5 (1976) 124–144. R.S. Liptser, A.N. Shiryaev, Statistics of Random Processes: II Applications, Second Edition, in: Applications of Mathematics, Springer-Verlag, Berlin, Heidelberg, New York, 2001. A.W. Lo, Maximum likelihood estimation of generalized Ito processes with discretely sampled data, Econometric Theory 4 (1988) 231–247. H. Long, Least squares estimator for discretely observed Ornstein–Uhlenbeck processes with small Lévy noises, Statist. Probab. Lett. 79 (2009) 2076–2085. H. Long, Parameter estimation for a class of stochastic differential equations driven by small stable noises from discrete observations, Acta Math. Scient. 30B (2010) 645–663. C. Ma, A note on ‘‘Least squares estimator for discretely observed Ornstein–Uhlenbeck processes with small Lévy noises’’, Statist. Probab. Lett. 80 (2010) 1528–1531. H. Masuda, Simple estimators for parametric Markovian trend of ergodic processes based on sampled data, J. Japan Statist. Soc. 35 (2005) 147–170. H. Masuda, Approximate self-weighted LAD estimation of discretely observed ergodic Ornstein–Uhlenbeck processes, Electron. J. Stat. 4 (2010) 525–565. T. Ogihara, N. Yoshida, Quasi-likelihood analysis for the stochastic differential equation with jumps, Stat., Inference Stoch. Process. 14 (2011) 189–229. A.R. Pedersen, A new approach to maximum likelihood estimation for stochastic differential equations based on discrete observations, Scand. J. Statist. 22 (1995) 55–71. A.R. Pedersen, Consistency and asymptotic normality of an approximate maximum likelihood estimator for discretely observed diffusion processes, Bernoulli 1 (1995) 257–279. R. Poulsen, Approximate maximum likelihood estimation of discretely observed diffusion processes, Tech. Report 29, Centre for Analytical Finance, University of Aarhus, 1999. B.L.S. Prakasa Rao, Asymptotic theory for nonlinear least squares estimator for diffusion processes, Math. Operations forschung Statist Ser. Statist. 14 (1983) 195–209. B.L.S. Prakasa Rao, Statistical Inference for Diffusion Type Processes, Oxford University Press, Arnold, London, New York, 1999. P. Protter, Stochastic Integration and Differential Equations, Springer-Verlag, Berlin, Heidelberg, New York, 1990. Y. Shimizu, M-estimation for discretely observed ergodic diffusion processes with infinite jumps, Stat. Inference Stoch. Process. 9 (2006) 179–225. Y. Shimizu, Quadratic type contrast functions for discretely observed non-ergodic diffusion processes, Research Report Series 09-04, Division of Mathematical Science, Osaka University, 2010. Y. Shimizu, N. Yoshida, Estimation of parameters for diffusion processes with jumps from discrete observations, Stat. Inference Stoch. Process. 9 (2006) 227–277. M. Sørensen, Small dispersion asymptotics for diffusion martingale estimating functions, Preprint No. 2000-2, Department of Statistics and Operation Research, University of Copenhagen, Copenhagen, 2000. H. Sørensen, Parameter inference for diffusion processes observed at discrete points in time: a survey, Internat. Statist. Rev. 72 (2004) 337–354. M. Sørensen, M. Uchida, Small diffusion asymptotics for discretely sampled stochastic differential equations, Bernoulli 9 (2003) 1051–1069. K. Spiliopoulos, Methods of moments estimation of Ornstein–Uhlenbeck processes driven by general Lévy process, Preprint, University of Maryland, 2008. A. Takahashi, An asymptotic expansion approach to pricing contingent claims, Asia-Paci. Financial Markets 6 (1999) 115–151. A. Takahashi, N. Yoshida, An asymptotic expansion scheme for optimal investment problems, Stat. Inference Stoch. Process. 7 (2004) 153–188. M. Uchida, Estimation for discretely observed small diffusions based on approximate martingale estimating functions, Scand. J. Statist. 31 (2004) 553–566. M. Uchida, Approximate martingale estimating functions for stochastic differential equations with small noises, Stochastic Process. Appl. 118 (2008) 1706–1721. M. Uchida, N. Yoshida, Information criteria for small diffusions via the theory of Malliavin-Watanabe, Stat. Inference Stoch. Process. 7 (2004) 35–67. M. Uchida, N. Yoshida, Asymptotic expansion for small diffusions applied to option pricing, Stat. Inference Stoch. Process. 7 (2004) 189–223. L. Valdivieso, W. Schoutens, F. Tuerlinckx, Maximum likelihood estimation in processes of Ornstein–Uhlenbeck type, Stat. Infer. Stoch. Process. 12 (2009) 1–19. A.W. van der Vaart, Asymptotic Statistics, in: Cambridge Series in Statistical and Probabilistic Mathematics, vol. 3, Cambridge University Press, 1998. N. Yoshida, Asymptotic expansion of maximum likelihood estimators for small diffusions via the theory of Malliavin-Watanabe, Probab. Theory Relat. Fields 92 (1992) 275–311. N. Yoshida, Asymptotic expansion for statistics related to small diffusions, J. Japan Statist. Soc. 22 (1992) 139–159. N. Yoshida, Conditional expansions and their applications, Stochastic Process. Appl. 107 (2003) 53–81.