Tutorial on Stochastic Differential Equations - Semantic Scholar

Report 18 Downloads 164 Views
Please cite as J. R. Movellan (2011) Tutorial on Stochastic Differential Equations, MPLab Tutorials Version 06.1.

Tutorial on Stochastic Differential Equations

Javier R. Movellan

c 2003, 2004, 2005, 2006, Javier R. Movellan Copyright

This document is being reorganized. Expect redundancy, inconsistencies, disorganized presentation ...

1

Motivation

There is a wide range of interesting processes in robotics, control, economics, that can be described as a differential equations with non-deterministic dynamics. Suppose the original processes is described by the following differential equation dXt = a(Xt ) (1) dt with initial condition X0 , which could be random. We wish to construct a mathematical model of how the may behave in the presence of noise. We wish for this noise source to be stationary and independent of the current state of the system. We also want for the resulting paths to be continuous. As it turns out building such a model is tricky. An elegant mathematical solution to this problem may be found by considering a discrete time versions of the process and then taking limits in some meaningful way. Let π = {0 = t0 ≤ t1 · · · ≤ tn = t} be a partition of the interval [0, t]. Let ∆π tk = tk+1 − tk . For each partition π we can construct a continuous time process X π defined as follows Xtπ0 = X0 Xtπk+1

=

(2)

Xtπk

+

a(Xtπk )∆π tk

+

c(Xtπk )(Ntk+1

− Ntk )

(3)

where N is a noise process whose properties remain to be determined and b is a function that allows us to have the amount of noise be a function of time and of the state. To make the process be continuous in time, we make it piecewise constant between the intervals defined by the partition, i.e. Xtπ = Xtπk for t ∈ [tk , tk+1 )

(4)

We want for the noise Nt to be continuous and for the increments Ntk+1 − Ntk to have zero mean, and to be independently and identically distributed. It turns out that the only noise source that satisfies these requirements is Brownian motion. Thus we get Xtπ = X0 +

n−1 X

a(Xtπk )∆tk +

k=0

n−1 X

c(Xtπk )∆Bk

(5)

k=0

where ∆tk = tk+1 − tk , and ∆Bk = Btk+1 − Btk where B is Brownian Motion. Let kπk = max{∆tk } be the norm of the partition π. It can be shown that as π → 0 the X π processes converge in probability to a stochastic process X. It follows that Z t n−1 X lim a(Xtπk )∆k = a(Xs )ds (6) kπk→0

0

k=0

and that n−1 X

c(Xtπk )∆Bk

(7)

k=0

converges to a process It It = lim

kπk→0

n−1 X k=0

c(Xtπk )∆Bk

(8)

Note It looks like an integral where the integrand is a random variable c(Xs ) and the integrator ∆Bk is also a random variable. As we will see later, It turns out to be an Ito Stochastic Integral. We can now express the limit process X as a process satisfying the following equation Z

t

Xt = X0 +

a(Xs )ds + It

(9)

0

Sketch of Proof of Convergence: Construct a sequence of partitions π1 , π2 , · · · each one being a refinement of the previous one. Show that the corresponding Xtπ−i form a Cauchy sequence in L2 and therefore converge to a limit. Call that process X. In order to get a better understanding of the limit process X there are two things we need to do: (1) To study the properties of Brownian motion and (2) to study the properties of the Ito stochastic integral.

2

Standard Brownian motion

By Brownian motion we refer to a mathematical model of the random movement of particles suspended in a fluid. This type of motion was named after Robert Brown that observed it in pollens of grains in water. The processes was described mathematically by Norbert Wiener, and is is thus also called a Wiener Processes. Mathematically a standard Brownian motion (or Wiener Process) is defined by the following properties: 1. The process starts at zero with probability 1, i.e., P (B0 = 0) = 1 2. The probability that a randomly generated Brownian path be continuous is 1. 3. The path increments are independent Gaussian, zero mean, with variance equal to the temporal extension of the increment. Specifically for 0 ≤ s1 ≤ t1 ≤ s2 , ≤ t2 Bt1 − Bs1 ∼ N (0, s1 − t1 ) Bt2 − Bs2 ∼ N (0, s2 − t2 )

(10) (11)

and Bt2 − Bs2 is independent of Bt1 − Bs1 . Wiener showed that such a process exists, i.e., there is a stochastic process that does not violate the axioms of probability theory and that satisfies the 3 aforementioned properties.

2.1 2.1.1

Properties of Brownian Motion Statistics

From the properties of Gaussian random variables, E(Bt − Bs ) = 0 Var(Bt − Bs ) = E[(Bt − Bs ) ] = t − s 2

E((Bt − Bs )4 ] = 3(t − s) Var[(Bt − Bs )2 ] = E[(Bt − Bs )4 ] − E[(Bt − Bs )2 ]2 = 2(t − s)2 Cov(Bs , Bt ) = s, for t > s r s , for t > s. Corr(Bs , Bt ) = t

(12) (13) (14) (15) (16) (17)

Proof: For the variance of (Bt − Bs )2 we used the that for a standard random variable Z E(Z 4 ) = 3 (18) Note Var(BT ) = Var(BT − B0 ) = T (19) since P (B0 = 0) and for all ∆t ≥ 0 Var(Bt+∆t − Bt ) = ∆t (20) Moreover, Cov(Bs , Bt ) = Cov(Bs , Bs + (Bt − Bs )) = Cov(Bs , Bs ) + Cov(Bs , (Bt − Bs )) = Var(Bs ) = s (21) since Bs and Bt − Bs are uncorrelated.

2.1.2

Distributional Properties

Let B represent a standard Brownian motion (SBM) process. • Self-similarity: For any c 6= 0, Xt =

√1 Bct c

is SBM.

We can use this property to simulate SBM in any given interval [0,T] if we know how to simulate in the interval [0, 1]: √ If B is SBM in [0,1], c = T1 then Xt = T B T1 t is SBM in [0,T]. • Time Inversion: Xt = tB 1t is SBM • Time Reversal: Xt = BT − BT −t is SBM in the interval [0, T ] • Symmetry: Xt = −Bt is SBM 2.1.3

Pathwise Properties • Brownian motion sample paths are non-differentiable with probability 1

This is the basic why we need to develop a generalization of ordinary calculus to handle stochastic differential equations. If we were to define such equations simply as dBt dXt = a(Xt ) + c(Xt ) (22) dt dt

we would have the obvious problem that the derivative of Brownian motion does not exist. Proof: Let X be a real valued stochastic process. For a fixed t let π = {0 = t0 ≤ t1 , · · · ≤ tn = t} be a partition of the interval [0, t]. Let kπk be the norm of the partition. The quadratic variation of X at t is a random variable represented as < X, X >2t and defined as follows n X

< X, X >2t = lim

kπk→0

|Xtk+1 − Xtk |2

(23)

k=1

We will show that the quadratic variation of SBM is larger than zero with probability one, and therefore the quadratic paths are not differentiable with probability 1. Let B be a Standard Brownian Motion. For a partition π = {0 = t0 ≤ t1 , · · · ≤ tn = t} let Bkπ be defined as follows Bkπ = Btk Let π

S =

n X

(24) 2

(∆Bkπ )

(25)

tk+1 − tk = t

(26)

k=1

Note

E(S π ) =

n−1 X k=0

and 0 ≤ Var(S π ) =

n−1 X

  Var (∆Bkπ )2

k=0

=2

n−1 X

(tk+1 − tk )2

k=0

≤ 2kπk

n−1 X

(tk+1 − tk ) = 2kπkt

(27)

k=0

Thus 

lim Var(S π ) = lim

kπk→0

kπk→0

n−1 X

E

!2  (∆Bkπ )2 − t

=0

(28)

k=0

This shows mean square convergence, which implies convergence in probability, of S π to t. (I think) almost sure convergence can also be shown. Comments: Rt • If we were to define the stochastic integral 0 (dBs )2 as Z t (dBs )2 = lim S π kπk→0

0

(29)

Then Z

t 2

Z

(dBs ) = 0

t

ds = t 0

(30)

• If a path Xt (ω) were differentiable almost everywhere in the interval [0, T ] then

< X, X >2t (ω)) ≤ lim

n−1 X

∆t →0

= ( max Xt0 (ω)2 ) lim

(∆t Xt0k (ω))2

X

n→∞

t∈[0,T ]

∆2t

= ( max Xt0 (ω)2 ) lim (n)(T /n)2 = 0 n→∞

t∈[0,T ]

(31)

k=0

(32) (33)

where X 0 = dX/dt. Since Brownian paths have non-zero quadratic variation with probability one, they are also non-differentiable with probability one. 2.2

Simulating Brownian Motion

Let π = {0 = t0 ≤ t1 · · · ≤ tn = t} be a partition of the interval [0, t]. Let {Z1 , · · · , Zn }; be i.i.d Gaussian random variables E(Zi ) = 0; Var(Zi ) = 1. Let the stochastic process B π as follows, Btπ0 = 0 Btπ1

=

Btπ0

=

Btπk−1

+



(34) t1 − t0 Z1

.. . Btπk

(35) (36)

p + tk − tk−1 Zk

(37)

Btπ = Btπk−1 for t ∈ [tk−1 , tk )

(38)

Moreover, For each partition π this defines a continuous time process. It can be shown that as kπk → 0 the process B π converges in distribution to Standard Brownian Motion. 2.2.1

Exercise

Simulate Brownian motion and verify numerically the following properties

E(Bt ) = 0 Var(Bt ) = t Z t Z t dBs2 = ds = t 0

3

(39) (40) (41)

0

The Ito Stochastic Integral

We want to give meaning to the expression Z t Ys dBs

(42)

0

where B is standard Brownian Motion and Y is a process that does not anticipate the future of Brownian motion. For example, Yt = Bt+2 would not be a valid

integrand. A random process Y is simply a set of functions f (t, ·) from an outcome space Ω to the real numbers, i.e for each ω ∈ Ω Yt (ω) = f (t, ω)

(43)

We will first study the case in which f is piece-wise constant. In such case there is a partition π = {0 = t0 ≤ t1 · · · ≤ tn = t} of the interval [0, .t] such that fn (t, ω) =

n−1 X

Ck (ω)ξk (t)

(44)

k=0

where  ξk (t) =

1 if t ∈ [tk , tk+1 ) 0 else

(45)

where Ck is a non-anticipatory random variable, i.e., a function of X0 and the Brownian noise up to time tk . For such a piece-wise constant process Yt (ω) = fn (t, ω) we define the stochastic integral as follows. For each outcome ω ∈ Ω Z

t

Ys (ω)dBs (ω) = 0

n−1 X

  Ck (ω) Btk+1 (ω) − Btk (ω)

(46)

k=0

More succinctly Z

t

Ys dBs = 0

n−1 X

  Ck Btk+1 − Btk

(47)

k=0

This leads us to the more general definition of the Ito integral Definition of the Ito Integral Let f (t, ·) be a non-anticipatory function from an outcome space Ω to the real numbers. Let {f1 , f2 , · · · } be a sequence of elementary non-anticipatory functions such that Z t 2 lim E[ f (s, ω) − fn (s, ω) ds] = 0 (48) n→∞

0

Let the random process Y be defined as follows: Yt (ω) = f (t, ω) Then the Ito integral Z t Ys dBs (49) 0

is a random variable defined as follows. For each outcome ω ∈ Ω Z t Z t f (s, ω)dBs (ω) = lim fn (t, ω)dBs (ω) 0

n→∞

(50)

0

where the limit is in L2 (P ). It can be shown that an approximating sequence f1 , f2 · · · satisfying (48) exists. Moreover the limit in (50) also exists and is independent of the choice of the approximating sequence. Comment Strictly speaking we need for f to be measurable, i.e., induce a proper random variable. We also need for f (t, ·) to be Ft adapted. This basically means that Yt must be a function of Y0 and the Brownian motion R t up to time t It cannot be a function of future values of B. Moreover we need E[ 0 f (t, ·)2 dt] ≤ ∞.

3.1

Properties of the Ito Integral •

E(It ) = 0

(51)

• Var(It ) = E

(It2 )

Z = 0

t

E(Xs2 )ds

(52)

• Z

t

Z

t

Z

(53)

0

0

0

t

Ys dBs

Xs dBs +

(Xs + Ys ) dBs = • Z

T

t

Z Xs dBs =

0

T

Z

Xs dBs for t ∈ (0, T )

Xs dBs + 0

(54)

t

• The Ito integral is a Martingale process

E(It | Fs ) = Is lfor all t > s

(55)

where E(It | Fs ) is the least squares prediction of It based on all the information available up to time s.

4

Stochastic Differential Equations

In the introduction we defined a limit process X which was the limit process of a dynamical system expressed as a differential equation plus Brownian noise perturbation in the system dynamics. The process was a solution to the following equation Z t Xt = X0 + a(Xs )ds + It (56) 0

where It = lim c(Xtπk )∆Bk kπk→0

It should now be clear that It is in fact an Ito Stochastic Integral Z t It = c(Xs )dBs

(57)

(58)

0

and thus X can be expressed as the solution of the following stochastic integral equation Z t Z t Xt = X0 + a(Xs )ds + c(Xs )dBs (59) 0

0

It is convenient to express the integral equation above using differential notation dXt = a(Xt )dt + c(Xt )dBt

(60)

with given initial condition X0 . We call this an Ito Stochastic Differential Equation (SDE). The differential notation is simply a pointer, and thus acquires its meaning from, the corresponding integral equation.

4.1

Second order differentials

The following rules are useful Z t Xt (dt)2 = 0 0 Z t Xt dBt dt = 0 0 Z t Xt dBt dWt = 0 if B, W are independent Brownian Motions 0 Z t Z t Xt (dBt )2 = Xt dt 0

(61) (62) (63) (64)

0

(65) Symbolically this is commonly expressed as follows dt2 = 0 dBt dt = 0 dBt dWt = 0

(66) (67) (68)

(dBt )2 = dt

(69)

Sketch of proof: Let π = {0 = t0 ≤ t1 · · · ≤ tn = t} a partition of the [0, t] with equal intervals, i.e. tk+1 − tk = ∆t. • Regarding dt2 = 0 note lim

∆t →0

n−1 X

Xtk ∆t2 = lim ∆t ∆t →0

k=0

t

Z

Xs ds = 0

(70)

0

• Regarding dBt dt = 0 note lim

∆t →0

n−1 X

Z Xtk ∆t∆Bk = lim ∆t ∆t →0

k=0

t

Xs dBs = 0

(71)

0

• Regarding dBt2 = dt note n−1 h n−1 2 i h n−1 2 X X X Xtk ∆Bk2 − E Xtk ∆t Xtk (∆Bk2 − ∆t) ] =E k=0

=

k=0

n−1 X n−1 X

E[Xt

k

k=0

Xt0k (∆Bk2 − ∆t)(∆Bk20 − ∆t)]

(72)

k=0 k0 =0

If k > k 0 then (∆Bk2 − ∆t) is independent of Xtk Xtk0 (∆Bk20 − ∆t), and therefore

E[Xt

k

Xtk0 (∆Bk2 − ∆t)(∆Bk20 − ∆t)]

= E[Xtk Xtk0 (∆Bk20 − ∆t)]E[∆Bk2 − ∆t)] = 0

0

Equivalently, if k > k then ∆t) and therefore

E[Xt

k

(∆Bk20

(73)

− ∆t) is independent of Xtk Xtk0 (∆Bk2 −

Xtk0 (∆Bk2 − ∆t)(∆Bk20 − ∆t)]

= E[Xtk Xtk0 (∆Bk2 − ∆t)]E[∆Bk20 − ∆t)] = 0

(74)

Thus n−1 X n−1 X

E[Xt

Xtk0 (∆Bk2 − ∆t)(∆Bk20 − ∆t)] = k

k=0 k0 =0

n−1 X

E[Xt2 (∆Bk2 − ∆t)2 ] k

k=0

(75) Note since ∆Bk is independent of Xtk then

E[Xt2 (∆Bk2 − ∆t)2 = E[Xt2 ]E[(∆Bk2 − ∆t)2 ] = E[Xt2 ]Var(∆Bk2 ) = 2E[Xt2 ]∆t2 k

(76)

k

k

(77)

k

Thus

E

h n−1 X

Xtk ∆Bk2 −

k=0

n−1 X

Xtk ∆t

2 i

n−1 X

=

E[Xt2 ]∆2t k

k=0

(78)

k=0

which goes to zero as ∆t → 0. Thus, in the limit as ∆t → 0 lim

n−1 X

∆t→0

Xtk ∆Bk2 = lim

n−1 X

∆t→0

k=0

Xtk ∆t

(79)

k=0

where the limit is taken in the mean square sense. Thus Z t Z t Xs dBs2 = Xs ds 0

(80)

0

• Regarding dBt dWt = 0 note

E

h n−1 X

Xtk ∆Bk ∆Wk

2 i

=

n−1 X n−1 X

E[Xt

k

Xtk0 ∆Bk ∆Wk ∆Bk0 ∆Wk0 ]

k=0 k0 =0

k=0

(81) 0

If k > k then ∆Bk , ∆Wk are independent of Xtk Xtk0 ∆Bk0 ∆Wk0 and therefore

E[Xt

k

Xtk0 ∆Bk ∆Wk ∆Bk0 ∆Wk0 ] = E[Xtk Xtk0 ∆Bk0 ∆Wk0 ]E[∆Bk ]E[∆Wk ] = 0 (82)

Equivalently, if k 0 > k then ∆Bk0 , ∆Wk0 Xtk Xtk0 ∆Bk ∆Wk and therefore

E[Xt

k

are independent of

Xtk0 ∆Bk ∆Wk ∆Bk0 ∆Wk0 ] = E[Xtk Xtk0 ∆Bk ∆Wk ]E[∆Bk0 ]E[∆Wk0 ] = 0 (83)

Finally, for k = k 0 , ∆Bk , ∆Wk and Xtk are independent, thus

E[Xt2 ∆Bk2 ∆Wk2 ] = E[Xt2 ]E[∆Bk2 ]E[∆Wk2 ] = E[Xt2 ]∆t2 k

k

k

(84)

Thus

E

h n−1 X

Xtk ∆Bk ∆Wk

2 i

k=0

=

n−1 X

E[Xt2 ]∆t2 k

(85)

k=0

which converges to 0 as ∆t → 0. Thus Z

t

Xs dBs dWs = 0 0

(86)

4.2

Vector Stochastic Differential Equations

The form dXt = a(Xt )dt + c(Xt )dBt

(87)

is also used to represent multivariate equations. In this case Xt represents an ndimensional random vector, Bt an m-dimensional vector of m independent standard Brownian motions, and c(Xt is an n × m matrix. a is commonly known as the drift vector and b the dispersion matrix.

5

Ito’s Rule

The main thing with Ito’s calculus is that for the general case a differential carries quadratic and linear components. For example suppose that Xt is an Ito process. Let Yt = f (t, Xt )

(88)

dYt = ∇f (t, Xt )T dXt + 12 dXtT ∇2 f (t, Xt )dXt

(89)

then

where ∇, ∇2 are the gradient and Hessian with respect to (t, x). Note basically this is the second order Taylor series expansion. In ordinary calculus the second order terms are zero, but in Stochastic calculus, due to the fact that these processes have non-zero quadratic variation, the quadratic terms do not go away. This is really all you need to remember about Stochastic calculus, everything else derives from this basic fact. The most important consequence of this fact is Ito’s rule. Let Xt be governed by an SDE dXt = a(Xt , t)dt + c(Xt , t)dBt

(90)

Let Yt = f (Xt , t). Ito’s rule tells us that Yt is governed by the following SDE def

dYt = ∇t f (t, Xt )dt + ∇x f (t, x)T dXt + 21 dXtT ∇2x f (t, Xt )dXt

(91)

where def

dBi,t dBj,t = δ(i, j) dt def

dXdt = 0 2 def

dt = 0

(92) (93) (94)

Equivalently

dYt = ∇t f  (Xt , t)dt + ∇x f (Xt , t)T a(Xt  , t)dt + ∇x f (Xt , t)T c(Xt , t)dBt + 12 trace c(Xt , t)c(Xt , t)T ∇2x f (Xt , t) dt

(95)

where ∇x f (x, t)T a(x, t) =

X ∂f (x, t)

ai (x, t) ∂xi   XX ∂ 2 f (x, t) trace c(x, t)c(x, t)T ∇2x f (x, t) = (c(x, t)c(x, t)T )ij ∂xi ∂xj i j

(96)

i

(97)

Note b is a matrix. Sketch of Proof: To second order ∆Yt = f (Xt+∆t , t + ∆t) − f (Xt , t) = ∇t f (Xt , t)∆t + ∇x f (Xt , t)T ∆Xt 1 1 + ∆t2 ∇2t f (Xt , t) + ∆XtT ∇2x f (Xt , t)∆Xt + ∆t(∇x ∇t f (Xt , t))T ∆Xt ∆t 2 2 (98) where ∇t , ∇x are the gradients with respect to time and state, and ∇2t is the second derivative with respect to time, ∇2x the Hessian with respect to time and ∇x ∇t the gradient with respect to state of the gradient with respect to time. Integrating over time n−1 X Yt = Y0 + ∆Ytk (99) k=00

and taking limits Z t ∇t f (Xs , s)ds + ∇x f (Xs , s)T dXs 0 0 0 Z Z 1 t 1 t 2 ∇t f (Xs , s)(ds)2 + dXsT ∇2x f (Xs , s)dXs + 2 0 2 0 Z t + (∇x ∇t f (Xs , s))T dXs ds Z

Yt =Y0 +

t

Z

t

dYs = Y0 +

(100)

0

In differential form dYt =∇t f (Xt , t)dt + ∇x f (Xt , t)T dXt 1 1 + ∇2t f (Xt , t)(dt)2 + dXtT ∇2x f (Xt , t)dXt 2 2 + (∇x ∇t f (Xt , t))T dXt dt

(101)

Expanding dXt (∇x ∇t f (Xt , t))T dXt dt = (∇x ∇t f (Xt , t))T a(Xt , t)(dt)2 + (∇x ∇t f (Xt , t))T c(Xt , t)dBt dt = 0

(102)

where we used the standard rules for second order differentials (dt)2 = 0 (dBt )dt = 0

(103) (104) (105)

Moreover dXtT ∇2x f (Xt , t)dXt = (a(Xt , t)dt + c(Xt , t)dBt )T ∇2x f (Xt , t)(a(Xt , t)dt + c(Xt , t)dBt ) = a(Xt , t)T ∇2x f (Xt , t)a(Xt , t)(dt)2 + 2a(Xt , t)T ∇2x f (Xt , t)c(Xt , t)(dBt )dt + dBtT c(Xt , t)T ∇2x f (Xt , t)c(Xt , t)(dBt )

(106)

Using the rules for second order differentials (dt)2 = 0 (dBt )dt = 0

(107) (108)

dBtT K(Xt , t)dBt =

XX i

Ki,j (Xt , t)dBi,t dBj,t =

j

X

Ki,i dt

(109)

i

where K(Xt , t) = c(Xt , t)T ∇2x f (Xt , t)c(Xt , t)

(110)

Thus dYt =∇t f (Xt , t)dt + ∇x f (Xt , t)T a(Xt , t)dt + ∇x f (Xt , t)T c(Xt , t)dBt   1 + trace c(Xt , t)c(Xt , t)T ∇2x f (Xt , t) dt (111) 2 where we used the fact that X Kii (Xt , t)dt = trace(K)dt i

  = trace c(Xt , t)T ∇2x f (Xt , t)c(Xt , t)   = trace c(Xt , t)c(Xt , t)T ∇2x f (Xt , t)

5.1

(112)

Product Rule

Let X, Y be Ito processes then d(Xt Yt ) = Xt dYt + Yt dXt + dXt dYt

(113)

Proof: Consider X, Y as a joint Ito process and take f (x, y, t) = xy. Then ∂f =0 ∂t ∂f =y ∂x ∂f =x ∂y ∂2f =1 ∂x∂y ∂2f ∂2f = =0 ∂x2 ∂y 2

(114) (115) (116) (117) (118)

Applying Ito’s rule, the Product Rule follows. RT Exercise: Solve 0 Bt dBt symbolically. Let a(Xt , t) = 0, c(Xt , t) = 1, f (x, t) = x2 . Thus dXt = dBt Xt = Bt

(119) (120)

and ∂f (t, x) =0 ∂t ∂f (t, x) = 2x ∂x 2 ∂ f (t, x) =2 ∂x2

(121) (122) (123)

Applying Ito’s rule df (Xt , t) =

∂f (Xt , t) ∂f (Xt , t) ∂f (Xt , t) dt + a(Xt , t)dt + c(Xt , t)dBt ∂t ∂x ∂x  1 ∂ 2 f (x, t)  + trace c(Xt , t)c(Xt , t)T 2 ∂x2

(124)

we get dBt2 = 2Bt dBt + dt

(125)

Equivalently Z t Z t dBs2 = 2 Bs dBs + ds 0 0 0 Z t Bt2 = 2 Bs dBs + t Z

t

(126) (127)

0

Therefore Z

t

Bs dBs = 0

1 2 1 B − t 2 t 2

(128)

NOTE: dBt2 is different from (dBt )2 . Exercise:

Get

E[eβ Bt ]

Let Let a(Xt , t) = 0, c(Xt , t) = 1, i.e., dXt = dBt . Let Yt = f (Xt , t) = eβBt , and dXt = dBt . Using Ito’s rule 1 dYt = βeβBt dBt + β 2 eβBt dt 2 Z t Z β 2 t βBs βBs Yt = Y0 + β e dBs + e ds 2 0 0

(129) (130)

Taking expected values Z β2 t (131) E[Yt ] = E[Y0 ] + 2 E[Ys ]ds 0 Rt where we used the fact that E[ 0 eβBs dBs ] = 0 because for any non anticipatory Rt random variable Yt , we know that E[ 0 Ys dBs ] = 0. Thus

and since

E[Y0 ] = 1

dE[Yt ] β2 = E[Yt ] dt 2

E[eβB ] = e t

β2 2

t

(132)

(133)

Exercise:

Solve the following SDE

dXt = αXt dt + βXt dBt

(134)

In this case a(Xt , t) = αXt , c(Xt , t) = βXt . Using Ito’s formula for f (x, t) = log(x) ∂f (t, x) =0 ∂t ∂f (t, x) 1 = ∂x x 1 ∂ 2 f (t, x) =− 2 2 ∂x x

(135) (136) (137)

Thus

d log(Xt ) =

1 1 1 2 2 β2 αXt dt + βXt dBt − β Xt dt = (α − )dt + βdBt 2 Xt Xt 2Xt 2

(138)

Integrating over time log(Xt ) = log(X0 ) + (α −

β2 )t + βBt 2

(139)

2

Xt = X0 exp((α −

β )t) exp(βBt ) 2

(140)

E[exp(αBt )] = E[X0 ]eαt

(141)

Note

E[Xt ] = E[X0 ]e(α− 6

β2 2

)t

Moment Equations

Consider an SDE of the form dXt = a(Xt )dt + c(Xt )dBt

(142)

Taking expected values we get the differential equation for first order moments dE[Xt ] = E[a(Xt )] (143) dt seems weird that c has no effect. Double check with generator of ito diffusion result With respect to second order moments, let Yt = f (Xt ) = Xi,t Xj,t

(144)

using Ito’s product rule dYt = d(Xi,t Xj,t ) =Xi,t dXj,t + Xj,t dXi,t + dXi,t dXj,t =Xi,t (aj (Xt )dt + (c(Xt )dBt )j ) + Xj,t (ai (Xt )dt + (c(Xt )dBt )i ) + (ai (Xt )dt + (c(Xt )dBt )i )(aj (Xt )dt + (c(Xt )dBt )j ) = Xi,t (aj (Xt )dt + (c(Xt )dBt )j ) + Xj,t (ai (Xt )dt + (c(Xt )dBt )i ) + ci (Xt )cj (Xt )dt (145) Taking expected values

dE[Xi,t Xj,t ] = E[Xi,t aj (Xt )] + E[Xj,t ai (Xt )] + E[ci (Xt )cj (Xt )] (146) dt In matrix form dE[Xt Xt0 ] = E[Xt a(Xt )0 ] + E[a(Xt )Xt0 ] + E[c(Xt )c(Xt )0 ] (147) dt The moment formulas are particularly useful when a, c are constant with respect to Xt , in such case dE[Xt ] = aE[Xt ] dt dE[Xt Xt0 ] = E[Xt Xt0 ]a0 + aE[Xt Xt0 ] + cc0 dt Var[Xt ] = E[Xt Xt0 ]a0 + aE[Xt Xt0 ] − aE[Xt ]E[Xt ]0 a0 + cc0 dt

(148) (149) (150) (151)

Example

Calculate the equilibrium mean and variance of the following process dXt = −Xt + cdBt

(152)

The first and second moment equations are

dE[Xt ] = −E[Xt ] dt dE[Xt2 ] = −2E[Xt ]2 + c2 dt

(153) (154)

Thus lim

t→∞

E[Xt ] = 0

(155) 2

c E[Xt2 ] = t→∞ lim Var[Xt ] = t→∞ 2 lim

7

(156)

Generator of an Ito Diffusion

The generator Gt of the Ito diffusion dXt = a(Xt , t)dt + c(Xt , t)dBt

(157)

is a second order partial differential operator. For any function f it provides the directional derivative of f averaged across the paths generated by the diffusion. In particular given the function f , the function Gt [f ] is defined as follows dE[f (Xt ) | Xt = x] E[f (Xt+∆t ) | Xt = x] − f (x) = lim ∆t→0 dt ∆t E [df (Xt ) | Xt = x] = dt Note using Ito’s rule Gt [f ](x) =

d f (Xt ) =∇x f (Xt , t)T a(Xt , t)dt + ∇x c(Xt , t)T dBt   1 + trace c(Xt , t)c(Xt , t)T ∇2x f (Xt , t) dt 2

(158)

(159)

Taking expected values Gt [f ](x) =

E[df (Xt ) | Xt = x] = ∇ dt

T x f (x) a(x, t)

  1 + trace c(x, t)c(x, t)T ∇2x f (x) 2 (160)

In other words Gt [·] =

X

ai (x, t)

i

8

∂ 1 XX ∂2 [·] + (c(x, t)c(x, t)T )i,j [·] ∂xi 2 i j ∂xi ∂xj

(161)

Adjoints

Every linear operator G on a Hilbert space H with inner product < ·, · > has a corresponding adjoint operator G∗ such that < Gx, y >=< x, G∗ y > for all x, y ∈ H

(162)

In our case the elements of the Hilbert space are functions f, g and the inner product will be of the form Z < f, g >= f (x) · g(x) dx (163) Using partial integrations it can be shown that if

G[f ](x) =

X ∂f (x) i

  1 ai (x, t) + trace c(x, t)c(x, t)T ∇2x f (x) ∂xi 2

(164) (165)

then G∗ [f ](x) = −

9

X ∂ 1 X ∂2 [f (x)ai (x, t)] + [(c(x, t)c(x, t)T )ij f (x)] ∂x 2 ∂x ∂x i i j i i,j

(166)

The Feynman-Kac Formula (Terminal Condition Version)

Let X be an Ito diffusion dXt = a(Xt , t)dt + c(Xt , t)dBt with generator Gt X ∂ 2 v(x, t) ∂v(x, t) 1 X X + (c(x, t)c(x, t)T )i,j Gt [v](x) = ai (x, t) ∂xi 2 i j ∂xi ∂xj i

(167)

(168)

Let v be the solution to the following pde ∂v(x, t) = Gt [v](x, t) − v(x, t)f (x, t) (169) ∂t with a known terminal condition v(x, T ), and function f . It can be shown that the solution to the pde (169) is as follows Z T h i  v(x, s) = E v(XT , T ) exp − f (Xt )dt | Xs = x (170) −

s

We can think of v(XT , T ) as a terminal reward and of factor.

RT s

f (Xt )dt as a discount

Informal Proof: Rt Let s ≤ t ≤ T let Yt = v(Xt , t), Zt = exp(− s f (Xτ )dτ ), Ut = Yt Zt . It can be shown (see Lemma below) that dZt = −Zt f (Xt )dt (171) Using Ito’s product rule dUt = d(Yt Zt ) = Zt dYt + Yt dZt + dYt dZt (172) Since dZt has a dt term, it follows that dYt dZt = 0. Thus dUt = Zt dv(Xt , t) − v(Xt , t)Zt f (Xt )dt (173) Using Ito’s rule on dv we get dv(Xt , t) =∇t v(Xt , t)dt + (∇x v(Xt , t))T a(Xt , t)dt + (∇x v(Xt , t))T c(Xt , t)dBt   1 + trace c(Xt , t)c(Xt , t)T ∇2x v(Xt , t) dt (174) 2 Thus h dUt =Zt ∇t v(Xt , t) + (∇x v(Xt , t))T a(Xt , t)   i 1 + trace c(Xt , t)c(Xt , t)T ∇2x v(Xt , t) − v(Xt , t)f (Xt ) dt 2 + Zt (∇x v(Xt , t))T c(Xt , t)dBt and since v is the solution to (169) then dUt = (∇x v(Xt , t))T c(Xt , t)dBt

(175) (176)

Integrating Z UT − Us =

T

Yt (∇x v(Xt , t))T c(Xt , t)dBt

(177)

s

taking expected values

E[UT | Xs = x] − E[Us | Xs = x] = 0 (178) where we used the fact that the expected values of integrals with respect to Brownian motion is zero. Thus, since Us = Y0 Z0 = v(Xs , s) E[UT | Xs = x] = E[Us | Xs = x] = v(x, s) (179) Using the definition of UT we get v(x, s) = E[v(XT , T )e− s f (Xt )dt | Xs = x] We end the proof by showing that dZt = −Zt f (Xt )dt Rt First let Yt = s f (Xτ )dτ and note Z t+∆t ∆Yt = f (Xτ )dτ ≈ f (Xt )∆t RT

(180) (181)

(182)

t

dYt = f (Xt )dt Let Zt = exp(−Yt ). Using Ito’s rule 1 dZt = ∇e−Yt dYt + ∇2 e−Yt (dYt )2 = −e−Yt f (Xt )dt = −Zt f (Xt )dt 2 where we used the fact that (dYt )2 = Zt2 f (Xt )2 (dt)2 = 0

(183) (184) (185)

10

Kolmogorov Backward equation

The Kolmogorov backward equation tells us at time s whether at a future time t the system will be in the target set A. We let ξ be the indicator function of A, i.e, ξ(x) = 1 if x ∈ A, otherwise it is zero. We want to know for every state x at time s < T what is the probability of ending up in the target set A at time T . This is call the the hit probability. Let X be an Ito diffusion dXt = a(Xt , t)dt + c(Xt , t)dBt X0 = x

(186) (187)

The hit probability p(x, t) satisfies the Kolmogorov backward pde −

∂p(x, t) = Gt [p](x, t) ∂t

(188)

i.e., − ∂p(x,t) = ∂t

P

i

ai (x, t) ∂p(x,t) ∂xi +

1 2

P

i,j (c(x, t)c(x, t)

T

2

)ij ∂∂xp(x,t) i ∂xj

(189)

subject to the final condition p(x, T ) = ξ(x). The equation can be derived from the Feynman-Kac formula, noting that the hit probability is an expected value over paths that originate at x at time s ≤ T , and setting f (x) = 0, q(x) = ξ(x) for all x p(x, t) = p(XT ∈ A | Xt = x) = E[ξ(XT ) | Xt = x] = E[q(XT )e

11

RT t

f (Xs )ds

]

(190)

The Kolmogorov Forward equation

Let X be an Ito diffusion dXt = a(Xt , t)dt + c(Xt , t)dBt X0 = x0

(191) (192)

with generator G. Let p(x, t) represent the probability density of Xt evaluated at x given the initial state x0 . Then ∂p(x, t) = G∗ [p](x, t) (193) ∂t where G∗ is the adjoint of G, i.e., ∂p(x,t) ∂t

=−

∂ i ∂xi [p(x, t)ai (x, t)]

P

+

1 2

∂2 T i,j ∂xi ∂xj [(c(x, t)c(x, t) )ij p(x, t)]

P

(194)

It is sometimes useful to express the equation in terms of the negative divergence (inflow) of a probability current J, caused by a probability velocity V X ∂Ji (x, t) ∂p(x, t) = −∇ · J(x, t) = − (195) ∂t ∂xi i J(x, t) = p(x, t)V (x, t) 1X ∂ Vi (x, t) = ai (x, t) − k(x, t)i,j log(p(x, t)ki,j (x)) 2 i ∂xj

(196) (197)

k(x) = c(x, t)c(x, t)T

(198)

From this point of view the Kolmogorov forward equation is just a law of conservation of probability (the rate of accumulation of probability in a state x equals the inflow of probability due to the probability field V ). 11.1

Example: Discretizing an SDE in state/time

Consider the following SDE dXt = a(Xt )Xt dt + c(Xt )dBt (199) The Kolmogorov Forward equation looks as follows ∂p(x, t) ∂a(x)p(x, t) 1 ∂ 2 c(x)2 p(x, t) (200) =− + ∂t ∂x 2 ∂x2 Discretizing in time and space, to first order ∂p(x, t) 1 = (p(x, t + ∆t ) − p(x−, t)) (201) ∂t ∆t and 1 ∂a(x)p(x, t) = (a(x + ∆x )p(x + ∆x , t) − a(x − ∆x )p(x − ∆x , t)) ∂x 2∆x  1  ∂a(x) ∂a(x) = )p(x + ∆x , t) − (a(x) − ∆x )p(x − ∆x , t) (a(x) + ∆x 2∆x ∂x ∂x a(x) ∂a(x) a(x) ∂a(x) = p(x + ∆x , t)( ) − p(x − ∆x , t)( ) (202) + − 2∆x 2∂x 2∆x 2∂x and  ∂ 2 c2 (x)p(x, t) 1  2 2 2 = c (x + ∆ )p(x + ∆ , t) + c (x − ∆ )p(x − ∆ , t) − 2c (x)p(x, t) x x x x ∂x2 ∆2x (203)  1 ∂c(x) ∂c(x) = 2 (c2 (x) + 2∆x c(x) )p(x + ∆x , t) + (c2 (x) − 2∆x c(x) )p(x − ∆x , t) ∆x ∂x ∂x  − 2c2 (x)p(x, t)  c(x) 2 c(x) ∂c(x) +2 ) = p(x + ∆x , t)( ∆x ∆x ∂x  c(x) 2 − 2p(x, t) ∆x  c(x) 2 c(x) ∂c(x) + p(x − ∆x , t)( ) (204) −2 ∆x ∆x ∂x Putting it together, the Kolmogorov Forward Equation can be approximated as follows p(x, t + ∆t ) − p(x, t) a(x) ∂a(x) =p(x − ∆x , t)( − ) ∆t 2∆x 2∂x a(x) ∂a(x) − p(x + ∆x , t)( + ) 2∆x 2∂x 1  c(x) 2 c(x) ∂c(x) + p(x + ∆x , t)( + ) 2 ∆x ∆x ∂x  c(x) 2 − p(x, t) ∆x 1  c(x) 2 c(x) ∂c(x) + p(x − ∆x , t)( − ) (205) 2 ∆x ∆x ∂x

Rearranging terms  ∆t c2 (x)  p(x, t + ∆t ) =p(x, t) 1 − ∆2x ∂a(x)  ∆t  c2 (x) ∂c(x) + p(x − ∆x , t) − 2c(x) + a(x) − ∆x 2∆x ∆x ∂x ∂x ∂c(x) ∂a(x)  ∆t  c2 (x) (206) + 2c(x) − a(x) − ∆x + p(x + ∆x , t) 2∆x ∆x ∂x ∂x Considering in a discrete time, discrete state system X p(Xt+∆t = x) = p(Xt = x0 )p(Xt+∆t = x | Xt = x0 )

(207)

x0

we make the following discrete time/discrete state approximation   2  c (x) ∂c(x) ∂a(x) ∆t  − a(x) + 2c(x) − ∆  x  2∆x  ∆x ∂x ∂x  if xt+∆t = xt − ∆x    ∆t c2 (x) ∂c(x) ∂a(x) + a(x) − 2c(x) ∂x − ∆x ∂x if xt+∆t = xt + ∆x p(xt+∆t | xt ) = 2∆x ∆2x  c (x)  1 − ∆t∆ if xt+∆t = xt  2  x  0 else (208) Note if the derivative of the drift function is zero, i.e., ∂a(x)/∂x = 0 the conditional probabilities add up to one. Not sure how to deal with the case in which the derivative is not zero. 11.2

Girsanov’s Theorem (Version I)

Let (Ω, F, P) be a probability space. Let B be a standard m-dimensional Brownian motion adapted to the filtration Ft . Let X, Y be defined by the following SDEs dXt = a(Xt )dt + c(Xt )dBt dYt = (c(Yt )Ut + a(Yt ))dt + c(Yt )dBt X0 = Y0 = x

(209) (210) (211)

where Xt ∈ Rn , Bt ∈ Rm and a, c satisfy the necessary conditions R t for the SDEs to be well defined and Ut is an Ft adapted process such that P( 0 kc(Xs )Us k2 ds < ∞) = 1. Let Z Z t 1 t 0 0 U Us ds (212) Zt = Us dBs + 2 0 s 0 Λt = e−Zt

(213)

and

i.e., for all A ∈ Ft

dQt = Λt dP

(214)

Qt (A) = E P [Λt IA ]

(215)

Then Z Wt =

t

Us ds + Bt 0

(216)

Qt .

is a standard Brownian motion with respect to

Informal Proof: We’ll provide a heuristic argument for the discrete time case. In discrete time the equation for W = (W1 , · · · , Wn ) would look as follows √ ∆Wk = Uk ∆t + ∆tGk (217)

where G1 , G2 , · · · are independent standard Gaussian vectors under P. Thus, under is as follows

P the log-likelihodd of W

log p(W ) = h(n, ∆t) −

n−1 1 X (∆Wk − Uk ∆t)0 (∆Wk − Uk ∆t) 2∆t

(218)

k=1

where h(n, ∆t) is constant with respect to W, U . For W to behave as Brownian motion under Q we need the probability density of W under Q to be as follows n−1 1 X log q(W ) = h(n, ∆t) − ∆Wk0 ∆Wk 2∆t

(219)

k=1

Let the random variable Z be defined as follows p(W ) Z = log q(W )

(220)

where q, p represent the probability densities of W under tively. Thus Z=

n−1 X

n−1 1X 0 Uk Uk ∆t 2

Uk0 (∆Wk − Uk ∆t) +

k=1

=

n−1 X

Q and under P respec(221)

k=1

Uk0



k=1

n−1 1X 0 ∆tGt + Uk Uk ∆t 2

(222)

k=1

(223) Note as ∆t → 0 Z Z→ 0

t

1 Us dBs + 2

Z

t

Us0 Us ds

(224)

0

q(W ) → e−Z p(W )

(225)

Remark 11.1. dYt = (Ht + a(Xt ))dt + c(Yt )dBt = a(Xt )dt + c(Yt )(Ut + dBt ) = a(Xt )dt + c(Yt )dWt

(226) (227) (228)

Therefore the distribution of Y under Qt is the same as the distribution of X under P, i.e., for all A ∈ Ft .

P(X ∈ A) = Qt (Y

or more generally

∈ A)

E P [f (X0:t )] = E Qt [f (Y0:t )] = E P [f (Y0:t )Λt ]

(229)

(230)

Remark 11.2. Radon Nykodim derivative Λt is the Radon Nykodim derivative of Qt with respect to P. This is typically represented as follows dQt Λt = e−Zt = (231) dP We can get the derivative of P with respect to Qt by inverting Λt , i.e., dP (232) = eZ t dQt Remark 11.3. Likelihood Ratio This tells us that Λt is the likelihood ratio between the process X0:t and the process Y0:t , i.e., the equivalent of pX (X)/pY (Y ) where pX , pY are the probability densities of X and Y . Remark 11.4. Importance Sampling Λt can be used in importance sampling schemes. Suppose (y [1] , λ[1] ), · · · , (y [n] , λ[n] ) are iid samples from (Y0:t , Λt ), then we can estimate E P [f (X0:t )] as follows E P [f (X0:t )] ≈ 11.2.1

n

1X f (y [i] )λ[i] n i=1

(233)

Girsanov Version II

Let dXt = c(Xt )dBt dYt = b(Yt )dt + c(Yt )dBt X0 = Y0 = x

(234) (235) (236)

Let Z Zt =

t

b(Ys )0 (c(Ys )−1 )0 dBs

0

Z 1 t b(Ys )0 k(Ys )b(Ys )ds + 2 0 dP = eZt dQt

(237) (238)

where Then under

Qt the process Y

k(Ys ) = (c(Ys )0 c(Ys ))−1

(239)

has the same distribution as the process X under

P.

Proof. We apply Girsanov’s version I with a(Xt ) = 0, Ut = c(Yt )−1 b(Xt ). Thus Z Z t 1 t 0 U Us ds Zt = Us0 dBs + 2 0 s 0 Z t b(Ys )0 (c(Ys )−1 )0 dBs = 0 Z 1 t + b(Ys )0 k(Ys )b(Ys )ds (240) 2 0 dP = eZt (241) dQt And under Qt the process Y looks like the process X, i.e., a process with zero drift.

11.2.2

Girsanov Version III

Let dXt = c(Xt )dBt dYt = b(Yt )dt + c(Yt )dBt X0 = Y0 = x

(242) (243) (244)

Let t

Z

b(Ys )0 k(Ys )dYt

Zt = 0



1 2

Z

t

b(Ys )0 k(Ys )b(Ys )ds

(245)

0

dP = eZt dQt

(246)

k(Ys ) = (c(Ys )0 c(Ys ))−1

(247)

where

Then under

Qt the process Y

has the same distribution as the process X under

Proof. We apply Girsanov’s version II Z t Zt = b(Ys )0 (c(Ys )−1 )0 dBs 0 Z 1 t b(Ys )0 (c(Ys )−1 )0 c(Ys )−1 b(Ys )ds + 2 0 Z t = b(Ys )0 (c(Ys )−1 )0 c(Ys )−1 (dYt − b(Ys )ds) 0 Z 1 t b(Ys )0 (c(Ys )−1 )0 c(Ys )−1 b(Ys )ds + 2 0 Z t = b(Ys )0 k(Ys )dYt 0 Z 1 t b(Ys )0 k(Ys )b(Ys )ds − 2 0 dP = eZt dQt

P.

(248)

(249)

(250) (251)

Informal Discrete Time Based Proof: For a given path x the ratio of the probability density of x under P and Q can be approximated as follows Y p(xtk+1 − xtk | xtk ) dP(x) ≈ (252) dQt (x) q(xtk+1 − xtk | xtk ) k

were π = {0 = t0 < t1 · · · < tn = T } is a partition of [0, T ] and p(xtk+1 | xtk ) = G(∆xtk | a(xtk )∆tk , ∆tk k(xt )−1 ) −1

q(xtk+1 | xtk ) = G(∆xtk | 0, ∆tk k(xt )

)

(253) (254)

where G(·, µ, σ) is the multivariate Gaussian distribution with mean µ and covariance matrix c. Thus X dP(x) 1  (∆xtk − a(xtk )∆tk )0 k(xt )(∆xtk − a(xtk )∆tk ) log ≈ − dQt (x) 2∆tk k=0  − ∆x0tk k(xt )∆xtk n−1

n−1 X

=

k=0

1 a(xtk )0 k(xt )∆xtk − a(xtk )0 k(xtk )a(xtk )∆tk 2

(255)

taking limits as |π| → 0 log

dP(x) = dQt (x)

Z

T

a(Xt )0 k(Xt ) dXt −

0

1 2

Z

T

a(Xt )0 k(Xt )a(Xt ) dt

(256)

0

Theorem 11.1. The dΛt differential Let Xt be an Ito process of the form dXt = a(t, Xt )dt + c(t, Xt )dBt

(257)

Let Λt = eZt Z Z t 1 t a(s, Xs )0 k(s, Xs )a(s, Xs )ds Zt = a(s, Xs )0 k(t, Xt )dXs − 2 0 0

(258) (259)

where k(t, x) = (c(t, x)c(t, x)0 )−1 .Then dΛt = Λt a(t, Xt )0 k(t, Xt )dXt

(260)

Proof. From Ito’s product rule d(Λt f (Xt )) = f (Xt )dΛt + Λt f (Xt ) + (dΛt )(df (Xt )

(261)

Note dXt0 k(t, X)a(t, Xt )a0 (t,k(t, Xt )dXt 1 = dΛt = (∇z Λt )0 dZt + dZt0 (∇2z Λt )dZt 2   1 = Λt dZt + dZt0 dZt 2

(262) (263) (264)

Moreover, from the definition of Zt 1 dZt = a(t, Xt )0 k(t, Xt )dXt − a(t, Xt )0 k(t, Xt )a(t, Xt )dt 2

(265)

Thus dZt0 dZt = dXt0 k(t, X)a(t, Xt )a0 (t, Xt )k(t, Xt )dXt =dBt0 c(t, Xt )0 k(t, X)a(t, Xt )a0 (t, Xt )k(t, Xt )c(t, Xt )dBt

(266) (267)

  = trace c(t, Xt )0 k(t, Xt )a(t, Xt )a0 (t, Xt )k(t, Xt )c(t, Xt ) dt   = trace c(t, Xt )c(t, Xt )0 k(t, Xt )a(t, Xt )a0 (t, Xt )k(t, Xt ) dt   = trace a(t, Xt )a0 (t, Xt )k(t, Xt ) dt   = trace a(t, Xt )a0 (t, Xt )k(t, Xt ) dt   = trace a0 (t, Xt )k(t, Xt )a(t, Xt ) dt

(268)

= a0 (t, Xt )k(t, Xt )a(t, Xt )dt

(273)

(269) (270) (271) (272)

Thus dΛt = Λt a(t, Xt )0 k(t, Xt )dXt

12

(274)

Zakai’s Equation

Let dXt = a(Xt )dt + c(Xt )dBt dYt = g(Xt )dt + h(Xt )dWt

(275) (276)

Λt = eZt Z Z t 1 t 0 g(Yt )0 k(Yt )g(Yt )dt Zt = g(Yt ) k(Yt )dYt + 2 0 0 k(Yt ) = (h(Yt )h(Yt )0 )−1

(277) (278) (279)

Using Ito’s product rule d(f (Xt )Λt ) = Λt df (Xt ) + f (Xt )dΛt + df (Xt )dΛt

(280)

  1 df (Xt ) = ∇x f (Xt )0 dXt + trace c(Xt )c0 (Xt )∇2x f (Xt ) dt 2 = Gt [f ](x) + ∇x f (Xt )c(Xt )dBt dΛt = Λt g(Yt )k(Yt )dYt

(281) (282)

where

Following the rules of Ito’s calculus we note dXt dYt0 is an n × m matrix of zeros. Thus   d(f (Xt )Λt ) = Λt Gt [f ](x) + ∇x f (Xt )c(Xt )dBt + g(Yt )k(Yt )dYt (283)

13

Solving Stochastic Differential Equations Let dXt = a(t, Xt )dt + c(t, Xt )dBt .

(284)

dBt dBt t Conceptually, this is related to dX dt = a(t, Xt ) + c(t, Xt ) dt where dt is white dBt noise. However, dt does not exist in the usual sense, since Brownian motion is nowhere differentiable with probability one.

We interpret solving for (284), as finding a process Xt that satisfies Zt Xt = M +

Zt a(s, Xs ) ds +

0

c(s, Xs ) dBs.

(285)

0

for a given standard Brownian process B. Here Xt is an Ito process with a(s, Xs ) = Ks and c(s, Xs ) = Hs . a(t, Xt ) is called the drift function. c(t, Xt ) is called the dispersion function (also called diffusion or volatility function). Setting b = 0 gives an ordinary differential equation.

Example 1: Geometric Brownian Motion dXt = aXt dt + bXt dBt X0 = ξ > 0

(286) (287)

Using Ito’s rule on log Xt we get

2  1 1 1 (dXt )2 d log Xt = − 2 dXt + Xt 2 Xt dXt 1 = − b2 dt Xt 2   1 = aXt − b2 dt + αdBt 2

(289)

  1 log Xt = log X0 + a − b2 t + bBt 2

(291)

1 2 Xt = X0 e(a− 2 b )t+bBt

(292)

Yt = Y0 eαt+βBt

(293)

(288)

(290)

Thus

and

Processes of the form

where α and β are constant, are called Geometric Brownian Motions. Geometric Brownian motion Xt is characterized by the fact that the log of the process is Brownian motion. Thus, at each point in time, the distribution of the process is log-normal. Let’s study the dynamics of the average path. First let Yt = ebBt

(294)

Using Ito’s rule 1 dYt = bebBt dBt + b2 ebBt (dBt )2 2 Z t Z 1 2 Yt = Y0 + b Ys dBs + b +0t Ys ds 2 0 Z 1 2 t E(Yt ) = E(Y0 ) + 2 b E(Ys )ds 0 1 2 dE(Yt ) = b E(Yt ) dt 2

E(Yt ) = E(Y0 )e Thus

E(Xt ) = E(X0 )e(a−

1 2 2b t

=e

1 2 2b t

(295) (296) (297) (298) (299)

)t E(Y ) = E(X )e(a− 21 b2 )t (300) t 0 Thus the average path has the same dynamics as the noiseless system. Note the result above is somewhat trivial considering 1 2 2b

E(dXt ) = dE(Xt ) = E(a(Xt ))dt + E(c(Xt )dBt ) = E(a(Xt ))dt + E(c(Xt ))E(dBt )

(301) (302) (303)

dE(Xt ) = E(a(Xt ))dt and in the linear case

E(a(Xt ))dt = E(at Xt + ut )dt = at E(Xt ) + ut )dt

(304)

These symbolic operations on differentials trace back to the corresponding integral operations they refer to.

14 14.1

Linear SDEs The Deterministic Case (Linear ODEs)

Let xt ∈ 0

(398)

Let ˜s = X Thus

Xs1−Ψ γ(1 − Ψ)

˜ s = a(X ˜ s )ds + dBs dX

(399) (400)

where a(x) =

Ψγ Ψ−1 (θ0 + θ1 x ˆ)ˆ x−Ψ − x ˆ γ 2

(401)

where

−1

x ˆ = (γ(1 − Ψ)x)(1−Ψ)

(402)

A special case is when Ψ = 0.5. In this case the process increments are known to have a non-central chi-squared distribution (Cox, Ingersoll, Ross, 1985) • Logistic Growth – Model I dXt = aXt (1 − Xt /k)dt + bXt dBt

(403)

 X0 exp (a − b2 /2)t + bBt Xt = Rt 1 + Xk0 a 0 exp {(a − b2 /2)s + bBs } ds

(404)

dXt = aXt (1 − Xt /k)dt + bXt (1 − Xt /k)dBt

(405)

dXt = rXt (1 − Xt /k)dt − srXt2 dBt

(406)

The solution is

– Model II

– Model III In all the models r is the Maltusian growth constant and k the carrying capacity of the environment. In model II, k is unattainable. In the other models Xt can have arbitrarily large values with nonzero probability. • Gompertz Growth  dXt = αXt dt rXt log

k Xt

 dt + bXt dBt

(407)

where r is the Maltusian growth constant and k the carrying capacity. For α = 0 we get the Skiadas version of the model, for r = 0 we get the lognormal model. Using Ito’s rule on Yt = eβt log Xt we can get expected value follows the following equation  2  n o  b E(Xt ) = exp log(x0 )e−rt exp γr (1 − e−rt ) exp 4r (1 − e−2rt (408) where b2 2 Something fishy in expected value formula. Try α = b = 0! γ =α −

16

(409)

Stratonovitch and Ito SDEs

Stochastic differential equations are convenient pointers to their corresponding stochastic integral equations. The two most popular stochastic integrals are the Ito and the Stratonovitch versions. The advantage of the Ito integral is that the integrand is independent of the integrator and thus the integral is a Martingale. The advantage of the Stratonovitch definition is that it does not require changing the rules of standard calculus. The Ito interpretation of dXt = f (t, Xt )dt +

m X j=1

gj (t, Xt )dBj,t

(410)

is equivalent to the Stratonovitch equation    m X m  m X X 1 ∂g i,j dXt = f (t, Xt )dt − gi j (t, Xt ) dt + gj (t, Xt )dBj,t 2 i=1 j=1 ∂xi j=1

(411)

and the Stratonovitch interpreation of dXt = f (t, Xt )dt +

m X

gj (t, Xt )dBj,t

(412)

j=1

is equivalent to the Ito equation   m  m X m X X ∂g 1 i,j gi j (t, Xt ) dt + gj (t, Xt )dBj,t dXt = f (t, Xt )dt + 2 i=1 j=1 ∂xi j=1 

17

(413)

SDEs and Diffusions • Diffusions are rocesses governed by the Fokker-Planck-Kolmogorov equation. • All Ito SDEs are diffusions, i.e., they follow the FPK equation. • There are diffusions that are not Ito diffusions, i.e., they cannot be described by an Ito SDE. Example: diffusions with reflection boundaries.

18

Appendix I: Numerical methods

18.1 18.1.1

Simulating Brownian Motion Infinitesimal “piecewise linear” path segments

Get n independent standard Gaussian variables {Z1 , · · · , Zn }; E(Zi ) = 0; Var(Zi ) = 1. Define the stochastic process Bˆ as follows, ˆt = 0 B 0 √ ˆt = B ˆt + t1 − t0 Z1 B 1 0 .. . p ˆ ˆt Bt = B + tk − tk−1 Zk k

k−1

(414) (415) (416) (417)

Moreover, ˆt = B ˆt B for t ∈ [tk−1 , tk ) (418) k−1 This defines a continuous time process that converges in distribution to Brownian motion as n → ∞. 18.1.2

Linear Interpolation

Same as above but linearly interpolating the starting points of path segments. ˆt = B ˆt ˆt ˆt )/(tk − tk−1 ) for t ∈ [tk−1 , tk ) B + (t − tk )(B −B (419) k−1 k−1 k Note this approach is non-causal, in that it looks into the future. I believe it is inconsistent with Ito’s interpretation and converges to Stratonovitch solutions

18.1.3

Fourier sine synthesis P ˆt (ω) = n−1 Zk (ω)φk (t) B k=0 where Zk (ω)  random variables as in previous approach, and φk (t) =  are same √ (2k+1)Rt 2 2T (2k+1)R sin 2T As n → ∞ B converges to BM in distribution. Note this approach is non-causal, in that it looks into the future. I believe it is inconsistent with Ito’s interpretation and converges to Stratonovitch solutions 18.2

Simulating SDEs

Our goal is to simulate dXt = a(Xt )dt + c(Xt )dBt ,

0 ≤ t ≤ T X0 = ξ

(420)

Order of Convergencce Let 0 = t1 < t2 · · · < tk = T A method is said to have strong oder of convergence α if there is aconstant K such that ˆ k < K(∆t )α sup E Xtk − X (421) k tk

A method is said to have week oder of convergence α if there is a constant K such that ˆ k ] < K(∆t )α sup E[Xtk ] − E[X (422) k tk

Euler-Maruyama Method ˆk = X ˆ k−1 + a(X ˆ k−1 )(tk − tk−1 ) + c(X ˆ k−1 )(Bk − Bk−1 ) X p Bk = Bk−1 + tk − tk−1 Zk

(423) (424)

where Z1 , · · · , Zn are independent standard Gaussian random variables. The Euler-Maruyama method has strong convergence of order α = 1/2, which is poorer of the convergence for the Euler method in the deterministic case, which is order α = 1. The Euler-Maruyama method has week convergence of order α = 1. Milstein’s Higher Order Method: It is based on a higher order truncation of the Ito-Taylor expansion ˆk = X ˆ k−1 + a(X ˆ k−1 )(tk − tk−1 ) + c(X ˆ k−1 )(Bk − Bk−1 ) X   1 + c(Xk−1 )∇x c(Xk−1 ) (Bk − Bk−1 )2 − (tk − tk−1 ) 2 p Bk = Bk−1 + tk − tk−1 Zk where Z1 , · · · , Zn are independent standard Gaussian random variables. method has strong convergence of order α = 1.

19

(425) (426) (427) This

History • The first version of this document, which was 17 pages long, was written by Javier R. Movellan in 1999.

• The document was made open source under the GNU Free Documentation License 1.1 on August 12 2002, as part of the Kolmogorov Project. • October 9, 2003. Javier R. Movellan changed the license to GFDL 1.2 and included an endorsement section. • March 8, 2004: Javier added Optimal Stochastic Control section based on Oksendals book. • September 18, 2005: Javier. Some cleaning up of the Ito Rule Section. Got rid of the solving symbolically section. • January 15, 2006: Javier added new stuff to the Ito rule. Added the Linear SDE sections. Added Boltzmann Machine Sections. • January/Feb 2011. Major reorganization of the tutorial.