Weak Dynamic Programming Principle for Viscosity Solutions ∗ Bruno Bouchard† and Nizar Touzi‡ February 2009
Abstract We prove a weak version of the dynamic programming principle for standard stochastic control problems and mixed control-stopping problems, which avoids the technical difficulties related to the measurable selection argument. In the Markov case, our result is tailor-maid for the derivation of the dynamic programming equation in the sense of viscosity solutions.
Key words: Optimal control, Dynamic programming, discontinuous viscosity solutions. AMS 1991 subject classifications: Primary 49L25, 60J60; secondary 49L20, 35K55.
1
Introduction
Consider the standard class of stochastic control problems in the Mayer form V (t, x) := sup E [f (XTν )|Xtν = x] , ν∈U
where U is the controls set, X ν is the controlled process, f is some given function, 0 < T ≤ ∞ is a given time horizon, t ∈ [0, T ) is the time origin, and x ∈ Rd is some given initial condition. This framework includes the general class of stochastic control problems under the so-called Bolza formulation, the corresponding singular versions, and optimal stopping problems. ∗
The authors are grateful to Nicole El Karoui for fruitful comments. This research is part of the Chair Financial Risks of the Risk Foundation sponsored by Soci´et´e G´en´erale, the Chair Derivatives of the Future sponsored by the F´ed´eration Bancaire Fran¸caise, the Chair Finance and Sustainable Development sponsored by EDF and Calyon, and the Chair Les particuliers face au risque sponsored by Groupama. † CEREMADE, Universit´e Paris Dauphine and CREST-ENSAE,
[email protected] ‡ Ecole Polytechnique Paris, Centre de Math´ematiques Appliqu´ees,
[email protected] 1
A key-tool for the analysis of such problems is the so-called dynamic programming principle (DPP), which relates the time−t value function V (t, .) to any later time−τ value V (τ, .) for any stopping time τ ∈ [t, T ) a.s. A formal statement of the DPP is: 00
V (t, x) = v(t, x) := sup E [V (τ, Xτν )|Xtν = x] .00
(1.1)
ν∈U
In particular, this result is routinely used in the case of controlled Markov jump-diffusions in order to derive the corresponding dynamic programming equation in the sense of viscosity solutions, see Lions [6, 7], Fleming and Soner [5], and Touzi [9]. The statement (1.1) of the DPP is very intuitive and can be easily proved in the deterministic framework, or in discrete-time with finite probability space. However, its proof is in general not trivial, and requires on the first stage that V be measurable. The inequality ”V ≤ v” is the easy one but still requires that V be measurable. Our weak formulation avoids this issue. Namely, under fairly general conditions on the controls set and the controlled process, it follows from an easy application of the tower property of conditional expectations that V (t, x) ≤ sup E [V ∗ (τ, Xτν )|Xtν = x] , ν∈U
where V ∗ is the upper semicontinuous envelope of the function V . The proof of the converse inequality ”V ≥ v” in a general probability space turns out to be difficult when the function V is not known a priori to satisfy some continuity condition. See e.g. Bertsekas and Shreve [1], Borkar [2], and El Karoui [4]. Our weak version of the DPP avoids the non-trivial measurable selection argument needed to prove the inequality V ≥ v in (1.1). Namely, in the context of a general control problem presented in Section 2, we show in Section 3 that: V (t, x) ≥ supν∈U E [ϕ(τ, Xτν )|Xt = x] for every upper-semicontinuous minorant ϕ of V. We also show that an easy consequence of this result is that V (t, x) ≥ sup E V∗ (τnν , Xτνnν )|Xt = x , ν∈U
where τnν := τ ∧ inf {s > t : |Xsν − x| > n}, and V∗ is the lower semicontinuous envelope of V. This result is weaker than the classical DPP (1.1). However, in the controlled Markov jumpdiffusions case, it turns out to be tailor-maid for the derivation of the dynamic programming equation in the sense of viscosity solutions. Section 5 reports this derivation in the context of controlled diffusions. Finally, Section 4 provides an extension of our argument in order to obtain a weak dynamic programming principle for mixed control-stopping problems. 2
2
The stochastic control problem
Let (Ω, F, P ) be a probability space, T > 0 a finite time horizon, and F := {Ft , 0 ≤ t ≤ T } a given filtration of F, satisfying the usual assumptions. For every t ≥ 0, we denote by Ft = (Fst )s≥0 the right-continuous filtration generated by F measurable processes that are independent of Ft , t ≥ 0. We denote by T the collection of all F−stopping times. For τ1 , τ2 ∈ T with τ1 ≤ τ2 a.s., the subset T[τ1 ,τ2 ] is the collection of all τ ∈ T such that τ ∈ [τ1 , τ2 ] a.s. When τ1 = 0, we simply write Tτ2 . We use the notations T[τt1 ,τ2 ] and Tτt2 to denote the corresponding sets of stopping times that are independent of Ft For τ ∈ T and a subset A of a finite dimensional space, we denote by L0τ (A) the collection of all Fτ −measurable random variables with values in A. H0 (A) is the collection of all F−progressively measurable processes with values in A, and H0rcll (A) is the subset of all processes in H0 (A) which are right-continuous with finite left limits. In the following, we denote by Br (z) (resp. ∂Br (z)) the open ball (resp. its boundary) of radius r > 0 and center z ∈ R` , ` ∈ N. Througout this note, we fix an integer d ∈ N, and we introduce the sets: S := [0, T ] × Rd and S0 := (τ, ξ) : τ ∈ TT and ξ ∈ L0τ (Rd ) . We also denote by USC(S) (resp. LSC(S)) the collection of all upper-semicontinuous (resp. lower-semicontinuous) functions from S to R. The set of control processes is a given subset U0 of H0 (Rk ), for some integer k ≥ 1, so that the controlled state process defined as the mapping: ν (τ, ξ; ν) ∈ S × U0 7−→ Xτ,ξ ∈ H0rcll (Rd ) for some S with S ⊂ S ⊂ S0
is well-defined and satisfies: ν θ, Xτ,ξ (θ) ∈ S for all (τ, ξ) ∈ S and θ ∈ T[τ,T ] . Given a Borel function f : Rd −→ R and (t, x) ∈ S, we introduce the reward function J : S × U −→ R: ν (2.1) J(t, x; ν) := E f Xt,x (T ) which is well-defined for controls ν in n o ν U := ν ∈ U0 : E|f (Xt,x (T ))| < ∞ ∀ (t, x) ∈ S .
(2.2)
We say that a control ν ∈ U is t-admissible if it is independent of Ft , and we denote by Ut the collection of such processes. The stochastic control problem is defined by: V (t, x) := sup J(t, x; ν) for (t, x) ∈ S. ν∈Ut
3
(2.3)
3
Dynamic programming for stochastic control problems
For the purpose of our weak dynamic programming principle, the following assumptions are crucial. Assumption A For all (t, x) ∈ S and ν ∈ Ut , the controlled state process satisfies: ν A1 (Independence) The process Xt,x is independent of Ft . ν˜ ν on A. = Xt,x A2 (Causality) For ν˜ ∈ Ut , if ν = ν˜ on A ⊂ F, then Xt,x t A3 (Stability under concatenation) For every ν˜ ∈ Ut , and θ ∈ T[t,T ]:
ν1[0,θ] + ν˜1(θ,T ] ∈ Ut . t A4 (Consistency with deterministic initial data) For all θ ∈ T[t,T ] , we have:
a. For P-a.e ω ∈ Ω, there exists ν˜ω ∈ Uθ(ω) such that ν ν E f Xt,x (T ) |Fθ (ω) ≤ J(θ(ω), Xt,x (θ)(ω); ν˜ω ) t , ν˜ ∈ Us , and ν¯ := ν1[0,θ] + ν˜1(θ,T ] , we have: b. For t ≤ s ≤ T , θ ∈ T[t,s] ν¯ ν E f Xt,x (T ) |Fθ (ω) = J(θ(ω), Xt,x (θ)(ω); ν˜) for P − a.e. ω ∈ Ω.
Remark 3.1 Assumption A3 above implies the following property of the controls set which will be needed later: t t A5 (Stability under bifurcation) For ν1 , ν2 ∈ Ut , τ ∈ T[t,T ] and A ∈ Fτ , we have: ν¯ := ν1 1[0,τ ] + (ν1 1A + ν2 1Ac ) 1(τ,T ] ∈ Ut . t To see this, observe that τA := T 1A + τ 1Ac is a stopping time in T[t,T ¯ = ν1 1[0,τA ) + ] , and ν ν2 1[τA ,T ] is the concatenation of ν1 and ν2 at the stopping time τA . t Iterating the above property, we see that for 0 ≤ t ≤ s ≤ T and τ ∈ T[t,T ] , we have the following extension: for a finite sequence (ν1 , . . . , νn ) of control in Ut with νi = ν1 on [0, τ ), and for a partion (Ai )1≤i≤n of Ω with Ai ∈ Fτt for every i ≤ n:
ν¯ := ν1 1[0,τ ) + 1[τ,T ]
n X
νi 1Ai ∈ Ut .
i=1
Our main result is the following weak version of the dynamic programming principle which uses the following notation: V∗ (t, x) := 0lim inf V (t0 , x0 ), V ∗ (t, x) := lim sup V (t0 , x0 ), (t, x) ∈ S. 0 (t ,x )→(t,x)
(t0 ,x0 )→(t,x)
4
Theorem 3.1 Let Assumptions A hold true. Then for every (t, x) ∈ S, and for all family t of stopping times {θν , ν ∈ Ut } ⊂ T[t,T ] ν V (t, x) ≤ sup E V ∗ (θν , Xt,x (θν )) .
(3.1)
ν∈Ut
Assume further that J(.; ν) ∈ LSC(S) for every ν ∈ U0 . Then, for any function ϕ : S −→ R: ν ϕ ∈ USC(S) and V ≥ ϕ =⇒ V (t, x) ≥ sup E ϕ(θν , Xt,x (θν )) ,
(3.2)
ν∈Utϕ
ν ν where Utϕ = ν ∈ Ut : E ϕ(θν , Xt,x (θν ))+ < ∞ or E ϕ(θν , Xt,x (θν ))− < ∞ . Before proceeding to the proof of this result, we report the following consequence. Corollary 3.1 Let the conditions of Theorem 3.1 hold. For (t, x) ∈ S, let {θν , ν ∈ Ut } ⊂ t ν ∞ T[t,T ] be a family of stopping times such that Xt,x 1[t,θν ] is L −bounded for all ν ∈ Ut . Then, ν ν sup E V∗ (θν , Xt,x (θν )) ≤ V (t, x) ≤ sup E V ∗ (θν , Xt,x (θν )) .
ν∈Ut
(3.3)
ν∈Ut
Proof The right-hand side inequality is already provided in Theorem 3.1. It follows from standard arguments, see e.g. Lemma 3.5 in [8], that we can find a sequence of continuous functions (ϕn )n such that ϕn ≤ V∗ ≤ V for all n ≥ 1 and such that ϕn converges pointwise to V∗ on [0, T ] × Br (0). Set φN := minn≥N ϕn for N ≥ 1 and observe that the sequence (φN )N is non-decreasing and converges pointwise to V∗ on [0, T ] × Br (0). Applying (3.2) of Theorem 3.1 and using the monotone convergence Theorem, we then obtain: V (t, x) ≥
ν ν lim E φN (θν , Xt,x (θν )) = E V∗ (θν , Xt,x (θν )) .
N →∞
2 Proof of Theorem 3.1 1. Let ν ∈ Ut be arbitrary and set θ := θ . The first assertion is a direct consequence of Assumption A4-a. Indeed, it implies that, for P-almost all ω ∈ Ω, there exists ν˜ω ∈ Uθ(ω) such that ν
ν ν E f Xt,x (T ) |Fθ (ω) ≤ J(θ(ω), Xt,x (θ)(ω); ν˜ω ) . ν ν Since, by definition, J(θ(ω), Xt,x (θ)(ω); ν˜ω ) ≤ V ∗ (θ(ω), Xt,x (θ)(ω)), it follows from the tower property of conditional expectations that
ν ν ν E f Xt,x (T ) = E E f Xt,x (T ) |Fθ ≤ E V ∗ θ, Xt,x (θ) . 2. Let {(ti , xi ), i ≥ 1} := Qd+1 ∩ S, and let ε > 0 be given. Then there is a sequence (ν i,ε )i≥1 ⊂ U0 such that: ν i,ε ∈ Uti and J(ti , xi ; ν i,ε ) ≥ V (ti , xi ) − ε, for every i ≥ 1. 5
(3.4)
By the lower-semicontinuity of J(.; ν i,ε ), together with the upper-semicontinuity of ϕ, we may find a sequence (ri )i≥1 of positive scalars so that ϕ(ti , xi ) − ϕ(t0 , x0 ) ≥ −ε and J(ti , xi ; ν i,ε ) − J(t0 , x0 ; ν i,ε ) ≤ ε for (t0 , x0 ) ∈ Bi , i ≥ 1 , where Bi := {(t0 , x0 ) ∈ S : t0 ∈ [ti − ri , ti ], |x0 − xi | ≤ ri } , i ≥ 1 . By (3.4) together with the fact that V ≥ ϕ, this implies by : J(t0 , x0 ; ν i,ε ) ≥ J(ti , xi ; ν i,ε ) − ε ≥ V (ti , xi ) − 2ε ≥ ϕ(ti , xi ) − 2ε ≥ ϕ(t0 , x0 ) − 3ε on Bi . (3.5) Set A1 := B1 , C0 := ∅, and define the sequence Ai+1 := Bi+1 \ Ci where Ci := Ci−1 ∪ Ai , i ≥ 1. With this construction, it follows from (3.5) that the countable family (Ai )i≥1 satisfies ∪i Ai = S, Ai ∩ Aj = ∅ for i 6= j, and J(t0 , x0 ; ν i,ε ) ≥ ϕ(t0 , x0 ) − 3ε on Ai ⊂ Bi . (3.6) 4. We now prove (3.2). Set An := ∪i≤n Ai , n ≥ 1. Given ν ∈ Ut , we define n X ν ν νsε,n := 1[t,θ] (s)νs + 1(θ,T ] (s) νs 1(An )c (θ, Xt,x (θ)) + 1Ai (θ, Xt,x (θ))νsi,ε , for s ∈ [t, T ]. i=1 ν (θ)) {(θ, Xt,x
Fθt
∈ Ai } ∈ as a consequence of the independence Assumption A1. Notice that Then, it follows from the stability under concatenation Assumption A3 and Remark 3.1 that ν ε,n ∈ Ut . Then, using Assumptions A4-b, A2, and (3.6), we deduce that: n X ν ε,n ν ν ν E f Xt,x (T ) |Fθ 1An θ, Xt,x (θ) = J(θ, Xt,x (θ); ν i,ε )1Ai θ, Xt,x (θ) i=1 n X
≥
ν ν (θ) ϕ(θ, Xt,x (θ)) − 3ε 1Ai θ, Xt,x
i=1
ν ν ϕ(θ, Xt,x (θ)) − 3ε 1An θ, Xt,x (θ) ,
=
which, by definition of V and the tower property of conditional expectations, implies V (t, x) ≥ J(t, x; ν ε,n ) ν ε,n = E E f Xt,x (T ) |Fθ ν ν ν ν ≥ E ϕ θ, Xt,x (θ) − 3ε 1An θ, Xt,x (θ) + E f Xt,x (T ) 1(An )c θ, Xt,x (θ) . ν Since f Xt,x (T ) ∈ L1 , it follows from the dominated convergence theorem that: ν ν (θ))1An θ, Xt,x (θ) V (t, x) ≥ −3ε + lim inf E ϕ(θ, Xt,x n→∞ ν ν = −3ε + lim E ϕ(θ, Xt,x (θ))+ 1An θ, Xt,x (θ) n→∞ ν ν − lim E ϕ(θ, Xt,x (θ))− 1An θ, Xt,x (θ) n→∞ ν = −3ε + E ϕ(θ, Xt,x (θ)) , 6
where the last equality follows from the monotone convergence theorem, due to the fact that ν ν either E ϕ(θ, Xt,x (θ))+ < ∞ or E ϕ(θ, Xt,x (θ))− < ∞. The proof of (3.2) is completed by the arbitrariness of ν ∈ Ut and ε > 0. 2 Remark 3.2 (Lower-semicontinuity condition I) It is clear from the above proof that it suffices to prove the lower-semicontinuity of (t, x) 7→ J(t, x; ν) for ν in a subset U˜0 of U0 such that supν∈U˜t J(t, x; ν) = V (t, x). In most applications, this allows to reduce to the case where the controls are essentially bounded or satisfy a strong integrability condition. Remark 3.3 (Lower-semicontinuity condition II) In the above proof, the lower-semicontinuity assumption is only used to construct the balls Bi on which J(ti , xi ; ν i,ε ) − J(·; ν i,ε ) ≤ ε. Clearly, it can be alleviated, and it suffices that the lower-semicontinuity holds in time from the left, i.e. lim inf
(t0 ,x0 )→(ti ,xi ), t0 ≤ti
J(t0 , x0 ; ν i,ε ) ≥ J(ti , xi ; ν i,ε ).
Remark 3.4 (The Bolza formulation) Consider the stochastic control problem under the so-called Lagrange formulation: Z T ν ν ν ν Yt,x,1 (s)g s, Xt,x (s), νs ds + Yt,x,1 (T )f Xt,x (T ) , V (t, x) := sup E ν∈Ut
t
where ν ν ν ν dYt,x,y (s) = −Yt,x,y (s)k s, Xt,x (s), νs ds , Yt,x,y (t) = y > 0 . Then, it is well known that this problem can be converted into the Mayer formulation (2.3) by increasing the state process to (X, Y, Z), where ν ν ν ν dZt,x,y,z (s) = Yt,x,y (s)g s, Xt,x (s), νs ds , Zt,x,y,z (t) = z ∈ R , and considering the value function ν ν ν V¯ (t, x, y, z) := sup E Zt,x,y,z (T ) + Yt,x,y (T )f Xt,x (T ) = yV (t, x) + z . ν∈Ut
In particular, V (t, x) = V¯ (t, x, 1, 0). The first assertion of Theorem 3.1 implies Z θν ν ν ν ν ν ν ν V (t, x) ≤ sup E Yt,x,1 (θ )V (θ , Xt,x (θ )) + Yt,x,1 (s)g s, Xt,x (s), νs ds . (3.7) ν∈Ut
t
Given a upper-semicontinuous minorant ϕ of V , the function ϕ¯ defined by ϕ(t, ¯ x, y, z) := yϕ(t, x)+z is an upper-semicontinuous minorant of V¯ . From the second assertion of Theorem t 3.1, we see that for a family {θν , ν ∈ Ut } ⊂ T[t,T ], ν ν ν V (t, x) ≥ sup E ϕ¯ θν , Xt,x (θν ), Yt,x,1 (θν ), Zt,x,1,0 (θν ) ν∈Utϕ¯
=
Z ν ν ν ν ν sup E Yt,x,1 (θ )ϕ(θ , Xt,x (θ )) +
ν∈Utϕ¯
t
7
θν ν Yt,x,1 (s)g
ν s, Xt,x (s), νs
ds . (3.8)
Remark 3.5 (Infinite Horizon) Infinite horizon problems can be handled similarly. Following the notations of the previous Remark 3.4, we introduce the infinite horizon stochastic control problem: Z ∞ ∞ ν ν V (t, x) := sup E Yt,x,1 (s)g s, Xt,x (s), νs ds . ν∈Ut
t
Then, it is immediately seen that V ∞ satisfies the weak dynamic programming principle (3.7)-(3.8).
4
Dynamic programming for mixed control-stopping problems
In this section, we provide a direct extension of the dynamics programming principle of Theorem 3.1 to the larger class of mixed control and stopping problems. In the context of the previous section, we define for a Borel function f : Rd −→ R and (t, x) ∈ S the reward J¯ : S × U¯ × T[t,T ] −→ R: ν ¯ x; ν, τ ) := E f Xt,x J(t, (τ ) ,
(4.1)
which is well-defined for every control ν in n h i o ν U¯ := ν ∈ U0 : E sup |f (Xt,x (s))| < ∞ ∀ (t, x) ∈ S . t≤s≤T
The mixed control-stopping problem is defined by: V¯ (t, x) :=
sup
¯ x; ν, τ ) , J(t,
(4.2)
t (ν,τ )∈U¯t ×T[t,T ]
where U¯t is the subset of elements of U¯ that are independent of Ft . The key ingredient for the proof of (4.6) is the following property of the set of stopping times TT : t t For all θ, τ1 ∈ TTt and τ2 ∈ T[θ,T ] , we have τ1 1{τ1 0 such that t0 + r < T and −∂t ϕ + H u (., Dϕ, D2 ϕ) (t, x) > 0 for every u ∈ U and (t, x) ∈ Br (t0 , x0 ). (5.6) Since (t0 , x0 ) is a strict maximizer of the difference V ∗ − ϕ, it follows that −2η :=
max (V ∗ − ϕ) < 0.
(5.7)
∂Br (t0 ,x0 )
Let (tn , xn )n be a sequence in Br (t0 , x0 ) such that (tn , xn , V (tn , xn )) → (t0 , x0 , V ∗ (t0 , x0 )). n For an arbitrary control ν n ∈ Utn , let X n := Xtνn ,xn denote the solution of (5.1) with initial condition Xtnn = xn , and set θn := inf {s ≥ tn : (s, Xsn ) ∈ / Br (t0 , x0 )} . We may assume without loss of generality that |(V − ϕ)(tn , xn )| ≤ η for all n ≥ 1.
(5.8)
Applying Itˆo’s formula to ϕ(·, X n ) and using (5.6) together with (5.2) leads to Z θn n νn 2 n ϕ(tn , xn ) = E ϕ(θn , Xθn ) − ∂t ϕ − H (., Dϕ, D ϕ) (s, Xs )ds ≥ E ϕ(θn , Xθnn ) . tn
Now observe that ϕ ≥ V ∗ + 2η on ∂Br (t0 , x0 ) by (5.7). Hence, the above inequality implies that ϕ(tn , xn ) ≥ E V ∗ (θn , Xθnn ) + 2η, which implies by (5.8) that: V (tn , xn ) ≥ E V ∗ (θn , Xθnn ) + η for n ≥ 1. Since ν n ∈ Utn is arbitrary, this contradicts (3.1) for n ≥ 1 fixed. 13
2
References [1] Bertsekas D.P. and S.E. Shreve (1978), Stochastic Optimal Control : The Discrete Time Case, Mathematics in Science and Engineering 139, Academic Press. [2] Borkar V.S. (1989), Optimal Control of Diffusion Processes, Pitman Research Notes 203. Longman Sci. and Tech. Harlow, UK. [3] Crandall M.G., H. Ishii and P.-L. Lions (1992), User’s guide to viscosity solutions of second order Partial Differential Equations, Amer. Math. Soc. 27, 1-67. [4] El Karoui N. (1981), Les Aspects probabilistes du contrˆole stochastique, Springer Lecture Notes in Mathematics 876, Springer Verlag, New York. [5] Fleming W.H. and H.M. Soner (2006), Controlled Markov Processes and Viscosity Solutions, Second Edition, Springer. [6] Lions P.-L. (1983), Optimal Control of Diffusion Processes and Hamilton-Jacobi-Bellman Equations I, Comm. PDE. 8, 1101-1134. [7] Lions P.-L. (1983), Optimal Control of Diffusion Processes and Hamilton-Jacobi-Bellman Equations, Part II: Viscosity Solutions and Uniqueness, Comm. PDE. 8, 1101-1134. [8] Reny P. J. (1999), On the Existence of Pure and Mixed Strategy Nash Equilibria in Discontinuous Games, Econometrica, 67(5), 1029-1056. [9] Touzi N. (2002), Stochastic Control Problems, Viscosity Solutions, and Application to Finance, Quaderni, Edizioni della Scuola Normale Superiore, Pisa.
14