Numerical Methods for the Optimization of Nonlinear Stochastic Delay Systems, and an Application to Internet Regulation Harold J. Kushner Applied Math, Brown University, Providence RI 02912 USA,
[email protected] Abstract— The Markov chain approximation method is an effective and widely used approach for computing optimal values and controls for stochastic systems. It was extended to nonlinear (and possibly reflected) diffusions with delays in a recent book. The path, control, and reflection terms can all be delayed. In models of communications systems, reflection terms correspond to buffer overflows, and to the adjustments needed to keep the buffers and rates nonnegative and bounded. The book concentrated on convergence proofs for general algorithms of promising forms. If the control and/or reflection terms are delayed, the memory requirements can make the problem intractable, since these have little or no regularity. For such models, the problem was recast in terms of a “wave equation, ” to get practical algorithms with very much reduced computational needs. The overall aproach to getting usable algorithms is outlined. The methods are applied to an ideal model of a communications system with transportation delay. and it is seen that nearly optimal controls for appropriately chosen criteria can improve the performance considerably and be robust in the face of changing operating conditions. The data illustrate the potential usefulness of numerical methods for nonlinear stochastic systems with delays. It opens up new avenues of control for consideration.
Key words. Nonlinear stochastic delay systems, optimal controls, numerical methods, communications applications I. I NTRODUCTION
The set of algorithms known as the Markov chain approximation methods [9] are widely used techniques for the numerical approximation of optimal controls and values for general continuous-time nonlinear stochastic systems. They were extended to controlled general nonlinear delayed diffusion models in [6], [7], [8]. With this method, one first approximates the process by a controlled finite-state Markov chain that satisfies certain minimal local consistency properties (many ways of getting the approximations are in [9]), and then solves the associated Bellman equation. The proofs of convergence are probabilistic. They do not depend on analytical properties of the Bellman equation for the original model, and converge under virtually the weakest possible conditions. In the absence of delays, the probabilistic basis of the proofs is a key to the generality, robustness, and usefulness of the methods. With delays, it is essential, since then virtually nothing is known about the properties of the Bellman equation for the original model. The numerical problem is particularly difficult if the control or reflection terms are delayed, where the memory needs can be enormous, due to the fact that naive approximations to the memory segments of these processes lead to very highdimensional numerical models. For this case, we recast [8] the problem in terms of a stochastic “wave” or transportation equation, whose numerical solution requires much less
memory, and which yields the optimal costs and controls. The emphasis in [8] was on the theoretical foundations. Promising classes of algorithms were developed and convergence theorems were proved. It provided a foundation for the subject, but details of the adaptation of the ideal algorithms to concrete applications and data illustrating the usefulness of the methods was left to subsequent publications; e.g., [2], which was not concerned with delayed reflection terms, a major consideration in the application of interest here. The general model and assumptions are in Section 2. Our particular interest in this paper is in a special subclass that are simple models of internet regulation, where the model is a reflected diffusion and both the control and reflection terms are delayed, given in Section 3. This model will serve as a concrete illustration the methods, and can be taken to illustrative of the benefits to be gained by the use of numerical methods. It will be seen that controls that take the past (over the delay interval) into account can yield much improved performance. The controls to be computed are robust in that they perform well when the system parameters change. Note that the problems to be discussed would be intractable by any other method, say one that tried to use time discretizations of the memory segments over the delay interval. The results are also illustrative of the potential of numerical methods for nonlinear stochastic systems with delays. Section 4 deals with the stochastic wave equation transformation, the Markov chain approximation method is briefly summarized for the no-delay case in Section 5 and extended to the delay case in Section 6, in a way that is convenient for applications. Section 7 contains the numerical data and discussion. See [7], [8] for the mathematical proofs of convergence. II. T HE G ENERAL M ODEL The process {xi (·), i ≤ r} = x(·) ∈ IRr (Euclidean r-space) is confined to G, a compact convex polyhedron (with an interior) by the boundary reflection process z(·). Let yi (·) = component of z(·) that is due to the reflection from the Pith face of G, with reflection direction di . Then z(t) = i di yi (t). The general model is the reflected diffusion, where θ¯ = maximum delay and w(·) is a Wiener process: R0 dx(t) = dt −θ¯ b(x(t + θ), u(t + θ), θ)dµa (θ) +σ(x(t))dw(t) + c(x(t), u(t))dt R0 +dz(t) + dt θ=−θ¯ p(x(t + θ), θ)dθ y(t + θ). (2.1)
The last term models the delayed reflection. θ¯ is used for the max delay for all components. There is only a notational change if the max delay depends on the component. The initial conditions are the values over the delay interval: e.g., x ˆ = {x(s), −θ¯ ≤ s ≤ 0}, u ˆ = {u(s), −θ¯ ≤ s ≤ 0} and/or ¯ yˆ = {y(s), −θ ≤ s ≤ 0}, whichever is delayed. If the process is stopped on reaching a target boundary (with no reflection), then drop all z(·), y(·). Assumptions and Costs. The conditions on {di } are those in [9, Section 5.7], to which the reader is referred. They are standard in models with reflections and cover the usual applications. See [5, Chapter 3] for a discussion of reflected diffusions. The controls take values in a compact set U , b(·), c(·), p(·), σ(·), k(·) are bounded and continuous, and there is a unique weak-sense solution to (2.1) for each initial condition and control. All functions of θ are defined to have value zero for θ < −θ¯ and θ > 0. µa (·) is a finite measure ¯ 0] with µa ([−t, 0]) → 0 as t → 0. To assure that on [−θ, ¯ such that p(x, θ) = 0 x(·) is well defined, there is θ¯0 ∈ (0, θ] ¯ for θ ≥ −θ0 . With x ˆ, u ˆ, yˆ denoting the canonical values of the (path, ¯ 0], β > 0, vector q, and control, reflection) segments on [−θ, control u(·), we concentrate on the discounted cost function (where Exˆu,ˆu,ˆy = expectation, given the initial data)
W (ˆ x, u ˆ, yˆ, u) = Exˆu,ˆu,ˆy
Z
∞
e−βt [k(x(t), u(t))dt + q 0 dy(t)] .
0
(2.2)
III. A C OMMUNICATIONS S YSTEMS M ODEL
The numerical data will be for the model (3.1). It is a form of TCP, the standard AIMD (additive increase, multiplicative decrease) internet rate control model. This particular model originally arose as a heavy traffic limit in [1] of a system with many (aggregated) controllable sources together with many other sources with short transmissions. But the model captures the main features of other simple models that have been considered in the literature. In particular, there is a group of sources whose transmission rates can be controlled, and many sources with short transmissions and uncontrollable rates. After a delay, the packets are received by a “bottleneck router.” Buffer overflow packets from the controlled sources are not acknowledged, causing a decrease in the transmission rate of the affected sources, after the transportation delay. Let x2 (·) = scaled controlled (and aggregated) source rate, x1 (·) = scaled content of the bounded buffer, θ¯ the roundtrip transportation delay, and q0 the scaled router processing
rate. The model is1 dx1 (t) = [x2 (t) − q0 ] dt + σdw(t) + dL1 (t) − dU1 (t), ¯ dx2 (t) = q1 dt + q2 u(t − θ)dt ¯ 2 (t − θ) ¯ + dL2 (t). −q3 dU1 (t − θ)x (3.1) The process U1 (·) represents the cumulative scaled buffer overflow. The process L1 (·) can increase only when the buffer is empty, and (as is common in modeling queueing systems) is used to assure that it is non-negative. w(·) is a Wiener process representing the scaled centered uncontrollable disturbances.2 The centering that yields the process w(·) in the limit is accounted for in the choice of the net processor rate q0 . The purpose of the control u(t) is to signal the source to reduce or increase the rate, as called for. More importantly, w(·) serves to represent the unpredictability of the demands on the system, as is commonly done in stochastic models. The reflection term L2 (·) assures that x2 (t) ≥ 0. The source rate is controlled by the constant ¯ increase rate q1 > 0, and the delayed overflow dU1 (t − θ) ¯ The variables are measured and control and control u(t − θ). determined at the router. Its values are sent to the sources and ¯ Cost the result is seen at the router after the roundtrip delay θ. functions would penalize buffer overflow and queue length, and reward rate. Systems like (3.1), if uncontrolled, oscillate and have large buffer overflows, as seen in . 3.1, where the zig-zag line is the source rate, and the wild curve is the noise and controllable-source-rate influenced queue level. The controls that are used at present depend only on the current state at the router. The instability problem arises because such controls do not account for the control signals ¯ t), whose that were sent on the memory interval [t − θ, effect has not yet been seen at the router, which leads to overcontrol. The data in Section 7 shows that the use of controls that take the past actions into account can improve the performance considerably. With naive algorithms, the memory requirements would be intractable. The controls are robust. E. g., the results remain good if the delay in the ¯ 0]. Robustness of a usable application is distributed on [−θ, control is critical, since conditions can change rapidly and the model is only an approximation. More detail is in Section 7. ¯ 2 (t − θ) ¯ as a primary TCP-type algorithms use dU1 (t − θ)x mechanism for source rate control. It will be seen that this can be replaced by a suitable control that need not depend on unacknowledged packets, thus opening the door to wider 1 It is also a form of the congestion control method RED [3], [4], [10], where one marks packets for notifications to their sources with a probability depending on the system state at the router. This is related to the control u(·), which here can also depend on the control actions in the delay interval. The model is simple, but no simpler than a large part of the models that have been used to analyze various forms of TCP-type protocols, where the delay is often dropped, and nonlinearities are linearized. In the literature, the modeling often starts in terms of “windows,” and is then approximated in terms of rates. 2 The Wiener process in the original model in [1] is due to the central limit theorems used to get the approximation, due to the limit of the scaled randomness in the uncontrollable arrival processes with the short transmissions.
A discrete-θ-and-time approximation. The following approximation to (4.1), (4.2) will provide the intuition behind the development of the numerical algorithm. Define the approximations dt χ1 (t, θ) ≈ [χ1 (t + ∆, θ) − χ1 (t, θ)] and dθ χ1 (t, θ) ≈ [χ1 (t, θ) − χ1 (t, θ − δ)]. Convergence to (4.1), (4.5) requires that δ = ∆, which yields the discrete-time form (t = lδ, l = 0, 1, . . .)
possibilities for regulation. 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 35
40
45
50
55
60
χ0,δ (t + δ) = χ0,δ (t) + δχ1,δ (t, 0) + δc(χ0,δ (t), uδ (t)) +σ(χ0,δ (t)) [w(t + δ) − w(t)] + z 0,δ (t + δ) − z 0,δ (t) , χ1,δ (t + δ, θ) = χ1,δ (t, θ − δ)
Fig 3.1. No control-Only basic AIMD.
+b(χ0,δ (t), uδ (t), θ − δ) [µa (θ) − µa (θ − δ)] +p(χ0,δ (t), θ − δ) y 0,δ (t + δ) − y 0,δ (t) .
IV. A WAVE E QUATION R EPRESENTATION Consider the following “wave” or “transportation” equation. For −θ¯ < θ ≤ 0, define χ0 (·) and χ1 (·) by 0
1
Define the operator Φδ , analogous to (4.4), by
0
dχ (t) = χ (t, 0)dt + c(χ (t), u(t))dt +σ(χ0 (t))dw(t) + dz 0 (t), dt χ1 (t, θ) = −dθ χ1 (t, θ) + p(χ0 (t), θ)dy 0 (t) +b(χ0 (t), u(t), θ) [µa (θ + dt) − µa (θ)] .
(4.6)
(Φδ f (·))(t, θ) = f (t, θ − δ),
(4.1)
−θ¯ ≤ θ − δ ≤ 0,
(4.7)
with (Φδ f (·))(t, θ) = 0 otherwise. Using Φδ , (4.6) is (4.2)
dt , dθ are differentials with respect to t, θ. The reflection term z 0 (·) is for χ0 (·), taking values in G. (4.2) is formal, but will be well-defined by (4.5). Boundary and initial conditions are ¯ = 0, χ0 (0) = x(0), and χ1 (t, −θ) Rθ χ1 (0, θ) = −θ¯ b(x(γ − θ), u(γ − θ), γ)dµa (γ) (4.3) Rθ + −θ¯ p(χ0 (γ − θ), γ)dγ y 0 (γ − θ). 0
The cost function is (2.2) with χ (t) replacing x(·). Comments on the representation. (4.1) contains the undelayed and (4.2) the delayed dynamics. The scalar “ transportation” variable θ represents all delays. The dimension of χ1 (·) is the number of components of dx(·) whose terms have delays, and does not depend on the number of controls. If dxi (·) has no terms with delays, then χ0i (·) = xi (·), χ1i (·) = 0. There are no longer any delayed variables as such, and only χ0 (t) and χ1 (t, θ) need to be approximated for a discrete set of θ-values. Thus all memory processes are represented by a process of lower dimension. For (3.1), χ1 (t, θ) is one-dimensional. Representation in terms of a Semigroup. Since (4.2) is linear in χ1 (·), it can be well-defined via the semigroup of the PDE dt χ1 (t, θ) = −dθ χ1 (t, θ). Define Φ(·), acting on functions of θ: f (θ − t), −θ¯ ≤ θ − t ≤ 0, (Φ(t)f (·))(θ) = (4.4) 0, otherwise. Although the forms (4.1), (4.2) will guide the development, the solution to (4.2) is defined by Rt χ1 (t, θ) = Φ(t)χ1 (0, θ) + 0 Φ(t − s)p(χ0 (s), θ)dy 0 (s)+ Rt Φ(t − s)b(χ0 (s), u(s), θ) [µa (θ + ds) − µa (θ)] . 0 (4.5) The representation (4.1), (4.2) is justified by: Theorem 4.1. χ0 (·) = x(·). (The proof is in [7], [8].)
χ1,δ (t + δ, θ) = Φδ χ1,δ (t, θ) +b(χ0,δ (t), uδ (t), θ − δ) [µa (θ) − µa (θ − δ)] +p(χ0,δ (t), θ − δ) y 0,δ (t + δ) − y 0,δ (t) .
(4.8)
For later use, we refine (4.8). Let t = lδ, δ/δ0 = n0 =integer. For n < n0 , use (with ∆0 y 0,δ (s) = [y 0,δ (s + δ0 ) − y 0,δ (s)]) χ1,δ (t + nδ0 , θ) = Φδ χ1,δ (t, θ)+ n−1 X
b(χ0,δ (t + iδ0 ), uδ (t + iδ0 ), θ−δ)[µa (θ)−µa (θ−δ)]
i=0 n−1 X
+
δ0 δ
p(χ0,δ (t + iδ0 ), θ − δ)∆0 y 0,δ (lδ + iδ0 )
i=0
(4.9) χ0,δ (·) converges to x(·) [7] as δ → 0. Note the procedure in (4.9): first shift by constructing Φδ χ1,δ (lδ, θ), then compute the right-hand dynamical terms until the next shift. V. T HE M ARKOV C HAIN A PPROXIMATION M ETHOD : R EVIEW FOR T HE N O -D ELAY C ASE Only some facts (of this well-known method) for the system dx = b(x, u)dt + σ(x)dw + dz that will be needed for the delay case will be given. See [9, Chapter 5] for more. Local consistency. For small h, Sh the h-grid in IRr , and Gh = Sh ∩G, construct a finite-state controlled Markov chain ξnh on Gh that has the “local” properties of x(·) as follows. h,α Let uhn = control at step n. Let Ex,n (resp., covarh,α x,n ) be the expectation (resp., covar) given the data to step n and h h ξnh = x ∈ Gh , uhn = α. Then ξn+1 must satisfy: kξn+1 − h h ξh k = O(h), and there is ∆t (x, α) > 0: h h,α Ex,n ξn+1 − x = b(x, α)∆th (x, α) + o(∆th (x, α)), h h h covarh,α x,n ξn+1 − x = a(x)∆t (x, α) + o(∆t (x, α)), a(x) = σ(x)σ 0 (x),
limh→0 supx,α ∆th (x, α) = 0, (5.1)
Define ph (x, x ˜|α) = P {ξ1h = x ˜|ξ0h = x, uh0 = α} = transition probability. The methods in [9] obtain ∆th (·) while getting ph (x, x ˜|α). Call such approximations “explicit.” States x 6∈ Gh , to which the chain can move in one step from Gh , are called reflecting points. For x a reflecting state, move to the closest points in Gh , with the mean direction being a reflection direction at x. Now solve the optimization problem for a cost approximating (2.2). For x ∈ Gh , ph (x, x ˜|α) can usually be written as a ratio: ph (x, x ˜|α) = N h (x, x ˜, α)/Dh (x, α), ∆th (x, α) = h2 /Dh (x, α), inf x,α Dh (x, α) > 0,
(5.2)
where N h (·), Dh (·) have the form: ¯ (hb(x, α), σ(x), x N h (x, x ˜, α) = N ˜), ¯ Dh (x, α) = D(hb(x, α), σ(x)).
(5.3)
These forms will be adapted for use with the delay case. h Define ∆thn = ∆th (ξnh , uhn ), δznh = [ξn+1 − ξnh ]I{ξnh 6∈G} ≡ P h i di yn,i . Centering around the expectation, given the data to step n, yields the dynamical representation h ξn+1 = ξnh + ∆thn b(ξnh , uhn ) + βnh + δznh + o(∆thn ), (5.4)
where the martingale difference βnh has conditional covariance a(ξnh , uhn )∆thn +o(∆thn ). One appropriate approximation to the cost (2.2) is Exu
∞ X
n−1 X h e−βtn k(ξnh , uhn )∆thn + q 0 δynh , with thn = ∆thi .
i=0
i=0
The “implicit” procedure. There is a procedure, called the “implicit” approximation, that is derived from the “explicit” method above [9, Chapter 12], and which will be essential in reducing the required memory. The time variable will be just another state variable. The state space of the chain is a discretization of (x, t)−space, and the component due to the time variable increases “randomly.” Let δ = O(h) =discretization level for the time variable. Given ph (·), ∆th (·) for the explicit method, it is straightforward to compute those for the implicit method [9, Section 12.4]. Suppose that, at the current step, the time variable does not advance, then the distribution of the next spatial state is just ph (x, x ˜|α). Thus we need only the conditional probability ph,δ (x, nδ; x, nδ + δ|α) that the time variable advances. The transition probabilities ph,δ (·) and interpolation interval ∆th,δ (·) for the implicit procedure are obtained from ph (·), ∆th (·) by [9, Section 12.4] ph,δ (x, nδ; x ˜, nδ|α) , 1 − ph,δ (x, nδ; x, nδ + δ|α) h ∆t (x, α) ph,δ (x, nδ; x, nδ + δ|α) = , ∆th (x, α) + δ δ∆th (x, α) . ∆th,δ (x, α) = ∆th (x, α) + δ
with time interval ∆th,δ (·). If we construct a continuous-time interpolation of the optimal chain with intervals {∆th,δ n }, then any weak-sense limit is an optimal process [9]. VI. T HE D ELAY S YSTEM : W ITH T HE I MPLICIT M ETHOD For the problem with delays, it is particularly important that the numerical algorithms have reasonable memory and computational needs. The “explicit” approximation method, which would discretize (θ, χ0 (t), χ1 (t, θ)) so that (5.1) holds, yields ∆thn = O(h2 ). Then, due to the connection between the advance in the time variable and θ as seen in the discussion connected with (4.6), the discretized θ would take O(1/h2 ) values, which is obviously impractical. The implicit ¯ =integer, method will greatly reduce this. For δ > 0, let θ/δ δ ¯ and let θ take values in T = {−θ + δ, . . . , −δ, 0}. Since ¯ = 0, we do not need the value θ = −θ. ¯ Use h0 χ1 (t, −θ) 0 for the approximation parameter for χ (t), and, for each θ ∈ T δ , let ξn1,h,δ (θ) take values in a regular h1 -grid. Redefine h = (h0 , h1 ). The chain approximating (χ0 (t), χ1 (t, ·)) will be called (ξn0,h,δ , ξn1,h,δ (θ), θ ∈ T δ ). ξn0,h,δ is immediately reflected back if it leaves Gh0 just as for ξnh . Let φh,δ n denote the time variable, as in Section 5. The transitions when the time variable advances. The transition probabilities depend on whether the time variable advances or not. Using (4.6)–(4.9) as a guide, the increments in θ and the time variable in the implicit procedure should be the same. Indeed, we would not have convergence otherwise. Because of the coordination between the advance in the time variable and the “shift” in θ associated with the operator Φδ , if the time variable advances at step n, then we simply shift: ξn1,h,δ ≡ {ξ 1 (−θ¯ + δ), . . . , ξ 1 (0)} 1,h,δ → ξn+1 = {0, ξ 1 (−θ¯ + δ), . . . , ξ 1 (−δ)}, 0,h,δ ξn+1
=
ξn0,h,δ , φh,δ n+1
(6.1)
= φh,δ n + δ.
0,h,δ Transitions of ξn when the time variable does not advance. The forms (5.2), (5.3) will be adapted. This implies that, for ξn0,h,δ = x0 ∈ Gh0 , ξn1,h,δ (0) = x1 and control α, and the drift vector for χ0 (·) being x1 + c(x0 , α), the 0,h,δ probability that ξn+1 =x ˜0 , conditioned on the time variable not advancing, is ¯ h0 (x1 + c(x0 , α)), σ(x0 ), x N ˜0 h 0 0 1 , p x ,x ˜ α, x = ¯ D (h0 (x1 + c(x0 , α)), σ(x0 )) (6.2) ¯ h0 (x1 + c(x0 , α)), σ(x0 ) . with ∆th (x0 , x1 , α) = h20 /D Any algorithms in [9, Chapter 5] can be used to get (6.2). By (5.5), the probability that the time variable advances is3
ph (x, x ˜|α) =
(5.5)
Let ξnh,δ and φh,δ be the spatial and temporal states at n h,δ h,δ step n under (5.5). Define ∆th,δ (ξn , uh,δ n = ∆t n ). Under (5.5), the local consistency (5.1) holds for the process ξnh,δ ,
ph,δ (x0 , nδ; x0 , nδ + δ|α, x1 ) =
∆th (x0 , x1 , α) , (6.3) ∆th (x0 , x1 , α) + δ
and the implicit procedure interpolation interval is ∆th,δ (x0 , x1 , α) = δ∆th (x0 , x1 , α)/[∆th (x0 , x1 , α) + δ]. 3 Although the number of steps between shifts for the numerical algorithm is random, in applications the shift is done at deterministic times, δ, 2δ, · · ·.
Let uh,δ = control at step n, and ∆thn = n 0,h,δ h 0,h,δ 1,h,δ 0,h,δ ∆t (ξn , ξn (0), uh,δ = [ξn+1 − n ). Set ∆zn 0,h,δ 0,h,δ ξn ]I{ξn0,h,δ 6∈Gh } . Define the components ∆yi,n P 0,h,δ Define ∆th,δ = by ∆zn0,h,δ = n i di ∆yi,n . h,δ 0,h,δ 1,h,δ h,δ h,δ ∆t (ξn , ξn (0), un ). Let Fn = data to step n, and Inh,δ = indicator that the time variable advances at step n. The construction of the transition probabilities for ξn0,h,δ implies that for ξn0,h,δ ∈ Gh0 , i h 0,h,δ E ξn+1 − ξn0,h,δ Fnh,δ 1,h,δ h,δ ξn (0) + c(ξn0,h,δ , uh,δ = ∆th,δ n ) + o(∆tn ), n i h 0,h,δ h,δ ξn+1 − ξn0,h,δ = a(ξn0,h,δ )∆th,δ covarh,δ,α n + o(∆tn ), n This is the local consistency (5.1). It follows that we can represent the increment in the dynamical form 0,h,δ 1,h,δ ξn+1 = ξn0,h,δ + ∆th,δ (0) + βn0,h,δ n ξn 0,h,δ 0,h,δ + o(∆thn ), , uh,δ +∆th,δ n ) + ∆zn n c(ξn
(6.4)
where βn0,h,δ is a martingale difference with conditional h,δ covariance a(ξn0,h,δ )∆th,δ n + o(∆tn ). 1,h,δ The transitions for ξn (θ) when time does not advance. The form (6.2) was simply a consequence of (5.2), (5.3), since χ0 (t) is a diffusion. Getting appropriate transition probabilities for ξn1,h,δ (θ) when time does not advance is a little more complicated, and we use the comments concerning (4.6)–(4.9) to guide us. In particular, the transitions aim to approximate the conditional expectation of the summands in (4.9), as follows. Replace the δ0 in (4.9) by ∆thn and let 1,h,δ ξn+1 (θ) − ξn1,h,δ (θ) satisfy h i 1,h,δ E ξn+1 (θ) − ξn1,h,δ (θ) Inh,δ = 0, Fnh,δ = qnh,δ (θ), (6.5) where qnh,δ (θ) = p(ξn0,h,δ , θ)∆yn0,h,δ +b(ξn0,h,δ , uh,δ n , θ)
[µa (θ) − µa (θ − δ)] h ∆tn . δ
(6.6)
Only one of the terms in (6.6) can be nonzero at a time. If ∆yn0,h,δ 6= 0, then we are at a reflection step for ξn0,h,δ , and the change in ξn0,h,δ is what is required to bring the path back to Gh0 . Then the transition probability for ξn1,h,δ (θ) is just that needed to assure (6.5) with qnh,δ (θ) = p(ξn0,h,δ , θ − δ)∆yn0,h,δ . The non-zero components of ∆yn0,h,δ are ±h0 . To realize the conditional mean (6.5), randomize between the grid points that are closest to qnh,δ (θ). The randomization errors are asymptotically negligible [8, Chapter 9]. A suitable cost function is Eχu0 (0),χ1 (0)
∞ X
e
−βth,δ n
0,h,δ h,δ 0 0,h,δ k(ξn , un )∆th,δ . n + q ∆yn
i=0
(6.7) Convergence. Let ψ i,h,δ (·), u0,h,δ (·) be the continuous-time 0,h,δ interpolations of the optimal {ξni,h,δ , uh,δ (·) the n }, and z
Pn−1 0,h,δ , all with intervals {∆th,δ interpolation of n }. i=0 ∆zi Then we have (mod asymptotically negligible terms) Rt ψ 0,h,δ (t) = x(0) + 0 ψ 1,h,σ (s, 0)ds+ Rt c(ψ 0,h,δ (s), uh,δ (s))ds + M h,δ (t) + z 0,h,δ (t), 0 andR the quadratic variation of the martingale M h,δ (·) t is 0 a(ψ 0,h,δ (s))ds. The proofs in [7], [8] show that 0,h,δ ψ (·) → x(·), and that the first integral on the right converges to the terms with delays in (2.1), and that the limit is optimal. An illustration with (3.1). The construction in the last paragraph will be illustrated by the system (3.1). We assume that there is only a single point delay for both the control and ¯ (3.1) is two-dimensional. Relating reflection process, at −θ. it to (2.1), we have p1 (·) = b1 (·) = 0, since the delays are all in the second component. By the nature of the discretization, the procedure cannot distinguish between this point delay and ¯ −θ¯ + δ] with density where the delay is distributed over [−θ, 1/δ. So we let p2 (·) pick out the buffer overflow component U (·) of z(·) and set p2 (x, θ) = −q3 x2 /δ and b2 (x, u, θ) = ¯ −θ¯ + δ], with b(x, u, θ) = p(x, θ) = 0 q2 u for θ ∈ [−θ, ¯ = 1. Then for θ = elsewhere, and µa (−θ¯ + δ) − µ(−θ) h /δ +p(ξn0,h,δ , θ)∆yn0,h,δ , , θ)∆t −θ¯+δ, (6.6) is b(ξn0,h,δ , uh,δ n n and equal to zero for other values of θ. For specificity, let θ¯ = 4δ. Then we can write ξn1,h,δ = ξn1,h,δ (−3δ), ξn1,h,δ (−2δ), ξn1,h,δ (−δ), ξn1,h,δ (0) , ¯ = 0. For this example, between advances of with ξn1,h,δ (−θ) the time variable, only ξn1,h,δ (−3δ) is updated. Computing u(·). Owing to the wave equation representation, the functional dependence of the computed controls has the form u(ξn0,h,δ , ξn1,h,δ (θ), θ ∈ Tδ ). Since ψ 0,h,δ (·) → . But x(·), in an application one can use x(t) for χ0,h,δ n ξn1,h,δ (θ) and its limit χ1 (t, θ) must be approximated from the applications data, and appropriately guided by (4.9) and (6.5). The following method worked well and will lead to a convergent procedure. When applying the computed controls to the physical system in the simulations we used a discrete-time approximation of the original model, with small interval δ1