Dynamic safety-stocks for asymptotic optimality in stochastic networks

Report 1 Downloads 43 Views
Dynamic safety-stocks for asymptotic optimality in stochastic networks∗ Sean Meyn† February 12, 2004

Abstract This paper concerns control of stochastic networks using state-dependent safetystocks. Three examples are considered: a pair of tandem queues; a simple routing model; and the Dai-Wang re-entrant line. In each case, a single policy is proposed that is independent of network load ρ• . The following conclusions are obtained for the controlled network, where the finite constant K0 is independent of load. (i) An optimal policy for a one-dimensional relaxation stores all inventory in a single buffer i∗ . The policy for the (un-relaxed) stochastic network maintains hX i   E k ≥ 0, Qi (k) ≤ K0 E log(1 + Qi∗ (k)) , i6=i∗

where Q(k) is the `-dimensional vector of buffer lengths at time k, initialized at Q(0) = 0. (ii) The policy is fluid-scale optimal, and approximately average-cost optimal: The steadystate cost η satisfies the bound η∗ ≤ η ≤ η∗ + K0 log(η∗ ),

0 < ρ• < 1,

where η∗ is the optimal steady-state cost. These results are based on the construction of an approximate solution to the average-cost dynamic programming equations via the one-dimensional relaxation and an associated fluid model. Keywords: Queueing Networks, Routing, Scheduling, Optimal Control. AMS subject classifications: Primary: 90B35, 68M20, 90B15; Secondary: 93E20, 60J20.

∗ This paper is based upon work supported by the National Science Foundation under Award Nos. ECS 02 17836 and DMI 00 85165. † Department of Electrical and Computer Engineering and the Coordinated Sciences Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801, U.S.A. ([email protected]).

1

1

Introduction

This paper concerns control of multi-dimensional stochastic networks based on optimization theory for stochastic and deterministic network models. In the deterministic or stochastic model there are ` ≥ 1 buffers in the network. The `-dimensional queue-length process is defined in (7) below in discrete-time for the stochastic model, but we may interpolate to obtain a piecewise linear function of t ∈ R+ . We let Q(t; x) denote the vector of bufferlengths at time t from an initial condition x ∈ Z`+ . An associated deterministic fluid model evolves on R`+ , with state denoted q(t; x) at time t from the same initial condition (see (8).) The following two control criteria for the stochastic model are based on a given linear cost function c : R`+ → R+ . The average-cost optimality criterion concerns the steady-state behavior of the network; and the fluid-scale optimality criterion is based on the transient behavior from large initial conditions. The latter view of optimization has received significant attention recently due to its tractability, and the stability properties that may be obtained for the controlled network [6, 13, 11, 19]. Optimality under each of these criteria is formally defined as follows: A policy is called average-cost optimal for the stochastic model if the average cost, η := lim sup E[c(Q(t; x))] t→∞

is minimized over all policies, for each initial condition x ∈ Z`+ . This is independent of x under the optimal policy [11]. Define for each n ≥ 1, x ∈ R`+ , the scaled process {q n (t; x) = n−1 Q(nt; [nx]) : t ∈ R+ }, where [y] ∈ Z`+ denotes the integer part of y ∈ R`+ . A policy is called fluid-scale optimal if lim sup Jn (x, T ) := lim sup Ex n→∞

n→∞

hZ

T 0

i c(q n (t; x)) dt ≤ J∗ (x),

where the minimal value function is defined for the fluid model by Z ∞ c(q(t; x)) dt, x ∈ R`+ , J∗ (x) := min

x ∈ Rn+ , T ≥ 0,

(1)

(2)

0

and the minimum is over all policies. The inequality in (1) is necessarily an equality for all T = T (x) sufficiently large. For the class of network models considered in this paper, it follows from [11, Theorem 7.2] that the average-cost optimal policy for the stochastic model is always fluid-scale optimal. A standard technique used for effective control of a stochastic network is through static thresholds, or safety-stocks (see e.g. [1, 3, 18].) Several recent papers explore the application of switching curves that grow logarithmically with increasing network congestion to define dynamic safety-stocks. A general approach of this form is proposed in [12, p. 194] based on heuristics and numerical results, and [14, p. 207] gives further motivation in a heavy-traffic analysis. A similar policy is analysed in depth in [6] for a pair of queues in tandem. It is found that the policy is stabilizing, and fluid-scale optimal. Moreover, a bound is obtained on the rate of convergence in (1). 2

α

x2 µ3

5 log(1 + x1 /5)

router

Optimal policy

µ1

µ2

x1

Figure 1: The optimal policy for the simple routing model. The grey region in the plot at right shows the optimal switching curve obtained using value iteration for the simple routing model. Within this region, arriving packets are routed to buffer 2. The network parameters are α = 9/19, µ1 = µ2 = 5/19, and the one step cost is c(x) = 2x1 +3x2 . The three concave curves are plots of the switching curve sγ (x1 ) = γ log(1+x1 /γ) for γ = 2, 5, 20. The best fit is when γ = 5. This closely matches the lower bound presented in Proposition 4.1, which in this instance is equal to 3β −1 = 3/ log(9/5) ≈ 5.1.

Consider for example the simple routing model shown at left in Figure 1. Customers arrive to the system at rate α, and are routed to one of the two servers based on a particular policy. Reference [7] develops structure of optimal policies for this and related models, and [10, 18] contains analyses of threshold policies for approximate optimality. Shown at right in Figure 1 is the optimal policy for a stochastic model, as well as several instances of the logarithmic switching curves considered in this paper (further details may be found in Section 4.) In this example it is apparent that the optimal policy may be closely approximated by a switching curve of this simple logarithmic form. This paper contains results that explain why one would expect such approximations to hold in general. The main idea is to modify the fluid value function to obtain an approximate solution to the average-cost dynamic programming equations for the stochastic network. This is also known as the relative value function, and is also a solution to Poisson’s equation for the optimal policy. The modifications are based upon analyses of low-dimensional workload relaxations. The approximations are accurate enough to establish approximate average-cost optimality as well as fluid-scale optimality in the examples considered. Some of the conclusions in this paper are generalizations of well-known structural properties of the following simple queueing model. For an initial condition Q(0) = x ∈ Z+ , the queue-length process evolves according to the recursion, Q(k + 1) = Q(k) − S(k + 1)U (k) + A(k + 1),

k ≥ 0.

(3)

It is assumed that the joint arrival-service process {A(k), S(k) : k ≥ 1} is i.i.d. with a finite second moment; the common marginal distribution of S = {S(k) : k ≥ 1} is binary; and the marginal distribution of A is supported on Z`+ . The allocation sequence U is defined as the non-idling policy (i.e. U (k) = 1 if Q(k) ≥ 1). The controlled process is a Markov chain on Z+ whose transition probability is denoted P , and expectation operator Ex when Q(0) = x ∈ Z+ . 3

The queueing model (3) is a version of the controlled random walk model that is considered in various multi-dimensional settings below. The associated fluid model is given by the ordinary differential equation (ODE) model, d+ q(t) = −µζ(t) + α , dt where the ‘+’ denotes right derivative, and α = E[A(1)], µ = E[S(1)]. The allocation process is again defined to be the non-idling policy, ζ(t) = 1 if q(t) > 0. It is well-known that upon sampling via uniformization, an M/M/1 queue may be represented as the solution to the recursion (3). In this special case, the arrival stream is also binary valued, with α = P(A(k) = 1) = 1 − P(S(k) = 1) = 1 − µ. It is sometimes convenient to assume that the processes A and S are mutually independent in (7). While this is violated for the standard sampled M/M/1 queue, a different sampling does yield independence. Indeed, consider an M/G/1 queue sampled at service completions. This is of the form (3) with S(k) ≡ 1 for all k ≥ 1, and the distribution of A(k) coincides with the number of Poisson arrivals during the kth service period. The following proposition is required in our treatment of one-dimensional workload relaxations, and is used to illustrate our main conclusions: Proposition 1.1 Consider the general one-dimensional queueing model (3) satisfying ρ = α/µ < 1, and σ 2 := ρm2 + (1 − ρ)m2A < ∞, where m2 = E[(S(1) − A(1))2 ] < ∞, m2A = E[A(1)2 ] < ∞. Then, (i) The fluid model satisfies, J∗ (x) :=

Z



q(t) dt =

0

1 1 x2 , 2µ−α

q(0) = x ∈ R+ .

(ii) There is a unique invariant probability distribution π on Z+ satisfying η := Eπ [Q(k)] =

1 σ2 , 2µ−α

k ≥ 0.

(iii) The function h∗ : Z+ → R+ defined as h∗ (x) = J∗ (x) +

1  m2 − m2A  x, 2µ µ−α

x ∈ Z+ ,

solves Poisson’s equation, P h∗ (x) = h∗ (x) − x + η. Proof

Let a = a(x) = I(Q(0) 6= 0). We have from the definitions, Ex [Q(1)] = x − aµ + α Ex [Q(1)2 ] = x2 − 2(µ − α)x + am2 + (1 − a)m2A ,

x ∈ Z+ .

The second identity implies a version of the drift inequality (V3) of [16]. It follows from the f -Norm Ergodic Theorem of [16] that a unique invariant probability measure exists with 4

finite mean. A solution to Poisson’s equation also exists, and it is unique up to an additive constant. It follows from the definitions and identities above that the function h∗ given in (iii) is the unique solution satisfying h∗ (0) = 0. u t The approximations obtained in this paper are based on consideration of various workload models. A one-dimensional workload process W corresponding to the most heavily loaded c , which is a version of the simple queueing model (3). station is compared to its relaxation W The two processes are defined using the same driving sequences A and S. For each k ≥ 0 c ∗ (k) holds with probability one, under any policy for Q, where the lower bound W (k) ≥ W ∗ c is controlled using the non-idling policy. the relaxation W The analysis is based on the construction of a Lyapunov function V : Z`+ → R for the stochastic network model, satisfying a drift criterion of the form, P V (x) ≤ V (x) − c(x) + ηˆ∗ + E(x),

x ∈ Z`+ ,

(4)

where ηˆ∗ is the optimal cost for the relaxation, and the error E has at most logarithmic growth. The Lyapunov function is expressed as the optimal value function J∗ plus a relatively small perturbation. The main results consist of extensions of Proposition 1.1 and results of [6, 14, 1] to three particular multi-dimensional examples. A fixed policy for the stochastic network is constructed based on a translation of the policy for the relaxation, and the following bounds are obtained for the a one-dimensional family of network models with increasing load. For some fixed constant K0 < ∞, Fluid-Scale Optimality The scaled process converges, q n (t; x) → q∗ (t; x) as n → ∞ for each x and t, and the following explicit bound is obtained, 0 ≤ lim sup n→∞

 kxk n  , Jn (x, T ) − J∗ (x) ≤ K0 log(n) 1 − ρ•

T = T◦ (x),

(5)

where T◦ (x) is the draining time for q ∗ , of order kxk(1 − ρ• )−1 . Heavy-Traffic Optimality With η equal to the steady state cost for the controlled network, and ηˆ∗ the average cost for the one-dimensional relaxation, ηˆ∗ ≤ η ≤ ηˆ∗ + K0 log(ˆ η∗ ),

ρ• → 1.

(6)

The remainder of the paper is organized as follows. The next section provides a description of the models considered. Sections 3–5 contain detailed analyses of three examples, based on general structure established in the Appendix. Conclusions and extensions are described in Section 6.

5

2

Controlled random walk model and its relaxations

We review in this section stochastic network models and various relaxations to describe a multi-class queueing network. In the controlled random walk model the queue length process evolves according to the recursion, Q(k + 1) = Q(k) + B(k + 1)U (k) + A(k + 1),

k ≥ 0, Q(0) = x .

(7)

The queue length process Q, the allocation sequence U , and the arrival sequence A are each `-dimensional, and B is an ` × ` matrix seqeunce. There are `r stations: the `r × ` constituency matrix, denoted C, has binary entries, with Cji = 1 if and only if buffer i resides at station j. The allocation vector U (k) ∈ Z`+ is assumed to have binary entries. If Ui (k) = 1, then buffer i receives priority at its respective station at time k. Routing is assumed deterministic, and service statistics are determined by the station, not the buffer. This is captured through the following specifications for the matrix sequence B: An associated service process is denoted S(k) = [S1 (k), . . . , S`r (k)]T , where Si (k) takes on binary values for i = 1, . . . , `r . We let R denote the ` × ` routing matrix defined as Rij = 1 if and only if jobs leaving buffer i then move to buffer j. It is assumed that B(k) = [I−RT ]M (k), where M denotes the the diagonal matrix sequence, M (k) = diag(S(k)T C),

k ≥ 1.

That is, Mii (k) = Sj (k) if and only if Cji = 1. The networks considered in this paper are assumed to be open, so that any customer entering the system may eventually exit. This is equivalent to the assumption that RN = 0 for some integer N ≥ 2. The following additional assumptions are imposed on the policy and parameters:  : k ≥ 1} is an i.i.d. sequence of (` + `r )-dimensional vectors. For (A1) The sequence { A(k) S(k) each k ≥ 1, 1 ≤ i ≤ `r , the distribution of Si (k) is binary. The distribution of A(k) is supported on Z`+ , and satisfies E[kA(k)k2 ] < ∞. (A2) The allocation sequence U is adapted to (Q, A, S), and satisfies the linear constraints, U (k) ≥ 0 ,

CU (k) ≤ 1,

k ≥ 0,

where {0, 1} denote the vectors consisting entirely of zeros, ones, respectively, and inequalities between vectors are interpreted component-wise. (A3) The queue length process Q is constrained to Z`+ . The allocation process is defined by a stationary policy in the three examples considered below. In this case, the controlled process Q is a Markov chain whose transition probability on Z`+ ×Z`+ is again denoted P , and the expectation operator is denoted Ex when Q(0) = x ∈ Z`+ . We denote the common mean of the arrival and service variables using the usual notation, E[Ai (k)] = αi ,

E[Sj (k)] = µj , 6

1 ≤ i ≤ `, 1 ≤ j ≤ `r ,

and we let B = E[B(k)]. The load vector may be expressed, ρ = −CB −1 α, and the system load is the maximum, ρ• = maxi ρi . The associated fluid model evolves in continuous time, and is defined using these averaged parameters: q(t) = x + Bz(t) + αt, t ≥ 0, (8) where x ∈ R` is the initial condition, and z is the cumulative allocation process, subject to the constraints z(t) − z(s) ≥ 0,

C[z(t) − z(s)] ≤ (t − s)1,

0 ≤ s ≤ t.

In this paper we consider one dimensional relaxations of the CRW and fluid models, based on the following `r × ` workload matrix, Ξ = C[I − RT ]−1 . The inverse exists as a power series since the routing matrix has spectral radius strictly less than unity. The workload process associated with the CRW model is defined as W (k) = C[I − RT ]−1 Q(k), k ≥ 0. This is a normalized version of workload, chosen so that its components have non-negative, integer values. For each 1 ≤ m ≤ `r we let ξ m ∈ Z`+ denote the mth row of Ξ. The mth component of the workload process may be expressed Wm (k) = hξ m , Q(k)i, and evolves according to the recursion, Wm (k + 1) = Wm (k) − Sm (k + 1) + Sm (k + 1)Im (k) + Lm (k + 1),

k ≥ 0,

where Lm (k) = hξ m , A(k)i, and the idleness process I m is defined by X Im (k) = 1 − Ui (k). i:Cmi =1

This satisfies 0 ≤ Im (k) ≤ 1 for each k. The sequence Lm is interpreted as the “effective arrival process” to station m. We let λm denote the common mean of {Lm (k) : k ≥ 1}, so that ρm = λm /µm , 1 ≤ m ≤ `r . The one-dimensional relaxation is defined on the same probability space with Q, and evolves as a controlled random walk, cm (k + 1) = W cm (k) − Sm (k + 1) + Sm (k + 1)Iˆm (k) + Lm (k + 1), W

k ≥ 0.

This is identical in form to the previous recursion, but in this relaxation we allow the idleness process to take any value in Z+ . It is assumed that the idleness process is adapted to A, S, c m is constrained to Z+ . with non-negative integer values, and that W A relaxation for the fluid model is defined in the same manner: For a given m ∈ {1, . . . , `r }, the workload process is respresented as the solution to the controlled ODE, d+ w ˆm (t) = −(µm − λm ) + ι(t), dt 7

where w ˆ m is constrained to R+ , and the idleness rate ι is also non-negative. One must define a cost function on the workload process in order to formulate an optimal control problem for the one dimensional relaxations. Fix m ≤ `r as above, and given any cost function c : R`+ → R+ , define the effective cost c : R+ → R as the solution to the non-linear program, c(w) = min c(x) s.t. hξ m , xi = w (9) x ∈ R`+ . For a given w ∈ R+ , the effective state x∗ (w) is defined to be any x ∈ R`+ that optimizes (9). In the following two typical situations the effective cost is easily computed: (i) Linear cost function If c(x) = cT x, x ∈ R`+ , then the effective state is given by  1 ∗ x∗ (w) = m ei w, w ∈ R+ , ξi∗  where the index i∗ is any solution to ci∗ /ξim∗ = min1≤i≤` ci /ξim . The effective cost is thus given by the linear function, c(w) = c(x∗ (w)) = (ci∗ /ξim∗ )w, w ∈ R+ .

(ii) Quadratic cost function If c(x) = 21 xT Dx, x ∈ R`+ , with D > 0 then the effective state is again linear in the workload value. We have the explicit expression,  −1 −1 m  w ∈ R+ , x∗ (w) = ξ m T D−1 ξ m D ξ w,

provided D −1 ξ m ∈ R`+ so that x∗ is feasible. In this case the effective cost is the onedimensional quadratic, −1 2 w , w ∈ R+ . c(w) = 21 ξ m T D −1 ξ m The feasibility constraint is always satisfied when D is a positive-definite diagonal matrix. In cases (i) and (ii) the effective cost is monotone. Consequently, the following lower bounds hold for the stochastic and fluid models, ∗ (k)), cm c(Q(k)) ≥ c(W

c(q(t)) ≥

∗ (t)), c(w ˆm

k ∈ Z+ , t ∈ R+ ,

∗ (0) = w ∗ (0) = hξ m , Q(0)i. The ‘star’ cm where each process has the common initialization, W ˆm is used in this notation to stress that the relaxed model is controlled using the non-idling policy.

When the cost function c is linear one may construct a solution for the fluid model that ∗ couples with w ˆ ∗ in the sense that q(t) = (ξim∗ )−1 ei hξ m , q(t)i after a transient period T1 that is bounded as ρm ↑ 1 (see [14, Theorem 4.1]). An analogous result holds for any monotone cost function in which the effective state lies on a fixed line in R`+ . Consequently, one has c(q(t)) = c(w ˆ∗ (t)) for t ≥ T1 . This one-dimensional state-space collapse is also seen in heavy-traffic limit theory of stochastic networks [8, 2, 20, 14]. Observe that in the case of quadratic cost with D > 0 diagonal, then x∗i > 0 for each buffer i at station m. This fact makes translation of a policy from the fluid model to its discretestochastic counterpart relatively transparent. This is not true when the cost is linear. In this 8

case, x∗i = 0 for all but one value of i when w 6= 0. If buffer i∗ does not reside at station m then one must use some form of safety-stock to avoid starvation at the bottleneck station. In each of the examples that follow we have by assumption i∗ = 1, so that all inventory lies at the first buffer in the relaxation. An approximate translation of this policy to the CRW model is obtained using the following switching curve, sγ (x1 ) := γ log(1 + x1 γ −1 ),

x1 ≥ 0,

(10)

where γ > 0 is fixed. When the total inventory at the bottleneck station m satisfies at time k, X Qi (k) ≤ sγ (Q1 (k)) i:Cmi =1

then there is risk of starvation at station m, and the policy is designed to move customers from buffer 1 to the down-stream station. In the following three sections we describe in detail the construction of a policy, and appropriate values for the parameter γ.

3

Tandem queues α1

µ1

µ2

Station 1

Station 2

Figure 2: Two queues in tandem.

The model shown in Figure 2 was considered recently in [6] in their analysis of value functions for networks. We assume as in that paper that µ1 c1 < µ2 c2 and µ1 > µ2 . For the fluid model there is a pathwise optimal policy that maintains ζ1 (t) = 0 until buffer two is empty. Thereafter, the second buffer remains empty, but ζ1 (t) = µ2 /µ1 and ζ2 (t) = 1 until the system is empty. We focus on the second workload process since ρ1 < ρ2 . This is given by the sum d+ w2 (t) = q1 (t) + q2 (t) in the fluid model, and satisfies the lower bound dt w2 (t) ≥ −(µ2 − α1 ) under any policy. The lower bound is achieved if, and only if ζ2 (t) = 1. Its relaxation is +

d defined by the controlled differential equation, dt w ˆ2 (t) = −(µ2 − α1 ) + ι(t), where ι(t) ≥ 0 for all t ≥ 0, and the workload process w ˆ 2 is non-negative. The effective cost is given by c(w) = min c(x), subject to x1 + x2 = w, or c(w) = c1 w for w ∈ R+ . The dynamics of qˆ are defined as follows. Given qˆ(0) = x, we have qˆ(0+) = (x1 + x2 )e1 . The process then moves exactly as the primary fluid model in a straight line towards the origin, so that qˆ(t) = x∗ (w(t)) ˆ for all t > 0. The value function for this relaxation is given by, (x1 + x2 )2 Jˆ∗ (x) = 21 c1 , x ∈ R2+ . µ2 − α1

9

The fluid value function for q is given by the following perturbation: J∗ (x) = Jˆ∗ (x) + 21 (c2 − c1 )

x22 , µ2

x ∈ R2+ .

(11)

The gap J∗ (x) − Jˆ∗ (x) is non-negative, and remains bounded as ρ2 ↑ 1 for fixed x ∈ R2+ . As for the stochastic network, we first consider a model obtained via uniformization, as in [6]. This is of the form (7) where the three-dimensional process {(A1 (k), S1 (k), S2 (k)) : k ≥ 1} is i.i.d. with marginal distribution given by, P{(A1 (k), S1 (k), S2 (k))T = ej } = µj−1 ,

j = 1, 2, 3,

where we have set µ0 := α1 = E[A1 (k)], and ej denotes the jth standard basis vector in R3 . The workload process corresponding to the second station is defined as W2 (k) = Q1 (k) + Q2 (k), k ≥ 0. Its relaxation is a version of the CRW queue given in (3): c2 (k + 1) = W c2 (k) + A1 (k + 1) − S2 (k + 1) + S2 (k + 1)Iˆ2 (k), W

k ≥ 0.

(12)

The idleness process Iˆ2 is adapted to {(A1 (k), S1 (k), S2 (k)) : k ≥ 1}, and is non-negative. This one-dimensional relaxation has mean drift α1 − µ2 , and second order parameters given by m2 = µ2 + α1 and m2A = α1 . Applying Proposition 1.1, it is evident that the steady-state mean of the effective cost is c2 (k))] = ηˆ∗ := lim E[c(W k→∞

α1 c1 . µ2 − α1

(13)

The relative value function for (12) is given by a simple quadratic, ˆ ∗ (w) = Jˆ∗ (w) + 1 c1 h 2

w w2 + w = 21 c1 , µ2 − α1 µ2 − α1

w ∈ Z+ .

We now define a policy for the CRW model using the switching curve sγ : R+ → R+ defined in (10). The policy at station 2 is non-idling, and at station 1 the policy is defined by U1 (k) = I(Q2 (k) ≤ sγ (Q1 (k)) and Q1 (k) ≥ 1), k ≥ 0. This is a minor modification of the policy proposed in [6] - the constant γ −1 within the logarithm in (10) is introduced to ensure a moderate slope at the origin. If γ > 0 is chosen sufficiently large than station 2 is rarely starved of work, and the policy ensures that the set {x ∈ Z2+ : x2 ≤ 2 + sγ (x1 )} is absorbing for the controlled process, so that the second queue is much smaller than the first whenever the system is congested. Consider the following candidate Lyapunov function, V (x) = J∗ (x) + 21 c1

c1 x1 + x2 x2 + 1 (c2 − c1 ) + b(x), µ2 − α1 2 µ2 µ2 − α1

x ∈ Z2+ ,

(14)

where J∗ is defined in (11), and the function b : Z2+ → R+ is defined as the correction term, b(x) =



 1 x1 e−βx2 , µ1 − µ2 10

x ∈ Z2+ ,

with β = log(µ1 /µ2 ). Asymptotic optimality for the policy (10) is established in the following result. Fluidscale optimality was obtained previously in [6] for this model using a similar policy. In Proposition 3.1 the service rate µ1 is fixed, and µ2 varies between µ1 and α1 . At the minimal value µ2 = α1 we have ρ2 = α1 /µ2 = 1, and ρ1 = ρ¯1 := 21 (1 − µ1 )/µ1 < 1. Proposition 3.1 Consider the tandem queue with µ1 > 31 fixed, and with γ fixed at some value satisfying γ ≥ 3(| log(¯ ρ1 )|)−1 . Then, the following bounds hold uniformly for this set of parameters, and uniformly over x ∈ Z2+ : For a fixed constant K0 depending only on µ1 , (i) Lyapunov drift inequality: The bound (4) holds, where  E(x) ≤ K0 log(1 + kxk) +

−2  1 1 + kxk , µ2 − α1

x ∈ Z2+ .

(ii) Fluid-scale optimality: The two dimensional controlled model satisfies the bound (5). (iii) Heavy-traffic optimality: The steady-state mean η for the two dimensional controlled model satisfies (6), with ηˆ∗ given in (13). Proof Given the result (i), the remaining conclusions follow from general results presented in the Appendix: Part (ii) follows directly from Proposition B.5, and Part (iii) is given in Proposition B.8. We now proceed with the most difficult part: establishing the drift inequality in (i). Consider first the transition kernel applied to the function without the correction term, h0 (x) := J∗ (x) + 12 c1

x1 + x2 x2 + 21 (c2 − c1 ) µ2 − α1 µ2

After some manipulations, an application of the transition kernel to this function gives the following identities for x ∈ Z2+ , U (0) = (a1 , a2 )T ∈ {0, 1}2 : P h0 (x) = h0 (x) − c(x) + ηˆ∗

    c2 −c1 c1 µ x + a µ (x + 1) . + (1 − a2 ) µ2 −α1 2 1 1 1 2 µ2

(15)

For x2 > sγ (x1 ), so that U1 (0) = a1 = 0, U2 (0) = a2 = 1, this satisfies the desired drift inequality exactly. However, there is trouble when x2 = 0, in which case a2 = 0, and the potentially large term c1 (µ2 − α1 )−1 µ2 x1 is introduced. The function b( · ) is included in the definition (14) to counteract this. Write b0 (x) = e−βx2 and b1 (x) = x1 b0 (x), x ∈ Z2+ . For any initial state and action we have, P b0 (x) = [α1 + µ1 e−βa1 + µ2 eβa2 ]b0 (x). The constant β is chosen so that the function b0 is locally harmonic near the lower boundary: If a1 = a2 = 1 then P b0 = b0 . In fact, a closer look at the policy yields the following identities

11

whenever x2 ≥ 1: P b0 (x) = b0 (x) = b0 (x) + µ1 (eβ − 1)

1 ≤ x2 ≤ sγ (x1 ); x2 > sγ (x1 ).

P b1 (x) = b1 (x) − (µ2 − α1 )b0 (x) = b1 (x) + α1 b0 (x) + µ1 (eβ − 1)b1 (x)

1 ≤ x2 ≤ sγ (x1 ); x2 > sγ (x1 ).

(16)

Moreover, we have the pair of bounds, b0 (x) ≤ e−βγ log(1+x1 /γ) ≤ (x1 /γ)−βγ ,

b1 (x) ≤ x1 (x1 /γ)−βγ ,

when x2 > sγ (x1 ). (17)

The requirement that γ ≥ 3β −1 is imposed to obtain strong bounds on these functions above the switching curve sγ . In particular, the mean increment P b1 − b1 is uniformly bounded over x ∈ Z2+ . We have, for x2 = 0, P b0 (x) = b0 (x) − (µ1 − µ2 ) = 1 − (µ1 − µ2 ); P b1 (x) = b1 (x) − (µ2 − α1 ) − (µ1 − µ2 )x1 .

(18)

The coefficient of b2 in the function b is chosen so that P b (x) = −c1 (µ2 −α1 )−1 µ2 x1 whenever x2 = 0, x1 6= 0. Combining the bounds (18) with (15) gives, P V (x) − [V (x) − c(x) + ηˆ∗ ]     c2 −c1 1 µ x + a µ (x + 1) = (1 − a2 ) µ2c−α 2 1 1 1 2 µ2 1    c1 1 + µ2 −α1 µ1 −µ2 −b1 (x) + P b1 (x)   c1 1 1 (x + 1) + (1 − a )(µ − µ )b (x) − (a µ − α )b (x) = a1 µ1 c2µ−c 2 1 1 2 1 1 2 1 0 µ2 −α1 µ1 −µ2 2

(19)

The first term on the right hand side is O(log(x1 )) since x2 ≤ sγ (x1 ) whenever a1 = 1. The second term above is always negative when a1 = 1. If a1 = 0, it follows that x2 ≥ sγ (x1 ), and from (17) and the lower bound imposed on γ one may conclude that b0 (x) + b1 (x) = u t O (1 + x1 + x2 )−2 . This completes the proof of (i). Note that the same method of proof provides similar bounds for general service and arrival processes satisfying (A1) when A and S are independent. The function b0 is defined as above, with β again chosen so that P b0 = b0 when a1 = a2 = 1. When x2 = 0, x1 ≥ 1, we have the following analog of (18):   P b1 (x) = E (x1 + A1 (1) − S1 (1))e−β(x2 −S1 (1))    = b1 (x) − E[eβS1 (1) ] − 1 x1 + E (A1 (1) − S1 (1))eβS1 (1)   = b1 (x) − µ1 eβ − 1 x1 − (1 − α1 )µ1 eβ − (1 − µ1 )α1 . The results above may be obtained with only superficial changes since β > 0. Details on this more general construction are included in the following two examples. 12

4

Routing model α

µ3

router

µ2

µ1

Figure 3: A simple routing model.

We now return to the routing model shown in Figure 3. To obtain a simplified model we set µ3 = ∞. On eliminating buffer 3 we arrive at the following two-dimensional stochastic network model, Q(k + 1) = Q(k) − S1 (k + 1)U1 (k)e1 − S2 (k + 1)U1 (k)e2 + A(k + 1)U3 (k)e1 + A(k + 1)U4 (k)e2 ,

k ≥ 0.

It is assumed that (A1)–(A3) are satisfied, and that A and S are independent processes. Moreover, it is assumed that S1 S2 = 0 a.s.. That is, at most one service is completed in each time slot. The allocation process is subject to the equality constraint that U3 (k) + U4 (k) = 1 for k ≥ 0. The arises from the constraint that the buffer at the router is always zero. The network load is given by ρ• = α1 /(µ1 + µ2 ), and the natural workload model is defined by W (k) = Q1 (k) + Q2 (k). Its relaxation is defined for k ≥ 0 by the recursion, c (k + 1) = W c (k) + A(k + 1) − (S1 (k + 1) + S2 (k + 1)) + (S1 (k + 1) + S2 (k + 1))I(k). ˆ W

The idleness process Iˆ satisfies the assumptions imposed in the previous example. It is nonc 2 . The relaxation for the fluid model is defined analogously. It negative, and adapted to W may be expressed as the solution to the differential equation, d+ w(t) ˆ = −(µ1 + µ2 − α1 ) + ι(t), dt

t ≥ 0.

We again assume that there is a linear cost function associated with the two buffers, and we take c1 ≤ c2 (without loss of generality). The optimal value function for the relaxation is (x1 + x2 )2 , Jˆ∗ (x) = 12 c1 µ1 + µ2 − α1

x ∈ R2+ ,

and the value function for q under the optimal policy is 2

x J∗ (x) = Jˆ∗ (x) + 21 (c2 − c1 ) 2 , µ2 13

x ∈ R2+ .

The relative value function and the optimal average-cost for the one-dimensional stochastic model are given by ˆ h∗ (w) = ηˆ∗ =

2 1 w + (1 − 2α1 )w , c 2 1 µ +µ −α 1 2 1 2 1 α1 (1 − 2α1 ) + mA c . 1 2 µ1 + µ2 − α1

w ∈ R+ ,

(20) (21)

This follows from Proposition 1.1 and the independence assumption on A, S. For optimization we consider two very different special cases: Case I If c1 < c2 then the optimal policy for the (unrelaxed) fluid model sends all arrivals to buffer 1, up until the first time that q2 (t) = 0. From this time up until the emptying time for the network, the policy maintains q2 (t) = 0, but sends material to this buffer at rate µ2 , so that ζ1 + ζ2 = 1 for all 0 ≤ t < w/(µ1 + µ2 − α1 ). This second requirement ensures that the policy is time-optimal. In the stochastic model with c1 < c2 we again consider the policy defined by the logarithmic switching curve (10). In this model the policy dictates that arriving packets are routed to buffer 2 if and only if x2 ≤ sγ (x1 ). Case II When the cost parameters are equal then any time-optimal policy is optimal for the fluid model, and we have J∗ (x) = Jˆ∗ (x) for all x. In the stochastic model we consider the linear switching curve s(x1 ) = $x1 , x1 ≥ 0, where the constant $ > 0 is fixed, but arbitrary. We see in Proposition 4.1 that the insensitivity found in the fluid model is reflected in the CRW model, in the sense that any $ ∈ (0, ∞) gives similar asymptotic performance. See [4] for related results. In either case, the Lyapunov function considered in this model is based on J∗ and the relative value function ˆ h∗ defined in (20): V (x) = J∗ (x) + 21 c1

(1 − 2α1 )(x1 + x2 ) + k1 x2 e−β1 x1 + k2 x1 e−β2 x2 , µ1 + µ2 − α1

x ∈ R2+ ,

where {βi > 0} solve, E[exp(βi (Si − A))] = 1,

i = 1, 2.

The constants are expressed    µi 1 ki = ci , µ1 + µ2 − α1 1 − E[e−βi A ]

(22)

i = 1, 2.

To formulate an analog of Proposition 3.1 we construct a family of models. Suppose that A is an arrival process satisfying E[A1 (k)] = µ1 + µ2 , and P{A1 (k) = 0} > 0. For each 0 ≤ ϑ ≤ 1, define the thinning of this arrival stream by Aϑ (k) = T (k)A1 (k), where T is an i.i.d. Bernoulli process that is indpendent of (A1 , S) with P{T (k) = 1} = ϑ. The system load is given by ρϑ• = ϑ for each ϑ ∈ [0, 1]. 1

14

Proposition 4.1 Suppose that γ ≥ 3/βi for each i ∈ {1, 2} and ϑ ∈ [0, 1), where {βi } solve (22). Then, under the conditions of this section, (i) The conclusions (i)–(iii) in Proposition 3.1 hold for the routing model, where ηˆ∗ is given in (21), and all of the bounds are uniform in the parameter 0 ≤ ϑ < 1. (ii) If c1 = c2 and the linear switching curve is used, then the last bound can be strengthened. In this case, there exists K0 < ∞ such that ηˆ∗ ≤ η ≤ ηˆ∗ + K0 ,

0 ≤ ϑ < 1.

Proof The first half of the proof is identical to the proof of Proposition 3.1. To see why the bound improves when c1 = c2 we note that the bound (19) obtained for the tandem queue has the following analog here: P V (x) = V (x) − c(x) + ηˆ∗ + E(x), where, for some constant K1 < ∞,   E(x) ≤ K1 (1 − ϑ)−1 x1 e−β2 x2 I(x2 ≥ $x1 ) + x2 e−β1 x1 I(x2 ≤ $x1 ) , x ∈ R2+ . It is easily shown that the steady state mean of this function is uniformly bounded in ϑ, for any fixed $ ∈ (0, ∞), and this is sufficient to obtain the desired bound. u t

We illustrate some of the conclusions of Proposition 4.1 through numerical results. A CRW model is obtained via uniformization of a continuous-time model with Poisson arrivals and exponential service, so that the common distribution of D(k) := (S1 (k), S2 (k), A1 (k))T on Z3+ is specified as follows: P{D(k) = e1 } = µ1 ;

P{D(k) = e2 } = µ2 ;

P{D(k) = e3 } = α1 ,

(23)

where ej denotes the jth standard basis vector in R3 . We take ρ• = 9/10, µ1 = µ2 , α + µ1 + µ2 = 1, and the one-step cost is assumed linear with c1 = 2 and c2 = 3. The nonlinear switching curve (10) is appropriate since c1 < c2 . The formula determining β2 in this model may be expressed, α1 e−β2 + µ1 + µ2 eβ2 = 1, which gives β2 = log(α1 /µ2 ) for α1 > µ2 . For the specific values α1 = 9/19, µ1 = µ2 = 5/19, this gives a value of 3β2−1 = 3/ log(9/5) ≈ 5.1 for the lower bound on γ to obtain fluid-scale optimality. The grey region shown in Figure 1 indicates the optimal policy, obtained using value iteration. This result is taken from [13]. The switching curve sγ with γ = 5 is also shown in this figure. It is a remarkably accurate approximation to the optimal policy.1 1

Note that numerical computation of the optimal policy for large values of x is difficult for two reasons. One is that the sensitivity of performance with respect to policy is small for large x away from the boundary of the state space. Secondly, for large x there are boundary effects due to the truncation of the infinite state space. Here the maximum buffer size was taken to be 100. Consequently, values of the policy for x1 beyond, say, 50 in Figure 1 must be viewed with some skepticism.

15

5

The Dai-Wang network

Station 2

α1

Station 1

Figure 4: The five-buffer model of Dai and Wang.

The model shown in Figure 4 was introduced in [5] to show that a central limit scaling may not lead to a useful limit as the system load approaches unity. We consider this model here to show how to control a network in which the majority of inventory is stored several steps away from the bottleneck. The CRW model is considered under Assumptions (A1)–(A3). It is assumed that the second station is a bottleneck, so that ρ1 < ρ2 = ρ• , with ρ1 = 3α1 /µ1 and ρ2 = 2α1 /µ2 . The two workload processes are defined by, W1 (k) = 3Q1 (k) + 2Q2 (k) + Q3 (k) + Q4 (k) + Q5 (k)  W2 (k) = 2 Q1 (k) + Q2 (k) + Q3 (k) + Q4 (k),

k ≥ 0.

(24)

Consider the relaxation of the second workload process. This is again defined as a CRW model with mean drift −(2α1 − µ2 ), and may be expressed recursively as, c2 (k + 1) = W c2 (k) + 2A1 (k + 1) − S2 (k + 1) + S2 (k + 1)Iˆ2 (k), W

k ≥ 0.

(25)

The one-dimensional relaxation of the fluid model is defined analogously. Given any linear cost function on buffer levels, the effective cost is given by c(w) = c∗ w, with c∗ = min{ 21 c1 , 12 c2 , 12 c3 , c4 }. We assume that the minimum is achieved uniquely by the first term, so that c1 < min{c2 , c3 , 2c4 }. The value function for the fluid relaxation qˆ is, 2

(2(x1 + x2 + x3 ) + x4 ) Jˆ∗ (x) = 12 c∗ , µ2 − 2α1

x ∈ R5+ .

(26)

Proposition 1.1 again implies that the relative value function for the one-dimensional model (25) is the simple quadratic, 2 ˆ ∗ (w) = 1 c∗ w + d∗ w , h 2 µ2 − 2α1

w ∈ R+ ,

(27)

2 2 2 2 2 2 where d∗ = µ−1 2 (m2 − mA2 ), with m2 = E[(S2 (1) − 2A1 (1)) ], and mA2 = 4E[A1 (1) ]. The optimal average cost for (25) is given by

ηˆ∗ = 21 c∗ σ22 (µ2 − 2α1 )−1 , 16

(28)

with c∗ = 21 c1 , and σ22 := ρ2 m22 + (1 − ρ2 )m2A2 . To mimic the optimal policy for the relaxation we wish to keep most inventory in buffer one. However, it is also necessary to feed station 2 when starvation is imminent. The following policy is constructed based upon these considerations. Note that this is the only instance of a randomized policy that is considered in this paper. The decision rule a2 = 2/5 in (i) means that P{U2 (0) = 1} = 1 − P{U2 (0) = 0} = 2/5. As in the previous examples, the switching curve sγ is defined in (10). (i) If x3 + x4 ≤ sγ (x1 ) and x2 6= 0, then a2 = 2/5. Otherwise, a2 = 0. (ii) If x2 + x3 + x4 ≤ sγ (x1 ) and x1 6= 0, then a1 = 3/5. Otherwise, a1 = 0. (iii) If x5 6= 0 then a5 = 1 − a1 − a2 . (iv) At station 2 the policy is defined as the non-idling, priority policy. Priority is given to buffer 3 if the cost parameters satisfy c3 > 2c4 , and to buffer 4 otherwise. There are two regions in which starvation avoidance is prioritized: P1 = {x ∈ Z5+ : x3 + x4 ≤ sγ (x1 ), x2 6= 0},

P2 = {x ∈ Z5+ : x2 + x3 + x4 ≤ sγ (x1 )}. (29)

When the condition Q(k) ∈ P1 holds over some period of time then the system remains in ‘panic mode’ in which the inventory at station 2 exhibits a positive drift. The second buffer exhibits a long-run positive drift, at rate 51 µ1 , during a time-period in which Q(k) ∈ P2 . Note that the region X := {x ∈ Z5+ : x2 + x3 + x4 ≤ sγ (x1 ) + 2} is absorbing for the controlled process. The construction of a Lyapunov function for Q is greatly simplified by restricting the process to this region. It is based on the fluid value function J∗ and the relative ˆ ∗ as in the previous examples. value function h The value function for the fluid model is complex, but it is C 1 , piecewise quadratic, and satisfies for fixed µ1 , and a finite constant K5.1 that is independent of {α1 , µ2 }, Jˆ∗ (x) ≤ J∗ (x) ≤ Jˆ∗ (x) +

K5.1 kxk2 , 2µ1 − 3µ2

ρ2 = ρ• → 1.

(30)

We now obtain a formula for J∗ on the restricted domain, R0 := {x ∈ R5+ : x2 + x3 + x4 = 0}. Define J∗0 : R2+ → R+ to be the piecewise quadratic given by J∗0 (x1 , x5 ) = J∗ (x1 , 0, 0, 0, x5 ) for (x1 , x5 ) ∈ R2+ . When q(t) ∈ R0 and q1 (t) > 0, q5 (t) > 0, we have under the optimal policy, d q(t) = − 12 (δ2 , 0, 0, 0, 2δ1 − 3δ2 ), dt where δ1 = µ1 − 3α1 and δ2 = µ2 − 2α1 . Note that 2δ1 − 3δ2 > 0 under the assumption that ρ1 < ρ2 .

17

Let Z1 = {z ∈ R2 : δ2−1 z1 ≤ (2δ1 −3δ2 )−1 z2 }, and set Z2 = R2 \Z1 . Then J∗0 (z) = 12 z T D i z for z ∈ Zi , i = 1, 2, with ! !  2δ2−1 c1 − 23 δ1−1 (2δ1 − 3δ2 )c5 3δ1−1 c5 δ2−1 c1 0 2 1 , D = . D =2 0 (2δ1 − 3δ2 )−1 c5 3δ1−1 c5 δ1−1 c5 Observe that J∗0 is C 1 on {z ∈ R2+ : z1 ≥ 0}, and that the bound (30) is indeed satisfied on R0 . Although the value function J∗ is only defined on R5+ , it will be convenient to extend the domain of J∗0 to {z ∈ R2 : z1 ≥ 0}. We now construct an approximation to J∗ using J∗0 . Let ξ 1 = (3, 2, 1, 1, 1)T and ξ 2 = (2, 2, 2, 1, 0)T denote the two workload vectors, set wi = hξ i , xi for i = 1, 2 and x ∈ R5+ , set z = ( 21 w2 , w1 − 32 w2 )T , and define, J0 (x) = J∗0 (z1 , z2 ),

x ∈ R5+ .

Letting S1 = {x ∈ R5+ : (2δ1 − 3δ2 )−1 w1 ≤ δ2−1 w2 }, S2 = R5+ \ S1 , we have, J0 (x) = 21 z T Di z,

x ∈ Si , i = 1, 2.

Note that (x1 , x5 ) = z T for x ∈ R0 , so that J0 (x) = J∗ (x) on this domain. The Lyapunov function constructed below is based on the expression (27). As a first step, we introduce a linear term as follows,  c∗ h0 (x) := J0 (x) + 12 d∗ x ∈ Z5+ , (31) 2(x1 + x2 + x3 ) + x4 , µ2 − 2α1

where d∗ is defined below (27). The following bound is established in Appendix C using straightforward calculations based on the form of the policy and (30): Lemma 5.1 For some constant K5.1 ≥ 0 that is independent of load, whenever x ∈ X, P h0 (x) ≤ h0 (x) − c1 z1 − c5 z2 + ηˆ∗ µ2  + K5.1 (sγ (x1 ) + 1) + (1 − a3 − a4 ) c1 µ2 −2α w2 , 1

x ∈ Z5+ .

(32)

When x ∈ X satisfies x3 + x4 ≥ 1, so that a3 + a4 = 1, we thus obtain P h0 (x) − h0 (x) ≈ −c(x) + ηˆ∗ , where the error grows logarithmically in x1 . Finally, the Lyapunov function used to obtain asymptotic optimality is defined as follows: V (x) = h0 (x) + b(x),

x ∈ R5+ ,

where the function b consists of two parts, b(x) = k1 (x1 + x2 )e−β1 (2x3 +x4 ) + k2 x1 e−β2 (x1 +2x3 +x4 ) . The first term is introduced to provide a negative drift of order (x1 + x2 )(1 − ρ• )−1 station 2 is empty, and the second provides a negative drift when buffer 2 is empty. leads to the following specifications: The exponents solve,     1 = 52 Ex e−β1 (−S2 (k)+2S1 (k)) + 35 Ex eβ1 S2 (k) ,   1 = Ex e−β2 (S1 (k)−S2 (k)) . 18

when This (33) (34)

and the constants are given by,  −1  µ2  k1 = 52 (1 − e2β1 ) c1 µ2 −2α . 1  −1  β S (k)  k2 = E[eβ1 S1 (k) ] − E[eβ1 (S2 (k)−S1 (k)) ] E[e 1 1 ] − 1 k1 .

(35) (36)

To formulate an analog of Proposition 3.1 we again construct a family of models: Assume that (A1 , S) is given, with ρ2 = 1, ρ1 < 1. Let T denote an i.i.d. Bernoulli process that is indpendent of (A1 , S) with P{T (k) = 1} = ϑ, ϑ ∈ [0, 1], k ≥ 1. We then define Aϑ (k) = T (k)A1 (k) for k ≥ 1, ϑ ∈ [0, 1]. Let ϑ0 ∈ (0, 1) be a fixed value such that ρ1 < ρ2 for ϑ ∈ [ϑ0 , 1]. Note that because we have restricted to the set X in our construction of V , it is necessary to restrict to R0 ⊂ R5+ in the statement of fluid-scale optimality. The proof of Proposition 5.2 is very similar to the proof of Proposition 3.1. Details are provided in Appendix C. Proposition 5.2 Suppose that V = h0 + b is constructed using the values {βi , ki } given above, and that the policy satisfies γ ≥ 3/βi for each i ∈ {1, 2} and ϑ ∈ [ϑ0 , 1). Then the following bounds all hold uniformly in the parameter ϑ0 ≤ ϑ < 1. (i) Lyapunov drift inequality: The bound (4) holds on X, where  −2  1 , 1 + kxk E(x) ≤ K0 log(1 + kxk) + µ2 − 2α1

x ∈ X.

(ii) Fluid-scale optimality: The two dimensional controlled model satisfies the bound (5) for x ∈ R0 . (iii) Heavy-traffic optimality: The steady-state mean η for the two dimensional controlled model satisfies (6), with ηˆ∗ given in (28). u t We now present results from some numerical experiments performed using this model, with linear cost function given by c(x) = x1 + 2(x2 + x3 + x4 + x5 ), x ∈ R`+ . The family of CRW models considered was parameterized by a ‘burstiness parameter’ n ≥ 1. For given {µi , αi } satisfying µ1 + µ2 + α1 = 1, and given n ≥ 1, the common distribution of D(k) := (S1 (k), S2 (k), A1 (k))T was specified as follows: P{D(k) = e1 } = µ1 ; P{D(k) = e2 } = µ2 ;

P{D(k) = ne3 } = α1 n−1 ; P{D(k) = 0} = α1 (1 − n−1 ).

(37)

In each of the experiments the load at the first station was fixed at ρ1 = 0.8. We restrict ρ2 to the range [0.9, 1]. The constants {βi : i = 1, 2} are independent of n, and equations (33) and (34) may be solved explicitly: r  1  µ1 µ1  µ1 µ1 2 β1 e = + 10 + ; eβ2 = . 5 µ2 µ2 µ2 µ2

We have β1 < β2 whenever µ1 > µ2 , which is the case under our assumption that ρ1 < ρ2 . 19

For heavy-traffic optimality we require γ ≥ 3 max(β1−1 , β2−1 ) = 3β1−1 for all ρ2 ∈ [0.9, 1]. The largest value occurs when ρ2 = 0.9, and in this case µ1 /µ2 = (3ρ2 )/(2ρ1 ) = 27/16. Consequently, the lower bound given in Proposition 5.2 for heavy-traffic optimality is γ ≥ 3β1−1 ≈ 14.7. Shown in Figure 5 are six plots obtained using ρ1 = 0.8, ρ2 = 0.9, n = 1, 3, 5, 7, 9, 11, and γ = 1, 3, 5, 7, . . . , 19. For each n, the service and arrival processes were held fixed in the experiments using these ten values of γ, and the queue was initially empty, Q(0) = 0. The vertical axis shows the average of c(Q(k)) for k ≤ T = 105 . In all but the first instance, the optimal value of γ is seen to lie between 3 and 5. A second set of simulations were obtained with ρ2 increased, and ρ1 held fixed at 0.8. Shown in Figure 6 are results for ρ2 = 0.95 and ρ2 = 1. The arrival and service processes were again defined using (37), with burstiness parameter fixed at n = 7. The vertical axis again shows the average of c(Q(k)), but the time horizon was increased to k ≤ T = 106 to account for the greater system load. Note that the shape of the plot is virtually unchanged as the load varies between the three values ρ2 = 0.9, 0.95, 1.

20

T −1 1 X c(Q(k)) T k=0

28

39

27.5

38.5

27

38

26.5

37.5

26

37

n =1 ηˆ∗ = 6.75

25.5 25

n =3 ηˆ∗ = 15.75

36.5 36

24.5

35.5

24

35

23.5

34.5

23

34 1

3

5

7

9

11

13

15

17

19

1

γ

49

3

5

7

9

11

13

15

17

19

γ

65

48.5 64 48 63 47.5

n =5 ηˆ∗ = 24.75

47 46.5

n =7 ηˆ∗ = 33.75

62

61

46 60 45.5 59 45 44.5

58 1

3

5

7

9

11

13

15

17

19

1

γ

76

3

5

7

9

11

13

15

17

19

γ

93 92

75

91 74 90 73

89

n =9 ηˆ∗ = 42.75

72

71

n = 11 ηˆ∗ = 51.75

88 87 86

70 85 69

84

68

83 1

3

5

7

9

11

13

15

17

19

1

γ

3

5

7

9

11

13

15

17

19

γ

Figure 5: Average cost for the model of Dai & Wang with T = 105 . For n ≥ 3 the best value of γ is smaller than the minimal value of 14.7 obtained in Proposition 5.2.

21

T −1 1 X c(Q(k)) T k=0

112

500

495

110

490 108 485 106 480 104

102

ηˆ∗ = 71.25

475

ηˆ∗ = ∞

ρ2 = 0.95

470

ρ2 = 1

100

465

98

460 1

3

5

7

9

11

13

15

17

19

1

γ

3

5

7

9

11

13

15

17

19

γ

Figure 6: In these experiments the burstiness-parameter was fixed at n = 7, and the time horizon was taken to be T = 106 . The load is ρ2 = 0.95 in the plot shown at left, and ρ2 = 1 in the plot shown at right. The policy is fixed throughout. It is precisely the same as used in the simulations shown in Figure 5.

6

Conclusions and extensions

We have focused on these three examples to clarify discussion and simplify technical arguments, but it is clear that an asymptotically optimal policy may be constructed in many situations. It is likely that a completely general statement can be made for a wide class of network models in which there is a single bottleneck. When there are multiple bottlenecks the story appears to be significantly more complex. Firstly, the optimal policy for a relaxation of dimension two or higher is not obvious in general. When the effective cost is not monotone then the policy is approximated by switching curves that are approximately affine [4, 15]. Secondly, in dimensions two or higher the optimal cost and the relative value function have no closed-form expression even when the effective cost is monotone. Nevertheless, it will be worthwhile to continue this investigation to obtain logarithmic bounds analogous to (6) even when there are multiple bottlenecks, and the effective cost is not monotone. The high sensitivity of cost with respect to safety-stock parameters suggests that on-line learning approaches may be successfully implemented for performance improvement [9, 17]. The analysis of these algorithms will likely rely on parameterized Lyapunov functions of the form constructed in this paper. Acknowledgment. This paper grew out of discussions with Professors Arie Hordijk, Peter Glynn, David McDonald and Ad Ridder at the Applied Probability Workshop held at the Mathematisches Forschungsinstitut Oberwolfach during the first week of December in 2003 (http://www.mfo.de/.) Prof. Jim Dai also provided useful comments and corrections based on an earlier draft of this manuscript. The author would like to express his sincere thanks to these colleagues for their inspiration. He also expresses his thanks to the organizers of the workshop and to MFO for creating this stimulating environment.

22

A

Appendix: Bounds for the simple queue

We begin with bounds for the single queue (3). In each result below we assume that m2 := E[(A(1) − S(1))2 ] < ∞ and ρ = α/µ < 1. The policy is assumed non-idling: U (k) = I(Q(k) ≥ 1). Lemma A.1 The Markov chain Q is positive recurrent, and its unique invariant probability distribution π satisfies Pπ {Q(0) = 0} = 1 − ρ, and Pπ {Q(0) = n} ≤ (µ − α)

Pπ {Q(0) > n} , E[(A(1) − S(1))+ ]

n ≥ 1.

Proof We have seen in Proposition 1.1 that π exists, with a finite, computable mean. Define for x, n ∈ Z+ , un (x) = (x − n)+ ,

g(n) = E[(A(1) − S(1) − n)+ ],

g0 (n) = E[(A(1) − n)+ ].

We have E[u0 (Q(k + 1)) | Q(k) = x] = u0 (x) − (µ − α) + µI0 (x). Stationarity of π then gives µπ{0} = (µ − α), or Pπ {Q(0) = 0} = 1 − ρ. We have for each x ∈ Z+ , k ≥ 0, n ≥ 1, E[un (Q(k + 1)) | Q(k) = x] = un (x) + g(n − x)I[1,n] (x) + g0 (n)I0 (x) − (µ − α)I[n+1,∞] (x), which together with stationarity of π gives the identitiy, π(0)g0 (n) +

n X

π(x)g(n − x) = (µ − α)

∞ X

π(x),

x=n+1

x=1

where π(x) = Pπ {Q(1) = x}, x ∈ Z+ . This implies the desired bound since the left-hand side is greater than π(n)g(0) for each n ≥ 1. u t Lemma A.2 The first bound below holds when Q is stationary, with Q(0) ∼ π. The second holds for any initial condition x ∈ Z+ , and any stopping time τ :  −2  ≤ Eπ 1 + Q(0)

Ex

τ −1 hX k=0

Proof

2(µ − α) ; E[(A(1) − S(1))+ ]

h i −2 i −1 1 + Q(k) ≤ Ex [τ ] + (µ − α)−1 Ex Q(τ ) + 1 + Q(τ ) − [x + (1 + x)−1 ] . Lemma A.1 provides the first bound: Eπ



−2  1 + Q(0) ≤ (µ − α)



X 1 (1 + k)−2 E[(A(1) − S(1))+ ] n=0

23

To see the second bound we use convexity of the function (1+ x)−1 , x ≥ 0, which provides the following lower bound, −1 1 + Q(k + 1) −1 −2  ≥ 1 + Q(k) − 1 + Q(k) Q(k + 1) − Q(k) −1 −2  = 1 + Q(k) − 1 + Q(k) A(k + 1) − S(k + 1) − I(Q(k) = 0)S(k + 1).

Moreover, the queue satisfies the recursion,

 Q(k + 1) = Q(k) + A(k + 1) − S(k + 1) + I(Q(k) = 0)S(k + 1),

k ≥ 0.

Combining this identity with the previous bound gives, h i h i −1 −1 −2   1+Q(k+1) +Q(k+1) ≥ 1+Q(k) +Q(k) + 1− 1+Q(k) A(k+1)−S(k+1) .

Summing over k from 0 to τ − 1 and taking expectations then gives,

τ −1 h i  hX  −1 −2 i − [x + (1 + x)−1 ] . (µ − α) Ex − Ex [τ ] ≤ Ex Q(τ ) + 1 + Q(τ ) 1 + Q(k) k=0

u t

Lemma A.3 The following bounds hold for any non-zero initial condition x ∈ Z+ , with τ0 equal to the first hitting time to the origin:     x x , Varx τ0 ≤ m2 , Ex τ0 = µ−α (µ − α)3 where m2 = E[(S(1) − A(1))2 ].

Proof

Consider the two test functions, V1 (x) =

x , µ−α

V2 (x) =

2 1 x 2 µ − α,

x ∈ Z+ .

We have for all x ≥ 1, P V1 (x) = V1 (x) − 1, which implies that M1 (k) := k ∧ τ0 + V1 (Q(k ∧ τ0 )), k ≥ 0, is a martingale. It is uniformly integrable, and this implies the formula for the mean hitting time to the origin. To see the variance bound we first note that, P V2 (x) = V2 (x) − x +

2 1 m 2 µ − α,

x ≥ 1.

The Comparison Theorem of [16] then gives, Ex

0 −1 hτX

k=0

i Q(k) ≤ V2 (x) +

2   1 m 2 µ − α Ex τ0 ,

24

x ≥ 1.

(38)

The left hand side has the following representation: i i hP hP τ0 −1 τ0 −1 Ex Q(k) = (µ − α)E V (Q(k)) x 1 k=0 k=0 hP i τ0 −1 = (µ − α)Ex E [τ ] Q(k) 0 k=0 hP i ∞ = (µ − α)Ex E [τ ]I(k < τ ) 0 0 Q(k) k=0 hP i ∞ = (µ − α)Ex (τ − k)I(k < τ ) 0 , k=0 0

where the last identity follows from the the smoothing property of the conditional expectation EQ(k) [τ0 ] = E[τ0 − k | Q(0), . . . , Q(k)],

k ≥ 0,

and the fact that τ0 is a stopping time. The right hand side may be further transformed as follows, hP hP i i τ0 −1 ∞ Ex Q(k) = (µ − α)E (τ − k)I(k < τ ) x 0 0 k=0 k=0 hP i τ0 −1 = (µ − α)Ex (τ − k) 0 k=0 hP i τ0 = (µ − α)Ex k=1 k   = 21 (µ − α)Ex τ02 + τ0 .

Combining this with (38) then gives

  m2 2 V2 (x) + 21 Ex τ0 , µ−α µ−α   and substituting the expressions for V2 (x) and Ex τ0 then gives,   Ex τ02 ≤

   Ex τ02 ≤

x 2 x . + m2 µ−α (µ − α)3

We finally arrive at the desired bound,

 2     Varx τ0 = Ex τ02 − Ex τ0 ≤ m2

B

x , (µ − α)3

x ≥ 1. u t

Appendix: Bounds for a parameterized family

To obtain asymptotic bounds we consider in this section a family of networks satisfying (A1)– (A3), parameterized by a scalar ϑ ∈ [0, 1] that represents load. It is assumed that, for some fixed ϑ0 ∈ (0, 1), there is a unique integer m ∈ {1, . . . , `r } satisfying ρm = ρ• = ϑ for each ϑ ∈ [ϑ0 , 1]. The cost function is assumed linear, and the effective cost is assumed given by c(w) = ∗ c(x (w)) = (ci∗ /ξim∗ )w, w ∈ R+ , with i∗ = 1. 25

We let Xϑ denote the maximally absorbing subset of Z`+ for the controlled network, equal to the set of states reachable from the origin: ∞ o n X ` P0 {Q(k) = x} > 0 . Xϑ = x ∈ Z+ : k=0

We let X ⊂ Z`+ denote a fixed set containing Xϑ for each ϑ ∈ [ϑ0 , 1]. We list here the remaining assumptions on the parameterized model: (a1) The routing matrix R and the policy are independent of ϑ ∈ [0, 1]. (a2) The random variables {Aϑ (k), S ϑ (k) : k ≥ 1, ϑ ∈ [0, 1]} are defined on a common probability space, and satisfy (A1) for fixed ϑ. Moreover, for fixed k ≥ 1 they are monotone in ϑ: S ϑ (k) ↓ S 1 (k),

Aϑ (k) ↑ A1 (k),

ϑ ↑ 1, a.s..

(a3) The randomized stationary policy gives rise to a uniformly irreducible Markov chain: for each N ≥ 1 we may find εN > 0, TN ≥ 1 such that for all kxk ≤ N , ϑ ∈ [ϑ0 , 1], Px {Q(TN ) = 0} ≥ εN . (a4) For each ϑ ∈ [ϑ0 , 1) there exists a Lyapunov function V = V ϑ satisfying (4) on X. Moreover, the Lyapunov functions and error constants satisfy the uniform bounds for x ∈ X and ϑ ∈ [ϑ0 , 1), J∗ (x) ≤ V (x) ≤ J∗ (x) + (1 − ϑ)−1 Ka4 (1 + kxk log(1 + kxk)), −2  , E(x) = Ka4 log(1 + c(x)) + (1 − ϑ)−1 1 + hξ m , xi

where Ka4 is a fixed constant.

Lemma B.1 Each of the three examples considered in this paper satisfies (a1)–(a4) provided γ satisfies the respective lower bounds given in Propositions 3.1–5.2. Proof The only part that requires proof is (a3). In each case we have for each N ≥ 1, 0 ≤ ϑ ≤ 1, P{Aϑ (k) = 0, 1 ≤ k ≤ N } ≥ P{A1 (k) = 0, 1 ≤ k ≤ N } > 0, which easily implies the desired bound.

u t

It is assumed throughout this section that the family of networks satisfies these four assumptions. In the results below we suppress dependency of W and Q on the parameter ϑ whenever possible to simplify notation. Lemma B.2 The optimal average-cost for the one-dimensional relaxation is given by ηˆ∗ =

1 1 2µ m

2 σm c1 , 1 − ϑ ξ1m

ϑ0 ≤ ϑ ≤ 1,

2 := ϑE[(S ϑ (1) − Lϑ (1))2 ] + (1 − ϑ)E[(Lϑ (1))2 ] < ∞. where σm m m m

26

Proof

c∗ . This follows directly from Proposition 1.1 and the identity λm /µm = ϑ for W m u t

For n ≥ 1, x ∈ R`+ , we define

hZ Jn (x, T ) := E

T 0

A lower bound is easily obtained.

i c(q n (t; x)) dt .

Lemma B.3 The lower bound holds, lim inf Jn (x, T ) ≥ J∗ (x),

T ≥ T◦ (x) ,

n→∞

where T◦ (x) denotes the draining time for the optimal fluid model with initial condition x ∈ R`+ . Proof The piece-wise linear trajectory in R2+ that interpolates {Ex [c(Q(k))] : k ≥ 0} is a feasible trajectory for the fluid model. Consequently, Z T hZ T i E c(q n (t; x)) dt ≥ inf c(q(t; y n )) dt, 0

0

where the infimum is over all feasible fluid trajectories, and y n = n−1 [nx]. The right hand side is equal to J∗ (y n ) for T ≥ T◦ (y n ). Continuity of J∗ completes the proof. u t Under (a1)–(a4) we have a version of the drift inequality (V3) of [16]:  P V (x) ≤ V (x) − 21 c(x) + ηˆ∗ + K1 IS (x) + (1 − ϑ)−1 (1 + hξ m , xi)−2 ,

x ∈ X,

(39)

where K1 is a fixed constant, and S ⊂ Z`+ is a fixed finite set, both independent of ϑ. We obtain the following crude upper bounds based on this drift inequality: Lemma B.4 The following bounds hold for some fixed KB.4 < ∞, and all x ∈ Z`+ , T > 0, ϑ ∈ [0, 1):   Jn (x, T ) ≤ KB.4 (1 − ϑ)−1 kxk2 + n−1 T ; η := Eπ [c(Q(k))] ≤ KB.4 (1 − ϑ)−1 .

Proof By the Comparison Theorem of [16] and the drift inequality (39) it follows that for all T > 0, x ∈ R` , n ≥ 1,  hX i −2 −1 −2 1 J (x, T ) ≤ n V ([nx]) + η ˆ T n + K E I (Q(k)) + (1 − ϑ) (1 + W (k)) , ∗ 1 nx S m 2 n k≤T n

and the first bound follows. The second bound also follows from the Comparison Theorem with KB.4 any fixed constant satisfying (1 − ϑ)−1 KB.4 ≥ 2ˆ η∗ + 2K1 (1 + (1 − ϑ)−1 ) for each ϑ ∈ [0, 1). u t The uniform bound on the average cost follows from these bounds: Proposition B.5 The following bounds hold for some fixed KB.5 < ∞: ηˆ∗ ≤ η ≤ ηˆ∗ + KB.5 (1 + | log(1 − ϑ)|), 27

ϑ0 ≤ ϑ < 1.

c ∗ (k) : k ≥ 0} are defined on the same probability space, Proof Recall that {Wm (k), W m and have a jointly stationary version. We consider this stationary process in the following. We have from the Comparison Theorem and (a4), h −2 i . (40) η ≤ ηˆ∗ + Ka4 Eπ log(1 + c(Q(k))) + (1 − ϑ)−1 1 + Wm (k)

Jensen’s inequality combined with Lemma B.4 provides a bound on the first term within the expectation, Eπ [log(1 + c(Q(k)))] ≤ log(1 + Eπ [c(Q(k))]) ≤ log(1 + KB.4 (1 − ϑ)−1 ). Minimality of the relaxation implies the following bound on the second term:  −2   −2  ∗ cm (k) . ≤ Eπ 1 + W Eπ 1 + Wm (k)

c ∗ , it is apparent that (40) gives the desired bound Consequently, applying Lemma A.2 to W m on η. u t

To refine the upper bound on Jn we require the bounds obtained in the following two lemmas: Lemma B.6 There exists KB.6 < ∞ such that for each non-zero x ∈ Z`+ and ϑ0 ≤ ϑ < 1, Ex

(x)−1 hTb∗X

−2

(1 + Wm (k))

k=0

i

≤ Ex

(x)−1 hTb∗X k=0

where Tb∗ (x) := (µm − λm )−1 hξ m , xi.

(1 +

c ∗ (k))−2 W m

i

≤ KB.6

p

kxk , (1 − ϑ)5/2



c m. Proof The first inequality follows from minimality of W ∗ c To see the second bound we apply Lemma A.2 to W m to obtain, for any T ≥ 0, −1 hTX i  ∗ ∗ cm cm (T )] + 1 − w , Ex (1 + W (k))−2 ≤ T + (µm − λm )−1 Ex [W

(41)

k=0

∗ (0) = hξ m , xi. For T = T b∗ (x), the right hand side becomes, cm where w = W

c ∗ (T )] + 1). (µm − λm )−1 (Ex [W m

This is bounded as follows: For any T ≥ 0,

∗ cm Ex [W (T )] ≤ w − (µm − λm )T + Ex [(T − τˆ0 )+ ] ∗

c m . For T = Tb∗ (x) we apply Jensen’s where τˆ0 denotes the first hitting time to the origin for W inequality and Lemma A.3 to obtain, r p w ∗ 2 c Ex [Wm (T )] ≤ Ex [(T − τˆ0 )+ ] ≤ Ex [(T − τˆ0 ) ] ≤ m ˆ2 , (42) (µm − λm )3 c ∗ defined in Lemma A.3. where m ˆ 2 is the variability parameter for W m Combining (41) and (42) with T = Tb∗ (x) gives the desired bound. 28

u t

Lemma B.7 For any T > 0, n ≥ 1, x ∈ R`+ , Ex

−1 hnT X k=0

Proof

i   log(1 + c(Q(k; [nx])) ≤ nT log 1 + T −1 nJn (x, T ) .

From Jensen’s inequality we have, −1 −1 hnT i  hnT i X X 1 1 Ex Ex log(1 + c(Q(k; [nx])) ≤ log 1 + c(Q(k; [nx])) . nT nT k=0

k=0

The result then follows from the definition of Jn (x, T ).

u t

Proposition B.8 For some KB.8 < ∞ we have the following uniform bound: Suppose that x ∈ R`+ satisfies [nx] ∈ X for each n ≥ 1. Then, 0 ≤ lim sup n→∞

 n  kxk + 1 Jn (x, T◦ ) − J∗ (x) ≤ KB.8 , log(n) 1−ϑ

ϑ ∈ [ϑ0 , 1).

Proof The lower bound is given in Lemma B.3. For the upper bound we again apply the Comparison Theorem to (4), which gives the following bound with T = T◦ = O(kxk(1 − ϑ)−1 ), x ∈ X, and n ≥ 1: i hX 1 (1 + Wm (k))−2 Ex Ka4 1−ϑ nT −1

Jn (x, T ) ≤ n

−2

V ([nx]) + n

−1

T ηˆ∗ + n

−2

k=0

+n

−2

Ka4 Ex

−1 hnT X k=0

i log(1 + c(Q(k; [nx]))k .

The two expectations on the right hand side may be bounded using Lemma B.6 and Lemma B.7, respectively, yielding,    n  1 −1 −2 lim sup T log 1+T nJ (x, T ) ≤ T, (43) Jn (x, T )−n V ([nx]) ≤ lim sup n n→∞ log(n) n→∞ log(n) where the final inequality follows from Lemma B.4. Note that one can establish the bound T◦ (x) − Tb∗ (x) ≤ Kkxk, x ∈ Z`+ , where the constant K is independent of load, which justifies the application of Lemma B.6 in the bound above. Moreover, it follows from (a4) and the fact that J∗ is piecewise-quadratic that 0 ≤ lim sup n→∞

 n  −2 kxk n V ([nx]) − J∗ (x) ≤ Ka4 , log(n) 1−ϑ

In particular, we see that limn→∞ n−2 V ([nx]) = J∗ (x).

29

ϑ ∈ [ϑ0 , 1).

(44)

Finally, we combine (43) and (44) to obtain the desired bound:   n  n  −2 lim sup Jn (x, T ) − J∗ (x) ≤ lim sup n V ([nx]) − J∗ (x) n→∞ log(n) n→∞ log(n)  n  Jn (x, T ) − n−2 V ([nx]) + lim sup n→∞ log(n) kxk + T. ≤ Ka4 1−ϑ This completes the proof since we are taking T = T◦ = O(kxk(1 − ϑ)−1 ).

C

u t

Appendix: Bounds for the Dai & Wang model

Proof of Lemma 5.1 Consider the piecewise quadratic function defined on R5+ by G := J0 − Jˆ∗ , where J0 is defined above Lemma 5.1, and Jˆ∗ is defined in (26). The function G is C 1 since J∗0 is quadratic within each of the two regions, {z ∈ R2 : z1 ≥ 0, δ2−1 z1 > (2δ1 − 3δ2 )−1 z2 },

{z ∈ R2 : z1 ≥ 0, δ2−1 z1 < (2δ1 − 3δ2 )−1 z2 },

and along the line of adjacency, {rδz : r ≥ 0}, where δz := 21 (δ2 , 2δ1 − 3δ2 )T , we have,   c1 h∇J∗0 (z), δz i = z T D 1 δz = z T D2 δz = . c5 This shows that ∇J∗0 is continuous on {z ∈ R2 : z1 ≥ 0}, and hence ∇G is continuous on R5 . Moreover, G(x) is uniformly bounded in ρ2 for fixed x ∈ R5 . This follows from the bounds given in (30). These observations imply the following uniform bounds. Fix k ≥ 0 and set x = Q(k), y = Q(k + 1). We then have by the definitions, J0 (y) − J0 (x) − h∇J0 (x), y − xi = Jˆ∗ (y) − Jˆ∗ (x) − h∇Jˆ∗ (x), y − xi

(45)

+ G(y) − G(x) − h∇G(x), y − xi Since Jˆ∗ is quadratic, the specific form (26) gives Jˆ∗ (Q(k + 1)) − Jˆ∗ (Q(k)) − h∇Jˆ∗ (Q(k)), Q(k + 1) − Q(k)i =

1 2µ

c∗ (W2 (k + 1) − W2 (k))2 . 2 − 2α1

Also, since G is C 1 we have, E[|G(Q(k + 1)) − G(Q(k)) − h∇G(Q(k)), Q(k + 1) − Q(k)i| | Q(k) = x] ≤ KC ,

a.s.,

where the constant KC < ∞ is independent of x and ρ2 . Consequently, (45) provides the bound, for any x ∈ Z5+ , E[J0 (Q(k + 1)) − J0 (Q(k)) | Q(k) = x] ≤ E[h∇J0 (Q(k)), Q(k + 1) − Q(k)i | Q(k) = x] c∗ + 21 E[(W2 (k + 1) − W2 (k))2 | Q(k) = x] + KC . µ2 − 2α1 30

Equivalently, writing Z(k) = ( 21 W2 (k), W1 (k) − 23 W2 (k))T , and ∆Z (x) = E[Z(k + 1) − Z(k) | Q(k) = x] for k ≥ 0, we have from the definition of J0 , P J0 (x) − J0 (x) ≤ h∇J∗0 (z), ∆Z (x)i +

1 2µ

c∗ Ex [(W2 (1) − W2 (0))2 ] + KC . 2 − 2α1

It follows that the function h0 satisfies, P h0 (x) − h0 (x) ≤ h∇J∗0 (Z(k)), ∆Z (x)i + ηˆ∗ + KC ,

x ∈ Z5+ .

(46)

To complete the proof we must consider in more detail the mean increment ∆Z (x). We first restrict to the case where x5 ≥ 1. From the definition we obtain,

Since J∗0 bound,

x ∈ Z5+ , x5 ≥ 1. ∆Z (x) = −δz + 21 µ2 (1 − a3 − a4 )(1, −3)T ,  has the property that h∇J∗0 (z), δz i = cc15 for any z ∈ R2 , z1 ≥ 0, we obtain the

1 z1 , h∇J∗0 (z), ∆Z (x)i ≤ −c1 z1 − c5 z2 + 12 µ2 (1 − a3 − a4 )D11

x ∈ Z5+ , x5 ≥ 1.

Here we have used the fact that ∇J∗0 (z) = D i z for some i ∈ {1, 2}, and considered the worst case over D 1 , D 2 . Hence the desired bound does hold whenever x5 ≥ 1. The second component of ∆Z (x) may not equal −δ2z when x5 = 0. However, in this case we have, |z2 | ≤ x2 + x3 + x4 ≤ sγ (x1 ) + 2. z2 = −(x2 + x3 + 12 x4 ) ≤ 0, We conclude that J∗0 (z) = 12 z T D1 z whenever x5 = 0 since z2 is non-positive. This observation combined with the upper bound given above then implies that, for a possibly larger constant KC , 1 h∇J∗0 (z), ∆Z (x)i ≤ −c1 z1 − c5 z2 + 12 µ2 (1 − a3 − a4 )D11 z1 + KC (1 + sγ (x1 )),

We conclude that (46) does imply the desired bound in Lemma 5.1.

x ∈ X, x5 = 0. u t

Proof of Proposition 5.2 It is sufficient to show that (4) holds and satisfies the bounds presented in Proposition 5.2 (i). The remaining results are obtained as in the proof of Proposition 3.1. Under the proposed policy we have,    Ex 2Q3 (1) + Q4 (1) − 2x3 + x4 > 0, x ∈ P1 ;    Ex Q2 (1) + 2Q3 (1) + Q4 (1) − x2 + 2x3 + x4 > 0, x ∈ P2 .

Due to this positive drift, it follows that there exist positive values {β1 , β2 } solving the equations,     Ex exp −β1 (2Q3 (1) + Q4 (1)) − (2x3 + x4 ) } = 1, x ∈ P1 ,     Ex exp −β2 (Q2 (1) + 2Q3 (1) + Q4 (1)) − (x2 + 2x3 + x4 ) } = 1, x ∈ P2 . These values coincide with the definitions given in (33) and (34). 31

As in the proof of Proposition 3.1, this implies that the two functions defined below are ‘locally harmonic’ within the respective ‘panic regions’, b01 (x) = e−β1 (2x3 +x4 ) ,

b02 (x) = e−β2 (x2 +2x3 +x4 ) .

Writing b1 (x) = (x1 + x2 )b01 (x), b2 (x) = x1 b02 (x), we have b(x) = k1 b1 (x) + k2 b2 (x) for x ∈ Z5+ . The constant k1 is chosen to cancel the positive drift in −h0 (x) + P h0 (x) that results when x3 + x4 = 0. We have,   −b1 (x) + P b1 (x) ≤ − 52 (1 − e2β1 ) (x1 + x2 ) x ∈ P1 ∩ {x3 + x4 = 0}, so that on considering (32) we obtain (35). Given this value for k1 , the constant k2 is used to compensate for a positive drift in (−b1 + P b1 )k1 when x2 is zero. Again, straightforward calculations give,   −b1 (x) + P b1 (x) ≤ E[eβ1 S1 (k) ] − 1 b1 (x)   −b2 (x) + P b2 (x) ≤ − E[eβ1 S1 (k) ] − E[eβ1 (S2 (k)−S1 (k)) ] b2 (x), x ∈ P2 ∩ {x2 = 0}. Using the fact that b1 (x) = b2 (x) when x2 = 0, we arrive at the value (36) for the second coefficient. u t

References [1] S. L. Bell and R. J. Williams. Dynamic scheduling of a system with two parallel servers in heavy traffic with complete resource pooling: Asymptotic optimality of a continuous review threshold policy. Adv. Appl. Probab., 11:608–649, 2001. [2] M. Bramson. State space collapse with application to heavy traffic limits for multiclass queueing networks. Queueing Systems: Theory and Applications, 30:89–148, 1998. [3] H. Chen and D. D. Yao. Fundamentals of queueing networks: Performance, asymptotics, and optimization. Springer-Verlag, New York, 2001. Stochastic Modelling and Applied Probability. [4] M. Chen, C. Pandit, and S. P. Meyn. In search of sensitivity in network optimization. Queueing Systems: Theory and Applications, 44(4):313–363, 2003. [5] J. G. Dai and Y. Wang. Nonexistence of Brownian models of certain multiclass queueing networks. Queueing Systems: Theory and Applications, 13:41–46, May 1993. [6] A. Gajrat, A. Hordijk, and A. Ridder. Large deviations analysis of the fluid approximation for a controllable tandem queue. To appear in Annals of Applied Probability, 2004. [7] B. Hajek. Optimal control of two interacting service stations. IEEE Trans. Automat. Control, AC-29:491–499, 1984. 32

[8] J. M. Harrison and Jan A. Van Mieghem. Dynamic control of Brownian networks: state space collapse and equivalent workload formulations. Ann. Appl. Probab., 7(3):747–771, 1997. [9] S.G. Henderson, S. P. Meyn, and V. Tadic. Performance evaluation and policy selection in multiclass networks. Discrete Event Dynamic Systems: Theory and Applications, 13:149–189, 2003. Special issue on learning and optimization methods. [10] G. Koole. On the pathwise optimal Bernoulli routing policy for homogeneous parallel servers. Math. Operations Res., 21:469–476, 1996. [11] S. P. Meyn. The policy iteration algorithm for average reward Markov decision processes with general state space. IEEE Trans. Automat. Control, 42, 1997. Also presented at the 35th IEEE Conference on Decision and Control, Kobe, Japan, December, 1996. [12] S. P. Meyn. Stability and optimization of queueing networks and their fluid models. In Mathematics of stochastic manufacturing systems (Williamsburg, VA, 1996), pages 175–199. Amer. Math. Soc., Providence, RI, 1997. [13] S. P. Meyn. Sequencing and routing in multiclass queueing networks. Part I: Feedback regulation. SIAM J. Control Optim., 40(3):741–776, 2001. [14] S. P. Meyn. Sequencing and routing in multiclass queueing networks. Part II: Workload relaxations. SIAM J. Control Optim., 42(1):178–217, 2003. Presented at the 2000 IEEE International Symposium on Information Theory, Sorrento, Italy, June 25 - June 30. [15] S. P. Meyn. Value functions, optimization, and performance evaluation in stochastic network models. Submitted for publication, 2003. [16] S. P. Meyn and R. L. Tweedie. Markov Chains and Stochastic Stability. Springer-Verlag, London, 1993. [17] V. Tadic and S. P. Meyn. Adaptive Monte-Carlo algorithms using control-variates. (preliminary version published in the Proceedings of the American Control Conference. June, 2003)., 2003. [18] Y-C Teh and A. R. Ward. Critical thresholds for dynamic routing in queueing networks. Queueing Systems: Theory and Applications, 42(3):297–316, 2002. [19] G. Weiss. Optimal draining of a fluid re-entrant line. In Frank Kelly and Ruth Williams, editors, Volume 71 of IMA volumes in Mathematics and its Applications, pages 91–103, New York, 1995. Springer-Verlag. [20] R. J. Williams. Diffusion approximations for open multiclass queueing networks: sufficient conditions involving state space collapse. Queueing Systems: Theory and Applications, 30(1-2):27–88, 1998.

33