Estimation of Derivatives of Nonsmooth Performance Measures in Regenerative Systems Tito Homem-de-Mello Revised version August 25, 1998 School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0205, USA
Abstract
Estimation of derivatives and consequent optimization of stochastic systems are elds that have been growing considerably in recent years. Most of the work, however, has been done for dierentiable systems. In this paper we investigate the problem of estimating directional derivatives of expected steadystate performance measures in parametric systems where those functions are not smooth. For the class of regenerative Markovian systems we provide conditions under which we can obtain consistent estimators of those directional derivatives. An example illustrates that the conditions imposed must be more strict than in the dierentiable case. Besides yielding an estimation procedure for directional derivatives and subgradients of equilibrium quantities, the result allows us to derive necessary and sucient conditions for dierentiability of the expected steady-state function. We then analyze the process formed by the subdierentials of the original process, and show that the subdierential set of the expected steady-state function can be expressed as an average of integrals of multifunctions, which is the approach commonly found in the literature for integrals of sets. The latter result can also be viewed as a limit theorem for more general compact-convex multivalued processes.
Key words: Derivative estimation, nonsmooth optimization, regenerative processes, steady-state, multifunctions.
Supported by CNPq (Conselho Nacional de Desenvolvimento Cient co e Tecnologico), Braslia, Brazil, through a Doctoral Fellowship under grant 200595/93-8. This work was also supported, in part, by grant Grant DMI-9713878 from the National Science Foundation.
i
1 Introduction In recent years a great deal of attention has been devoted to the computation of derivatives of performance measures in stochastic systems. The information provided by those quantities is essential to answer the important question: how much will the performance change if some parameters of the system are slightly changed? More formally, suppose we have a stochastic process, say fXn ()g, depending on a (vectorvalued) parameter , and assume that the process converges to a steady-state X1 (). We would like to compute the gradient of the expected value of the process in equilibrium, i.e. rIE[X1()]. A typical example is a G/G/1 queue where the distribution of the service times depends on a parameter (e.g. its mean); we may be interested in computing the sensitivity of the expected waiting time IE[W1()] with respect to the parameter . Notice also that the computation (or estimation) of derivatives allows one to take a step further and develop optimization procedures for the underlying performance measure. Such eort brings obviously numerous bene ts, and in fact there have been several papers in the literature dealing with that issue. See, for instance, Chong and Ramadge [4], L'Ecuyer and Glynn [23], Suri and Leung [39] and references therein for description of methods and further applications. In general, however, closed-form expressions for the steady-state derivatives cannot be obtained, so one must resort to simulation methods like nite dierences, perturbation analysis or likelihood ratios in order to estimate gradients (see e.g. Glasserman [9], Glynn [12], L'Ecuyer [22], Rubinstein and Shapiro [34], Suri [38] for discussions on that topic). In addition, it is necessary to show consistency of such estimators, since the steady-state performance measure of the system under scrutiny is a limiting quantity and hence so is its gradient. Extra assumptions that guarantee some type of uniform convergence, such as convexity (see Hu [18], Shapiro and Wardi [36], Robinson [29]), are often imposed for that purpose. A particularly neat situation occurs when the underlying process possesses a regenerative structure, i.e. it \restarts" itself at some points in time (see for instance Asmussen [1]). In such cases one can estimate steady-state quantities based on the behavior of the process over regenerative cycles, thus avoiding \warm-up" periods that are typically necessary in simulation (see Bratley, Fox and Schrage [2]). This is expressed in the formula h i IE Pn(=0)?1 Xn () E [X1 ()] = ; IE () where () is the length of a cycle. Notice however that the cycle length often depends on the parameter , thus making dierentiation of those quantities a dicult task. In some cases this problem can be overcome by using the likelihood ratio technique (see e.g. Rubinstein and Shapiro [34], Glasserman and Glynn [11]). The situation is also remedied if the derivative process frXn()g regenerates at the same epochs as the original process; we then have that, under some additional assumptions, h i IE Pn(=0)?1 rXn () : (1.1) rE [X1 ()] = IE () 1
Glasserman [10] studies conditions under which (1.1) holds for Markov processes (in the one-dimensional case), thus providing a convenient way to estimate the desired derivatives. Most of the work found in the literature treats the case where the expected steadystate function is dierentiable. Indeed, in many situations this is actually the case, for instance in queueing networks in which the distributions of arrival and service times have densities (see Suri [38], Wardi and Hu [40]). This dierentiability property, however, does not hold in general; in fact, Shapiro and Wardi [36] show by a simple example that not only can the expected steady-state function be nondierentiable, but also the set of nondierentiable points can be a dense subset of the domain. In such cases one is concerned with estimating subgradients, or more generally, directional derivatives (see section 2 for de nitions). An important contribution in that respect is the work by Robinson [29]: under the assumption that the functions Xn (; !) are convex w.p.1, Robinson shows that subgradients of the expected steady-state function can be consistently estimated; however, no claim can be made regarding the whole subdierential set, since in that setting one can only show that limits of subgradients are contained in the subdierential set of the expected steady-state function. An interesting application of nonsmooth optimization to the maximization of steadystate throughput in a tandem production line, using the techniques of [29], can be found in Plambeck et al. [26]. In this paper we investigate the problem of estimating directional derivatives of nonsmooth expected steady-state performance measures of Markov processes. We provide conditions under which we can obtain consistent estimators of those directional derivatives. The basic idea is to show that, under those conditions, the directional derivative function process fXn0 (0; )g regenerates at the same points as the original process fXn (0)g and hence the directional derivatives of the expected value function can be expressed both as a long-run average and as a ratio formula. Here, regeneration plays an essential role: without this condition, consistency is unlikely to hold, since typically pointwise convergence of functions does not imply convergence of directional derivatives | unlike the dierentiable case, where such convergence holds for example under convexity assumptions. An example given in section 3 shows that the analysis of regeneration of the directional derivative process is more complicated than in the dierentiable case: we exhibit two random variables depending on , say Y1() and Y2(), such that Y1() and Y2() are independent and identically distributed for each , but their subdierential sets at = 0 do not have the same distribution (the situation can be easily extended to regenerative processes in general). As a consequence, we give a necessary and sucient condition for the dierentiability of the expected steady-state function, thus extending a result by Shapiro and Wardi [36]. We then rewrite the results obtained for directional derivatives in terms of subdifferential sets. Although this reformulation is an immediate consequence of the correspondence between sublinear functions and compact convex sets (cf. Hiriart-Urruty and Lemarechal [15]), the importance of subdierentials in non-smooth optimization theory, in terms of algorithms and optimality conditions (see Rockafellar [30], Ioe and Tihomirov [20], Hiriart-Urruty and Lemarechal [15, 16]) justify per se the restatement of those results. This allows us to draw an important conclusion: the 2
subdierential of the expected steady-state function @ IE[X1 (0)] can be expressed both as a limiting average of sets (N ?1 PnN=0?1 @Xn (0)) | where the sum is understood as the Minkowski addition of sets A +PB = fa + b : a 2 A; b 2 B g | and as an average of sets over a regenerative cycle (IE[ n?=01 @Xn ()]=IE ). The latter expression | which involves the integral of a set | can be computed by using the theory of integration of multifunctions found in the literature (see for instance Castaing and Valadier [3], Hiai and Umegaki [14], Rockafellar [31]). We show that this limiting result is valid in general for regenerative compact-convex multivalued processes. The paper is organized as follows: in section 2 we de ne the set-up for the problem and review concepts on regenerative processes, convex analysis and vector measures. In section 3 we illustrate with an example some diculties of the problem and give conditions to ensure regeneration of the directional derivative functions. In section 4 we show that under some additional assumptions a ratio formula like (1.1) holds for the directional derivatives and then we exhibit consistent estimators of the derivatives of the expected steady-state function. As a consequence, we obtain a necessary and sucient condition for the dierentiability of that function. In section 5 we restrict ourselves to the case of subdierentiable functions and apply the theory of integrals of multifunctions to the results obtained in the previous sections. Section 6 presents two examples of application of the results, and in section 7 we make some concluding remarks.
2 Background and basic assumptions We rst review some concepts on regenerative processes. Following Asmussen [1], we say that a vector-valued process fYng is regenerative if there exists a sequence of iid spacings fj g such that, for each j 1, the process
fj+1 ; j+2; : : :; fYj +m ; m = 0; 1; : : :gg (where j = 1 + : : : + j ) is independent of 1; : : : ; j and its distribution does not depend on j . Observe that this de nition does not impose that fYj +m ; m = 0; 1; : : :g be independent of fYm; m = 0; : : : ; j ? 1g. The importance of regenerative processes has both theoretical and practical aspects. On the theoretical side, it can be shown (see e.g. Asmussen [1]) that under mild assumptions (namely, that IE1 < 1 and 1 has a non-lattice distribution) the process fYng has a limiting distribution which is given by h i IE Pn1=0?1 f (Yn ) IE [f (Y1 )] = (2.1) IE 1
for any bounded continuous function f . Furthermore, the same quantity is given by time-averages, that is, NX ?1 1 f (Yn ) w.p.1: (2.2) IE [f (Y1 )] = Nlim !1 N n=0 3
On the practical side, formula (2.1) provides a way to estimate functions of the process in steady-state, yielding a procedure that is usually called regenerative simulation. Suppose we generate m cycles of sizes 1; : : :; m . Then, W = IE [f (Y1 )] can be estimated by Pm Pi?1 f (Y ) W^ = i=1 Pnm=i?1 n : i=1 i
The variance of W^ can be estimated from the same sample; see for instance Shedler [37] for details. Let us now de ne formally the underlying processes and make some basic assumptions. Consider a vector-valued stochastic process X () := fXn ()g := f(Xn1(); : : :; XnK ())g; n = 0; 1; : : :, de ned on a common probability space ( ; F ; P ), and depending on an m-dimensional parameter belonging to some open set IRm . The choice for vector-valued rather than scalar-valued processes is driven mainly by the fact that many applications fall into that category, like queueing networks for instance. As we shall see later, the analysis is basically the same for both cases; there are however some exceptions, as in the discussion in the end of section 3. Since we are mostly interested in studying the derivatives at some xed point, we shall x from now on some 0 2 . Assume the following:
Assumption A1: fXn ()g is a Markov process for each 2 (possibly with continuous state space) and the process fXn(0)g is regenerative with regeneration epochs fmg; m 0, and 0 = 0. Assumption A1 is the basic starting point, since our main goal is to nd conditions under which the derivative process regenerates together with the original process at the points fmg. This assumption will be complemented later, as we shall see in section 3. Before proceeding further, let us review some concepts of convex analysis which will be used in the sequel (basic references are Rockafellar [30] and Hiriart-Urruty and Lemarechal [15]). Let f : ! IR be an arbitrary function. For any 2 , the directional derivative of f at in the direction d, when it exists, is given by f ( + td) ? f () : (2.3) f 0(; d) = lim t#0 t Notice that f 0(; ) is positively homogeneous, i.e. f 0(; 0) = 0 and f 0(; d) = f 0(; d) for all d 2 IRm and all > 0.
De nition: A function f : IRm ! IR is said to be directionally dierentiable at 0 2 if the directional derivative f 0(0; d) de ned in (2.3) exists and is nite for all d 2 IRm, and the function f 0(0; ) is continuous. De nition: A function f : IRm ! IR is said to be subdierentiable at 0 2 if it is directionally dierentiable and, in addition, f 0(0; ) is convex. 4
Notice that our concept of directional dierentiability is stronger than the usual one found in the literature, since we assume the directional derivative function to be continuous. When f is subdierentiable, we can de ne the subdierential of f at as the set supported by the function f 0(; ), that is,
@f () = f 2 IRm : h; di f 0(; d) for all d 2 IRmg:
(2.4)
It follows that the subdierential @f () is a convex and compact set. Furthermore, it follows that in this case the directional derivative f 0(; d) is a sublinear function of d (we say that a function g is sublinear if g(1d1 + 2d2) 1g(d1 ) + 2g(d2) for all d1; d2 2 IRm and all 1; 2 > 0. It can be shown | see [15, Proposition V.1.1.4] | that g is sublinear if and only if g is positively homogeneous and convex). Notice that when f is locally Lipschitz (as de ned below) the concept of subdierentiability includes that of regularity introduced by Clarke [6], that is, regular functions are subdierentiable. Moreover, if f is regular then the subdierential coincides with the so-called generalized gradient (see [6]). If f is convex, those two concepts also coincide with the standard subdierential of convex analysis (see e.g. Rockafellar [30]). Relation (2.4) can actually be generalized to arbitrary nite-valued sublinear functions, resulting in an interesting connection between that class of functions and compact convex sets. Given a nite-valued sublinear function () on IRm we can de ne a compact convex set C IRm as
C = f 2 IRm : h; di (d) for all d 2 IRm g: Conversely, given a compact convex set C IRm we can construct a nite sublinear function C () on IRm as
C (d) = maxfh; di : 2 C g for all d 2 IRm . This correspondence can be shown to be an isometric isomorphism, and many results can be derived from this equivalence. In Hiriart-Urruty and Lemarechal [15, section V.3], for instance, one can nd a comprehensive discussion of that topic. An important situation occurs when the directional derivative function f 0(0; ) is linear, i.e. there exists a vector D0 2 IRm such that f 0(0; d) = hD0 ; di for all d 2 IRm. In that case f is said to be G^ateaux-dierentiable at 0. A stronger concept is that of Frechet dierentiability: f is said to be Frechet-dierentiable at 0 if there exists a vector D0 2 IRm such that lim jf (0 + d) ? f (0) ? hD0 ; dij=kdk = 0:
d!0
It is easy to see that Frechet-dierentiability implies G^ateaux-dierentiability. The converse is true if f is locally Lipschitz, i.e. if there exists a positive constant M such that jf (1) ? f (2)j M k1 ? 2k 5
for all 1; 2 in a neighborhood of 0. If f = (f1; : : :; fK ) is a mapping from to IRK we say that f is directionally (resp. G^ateaux, Frechet) dierentiable at 0 if each fi is directionally (resp. G^ateaux, Frechet) dierentiable at 0 , i = 1; : : : ; K . For a comprehensive discussion on these concepts, including the generalization to in nite-dimensional spaces, see Shapiro [35] and references therein. It is clear from the above discussion that the set of directionally dierentiable functions includes the G^ateaux-dierentiable functions. Another important class included in that category are the convex continuous functions. More generally, functions that can be written as a dierence of convex continuous functions are directionally differentiable. Notice also that composite functions of the form f () = g(A()), where g() is convex continuous and A : IRm ! IRk is a G^ateaux-dierentiable mapping, are directionally dierentiable. The ideas discussed so far suggest that a fairly general setting is obtained if we assume that the underlying functions are directionally dierentiable. The goal is then to estimate the directional derivatives of the expected value function W1() = IE[X1 ()] at 0, by using the directional derivatives of the process fXn ()g. In section 5 we specialize our results for the subdierentiable case. In studying estimation procedures for the directional derivatives W10 (0; d), it will be useful to consider the function W10 (0; ) rather than speci c values of d. We proceed now in de ning the directional derivative process fXn0 (0; )g on an appropriate space. We can now state the second assumption:
Assumption B1: For each n = 0; 1; 2; : : : and P -almost all ! 2 , the sample-path functions Xn (; !) are directionally dierentiable at 0. Assumption B1 guarantees, of course, that we can actually take the directional derivatives pathwise. As pointed out before, a fairly large class of functions satis es k 0 ( ; ), k = 1; : : :; K , is that property. Observe that the state space of each X n 0 the space H + (IRm ) of positively homogeneous continuous functions de ned on IRm . This space is clearly isomorphic to the space C (S m?1) of continuous functions whose domain is the unit sphere S m?1 of IRm (i.e. the set fd 2 IRm : kdk = 1g) since +we can always write f (d) = f (kdk kddk ) = kdkf ( kddk ) for d 6= 0. Endow C (S m?1) and H (IRm ) with the sup-norm kk = d2max j(d)j; (2.5) S m?1 and+ let BH denote the corresponding collection of Borel sets in H + (IRm).0 Now, (H (IRm ); BH ) is a measurable space and thus, in +order to show that each Xnk (00; ) is well-de ned as a random variable with range in H (IRm) we must show that Xnk (0; ) is F ? BH measurable, i.e. the inverse image of a set in BH belongs to the - eld F . The Lemma below shows that this is actually the case. Lemma 2.1 Let n 2 f0; 1; : : :g and k +2 f1; : : : ; K g be arbitrary. Then,0 under assumption B1, the function ' : ! H (IRm ), de ned by '(!)() = Xnk (0; ; !), is F ? BH measurable. 6
Proof: Let '! () denote the function '(!)(). In order to show that ' is F ? BH measurable, it is sucient to show that, given any function 0 2 H + (IRm) and any " > 0, the set A0 = f! 2 : k'! ? 0k "g is F -measurable. Notice that
A0 = f!\2 : j'! (d) ? 0(d)j "; for all d 2 S m?1 g f! 2 : j'! (d) ? 0(d)j "g: = d2S m?1
(2.6)
Observe that, for a xed d 2 S m?1, the set f! 2 : j'! (d) ? 0(d)j "g is F measurable. Too see this, notice that '! (d) is de ned by the limit of the dierence of two measurable functions (see the de nition (2.3) of the directional derivative) and thus '! (d) is measurable in !. Next, let D be a countable dense subset of S m?1 . Since '! and 0 are continuous, it follows that the intersection in (2.6) can be taken over the countable set D instead of S m?1 and hence we conclude that A0 is F -measurable. Let HK+ (IRm ) be the cartesian product HK+ (IRm ) = H + (IRm) : : : H++ (IRm ) (K times). Let BHK denote the corresponding collection of Borel sets of HK (IRm ). Notice that, because of the isomorphism between the metric spaces (C (S m?1); k k) + m + and (H (IR ); k k), it follows that H (IRm )+is a separable Banach space since so is C (S m?1) (see e.g. Royden [33]) and hence HK (IRm ) is also a separable Banach space. Thus, it makes sense then to 0study the regeneration (or more generally, Harris recurrence) of fXn0 (0; )g := f(Xn1 (0; ); : : :; XnK 0(0; ))g as a process on (HK+ (IRm ); BHK ) (see Revuz [28] or Nummelin [25] for details on the latter topic). Note also that fXn0 (0; )g is a Markov process with respect to the ltrations generated by fXn ()g. Consider now an arbitrary random variable '+ on (H + (IRm ); BH ). As seen above, ' takes on values in the separable Banach space H (IRm ), so it is clear that the standard Lebesgue integral cannot be used to compute the expected value IE[']. Nevertheless we can resort to the so-called Bochner integral, which is in a sense an extension of the Lebesgue integral from the real line to a Banach space. Concepts and main results about the Bochner integral can be found for instance in Diestel and Uhl [8] or Kelley and Srinivasan [21]. For now we need the following de nitions (cf. [8, pp.41,45]): + De nition : A function :
! H (IRm ) is called simple if there exist f1; : : : ; fN 2 + m H (IR ) and E1; : : :; EN 2 F such that+ (!) = PNi=1 fi 1Ei (!). De nition: A function ' : ! H (IRm ) is called strongly measurable if there exists a sequence of simple functions f ng such that limn!1 k'(!) ? n(!)k = 0 for
P -almost all !. De nition: A strongly measurable function ' : ! H + (IRm) is called Bochner integrable if IE[k'k] < 1. 7
The importance of strongly measurable integrable functions lies in the fact that for class of functions the Bochner integral is de ned in a natural way. If = R PN this simple function, its Bochner integral is de ned as (!)P (d!) = PNi=1 ffiP1E(iEis).aNow, let ' be a strongly measurable integrableRfunction and let f ng i i=1 i be a sequence of simple functions converging to '. Then, limn n exists (see e.g. [8, p. 45]) and hence we can de ne Z Z ' ( ! ) P ( d! ) := lim (2.7) n (! )P (d! ): n!1
It is important to note that, because of the separability of H + (IRm), F ? BH measurable functions are strongly measurable; for a proof of this fact, see Kelley and Srinivasan [21, Lemma 10.18] (observe that those authors use a dierent de nition for strong measurability; the result stated above is translated in terms of the nomenclature adopted here). Hence, by Lemma 2.1 we have that, given 0 2 , each Xnk0 0(0; ) is strongly measurable. The following assumption will then ensure that IE[Xnk (0; )] exists: + Assumption B2: For each n = 0 ; 1 ; 2 ; : : : , each k = 1 ; : : :; K , the H (IRm )-valued 0 k random variables Xn (0; ) are Bochner integrable.
course, de nition (2.7) does not tell us how to evaluate the resulting function R ' Of pointwise unless we compute the approximating simple functions, which in general cannot be easily done. The following lemma shows that we can actually evaluate the integral function at a point by computing the integrals pointwise.
LemmaR 2.2 Let ' be a Bochner integrable random variable on (H + (IRm); BH ), and let = '(!) P (d!) be the Bochner integral of '. Then, for each d 2 IRm we have Z
that
(d) = '(!)(d) P (d!); (2.8)
where the integral on the right-hand side is the standard Lebesgue integral, and '(!)(d) denotes the value of '(!) at d. Proof: Let (H + (IRm)) denote +the dual space of H ++(IRm ), i.e. the space of all bounded linear functionals on H (IRm ). Let F 2 (H (IRm )). From Kelley and Srinivasan [21, Lemma 10.21] we know that F (') is Lebesgue integrable and
F () = F
Z
'(!)P (d!) =
Z
F ('(!)) P (d!):
(2.9)
Next, observe+ that because of the isomorphism between H + (IRm ) and C (S m?1) we have that (H (IRm)) is isomorphic to the space of Borel measures on S m?1; see for instance Royden+ [33, p. 358]. Hence, given a Borel measure on S m?1 there is a unique F 2 (H (IRm )) such that Z F ( ) = Sm?1 d 8
for all 2 H + (IRm). For each d 2 S m?1, let d denote the atom measure with mass one at d, and let F denote the corresponding linear functional. Clearly, we have that, + dm for any 2 H (IR ), Z Fd( ) = m?1 dd = (d) S and by substituting Fd into equation (2.9) it follows that Z Z (d) = Fd() = Fd('(!))P (d!) = '(!)(d) P (d!); as stated.
Notice that+the above lemma also implies that if ' is a Bochner integrable random variable on (H (IRm); BH ), then each real-valued random variable '(!)(d), d 2 IRm , is (Lebesgue) integrable.
3 Local conditions for regeneration As remarked in section 1, our objective is to study the regeneration of subdierentials (or, more generally, directional derivatives) based on the regeneration of the original process. A much simpler but fundamental question hidden under that is: suppose Y () is a random function of a one-dimensional parameter . How does the distribution of @Y (0) for some 0 2 relate to the distribution of the Y ()'s? Let us consider initially the dierentiable case. Suppose there is a set A such that P (A) = 1 and Y (; !) is dierentiable at 0 for all ! 2 A. For simplicity, suppose that Y (0) is constant, i.e. Y (0; !) = y0 for some y0 and all !. Then we have that, for ! 2 A, @Y (0) = fY 0(0)g and Y (0 + h; !) ? Y (0) : Y 0(0; !) = hlim !0 h Therefore, since convergence w.p.1 implies convergence in distribution, it follows that the distribution of Y 0(0) is determined if one knows the distribution of each Y (0 + h) for 0 < h < ", where " is any positive xed number. In the nondierentiable case, however, more is needed, as can be veri ed from the following simple example. Let = f!1; !2; !3; !4g with P (!i) = 1=4, i = 1; : : : ; 4. Let Y1() and Y2() be random variables de ned for each 2 in the following way: Y1(; !1) = jj; Y2(; !1) = + Y1(; !2) = jj; Y2(; !2) = ? Y1(; !3) = 0; Y2(; !3) = + Y1(; !4) = 0; Y2(; !4) = ? where + = max(; 0), ? = max(?; 0). It is clear that Y1() and Y2() are independent for each . Notice that at 0 := 0 we have Y1(0) = Y2(0) = 0 w.p.1 and, at 6= 0 and a 2 IR we have that (1 if a = jj or a = 0 P (Y1() = a) = P (Y2 () = a) = 02 otherwise ; 9
so Y1() and Y2() have the same distribution for any xed . Furthermore, at any 6= 0 Y1() and Y2() are dierentiable w.p.1 and (1 if a = sign() or a = 0 0 0 P (Y1 () = a) = P (Y2 () = a) = 02 otherwise : However, at 0 = 0 we have (1 if C = [?1; 1] or C = f0g P (@Y1(0) = C ) = 02 otherwise and (1 if C = [?1; 0] or C = [0; 1] P (@Y2(0) = C ) = 02 otherwise : so @Y1(0) and @Y2(0) do not have the same distribution. Note that Y1 is nondierentiable at zero with probability 1/2, whereas Y2 is nondierentiable at zero with probability 1. The above example illustrates the nature of the problem. At a point where the functions are dierentiable, the subdierential is just a singleton and hence it is determined by any of the one-sided derivatives (i.e. the directional derivative at 0 in the direction +1 or -1). At the nondierentiable point 0 = 0, on the contrary, the subdierential is given by the set whose support function is the directional derivative at 0, in this case the convex hull of left- and right-side derivatives, so its distribution depends on the joint distribution of Yi0(0; 1) and Yi0(0; ?1). In summary, knowledge of the distribution of each Y () is not sucient to determine the distribution of @Y (0). On the other hand, of course, the law of @Y (0) can be computed if one knows the distribution of the random function Y (), that is, all the nite-dimensional distributions (Y (1); Y (2); : : :; Y (l)), where l and 1; : : : ; l are arbitrary. The preceding discussion suggests that, in order to obtain conditions for regeneration of subdierentials, one should impose conditions on the random functions Xn (; !) rather than on each Xn (; !). This however appears to be quite restrictive, since regeneration of the whole functions is unlikely to occur in real situations. A more exible alternative is to study the behavior of the process fXn ()g on (random) neighborhoods of the xed point 0. The key (although obvious) observation here is that the knowledge of a directionally dierentiable function f on any neighborhood of 0 is sucient to compute the directional derivative function f 0(0; ), as it can be veri ed directly from de nition (2.3). The lemma below establishes the connection between the distribution of the directional derivative and the distribution of the function on a xed (i.e. independent of !) neighborhood of 0. Lemma 3.1 Let V be a neighborhood of 0, and let Y () be a real-valued directionally dierentiable random function on . Then, the distribution of Y 0 (0; ) is determined by the distribution of Y jV () (the restriction of Y to V ) on cylinder sets, that is, by the elements of the form P (Y (1) 2 B1; : : :; Y (k ) 2 Bk ); for all k 1, all 1; : : :; k 2 V and all B1 ; : : :; Bk Borel sets of the real line. 10
Proof: Notice rst that the distribution of the random function Y 0 (0; ) is given by its distribution on cylinder sets, that is, by
P (Y 0(0; d1) 2 A1; Y 0(0; d2) 2 A2; : : : ; Y 0(0; dl) 2 Al) for all l and all d1; : : :; dl , where A1; : : :; Al are Borel sets of the real line. Fix then d1; : : :; dl and, for each t 0 and each j = 1; : : : ; l, let Wdj (t) denote the random variable Wdj (t) = Y (0 + tdtj ) ? Y (0) : By assumption, the joint distribution of Y (0 + td1); : : : ; Y (0 + tdl) and Y (0) is known for t small enough. It follows that
P (Wd1 (t) 2 A1; : : :; Wdl (t) 2 Al) can be computed for t small enough. Moreover, it is clear that Wdj (t) ! Y 0(0; dj ) w.p.1 as t ! 0 for each j = 1; : : : ; l. Therefore, we conclude that
P (Wd1 (t) 2 A1; : : :; Wdl (t) 2 Al); P (Y 0(0; d1) 2 A1; : : :; Y 0(0; dl ) 2 Al) = lim t#0 so the left-hand side of the above equation can also be computed from the given data. Clearly, the above lemma implies that, if there exists a neighborhood V of 0 such that the restricted functions Xn jV () regenerate, then Xn0 (0; ) regenerates at the same epochs. We would like however to have a less strict condition, since the sample paths Xn (; !); ! 2 , can be quite dierent from each other | so it is unlikely that they have the same properties around a xed neighborhood. Similarly, the functions Xn (; !) and Xm (; !) may have dierent behaviors for n 6= m. In other words, we want to allow the neighborhood V to depend on n and !, and still ensure some kind of regeneration. Assumption A2 below is a step in that direction. Recall that, by assumption A1, fm g are regeneration points of Xn (0). We now extend that assumption and state the main result.
Assumption A2: There exist K -dimensional vector-valued directionally dierentiable random functions fFn()g; n 0, which are identically distributed (i.e. have the same nite-dimensional distributions) and, for P -almost all ! 2 , neighborhoods fVn (!) = (Vn1(!); : : : ; VnK (!))g of 0 such that, for all m; n 0, on the event fm = ng the restricted functions Xn jVn(!)(; !) and Fn jVn(!)(; !) coincide. Veri cation of assumption A2 must be done on a case-by-case basis. In section 6 we shall see some typical examples of ways to check this condition.
Theorem 3.1 Suppose that Assumptions A1, B1 and A2 hold. Then, the process f(Xn(0); Xn0 (0; ))g regenerates at the epochs fm g; m 0. 11
Proof: Let m 0. By assumption A2, on the event fm = ng we have (Xn jVn )0 (0; ) = (FnjVn )0(0; ) and thus
Xn0 (0; ) = (Xn jVn )0(0; ) = (FnjVn )0(0; ) = Fn0 (0; ); where the above equalities are understood to hold for P -almost all ! 2 . But from Lemma 3.1 we have that Fn0 (0; ) =d F10(0; ), since Fn() =d F1(). It follows that (Xm (0); X0m (0; )) is independent of 1; : : :; m?1 and independent of n, so the process f(Xn (0); Xn0 (0; ))g regenerates at m. A particular situation occurs when the original process X () satis es a recursion of the type Xn+1 () = (Xn(); Un ()); (3.1) where fUn ()g for any xed are iid random vectors independent of X (). In those cases we can often obtain an explicit regenerative structure. In Glasserman [10], it is assumed that the original process fXn (0)g has the following structure: there is an integer r 1, a recurrent set B IRK , subsets A1; : : : ; Ar of IRK and a function h : IRrK ! IRK such that, if Xn (0) 2 B and Un+i (0) 2 Ai, i = 1; : : : ; r, then Xn+r (0) = h(Un (0); : : :; Un+r (0)). This condition can be viewed as a more explicit version of the splitting condition for Harris recurrent chains (see e.g. Asmussen [1]), i.e. whenever theQprocess reaches a set B it regenerates r steps later with probability p > 0; here, p = ri=1 pi , where pi = P (U1(0) 2 Ai). Under this condition, assuming that B and Ai are open and some other dierentiability assumptions, Glasserman proves that the process f(Xn (0); Xn0 (0))g is regenerative. We refer again to [10] for details. Notice that the assumption that B and Ai are open is actually equivalent to assuming pointwise regeneration of X () on neighborhoods of 0. By strengthening the assumption on the input functions Un(), Theorem 3.6 in [10] can be modi ed as follows: Theorem 3.2 Assume A1 and B1, and suppose there is an integer r 1, a recurrent set B IRK , subsets A1; : : :; Ar of IRK , a Frechet-dierentiable function h : IRrK ! RK and neighborhoods fVn (!) = (Vn1(!); : : : ; Vnk (!))g of 0 such that if Xn (0; !) 2 B and Un+i (0 ; !) 2 Ai; i = 1; : : : ; r; (3.2) then Xn+r (; !) = h(Un(; !); : : :; Un+r (; !)) on Vn+r (!): Suppose further that the input functions fUn()g; n 0, are independent and identically distributed, and pathwise directionally dierentiable. Then, f(Xn(0 ); Xn0 (0; ))g is regenerative. Proof: Let fj ; j 1g be the regeneration epochs j = inf fq > j?1 : Xq?r (0) 2 B and Uq?r+i (0) 2 Ai; i = 1; : : :; rg: On fj = qg we have that Xq (; !) = h(Uq?r (; !); : : :; Uq (; !)) on each neighborhood Vq (!). Now, let Fq be a random function from to RK de ned by Fq () = 12
h(Uq?r (); : : :; Uq ()). It can be shown that F is pathwise directionally dierentiable, since each Un() is directionally dierentiable and h is Frechet-dierentiable (see Shapiro [35, Prop. 3.6]). Moreover, the Fq (); q 0, are identically distributed. It follows that assumption A2 is satis ed and hence the result follows from Theorem 3.1.
4 Obtaining consistent derivative estimators One of the most useful applications of regenerative processes is the estimation of steady-state quantities by ratio-type formulas like (2.1). Indeed, from a simulation standpoint, by using the regenerative structure one avoids \warm-up" periods typically necessary when computing time-averages like (2.2) and, furthermore, variances and consequently con dence intervals can be constructed by standard application of the Central Limit Theorem. In many systems, however, the regenerative cycles can be extremely long, thus making the use of ratio formulas not feasible from a practical point of view. Nevertheless, even in such cases regeneration plays an important role, namely that of ensuring the existence of a steady-state under mild assumptions. In this section we discuss these issues with respect to the directional derivatives. As we shall see, the assumptions of the previous sections, together with some regularity conditions, guarantee that the directional derivatives of the expected value function can be expressed both as ratio-type and as time-average formulas. Suppose that the processes fXn ()g (for all in a neighborhood of 0) and 0 fXn(0; )g are regenerative with the same regeneration epochs, and suppose that the iid cycle times fn g are such that IE1 < 1 and 1 has a non-lattice distribution, so a limiting distribution exists. In order to simplify the discussion, we assume throughout this section that K = 1, i.e. Xn () is a scalar, and that assumption B2 holds, so Xn0 (0; ) is Bochner integrable. Let Wn () = IE[Xn ()], 0 n 1, where X1() denotes the weak limit of Xn (). Recall that our main goal is to estimate the steady-state directional derivative W10 (0; ) provided it exists. A direct application of the ratio formula (2.1) gives us i h IE Pn1=0?1 Xn () W1() = IE[X1()] = (4.1) IE 1
and
h i IE Pn1=0?1 Xn0 (0; ) IE [0 ()] = ; (4.2) IE1 where 0 () is the weak limit of Xn0 (0; ), and the expected value in (4.2) is understood as the Bochner integral (see section 2). Notice however that neither of the above formulas seems to be appropriate to compute W10 (0; ): dierentiation of (4.1) would require dierentiating IE1 (which despite the notation is also a function of ), whereas in (4.2) it does not necessarily hold that 0 () = X10 (0; ) | and even when it does, we still have to be concerned about interchanging the expectation and the 13
dierentiation operators. Indeed, it is not even clear whether X1 is directionally differentiable. Those diculties do not arise from the non-smoothness of the functions; they exist in the dierentiable case as well, as pointed out by Glasserman [10, section 5]. In order to overcome these problems, we shall impose some stronger conditions. The goal of assumption A3 below is to ensure that the regeneration epochs are constant in a neighborhood of 0 with some probability. Assumption A4, in turn, imposes independence of the original process between cycles. This assumption is sometimes called \classical regeneration" in the literature. Assumption B3 is used to ensure that the interchange of limits and expectations is valid.
Assumption A3: Assumption A2 holds and, furthermore, the neighborhoods fVn (!)g of 0 are such that, for any > 0, there exist a set B 2 F with P (B ) > 1 ? and a neighborhood V of 0 such that V \!2B \1n=0 Vn (!). Assumption A4: For each j 1, the post-j process fXj +k (); k = 0; 1; : : :g is independent of the pre-j process.
Assumption B3: Either each sample-path function Xn (; !), n = 0; 1; 2; : : :, is convex for P -almost all ! 2 , or each Xn (; !) is almost surely locally Lipschitz at 0 with constant Mn (!), that is, jXn (1; !) ? Xn (2; !)j Mn (!)j1 ? 2j for all 1; 2 2 Vn (!) for P -almost all ! 2 . In the latter case, fMn g is independent of fXng and there exists a constant such that IEMn for all n. As with assumption A2, veri cation of assumption A3 may involve a thorough study of the system. However, as we shall see in section 6, typically that reduces to showing that the family of functions Xn (; !) is equicontinuous for all n and all ! in some set of arbitrarily high probability, which can be accomplished by exploiting the structure of the system. Assumption A4, on the other hand, is satis ed in most regenerative systems. Assumption B3 covers the strong stochastic convexity assumption usually found in the literature (see e.g. Hu [18], Robinson [29]) as well as an alternative condition in case convexity is not present.
Theorem 4.1 Suppose that assumptions B1-B3, A3-A4 hold. Also, assume that the iid cycle times fn g are such that IE1 < 1 and 1 has a non-lattice distribution. Then, the expected value function W1 () = IE[X1 ()] is directionally dierentiable at 0 and
hP1?1 0 i IE X ( ; ) 0 n=0 n W10 (0; ) = IE [0 ()] = IE1 0 where, as above, 0 () is the weak limit of Xn (0; ), and the expectation corresponds to the Bochner integral. 14
Proof: Fix > 0, and let V be the neighborhood of 0 given by assumption A3. We will show initially that, for any 2 V , fXn ()g regenerates at a subset of the regeneration epochs of fXn (0)g. Let denote the random variable
:= inf fk 1 : Xk jV = Fk jV g; that is, the th regeneration point is the rst one at which X coincides with F on V . Thus, by assumption A4 we have that has a geometric distribution with parameter q = P ( = 1) = P X1 jV = F1 jV P (B ) > 1 ? (4.3) (where B is the set given in assumption A3), the inequalities arising from the fact that ! 2 B =) V V1(!) =) X1 jV = F1 jV and P (B ) > 1 ? by assumption. Consider now a subset fkj g of the regeneration points such that Xkj V = Fkj V . Clearly, the length of these new cycles is given by := =
X i=1
i
(we drop the subscript from to ease the notation). By assumption A4 we can easily see that, for any j 1, the event f j ? 1g is independent of the random variable j . Therefore we can apply Wald's identity (see e.g. Chung [5, p.137]) to obtain IE = IE [ ]IE [ ] = IEq < 1: We conclude that, for any 2 V , the process fXn ()g regenerates at a subset of the regeneration epochs of fXn (0)g. Hence, for any suciently small t > 0 we have that i hP1?1 i1 0 hP1?1 IE X ( + td ) IE X ( ) n 0 n 0 1 W ( n =0 n =0 1 0 + td) ? W1 (0) @ A = lim ? lim t#0 t t#0 t IE1 IE1 3 21 ?1 X 1 X ( + td ) ? X ( ) n 0 n 0 5: = IE lim (4.4) IE 4 t#0 t 1
n=0
By assumption B3 (in case the Xn 's are Lipschitz), we have that 1t jXn(0 + td) ? Xn (0)j Mn kdk. Furthermore, since the Mn's are independent of 1 it follows that 2 2 1?1 33 21?1 3 X X IE 4 Mn 5 = IE 4IE 4 Mn 155 n=0 21 ?1 n=0 3 X = IE 4 IEMn5
n=0 IE [1]
15
< 1
and thus, by the Dominated Convergence Theorem we have that the limit and the expectation in (4.4) can be interchanged, that is, 3 21?1 X W ( + td ) ? W ( ) X ( + td ) ? X ( ) 1 1 0 1 0 n 0 n 0 5 W10 (0; d) = lim = IE IE 4 lim t#0 t # 0 t t 1 hP1 ?1n=0 0 i IE n=0 Xn (0; d) (4.5) = IE1 = IE [0 (d)] ; (4.6) the last equality following from the fact that, as seen before, fXn0 (0; d)g also regenerates atPthe same points as fXn (0)g. Observe that in case Xn (; !) is convex w.p.1, clearly n1=0?1 Xn is convex as well and hence the interchange of the derivative and expectation operators follows from the Monotone Convergence Theorem (cf. Shapiro and Wardi [36, Proposition 2.1]). Next, notice that equations (4.5) and (4.6) hold for all d 2 IRm and hence by Lemma 2.2 we conclude that W10 (0; ) is the Bochner integral of the corresponding functions on the right-hand sides of those equations. Finally, from (4.3) we see that P ( = 1) goes to one as goes to zero, that is, P ( = ) ! 1 as ! 0. The assertion of the theorem now follows.
Remark: A simple situation where assumption A3 is satis ed occurs when the \re-
generation neighborhoods" do not depend on n or !. Quite often, however, we cannot ensure that such strong condition holds. Suppose for instance that = IR+ , 0 2 IR and (0) = 1, the regeneration occurring for all 2 [0 ? 1=!; 0 +1=!], where ! has exponential distribution. Then, for any positive t, there exists no neighborhood of 0 such that fXn (0 + t)g regenerates together with fXn (0)g. Assumption A3, in turn, covers this case, thus allowing one to apply Theorem 4.1. As we shall see in section 6, there are systems for which we cannot ensure regeneration in a global neighborhood of 0 that satisfy assumption A3.
Corollary 4.1 Under the assumptions of Theorem 4.1, for any d 2 IRm the random variables
?1 1 NX 0 (4.7) N n=0 Xn (0; d) and PM Pm?1 X 0 ( ; d) PM ?1 X 0 ( ; d) ! m=1 n=m?1 n 0 (4.8) = n=0 n 0 PM ( ? ) M m?1 m=1 m are consistent estimators of W10 (0; d) (respectively as N and M go to in nity).
Proof: Since Xn0 (0; ) regenerates at the points fmg, it follows that both above expressions converge to IE [0 (d)]. By Theorem 4.1, this quantity is equal to W10 (0; d).
Corollary 4.1 provides a way to estimate the directional derivative of W1 at 0 in any given direction d. Suppose we simulate the system for M regenerative cycles, and 16
recall that 0 = 0; 1; 2; : : : are the regeneration epochs. Then, (4.8) gives an estimator of W10 (0; d), and by using the regenerative property it is possible to estimate the variance of that estimator (for details, see for instance Shedler [37]). Alternatively, one can x a run length N and use the estimator (4.7); in that case, variances can be estimated by the batch means method (see e.g. Bratley et al. [2]). The importance of the above result arises from the fact that, in general, (4.7) and (4.8) are not consistent estimators of W10 (0; d), the reason being that typically pointwise convergence does not imply convergence of directional derivatives. For instance, the functions fn () = max( ? 1=n; 0) are all dierentiable at 0 and fn0 (0) = 0; however, fn converges (uniformly) to f () = max(; 0) and f is not dierentiable at 0 | we have f 0(0; 1) = 1, f 0(0; ?1) = 0. In this sense, assumptions of Theorem 4.1 can be viewed as sucient conditions for convergence of directional derivatives even in the deterministic case. As another consequence of Theorem 4.1, we can derive necessary and sucient conditions for nondierentiability of the steady-state function in case all Xn are subdierentiable. This nondierentiability phenomenon was observed by Shapiro and Wardi [36]. In their paper, they assume that each Xn () is subdierentiable and that fXn()g and f@Xn ()g are regenerative for each , and then give a sucient condition for IE[X1 (0)] not to be dierentiable at some xed point 0, namely, that there exists a non-singleton convex compact set C such that P (C a @Xk (0); for some k 1 ) > 0; where a means inclusion up to an additive constant and, as before, 1 is the length of the rst regenerative cycle; see [36, Proposition 2.2] for details. The present setting allows us to extend that result, as follows. Recall from section 2 that fXn0 (0; )g is a + + m process on (H (IR ); BH ). Let A H (IRm ) denote the set A = f' 2 H + (IRm ) : '() is not linearg: (4.9) The next lemma shows that P (Xn0 (0; ) 2 A) is well-de ned. Lemma 4.1 The set A de ned above is a BH -Borel set. Proof: + For any r 2 IRm , let Lr () 2 H + (IRm ) denote the linear function hr; i. Let D H (IRm) denote the set \ [ D = B (Lr ; 1=n); n1 r2QIm
where B (Lr ; 1=n) H + (IRm) is the closed ball B (Lr ; 1=n) = f' 2 H + (IRm ) : k' ? Lr k 1=ng. We claim that Ac = D. Indeed, let ' 2 Ac, i.e. ' = Lx for some x 2 IRm . Let n 1 be arbitrary. Since QI m is dense in IRm there exists an rn 2 QI m such that krn ? xk2 1=n and therefore we have kLx ? Lrn k = supm?1 jhx ? rn ; dij d2S supm?1 kx ? rnk2kdk2 d2S = kx ? rnk2 1=n; 17
so ' 2 B (Lrn ; 1=n). Thus, ' 2 [r2QIm B (Lr; 1=n) for all n 1 and hence ' 2 D, so Ac D . Conversely, let 2 D. Then, we know that given any n 1 there exists an rn 2 QI m such that k ? Lrn k 1=n. Fix now an arbitrary N 1. Let d1; d2 be arbitrary points in S m?1 , and let 1; 2 be any real numbers. Let n = n(1; 2; N ) be any integer greater than or equal to 2N (j1 j + j2j). Then, we have that
j (1d1 + 2d2) ? 1 (d1) ? 2 (d2)j = j (1d1 + 2d2) ? Lrn (1d1 + 2d2) + 1Lrn (d1) + 2Lrn (d2) ? 1 (d1) ? 2 (d2)j j (1d1 + 2d2) ? Lrn (1d1 + 2d2)j + j1jjLrn (d1) ? (d1)j + j2jjLrn (d2) ? (d2)j k1d1 + 2d2k2k ? Lrn k + j1jkLrn ? k + j2jkLrn ? k n1 (k1d1 + 2d2k2 + j1j + j2j) n1 (2j1j + 2j2j) N1 :
Thus, we have that j (1d1 + 2d2) ? 1 (d1) ? 2 (d2 )j 1=N for any N 1, so we conclude that (1d1 + 2d2) = 1 (d1)+ 2 (d2), i.e. is linear. Therefore, D Ac and hence D = Ac. Since D is clearly a BH -Borel set, we conclude that so are Ac and A.
We can now state the result. Theorem 4.2 Suppose the assumptions of Theorem 4.1 hold, and suppose furthermore that Xn () is subdierentiable at 0 2 w.p.1 for all n 0. Then, the steadystate function W1 () := IE[X1 ()] is (G^ateaux, Frechet) dierentiable at 0 if and only if
P (f! : 9 j; 0 j 1(!); s.t. Xj (; !) is not dierentiable at 0g) = 0: (4.10) For the proof, we shall need the following lemma: Lemma 4.2 Let '() be a a random variable on (H + (IRm); BH ) such that '() is sublinear, i.e. '(1 d1 + 2d2 ) 1 '(d1) + 2 '(d2 ) w.p.1 for all d1 ; d2 2 IRm and all 1; 2 > 0. Then, IE'() is linear if and only if P (f! : '() is linearg) = 0. Proof: Let A be de ned as in (4.9). Suppose rst that IE['()] is linear. Then, for any d1; d2 2 IRm we have that IE'(d1 + d2) = IE'(d1) + IE'(d2) (notice the implicit use of lemma 2.2 here) and hence IE['(d1 + d2) ? '(d1) ? '(d2)] = 0. Since the integrand is almost always nonpositive, it follows that '(d1 + d2) = '(d1) + '(d2) w.p.1. Moreover, for any d 2 IRm and any > 0 we have that 0 = '(0) = '(d + (?)d) = '(d) + '((?)d) and thus '((?)d) = (?)'(d). We conclude that ' is linear w.p.1, i.e. P ('() 2 A) = 0. 18
Conversely, suppose that '() is linear w.p.1. Then, by linearity of the integral we have that IE['()] is also linear, that is, IE['()] 2= A. Proof of Theorem 4.2: Let A be de ned as in (4.9). Note initially that Xj (; !) is nondierentiable at 0 if and only if Xj0 (0; ; !) 2 A. By Lemma 4.1, the set in (4.10) is measurable. Let ' be a random variable on (H + (IRm ); BH ) de ned as '() = Pn1=0?1 Xn0 (0; ). From Theorem 4.1 we know that
hP1?1 0 i IE X ( ; ) IE'() : n=0 n 0 W10 (0; ) = = (4.11) IE1 IE1 Notice that the assumption of subdierentiability of Xn implies that Xn0 (0; ) is convex w.p.1 (see section 2) and hence so is '(). Furthermore, since '() is positively homogeneous, it follows that ' is sublinear. Let E denote the set f! : 9 j 2 [0; 1(!)] s.t. Xj0 (0; ; !) 2 Ag. We shall prove now that E = f! : '(; !) 2 Ag:
(4.12)
Indeed, let ! 2 E . Then, there exists some j 1 such that Xj0 (0; ; !) 2 A and hence '(; !) is nonlinear, otherwise '(; !) ? Xj0 (0; ; !) would be concave. Thus, ! 2 f'() 2 Ag. Conversely, suppose that ! 2= E . This means that Xj0 (0; ; !) is linear for all j 1 and hence '(; !) = Pn1=0(!)?1 Xn0 (0; ; !) is linear, i.e. ! 2= f'() 2 Ag. Therefore, (4.12) holds. Finally, from (4.11) we see immediately that W10 (0; ) is linear (i.e. W1 () is G^ateaux-dierentiable at 0) if and only if IE['()] is linear. Furthermore, as seen in the proof of Theorem 4.1, the properties of assumption B3 carry out to W1(), hence G^ateaux-dierentiability of W1() is equivalent to Frechet dierentiability. The assertion of the theorem follows now from (4.12) and Lemma 4.2.
5 The subdierential process The results presented in the previous sections are based on the assumption of directional dierentiability of the functions Xn (). As we have seen, this includes a fairly general class of functions. Suppose now that we restrict ourselves to subdierentiable processes, that is, directionally dierentiable processes whose directional derivative functions are convex (and hence sublinear). In dealing with optimization algorithms, one often encounters the problem of computing the subdierential set (or at least obtaining an approximation by using some computed subgradients), in order to check optimality conditions or generate a next iterate. In this sense, it is important to formulate the results discussed in the previous sections in terms of the subdierentials @Xn(0), n 0. The basic property that makes this possible is the one-to-one correspondence between compact convex sets and nite sublinear functions mentioned in section 2 | every nite sublinear function is the support function of a unique 19
compact convex set. In particular, the subdierential @Xn (0) (which is compact and convex) corresponds to the directional derivative function Xn0 (0; ). Consider the space C of all compact convex subsets of IRm , endowed with the Hausdor metric H . It is known (see Debreu [7]) that (C ; H ) is a complete and separable metric space, so in principle we can construct the corresponding collection of Borel sets BC and hence, assuming measurability, we can consider f@Xn (0)g as a process on (C ; BC ). Notice that this measurability assumption follows immediately from Assumption B1 and Lemma +2.1. Indeed, the aforementioned correspondence between C and the space S H (IRm ) of nite sublinear functions implies that @Xn(0) is F ? BC measurable if and only if Xn0 (0; ) is F ? BS measurable, where BS are the Borel sets of S . But the sets of BS are of the form B \ S with B 2 BH . Since Xn0 (0; ) is F ?BH measurable by Lemma 2.1 and Xn0 (0; ) 2 S , the conclusion follows. All results from section 3 can be easily reformulated into this new setting. Notice however that, in principle, we cannot apply directly the theory of the Bochner integral to compute the expected value of a random variable on (C ; BC ) as done in section 2, since that theory is constructed for Banach spaces | and C does not have a linear structure. Nevertheless, as pointed out by Hiai and Umegaki [14] (see also Radstrom [27]), C can be embedded as a convex cone in a Banach space, and this suces to allow the theory of Bochner integration to be applied to C -valued functions. + As with the space (H (IRm ); BH ), we say that a function : !PC is simple if there exist sets C1; : : :; CN 2 C and E1; : : : ; EN 2 F such that (!) = Ni=1 Ci 1Ei (!). Also, we say that a function ? : ! C is strongly measurable if there exists a sequence of simple functions fng such that limn!1 H (?(!); n (!)) = 0 for P almost all !. Analogously to the case of positive homogeneous functions studied in section 2, it follows from the separability of C that a random variable ? on (C ; BC ) P N n is strongly measurable. Let then 1; 2; : : : (with n = i=1 Cin1Ein ) be C -valued simple functions such that, for P -almost all ! 2 , Nn X Cin1Ein (!); (5.1) ?(!) = nlim !1 n (! ) = nlim !1 i=1
where the limit is understood to be with respect to the Hausdor metric. We de ne the Bochner integral of ? as Z Z Nn X CinP (Ein): ? dP := nlim dP = lim n !1 n!1 i=1
It is interesting to notice that the integral de ned above for a \random set" ? diers from the standard construction of integrals of multifunctions found in the literature (see e.g. Castaing and Valadier [3], Hiai and Umegaki [14], Debreu [7], Rockafellar [31]). In that approach, ? is viewed as a multifunction ! IRm (we shall consider here only closed-valued multifunctions); ? is said to be measurable if for each closed subset B of IRm , the set ??1(B ) := f! 2 : ?(!) \ B 6= ;g 20
belongs to F . A measurable selection of ? is a measurable function v : ! IRm such that v(!) 2 ?(!) for all ! 2 . It can be shown that a multifunction ? is measurable if and only if there is a countable family fvi : i 2 I g of measurable selections of ? such that ?(!) = cl fvi(!) : i 2 I g for all ! 2 . The integral of ? is de ned as the set of integrals of (integrable) measurable selections of ?, that is, Z Z 1 v(!) P (d!) : v 2 L (P ); v is a measurable selection of ? ; (5.2) ? dP := where L1(P ) is the set of integrable functions with respect to the measure P . Notice that de nition (5.2) is general in that no assumptions on compactness or convexity of ? are imposed. A third way to view the integral of a random compact convex set ? is by using the correspondence between that class+ of sets and sublinear functions as discussed in section 2. Indeed, let (!+ )() 2 H (IRm) be the support function of ?(!) (so is a random variable on (H (IRm ); BH )), and suppose that is Bochner integrable. R Let = (!)P (d!) be the Bochner integral of . Then, () is nite-valued and sublinear, so there exists a compact convex set whose support function is . We can then take to be the integral Observe that, by Lemma 2.2, the R (!)(of)?. computation of the Bochner integral P ( d! ) reduces to the computation of R the Lebesgue integrals (!)(d)P (d!). A particular advantage of this approach in the present case is that, since the support function of the subdierential @Xn (0) is the directional derivative function Xn0 (0; ), we can easily reinterpret the results of the previous sections. A natural question of course is: do the integrals discussed above coincide? As it turns out, the answer is armative. The equivalence between Bochner and multifunction integrals is studied in Hiai and Umegaki [14, Theorem 4.5] and Debreu [7, (6.5)], whereas the correspondence between multifunction integrals and integrals of support functions can be found in Castaing and Valadier [3, Theorem V-14] and Hiai and Umegaki [14, Theorem 2.2]. It must be stressed that those equivalences hold for compact-convex-valued multifunctions (which is the present case), since otherwise the Bochner integral is not well-de ned and the correspondence with support functions does not hold. We now re-state the results of section 4 in terms of subdierentials. First we derive a result that, although well-known (see Rockafellar and Wets [32], Ioe and Tihomirov [19]), illustrates an application of the equivalence between integrals discussed above:
Proposition 5.1 Suppose that, for P -almost ! 2 , each Xn (; !) is subdierentiable at 0 . Suppose also that assumptions B2 and B3 hold. Then, for any nite n 0, @ IE[Xn (0)] = IE[@Xn(0)];
(5.3)
where the expected value is understood as the multifunction integral de ned in (5.2).
21
Proof: Let Wn () = IE[Xn()]. By the equivalence of multifunction integrals with integrals of supporting functions, (5.3) holds if and only if Wn0 (0; ) = IE[Xn0 (0; )], i.e. Wn0 (0; d) = IE[Xn0 (0; d)] for all d 2 IRm: The above equation holds if we can interchange integrals and limits, which as seen in the proof of Theorem 4.1 follows from assumption B3.
Another interesting consequence comes from Lemma 4.2: Proposition 5.2 Suppose that the assumptions of Proposition 5.1 hold. Then, for any nite n 0, IE [Xn ()] is dierentiable at 0 if and only if Xn () is dierentiable at 0 with probability one. Proof: First notice that, under the assumptions of the proposition, IE [Xn ()] is dierentiable at 0 if and only if @ IE[Xn (0)] is a singleton, i.e. if the directional derivative of IE [Xn ()] at 0 is a linear function. By Lemma 4.2, this occurs if and only if each Xn0 (0; ) is linear w.p.1, that is, if and only if Xn () is dierentiable at 0 w.p.1. The following is an immediate consequence of Theorem 4.1 and Corollary 4.1.
Proposition 5.3 Suppose that the assumptions of Theorem 4.1 hold. Then, the following equations hold: NX ?1 1 @Xn(0) w.p.1; (5.4) @ IE[X1(0)] = Nlim !1 N n=0 h i IE Pn1=0?1 @Xn(0) @ IE[X1(0)] = ; (5.5) IE1 where the sums on the right-hand sides are understood as Minkowski addition of sets, the expected-values are again understood as the multifunction integrals de ned in (5.2), and the limit in (5.4) is understood in the Hausdor sense.
As with the directional derivatives, equation (5.4) provides a way to estimate the subdierential of IE[X1] at 0. Again, suppose we simulate the system for M regenerative cycles, and let 0 = 0; 1; 2; : : : be the regeneration epochs. An estimator for @ IE[X1(0)] is given by PM Pm?1 @X ( ) ?^0 = mP=1M n=m?1 n 0 : m=1 (m ? m?1 ) Clearly, from the above formula we see that the choice of one particular subgradient of each set @Xn (0) yields an estimator for a subgradient of IE[X1] at 0. Furthermore, as remarked before, we can also estimate the variance of the resulting estimators. Observe that Proposition 5.3 can actually be stated in terms of a general C -valued process fCng. Let n() denote the support function of Cn . As seen in section 2, 22
n() is a nite-valued sublinear (i.e. convex and positively homogeneous) function. Moreover, it is easy to see that n (d) = n0 (0; d) for all d 2 IRm , whence we have that Cn = @n(0). It follows that if the process fn ()g satis es assumptions A3-A4 and B1-B3 discussed in the previous sections (for 0=0), then fCng is regenerative and hence we can apply Proposition 5.3. Notice that assumption B2 is equivalent to assuming Bochner integrability of n() (and therefore of Cn), and that assumptions B1 and B3 is automatically satis ed in this case since n () is convex. Below we summarize this result.
Corollary 5.1 Let fCng be a process on (C ; BC ) such that each Cn is (Bochner) integrable, and let n() denote the support function of Cn. Suppose that fn()g satis es assumptions A3 and A4 for 0 =0. Then, fCn g is regenerative (so it has a weak limit C1 ) and the following equations hold:
NX ?1 1 IE[C1] = Nlim C w.p.1; (5.6) !1 N n=0 n h i IE Pn1=0?1 Cn IE[C1] = : (5.7) IE1 The above corollary can also be viewed as a result for compact-convex-valued multifunction processes. Other types of limiting results have been studied in the literature for multivalued processes: in more general settings (i.e. without imposing convexity and compactness) Hiai [13] gives various forms of the Strong Law of Large Numbers when the Cn are independent, whereas Hiai and Umegaki [14] study martingales formed by multivalued processes and provide some convergence theorems. We refer to those papers and references therein for details. To the best of our knowledge, however, there have been no results on regenerative multivalued processes as given by Corollary 5.1.
6 Examples 1. A G/D/1 queue The following system was presented by Shapiro and Wardi [36] to illustrate the nondierentiability of the mean steady-state function. Consider a G/D/1 queue where the distribution of the interarrival times An has atoms at two points b and c with b < c. For simplicity, we take P (An = b) = P (An = c) = 1=2. Suppose that the deterministic service time is a parameter 2 = (b; b+2 c ). Notice that the assumption < b+2 c guarantees that the queue is stable and hence regenerative. Denote by Tn() the system time of customer n (i.e. waiting time plus service time). Then, for n 1, Tn() satis es the recursion
Tn() = + [Tn?1() ? An]+; 23
where [x]+ = maxfx; 0g (assume T0() 0). Notice that Tn() is de ned by the maximum of linear functions and therefore it is convex. Given 0 2 , de ne random variables fmg, m = 0; 1; 2; : : :, with 0 = 0 and
m = inf fn > m?1 : Tn?1(0) Ang: Observe that fmg, m = 0; 1; 2; : : : are the epochs at which an arriving customer nds the queue empty, hence fTn(0)g regenerates at those points. Notice also that on the event fTn?1(0) Ang we have that Tn(0) = 0. Shapiro and Wardi [36] show that on fTn?1 (0) = Ang the function Tn() is not dierentiable at 0 and, furthermore, the event fTn?1(0) = An; for some n 1g has positive probability, which is their basic condition for nondierentiability of the mean steady-state function IE [T1()] at 0. Consider now the event E = fm = ng, and observe that we can partition E as E< [ E=, where E< = fTn?1(0) < Ang and E= = fTn?1(0) = Ang. Notice that the continuity of Tn () implies that for any ! 2 E< there is a neighborhood Vn (!) of 0 such that Tn?1() < An for all 2 Vn (!) and hence Tn() = on that neighborhood. On the other hand, on E= it might happen that Tn?1() > An for arbitrarily close to 0 and hence Tn() = + Tn?1() ? An. Since the event fTn?1(0) = An; for some n 1g happens at some k with probability one, it follows that we cannot ensure the existence of neighborhoods fVn (!)g satisfying assumption A2 and thus we cannot apply directly Theorem 3.1. We can nevertheless overcome that problem by considering a subset of the regeneration points as follows. Let " > 0 be arbitrary, and let fmg, m = 0; 1; 2; : : : (with 0 = 0) denote the epochs
m = inf fn > m?1 : Tn?1(0) < An ? "g: As argued before, by continuity of the functions Tn there exist neighborhoods Vn (!) of 0, ! 2 , such that on the event F = fm = ng we have that Tn () = on Vn (!). Therefore, assumption A2 is satis ed and hence from Theorem 3.1 it follows that (Tn(0); @Tn(0)) regenerates at each m . Let us show now that assumption A3 also holds in this case. Note initially that, since IE1 < 1, it follows that for any > 0 there exists an N such that P (1 > N ) < . Let B denote the event f1 N g, and x ! 2 B . Then, for any j < 1(!) we can write j X Tj () = (j + 1) ? Ak : k=1
De ne the set
" V := : j ? 0j < N : We have that, for any 2 V and any ! 2 B ,
jTj () ? Tj (0)j = (j + 1)j ? 0j 1j ? 0j < 1 N" " 24
and therefore, for any n = 1; 2; : : :, on f1 = ng \ B we have
2 V =) jTn?1()?Tn?1(0)j < " =) Tn?1() < An =) Tn() = =) 2 Vn (!); that is, V Vn (!). We conclude that V \!2B \1n=1 Vn (!) and, since P (B ) > 1 ? by construction, assumption A3 holds. Thus, by applying Proposition 5.3 we obtain i h IE Pn1=0?1 @Tn(0) @ IE[T1(0)] = : IE 1
Notice that Tm () is dierentiable at 0 for all m 0 | indeed, T0m (0) = 1 |, but for the indices n such that fTn?1(0) = Ang we have that [ @Tn(0) = conv f1g f1g + @Tn?1(0) (6.1) (where conv(A) denotes the convex hull of A) and for all other indices j such that 0 < j < 1 and fTj?1(0) < Aj g we have that
@Tj (0) = f1g + @Tj?1(0): (6.2) Observe that with positive probability (i.e. the probability that the event fTn?1(0) = Ang occurs before fTn?1 (0) < Ang) there exists n < 1 such that Tn() is nondifferentiable at 0. Thus, we see that the sum of the sets @Ti(0) for i within a cycle will be a non-singleton with positive probability, and hence by Theorem 4.2 it follows IE[T1()] is not dierentiable at 0. A consistent estimator of @ IE[T1(0)] can now be computed as follows. Fix some M > 0, and simulate the system for M cycles. Let 0; : : :; M be the regeneration points (without loss of generality, assume we start with an empty system, i.e. 0 = 0). Then, PM Pi?1 @T ( ) i=1 j =i?1 j 0 PM ( ? ) i?1 i=1 i is an estimator of @ IE[T1(0)], where the @Tj (0) are given by (6.1) and (6.2), and @T0(0) = @Ti (0) = f1g. Another consistent estimator is obtained by simulating the system for N periods and computing ?1 1 NX N i=0 @Ti(0): 2. A tandem queue model Consider a series of K single-server queues, with general distributions for the inter-arrival times An and the service times fSnk ()g, n = 0; 1; : : :, k = 1; : : : ; K . Suppose that all service times vectors Sn () := (Sn1 (); : : :; SnK ()) are iid, and that the network is stable. Such system has been studied in the literature in the case the Sn() are dierentiable: some papers deal with optimization issues and consequently the computation of derivatives of steady-state quantities, often using perturbation 25
analysis techniques (see for instance Wardi and Hu [40], Hu [18], Chong and Ramadge [4]), whereas Glasserman [10] shows that the waiting times Wn () and the derivative process Wn0 () regenerate at the same epochs. Here, we do not assume dierentiability; instead we assume that Snk () is only a subdierentiable function of for each n and each k, and that the functions Sn () := (Sn1 (); : : :; SnK ()) are iid. We also assume that the service times satisfy the following property: for any > 0 there exists a constant M > 0 such that, for any in a neighborhood of 0, P jSnk () ? Snk (0)j M k ? 0k > 1 ? (6.3) for all n and k. This condition is satis ed for instance if Snk () has the form Yk , where Yk is a random variable with nite expectation, which in turn includes the case when Snk () has exponential distribution with mean k . Let Tnk () denote the system time of job n upon its completion on server k. It is clear that one set of regeneration points for the process Tn() := (Tn1(); : : :; TnK ()) consists of the epochs at which an arriving customer nds the whole network empty. However, as pointed out by Nummelin [24], those points may never occur, even though the system is stable. Alternatively, with mild assumptions, Nummelin provides regeneration epochs for the waiting time process Wnk () and shows that the corresponding regeneration cycles have expected nite length. By using an argument similar to [24], it can be shown that under proper assumptions the system time process fTn()g is a regenerative Markov chain with regeneration points given by
j () = inf fn > j?1() : Tnk?1 ? An < bk () ? "; Tnk?1 > bk () + "; k = 1; : : : ; K g; where b() = (b1(); : : :; bK ()) is a deterministic continuous function of , and " is an arbitrary positive number. Notice that on fj () = ng the total waiting time of job n is zero and hence we have that
Tnk () = Sn1 () + : : : + Snk (); k = 1; : : : ; K: For 0 2 , let B0 IRK IR denote the set
B0 = f(x; a) : xk ? a < bk (0) ? "; k = 1; : : : ; K g: and D0 IRK denote the set
D0 = fy : y1 + : : : + yk?1 > bk (0) + "; k = 1; : : :; K g: It is easy to see that on f(Tn(0); An+1) 2 B0g \ fSn+1(0) 2 D0g we have that ?1 > bk () + " for all k = 1; : : : ; K and all in some Tnk() ? An+1 < bk () ? ", Tnk+1 neighborhood Vn (!) of 0, so it follows that
Tnk+1() = Sn1+1 () + : : : + Snk+1 (); k = 1; : : : ; K on Vn (!). Furthermore, by writing a recursive expression for Tnk () it is readily seen that Tnk () is subdierentiable, since so are the service times. Therefore, the conditions 26
of Theorem 3.2 are satis ed and hence we conclude that fTn(0); @Tn(0)g regenerates at the epochs fj (0)g. Let us show now that assumption A3 is satis ed. It is possible to show that, given ! 2 , 2 , k K and n 0, there exist sets J and M (depending on !,, n and k) such that J f1; : : : ; K g, M f`?1(); : : :; `()g for some ` > 0 and we can write X X j Tnk (; !) = Sm(; !): j 2J m2M
The sets J and M correspond to the solution of a longest path problem in a graph, as shown in Homem-de-Mello, Shapiro and Spearman [17]; we refer to that paper for details. Let J0, M0 be the sets corresponding to Tnk (0; !), i.e X X j Tnk(0; !) = Sm (0; !): j 2J0 m2M0
Then, by the property of longest paths we have that X X j X X j Sm(; !) Sm (; !) j 2J m2M j 2J0 m2M0 X X j X X j Sm (0; !) Sm(0; !) j 2J0 m2M0
j 2J m2M
and thus X X h j 2J0 m2M0
i Smj (; !) ? Smj (0; !) Tnk(; !) ? Tnk (0; !) i X Xh j Sm(; !) ? Smj (0; !) : j 2J m2M
It follows that jTnk (; !) ? Tnk(0; !)j max Pj2J0 Pm2M0 [Smj (; !) ? Smj (0; !)] ; P P j2J m2M [Smj (; !) ? Smj (0; !)] : (6.4) Next, notice that by the assumption (6.3) on service times it follows from (6.4) that, for any > 0, X X M k ? 0k jTnk() ? Tnk (0)j j 2J1 m2M1
with probability at least 1 ? (where J1 and M1 correspond to the maximizer of the right-hand side of (6.4)). Thus, we have that P jTnk () ? Tnk(0)j card(J1) card(M1) M k ? 0k > 1 ? : (6.5) Finally, since J1 f1; : : : ; K g and M1 f`?1 ; : : :; `g for some ` > 0, it follows that card(J1) K and card(M1) ` , where ` is the length of the `th cycle. Since IE < 1, we conclude that there exists a constant Q > 0 such that P (card(M1) Q ) > 1 ? : (6.6) 27
Together with (6.5), (6.6) implies (by the inequality P (A \ B ) P (A) + P (B ) ? 1) that with probability at least 1 ? 2 we have
jTnk () ? Tnk (0)j KQ M k ? 0k for all n and k. As inn example 1, we conclude o that there exists a neighborhood V of 0, namely V := : j ? 0j < KQ" M , such that V \!2B \1n=1 Vn (!) with P (B ) > 1 ? 2. Therefore, assumption A3 holds and thus, by applying Proposition 5.3 we obtain h i h k i IE Pn1=0?1 @Tnk(0) @ IE T1(0) = : IE1 h i We can now estimate an element of @ IE T1k (0) , as follows. Fix some M > 0, and simulate the system for M cycles. Let 0; : : :; M be the regeneration points (without loss of generality, assume we start with an empty system, i.e. 0 = 0). Then, PM Pi?1 @T k( ) i=1 j =i?1 j 0 PM ( ? ) i=1 i
i?1
is an estimator of @ IE[T1k (0)]. Notice that for any j 0 we have
Tkj (0) = S1j (0) + : : : + Skj (0); k = 1; : : : ; K and hence
rTkj (0) = rS1j (0) + : : : + rSkj (0); k = 1; : : :; K;
where rf denotes an arbitrary subgradient of f . For the indices n such that j?1 < n < j a subgradient of Tkj at 0 can be computed by using the corresponding solution of the longest-path problem mentioned above, together with subgradients of the service times; we refer again to [17] for details. Another consistent estimator is obtained by simulating the system for N periods and computing ?1 1 NX k N i=0 rTi (0):
7 Concluding remarks In this paper we have provided results that will hopefully enlarge the scope of applications of sensitivity analysis and optimization in stochastic systems to include \nonsmooth" processes. In particular, we have shown that, under some assumptions, one can consistently estimate directional derivatives (and consequently subdierential sets and subgradients) of expected steady-state functions both by ratio-type and long-run average formulas. Those formulas are convenient in that they allow the computation of the derivatives to be done simultaneously with the original process during the simulation. We have given some examples showing potential applications of these results. 28
From the more theoretical viewpoint, our contribution is twofold: rstly, we extended the result in Shapiro and Wardi [36] by exhibiting a necessary and sucient condition for the dierentiability of the expected steady-state function. Secondly, we showed a limit theorem for compact-convex-valued multifunctions by proving that, under proper assumptions, the average of a sequence of regenerative random multifunctions converge with probability one. We hope this work will allow many results from the well-studied elds of nonsmooth analysis and multifunctions to be incorporated into the area of stochastic processes.
Acknowledgements I am very grateful to Professor Alexander Shapiro for some enlightening discussions kept during the preparation of this work, and for many suggestions that helped to improve the quality of the paper. I also thank two anonymous referees for their comments, and Professor David McDonald and Professor A. Ioe for discussions on the Bochner integral.
References [1] S. Asmussen, Applied Probability and Queues, Wiley, New York, 1987. [2] P. Bratley, B.L. Fox and L.E. Schrage, A Guide to Simulation, 2nd. ed, SpringerVerlag, New York, 1987. [3] C. Castaing and M. Valadier, Convex Analysis and Measurable Multifunctions, Springer-Verlag, Berlin, 1977. [4] E.K.P. Chong and P.J. Ramadge, \Stochastic optimization of regenerative systems using in nitesimal perturbation analysis", IEEE Trans. on Autom. Control, vol. 39, no. 7, 1400-1410, 1994. [5] K.L. Chung, A Course in Probability Theory, 2nd ed., Academic Press, 1974. [6] F.H. Clarke, Optimization and Nonsmooth Analysis, reprint, SIAM, 1990. [7] G. Debreu, \Integration of correspondences", Proc. Fifth Berkeley Symposium on Math. Statist. and Probability, vol. II, part I, 351-372, 1966. [8] J. Diestel and J.J. Uhl, Jr., Vector Measures, American Mathematical Society, Providence, 1977. [9] P. Glasserman, Gradient Estimation via Perturbation Analysis, Kluwer, Norwell (MA), 1991. [10] P. Glasserman, \Regenerative derivatives of regenerative sequences", Adv. Appl. Prob. 25, 116-139 (1993). 29
[11] P. Glasserman and P.W. Glynn, \Gradient estimation for regenerative processes", Proceedings of the 1992 Winter Simulation Conference, 280-288. [12] P.W. Glynn, \Optimization of stochastic systems via simulation", Proceedings of the Winter Simulation Conference 1989, IEEE Press (1989), 90-105. [13] F. Hiai, \Strong laws of large numbers for multivalued random variables", in Multifunctions and Integrands, eds. A. Dold and B. Eckmann, Springer-Verlag, Berlin, 1984. [14] F. Hiai and H. Umegaki, \Integrals, conditional expectations and martingales of multivalued functions", Journal of Multivariate Analysis, 7, 149-182 (1977). [15] J.-B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms I, Springer-Verlag, Berlin, 1993. [16] J.-B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms II, Springer-Verlag, Berlin, 1993. [17] T. Homem-de-Mello, A. Shapiro and M.L. Spearman, \Finding optimal material release times using simulation based optimization", to appear in Management Science, 1998. [18] J.Q. Hu, \Convexity of sample path performance and strong consistency of in nitesimal perturbation analysis", IEEE Trans. on Autom. Control, vol. 37, no. 2, 258-262, 1992. [19] A.D. Ioe and V.M. Tihomirov, \On the minimization of integral functionals", Functional Anal. Appl. 3 (1969), 218-227. [20] A.D. Ioe and V.M. Tihomirov, Theory of Extremal Problems, North-Holland, 1979. [21] J.L. Kelley and T.P. Srinivasan, Measure and Integral, Springer-Verlag, New York, 1988. [22] P. L'Ecuyer, \A uni ed view of the IPA, SF and LR gradient estimation techniques", Management Science 36, 11 (1990), 1364-1383. [23] P. L'Ecuyer and P.W. Glynn, \Stochastic optimization by simulation: convergence proofs for the GI/G/1 queue in steady-state", Management Science 11, 40 (1994), 1562-1578. [24] E. Nummelin, \Regeneration in tandem queues", Adv. Appl. Prob. 13, 221-230, 1981. [25] E. Nummelin, General Irreducible Markov Chains and Non-negative Operators, Cambridge University Press, 1984. 30
[26] E.L. Plambeck, B.R. Fu, S.M. Robinson and R. Suri, \Sample-path optimization of convex stochastic performance functions", Mathematical Programming, Series B, 75 (1996), 137-176. [27] H. Radstrom, \An embedding theorem for spaces of convex sets", Proc. Amer. Math. Soc., 3, 165-169 (1952 ). [28] D. Revuz, Markov Chains, North-Holland, 1984. [29] S.M. Robinson, \Convergence of subdierentials under strong stochastic convexity", Management Science, 41 (1995), 1397-1401. [30] R.T. Rockafellar, Convex Analysis, Princeton University Press, 1970. [31] R.T. Rockafellar, \Integral functionals, normal integrands and measurable selections", in Nonlinear Operators and the Calculus of Variations, eds. J.P. Gossez, E.J. Lami Dozo, J. Mawhin and L. Waelbroeck, Springer-Verlag, Berlin, 1976. [32] R.T. Rockafellar and R.J.-B. Wets, \On the interchange of subdierentiation and conditional expectation for convex functionals", Stochastics, vol. 7, 173-182, 1982. [33] H. Royden, Real Analysis, Macmillan, New York, 1988. [34] R.Y. Rubinstein and A. Shapiro, Discrete Event Systems: Sensitivity Analysis and Stochastic Optimization by the Score Function Method, John Wiley & Sons, New York, 1993. [35] A. Shapiro, \On concepts of directional dierentiability", Journal of Optimization Theory and Applications, vol. 66, no. 3, 477-487, 1990. [36] A. Shapiro and Y. Wardi, \Nondierentiability of the steady-state function in discrete event dynamical systems" IEEE Trans. on Autom. Control, vol. 39, no. 8, 1707-1711, 1994. [37] G.S. Shedler, Regeneration and Networks of Queues, Springer-Verlag, New York, 1987. [38] R. Suri, \Perturbation analysis: The state of the art and research issues explained via the GI/G/1 queue", in Proceedings of the IEEE, vol. 77, 114-137, 1989. [39] R. Suri and Y.T. Leung, \Single run optimization of discrete event simulations | an empirical study using the M/M/1 queue", IIE Transactions, 21, 1 (1989), 35-49. [40] Y. Wardi and J.Q. Hu, \Strong consistency of in nitesimal perturbation analysis for tandem queueing networks", J. Discrete Event Dynamic Systems: Theory and Applications, vol. 1, 37-59, 1991. 31