A note on complexity of multistage stochastic ... - Semantic Scholar

Report 2 Downloads 166 Views
A note on complexity of multistage stochastic programs M.M.C.R. Reaiche∗ IMPA, Rio de Janeiro, RJ, Brazil November 20, 2014

Abstract In Shapiro [2006], estimates of the sample sizes required to solve a multistage stochastic programming problem with a given accuracy by the conditional sample average approximation method were derived. In this paper we construct an example in the multistage setting that shows that these estimates cannot be significantly improved. Keywords: Stochastic programming, Monte Carlo sampling, Sample average method, Complexity

1

Introduction

Consider the following T −stage stochastic programming problem represented in the nested form  min F1 (x1 ) + E|ξ1

x1 ∈X1

inf

x2 ∈X2 (x1 ,ξ2 )

  F2 (x2 , ξ2 ) + E|ξ[2] ... + E|ξ[T −1]

 inf

xT ∈XT (xT −1 ,ξT )

FT (xT , ξT )

(1)

driven by the random data process ξ1 , ..., ξT . Here, xt ∈ Rnt , t = 1, ..., T , are the decisions variables, Ft : Rnt ×Rdt → R are continuous functions and Xt : Rnt−1 ×Rdt ⇒ Rnt , t = 2, ..., T , are measurable multifunctions. The (continuous) function F1 : Rn1 → R, the (nonempty) set X1 and the vector ξ1 are deterministic. In (1) we have made it explicit the fact that the information available until each stage, ξ[t] := (ξ1 , ..., ξt ), is considered in order to calculate the expected values. If the (conditional) distribution of ξt (given ξ[t−1] ) is continuous, problem (1) cannot be addressed directly, except for some trivial cases. In fact, the (conditional) expected value operators are multidimensional integrals on Rdt , that are typically impossible to evaluate with high accuracy even for moderate values of the dimension. Even when the support of each ξt is finite, but with a very large number of elements, it may be the case that is not possible to evaluate, within a reasonable computational time, the integrals (summations) in (1). ∗ [email protected]

1

Hence, one usually approximates the random data of problem (1) by a discrete stochastic process building a scenario tree. A classical idea is to construct the tree via Monte Carlo conditional sampling techniques. Given the scenario tree, one solves the sample average approximation (SAA) problem, that is, problem (1) with the discrete random data. This is the basic idea of the SAA method. Let us point out that the SAA method is not an algorithm, one still has to solve the SAA problem by employing an appropriate procedure (for example, the Stochastic Dual Dynamic Programming algorithm). One can study the complexity of the algorithms that are used to solve the SAA problem, although we will not discuss it in this paper. In general, even if we solve the SAA problem exactly, its first-stage optimal decision will not be optimal for the true problem. So there exists an error that comes from the fact that we are approximating the true stochastic process by Monte Carlo methods. Thus, it makes sense to define the complexity associated with the SAA method as the number of paths in the scenario tree, say N , that guarantees that any δ-optimal solution of the SAA problem is an −optimal solution of the true problem with probability at least 1 − α (here, 0 ≤ δ ≤  and α ∈ (0, 1)). Let us point out that this notion of complexity was proposed and studied in [1, 2, 4]. Consider the (deterministic) first stage objective function of the true problem:

f (x1 ) =

   F1 (x1 ) + Q2 (x1 )

, if x1 ∈ X1

  +∞

, otherwise

 where Q2 (x1 ) = E

 inf

x2 ∈X2 (x1 ,ξ2 ) ∗

F2 (x2 , ξ2 ) + E|ξ[2] [...+ E|ξ[T −1]

go function. Denote by V the optimal-value of problem (1),

inf

(2)

 FT (xT , ξT ) is the true cost-to-

xT ∈XT (xT −1 ,ξT ) i.e. V∗ := inf x1 ∈X1

f (x1 ). We say that x1 ∈ X1 is

a first stage -solution of problem (1) if f (x1 ) ≤ V∗ + . Given the scenario tree, we can also consider the set of first stage δ−optimal solutions of the SAA problem. Since the scenario tree is constructed by Monte Carlo sampling, we can consider the event: “any first stage δ−optimal solution of the SAA problem is a first stage −optimal solution of the true problem”, for given 0 ≤ δ ≤ . The probability of this event is a function of the number of branches, say Nt , t = 2, ..., T , of each stage node in the tree. The total number of paths in the tree QT is just N = t=2 Nt (remember that N1 = 1, since ξ1 is deterministic). When T = 2, the random sample consists of N2 = N realizations of ξ2 . It is possible to show, under mild regularity conditions (c.f. [2] or [3, p. 195]), that for  > 0 and α ∈ (0, 1) the sample size:      O(1)σ 2 DL O(1) n1 log + log N≥ 2  α

2

(3)

guarantees that any (/2)-optimal solution of the SAA problem is an −optimal solution of the true problem with probability at least 1 − α (in this case, δ = /2). Here O(1) is a generic constant, D is the diameter of X1 , L is a Lipschitz constant of f in X1 and σ 2 is a constant measuring the variability of Q2 (x1 , ξ2 ) (c.f. [2]). In [2, Example 1] it was shown that this estimate cannot be significantly improved. Indeed, for a family of two-stage stochastic programs (indexed by k ∈ N), it was shown in that example that for  ∈ (0, 1) and α ∈ (0, 0.3), the sample size should satisfy: N>

n1 σ 2 , 2/vk

(4)

in order to guarantee that with probability 1 − α an exact optimal solution of the SAA problem is an −optimal solution of the true problem, where vk := 2k/(2k − 1). Comparing (4) with the estimate (3), both quantities grows linearly in n1 . In (3), N is proportional to

σ2 2 ,

while in (4), N is proportional to

σ2 2/vk

. Observe that

vk & 1, as k → +∞ establishing that (3) is tight. In [2], the estimate (3) was extended for an arbitrary finite number of stages T ≥ 2. In order to avoid some technical difficulties, the analysis was conducted under the stagewise independent hypothesis. In this case, the scenario tree can be constructed in two ways: (i) by conditional sampling: first, generate a random sample of ξ2 , say ξ2i2 , i2 = 1, ..., N2 , with size N2 . For each 2-stage node, ξ2i2 , generate a random sample of ξ3 |ξ2i2 , say ξ3i2 ,i3 , i3 = 1, ..., N3 , with size N3 and such that ξ3i2 ,i3 ’s are independent from each other. This continues forward in stages, that is, having generated the scenario tree until stage t ≤ T − 1, for each tth -stage node, ξti2 ,...,it , generate a random sample of i ,...,it+1

2 ξt+1 |ξti2 ,...,it , say ξt+1

i ,...,it+1

2 , it+1 = 1, ..., Nt+1 , with size Nt+1 and such that ξt+1

’s are independent

from each other. (ii) taking advantage of the stagewise independence: since ξt+1 |ξ[t] has the same distribution of ξt+1 for each i

t+1 ξ[t] , we can take the same sample for each tth −stage node, say ξt+1 , it+1 = 1, ..., Nt+1 .

In both alternatives, the total number of paths in the scenario tree is equal to N =

QT

t=2

Nt . Applying

Theorem 2 in [2] (that requires some mild regularity assumptions) for  > 0 and α ∈ (0, 1), if we take N2 = ... = NT , then for: Nt ≥

     O(1)σ 2 O(1) DL + log (n + ... + n ) log 1 T −1 2  α

(5)

any first stage /2−optimal solution of the SAA problem is a first stage −optimal solution of the true problem with probability at least 1 − α. By (5) we obtain an estimate for N that grows exponentially with the number of stages. As it was pointed out in [2], this result indicates that the SAA method could be practically inapplicable for solving multistage problems with a large number of stages. 3

In this paper we consider the other side of the story, that is, we construct an example that shows that the estimate obtained for N in the T -stage setting cannot be significantly improved, in a similar way of Example 1 in [2]. Indeed, we exhibit a family of T -stage stochastic programming problems (indexed by k ∈ N), such that the total number of paths in the scenario tree must satisfy:  N>

σ2 2/vk

T −1 [n1 + ... + nT −1 ]

T −1

,

(6)

in order to guarantee that with probability 1 − α an exact first stage optimal solution of the SAA problem is an −optimal solution of the true problem. As before vk := 2k/(2k − 1), so vk & 1, as k → +∞. Let us point out that the condition N2 = ... = NT was assumed here to derive (5) for each Nt , t = 2, ..., T , separately (cf. [2, equation 29]), but we will not assume this in order to derive (6). The rest of the paper is organized as follows: Section 2 is the core part, where we present the family of examples satisfying (6); Section 3 deals with the fact that, at first look, our example seems to belong to a different problem class as compared to problem (1). We show that this is not the case in Section 3. We close the paper with a brief discussion in Section 4.

2

Example

For k ∈ N, let us consider the T −stage stochastic programming problem (cf. [3, p. 60, eq. 3.4] ):   min E ||x1 ||2k − 2k hξ2 + ... + ξT , x1 i + ||x2 (ξ[2] )||2k + ... + ||xT (ξ[T ] )||2k x(·)    x1 ∈ Bn s.t.   xt (ξ[t] ) ∈ Bn w.p.1, t = 2, ..., T

(7)

where x(ξ) = (x1 , x2 (ξ[2] ), ..., xT (ξ[T ] )) is an implementable policy, X1 = Xt (xt−1 , ξt ) = Bn := {x ∈ Rn : ||x|| ≤ 1} are (measurable) multifunctions and ξt ∼ N (0, σ 2 In ), t = 2, ..., T, are i.i.d. multivariate normal random vectors and ξ1 = 0. Writing (7) in the nested form, we obtain:

min

x1 ∈Bn

2k

||x1 ||

 +E

2k

inf ||x2 ||

x2 ∈Bn

    2k − 2k hξ2 , x1 i + E ... + E inf ||xT || − 2k hξT , x1 i ... xT ∈Bn

(8)

So each stage objective function is Ft (x1 , xt , ξt ) := ||xt ||2k − 2k hx1 , ξt i, t = 1, ..., T . Hence, they also depend on the first stage decision x1 . It could be noted that our example is not exactly in the same class of problems of

4

form (1). In order to keep the natural flow of the paper, we will address this issue in the next section. Since the random vectors are stagewise independent, we can construct the scenario tree as in Section 1 (ii.), i.e., taking independent random samples of each stage random vectors and considering the product probability measure of each stage empirical distribution:

SN2 ,...,NT :=

  

ξti ∼ N (0, σ 2 In ) :

  i = 1, ..., Nt ,  t = 2, ..., T

 

,

(9)

 

Let us derive the objective functions f (·) and fˆN (·), respectively, of the true problem and the SAA problem given SN2 ,...,NT (it would be more precise to write fˆN2 ,...,NT (·), but we prefer to simplify the notation, be aware though that this can be misleading). Let us begin with the T th -stage (optimal) value function obtained by the dynamic programming equation:

QT (x1 , ..., xT −1 , ξT ) = inf FT (x1 , ..., xT , ξT ) = min

xT ∈Bn

xT ∈Bn



||xT ||2k − 2k hξT , x1 i = −2k hξT , x1 i

(10)

The true problem and SAA problem T th −stage cost-to-go functions are obtained, respectively, by taking the expected value of QT (x1 , ..., xT −1 , ξT ) with respect to the true distribution of ξT and the approximated distribution of ξT : QT (x1 , ..., xT −1 ) = E [−2k hξT , x1 i] = 0

ˆ [−2k hξT , x1 i] = −2k ξ¯T , x1 , ˆ T (x1 , ..., xT −1 ) = E Q where ξ¯T =

1 NT

PNT

i i=1 ξT .

Continuing backward in stages, it is not difficult to verify that we obtain:

Q2 (x1 ) = 0

ˆ 2 (x1 ) = −2k ξ¯2 + ... + ξ¯T , x1 , Q where ξ¯t =

1 Nt

PNt

i i=1 ξt ,

(11)

(12)

t = 2, ..., T . Let us define η := ξ¯2 + ... + ξ¯T , so by (12), it follows that f (x1 ) = ||x1 ||2k

and fˆN (x1 ) = ||x1 ||2k − 2k hη, x1 i, for x1 ∈ Bn . The (unique) first stage optimal solution of the true problem is x ¯1 = 0, so its optimal-value is V∗ = 0.

5

Moreover, the (exact) first stage optimal solution to the SAA problem is given by:

x ˆ1 =

            

0 1 η ||η||γ 1 η ||η||

, if ||η|| = 0 , if 0 < ||η|| < 1

(13)

, if ||η|| ≥ 1

2k − 2 . 2k − 1 Hence, given  ∈ (0, 1), x ˆ1 is an -optimal solution of the true problem if, and only if, ||η||2k(1−γ) ≤ . Define  T 2  P σ v = vk = 2k(1 − γ) = 2k/(2k − 1). By (9), η ∼ N 0, In . Considering the harmonic mean, say hm, of t=2 Nt the numbers N2 , ..., NT : T X T −1 1 := , hm Nt t=2 where γ = γk =

it follows that: η∼N

  σ 2 (T − 1) In . 0, hm

(14)

Consider the following statement: (A) the (exact) optimal solution of SAA problem has probability 1 − α of being an −optimal solution of problem (1). Let us now derive a lower bound for N in order for (A) to be satisfied. Indeed, suppose that N2 , ..., NT are such that (A) holds, that is: P [||η||v ≤ ] ≥ 1 − α

(15)

hm This is equivalent to α ≥ 1−P [||η||v ≤ ] = P [||η||v > ]. It follows from (14) that 2 ||η||2 ∼ χ2n . Observe σ (T − 1) h i  2  (hm)2/v hm 2 v also that P σ2 (T −1) ||η|| > σ 2 (T −1) = P [||η|| > ]. Since the sequence P χn > n is monotone increasing and   2/v P χ21 > 1 = 0.3173, if α ∈ (0, 0.3173) we must necessarily have (hm) σ 2 (T −1) > n, that is:

hm >

σ2 n(T − 1) 2/v

(16)

It is a well known result that the harmonic mean of (positive) real numbers is always less than or equal to its geometric mean, say: gm := (N2 ...NT )1/(T −1) = N 1/(T −1) .

6

(17)

So, we arrive at the following lower bound for N :  N>

σ2 2/v

T −1

T −1

[n(T − 1)]

 =

σ2 2/v

T −1

T −1

[n1 + ... + nT −1 ]

(18)

As v = 2k/(2k − 1), it can be taken arbitrarily close to 1 by making k sufficiently large.

3

Technical Issues

As it was pointed out in the previous section, we will show that, in some sense, our example is in the same class of problem (1). Consider the following T −stage stochastic programming problem: 

 min G1 (y1 ) + E|ξ1

y1 ∈Y1

inf

y2 ∈Y2 (y1 ,ξ2 )

G2 (y1 , y2 , ξ2 ) + E|ξ[2] [ ... + E|ξ[T −1]

inf

yT ∈YT (yT −1 ,ξT )

  GT (y1 , ..., yT , ξT ) ... (19)

where yt ∈ Rnt , t = 1, ..., T , are the decisions variables, Gt :

Q

t s=1



Rns ×Rdt → R are continuous functions and

Yt : Rnt−1 × Rdt ⇒ Rnt , t = 2, ..., T , are measurable multifunctions. The (continuous) function G1 : Rn1 → R, the (nonempty) set Y1 are deterministic. The example in the previous section belongs to this class of multistage problem. Also, if we consider Gt (y1 , ..., yt , ξt ) := Ft (yt , ξt ), Yt (yt−1 , ξt ) := Xt (yt−1 , ξt ), t = 2, .., T , G1 := F1 and Y1 := X1 , then every element of class (1) can be written in form (19). Let us show that the converse is also true. Just write xt := (y1 , ..., yt ) and Ft (xt , ξt ) := Gt (xt , ξt ), t = 2, ..., T . Also x1 := y1 , F1 := G1 and X1 := Y1 . This is quite straightforward, but dealing with the remaining (measurable) multifunctions is more tricky. For t = 2, ..., T , define the (linear) projection operators:

pt :

Q

t−1 s=1

 Rn s × Rd t

(y1 , ..., yt−1 , ξt )



Qt−1

7→

(y1 , ..., yt−1 )

s=1

Rn s

(20) qt :

Q

t−1 s=1



Rn s × Rd t

(y1 , ..., yt−1 , ξt )



Rnt−1

7→

yt−1

Let us define:

Xt (xt−1 , ξt ) := {pt (xt−1 , ξt )} × Yt (qt (xt−1 , ξt ), ξt ) = {xt−1 } × Yt (yt−1 , ξt ).

Since the multifunction Yt (yt , ξt ) is measurable and qt is continuous (hence inner semicontinuous), it follows

7

by [5, Theor.14.13(a)] that Yt (qt (xt−1 ), ξt ) is measurable. Moreover, pt−1 is measurable, so by [5, Prop.14.11(d)] we conclude that Xt (xt−1 , ξt ) is measurable. We conclude that both classes of problems, (1) and (19), are equivalent, in the sense that each instance of one class can be transformed in an instance of the other class.

4

Discussion

In this work we have shown that the sample sizes estimates derived in [2] required to solve a multistage stochastic program with a given accuracy by the sample average approximation method are tight. Our example shows that there are instances of problem (1) in which the required number of paths in the scenario tree grows exponentially in T . This confirms that the number of scenarios that should be generated in order to solve (1) by Monte Carlo (conditional) sampling methods with a given accuracy can grow very fast with the increase of the number of stages.

5

Acknowledgments

The author wishes to thank Prof. Alexander Shapiro for useful discussions, helpful comments and support in writing this paper. The author also wishes to thank Prof. Alfredo Iusem for his help in preparing the manuscript. Of course, the author is the only responsible for any errors this document may contain.

References [1] A.J. Kleywegt, A. Shapiro, T. Homem-De-Mello, The sample average approximation method for stochastic discrete optimization, SIAM Journal on Optimization 12 (2001) 479-502. [2] A. Shapiro, On complexity of multistage stochastic programs, Operations Research Letters 34 (2006) 1-8. [3] A. Shapiro, D. Dentcheva, A. Ruszcy´ nski, Lectures on Stochastic Programming: Modeling and Theory. SIAM, Philadelphia, 2014. [4] A. Shapiro and A. Nemirovski, On complexity of stochastic programming problems. In V. Jeyakumar and A.M. Rubinov, editors, Continuous Optimization: Current Trends and Applications, Springer-Verlag, Berlin, 2005, 111-144. [5] R.T. Rockafellar and R.J.-B. Wets. Variational Analysis. Springer, Berlin, 3rd printing, 2009.

8