Operations Research Letters A sample-path ... - Semantic Scholar

Report 2 Downloads 102 Views
Operations Research Letters 36 (2008) 547–550

Contents lists available at ScienceDirect

Operations Research Letters journal homepage: www.elsevier.com/locate/orl

A sample-path approach to the optimality of echelon order-up-to policies in serial inventory systems Woonghee Tim Huh a,∗ , Ganesh Janakiraman b a

Department of IEOR, Columbia University, United States

b

Stern School of Business, New York University, United States

article

info

Article history: Received 22 January 2008 Accepted 8 May 2008 Available online 11 June 2008

a b s t r a c t We present a new proof of the optimality of echelon order-up-to policies in serial inventory systems, first proved by Clark and Scarf. Our proof is based on a sample-path analysis as opposed to the original proof based on dynamic programming induction. © 2008 Elsevier B.V. All rights reserved.

Keywords: Inventory Multi-echelon Sample-path analysis

1. Introduction In this paper, we provide an alternate proof of the optimality of echelon order-up-to policies in serial inventory systems, first proved in the seminal paper by Clark and Scarf [5]. Our proof is based on a sample-path analysis as opposed to the original proof, based on dynamic programming induction. The crux of our proof is the following claim: for every policy that is not of the echelon order-up-to type, we can construct a policy that dominates the original policy strictly. We now highlight the weaknesses and strengths of our approach relative to the original. In our analysis, we assume just as Clark and Scarf do that the single period cost is a sum of N functions (where N is the number of echelons) of the individual echelon inventory positions. While Clark and Scarf require these functions to be weakly convex, we require strict convexity. (Although this is a stricter assumption than weak convexity, note that any convex function can be approximated arbitrarily closely by a strictly convex function.) Furthermore, their proof also provides an algorithm for computing the optimal order-up-to levels, whereas our analysis does not. In these aspects, our analysis is weaker than the original. On the other hand, we believe that our proof is simpler, especially for those who are not familiar with the technique of dynamic programming induction. Given that the optimality of echelon order-up-to policies is one of the most important results

∗ Corresponding address: Industrial Engineering and Operations Research Department, Columbia University, 500 West 120th Street, New York, NY 10027, United States. E-mail address: [email protected] (W.T. Huh). 0167-6377/$ – see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.orl.2008.05.006

in inventory theory, we think that there is merit in developing new proofs that are fundamentally different from the original. The only other alternate proofs of this result, that we are aware of, are due to [3,4,6]. While elegant and important in their own right, they require the cost function to be composed of linear holding and shortage costs. (Strictly speaking, the analysis of [6] allows cost functions that can depend on the waiting time of an individual customer and the time an inventory unit spends in an echelon. However, when we restrict attention to costs that are functions of echelon inventory positions alone, only linear holding and shortage costs are allowed.) Moreover, [4] requires demands to be independently and identically distributed, while [3] requires demands to be Markov-modulated; in contrast, we do not make any assumptions on the evolution of demands except that demands are unaffected by the ordering decisions. Additionally, these two papers consider the long run average cost criterion whereas our analysis holds for the finite horizon or the infinite horizon discounted cost criterion. Recently, there have been some notable generalizations of these optimality results: for example, [2,7] study serial inventory systems in which each echelon has a specified replenishment frequency. 2. Problem definition We consider an uncapacitated two-echelon serial inventory system under periodic review. Demand occurs at the lower stage (stage 1), which orders from the upper stage (stage 2); stage 2 orders from an outside source with ample supply. The replenishment lead times in the system are deterministic, and we assume that both lead times are one period in length. We assume

548

W.T. Huh, G. Janakiraman / Operations Research Letters 36 (2008) 547–550

that demands in any two distinct periods are independent of each other. (These assumptions – that there are exactly two stages, that the lead times are one period in length, and that demands are independently distributed – are made only for the sake of exposition. Our proof extends readily to an arbitrary number of stages, arbitrary lead time lengths and any demand process whose evolution is unaffected by the policy.) We consider a finite planning horizon T , and periods are indexed forward by t. Let (Dt |t = 1, 2, . . . , T ) denote the sequence of nonnegative random variables representing demand. The sequence of events in each period t is given as follows. (i) Each echelon j ∈ {1, 2} receives the delivery due to arrive in period t, which is the same as the quantity ordered in the previous period, j qt −1 . Any backlogged demand at stage 1 is immediately satisfied to the extent possible. (ii) The manager observes the amount of j inventory in each echelon j. Let xt denote the net echelon inventory level for echelon j, which is the total amount of physical inventory at stages j, . . . , 1 minus the amount of backordered demand at j stage 1. (iii) The manager makes the ordering decision qt ≥ 0 for j each stage j = 1, . . . , N. We require that qt cannot exceed the amount of inventory available in the immediately upstream stage, j+1 j j which is xt − xt . (For simplicity, let x3t = ∞ for all t.) Let yt denote the after-ordering echelon inventory position for echelon j j j j, i.e., yt = xt + qt , which satisfies j yt

j xt

∈[ ,

j+1 xt

].

(1)

(iv) Demand Dt is realized and is satisfied to the extent possible. Any excess demand is backordered to the next period. The j dynamics of the system can be represented as follows: xt +1 = j

y t − Dt . The expected single-period cost for period t, φt , is given by a separable convex function of (y1t , y2t ), i.e.,

φt (y1t , y2t ) = L1t (y1t ) + L2t (y2t ), L1t

where both j

j

and j

L2t

(2)

are strictly convex. We assume that each

Lt (yt ) → ∞ as yt → ∞ or − ∞. The objective is to minimize the PT T -horizon expected cost E [ t =1 φt (y1t , y2t )]. The above problem can be formulated as a dynamic program where the state and action pair in each period is given by xt = (x1t , x2t ) and yt = (y1t , y2t ), respectively. An echelon order-up-to policy is an ordering policy characterized by a vector of order-up-to levels St = (St1 , . . . , StN ), which is defined for each period t. The ordering quantity at each stage is chosen to bring the echelon inventory level as close to the orderup-to level as possible. We also call this the echelon order-up-to S policy. Thus, for j ∈ {1, 2}, j qt

h

= min



j St

,

j+1 xt





− j

j xt

i+ j+1

ytj = max min(St , xt

,

B2 B1 B A2 (ii) For every pair of states xAt = (xA1 t , xt ) and xt = (xt , xt ), B A2 A1 A and the corresponding pair of actions yt = (yt , yt ) and yt = B2 (yB1 t , yt ), exactly one of the following three statements holds for each j ∈ {1, 2}: Bj ◦ yAj t = yt ; Aj Bj Bj Aj A,j+1 ◦ yt < yBj ; t and either yt = xt or yt = xt Aj Bj Aj Aj Bj ◦ yt > yt and either yt = xt or yt = xBt ,j+1 .

Proof. The claim that (i) implies (ii) follows directly from the definition of an echelon order-up-to policy. We will now prove that (ii) implies (i). Let yt (xt ) = (y1t (xt ), y2t (xt )) denote the action (inventory position after ordering) for the given state (inventory j level before ordering) xt = (x1t , x2t ). Fix j ∈ {1, 2}. Recall yt ∈

[xjt , xjt+1 ] from (1). We consider the following cases separately. j j j j+1 Case I. If yt (xt ) = xt for all xt , or yt (xt ) = xt for all xt , j

then echelon j follows the order-up-to St policy in period t, where j j St = −∞ or St = ∞, respectively. j j j+1 Case II. Suppose yt (xt ) ∈ {xt , xt } for all xt . Then, define j

and Mj = {xt :

 ), xjt .

j xt

=

j+1 xt

},

j

j+1

j

Rj = {xt : xt = yt (xt ) < xt

}

}.

Note that for every j ∈ {1, 2}, any xt belongs to exactly one of the j j sets Lj , Rj and Mj . Then, we have x˜ t ≤ xˆ t for any x˜ t ∈ Lj and xˆ t ∈ Rj . j

j

j

j

(Otherwise, we must have yt (˜xt ) > x˜ t > xˆ t = yt (ˆxt ), which j j j j+1 imply by (ii) that yt (˜xt ) = x˜ t or yt (ˆxt ) = xˆ t . In the first case, j j j+1 x˜ t ∈ Lj implies x˜ t = yt (˜xt ) = x˜ t , contradicting the definition of j

j+1

j

Lj . Similarly, in the other case, xˆ t ∈ Rj implies xˆ t = yt (ˆxt ) = xˆ t contradicting the definition of Rj .) Thus, there exists j

j

j St

,

such that

j

x˜ t ≤ St ≤ xˆ t for any x˜ t ∈ Lj and xˆ t ∈ Rj . It is now easy to verify that for every xt in Lj , Rj and Mj , the policy followed by echelon-j satisfies the definition of the order-up-to Sj policy. Case III. We proceed by assuming otherwise, that is, there exists j j j+1 j j xˆ t such that yt (ˆxt ) ∈ (ˆxt , xˆ t ). Define St = yt (ˆxt ). For any state vector x˜ t , the condition of this proposition implies either j j j j j j (a) yt (ˆxt ) = y˜ t (˜xt ), (b) yt (ˆxt ) < y˜ t (˜xt ) = x˜ t , or (c) yt (ˆxt ) > j j+1 j y˜ t (˜xt ) = x˜ t . Thus, echelon j follows the order-up-to St policy in period t.  The key idea for our sample-path approach is based on the following definition. Let A and B be any pair of systems starting B2 A2 B B1 from initial states xA1 = (xA1 1 , x1 ) and x1 = (x1 , x1 ), respectively. We say that a pair of systems (C , D) is the tightened pair of (A, B) if both of the following conditions are satisfied: (i) The initial states for C and D are xA1 and xB1 , respectively. D2 (ii) In each period t, yCt = (yCt 1 , yCt 2 ) and yDt = (yD1 t , yt ) are feasible actions, which solve min

or equivalently

j+1

j

Lj = {xt : xt < yt (xt ) = xt

Cj Dj yt ,yt

Cj

n

Dj Aj Bj Dj Cj |yCj t − yt | : yt + yt = yt + yt ,

Cj

C ,j+1

xt ≤ yt ≤ xt

Dj D,j+1 , xDj t ≤ yt ≤ xt

o

(3)

for each j ∈ {1, 2}. 3. Analysis The following proposition provides a useful characterization of echelon order-up-to policies. Proposition 1. For any t ∈ {1, . . . , T }, the following statements are equivalent for the serial inventory system of Section 2: (i) There exists an order-up-to vector St = (St1 , St2 ) such that the system follows an echelon order-up-to St policy in period t.

The first condition states that systems C and D have the same initial states as A and B, respectively. The second condition states in period t, feasible actions are chosen such that the total number of units ordered in C and D is the same as the corresponding Cj quantity in A and B in each period, and the difference between yt Dj and yt is as small as possible. The existence of the tightened pair (C , D) follows from a straightforward induction. Furthermore, the Cj Dj values of yt and yt in (3) can easily be shown to be unique. (Note the optimization problem in (3) can be reformulated as a singlevariable problem.)

W.T. Huh, G. Janakiraman / Operations Research Letters 36 (2008) 547–550

Proposition 2. Let A and B be a pair of systems starting from initial states xA1 and xB1 , respectively, and let (C , D) be the tightened pair of (A, B). Then, for any t ∈ {1, . . . , T } and j ∈ {1, 2}, Aj

Bj

Cj

Dj

Cj

Dj

Aj

Bj

min(yt , yt ) ≤ min(yt , yt ) ≤ max(yt , yt ) ≤ max(yt , yt ). (4) Cj

Aj

Dj

Bj

Proof. Since systems A and B do not follow an echelon order-upto policy in period 1, Proposition 1 implies the existence of two state-action pairs (xA1 , yA1 ) and (xB1 , yB1 ) that violate the conclusions of Proposition 1(ii), i.e., there exists some j ∈ {1, 2} such that either Aj

Bj

y1 6= x1 ,

Bj

y1 6= x1 ,

y1 < y1 ,

Proof. If t = 1, then y1 = y1 and y1 = y1 are feasible decisions

or

in the optimization problem defined in (3). Thus, the pair of y1 and

y1 > y1 ,

Cj

Dj

Cj

Dj

Aj

Bj

y1 optimizing (3) satisfies |y1 −y1 | ≤ |y1 −y1 | for each j. From the Cj

Dj

Cj

Dj

Aj

Bj

fact y1 and y1 are optimal and y1 + y1 = y1 + y1 , it follows that (4) holds for t = 1. We proceed by induction with an hypothesis Cj Dj that feasible (yt −1 , yt −1 ) exists and satisfies (4) for t − 1, where j

j

t > 1. Since xt = yt −1 − Dt −1 , it follows from (3) that, for each j, Aj xt

Bj xt

+

Cj xt

=

+

Dj xt

(5)

and Aj xt

min(

,

Bj xt

Cj xt

) ≤ min(

,

Dj xt

Cj xt

) ≤ max(

,

Dj xt

Aj xt

) ≤ max(

Aj

,

Bj xt

). (6)

Bj

Since yt is a feasible action in system A and yt is a feasible action Aj Aj A,j+1 in system B, it follows from (1) that yt ∈ [xt , xt ] and yBj t ∈ Bj B,j+1 [xt , xt ]. From this observation and (6), we obtain the following inequalities: Aj yt

min(

,

Bj yt

A,j+1 xt

,

Aj

Bj

) ≤ min(

B,j+1 xt

C ,j+1 xt

) ≤ min(

,

D,j+1 xt

),

(7)

and Aj

Bj

Cj

Dj

max(yt , yt ) ≥ max(xt , xt ) ≥ max(xt , xt ).

(8)

Note that, in the optimization problem (3), the upper bounds on Cj Dj C ,j+1 D,j+1 yt and yt are given by xt and xt , respectively, and the Cj Dj lower bounds are given by xt and xt , respectively. Since (C , D) is the tightened pair of (A, B), it can now be verified using (5), Cj Dj Aj Bj Cj Dj (7) and (8) that min(yt , yt ) ≥ min(yt , yt ) and max(yt , yt ) ≤ Aj Bj max(yt , yt ), completing the induction argument.  If both systems A and B follow the echelon order-up-to policy with the same order-up-to vector, then it can easily be seen that the tightened pair of (A, B) is the same as (A, B) itself. Otherwise, we show in Proposition 3 that the tightened pair of these systems strictly improves the combined performance. More specifically, we show that a policy which is not of the echelon order-up-to type in period 1 can be dominated strictly by another policy from at least one starting state. Here, the choice of period 1 as the period of interest is without any loss of generality. Proposition 3. Consider a policy that is not of the echelon order-upto type in period 1. Then, there exists a pair of states (xA1 , xB1 ) such that the following statements are true, where A and B denote a pair of systems starting from (xA1 , xB1 ), and (C , D) is the tightened pair of (A, B): (a) There exists j ∈ {1, 2} such that Aj

Bj

Cj

Dj

Cj

Dj

Aj

Bj

min(y1 , y1 ) < min(y1 , y1 ) ≤ max(y1 , y1 ) < max(y1 , y1 ). (b) For any T ◦ ≥ 1, T◦ X

A2 φt (yA1 t , yt ) +

t =1

>

T◦ X

B2 φt (yB1 t , yt )

t =1 T◦ X t =1

φ(

C1 t yt

,

yCt 2

)+

T◦ X t =1

D2 φt (yD1 t , yt ).

(9)

549

Aj

Aj

A,j+1

,

(10)

Bj

B,j+1

.

(11)

Bj

Bj

and y1 6= x1

Aj

Aj

and y1 6= x1

Since (10) and (11) are symmetric, we proceed by assuming (10). Bj Bj Then, by the feasibility of yA1 and yB1 in (1), we have y1 > x1 and A,j+1

Aj

y1 < x1 . Since (C , D) is the tightened pair of (A, B), it follows from (3) that Aj

Cj

Dj

Cj

Dj

Bj

y1 < min(y1 , y1 ) ≤ max(y1 , y1 ) < y1 , j

which establishes (a). By the strict convexity of L1 , we obtain j L1

Aj y1

j L1

Bj y1

j

Aj

j

Bj

j L1

Cj y1

j L1

Dj y1

( ) + ( ) > ( ) + ( ). Therefore, we establish the strict inequality in (9) from the following result: j

Cj

j

Dj

Lt (yt ) + Lt (yt ) ≥ Lt (yt ) + Lt (yt ) for each j and t ≥ 2, Aj

Bj

Cj

Dj

which follows from yt + yt = yt + yt , Proposition 2 and the j convexity of Lt .  We immediately obtain the following result. In the statement of this result, we say a specific policy is suboptimal if there exists another policy that achieves a strictly lower expected cost in [1, T ] from at least one starting state. Theorem 4. For the multi-echelon serial system described in Section 2, any policy outside the class of echelon order-up-to policies is suboptimal. Proof. Inequality (9) implies either

φt (yCt 1 , yCt 2 ) or

PT ◦

PT ◦

PT ◦ A2 φt (yA1 t , yt ) > t =1 PT ◦ D1 D2 t =1 φt (yt , yt ). The

t =1

B1 B2 > t =1 φt (yt , yt ) result now follows directly from the definition of a suboptimal policy. 

Our analysis is valid for discounted cost models for both finite and infinite horizons and also when linear ordering costs are present. Theorem 4 implies the optimality of echelon orderup-to policies when an optimal policy exists - this existence is guaranteed, for example, when demands are continuous random variables, by Corollary 8.5.2 and Proposition 9.17 of [1]. Remark. In this paper, we have shown the suboptimality of any policy that is not an echelon-order-up-to type when the single period cost function is strictly convex, under the setting of either the finite horizon or the infinite horizon discounted cost criterion. We explain why we are unable to prove the existence of an optimal echelon order-up-to policy when our assumptions are relaxed. First, when the assumption of strict convexity of the single period cost is replaced with the assumption of weak convexity, our technique of working with the tightened pair leads to a new system which has a cost that is less than or equal to (as opposed to strictly less than) that of the original system. Moreover, since the number of possible states is infinite, there could be an infinite number of after-ordering inventory levels, and there is no guarantee that this process of working with tightened pairs will eventually converge to an echelon order-up-to policy. (Note that each step of the tightened pair ‘‘corrects’’ the optimal action at each pair of states, and a state-action pair that is tightened in one iteration may be subject to further tightening in a future iteration.) While it is plausible that such a difficulty can be circumvented by generalizing our technique to tighten infinitely many states or a continuum of states all at once, the complexity of such a technical argument is beyond the scope of this paper.

550

W.T. Huh, G. Janakiraman / Operations Research Letters 36 (2008) 547–550

We now consider the long-run average cost criterion under the assumption that the single period cost function is strictly convex. While our analysis shows that, for any finite T , the T -period average cost of any system that does not follow an echelon order-up-to policy has a strictly higher cost than the tightened system (see (9)), it is not clear if the strict inequality holds for the infinite horizon long run average costs. For reasons similar to those discussed in the previous paragraph, we do not know if the iterative process of tightening will eventually result in an echelon order-up-to policy. Acknowledgement We sincerely thank the review team for its valuable suggestions that have improved the exposition of the paper. The first author was supported partially by NSF grant DMS-0732169.

References [1] D. Bertsekas, S. Shreve, Stochastic Optimal Control: The Discrete Time Case, Academic Press, New York, 1978. [2] X. Chao, S.X. Zhou, Optimal policy for multi-echelon inventory systems with periodic batching and fixed replenishment intervals, Working Paper, The Chinese University of Hong Kong, 2005. [3] F. Chen, J.S. Song, Optimal policies for multi-echelon inventory problems with markov-modulated demand, Operations Research 49 (2) (2001) 226–234. [4] F. Chen, Y.S. Zheng, Lower bounds for multi-echelon stochastic inventory systems, Management Science 40 (11) (1994) 1426–1443. [5] A.J. Clark, H. Scarf, Optimal policies for a multiechelon inventory problem, Management Science 6 (1960) 475–490. [6] A. Muharremoglu, J.N. Tsitsiklis, A single-unit decomposition approach to multiechelon inventory systems. http://web.mit.edu/jnt/www/publ.html, 2003. [7] G.-J. van Houtum, A. Scheller-Wolf, J. Yi, Optimal control of serial inventory systems with fixed replenishment intervals, Operations Research 55 (4) (2007) 674–687.