. Proceedings of the38”
WeA02
09:50
Conferenceon Decision&Control PhoenixjArizonaUSA December1999 ●
A Markov Decision Process Model for Capacity Expansion and Allocation Shalabh Emmanuel
Bhatnagar
[email protected].
Fern4ndez-Gaucherand Michael
C.
Ying Steven
1
[email protected].
He
[email protected] [email protected] for Systems
University College
edu
Fu mfutlisr.umd.edu
I. Marcus
Institute
edu
Research
of Maryland
Park,
MD
20742
http: //www.isr.umd.edu/IPDPM/
suit in policies that utilize the available information in a way that provides a trade-off between immediate and future benefits and costs, and that utilizes the fact that observations will be available in the future (see Puterman 1994 and Bertsekas 1995). The MDP employs an aggregate factory model for describing the state of the fab. Aggregation avoids excessive computational complexity, since a detailed factory model would have too many states. A policy will specify, for each possible factory state, the best actions to implement. Such actions include purchasing (or discarding) equipment, upgrading equipment /processes, and the allocation of equipment to product lines. Actions have costs that include the investment and operating cost and possible production shortages (from the production targets), and benefits that includes increased capacity (maximum throughput). The costs, benefits, and the system state themselves are subject to random uncontrollable events that are both exogenous and endogenous to the fab: equipment may be delivered late or may fail, the performance of newly-installed equipment is uncertain, and the market for certain chips may collapse.
Abstract We present a finite-horizon Markov decision process (MDP) model for providing decision support in semiconductor manufacturing on such critical operational issues as when to add additional capacity and when to convert from one type of production to another.
Keywords: Semiconductor Manufacturing, Decision Processes, Capacity Planning.
Markov
1 Introduction The planning and scheduling of a semiconductor fab is carried ‘out according to a general hierarchical framework based on a temporal and/or physical decomposition of the system. Issues that must be addressed at the highest level include, for example, when to add additional capacity and when to convert from one type of production to another. We present a Markov decision process (MDP) model for supporting these types of decisions. In the remainder of this paper, the technical details of the problem formulation and MDP model are presented, including some details of the cost modeling, and some computational experiments are summarized.
2 The Model
and Problem
In particular, the aggregate factory state is summarized by a vector of capacities X(t) at time epoch t, where the components X(t,i) ,W(t) represent the capacity (measured, for example, in wafer starts per day or number of machines) of type w allocated to product 1 and operation i (this could be a type of sub-factory manufacturing a particular product or a type of process), Actions to be taken include decisions to (i) increase the capacity of type w by BW(t) units, such as by the introduction of new technology; and/or (~,~)j(~,~)(t) units of type w capacity (ii) switch over VW
Formulation
Our approach at the highest level of the hierarchy is to formulate a Markov decision process (MDP) model that will yield decision support for operating the fab in each of the phases of its life cycle and include life cycle dynamics such ss technology shrink. MDP models re-
from product 1 and operation i to product m and operation j, such as by qualifying tools for a different process. Randomness is explicitly modeled by the demand dl(t) for product 1. The evaluation criteria include a number of factors, including costs for excess capacity (overage)
1on leavefrom the Systems & Industrial Engineering Department, The University of Arizona, Tucson. 0-7803-5250-5/99/$10.00
@ 1999
IEEE
1380
tuple (i, i), if either 1 or i is zero, the other quantity (i or 1 respectively) will automatically be zero. Also, if a particular combination of 1,i and w is infeasible, all the corresponding quantities containing (1, i) and w will be assumed to be taking value zero. The following notation will be used: (with t denoting the period)
and capacity shortages (underage), cost of production, cost of converting capacity from one type of operation to another, and the cost of increasing capacity. A more precise description of the model is given below. We denote by T the duration of the planning horizon and by t the periods i.e., t E {1,2, ... T}. Each product will be characterized by a sequence of operations, where an operation is defined as a job to be performed on a wafer. There are typically many operations that are performed on a wafer before the final product emerges. Also, an operation maybe performed several times on a wafer. For instance, for a product which requires operations i, j and k to be performed, the actual sequence of operations before the final product (say F’) comes out could (in the order from left to right) be iijjjkijkk. In what follows, we shall not take into account the particular sequences of operations required for manufacturing the products. However, we shall take into account the number of operations of a particular type required for any given type of product. Thus in the example above, product P requires three i, four j and three k operations to be performed on it. We shall measure capacity in terms of the number of machines or tools. Note that in general, machines or tools are capable of performing one or more types of operations. Thus a machine capable of performing operations i, j and k could be used for manufacturing all those products which require one or more of these operations (and in any numbers and combinations of them).
●
TW(t): total type w capacity.
●
BW (t): new type w capacity
●
Dw(t): old type w capacity discarded.
●
.?7W (t): available reserve capacity of type w,
●
X[l,il,W (t):amount allocated to product
● ●
(l#o, i#o). Kw:availability
bought.
of available type w capacity 1, for operations of type i 6 w
factor for type w machines.
# wafers per unit time of product 1 ql,i),w: produced after type. i operations are performed on machine w, constant for given 1, i and w;
●
C(o,o),w = o v w E Zt. J_’(t,i),w: # type i operations in product 1 on machines of type w.
●
dl (t):demand for product 1.
●
11(t): inventory
for product
1.
V(z’i)’(m’~) (-t): type w capacity switched over from p~oduct 1 and operations of type i to product m and operations of type j, i,j G w, 1 # O,i # O. b v’o)o)d%~) (~); available reserve and/or newly bought capacity of type w allocated to product m and operations of type j E w. (t): portion of allocated type w capacity ● V~~I’)l(O1O) ●
Before we proceed further, we shall first develop some notation. Let lV~ represent the total number of operations associated with all the machines in the firm. For simplicity, we shall denote the operations by 1,2, ... Nt. Note that Nt depends on t since in any period t,one could decide to purchase new equipment (machines) which allows a greater number of operations. Let Xt be the set of all operations in period t, or Xt = {1,.., Nt}. Let O represent ‘no operation’. Let Pt be the set of products that the firm produces in period t,plus the ‘no product’ element O, i.e., O E Pt, Note that Pt set also depends on t, so that product mixes may change over time. If a machine is idle, we will say that it is manufacturing product O, by performing operation O. We will call a word as any lexicographic ordering of elements of Xt with the first letter O in it. Let Zt represent the set of all possible words. We will call the capacity associated with word w as type w capacity. Specifically, if there is a nonzero capacity associated with a particular word W, it would mean that there exists at least one machine in the facility which is capable of performing all operations in the word w. Later, we shall also define the set of all feasible words. In the following, the various quantities will be indexed by terms of the type (1,i) and w. These will be used to indicate the corresponding quantities associated with product 1 and operation i on machines of type w. Note that in the 1381
for product 1 and operations of type i, taken away (sent to reserve or permanently discarded). ●
C:(x): cost of increasing type w capacity by purchasing x units of new capacity.
●
CL(x): cost of decreasing type w capacity by discarding x units of old capacity.
●
C&(x): cost of switching over z units of type w capacity from one type of production and/or operation to another.
●
C:(y): inventory holding/backlogging units of product 1.
●
C:(z): pacity.
cost for y
operating cost for Z units of type w ca-
We now define the set of all feasible words At c Zt as follows. Let Al ~ the set of all words with nonzero total capacity associated with them in the first period of the planning horizon and such that each word is characterized by a set of equipment that can perform all the operations in it. Also for subsequent periods (t> 1),let At+l ~ At U {w @ AtlBw(t + 1) > O}. In what follows, we shall restrict our attention to the set of all feasible words At, in period t,since it contains all words which currently have or had in an earlier period nonzero total capacity associated with them.
Remark: Note that At for t > 1 may also contain words with zero total capacity associated with them in period t,aa a result of discarding capacity in a previous period. Words of the type Oi E At represent dedicated capacity for product i. The availability factor KW for type w machines is required to take into account the effect of breakdowns, periodic maintenance etc. Also for w c At, the available reserve capacity
is
uw(~) ~ L~w(~) – x{(l,i)~pt xwp#o,i#o} %l))w(o the capacity available in period t after the various capacity allocations to the products and operations have been made.
The state vector at time t is given by (Z’W(t), X(l,i),w(t), Ii(t), 1 c Pt\{O}, i 6 w\{ O}, w G d~)~. The actions (~>O>(mA (t), /, ~ ~ vector at time t is (Bw (t), Dw (t), VW Pt, i,j ~ w, w ~ At)=, where it is assumed that @iJJmJ (t) = o if (1, i) = (m, j). At the beginning w of any period, the decision maker observes the state of the system and chooses an action. The total cost over the entire planning horizon that we want to minimize is J = E
+2 wGAt
5( ~ (C;(BW(t)) [ t=l wEAi
x
{(m,j)EP,
1)
0,
(7)
Tw(t) ~ DW(t)
~
O.
(8)
(6)
+ C~(DW(t)))
Cost
Modeling
(l,i)#(m,j)}
+l?~(t)
+
>
(6) imply X(l,q,w(t) >0, V1 c Pt, i 6 w, and similarly Constraints (5) and (6) imply Uw(t) > O; hence, these two sets of constraints are not included explicitly as separate constraints in our formulation.
=
– D~(t),
w C At,
(1)
(2)
x(l,i),w(~) +
cost
structure
for
our
model
is
given
by
{C8(Z), CL(Z), C&(z), C:(Y), Cfi(z)}, which we further discuss next. Modeling capital and operations costs in a semiconductor fab environment is a complex task. Not only consumables and operations have to be accounted for, but also, e.g., capital investments, depreciation, management, and the bussiness cost for succeding or failing to capture market dynamics. As costs continue to scalate rapidily (current wafer fabs approach $2 billion), equipment and operation efficiency and responsiveness are thus essential.
We have the following state equations:
x(l,i),w(~
T;(t) , Bw(t)
1,m c Pt, i,j C W,
The second term on the RHS of Equation (3) is the ‘throughput term’ in the inventory equation and gives the number of ‘finished wafers’ of product 1 in period t; thus, the bottleneck operation for a particular product for the given state of allocated capacity is essentially the operation(s) yielding the minimum term (i.e., the arg rein) in Equation (3). There is no machine workin-process inventory explicitly considered (only finished product inventory), as this is meant to be a higher-level planning model, but the difference between reserve capacity and excess capacity at a non-bottleneck machine is still captured in the last term of the cost function, in the form of an operational charge for allocated capacity (versus no such charge for reserved capacity). In Equation (4), the cases m = O and j # O and m # O and j = O do not arise, since we have already mentioned that if either of m or j is O, the other one is automatically O. Also, note that Constraints (4) and
The
=Tw(t)
0,
c:(vy)h~)(t))
{(l,i), (m,j)ll,ntEPt,i,jcw,
TW(t+l)
v(L4,(nh~)(t)~
xw[(7n,j)#(l,i)}
where 1 E Pt,
i~w,
1#0,
i#O,
w~dt,
1
min i {
—c(~,~),wx(~,~),w(t) E F(Li),w {wG41~(I,i),w>o} +’J(t),1 e P,\{
O},
} (3)
The constraints are as follows (w E At throughout):
V(w’’d(t) E {(m,j)GPfxwl(m,j) #(l;)}
< X(l,i),w(t), i e w,
z,~# 0? (4)
x {(7n,j)EP, xwlTn#o,j#o]
Vyw(%d
(t) < Uw(t)>
(5) 1382
Cost modeling, as relevant to our model, has been treated quite frugally in the literature, In Klutke et al. (1992), costs are simply specified as functions g~ (zk, u~, w~) of states, actions, and disturbances, within a dynamic programming framework following Bertsekas (1995). A linear programming approach was presented in Bermon et al. (1995), where the cost of an operation j on a tool of type i was modeled as being proportional to the time required to perform that operation on the tool. As a first approximation, the cost of the tool can be linearly depreciated over its lifetime, or on the other hand specific fab data can be utilized, obtained via, e.g., activity based cost analysis. Another model, based on a mathematical programming formulation, that tries to capture market dynamics in its cost structure is presented in Feigin et al. (1999).
Thus, operations 1 and 2 correspond to operations on product A, whereas operations 3 and 4 correspond to operations on product B. Machines are words w chosen from the set Zt :
In Wood (1997) a comprehensive survey of costs for different operations in semiconductor fabs is presented. In Duley et al. (1997) afab (macro) level cost model is given, as a function of total wafer starts per month, specified by the triplet (m, M, G), where the minimum cost per wafer, Cm~n is given as:
01 = machine dedicated to lithoA operation, 02 = machine dedicated to etchA operation,
C~~n = M + mTR~az,
03 = machine dedicated to lithoB operation,
and TR~az denotes the maximum fab throughput rate, Ll denotes the capacity independent costs, and m denotes the capacity dependent costs. In addition, for nonbottleneck tools, the (maximum) fraction of the tool that can be utilized contributes to m, and the rest to M: the granularity cost G. Costs can be alternatively categorized, e.g., as Duley et al. (1997): ●
Capital Costs: (bare) Building, fitup (converting a building into a fab), tool costs.
●
Building/ fitup/tool depreOperating Costs: ciation, labor, engineering, administration, consumables, utilities.
04 = machine dedicated to etchB operation, 013 = flexible litho machine, 024 = flexible etch machine, U
{012, 034,014,023,0123,0124,
where O corresponds to ‘no operation’ and the last set’s elements do not correspond to feasible machines in the real model. Other model input (system) parameters that must be defined are C(l,i) ,W and F(l,g),W, which essentially specify the operation times for a particular product on a particular machine.
The cost structure in our model is flexible enough to capture all of the above cost types, e.g., C’;(z) for tool costs, clean room area, labor, and consumables, and thus impacts upon all quantities in the model (m, M, G) (linear depreciation of the tool cost could also be used as a simple approximation). As a second example, C’;(z) requires set-up time and labor changes, and hence impacts m and G. Furthermore, market dynamics, e.g., rapid price decreases, lost sales due to demand not met, etc., can be accounted for in C~ (z).
ql,i),w=
1 (A, 1), 013; (A, 2), 024; (B,3), 013; (B, 4),024; (A, 1),01; (A, 2),02; (B, 3), 03; (1?, 4), 04; O otherwise, {
i.e., all products require a single operation on the appropriate machine.
C(l,i),w = 3 Numerical
0134,0234, 01234},
Example {
In order to get a better understandingof how the model
1 0.5 1.2 0.6 o
(A, 1), 013; (A, 2),024; (B, 3),013; (B, 4),024; (A, 1),01; (A,2),02; (B, 3), 03; (B,4), 04; otherwise,
i.e., a flexible etch or litho machine completes one operation on product A in an hour, product B takes twice as long on both operations, and a flexible machine is 20% slower than a dedicated one (e.g., 60 minutes versus 50 minutes for product A). For simplicity, we will take KW = 1 for all w, i.e., availability is 100%.
would be specified in practice, we provide here a simple example of a fab producing just two products that each have two operations (litho and etch), The example is meant to be illustrative of the notation in the MDP model, especially with regard to the factory state and capacity expansion/allocation actions. For actual-sized fabs, the notation would all be handled by a computer, as it would be practically infeasible to enumerate the various components of the model. Specifically, the fab will be characterized as follows. There are two products: “A” and “B”; two operations on each: “litho” and “etch”, distinguished by product; machines – litho or etch – could be flexible (able to do the respective operation on both products A and B) or dedicated (only able to do the respective operation on one of A or B); operation times, which depend on the product and the machine.
We consider a specific case of two each of flexible etch and litho machines and no dedicated machines (from Bhatnagar et al. 1999): T013(1) = T02A(1) = 2, TW(l) = O otherwise; Al = {013, 024}. Henceforth, we drop the t subscript for notational simplicity. The product throughputs are given by TPA = min {X( A,1),013, X(A,2),024}, TPB = 0.5 min {X( B,3),013, X(B,4),024},
We first define products, operations, and machines. Products 1 are chosen from the set Pt = {O, A, B}, where O corresponds to ‘no product’. Operations i are chosen from the set Xt = {1,2,3, 4}, ~t = 4, where 1 # lithoA, 2 ~ etchA, 3lithoB, 4etchB. 1383
where the arg min gives the bottleneck the product (1 or 2 for A; 3 or 4 for B).
operation
for
The remaining model parameters that need to be provided in order to have a fully specified model that could be numerically solved are the demand process,
.,
m m
the cost parameters, and the decision variables: capacity amounts (present, bought, discarded) and allocation. To get a feeling for the computational demands of solving our model, as well as to gain some insight into the structure of optimal expansion and allocation policies, we worked through some numerical runs for the simple example to illustrate the application of the model for finding the optimal switching policy between two flexible capacities. In particular, we focused on the impact of demand and cost parameters on the optimal policy. The software laboratory SYSCODE - which includes various routines for solving dynamic programming models – is used here to obtain the optimal policy, see (Fermindez Gaucherand et al. 1998).
Table
unit backlog cost for product A
unit inventory cost for product A unit inventorv cost for moduct B Table
In these experiments, we also assume the following. During the decision horizon, no machine is purchased, discarded, or sent to reserve, and no maintenance is required. The only actions are to switch flexible machines between different products. For each type of machine (litho or etch), no more than one machine can be switched from one product and/or operation to another in a period. Products A and B are operated in whole unit and half units, respectively. The inventory warehouses for products A and B have capacities of 1 and 0.5 units, respectively. There is a limit on backlogged demand of 1 and 0.5 units for product A and B, respectively. Demand exceeding backlogging limits is lost. The demands for a given product are independent and identically distributed from period to period, and mutually independent between products.
X( A,2),024 = z – X( B,4),024 E IB C 1A E {–1,0, 1},
{0,1,2},
1 2
1 6
V(B,3),(A)1) V(A,Zh(13)4), ~
013
Y
v;2;14)t(A12) e
024
{0,1}).
~he assumption limiting the switching of any particular machine type to one machine per period means that V(A,1),(B,3) V(13,3),(A,1), VJJ,2),(B)4), VJJ,4),(A,Z) ~ {., ~}, 013 ~ 013 Furthermore, since switching is a two-way interaction, we have VW),(M) w
>
() ~
V’ww>o
=
0,
We will label the resulting nine possible actions as follows: A1=(l,O,l,O), A2=(1,0,0,0), A3=(1,0,0,1), A4=(0,0,1,0), A5=(0,0,0,0), A6=(0,0,0,1), A7=(0,1,1,0), A8=(0,1,0,0), A9=(0,1,0,1). Thus, for example, A7 is the action that moves one unit of litho machine capacity from product B to A and one unit of etch machine from A to B, whereas A5 is the action that does no switching. Note that for a given state group, not all actions are admissible; for example, the admissible actions for state group (0,0), in which all capacity is currently dedicated to product B, are those actions that move one or zero units of capacity from B to A, i.e., actions A5, A6, A8 and A9.
{(X( A,1),013,X( A,z),OM,1A, IB}, where the first and second components are, respectively, the litho and etch capacities allocated to product A, and the third and fourth components are, respectively, the inventory levels of products A and B. Note that the capacity allocated to product B is simply the remainder of total machine capacity for each tool type (litho or etch), because we have assumed for this simple example that no capacity is ever put into reserve, reducing the dimensionality of the state vector from six dimensions to four. Under the assumptions specified above, the components of the vectors take values from the following sets: {O, 1, 2},
2 1
2: unit inventory and backlog costs
(V&’ I),(B,3)
For the Specific Example under these assumptions, the state vector of our MDP model takes the form
X(,4,~),o~~ = 2 – X( B,3),013 C
1: demand probability distributions
In the design of the test problems, we considered both stochastic and deterministic demands. The demand distributions for the 6 sets of configurations are shown in Table 3, where S1 to S3 indicate the stochastic cases and D 1 to D3 indicate the deterministic cases. We also considered different sets of inventory, backlogging, operating, and switchover costs, in order to investigate the relative impact of different costs on the optimal policy. The unit costs are shown in Tables 3 and 3. The different demand distributions and cost coefficients give rise to 72 test problems for our experiments.
{–0.5, O,0.5},
and thus the total number of possible states is 81. We will define a state group as the set of those states that have the same capacity allocation x(,4,1 ),o13 and X( A,2),024, i.e., they differ only in their product inventory levels.
Experiment 1: Compare the optimal policies with different demand distributions. Fix inventory costs and backlogging costs as 12, fix operating costs and switch costs as 01, and change demand distribution from S1 to D3.
Next, we see that the action vector is of the form 1384
‘
unit unit unit unit unit unit
operating cost operating cost operating cost operating cost switch cost on switch cost on
Table
on litho for A on litho for B on etch for A on etch for B litho machine etch machine
m m 0.2 m 0.2 0.1 0.1 0.3 0.3
0.2 0.1 0.1 0.3 0.3 —
03 0.2 0.2 0.1 0.1 0.3 0.9
3: unit operating and switching costs
04 0.8 0.2 0.1 0.1 0.3 0.9
7
.Rrperinzent 2 Compare the optimal policies with different inventory and backlogging costs pairs. Fix demand as S2 or D2, fix operating costs and switch cost as 01, and change inventory costs and backlogging costs from 11 to 13.
Acknowledgements This work was supported by the National Science Foundation under Grant DMI-9713720 and by the Semiconductor Research Corporation under Grant 97-FJ-491.
Experiment 3 Compare the optimal policies with different operating costs or switch costs. Fix demand as S2 or D2, fix inventory costs and backlog costs, and change operating costs and switch costs from 01 to 04.
References [1] S. Bermon, G.E. Feigin and S. Hood, Capacity analysis of complex manufacturing facilities, in Proc. IEEE Conf. Decision t? Control, New Orleans, LA, 1935-1940, 1995.
,%rperiment 4: Compare the optimal policies for initial state (1,1, 1,1), a balanced allocation of available capacity, with different configurations. A brief summary of our preliminary findings is as follows. If one product, say A, is more likely to be short of stock, and it has a higher backlogging penalty, both litho machines and both etch machines would be allocated to produce product A to avoid high cost. To reach this “absorbing” machine allocation state group (2,2), usually the switching actions are taken in one of the following ways:
[2] D.P. Bertsekas, Dynamic Programming and Optirrud Control, Vol. 1, Athena Scientific, Belmont, MA, 1995. [3] S, Bhatnagar, M.C. Fu, S.1. Marcus, and Y. He, Markov decision processes for semiconductor fab-level decision making, Proceedings of the IFA C lJth Triennial World Congress, Beijing, China, 145-150, 1999. [4] J.R. Duley, V. Varma and S.C. Wood, Sense and sensibility: the scaleable minifab, in Proc. Int ‘1. Symposium on Semiconductor Manufacturing, San Francisco, CA, October 1997.
Go to the absorbing state group directly if an admissible action exists. s First go to state group (1,1), which allocates one of each type of machine to each product, and then go to the absorbing state group in the next period.
●
●
●
●
●
[5] G.E. Feigin, K. Katircioglu and D.D. Yaa, Capacity allocation in semiconductor fabrication, this proceedings, 1999.
First go to the state group that is closest to the absorbing state group, and then go to the absorbing state group.
Optimal policies that are obtained match with intuition:
In addition to continuing the numerical experiments, we are working on incorporating a hierarchical decomposition of the capacity expansion and allocation decisions, whereby the former decisions would be made on a slower time scale (i.e., less frequently, such as semiannually) than allocation decisions (which might be made weekly or monthly). Besides providing a clearer decision-making framework for differentiating between two different levels of decisions, this should also lead to increased computational efficiency in numerical solution of the model.
[6] E. Fern4ndez-Gaucherand, J. Choi, and D. Gerhart, SYSCODE: Stochastic Systems Control and Decision Algorithms Software Laboratory, FORTRAN & MATLAB Versions, Department of Systems and Industrial Engineering, The University of Arizona, (cimarron.sie.arizona. edu/modeling/modeling. html), 1998.
for various cases
If some operating cost is much higher than the others, it is likely that the action leading to this operation will not be performed to avoid high cost . If some switching cost is much higher than others, this switching action is not likely to be performed.
[7] G.-A. Klutke, M. Kammer-Kerwick and J. Fowler, Stochastic control of wafer fabrication processes in semiconductor manufacturing, in Proc. Ist Industrial Eng. Research Conference, Chicago, IL, 449-453, 1992. Markov Decision [8] M.L. Puterman, John Wiley & Sons, New York, 1994.
For state (1,1,0,0), with one of each type of machine assigned to each product and zero inventory, when the demands are 1 and 0.5 units for products A and B, respectively, the optimal policy is the intuitively obvious one of not switchhg any capacity, i.e., continuing to run a “balanced” fab with respect to capacity.
Processes,
[9] S.C. Wood, Cost and cycle time performance of fabs based on integrated single-wafer processing, IEEE Trans. Semiconductor Manufacturing, Vol. 10, no. 1, 1997. 1385