Decision Diagrams and Dynamic Programming - Semantic Scholar

Comment

Report 6 Downloads 243 Views

Decision Diagrams and Dynamic Programming J. N. Hooker Carnegie Mellon University INFORMS 2013

Decision Diagrams & Dynamic Programming ● Binary/multivalued decision diagrams are related to dynamic programming. – But there are important differences.

Decision Diagrams & Dynamic Programming ● Binary/multivalued decision diagrams are related to dynamic programming. – But there are important differences. – Dynamic programming has state variables and state-dependent costs.

Decision Diagrams & Dynamic Programming ● We extend the theory of decision diagrams to accommodate state-dependent-costs. – We prove uniqueness theorem for weighted DDs using canonical costs.

Decision Diagrams & Dynamic Programming ● We extend the theory of decision diagrams to accommodate state-dependent-costs. – We prove uniqueness theorem for weighted DDs using canonical costs. • We can now view DP state transition graph as a decision diagram. – And perhaps reduce the decision diagram to simplify the DP model.

Outline ● Dynamic programming example ● Weighted decision diagrams and canonical costs ● Application to the example ● Ongoing research

Dynamic Programming • Dynamic programming (including the name) was introduced by Richard Bellman in 1950s. – Different concept than decision diagram, caching, etc. – But DP state transition graph can be viewed as a weighted decision diagram. • Illustration: a very basic inventory management problem. – In the literature at least 50 years.

Inventory Management Example • In each period i, we have: – Demand di – Unit production cost ci – Warehouse space m – Unit holding cost hi • In each period, we decide: – Production level xi – Stock level si • Objective: – Meet demand each period while minimizing production and holding costs.

State transition graph Period i = 1

0

State si = stock level i =2

0

1

2

i =3

0

1

2

i =4

0

1

2

0

Demand di = 2 in each period i

State transition graph Period i = 1

0

Transition xi = 4 units manufactured

State si = stock level i =2

0

1

2

i =3

0

1

2

i =4

0

1

2

0

Demand di = 2 in each period i

State transition graph Period i = 1

0

Transition xi = 4 units manufactured

State si = stock level i =2

0

1

2

i =3

0

1

2

i =4

0

1

2

0

Each path represents a solution

Demand di = 2 in each period i

Transition costs 0

c1 = 2 h2 = 2 c2 = 3 h3 = 1 c3 = 5 h4 = 2

0+8

Transition cost (immediate cost)

0+6

0+4

0 0+6

0 0+10

0+12

2+3

2+9

0+20

1+5

1

4+3

1+15

0

1

2+5

4+0

4+6

2

2+0

1+10

0+15

2

4+0

2+6

0+9

2+6

c4 = 6

1

hi si  ci xi

2+10

2

0+12

0

Unit holding cost = hi Unit manufacturing cost = ci

Backward recursion 0

c1 = 2 h2 = 2 c2 = 3 h3 = 1 c3 = 5 h4 = 2

0+8 0+6

0+4

0 0+6

0 0+10

0+12

2+3

2+9

0+20

1+5

1

1+15

4+3

0

1

2+5

4+0

4+6

2

2+0

1+10

0+15

2

4+0

2+6

0+9

2+6

c4 = 6

1

2+10

2

0+12

0 0

Cost to go gi (si)

Backward recursion 0

0+8 0+6

0+4

0 0+6

0 0+10

0+12

2+3

1

2+9

2+6

0+9

0+20

1+5

1

1+15

4+3

12 0

8 1 2+6

2+5

4+0

4+6

2

2+0

1+10

0+15

2

4+0

2+10

2 4

0+12

gi (si )  hi si  ci xi  g i 1(si  xi  di ) 0 0

Backward recursion 0 0+4

0+8 0+6

0 0+6

22 0 0+10

0+12

2+3

1

2+9

2+6

0+9

0+20

18 1

1+5

1+15

4+3

12 0

8 1 2+6

2+5

4+0

4+6

2 14

2+0

1+10

0+15

2

4+0

2+10

2 4

0+12

gi (si )  minhi si  ci xi  g i 1(si  xi  d i ) 0 0

xi

Backward recursion 0 0+4

0+8 0+6

26 0 0+6

22 0 0+10

0+12

2 5

2+3

0+9

0+20

1

2+9

2+6

18 1

1+5

1+15

4+3

12 0

8 1 2+6

2+5

4+0

4+6

2 14

2+0

1+10

0+15

2 2 4

4+0

2+10

2 4

0+12

gi (si )  minhi si  ci xi  g i 1(si  xi  d i ) 0 0

xi

Backward recursion 30 0 0+4

0+8 0+6

26 0 0+6

22 0 0+10

0+12

2 5

2+3

0+9

0+20

1

2+9

2+6

18 1

1+5

1+15

4+3

12 0

8 1 2+6

2+5

4+0

4+6

2 14

2+0

1+10

0+15

2 2 4

4+0

2+10

2 4

0+12

gi (si )  minhi si  ci xi  g i 1(si  xi  d i ) 0 0

xi

Optimal solution 30 0 0+4

0+6

26 0 0+6

22 0 0+10

Trace forward to find optimal path

0+8

0+12

2+3

0+20

18 1

1+5

1+15

2

4+0 4+3

4+6

2 14

2+0

1+10

0+15

2+5 2+10

8 1 2+6

0 0

2+9

2+6

0+9

12 0 0+12

1

4+0

2 4

Optimal solution 30 0

Trace forward to find optimal path

0+8 0+6

0+4

26 0 0+6

0 0+10

0+12

2+3

0+20

1+5

1

1+15

8 1

2

4+0 4+3

2+5

4+0

4+6

2 14

2+0

1+10

0+15

2+6

0 0

2+9

2+6

0+9

12 0 0+12

1

2+10

2 4

Optimal solution 30 0

Trace forward to find optimal path

0+8 0+6

0+4

26 0 0+6

0 0+10

0+12

2+3

0+20

1+5

1

1+15

1

2

4+0 4+3

2+5

4+0

4+6

2 14

2+0

1+10

0+15

2+6

0 0

2+9

2+6

0+9

12 0 0+12

1

2+10

2

Dynamic Programming Recursion • In general, the state transition is

si 1  i (si , xi ), i  1, , n • Cost is a function of state and control pairs n

f ( x )   c i ( si , x i ) i 1

• The recursion is

g i (si )  minci (si , xi )  g i 1 i (si , xi )  xi

– with boundary condition gn 1(sn 1 )  0,

all sn 1

– and optimal value g1(s1 ) for starting state s1

Dynamic Programming Characteristics • There are state variables in addition to decision variables. • Costs are function of state variables as well as decision variables. • State transitions are Markovian. – Current state determines possible transitions and costs. • Problem is solved recursively. – Often by moving backward through stages. • The art of dynamic programming: – Find a small state description that is Markovian.

DP vs Caching • Dynamic programming  caching – Yes, DP identifies equivalent subproblems.

DP vs Caching • Dynamic programming  caching – Yes, DP identifies equivalent subproblems. – But not by identifying distinct states. – All states are treated separately (except in approximate DP). • The intelligence is in the state description.

DP vs Caching • However, caching can be applied on top of DP. – We will use the concept of reduced decision diagram (reduced MDD) to identify equivalent states. • Problem: how to deal with state-dependent costs.

Reducing the Transition Graph 0

0+8 0+6

0+4

0 0+6

0 0+10

0+12

2+3

0+20

1+5

1

1+15

1

2

4+0 4+3

2+5

4+0

4+6

2

2+0

1+10

0+15

2+6

0

2+9

2+6

0+9

0 0+12

1

2+10

2

Arcs leaving each node are very similar. • Transition to the same states. • Have the same costs, up to an offset.

Reducing the Transition Graph 0

0+8 0+6

0+4

0 0+6

0 0+10

0+12

2+3

0+20

1+5

1

1+15

1

2

4+0 4+3

2+5

4+0

4+6

2

2+0

1+10

0+15

2+6

0

2+9

2+6

0+9

0 0+12

1

2+10

2

Arcs leaving each node are very similar. • Transition to the same states. • Have the same costs, up to an offset. Incidentally, there is also a bang-bang solution.

Reducing the Transition Graph 4

x1 = 2

x1 = 4 x1 = 3

6

7

8

10

9

8

12

13

14

0

By rearranging the costs, we can collapse the states in each period.

Reducing the Transition Graph

30 4

26

x1 = 2 6

x1 = 4 x1 = 3 7

8

20

10

9

8

12

12

13

14

0

0

0

By rearranging the costs, we can collapse the states in each period. Now it is easier to compute the optimal solution

Reducing the Transition Graph

30 4

26

x1 = 2 6

x1 = 4 x1 = 3 7

8

20

10

9

8

13

14

0

0

0

Now it is easier to compute the optimal solution This looks like reduction of a decision diagram (MDD).

12

12

By rearranging the costs, we can collapse the states in each period.

We will develop this idea in general.

Decision Diagrams Set covering example Select a minimum-weight family of sets that contain all 4 elements A, B, C, D

Weight

3 5 4 6

xi = 1 when we select set i

Decision Diagrams Decision diagram Each path corresponds to a feasible solution.

Weight

3 5 4 6

xi = 1 when we select set i

x1 = 0

x1 = 1

Weighted Decision Diagrams Separable cost function Just label arcs with weights. Shortest path corresponds to an optimal solution.

Weight

3 5 4 6

xi = 1 when we select set i

Weighted Decision Diagrams

• State-dependent costs in dynamic programming imply a nonseparable cost function: n

f ( x )   c i ( si , x i ) i 1

where si 1  i (si , xi ),

i  1, , n

– We need a theory of decision diagrams that deals with nonseparable costs.

Weighted Decision Diagrams Nonseparable cost function Now what?

Weighted Decision Diagrams Nonseparable cost function Put costs on leaves of branching tree.

Weighted Decision Diagrams Nonseparable cost function Put costs on leaves of branching tree.

But now we can’t reduce the tree to an efficient decision diagram.

Weighted Decision Diagrams Nonseparable cost function Put costs on leaves of branching tree.

But now we can’t reduce the tree to an efficient decision diagram. We will rearrange costs to obtain canonical costs.

Weighted Decision Diagrams Nonseparable cost function Put costs on leaves of branching tree.

But now we can’t reduce the tree to an efficient decision diagram. We will rearrange costs to obtain canonical costs.

6 0

5

7 0

1

0

6

0

2

7

0

2

Weighted Decision Diagrams Nonseparable cost function Put costs on leaves of branching tree.

But now we can’t reduce the tree to an efficient decision diagram. We will rearrange costs to obtain canonical costs.

6

0

0

6

5

6 1

0

5

7

0

1

0

6

0

0

2

1

7

0

2

Weighted Decision Diagrams Nonseparable cost function Put costs on leaves of branching tree.

6

But now we can’t reduce the tree to an efficient decision diagram.

5

0

5

6

We will rearrange costs to obtain canonical costs.

6

0

0

1

1

0

5

7

0

0

1

0

6

0

6

0

2

1

7

0

2

Weighted Decision Diagrams Nonseparable cost function Now the tree can be reduced.

6

5

0

5

6

6

0

0

1

1

0

5

7

0

0

1

0

6

0

6

0

2

1

7

0

2

Weighted Decision Diagrams Nonseparable cost function Now the tree can be reduced.

Weighted Decision Diagrams Nonseparable cost function Note that DD is larger than reduced unweighted DD, but still compact.

Weighted Decision Diagrams Nonseparable cost function We can represent any discrete optimization problem with such a decision diagram… even if the costs are nonseparable.

Weighted Decision Diagrams Nonseparable cost function We know that without weights, there is a unique reduced decision diagram for a given variable ordering. Is this true for decision diagrams with canonical weights? Yes.

Weighted Decision Diagrams Definition. Costs on a decision diagram are canonical if for every node in layer i, the costs cij leaving that node satisfy for fixed i (e.g., 0).

min cij    i j

Weighted Decision Diagrams Definition. Costs on a decision diagram are canonical if for every node in layer i, the costs cij leaving that node satisfy for fixed i (e.g., 0).

min cij    i j

Theorem. Any given discrete optimization problem is uniquely represented by a weighted decision diagram with canonical costs, for a given variable ordering.

Weighted Decision Diagrams Definition. Costs on a decision diagram are canonical if for every node in layer i, the costs cij leaving that node satisfy for fixed i (e.g., 0).

min cij    i j

Theorem. Any given discrete optimization problem is uniquely represented by a weighted decision diagram with canonical costs, for a given variable ordering. • Similar result proved for Affine Algebraic Decision Diagrams (AADDs) by Sanner and McAllester (IJCAI 2005). – Definition of canonical is somewhat different.

Weighted Decision Diagrams • Converting to canonical costs does not destroy the benefits of separability. Definition. A decision diagram is separable when arc costs represent terms of a separable cost function.

Theorem. A separable decision diagram that is reduced when costs are ignored is also reduced when costs are converted to canonical costs.

Weighted Decision Diagrams Example

Reduced unweighted DD

Add separable costs

Reduced weighted DD with canonical costs has same shape

Application to Inventory Problem 0

gi (si )  minhi si  ci xi  g i 1(si  xi  d i )

8 6

4

xi

x1  3

x1  2 0

12 9

6 2

4

x1  4 1

20

6

0

1

1 8

0

11

3

0

2

4 17

16

10 2

2 2

11

15

12

1 2 8

3

0 10

5

7

4

12

2

To equalize controls, let

xi   si  xi  di

Be the stock level in next period.

Application to Inventory Problem 0

gi (si )  minhi si  ci xi  g i 1(si  xi  d i )

8 6

4

x1  1

x1  0 0

12 9

6 0

xi

2

x1  2 0

20

6

0

1

1 8

0

11

2

0

2

4 17

16

10 2

2 2

11

15

12

1 1 8

1

0 10

5

7

4

12

2

To equalize controls, let

xi   si  xi  di

Be the stock level in next period.

Application to Inventory Problem New recursion:

0

8

4

x1  1

x1  0 0

12 9

6 0

2

0

20

11

1 8

1

1 8

0

1

2

0

2

4 17

16

10 2

2 2

11

0 12

5

6

15

10

xi 

x1  2

1

0





g i (si )  min hi si  ci ( xi   si  d i )  g i 1( xi  )

6

7

4

12

2

To equalize controls, let

xi   si  xi  di

Be the stock level in next period.

Application to Inventory Problem 0

8

4

0

xi 

12

5

0

20

6

0

1 8

12

0

1

2

4

11

7

16

10

2 2

11

15

10

1 8

9

6





g i (si )  min hi si  ci ( xi   si  d i )  g i 1( xi  )

6

7

4

12

2

To obtain canonical costs, subtract ci (m  si )  hi si from cost on each arc (si,si+1).

Application to Inventory Problem 0

4

0

0

xi 

6

0

0

10

0

0

1 0

0

0

1

2

0

6

3

10

6

2 0

5

5

0

1 3

3

0





g i (si )  min hi si  ci ( xi   si  d i )  g i 1( xi  )

2

5

4

10

2

To obtain canonical costs, subtract ci (m  si )  hi si from cost on each arc (si,si+1). Add these offsets to incoming arcs.

Application to Inventory Problem

4

0

8

6

0

xi 

8

10

0

14

12

0

1 0

0

0

1

2

10

8

9

14

8

2 12

13

13

12

1 9

9

10





g i (si )  min hi si  ci ( xi   si  d i )  g i 1( xi  )

7

13

0

14

2

To obtain canonical costs, subtract ci (m  si )  hi si from cost on each arc (si,si+1). Add these offsets to incoming arcs.

Application to Inventory Problem

4

0

8

6

0

xi 

8

10

0

14

12

1

2

10

8

9

14

8

2 12

13

13

12

1 9

9

10





g i (si )  min hi si  ci ( xi   si  d i )  g i 1( xi  )

7

13

To obtain canonical costs, subtract ci (m  si )  hi si from cost on each arc (si,si+1). Add these offsets to incoming arcs.

14

Now outgoing arcs look alike. 0

1 0

0

0

2

And all arcs into state si have the same cost ci (si 1 )  si 1hi 1  ci (di  si 1  m)  ci 1(m  si 1 ) 0

Application to Inventory Problem

4

0

8

6

0

xi 

8

10

0

14

12

0

1 0

0

0

1

2

10

8

9

14

8

2 12

13

13

12

1 9

9

10





g i (si )  min hi si  ci ( xi   si  d i )  g i 1( xi  )

7

13

0

14

2

These are canonical costs with

 i  minci (si 1 ) si 1

Application to Inventory Problem

4

New recursion:

0

8

6

0

xi 

8

10

0

14

12

0

0

1

1 0

0



2

10

8

9

14

8

2 12

13

13

12

1 9

9

10



g i  min hi 1xi   ci ( xi   m  d i )  ci 1(m  xi  )  g i 1

7

13

0

14

2

These are canonical costs with

 i  minci (si 1 ) si 1

Application to Inventory Problem

30 4

New recursion:

26

xi 

6

7

8

Now there is only one state per period.

20

10

9

8

12

12



13

14

0

0

0



g i  min hi 1xi   ci ( xi   m  d i )  ci 1(m  xi  )  g i 1

Application to Inventory Problem

30 4

New recursion:

26

xi 

6

7

8

Now there is only one state per period.

20

10

9

8

12

12



13

0

0

Note that computational tests are not necessary. We immediately see the speedup from the reduction in the state space.

14

0



g i  min hi 1xi   ci ( xi   m  d i )  ci 1(m  xi  )  g i 1

Ongoing Research • DP model simplification – Go through the classical DP models and see under what conditions they can be simplified.

Ongoing Research • DP model simplification – Go through the classical DP models and see under what conditions they can be simplified. • DP models for optimization based on decision diagrams – Use DP model as basis for building relaxed decision diagram. – Relaxed decision diagram provides bounds and branching mechanism.

Recommend Documents

Dynamic programming and influence diagrams - Systems ... - CiteSeerX

Planar Multiple-Valued Decision Diagrams - Semantic Scholar

Dynamic programming a - Semantic Scholar

Dynamic Logic Programming - Semantic Scholar