Reachability in MDPs: Refining Convergence of Value Iteration

Comment

Report 3 Downloads 104 Views

Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad (LSV, ENS Cachan, CNRS & Inria) and Benjamin Monmege (ULB) !

RP 2014, Oxford

Markov Decision Processes •

What? ✦

Stochastic process with non-deterministic choices

✦

Non-determinism solved by policies/strategies

2

Markov Decision Processes •

•

What? ✦

Stochastic process with non-deterministic choices

✦

Non-determinism solved by policies/strategies

Where? ✦ ✦

✦

Optimization Program verification: reachability as the basis of PCTL model-checking Game theory: 1+½ players 2

MDPs: definition and objective ½ c ½

⅓ b

e

d ½

a

⅔ ½

3

MDPs: definition and objective ½ c ½

⅓ b

e

d ½

a

⅔ ½

Finite number of states

3

MDPs: definition and objective ½ c ½

⅓ b

d ½

a

⅔ ½

Finite number of states

e

Probabilistic states

3

MDPs: definition and objective ½ c ½

⅓ b

d ½

a

⅔ Actions to be selected by the policy

½ Finite number of states

e

Probabilistic states

3

MDPs: definition and objective ½ c ½

⅓ b

e

d ½

a

⅔ Actions to be selected by the policy

½ Finite number of states

Probabilistic states

M = (S, α, δ) δ : S ×α → Dist(S) Policy σ : (S ⋅ α) ⋅S → Dist(α) !

3

MDPs: definition and objective ½ c ½

⅓ b

e

d ½

a

⅔ Actions to be selected by the policy

½ Finite number of states

Probabilistic states Reachability objective

M = (S, α, δ) δ : S ×α → Dist(S) Policy σ : (S ⋅ α) ⋅S → Dist(α) !

3

MDPs: definition and objective ½ c ½

⅓ b

e

d ½

a

⅔ Actions to be selected by the policy

½ Finite number of states

Probabilistic states Reachability objective

M = (S, α, δ) δ : S ×α → Dist(S)

σ s

Probability to reach: Pr (F

Policy σ : (S ⋅ α) ⋅S → Dist(α) !

3

)

MDPs: definition and objective ½ c ½

⅓ b

e

d ½

a

⅔ Actions to be selected by the policy

½ Finite number of states

Probabilistic states Reachability objective

M = (S, α, δ) δ : S ×α → Dist(S) Policy σ : (S ⋅ α) ⋅S → Dist(α) !

σ s

Probability to reach: Pr (F

)

Maximal probability max σ to reach: Prs (F ) = sup Prs (F 3

σ

)

Optimal reachability probabilities of MDPs

•

How? ✦

Linear programming

✦

Policy iteration

✦

Value iteration: numerical scheme that scales well and works in practice

4

Optimal reachability probabilities of MDPs

•

How? ✦ ✦ ✦

used in the numerical PRISM model checker

Linear programming

[Kwiatkowska, Norman, Parker, 2011]

Policy iteration Value iteration: numerical scheme that scales well and works in practice

4

Value iteration

½ c ½ a

e

d

⅓ b

½ ⅔

½ 5

Value iteration

½ c ½ a

e

d

⅓ b

½

⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩

⅔ ½

x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5

s ′∈S

Value iteration 0

0

0

0

½ c ½ a

e

d

⅓ b

½

⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩

⅔ ½

x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5

s ′∈S

Value iteration 0 0

0 2/3 (b)

0 0

0 0

½ c ½ a

e

d

⅓ b

½

⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩

⅔ ½

x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5

s ′∈S

Value iteration 0 0 1/3

0 2/3 (b) 2/3 (b)

0 0 0

0 0 0

½ c ½ a

e

d

⅓ b

½

⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩

⅔ ½

x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5

s ′∈S

Value iteration 0 0 1/3 1/2

0 2/3 (b) 2/3 (b) 2/3 (b)

0 0 0 1/6

0 0 0 0

½ c ½ a

e

d

⅓ b

½

⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩

⅔ ½

x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5

s ′∈S

Value iteration 0 0 1/3 1/2 7/12

0 2/3 (b) 2/3 (b) 2/3 (b) 13/18 (b)

0 0 0 1/6 1/4

0 0 0 0 0

½ c ½ a

e

d

⅓ b

½

⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩

⅔ ½

x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5

s ′∈S

Value iteration 0 0 1/3 1/2 7/12 …

0 2/3 (b) 2/3 (b) 2/3 (b) 13/18 (b) …

0 0 0 1/6 1/4 …

0 0 0 0 0 …

½ c ½ a

e

d

⅓ b

½

⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩

⅔ ½

x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5

s ′∈S

Value iteration 0 0 1/3 1/2 7/12 … 0.7969

0 2/3 (b) 2/3 (b) 2/3 (b) 13/18 (b) … 0.7988 (b)

0 0 0 1/6 1/4 … 0.3977

0 0 0 0 0 … 0

½ c ½ a

e

d

⅓ b

½

⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩

⅔ ½

x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5

s ′∈S

Value iteration 0 0 1/3 1/2 7/12 … 0.7969 0.7978

0 2/3 (b) 2/3 (b) 2/3 (b) 13/18 (b) … 0.7988 (b) 0.7992 (b)

0 0 0 1/6 1/4 … 0.3977 0.3984

0 0 0 0 0 … 0 0

½ c ½ a

e

d

⅓ b

½

⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩

⅔ ½

x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5

s ′∈S

Value iteration

≤0.001

0 0 1/3 1/2 7/12 … 0.7969 0.7978

0 2/3 (b) 2/3 (b) 2/3 (b) 13/18 (b) … 0.7988 (b) 0.7992 (b)

0 0 0 1/6 1/4 … 0.3977 0.3984

0 0 0 0 0 … 0 0

½ c ½ a

e

d

⅓ b

½

⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩

⅔ ½

x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5

s ′∈S

Value iteration: which guarantees? ½ ½

k-1

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½

6

½

2k

Value iteration: which guarantees? ½

½

k-1

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½ State

0

1

2

3

½

…

6

2k

k-1

k

k+1

…

2k

Value iteration: which guarantees? ½

½

k-1

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½ State Step 1

0 1

1 0

2 0

3 0

½

… …

6

2k

k-1 0

k 0

k+1 0

… …

2k 0

Value iteration: which guarantees? ½

½

k-1

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½ State Step 1 Step 2

0 1 1

1 0 1/2

2 0 0

3 0 0

½

… … …

6

2k

k-1 0 0

k 0 0

k+1 0 0

… … …

2k 0 0

Value iteration: which guarantees? ½ k-1

½

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½ State Step 1 Step 2 Step 3

0 1 1 1

1 0 1/2 1/2

2 0 0 1/4

3 0 0 0

½

… … … …

6

2k

k-1 0 0 0

k 0 0 0

k+1 0 0 0

… … … …

2k 0 0 0

Value iteration: which guarantees? ½ k-1

½

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½ State Step 1 Step 2 Step 3 Step 4

0 1 1 1 1

1 0 1/2 1/2 1/2

2 0 0 1/4 1/4

3 0 0 0 1/8

½

… … … … …

6

2k

k-1 0 0 0 0

k 0 0 0 0

k+1 0 0 0 0

… … … … …

2k 0 0 0 0

Value iteration: which guarantees? ½ k-1

½

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½ State Step 1 Step 2 Step 3 Step 4 … Step k

0 1 1 1 1 … 1

1 0 1/2 1/2 1/2 … 1/2

2 0 0 1/4 1/4 … 1/4

3 0 0 0 1/8 … 1/8

½

… … … … … … … 6

2k

k-1 0 0 0 0 … 1 / 2k−1

k 0 0 0 0 … 0

k+1 0 0 0 0 … 0

… … … … … … …

2k 0 0 0 0 … 0

Value iteration: which guarantees? ½ k-1

½

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½ State Step 1 Step 2 Step 3 Step 4 … Step k Step k+1

0 1 1 1 1 … 1 1

1 0 1/2 1/2 1/2 … 1/2 1/2

2 0 0 1/4 1/4 … 1/4 1/4

3 0 0 0 1/8 … 1/8 1/8

½

… … … … … … … … 6

2k

k-1 0 0 0 0 … 1 / 2k−1 1 / 2k−1

k 0 0 0 0 … 0 1 / 2k

k+1 0 0 0 0 … 0 0

… … … … … … … …

2k 0 0 0 0 … 0 0

Value iteration: which guarantees? ½ k-1

½

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½

≤1/2k

State Step 1 Step 2 Step 3 Step 4 … Step k Step k+1

0 1 1 1 1 … 1 1

1 0 1/2 1/2 1/2 … 1/2 1/2

2 0 0 1/4 1/4 … 1/4 1/4

3 0 0 0 1/8 … 1/8 1/8

½

… … … … … … … … 6

2k

k-1 0 0 0 0 … 1 / 2k−1 1 / 2k−1

k 0 0 0 0 … 0 1 / 2k

k+1 0 0 0 0 … 0 0

… … … … … … … …

2k 0 0 0 0 … 0 0

Value iteration: which guarantees? ½ k-1

½

½

½ k-2

1

½

2k-1

½

Real value: 1/2 (by symmetry)

k

½

½

k+1

State Step 1 Step 2 Step 3 Step 4 … Step k Step k+1

0 1 1 1 1 … 1 1

1 0 1/2 1/2 1/2 … 1/2 1/2

½ …

k+2

½

½

≤1/2k

½ …

½

2 0 0 1/4 1/4 … 1/4 1/4

3 0 0 0 1/8 … 1/8 1/8

½

… … … … … … … … 6

2k

k-1 0 0 0 0 … 1 / 2k−1 1 / 2k−1

k 0 0 0 0 … 0 1 / 2k

k+1 0 0 0 0 … 0 0

… … … … … … … …

2k 0 0 0 0 … 0 0

Contributions

7

Contributions 1. Enhanced value iteration algorithm with strong guarantees

7

Contributions 1. Enhanced value iteration algorithm with strong guarantees •

performs two value iterations in parallel

7

Contributions 1. Enhanced value iteration algorithm with strong guarantees •

performs two value iterations in parallel

•

keeps an interval of possible optimal values

7

Contributions 1. Enhanced value iteration algorithm with strong guarantees •

performs two value iterations in parallel

•

keeps an interval of possible optimal values

•

uses the interval for the stopping criterion

7

Contributions 1. Enhanced value iteration algorithm with strong guarantees •

performs two value iterations in parallel

•

keeps an interval of possible optimal values

•

uses the interval for the stopping criterion

2. Study of the speed of convergence

7

Contributions 1. Enhanced value iteration algorithm with strong guarantees •

performs two value iterations in parallel

•

keeps an interval of possible optimal values

•

uses the interval for the stopping criterion

2. Study of the speed of convergence •

also applies to classical value iteration

7

Contributions 1. Enhanced value iteration algorithm with strong guarantees •

performs two value iterations in parallel

•

keeps an interval of possible optimal values

•

uses the interval for the stopping criterion

2. Study of the speed of convergence •

also applies to classical value iteration

3. Improved rounding procedure for exact computation 7

⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise

Interval iteration

x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ s ′∈S

½ ½

k-1

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½

½ 8

2n

⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise

Interval iteration 1

0,75

x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′

0,5

s ′∈S

0,25 0

1

2

3

4

5

Number of steps

½ ½

k-1

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½

½ 8

2n

6

⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise

Interval iteration 1

0,75 0,5

x (n+1) = fmax (x (n) ) fmax (x)s = maxa∈α ∑ δ(s,a)(s ′)×x s ′ s ′∈S

0,25 0

1

2

3

4

5

Number of steps

½ ½

k-1

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½

½ 8

2n

6

⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise

Interval iteration 1

0,75 usual stopping criterion

0,5

x (n+1) = fmax (x (n) ) fmax (x)s = maxa∈α ∑ δ(s,a)(s ′)×x s ′ s ′∈S

0,25 0

1

2

3

4

5

Number of steps

½ ½

k-1

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½

½ 8

2n

6

⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise

Interval iteration 1

0,75 usual stopping criterion

0,5

x (n+1) = fmax (x (n) ) fmax (x)s = maxa∈α ∑ δ(s,a)(s ′)×x s ′ s ′∈S

0,25 0

1

2

3

4

5

Number of steps

½ ½

k-1

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½

½ 8

2n

6

⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise

Interval iteration 1

0,75

NEW stopping criterion

usual stopping criterion

0,5

x (n+1) = fmax (x (n) ) fmax (x)s = maxa∈α ∑ δ(s,a)(s ′)×x s ′ s ′∈S

0,25 0

1

2

3

4

5

Number of steps

½ ½

k-1

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½

½ 8

2n

6

⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise

Interval iteration ⎧ ⎪ 0 if s = (0) ⎪ ys = ⎨ ⎪ ⎪ ⎩ 1 otherwise

1

0,75

y (n+1) = fmax (y (n) )

0,5

fmax (x)s = maxa∈α ∑ δ(s,a)(s ′)×x s ′

0,25

x (n+1) = fmax (x (n) )

s ′∈S

0

NEW stopping criterion

1

usual stopping criterion

2

3

4

5

Number of steps

½ ½

k-1

½

½ k-2

½ …

k+2

½ …

½ 1

½

2k-1

½

k

½

½

k+1

½

½

½ 8

2n

6

Fixed point characterization (Pr

max s

(F

))s∈S is the smallest fixed point of fmax .

9

Fixed point characterization (Pr

(F

2

3

max s

))s∈S is the smallest fixed point of fmax .

1 0,75 0,5 0,25 0

1

4

5

6

Number of steps

9

Fixed point characterization (Pr

max s

(F

))s∈S is the smallest fixed point of fmax .

1 0,75

always converges towards (Pr

0,5

max s

0,25 0

1

2

3

4

5

6

Number of steps

9

(F

))s∈S

Fixed point characterization (Pr

max s

(F

))s∈S is the smallest fixed point of fmax .

1

not always…!

0,75

always converges towards (Pr

0,5

max s

0,25 0

1

2

3

4

5

6

Number of steps

9

(F

))s∈S

Fixed point characterization (Pr

max s

(F

))s∈S is the smallest fixed point of fmax .

1

not always…!

0,75

always converges towards (Pr

0,5

b

0,25 0

max s

1

2

3

4

5

(F

6

Number of steps

½

c

g

½ ½

a

0.2

f

0.3

d 9

e

))s∈S

Fixed point characterization (Pr

max s

(F

))s∈S is the smallest fixed point of fmax .

1

not always…!

0,75

always converges towards (Pr

0,5

b

0,25 0

max s

1

2

3

4

5

½

c

1

))s∈S

1

6

Number of steps

a

(F

g

0.2

½ ½

f

0.7

0.3

1

d 9

0

e

0

Fixed point characterization (Pr

max s

(F

))s∈S is the smallest fixed point of fmax .

1

not always…!

0,75

always converges towards (Pr

0,5

b

0,25 0

max s

1

2

3

4

5

½

c

0.5

))s∈S

1

6

Number of steps

a

(F

g

0.2

½ ½

0.45

0.3

0.5 9

f

d

0

e

0

Solution: ensure uniqueness! Usual techniques applied for MDPs do not apply… b

c a

½ g

0.2

½ ½

f

0.3

d e

10

Solution: ensure uniqueness! Usual techniques applied for MDPs do not apply… b

Prsmax (F

c a

½ g

0.2

½ ½

)=1

f

0.3

d e

10

Prsmax (F

)= 0

Solution: ensure uniqueness! Usual techniques applied for MDPs do not apply… b

Prsmax (F

c

½ g

a

0.2

½ ½

)=1

f

0.3

max s

Pr

Still multiple fixed points!

e

10

(F

)= 0

Solution: ensure uniqueness! Usual techniques applied for MDPs do not apply… b

c a

½ g

0.2

½ ½

f

0.3

d e

NEW! Use Maximal End Components… (computable in polynomial time) [de Alfaro, 1997]

10

Solution: ensure uniqueness! Usual techniques applied for MDPs do not apply… b

Bottom Maximal End Component

Maximal End Component

c a

½ g

0.2

½ ½

f

Trivial Maximal End Component

0.3

d e

Bottom Maximal End Component

NEW! Use Maximal End Components… (computable in polynomial time) [de Alfaro, 1997]

10

Solution: ensure uniqueness! Usual techniques applied for MDPs do not apply… b

½ g

0.2

½ ½

f

0.3

Max-reduced MDP e

NEW! Use Maximal End Components… (computable in polynomial time) and trivialize them! Now, unicity of the fixed point [de Alfaro, 1997]

10

An even smaller MDP for minimal probabilities b

Bottom Maximal End Component

Maximal End Component

c a

½ g

½ ½

0.2

f

Trivial Maximal End Component

0.3

d e

11

Bottom Maximal End Component

An even smaller MDP for minimal probabilities b

Bottom Maximal End Component

Maximal End Component

c a

½ g

½ ½

0.2

f

Trivial Maximal End Component

0.3

d e

Non-trivial (and non accepting) MEC have null minimal probability! 11

Bottom Maximal End Component

An even smaller MDP for minimal probabilities b

0.2

0.3

Min-reduced MDP e

Non-trivial (and non accepting) MEC have null minimal probability! 11

f

Interval iteration algorithm in reduced MDPs Algorithm 1: Interval iteration algorithm for minimum reachability

1 2 3 4 5 6 7 8 9 10

Input: Min-reduced MDP M = (S, ↵M , M ), convergence threshold " min Output: Under- and over-approximation of P rM (F s+ ) xs+ := 1; xs := 0; ys+ := 1; ys := 0 foreach s 2 S \ {s+ , s } do xs := 0; ys := 1 repeat foreach s 2 S \ {s+P , s } do x0s := mina2A(s) P s0 2S M (s, a)(s0 ) xs0 ys0 := mina2A(s) s0 2S M (s, a)(s0 ) ys0 := maxs2S (ys0 x0s ) foreach s 2 S \ {s+ , s } do x0s := xs ; ys0 := ys until 6 " return (xs )s2S , (ys )s2S

– y (0) = y (0) and for all n 2 N, y (n+1) = fmin (y (n) ); – y (0) = y (0) and for all n 2 N, y (n+1) = fmax (y (n) ). Because of the new choice for the initial vector, notice that y and y are nonincreasing sequences. Hence, with the 12 same reasoning as above, we know that (1)

(1)

Interval iteration algorithm in reduced MDPs Algorithm 1: Interval iteration algorithm for minimum reachability

1 2 3 4 5 6 7 8 9 10

Input: Min-reduced MDP M = (S, ↵M , M ), convergence threshold " min Output: Under- and over-approximation of P rM (F s+ ) xs+ := 1; xs := 0; ys+ := 1; ys := 0 foreach s 2 S \ {s+ , s } do xs := 0; ys := 1 repeat foreach s 2 S \ {s+P , s } do x0s := mina2A(s) P s0 2S M (s, a)(s0 ) xs0 ys0 := mina2A(s) s0 2S M (s, a)(s0 ) ys0 := maxs2S (ys0 x0s ) foreach s 2 S \ {s+ , s } do x0s := xs ; ys0 := ys until 6 " return (xs )s2S , (ys )s2S

Sequences x and y converge towards the minimal probability to terminates (0) – y (0) = yreach and. Hence, for all the n 2algorithm N, y (n+1) = fmin (yby(n)returning ); max an interval ). each state containing (0) (n) Prs (F (0) of length at most ε for (n+1) – y = y and for all n 2 N, y = fmax (y ).

Because of the new choice for the initial vector, notice that y and y are nonincreasing sequences. Hence, with the 12 same reasoning as above, we know that (1)

(1)

Interval iteration algorithm in reduced MDPs Algorithm 1: Interval iteration algorithm for minimum reachability

1 2 3 4 5 6 7 8 9 10

Input: Min-reduced MDP M = (S, ↵M , M ), convergence threshold " min Output: Under- and over-approximation of P rM (F s+ ) xs+ := 1; xs := 0; ys+ := 1; ys := 0 foreach s 2 S \ {s+ , s } do xs := 0; ys := 1 repeat foreach s 2 S \ {s+P , s } do x0s := mina2A(s) P s0 2S M (s, a)(s0 ) xs0 ys0 := mina2A(s) s0 2S M (s, a)(s0 ) ys0 := maxs2S (ys0 x0s ) foreach s 2 S \ {s+ , s } do x0s := xs ; ys0 := ys until 6 " return (xs )s2S , (ys )s2S

Sequences x and y converge towards the minimal probability to terminates (0) – y (0) = yreach and. Hence, for all the n 2algorithm N, y (n+1) = fmin (yby(n)returning ); max an interval ). each state containing (0) (n) Prs (F (0) of length at most ε for (n+1) – y = y and for all n 2 N, y = fmax (y ).

Possible onlythe check size of intervalnotice for a given Because of the newspeed-up: choice for initial vector, that state… y and y are nonincreasing sequences. Hence, with the 12 same reasoning as above, we know that (1)

(1)

Rate of convergence ½

½

½

½

½

½

½

½ ½

½

½

…

½

…

½ ½

x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n

x s = Prs (F

13

) ys = Prs (G (¬

))

Rate of convergence ½

½

½

½

½

½

½

½ ½

½

½

…

½

…

½

2 BMECs and only trivial MECs

½

x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n

x s = Prs (F

13

) ys = Prs (G (¬

))

Rate of convergence ½

½

½

½

½

½

½

½ ½

½

½

…

½

…

½

2 BMECs and only trivial MECs attractor decomposition: length I

½

x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n

x s = Prs (F

13

) ys = Prs (G (¬

))

Rate of convergence ½

½

½

½

½

½

½

½ ½

½

½

…

½

…

½

2 BMECs and only trivial MECs attractor decomposition: length I

½

x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n

x s = Prs (F

13

) ys = Prs (G (¬

))

Rate of convergence ½

½

½

½

½

½

½

½ ½

½

½

…

½

…

½

2 BMECs and only trivial MECs attractor decomposition: length I

½

x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n

x s = Prs (F

13

) ys = Prs (G (¬

))

Rate of convergence ½

½

½

½

½

½

½

½ ½

½

½

…

½

…

½

2 BMECs and only trivial MECs attractor decomposition: length I

½

x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n

x s = Prs (F

13

) ys = Prs (G (¬

))

Rate of convergence ½

½

½

½

½

½

½

½ ½

½

½

…

½

…

½

2 BMECs and only trivial MECs attractor decomposition: length I

½

x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n

x s = Prs (F

13

) ys = Prs (G (¬

))

Rate of convergence ½

½

½

½

½

½

½

½ ½

½

½

…

½

…

½

2 BMECs and only trivial MECs attractor decomposition: length I smallest positive probability: η

½

x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n

x s = Prs (F

13

) ys = Prs (G (¬

))

Rate of convergence ½

½

½

½

½

½

½

½ ½

½

½

…

½

…

½

2 BMECs and only trivial MECs attractor decomposition: length I smallest positive probability: η

½

x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n

x s = Prs (F

Leaking property:

) ys = Prs (G (¬

))

∀n ∈ ! Prsmax (G≤nI ¬( ∨ )) ≤ (1 − η I )n

13

Rate of convergence ½

½

½

½

½

½

½

½ ½

½

½

…

½

…

½

2 BMECs and only trivial MECs attractor decomposition: length I smallest positive probability: η

½

x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n

x s = Prs (F

Leaking property:

y

(nI ) s

−x

(nI ) s

σ s

≤nI

= Pr (G

(¬

) ys = Prs (G (¬

))

∀n ∈ ! Prsmax (G≤nI ¬( ∨ )) ≤ (1 − η I )n σ′ s

≤nI

))− Pr (F

13

σ′ s

≤nI

) ≤ Pr (G

(¬

σ′ s

≤nI

))− Pr (F

)

Rate of convergence ½

½

½

½

½

½

½

½ ½

½

½

…

½

…

½

2 BMECs and only trivial MECs attractor decomposition: length I smallest positive probability: η

½

x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n

x s = Prs (F

Leaking property:

y

(nI ) s

−x

(nI ) s

σ s

≤nI

= Pr (G

since G≤n (¬

))

∀n ∈ ! Prsmax (G≤nI ¬( ∨ )) ≤ (1 − η I )n σ′ s

(¬

) ≡ G≤n ¬(

) ys = Prs (G (¬

≤nI

σ′ s

≤nI

σ′ s I n

≤nI

))− Pr (F ) ≤ Pr (G (¬ ))− Pr (F σ′ ≤nI = Prs (G ¬( ∨ )) ≤ (1 − η ) ∨

) ⊕ F≤n 13

)

Rate of convergence ½

½

½

½

½

½

½

½ ½

½

½

…

½

…

½

2 BMECs and only trivial MECs attractor decomposition: length I smallest positive probability: η

½

x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n

x s = Prs (F

Leaking property:

) ys = Prs (G (¬

))

∀n ∈ ! Prsmax (G≤nI ¬( ∨ )) ≤ (1 − η I )n

⎡ log ε ⎤ ⎥ steps. The interval iteration algorithm converges in at most I ⎢⎢ I ⎥ log(1 − η )⎥ ⎢ 13

Stopping criterion for exact computation MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

14

Stopping criterion for exact computation MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities [Chatterjee, Henzinger 2008] claim for exact computation possible 8M after d iterations of value iteration

14

Stopping criterion for exact computation MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities [Chatterjee, Henzinger 2008] claim for exact computation possible 8M after d iterations of value iteration Optimal probabilities and policies can be computed by the interval iteration N 3 algorithm in at most O((1 / η) N logd) steps.

14

Stopping criterion for exact computation MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities [Chatterjee, Henzinger 2008] claim for exact computation possible 8M after d iterations of value iteration Optimal probabilities and policies can be computed by the interval iteration N 3 algorithm in at most O((1 / η) N logd) steps.

Improvement since 1/ η ≤d ! N ≤ M

14

Stopping criterion for exact computation MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities [Chatterjee, Henzinger 2008] claim for exact computation possible 8M after d iterations of value iteration Optimal probabilities and policies can be computed by the interval iteration N 3 algorithm in at most O((1 / η) N logd) steps.

Sketch of proof: • use ε = 1 / 2α as threshold (with α gcd of optimal probabilities) • upper bound on α based on matrix properties of Markov N 3N 2 chains: α = O(N d ) 14

Improvement since 1/ η ≤d ! N ≤ M

Conclusion and related work •

Framework allowing guarantees for value iteration algorithm

•

General results on convergence rate

•

Criterion for computation of exact value

•

Future work: test of our preliminary implementation over real instances

15

Conclusion and related work •

Framework allowing guarantees for value iteration algorithm

•

General results on convergence rate

•

Criterion for computation of exact value

•

Future work: test of our preliminary implementation over real instances

•

[Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient

⁓

analysis of continuous-time Markov chains

15

Conclusion and related work •

Framework allowing guarantees for value iteration algorithm

•

General results on convergence rate

•

Criterion for computation of exact value

•

Future work: test of our preliminary implementation over real instances

•

[Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient

⁓

analysis of continuous-time Markov chains •

[Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for

stochastic games

15

Conclusion and related work •

Framework allowing guarantees for value iteration algorithm

•

General results on convergence rate

•

Criterion for computation of exact value

•

Future work: test of our preliminary implementation over real instances

•

[Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient

⁓

analysis of continuous-time Markov chains •

[Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for

stochastic games •

To be published at ATVA 2014 [Brázdil, Chatterjee, Chmelík, Forejt, Křetínský, Kwiatkowska, Parker, Ujma, 2014] same techniques in a machine learning framework with almost sure convergence and computation of non-trivial end components on-the-fly 15

Recommend Documents

Convergence of Iteration Systems - CiteSeerX

Convergence Properties of Policy Iteration - Editorial Express