Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad (LSV, ENS Cachan, CNRS & Inria) and Benjamin Monmege (ULB) !
RP 2014, Oxford
Markov Decision Processes •
What? ✦
Stochastic process with non-deterministic choices
✦
Non-determinism solved by policies/strategies
2
Markov Decision Processes •
•
What? ✦
Stochastic process with non-deterministic choices
✦
Non-determinism solved by policies/strategies
Where? ✦ ✦
✦
Optimization Program verification: reachability as the basis of PCTL model-checking Game theory: 1+½ players 2
MDPs: definition and objective ½ c ½
⅓ b
e
d ½
a
⅔ ½
3
MDPs: definition and objective ½ c ½
⅓ b
e
d ½
a
⅔ ½
Finite number of states
3
MDPs: definition and objective ½ c ½
⅓ b
d ½
a
⅔ ½
Finite number of states
e
Probabilistic states
3
MDPs: definition and objective ½ c ½
⅓ b
d ½
a
⅔ Actions to be selected by the policy
½ Finite number of states
e
Probabilistic states
3
MDPs: definition and objective ½ c ½
⅓ b
e
d ½
a
⅔ Actions to be selected by the policy
½ Finite number of states
Probabilistic states
M = (S, α, δ) δ : S ×α → Dist(S) Policy σ : (S ⋅ α) ⋅S → Dist(α) !
3
MDPs: definition and objective ½ c ½
⅓ b
e
d ½
a
⅔ Actions to be selected by the policy
½ Finite number of states
Probabilistic states Reachability objective
M = (S, α, δ) δ : S ×α → Dist(S) Policy σ : (S ⋅ α) ⋅S → Dist(α) !
3
MDPs: definition and objective ½ c ½
⅓ b
e
d ½
a
⅔ Actions to be selected by the policy
½ Finite number of states
Probabilistic states Reachability objective
M = (S, α, δ) δ : S ×α → Dist(S)
σ s
Probability to reach: Pr (F
Policy σ : (S ⋅ α) ⋅S → Dist(α) !
3
)
MDPs: definition and objective ½ c ½
⅓ b
e
d ½
a
⅔ Actions to be selected by the policy
½ Finite number of states
Probabilistic states Reachability objective
M = (S, α, δ) δ : S ×α → Dist(S) Policy σ : (S ⋅ α) ⋅S → Dist(α) !
σ s
Probability to reach: Pr (F
)
Maximal probability max σ to reach: Prs (F ) = sup Prs (F 3
σ
)
Optimal reachability probabilities of MDPs
•
How? ✦
Linear programming
✦
Policy iteration
✦
Value iteration: numerical scheme that scales well and works in practice
4
Optimal reachability probabilities of MDPs
•
How? ✦ ✦ ✦
used in the numerical PRISM model checker
Linear programming
[Kwiatkowska, Norman, Parker, 2011]
Policy iteration Value iteration: numerical scheme that scales well and works in practice
4
Value iteration
½ c ½ a
e
d
⅓ b
½ ⅔
½ 5
Value iteration
½ c ½ a
e
d
⅓ b
½
⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩
⅔ ½
x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5
s ′∈S
Value iteration 0
0
0
0
½ c ½ a
e
d
⅓ b
½
⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩
⅔ ½
x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5
s ′∈S
Value iteration 0 0
0 2/3 (b)
0 0
0 0
½ c ½ a
e
d
⅓ b
½
⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩
⅔ ½
x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5
s ′∈S
Value iteration 0 0 1/3
0 2/3 (b) 2/3 (b)
0 0 0
0 0 0
½ c ½ a
e
d
⅓ b
½
⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩
⅔ ½
x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5
s ′∈S
Value iteration 0 0 1/3 1/2
0 2/3 (b) 2/3 (b) 2/3 (b)
0 0 0 1/6
0 0 0 0
½ c ½ a
e
d
⅓ b
½
⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩
⅔ ½
x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5
s ′∈S
Value iteration 0 0 1/3 1/2 7/12
0 2/3 (b) 2/3 (b) 2/3 (b) 13/18 (b)
0 0 0 1/6 1/4
0 0 0 0 0
½ c ½ a
e
d
⅓ b
½
⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩
⅔ ½
x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5
s ′∈S
Value iteration 0 0 1/3 1/2 7/12 …
0 2/3 (b) 2/3 (b) 2/3 (b) 13/18 (b) …
0 0 0 1/6 1/4 …
0 0 0 0 0 …
½ c ½ a
e
d
⅓ b
½
⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩
⅔ ½
x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5
s ′∈S
Value iteration 0 0 1/3 1/2 7/12 … 0.7969
0 2/3 (b) 2/3 (b) 2/3 (b) 13/18 (b) … 0.7988 (b)
0 0 0 1/6 1/4 … 0.3977
0 0 0 0 0 … 0
½ c ½ a
e
d
⅓ b
½
⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩
⅔ ½
x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5
s ′∈S
Value iteration 0 0 1/3 1/2 7/12 … 0.7969 0.7978
0 2/3 (b) 2/3 (b) 2/3 (b) 13/18 (b) … 0.7988 (b) 0.7992 (b)
0 0 0 1/6 1/4 … 0.3977 0.3984
0 0 0 0 0 … 0 0
½ c ½ a
e
d
⅓ b
½
⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩
⅔ ½
x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5
s ′∈S
Value iteration
≤0.001
0 0 1/3 1/2 7/12 … 0.7969 0.7978
0 2/3 (b) 2/3 (b) 2/3 (b) 13/18 (b) … 0.7988 (b) 0.7992 (b)
0 0 0 1/6 1/4 … 0.3977 0.3984
0 0 0 0 0 … 0 0
½ c ½ a
e
d
⅓ b
½
⎧⎪ 1 if s = x s(0) = ⎪⎨ ⎪⎪ 0 otherwise ⎩
⅔ ½
x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ 5
s ′∈S
Value iteration: which guarantees? ½ ½
k-1
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½
6
½
2k
Value iteration: which guarantees? ½
½
k-1
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½ State
0
1
2
3
½
…
6
2k
k-1
k
k+1
…
2k
Value iteration: which guarantees? ½
½
k-1
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½ State Step 1
0 1
1 0
2 0
3 0
½
… …
6
2k
k-1 0
k 0
k+1 0
… …
2k 0
Value iteration: which guarantees? ½
½
k-1
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½ State Step 1 Step 2
0 1 1
1 0 1/2
2 0 0
3 0 0
½
… … …
6
2k
k-1 0 0
k 0 0
k+1 0 0
… … …
2k 0 0
Value iteration: which guarantees? ½ k-1
½
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½ State Step 1 Step 2 Step 3
0 1 1 1
1 0 1/2 1/2
2 0 0 1/4
3 0 0 0
½
… … … …
6
2k
k-1 0 0 0
k 0 0 0
k+1 0 0 0
… … … …
2k 0 0 0
Value iteration: which guarantees? ½ k-1
½
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½ State Step 1 Step 2 Step 3 Step 4
0 1 1 1 1
1 0 1/2 1/2 1/2
2 0 0 1/4 1/4
3 0 0 0 1/8
½
… … … … …
6
2k
k-1 0 0 0 0
k 0 0 0 0
k+1 0 0 0 0
… … … … …
2k 0 0 0 0
Value iteration: which guarantees? ½ k-1
½
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½ State Step 1 Step 2 Step 3 Step 4 … Step k
0 1 1 1 1 … 1
1 0 1/2 1/2 1/2 … 1/2
2 0 0 1/4 1/4 … 1/4
3 0 0 0 1/8 … 1/8
½
… … … … … … … 6
2k
k-1 0 0 0 0 … 1 / 2k−1
k 0 0 0 0 … 0
k+1 0 0 0 0 … 0
… … … … … … …
2k 0 0 0 0 … 0
Value iteration: which guarantees? ½ k-1
½
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½ State Step 1 Step 2 Step 3 Step 4 … Step k Step k+1
0 1 1 1 1 … 1 1
1 0 1/2 1/2 1/2 … 1/2 1/2
2 0 0 1/4 1/4 … 1/4 1/4
3 0 0 0 1/8 … 1/8 1/8
½
… … … … … … … … 6
2k
k-1 0 0 0 0 … 1 / 2k−1 1 / 2k−1
k 0 0 0 0 … 0 1 / 2k
k+1 0 0 0 0 … 0 0
… … … … … … … …
2k 0 0 0 0 … 0 0
Value iteration: which guarantees? ½ k-1
½
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½
≤1/2k
State Step 1 Step 2 Step 3 Step 4 … Step k Step k+1
0 1 1 1 1 … 1 1
1 0 1/2 1/2 1/2 … 1/2 1/2
2 0 0 1/4 1/4 … 1/4 1/4
3 0 0 0 1/8 … 1/8 1/8
½
… … … … … … … … 6
2k
k-1 0 0 0 0 … 1 / 2k−1 1 / 2k−1
k 0 0 0 0 … 0 1 / 2k
k+1 0 0 0 0 … 0 0
… … … … … … … …
2k 0 0 0 0 … 0 0
Value iteration: which guarantees? ½ k-1
½
½
½ k-2
1
½
2k-1
½
Real value: 1/2 (by symmetry)
k
½
½
k+1
State Step 1 Step 2 Step 3 Step 4 … Step k Step k+1
0 1 1 1 1 … 1 1
1 0 1/2 1/2 1/2 … 1/2 1/2
½ …
k+2
½
½
≤1/2k
½ …
½
2 0 0 1/4 1/4 … 1/4 1/4
3 0 0 0 1/8 … 1/8 1/8
½
… … … … … … … … 6
2k
k-1 0 0 0 0 … 1 / 2k−1 1 / 2k−1
k 0 0 0 0 … 0 1 / 2k
k+1 0 0 0 0 … 0 0
… … … … … … … …
2k 0 0 0 0 … 0 0
Contributions
7
Contributions 1. Enhanced value iteration algorithm with strong guarantees
7
Contributions 1. Enhanced value iteration algorithm with strong guarantees •
performs two value iterations in parallel
7
Contributions 1. Enhanced value iteration algorithm with strong guarantees •
performs two value iterations in parallel
•
keeps an interval of possible optimal values
7
Contributions 1. Enhanced value iteration algorithm with strong guarantees •
performs two value iterations in parallel
•
keeps an interval of possible optimal values
•
uses the interval for the stopping criterion
7
Contributions 1. Enhanced value iteration algorithm with strong guarantees •
performs two value iterations in parallel
•
keeps an interval of possible optimal values
•
uses the interval for the stopping criterion
2. Study of the speed of convergence
7
Contributions 1. Enhanced value iteration algorithm with strong guarantees •
performs two value iterations in parallel
•
keeps an interval of possible optimal values
•
uses the interval for the stopping criterion
2. Study of the speed of convergence •
also applies to classical value iteration
7
Contributions 1. Enhanced value iteration algorithm with strong guarantees •
performs two value iterations in parallel
•
keeps an interval of possible optimal values
•
uses the interval for the stopping criterion
2. Study of the speed of convergence •
also applies to classical value iteration
3. Improved rounding procedure for exact computation 7
⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise
Interval iteration
x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′ s ′∈S
½ ½
k-1
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½
½ 8
2n
⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise
Interval iteration 1
0,75
x s(n+1) = maxa∈α ∑ δ(s,a)(s ′)×x s(n) ′
0,5
s ′∈S
0,25 0
1
2
3
4
5
Number of steps
½ ½
k-1
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½
½ 8
2n
6
⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise
Interval iteration 1
0,75 0,5
x (n+1) = fmax (x (n) ) fmax (x)s = maxa∈α ∑ δ(s,a)(s ′)×x s ′ s ′∈S
0,25 0
1
2
3
4
5
Number of steps
½ ½
k-1
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½
½ 8
2n
6
⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise
Interval iteration 1
0,75 usual stopping criterion
0,5
x (n+1) = fmax (x (n) ) fmax (x)s = maxa∈α ∑ δ(s,a)(s ′)×x s ′ s ′∈S
0,25 0
1
2
3
4
5
Number of steps
½ ½
k-1
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½
½ 8
2n
6
⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise
Interval iteration 1
0,75 usual stopping criterion
0,5
x (n+1) = fmax (x (n) ) fmax (x)s = maxa∈α ∑ δ(s,a)(s ′)×x s ′ s ′∈S
0,25 0
1
2
3
4
5
Number of steps
½ ½
k-1
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½
½ 8
2n
6
⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise
Interval iteration 1
0,75
NEW stopping criterion
usual stopping criterion
0,5
x (n+1) = fmax (x (n) ) fmax (x)s = maxa∈α ∑ δ(s,a)(s ′)×x s ′ s ′∈S
0,25 0
1
2
3
4
5
Number of steps
½ ½
k-1
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½
½ 8
2n
6
⎧ ⎪ 1 if s = (0) ⎪ xs = ⎨ ⎪ ⎪ ⎩ 0 otherwise
Interval iteration ⎧ ⎪ 0 if s = (0) ⎪ ys = ⎨ ⎪ ⎪ ⎩ 1 otherwise
1
0,75
y (n+1) = fmax (y (n) )
0,5
fmax (x)s = maxa∈α ∑ δ(s,a)(s ′)×x s ′
0,25
x (n+1) = fmax (x (n) )
s ′∈S
0
NEW stopping criterion
1
usual stopping criterion
2
3
4
5
Number of steps
½ ½
k-1
½
½ k-2
½ …
k+2
½ …
½ 1
½
2k-1
½
k
½
½
k+1
½
½
½ 8
2n
6
Fixed point characterization (Pr
max s
(F
))s∈S is the smallest fixed point of fmax .
9
Fixed point characterization (Pr
(F
2
3
max s
))s∈S is the smallest fixed point of fmax .
1 0,75 0,5 0,25 0
1
4
5
6
Number of steps
9
Fixed point characterization (Pr
max s
(F
))s∈S is the smallest fixed point of fmax .
1 0,75
always converges towards (Pr
0,5
max s
0,25 0
1
2
3
4
5
6
Number of steps
9
(F
))s∈S
Fixed point characterization (Pr
max s
(F
))s∈S is the smallest fixed point of fmax .
1
not always…!
0,75
always converges towards (Pr
0,5
max s
0,25 0
1
2
3
4
5
6
Number of steps
9
(F
))s∈S
Fixed point characterization (Pr
max s
(F
))s∈S is the smallest fixed point of fmax .
1
not always…!
0,75
always converges towards (Pr
0,5
b
0,25 0
max s
1
2
3
4
5
(F
6
Number of steps
½
c
g
½ ½
a
0.2
f
0.3
d 9
e
))s∈S
Fixed point characterization (Pr
max s
(F
))s∈S is the smallest fixed point of fmax .
1
not always…!
0,75
always converges towards (Pr
0,5
b
0,25 0
max s
1
2
3
4
5
½
c
1
))s∈S
1
6
Number of steps
a
(F
g
0.2
½ ½
f
0.7
0.3
1
d 9
0
e
0
Fixed point characterization (Pr
max s
(F
))s∈S is the smallest fixed point of fmax .
1
not always…!
0,75
always converges towards (Pr
0,5
b
0,25 0
max s
1
2
3
4
5
½
c
0.5
))s∈S
1
6
Number of steps
a
(F
g
0.2
½ ½
0.45
0.3
0.5 9
f
d
0
e
0
Solution: ensure uniqueness! Usual techniques applied for MDPs do not apply… b
c a
½ g
0.2
½ ½
f
0.3
d e
10
Solution: ensure uniqueness! Usual techniques applied for MDPs do not apply… b
Prsmax (F
c a
½ g
0.2
½ ½
)=1
f
0.3
d e
10
Prsmax (F
)= 0
Solution: ensure uniqueness! Usual techniques applied for MDPs do not apply… b
Prsmax (F
c
½ g
a
0.2
½ ½
)=1
f
0.3
max s
Pr
Still multiple fixed points!
e
10
(F
)= 0
Solution: ensure uniqueness! Usual techniques applied for MDPs do not apply… b
c a
½ g
0.2
½ ½
f
0.3
d e
NEW! Use Maximal End Components… (computable in polynomial time) [de Alfaro, 1997]
10
Solution: ensure uniqueness! Usual techniques applied for MDPs do not apply… b
Bottom Maximal End Component
Maximal End Component
c a
½ g
0.2
½ ½
f
Trivial Maximal End Component
0.3
d e
Bottom Maximal End Component
NEW! Use Maximal End Components… (computable in polynomial time) [de Alfaro, 1997]
10
Solution: ensure uniqueness! Usual techniques applied for MDPs do not apply… b
½ g
0.2
½ ½
f
0.3
Max-reduced MDP e
NEW! Use Maximal End Components… (computable in polynomial time) and trivialize them! Now, unicity of the fixed point [de Alfaro, 1997]
10
An even smaller MDP for minimal probabilities b
Bottom Maximal End Component
Maximal End Component
c a
½ g
½ ½
0.2
f
Trivial Maximal End Component
0.3
d e
11
Bottom Maximal End Component
An even smaller MDP for minimal probabilities b
Bottom Maximal End Component
Maximal End Component
c a
½ g
½ ½
0.2
f
Trivial Maximal End Component
0.3
d e
Non-trivial (and non accepting) MEC have null minimal probability! 11
Bottom Maximal End Component
An even smaller MDP for minimal probabilities b
0.2
0.3
Min-reduced MDP e
Non-trivial (and non accepting) MEC have null minimal probability! 11
f
Interval iteration algorithm in reduced MDPs Algorithm 1: Interval iteration algorithm for minimum reachability
1 2 3 4 5 6 7 8 9 10
Input: Min-reduced MDP M = (S, ↵M , M ), convergence threshold " min Output: Under- and over-approximation of P rM (F s+ ) xs+ := 1; xs := 0; ys+ := 1; ys := 0 foreach s 2 S \ {s+ , s } do xs := 0; ys := 1 repeat foreach s 2 S \ {s+P , s } do x0s := mina2A(s) P s0 2S M (s, a)(s0 ) xs0 ys0 := mina2A(s) s0 2S M (s, a)(s0 ) ys0 := maxs2S (ys0 x0s ) foreach s 2 S \ {s+ , s } do x0s := xs ; ys0 := ys until 6 " return (xs )s2S , (ys )s2S
– y (0) = y (0) and for all n 2 N, y (n+1) = fmin (y (n) ); – y (0) = y (0) and for all n 2 N, y (n+1) = fmax (y (n) ). Because of the new choice for the initial vector, notice that y and y are nonincreasing sequences. Hence, with the 12 same reasoning as above, we know that (1)
(1)
Interval iteration algorithm in reduced MDPs Algorithm 1: Interval iteration algorithm for minimum reachability
1 2 3 4 5 6 7 8 9 10
Input: Min-reduced MDP M = (S, ↵M , M ), convergence threshold " min Output: Under- and over-approximation of P rM (F s+ ) xs+ := 1; xs := 0; ys+ := 1; ys := 0 foreach s 2 S \ {s+ , s } do xs := 0; ys := 1 repeat foreach s 2 S \ {s+P , s } do x0s := mina2A(s) P s0 2S M (s, a)(s0 ) xs0 ys0 := mina2A(s) s0 2S M (s, a)(s0 ) ys0 := maxs2S (ys0 x0s ) foreach s 2 S \ {s+ , s } do x0s := xs ; ys0 := ys until 6 " return (xs )s2S , (ys )s2S
Sequences x and y converge towards the minimal probability to terminates (0) – y (0) = yreach and. Hence, for all the n 2algorithm N, y (n+1) = fmin (yby(n)returning ); max an interval ). each state containing (0) (n) Prs (F (0) of length at most ε for (n+1) – y = y and for all n 2 N, y = fmax (y ).
Because of the new choice for the initial vector, notice that y and y are nonincreasing sequences. Hence, with the 12 same reasoning as above, we know that (1)
(1)
Interval iteration algorithm in reduced MDPs Algorithm 1: Interval iteration algorithm for minimum reachability
1 2 3 4 5 6 7 8 9 10
Input: Min-reduced MDP M = (S, ↵M , M ), convergence threshold " min Output: Under- and over-approximation of P rM (F s+ ) xs+ := 1; xs := 0; ys+ := 1; ys := 0 foreach s 2 S \ {s+ , s } do xs := 0; ys := 1 repeat foreach s 2 S \ {s+P , s } do x0s := mina2A(s) P s0 2S M (s, a)(s0 ) xs0 ys0 := mina2A(s) s0 2S M (s, a)(s0 ) ys0 := maxs2S (ys0 x0s ) foreach s 2 S \ {s+ , s } do x0s := xs ; ys0 := ys until 6 " return (xs )s2S , (ys )s2S
Sequences x and y converge towards the minimal probability to terminates (0) – y (0) = yreach and. Hence, for all the n 2algorithm N, y (n+1) = fmin (yby(n)returning ); max an interval ). each state containing (0) (n) Prs (F (0) of length at most ε for (n+1) – y = y and for all n 2 N, y = fmax (y ).
Possible onlythe check size of intervalnotice for a given Because of the newspeed-up: choice for initial vector, that state… y and y are nonincreasing sequences. Hence, with the 12 same reasoning as above, we know that (1)
(1)
Rate of convergence ½
½
½
½
½
½
½
½ ½
½
½
…
½
…
½ ½
x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n
x s = Prs (F
13
) ys = Prs (G (¬
))
Rate of convergence ½
½
½
½
½
½
½
½ ½
½
½
…
½
…
½
2 BMECs and only trivial MECs
½
x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n
x s = Prs (F
13
) ys = Prs (G (¬
))
Rate of convergence ½
½
½
½
½
½
½
½ ½
½
½
…
½
…
½
2 BMECs and only trivial MECs attractor decomposition: length I
½
x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n
x s = Prs (F
13
) ys = Prs (G (¬
))
Rate of convergence ½
½
½
½
½
½
½
½ ½
½
½
…
½
…
½
2 BMECs and only trivial MECs attractor decomposition: length I
½
x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n
x s = Prs (F
13
) ys = Prs (G (¬
))
Rate of convergence ½
½
½
½
½
½
½
½ ½
½
½
…
½
…
½
2 BMECs and only trivial MECs attractor decomposition: length I
½
x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n
x s = Prs (F
13
) ys = Prs (G (¬
))
Rate of convergence ½
½
½
½
½
½
½
½ ½
½
½
…
½
…
½
2 BMECs and only trivial MECs attractor decomposition: length I
½
x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n
x s = Prs (F
13
) ys = Prs (G (¬
))
Rate of convergence ½
½
½
½
½
½
½
½ ½
½
½
…
½
…
½
2 BMECs and only trivial MECs attractor decomposition: length I
½
x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n
x s = Prs (F
13
) ys = Prs (G (¬
))
Rate of convergence ½
½
½
½
½
½
½
½ ½
½
½
…
½
…
½
2 BMECs and only trivial MECs attractor decomposition: length I smallest positive probability: η
½
x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n
x s = Prs (F
13
) ys = Prs (G (¬
))
Rate of convergence ½
½
½
½
½
½
½
½ ½
½
½
…
½
…
½
2 BMECs and only trivial MECs attractor decomposition: length I smallest positive probability: η
½
x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n
x s = Prs (F
Leaking property:
) ys = Prs (G (¬
))
∀n ∈ ! Prsmax (G≤nI ¬( ∨ )) ≤ (1 − η I )n
13
Rate of convergence ½
½
½
½
½
½
½
½ ½
½
½
…
½
…
½
2 BMECs and only trivial MECs attractor decomposition: length I smallest positive probability: η
½
x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n
x s = Prs (F
Leaking property:
y
(nI ) s
−x
(nI ) s
σ s
≤nI
= Pr (G
(¬
) ys = Prs (G (¬
))
∀n ∈ ! Prsmax (G≤nI ¬( ∨ )) ≤ (1 − η I )n σ′ s
≤nI
))− Pr (F
13
σ′ s
≤nI
) ≤ Pr (G
(¬
σ′ s
≤nI
))− Pr (F
)
Rate of convergence ½
½
½
½
½
½
½
½ ½
½
½
…
½
…
½
2 BMECs and only trivial MECs attractor decomposition: length I smallest positive probability: η
½
x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n
x s = Prs (F
Leaking property:
y
(nI ) s
−x
(nI ) s
σ s
≤nI
= Pr (G
since G≤n (¬
))
∀n ∈ ! Prsmax (G≤nI ¬( ∨ )) ≤ (1 − η I )n σ′ s
(¬
) ≡ G≤n ¬(
) ys = Prs (G (¬
≤nI
σ′ s
≤nI
σ′ s I n
≤nI
))− Pr (F ) ≤ Pr (G (¬ ))− Pr (F σ′ ≤nI = Prs (G ¬( ∨ )) ≤ (1 − η ) ∨
) ⊕ F≤n 13
)
Rate of convergence ½
½
½
½
½
½
½
½ ½
½
½
…
½
…
½
2 BMECs and only trivial MECs attractor decomposition: length I smallest positive probability: η
½
x stores reachability probabilities, y stores safety probabilities, min ≤n i.e., after n iterations: min ≤n
x s = Prs (F
Leaking property:
) ys = Prs (G (¬
))
∀n ∈ ! Prsmax (G≤nI ¬( ∨ )) ≤ (1 − η I )n
⎡ log ε ⎤ ⎥ steps. The interval iteration algorithm converges in at most I ⎢⎢ I ⎥ log(1 − η )⎥ ⎢ 13
Stopping criterion for exact computation MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities
14
Stopping criterion for exact computation MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities [Chatterjee, Henzinger 2008] claim for exact computation possible 8M after d iterations of value iteration
14
Stopping criterion for exact computation MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities [Chatterjee, Henzinger 2008] claim for exact computation possible 8M after d iterations of value iteration Optimal probabilities and policies can be computed by the interval iteration N 3 algorithm in at most O((1 / η) N logd) steps.
14
Stopping criterion for exact computation MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities [Chatterjee, Henzinger 2008] claim for exact computation possible 8M after d iterations of value iteration Optimal probabilities and policies can be computed by the interval iteration N 3 algorithm in at most O((1 / η) N logd) steps.
Improvement since 1/ η ≤d ! N ≤ M
14
Stopping criterion for exact computation MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities [Chatterjee, Henzinger 2008] claim for exact computation possible 8M after d iterations of value iteration Optimal probabilities and policies can be computed by the interval iteration N 3 algorithm in at most O((1 / η) N logd) steps.
Sketch of proof: • use ε = 1 / 2α as threshold (with α gcd of optimal probabilities) • upper bound on α based on matrix properties of Markov N 3N 2 chains: α = O(N d ) 14
Improvement since 1/ η ≤d ! N ≤ M
Conclusion and related work •
Framework allowing guarantees for value iteration algorithm
•
General results on convergence rate
•
Criterion for computation of exact value
•
Future work: test of our preliminary implementation over real instances
15
Conclusion and related work •
Framework allowing guarantees for value iteration algorithm
•
General results on convergence rate
•
Criterion for computation of exact value
•
Future work: test of our preliminary implementation over real instances
•
[Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient
⁓
analysis of continuous-time Markov chains
15
Conclusion and related work •
Framework allowing guarantees for value iteration algorithm
•
General results on convergence rate
•
Criterion for computation of exact value
•
Future work: test of our preliminary implementation over real instances
•
[Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient
⁓
analysis of continuous-time Markov chains •
[Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for
stochastic games
15
Conclusion and related work •
Framework allowing guarantees for value iteration algorithm
•
General results on convergence rate
•
Criterion for computation of exact value
•
Future work: test of our preliminary implementation over real instances
•
[Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient
⁓
analysis of continuous-time Markov chains •
[Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for
stochastic games •
To be published at ATVA 2014 [Brázdil, Chatterjee, Chmelík, Forejt, Křetínský, Kwiatkowska, Parker, Ujma, 2014] same techniques in a machine learning framework with almost sure convergence and computation of non-trivial end components on-the-fly 15