Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments G´abor Bart´ ok, D´avid P´al, Csaba Szepesv´ari
COLT2011, Budapest
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
1 / 14
Finite Stochastic Partial-Monitoring Games
learner
Bart´ ok, P´ al, Szepesv´ ari (UofA)
environment
Partial Monitoring
COLT2011, Budapest
2 / 14
Finite Stochastic Partial-Monitoring Games
referee action It
outcome Jt
learner
Bart´ ok, P´ al, Szepesv´ ari (UofA)
environment
Partial Monitoring
COLT2011, Budapest
2 / 14
Finite Stochastic Partial-Monitoring Games
feedback ht = H(It , Jt )
loss `t = L(It , Jt )
referee action It
outcome Jt
L, H ∈ RN ×M publicly known
learner
Bart´ ok, P´ al, Szepesv´ ari (UofA)
environment
Partial Monitoring
COLT2011, Budapest
2 / 14
Finite Stochastic Partial-Monitoring Games
feedback ht = H(It , Jt )
loss `t = L(It , Jt )
feedback ht
referee action It
outcome Jt
learner
Bart´ ok, P´ al, Szepesv´ ari (UofA)
environment
Partial Monitoring
COLT2011, Budapest
2 / 14
Finite Stochastic Partial-Monitoring Games
feedback ht = H(It , Jt )
loss `t = L(It , Jt )
feedback ht
referee action It
outcome Jt
learner
environment
Finitely many actions, outcomes; Stochastic environment Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
2 / 14
Examples
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
3 / 14
Examples Bandits L=H
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
3 / 14
Examples Bandits
Full info
1 2 3 4 H = 1 2 3 4 1 2 3 4
L=H
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
3 / 14
Examples Bandits
Full info
1 2 3 4 H = 1 2 3 4 1 2 3 4
L=H
Dynamic pricing
L(i, j) = cIi>j + (j − i)Ii≤j
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
H(i, j) = Ii≤j
COLT2011, Budapest
3 / 14
Examples Bandits
Full info
1 2 3 4 H = 1 2 3 4 1 2 3 4
L=H
Dynamic pricing
0 c L = . ..
1 0 .. .
c ···
Bart´ ok, P´ al, Szepesv´ ari (UofA)
··· ··· .. . c
N −1 N − 2 .. . 0
Partial Monitoring
1 ··· 0 . . . H= .. . . . . 0 ···
··· .. 0
.
1 .. . .. . 1
COLT2011, Budapest
3 / 14
Goal Performance measure P PT Expected regret RT (A) = E[ T t=1 L(It , Jt )] − mini E[ t=1 L(i, Jt )]
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
4 / 14
Goal Performance measure P PT Expected regret RT (A) = E[ T t=1 L(It , Jt )] − mini E[ t=1 L(i, Jt )] ˆT The problem: (L, H) given, determine the minimax expected regret R
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
4 / 14
Goal Performance measure P PT Expected regret RT (A) = E[ T t=1 L(It , Jt )] − mini E[ t=1 L(i, Jt )] ˆT The problem: (L, H) given, determine the minimax expected regret R ˆ T = O(T α ) for some 0 ≤ α ≤ 1 A typical result: R
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
4 / 14
Goal Performance measure P PT Expected regret RT (A) = E[ T t=1 L(It , Jt )] − mini E[ t=1 L(i, Jt )] ˆT The problem: (L, H) given, determine the minimax expected regret R ˆ T = O(T α ) for some 0 ≤ α ≤ 1 A typical result: R
Csaba
Bart´ ok, P´ al, Szepesv´ ari (UofA)
G´abor
Partial Monitoring
COLT2011, Budapest
4 / 14
Goal Performance measure P PT Expected regret RT (A) = E[ T t=1 L(It , Jt )] − mini E[ t=1 L(i, Jt )] ˆT The problem: (L, H) given, determine the minimax expected regret R ˆ T = O(T α ) for some 0 ≤ α ≤ 1 A typical result: R D´avid
Csaba
Bart´ ok, P´ al, Szepesv´ ari (UofA)
G´abor
Partial Monitoring
COLT2011, Budapest
4 / 14
Previous work
T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
Bart´ ok, P´ al, Szepesv´ ari (UofA)
dynamic pricing
Partial Monitoring
l.e.p.
COLT2011, Budapest
5 / 14
Previous work
T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
Full-info, bandits:
Bart´ ok, P´ al, Szepesv´ ari (UofA)
dynamic pricing
l.e.p.
[Littlestone and Warmuth, 1994, Auer et al., 2002]
Partial Monitoring
COLT2011, Budapest
5 / 14
Previous work
T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
Full-info, bandits:
dynamic pricing
[Littlestone and Warmuth, 1994, Auer et al., 2002]
e 3/4 ) Algorithm (non-hopeless): O(T
Bart´ ok, P´ al, Szepesv´ ari (UofA)
l.e.p.
[Piccolboni and Schindelhauer, 2001]
Partial Monitoring
COLT2011, Budapest
5 / 14
Previous work
T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
Full-info, bandits:
dynamic pricing
[Littlestone and Warmuth, 1994, Auer et al., 2002]
e 3/4 ) Algorithm (non-hopeless): O(T Reducing regret to
Bart´ ok, P´ al, Szepesv´ ari (UofA)
l.e.p.
[Piccolboni and Schindelhauer, 2001]
O(T 2/3 ) [Cesa-Bianchi et al., 2006]
Partial Monitoring
COLT2011, Budapest
5 / 14
Previous work
T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
Full-info, bandits:
dynamic pricing
[Littlestone and Warmuth, 1994, Auer et al., 2002]
e 3/4 ) Algorithm (non-hopeless): O(T Reducing regret to
Bart´ ok, P´ al, Szepesv´ ari (UofA)
l.e.p.
[Piccolboni and Schindelhauer, 2001]
O(T 2/3 ) [Cesa-Bianchi et al., 2006]
Partial Monitoring
+ l.e.p. lower
COLT2011, Budapest
5 / 14
Previous work
T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
Full-info, bandits:
dynamic pricing
l.e.p.
[Littlestone and Warmuth, 1994, Auer et al., 2002]
e 3/4 ) Algorithm (non-hopeless): O(T
[Piccolboni and Schindelhauer, 2001]
O(T 2/3 ) [Cesa-Bianchi et al., 2006]
Reducing regret to √ non-trivial → Ω( T )
Bart´ ok, P´ al, Szepesv´ ari (UofA)
+ l.e.p. lower
[Antos et al., 2011]
Partial Monitoring
COLT2011, Budapest
5 / 14
Previous work
T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
Full-info, bandits:
dynamic pricing
l.e.p.
[Littlestone and Warmuth, 1994, Auer et al., 2002]
e 3/4 ) Algorithm (non-hopeless): O(T
[Piccolboni and Schindelhauer, 2001]
O(T 2/3 ) [Cesa-Bianchi et al., 2006]
Reducing regret to √ non-trivial → Ω( T )
+ l.e.p. lower
[Antos et al., 2011]
Non-stochastic results, apply to stochastic
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
5 / 14
Our contribution T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
Bart´ ok, P´ al, Szepesv´ ari (UofA)
dynamic pricing
Partial Monitoring
l.e.p.
COLT2011, Budapest
6 / 14
Our contribution T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
dynamic pricing
l.e.p.
√ What about the grey area? Ω( T ) and O(T 2/3 )
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
6 / 14
Our contribution T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
dynamic pricing
l.e.p.
√ What about the grey area? Ω( T ) and O(T 2/3 ) What is in between? Is there a game with, say, Θ(T 3/5 )?
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
6 / 14
Our contribution T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
dynamic pricing
l.e.p.
√ What about the grey area? Ω( T ) and O(T 2/3 ) What is in between? Is there a game with, say, Θ(T 3/5 )? No! We eliminate the grey area
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
6 / 14
Our contribution T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
dynamic pricing
l.e.p.
√ What about the grey area? Ω( T ) and O(T 2/3 ) What is in between? Is there a game with, say, Θ(T 3/5 )? No! We eliminate the grey area Dynamic pricing is hard!
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
6 / 14
Our contribution T 1/2
0
?
easy
T 2/3
T
hard
trivial
hopeless full-info bandits
dynamic pricing
l.e.p.
√ What about the grey area? Ω( T ) and O(T 2/3 ) What is in between? Is there a game with, say, Θ(T 3/5 )? No! We eliminate the grey area Dynamic pricing is hard!
Main Theorem The minimax regret of any finite partial-monitoring game against √ e stochastic opponent can be 0 (trivial), Θ( T ) (easy), Θ(T 2/3 ) (hard) or Θ(T ) (hopeless). Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
6 / 14
Main tools 1: using L Cell decomposition of the probability simplex (the space of outcome distributions) (0, 0, 1)
0 1 2 1 0 1 L = 1 1 0 .. .. .. . . .
p (0, 1, 0)
(1, 0, 0)
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
7 / 14
Main tools 1: using L Cell decomposition of the probability simplex (the space of outcome distributions) (0, 0, 1)
0 1 2 1 0 1 L = 1 1 0 .. .. .. . . .
p (0, 1, 0)
−→
(1, 0, 0)
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
7 / 14
Main tools 1: using L Cell decomposition of the probability simplex (the space of outcome distributions) (0, 0, 1)
0 1 2 1 0 1 L = 1 1 0 .. .. .. . . .
p (0, 1, 0)
−→
(1, 0, 0)
Boundary ⊆ (`i − `j )⊥ Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
7 / 14
Main tools 2: using H Row of action i in H: a b a c
Given opponent strategy p, probability of observing a, b, c?
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
8 / 14
Main tools 2: using H Row of action i in H: a b a c
Given opponent strategy p, probability of observing a, b, c?
qa p1 + p3 qb = p2 qc p4
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
8 / 14
Main tools 2: using H Row of action i in H: a b a c
Given opponent strategy p, probability of observing a, b, c?
qa p1 + p3 qb = p2 = qc p4
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
?
p1 p2 p3 p4
COLT2011, Budapest
8 / 14
Main tools 2: using H Row of action i in H: a b a c
Given opponent strategy p, probability of observing a, b, c?
p1 qa p1 + p3 1 0 1 0 qb = p2 = p2 p3 qc p4 p4
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
8 / 14
Main tools 2: using H Row of action i in H: a b a c
Given opponent strategy p, probability of observing a, b, c?
p1 qa p1 + p3 1 0 1 0 qb = p2 = 0 1 0 0 p2 p3 qc p4 p4
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
8 / 14
Main tools 2: using H Row of action i in H: a b a c
Given opponent strategy p, probability of observing a, b, c?
p1 qa p1 + p3 1 0 1 0 qb = p2 = 0 1 0 0 p2 p3 qc p4 0 0 0 1 p4
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
8 / 14
Main tools 2: using H Row of action i in H: a b a c
Given opponent strategy p, probability of observing a, b, c?
p1 qa p1 + p3 1 0 1 0 qb = p2 = 0 1 0 0 p2 p3 qc p4 0 0 0 1 p4 Indicator rows, signal matrix Si
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
8 / 14
Main tools 2: using H Row of action i in H: a b a c
Given opponent strategy p, probability of observing a, b, c?
p1 qa p1 + p3 1 0 1 0 qb = p2 = 0 1 0 0 p2 p3 qc p4 0 0 0 1 p4 Indicator rows, signal matrix Si Si For more actions: Si,i 0 = Si 0
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
8 / 14
Main tools 2: using H Row of action i in H: a b a c
Given opponent strategy p, probability of observing a, b, c?
p1 qa p1 + p3 1 0 1 0 qb = p2 = 0 1 0 0 p2 p3 qc p4 0 0 0 1 p4 Indicator rows, signal matrix Si Si For more actions: Si,i 0 = Si 0 Si,i 0 p = Si,i 0 p 0 → no way we can distinguish them
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
8 / 14
Main tools 2: using H Row of action i in H: a b a c
Given opponent strategy p, probability of observing a, b, c?
p1 qa p1 + p3 1 0 1 0 qb = p2 = 0 1 0 0 p2 p3 qc p4 0 0 0 1 p4 Indicator rows, signal matrix Si Si For more actions: Si,i 0 = Si 0 Si,i 0 p = Si,i 0 p 0 → no way we can distinguish them nullspace of Si,i 0 “dangerous” Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
8 / 14
What makes a game easy?
“Local observability”
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
9 / 14
What makes a game easy?
“Local observability” Two neighbor actions, which is better?
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
9 / 14
What makes a game easy?
“Local observability” Two neighbor actions, which is better? Decide without using other actions
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
9 / 14
What makes a game easy?
“Local observability” Two neighbor actions, which is better? Decide without using other actions
The condition: local observability For every neighboring action pair i, i 0 , `i − `i 0 is in the row space of Si,i 0 .
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
9 / 14
What makes a game easy?
“Local observability” Two neighbor actions, which is better? Decide without using other actions
The condition: local observability For every neighboring action pair i, i 0 , `i − `i 0 is in the row space of Si,i 0 . Why?
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
9 / 14
What makes a game easy?
“Local observability” Two neighbor actions, which is better? Decide without using other actions
The condition: local observability For every neighboring action pair i, i 0 , `i − `i 0 is in the row space of Si,i 0 . Why?
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
9 / 14
What makes a game easy?
“Local observability” Two neighbor actions, which is better? Decide without using other actions
The condition: local observability For every neighboring action pair i, i 0 , `i − `i 0 is in the row space of Si,i 0 . Why? Unbiased estimate of h`i − `i 0 , p ∗ i: “Which action is better?”
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
9 / 14
Algorithm outline
Maintain set of “alive” actions
(0, 0, 1)
p∗
(0, 1, 0)
(1, 0, 0)
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
10 / 14
Algorithm outline
Maintain set of “alive” actions In every “round”, choose each alive action
(0, 0, 1)
p∗
(0, 1, 0)
(1, 0, 0)
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
10 / 14
Algorithm outline
Maintain set of “alive” actions In every “round”, choose each alive action
(0, 0, 1)
Update estimates of loss differences
p∗
(0, 1, 0)
(1, 0, 0)
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
10 / 14
Algorithm outline
Maintain set of “alive” actions In every “round”, choose each alive action
(0, 0, 1)
Update estimates of loss differences If a loss difference is significant (Bernstein stopping),
p∗
(0, 1, 0)
(1, 0, 0)
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
10 / 14
Algorithm outline
Maintain set of “alive” actions In every “round”, choose each alive action
(0, 0, 1)
Update estimates of loss differences If a loss difference is significant (Bernstein stopping), eliminate suboptimal halfspace
p∗
(0, 1, 0)
(1, 0, 0)
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
10 / 14
Algorithm outline
Maintain set of “alive” actions In every “round”, choose each alive action
(0, 0, 1)
Update estimates of loss differences If a loss difference is significant (Bernstein stopping), eliminate suboptimal halfspace Do until only one action, or time step T
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
p∗
(0, 1, 0)
(1, 0, 0)
COLT2011, Budapest
10 / 14
Algorithm outline
Maintain set of “alive” actions In every “round”, choose each alive action
(0, 0, 1)
Update estimates of loss differences If a loss difference is significant (Bernstein stopping), eliminate suboptimal halfspace Do until only one action, or time step T √ e T ) if local observability Achieves O(
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
p∗
(0, 1, 0)
(1, 0, 0)
COLT2011, Budapest
10 / 14
Lower bound for hard games
Actions i and j not enough feedback
(0, 0, 1)
j
(0, 1, 0)
i
(1, 0, 0)
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
11 / 14
Lower bound for hard games
Actions i and j not enough feedback
(0, 0, 1)
“dangerous line” crosses (Ker Si,j ) j
(0, 1, 0)
i
(1, 0, 0)
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
11 / 14
Lower bound for hard games
Actions i and j not enough feedback
(0, 0, 1)
“dangerous line” crosses (Ker Si,j ) Third action needed, but costly j
(0, 1, 0)
i
(1, 0, 0)
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
11 / 14
Lower bound for hard games
Actions i and j not enough feedback
(0, 0, 1)
“dangerous line” crosses (Ker Si,j ) Third action needed, but costly When does this line exist?
j
(0, 1, 0)
i
(1, 0, 0)
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
11 / 14
Lower bound for hard games
Actions i and j not enough feedback
(0, 0, 1)
“dangerous line” crosses (Ker Si,j ) Third action needed, but costly When does this line exist?
j
Coincidence! When no local observability > (`i − `j ) 6∈ Im Si,j ↔ Ker Si,j 6⊆ (`i − `j )⊥ | {z } | {z } unobservable
Bart´ ok, P´ al, Szepesv´ ari (UofA)
line crosses
Partial Monitoring
(0, 1, 0)
i
(1, 0, 0)
COLT2011, Budapest
11 / 14
Lower bound for hard games
Actions i and j not enough feedback
(0, 0, 1)
“dangerous line” crosses (Ker Si,j ) Third action needed, but costly When does this line exist?
j
Coincidence! When no local observability > (`i − `j ) 6∈ Im Si,j ↔ Ker Si,j 6⊆ (`i − `j )⊥ | {z } | {z } unobservable Gives Ω(T 2/3 )
line crosses
(0, 1, 0)
i
(1, 0, 0)
bound
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
11 / 14
Discussion Finite stochastic partial monitoring fully classified
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games
Computational efficiency
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games
Computational efficiency verifying the condition
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games
Computational efficiency verifying the condition the algorithm
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games
Computational efficiency verifying the condition the algorithm
Scaling with the number of actions?
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games
Computational efficiency verifying the condition the algorithm
Scaling with the number of actions? lower bound: does not scale
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games
Computational efficiency verifying the condition the algorithm
Scaling with the number of actions? lower bound: does not scale upper bound: O(N 3/2 )
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games
Computational efficiency verifying the condition the algorithm
Scaling with the number of actions? lower bound: does not scale upper bound: O(N 3/2 )
Scaling with the number of outcomes? Nope!
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games
Computational efficiency verifying the condition the algorithm
Scaling with the number of actions? lower bound: does not scale upper bound: O(N 3/2 )
Scaling with the number of outcomes? Nope! Non-stochastic opponent?
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games
Computational efficiency verifying the condition the algorithm
Scaling with the number of actions? lower bound: does not scale upper bound: O(N 3/2 )
Scaling with the number of outcomes? Nope! Non-stochastic opponent? Conjecture: the classification holds
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games
Computational efficiency verifying the condition the algorithm
Scaling with the number of actions? lower bound: does not scale upper bound: O(N 3/2 )
Scaling with the number of outcomes? Nope! Non-stochastic opponent? Conjecture: the classification holds Algorithm for easy games wanted
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
12 / 14
Thank you!
Questions?
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
COLT2011, Budapest
13 / 14
References
Antos, A., Bart´ ok, G., P´ al, D., and Szepesv´ ari, C. (2011). Toward a classification of finite partial-monitoring games. http://arxiv.org/abs/1102.2041.
Regret minimization under partial monitoring. Mathematics of Operations Research, 31(3):562–580. Littlestone, N. and Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108:212–261.
Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E. (2002). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77. Cesa-Bianchi, N., Lugosi, G., and Stoltz, G. (2006).
Bart´ ok, P´ al, Szepesv´ ari (UofA)
Partial Monitoring
Piccolboni, A. and Schindelhauer, C. (2001). Discrete prediction games with arbitrary feedback and loss. In Proceedings of the 14th Annual Conference on Computational Learning Theory (COLT 2001), pages 208–223. Springer-Verlag.
COLT2011, Budapest
14 / 14