Minimax Regret of Finite Partial-Monitoring Games in ... - VideoLectures

Report 4 Downloads 59 Views
Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments G´abor Bart´ ok, D´avid P´al, Csaba Szepesv´ari

COLT2011, Budapest

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

1 / 14

Finite Stochastic Partial-Monitoring Games

learner

Bart´ ok, P´ al, Szepesv´ ari (UofA)

environment

Partial Monitoring

COLT2011, Budapest

2 / 14

Finite Stochastic Partial-Monitoring Games

referee action It

outcome Jt

learner

Bart´ ok, P´ al, Szepesv´ ari (UofA)

environment

Partial Monitoring

COLT2011, Budapest

2 / 14

Finite Stochastic Partial-Monitoring Games

feedback ht = H(It , Jt )

loss `t = L(It , Jt )

referee action It

outcome Jt

L, H ∈ RN ×M publicly known

learner

Bart´ ok, P´ al, Szepesv´ ari (UofA)

environment

Partial Monitoring

COLT2011, Budapest

2 / 14

Finite Stochastic Partial-Monitoring Games

feedback ht = H(It , Jt )

loss `t = L(It , Jt )

feedback ht

referee action It

outcome Jt

learner

Bart´ ok, P´ al, Szepesv´ ari (UofA)

environment

Partial Monitoring

COLT2011, Budapest

2 / 14

Finite Stochastic Partial-Monitoring Games

feedback ht = H(It , Jt )

loss `t = L(It , Jt )

feedback ht

referee action It

outcome Jt

learner

environment

Finitely many actions, outcomes; Stochastic environment Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

2 / 14

Examples

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

3 / 14

Examples Bandits L=H

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

3 / 14

Examples Bandits

Full info 

 1 2 3 4 H = 1 2 3 4 1 2 3 4

L=H

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

3 / 14

Examples Bandits

Full info 

 1 2 3 4 H = 1 2 3 4 1 2 3 4

L=H

Dynamic pricing

L(i, j) = cIi>j + (j − i)Ii≤j

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

H(i, j) = Ii≤j

COLT2011, Budapest

3 / 14

Examples Bandits

Full info 

 1 2 3 4 H = 1 2 3 4 1 2 3 4

L=H

Dynamic pricing 

0 c  L = .  ..

1 0 .. .

c ···

Bart´ ok, P´ al, Szepesv´ ari (UofA)

··· ··· .. . c

 N −1 N − 2  ..  .  0

Partial Monitoring



1 ···  0 . . . H=  .. . . . . 0 ···

··· .. 0

.

 1 ..  .  ..  . 1

COLT2011, Budapest

3 / 14

Goal Performance measure P PT Expected regret RT (A) = E[ T t=1 L(It , Jt )] − mini E[ t=1 L(i, Jt )]

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

4 / 14

Goal Performance measure P PT Expected regret RT (A) = E[ T t=1 L(It , Jt )] − mini E[ t=1 L(i, Jt )] ˆT The problem: (L, H) given, determine the minimax expected regret R

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

4 / 14

Goal Performance measure P PT Expected regret RT (A) = E[ T t=1 L(It , Jt )] − mini E[ t=1 L(i, Jt )] ˆT The problem: (L, H) given, determine the minimax expected regret R ˆ T = O(T α ) for some 0 ≤ α ≤ 1 A typical result: R

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

4 / 14

Goal Performance measure P PT Expected regret RT (A) = E[ T t=1 L(It , Jt )] − mini E[ t=1 L(i, Jt )] ˆT The problem: (L, H) given, determine the minimax expected regret R ˆ T = O(T α ) for some 0 ≤ α ≤ 1 A typical result: R

Csaba

Bart´ ok, P´ al, Szepesv´ ari (UofA)

G´abor

Partial Monitoring

COLT2011, Budapest

4 / 14

Goal Performance measure P PT Expected regret RT (A) = E[ T t=1 L(It , Jt )] − mini E[ t=1 L(i, Jt )] ˆT The problem: (L, H) given, determine the minimax expected regret R ˆ T = O(T α ) for some 0 ≤ α ≤ 1 A typical result: R D´avid

Csaba

Bart´ ok, P´ al, Szepesv´ ari (UofA)

G´abor

Partial Monitoring

COLT2011, Budapest

4 / 14

Previous work

T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

Bart´ ok, P´ al, Szepesv´ ari (UofA)

dynamic pricing

Partial Monitoring

l.e.p.

COLT2011, Budapest

5 / 14

Previous work

T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

Full-info, bandits:

Bart´ ok, P´ al, Szepesv´ ari (UofA)

dynamic pricing

l.e.p.

[Littlestone and Warmuth, 1994, Auer et al., 2002]

Partial Monitoring

COLT2011, Budapest

5 / 14

Previous work

T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

Full-info, bandits:

dynamic pricing

[Littlestone and Warmuth, 1994, Auer et al., 2002]

e 3/4 ) Algorithm (non-hopeless): O(T

Bart´ ok, P´ al, Szepesv´ ari (UofA)

l.e.p.

[Piccolboni and Schindelhauer, 2001]

Partial Monitoring

COLT2011, Budapest

5 / 14

Previous work

T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

Full-info, bandits:

dynamic pricing

[Littlestone and Warmuth, 1994, Auer et al., 2002]

e 3/4 ) Algorithm (non-hopeless): O(T Reducing regret to

Bart´ ok, P´ al, Szepesv´ ari (UofA)

l.e.p.

[Piccolboni and Schindelhauer, 2001]

O(T 2/3 ) [Cesa-Bianchi et al., 2006]

Partial Monitoring

COLT2011, Budapest

5 / 14

Previous work

T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

Full-info, bandits:

dynamic pricing

[Littlestone and Warmuth, 1994, Auer et al., 2002]

e 3/4 ) Algorithm (non-hopeless): O(T Reducing regret to

Bart´ ok, P´ al, Szepesv´ ari (UofA)

l.e.p.

[Piccolboni and Schindelhauer, 2001]

O(T 2/3 ) [Cesa-Bianchi et al., 2006]

Partial Monitoring

+ l.e.p. lower

COLT2011, Budapest

5 / 14

Previous work

T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

Full-info, bandits:

dynamic pricing

l.e.p.

[Littlestone and Warmuth, 1994, Auer et al., 2002]

e 3/4 ) Algorithm (non-hopeless): O(T

[Piccolboni and Schindelhauer, 2001]

O(T 2/3 ) [Cesa-Bianchi et al., 2006]

Reducing regret to √ non-trivial → Ω( T )

Bart´ ok, P´ al, Szepesv´ ari (UofA)

+ l.e.p. lower

[Antos et al., 2011]

Partial Monitoring

COLT2011, Budapest

5 / 14

Previous work

T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

Full-info, bandits:

dynamic pricing

l.e.p.

[Littlestone and Warmuth, 1994, Auer et al., 2002]

e 3/4 ) Algorithm (non-hopeless): O(T

[Piccolboni and Schindelhauer, 2001]

O(T 2/3 ) [Cesa-Bianchi et al., 2006]

Reducing regret to √ non-trivial → Ω( T )

+ l.e.p. lower

[Antos et al., 2011]

Non-stochastic results, apply to stochastic

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

5 / 14

Our contribution T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

Bart´ ok, P´ al, Szepesv´ ari (UofA)

dynamic pricing

Partial Monitoring

l.e.p.

COLT2011, Budapest

6 / 14

Our contribution T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

dynamic pricing

l.e.p.

√ What about the grey area? Ω( T ) and O(T 2/3 )

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

6 / 14

Our contribution T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

dynamic pricing

l.e.p.

√ What about the grey area? Ω( T ) and O(T 2/3 ) What is in between? Is there a game with, say, Θ(T 3/5 )?

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

6 / 14

Our contribution T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

dynamic pricing

l.e.p.

√ What about the grey area? Ω( T ) and O(T 2/3 ) What is in between? Is there a game with, say, Θ(T 3/5 )? No! We eliminate the grey area

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

6 / 14

Our contribution T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

dynamic pricing

l.e.p.

√ What about the grey area? Ω( T ) and O(T 2/3 ) What is in between? Is there a game with, say, Θ(T 3/5 )? No! We eliminate the grey area Dynamic pricing is hard!

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

6 / 14

Our contribution T 1/2

0

?

easy

T 2/3

T

hard

trivial

hopeless full-info bandits

dynamic pricing

l.e.p.

√ What about the grey area? Ω( T ) and O(T 2/3 ) What is in between? Is there a game with, say, Θ(T 3/5 )? No! We eliminate the grey area Dynamic pricing is hard!

Main Theorem The minimax regret of any finite partial-monitoring game against √ e stochastic opponent can be 0 (trivial), Θ( T ) (easy), Θ(T 2/3 ) (hard) or Θ(T ) (hopeless). Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

6 / 14

Main tools 1: using L Cell decomposition of the probability simplex (the space of outcome distributions) (0, 0, 1)



 0 1 2 1 0 1    L = 1 1 0    .. .. .. . . .

p (0, 1, 0)

(1, 0, 0)

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

7 / 14

Main tools 1: using L Cell decomposition of the probability simplex (the space of outcome distributions) (0, 0, 1)



 0 1 2 1 0 1    L = 1 1 0    .. .. .. . . .

p (0, 1, 0)

−→

(1, 0, 0)

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

7 / 14

Main tools 1: using L Cell decomposition of the probability simplex (the space of outcome distributions) (0, 0, 1)



 0 1 2 1 0 1    L = 1 1 0    .. .. .. . . .

p (0, 1, 0)

−→

(1, 0, 0)

Boundary ⊆ (`i − `j )⊥ Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

7 / 14

Main tools 2: using H Row of action i in H: a b a c



Given opponent strategy p, probability of observing a, b, c?

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

8 / 14

Main tools 2: using H Row of action i in H: a b a c



Given opponent strategy p, probability of observing a, b, c?

    qa p1 + p3 qb  =  p2  qc p4

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

8 / 14

Main tools 2: using H Row of action i in H: a b a c



Given opponent strategy p, probability of observing a, b, c?

     qa p1 + p3 qb  =  p2  =  qc p4

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

?

  p1 p2    p3  p4 

COLT2011, Budapest

8 / 14

Main tools 2: using H Row of action i in H: a b a c



Given opponent strategy p, probability of observing a, b, c?

        p1 qa p1 + p3 1 0 1 0   qb  =  p2  =   p2  p3  qc p4 p4

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

8 / 14

Main tools 2: using H Row of action i in H: a b a c



Given opponent strategy p, probability of observing a, b, c?

        p1 qa p1 + p3 1 0 1 0   qb  =  p2  =  0 1 0 0  p2  p3  qc p4 p4

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

8 / 14

Main tools 2: using H Row of action i in H: a b a c



Given opponent strategy p, probability of observing a, b, c?

        p1 qa p1 + p3 1 0 1 0   qb  =  p2  =  0 1 0 0  p2  p3  qc p4 0 0 0 1 p4

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

8 / 14

Main tools 2: using H Row of action i in H: a b a c



Given opponent strategy p, probability of observing a, b, c?

        p1 qa p1 + p3 1 0 1 0   qb  =  p2  =  0 1 0 0  p2  p3  qc p4 0 0 0 1 p4 Indicator rows, signal matrix Si

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

8 / 14

Main tools 2: using H Row of action i in H: a b a c



Given opponent strategy p, probability of observing a, b, c?

        p1 qa p1 + p3 1 0 1 0   qb  =  p2  =  0 1 0 0  p2  p3  qc p4 0 0 0 1 p4 Indicator rows, signal matrix Si   Si For more actions: Si,i 0 = Si 0

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

8 / 14

Main tools 2: using H Row of action i in H: a b a c



Given opponent strategy p, probability of observing a, b, c?

        p1 qa p1 + p3 1 0 1 0   qb  =  p2  =  0 1 0 0  p2  p3  qc p4 0 0 0 1 p4 Indicator rows, signal matrix Si   Si For more actions: Si,i 0 = Si 0 Si,i 0 p = Si,i 0 p 0 → no way we can distinguish them

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

8 / 14

Main tools 2: using H Row of action i in H: a b a c



Given opponent strategy p, probability of observing a, b, c?

        p1 qa p1 + p3 1 0 1 0   qb  =  p2  =  0 1 0 0  p2  p3  qc p4 0 0 0 1 p4 Indicator rows, signal matrix Si   Si For more actions: Si,i 0 = Si 0 Si,i 0 p = Si,i 0 p 0 → no way we can distinguish them nullspace of Si,i 0 “dangerous” Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

8 / 14

What makes a game easy?

“Local observability”

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

9 / 14

What makes a game easy?

“Local observability” Two neighbor actions, which is better?

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

9 / 14

What makes a game easy?

“Local observability” Two neighbor actions, which is better? Decide without using other actions

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

9 / 14

What makes a game easy?

“Local observability” Two neighbor actions, which is better? Decide without using other actions

The condition: local observability For every neighboring action pair i, i 0 , `i − `i 0 is in the row space of Si,i 0 .

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

9 / 14

What makes a game easy?

“Local observability” Two neighbor actions, which is better? Decide without using other actions

The condition: local observability For every neighboring action pair i, i 0 , `i − `i 0 is in the row space of Si,i 0 . Why?

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

9 / 14

What makes a game easy?

“Local observability” Two neighbor actions, which is better? Decide without using other actions

The condition: local observability For every neighboring action pair i, i 0 , `i − `i 0 is in the row space of Si,i 0 . Why?

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

9 / 14

What makes a game easy?

“Local observability” Two neighbor actions, which is better? Decide without using other actions

The condition: local observability For every neighboring action pair i, i 0 , `i − `i 0 is in the row space of Si,i 0 . Why? Unbiased estimate of h`i − `i 0 , p ∗ i: “Which action is better?”

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

9 / 14

Algorithm outline

Maintain set of “alive” actions

(0, 0, 1)

p∗

(0, 1, 0)

(1, 0, 0)

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

10 / 14

Algorithm outline

Maintain set of “alive” actions In every “round”, choose each alive action

(0, 0, 1)

p∗

(0, 1, 0)

(1, 0, 0)

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

10 / 14

Algorithm outline

Maintain set of “alive” actions In every “round”, choose each alive action

(0, 0, 1)

Update estimates of loss differences

p∗

(0, 1, 0)

(1, 0, 0)

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

10 / 14

Algorithm outline

Maintain set of “alive” actions In every “round”, choose each alive action

(0, 0, 1)

Update estimates of loss differences If a loss difference is significant (Bernstein stopping),

p∗

(0, 1, 0)

(1, 0, 0)

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

10 / 14

Algorithm outline

Maintain set of “alive” actions In every “round”, choose each alive action

(0, 0, 1)

Update estimates of loss differences If a loss difference is significant (Bernstein stopping), eliminate suboptimal halfspace

p∗

(0, 1, 0)

(1, 0, 0)

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

10 / 14

Algorithm outline

Maintain set of “alive” actions In every “round”, choose each alive action

(0, 0, 1)

Update estimates of loss differences If a loss difference is significant (Bernstein stopping), eliminate suboptimal halfspace Do until only one action, or time step T

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

p∗

(0, 1, 0)

(1, 0, 0)

COLT2011, Budapest

10 / 14

Algorithm outline

Maintain set of “alive” actions In every “round”, choose each alive action

(0, 0, 1)

Update estimates of loss differences If a loss difference is significant (Bernstein stopping), eliminate suboptimal halfspace Do until only one action, or time step T √ e T ) if local observability Achieves O(

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

p∗

(0, 1, 0)

(1, 0, 0)

COLT2011, Budapest

10 / 14

Lower bound for hard games

Actions i and j not enough feedback

(0, 0, 1)

j

(0, 1, 0)

i

(1, 0, 0)

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

11 / 14

Lower bound for hard games

Actions i and j not enough feedback

(0, 0, 1)

“dangerous line” crosses (Ker Si,j ) j

(0, 1, 0)

i

(1, 0, 0)

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

11 / 14

Lower bound for hard games

Actions i and j not enough feedback

(0, 0, 1)

“dangerous line” crosses (Ker Si,j ) Third action needed, but costly j

(0, 1, 0)

i

(1, 0, 0)

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

11 / 14

Lower bound for hard games

Actions i and j not enough feedback

(0, 0, 1)

“dangerous line” crosses (Ker Si,j ) Third action needed, but costly When does this line exist?

j

(0, 1, 0)

i

(1, 0, 0)

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

11 / 14

Lower bound for hard games

Actions i and j not enough feedback

(0, 0, 1)

“dangerous line” crosses (Ker Si,j ) Third action needed, but costly When does this line exist?

j

Coincidence! When no local observability > (`i − `j ) 6∈ Im Si,j ↔ Ker Si,j 6⊆ (`i − `j )⊥ | {z } | {z } unobservable

Bart´ ok, P´ al, Szepesv´ ari (UofA)

line crosses

Partial Monitoring

(0, 1, 0)

i

(1, 0, 0)

COLT2011, Budapest

11 / 14

Lower bound for hard games

Actions i and j not enough feedback

(0, 0, 1)

“dangerous line” crosses (Ker Si,j ) Third action needed, but costly When does this line exist?

j

Coincidence! When no local observability > (`i − `j ) 6∈ Im Si,j ↔ Ker Si,j 6⊆ (`i − `j )⊥ | {z } | {z } unobservable Gives Ω(T 2/3 )

line crosses

(0, 1, 0)

i

(1, 0, 0)

bound

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

11 / 14

Discussion Finite stochastic partial monitoring fully classified

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games

Computational efficiency

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games

Computational efficiency verifying the condition

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games

Computational efficiency verifying the condition the algorithm

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games

Computational efficiency verifying the condition the algorithm

Scaling with the number of actions?

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games

Computational efficiency verifying the condition the algorithm

Scaling with the number of actions? lower bound: does not scale

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games

Computational efficiency verifying the condition the algorithm

Scaling with the number of actions? lower bound: does not scale upper bound: O(N 3/2 )

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games

Computational efficiency verifying the condition the algorithm

Scaling with the number of actions? lower bound: does not scale upper bound: O(N 3/2 )

Scaling with the number of outcomes? Nope!

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games

Computational efficiency verifying the condition the algorithm

Scaling with the number of actions? lower bound: does not scale upper bound: O(N 3/2 )

Scaling with the number of outcomes? Nope! Non-stochastic opponent?

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games

Computational efficiency verifying the condition the algorithm

Scaling with the number of actions? lower bound: does not scale upper bound: O(N 3/2 )

Scaling with the number of outcomes? Nope! Non-stochastic opponent? Conjecture: the classification holds

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Discussion Finite stochastic partial monitoring fully classified trivial, easy, hard, hopeless Key condition separating easy and hard: local observability New algorithm achieves minimax regret rate for easy games

Computational efficiency verifying the condition the algorithm

Scaling with the number of actions? lower bound: does not scale upper bound: O(N 3/2 )

Scaling with the number of outcomes? Nope! Non-stochastic opponent? Conjecture: the classification holds Algorithm for easy games wanted

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

12 / 14

Thank you!

Questions?

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

COLT2011, Budapest

13 / 14

References

Antos, A., Bart´ ok, G., P´ al, D., and Szepesv´ ari, C. (2011). Toward a classification of finite partial-monitoring games. http://arxiv.org/abs/1102.2041.

Regret minimization under partial monitoring. Mathematics of Operations Research, 31(3):562–580. Littlestone, N. and Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108:212–261.

Auer, P., Cesa-Bianchi, N., Freund, Y., and Schapire, R. E. (2002). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48–77. Cesa-Bianchi, N., Lugosi, G., and Stoltz, G. (2006).

Bart´ ok, P´ al, Szepesv´ ari (UofA)

Partial Monitoring

Piccolboni, A. and Schindelhauer, C. (2001). Discrete prediction games with arbitrary feedback and loss. In Proceedings of the 14th Annual Conference on Computational Learning Theory (COLT 2001), pages 208–223. Springer-Verlag.

COLT2011, Budapest

14 / 14