Online Learning : Beyond Regret - VideoLectures.NET

Comment

Report 4 Downloads 51 Views

Online Learning : Beyond Regret Alexander Rakhlin,

Karthik Sridharan,

Ambuj Tewari

Wharton School University of Pennsylvania

Toyota Technological Institute at Chicago

University of Texas at Austin

Online Learning : External Regret t = 1, . . . , T xt ∈ X

ft ∈ F

Player

Suﬀer loss �(ft , xt )

Adversary

Performance Measure : External Regret T T � � 1 1 RegT := �(ft , xt ) − inf �(f, xt ) f ∈F T T t=1 t=1

Online Learning : Beyond Regret Performance Measure : RegT := B (�(f1 , x1 ), . . . , �(fT , xT )) − inf B (�φ1 (f1 , x1 ), . . . , �φT (fT , xT )) φ∈ΦT

� : F × X �→ H, is H-valued loss (payoﬀ) function. (subset of a Banach space)

B : HT �→ R is a form of cumulative cost ΦT class of payoﬀ transformations consisting of φ = (φ1 , . . . , φT ) φt : HF ×X �→ HF ×X transforms payoﬀ function � → �φt

Goal : RegT → 0

Examples • !-Regret :

T T � � 1 1 RegT = �(ft , xt ) − inf �(φ(ft ), xt ) φ∈Φ T T t=1 t=1

• Blackwell’s Approachability�

� T � �1 � � � RegT = inf � �(ft , xt ) − c� c∈S � T � t=1

Costs [Evan-Dar et. al’09] • Learning with Global � � �

� T T � � �1 � �1 � � � � � RegT = � �(ft , xt )� − inf � �(f, xt )� �T � f ∈F � T � t=1 t=1

• Calibration, Adaptive regret, tracking best expert, slowly varying experts ...

Value of the Game Regret of optimal learner against optimal adversary : VT (�, ΦT ) := inf

sup

E ...

q1 ∈∆(F )x1 ∈X f1 ∼q1

inf

sup

E

qT ∈∆(F ) xT ∈X fT ∼qT

RegT (f1:T , x1:T )

Exists randomized online learning algorithm whose expected regret is bounded by VT (�, ΦT ) . No algorithm can guarantee regret better than VT (�, ΦT ) .

Some comments : • High probability version possible • T fixed in advance (a.s. convergence : Calibration, Blackwell approachability, Φ-regret) • Non-constructive

Triplex Inequality VT (�, ΦT ) :=

≤

inf

sup

E ...

q1 ∈∆(F )x1 ∈X f1 ∼q1

inf

sup

E

qT ∈∆(F ) xT ∈X fT ∼qT

RegT (f1:T , x1:T )

Martingale convergence term

1

+ Best response term

2

+ Problem complexity term (Uniform martingale convergence over ΦT )

3

Rademacher Complexity on Steroids Sequential Complexity : RT (�, ΦT , B) = sup E� sup B (�1 �φ1 (f1 (�), x1 (�)), . . . , �T �φT (fT (�), xT (�))) f ,x

φ∈ΦT

�

where � = (�1 , . . . , �T ) ∼ Unif {±1}

x1

+1

−1

x3

x2

−1 x4

+1 x5

−1 x6

−1 +1 −1 +1−1 +1

T

�

Example : � = (+1, −1, −1) +1 B(�1 �φ1 (f1 (�), x1 (�)), �2 �φ2 (f2 (�), x2 (�)), �3 �φ3 (f3 (�), x3 (�))) x7

−1 +1

= B(+�φ1 (f1 , x1 ), −�φ2 (f3 , x3 ), −�φ3 (f6 , x6 ))

Rademacher Complexity on Steroids Sequential Complexity : RT (�, ΦT , B) = sup E� sup B (�1 �φ1 (f1 (�), x1 (�)), . . . , �T �φT (fT (�), xT (�))) f ,x

φ∈ΦT

�

where � = (�1 , . . . , �T ) ∼ Unif {±1}

T

�

If B is sub-additive :

3

≤ 2RT (�, ΦT , B)

Bounding Sequential Complexity �

B(z1 , . . . , zT ) = G

1 T

T � t=1

zt

�

G is 1-Lipschitz w.r.t. smooth � · � Eg. G linear, G(·) = � · �2 If ∀f ∈ F, x ∈ X , φ ∈ ΦT , t ∈ [T ] we have that ��φt (f, x)� ≤ 1, then :

RT (�, ΦT , B) ≤

�

γ log (2 |ΦT |) T

What about infinite ΦT ?

Covering Numbers Example : Given trees f and x :

φ

x1

(0,0) (0,1)

x4

x3

x2

(0,0) x5

φ

(0,0) (0,0)

(0,0)

x6 (0,0) x7 (0,0)

(0,0)

x4

�

x1

x3

x2

(0,0) x5

φ

(0,0)

x6

(0,0)

(0,1)

x7

(1,0)

(0,0)

(0,0)

x4

x1

(0,0) x3

x2

(0,0)

(0,0)

covering trees :

��

(1,0) x5

x6

(0,0)

(0,0)

x7

(0,0)

(0,0) (0,0)

(0,0)

(0,1) (1,0) (0,1) (1,0)

(0,0)

(0,0)

(0,0) (0,0) (0,0)

N (α, ΦT , (f , x)) = size of smallest α cover on (f , x) If ∀f ∈ F, x ∈ X , φ ∈ ΦT , t ∈ [T ] we have that ��φt (f, x)� ≤ 1, then :

RT (�, ΦT , B) ≤ 4 inf

α>0

�

α+6

�

γ T

�

1 0

�

log (N (β, ΦT , T ))dβ

�

Φ -Regret

[Foster,Vohra’97, Greenwald, Jafari’03, Stoltz,Lugosi’07, Hazan,Kale’07, Gordon et. al’08] T T � �

1 RegT = T

1 �(ft , xt ) − inf φ∈Φ T t=1

�(φ(ft ), xt )

t=1

VT (�, ΦT , B) ≤ 2 RT (�, ΦT , B) Examples : Swap regret : O

��

|F| log |F| T

�

Internal regret : O

��

log |F| T

�

Linear Φ : Online convex optimization with 1-Lipschitz objective F = unit ball of Hilbert space, Φ = {M linear transformation : �M � ≤ R} VT (�, ΦT , B) ≤ O

�� R T

Metric entropy case : log Nmetric (Φ, α) = Θ(α−p )  � 1 − p+2  1 ) p < 2 (compared to previous T T VT (Φ) ≤ [Stoltz,Lugosi’07]  1 p>2 T 1/p

Φ -Regret

[Foster,Vohra’97, Greenwald, Jafari’03, Stoltz,Lugosi’07, Hazan,Kale’07, Gordon et. al’08] T T � �

1 RegT = T

1 �(ft , xt ) − inf φ∈Φ T t=1

�(φ(ft ), xt )

t=1

VT (�, ΦT , B) ≤ 2 RT (�, ΦT , B)

• Sequential covering number instead of metric entropy • Bounds for settings beyond OCO • Faster convergence to !-correlated equilibria

Blackwell Approachability � �

[Blackwell’56,Lehrer’03]

T � �1 � � � RegT = inf � �(ft , xt ) − c� c∈S � T � t=1

general Banach space

For any one-shot approachable game �: �

�� T � �1 � � � VT (�, ΦT , B) ≤ 4 sup E � dt � �T � M t=1

supremum over absconv(H)-valued martingale diﬀerence sequence. For all symmetric convex H, there exists a one-shot approachable game s.t.:

�� T �1 � � 1 � � VT (�, ΦT , B) ≥ sup E � dt � �T � 2 M t=1

Examples : �2 case Θ

�� 1 T

, �d1 case Θ

�� d T

, �d∞ case Θ

��

log d T

�

Other examples : Hilbert space, matrix norms, . . . Tight characterization of rates

Other Applications • Calibration • Global cost learning • Tracking best expert (k time invariant intervals) • Slowly varying transformations beyond OCO • Adaptive regret [Evan-Dar et. al’09]

[Hazan,Seshadri’09]

infinite F [Herbster,Warmuth’98]

Future Directions • Sequential complexity based generic algorithm ? • Stochastic/constrained adversary ? (external regret : see arxiv version) • Generic analysis for partial information games ? po we r sa ry Ad ve r

l ria sa

r ve Ad

full Information

Information

iid

bandit

e

c an

rm rfo

Pe et r eg

lR na

er

t Ex

re su ea M

Thanks! Köszönöm!

Recommend Documents

Online Learning with Feedback Graphs: Beyond Bandits

The Interplay Between Stability and Regret in Online Learning

Spectral Sparsification and Regret Minimization Beyond Matrix ...