Online Learning : Beyond Regret Alexander Rakhlin,
Karthik Sridharan,
Ambuj Tewari
Wharton School University of Pennsylvania
Toyota Technological Institute at Chicago
University of Texas at Austin
Online Learning : External Regret t = 1, . . . , T xt ∈ X
ft ∈ F
Player
Suffer loss �(ft , xt )
Adversary
Performance Measure : External Regret T T � � 1 1 RegT := �(ft , xt ) − inf �(f, xt ) f ∈F T T t=1 t=1
Online Learning : Beyond Regret Performance Measure : RegT := B (�(f1 , x1 ), . . . , �(fT , xT )) − inf B (�φ1 (f1 , x1 ), . . . , �φT (fT , xT )) φ∈ΦT
� : F × X �→ H, is H-valued loss (payoff) function. (subset of a Banach space)
B : HT �→ R is a form of cumulative cost ΦT class of payoff transformations consisting of φ = (φ1 , . . . , φT ) φt : HF ×X �→ HF ×X transforms payoff function � → �φt
Goal : RegT → 0
Examples • !-Regret :
T T � � 1 1 RegT = �(ft , xt ) − inf �(φ(ft ), xt ) φ∈Φ T T t=1 t=1
• Blackwell’s Approachability�
� T � �1 � � � RegT = inf � �(ft , xt ) − c� c∈S � T � t=1
Costs [Evan-Dar et. al’09] • Learning with Global � � �
� T T � � �1 � �1 � � � � � RegT = � �(ft , xt )� − inf � �(f, xt )� �T � f ∈F � T � t=1 t=1
• Calibration, Adaptive regret, tracking best expert, slowly varying experts ...
Value of the Game Regret of optimal learner against optimal adversary : VT (�, ΦT ) := inf
sup
E ...
q1 ∈∆(F )x1 ∈X f1 ∼q1
inf
sup
E
qT ∈∆(F ) xT ∈X fT ∼qT
RegT (f1:T , x1:T )
Exists randomized online learning algorithm whose expected regret is bounded by VT (�, ΦT ) . No algorithm can guarantee regret better than VT (�, ΦT ) .
Some comments : • High probability version possible • T fixed in advance (a.s. convergence : Calibration, Blackwell approachability, Φ-regret) • Non-constructive
Triplex Inequality VT (�, ΦT ) :=
≤
inf
sup
E ...
q1 ∈∆(F )x1 ∈X f1 ∼q1
inf
sup
E
qT ∈∆(F ) xT ∈X fT ∼qT
RegT (f1:T , x1:T )
Martingale convergence term
1
+ Best response term
2
+ Problem complexity term (Uniform martingale convergence over ΦT )
3
Rademacher Complexity on Steroids Sequential Complexity : RT (�, ΦT , B) = sup E� sup B (�1 �φ1 (f1 (�), x1 (�)), . . . , �T �φT (fT (�), xT (�))) f ,x
φ∈ΦT
�
where � = (�1 , . . . , �T ) ∼ Unif {±1}
x1
+1
−1
x3
x2
−1 x4
+1 x5
−1 x6
−1 +1 −1 +1−1 +1
T
�
Example : � = (+1, −1, −1) +1 B(�1 �φ1 (f1 (�), x1 (�)), �2 �φ2 (f2 (�), x2 (�)), �3 �φ3 (f3 (�), x3 (�))) x7
−1 +1
= B(+�φ1 (f1 , x1 ), −�φ2 (f3 , x3 ), −�φ3 (f6 , x6 ))
Rademacher Complexity on Steroids Sequential Complexity : RT (�, ΦT , B) = sup E� sup B (�1 �φ1 (f1 (�), x1 (�)), . . . , �T �φT (fT (�), xT (�))) f ,x
φ∈ΦT
�
where � = (�1 , . . . , �T ) ∼ Unif {±1}
T
�
If B is sub-additive :
3
≤ 2RT (�, ΦT , B)
Bounding Sequential Complexity �
B(z1 , . . . , zT ) = G
1 T
T � t=1
zt
�
G is 1-Lipschitz w.r.t. smooth � · � Eg. G linear, G(·) = � · �2 If ∀f ∈ F, x ∈ X , φ ∈ ΦT , t ∈ [T ] we have that ��φt (f, x)� ≤ 1, then :
RT (�, ΦT , B) ≤
�
γ log (2 |ΦT |) T
What about infinite ΦT ?
Covering Numbers Example : Given trees f and x :
φ
x1
(0,0) (0,1)
x4
x3
x2
(0,0) x5
φ
(0,0) (0,0)
(0,0)
x6 (0,0) x7 (0,0)
(0,0)
x4
�
x1
x3
x2
(0,0) x5
φ
(0,0)
x6
(0,0)
(0,1)
x7
(1,0)
(0,0)
(0,0)
x4
x1
(0,0) x3
x2
(0,0)
(0,0)
covering trees :
��
(1,0) x5
x6
(0,0)
(0,0)
x7
(0,0)
(0,0) (0,0)
(0,0)
(0,1) (1,0) (0,1) (1,0)
(0,0)
(0,0)
(0,0) (0,0) (0,0)
N (α, ΦT , (f , x)) = size of smallest α cover on (f , x) If ∀f ∈ F, x ∈ X , φ ∈ ΦT , t ∈ [T ] we have that ��φt (f, x)� ≤ 1, then :
RT (�, ΦT , B) ≤ 4 inf
α>0
�
α+6
�
γ T
�
1 0
�
log (N (β, ΦT , T ))dβ
�
Φ -Regret
[Foster,Vohra’97, Greenwald, Jafari’03, Stoltz,Lugosi’07, Hazan,Kale’07, Gordon et. al’08] T T � �
1 RegT = T
1 �(ft , xt ) − inf φ∈Φ T t=1
�(φ(ft ), xt )
t=1
VT (�, ΦT , B) ≤ 2 RT (�, ΦT , B) Examples : Swap regret : O
��
|F| log |F| T
�
Internal regret : O
��
log |F| T
�
Linear Φ : Online convex optimization with 1-Lipschitz objective F = unit ball of Hilbert space, Φ = {M linear transformation : �M � ≤ R} VT (�, ΦT , B) ≤ O
�� � R T
Metric entropy case : log Nmetric (Φ, α) = Θ(α−p ) � 1 − p+2 1 ) p < 2 (compared to previous T T VT (Φ) ≤ [Stoltz,Lugosi’07] 1 p>2 T 1/p
Φ -Regret
[Foster,Vohra’97, Greenwald, Jafari’03, Stoltz,Lugosi’07, Hazan,Kale’07, Gordon et. al’08] T T � �
1 RegT = T
1 �(ft , xt ) − inf φ∈Φ T t=1
�(φ(ft ), xt )
t=1
VT (�, ΦT , B) ≤ 2 RT (�, ΦT , B)
• Sequential covering number instead of metric entropy • Bounds for settings beyond OCO • Faster convergence to !-correlated equilibria
Blackwell Approachability � �
[Blackwell’56,Lehrer’03]
T � �1 � � � RegT = inf � �(ft , xt ) − c� c∈S � T � t=1
general Banach space
For any one-shot approachable game �: �
�� T � �1 � � � VT (�, ΦT , B) ≤ 4 sup E � dt � �T � M t=1
supremum over absconv(H)-valued martingale difference sequence. For all symmetric convex H, there exists a one-shot approachable game s.t.:
�� �� T �1 � � 1 � � VT (�, ΦT , B) ≥ sup E � dt � �T � 2 M t=1
Examples : �2 case Θ
�� � 1 T
, �d1 case Θ
�� � d T
, �d∞ case Θ
��
log d T
�
Other examples : Hilbert space, matrix norms, . . . Tight characterization of rates
Other Applications • Calibration • Global cost learning • Tracking best expert (k time invariant intervals) • Slowly varying transformations beyond OCO • Adaptive regret [Evan-Dar et. al’09]
[Hazan,Seshadri’09]
infinite F [Herbster,Warmuth’98]
Future Directions • Sequential complexity based generic algorithm ? • Stochastic/constrained adversary ? (external regret : see arxiv version) • Generic analysis for partial information games ? po we r sa ry Ad ve r
l ria sa
r ve Ad
full Information
Information
iid
bandit
e
c an
rm rfo
Pe et r eg
lR na
er
t Ex
re su ea M
Thanks! Köszönöm!