4. Opponent Forecasting in Repeated Games
Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games
1 / 21
Learning in Games
I
Literature examines limiting behavior of interacting players.
I
One approach is to have players compute forecasts for opponents and best respond. Types of forecasting and convergence using calibrated forecasts:
I
I I
Calibrated Forecasts ⇒ Correlated equilibria. Calibrated Forecasts of joint actions ⇒ Convex hull of NE.
I
These calibration algorithms are computationally intensive.
I
The authors propose tracking forecasts.
Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games
2 / 21
Static Game I
Two players, P1 and P2 .
I
m moves per player
I
Strategy pi ∈ ∆.
I
Player’s action ai = rand [pi ] ∈ vert [∆].
I
Utility, Ui (ai , a−i ; pi ) = aiT Mi a−i + τH (pi ) where H(p) = −p T log(p) is the entropy of the distribution p.
I
H is maximized when p is uniform, and minimized when all weight is on one alternative.
Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games
3 / 21
Static Game I
Best Response βi (p−i ) = arg max Ui (pi , p−i ) pi ∈∆
I
Logit function
I
e xi e x1 + · · · + e xn The best response for “τ ≥ 0” is (σ(x))i =
βi (p−i ) = σ (Mi p−i /τ) I
Nash Equilibrium ∗ ∗ Ui pi , p−i ≤ Ui pi∗ , p−i , i ∈ {1, 2} or equivalently, ∗ pi∗ = βi p−i , i ∈ {1, 2}
Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games
4 / 21
Repeated Game I
Stages k = 0, 1, 2, . . ..
I
Player’s strategies pi (k) yield action ai (k).
I
Utility Ui (ai (k), a−i (k); pi (k)).
I
Each stage players update pi (k) with information available.
I
Players can observe other player’s action after each period.
I
Players can update empirical frequency as follows, 1 (ai (k) − qi (k)) qi (k + 1) = qi (k) + k +1 .
Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games
5 / 21
Repeated Game Variations I
Smooth-Fictitious Play: I I
τ>0 Player i updates ai (k) = rand [pi (k)] pi (k) = βi (q−i (k))
I
Gradient play: I I
τ=0 Player i updates ai (k) = rand [pi (k)] pi (k) = Π∆ [qi (k) + Mi q−i (k)]
I
Comes from ∇pi Ui (pi , p−i ) = Mi p−i .
Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games
6 / 21
Repeated Game Variations I
Exponential Regret Matching I
Players update according to ai (k) = rand [pi (k)] pi (k) = σ (ri (k)/τ) 1 ri (k + 1) = ri (k)+ Mi a−i (k) − ai (k)T Mi a−i (k) × 1 k +1
I
Smooth fictitious play with tracking forecasts: I
Players update according to ai (k) = rand [pi (k)] pi (k) = βi (f−i (k)) ρ 1 (a−i (k) − f−i (k)) f−i (k + 1) = f−i (k) + k +1 where ρ ∈ [0, 1].
Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games
7 / 21
Theorem 4.1 Suppose P1 plays the smooth fictitious play with tracking forecast with 1/2 < ρ < 1. If the outcome sequence by P2 is a I Bounded rate sequence I Relatively bounded rate sequence, with ρ < η < 1, Then almost surely lim (f2 (k) − p2 (k)) = 0
k→∞
which implies that lim (p1 (k) − β1 (p2 (k))) = 0
k→∞
I
P2 can play: SFP, GP, ERM, or SFPTF with η > ρ (more like SFP).
Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games
8 / 21
Equally Fast Players I
If both players use smooth fictitious play with tracking forecasts and equal ρ, then player’s forecasts are not necessarily weakly calibrated.
I
Can we get convergence with equally fast players?
I
What if both players are using smooth fictitious play.
I
Asymptotic behavior can be analyzed with the set of ODE’s q˙ 1 (t) = −q1 (t) + β1 (q2 (t)) q˙ 2 (t) = −q2 (t) + β2 (q1 (t))
I
Linearize and find eigenvalues of Jacobian.
Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games
9 / 21
Non-Convergence of Smooth Fictitious Play Theorem (Benaim and Hirsch (1999)) Consider smooth fictitious play with a Nash equilibrium (q1∗ , q2∗ ). If any eigenvalue of J has positive real part then, lim qi (k) = qi∗ , i ∈ {1, 2}
k→∞
occurs with zero probability. I
Can we adjust the learning algorithm to enable convergence to Nash equilibrium?
I
Yes - Modified Tracking Forecast
Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games
10 / 21
Modified Tracking Forecast I
I I
For the outcome sequence x(k), the modified tracking forecast is defined by λ (x(k) − f (k)) f (k + 1) = f (k) + k +1 for some fixed λ 1. Weakly calibrated for ε ≈ 1/λ. Smooth Fictitious Play with Combined Forecasts: I
Players update according to ai (k) = rand [pi (k)] pi (k) = βi ((1 − γ) q−i (k) + γf−i (k)) λ (a−i (k) − f−i (k)) f−i (k + 1) = f−i (k) + k +1
Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games
11 / 21
Theorem 4.2 Both players play smooth fictitious play with combined forecasts with Nash equilibrium (q1∗ , q2∗ ). The eigenvalues of J are ai + jbi for standard smooth fictitious play, then lim qi (k) = qi∗ , i ∈ {1, 2}
k→∞
occurs with strictly positive probability for sufficiently large λ if and only if, I
0 ≤ γ ≤ 1, if maxi ai < 0
I
maxi
ai ai2 +bi2