pdf2

Comment

Report 1 Downloads 287 Views

4. Opponent Forecasting in Repeated Games

Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games

1 / 21

Learning in Games

I

Literature examines limiting behavior of interacting players.

I

One approach is to have players compute forecasts for opponents and best respond. Types of forecasting and convergence using calibrated forecasts:

I

I I

Calibrated Forecasts ⇒ Correlated equilibria. Calibrated Forecasts of joint actions ⇒ Convex hull of NE.

I

These calibration algorithms are computationally intensive.

I

The authors propose tracking forecasts.

Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games

2 / 21

Static Game I

Two players, P1 and P2 .

I

m moves per player

I

Strategy pi ∈ ∆.

I

Player’s action ai = rand [pi ] ∈ vert [∆].

I

Utility, Ui (ai , a−i ; pi ) = aiT Mi a−i + τH (pi ) where H(p) = −p T log(p) is the entropy of the distribution p.

I

H is maximized when p is uniform, and minimized when all weight is on one alternative.

Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games

3 / 21

Static Game I

Best Response βi (p−i ) = arg max Ui (pi , p−i ) pi ∈∆

I

Logit function

I

e xi e x1 + · · · + e xn The best response for “τ ≥ 0” is (σ(x))i =

βi (p−i ) = σ (Mi p−i /τ) I

Nash Equilibrium ∗ ∗ Ui pi , p−i ≤ Ui pi∗ , p−i , i ∈ {1, 2} or equivalently, ∗ pi∗ = βi p−i , i ∈ {1, 2}

Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games

4 / 21

Repeated Game I

Stages k = 0, 1, 2, . . ..

I

Player’s strategies pi (k) yield action ai (k).

I

Utility Ui (ai (k), a−i (k); pi (k)).

I

Each stage players update pi (k) with information available.

I

Players can observe other player’s action after each period.

I

Players can update empirical frequency as follows, 1 (ai (k) − qi (k)) qi (k + 1) = qi (k) + k +1 .

Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games

5 / 21

Repeated Game Variations I

Smooth-Fictitious Play: I I

τ>0 Player i updates ai (k) = rand [pi (k)] pi (k) = βi (q−i (k))

I

Gradient play: I I

τ=0 Player i updates ai (k) = rand [pi (k)] pi (k) = Π∆ [qi (k) + Mi q−i (k)]

I

Comes from ∇pi Ui (pi , p−i ) = Mi p−i .

Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games

6 / 21

Repeated Game Variations I

Exponential Regret Matching I

Players update according to ai (k) = rand [pi (k)] pi (k) = σ (ri (k)/τ) 1 ri (k + 1) = ri (k)+ Mi a−i (k) − ai (k)T Mi a−i (k) × 1 k +1

I

Smooth fictitious play with tracking forecasts: I

Players update according to ai (k) = rand [pi (k)] pi (k) = βi (f−i (k)) ρ 1 (a−i (k) − f−i (k)) f−i (k + 1) = f−i (k) + k +1 where ρ ∈ [0, 1].

Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games

7 / 21

Theorem 4.1 Suppose P1 plays the smooth fictitious play with tracking forecast with 1/2 < ρ < 1. If the outcome sequence by P2 is a I Bounded rate sequence I Relatively bounded rate sequence, with ρ < η < 1, Then almost surely lim (f2 (k) − p2 (k)) = 0

k→∞

which implies that lim (p1 (k) − β1 (p2 (k))) = 0

k→∞

I

P2 can play: SFP, GP, ERM, or SFPTF with η > ρ (more like SFP).

Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games

8 / 21

Equally Fast Players I

If both players use smooth fictitious play with tracking forecasts and equal ρ, then player’s forecasts are not necessarily weakly calibrated.

I

Can we get convergence with equally fast players?

I

What if both players are using smooth fictitious play.

I

Asymptotic behavior can be analyzed with the set of ODE’s q˙ 1 (t) = −q1 (t) + β1 (q2 (t)) q˙ 2 (t) = −q2 (t) + β2 (q1 (t))

I

Linearize and find eigenvalues of Jacobian.

Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games

9 / 21

Non-Convergence of Smooth Fictitious Play Theorem (Benaim and Hirsch (1999)) Consider smooth fictitious play with a Nash equilibrium (q1∗ , q2∗ ). If any eigenvalue of J has positive real part then, lim qi (k) = qi∗ , i ∈ {1, 2}

k→∞

occurs with zero probability. I

Can we adjust the learning algorithm to enable convergence to Nash equilibrium?

I

Yes - Modified Tracking Forecast

Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games

10 / 21

Modified Tracking Forecast I

I I

For the outcome sequence x(k), the modified tracking forecast is defined by λ (x(k) − f (k)) f (k + 1) = f (k) + k +1 for some fixed λ 1. Weakly calibrated for ε ≈ 1/λ. Smooth Fictitious Play with Combined Forecasts: I

Players update according to ai (k) = rand [pi (k)] pi (k) = βi ((1 − γ) q−i (k) + γf−i (k)) λ (a−i (k) − f−i (k)) f−i (k + 1) = f−i (k) + k +1

Julian and Mohamed Online calibrated forecasts: Memory efficiency versus universality for learning in games

11 / 21

Theorem 4.2 Both players play smooth fictitious play with combined forecasts with Nash equilibrium (q1∗ , q2∗ ). The eigenvalues of J are ai + jbi for standard smooth fictitious play, then lim qi (k) = qi∗ , i ∈ {1, 2}

k→∞

occurs with strictly positive probability for sufficiently large λ if and only if, I

0 ≤ γ ≤ 1, if maxi ai < 0

I

maxi

ai ai2 +bi2

Recommend Documents

2016 Apr7 PDF2 Hullmark

infographic v5 PDF2

Cocoa Recipes PDF2