From pointwise convergence of evolutionary ... - Semantic Scholar

Report 2 Downloads 65 Views
From pointwise convergence of evolutionary dynamics to average case analysis of decentralized algorithms

arXiv:1403.3885v4 [cs.GT] 4 Apr 2014

Ioannis Panageas Georgia Instiute of Technology [email protected]

Georgios Piliouras Georgia Instiute of Technology [email protected]

Abstract Nash’s proof of equilibrium existence has offered a tool of seminal importance to economic theory, however, a remaining, nagging hurdle is the possibility of multiple equilibria, perhaps with quite different properties. A wide range of approaches has been employed in an effort to reduce the study of games down to analyzing a single behavioral point (e.g., best or worst equilibria with respect to social welfare). We move away from the search of a single, correct solution concept. We propose a quantitative framework where the predictions of different equilibria are weighted by their probability to arise under evolutionary dynamics given random initial conditions. This average case analysis is shown to offer the possibility of significantly tighter understanding of classic game theoretic settings, including quantifying the risk dominance of payoff dominated equilibria in stag-hunt games and reducing prediction uncertainty due to large gaps between price of stability and price of anarchy in coordination and congestion games. This approach rests upon new deep topological results about evolutionary dynamics, including point-wise convergence results for replicator dynamics in linear congestion games and explicit computation of stable and unstable manifolds of equilibrium fixed points in asymmetric games. We conclude by introducing a concurrent convergent deterministic discrete-time dynamic for general linear congestion games and performing stability analysis on it.

1

Introduction

Nash’s theorem [24] on the existence of fixed points in game theoretic dynamics ushered in an exciting new era in the study of economics. At a high level, the inception of the (Nash) equilibrium concept allowed, to a large degree, the disentanglement between the study of complex behavioral dynamics and the study of games. Equilibria could be concisely described, independently from the dynamics that gave rise to them, as solutions of algebraic equations. Crucially, their definition was simple, intuitive, analytically tractable in many practical instances of small games, and arguably instructive about real life behavior. The notion of a solution to (general) games, which was introduced by the work of von Neumann in the special case of zero-sum games [34], would be solidified as a key landmark of economic thought. This mapping from games to their solutions, i.e., the set of equilibria, grounded economic theory in a solid foundation and allowed for a whole new class of questions in regards to numerous properties of these sets including their geometry, computability, and resulting agent utilities. 1

Unfortunately, unlike von Neumann’s essentially unique behavioral solution to zerosum games, it became immediately clear that Nash equilibrium fell short from its role as a universal solution concept in a crucial way. It is non-unique. It is straightforward to find games1 with constant number of agents and strategies and uncountably many distinct equilibria with different properties in terms of support sizes, symmetries, efficiency, and practically any other conceivable attribute of interest. This raises a rather natural question. How should we analyze games with multiple Nash equilibria? The centrality of the equilibrium selection problem can hardly be overestimated. Indeed, according to Ariel Rubinstein “No other task may be more significant within game theory. A successful theory of this type may change all of economic theory.” Accordingly, a wide range of radically different approaches has been explored by economists and computer scientists alike. Despite their differing points of view, they share a common high level goal. The goal is to reduce the number of admissible equilibria and, if possible, effectively pinpoint a single one as target for analytical inquiry. This way, the multi-valued equilibrium correspondence becomes a simple function and prediction uncertainty vanishes. Although no single approach stands out as providing the definitive answer, each has allowed for significant headways in specific classes of interesting games, and some have sprung forth standalone lines of inquiry. Next, we will focus on two approaches that have inspired our work: risk dominance and price of anarchy analysis. Risk dominance is an equilibrium refinement process that centers around uncertainty about opponent behavior. A Nash equilibrium is considered risk dominant if it has the largest basin of attraction. Specifically, the more uncertainty agents have about the actions of others, the more likely it is that they will choose the corresponding strategy. The benchmark example is the Stag Hunt game, shown in figure 1(a). In such symmetric 2x2 coordination games a strategy is risk dominant if it is a best response to the uniformly random strategy of the opponent. Although risk dominance [9] was originally introduced as a hypothetical model of the method by which perfectly rational players select their actions, it may also be interpreted [23]) as the result of evolutionary processes. Experimental evidence seems supportive about its practical relevance [29]. However, a critical shortcoming of this approach is that it is largely qualitative in nature. An equilibrium whose basin of attraction covers 99.9% of the state space has significantly more predictive power than one with 50.1%. More distressingly, as we move towards classes of games with exponentially or uncountably many equilibria such arguments become moot, since all equilibria may have attractors of vanishingly small size. Price of anarchy [16] follows a much more quantitative approach. The point of view here is that of optimization and the focus is on extremal equilibria. Price of anarchy, defined as the ratio between the social welfare of the worst equilibrium and that of the optimum tries to capture the loss in efficiency due to the lack of centralized authority. A plethora of similar concepts, based on normalized ratios, has been defined (e.g., price of stability [2] focuses on best case equilibria). Tight bounds on these quantities have been established for large classes of games [4, 28]. However, these bounds do not necessarily reflect the whole picture. They usually correspond to highly artificial instances. Even in these bad instances, typically there exist sizable gaps between their price of anarchy and price of stability, allowing for the possibility of significantly tighter analysis of system performance. More to the point, worst case equilibria maybe unlikely in themselves by having a negligible basin of attraction [15]. 1

An example of such a game can be found in section 6.1.

2

Our approach. We do not aim to solve the equilibrium selection problem, but to circumvent it. The high level intuition as follows: Each agent chooses a randomized strategy uniformly at random. We use this profile of randomized strategies as an initial condition for replicator, a deterministic evolutionary dynamic. As long as we can prove point-wise convergence, i.e., that given any initial condition the corresponding trajectory has a uniquely defined limit point, then the whole system can be viewed as a decentralized algorithm that maps initial conditions to their corresponding equilibrium. Each equilibrium has a corresponding basin of attraction, which is defined as the set of points that converge to it. We assign to each equilibrium a likelihood that is proportional to the volume of its respective basin of attraction. Our results. In this paper we focus on potential games which are known to be isomorphic to congestion games [22]. We start by establishing the first to our knowledge point-wise convergence result for replicator dynamics2 in linear congestion games. The proof is based on local, information theoretic inspired Lyapunov functions around each equilibrium and not on the typical global potential functions used in establishing the standard set-wise convergence to equilibrium sets. This result, which is of independent interest, allows us to define properly the notion of average case system performance where the social welfare of each equilibrium is weighted by the size of its basin. Next, we distinguish between equilibria whose region of attraction has positive/zero measure. All equilibria with no unstable eigenvectors must satisfy a necessary game theoretic condition, known as weak stability [15]. An equilibrium is weakly stable if given any two randomizing agents, fixing one of the agents to choosing one of his strategies with probability one, leaves the other agent indifferent between the strategies in his support. Given pointwise convergence of the replicator, all but a zero measurable subset of initial conditions converge to weakly stable equilibria. Two key technical hurdles in translating eigenvalue results regarding local stability/instability to global measure theoretic statements about basins of attraction is the possibility of uncountably many equilibria that exclude naive union bound treatments and the fact that the replicator in its usual form evolves over a simplex that defines a zero measure set in its native space. We circumvent the uncountably many equilibria case by producing countable open covers of the equilibrium set via Lindel˝of’s lemma, producing zero measure statements over each open set in the cover and then applying union bound (proof can be found in the appendix). Our approach can be thought of as a detailed, quantitative analysis of risk dominance. Specifically, instead of merely arguing that a specific equilibrium has size of basin of attraction at least 1/2, we aim to explicitly compute the sizes of the basin of attraction of each equilibrium. Thus, a natural starting point is the study of the Stag Hunt game √ itself. 1 We show that the size of attraction of the risk dominant equilibrium is 27 (9 + 2 3π) ≈ 0.7364. Our analysis builds upon a combination of ideas. First, we apply weak stability to establish that the only equilibria with non-negligible region of attraction are the pure ones. Next, we construct an algebraic description of the stable and unstable manifolds of the mixed Nash equilibrium by building on information theoretic invariant properties of the replicator system. We solve these systems to provide an exact explicit description of these objects. Next, we move to a class of two agent coordination games that generalizes Stag Hunt games/dynamics. Although we can still produce implicit parametric descriptions of the stable/unstable manifolds of the interior Nash equilibrium, the computation of explicit 2

To our knowledge this is the first result of deterministic pointwise convergence to equilibria in linear congestion games for any concurrent dynamic.

3

parametric closed form solutions appears hopeless since the resulting systems contain parametric uncertainty in the exponent. Instead, we focus on computing approximately the volume of each basin. We exploit these parametric descriptions to extract geometric properties of these manifolds that remain valid for the whole parameter space. We use this understanding to produce tight coverings of the attractor basins via parametric families of polytopes. We compute both upper and lower bounds for the volume of these attractors. The average price of anarchy of this class of games, where the social welfare of each equilibrium is weighted by the size of its basin is shown to lie somewhere between 1.15 and 1.21. In contrast, the price of anarchy is unbounded. Although distinctions between equilibria with zero and non-zero basin of attractions do not allow for as detailed performance analysis as the one presented in the case of two agent coordination games, one can still apply these results to get significant improvements over price of anarchy analysis. For example, we show that in the case of unweighted nballs, n bins games (singleton congestion games) with cost functions equal to the bins latency, the average price of anarchy both in terms of social cost as well as makespan is equal to one. In contrast the price of anarchy is Ω(log(n)/ log log(n)). Finally, we show that the average price of anarchy of any linear congestion game is strictly less than 5/2. We conclude our analysis by introducing a replicator inspired discrete-time dynamic. This dynamic is both deterministic and concurrent3 and we establish set-wise convergence to equilibria for it in linear congestion games. The convergence result here is significantly more technical than that of the replicator, its continuous-time counterpart. To our knowledge, this is the first result of its kind since prior results on network congestion games are either based on agents moving one at a time or are based on probabilistic arguments [1]. We also perform local stability analysis, which similarly to the replicator, pinpoints weakly stable equilibria.

2

Related Work

Pointwise convergence in gradient/gradient-like systems: Pointwise convergence is much stronger than just convergence to a set. The gradient systems x˙ = −∇V (x) (or gradient-like systems like replicator dynamics) have the property of converging to equilibria sets because of the existence of the Lyapunov type function V . Lojasiewicz [18] proved an important inequality if V is an analytic function C ∞ a consequence of which is the pointwise convergence. Roughly, using this inequality one can show that the length of any trajectory is bounded. These results can be extended to systems called gradient-like x˙ = x(t) [17] where there exists V s.t ∇V x˙ ≤ 0 (with equality only at fixed points). It can be shown under the angle assumption (∇V, x˙ have angle bounded away from 90degrees) that we have pointwise convergence. However, in the replicator dynamics this is not the case. Even though someone can see that V (potential function) is analytic, we cannot generally satisfy the angle assumption (an example is of 2 balls - 2 bins), i.e ∇V ·x˙ → 0 as you go to a fixed point. In any case, there are examples (mexican hat) ||x||||∇V ˙ || where the limit points is a for example a circle. Losert and Akin [19] proved pointwise convergence of replicator dynamics for a specific population model using relative entropy. Our proof for pointwise convergence in the 3

Concurrent means that all agents (or even an arbitrary number of agents) get to move simultaneously.

4

continuous case generalizes these techniques. We address how much harder is to prove similar results for the discrete case. Learning as a refinement mechanism: A Nash equilibrium is evolutionarily stable: once it is fixed in a population, natural selection alone is sufficient to prevent alternative (mutant) strategies from invading successfully. Such fixed points are referred to as evolutionary stable equilibria or evolutionary stable strategies [20]. A related concept is that an evolutionary stable state [21]. This is a definition that arises in mathematical biology and explicitly in the study of replicator. A population is said to be in an evolutionarily stable state if its genetic composition is restored by selection after a disturbance, provided the disturbance is not too large. A stochastically stable equilibrium [7] is a further refinement of the evolutionarily stable states. An evolutionary stable state is also stochastically stable if under vanishing noise the probability that the population is in the vicinity of the state does not go to zero. Replicator dynamics and system performance: This paper builds upon recent results that show favorable performance guarantees for the replicator dynamics (and discrete-time variants) in settings of interest. The key reference is [15], where many key ideas including the fact that replicator dynamics can significantly outperform worst case equilibria were introduced. In fact, in some games replicator dynamic can be shown to outperform even best case equilibria by converging to optimal cyclic attractors [13]. Finally, the use of information theoretic arguments as (weak) Lyapunov functions have been explored before (e.g., [13, 14]).

3 3.1

Preliminaries Congestion Games

Congestion games [27] are non-cooperative games in which the utility of each agent depends only on the agent’s strategy and the number of other agents that either choose the same strategy, or some strategy that intersects/overlaps it Formally, a congestion game is defined by the tuple (N ; E; (Si )i∈N ; (ce )e∈E ) where N is the set of agents, E is a set of resources (also known as edges or bins or facilities), and each player i has a set Si of subsets of E (Si ⊆ 2E ) and |Si | ≥ 2. Each strategy si ∈ Si is a set of edges (a path), and ce is a cost (negative utility) function associated with facility e. We will also use small greek characters like γ, δ to denote different strategies/paths. For a strategy P profile s = (s1 , s2 , . . . , sN ), the cost of player i is given by ci (s) = e∈si ce (`e (s)), where `e (s) is the number of players using e in s (the load of edge e). Congestion games admit P P e (s) a potential function Φ(s) = e∈E kj=1 ce (j), which captures each player’s incentive to change his strategy [27]. Specifically, given a strategy profile s = (s1 , s2 , . . . , sN ), and strategy s0i of player i, we have ci (s0i , s−i ) − ci (s) = Φ(s0i , s−i ) − Φ(s). As a result, starting from any strategy profile, any sequence of improving moves is bound to terminate. Such stable states s ∈ S where for each agent i and s0i ∈ Si , ci (s0i , s−i ) ≥ ci (s) are called Nash equilibria. The set of Nash equilibria correspond to local optima of Φ(S). In linear congestion games where the latency functions are of the form ce (x) = ae xe + be where ae , be ≥ 0. Furthermore, the social cost will correspond to the sum of the P costs of all the agents sc = c (s). The price of anarchy is defined as: PoA(G) = i i maxσ∈NE Social Cost(σ) . minσ∗ ∈×i Si Social Cost(σ ∗ ) 5

P P We will denote ∆(Si ) = {p : γ piγ = 1}, ∆ = ×i ∆(Si ) and M = i |Si |. Since ∆ ⊂ RM sometimes we use the continuous function g that is a natural projection of the points p ∈ RM to RM −N by excluding a specific (the ”first”) variable for each player (we know that the probabilities must some up to one for each player). We denote the projection of ∆ by g(∆). Hence for a point p ∈ ∆ we will denote x = g(p) ∈ g(∆) ⊂ RM −N (for example (p1,a , p1,b , p1,c , p2,a0 , p2,b0 ) →g (p1,b , p1,c , p2,b0 ) where p1,a + p1,b + p1,c = 1 and p2,a0 + p2,b0 = 1)).

3.2

Average PoA in pointwise convergent systems

Definition 1. Given two manifolds M, N , a continuously differentiable map f : M → N is called C 1 - diffeomorphism if it is bijection and the inverse map f −1 is continuously differentiable as well. Let φ(t, p) be the flow of a differential equation x˙ = f (x) where f ∈ C 1 (Rn , Rn ). Some properties of φ(t, p) are the following: • φ(t, p) is a C 1 and local-diffeomorphism (around fixed points). • φ(t, φ(s, p)) = φ(t + s, p) We have the assumption that the social cost is continuous and defined on g(∆). Assume that the dynamical system converges pointwise and let ψ(x) = g(limt→∞ φt (g −1 (x))), i.e., ψ returns the limit point (always exists by assumption) of p = g −1 (x) in g(∆). We prove the following important lemma: Lemma 2. ψ(x) is Lebesgue measurable. Proof See appendix. We define a new performance measure that we call Average PoA. Essentially it compares the weighted average of all equilibrium points over the optimal solution. The respective weight of each equilibrium is just the measure of the points that converge to it. Formally:4

Definition 3. (Average PoA) The Average PoA is the following Lebesgue integral (µ is the Lebesgue measure in RM −N , sc is continuous) 1 µ(g(∆))

R g(∆)

sc ◦ ψ dµ

min Social Cost The integral above is well-defined since ψ(x) is a Lebesgue measurable function and sc is a continuous function hence sc ◦ ψ is Lebesgue measurable. 4

in the case of games that players want to maximize their utilities, we have the inverse fraction so that Average PoA ≥ 1

6

3.3 3.3.1

Replicator Dynamics Continuous Replicator Dynamics

The replicator equation [33, 30] is among the basic tools in mathematical ecology, genetics, and mathematical theory of selection and evolution. In its classic continuous form, it is described by the following differential equation: dpi (t) = p˙i = pi [ui (p) − uˆ(p)], dt

uˆ(p) =

n X

pi ui (p)

(1)

i=1

where pi is the proportion of type i in the population, p = (p1 , . . . , pm ) is the vector of the distribution of types in the population, ui (p) is the fitness of type i, and uˆ(p) is the average population fitness. The state vector p can also be interpreted as a randomized strategy of an adaptive agent that learns to optimize over its m possible actions, given an online stream of payoff vectors. As a result, it can be employed in any distributed optimization setting. An interior point of the state space is a fixed point for the replicator if and only if it is a fully mixed Nash equilibrium of the game. The interior (the boundary) of the state space ∆ = ×i ∆(Si ) are invariants for the replicator. We typically analyze the behavior of the replicator from a generic interior point, since points of the boundary can be captured as interior points of smaller dimensional systems. Summing all this up, our model is captured by the following system:  dpiγ = piγ cˆi − ciγ dt

(2)

P for each i ∈ N , γ ∈ Si , where ciγ = Es−i ∼p−i ci (γ, s−i ) and cˆi = δ piδ ciδ . The replicator dynamic enjoys numerous desirable properties such as universal consistency (no-regret) [8, 11], connections to the principle of minimum discrimination information (Occam’s rajor, Bayesian updating), disequilibrium thermodynamics [12, 31], classic models of ecological growth (e.g. Lotka-Volterra equations[10]), as well as several well studied discrete time learning algorithms (e.g. Multiplicative Weights algorithm [15, 3]). Remark: A fixed point of a flow is a point where the vector field is equal to zero. An interesting observation about the replicator is that its fixed points are exactly the set of randomized strategies such that each agent experiences equal costs across all strategies he employs with positive probability. This is a generalization of the notion of Nash equilibrium, since equilibria furthermore require that any strategy that is played with zero probability must have expected cost at least as high as those strategies which are played with positive probability. 3.3.2

Discrete Replicator Dynamics

The model of discrete replicator dynamics which is commonly used when we deal with utilities is the following: uiγ (t) piγ (t + 1) = piγ (t) uˆi (t) Observe that this dynamic has the same fixed points with the continuous replicator dynamic. We could not find in the literature a discrete replicator dynamics model for the case of congestion games, where we have cost functions. The model we consider is 7

slightly different. piγ (t + 1) = piγ (t)

x − ciγ (t) x − cˆi (t)

where we consider x to be the following constant: X x= ce (N ) e

x corresponds to the cost when all players use all edges with probability one, and we choose it in order to ensure that both numerator and denominator are positive. Observe that ∆ is invariant under the discrete dynamics defined above. If piγ = 0 it remains zero, if Pit is positive, it remains positive (both numerator and denominator are positive) and γ piγ = 1. The fixed points are the same with the continuous replicator setting, i.e if piγ > 0 then ciγ = cˆi .

4 4.1

Continuous case Pointwise convergence

Theorem 4. For any linear congestion game, continuous replicator dynamics converges to a fixed point (pointwise convergence). Proof First of all, we observe that X XX (be + ae )piγ Ψ(p) = cˆi + i

i,γ e∈γ

is a Lyapunov function for our game since X ∂cjγ 0 X ∂Ψ pjγ 0 (be + ae ) = ciγ + + ∂piγ piγ e∈γ j6=i XX X X = ciγ + ae pjγ 0 + (be + ae ) j6=i

γ 0 e∈γ∩γ 0

|

e∈γ

{z

ciγ

}

= 2ciγ and hence dΨ X ∂Ψ dpiγ = dt ∂piγ dt i,γ X =− piγ piγ 0 (ciγ − ciγ 0 )2 ≤ 0 i,γ,γ 0

with equality at fixed points. Hence (as in [15]) we have convergence to equilibria sets (compact connected sets consisting of fixed points). We address the fact that this doesn’t suffice for pointwise convergence. To be exact it suffices only in the case the equilibria are discrete (which is not the case for linear congestion games - see 14)

8

Let q be a limit point of the trajectory p(t) where p(t) is in the interior of ∆ for all t ∈ R (since we started from an initial condition inside ∆) then we have that Ψ(q) < Ψ(p(t)). We define the relative entropy X X I(p) = − qiγ ln(piγ /qiγ ) ≥ 0 (Jensen’s ineq.) i

γ:qiγ >0

and I(p) = 0 iff p = q. We get that X X dI =− qiγ (ˆ ci − ciγ ) dt i γ:qiγ >0 X X =− cˆi + qiγ ciγ i

=−

X

=−

i,γ

cˆi +

XX

i

i,γ e∈γ

X

cˆi +

XX

X

cˆi +

=

dˆi −

i,γ

(be + ae )qiγ −

cˆi +

γ 0 e∈γ∩γ 0

j6=i

ae qjγ 0 piγ

γ 0 e∈γ∩γ 0

j6=i

XX

XX

(be + ae )qiγ −

(be + ae )piγ +

X

X

piγ (diγ )

i,γ

XX

X

(be + ae )piγ −

i,γ e∈γ

i,γ e∈γ

= Ψ(q) − Ψ(p) −

ae qiγ pjγ 0

i,γ e∈γ

i,γ e∈γ

X i

i

(be + ae )qiγ +

XXX X

i,γ

XX

i

X

XXX X

i,γ e∈γ

i

=−

(be + ae )qiγ +

piγ (dˆi − diγ )

i,γ

piγ (dˆi − diγ )

i,γ

where diγ , dˆi correspond to the cost of player i if he chooses strategy γ and his expected cost respectively at point q. The rest of the proof follows in a similar way to Losert and Akin. P We break the term i,γ piγ (dˆi − diγ ) to positive and negative terms (don’t care about P P P zero terms), i.e., i,γ piγ (dˆi − diγ ) = i,γ:dˆi >diγ piγ (dˆi − diγ ) + i,γ:dˆi 0 so that the function Z(p) = I(p) + 2 has dZ < 0 for |p − q| <  and Ψ(q) < Ψ(p). dt

P

i,γ:dˆi 0 with |p − q| < , we have that cˆi − ciγ ≤ 43 (dˆi − diγ ) for the terms which dˆi − diγ < 0. Therefore dZ = Ψ(q) − Ψ(p) − dt

X

piγ (dˆi − diγ ) −

i,γ:dˆi >diγ

X

≤ Ψ(q) − Ψ(p) −

piγ (dˆi − diγ ) −

= Ψ(q) − Ψ(p) + | {z } 0. Since Ei ⊂ E, we have that Ei has measure zero, hence we can find a countable cover , namely P∞of open balls C1 , C2 , ... for Ei∞ ∞ Ei ⊂ ∪j=1 Cj so that Cj ⊂ Bi for all j and also j=1 µ(Cj ) < K n . Since Ei ⊂ ∪j=1 Cj we i get that g(Ei ) ⊂ ∪∞ j=1 g(Cj ), namely g(C1 ), g(C2 ), ... cover g(Ei ) and also g(Cj ) ⊂ g(Bi ) for all j. Assuming that ball Cj ≡ B(x, r) (center x and radius r) then it is clear that g(Cj ) ⊂ B(g(x), Ki r) (g maps the center x to g(x) and the radius r to Ki r because of lipschitz assumption). But µ(B(g(x), Ki r)) = Kin µ(B(x, r)) = Kin µ(Cj ), therefore µ(g(Cj )) ≤ Kin µ(Cj ) and so we conclude that µ(g(Ei )) ≤

∞ X

µ(g(Cj )) ≤

j=1

Kin

∞ X

µ(Cj ) < 

j=1

Since  was arbitrary, it follows that µ(g(E P∞ i )) = 0. To finish the proof, observe that ∞ g(E) = ∪i=1 g(Ei ) therefore µ(g(E)) ≤ i=1 µ(g(Ei )) = 0. 8

Here we used the fact that the eigenvalues with absolute value less than one, one and greater than one of eA correspond to eigenvalues with negative real part, zero real part and positive real part respectively of A 9 we used ”silently” that µ(g(∆)) = µ(int g(∆))

18

A.1.3

Proof of theorem 7

Proof Observe that x = N

P

e ae +

x − ciγ =

P

P

e be =

XX X j6=i

δ

j6=i

P

δ,e ae pjδ +

ae pjδ +

e∈γ∩δ /

X

P

e (ae

+ be ) and hence

(ae + be )

e∈γ /

Therefore we get X

X

(x − cˆi ) =

i

piγ (x − ciγ )

i,γ

= 2Q −

X

piγ

i,γ

X

(ae + be )

e∈γ /

P Thus 2Q + Ψ = N x + N e (ae + be ) which is constant. It suffices to prove that Q(p(t + 1)) ≥ Q(p(t)) and Q(p(t + 1)) = Q(p(t)) only at fixed points, i.e Q is increasing with equality at fixed points.

Q(p(t)) =

XX 1 XXX X ae piγ pjδ + ( ae + be )piγ 2 i,γ j6=i δ i,γ e∈γ∩δ /

e∈γ /

s ! 32  31 X X X X  1 x − ciγ x − cjδ x − cˆi x − cˆj 1 ae piγ pjδ ae piγ pjδ = 2 x − c ˆ x − c ˆ 2 x − c x − c i j iγ jδ i,γ j6=i δ e∈γ∩δ /   13   23 s X X x − ciγ   X x − cˆi  ( ae + be )piγ ( ae + be )piγ + x − cˆi x − ciγ i,γ e∈γ /

e∈γ /

 31



X X X X 1 x − ciγ x − cjδ XX x − ciγ  ae piγ pjδ + ( ae + b e ) piγ 2 x − c ˆ x − c ˆ x − c ˆ i j i i,γ j6=i δ e∈γ∩δ i,γ e∈γ / / | {z }

≤

Q(p(t+1))1/3

 ×

XXX X i,γ

j6=i

δ

e∈γ∩δ /

1 ae piγ pjδ 2

s

XX x − cˆi x − cˆj + ( ae + be )piγ x − ciγ x − cjδ i,γ e∈γ /

s

 32

x − cˆi  x − ciγ

where we used H˝older’s inequality in the following form !1/3 X i

1/3 2/3 xi yi



X

xi

i

!2/3 X

yi

i

We will that the second term of the last product is at most Q(p(t)). Using the fact √ showa+b that ab ≤ 2 (with equality iff a = b) we get that 19

s XX x − cˆi x − cˆj x − cˆi + ( ae + be )piγ x − ciγ x − cjδ x − ciγ i,γ e∈γ e∈γ∩δ / /     1 XXX X 1 x − cˆi x − cˆj 1X X x − cˆi ae piγ pjδ + + ( ae + be )piγ +1 2 i,γ j6=i δ 2 x − ciγ x − cjδ 2 i,γ x − ciγ e∈γ∩δ / e∈γ /     XX X X 1X x − cˆi 1X x − cˆi piγ ae pjδ + piγ ( ae + be ) +1 2 i,γ x − c 2 x − c iγ iγ i,γ j6=i δ e∈γ∩δ / e∈γ / X X 1 piγ [(x − cˆi ) + ( ae + be )] 2 i,γ e∈γ / X 1X 1X (x − cˆi ) + piγ ( ae + b e ) 2 i 2 i,γ

XXX X 1 ae piγ pjδ 2 i,γ j6=i δ ≤ = = =

s

e∈γ /

=Q The proof is complete since Q(p(t)) ≤ Q(p(t+1))1/3 Q(p(t))2/3 . i.e Q(p(t)) ≤ Q(p(t+1)). Equality holds iff piγ = 0 or cˆi = ciγ for all i, γ (namely at fixed points) A.1.4

Equations of the J q at section 5.1

(x − ciγ ) x − ciγ x − ciγ (x − ciγ )ciγ − qiγ ci,hq (i) = + qiγ 2 2 x − cˆi (x − cˆi ) (x − cˆi ) x − cˆi (x − ciγ ) (x − ciγ ) =qiγ ciδ − qiγ ci,h (i) 2 (x − cˆi ) (x − cˆi )2 q P P P (ci − x) e∈γ∩δ ae − (ciγ − x) γ 0 qiγ 0 e∈γ 0 ∩δ ae =qiγ − (x − cˆi )2 P P P (ci − x) e∈γ∩hq (i) ae − (ciγ − x) γ 0 qiγ 0 e∈γ 0 ∩hq (i) ae qiγ (x − cˆi )2

q J(i,γ),(i,γ) = q J(i,γ),(i,δ) q J(i,γ),(j,δ)

A.1.5

Proof of theorem 11

Proof Since Jqiγ,iδ = 0 for γ 6= δ and Jqiγ,iδ = 1 we get that tr((Jq )2 ) = t +

XX

Jqiγ,jδ Jqjδ,iγ

i j6=i γ∈Si δ∈Sj

We consider the following cases: • Let i < j, γ < γ 0 with γ, γ 0 6= hq (i) and δ < δ 0 with δ, δ 0 6= hq (j) and we examine 1 p p 0 p p 0 in the sum and we get that it appears the term (x−ˆci )(x−ˆ cj ) iγ iγ jδ jδ 0

0

0

0

0

[[M γ ,γ,δ,hq (j) ] × [M γ,hq (i),δ ,δ ] + [M γ,γ ,δ,hq (j) ] × [M γ ,hq (i),δ ,δ ] 0

0

0

0

0

0

0

+[M γ ,γ,δ ,hq (j) ] × [M γ,hq (i),δ,δ ] + [M γ,γ ,δ ,hq (j) ] × [M γ ,hq (i),δ,δ ] 0

0

=(M γ,γ ,δ,δ )2 20

• Let i < j, γ 6= hq (i) and δ 6= hq (j). The term sum appears

1 p p p p (x−ˆ ci )(x−ˆ cj ) iγ i,hq (i) jδ j,hq (j)

in the

[M hq (i),γ,δ,hq (j) ] × [M γ,hq (i),hq (j),δ ] = (M γ,hq (i),δ,hq (j) )2 • Let γ < γ 0 with γ, γ 0 6= hq (i) and δ 6= hq (j). The term in the sum appears 0

1 p p 0p p (x−ˆ ci )(x−ˆ cj ) iγ i,γ jδ j,hq (j)

0

0

2[M γ ,γ,δ,hq (j) ] × [M γ,hq (i),hq (j),δ ] + 2[M γ,γ ,δ,hq (j) ] × [M γ ,hq (i),hq (j),δ ] 0

= 2(M γ,γ ,δ,hq (j) )2 Corollary 23. Every stable fixed point is a weakly stable Nash equilibrium (discrete case). Proof If the trace of (Jq )2 (matrix has size t×t) is larger than t then exists an eigenvalue 0 0 of absolute value greater than one. Hence for a stable fixed point we must have M γ,γ ,δ,δ = 0. The rest follows from same argument as [15], theorem 3.8. A.1.6

Proof of theorem 15

Proof Since Stag Hunt is payoff equivalent to a coordination game and has a fully mixed d

2

ln(φ1s (p,t))+ 1 ln(φ1h (p,t))− 2 ln(φ2s (p,t))− 1 ln(φ2h (p,t))

3 3 3 Nash equilibrium, 3 = 0, where φiγ (p, t), dt corresponds to the probability that each agent i assigns to strategy γ at time t given initial condition p (from corollary 27) . We will use this invariant to identify the stable and unstable manifold corresponding to the interior Nash q. Given any point p of the stable manifold of q, we have that by definition limt→∞ φ(p, t) = q. Similarly for the unstable manifold, we have that limt→−∞ φ(p, t) = q. The timeinvariant property implies that for all such points (belonging to the stable or unstable manifold), 23 ln(p1s ) + 13 ln(1 − p1s )− 32 ln(p2s ) − 13 ln(1 − p2s ) = 23 ln(q1h ) + 31 ln(1 − q1h )− 23 ln(q2h ) − 13 ln(1 − q2h ) = 0, since the fully mixed Nash equilibrium is symmetric. This condition is equivalent to p21s (1 − p1s ) = p22s (1 − p2s ), where 0 < p1s , p2s < 1. It is straightforward to verify that this algebraic equation is satisfied bypthe following two distinct solutions, the diagonal line (p2s = p1s ) and p2s = 12 (1 − p1s + 1 + 2p1s − 3p21s ). Below, we show that these manifolds correspond indeed to the state and unstable manifold of the mixed Nash equilibrium, by showing that the Nash equilibrium satisfies those equations and by establishing that the vector field is tangent everywhere along them. The case of the diagonal is trivial and follows from p the symmetric nature of the game. We verify the claims about p2s = 12 (1 − p1s + 1 + 2p1s − 3p21s ). Indeed, the mixed equilibrium point in which p1s = p2s = 2/3 satisfies the above equation. We establish that the vector filed is tangent to this manifold by showing in lemma 24 that

∂p2s ∂p1s

=

ζ2s ζ1s

=

p2s u2 (s)−(p2s u2 (s)+(1−p2s )u2 (h))

 . Finally, this manifold is indeed attracting to p the equilibrium. Since the function p2s = y(p1s ) = 21 (1 − p1s + 1 + 2p1s − 3p21s ) is a strictly decreasing function of p1s in [0,1] and satisfies y(2/3) = 2/3,  this implies that its graph is contained in the subspace 0 < p < 2/3 ∩ 2/3 < p < 1 ∪ 2/3< p1s < 1 ∩ 0 < 1s 2s  p2s < 2/3 . In each  of these subsets 0 < p1s < 2/3 ∩ 2/3 < p2s < 1 , 2/3 < p1s < 1 ∩ 0 < p2s < 2/3 the replicator vector field coordinates have fixed signs that “push” p1s , p2s towards their respective equilibrium values. p1s u1 (s)−(p1s u1 (s)+(1−p1s )u1 (h))

21

The unstable manifold partitions the set 0 < p1s , p2s < 1 into two subsets, each of which is flow invariant since the unstable manifold itself is flow invariant. Our convergence analysis for the generalized replicator flow implies that in each subset all but a measure zero of initial conditions must converge to its respective pure equilibrium. R 1 1The size of 10 the lower region of attraction is equal to the following definite integral 0 2 (1 − p1s + h  i1 p p 2arcsin[ 21 (1−3p1s )] p2 √ 1 + 2p1s − 3p21s )dx = 1/2 p1s − 21s +(− 61 + p21s ) 1 + 2p1s − 3p21s − = 3 3 0 √ 1 (9 + 2 3π) = 0.7364 and the theorem follows. 27 We conclude by providing the proof of the following technical lemma: p Lemma 24. For any 0 < p1s , p2s < 1, with p2s = 21 (1 − p1s + 1 + 2p1s − 3p21s ) we have that:  p2s u2 (s) − (p2s u2 (s) + (1 − p2s )u2 (h)) ∂p2s ζ2s  = = ∂p1s ζ1s p1s u1 (s) − (p1s u1 (s) + (1 − p1s )u1 (h)) Proof By substitution of the stag hunt game utilities, we have that:  p2s u2 (s) − (p2s u2 (s) + (1 − p2s )u2 (h)) ζ2s p (1 − p2s )(3p1s − 2)  = 2s = (7) ζ1s p1s (1 − p1s )(3p2s − 2) p1s u1 (s) − (p1s u1 (s) + (1 − p1s )u1 (h)) p However, p2s (1 − p2s ) = 12 p1s (p1s − 1 + 1 + 2p1s − 3p21s ). Combining this with (7), p √ √ 1 (p1s − 1 + 1 + 2p1s − 3p21s )(3p1s − 2) ζ2s 1 ( 1 + 3p1s − 1 − p1s )(3p1s − 2) √ = = ζ1s 2 (1 − p1s )(3p2s − 2) 2 1 − p1s · (3p2s − 2) (8) √ √ √ Similarly, we have that 3p2s −2√= 21 1 + 3p1s ·(3 1 − p − 1 + 3p ). By multiplying 1s 1s √ and dividing equation (8) with ( 1 + 3p1s + 3 1 − p1s ) we get: √ √ √ √ 1 ( 1 + 3p1s + 3 1 − p1s )( 1 + 3p1s − 1 − p1s )(3p1s − 2) ζ2s √ √ = ζ1s 2 2 1 − p1s · 1 + 3p1s · (2 − 3p1s ) √ √ √ √ 1 ( 1 + 3p1s + 3 1 − p1s )( 1 + 3p1s − 1 − p1s ) p = − 4 1 + 2p1s − 3p21s )   p 1 2  ∂ 2 (1 − p1s + 1 + 2p1s − 3p1s ) 1 1 − 3p1s ∂p2s = −1+ p = = . 2 2 ∂p1s ∂p1s 1 + 2p1s − 3p1s

A.1.7

Proof of theorem 16

We denote by ζ, ψ, the projected flow and vector field respectively. Lemma 25. All but a zero measure of initial conditions in the polytope (PHare ): p2s ≤ −wp1s + w 1 p2s ≤ − p1s + 1 w 0 ≤ p1s , p2s ≤ 1 10

This corresponds to the risk dominant equilibrium (Hare, Hare).

22

Figure 2: Vector field of replicator dynamics in Stag Hunt converges to the (Hare, Hare) equilibrium. All but a zero measure of initial conditions in the polytope (PStag ): p2s ≥ −p1s + 0 ≤ p1s , p2s

2w w+1 ≤1

converges to the (Stag, Stag) equilibrium. Proof First, we will prove the claimed property for polytope (PStag ). Since the game is symmetric, the replicator dynamics are similarly symmetric with p2s = p1s axis of 0 = PHare ∩ symmetry. Therefore it suffices to prove the property for the polytope PHare {p2s ≤ p1s } = {p2s ≤ p1s } ∩ {p2s ≤ −wp1s + w} ∩ {0 ≤ p1s ≤ 1} ∩ {0 ≤ p2s ≤ 1} We will argue that this polytope is forward flow invariant, i.e., if we start from an initial condition 0 0 0 x ∈ PHare ψ(t, x) ∈ PHare for all t > 0. On the p1s , p2s subspace PHare defines a triangle w w with vertices A = (0, 0), B = (1, 0) and C = ( w+1 , w+1 ) (see figure 2). The line segments AB, AC are trivially flow invariant. Hence, in order to argue that the ABC triangle is forward flow invariant, it suffices to show that everywhere along the line segment BC the vector field does not point “outwards” of the ABC triangle. Specifically, we need to show that for every point p on the line segment BC (except the Nash equilibrium C), |ζ1s (p)| ≥ w1 . |ζ2s p| |ζ1s (p)| p1s |p2s − (p1s p2s + w(1 − p1s )(1 − p2s ))| p1s (1 − p1s )(w − (w + 1)p2s ) = = |ζ2s p| p2s |p1s − (p1s p2s + w(1 − p1s )(1 − p2s ))| p2s (1 − p2s )(−w + (w + 1)p1s ) However, the points of the line passing through B, C satisfy p2s = w(1 − p1s ). |ζ1s (p)| wp1s (1 − p1s )(1 − (w + 1)(1 − p1s )) = |ζ2s p| w(1 − p1s )(1 − w(1 − p1s ))(−w + (w + 1)p1s ) p1s (−w + (w + 1)p1s ) = (1 − w + wp1s )(−w + (w + 1)p1s ) p1s p1s 1 = ≥ = 1 − w + wp1s wp1s w 23

We have established that the ABC triangle is forward flow invariant. Since G(w) is a potential game, all but a zero measurable set of initial conditions converge to one of the two pure equilibria. Since ABC is forward invariant, all but a zero measure of initial conditions converge to (Hare, Hare). A symmetric argument holds for the triangle AB 0 C with B 0 = (0, 1). The union of ABC and AB 0 C is equal to the polygon PHare , which implies the first part of the lemma. Next, we will prove the claimed property for polytope (PStag ). Again, due to symme0 try, it suffices to prove the property for the polytope PStag = PStag ∩ {p2s ≤ p1s } = {p2s ≤ 2w p1s }∩{p2s ≥ −p1s + w+1 }∩{0 ≤ p1s ≤ 1}∩{0 ≤ p2s ≤ 1} We will argue that this polytope 0 is forward flow invariant. On the p1s , p2s subspace PStag defines a triangle with vertices w w ), E = (1, 1) and C = ( , ). The line segments CD, DE are trivially D = (1, w−1 w+1 w+1 w+1 forward flow invariant. Hence, in order to argue that the CDE triangle is forward flow invariant, it suffices to show that everywhere along the line segment CD the vector field does not point “outwards” of the CDE triangle (see figure 2) . Specifically, we need to show that for every point p on the line segment CD (except the Nash equilibrium C), |ζ1s (p)| ≤ 1. |ζ2s p| |ζ1s (p)| p1s |p2s − (p1s p2s + w(1 − p1s )(1 − p2s ))| p1s (1 − p1s )(w − (w + 1)p2s ) = = |ζ2s p| p2s |p1s − (p1s p2s + w(1 − p1s )(1 − p2s ))| p2s (1 − p2s )(−w + (w + 1)p1s ) However, the points of the line passing through C, D satisfy p2s = −p1s +

2w . w+1

p1s (1 − p1s )(−w + (w + 1)p1s ) |ζ1s (p)| = 2w |ζ2s p| (−p1s + w+1 )(− w−1 + p1s )(−w + (w + 1)p1s ) w+1 p1s (1 − p1s ) p1s (1 − p1s ) = 2(w−1) = ≤1 2w w−1 w (−p1s + w+1 )(− w+1 + p1s ) (− + p ) + p (1 − p ) 1s 1s 1s w+1 w+1 We have established that the CDE triangle is forward flow invariant. Since G(w) is a potential game, all but a zero measurable set of initial conditions converge to one of the two pure equilibria. Since CDE is forward invariant, all but a zero measure of initial conditions converge to (Stag, Stag). A symmetric argument holds for the triangle CD0 E w−1 , 1). The union of CDE and CD0 E is equal to the polygon PStag , which with D0 = ( w+1 implies the second part of the lemma. w Proof The measure/size of µ(PHare ) = 2|ABC| = w+1 , and similarly the measure of 2 µ(PStag ) = 2|CDE| = (w+1)2 In terms of the average limit performance of the replicator  R 2 +1 dynamics g(∆) sw(ψ(x))dµ ≥ 2w · µ(PHare ) + 2 1 − µ(PHare ) = 2 ww+1 . Furthermore,  R 2 2 sw(ψ(x))dµ ≤ 2w 1 − µ(PStag ) + 2 · µ(PStag ) = 2w(1 − (w+1)2 ) + 2 · (w+1) 2 = g(∆) w−1 2w − 4 (w+1) 2 . This implies that

A.1.8

w(w+1)2 w(w+1)2 −2w+2

≤ AP oA ≤

w2 +w . w2 +1

Proof of lemma 13

Proof Assume we have a weakly Nash equilibrium p. We have the following facts: • Fact 1: For every bin γ, if a player i chooses γ with probability 1 > piγ > 0, he must be the only player that chooses that bin with nonzero probability. Let i, j two 24

players that choose bin γ with nonzero probabilities and also piγ , pjγ < 1. Clearly if player i changes his strategy and chooses bin γ with probability one, then player j doesn’t stay indifferent (his cost ciγ increases). • Fact 2: If player i chooses bin γ with probability one, then he is the only player that chooses bin γ with nonzero probability. This is true because every player j 6= i can find a bin with load less than 1 to choose. From Facts 1,2 and since the number of balls is equal to the number of bins we get that p must be pure. A.1.9

Proof of lemma 14

Proof We will prove it for n = 4 and the generalization is then easy, i.e., if n > 4 then the first 4 players will play as shown below in the first 4 bins and each of the remaining n − 4 players will choose a distinct remaining bin. Below we give matrix A where Aiγ = piγ . Observe that for any x ∈ [ 14 , 34 ] we have a Nash equilibrium. 

 x 1−x 0 0  1/2 0 1/2 0   A=  0 1/2 0 1/2  0 0 x 1−x

A.2

Information Theory

Entropy is a measure of the uncertainty of a random variable and captures the expected information value from a measurement of the random variable. The entropy H of a discrete random variable X with P possible values {1, . . . , n} and probability mass function p(X) is defined as H(X) = − ni=1 p(i) ln p(i). Given two probability distributions p and q of a discreterandom  variable their K-L P p(i) divergence (relative entropy) is defined as DKL (pkq) = i ln q(i) p(i). It is the average of the logarithmic difference between the probabilities p and q, where the average is taken using the probabilities p. The K-L divergence is only defined if q(i) = 0 implies p(i) = 0 for all i11 . K-L divergence is a ”pseudo-metric” in the sense that for it is always non-negative and is equal to zero if and only if the two distributions are equal (almost everywhere). Other useful properties of the K-L divergence is that it is additive for independent distributions and that it is jointly convex in both of its arguments; that is, if (p1 , q1 ) and (p2 , q2 ) are two pairs of distributions then for any 0 ≤ λ ≤ 1: DKL (λp1 + (1 − λ)p2 kλq1 + (1 − λ)q2 ) ≤ λDKL (p1 kq1 ) + (1 − λ)DKL (p2 kq2 ). A closely related concept is that of the cross entropy between two probability distributions, which measures the average number of bits needed to identify an event from a set of possibilities, if a coding scheme is used based on a given probability distribution q, rather than the “true” distribution p. Formally, the cross entropy for two distributions p and Pn q over the same probability space is defined as follows: H(p, q) = − i=1 p(i) ln q(i) = H(p) + DKL (pkq). For more details and proofs of these basic facts the reader should refer to the classic text by Cover and Thomas [5]. 11

The quantity 0 ln 0 is interpreted as zero because limx→0 x ln(x) = 0.

25

We will start by arguing a simple technical lemma about an information theoretic invariant property of (bipartite networks) of coordination games. A coordination game is a two agent game that in each outcome both agents receive the same utility. A graphical polymatrix game is defined via an undirected graph G = (V, E), where V corresponds to the set of agents of the game and where every edge corresponds to a bimatrix game between its two endpoints/agents. We denote by Si the set of strategies of agent i. We denote the bimatrix game on edge (i, k) ∈ E via a pair of payoff matrices: Ai,k of dimension |Si | × |Sk | and Ak,i of dimension |Sk | × |Si |. Let s ∈ ×i Si be a strategy profile of the game, then we denote by si ∈ Si the respective strategy of agent i. Similarly, we denote by s−i ∈ ×j∈V \i Sj the strategies of the other agents. The payoff of agent i ∈ V in strategy profile s is equal to the sum of the payoffs thatPagent i receives from all the bimatrix games she participates in. Specifically, ui (s) = (i,k)∈E Ai,k si ,sk . If the case of a bipartite network we denote by Vlef t , Vright the respective vertices of each size of the bipartite graph. We will P show that betweenP a fully mixed Nash q and an evolving inPthe cross entropyP terior state i∈Vlef t γ∈Si qiγ ·ln(piγ )− i∈Vright γ∈Si qiγ ·ln(piγ ) is an invariant of the dynamics. This results follows in the line of similar invariants P [26, 10]. When Px, y ∈ ×i ∆(Si ) we will use H(x, y), DKL (xky) to denote respectively the i H(xi , yi ), i DKL (xi kyi ). Lemma 26. Let φ denote the flow of the replicator dynamic when applied to a bipartite network of coordination games that mixed P Nash equilibrium then given any P has a fullyi (p,t)) dH(qi ,φi (p,t)) = . starting point p ∈ ×i int(∆(Si )), i∈Vlef t dH(qi ,φ i∈Vright dt dt Proof The derivative of follows: X X

qiγ

i∈Vlef t γ∈Si

=

X

P

i∈Vlef t

P

γ∈Si

qi · ln(piγ ) −

=

X

=

X

X

γ∈Si

qiγ · ln(piγ ) has as

 qi T − pi T Ai,j pj =

i∈Vright (i,j)∈E

X  qi T − pi T Ai,j (pj − qj ) −

i∈Vlef t (i,j)∈E

= −

P

i∈Vright (i,j)∈E

X  qi T − pi T Ai,j pj −

i∈Vlef t (i,j)∈E

X

i∈Vright

X X d ln(piγ ) X X p˙iγ X X p˙iγ d ln(piγ ) − qiγ = qiγ − qiγ = dt dt piγ i∈V piγ i∈Vright γ∈Si i∈Vlef t γ∈Si γ∈S i right X X X   T i,j T i,j T i,j T i,j qi A pj − pi A pj − qi A pj − pi A pj =

i∈Vlef t (i,j)∈E

X

P

X

X

 qi T − pi T Ai,j (pj − qj ) =

i∈Vright (i,j)∈E



   qi T − pi T Ai,j (qj − pj ) − qj T − pj T Aj,i (qi − pi ) = 0

(i,j)∈E,i∈Vlef t ,j∈Vright

The cross entropy between the Nash q and the state of the system, however is equal to the summation of the K-L divergence between these two distributions and the entropy of q. Since the entropy of q is constant, we derive the following corollary: Corollary 27. Let φ denote the flow of the replicator dynamic when applied to a bipartite network of coordination games that has a fully mixed Nash equilibrium then given any (interior) starting point p ∈ ×i ∆(Si ), the K-L divergence between φ(q, t) and the p is constant, i.e., does not depend on t.

26