Evolutionary Coalitional Games for Random Access Control Xin Luo and Hamidou Tembine Supelec, Ecole Sup´erieure d’Electricit´e, France. Emails: {luo.xin, hamidou.tembine}@supelec.fr
Abstract—In this paper we consider a random access system where each user can be in two modes of operation, has a packet or not and the set of users which have a packet is available to a shared medium. We propose an evolving coalitional game theory to analyze the system outcomes. Unlike classical coalitional approaches that assume that coalitional structures are fixed and formed with cost-free, we explain how coalitions can be formed in a fully distributed manner using evolutionary dynamics and coalitional combined fully distributed payoff and strategy (CODIPAS) learning. We introduce the concept of evolutionarily stable coalitional structure (ESCS), which is, when it is formed it is resilient by small perturbation of strategies. We show that (i) the formation and the stability of coalitions depend mainly on the cost of making a coalition compared to the benefit of cooperation, (ii) the grand coalition can be unstable and a localized coalitional structure is formed as an evolutionarily stable coalitional structure. When the core is empty, the coalitional CODIPAS scheme selects one of the stable sets. Finally, we discuss the convergence and complexity of the proposed coalitional CODIPAS learning in access control with different users’ activities.
I. I NTRODUCTION In 1961, Claude Shannon established the foundation for the discipline now known as “multi-user information theory” in his pioneering paper “Two-way Communication Channels”,[8] and one decade later Norman Abramson [1] introduced the concept of multiple access using a shared common channel, in his paper entitled “The Aloha System Another Alternative for Computer Communications”. Thereafter for more than four decades of study, numerous elegant theories and algorithms have been developed for multiple access protocols including pure Aloha, p−persistent Aloha, Slotted Aloha, carrier-sense-multiple access with collision avoidance or collision detection and time division multiple access. Despite decades of intensive researches in the field, random medium access is still far from complete. Most of the approaches (such as game-theoretic payoff, information theoretic capacity) are based on bounds of the noise, interference and collision probability caused by simultaneous transmissions but ignores random data arrivals at the users. In particular, in presence of non-saturated queues (users may not have always a data to send), these approaches may fail. At the same time, there are several issues developed in queueing theory and networking that takes into account the evolution of the queue size (buffer, backoff) but ignores the uncertainty of the channel, noise and mobility of the users and interaction
between the users’ queues. As the performance of a network depends on the interactive and cross-layers, one need to take into consideration both issues. In a recent IEEE Spectrum Feature Cognitive Radio and Game Theory 2011, [7], the author has pointed out the emergence of cooperative behaviors and stated the following: As our radios get smarter, they’ll be competing for overcrowded airwaves. Game theory can make them cooperate. In this paper we aim to understand how cooperation and coalitional structure will be formed in random medium access control. Inside each coalition we use a specific allocation procedure. As an illustrative example consider a scenario with two users. User 1 can get 0.06 i.e., 6% of successful transmissions in the long-run and user 2 can get 0.36, i.e., 36% of successful transmissions, and if they are forming a team (in a coalition) they can together get 0.46 i.e. 46%. How should they split the 46%? A standard and well-investigated answer to that question is the solution concept corresponding to the equal split of the surplus principle due to Shapley1 . Here the surplus is 46 − 6 − 36 = 4%. Thus, user 1 gets 6 + 2 = 8% and user 2 gets 38% of successful transmissions. To achieve these percentages in the random access control problem, the coalition can use an agreement policy to give the slot to other user with equal proportion whenever both users have a packet to send. Classical approaches to network and coalition formation analyze stable coalitional structures. Stable coalitional structures correspond to the “equilibrium states” according to some solution concept in which players do not have incentives to leave already established or fixed coalitions. These approaches fail to specify how the players arrive at a specific coalitional structure and topology of the system. In practice, it is not trivial to form a coalition. Thus, we address three basic questions: • • •
Q1: Why and how to form a coalition between users? Q2: Is it possible to form a coalition by means of distributed strategic learning? Q3: What is the final coalitional structure to the network?
1 Note that one can use a proportional split of the surplus instead of equal split, i.e., α% of the surplus goes to user 1 and (1 − α)% to user 2 for some α ∈ (0, 1).
To answer to these questions we propose an evolutionary coalitional game framework. In evolutionary coalitional games [6], the players learn and interact with their environment and use their experience to choose or avoid certain coalitions based on their consequences (past measurements). There are strategic interactions between the coalitions. The coalitions that led to higher payoffs in a certain situation tend to be repeated whenever the same situation recurs, whereas choices that led to comparatively lower payoffs tend to be avoided. In addition, the analysis incorporates the cost of making a new coalition. The evolutionary coalitional game approach goes beyond the notions of cooperative games or non-cooperative games. It combines both notions. For example, both situation can be observed in the transient phase. Within this framework, partially cooperative users, spiteful users and altruistic users can be studied. The main solution concepts of coalitional games are geared toward payoff division among agents in ways that guarantee forms of stability of the coalitional structure. Most of the solution concepts focus on the final solution (steady state), and usually do not address the dynamic process that leads to the stationary solution (if any). Dynamic coalitional approaches have been widely studied in the game theory literature. One of the first approaches to coalition formation is the dynamic split-and-merge procedure proposed in 1974 by Aumann and Dreze [2]. Since then the “merges and splits” idea have been used by many authors in different areas of applications including economics, computer science, and engineering. Our contributions can be summarized as follows. Most of the previous works claim that designing self-interested autonomous agents that can negotiate rationally in stable coalitions can dramatically benefit end users. However, these studies assume implicitly that, coalition formation is of costfree, which is not realistic in topological network structure of interest due to communication complexity and difficulties to reach stable consensus. In this paper, we show that these results depend on the cost of communications, information exchange and cost of cooperation. Depending on these costs, coalition will not be formed or the grand coalition can be unstable. We explicitly model these costs in the evolutionary dynamics and the (evolving) value of coalitions. We propose a combined fully distributed payoff and strategy (CODIPAS) for coalition formation based on imitative learning. The asymptotic trajectory of the CODIPAS is related to the socalled replicator dynamics. Based on this dynamical system, we answer to questions Q1-Q3 in Proposition 1 and 2. We explicitly provide the exact convergence time (not a bound) of coalition formation dynamics in Proposition 4. We introduce the concept of evolutionarily stable coalitional structure (ESCS), which is, when it is formed it is resilient by small perturbation of strategies. ESCS is the analogue of ESS (evolutionarily stable state or strategy) in evolutionary games but here we use it for finite coalitional structures. Depending on the cost of cooperation, the grand coalition may not be an ESCS and localized coalitional structures may emerge. Using
coalitional CODIPAS we determine the stable set even when the core is empty. The remainder of this paper is structured as follows. In the next section we introduce basic notions of evolutionary coalitional games and provide a simple learning scheme. In Section III we present the random access control model. Analytical results are given for different coalitional structures. Section IV concludes the paper. The Proofs and the extensions to more than three users are omitted. We summarize some of the notations in Table I. TABLE I S UMMARY OF N OTATIONS Symbol N C vC C j rC,t ∈R j rˆC,t 2N
Meaning set of users coalition (subset of N ) value of coalition C cost of making the coalition C perceived payoff by user j in coalition C at t estimated payoff vector of user j in coalition C at t set of all the subsets of N .
II. P RELIMINARY Cooperative game theory is concerned primarily with coalitions, that is, groups of players, who coordinate their actions and pool their payoffs. Consequently, one of the problems here is how to divide the payoff among the members of the formed coalition. The basis of this theory was laid by John von Neumann & Oskar Morgenstern [12] with coalitional games in characteristic function form, known also as transferable utility games (TU-games). The theory has been extended to non-transferable utility games (NTU-games). Let N be a non-empty finite set of players who consider different cooperation possibilities. Each subset C ⊆ N is referred to as a crisp coalition. The set N is called the grand coalition and ∅ is the empty coalition. Definition 1. A cooperative game in characteristic function form is an ordered pair (N , v) consisting of the player set N and the characteristic function v : 2N −→ R, with v(∅) = 0 and 2N is the set of subsets of N . The real number vC = v(C) can be interpreted as the maximal payoff that the members of C can obtain when they cooperate. We now address one of the basic questions in the theory of cooperative game: If the grand coalition forms, how to divide the value vN ? This question is approached using different solution concepts in cooperative game theory like cores, stable sets, bargaining sets, the Shapley value, the nucleolus. A solution concept gives an answer to the question of how the payoff obtained when all players in N cooperate should be distributed among the individual players while taking account of the potential utility of all different coalitions of players. Hence, a solution concept assigns to
a coalitional game at least one payoff vector r = (rk )k∈N , where rk is the payoff allocated to player k ∈ N . Definition 2. A payoff vector r ∈ Rn is an imputation for the P game v ∈ GN if it is efficient and individually rational, i.e. j∈N rj = v(N ), rj ≥ v({j}), ∀ j ∈ N We denote by I(v) the set of imputationsPof v ∈ GN . Clearly, I(v) is empty if and only if v(N ) < j∈N v({j}). We now present a set-valued solution concept called core [3], [9], [5], [4]. The core was first proposed by Francis Ysidro Edgeworth in 1881, and later defined in game theoretic terms in Shapley 1952 and in Gillies 1953. Definition 3 (Core, [3], [9], [5], [4]). The core C(v) of a TU-game v ∈ GN is the set X rj 0 ≥ v(N 0 ), ∀ N 0 ∈ 2N \{∅} C(v) = r ∈ I(v), 0 0 j ∈N
0
If r ∈ C(v), then no coalition N has an incentive to split off if r is theP proposed payoff allocation in N , because the total amount j 0 ∈N 0 rj 0 allocated to N 0 is not smaller than the amount v(N 0 ) which the players can obtain by forming the sub-coalition. If the core is non-empty, then the core is a polytope because bounded and it is a finite system of linear inequalities. A. Shapley value We now focus on the Shapley value. The Shapley value uses the marginal vectors of a cooperative game. Definition 4 (Shapley value). Let Sn be the set of permutation N . Let π ∈ Sn . The set pπ,k = {k 0 ∈ N , | π −1 (k 0 ) < π −1 (k)} consists of all predecessors of k with respect to the permutation. The marginal contribution vector mπ,k ∈ Rn with respect to π and v has the k-th coordinate mπ,k (v) = v(pπ,k ∪ {k}) − v(pπ,k ), k ∈ N . The Shapley value Sh of a game v is the Paverage of the marginal vectors of the game: 1 Sh(v) = n! π∈Sn mπ (v) The Shapley value satisfies in particular efficiency and anonymity. B. Coalitions and distributed strategic learning We introduce learning algorithms for coalition formations. Due to randomness and variability of the network, coalitions are not static and may evolve over time. We propose adaptive coalitions in the long-term. The formation of coalitions will be done via a learning process. Formation of coalitions and coalition learning : We introduce coalitional combined fully distributed payoff and strategy (CODIPAS) learning. The coalitional CODIPAS allows to learn simultaneously which coalition will be formed, the payoffs and the strategies. We refer to [10] for more details on CODIPAS and its variants. Here we use the imitative coalitional CODIPAS which has interesting properties in terms of pure strategies selection (pure coalition).
Imitative CODIPAS is based on the previous strategy and the estimated payoffs by a user. However, the imitative CODIPAS developed in [10] is slightly different than the coalitional imitative CODIPAS in two ways. The first difference is about the payoffs. Depending on the coalition, and the allocation procedure inside each coalition, the perceived payoff will be different. Thus, there is an intra-coalition interaction and a cross-coalition interaction aspect which is missing in the classical strategic learning. Inside each coalition, the users can have a learning process to get a better outcome for the coalition. Hence, coalitional learning adds another dimension to the selection issue. We present the coalitional learning and explain how coalitions will be formed. The algorithm has two main components: average payoff estimation and optimal strategy-learning. The payoff estimation is given by j j j j rˆC,t+1 = rˆC,t + µj,t 1l{aj,t =C} (rC,t − rˆC,t ),
(1)
j where rˆC,t is the estimated payoffs at time t in coalition j C, rC,t is perceived (experimented) payoff at time t after choosing coalition C and µj,t ∈ (0, 1) is a learning rate for payoff estimation. 1l{aj,t =C} denotes the indicator function. The indicator term is equal to 1 if user j is with coalition C and 0 otherwise. Note that, using (1), only one component is updated at a time, which reduces considerably the complexity per iteration. In order to know the payoffs, the user needs to experiment some other coalition (exploration). Then, the question is how to explore? A relatively simple way to learn from others experimented actions is by imitating them. Imitative learning in games is a learning process whereby a player observes/measures and replicates another’s in proportional way. The “imitative” process consists to do an act from seeing it done (experiences). This means that the probability that a player adopts the most successful action played in the already experimented actions is higher and proportional to the previous probability. In that context it is natural to ask how the next strategy will be constructed from the current strategy. This is what is behind the
strategy-learning equation: xjC,t+1 =
j r ˆ C,t
xjC,t (1+λj,t ) P
C0
j r ˆ 0 C ,t
xjC 0 ,t (1+λj,t )
where λj,t > 0 is learning rate for strategies. The number xjC,t is positive and can be seen as the investment of user j into coalition C at step t. Note that xjC,t = 0 if j is not in coalition C. C 0 is a coalition that contains j. The strategy learning can be interpreted as the cost of learning with moving cost equal to the relative entropy. By changing the function λj,t one can cover the rationality level of player j. For small rationality level one gets the replicator equation and for high rationality level one gets the best reply dynamics. Interestingly, both extreme cases are reasonable and may guarantee the formation of stable coalitional structures. Note that the convergence of the learning process of each player depends on the others. This says that the coalition formation process depends on the learning process of all the possible
members. C. Evolutionary coalitional games We consider an evolutionary game with finite number of players. Each member of the population can participate in some coalition or be alone (non-cooperatively). The members of the same coalition will make their decision jointly, share information and exchange information inside that coalition. They also try to learn their payoff (called the value of the coalition) through an evolutionary dynamics. Inside each coalition, an allocation rule is applied. The coalitions evolve according to a evolutionary process based on individual payoff and coalition experimentations. 1) ESCS: We introduce the concept evolutionarily stable coalitional structure (ESCS), which is, when it is formed it is resilient by small perturbation of strategies. The notion of ESCS is the analogue of the ESS (evolutionarily stable state or strategy) in evolutionary games but here we use it for finite number of users. Definition 5 (ESCS). A strategy profile y = (y j , y −j ) is an ESCS if ∀ y 0 6= y, there exists y0 > 0 (called coalitional invasion barrier) such that ∀ ∈ (0, y0 ), rj (y j , (1 − )y −j + y 0 where
−j
j
) > rj (y 0 , (1 − )y −j + y 0
−j
), .
0
j y j = (yC )C3j , y −j = (y j )j 0 6=j .
ESCS is an important notion since it determines stability, robustness and local efficiency. The coalitional CODIPAS learning scheme converges to an ESCS starting from an interior point. The solution of the evolutionary coalitional game is the coalitional structure which can exhibit some stability notions. An ESCS satisfies both internal and external stability notions. Internal stability implies that, given a coalition, no user in this coalition has an incentive to leave this coalition and act non-cooperatively as a singleton, since the payoff any player receives in the coalition is higher than that received when acting non-cooperatively. External stability implies that, in a given partition, no player can improve its payoff by switching its current coalition and join another one. Note that ESCS is well-defined even if the core is empty. In order to know the stable coalitional structure, we compute them from payoff inequalities or we run the dynamics starting from interior point. The vector field of the dynamics determines all the ESCS. III. R ANDOM ACCESS C ONTROL M ODEL Consider a random access network composed by powerlimited users communicating to a common access point in a coordinated way. Each user can be in one of the two states: active or dormant. In an active state, a user attempts a communication. In a dormant state, a user shuts off in order to maximize the battery lifetime. Transmitted packets are successful only if there is a single active user in the range, otherwise collision occurs. In order to reduce collisions, users
are allowed to form coalitions and the users that are in the same coalition can share their activity plans. In coalition C we choose Shapley value as allocation procedure, i.e., the j payoff of player j is rC,t = ShjC,t , the Shapley value of j in coalition C. The value of coalition C is Y Y vC = 1 − (1 − pj ) (1 − pj 0 ), j 0 ∈N \C
j∈C
where N = {1, 2, . . . , n}, n ≥ 2 is the grand coalition, C ⊆ N , and pj is the probability for user j to be active. In addition, there is a cost C for making the coalition C. The cost C captures the cost of communication and information exchange to reach a local consensus inside the coalition. The pair (N , v) defines a cooperative game in the sense of von Neumann-Morgenstern (1944). Evolutionary coalitional structure: two users Consider two users. Then, the coalitional structure is given by {({1}, {2}), ({12})} . User j has a probability pj ∈ (0, 1) of having a packet to send (or being in active mode) and a probability xj to transmit if she is in a singleton coalition. Then, the probability user 1 or user 2 has a packet to send is p1 + p2 − p1 p2 = 1 − (1 − p1 )(1 − p2 ) i.e. the complementary of the probability that no user has a packet. Thus, the value of coalition C = {12}, which corresponds also to full cooperation is rC = p1 + p2 − p1 p2 . In order to know if users have incentive to form a coalition we examine the payoff when the users are outside the coalition, i.e., j in a non-cooperative situation. Denote by r{j} the payoff of user j in the non-cooperative game. Then, user 1 has 1 2 r{1} = p1 x1 (1 − p2 x2 ). Similarly, r{2} = p2 x2 (1 − p1 x1 ). It is clear that x1 = x2 = 1 is one of the equilibria of the noncooperative game and the payoff of user 1 at this equilibrium is p1 (1 − p2 ). Similarly, user 2 will get p2 (1 − p1 ). The users share the value inside the coalition C according to the 1 2 rC −r{1} −r{2} 1 Shapley value procedure i.e. user 1 gets r{1} + 2 rC −r 1
−r 2
{1} {2} 1 and user 2 gets r{1} + 2 We verify that the surplus is positive:
1 2 rC − r{1} − r{2}
= p1 + p2 − p1 p2 − p1 (1 − p2 ) − p2 (1 − p1 ) = p1 p2 > 0 Thus, the cooperation cost C will be compared the gain p1 p2 . Proposition 1 (Shapley value for two users). The Shapley value vector is given by Sh1C = p1 (1 − p22 ) − 2C and Sh2C = p2 (1 − p21 ) − 2C This Proposition answers to the first part of Q1 for the case where C < p1 p2 . We propose a very simple procedure for evolution of coalitions with symmetric strategies. It is based on the replicator equation which is the scaled limit of our
imitative CODIPAS learning algorithm: p1 p2 − C 1 1 1 1 , y˙ C (t) = yC (t)(1 − yC (t))yC (t) 2 1 1 where y˙ C (t) denotes the time-derivative of yC (t). Thus, if the initial distribution is not a pure strategy then, if p1 p2 > C 1 1 0 if p1 p2 < C yC (t) −→ 1 yC (0) if p1 p2 = C
This means that the two users will form a coalition in the long term only when the cost of coalition formation C is less than the expected gain per user p1 p2 . Otherwise the system stays non-cooperatively. Partial or fuzzy or quantum coalition may arise if C = p1 p2 . For non-symmetric strategies, the system becomes 1 1 1 2 y˙ C (t) = yC (t)(1 − yC (t))yC (t) p1 p22−C 2 2 2 1 y˙ C (t) = yC (t)(1 − yC (t))yC (t) p1 p22−C Proposition 2 (Convergence issue). • If the cooperation cost is less than the benefit of cooperation compared to non-cooperation, i.e., C < p1 p2 then the two 2 1 (t) goes to 1 as t goes to (t) and yC components yC infinity, starting from non-zero probabilities. • If the cooperation cost is greater than the gain from cooperation compared to non-cooperation, i.e., C > 2 1 (t) (t)yC p1 p2 then the product of the two components yC j goes to 0 as t goes to infinity starting from yC (0) < 1. Proposition 2 answers to the question Q1 when the condition C < p1 p2 meets. It answers to question Q2 by using the imitative dynamics [11] and the CODIPAS. It answers to Q3 for all the initial conditions. The final topology depends on the starting point and the cost of coalition formation C . Remark 1. Interestingly, the coalition between the two users will be formed only if the cost of making the coalition is not bigger than p1 p2 . In particular, coalition is not always a better issue. In other words, the coalitional structure is unstable (in the sense of Lyapunov) for C > p1 p2 . Corollary 1. If C < p1 p2 then the coalitional structure ({12}) is the unique ESCS. If C > p1 p2 the coalitional structure ({1}, {2}) is the unique ESCS and the core is empty. Since we have shown the convergence of the dynamics in Proposition 2, we now quantify how many time it will take to be in a neighborhood of the steady state with a certain fixed tolerance parameter η. Proposition 3 (Explicit solution). The explicit solution of the ODE from (p starting 0 < y(0) < 1 is given 1by y(t) = 1 p2 −C ) t κ y −1 0 2 F e e− y , κ0 = e , where F (y) = 1−y y(0) 1 ln 1−y(0) − y(0) Proposition 4 (Convergence time). Suppose that C < p1 p2 . Let Tη = inf{t | |1 − y(t)| ≤ η}, be the convergence time within a range of size η to 1. Then the following hold:
• •
The convergence time Tη is finite for any η ∈ (0, 1). the convergence time Tη to be within η range of the steady state 1 is exactly 2 1−η 1 Tη = ln( )+ − κ0 (2) p1 p2 − C η η−1
The result of Proposition 4 is very important since it provides the exact convergence time (not a bound) of the coalition formation dynamics. The convergence time Tη decreases with p1 p2 − C > 0 and with η. IV. C ONCLUDING REMARKS We have examined evolutionary coalitional games for random access control in wireless networks. We have introduced cost of cooperation due to information exchange, mini-slots losses, time delays and negotiation frames. Our results clarify that coalition formation is not cost-free and will not be adopted all the time. We have shown that in some range of parameters, coalitions will be formed in the long-run. However, in some other range of parameters, one of the observations is that the grand coalition is often unstable for a multi-user wireless network. This approach shows how coalitions can be formed by means of distributed strategic learning and finds evolutionarily stable coalitional structure even when the core is empty. R EFERENCES [1] Norman Abramson. The aloha system: another alternative for computer communications. Proceedings of the November 17-19, 1970, fall joint computer conference, pages 281–285, 1970. [2] R.J. Aumann and J.H. Dr`eze. Cooperative games with coalition structures. International Journal of Game Theory, 3:217–237, 1974. [3] F. Y. Edgeworth. Mathematical psychics. Kegan Paul Publishers, London, reprinted in 2003, P. Newman (ed.): F. Y. Edgeworth’s Mathematical Psychics and Further Papers on Political Economy. Oxford University Press, 1881. [4] D. B. Gillies. Solutions to general non-zero-sum games. In A. W. Tucker and R. D. Luce (eds.): Contributions to the Theory of Games IV. Princeton University Press, pages 47–85, 1959. [5] D.B. Gillies. Some theorems on n-person games. Ph.D. thesis, Department of Mathematics, Princeton University, 1953. [6] M.A. Khan, H. Tembine, and A.V. Vasilakos. Evolutionary coalitional games: design and challenges in wireless networks. IEEE Wireless Communications Magazine, 19(2):50 – 56, 2012. [7] K.J. R. Liu. Cognitive radio games. IEEE Spectrum, 48(4):40–56, April 2011. [8] C. E. Shannon. Two-way communication channels. in Proc. 4th Berkeley Symp. Math. Satist. Probab., 1:611–644, 1961. [9] L. S. Shapley. Notes on the n-person game iii: Some variants of the von-neumann-morgenstern denition of solution. The Rand Corporation RM-817, 1952. [10] H. Tembine. Distributed strategic learning for wireless engineers. CRC Press, 2012. [11] H. Tembine, E. Altman, R. ElAzouzi, and Y. Hayel. Evolutionary games in wireless networks. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 40(3):634–646, 2010. [12] J. Von Neumann and O. Morgenstern. Theory of games and economic behavior. Princeton:Princeton University Press; 2nd edn. 1947; 3rd edn. 1953, 1944.