Collaborative Learning in Strategic Environments

Report 2 Downloads 270 Views
From: AAAI Technical Report SS-02-02. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved.

Collaborative Learning in Strategic Environments Akira Namatame,Noriko Tanoura, Hiroshi Sato Dept. of ComputerScience, National Defense Academy Yokosuka, 239-8686, JAPAN E-mail: { nama,hsato} @cc.nda.ac.jp Abstract Thereis no presumptionthat collective behaviorof interacting agents leads to collectively satisfactory results. Howwell agents can adapt to their social environmentis different to howsatisfactory a social environment they collectively create. In this paper, we attempt to probe a deeper understandingof this issue by specifying howagents interact by adaptingtheir behavior. Weconsider the problems of asymmetric coordination, which are formulated as minority games, and we address the followingquestion: howdo interacting agents realize an efficient coordinationwithoutany central authority through self-organizing macroscopicorders from bottomup? Weinvestigate several types of learning methodologies including anewmodel,give-and-takelearning, in whichagents yield to others if they gain andthey randomize their actions if they lose or do not gain. Weshowthat evolutionary learning is the mostefficient in asymmetric strategic environments. Keyword:aymmetric coordination, social efficiency, evolutionarylearning, give-and-takelearning

1. Introduction

ensurethat their individual actions are carried out with little conflicts. Weclassify this type of coordinationas asymmetric coordination[7]. Considerthe followingsituation: A collection of agents have to travel using one of the route A or B. Each agent gains a payoff if he chooses the sameroute what the majority does. This type of coordination is classified as symmetriccoordination. Onthe other hand, each agent gains a payoff if he chooses the opposite route what the majority does. This type of coordination is classified as asymmetric coordination. Coordination problems are characterized with many equilibria, and they often face the problemof coordination failure resulting from their independentinductive processes [ 1][4]. Aninteresting problemis then underwhatcircumstances will a collectionof agentsrealizes a particular stable situation, and whetherthey satisfy the conditions of social efficiency? In recent years, this issue has beenaddressedby formulating

Thereare manysituations whereinteracting agents can benefit fromcoordinatingtheir actions. Social interactions pose many minority games(MG)[2][10]. However,the growingliterature on MGtreats agents as automata, merely responding to coordination problems for individuals. Individials face problemsof sharing and distributing limited resources in an changingenvironmentswithout deliberating about individuals’ decisions[13]. Thereis no presumption that the self-interested efficient way. Consider a competitive routing problem of networksin whichthe paths from sources to destination have behavior of agents should usually lead to collectively to be established by multiple agents. In the context of traffic networks, for instance, agents have to determinetheir route independently, and in telecommunicationnetworks, they have to decideon whatfraction of their traffic to send on eachlink of the network. Coordinationimplies that increased effort by someagents leads the remainingagents to follow suit, whichgives rise to multiplier effects. Weclassify this type of coordination as symmetriccoordination[3]. Coordinationis also necessaryto

61

satisfactory results [8][9]. Howwell each agent does in adapting to its social environmentis not the samething as howsatisfactory a social environmentthey collectively create for themselves. Aninteresting problemis then under what circumstanceswill a society of rational agents realize social efficiency? Solutionsto these problemsinvokethe intervention of an authority whofinds the social optimumand imposes the optimalbehaviorto agents. Whilesuch an optimal solution maybe easy to find, the implementationmaybe difficult to enforcein practical situations. Self-enforcingsolutions, where

agents achieve optimal allocation of resources while pursing their self-interests withoutany explicit agreementwith others are of great practical importance. Weare interested in the bottom-upapproach for leading to moreefficient coordinationwith the powerof moreeffective learningat the individual levels [I 1]. Withinthe scopeof our

previousnights. Whatmakesthis problemparticularly interesting is that it is impossiblefor each agent to be perfectly rational, in the sense of correctly predicting the attendance on any given night. Thisis becauseif mostagentspredict that the attendance

model, we create models in which agents makedeliberate decisions by applyingrational learning procedures.Weexplore the mechanismin which interacting agents are stuck at an inefficient equilibrium. While agents understand that the outcomeis inefficient, each agent acting independently is powerlessto managedecisionswhcireflect collective activity.

will be low(and therefore decide to attend), the attendance will actually be high, whileif they predict the attendancewto be high (and therefore decide not to attend) the attendance will be low. Arthurinvestigated the numberof agents attending the bar over time by using a diverse population of simple rules. Oneinteresting result obtained wasthat over time, the averageattendance of the bar is about 60. Agentsmaketheir

Agents also maynot knowabout what to do and also howto makea decision. The design of efficient collective action is crucial in manyfields. In collective activity, two types of

choices by predicting ahead of time whether the attendance on the current night will exceedthe capability and then take the appropriate course of action. Arthur examinedthe dynamic

activities maybe necessary: Eachagent behavesas a member of society, while at the sametime, it behavesindependently by adjusting its view and action. At the individual level, it learns to improveits action basedon its ownobservationand experiences. At the samelevel, they put forwardtheir learnt knowledgefor consideration by others. Animportant aspect

driving force behindthis equilibrium. The Arthur’s "El Farol" modelhas been extended in the form as Minority Games(MG),whichshowfor the first time howequilibrium can be reachedusing inductive learning [2]. The MGis played by a collection of rational agents G = {At : 1 < i < N}. Withoutlosing the generarity, we can

of this coordinationis the learningrule adaptedby individuals.

assume N is an odd number. On each period of the stage

2. Formalismof AsymmetricCoordination and Minority Games

game, each agent must choose privately and independently betweentwo strategies S = {SI, S2 }. Werepresent the action ofagentA~at the time period t by ai(t ) = 1 if he choosesSi. and at(t ) = 0 if he chooses S2. Given the actions of all

The EL Farol bar problem and its variants provide a clean and simple example of asymmetric coordination problems agents, the payoffof agent Ai is givenby Ea,(t)lN 0.5 heterogeneousagents. There is a bar called El Farol in the downtown of Santa Fe. In Santa Fe, there are agents interested in going to the bar each night. All agents have identical preferences. Eachof them will enjoy the night at El Farol very muchif there are no morethan the threshold numberof agents in the bar; however,each of themwill suffer miserably if there are more than the threshold numberof agents. In Arthur’s example, the total numberof agents is N=IO0,and the threshold numberis set to 60. The only information available to agents is the numberof visitors to the bar in

62

(2.1)

Each agent first receives aggregate information p(t) which represents all agents’ actions, and then he decides whetherto choose$1 or S,. Eachagent is rewardedwith a unitary payoff wheneverthe side he chooses happens to be chosen by the minority of the agents, while agents on the majority side get nothing. All agents have access to public information on the record of past histories on p(~r), Z" < t. The past history available at the time period t is represented by /2(t). How do agents choose actions under the commoninformation bt(t)? Agentsmaybehavedifferently becauseof their personal

beliefs on the outcomeof the next time period p(t + 1),

chooseS~at the timeperiod t.

whichonly dependson what agents do at the next time period t+l, and the past history #(t) has no direct impacton it. Weanalyze the structure of the MGto see what we should expect. Thesocial efficiency can be measuredfromthe average payoff of one agent over a long-time period. Consider the extremecase whereonly one agent take one side, and all the others take the other side at each time period. The lucky agent gets a reward, nothing for the others, and the average

~

Table1 The payoff matrix of the minority games mhzfs

stramgy

S1 (go)

82 (stay)

Si (go) 82 (stay)

payoff per agent is 1/N. Equally extreme situation is that when(N-I)/2 agents on one side, (N+1)/’2agents on the other side wherethe average payoff is about 0.5. Fromthe society

N=50

point of view,the latter situation is preferable. The MGgameis characterized with manysolutions. It is easy to see that this game has (N- I)/2 asymmetric Nash

N=50

equilibria in pure strategies in the case whereexactly (N-I)~2 agents choose either one of the two sides. The gamealso presents a unique symmetricmixedstrategy Nashequilibrium in which each agent selects the two sides with un equal probability. Withthis mixedstrategy, each agent can expect the payoff 0.5 on each time period, and the society payoff follows a binomial distribution with the meanequal to N/2 and the variance N/4. The variance is also a measureof the degreeof social efficiency. Thehigher the variance, the higher the magnitude of the fluctuations around N/2 and the correspondingaggregatewelfare loss. Several learning rules have been found to lead to an efficient outcomewhenagents learn fromeach other [2][15]. Howexactly does an agent’s utility depend on the numberof total participants? Wenow showthe MGcan be represented as 2x2 gamesin which an agent play with the aggregate of the society with payoff matrix in Table 1. Let supposeeach agent plays with all other agents individually with the payoff matrix in Table 1. The average payoffs of agent Ai from the play S~and S2 with one agent are given: Ui(Sl) ffi 1 - ~ai(t)/N Isi:gN U,(S,,) ~, a,(t)lN

(2.2)

I
Recommend Documents