Decision-Theoretic Proactive Communication in ... - Semantic Scholar

Report 2 Downloads 67 Views
2005 IEEE International Conference on Systems, Man and Cybernetics Waikoloa, Hawaii October 10-12, 2005

Decision-Theoretic Proactive Communication in MultiAgent Teamwork Yu Zhang Department of Computer Science Trinity University San Antonio, TX 77812, USA [email protected]

Thomas R. Ioerger and Richard A. Volz Department of Computer Science Texas A&M University College Station, TX 77843-3112, USA {ioerger, volz}@cs.tamu.edu

Abstract - Despite the significant progress in multi-agent communication decision, existing research does not address two important issues: 1) how to optimize timing and currency of communication when dealing with dynamically changing information, and 2) what kinds of communication decision interdependency exist in a team and how this interdependency impacts a single agent’s communication decision. To provide a more comprehensive solution to the communication problem, we present a new model, DTPC (Decision-Theoretic Proactive Communication), which uses statistical analysis of information production and need of team members and a dynamic decision-theoretic determination of communication policies. We show experimentally how DTPC decreases communication load while improving performance of distributed teamwork. Keywords: Multi-agent systems, agent communication, teamwork, decision theory

1

Introduction

Teamwork is a cooperative effort by a team of agents to achieve a joint goal [7]. Recent research in agent communication provided a range of successful models, trading off communication complexity and efficiency. These models vary from making domain-dependent assumptions [1] or relying on heuristics [9, 3], to finding approximate communication decision solutions without domain-dependent assumptions or heuristics [4, 6, 10, 7]. However, two important issues in communication have not been addressed. First, how to optimize timing and currency of communication when dealing with changing information. Information is changed in the environment dynamically, and the degree of use of the information may be different, too. For some information, agents must consume all changes (e.g., new enemy target identified); while for other information, agents do not necessarily have to process each change (e.g., current location of friendly aircraft). Agents need to check every production of the first type of information, while the check to the second type of information depends on agents’ needs [5, 10]. Second, what kinds of decision interdependencies exist in a team and how they impact a single teammate’s communication decision. Therefore, a

0-7803-9298-1/05/$20.00©2005 IEEE

more comprehensive solution that supports effective communication during teamwork processes is of particular importance. This paper introduces a new model called DTPC (Decision-Theoretic Proactive Communication). Proactivity is the ability to take initiative by exhibiting goal-directed behavior [6]. An effective team often can respond to external stimuli in a timely way, and they can also prepare knowingly for some unexpected future [5, 10]. Hence, the ability to anticipate information needs of teammates and assist them proactively is highly desirable. While an agent can anticipate certain information needs of teammates, it may not always be able to predict all of their needs, especially if the team interacts with a dynamic environment. Therefore, when an agent needs some information, it is also necessary to anticipate information production of teammates and ask for the information actively. Proactive communication allows agents to tell others proactively about a piece of information when producing it or to ask actively for a piece of information when needing it. Proactive communication increases communication effectiveness in three ways. First, messages are conveyed to agents when they need an information item, rather than always sending it to them. Second, proactive telling can partially eliminate the need to ask. Third, if there is no proactive telling, active asking may eliminate multiple asks, i.e. only ask one provider per need.

2

Basics

Communication generally involves two parts: needer and provider. For clarity, we will assume that the system consists of two agents, a, a provider, and b, a needer. The ideas can be extended readily to larger numbers of agents if desired. A needer needs to know what to do when it needs or receives an information item I, while a provider must decide what to do when it produces I or receives a request for I. Table 1 delineates different situations that each needer and provider will face when making decisions, and policies they might employ in these situations [11]. In order to make their communication decisions, agents need to consider the relationship between the time at which information is needed and the time at

3903

which it is produced. The various policies involve using the information produced at different times or satisfying needs at different times. Thus, to describe the range of possibilities encompassed by the different policies adequately, several different time points must be defined. Table 1. Situations and Policies Policy ProactiveTell: a proactively provides I. Silence: a does not provide I. Reply: a provides most recent value for I. WaitUntilNext: a waits until next production of I and then provides I. ActiveAsk: b actively asks for I. Silence:b does not ask for I and uses the most recent value it has. Wait: b waits to be told I proactively. NB - needer Accept: b accepts I (but will not notify a). receives I Situation PA - provider produces I PB - provider receives a request for I NA - needer has a need for I arise

3

DTPC Model

We need to generalize the notion of situation slightly to include the specific item of information I with respect to which it occurs. Let t be the time of occurrence of a situation. We denote a situation by St = (SU, I) for SU ∈ {PA, PB, NA, NB} and I ∈ set of information items. The policy, denoted δ, used to respond an situation St is also relevant. There are two time points closely related to information need and production: tn the time at which a need for I occurs, and tp the production time of a value for I which is provided for the need at tn. Because obtaining I may involve sending messages, these messages are also part of our model. Then, letting E denote a finite set of states, we define our model DTPC as: where:

Table 2 defines three sets of relevant points in time, for situations PA, PB and NA, respectively. Situation NB is not included because in this case agent b’s choice is deterministic, thus Accepts I. However, agent b may use I or not, which depends on the time at which b needs I. These uncertainty brings difficulties to a‘s estimation of the time of the need.

z

z

z z

Table 2. Relevant Time Points and Their Relations PA Ta0,P : the time at which a produces a value for I. { Ta1,P , Ta2,P , …}: the (ordered) set of times at which a will produce I in the future. ls : a, P

T

the time of the last value for I

z

z

Tals, P < Ta0,P < Ta1,P < Ta2,P < …, z

Tals, P < Tb , N .

a sent to b.

4

Tb , N : the time of a need after Tals, P . PB Tb,q: the time at which b requests I. Taq, 0P : the latest production time before Tb,q. q1 : a, P

T

T

q1 . a, P

T

U: e × St × δ × tn × tp × {m}→ R

NA Tb0,N : the time at which b’s most recent need for I arises.

Taa,1P : the time at which a produces I next. Tb,r: the time at which b most recently received a value for I.

Utility

The utility function is a mapping from agent’s internal states, the current situation, communication policies and time points tn and tp, and messages in M to a real number:

subsequent to Tb,q.

recently produced I.

E = {e} is a finite set of states. Each state e = (εa, εb) where εj ∈ Ei, i = {a, b}, are the local states of the corresponding agents. {St} is the set of possible situations occurring at t. {δ} is a finite set of policies. {tn}=N (the natural numbers) is the set of times at which a need may occur. {tp}=N is the set of production times of I which are provided for the needs at {tn}. M is the set of messages. A special message that belongs to M is the null message which is denoted by ϕ. This message is chosen by an agent that does not want to communicate to the other agents. U is a utility function assigning a value to the use of a specific policy δ.

4.1 Modeling the Utility Function q0

Tb,r≤ Taa, 0P ≤ a1

Tb0,N < Ta , P .

where {m} is the set of messages used by the policy δ. A combination of situation/policy would bring about different time points defined in Table 2 to be used for tp and tn [20]. Some of these time points depend on a sequence of decision interactions that has taken place or will take place in the future. The difficulty is that one has to provide values for tp and tn at time t without knowing this sequence. To provide an approximation for the utility function, we use the expected value of the utility function with respect to tp and tn, using

3904

probability mass functions obtained via the Empirical Distribution Function (EDF) [2] process. The distributions of tp and tn can be arbitrarily complex. If we have no other information, the entire sample will be the best estimate of the population as long as the current samples are randomly generated according to the actual distribution. This feature allows us to take advantage of some previous data and to use the EDF to estimate the distributions. In order to make the interval lists known, each request by a needer will be accompanied by the change in the need list since the last request and each tell of the information will be accompanied by the change in the production list since the last tell. For a given policy δ, {m} is known. Thus, for a given decision point for St, δ, and e, U becomes a function of two variables, tp and tn Then, the expected value of U may be computed as: ∞ ∞ E(U)= Pr(t p , t n ) × U(t p , t n ) dt n dt p ,

∫ ∫ t ns

t ps

where for each of the possible policy, tp and tn are replaced by the variables of Table 2 and tps and tns are lower limits of these variables, based on the time relationship described in Table 2. Since we are only able to determine a discrete approximation to the probability using EDF, we use a discrete approximation to this integral function. The details of this approximation, however, are beyond the scope of this paper.

4.2 Quantifying the Utility Function Three factors are generally needed to be included in the form of the utility function: 1) the value V, gained from the information delivered; 2) the cost C, to send a message caused by resource constraints (network bandwidth constraints, etc.); 3) the risk R, due to unwanted revelation of the information, such as being overheard by an enemy on a battlefield. We define utility as the difference between V, and C and R. R is generally situation-dependent (refer to section 7), so defining C and V becomes our imminent task. The cost of sending message {m} is assumed to be: if {m} = ϕ C({m})=  0 k 0 + k 1 × len({m}) otherwise where len({m}) is length of messages {m}, and k0 and k1 are coefficients. We measure the value gained by having I by two factors: currency of the information to the need and timeliness of the fulfillment of the need. V(St, δ, tn, tp) =Ts(tn, tp)×P(St, δ, tn, tp) + Tf(tn, tp)×(1−P(St, δ, tn, tp)). where Ts denotes the reward, in terms of timeliness, by successfully using the most recent I, Tf denotes the reward to using old I, and P denotes the probability of using the most recent I.

We assume that the timeliness Ts can be represented by a function fs of the time difference between tp and tn, and we consider fs as a decreasing function of the time difference. First, we define a function: d(tp, tn) = max(0, tp-tn). We then define a non-increasing function fs and use fs (d(t1, t2)) as the timeliness function. fs may have various forms. For example, it might decrease exponentially, or it might be constant for a length of time and zero thereafter, indicating that the information must be consumed in a finite length of time or it is useless. Tf, is domain dependent. In some circumstances, old information still has value. For example, if an item (say enemy troop location) has not been processed it still has value, though this case might reduce to a more simplistic decision algorithm (always send I to a needer). Thus, at the highest level, we represent Tf by a function Tf = ff() that expresses the pertinent factors. There are many forms that ff could take for different types of I; this provides flexibility of defining the currency based on various focuses of different domains. Finally, we consider the currency of the information. The general idea we will use for developing a model of currency is that value of I at time tp should not change between the time it is produced and the time it is used to satisfy the need at tn. Let tu be the time at which I is used by the needer for the need at time tn, then tu=max(tn, tp). We consider the probability that the value does not change in time interval (tp, tu] and use that as the basis for defining the currency function P: P(St, δ, tn, tp) = Pr( ¬∃ τ∈Int(tp, tu] ∋ IP(τ) | St ∧ δ), where Int(tp, tu] denotes the interval between the two time points, noting that the time order is unspecified; IP(τ)denotes the production of a value for I at τ.

5

Multi-Agent Comm. Processes

Fig. 1 shows finite state diagrams representing communication processes of getting and telling an information item, respectively. Each node represents a decision point. As one proceeds through the graph, the nodes represent alternating decisions by the needer and the provider. The nodes marked “e” are special in the sense that they represent the receipt of the information. For example, in situation PA of, a may receive an ActiveAsk from b when deciding to keep Silence to b. In such a case, the state will transfer to the start state of situation PB, the situation where a receives a request from b. In this case, a needs to update its data about b’s need time and decide if to Reply b right away or WaitUntilNext production. By either decision which a will make, b is able to receive an information item.

3905

b-a: Accept

1

a-b: ProactiveTell

Fig. 3 and 4 show algorithms for providing a piece of information to a needer, or getting a piece of information from a provider. Generally, agents select a policy that has maximum expected utility and act corresponding to that and their counterpart’s response. We set a time cutoff To, to guarantee that the system does not go into a waiting forever state. Thus, we need secondary decisions if the delay has expired. The algorithms simply loop back to the policy selection point in such cases, but with the additional information that the timeout occurred. Since time has passed and data has been updated, each policy may generate different utility.

e

0 b-a: Wait a-b: Silence

b-a: Silence

e

b-a: ActiveAsk

t

2

a: provider

b: needer

e

e: end

t: transfer

Situation PA: Provider produces a new piece of information a-b: Reply

e

a-b: WaitUntilNext

e

0 t

/*Executed when provider is in situation PA at time t. Let pendWUNList be a list of needers whose requests will be replied with WaitUntilNext production.*/ provideNeededInfo(provider, needers, I, t){ updateSelfData(provider, I, t); if (pendWUNList != null) reply I to A0;// the first needer on pendWUNList updateOtherData(A0, I, t); remove A0 from pendWUNList; exit; δi= selectPolicy(provider, needers, I); switch(δi) case ProactiveTell: ProactiveTell needersi; updateOtherData(needersi, I, t); break; case Silence: Silence; break; }

Situation PB: Provider receives a request for a piece of information e b-a: Silence a-b: R eply

t

1

0

b-a: ActiveA sk a-b: W a itU n tilN ext a-b: Silence

b-a : W ait

e e

2 a-b: Pro activeT ell

t

S ituatio n N A : N eeder n eed s a piece o f inform ation

0

b-a: Accept

e

t

Situation NB: Needer receives a piece of information

/*Executed when provider is in situation PB at time t.*/ receiveRequest(provider, needer, I, t){ δi = selectPolicy(provider, needer, I); switch(δi) case Reply: Reply needer; updateOtherData(needer, I, t); break; case WaitUntilNext: add needer to pendWUNList; break; }

Fig. 1. Decision-making processes of provider and needer

6

IAPIE

IAPIE (Inter-Agent Proactive Information Exchange) is the overall process for managing communication. It has three parts: an algorithm for selecting a policy, algorithms for getting needed information, and algorithms for providing information. Fig. 2 shows the algorithm for selecting a policy. /*Self is an agent who makes the decision; counterpart is an agent set about whom the decision is made.*/ selectPolicy(self, counterpart, I){ policyList = null; ∀ Agi ∈ counterparts ∀ policy δi identify(self, Agi, δi, I); U(δi)=evaluate(self, Agi, δi, I); select one δi with maximum U; add δi to policyList; select one δi with maximum U from policyList; return δI;} Fig.2. A strategy selection algorithm

Fig. 3. Algorithms about providing needed information

7

Experiments

In the experiments introduced in this section, we used the Multi-Agent Wumpus World domain [11]. The world is 20 by 20 and has 4 wumpuses. Each wumpus is assigned a hearing radius of 8 by 8 when it is generated. All wumpuses have a 10% probability of hearing sounds (messages) within their radius. A wumpus does not always hear messages because it does not always focus on hearing (e.g., it sleeps some). However, once a wumpus overhears a request message from a fighter to a carrier, it will be alerted and will

3906

focus on the coming reply from the carrier. Once a wumpus is alerted by a message, it can sense the adjacent fighter and start to fight with the fighter. The wumpus has 10% chance to win the fight. Therefore, if more information has been sent, chances are greater that the fighters may be killed, and consequently fewer wumpues will be killed in the limited length of time. Moreover, the game may be forced to end before the time limit if there are no fighters left. /*Executed when needer is in situation NA at time t.*/ getNeededInfo(needer, providers, I, t){ set time cutoff To; waitTime = 0; boolean obtained=FALSE, waiting=FALSE; updateSelfData(needer, I, t); //update self need time

where k is the number of wumpuses a fighter can kill per step, t is the number of steps passed, Prh is the probability that the wumpus whose location is sent by the carrier hears the message, and Prf is the probability that the wumpus can win against the fighter who receives the message. To give a number to k for initial tests, we ran the system in a trial mode and k was estimated by the data collected from previous test runs. We define the form of the timeliness function fs as: if t p < t n . fs = 0 k(t − t ) otherwise p  n The rationale for this form of timeliness is that the further in the future the used value will be produced, the more likely the needer is losing timeliness while waiting. Hence this form is the non-decreasing function of tp−tn. ff is domain dependent and can be determined on many different bases. In this domain, the wumpuses periodically jump to some other random location and the time duration of staying on one place is also random. Therefore, once a wumpus jumps (meaning the information changes), there is no gain for the fighter to chase that wumpus. Hence the value for communicating the old wumpus’ location is: ff = 0. The probability of currency P of this implementation is not exactly the same because of the unique characteristics of the domain. However, it rests on the same idea we worked out in Section 4.2. In this domain, it is important for a fighter to arrive at a found wumpus’ location before the wumpus moves. A found wumpus’ location is relevant if the fighter is going to kill the wumpus before the wumpus jumps. Therefore in this domain tu represents the time at which the fighter arrives at the wumpus’ location. We assume P represents the probability that the wumpus does not jump between the interval (tp, tu]:

δ i` = selectPolicy(needer, providers, I); ` switch( δ i ) case Silence: Silence; //use most recent value it has break; case ActiveAsk: ActiveAsk providersi; if providersi sends Reply receiveInfo(providersi, I, t); else //WaitUntilNext Wait; waiting = TRUE; break; case Wait: Wait; waiting = TRUE; break; if (waiting) while ((!obtained)&&(waitTime