Learning to Communicate in Decentralized Systems

Report 5 Downloads 82 Views
Framing the Problem Solving the Problem: Approaches and Results Summary

Learning to Communicate in Decentralized Systems M. Allen1

C.V. Goldman2

S. Zilberstein1

1 Department

of Computer Science University of Massachusetts, Amherst 2 Caesarea

Rothschild Institute University of Haifa, Israel

Workshop on Multiagent Learning Twentieth National Conference on Artificial Intelligence (AAAI-05) (1/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Outline Framing the Problem Multiagent Communication Decentralized MDPs with Communication Communication and Translation Solving the Problem: Approaches and Results The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

(2/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Outline Framing the Problem Multiagent Communication Decentralized MDPs with Communication Communication and Translation Solving the Problem: Approaches and Results The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

(3/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

Outline Framing the Problem Multiagent Communication Decentralized MDPs with Communication Communication and Translation Solving the Problem: Approaches and Results The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

(4/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

Motivation

(5/33)

I

Decentralized decision-making is a provably hard problem, but one that arises in many common contexts.

I

Automated systems that can communicate between components can radically simplify such tasks.

I

Systems may need to cooperate in situations where common communication is not initially available.

I

Robust systems need to be able to recover from specification errors or unforeseen situations, by learning new understandings of what each other communicate.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

An Overview of the Problem

Decision-making in decentralized and multiagent systems depends upon a number of factors:

(6/33)

I

How system-states affect what agents observe.

I

How agent’s actions affect system-states.

I

The information that agents share by communicating.

I

Agent’s beliefs about their environment.

I

Agent’s beliefs about the meanings of communications.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

Belief−state update

1 1 1 o 2

m

Pτ21 F 12

2 1 2 o 2

m 1 2

a

1 β2

Pτ22 F 2 2

3 1

m 2 2

a

o

2 β2

3 2

Pτ23 F 3 2 β2 3

Policy of Communication

s1

s2

s3

Policy of Action

1

m

1 2 1

o1

2

β1 Pτ11 F 1 1

2 2 2 o 1

m a

1 1

t1 t2

(7/33)

Allen, Goldman, Zilberstein

3

β1 Pτ12 F 2 1

3 2 3 o 1

m a

2 1

t2

β1 3 Pτ13 F 1

t3

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

The Dec-MDP-Com Model An n-agent decentralized MDP with direct communication, M = hS, A, P, R, Σ, CΣ , Ω, O, T i. I I I I I I I I

I (8/33)

S is a finite set of states, with initial state s0 . A = {Ai | Ai is a finite set of actions ai for agent αi }. P is a transition function for state-pairs and joint actions. R is a global reward function for state-action transitions. Σ = {Σi | Σi is a finite set of messages σi for αi }. CΣ gives the cost of each transmitted message. Ω = {Ωi | Ωi is a finite set of observations oi for αi }. O is an observation function for agents, observations and state-action transitions. T is the time-horizon (finite or infinite) of the problem. Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

An Important Constraint I

Dec-MDP-Coms (unlike Dec-POMDP-Coms) restrict the observation function O, so they are jointly fully-observable.

I

Agents’ combined observations fix the global state.

I

That is, if the observation-probability O(o1 , . . . , on | s, a1 , . . . , an , s0 ) 6= 0 then we can fix state s0 determinately, based upon the observations (o1 , . . . , on ).

I

(9/33)

Each agent alone does not observe the entire state; rather, the observations of all agents taken together determine it.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

An Example: The Pumps Domain

(10/33)

I

2 agents, each controlling n pumps and m flow-valves.

I

Each agent separately observes fluid entering along different inflow ducts, along with its own pumps and valves.

I

Task: maximize flow out of the system, minimizing outflow ducts used.

I

Probabilistic effects: pumps and valves susceptible to variations in throughput.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

The Pumps Domain as Dec-MDP-Com M = hS, A, P, R, Σ, CΣ , Ω, O, T i, has elements as follows:

(11/33)

I

S: the state-set is described by flow through the two inflow ducts, two sets of pumps, and two sets of valves.

I

A: each agent chooses an action to control the pumps pri (on, off, forward, back) or valves (open, shut).

I

P: the transition function directs flow according to actions taken; pumps and valves may fail to respond probabilistically.

I

R: the total reward is proportional to outflow relative to inflow, discounted by number of ducts used. Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

The Pumps Domain as Dec-MDP-Com (II)

(12/33)

I

Σ: each agent possesses messages corresponding to each possible action, each pump/valve in the system, and observed units of inflow.

I

CΣ : the cost of each message is zero (0).

I

Ω, O: each agent observes just its own inflow duct, and all pumps and valves it controls.

I

T : the problem has an infinite time-horizon.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

Learning to Communicate

(13/33)

I

Where agents understand one another, optimal linguistic action involves deciding what and when to communicate, based on cost-benefit analysis.

I

Where agents do not fully understand one another, simple message-passing is not enough.

I

Agents need to learn how to respond to messages.

I

A limited, but interesting question of what messages mean.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

Translations We represent the degree to which agent αi understands agent αj by way of a probabilistic correspondence between messages sent by αj , and those that αi might itself send.

Definition (Translation) Let Σ and Σ0 be sets of messages. A translation, τ , between Σ and Σ0 is a probability function over message-pairs: for any messages σ, σ 0 , τ (σ, σ 0 ) is the probability that σ and σ 0 mean the same. + 0 τΣ,Σ 0 is the set of all translations between Σ and Σ .

(14/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

Belief-States Agents need to consider multiple possible translations between messages, and possess beliefs regarding which translation might be correct.

Definition (Belief-state) Let agents α1 and α2 utilize message-sets Σ1 and Σ2 , respectively. A belief-state for agent αi is a a probability-function βi over the set of translations τΣ+i ,Σj (i 6= j). That is, for any translation τ between Σi and Σj , βi (τ ) is the probability that τ is correct.

(15/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Multiagent Communication Decentralized MDPs with Communication Communication and Translation

Belief-States (II) We can thus talk about an agent’s belief about meanings. I

Translations (τ ): distributions over message-pairs.

I

Belief-states (βi ): distributions over translations.

I

Given any pair of messages, agent αi assigns that pair a likelihood, βi+ , of having the same meaning: X

βi+ (σi , σj ) =

βi (τ ) · τ (σi , σj ).

τ ∈τΣ+ ,Σ i j I

(16/33)

Learning to communicate is the process of updating belief-states about translations.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

Outline Framing the Problem Multiagent Communication Decentralized MDPs with Communication Communication and Translation Solving the Problem: Approaches and Results The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

(17/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

Bayesian Updates for Communication Learning Agents base their translations upon the outcomes of actions taken in response to messages.

(18/33)

I

Actions are chosen in response to local observations, messages received, and current belief-states.

I

In the predictive phase, agents generate possible next belief-states, based upon their current beliefs, and the action they have chosen.

I

The actions chosen by all agents together lead to state-transitions, and new observations.

I

In a retrospective phase, following action, agents adjust belief-states in response to observed outcomes.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

Special Properties of Dec-MDP-Coms Under certain conditions, Dec-MDP-Coms can be reduced to somewhat simpler problems.

Definition (Fully-describable) A Dec-MDP-Com is fully-describable if each agent possesses a language sufficient to communicate both: (a) any observation it makes, and (b) any action it takes.

Definition (Freely-describable) A Dec-MDP-Com is freely-describable if the cost of communicating any message σ is 0.

(19/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

Reduction to Multi-Agent MDPs Claim (Reduction) A Dec-MDP-Com is equivalent to a Multi-agent MDP if (a) it is fully- and freely-describable; and (b) all agents share a common language.

(20/33)

I

Such a Dec-MDP-Com is reducible to an MDP in which each agent can fully observe the entire state-space.

I

Simpler, with polynomial-time solution algorithms.

I

Can be solved off-line.

I

Still the problem of learning the common language.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

Suitable Problem Instances In order for communication-learning to be possible, agents need to be able to update belief-states meaningfully. In a suitable Dec-MDP-Com, we have: I

Free and full communication.

I

Agents communicate observations and intended actions. Each translator can choose a most likely interpretation of any received message, such that:

I

I

I

(21/33)

If incorrect, then after taking action and observing the outcome, the probability assigned to correct interpretation is strictly greater than one originally chosen.

An example: meanings observed after some time delay.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

Suitable Problem Instances In order for communication-learning to be possible, agents need to be able to update belief-states meaningfully. In a suitable Dec-MDP-Com, we have: I

Free and full communication.

I

Agents communicate observations and intended actions. Each translator can choose a most likely interpretation of any received message, such that:

I

I

I

(22/33)

If incorrect, then after taking action and observing the outcome, the probability assigned to correct interpretation is strictly greater than one originally chosen.

An example: meanings observed after some time delay.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

Suitable Problem Instances In order for communication-learning to be possible, agents need to be able to update belief-states meaningfully. In a suitable Dec-MDP-Com, we have: I

Free and full communication.

I

Agents communicate observations and intended actions. Each translator can choose a most likely interpretation of any received message, such that:

I

I

I

(23/33)

If incorrect, then after taking action and observing the outcome, the probability assigned to correct interpretation is strictly greater than one originally chosen.

An example: meanings observed after some time delay.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

The Elementary Action Protocol Agents can learn to communicate in suitable problem instances by way of the following protocol. 1. All agents share their observations. 2. Each agent calculates most likely observation-sequence, based on observations, messages, and current translations, and thus a most likely state. 3. Proceeding in turn, each agent chooses an action by: 3.1 Calculating the most likely action sub-sequence, consisting of actions that might be taken by prior agents in the order. 3.2 Choosing an action that maximizes value. 3.3 Communicating this action to the others.

4. Agents take action after all agents complete the prior step. 5. Belief-states are adjusted based on next state-transition. (24/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

The Elementary Action Protocol (II) I

Such a straightforward procedure is not necessarily optimal, nor even necessarily close to optimal.

I

Computing optimal policies for legitimately decentralized problems is generally intractable.

I

Agents who follow the protocol can eventually come to communicate clearly, and so act properly.

Claim (Convergence) Given an infinite time-horizon, agents acting according to the elementary action protocol in a suitable Dec-MDP-Com will eventually converge upon a joint policy that is optimal for the states they encounter from then on. (25/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

The Elementary Action Protocol (II) I

Such a straightforward procedure is not necessarily optimal, nor even necessarily close to optimal.

I

Computing optimal policies for legitimately decentralized problems is generally intractable.

I

Agents who follow the protocol can eventually come to communicate clearly, and so act properly.

Claim (Convergence) Given an infinite time-horizon, agents acting according to the elementary action protocol in a suitable Dec-MDP-Com will eventually converge upon a joint policy that is optimal for the states they encounter from then on. (26/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

Empirical Results We tested our methods on a variety of instances of the pump-world problem domain already described.

(27/33)

I

Varied numbers of pumps and valves, and thus size of associated vocabularies.

I

Agents proceeded until each had full confidence in translation of other agent.

I

Results showed convergence to shared language proceeds quite steadily, with time an apparently polynomial function of problem/language complexity.

I

Rate of reward-accumulation grows very quickly to near maximum, suggesting possibility of approximation.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

Language and Total Reward Over Time 100

% of Reward Earned

Pct. Learning Progress

% of Language Learned 80

60

40

20

0

(28/33)

0

2000

4000 6000 8000 Time−Steps in Learning Process

Allen, Goldman, Zilberstein

10000

12000

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results

Rate of Reward Accumulation Over Time

1

Avg. Reward/Timestep

0.8

0.6

0.4

0.2

0 0

(29/33)

2000

4000

Allen, Goldman, Zilberstein

6000 Timesteps

8000

10000

12000

Learning to Communicate in Decentralized Systems

Framing the Problem Solving the Problem: Approaches and Results Summary

Summary

I

Learning to communicate allows complex decentralized problems to be solved more efficiently.

I

This learning problem is itself highly complicated. We are interested in further exploration of:

I

I

I

I

(30/33)

Properties of decentralized problems making learning easier (or even possible). Effective techniques for updating beliefs about communication. Approximate techniques and their relation to communication.

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Appendix

For Further Reading

For Further Reading I

Bernstein, D.; Givan, R.; Immerman, N.; and Zilberstein, S. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27:819–840, 2002. Boutilier, C. Sequential optimality and coordination in multiagent systems. In Proc. 16th Intl. Joint Conf. on AI, pages 478–485, 1999.

(31/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Appendix

For Further Reading

For Further Reading II Goldman, C. V. and Zilberstein, S. Optimizing information exchange in cooperative multi-agent systems. In Proc. 2nd Intl. Conf. on Autonomous Agents and Multiagent Systems, 137–140, 2003. Goldman, C. V., and Zilberstein, S. Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research, 22:143–174, 2004.

(32/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems

Appendix

For Further Reading

For Further Reading III

Goldman, C. V.; Allen, M.; and Zilberstein, S. Decentralized language learning through acting. In Proc. 3rd Intl. Conf. on Autonomous Agents and Multiagent Systems, 1006–1013, 2004.

(33/33)

Allen, Goldman, Zilberstein

Learning to Communicate in Decentralized Systems