Framing the Problem Solving the Problem: Approaches and Results Summary
Learning to Communicate in Decentralized Systems M. Allen1
C.V. Goldman2
S. Zilberstein1
1 Department
of Computer Science University of Massachusetts, Amherst 2 Caesarea
Rothschild Institute University of Haifa, Israel
Workshop on Multiagent Learning Twentieth National Conference on Artificial Intelligence (AAAI-05) (1/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Outline Framing the Problem Multiagent Communication Decentralized MDPs with Communication Communication and Translation Solving the Problem: Approaches and Results The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
(2/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Outline Framing the Problem Multiagent Communication Decentralized MDPs with Communication Communication and Translation Solving the Problem: Approaches and Results The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
(3/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
Outline Framing the Problem Multiagent Communication Decentralized MDPs with Communication Communication and Translation Solving the Problem: Approaches and Results The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
(4/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
Motivation
(5/33)
I
Decentralized decision-making is a provably hard problem, but one that arises in many common contexts.
I
Automated systems that can communicate between components can radically simplify such tasks.
I
Systems may need to cooperate in situations where common communication is not initially available.
I
Robust systems need to be able to recover from specification errors or unforeseen situations, by learning new understandings of what each other communicate.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
An Overview of the Problem
Decision-making in decentralized and multiagent systems depends upon a number of factors:
(6/33)
I
How system-states affect what agents observe.
I
How agent’s actions affect system-states.
I
The information that agents share by communicating.
I
Agent’s beliefs about their environment.
I
Agent’s beliefs about the meanings of communications.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
Belief−state update
1 1 1 o 2
m
Pτ21 F 12
2 1 2 o 2
m 1 2
a
1 β2
Pτ22 F 2 2
3 1
m 2 2
a
o
2 β2
3 2
Pτ23 F 3 2 β2 3
Policy of Communication
s1
s2
s3
Policy of Action
1
m
1 2 1
o1
2
β1 Pτ11 F 1 1
2 2 2 o 1
m a
1 1
t1 t2
(7/33)
Allen, Goldman, Zilberstein
3
β1 Pτ12 F 2 1
3 2 3 o 1
m a
2 1
t2
β1 3 Pτ13 F 1
t3
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
The Dec-MDP-Com Model An n-agent decentralized MDP with direct communication, M = hS, A, P, R, Σ, CΣ , Ω, O, T i. I I I I I I I I
I (8/33)
S is a finite set of states, with initial state s0 . A = {Ai | Ai is a finite set of actions ai for agent αi }. P is a transition function for state-pairs and joint actions. R is a global reward function for state-action transitions. Σ = {Σi | Σi is a finite set of messages σi for αi }. CΣ gives the cost of each transmitted message. Ω = {Ωi | Ωi is a finite set of observations oi for αi }. O is an observation function for agents, observations and state-action transitions. T is the time-horizon (finite or infinite) of the problem. Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
An Important Constraint I
Dec-MDP-Coms (unlike Dec-POMDP-Coms) restrict the observation function O, so they are jointly fully-observable.
I
Agents’ combined observations fix the global state.
I
That is, if the observation-probability O(o1 , . . . , on | s, a1 , . . . , an , s0 ) 6= 0 then we can fix state s0 determinately, based upon the observations (o1 , . . . , on ).
I
(9/33)
Each agent alone does not observe the entire state; rather, the observations of all agents taken together determine it.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
An Example: The Pumps Domain
(10/33)
I
2 agents, each controlling n pumps and m flow-valves.
I
Each agent separately observes fluid entering along different inflow ducts, along with its own pumps and valves.
I
Task: maximize flow out of the system, minimizing outflow ducts used.
I
Probabilistic effects: pumps and valves susceptible to variations in throughput.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
The Pumps Domain as Dec-MDP-Com M = hS, A, P, R, Σ, CΣ , Ω, O, T i, has elements as follows:
(11/33)
I
S: the state-set is described by flow through the two inflow ducts, two sets of pumps, and two sets of valves.
I
A: each agent chooses an action to control the pumps pri (on, off, forward, back) or valves (open, shut).
I
P: the transition function directs flow according to actions taken; pumps and valves may fail to respond probabilistically.
I
R: the total reward is proportional to outflow relative to inflow, discounted by number of ducts used. Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
The Pumps Domain as Dec-MDP-Com (II)
(12/33)
I
Σ: each agent possesses messages corresponding to each possible action, each pump/valve in the system, and observed units of inflow.
I
CΣ : the cost of each message is zero (0).
I
Ω, O: each agent observes just its own inflow duct, and all pumps and valves it controls.
I
T : the problem has an infinite time-horizon.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
Learning to Communicate
(13/33)
I
Where agents understand one another, optimal linguistic action involves deciding what and when to communicate, based on cost-benefit analysis.
I
Where agents do not fully understand one another, simple message-passing is not enough.
I
Agents need to learn how to respond to messages.
I
A limited, but interesting question of what messages mean.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
Translations We represent the degree to which agent αi understands agent αj by way of a probabilistic correspondence between messages sent by αj , and those that αi might itself send.
Definition (Translation) Let Σ and Σ0 be sets of messages. A translation, τ , between Σ and Σ0 is a probability function over message-pairs: for any messages σ, σ 0 , τ (σ, σ 0 ) is the probability that σ and σ 0 mean the same. + 0 τΣ,Σ 0 is the set of all translations between Σ and Σ .
(14/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
Belief-States Agents need to consider multiple possible translations between messages, and possess beliefs regarding which translation might be correct.
Definition (Belief-state) Let agents α1 and α2 utilize message-sets Σ1 and Σ2 , respectively. A belief-state for agent αi is a a probability-function βi over the set of translations τΣ+i ,Σj (i 6= j). That is, for any translation τ between Σi and Σj , βi (τ ) is the probability that τ is correct.
(15/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Multiagent Communication Decentralized MDPs with Communication Communication and Translation
Belief-States (II) We can thus talk about an agent’s belief about meanings. I
Translations (τ ): distributions over message-pairs.
I
Belief-states (βi ): distributions over translations.
I
Given any pair of messages, agent αi assigns that pair a likelihood, βi+ , of having the same meaning: X
βi+ (σi , σj ) =
βi (τ ) · τ (σi , σj ).
τ ∈τΣ+ ,Σ i j I
(16/33)
Learning to communicate is the process of updating belief-states about translations.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
Outline Framing the Problem Multiagent Communication Decentralized MDPs with Communication Communication and Translation Solving the Problem: Approaches and Results The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
(17/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
Bayesian Updates for Communication Learning Agents base their translations upon the outcomes of actions taken in response to messages.
(18/33)
I
Actions are chosen in response to local observations, messages received, and current belief-states.
I
In the predictive phase, agents generate possible next belief-states, based upon their current beliefs, and the action they have chosen.
I
The actions chosen by all agents together lead to state-transitions, and new observations.
I
In a retrospective phase, following action, agents adjust belief-states in response to observed outcomes.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
Special Properties of Dec-MDP-Coms Under certain conditions, Dec-MDP-Coms can be reduced to somewhat simpler problems.
Definition (Fully-describable) A Dec-MDP-Com is fully-describable if each agent possesses a language sufficient to communicate both: (a) any observation it makes, and (b) any action it takes.
Definition (Freely-describable) A Dec-MDP-Com is freely-describable if the cost of communicating any message σ is 0.
(19/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
Reduction to Multi-Agent MDPs Claim (Reduction) A Dec-MDP-Com is equivalent to a Multi-agent MDP if (a) it is fully- and freely-describable; and (b) all agents share a common language.
(20/33)
I
Such a Dec-MDP-Com is reducible to an MDP in which each agent can fully observe the entire state-space.
I
Simpler, with polynomial-time solution algorithms.
I
Can be solved off-line.
I
Still the problem of learning the common language.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
Suitable Problem Instances In order for communication-learning to be possible, agents need to be able to update belief-states meaningfully. In a suitable Dec-MDP-Com, we have: I
Free and full communication.
I
Agents communicate observations and intended actions. Each translator can choose a most likely interpretation of any received message, such that:
I
I
I
(21/33)
If incorrect, then after taking action and observing the outcome, the probability assigned to correct interpretation is strictly greater than one originally chosen.
An example: meanings observed after some time delay.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
Suitable Problem Instances In order for communication-learning to be possible, agents need to be able to update belief-states meaningfully. In a suitable Dec-MDP-Com, we have: I
Free and full communication.
I
Agents communicate observations and intended actions. Each translator can choose a most likely interpretation of any received message, such that:
I
I
I
(22/33)
If incorrect, then after taking action and observing the outcome, the probability assigned to correct interpretation is strictly greater than one originally chosen.
An example: meanings observed after some time delay.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
Suitable Problem Instances In order for communication-learning to be possible, agents need to be able to update belief-states meaningfully. In a suitable Dec-MDP-Com, we have: I
Free and full communication.
I
Agents communicate observations and intended actions. Each translator can choose a most likely interpretation of any received message, such that:
I
I
I
(23/33)
If incorrect, then after taking action and observing the outcome, the probability assigned to correct interpretation is strictly greater than one originally chosen.
An example: meanings observed after some time delay.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
The Elementary Action Protocol Agents can learn to communicate in suitable problem instances by way of the following protocol. 1. All agents share their observations. 2. Each agent calculates most likely observation-sequence, based on observations, messages, and current translations, and thus a most likely state. 3. Proceeding in turn, each agent chooses an action by: 3.1 Calculating the most likely action sub-sequence, consisting of actions that might be taken by prior agents in the order. 3.2 Choosing an action that maximizes value. 3.3 Communicating this action to the others.
4. Agents take action after all agents complete the prior step. 5. Belief-states are adjusted based on next state-transition. (24/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
The Elementary Action Protocol (II) I
Such a straightforward procedure is not necessarily optimal, nor even necessarily close to optimal.
I
Computing optimal policies for legitimately decentralized problems is generally intractable.
I
Agents who follow the protocol can eventually come to communicate clearly, and so act properly.
Claim (Convergence) Given an infinite time-horizon, agents acting according to the elementary action protocol in a suitable Dec-MDP-Com will eventually converge upon a joint policy that is optimal for the states they encounter from then on. (25/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
The Elementary Action Protocol (II) I
Such a straightforward procedure is not necessarily optimal, nor even necessarily close to optimal.
I
Computing optimal policies for legitimately decentralized problems is generally intractable.
I
Agents who follow the protocol can eventually come to communicate clearly, and so act properly.
Claim (Convergence) Given an infinite time-horizon, agents acting according to the elementary action protocol in a suitable Dec-MDP-Com will eventually converge upon a joint policy that is optimal for the states they encounter from then on. (26/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
Empirical Results We tested our methods on a variety of instances of the pump-world problem domain already described.
(27/33)
I
Varied numbers of pumps and valves, and thus size of associated vocabularies.
I
Agents proceeded until each had full confidence in translation of other agent.
I
Results showed convergence to shared language proceeds quite steadily, with time an apparently polynomial function of problem/language complexity.
I
Rate of reward-accumulation grows very quickly to near maximum, suggesting possibility of approximation.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
Language and Total Reward Over Time 100
% of Reward Earned
Pct. Learning Progress
% of Language Learned 80
60
40
20
0
(28/33)
0
2000
4000 6000 8000 Time−Steps in Learning Process
Allen, Goldman, Zilberstein
10000
12000
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
The Bayesian Approach Reduction to Multi-Agent MDPs Solving Suitable Problems Empirical Results
Rate of Reward Accumulation Over Time
1
Avg. Reward/Timestep
0.8
0.6
0.4
0.2
0 0
(29/33)
2000
4000
Allen, Goldman, Zilberstein
6000 Timesteps
8000
10000
12000
Learning to Communicate in Decentralized Systems
Framing the Problem Solving the Problem: Approaches and Results Summary
Summary
I
Learning to communicate allows complex decentralized problems to be solved more efficiently.
I
This learning problem is itself highly complicated. We are interested in further exploration of:
I
I
I
I
(30/33)
Properties of decentralized problems making learning easier (or even possible). Effective techniques for updating beliefs about communication. Approximate techniques and their relation to communication.
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Appendix
For Further Reading
For Further Reading I
Bernstein, D.; Givan, R.; Immerman, N.; and Zilberstein, S. The complexity of decentralized control of Markov decision processes. Mathematics of Operations Research, 27:819–840, 2002. Boutilier, C. Sequential optimality and coordination in multiagent systems. In Proc. 16th Intl. Joint Conf. on AI, pages 478–485, 1999.
(31/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Appendix
For Further Reading
For Further Reading II Goldman, C. V. and Zilberstein, S. Optimizing information exchange in cooperative multi-agent systems. In Proc. 2nd Intl. Conf. on Autonomous Agents and Multiagent Systems, 137–140, 2003. Goldman, C. V., and Zilberstein, S. Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research, 22:143–174, 2004.
(32/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems
Appendix
For Further Reading
For Further Reading III
Goldman, C. V.; Allen, M.; and Zilberstein, S. Decentralized language learning through acting. In Proc. 3rd Intl. Conf. on Autonomous Agents and Multiagent Systems, 1006–1013, 2004.
(33/33)
Allen, Goldman, Zilberstein
Learning to Communicate in Decentralized Systems