Sensing And Decision Making With Social Learning Vikram Krishnamurthy Cornell University, ECE Department. from a statistical signal processing/stochastic control viewpoint
SOCIAL
•
SENSORS
Social sensor: Provides information about environment to a social network after interaction with other agents.
2
SOCIAL
•
SENSORS
Social sensor: Provides information about environment to a Sensor-Adaptive Signal Processing social network after interaction with other agents. Statistical signal processing: Extract signal from noise Noise
1. Social SignalSensors influence each other over a network 2. Social Sensors have dynamics: learn from past decisions and decisions of others Sensor Signal Processing 3. Social Sensors are rationally inattentive.
Estimate
Sensor-Adaptive Signal Processing: Dynamically manage sensor resources. Noise Estimate
Signal
local utility
Sensor Signal Processing
ordinal decision
Feedback (Stochastic Control)
2 How can sensors autonomously manage their behavior?
Social learning results in herding - suppose we close the loop. Q1. How do Local and Global Agents Interact in decision making? Q2. How to optimize Social Learning to delay herding? Q3. How to price a product?
yk uk Markov yk πk xk Bayesian u=1 yk ykSensor uk Filter πk u=2 uk yk uk {stop, continue} local utility u=1 uk {stop, continue} Unstructured Policy µ∗ (π) πk πk πk u=2 local Montone Policy µ∗ (π) u1 = 1 Controller u = u=1 action µ∗ (π) Unstructured Policy u = u2 = 2 u=2 ∗ ∗ Montone Policy µ (π) ∗ ∗ Unstructured Policy µ (π) Unstructured Policy µ (π) Policy µ (π) Unstructured ∗ π= ∗ Montone Policy µ∗ (π) ⎡
Q1.
HOW
DO LOCAL
&
GLOBAL DECISION MAKERS INTERACT?
Example: Multiagent Quickest Change Detection
4
! B1 (·) k ≤ τ 0 0 , where τ = change time (usually geometric) Observations yk ∼ 0 B (·)Change k> τ Detection: Bayesian Bayesian Change Detection: Q1.Quickest-Time HQuickest-Time OW DO2 LOCAL & G LOBAL DECISION MAKERS INTERACT? µ Minimize E − τ 0 |+ + f I(τ < τ 0 )}, Aim: Compute time τ to annouce change: Example: Multiagent ! ! Quickest Change Detection π0 {d|τ " #$ % " #$ % 0
B1 (·)B (·) k ≤ τk ≤ τ 0 delay false-alarm 0 1 , where τ = change time (usually geometric) Observations y ∼ 0 k 0P (τ 0 ≤, k|y1where Observations ykposterior ∼2 (·) kπ> Classical: Given , . . . , ykτ): = change time (usually geometric) k = B τ 0 B (·) k > τ Optimal decision policy is2threshold. 0 + Aim: Compute time τ to annouce change: Minimize Eµµπ0 {d|τ − τ 00 |+ + f I(τ < τ }, 0)change declare Compute time τ to annouce change: Minimize Eπ0 {d|τ τchange #$τ | % + "f I(τdeclare #$< %)}, " −
"
#$ delay
%
decision Classical: Given posterior πk = P (change|y delay 0 1 , . . . , yk ): policy Classical: Given posterior πk−1 = P (τ ≤ kdecision − 1|y , . . . , y ) no change 1 k−1 0 Optimal decision policy is threshold. [Shiriyaev 1950s]. Classical: Given posterior πk−1 = P (τ ≤ k − 1|y1 , . . . , yk−1)
policy
• Agent yk Aim: Agentsk kreceives = 1, 2, . observation . . act sequentially
no change
" false-alarm #$ % false-alarm
declare change
• Agent k receives observation ykdecision
to estimate state x ∼ π0 . 0
• Updates πk = P (τ ≤ k|y1 , . . . , yk−1, ypolicy k)
• Updates πk = P (τ 0 ≤ k|y1 , . . . , yk−1, yk )
• Broadcasts πk (or yk )
posterior probability of change
no change
posterior probability of change
Protocol: Given public belief πk−1 = P (x|a1 , . . . , ak−1 )
• Broadcasts πk (or yk )
At what time τ it optimal to declare change? • Agent k observes yk ∼ p(y|x) 0 posterior At what time τ it optimal to declare change? Social Learning: Given posterior π = P (τ ≤k− 1|a1 , . .probability . , ak−1 ) of change k−1 • Take action greedily to minimize cost
Learning: Given akSocial = arg min a Eπ,y (c(x, a)}
posterior πk−1 = P (τ 0 ≤ k − 1|a1 , . . . , ak−1 )
• Agent k receives observation yk .
• Update public belief πk = P (x|a1 , . . . , ak )
• Agent k receives observation yk .
• Chooses local decision ak = mina E{c(I(τ 0 ≤ k), a)|a1 , . . . , ak−1 , yk }. 0 ≤ k − 1|y , . . . , y 0 Posterior πk−1local = P (τ ) • Chooses decision ak = E{c(I(τ ≤ k), a)|a1 , . . . , ak−1 , yk }. 1 minak−1
• Broadcasts action ak .
• Agent k receives observation yk
4
⇥, C)
! B1 (·) k ≤ τ 0 0 , where τ = change time (usually geometric) Observations Broadcasts action ak y .k ∼ 0 B (·)Change k> τ Detection: Bayesian Bayesian Change Detection: Q1.Quickest-Time HQuickest-Time OW DO2 LOCAL & G LOBAL DECISION MAKERS INTERACT? µ Minimize E − τ 0 |+ + f I(τ < τ 0 )}, Aim: Compute time τ to annouce change: Example: Multiagent ! ! Quickest Change Detection π0 {d|τ " #$ % " #$ % 1 0X
B1 (·) (·) k ≤ τk ≤ τ 0 delay false-alarm 0 y ))} µy⇤k(⇤∼ arg c(x 1 min E{ k 1 , yk ) =B k , µ(⇤kτ 1 ,= k change , where time (usually geometric) Observations 0 µ = 0P (τ 0 ≤, k|y where Observations ykposterior ∼2 (·) kπ> Classical: Given 1 , . . . , ykτ): = change time (usually geometric) k τ k=1 0 B B (·) k > τ Optimal decision policy is2threshold.
1 X
0 0 + µ (⇤Aim: ⇧ { yk time , min τE to } = arg min E{ c(xk ,µµ(⇤k 1 , yk ))} k 1 , yk )Compute ,yk {c(x, a)change: k 1annouce τ }, a µMinimize Eµπ0 {d|τ − τ 0 |+ + f I(τ < 0)change |{z} declare | {z } Compute time τ to annouce change: + f I(τ < τ ) }, Minimize E {d|τ − τ | " #$ % #$ % " k=1 π0 socialistic ⇤
capitalistic ak
decision πk = P (change|y 0 1 , . . . , yk ): policy decision 1 k−1
"
#$ delay
%
Classical: Given posterior Classical: Given posterior π = P (τ ≤ k 0− 1|y , . . . , yk−1delay ) no change Optimal decision policy is threshold. [Shiriyaev 1950s]. Classical: Given posterior πk−1 = P (τ ≤ k − 1|y1 , . . . , yk−1) policy
• Agent k kreceives ykP (change|a , . . . , a ). Aim: Agents = 1,Public 2, . observation . . act sequentially Social Learning: belief ⇤ = k 1 1 k 1 • Agent k receives observation y
no change
declare " false-alarm #$ change % false-alarm
declare change
kdecision
to estimate state x ∼ π0 . 0
Updates πk = Py(τ ≤(y|x) k|y1 , . . . , yk−1, ypolicy •• Agent k: Observes k) k ⇥P
posterior probability of change
• Updates πk = P (τ 0 ≤ k|y1 , . . . , yk−1, yk )
no change
•• Broadcasts action ak =yarg mina E{c(state, a)|a1 , . . . , ak 1 , yk }. posterior probability of change Broadcasts πk (or k) Protocol: Given public belief πk−1 = P (x|a1 , . . . , ak−1 ) • Broadcasts yk ) k (or belief • Other agents updateπpublic (social learning filter) At what time τ it optimal to declare change? • Agent k observes yk ∼ p(y|x) X 0 posterior predictor At what time τ it optimal to declare change? Social Learning: Given posterior π = P (τ ≤(·) k− 1|a1 , . .probability . , ak−1 ) of change ⇤ = P (change|a , . . . , a ) ⌅ P (a |y, ⇤ )P (y|x) k−1 k 1 k k k 1 • Take action greedily to minimize cost update y Learning: Given posterior πk−1 = P (τ 0 ≤ k − 1|a1 , . . . , adeclare ) akSocial = arg min k−1change a Eπ,y (c(x, a)}
• Agent k receives observation yk .
global
• Update public belief πk = P (x|a1 , . . . , ak )
decision policy
• Agent k receives observation yk .
• Chooses local decision ak = mina E{c(I(τ 0 ≤ k), a)|a1 , . . no . ,change ak−1 , yk }. 1
0 ≤ k − 1|y , . . . , y 0 Posterior πk−1local = P (τ ) • Chooses decision ak = E{c(I(τ ≤ k), a)|a1 , . . . , ak−1 , yk }. 1 minak−1
• Broadcasts action ak .
• Agent k receives observation yk
4
posterior probability of change
⇥, C)
! B1 (·) k ≤ τ 0 0 , where τ = change time (usually geometric) Observations Broadcasts action ak y .k ∼ 0 B (·)Change k> τ Detection: Bayesian Bayesian Change Detection: Q1.Quickest-Time HQuickest-Time OW DO2 LOCAL & G LOBAL DECISION MAKERS INTERACT? µ Minimize E − τ 0 |+ + f I(τ < τ 0 )}, Aim: Compute time τ to annouce change: Example: Multiagent ! ! Quickest Change Detection π0 {d|τ " #$ % " #$ % 1 0X
B1 (·) (·) k ≤ τk ≤ τ 0 delay false-alarm 0 y ))} µy⇤k(⇤∼ arg c(x 1 min E{ k 1 , yk ) =B k , µ(⇤kτ 1 ,= k change , where time (usually geometric) Observations 0 µ = 0P (τ 0 ≤, k|y where Observations ykposterior ∼2 (·) kπ> Classical: Given 1 , . . . , ykτ): = change time (usually geometric) k τ k=1 0 B B (·) k > τ Optimal decision policy is2threshold.
1 X
0 0 + µ (⇤Aim: ⇧ { yk time , min τE to } = arg min E{ c(xk ,µµ(⇤k 1 , yk ))} k 1 , yk )Compute ,yk {c(x, a)change: k 1annouce τ }, a µMinimize Eµπ0 {d|τ − τ 0 |+ + f I(τ < 0)change |{z} declare | {z } Compute time τ to annouce change: + f I(τ < τ ) }, Minimize E {d|τ − τ | " #$ % #$ % " k=1 π0 socialistic ⇤
capitalistic ak
decision πk = P (change|y 0 1 , . . . , yk ): policy decision 1 k−1
"
#$ delay
%
Classical: Given posterior Classical: Given posterior π = P (τ ≤ k 0− 1|y , . . . , yk−1delay ) no change Optimal decision policy is threshold. [Shiriyaev 1950s]. Classical: Given posterior πk−1 = P (τ ≤ k − 1|y1 , . . . , yk−1) policy
• Agent k kreceives ykP (change|a , . . . , a ). Aim: Agents = 1,Public 2, . observation . . act sequentially Social Learning: belief ⇤ = k 1 1 k 1 • Agent k receives observation y
no change
declare " false-alarm #$ change % false-alarm
declare change
kdecision
to estimate state x ∼ π0 . 0
Updates πk = Py(τ ≤(y|x) k|y1 , . . . , yk−1, ypolicy •• Agent k: Observes k) k ⇥P
posterior probability of change
• Updates πk = P (τ 0 ≤ k|y1 , . . . , yk−1, yk )
no change
•• Broadcasts action ak =yarg mina E{c(state, a)|a1 , . . . , ak 1 , yk }. posterior probability of change Broadcasts πk (or k) Protocol: Given public belief πk−1 = P (x|a1 , . . . , ak−1 ) • Broadcasts yk ) k (or belief • Other agents updateπpublic (social learning filter) At what time τ it optimal to declare change? • Agent k observes yk ∼ p(y|x) X 0 posterior predictor At what time τ it optimal to declare change? Social Learning: Given posterior π = P (τ ≤(·) k− 1|a1 , . .probability . , ak−1 ) of change ⇤ = P (change|a , . . . , a ) ⌅ P (a |y, ⇤ )P (y|x) k−1 k 1 k k k 1 • Take action greedily to minimize cost update y Learning: Given posterior πk−1 = P (τ 0 ≤ k − 1|a1 , . . . , adeclare ) akSocial = arg min k−1change a Eπ,y (c(x, a)}
• When Agentshould k receives observation declare yk . global decision-maker change? • Update public belief πk = P (x|a1 , . . . , ak )
global
decision policy
• Agent k receives observation yk .
• Chooses local decision ak = mina E{c(I(τ 0 ≤ k), a)|a1 , . . no . ,change ak−1 , yk }. 1
0 ≤ k − 1|y , . . . , y 0 Posterior πk−1local = P (τ ) • Chooses decision ak = E{c(I(τ ≤ k), a)|a1 , .multi-threshold . . , ak−1 , yk }. 1 minak−1
• Broadcasts action ak .
• Agent k receives observation yk
4
posterior probability of change
⇥, C)
! B1 (·) k ≤ τ 0 0 , where τ = change time (usually geometric) Observations Broadcasts action ak y .k ∼ 0 B (·)Change k> τ Detection: Bayesian Bayesian Change Detection: Q1.Quickest-Time HQuickest-Time OW DO2 LOCAL & G LOBAL DECISION MAKERS INTERACT? µ Minimize E − τ 0 |+ + f I(τ < τ 0 )}, Aim: Compute time τ to annouce change: Example: Multiagent ! ! Quickest Change Detection π0 {d|τ " #$ % " #$ % 1 0X
B1 (·) (·) k ≤ τk ≤ τ 0 delay false-alarm 0 y ))} µy⇤k(⇤∼ arg c(x 1 min E{ k 1 , yk ) =B k , µ(⇤kτ 1 ,= k change , where time (usually geometric) Observations 0 µ = 0P (τ 0 ≤, k|y where Observations ykposterior ∼2 (·) kπ> Classical: Given 1 , . . . , ykτ): = change time (usually geometric) k τ k=1 0 B B (·) k > τ Optimal decision policy is2threshold.
1 X
0 0 + µ (⇤Aim: ⇧ { yk time , min τE to } = arg min E{ c(xk ,µµ(⇤k 1 , yk ))} k 1 , yk )Compute ,yk {c(x, a)change: k 1annouce τ }, a µMinimize Eµπ0 {d|τ − τ 0 |+ + f I(τ < 0)change |{z} declare | {z } Compute time τ to annouce change: + f I(τ < τ ) }, Minimize E {d|τ − τ | " #$ % #$ % " k=1 π0 socialistic ⇤
capitalistic ak
decision πk = P (change|y 0 1 , . . . , yk ): policy decision 1 k−1
"
#$ delay
%
Classical: Given posterior Classical: Given posterior π = P (τ ≤ k 0− 1|y , . . . , yk−1delay ) no change Optimal decision policy is threshold. [Shiriyaev 1950s]. Classical: Given posterior πk−1 = P (τ ≤ k − 1|y1 , . . . , yk−1) policy
• Agent k kreceives ykP (change|a , . . . , a ). Aim: Agents = 1,Public 2, . observation . . act sequentially Social Learning: belief ⇤ = k 1 1 k 1 • Agent k receives observation y
no change
declare " false-alarm #$ change % false-alarm
declare change
kdecision
to estimate state x ∼ π0 . 0
Updates πk = Py(τ ≤(y|x) k|y1 , . . . , yk−1, ypolicy •• Agent k: Observes k) k ⇥P
posterior probability of change
• Updates πk = P (τ 0 ≤ k|y1 , . . . , yk−1, yk )
no change
•• Broadcasts action ak =yarg mina E{c(state, a)|a1 , . . . , ak 1 , yk }. posterior probability of change Broadcasts πk (or k) Protocol: Given public belief πk−1 = P (x|a1 , . . . , ak−1 ) • Broadcasts yk ) k (or belief • Other agents updateπpublic (social learning filter) At what time τ it optimal to declare change? • Agent k observes yk ∼ p(y|x) X 0 posterior predictor At what time τ it optimal to declare change? Social Learning: Given posterior π = P (τ ≤(·) k− 1|a1 , . .probability . , ak−1 ) of change ⇤ = P (change|a , . . . , a ) ⌅ P (a |y, ⇤ )P (y|x) k−1 k 1 k k k 1 • Take action greedily to minimize cost update y Learning: Given posterior πk−1 = P (τ 0 ≤ k − 1|a1 , . . . , adeclare ) akSocial = arg min k−1change a Eπ,y (c(x, a)}
• When Agentshould k receives observation declare yk . global decision-maker change? • Update public belief πk = P (x|a1 , . . . , ak )
global
decision policy
• Agent k receives observation y .
k set is non-convex • Chooses.Stopping local decision ak = mina E{c(I(τ 0 ≤ k), a)|a1 , . . no . ,change ak−1 , yk }.
1
0 ≤ k − 1|y , . . . , y 0 Posterior πk−1local = P (τ ) • Chooses decision ak = E{c(I(τ ≤ k), a)|a1 , .multi-threshold . . , ak−1 , yk }. 1 minak−1
• Broadcasts action ak .
• Agent k receives observation yk
4
posterior probability of change
⇥, C)
! B1 (·) k ≤ τ 0 0 , where τ = change time (usually geometric) Observations Broadcasts action ak y .k ∼ 0 B (·)Change k> τ Detection: Bayesian Bayesian Change Detection: Q1.Quickest-Time HQuickest-Time OW DO2 LOCAL & G LOBAL DECISION MAKERS INTERACT? µ Minimize E − τ 0 |+ + f I(τ < τ 0 )}, Aim: Compute time τ to annouce change: Example: Multiagent ! ! Quickest Change Detection π0 {d|τ " #$ % " #$ % 1 0X
B1 (·) (·) k ≤ τk ≤ τ 0 delay false-alarm 0 y ))} µy⇤k(⇤∼ arg c(x 1 min E{ k 1 , yk ) =B k , µ(⇤kτ 1 ,= k change , where time (usually geometric) Observations 0 µ = 0P (τ 0 ≤, k|y where Observations ykposterior ∼2 (·) kπ> Classical: Given 1 , . . . , ykτ): = change time (usually geometric) k τ k=1 0 B B (·) k > τ Optimal decision policy is2threshold.
1 X
0 0 + µ (⇤Aim: ⇧ { yk time , min τE to } = arg min E{ c(xk ,µµ(⇤k 1 , yk ))} k 1 , yk )Compute ,yk {c(x, a)change: k 1annouce τ }, a µMinimize Eµπ0 {d|τ − τ 0 |+ + f I(τ < 0)change |{z} declare | {z } Compute time τ to annouce change: + f I(τ < τ ) }, Minimize E {d|τ − τ | " #$ % #$ % " k=1 π0 socialistic ⇤
capitalistic ak
decision πk = P (change|y 0 1 , . . . , yk ): policy decision 1 k−1
"
#$ delay
%
Classical: Given posterior Classical: Given posterior π = P (τ ≤ k 0− 1|y , . . . , yk−1delay ) no change Optimal decision policy is threshold. [Shiriyaev 1950s]. Classical: Given posterior πk−1 = P (τ ≤ k − 1|y1 , . . . , yk−1) policy
• Agent k kreceives ykP (change|a , . . . , a ). Aim: Agents = 1,Public 2, . observation . . act sequentially Social Learning: belief ⇤ = k 1 1 k 1 • Agent k receives observation y
no change
declare " false-alarm #$ change % false-alarm
declare change
kdecision
to estimate state x ∼ π0 . 0
Updates πk = Py(τ ≤(y|x) k|y1 , . . . , yk−1, ypolicy •• Agent k: Observes k) k ⇥P
posterior probability of change
• Updates πk = P (τ 0 ≤ k|y1 , . . . , yk−1, yk )
no change
•• Broadcasts action ak =yarg mina E{c(state, a)|a1 , . . . , ak 1 , yk }. posterior probability of change Broadcasts πk (or k) Protocol: Given public belief πk−1 = P (x|a1 , . . . , ak−1 ) • Broadcasts yk ) k (or belief • Other agents updateπpublic (social learning filter) At what time τ it optimal to declare change? • Agent k observes yk ∼ p(y|x) X 0 posterior predictor At what time τ it optimal to declare change? Social Learning: Given posterior π = P (τ ≤(·) k− 1|a1 , . .probability . , ak−1 ) of change ⇤ = P (change|a , . . . , a ) ⌅ P (a |y, ⇤ )P (y|x) k−1 k 1 k k k 1 • Take action greedily to minimize cost update y Learning: Given posterior πk−1 = P (τ 0 ≤ k − 1|a1 , . . . , adeclare ) akSocial = arg min k−1change a Eπ,y (c(x, a)}
• When Agentshould k receives observation declare yk . global decision-maker change? • Update public belief πk = P (x|a1 , . . . , ak )
global
decision policy
• Agent k receives observation yk .
0 • ChoosesGlobal local decision ak = mina E{c(I(τ ≤ k), a)|a1 , . . no . ,change ak−1 , yk }. Summary: Decision making using local
1
0 ≤ k − 1|y , . . . , y 0 Posterior πk−1local =isP (τ ) • Chooses decision ak = E{c(I(τ ≤ k), a)|a1 , .multi-threshold . . , ak−1 , yk }. 1 minak−1 decisions non-monotone!
• Broadcasts action ak .
• Agent k receives observation yk
4
posterior probability of change
REFERENCES
• Krishnamurthy, Namvar and Hamdi, Interactive Sensing and Decision Making in Social Networks, 2014 (monograph in Foundations & Trends in SP) • Krishnamurthy and Hoiles, Social Learning, Data Incest and Revealed Preferences, IEEE Journal Computational Social Systems, 2015 • Krishnamurthy, Quickest Detection POMDPs with Social Learning, IEEE Trans Information Theory, 2012 • Krishnamurthy & Yin, Tracking the Degree Distribution of Random Graphs, IEEE Trans Information Theory, 2014. • Krishnamurthy, Bhatt, Sequential Detection of Market Shocks with Risk-Averse CVaR social sensors, IEEE Journal Selected Topics Signal Proc, 2016
5