P01 Sensing with Soical Learning Vikram Cornell

Comment

Report 2 Downloads 73 Views

Sensing And Decision Making With Social Learning Vikram Krishnamurthy Cornell University, ECE Department. from a statistical signal processing/stochastic control viewpoint

SOCIAL

•

SENSORS

Social sensor: Provides information about environment to a social network after interaction with other agents.

2

SOCIAL

•

SENSORS

Social sensor: Provides information about environment to a Sensor-Adaptive Signal Processing social network after interaction with other agents. Statistical signal processing: Extract signal from noise Noise

1. Social SignalSensors influence each other over a network 2. Social Sensors have dynamics: learn from past decisions and decisions of others Sensor Signal Processing 3. Social Sensors are rationally inattentive.

Estimate

Sensor-Adaptive Signal Processing: Dynamically manage sensor resources. Noise Estimate

Signal

local utility

Sensor Signal Processing

ordinal decision

Feedback (Stochastic Control)

2 How can sensors autonomously manage their behavior?

Social learning results in herding - suppose we close the loop. Q1. How do Local and Global Agents Interact in decision making? Q2. How to optimize Social Learning to delay herding? Q3. How to price a product?

yk uk Markov yk πk xk Bayesian u=1 yk ykSensor uk Filter πk u=2 uk yk uk {stop, continue} local utility u=1 uk {stop, continue} Unstructured Policy µ∗ (π) πk πk πk u=2 local Montone Policy µ∗ (π) u1 = 1 Controller u = u=1 action µ∗ (π) Unstructured Policy u = u2 = 2 u=2 ∗ ∗ Montone Policy µ (π) ∗ ∗ Unstructured Policy µ (π) Unstructured Policy µ (π) Policy µ (π) Unstructured ∗ π= ∗ Montone Policy µ∗ (π) ⎡

Q1.

HOW

DO LOCAL

&

GLOBAL DECISION MAKERS INTERACT?

Example: Multiagent Quickest Change Detection

4

! B1 (·) k ≤ τ 0 0 , where τ = change time (usually geometric) Observations yk ∼ 0 B (·)Change k> τ Detection: Bayesian Bayesian Change Detection: Q1.Quickest-Time HQuickest-Time OW DO2 LOCAL & G LOBAL DECISION MAKERS INTERACT? µ Minimize E − τ 0 |+ + f I(τ < τ 0 )}, Aim: Compute time τ to annouce change: Example: Multiagent ! ! Quickest Change Detection π0 {d|τ " #$ % " #$ % 0

B1 (·)B (·) k ≤ τk ≤ τ 0 delay false-alarm 0 1 , where τ = change time (usually geometric) Observations y ∼ 0 k 0P (τ 0 ≤, k|y1where Observations ykposterior ∼2 (·) kπ> Classical: Given , . . . , ykτ): = change time (usually geometric) k = B τ 0 B (·) k > τ Optimal decision policy is2threshold. 0 + Aim: Compute time τ to annouce change: Minimize Eµµπ0 {d|τ − τ 00 |+ + f I(τ < τ }, 0)change declare Compute time τ to annouce change: Minimize Eπ0 {d|τ τchange #$τ | % + "f I(τdeclare #$< %)}, " −

"

#$ delay

%

decision Classical: Given posterior πk = P (change|y delay 0 1 , . . . , yk ): policy Classical: Given posterior πk−1 = P (τ ≤ kdecision − 1|y , . . . , y ) no change 1 k−1 0 Optimal decision policy is threshold. [Shiriyaev 1950s]. Classical: Given posterior πk−1 = P (τ ≤ k − 1|y1 , . . . , yk−1)

policy

• Agent yk Aim: Agentsk kreceives = 1, 2, . observation . . act sequentially

no change

" false-alarm #$ % false-alarm

declare change

• Agent k receives observation ykdecision

to estimate state x ∼ π0 . 0

• Updates πk = P (τ ≤ k|y1 , . . . , yk−1, ypolicy k)

• Updates πk = P (τ 0 ≤ k|y1 , . . . , yk−1, yk )

• Broadcasts πk (or yk )

posterior probability of change

no change

posterior probability of change

Protocol: Given public belief πk−1 = P (x|a1 , . . . , ak−1 )

• Broadcasts πk (or yk )

At what time τ it optimal to declare change? • Agent k observes yk ∼ p(y|x) 0 posterior At what time τ it optimal to declare change? Social Learning: Given posterior π = P (τ ≤k− 1|a1 , . .probability . , ak−1 ) of change k−1 • Take action greedily to minimize cost

Learning: Given akSocial = arg min a Eπ,y (c(x, a)}

posterior πk−1 = P (τ 0 ≤ k − 1|a1 , . . . , ak−1 )

• Agent k receives observation yk .

• Update public belief πk = P (x|a1 , . . . , ak )

• Agent k receives observation yk .

• Chooses local decision ak = mina E{c(I(τ 0 ≤ k), a)|a1 , . . . , ak−1 , yk }. 0 ≤ k − 1|y , . . . , y 0 Posterior πk−1local = P (τ ) • Chooses decision ak = E{c(I(τ ≤ k), a)|a1 , . . . , ak−1 , yk }. 1 minak−1

• Broadcasts action ak .

• Agent k receives observation yk

4

⇥, C)

! B1 (·) k ≤ τ 0 0 , where τ = change time (usually geometric) Observations Broadcasts action ak y .k ∼ 0 B (·)Change k> τ Detection: Bayesian Bayesian Change Detection: Q1.Quickest-Time HQuickest-Time OW DO2 LOCAL & G LOBAL DECISION MAKERS INTERACT? µ Minimize E − τ 0 |+ + f I(τ < τ 0 )}, Aim: Compute time τ to annouce change: Example: Multiagent ! ! Quickest Change Detection π0 {d|τ " #$ % " #$ % 1 0X

B1 (·) (·) k ≤ τk ≤ τ 0 delay false-alarm 0 y ))} µy⇤k(⇤∼ arg c(x 1 min E{ k 1 , yk ) =B k , µ(⇤kτ 1 ,= k change , where time (usually geometric) Observations 0 µ = 0P (τ 0 ≤, k|y where Observations ykposterior ∼2 (·) kπ> Classical: Given 1 , . . . , ykτ): = change time (usually geometric) k τ k=1 0 B B (·) k > τ Optimal decision policy is2threshold.

1 X

0 0 + µ (⇤Aim: ⇧ { yk time , min τE to } = arg min E{ c(xk ,µµ(⇤k 1 , yk ))} k 1 , yk )Compute ,yk {c(x, a)change: k 1annouce τ }, a µMinimize Eµπ0 {d|τ − τ 0 |+ + f I(τ < 0)change |{z} declare | {z } Compute time τ to annouce change: + f I(τ < τ ) }, Minimize E {d|τ − τ | " #$ % #$ % " k=1 π0 socialistic ⇤

capitalistic ak

decision πk = P (change|y 0 1 , . . . , yk ): policy decision 1 k−1

"

#$ delay

%

Classical: Given posterior Classical: Given posterior π = P (τ ≤ k 0− 1|y , . . . , yk−1delay ) no change Optimal decision policy is threshold. [Shiriyaev 1950s]. Classical: Given posterior πk−1 = P (τ ≤ k − 1|y1 , . . . , yk−1) policy

• Agent k kreceives ykP (change|a , . . . , a ). Aim: Agents = 1,Public 2, . observation . . act sequentially Social Learning: belief ⇤ = k 1 1 k 1 • Agent k receives observation y

no change

declare " false-alarm #$ change % false-alarm

declare change

kdecision

to estimate state x ∼ π0 . 0

Updates πk = Py(τ ≤(y|x) k|y1 , . . . , yk−1, ypolicy •• Agent k: Observes k) k ⇥P

posterior probability of change

• Updates πk = P (τ 0 ≤ k|y1 , . . . , yk−1, yk )

no change

•• Broadcasts action ak =yarg mina E{c(state, a)|a1 , . . . , ak 1 , yk }. posterior probability of change Broadcasts πk (or k) Protocol: Given public belief πk−1 = P (x|a1 , . . . , ak−1 ) • Broadcasts yk ) k (or belief • Other agents updateπpublic (social learning filter) At what time τ it optimal to declare change? • Agent k observes yk ∼ p(y|x) X 0 posterior predictor At what time τ it optimal to declare change? Social Learning: Given posterior π = P (τ ≤(·) k− 1|a1 , . .probability . , ak−1 ) of change ⇤ = P (change|a , . . . , a ) ⌅ P (a |y, ⇤ )P (y|x) k−1 k 1 k k k 1 • Take action greedily to minimize cost update y Learning: Given posterior πk−1 = P (τ 0 ≤ k − 1|a1 , . . . , adeclare ) akSocial = arg min k−1change a Eπ,y (c(x, a)}

• Agent k receives observation yk .

global

• Update public belief πk = P (x|a1 , . . . , ak )

decision policy

• Agent k receives observation yk .

• Chooses local decision ak = mina E{c(I(τ 0 ≤ k), a)|a1 , . . no . ,change ak−1 , yk }. 1

0 ≤ k − 1|y , . . . , y 0 Posterior πk−1local = P (τ ) • Chooses decision ak = E{c(I(τ ≤ k), a)|a1 , . . . , ak−1 , yk }. 1 minak−1

• Broadcasts action ak .

• Agent k receives observation yk

4

posterior probability of change

⇥, C)

! B1 (·) k ≤ τ 0 0 , where τ = change time (usually geometric) Observations Broadcasts action ak y .k ∼ 0 B (·)Change k> τ Detection: Bayesian Bayesian Change Detection: Q1.Quickest-Time HQuickest-Time OW DO2 LOCAL & G LOBAL DECISION MAKERS INTERACT? µ Minimize E − τ 0 |+ + f I(τ < τ 0 )}, Aim: Compute time τ to annouce change: Example: Multiagent ! ! Quickest Change Detection π0 {d|τ " #$ % " #$ % 1 0X

B1 (·) (·) k ≤ τk ≤ τ 0 delay false-alarm 0 y ))} µy⇤k(⇤∼ arg c(x 1 min E{ k 1 , yk ) =B k , µ(⇤kτ 1 ,= k change , where time (usually geometric) Observations 0 µ = 0P (τ 0 ≤, k|y where Observations ykposterior ∼2 (·) kπ> Classical: Given 1 , . . . , ykτ): = change time (usually geometric) k τ k=1 0 B B (·) k > τ Optimal decision policy is2threshold.

1 X

0 0 + µ (⇤Aim: ⇧ { yk time , min τE to } = arg min E{ c(xk ,µµ(⇤k 1 , yk ))} k 1 , yk )Compute ,yk {c(x, a)change: k 1annouce τ }, a µMinimize Eµπ0 {d|τ − τ 0 |+ + f I(τ < 0)change |{z} declare | {z } Compute time τ to annouce change: + f I(τ < τ ) }, Minimize E {d|τ − τ | " #$ % #$ % " k=1 π0 socialistic ⇤

capitalistic ak

decision πk = P (change|y 0 1 , . . . , yk ): policy decision 1 k−1

"

#$ delay

%

Classical: Given posterior Classical: Given posterior π = P (τ ≤ k 0− 1|y , . . . , yk−1delay ) no change Optimal decision policy is threshold. [Shiriyaev 1950s]. Classical: Given posterior πk−1 = P (τ ≤ k − 1|y1 , . . . , yk−1) policy

• Agent k kreceives ykP (change|a , . . . , a ). Aim: Agents = 1,Public 2, . observation . . act sequentially Social Learning: belief ⇤ = k 1 1 k 1 • Agent k receives observation y

no change

declare " false-alarm #$ change % false-alarm

declare change

kdecision

to estimate state x ∼ π0 . 0

Updates πk = Py(τ ≤(y|x) k|y1 , . . . , yk−1, ypolicy •• Agent k: Observes k) k ⇥P

posterior probability of change

• Updates πk = P (τ 0 ≤ k|y1 , . . . , yk−1, yk )

no change

•• Broadcasts action ak =yarg mina E{c(state, a)|a1 , . . . , ak 1 , yk }. posterior probability of change Broadcasts πk (or k) Protocol: Given public belief πk−1 = P (x|a1 , . . . , ak−1 ) • Broadcasts yk ) k (or belief • Other agents updateπpublic (social learning filter) At what time τ it optimal to declare change? • Agent k observes yk ∼ p(y|x) X 0 posterior predictor At what time τ it optimal to declare change? Social Learning: Given posterior π = P (τ ≤(·) k− 1|a1 , . .probability . , ak−1 ) of change ⇤ = P (change|a , . . . , a ) ⌅ P (a |y, ⇤ )P (y|x) k−1 k 1 k k k 1 • Take action greedily to minimize cost update y Learning: Given posterior πk−1 = P (τ 0 ≤ k − 1|a1 , . . . , adeclare ) akSocial = arg min k−1change a Eπ,y (c(x, a)}

• When Agentshould k receives observation declare yk . global decision-maker change? • Update public belief πk = P (x|a1 , . . . , ak )

global

decision policy

• Agent k receives observation yk .

• Chooses local decision ak = mina E{c(I(τ 0 ≤ k), a)|a1 , . . no . ,change ak−1 , yk }. 1

0 ≤ k − 1|y , . . . , y 0 Posterior πk−1local = P (τ ) • Chooses decision ak = E{c(I(τ ≤ k), a)|a1 , .multi-threshold . . , ak−1 , yk }. 1 minak−1

• Broadcasts action ak .

• Agent k receives observation yk

4

posterior probability of change

⇥, C)

! B1 (·) k ≤ τ 0 0 , where τ = change time (usually geometric) Observations Broadcasts action ak y .k ∼ 0 B (·)Change k> τ Detection: Bayesian Bayesian Change Detection: Q1.Quickest-Time HQuickest-Time OW DO2 LOCAL & G LOBAL DECISION MAKERS INTERACT? µ Minimize E − τ 0 |+ + f I(τ < τ 0 )}, Aim: Compute time τ to annouce change: Example: Multiagent ! ! Quickest Change Detection π0 {d|τ " #$ % " #$ % 1 0X

B1 (·) (·) k ≤ τk ≤ τ 0 delay false-alarm 0 y ))} µy⇤k(⇤∼ arg c(x 1 min E{ k 1 , yk ) =B k , µ(⇤kτ 1 ,= k change , where time (usually geometric) Observations 0 µ = 0P (τ 0 ≤, k|y where Observations ykposterior ∼2 (·) kπ> Classical: Given 1 , . . . , ykτ): = change time (usually geometric) k τ k=1 0 B B (·) k > τ Optimal decision policy is2threshold.

1 X

0 0 + µ (⇤Aim: ⇧ { yk time , min τE to } = arg min E{ c(xk ,µµ(⇤k 1 , yk ))} k 1 , yk )Compute ,yk {c(x, a)change: k 1annouce τ }, a µMinimize Eµπ0 {d|τ − τ 0 |+ + f I(τ < 0)change |{z} declare | {z } Compute time τ to annouce change: + f I(τ < τ ) }, Minimize E {d|τ − τ | " #$ % #$ % " k=1 π0 socialistic ⇤

capitalistic ak

decision πk = P (change|y 0 1 , . . . , yk ): policy decision 1 k−1

"

#$ delay

%

Classical: Given posterior Classical: Given posterior π = P (τ ≤ k 0− 1|y , . . . , yk−1delay ) no change Optimal decision policy is threshold. [Shiriyaev 1950s]. Classical: Given posterior πk−1 = P (τ ≤ k − 1|y1 , . . . , yk−1) policy

• Agent k kreceives ykP (change|a , . . . , a ). Aim: Agents = 1,Public 2, . observation . . act sequentially Social Learning: belief ⇤ = k 1 1 k 1 • Agent k receives observation y

no change

declare " false-alarm #$ change % false-alarm

declare change

kdecision

to estimate state x ∼ π0 . 0

Updates πk = Py(τ ≤(y|x) k|y1 , . . . , yk−1, ypolicy •• Agent k: Observes k) k ⇥P

posterior probability of change

• Updates πk = P (τ 0 ≤ k|y1 , . . . , yk−1, yk )

no change

•• Broadcasts action ak =yarg mina E{c(state, a)|a1 , . . . , ak 1 , yk }. posterior probability of change Broadcasts πk (or k) Protocol: Given public belief πk−1 = P (x|a1 , . . . , ak−1 ) • Broadcasts yk ) k (or belief • Other agents updateπpublic (social learning filter) At what time τ it optimal to declare change? • Agent k observes yk ∼ p(y|x) X 0 posterior predictor At what time τ it optimal to declare change? Social Learning: Given posterior π = P (τ ≤(·) k− 1|a1 , . .probability . , ak−1 ) of change ⇤ = P (change|a , . . . , a ) ⌅ P (a |y, ⇤ )P (y|x) k−1 k 1 k k k 1 • Take action greedily to minimize cost update y Learning: Given posterior πk−1 = P (τ 0 ≤ k − 1|a1 , . . . , adeclare ) akSocial = arg min k−1change a Eπ,y (c(x, a)}

• When Agentshould k receives observation declare yk . global decision-maker change? • Update public belief πk = P (x|a1 , . . . , ak )

global

decision policy

• Agent k receives observation y .

k set is non-convex • Chooses.Stopping local decision ak = mina E{c(I(τ 0 ≤ k), a)|a1 , . . no . ,change ak−1 , yk }.

1

0 ≤ k − 1|y , . . . , y 0 Posterior πk−1local = P (τ ) • Chooses decision ak = E{c(I(τ ≤ k), a)|a1 , .multi-threshold . . , ak−1 , yk }. 1 minak−1

• Broadcasts action ak .

• Agent k receives observation yk

4

posterior probability of change

⇥, C)

! B1 (·) k ≤ τ 0 0 , where τ = change time (usually geometric) Observations Broadcasts action ak y .k ∼ 0 B (·)Change k> τ Detection: Bayesian Bayesian Change Detection: Q1.Quickest-Time HQuickest-Time OW DO2 LOCAL & G LOBAL DECISION MAKERS INTERACT? µ Minimize E − τ 0 |+ + f I(τ < τ 0 )}, Aim: Compute time τ to annouce change: Example: Multiagent ! ! Quickest Change Detection π0 {d|τ " #$ % " #$ % 1 0X

B1 (·) (·) k ≤ τk ≤ τ 0 delay false-alarm 0 y ))} µy⇤k(⇤∼ arg c(x 1 min E{ k 1 , yk ) =B k , µ(⇤kτ 1 ,= k change , where time (usually geometric) Observations 0 µ = 0P (τ 0 ≤, k|y where Observations ykposterior ∼2 (·) kπ> Classical: Given 1 , . . . , ykτ): = change time (usually geometric) k τ k=1 0 B B (·) k > τ Optimal decision policy is2threshold.

1 X

0 0 + µ (⇤Aim: ⇧ { yk time , min τE to } = arg min E{ c(xk ,µµ(⇤k 1 , yk ))} k 1 , yk )Compute ,yk {c(x, a)change: k 1annouce τ }, a µMinimize Eµπ0 {d|τ − τ 0 |+ + f I(τ < 0)change |{z} declare | {z } Compute time τ to annouce change: + f I(τ < τ ) }, Minimize E {d|τ − τ | " #$ % #$ % " k=1 π0 socialistic ⇤

capitalistic ak

decision πk = P (change|y 0 1 , . . . , yk ): policy decision 1 k−1

"

#$ delay

%

Classical: Given posterior Classical: Given posterior π = P (τ ≤ k 0− 1|y , . . . , yk−1delay ) no change Optimal decision policy is threshold. [Shiriyaev 1950s]. Classical: Given posterior πk−1 = P (τ ≤ k − 1|y1 , . . . , yk−1) policy

• Agent k kreceives ykP (change|a , . . . , a ). Aim: Agents = 1,Public 2, . observation . . act sequentially Social Learning: belief ⇤ = k 1 1 k 1 • Agent k receives observation y

no change

declare " false-alarm #$ change % false-alarm

declare change

kdecision

to estimate state x ∼ π0 . 0

Updates πk = Py(τ ≤(y|x) k|y1 , . . . , yk−1, ypolicy •• Agent k: Observes k) k ⇥P

posterior probability of change

• Updates πk = P (τ 0 ≤ k|y1 , . . . , yk−1, yk )

no change

•• Broadcasts action ak =yarg mina E{c(state, a)|a1 , . . . , ak 1 , yk }. posterior probability of change Broadcasts πk (or k) Protocol: Given public belief πk−1 = P (x|a1 , . . . , ak−1 ) • Broadcasts yk ) k (or belief • Other agents updateπpublic (social learning filter) At what time τ it optimal to declare change? • Agent k observes yk ∼ p(y|x) X 0 posterior predictor At what time τ it optimal to declare change? Social Learning: Given posterior π = P (τ ≤(·) k− 1|a1 , . .probability . , ak−1 ) of change ⇤ = P (change|a , . . . , a ) ⌅ P (a |y, ⇤ )P (y|x) k−1 k 1 k k k 1 • Take action greedily to minimize cost update y Learning: Given posterior πk−1 = P (τ 0 ≤ k − 1|a1 , . . . , adeclare ) akSocial = arg min k−1change a Eπ,y (c(x, a)}

• When Agentshould k receives observation declare yk . global decision-maker change? • Update public belief πk = P (x|a1 , . . . , ak )

global

decision policy

• Agent k receives observation yk .

0 • ChoosesGlobal local decision ak = mina E{c(I(τ ≤ k), a)|a1 , . . no . ,change ak−1 , yk }. Summary: Decision making using local

1

0 ≤ k − 1|y , . . . , y 0 Posterior πk−1local =isP (τ ) • Chooses decision ak = E{c(I(τ ≤ k), a)|a1 , .multi-threshold . . , ak−1 , yk }. 1 minak−1 decisions non-monotone!

• Broadcasts action ak .

• Agent k receives observation yk

4

posterior probability of change

REFERENCES

• Krishnamurthy, Namvar and Hamdi, Interactive Sensing and Decision Making in Social Networks, 2014 (monograph in Foundations & Trends in SP) • Krishnamurthy and Hoiles, Social Learning, Data Incest and Revealed Preferences, IEEE Journal Computational Social Systems, 2015 • Krishnamurthy, Quickest Detection POMDPs with Social Learning, IEEE Trans Information Theory, 2012 • Krishnamurthy & Yin, Tracking the Degree Distribution of Random Graphs, IEEE Trans Information Theory, 2014. • Krishnamurthy, Bhatt, Sequential Detection of Market Shocks with Risk-Averse CVaR social sensors, IEEE Journal Selected Topics Signal Proc, 2016

5

Recommend Documents

Learning with Dynamic Programming - Cornell

Sensing And Decision Making With Social Learning

P01.tif

P01 - IEA-ETSAP

Learning Dynamic Compressive Sensing Models