Announcements CS 188: Armficial Intelligence Probability Recap ...

Comment

Report 8 Downloads 67 Views

Announcements §  Project 3: MDPs and Reinforcement Learning §  Due Friday 3/6 at 5pm

CS 188: ArOﬁcial Intelligence

Bayes’ Nets: Independence

§  Midterm 1 §  Monday 3/9, 6:00-‐9:00pm §  [A-‐H] 155 Dwinelle §  [I-‐V] 150 Wheeler §  [W-‐Z] 145 Dwinelle

§  PreparaOon page up §  Topics: Lectures 1 through 11 (inclusive) §  Past exams §  Special midterm 1 oﬃce hours

§  PracOce Midterm 1 §  OpOonal §  One point of EC on Midterm 1 for compleOng §  Due: Saturday 3/7 at 11:59pm (submit into Gradescope)

Instructors: Pieter Abbeel -‐-‐-‐ University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hfp://ai.berkeley.edu.]

Probability Recap §  CondiOonal probability §  Product rule §  Chain rule

Bayes’ Nets §  A Bayes’ net is an eﬃcient encoding of a probabilisOc model of a domain §  QuesOons we can ask:

§  X, Y independent if and only if: §  X and Y are condiOonally independent given Z if and only if:

§  Inference: given a ﬁxed BN, what is P(X | e)? §  RepresentaOon: given a BN graph, what kinds of distribuOons can it encode? §  Modeling: what BN is most appropriate for a given domain?

Bayes’ Net SemanOcs

Example: Alarm Network B

§  A directed, acyclic graph, one node per random variable §  A condiOonal probability table (CPT) for each node §  A collecOon of distribuOons over X, one for each combinaOon of parents’ values

§  Bayes’ nets implicitly encode joint distribuOons §  As a product of local condiOonal distribuOons §  To see what probability a BN gives to a full assignment, mulOply all the relevant condiOonals together:

P(B)

+b 0.001 -‐b

B

E

0.999

E

-‐e

A

P(E)

+e 0.002 0.998

A

J

P(J|A)

A

M

P(M|A)

+a

+j

0.9

+a

+m

0.7

+a

-‐j

0.1

+a

-‐m

0.3

-‐a

+j

0.05

-‐a

+m

0.01

-‐a

-‐j

0.95

-‐a

-‐m

0.99

J

M

B

E

A

P(A|B,E)

+b

+e

+a

0.95

+b

+e

-‐a

0.05

+b

-‐e

+a

0.94

+b

-‐e

-‐a

0.06

-‐b

+e

+a

0.29

-‐b

+e

-‐a

0.71

-‐b

-‐e

+a

0.001

-‐b

-‐e

-‐a

0.999

Example: Alarm Network B

P(B)

B

+b 0.001 -‐b

E

0.999

A

J

P(J|A)

+a

+j

0.9

+a

-‐j

0.1

-‐a

+j

0.05

-‐a

-‐j

0.95

E

J

M

§  How big is a joint distribuOon over N Boolean variables?

P(E)

+e 0.002 -‐e

A

Size of a Bayes’ Net §  Both give you the power to calculate

2N

0.998

A

M

P(M|A)

+a

+m

0.7

+a

-‐m

0.3

-‐a

+m

0.01

-‐a

-‐m

0.99

B

E

A

P(A|B,E)

+b

+e

+a

0.95

+b

+e

-‐a

0.05

+b

-‐e

+a

0.94

+b

-‐e

-‐a

0.06

-‐b

+e

+a

0.29

-‐b

+e

-‐a

0.71

-‐b

-‐e

+a

0.001

-‐b

-‐e

-‐a

0.999

§  Also easier to elicit local CPTs §  Also faster to answer queries (coming)

O(N * 2k+1)

Bayes’ Nets §  RepresentaOon

§  BNs: Huge space savings!

§  How big is an N-‐node net if nodes have up to k parents?

CondiOonal Independence §  X and Y are independent if

§  CondiOonal Independences §  ProbabilisOc Inference §  Learning Bayes’ Nets from Data

§  X and Y are condiOonally independent given Z §  (CondiOonal) independence is a property of a distribuOon §  Example:

Bayes Nets: AssumpOons §  AssumpOons we are required to make to deﬁne the Bayes net when given the graph: P (xi |x1 · · · xi

1)

= P (xi |parents(Xi ))

§  Beyond above “chain rule à Bayes net” condiOonal independence assumpOons

Example X

Y

Z

§  CondiOonal independence assumpOons directly from simpliﬁcaOons in chain rule:

§  Ouen addiOonal condiOonal independences §  They can be read oﬀ the graph

§  Important for modeling: understand assumpOons made when choosing a Bayes net graph

W

§  AddiOonal implied condiOonal independence assumpOons?

Independence in a BN

D-‐separaOon: Outline

§  Important quesOon about a BN: §  §  §  § 

Are two nodes independent given certain evidence? If yes, can prove using algebra (tedious in general) If no, can prove with a counter example Example:

X

Y

Z

§  QuesOon: are X and Z necessarily independent? §  Answer: no. Example: low pressure causes rain, which causes traﬃc. §  X can inﬂuence Z, Z can inﬂuence X (via Y) §  Addendum: they could be independent: how?

D-‐separaOon: Outline §  Study independence properOes for triples

Causal Chains §  This conﬁguraOon is a “causal chain”

§  Guaranteed X independent of Z ? No! §  One example set of CPTs for which X is not independent of Z is suﬃcient to show this independence is not guaranteed.

§  Analyze complex cases in terms of member triples

§  Example:

§  D-‐separaOon: a condiOon / algorithm for answering such queries

§  Low pressure causes rain causes traﬃc, high pressure causes no rain causes no traﬃc X: Low pressure Y: Rain Z: Traﬃc

§  In numbers: P( +y | +x ) = 1, P( -‐y | -‐ x ) = 1, P( +z | +y ) = 1, P( -‐z | -‐y ) = 1

Causal Chains §  This conﬁguraOon is a “causal chain”

Common Cause

§  Guaranteed X independent of Z given Y?

§  This conﬁguraOon is a “common cause”

§  Guaranteed X independent of Z ? No! §  One example set of CPTs for which X is not independent of Z is suﬃcient to show this independence is not guaranteed.

Y: Project due

§  Example: §  Project due causes both forums busy and lab full §  In numbers: P( +x | +y ) = 1, P( -‐x | -‐y ) = 1, P( +z | +y ) = 1, P( -‐z | -‐y ) = 1

X: Low pressure Y: Rain Z: Traﬃc

Yes! §  Evidence along the chain “blocks” the inﬂuence

X: Forums busy

Z: Lab full

Common Cause §  This conﬁguraOon is a “common cause”

Common Eﬀect

§  Guaranteed X and Z independent given Y?

§  Last conﬁguraOon: two causes of one eﬀect (v-‐structures)

Y: Project due

X: Raining

Y: Ballgame

§  Are X and Y independent? §  Yes: the ballgame and the rain cause traﬃc, but they are not correlated §  SOll need to prove they must be (try it!)

§  Are X and Y independent given Z? §  No: seeing traﬃc puts the rain and the ballgame in compeOOon as explanaOon. X: Forums busy

Z: Lab full

§  This is backwards from the other cases

Yes!

§  Observing an eﬀect acOvates inﬂuence between

Z: Traﬃc

§  Observing the cause blocks inﬂuence between eﬀects.

The General Case

possible causes.

The General Case §  General quesOon: in a given BN, are two variables independent (given evidence)? §  SoluOon: analyze the graph §  Any complex example can be broken into repeOOons of the three canonical cases

Reachability

AcOve / InacOve Paths

§  Afempt 1: if two nodes are connected by an undirected path not blocked by a shaded node, they are condiOonally independent §  Almost works, but not quite §  Where does it break? §  Answer: the v-‐structure at T doesn’t count as a link in a path unless “acOve”

§  QuesOon: Are X and Y condiOonally independent given evidence variables {Z}?

L

§  Recipe: shade evidence nodes, look for paths in the resulOng graph

R

B

§  Yes, if X and Y “d-‐separated” by Z §  Consider all (undirected) paths from X to Y §  No acOve paths = independence!

§  A path is acOve if each triple is acOve: D

T

§  Causal chain A → B → C where B is unobserved (either direcOon) §  Common cause A ← B → C where B is unobserved §  Common eﬀect (aka v-‐structure) A → B ← C where B or one of its descendents is observed

§  All it takes to block a path is a single inacOve segment

AcOve Triples

InacOve Triples

D-‐SeparaOon §  Query:

X i

Xj |{Xk1 , ..., Xkn }

Example

?

§  Check all (undirected!) paths between and

Xi

R

Yes

§  If one or more acOve, then independence not guaranteed

Xj |{Xk1 , ..., Xkn }

B

T

§  Otherwise (i.e. if all paths are inacOve), then independence is guaranteed

Xi

T’

Xj |{Xk1 , ..., Xkn }

Example

Example §  Variables:

L

Yes R

Yes D

B

T

Yes T’

Structure ImplicaOons §  Given a Bayes net structure, can run d-‐ separaOon algorithm to build a complete list of condiOonal independences that are necessarily true of the form

Xi

Xj |{Xk1 , ..., Xkn }

§  R: Raining §  T: Traﬃc §  D: Roof drips §  S: I’m sad

R

T

D

§  QuesOons: S

Yes

CompuOng All Independences Y X

Z Y

X

Z

X

Z

§  This list determines the set of probability distribuOons that can be represented

Y Y X

Z

Topology Limits DistribuOons §  Given some graph topology G, only certain joint distribuOons can be encoded

{X X

Y, X

Z, Y

Z | Y, X

Z,

Y | Z, Y

Bayes Nets RepresentaOon Summary {X

Z | X}

Z | Y}

Y

Y X

X

Z

Z Y

§  The graph structure guarantees certain (condiOonal) independences

X

§  (There might be more independence)

X

Z Y Z

{}

§  Adding arcs increases the set of distribuOons, but has several costs

Y

Y X

§  Full condiOoning can encode any distribuOon

Z

X

Z

X

X

Z

X

Y

Y X

Y Z

Y

Bayes’ Nets §  RepresentaOon §  CondiOonal Independences §  ProbabilisOc Inference §  EnumeraOon (exact, exponenOal complexity) §  Variable eliminaOon (exact, worst-‐case exponenOal complexity, ouen befer) §  ProbabilisOc inference is NP-‐complete §  Sampling (approximate)

§  Learning Bayes’ Nets from Data

Z

§  Bayes nets compactly encode joint distribuOons §  Guaranteed independencies of distribuOons can be deduced from BN graph structure §  D-‐separaOon gives precise condiOonal independence guarantees from graph alone §  A Bayes’ net’s joint distribuOon may have further (condiOonal) independence that is not detectable unOl you inspect its speciﬁc distribuOon

Z

Good luck on MT1 on Monday!