Announcements § Project 3: MDPs and Reinforcement Learning § Due Friday 3/6 at 5pm
CS 188: ArOficial Intelligence
Bayes’ Nets: Independence
§ Midterm 1 § Monday 3/9, 6:00-‐9:00pm § [A-‐H] 155 Dwinelle § [I-‐V] 150 Wheeler § [W-‐Z] 145 Dwinelle
§ PreparaOon page up § Topics: Lectures 1 through 11 (inclusive) § Past exams § Special midterm 1 office hours
§ PracOce Midterm 1 § OpOonal § One point of EC on Midterm 1 for compleOng § Due: Saturday 3/7 at 11:59pm (submit into Gradescope)
Instructors: Pieter Abbeel -‐-‐-‐ University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hfp://ai.berkeley.edu.]
Probability Recap § CondiOonal probability § Product rule § Chain rule
Bayes’ Nets § A Bayes’ net is an efficient encoding of a probabilisOc model of a domain § QuesOons we can ask:
§ X, Y independent if and only if: § X and Y are condiOonally independent given Z if and only if:
§ Inference: given a fixed BN, what is P(X | e)? § RepresentaOon: given a BN graph, what kinds of distribuOons can it encode? § Modeling: what BN is most appropriate for a given domain?
Bayes’ Net SemanOcs
Example: Alarm Network B
§ A directed, acyclic graph, one node per random variable § A condiOonal probability table (CPT) for each node § A collecOon of distribuOons over X, one for each combinaOon of parents’ values
§ Bayes’ nets implicitly encode joint distribuOons § As a product of local condiOonal distribuOons § To see what probability a BN gives to a full assignment, mulOply all the relevant condiOonals together:
P(B)
+b 0.001 -‐b
B
E
0.999
E
-‐e
A
P(E)
+e 0.002 0.998
A
J
P(J|A)
A
M
P(M|A)
+a
+j
0.9
+a
+m
0.7
+a
-‐j
0.1
+a
-‐m
0.3
-‐a
+j
0.05
-‐a
+m
0.01
-‐a
-‐j
0.95
-‐a
-‐m
0.99
J
M
B
E
A
P(A|B,E)
+b
+e
+a
0.95
+b
+e
-‐a
0.05
+b
-‐e
+a
0.94
+b
-‐e
-‐a
0.06
-‐b
+e
+a
0.29
-‐b
+e
-‐a
0.71
-‐b
-‐e
+a
0.001
-‐b
-‐e
-‐a
0.999
Example: Alarm Network B
P(B)
B
+b 0.001 -‐b
E
0.999
A
J
P(J|A)
+a
+j
0.9
+a
-‐j
0.1
-‐a
+j
0.05
-‐a
-‐j
0.95
E
J
M
§ How big is a joint distribuOon over N Boolean variables?
P(E)
+e 0.002 -‐e
A
Size of a Bayes’ Net § Both give you the power to calculate
2N
0.998
A
M
P(M|A)
+a
+m
0.7
+a
-‐m
0.3
-‐a
+m
0.01
-‐a
-‐m
0.99
B
E
A
P(A|B,E)
+b
+e
+a
0.95
+b
+e
-‐a
0.05
+b
-‐e
+a
0.94
+b
-‐e
-‐a
0.06
-‐b
+e
+a
0.29
-‐b
+e
-‐a
0.71
-‐b
-‐e
+a
0.001
-‐b
-‐e
-‐a
0.999
§ Also easier to elicit local CPTs § Also faster to answer queries (coming)
O(N * 2k+1)
Bayes’ Nets § RepresentaOon
§ BNs: Huge space savings!
§ How big is an N-‐node net if nodes have up to k parents?
CondiOonal Independence § X and Y are independent if
§ CondiOonal Independences § ProbabilisOc Inference § Learning Bayes’ Nets from Data
§ X and Y are condiOonally independent given Z § (CondiOonal) independence is a property of a distribuOon § Example:
Bayes Nets: AssumpOons § AssumpOons we are required to make to define the Bayes net when given the graph: P (xi |x1 · · · xi
1)
= P (xi |parents(Xi ))
§ Beyond above “chain rule à Bayes net” condiOonal independence assumpOons
Example X
Y
Z
§ CondiOonal independence assumpOons directly from simplificaOons in chain rule:
§ Ouen addiOonal condiOonal independences § They can be read off the graph
§ Important for modeling: understand assumpOons made when choosing a Bayes net graph
W
§ AddiOonal implied condiOonal independence assumpOons?
Independence in a BN
D-‐separaOon: Outline
§ Important quesOon about a BN: § § § §
Are two nodes independent given certain evidence? If yes, can prove using algebra (tedious in general) If no, can prove with a counter example Example:
X
Y
Z
§ QuesOon: are X and Z necessarily independent? § Answer: no. Example: low pressure causes rain, which causes traffic. § X can influence Z, Z can influence X (via Y) § Addendum: they could be independent: how?
D-‐separaOon: Outline § Study independence properOes for triples
Causal Chains § This configuraOon is a “causal chain”
§ Guaranteed X independent of Z ? No! § One example set of CPTs for which X is not independent of Z is sufficient to show this independence is not guaranteed.
§ Analyze complex cases in terms of member triples
§ Example:
§ D-‐separaOon: a condiOon / algorithm for answering such queries
§ Low pressure causes rain causes traffic, high pressure causes no rain causes no traffic X: Low pressure Y: Rain Z: Traffic
§ In numbers: P( +y | +x ) = 1, P( -‐y | -‐ x ) = 1, P( +z | +y ) = 1, P( -‐z | -‐y ) = 1
Causal Chains § This configuraOon is a “causal chain”
Common Cause
§ Guaranteed X independent of Z given Y?
§ This configuraOon is a “common cause”
§ Guaranteed X independent of Z ? No! § One example set of CPTs for which X is not independent of Z is sufficient to show this independence is not guaranteed.
Y: Project due
§ Example: § Project due causes both forums busy and lab full § In numbers: P( +x | +y ) = 1, P( -‐x | -‐y ) = 1, P( +z | +y ) = 1, P( -‐z | -‐y ) = 1
X: Low pressure Y: Rain Z: Traffic
Yes! § Evidence along the chain “blocks” the influence
X: Forums busy
Z: Lab full
Common Cause § This configuraOon is a “common cause”
Common Effect
§ Guaranteed X and Z independent given Y?
§ Last configuraOon: two causes of one effect (v-‐structures)
Y: Project due
X: Raining
Y: Ballgame
§ Are X and Y independent? § Yes: the ballgame and the rain cause traffic, but they are not correlated § SOll need to prove they must be (try it!)
§ Are X and Y independent given Z? § No: seeing traffic puts the rain and the ballgame in compeOOon as explanaOon. X: Forums busy
Z: Lab full
§ This is backwards from the other cases
Yes!
§ Observing an effect acOvates influence between
Z: Traffic
§ Observing the cause blocks influence between effects.
The General Case
possible causes.
The General Case § General quesOon: in a given BN, are two variables independent (given evidence)? § SoluOon: analyze the graph § Any complex example can be broken into repeOOons of the three canonical cases
Reachability
AcOve / InacOve Paths
§ Afempt 1: if two nodes are connected by an undirected path not blocked by a shaded node, they are condiOonally independent § Almost works, but not quite § Where does it break? § Answer: the v-‐structure at T doesn’t count as a link in a path unless “acOve”
§ QuesOon: Are X and Y condiOonally independent given evidence variables {Z}?
L
§ Recipe: shade evidence nodes, look for paths in the resulOng graph
R
B
§ Yes, if X and Y “d-‐separated” by Z § Consider all (undirected) paths from X to Y § No acOve paths = independence!
§ A path is acOve if each triple is acOve: D
T
§ Causal chain A → B → C where B is unobserved (either direcOon) § Common cause A ← B → C where B is unobserved § Common effect (aka v-‐structure) A → B ← C where B or one of its descendents is observed
§ All it takes to block a path is a single inacOve segment
AcOve Triples
InacOve Triples
D-‐SeparaOon § Query:
X i
Xj |{Xk1 , ..., Xkn }
Example
?
§ Check all (undirected!) paths between and
Xi
R
Yes
§ If one or more acOve, then independence not guaranteed
Xj |{Xk1 , ..., Xkn }
B
T
§ Otherwise (i.e. if all paths are inacOve), then independence is guaranteed
Xi
T’
Xj |{Xk1 , ..., Xkn }
Example
Example § Variables:
L
Yes R
Yes D
B
T
Yes T’
Structure ImplicaOons § Given a Bayes net structure, can run d-‐ separaOon algorithm to build a complete list of condiOonal independences that are necessarily true of the form
Xi
Xj |{Xk1 , ..., Xkn }
§ R: Raining § T: Traffic § D: Roof drips § S: I’m sad
R
T
D
§ QuesOons: S
Yes
CompuOng All Independences Y X
Z Y
X
Z
X
Z
§ This list determines the set of probability distribuOons that can be represented
Y Y X
Z
Topology Limits DistribuOons § Given some graph topology G, only certain joint distribuOons can be encoded
{X X
Y, X
Z, Y
Z | Y, X
Z,
Y | Z, Y
Bayes Nets RepresentaOon Summary {X
Z | X}
Z | Y}
Y
Y X
X
Z
Z Y
§ The graph structure guarantees certain (condiOonal) independences
X
§ (There might be more independence)
X
Z Y Z
{}
§ Adding arcs increases the set of distribuOons, but has several costs
Y
Y X
§ Full condiOoning can encode any distribuOon
Z
X
Z
X
X
Z
X
Y
Y X
Y Z
Y
Bayes’ Nets § RepresentaOon § CondiOonal Independences § ProbabilisOc Inference § EnumeraOon (exact, exponenOal complexity) § Variable eliminaOon (exact, worst-‐case exponenOal complexity, ouen befer) § ProbabilisOc inference is NP-‐complete § Sampling (approximate)
§ Learning Bayes’ Nets from Data
Z
§ Bayes nets compactly encode joint distribuOons § Guaranteed independencies of distribuOons can be deduced from BN graph structure § D-‐separaOon gives precise condiOonal independence guarantees from graph alone § A Bayes’ net’s joint distribuOon may have further (condiOonal) independence that is not detectable unOl you inspect its specific distribuOon
Z
Good luck on MT1 on Monday!