Announcements
CS 188: Ar2ficial Intelligence
Bayes’ Nets: Inference
§ Midterm 1 § Solu2ons posted onto piazza § Grades available on gradescope § Regrade request window: today/Thursday 11:59pm – Sunday 3/15 11:59pm
§ Homework 6 § Due: Monday at 11:59pm
§ Project 4 – NEW! § Due: Friday 3/20 at 5pm
Instructors: Pieter Abbeel -‐-‐-‐ University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hap://ai.berkeley.edu.]
Bayes’ Net Representa2on
Example: Alarm Network B
§ A directed, acyclic graph, one node per random variable § A condi2onal probability table (CPT) for each node
-‐b
§ A collec2on of distribu2ons over X, one for each combina2on of parents’ values
E
P(B)
+b 0.001
Burglary
Earthqk
P(E)
+e 0.002 -‐e
0.999
0.998
Alarm
John calls
§ Bayes’ nets implicitly encode joint distribu2ons
Mary calls
§ As a product of local condi2onal distribu2ons § To see what probability a BN gives to a full assignment, mul2ply all the relevant condi2onals together:
B
E
A
P(A|B,E)
+b
+e
+a
0.95
+b
+e
-‐a
0.05
+b
-‐e
+a
0.94
A
J
P(J|A)
A
M
P(M|A)
+b
-‐e
-‐a
0.06
+a
+j
0.9
+a
+m
0.7
-‐b
+e
+a
0.29
+a
-‐j
0.1
+a
-‐m
0.3
-‐b
+e
-‐a
0.71
-‐a
+j
0.05
-‐a
+m
0.01
-‐b
-‐e
+a
0.001
-‐a
-‐j
0.95
-‐a
-‐m
0.99
-‐b
-‐e
-‐a
0.999
Video of Demo BN Applet
Example: Alarm Network B
P(B)
+b 0.001 -‐b
B
E
0.999
E
-‐e
A
P(E)
+e 0.002 0.998
A
J
P(J|A)
A
M
P(M|A)
+a
+j
0.9
+a
+m
0.7
+a
-‐j
0.1
+a
-‐m
0.3
-‐a
+j
0.05
-‐a
+m
0.01
-‐a
-‐j
0.95
-‐a
-‐m
0.99
J
M
B
E
A
P(A|B,E)
+b
+e
+a
0.95
+b
+e
-‐a
0.05
+b
-‐e
+a
0.94
+b
-‐e
-‐a
0.06
-‐b
+e
+a
0.29
-‐b
+e
-‐a
0.71
-‐b
-‐e
+a
0.001
-‐b
-‐e
-‐a
0.999
Example: Alarm Network B
P(B)
+b 0.001 -‐b
B
E
0.999
E
P(E)
+e 0.002 -‐e
A
Time
A
J
P(J|A)
A
M
P(M|A)
+j
0.9
+a
+m
0.7
+a
-‐j
0.1
+a
-‐m
0.3
-‐a
+j
0.05
-‐a
+m
0.01
-‐a
-‐j
0.95
-‐a
-‐m
0.99
M
Temperature
0.998
+a
J
P4 Bayes’ Net
B
E
A
P(A|B,E)
+b
+e
+a
0.95
+b
+e
-‐a
0.05
+b
-‐e
+a
0.94
+b
-‐e
-‐a
0.06
-‐b
+e
+a
0.29
-‐b
+e
-‐a
0.71
-‐b
-‐e
+a
0.001
-‐b
-‐e
-‐a
0.999
Laser
Blast
Belt
P4 Demo Video
Speed
Size
Bayes’ Nets § Representa2on § Condi2onal Independences § Probabilis2c Inference § Enumera2on (exact, exponen2al complexity) § Variable elimina2on (exact, worst-‐case exponen2al complexity, olen beaer) § Inference is NP-‐complete § Sampling (approximate)
§ Learning Bayes’ Nets from Data
Inference § Inference: calcula2ng some useful quan2ty from a joint probability distribu2on
§ Examples: § Posterior probability
§ Most likely explana2on:
Inference by Enumera2on § General case:
§ Evidence variables: § Query* variable: § Hidden variables:
§ Step 1: Select the entries consistent with the evidence
* Works fine with mul:ple query variables, too
§ We want: All variables
§ Step 2: Sum out H to get joint of Query and evidence
§ Step 3: Normalize
⇥
1 Z
Inference by Enumera2on in Bayes’ Net § Given unlimited 2me, inference in BNs is easy § Reminder of inference by enumera2on by example:
B
P (B | + j, +m) /B P (B, +j, +m) =
X
=
E A
P (B, e, a, +j, +m)
J
e,a
X
Inference by Enumera2on?
M
P (B)P (e)P (a|B, e)P (+j|a)P (+m|a)
e,a
=P (B)P (+e)P (+a|B, +e)P (+j| + a)P (+m| + a) + P (B)P (+e)P ( a|B, +e)P (+j|
a)P (+m|
a)
P (B)P ( e)P (+a|B, e)P (+j| + a)P (+m| + a) + P (B)P ( e)P ( a|B, e)P (+j|
a)P (+m|
a)
P (Antilock|observed variables) = ?
Inference by Enumera2on vs. Variable Elimina2on § Why is inference by enumera2on so slow?
Factor Zoo
§ Idea: interleave joining and marginalizing!
§ You join up the whole joint distribu2on before you sum out the hidden variables
§ Called “Variable Elimina2on” § S2ll NP-‐hard, but usually much faster than inference by enumera2on
§ First we’ll need some new nota2on: factors
Factor Zoo I § Joint distribu2on: P(X,Y) § Entries P(x,y) for all x, y § Sums to 1
T
W
P
hot
sun
0.4
hot
rain
0.1
cold
sun
0.2
cold
rain
0.3
Factor Zoo II § Single condi2onal: P(Y | x) § Entries P(y | x) for fixed x, all y § Sums to 1
T
W
P
cold
sun
0.4
cold
rain
0.6
§ Selected joint: P(x,Y) § A slice of the joint distribu2on § Entries P(x,y) for fixed x, all y § Sums to P(x)
§ Number of capitals = dimensionality of the table
T
W
P
cold
sun
0.2
cold
rain
0.3
§ Family of condi2onals: P(X |Y) § Mul2ple condi2onals § Entries P(x | y) for all x, y § Sums to |Y|
T
W
P
hot
sun
0.8
hot
rain
0.2
cold
sun
0.4
cold
rain
0.6
Factor Zoo III
Factor Zoo Summary § In general, when we write P(Y1 … YN | X1 … XM)
§ Specified family: P( y | X ) § Entries P(y | x) for fixed y, but for all x § Sums to … who knows!
§ It is a “factor,” a mul2-‐dimensional array § Its values are P(y1 … yN | x1 … xM) § Any assigned (=lower-‐case) X or Y is a dimension missing (selected) from the array
T
W
P
hot
rain
0.2
cold
rain
0.6
Example: Traffic Domain § Random Variables § R: Raining § T: Traffic § L: Late for class!
R T
P (L) = ? =
X
L
P (r, t, L)
r,t
=
X
P (r)P (t|r)P (L|t)
r,t
+r -‐r
Variable Elimina2on (VE)
0.1 0.9
+r +r -‐r -‐r
+t -‐t +t -‐t
0.8 0.2 0.1 0.9
+t +t -‐t -‐t
+l -‐l +l -‐l
0.3 0.7 0.1 0.9
Inference by Enumera2on: Procedural Outline § Track objects called factors § Ini2al factors are local CPTs (one per node) +r -‐r
0.1 0.9
+r +r -‐r -‐r
+t -‐t +t -‐t
0.8 0.2 0.1 0.9
+t +t -‐t -‐t
+l -‐l +l -‐l
0.3 0.7 0.1 0.9
§ Any known values are selected § E.g. if we know , the ini2al factors are +r -‐r
0.1 0.9
+r +r -‐r -‐r
+t -‐t +t -‐t
0.8 0.2 0.1 0.9
+t -‐t
+l +l
0.3 0.1
§ Procedure: Join all factors, then eliminate all hidden variables
Opera2on 1: Join Factors § First basic opera2on: joining factors § Combining factors: § Just like a database join § Get all factors over the joining variable § Build a new factor over the union of the variables involved
§ Example: Join on R
R +r -‐r
T
0.1 0.9
+r +r -‐r -‐r
§ Computa2on for each entry: pointwise products
+t -‐t +t -‐t
0.8 0.2 0.1 0.9
+r +t 0.08 +r -‐t 0.02 -‐r +t 0.09 -‐r -‐t 0.81
R,T
Example: Mul2ple Joins
Example: Mul2ple Joins R T L
+r -‐r
0.1 0.9
Join R
+r +r -‐r -‐r
+t -‐t +t -‐t
0.8 0.2 0.1 0.9
+t +t -‐t -‐t
+l -‐l +l -‐l
0.3 0.7 0.1 0.9
+t +t -‐t -‐t
§ Example:
+r +r -‐r -‐r
+t -‐t +t -‐t
0.08 0.02 0.09 0.81
+t -‐t
+l -‐l +l -‐l
R, T, L
R, T +r +r +r +r -‐r -‐r -‐r -‐r
0.3 0.7 0.1 0.9
R, T, L +r +r +r +r -‐r -‐r -‐r -‐r
§ A projec2on opera2on
Join T
0.08 0.02 0.09 0.81
+t +t -‐t -‐t +t +t -‐t -‐t
+l -‐l +l -‐l +l -‐l +l -‐l
Mul2ple Elimina2on
§ Second basic opera2on: marginaliza2on § Shrinks a factor to a smaller one
+t -‐t +t -‐t
L
Opera2on 2: Eliminate § Take a factor and sum out a variable
+r +r -‐r -‐r
+t +t -‐t -‐t +t +t -‐t -‐t
+l -‐l +l -‐l +l -‐l +l -‐l
T, L 0.024 0.056 0.002 0.018 0.027 0.063 0.081 0.729
Sum out R
L
Sum out T +t +t -‐t -‐t
+l -‐l +l -‐l
0.051 0.119 0.083 0.747
+l 0.134 -‐l 0.886
0.17 0.83
Thus Far: Mul2ple Join, Mul2ple Eliminate (= Inference by Enumera2on)
Marginalizing Early (= Variable Elimina2on)
0.024 0.056 0.002 0.018 0.027 0.063 0.081 0.729
Traffic Domain
Marginalizing Early! (aka VE) Join R
P (L) = ?
R T
§ Inference by Enumera2on =
L
XX t
+r -‐r
§ Variable Elimina2on =
P (L|t)P (r)P (t|r)
X t
r
Join on r Join on t
P (L|t)
X
P (r)P (t|r)
R
r
Join on r
T
+r +r -‐r -‐r
0.1 0.9
+t -‐t +t -‐t
Sum out R +r +r -‐r -‐r
0.8 0.2 0.1 0.9
Join on t
Eliminate t
Eliminate t
L
+t +t -‐t -‐t
+l -‐l +l -‐l
0.3 0.7 0.1 0.9
Evidence § If evidence, start with factors that select that evidence § No evidence uses these ini2al factors: 0.1 0.9
+r +r -‐r -‐r
+t -‐t +t -‐t
0.8 0.2 0.1 0.9
0.08 0.02 0.09 0.81
+t +t -‐t -‐t
+t -‐t
0.17 0.83
R, T
T
L
L
+l -‐l +l -‐l
+t +t -‐t -‐t
0.3 0.7 0.1 0.9
T, L
+l -‐l +l -‐l
0.3 0.7 0.1 0.9
Evidence II § Result will be a selected joint of query and evidence § E.g. for P(L | +r), we would end up with:
+t +t -‐t -‐t
+l -‐l +l -‐l
0.3 0.7 0.1 0.9
Normalize +r +l 0.026 +r -‐l 0.074
+l 0.26 -‐l 0.74
§ Compu2ng , the ini2al factors become:
+r
0.1
+r +r
+t -‐t
0.8 0.2
+t +t -‐t -‐t
+l -‐l +l -‐l
0.3 0.7 0.1 0.9
§ To get our answer, just normalize this! § That ’s it!
§ We eliminate all vars other than query + evidence
General Variable Elimina2on
Example
§ Query: § Start with ini2al factors: § Local CPTs (but instan2ated by evidence)
§ While there are s2ll hidden variables (not Q or evidence): § Pick a hidden variable H § Join all factors men2oning H § Eliminate (sum out) H
§ Join all remaining factors and normalize If a variable appears in front of the condi2oning bar in any of the factors par2cipa2ng in the join, it’ll be in front of the condi2oning bar in the resul2ng factor. Otherwise it’ll end up behind the condi2oning bar. A variable can never appear in front of the condi2oning bar in more than one factor.
Choose A
Sum out T
Join T
Eliminate r
Eliminate r
+r -‐r
+t -‐t +t -‐t
+t +t -‐t -‐t
+l -‐l +l -‐l
L
0.051 0.119 0.083 0.747
+l 0.134 -‐l 0.866
Example
Example 2: P(B|a) Start / Select
Choose E
Join on B
B
Finish with B Normalize
B
P
+b
0.1
¬b
0.9
Normalize
a, B
a
B
A
P
+b
+a
0.8
b
¬a
0.2
¬b
+a
0.1
¬b
¬a
0.9
Same Example in Equa2ons
A
B
P
A
B
P
+a
+b
0.08
+a
+b
8/17
+a
¬b
0.09
+a
¬b
9/17
Another Variable Elimina2on Example Start by inserting evidence, which gives the following initial factors: P (Z), P (X1 |Z), P (X2 |Z), P (X3 |Z), P (y1 |X1 ), P (y2 |X2 ), P (y3 |X3 ) P x1 P (x1 |Z)P (y1 |x1 ),
Eliminate X1 , this introduces the factor f1 (y1 |Z) = and we are left with:
marginal can be obtained from joint by summing out
P (B|j, m) / P (B, j, m) =
=
X
use Bayes’ net joint distribu2on expression
P (B, j, m, e, a)
e,a X
P (B)P (e)P (a|B, e)P (j|a)P (m|a) e,a X X = P (B)P (e) P (a|B, e)P (j|a)P (m|a)
=
e X
a
P (B)P (e)f1 (j, m|B) X e = P (B) P (e)f1 (j, m|B)
use x*(y+z) = xy + xz
joining on a, and then summing out gives f1
use x*(y+z) = xy + xz
joining on e, and then summing out gives f2
e
= P (B)f2 (j, m|B)
All we are doing is exploi4ng uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(w+x)(y+z) to improve computa4onal efficiency!
Variable Elimina2on Ordering § For the query P(Xn|y1,…,yn) work through the following two different orderings as done in previous slide: Z, X1, …, Xn-‐1 and X1, …, Xn-‐1, Z. What is the size of the maximum factor generated for each of the orderings?
P (Z), P (X2 |Z), P (X3 |Z), P (y2 |X2 ), P (y3 |X3 ), f1 (y1 |Z) P x2 P (x2 |Z)P (y2 |x2 ),
Eliminate X2 , this introduces the factor f2 (y2 |Z) = and we are left with:
P (Z), P (X3 |Z), P (y3 |X3 ), f1 (y1 |Z), f2 (y2 |Z)
Eliminate Z, this introduces the factor f3 (y1 , y2 , X3 ) = and we are left with: P (y3 |X3 ), f3 (y1 , y2 , X3 )
P
z
P (z)P (X3 |z)f1 (y1 |Z)f2 (y2 |Z),
No hidden variables left. Join the remaining factors to get: f4 (y1 , y2 , y3 , X3 ) = P (y3 |X3 ), f3 (y1 , y2 , X3 )
Normalizing over X3 gives P (X3 |y1 , y2 , y3 ) = f4 (y1 , y2 , y3 , X3 )/
P
x3
Computa2onal complexity cri2cally depends on the largest factor being generated in this process. Size of factor = number of entries in table. In example above (assuming binary) all factors generated are of size 2 -‐-‐-‐ as they all only have one variable (Z, Z, and X3 respec2vely).
f4 (y1 , y2 , y3 , x3 )
VE: Computa2onal and Space Complexity § The computa2onal and space complexity of variable elimina2on is determined by the largest factor § The elimina2on ordering can greatly affect the size of the largest factor. § E.g., previous slide’s example 2n vs. 2
…
§ Does there always exist an ordering that only results in small factors? § No!
…
§ Answer: 2n+1 versus 22 (assuming binary) § In general: the ordering can greatly affect efficiency.
Worst Case Complexity?
“Easy” Structures: Polytrees
§ CSP: § A polytree is a directed graph with no undirected cycles § For poly-‐trees you can always find an ordering that is efficient § Try it!!
…
§ Cut-‐set condi2oning for Bayes’ net inference
…
§ Choose set of variables such that if removed only a polytree remains § Exercise: Think about how the specifics would work out!
§ If we can answer P(z) equal to zero or not, we answered whether the 3-‐SAT problem has a solu2on. § Hence inference in Bayes’ nets is NP-‐hard. No known efficient probabilis2c inference in general.
Bayes’ Nets § Representa2on § Condi2onal Independences § Probabilis2c Inference § Enumera2on (exact, exponen2al complexity) § Variable elimina2on (exact, worst-‐case exponen2al complexity, olen beaer) § Inference is NP-‐complete § Sampling (approximate)
§ Learning Bayes’ Nets from Data