Announcements CS 188: Aroficial Intelligence Bayes' Net ...

Comment

Report 4 Downloads 142 Views

Announcements

CS 188: Ar2ﬁcial Intelligence

Bayes’ Nets: Inference

§  Midterm 1 §  Solu2ons posted onto piazza §  Grades available on gradescope §  Regrade request window: today/Thursday 11:59pm – Sunday 3/15 11:59pm

§  Homework 6 §  Due: Monday at 11:59pm

§  Project 4 – NEW! §  Due: Friday 3/20 at 5pm

Instructors: Pieter Abbeel -‐-‐-‐ University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hap://ai.berkeley.edu.]

Bayes’ Net Representa2on

Example: Alarm Network B

§  A directed, acyclic graph, one node per random variable §  A condi2onal probability table (CPT) for each node

-‐b

§  A collec2on of distribu2ons over X, one for each combina2on of parents’ values

E

P(B)

+b 0.001

Burglary

Earthqk

P(E)

+e 0.002 -‐e

0.999

0.998

Alarm

John calls

§  Bayes’ nets implicitly encode joint distribu2ons

Mary calls

§  As a product of local condi2onal distribu2ons §  To see what probability a BN gives to a full assignment, mul2ply all the relevant condi2onals together:

B

E

A

P(A|B,E)

+b

+e

+a

0.95

+b

+e

-‐a

0.05

+b

-‐e

+a

0.94

A

J

P(J|A)

A

M

P(M|A)

+b

-‐e

-‐a

0.06

+a

+j

0.9

+a

+m

0.7

-‐b

+e

+a

0.29

+a

-‐j

0.1

+a

-‐m

0.3

-‐b

+e

-‐a

0.71

-‐a

+j

0.05

-‐a

+m

0.01

-‐b

-‐e

+a

0.001

-‐a

-‐j

0.95

-‐a

-‐m

0.99

-‐b

-‐e

-‐a

0.999

Video of Demo BN Applet

Example: Alarm Network B

P(B)

+b 0.001 -‐b

B

E

0.999

E

-‐e

A

P(E)

+e 0.002 0.998

A

J

P(J|A)

A

M

P(M|A)

+a

+j

0.9

+a

+m

0.7

+a

-‐j

0.1

+a

-‐m

0.3

-‐a

+j

0.05

-‐a

+m

0.01

-‐a

-‐j

0.95

-‐a

-‐m

0.99

J

M

B

E

A

P(A|B,E)

+b

+e

+a

0.95

+b

+e

-‐a

0.05

+b

-‐e

+a

0.94

+b

-‐e

-‐a

0.06

-‐b

+e

+a

0.29

-‐b

+e

-‐a

0.71

-‐b

-‐e

+a

0.001

-‐b

-‐e

-‐a

0.999

Example: Alarm Network B

P(B)

+b 0.001 -‐b

B

E

0.999

E

P(E)

+e 0.002 -‐e

A

Time

A

J

P(J|A)

A

M

P(M|A)

+j

0.9

+a

+m

0.7

+a

-‐j

0.1

+a

-‐m

0.3

-‐a

+j

0.05

-‐a

+m

0.01

-‐a

-‐j

0.95

-‐a

-‐m

0.99

M

Temperature

0.998

+a

J

P4 Bayes’ Net

B

E

A

P(A|B,E)

+b

+e

+a

0.95

+b

+e

-‐a

0.05

+b

-‐e

+a

0.94

+b

-‐e

-‐a

0.06

-‐b

+e

+a

0.29

-‐b

+e

-‐a

0.71

-‐b

-‐e

+a

0.001

-‐b

-‐e

-‐a

0.999

Laser

Blast

Belt

P4 Demo Video

Speed

Size

Bayes’ Nets §  Representa2on §  Condi2onal Independences §  Probabilis2c Inference §  Enumera2on (exact, exponen2al complexity) §  Variable elimina2on (exact, worst-‐case exponen2al complexity, olen beaer) §  Inference is NP-‐complete §  Sampling (approximate)

§  Learning Bayes’ Nets from Data

Inference §  Inference: calcula2ng some useful quan2ty from a joint probability distribu2on

§  Examples: §  Posterior probability

§  Most likely explana2on:

Inference by Enumera2on §  General case:

§  Evidence variables: §  Query* variable: §  Hidden variables:

§  Step 1: Select the entries consistent with the evidence

* Works ﬁne with mul:ple query variables, too

§  We want: All variables

§  Step 2: Sum out H to get joint of Query and evidence

§  Step 3: Normalize

⇥

1 Z

Inference by Enumera2on in Bayes’ Net §  Given unlimited 2me, inference in BNs is easy §  Reminder of inference by enumera2on by example:

B

P (B | + j, +m) /B P (B, +j, +m) =

X

=

E A

P (B, e, a, +j, +m)

J

e,a

X

Inference by Enumera2on?

M

P (B)P (e)P (a|B, e)P (+j|a)P (+m|a)

e,a

=P (B)P (+e)P (+a|B, +e)P (+j| + a)P (+m| + a) + P (B)P (+e)P ( a|B, +e)P (+j|

a)P (+m|

a)

P (B)P ( e)P (+a|B, e)P (+j| + a)P (+m| + a) + P (B)P ( e)P ( a|B, e)P (+j|

a)P (+m|

a)

P (Antilock|observed variables) = ?

Inference by Enumera2on vs. Variable Elimina2on §  Why is inference by enumera2on so slow?

Factor Zoo

§  Idea: interleave joining and marginalizing!

§  You join up the whole joint distribu2on before you sum out the hidden variables

§  Called “Variable Elimina2on” §  S2ll NP-‐hard, but usually much faster than inference by enumera2on

§  First we’ll need some new nota2on: factors

Factor Zoo I §  Joint distribu2on: P(X,Y) §  Entries P(x,y) for all x, y §  Sums to 1

T

W

P

hot

sun

0.4

hot

rain

0.1

cold

sun

0.2

cold

rain

0.3

Factor Zoo II §  Single condi2onal: P(Y | x) §  Entries P(y | x) for ﬁxed x, all y §  Sums to 1

T

W

P

cold

sun

0.4

cold

rain

0.6

§  Selected joint: P(x,Y) §  A slice of the joint distribu2on §  Entries P(x,y) for ﬁxed x, all y §  Sums to P(x)

§  Number of capitals = dimensionality of the table

T

W

P

cold

sun

0.2

cold

rain

0.3

§  Family of condi2onals: P(X |Y) §  Mul2ple condi2onals §  Entries P(x | y) for all x, y §  Sums to |Y|

T

W

P

hot

sun

0.8

hot

rain

0.2

cold

sun

0.4

cold

rain

0.6

Factor Zoo III

Factor Zoo Summary §  In general, when we write P(Y1 … YN | X1 … XM)

§  Speciﬁed family: P( y | X ) §  Entries P(y | x) for ﬁxed y, but for all x §  Sums to … who knows!

§  It is a “factor,” a mul2-‐dimensional array §  Its values are P(y1 … yN | x1 … xM) §  Any assigned (=lower-‐case) X or Y is a dimension missing (selected) from the array

T

W

P

hot

rain

0.2

cold

rain

0.6

Example: Traﬃc Domain §  Random Variables §  R: Raining §  T: Traﬃc §  L: Late for class!

R T

P (L) = ? =

X

L

P (r, t, L)

r,t

=

X

P (r)P (t|r)P (L|t)

r,t

+r -‐r

Variable Elimina2on (VE)

0.1 0.9

+r +r -‐r -‐r

+t -‐t +t -‐t

0.8 0.2 0.1 0.9

+t +t -‐t -‐t

+l -‐l +l -‐l

0.3 0.7 0.1 0.9

Inference by Enumera2on: Procedural Outline §  Track objects called factors §  Ini2al factors are local CPTs (one per node) +r -‐r

0.1 0.9

+r +r -‐r -‐r

+t -‐t +t -‐t

0.8 0.2 0.1 0.9

+t +t -‐t -‐t

+l -‐l +l -‐l

0.3 0.7 0.1 0.9

§  Any known values are selected §  E.g. if we know , the ini2al factors are +r -‐r

0.1 0.9

+r +r -‐r -‐r

+t -‐t +t -‐t

0.8 0.2 0.1 0.9

+t -‐t

+l +l

0.3 0.1

§  Procedure: Join all factors, then eliminate all hidden variables

Opera2on 1: Join Factors §  First basic opera2on: joining factors §  Combining factors: §  Just like a database join §  Get all factors over the joining variable §  Build a new factor over the union of the variables involved

§  Example: Join on R

R +r -‐r

T

0.1 0.9

+r +r -‐r -‐r

§  Computa2on for each entry: pointwise products

+t -‐t +t -‐t

0.8 0.2 0.1 0.9

+r +t 0.08 +r -‐t 0.02 -‐r +t 0.09 -‐r -‐t 0.81

R,T

Example: Mul2ple Joins

Example: Mul2ple Joins R T L

+r -‐r

0.1 0.9

Join R

+r +r -‐r -‐r

+t -‐t +t -‐t

0.8 0.2 0.1 0.9

+t +t -‐t -‐t

+l -‐l +l -‐l

0.3 0.7 0.1 0.9

+t +t -‐t -‐t

§  Example:

+r +r -‐r -‐r

+t -‐t +t -‐t

0.08 0.02 0.09 0.81

+t -‐t

+l -‐l +l -‐l

R, T, L

R, T +r +r +r +r -‐r -‐r -‐r -‐r

0.3 0.7 0.1 0.9

R, T, L +r +r +r +r -‐r -‐r -‐r -‐r

§  A projec2on opera2on

Join T

0.08 0.02 0.09 0.81

+t +t -‐t -‐t +t +t -‐t -‐t

+l -‐l +l -‐l +l -‐l +l -‐l

Mul2ple Elimina2on

§  Second basic opera2on: marginaliza2on §  Shrinks a factor to a smaller one

+t -‐t +t -‐t

L

Opera2on 2: Eliminate §  Take a factor and sum out a variable

+r +r -‐r -‐r

+t +t -‐t -‐t +t +t -‐t -‐t

+l -‐l +l -‐l +l -‐l +l -‐l

T, L 0.024 0.056 0.002 0.018 0.027 0.063 0.081 0.729

Sum out R

L

Sum out T +t +t -‐t -‐t

+l -‐l +l -‐l

0.051 0.119 0.083 0.747

+l 0.134 -‐l 0.886

0.17 0.83

Thus Far: Mul2ple Join, Mul2ple Eliminate (= Inference by Enumera2on)

Marginalizing Early (= Variable Elimina2on)

0.024 0.056 0.002 0.018 0.027 0.063 0.081 0.729

Traﬃc Domain

Marginalizing Early! (aka VE) Join R

P (L) = ?

R T

§  Inference by Enumera2on =

L

XX t

+r -‐r

§  Variable Elimina2on =

P (L|t)P (r)P (t|r)

X t

r

Join on r Join on t

P (L|t)

X

P (r)P (t|r)

R

r

Join on r

T

+r +r -‐r -‐r

0.1 0.9

+t -‐t +t -‐t

Sum out R +r +r -‐r -‐r

0.8 0.2 0.1 0.9

Join on t

Eliminate t

Eliminate t

L

+t +t -‐t -‐t

+l -‐l +l -‐l

0.3 0.7 0.1 0.9

Evidence §  If evidence, start with factors that select that evidence §  No evidence uses these ini2al factors: 0.1 0.9

+r +r -‐r -‐r

+t -‐t +t -‐t

0.8 0.2 0.1 0.9

0.08 0.02 0.09 0.81

+t +t -‐t -‐t

+t -‐t

0.17 0.83

R, T

T

L

L

+l -‐l +l -‐l

+t +t -‐t -‐t

0.3 0.7 0.1 0.9

T, L

+l -‐l +l -‐l

0.3 0.7 0.1 0.9

Evidence II §  Result will be a selected joint of query and evidence §  E.g. for P(L | +r), we would end up with:

+t +t -‐t -‐t

+l -‐l +l -‐l

0.3 0.7 0.1 0.9

Normalize +r +l 0.026 +r -‐l 0.074

+l 0.26 -‐l 0.74

§  Compu2ng , the ini2al factors become:

+r

0.1

+r +r

+t -‐t

0.8 0.2

+t +t -‐t -‐t

+l -‐l +l -‐l

0.3 0.7 0.1 0.9

§  To get our answer, just normalize this! §  That ’s it!

§  We eliminate all vars other than query + evidence

General Variable Elimina2on

Example

§  Query: §  Start with ini2al factors: §  Local CPTs (but instan2ated by evidence)

§  While there are s2ll hidden variables (not Q or evidence): §  Pick a hidden variable H §  Join all factors men2oning H §  Eliminate (sum out) H

§  Join all remaining factors and normalize If a variable appears in front of the condi2oning bar in any of the factors par2cipa2ng in the join, it’ll be in front of the condi2oning bar in the resul2ng factor. Otherwise it’ll end up behind the condi2oning bar. A variable can never appear in front of the condi2oning bar in more than one factor.

Choose A

Sum out T

Join T

Eliminate r

Eliminate r

+r -‐r

+t -‐t +t -‐t

+t +t -‐t -‐t

+l -‐l +l -‐l

L

0.051 0.119 0.083 0.747

+l 0.134 -‐l 0.866

Example

Example 2: P(B|a) Start / Select

Choose E

Join on B

B

Finish with B Normalize

B

P

+b

0.1

¬b

0.9

Normalize

a, B

a

B

A

P

+b

+a

0.8

b

¬a

0.2

¬b

+a

0.1

¬b

¬a

0.9

Same Example in Equa2ons

A

B

P

A

B

P

+a

+b

0.08

+a

+b

8/17

+a

¬b

0.09

+a

¬b

9/17

Another Variable Elimina2on Example Start by inserting evidence, which gives the following initial factors: P (Z), P (X1 |Z), P (X2 |Z), P (X3 |Z), P (y1 |X1 ), P (y2 |X2 ), P (y3 |X3 ) P x1 P (x1 |Z)P (y1 |x1 ),

Eliminate X1 , this introduces the factor f1 (y1 |Z) = and we are left with:

marginal can be obtained from joint by summing out

P (B|j, m) / P (B, j, m) =

=

X

use Bayes’ net joint distribu2on expression

P (B, j, m, e, a)

e,a X

P (B)P (e)P (a|B, e)P (j|a)P (m|a) e,a X X = P (B)P (e) P (a|B, e)P (j|a)P (m|a)

=

e X

a

P (B)P (e)f1 (j, m|B) X e = P (B) P (e)f1 (j, m|B)

use x*(y+z) = xy + xz

joining on a, and then summing out gives f1

use x*(y+z) = xy + xz

joining on e, and then summing out gives f2

e

= P (B)f2 (j, m|B)

All we are doing is exploi4ng uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz = (u+v)(w+x)(y+z) to improve computa4onal eﬃciency!

Variable Elimina2on Ordering §  For the query P(Xn|y1,…,yn) work through the following two diﬀerent orderings as done in previous slide: Z, X1, …, Xn-‐1 and X1, …, Xn-‐1, Z. What is the size of the maximum factor generated for each of the orderings?

P (Z), P (X2 |Z), P (X3 |Z), P (y2 |X2 ), P (y3 |X3 ), f1 (y1 |Z) P x2 P (x2 |Z)P (y2 |x2 ),

Eliminate X2 , this introduces the factor f2 (y2 |Z) = and we are left with:

P (Z), P (X3 |Z), P (y3 |X3 ), f1 (y1 |Z), f2 (y2 |Z)

Eliminate Z, this introduces the factor f3 (y1 , y2 , X3 ) = and we are left with: P (y3 |X3 ), f3 (y1 , y2 , X3 )

P

z

P (z)P (X3 |z)f1 (y1 |Z)f2 (y2 |Z),

No hidden variables left. Join the remaining factors to get: f4 (y1 , y2 , y3 , X3 ) = P (y3 |X3 ), f3 (y1 , y2 , X3 )

Normalizing over X3 gives P (X3 |y1 , y2 , y3 ) = f4 (y1 , y2 , y3 , X3 )/

P

x3

Computa2onal complexity cri2cally depends on the largest factor being generated in this process. Size of factor = number of entries in table. In example above (assuming binary) all factors generated are of size 2 -‐-‐-‐ as they all only have one variable (Z, Z, and X3 respec2vely).

f4 (y1 , y2 , y3 , x3 )

VE: Computa2onal and Space Complexity §  The computa2onal and space complexity of variable elimina2on is determined by the largest factor §  The elimina2on ordering can greatly aﬀect the size of the largest factor. §  E.g., previous slide’s example 2n vs. 2

…

§  Does there always exist an ordering that only results in small factors? §  No!

…

§  Answer: 2n+1 versus 22 (assuming binary) §  In general: the ordering can greatly aﬀect eﬃciency.

Worst Case Complexity?

“Easy” Structures: Polytrees

§  CSP: §  A polytree is a directed graph with no undirected cycles §  For poly-‐trees you can always ﬁnd an ordering that is eﬃcient §  Try it!!

…

§  Cut-‐set condi2oning for Bayes’ net inference

…

§  Choose set of variables such that if removed only a polytree remains §  Exercise: Think about how the speciﬁcs would work out!

§  If we can answer P(z) equal to zero or not, we answered whether the 3-‐SAT problem has a solu2on. §  Hence inference in Bayes’ nets is NP-‐hard. No known eﬃcient probabilis2c inference in general.

Bayes’ Nets §  Representa2on §  Condi2onal Independences §  Probabilis2c Inference §  Enumera2on (exact, exponen2al complexity) §  Variable elimina2on (exact, worst-‐case exponen2al complexity, olen beaer) §  Inference is NP-‐complete §  Sampling (approximate)

§  Learning Bayes’ Nets from Data