Announcements CS 188: Aroficial Intelligence Bayes' Net ...

Report 4 Downloads 142 Views
Announcements  

CS  188:  Ar2ficial  Intelligence    

Bayes’  Nets:  Inference  

§  Midterm  1   §  Solu2ons  posted  onto  piazza   §  Grades  available  on  gradescope   §  Regrade  request  window:  today/Thursday  11:59pm  –  Sunday  3/15  11:59pm  

§  Homework    6   §  Due:  Monday  at  11:59pm  

§  Project  4  –  NEW!   §  Due:  Friday  3/20  at  5pm  

Instructors:  Pieter  Abbeel  -­‐-­‐-­‐  University  of  California,  Berkeley   [These  slides  were  created  by  Dan  Klein  and  Pieter  Abbeel  for  CS188  Intro  to  AI  at  UC  Berkeley.    All  CS188  materials  are  available  at  hap://ai.berkeley.edu.]  

Bayes’  Net  Representa2on  

Example:  Alarm  Network   B  

§  A  directed,  acyclic  graph,  one  node  per  random  variable   §  A  condi2onal  probability  table  (CPT)  for  each  node  

-­‐b  

§  A  collec2on  of  distribu2ons  over  X,  one  for  each  combina2on   of  parents’  values  

E  

P(B)  

+b   0.001  

Burglary  

Earthqk  

P(E)  

+e   0.002   -­‐e  

0.999  

0.998  

Alarm  

John   calls  

§  Bayes’ nets  implicitly  encode  joint  distribu2ons  

Mary   calls  

§  As  a  product  of  local  condi2onal  distribu2ons   §  To  see  what  probability  a  BN  gives  to  a  full  assignment,   mul2ply  all  the  relevant  condi2onals  together:  

B  

E  

A  

P(A|B,E)  

+b  

+e  

+a  

0.95  

+b  

+e  

-­‐a  

0.05  

+b  

-­‐e  

+a  

0.94  

A  

J  

P(J|A)  

A  

M  

P(M|A)  

+b  

-­‐e  

-­‐a  

0.06  

+a  

+j  

0.9  

+a  

+m  

0.7  

-­‐b  

+e  

+a  

0.29  

+a  

-­‐j  

0.1  

+a  

-­‐m  

0.3  

-­‐b  

+e  

-­‐a  

0.71  

-­‐a  

+j  

0.05  

-­‐a  

+m  

0.01  

-­‐b  

-­‐e  

+a  

0.001  

-­‐a  

-­‐j  

0.95  

-­‐a  

-­‐m  

0.99  

-­‐b  

-­‐e  

-­‐a  

0.999  

Video  of  Demo  BN  Applet  

Example:  Alarm  Network   B  

P(B)  

+b   0.001   -­‐b  

B  

E  

0.999  

E  

-­‐e  

A  

P(E)  

+e   0.002   0.998  

A  

J  

P(J|A)  

A  

M  

P(M|A)  

+a  

+j  

0.9  

+a  

+m  

0.7  

+a  

-­‐j  

0.1  

+a  

-­‐m  

0.3  

-­‐a  

+j  

0.05  

-­‐a  

+m  

0.01  

-­‐a  

-­‐j  

0.95  

-­‐a  

-­‐m  

0.99  

J  

M  

B  

E  

A  

P(A|B,E)  

+b  

+e  

+a  

0.95  

+b  

+e  

-­‐a  

0.05  

+b  

-­‐e  

+a  

0.94  

+b  

-­‐e  

-­‐a  

0.06  

-­‐b  

+e  

+a  

0.29  

-­‐b  

+e  

-­‐a  

0.71  

-­‐b  

-­‐e  

+a  

0.001  

-­‐b  

-­‐e  

-­‐a  

0.999  

Example:  Alarm  Network   B  

P(B)  

+b   0.001   -­‐b  

B  

E  

0.999  

E  

P(E)  

+e   0.002   -­‐e  

A  

Time

A  

J  

P(J|A)  

A  

M  

P(M|A)  

+j  

0.9  

+a  

+m  

0.7  

+a  

-­‐j  

0.1  

+a  

-­‐m  

0.3  

-­‐a  

+j  

0.05  

-­‐a  

+m  

0.01  

-­‐a  

-­‐j  

0.95  

-­‐a  

-­‐m  

0.99  

M  

Temperature

0.998  

+a  

J  

P4  Bayes’  Net  

B  

E  

A  

P(A|B,E)  

+b  

+e  

+a  

0.95  

+b  

+e  

-­‐a  

0.05  

+b  

-­‐e  

+a  

0.94  

+b  

-­‐e  

-­‐a  

0.06  

-­‐b  

+e  

+a  

0.29  

-­‐b  

+e  

-­‐a  

0.71  

-­‐b  

-­‐e  

+a  

0.001  

-­‐b  

-­‐e  

-­‐a  

0.999  

Laser

Blast

Belt

P4  Demo  Video  

Speed

Size

Bayes’  Nets   §  Representa2on   §  Condi2onal  Independences   §  Probabilis2c  Inference   §  Enumera2on  (exact,  exponen2al  complexity)   §  Variable  elimina2on  (exact,  worst-­‐case  exponen2al   complexity,  olen  beaer)   §  Inference  is  NP-­‐complete   §  Sampling  (approximate)  

§  Learning  Bayes’  Nets  from  Data  

Inference   §  Inference:  calcula2ng  some   useful  quan2ty  from  a  joint   probability  distribu2on  

§  Examples:   §  Posterior  probability  

§  Most  likely  explana2on:  

Inference  by  Enumera2on   §  General  case:  

§  Evidence  variables:     §  Query*  variable:   §  Hidden  variables:  

§  Step  1:  Select  the   entries  consistent   with  the  evidence  

*  Works  fine  with   mul:ple  query   variables,  too  

§  We  want:   All  variables  

§  Step  2:  Sum  out  H  to  get  joint   of  Query  and  evidence  

§  Step  3:  Normalize  



1 Z

Inference  by  Enumera2on  in  Bayes’  Net   §  Given  unlimited  2me,  inference  in  BNs  is  easy   §  Reminder  of  inference  by  enumera2on  by  example:  

B  

P (B | + j, +m) /B P (B, +j, +m) =

X

=

E   A  

P (B, e, a, +j, +m)

J  

e,a

X

Inference  by  Enumera2on?  

M  

P (B)P (e)P (a|B, e)P (+j|a)P (+m|a)

e,a

=P (B)P (+e)P (+a|B, +e)P (+j| + a)P (+m| + a) + P (B)P (+e)P ( a|B, +e)P (+j|

a)P (+m|

a)

P (B)P ( e)P (+a|B, e)P (+j| + a)P (+m| + a) + P (B)P ( e)P ( a|B, e)P (+j|

a)P (+m|

a)

P (Antilock|observed variables) = ?

Inference  by  Enumera2on  vs.  Variable  Elimina2on   §  Why  is  inference  by  enumera2on  so  slow?  

Factor  Zoo  

§  Idea:  interleave  joining  and  marginalizing!  

§  You  join  up  the  whole  joint  distribu2on  before   you  sum  out  the  hidden  variables  

§  Called  “Variable  Elimina2on”   §  S2ll  NP-­‐hard,  but  usually  much  faster  than   inference  by  enumera2on  

§  First  we’ll  need  some  new  nota2on:  factors  

Factor  Zoo  I   §  Joint  distribu2on:  P(X,Y)   §  Entries  P(x,y)  for  all  x,  y   §  Sums  to  1  

T  

W  

P  

hot  

sun  

0.4  

hot  

rain  

0.1  

cold  

sun  

0.2  

cold  

rain  

0.3  

Factor  Zoo  II   §  Single  condi2onal:  P(Y  |  x)   §  Entries  P(y  |  x)  for  fixed  x,  all  y   §  Sums  to  1  

T  

W  

P  

cold  

sun  

0.4  

cold  

rain  

0.6  

§  Selected  joint:  P(x,Y)   §  A  slice  of  the  joint  distribu2on   §  Entries  P(x,y)  for  fixed  x,  all  y   §  Sums  to  P(x)  

§  Number  of  capitals  =   dimensionality  of  the  table  

T  

W  

P  

cold  

sun  

0.2  

cold  

rain  

0.3  

§  Family  of  condi2onals:      P(X  |Y)   §  Mul2ple  condi2onals   §  Entries  P(x  |  y)  for  all  x,  y   §  Sums  to  |Y|  

T  

W  

P  

hot  

sun  

0.8  

hot  

rain  

0.2  

cold  

sun  

0.4  

cold  

rain  

0.6  

Factor  Zoo  III  

Factor  Zoo  Summary   §  In  general,  when  we  write  P(Y1  …  YN  |  X1  …  XM)  

§  Specified  family:  P(  y  |  X  )   §  Entries  P(y  |  x)  for  fixed  y,    but  for  all  x   §  Sums  to  …  who  knows!  

§  It  is  a  “factor,”  a  mul2-­‐dimensional  array   §  Its  values  are  P(y1  …  yN  |  x1  …  xM)   §  Any  assigned  (=lower-­‐case)  X  or  Y  is  a  dimension  missing  (selected)  from  the  array  

T  

W  

P  

hot  

rain  

0.2  

cold  

rain  

0.6  

Example:  Traffic  Domain   §  Random  Variables   §  R:  Raining   §  T:  Traffic   §  L:  Late  for  class!  

R T

P (L) = ? =

X

L

P (r, t, L)

r,t

=

X

P (r)P (t|r)P (L|t)

r,t

+r   -­‐r  

Variable  Elimina2on  (VE)  

0.1   0.9  

+r   +r   -­‐r   -­‐r  

+t   -­‐t   +t   -­‐t  

0.8   0.2   0.1   0.9  

+t   +t   -­‐t   -­‐t  

+l   -­‐l   +l   -­‐l  

0.3   0.7   0.1   0.9  

Inference  by  Enumera2on:  Procedural  Outline   §  Track  objects  called  factors   §  Ini2al  factors  are  local  CPTs  (one  per  node)   +r   -­‐r  

0.1   0.9  

+r   +r   -­‐r   -­‐r  

+t   -­‐t   +t   -­‐t  

0.8   0.2   0.1   0.9  

+t   +t   -­‐t   -­‐t  

+l   -­‐l   +l   -­‐l  

0.3   0.7   0.1   0.9  

§  Any  known  values  are  selected   §  E.g.  if  we  know                                    ,  the  ini2al  factors  are   +r   -­‐r  

0.1   0.9  

+r   +r   -­‐r   -­‐r  

+t   -­‐t   +t   -­‐t  

0.8   0.2   0.1   0.9  

+t   -­‐t  

+l   +l  

0.3   0.1  

§  Procedure:  Join  all  factors,  then  eliminate  all  hidden  variables  

Opera2on  1:  Join  Factors   §  First  basic  opera2on:  joining  factors   §  Combining  factors:   §  Just  like  a  database  join   §  Get  all  factors  over  the  joining  variable   §  Build  a  new  factor  over  the  union  of  the  variables   involved  

§  Example:  Join  on  R  

R +r   -­‐r  

T

0.1   0.9  

+r   +r   -­‐r   -­‐r  

§  Computa2on  for  each  entry:  pointwise  products  

+t   -­‐t   +t   -­‐t  

0.8   0.2   0.1   0.9  

+r   +t   0.08   +r   -­‐t   0.02   -­‐r   +t   0.09   -­‐r   -­‐t   0.81  

R,T

Example:  Mul2ple  Joins  

Example:  Mul2ple  Joins   R T L

+r   -­‐r  

0.1   0.9  

Join  R  

+r   +r   -­‐r   -­‐r  

+t   -­‐t   +t   -­‐t  

0.8   0.2   0.1   0.9  

+t   +t   -­‐t   -­‐t  

+l   -­‐l   +l   -­‐l  

0.3   0.7   0.1   0.9  

+t   +t   -­‐t   -­‐t  

§  Example:  

+r   +r   -­‐r   -­‐r  

+t   -­‐t   +t   -­‐t  

0.08   0.02   0.09   0.81  

+t   -­‐t  

+l   -­‐l   +l   -­‐l  

R, T, L

R, T +r   +r   +r   +r   -­‐r   -­‐r   -­‐r   -­‐r  

0.3   0.7   0.1   0.9  

R, T, L +r   +r   +r   +r   -­‐r   -­‐r   -­‐r   -­‐r  

§  A  projec2on  opera2on  

Join  T  

0.08   0.02   0.09   0.81  

+t   +t   -­‐t   -­‐t   +t   +t   -­‐t   -­‐t  

+l   -­‐l   +l   -­‐l   +l   -­‐l   +l   -­‐l  

Mul2ple  Elimina2on  

§  Second  basic  opera2on:  marginaliza2on   §  Shrinks  a  factor  to  a  smaller  one  

+t   -­‐t   +t   -­‐t  

L

Opera2on  2:  Eliminate   §  Take  a  factor  and  sum  out  a  variable  

+r   +r   -­‐r   -­‐r  

+t   +t   -­‐t   -­‐t   +t   +t   -­‐t   -­‐t  

+l   -­‐l   +l   -­‐l   +l   -­‐l   +l   -­‐l  

T, L 0.024   0.056   0.002   0.018   0.027   0.063   0.081   0.729  

Sum   out  R  

L

Sum   out  T   +t   +t   -­‐t   -­‐t  

+l   -­‐l   +l   -­‐l  

0.051   0.119   0.083   0.747  

+l   0.134   -­‐l   0.886  

0.17   0.83  

Thus  Far:  Mul2ple  Join,  Mul2ple  Eliminate  (=  Inference  by  Enumera2on)  

Marginalizing  Early  (=  Variable  Elimina2on)  

0.024   0.056   0.002   0.018   0.027   0.063   0.081   0.729  

Traffic  Domain  

Marginalizing  Early!  (aka  VE)   Join  R  

P (L) = ?

R T

§  Inference  by  Enumera2on   =

L

XX t

+r   -­‐r  

§  Variable  Elimina2on   =

P (L|t)P (r)P (t|r)

X t

r

Join  on  r   Join  on  t  

P (L|t)

X

P (r)P (t|r)

R

r

Join  on  r  

T

+r   +r   -­‐r   -­‐r  

0.1   0.9  

+t   -­‐t   +t   -­‐t  

Sum  out  R   +r   +r   -­‐r   -­‐r  

0.8   0.2   0.1   0.9  

Join  on  t  

Eliminate  t  

Eliminate  t  

L

+t   +t   -­‐t   -­‐t  

+l   -­‐l   +l   -­‐l  

0.3   0.7   0.1   0.9  

Evidence   §  If  evidence,  start  with  factors  that  select  that  evidence   §  No  evidence  uses  these  ini2al  factors:   0.1   0.9  

+r   +r   -­‐r   -­‐r  

+t   -­‐t   +t   -­‐t  

0.8   0.2   0.1   0.9  

0.08   0.02   0.09   0.81  

+t   +t   -­‐t   -­‐t  

+t   -­‐t  

0.17   0.83  

R, T

T

L

L

+l   -­‐l   +l   -­‐l  

+t   +t   -­‐t   -­‐t  

0.3   0.7   0.1   0.9  

T, L

+l   -­‐l   +l   -­‐l  

0.3   0.7   0.1   0.9  

Evidence  II   §  Result  will  be  a  selected  joint  of  query  and  evidence   §  E.g.  for  P(L  |  +r),  we  would  end  up  with:  

+t   +t   -­‐t   -­‐t  

+l   -­‐l   +l   -­‐l  

0.3   0.7   0.1   0.9  

Normalize   +r   +l   0.026   +r   -­‐l   0.074  

+l   0.26   -­‐l   0.74  

§  Compu2ng                                                ,  the  ini2al  factors  become:  

+r  

0.1  

+r   +r  

+t   -­‐t  

0.8   0.2  

+t   +t   -­‐t   -­‐t  

+l   -­‐l   +l   -­‐l  

0.3   0.7   0.1   0.9  

§  To  get  our  answer,  just  normalize  this!   §  That  ’s  it!  

§  We  eliminate  all  vars  other  than  query  +  evidence  

General  Variable  Elimina2on  

Example  

§  Query:   §  Start  with  ini2al  factors:   §  Local  CPTs  (but  instan2ated  by  evidence)  

§  While  there  are  s2ll  hidden  variables   (not  Q  or  evidence):   §  Pick  a  hidden  variable  H   §  Join  all  factors  men2oning  H   §  Eliminate  (sum  out)  H  

§  Join  all  remaining  factors  and  normalize   If  a  variable  appears  in  front  of  the  condi2oning  bar  in  any  of  the  factors  par2cipa2ng  in  the  join,  it’ll  be   in  front  of  the  condi2oning  bar  in  the  resul2ng  factor.    Otherwise  it’ll  end  up  behind  the  condi2oning  bar.   A  variable  can  never  appear  in  front  of  the  condi2oning  bar  in  more  than  one  factor.  

Choose A

Sum  out  T  

Join  T  

Eliminate  r  

Eliminate  r  

+r   -­‐r  

+t   -­‐t   +t   -­‐t  

+t   +t   -­‐t   -­‐t  

+l   -­‐l   +l   -­‐l  

L

0.051   0.119   0.083   0.747  

+l   0.134   -­‐l   0.866  

Example  

Example  2:  P(B|a)   Start  /  Select  

Choose  E  

Join  on  B  

B

Finish  with  B   Normalize  

B  

P  

+b  

0.1  

¬b  

0.9  

Normalize  

a, B

a

B  

A  

P  

+b  

+a  

0.8  

b  

¬a  

0.2  

¬b  

+a  

0.1  

¬b  

¬a  

0.9  

Same  Example  in  Equa2ons  

A  

B  

P  

A  

B  

P  

+a  

+b  

0.08  

+a  

+b  

8/17  

+a  

¬b  

0.09  

+a  

¬b  

9/17  

Another  Variable  Elimina2on  Example   Start by inserting evidence, which gives the following initial factors: P (Z), P (X1 |Z), P (X2 |Z), P (X3 |Z), P (y1 |X1 ), P (y2 |X2 ), P (y3 |X3 ) P x1 P (x1 |Z)P (y1 |x1 ),

Eliminate X1 , this introduces the factor f1 (y1 |Z) = and we are left with:

 

marginal  can  be  obtained  from  joint  by  summing  out  

P (B|j, m) / P (B, j, m) =

=

X

 

use  Bayes’  net  joint  distribu2on  expression  

P (B, j, m, e, a)

 

e,a X

P (B)P (e)P (a|B, e)P (j|a)P (m|a) e,a X X = P (B)P (e) P (a|B, e)P (j|a)P (m|a)

=

e X

a

P (B)P (e)f1 (j, m|B) X e = P (B) P (e)f1 (j, m|B)

use  x*(y+z)  =  xy  +  xz    

joining  on  a,  and  then  summing  out  gives  f1    

use  x*(y+z)    =  xy  +  xz    

joining  on  e,  and  then  summing  out  gives  f2  

e

= P (B)f2 (j, m|B)

All  we  are  doing  is  exploi4ng  uwy  +  uwz  +  uxy  +  uxz  +  vwy  +  vwz  +  vxy  +vxz  =  (u+v)(w+x)(y+z)  to  improve  computa4onal  efficiency!  

Variable  Elimina2on  Ordering   §  For  the  query  P(Xn|y1,…,yn)  work  through  the  following  two  different  orderings   as  done  in  previous  slide:  Z,  X1,  …,  Xn-­‐1  and  X1,  …,  Xn-­‐1,  Z.    What  is  the  size  of  the   maximum  factor  generated  for  each  of  the  orderings?  

P (Z), P (X2 |Z), P (X3 |Z), P (y2 |X2 ), P (y3 |X3 ), f1 (y1 |Z) P x2 P (x2 |Z)P (y2 |x2 ),

Eliminate X2 , this introduces the factor f2 (y2 |Z) = and we are left with:

P (Z), P (X3 |Z), P (y3 |X3 ), f1 (y1 |Z), f2 (y2 |Z)

Eliminate Z, this introduces the factor f3 (y1 , y2 , X3 ) = and we are left with: P (y3 |X3 ), f3 (y1 , y2 , X3 )

P

z

P (z)P (X3 |z)f1 (y1 |Z)f2 (y2 |Z),

No hidden variables left. Join the remaining factors to get: f4 (y1 , y2 , y3 , X3 ) = P (y3 |X3 ), f3 (y1 , y2 , X3 )

Normalizing over X3 gives P (X3 |y1 , y2 , y3 ) = f4 (y1 , y2 , y3 , X3 )/

P

x3

Computa2onal  complexity  cri2cally   depends  on  the  largest  factor  being   generated  in  this  process.    Size  of  factor   =  number  of  entries  in  table.    In   example  above  (assuming  binary)  all   factors  generated  are  of  size  2  -­‐-­‐-­‐  as   they  all  only  have  one  variable  (Z,  Z,   and  X3  respec2vely).    

f4 (y1 , y2 , y3 , x3 )

VE:  Computa2onal  and  Space  Complexity   §  The  computa2onal  and  space  complexity  of  variable  elimina2on  is   determined  by  the  largest  factor   §  The  elimina2on  ordering  can  greatly  affect  the  size  of  the  largest  factor.       §  E.g.,  previous  slide’s  example  2n  vs.  2  

…  

§  Does  there  always  exist  an  ordering  that  only  results  in  small  factors?   §  No!  

…  

§  Answer:  2n+1  versus  22  (assuming  binary)   §  In  general:  the  ordering  can  greatly  affect  efficiency.      

Worst  Case  Complexity?  

“Easy”  Structures:  Polytrees  

§  CSP:       §  A  polytree  is  a  directed  graph  with  no  undirected  cycles   §  For  poly-­‐trees  you  can  always  find  an  ordering  that  is  efficient     §  Try  it!!  



§  Cut-­‐set  condi2oning  for  Bayes’  net  inference  



§  Choose  set  of  variables  such  that  if  removed  only  a  polytree  remains   §  Exercise:  Think  about  how  the  specifics  would  work  out!  

§  If  we  can  answer  P(z)  equal  to  zero  or  not,  we  answered  whether  the  3-­‐SAT  problem  has  a  solu2on.   §  Hence  inference  in  Bayes’  nets  is  NP-­‐hard.    No  known  efficient  probabilis2c  inference  in  general.  

Bayes’  Nets   §  Representa2on   §  Condi2onal  Independences   §  Probabilis2c  Inference   §  Enumera2on  (exact,  exponen2al   complexity)   §  Variable  elimina2on  (exact,  worst-­‐case   exponen2al  complexity,  olen  beaer)   §  Inference  is  NP-­‐complete   §  Sampling  (approximate)  

§  Learning  Bayes’  Nets  from  Data