Variable( Elimina,on( Algorithm AWS

Report 0 Downloads 34 Views
Probabilis,c( Graphical( Models(

Inference( Variable(Elimina,on(

Variable( Elimina,on( Algorithm( Daphne Koller

X

Elimination in Chains

A

B

C

E

D

~ P ( E ) ∝ ∑ ∑ ∑ ∑ P ( A, B, C , D, E ) D

C

B

A

= ∑∑∑∑φ1 ( A, B)φ2 ( B, C )φ3 (C, D)φ4 ( D, E ) D

C

B

A

= ∑∑∑φ2 ( B, C )φ3 (C, D)φ4 ( D, E )∑φ1 ( A, B) D

C

B

A

= ∑∑∑φ2 ( B, C )φ3 (C, D)φ4 ( D, E )τ 1 ( B) D

C

B

Daphne Koller

X

Elimination in Chains

X

A

B

C

D

E

P( E ) ∝ ∑∑∑φ2 ( B, C )φ3 (C, D)φ4 ( D, E )τ 1 ( B) D

C

B

' $ = ∑∑ φ3 (C , D)φ4 ( D, E )% ∑ φ2 ( B, C )τ 1 ( B) " D C & B # = ∑∑ φ3 (C , D)φ4 ( D, E )τ 2 (C ) D

C

Daphne Koller

Variable Elimination •  Goal: P(J) •  Eliminate: C,D,I,H,G,S,L ( J , L, S )φL ( L, G )φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )φD (C , D)φC (C )

∑φ

J L , S ,G , H , I , D , C

∑φ

C

( J , L, S )φL ( L, G)φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )∑φD (C , D)φC (C )

J L , S ,G , H , I , D

=

∑φ

Compute

C

τ 1 ( D) = ∑φC (C )φD (C, D)

I

D

C

G

( J , L, S )φL ( L, G)φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )τ 1 ( D)

J L , S ,G , H , I , D

S

L H

J Daphne Koller

Variable Elimination •  Goal: P(J) •  Eliminate: D,I,H,G,S,L

∑φ

( J , L, S )φL ( L, G)φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )τ 1 ( D)

J L , S ,G , H , I , D

=

∑φ

J L , S ,G , H , I

( J , L, S )φL ( L, G)φS ( S , I )φH ( H , G, J )φI ( I )∑φG (G, I , D)τ 1 ( D)

C

D

Compute τ 2 (G, I ) = ∑φG (G, I , D)τ 1 ( D)

I

D

D

=

∑φ

J L , S ,G , H , I

( J , L, S )φL ( L, G)φS ( S , I )φH ( H , G, J )φI ( I )τ 2 (G, I )

G

S

L H

J Daphne Koller

Variable Elimination •  Goal: P(J) •  Eliminate: I,H,G,S,L

∑φ

J L , S ,G , H , I

=



L , S ,G , H

( J , L, S )φL ( L, G)φS ( S , I )φH ( H , G, J )φI ( I )τ 2 (G, I )

φJ ( J , L, S )φL ( L, G)φH ( H , G, J )∑φS ( S , I )φI ( I )τ 2 (G, I ) I

Compute =

C

∑φ

J

τ 3 (S , G) = ∑φS (S , I )φI ( I )τ 2 (G, I )

I

D

I

G

( J , L, S )φL ( L, G )φH ( H , G, J )τ 3 ( S , G )

L , S ,G , H

S

L H

J Daphne Koller

Variable Elimination •  Goal: P(J) •  Eliminate: H,G,S,L

∑φ

J

( J , L, S )φL ( L, G)φH ( H , G, J )τ 3 ( S , G)

L , S ,G , H

∑φ

J

L , S ,G

∑φ

J

( J , L, S )φL ( L, G )τ 3 ( S , G )∑ φH ( H , G, J )

C

Compute

D

H

τ 4 (G, J ) = ∑φH ( H , G, J )

I

H

( J , L, S )φL ( L, G )τ 3 ( S , G )τ 4 (G, J )

G

L , S ,G

S

L H

J Daphne Koller

Variable Elimination •  Goal: P(J) •  Eliminate: G,S,L

∑φ

J

( J , L, S )φL ( L, G )τ 3 ( S , G )τ 4 (G, J )

L , S ,G

∑φ L,S

J

( J , L, S )∑ φL ( L, G )τ 4 (G, J )τ 3 ( S , G )

C

G

Compute

∑φ

J

τ 5 ( L, J ) = ∑φL ( L, G)τ 3 (S , G)τ 4 (G, J )

I

D

G

( J , L, S )τ 5 ( L, J )

G

L,S

S

L H

J Daphne Koller

Variable Elimination •  Goal: P(J) •  Eliminate: S,L

∑φ L,S

J

( J , L, S )τ 5 ( L, J )

C I

D G

S

L H

J Daphne Koller

Variable Elimination with evidence •  Goal: P(J,I=i,H=h) •  Eliminate: C,D,G,S,L

∑φ

J L , S , G , , D ,C

( J , L, S )φL ( L, G)φS ' ( S )φG ' (G, D)φH ' (G, J )φI ' ()φD (C , D)φC (C )

C I

D G

How do we get P(J | I=i,H=h)?

S

L H

J Daphne Koller

Variable Elimination in MNs •  Goal: P(D) •  Eliminate: A,B,C

∑φ ( A, B)φ 1

2

( B, C )φ3 (C , D)φ4 ( A, D)

A

A, B ,C

∑φ

2

B ,C

( B, C )φ3 (C , D)∑ φ1 ( A, B)φ4 ( A, D) A

∑φ

2

( B, C )φ3 (C , D)τ 1 ( B, D)

D

B

B ,C

At the end of elimination get τ3(D)

C

Daphne Koller

Eliminate-Var Z from Φ

Daphne Koller

VE Algorithm Summary •  Reduce all factors by evidence –  Get a set of factors Φ

•  For each non-query variable Z –  Eliminate-Var Z from Φ

•  Multiply all remaining factors •  Renormalize to get distribution Daphne Koller

Summary

•  Simple algorithm •  Works for both BNs and MNs •  Factor product and summation steps can be done in any order, subject to: –  when Z is eliminated, all factors involving Z have been multiplied in Daphne Koller

Probabilis3c+ Graphical+ Models+

Inference+ Variable+Elimina3on+

Complexity+ Analysis+ Daphne Koller

Eliminating Z

Daphne Koller

Reminder: Factor Product Nk =|Val(Xk)|

a1

b1

c1

0.5·0.5 = 0.25

a1

b1

c2

0.5·0.7 = 0.35

a1

b2

c1

0.8·0.1 = 0.08

a1

b2

c2

0.8·0.2 = 0.16

a2

b1

c1

0.1·0.5 = 0.05

a1

b1

0.5

a1

b2

0.8

b1

c1

0.5

a2

b1

c2

0.1·0.7 = 0.07

a2

b1

0.1

b1

c2

0.7

a2

b2

c1

0·0.1 = 0

a2

b2

0

b2

c1

0.1

a2

b2

c2

0·0.2 = 0

a3

b1

0.3

b2

c2

0.2

a3

b1

c1

0.3·0.5 = 0.15

a3

b2

0.9

a3

b1

c2

0.3·0.7 = 0.21

a3

b2

c1

0.9·0.1 = 0.09

a3

b2

c2

0.9·0.2 = 0.18

Cost: (mk-1)Nk multiplications

Daphne Koller

Reminder: Factor Marginalization Nk =|Val(Xk)|

Cost: ~Nk additions

a1

b1

c1

0.25

a1

b1

c2

0.35

a1

b2

c1

0.08

a1

b2

c2

0.16

a1

c1

0.33

a2

b1

c1

0.05

a1

c2

0.51

a2

b1

c2

0.07

a2

c1

0.05

a2

b2

c1

0

a2

c2

0.07

a2

b2

c2

0

a3

c1

0.24

a3

b1

c1

0.15

a3

c2

0.39

a3

b1

c2

0.21

a3

b2

c1

0.09

a3

b2

c2

0.18

Daphne Koller

Complexity of Variable Elimination •  Start with m factors –  m ≤ n for Bayesian networks –  can be larger for Markov networks

•  At each elimination step generate •  At most elimination steps •  Total number of factors: m* Daphne Koller

Complexity of Variable Elimination •  N = max(Nk) = size of the largest factor •  Product operations: Σk (mk-1)Nk •  Sum operations: Σk Nk •  Total work is linear in N and m* Daphne Koller

Complexity of Variable Elimination •  Total work is linear in N and m •  Nk =|Val(Xk)|=O(drk) where

–  d = max(|Val(Xi)|) –  rk = |Xk| = cardinality of the scope of the kth factor

Daphne Koller

Complexity Example τ 1 ( D) = ∑φC (C )φD (C, D) C

τ 2 (G, I ) = ∑φG (G, I , D)τ 1 ( D) D

C

τ 3 (S , G) = ∑φS (S , I )φI ( I )τ 2 (G, I )

G

I

τ 4 (G, J ) = ∑φH ( H , G, J )

S

L

H

τ 5 ( J , L, S ) = ∑φL ( L, G)τ 3 (S , G)τ 4 (G, J ) G

τ 6 ( J ) = ∑ φ J ( J , L, S )τ 5 ( J , L, S )

I

D

H

J

L,S

Daphne Koller

Complexity and Elimination Order ∑φ

( J , L, S )φL ( L, G )φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )φD (C , D)φC (C )

J L , S ,G , H , I , D , C

•  Eliminate: G

∑φ

L

( L, G)φG (G, I , D)φH ( H , G, J )

C I

D

G

G

S

L H

J Daphne Koller

Complexity and Elimination Order A

Eliminate A first:

B1

B2

B3



Bk

C A

Eliminate Bi‘s first:

B1

B2

B3



Bk

C Daphne Koller

Summary

•  Complexity of variable elimination linear in –  size of the model (# factors, # variables) –  size of the largest factor generated

•  Size of factor is exponential in its scope •  Complexity of algorithm depends heavily on elimination ordering Daphne Koller

Probabilis.c+ Graphical+ Models+

Inference+ Variable+Elimina.on+

Graph&Based+ Perspec.ve+ Daphne Koller

Initial Graph φJ ( J , L, S )φL ( L, G)φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )φD (C , D)φC (C ) C

C I

D G

S

G

L H

I

D

S

L J H

J Daphne Koller

Elimination as Graph Operation φJ ( J , L, S )φL ( L, G)φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )φD (C , D)φC (C )

•  Eliminate: C τ 1 ( D) = ∑φC (C )φD (C, D) C

C I

D G

S

L

Induced Markov network for the current set of factors

H

J Daphne Koller

Elimination as Graph Operation φJ ( J , L, S )φL ( L, G)φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )τ 1 ( D)

•  Eliminate: D

τ 2 (G, I ) = ∑φG (G, I , D)τ 1 ( D)

C I

D

D

G

S

L

Induced Markov network for the current set of factors

H

J Daphne Koller

Elimination as Graph Operation φJ ( J , L, S )φL ( L, G)φS ( S , I )φI ( I )φH ( H , G, J )τ 2 (G, I )

•  Eliminate: I

C I

D G

τ 3 (S , G) = ∑φS (S , I )φI ( I )τ 2 (G, I )

S

I

L

Induced Markov network for the current set of factors

H

J Daphne Koller

Elimination as Graph Operation φJ ( J , L, S )φL ( L, G)φH ( H , G, J )τ 3 ( S , G)

•  Eliminate: H

C I

D G

τ 4 (G, J ) = ∑φH ( H , G, J )

L

H

Induced Markov network for the current set of factors

S

H

J Daphne Koller

Elimination as Graph Operation φJ ( J , L, S )φL ( L, G)τ 3 ( S , G)τ 4 (G, J )

•  Eliminate: G

C I

D G

S

L

τ 5 ( L, J ) = ∑φL ( L, G)τ 3 (S , G)τ 4 (G, J ) G

Induced Markov network for the current set of factors

H

J Daphne Koller

Elimination as Graph Operation φJ ( J , L, S )τ 5 ( L, J )

•  Eliminate: L,S

C I

D G

S

L

Induced Markov network for the current set of factors

H

J Daphne Koller

Elimination as Graph Operation φJ ( J , L, S )τ 5 ( L, J )

•  Eliminate: L,S

C I

D G

S

L

Induced Markov network for the current set of factors

H

J Daphne Koller

Induced Graph •  The induced graph IΦ,α over factors Φ and ordering α: –  Undirected graph –  Xi and Xj are connected if they appeared in the same factor in a run of the VE algorithm using α as the ordering

C I

D G

S

L H

J Daphne Koller

Cliques in the Induced Graph •  Theorem: Every factor produced during VE is a clique in the induced graph C τ 1 ( D) = ∑φC (C )φD (C, D) C

τ 2 (G, I ) = ∑φG (G, I , D)τ 1 ( D)

I

D

D

G

τ 3 (S , G) = ∑φS (S , I )φI ( I )τ 2 (G, I ) I

τ 4 (G, J ) = ∑φH ( H , G, J )

L

H

τ 5 ( L, J ) = ∑φL ( L, G)τ 3 (S , G)τ 4 (G, J ) G

τ 6 = ∑ φ J ( J , L, S )τ 5 ( L, J ) L,S

S

H

J Daphne Koller

Cliques in the Induced Graph •  Theorem: Every (maximal) clique in the induced graph is a factor produced during VE C τ 1 ( D) = ∑φC (C )φD (C, D) C

τ 2 (G, I ) = ∑φG (G, I , D)τ 1 ( D)

I

D

D

G

τ 3 (S , G) = ∑φS (S , I )φI ( I )τ 2 (G, I ) I

τ 4 (G, J ) = ∑φH ( H , G, J )

L

H

τ 5 ( L, J ) = ∑φL ( L, G)τ 3 (S , G)τ 4 (G, J ) G

τ 6 = ∑ φ J ( J , L, S )τ 5 ( L, J ) L,S

S

H

J Daphne Koller

Cliques in the Induced Graph •  Theorem: Every (maximal) clique in the induced graph is a factor produced during VE C I

D G

S

L H

J Daphne Koller

Induced Width •  The width of an induced graph is the number of nodes in the largest clique in the graph minus 1 •  Minimal induced width of a graph K is minα(width(IK,α)) •  Provides a lower bound on best performance of VE to a model factorizing over K Daphne Koller

Summary •  Variable elimination can be viewed as transformations on undirected graph –  Elimination connects all node’s current neighbors

•  Cliques in resulting induced graph directly correspond to algorithm’s complexity Daphne Koller

Probabilis+c& Graphical& Models&

Inference& Variable&Elimina+on&

Finding& Elimina+on& Orderings& Daphne Koller

Finding Elimination Orderings •  Theorem: For a graph H, determining whether there exists an elimination ordering for H with induced width ≤ K is NP-complete •  Note: This NP-hardness result is distinct from the NP-hardness result of inference

–  Even given the optimal ordering, inference may still be exponential Daphne Koller

Finding Elimination Orderings •  Greedy search using heuristic cost function

–  At each point, eliminate node with smallest cost

•  Possible cost functions:

–  min-neighbors: # neighbors in current graph –  min-weight: weight (# values) of factor formed –  min-fill: number of new fill edges –  weighted min-fill: total weight of new fill edges (edge weight = product of weights of the 2 nodes) Daphne Koller

Finding Elimination Orderings •  Theorem: The induced graph is triangulated –  No loops of length > 3 without a “bridge” A D

B C

•  Can find elimination ordering by finding a low-width triangulation of original graph HΦ Daphne Koller

Robot Localization & Mapping x0

x1

x2

x3

x4

L1

z1

z2

z3

z4

L2

... robot pose

xt zt

sensor observation

L3 Daphne Koller

Square Root SAM, F. Dellaert and M. Kaess, IJRR, 2006

Robot Localization & Mapping

Daphne Koller

Square Root SAM, F. Dellaert and M. Kaess, IJRR, 2006

Eliminate Poses then Landmarks

Daphne Koller

Square Root SAM, F. Dellaert and M. Kaess, IJRR, 2006

Eliminate Landmarks then Poses

Daphne Koller

Summary •  Finding the optimal elimination ordering is NP-hard •  Simple heuristics that try to keep induced graph small often provide reasonable performance

Daphne Koller