Probabilis,c( Graphical( Models(
Inference( Variable(Elimina,on(
Variable( Elimina,on( Algorithm( Daphne Koller
X
Elimination in Chains
A
B
C
E
D
~ P ( E ) ∝ ∑ ∑ ∑ ∑ P ( A, B, C , D, E ) D
C
B
A
= ∑∑∑∑φ1 ( A, B)φ2 ( B, C )φ3 (C, D)φ4 ( D, E ) D
C
B
A
= ∑∑∑φ2 ( B, C )φ3 (C, D)φ4 ( D, E )∑φ1 ( A, B) D
C
B
A
= ∑∑∑φ2 ( B, C )φ3 (C, D)φ4 ( D, E )τ 1 ( B) D
C
B
Daphne Koller
X
Elimination in Chains
X
A
B
C
D
E
P( E ) ∝ ∑∑∑φ2 ( B, C )φ3 (C, D)φ4 ( D, E )τ 1 ( B) D
C
B
' $ = ∑∑ φ3 (C , D)φ4 ( D, E )% ∑ φ2 ( B, C )τ 1 ( B) " D C & B # = ∑∑ φ3 (C , D)φ4 ( D, E )τ 2 (C ) D
C
Daphne Koller
Variable Elimination • Goal: P(J) • Eliminate: C,D,I,H,G,S,L ( J , L, S )φL ( L, G )φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )φD (C , D)φC (C )
∑φ
J L , S ,G , H , I , D , C
∑φ
C
( J , L, S )φL ( L, G)φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )∑φD (C , D)φC (C )
J L , S ,G , H , I , D
=
∑φ
Compute
C
τ 1 ( D) = ∑φC (C )φD (C, D)
I
D
C
G
( J , L, S )φL ( L, G)φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )τ 1 ( D)
J L , S ,G , H , I , D
S
L H
J Daphne Koller
Variable Elimination • Goal: P(J) • Eliminate: D,I,H,G,S,L
∑φ
( J , L, S )φL ( L, G)φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )τ 1 ( D)
J L , S ,G , H , I , D
=
∑φ
J L , S ,G , H , I
( J , L, S )φL ( L, G)φS ( S , I )φH ( H , G, J )φI ( I )∑φG (G, I , D)τ 1 ( D)
C
D
Compute τ 2 (G, I ) = ∑φG (G, I , D)τ 1 ( D)
I
D
D
=
∑φ
J L , S ,G , H , I
( J , L, S )φL ( L, G)φS ( S , I )φH ( H , G, J )φI ( I )τ 2 (G, I )
G
S
L H
J Daphne Koller
Variable Elimination • Goal: P(J) • Eliminate: I,H,G,S,L
∑φ
J L , S ,G , H , I
=
∑
L , S ,G , H
( J , L, S )φL ( L, G)φS ( S , I )φH ( H , G, J )φI ( I )τ 2 (G, I )
φJ ( J , L, S )φL ( L, G)φH ( H , G, J )∑φS ( S , I )φI ( I )τ 2 (G, I ) I
Compute =
C
∑φ
J
τ 3 (S , G) = ∑φS (S , I )φI ( I )τ 2 (G, I )
I
D
I
G
( J , L, S )φL ( L, G )φH ( H , G, J )τ 3 ( S , G )
L , S ,G , H
S
L H
J Daphne Koller
Variable Elimination • Goal: P(J) • Eliminate: H,G,S,L
∑φ
J
( J , L, S )φL ( L, G)φH ( H , G, J )τ 3 ( S , G)
L , S ,G , H
∑φ
J
L , S ,G
∑φ
J
( J , L, S )φL ( L, G )τ 3 ( S , G )∑ φH ( H , G, J )
C
Compute
D
H
τ 4 (G, J ) = ∑φH ( H , G, J )
I
H
( J , L, S )φL ( L, G )τ 3 ( S , G )τ 4 (G, J )
G
L , S ,G
S
L H
J Daphne Koller
Variable Elimination • Goal: P(J) • Eliminate: G,S,L
∑φ
J
( J , L, S )φL ( L, G )τ 3 ( S , G )τ 4 (G, J )
L , S ,G
∑φ L,S
J
( J , L, S )∑ φL ( L, G )τ 4 (G, J )τ 3 ( S , G )
C
G
Compute
∑φ
J
τ 5 ( L, J ) = ∑φL ( L, G)τ 3 (S , G)τ 4 (G, J )
I
D
G
( J , L, S )τ 5 ( L, J )
G
L,S
S
L H
J Daphne Koller
Variable Elimination • Goal: P(J) • Eliminate: S,L
∑φ L,S
J
( J , L, S )τ 5 ( L, J )
C I
D G
S
L H
J Daphne Koller
Variable Elimination with evidence • Goal: P(J,I=i,H=h) • Eliminate: C,D,G,S,L
∑φ
J L , S , G , , D ,C
( J , L, S )φL ( L, G)φS ' ( S )φG ' (G, D)φH ' (G, J )φI ' ()φD (C , D)φC (C )
C I
D G
How do we get P(J | I=i,H=h)?
S
L H
J Daphne Koller
Variable Elimination in MNs • Goal: P(D) • Eliminate: A,B,C
∑φ ( A, B)φ 1
2
( B, C )φ3 (C , D)φ4 ( A, D)
A
A, B ,C
∑φ
2
B ,C
( B, C )φ3 (C , D)∑ φ1 ( A, B)φ4 ( A, D) A
∑φ
2
( B, C )φ3 (C , D)τ 1 ( B, D)
D
B
B ,C
At the end of elimination get τ3(D)
C
Daphne Koller
Eliminate-Var Z from Φ
Daphne Koller
VE Algorithm Summary • Reduce all factors by evidence – Get a set of factors Φ
• For each non-query variable Z – Eliminate-Var Z from Φ
• Multiply all remaining factors • Renormalize to get distribution Daphne Koller
Summary
• Simple algorithm • Works for both BNs and MNs • Factor product and summation steps can be done in any order, subject to: – when Z is eliminated, all factors involving Z have been multiplied in Daphne Koller
Probabilis3c+ Graphical+ Models+
Inference+ Variable+Elimina3on+
Complexity+ Analysis+ Daphne Koller
Eliminating Z
Daphne Koller
Reminder: Factor Product Nk =|Val(Xk)|
a1
b1
c1
0.5·0.5 = 0.25
a1
b1
c2
0.5·0.7 = 0.35
a1
b2
c1
0.8·0.1 = 0.08
a1
b2
c2
0.8·0.2 = 0.16
a2
b1
c1
0.1·0.5 = 0.05
a1
b1
0.5
a1
b2
0.8
b1
c1
0.5
a2
b1
c2
0.1·0.7 = 0.07
a2
b1
0.1
b1
c2
0.7
a2
b2
c1
0·0.1 = 0
a2
b2
0
b2
c1
0.1
a2
b2
c2
0·0.2 = 0
a3
b1
0.3
b2
c2
0.2
a3
b1
c1
0.3·0.5 = 0.15
a3
b2
0.9
a3
b1
c2
0.3·0.7 = 0.21
a3
b2
c1
0.9·0.1 = 0.09
a3
b2
c2
0.9·0.2 = 0.18
Cost: (mk-1)Nk multiplications
Daphne Koller
Reminder: Factor Marginalization Nk =|Val(Xk)|
Cost: ~Nk additions
a1
b1
c1
0.25
a1
b1
c2
0.35
a1
b2
c1
0.08
a1
b2
c2
0.16
a1
c1
0.33
a2
b1
c1
0.05
a1
c2
0.51
a2
b1
c2
0.07
a2
c1
0.05
a2
b2
c1
0
a2
c2
0.07
a2
b2
c2
0
a3
c1
0.24
a3
b1
c1
0.15
a3
c2
0.39
a3
b1
c2
0.21
a3
b2
c1
0.09
a3
b2
c2
0.18
Daphne Koller
Complexity of Variable Elimination • Start with m factors – m ≤ n for Bayesian networks – can be larger for Markov networks
• At each elimination step generate • At most elimination steps • Total number of factors: m* Daphne Koller
Complexity of Variable Elimination • N = max(Nk) = size of the largest factor • Product operations: Σk (mk-1)Nk • Sum operations: Σk Nk • Total work is linear in N and m* Daphne Koller
Complexity of Variable Elimination • Total work is linear in N and m • Nk =|Val(Xk)|=O(drk) where
– d = max(|Val(Xi)|) – rk = |Xk| = cardinality of the scope of the kth factor
Daphne Koller
Complexity Example τ 1 ( D) = ∑φC (C )φD (C, D) C
τ 2 (G, I ) = ∑φG (G, I , D)τ 1 ( D) D
C
τ 3 (S , G) = ∑φS (S , I )φI ( I )τ 2 (G, I )
G
I
τ 4 (G, J ) = ∑φH ( H , G, J )
S
L
H
τ 5 ( J , L, S ) = ∑φL ( L, G)τ 3 (S , G)τ 4 (G, J ) G
τ 6 ( J ) = ∑ φ J ( J , L, S )τ 5 ( J , L, S )
I
D
H
J
L,S
Daphne Koller
Complexity and Elimination Order ∑φ
( J , L, S )φL ( L, G )φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )φD (C , D)φC (C )
J L , S ,G , H , I , D , C
• Eliminate: G
∑φ
L
( L, G)φG (G, I , D)φH ( H , G, J )
C I
D
G
G
S
L H
J Daphne Koller
Complexity and Elimination Order A
Eliminate A first:
B1
B2
B3
…
Bk
C A
Eliminate Bi‘s first:
B1
B2
B3
…
Bk
C Daphne Koller
Summary
• Complexity of variable elimination linear in – size of the model (# factors, # variables) – size of the largest factor generated
• Size of factor is exponential in its scope • Complexity of algorithm depends heavily on elimination ordering Daphne Koller
Probabilis.c+ Graphical+ Models+
Inference+ Variable+Elimina.on+
Graph&Based+ Perspec.ve+ Daphne Koller
Initial Graph φJ ( J , L, S )φL ( L, G)φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )φD (C , D)φC (C ) C
C I
D G
S
G
L H
I
D
S
L J H
J Daphne Koller
Elimination as Graph Operation φJ ( J , L, S )φL ( L, G)φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )φD (C , D)φC (C )
• Eliminate: C τ 1 ( D) = ∑φC (C )φD (C, D) C
C I
D G
S
L
Induced Markov network for the current set of factors
H
J Daphne Koller
Elimination as Graph Operation φJ ( J , L, S )φL ( L, G)φS ( S , I )φG (G, I , D)φH ( H , G, J )φI ( I )τ 1 ( D)
• Eliminate: D
τ 2 (G, I ) = ∑φG (G, I , D)τ 1 ( D)
C I
D
D
G
S
L
Induced Markov network for the current set of factors
H
J Daphne Koller
Elimination as Graph Operation φJ ( J , L, S )φL ( L, G)φS ( S , I )φI ( I )φH ( H , G, J )τ 2 (G, I )
• Eliminate: I
C I
D G
τ 3 (S , G) = ∑φS (S , I )φI ( I )τ 2 (G, I )
S
I
L
Induced Markov network for the current set of factors
H
J Daphne Koller
Elimination as Graph Operation φJ ( J , L, S )φL ( L, G)φH ( H , G, J )τ 3 ( S , G)
• Eliminate: H
C I
D G
τ 4 (G, J ) = ∑φH ( H , G, J )
L
H
Induced Markov network for the current set of factors
S
H
J Daphne Koller
Elimination as Graph Operation φJ ( J , L, S )φL ( L, G)τ 3 ( S , G)τ 4 (G, J )
• Eliminate: G
C I
D G
S
L
τ 5 ( L, J ) = ∑φL ( L, G)τ 3 (S , G)τ 4 (G, J ) G
Induced Markov network for the current set of factors
H
J Daphne Koller
Elimination as Graph Operation φJ ( J , L, S )τ 5 ( L, J )
• Eliminate: L,S
C I
D G
S
L
Induced Markov network for the current set of factors
H
J Daphne Koller
Elimination as Graph Operation φJ ( J , L, S )τ 5 ( L, J )
• Eliminate: L,S
C I
D G
S
L
Induced Markov network for the current set of factors
H
J Daphne Koller
Induced Graph • The induced graph IΦ,α over factors Φ and ordering α: – Undirected graph – Xi and Xj are connected if they appeared in the same factor in a run of the VE algorithm using α as the ordering
C I
D G
S
L H
J Daphne Koller
Cliques in the Induced Graph • Theorem: Every factor produced during VE is a clique in the induced graph C τ 1 ( D) = ∑φC (C )φD (C, D) C
τ 2 (G, I ) = ∑φG (G, I , D)τ 1 ( D)
I
D
D
G
τ 3 (S , G) = ∑φS (S , I )φI ( I )τ 2 (G, I ) I
τ 4 (G, J ) = ∑φH ( H , G, J )
L
H
τ 5 ( L, J ) = ∑φL ( L, G)τ 3 (S , G)τ 4 (G, J ) G
τ 6 = ∑ φ J ( J , L, S )τ 5 ( L, J ) L,S
S
H
J Daphne Koller
Cliques in the Induced Graph • Theorem: Every (maximal) clique in the induced graph is a factor produced during VE C τ 1 ( D) = ∑φC (C )φD (C, D) C
τ 2 (G, I ) = ∑φG (G, I , D)τ 1 ( D)
I
D
D
G
τ 3 (S , G) = ∑φS (S , I )φI ( I )τ 2 (G, I ) I
τ 4 (G, J ) = ∑φH ( H , G, J )
L
H
τ 5 ( L, J ) = ∑φL ( L, G)τ 3 (S , G)τ 4 (G, J ) G
τ 6 = ∑ φ J ( J , L, S )τ 5 ( L, J ) L,S
S
H
J Daphne Koller
Cliques in the Induced Graph • Theorem: Every (maximal) clique in the induced graph is a factor produced during VE C I
D G
S
L H
J Daphne Koller
Induced Width • The width of an induced graph is the number of nodes in the largest clique in the graph minus 1 • Minimal induced width of a graph K is minα(width(IK,α)) • Provides a lower bound on best performance of VE to a model factorizing over K Daphne Koller
Summary • Variable elimination can be viewed as transformations on undirected graph – Elimination connects all node’s current neighbors
• Cliques in resulting induced graph directly correspond to algorithm’s complexity Daphne Koller
Probabilis+c& Graphical& Models&
Inference& Variable&Elimina+on&
Finding& Elimina+on& Orderings& Daphne Koller
Finding Elimination Orderings • Theorem: For a graph H, determining whether there exists an elimination ordering for H with induced width ≤ K is NP-complete • Note: This NP-hardness result is distinct from the NP-hardness result of inference
– Even given the optimal ordering, inference may still be exponential Daphne Koller
Finding Elimination Orderings • Greedy search using heuristic cost function
– At each point, eliminate node with smallest cost
• Possible cost functions:
– min-neighbors: # neighbors in current graph – min-weight: weight (# values) of factor formed – min-fill: number of new fill edges – weighted min-fill: total weight of new fill edges (edge weight = product of weights of the 2 nodes) Daphne Koller
Finding Elimination Orderings • Theorem: The induced graph is triangulated – No loops of length > 3 without a “bridge” A D
B C
• Can find elimination ordering by finding a low-width triangulation of original graph HΦ Daphne Koller
Robot Localization & Mapping x0
x1
x2
x3
x4
L1
z1
z2
z3
z4
L2
... robot pose
xt zt
sensor observation
L3 Daphne Koller
Square Root SAM, F. Dellaert and M. Kaess, IJRR, 2006
Robot Localization & Mapping
Daphne Koller
Square Root SAM, F. Dellaert and M. Kaess, IJRR, 2006
Eliminate Poses then Landmarks
Daphne Koller
Square Root SAM, F. Dellaert and M. Kaess, IJRR, 2006
Eliminate Landmarks then Poses
Daphne Koller
Summary • Finding the optimal elimination ordering is NP-hard • Simple heuristics that try to keep induced graph small often provide reasonable performance
Daphne Koller