Dynamic Tree Block Coordinate Ascent - Semantic Scholar

Comment

Report 3 Downloads 75 Views

Dynamic Tree Block Coordinate  Ascent 

Daniel Tarlow1, Dhruv Batra2   Pushmeet Kohli3, Vladimir Kolmogorov4  1: University of Toronto    2: TTI Chicago       

3: Microso3 Research Cambridge  4: University College London 

InternaAonal Conference on Machine Learning (ICML), 2011 

MAP in Large Discrete Models  • Many important problems can be expressed as a discrete  Random Field (MRF, CRF)  • MAP inference is a fundamental problem 

min E(x) = min

x∈X

x∈X

! i∈V

θi (xi ) +

!

θij (xi , xj )

(i,j)∈E

InpainAng 

Stereo 

Object Class Labeling 

Protein Design / Side Chain PredicAon 

Primal and Dual  Primal  min x

!

Dual  θA (xA )

A∈V∪E

≥

!

min θ˜A (xA ) =

A∈V∪E

xA

!

h∗A

A∈V∪E

‐   Dual is a lower bound: less constrained version of primal  ‐        is a reparameteriza)on, determined by messages  ‐   hA* is height of unary or pairwise potenAal  ‐   DeﬁniAon of reparameterizaAon: 

!

A∈V∪E

θA (xA ) =

!

A∈V∪E

θ˜A (xA )

∀{xA }

LP‐based message passing: ﬁnd reparameterizaAon to maximize dual 

Standard Linear Program‐based Message Passing  • Max Product Linear Programming (MPLP)  – Update edges in ﬁxed order 

• SequenAal Tree‐Reweighted Max Product (TRW‐S)  – SequenAally iterate over variables in ﬁxed order 

• Tree Block Coordinate Ascent (TBCA) [Sontag & Jaakkola, 2009]  – Update trees in ﬁxed order 

Key: these are all energy oblivious  Can we do be^er by being energy aware? 

Example  TBCA with StaJc Schedule:  630 messages needed 

TBCA with Dynamic Schedule:   276 messages needed 

Beneﬁt of Energy Awareness  StaJc seMngs  – Not all graph regions are equally diﬃcult  – RepeaAng computaAon on easy parts is wasteful  Harder region  Easy region 

Dynamic seMngs (e.g., learning, search)  – Small region of graph changes.  – ComputaAon on unchanged part is wasteful  Unchanged  Changed  Image 

Previous OpAmum  Change Mask 

References and Related Work  • [Elidan et al., 2006], [Su^on & McCallum, 2007]   – Residual Belief PropagaAon.  Pass most diﬀerent messages ﬁrst. 

• [Chandrasekaran et al., 2007]  – Works only on conAnuous variables.  Very diﬀerent formulaAon. 

• [Batra et al., 2011]  – Local Primal Dual Gap for Tightening LP relaxaAons. 

• [Kolmogorov, 2006]  – Weak Tree Agreement in relaAon to TRW‐S. 

• [Sontag et al., 2009]   – Tree Block Coordinate Descent. 

VisualizaJon of reparameterized energy  States for each variable: red (R), green (G), or blue (B)  "Good" local settings:

G 

x1  R  G  B 

G‐G, B‐B 

(can assume "good" has cost 0, otherwise cost 1)

G  x2 

G‐G, B‐B 

G  x3 

G‐B, B‐B 

B  x4 

VisualizaJon of reparameterized energy  States for each variable: red (R), green (G), or blue (B)  "Good" local settings:

G 

x1 

G‐G, B‐B  "Don't be R or B"  "Don't be R or B" 

G  x2 

G‐G, B‐B  "Don't be R or B"  "Don't be R or B" 

G  x3 

G‐B, B‐B  "Don't be R or G"  "Don't be R" 

R  G  B  HypotheJcal messages that e.g. residual max‐product would send. 

B  x4 

VisualizaJon of reparameterized energy  States for each variable: red (R), green (G), or blue (B)  "Good" local settings:

G 

x1 

G‐G, B‐B 

G  x2 

G‐G, B‐B 

G  x3 

G‐B, B‐B 

B  x4 

R  G  B  But we don't need to send any messages.  We are at the global opJmum.  Our scores (see later slides) are 0, so we wouldn't send any messages here. 

VisualizaJon of reparameterized energy  States for each variable: red (R), green (G), or blue (B)  "Good" local settings:

B 

x1 

G‐G, B‐B 

B  x2 

G‐G, B‐B 

B 

G‐B, B‐B 

x3 

R  G  B  Change unary potenJals (e.g., during learning or search)  

B  x4 

VisualizaJon of reparameterized energy  States for each variable: red (R), green (G), or blue (B)  "Good" local settings:

B 

x1 

G‐G, B‐B 

B  x2 

G‐G, B‐B 

B 

G‐B, B‐B 

x3 

R  G  B  Locally, best assignment for some variables change. 

B  x4 

VisualizaJon of reparameterized energy  States for each variable: red (R), green (G), or blue (B)  "Good" local settings:

B 

x1 

G‐G, B‐B  "Don't be R or G"  "Don't be R or G" 

B  x2 

G‐G, B‐B  "Don't be R or G"  "Don't be R or G" 

B  x3 

G‐B, B‐B  "Don't be R or G"  "Don't be R" 

R  G  B  HypotheJcal messages that e.g. residual max‐product would send. 

B  x4 

VisualizaJon of reparameterized energy  States for each variable: red (R), green (G), or blue (B)  "Good" local settings:

B 

x1 

G‐G, B‐B 

B  x2 

G‐G, B‐B 

B  x3 

G‐B, B‐B 

B  x4 

R  G  B  But we don't need to send any messages.  We are at the global opJmum.  Our scores (see later slides) are 0, so we wouldn't send any messages here. 

VisualizaJon of reparameterized energy  "Good" local settings: B  G‐G, B‐B  B  x1 

G‐G, B‐B 

x2 

B 

G‐B, B‐B 

x3 

B  x4 

Possible ﬁx:  look at how much sending messages on edge would improve dual.  •    Would work in above case, but incorrectly ignores e.g. the subgraph below:  "Good" local settings: B  x1 

R  G  B 

B‐B 

G, B  x2 

G‐G 

R, G  x3 

R‐R 

R  x4 

Key Slide 

B  x1 

B‐B 

G, B  x2 

B‐B 

R,G 

R‐R 

x3 

Locally, everything looks opAmal 

R  x4 

Key Slide 

B  x1 

B‐B 

G, B  x2 

B‐B 

R,G 

R‐R 

x3 

Try assigning a value to each variable 

R  x4 

Key Slide  Our main contribuAon 

Use primal (and dual) informaJon to  choose regions on which to pass messages  B 

B‐B 

G, B 

x1 

x2 

al 

Am p o b u S

B‐B 

R,G 

R‐R 

R 

x3 

x4 

Am Subop

Try assigning a value to each variable 

al 

Our FormulaJon  • Measure primal‐dual local agreement at edges and variables  – Local Primal Dual Gap (LPDG).  – Weak Tree Agreement (WTA). 

• Choose forest with maximum disagreement  – Kruskal's algorithm, possibly terminated early 

• Apply TBCA update on maximal trees 

Important!  Minimize overhead.  Use quanAAes that are already computed during  inference, and carefully cache computaAons   

Local Primal‐Dual Gap (LPDG) Score  • Diﬀerence between primal and dual objecAves  – Given primal assignment xp and dual variables  (messages) deﬁning     , primal‐dual gap is  Primal‐dual   gap 

!

θA (xpA )

A∈V∪E

−

!

A∈V∪E

primal 

=

! "

A∈V∪E

min θ˜A (xA ) xA

dual 

# ! p θ˜A (xA ) − min θ˜A (xA ) = LPDG (A)

Primal cost of node/edge 

xA

A∈V∪E

Dual bound at node/edge 

e: “local disagreement” measure:  eA = LPDG (A)

Shortcoming of LPDG Score: Loose RelaxaJons 

LPDG > 0,  but dual opAmal 

Filled circle means 

, black edge means 

Weak Tree Agreement (WTA)  [Kolmogorov 2006]  Reparameterized potenAals      are said to saAsfy WTA if  there exist non‐empty subsets                  for each node i   Di ⊆ Xi such that   θ˜i (xi ) = h∗i ∀xi ∈ Di min θ˜ij (xi , xj ) = h∗ij ∀xi ∈ Di , (i, j) ∈ E xj ∈Dj

    Black edge means  labels 

labels 

Filled circle means 

At Weak Tree Agreement 

Not at Weak Tree Agreement 

Weak Tree Agreement (WTA)  [Kolmogorov 2006]  Reparameterized potenAals      are said to saAsfy WTA if  there exist non‐empty subsets                  for each node i   Di ⊆ Xi such that   θ˜i (xi ) = h∗i ∀xi ∈ Di min θ˜ij (xi , xj ) = h∗ij ∀xi ∈ Di , (i, j) ∈ E xj ∈Dj

    Black edge means  labels 

labels 

Filled circle means 

At Weak Tree Agreement  D1={0}  D2={0,2}  D2={0,2}  D3={0} 

Not at Weak Tree Agreement  D1={0} 

D2={2} 

D2={0,2} 

D3={0} 

Weak Tree Agreement (WTA)  [Kolmogorov 2006]  Reparameterized potenAals      are said to saAsfy WTA if  there exist non‐empty subsets                  for each node i   Di ⊆ Xi such that   θ˜i (xi ) = h∗i ∀xi ∈ Di min θ˜ij (xi , xj ) = h∗ij ∀xi ∈ Di , (i, j) ∈ E xj ∈Dj

    Black edge means  labels 

labels 

Filled circle means 

At Weak Tree Agreement 

Not at Weak Tree Agreement  D1={0} 

D2={2} 

D2={0,2} 

D3={0} 

WTA Score  e: “local disagreement” measure 

D2={0,2}  D3={0,2} 

Costs:  solid – low  do^ed – medium  else – high 

e23 = max(min(a,high), min(b,c)) − c Filled circle means 

, black edge means 

WTA Score  e: “local disagreement” measure 

D2={0,2}  D3={0,2} 

Costs:  solid – low  do^ed – medium  else – high 

e23 = max(min(a,high), min(b,c)) − c Filled circle means 

, black edge means 

WTA Score  e: “local disagreement” measure 

D2={0,2}  D3={0,2} 

Costs:  solid – low  do^ed – medium  else – high 

e23 = max( a, c ) − c = a − c Filled circle means 

, black edge means 

WTA Score  e: “local disagreement” measure 

D2={0,2}  D3={0,2} 

Costs:  solid – low  do^ed – medium  else – high 

e23 = max( a, c ) − c = a − c Filled circle means 

, black edge means 

WTA Score  e: “local disagreement” measure: node measure 

ei = max θ˜i (x i ) − min θ˜i (x i ) x i ∈Di

€

xi

Single FormulaJon of LPDG and WTA  • Set a max history size parameter R.  • Store most recent R labelings of variable i in  label set Di  R=1: LPDG score.         R>1: WTA score.  Combine scores into undirected edge score:  

ProperJes of LPDG/WTA Scores  • LPDG measure gives upper bound on possible dual  improvement from passing messages on forest  • LPDG may overesAmate "usefulness" of an edge e.g., on non‐ Aght relaxaAons. 

LPDG > 0  WTA = 0 

• WTA measure addresses overesAmate problem: is zero shortly  a3er normal message passing would converge.  • Both only change when messages are passed on nearby  region of graph.   

Experiments  Computer Vision:  •    Stereo  •    Image SegmentaAon  •    Dynamic Image SegmentaAon  Protein Design:  •    StaAc problem  •    CorrelaAon between measure and dual improvement  •    Dynamic search applicaAon  Algorithms  •    TBCA: StaAc Schedule, LPDG Schedule, WTA Schedule  •    MPLP   [Sontag and Globerson implementaAon]  •    TRW‐S  [Kolmogorov ImplementaAon] 

Experiments: Stereo 

383x434 pixels, 16 labels. Po^s potenAals. 

Experiments: Image SegmentaJon 

(

+

+,!"

'(%&

, %*%,

!&!)( !&!) ->?@!AB5C ->?@!D-@ ->?@!?E/:2 -FD!G H@B

!&!'( !&!' ,

!""

#"" -./0,12034

$""

%""

.;2?541/@5

5678,9:;030 6HG!I JCE

%*%)+ %*%) (

!

" # $ %& -(./012345(67230897:0

%! ) '(%&

375x500 pixels, 21 labels. General potenAals based on label co‐occurence. 

Experiments: Dynamic Image SegmentaJon 

*

!&'#

Sheep 

+,!"

, 6>?!@,.A2B 6>?!@,C7/:2B 6DEF!?6F,.A2B 6DEF!?6F,C7/:2B

!&'*

Previous Opt 

Modify White Unaries 

;357

Recommend Documents

Online Dual Coordinate Ascent Learning - Semantic Scholar

Rounded Dynamic Programming for Tree ... - Semantic Scholar