Dynamic
Tree
Block
Coordinate
Ascent
Daniel
Tarlow1,
Dhruv
Batra2
Pushmeet
Kohli3,
Vladimir
Kolmogorov4
1:
University
of
Toronto
2:
TTI
Chicago
3:
Microso3
Research
Cambridge
4:
University
College
London
InternaAonal
Conference
on
Machine
Learning
(ICML),
2011
MAP
in
Large
Discrete
Models
• Many
important
problems
can
be
expressed
as
a
discrete
Random
Field
(MRF,
CRF)
• MAP
inference
is
a
fundamental
problem
min E(x) = min
x∈X
x∈X
! i∈V
θi (xi ) +
!
θij (xi , xj )
(i,j)∈E
InpainAng
Stereo
Object
Class
Labeling
Protein
Design
/
Side
Chain
PredicAon
Primal
and
Dual
Primal
min x
!
Dual
θA (xA )
A∈V∪E
≥
!
min θ˜A (xA ) =
A∈V∪E
xA
!
h∗A
A∈V∪E
‐
Dual
is
a
lower
bound:
less
constrained
version
of
primal
‐
is
a
reparameteriza)on,
determined
by
messages
‐
hA*
is
height
of
unary
or
pairwise
potenAal
‐
DefiniAon
of
reparameterizaAon:
!
A∈V∪E
θA (xA ) =
!
A∈V∪E
θ˜A (xA )
∀{xA }
LP‐based
message
passing:
find
reparameterizaAon
to
maximize
dual
Standard
Linear
Program‐based
Message
Passing
• Max
Product
Linear
Programming
(MPLP)
– Update
edges
in
fixed
order
• SequenAal
Tree‐Reweighted
Max
Product
(TRW‐S)
– SequenAally
iterate
over
variables
in
fixed
order
• Tree
Block
Coordinate
Ascent
(TBCA)
[Sontag
&
Jaakkola,
2009]
– Update
trees
in
fixed
order
Key:
these
are
all
energy
oblivious
Can
we
do
be^er
by
being
energy
aware?
Example
TBCA
with
StaJc
Schedule:
630
messages
needed
TBCA
with
Dynamic
Schedule:
276
messages
needed
Benefit
of
Energy
Awareness
StaJc
seMngs
– Not
all
graph
regions
are
equally
difficult
– RepeaAng
computaAon
on
easy
parts
is
wasteful
Harder
region
Easy
region
Dynamic
seMngs
(e.g.,
learning,
search)
– Small
region
of
graph
changes.
– ComputaAon
on
unchanged
part
is
wasteful
Unchanged
Changed
Image
Previous
OpAmum
Change
Mask
References
and
Related
Work
• [Elidan
et
al.,
2006],
[Su^on
&
McCallum,
2007]
– Residual
Belief
PropagaAon.
Pass
most
different
messages
first.
• [Chandrasekaran
et
al.,
2007]
– Works
only
on
conAnuous
variables.
Very
different
formulaAon.
• [Batra
et
al.,
2011]
– Local
Primal
Dual
Gap
for
Tightening
LP
relaxaAons.
• [Kolmogorov,
2006]
– Weak
Tree
Agreement
in
relaAon
to
TRW‐S.
• [Sontag
et
al.,
2009]
– Tree
Block
Coordinate
Descent.
VisualizaJon
of
reparameterized
energy
States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
"Good" local settings:
G
x1
R
G
B
G‐G,
B‐B
(can assume "good" has cost 0, otherwise cost 1)
G
x2
G‐G,
B‐B
G
x3
G‐B,
B‐B
B
x4
VisualizaJon
of
reparameterized
energy
States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
"Good" local settings:
G
x1
G‐G,
B‐B
"Don't
be
R
or
B"
"Don't
be
R
or
B"
G
x2
G‐G,
B‐B
"Don't
be
R
or
B"
"Don't
be
R
or
B"
G
x3
G‐B,
B‐B
"Don't
be
R
or
G"
"Don't
be
R"
R
G
B
HypotheJcal
messages
that
e.g.
residual
max‐product
would
send.
B
x4
VisualizaJon
of
reparameterized
energy
States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
"Good" local settings:
G
x1
G‐G,
B‐B
G
x2
G‐G,
B‐B
G
x3
G‐B,
B‐B
B
x4
R
G
B
But
we
don't
need
to
send
any
messages.
We
are
at
the
global
opJmum.
Our
scores
(see
later
slides)
are
0,
so
we
wouldn't
send
any
messages
here.
VisualizaJon
of
reparameterized
energy
States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
"Good" local settings:
B
x1
G‐G,
B‐B
B
x2
G‐G,
B‐B
B
G‐B,
B‐B
x3
R
G
B
Change
unary
potenJals
(e.g.,
during
learning
or
search)
B
x4
VisualizaJon
of
reparameterized
energy
States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
"Good" local settings:
B
x1
G‐G,
B‐B
B
x2
G‐G,
B‐B
B
G‐B,
B‐B
x3
R
G
B
Locally,
best
assignment
for
some
variables
change.
B
x4
VisualizaJon
of
reparameterized
energy
States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
"Good" local settings:
B
x1
G‐G,
B‐B
"Don't
be
R
or
G"
"Don't
be
R
or
G"
B
x2
G‐G,
B‐B
"Don't
be
R
or
G"
"Don't
be
R
or
G"
B
x3
G‐B,
B‐B
"Don't
be
R
or
G"
"Don't
be
R"
R
G
B
HypotheJcal
messages
that
e.g.
residual
max‐product
would
send.
B
x4
VisualizaJon
of
reparameterized
energy
States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
"Good" local settings:
B
x1
G‐G,
B‐B
B
x2
G‐G,
B‐B
B
x3
G‐B,
B‐B
B
x4
R
G
B
But
we
don't
need
to
send
any
messages.
We
are
at
the
global
opJmum.
Our
scores
(see
later
slides)
are
0,
so
we
wouldn't
send
any
messages
here.
VisualizaJon
of
reparameterized
energy
"Good" local settings: B
G‐G,
B‐B
B
x1
G‐G,
B‐B
x2
B
G‐B,
B‐B
x3
B
x4
Possible
fix:
look
at
how
much
sending
messages
on
edge
would
improve
dual.
•
Would
work
in
above
case,
but
incorrectly
ignores
e.g.
the
subgraph
below:
"Good" local settings: B
x1
R
G
B
B‐B
G,
B
x2
G‐G
R,
G
x3
R‐R
R
x4
Key
Slide
B
x1
B‐B
G,
B
x2
B‐B
R,G
R‐R
x3
Locally,
everything
looks
opAmal
R
x4
Key
Slide
B
x1
B‐B
G,
B
x2
B‐B
R,G
R‐R
x3
Try
assigning
a
value
to
each
variable
R
x4
Key
Slide
Our
main
contribuAon
Use
primal
(and
dual)
informaJon
to
choose
regions
on
which
to
pass
messages
B
B‐B
G,
B
x1
x2
al
Am p o b u S
B‐B
R,G
R‐R
R
x3
x4
Am Subop
Try
assigning
a
value
to
each
variable
al
Our
FormulaJon
• Measure
primal‐dual
local
agreement
at
edges
and
variables
– Local
Primal
Dual
Gap
(LPDG).
– Weak
Tree
Agreement
(WTA).
• Choose
forest
with
maximum
disagreement
– Kruskal's
algorithm,
possibly
terminated
early
• Apply
TBCA
update
on
maximal
trees
Important!
Minimize
overhead.
Use
quanAAes
that
are
already
computed
during
inference,
and
carefully
cache
computaAons
Local
Primal‐Dual
Gap
(LPDG)
Score
• Difference
between
primal
and
dual
objecAves
– Given
primal
assignment
xp
and
dual
variables
(messages)
defining
,
primal‐dual
gap
is
Primal‐dual
gap
!
θA (xpA )
A∈V∪E
−
!
A∈V∪E
primal
=
! "
A∈V∪E
min θ˜A (xA ) xA
dual
# ! p θ˜A (xA ) − min θ˜A (xA ) = LPDG (A)
Primal
cost
of
node/edge
xA
A∈V∪E
Dual
bound
at
node/edge
e:
“local
disagreement”
measure:
eA = LPDG (A)
Shortcoming
of
LPDG
Score:
Loose
RelaxaJons
LPDG
>
0,
but
dual
opAmal
Filled
circle
means
,
black
edge
means
Weak
Tree
Agreement
(WTA)
[Kolmogorov
2006]
Reparameterized
potenAals
are
said
to
saAsfy
WTA
if
there
exist
non‐empty
subsets
for
each
node
i
Di ⊆ Xi such
that
θ˜i (xi ) = h∗i ∀xi ∈ Di min θ˜ij (xi , xj ) = h∗ij ∀xi ∈ Di , (i, j) ∈ E xj ∈Dj
Black
edge
means
labels
labels
Filled
circle
means
At
Weak
Tree
Agreement
Not
at
Weak
Tree
Agreement
Weak
Tree
Agreement
(WTA)
[Kolmogorov
2006]
Reparameterized
potenAals
are
said
to
saAsfy
WTA
if
there
exist
non‐empty
subsets
for
each
node
i
Di ⊆ Xi such
that
θ˜i (xi ) = h∗i ∀xi ∈ Di min θ˜ij (xi , xj ) = h∗ij ∀xi ∈ Di , (i, j) ∈ E xj ∈Dj
Black
edge
means
labels
labels
Filled
circle
means
At
Weak
Tree
Agreement
D1={0}
D2={0,2}
D2={0,2}
D3={0}
Not
at
Weak
Tree
Agreement
D1={0}
D2={2}
D2={0,2}
D3={0}
Weak
Tree
Agreement
(WTA)
[Kolmogorov
2006]
Reparameterized
potenAals
are
said
to
saAsfy
WTA
if
there
exist
non‐empty
subsets
for
each
node
i
Di ⊆ Xi such
that
θ˜i (xi ) = h∗i ∀xi ∈ Di min θ˜ij (xi , xj ) = h∗ij ∀xi ∈ Di , (i, j) ∈ E xj ∈Dj
Black
edge
means
labels
labels
Filled
circle
means
At
Weak
Tree
Agreement
Not
at
Weak
Tree
Agreement
D1={0}
D2={2}
D2={0,2}
D3={0}
WTA
Score
e:
“local
disagreement”
measure
D2={0,2}
D3={0,2}
Costs:
solid
–
low
do^ed
–
medium
else
–
high
e23 = max(min(a,high), min(b,c)) − c Filled
circle
means
,
black
edge
means
WTA
Score
e:
“local
disagreement”
measure
D2={0,2}
D3={0,2}
Costs:
solid
–
low
do^ed
–
medium
else
–
high
e23 = max(min(a,high), min(b,c)) − c Filled
circle
means
,
black
edge
means
WTA
Score
e:
“local
disagreement”
measure
D2={0,2}
D3={0,2}
Costs:
solid
–
low
do^ed
–
medium
else
–
high
e23 = max( a, c ) − c = a − c Filled
circle
means
,
black
edge
means
WTA
Score
e:
“local
disagreement”
measure
D2={0,2}
D3={0,2}
Costs:
solid
–
low
do^ed
–
medium
else
–
high
e23 = max( a, c ) − c = a − c Filled
circle
means
,
black
edge
means
WTA
Score
e:
“local
disagreement”
measure:
node
measure
ei = max θ˜i (x i ) − min θ˜i (x i ) x i ∈Di
€
xi
Single
FormulaJon
of
LPDG
and
WTA
• Set
a
max
history
size
parameter
R.
• Store
most
recent
R
labelings
of
variable
i
in
label
set
Di
R=1:
LPDG
score.
R>1:
WTA
score.
Combine
scores
into
undirected
edge
score:
ProperJes
of
LPDG/WTA
Scores
• LPDG
measure
gives
upper
bound
on
possible
dual
improvement
from
passing
messages
on
forest
• LPDG
may
overesAmate
"usefulness"
of
an
edge
e.g.,
on
non‐ Aght
relaxaAons.
LPDG
>
0
WTA
=
0
• WTA
measure
addresses
overesAmate
problem:
is
zero
shortly
a3er
normal
message
passing
would
converge.
• Both
only
change
when
messages
are
passed
on
nearby
region
of
graph.
Experiments
Computer
Vision:
•
Stereo
•
Image
SegmentaAon
•
Dynamic
Image
SegmentaAon
Protein
Design:
•
StaAc
problem
•
CorrelaAon
between
measure
and
dual
improvement
•
Dynamic
search
applicaAon
Algorithms
•
TBCA:
StaAc
Schedule,
LPDG
Schedule,
WTA
Schedule
•
MPLP
[Sontag
and
Globerson
implementaAon]
•
TRW‐S
[Kolmogorov
ImplementaAon]
Experiments:
Stereo
383x434
pixels,
16
labels.
Po^s
potenAals.
Experiments:
Image
SegmentaJon
(
+
+,!"
'(%&
, %*%,
!&!)( !&!) ->?@!AB5C ->?@!D-@ ->?@!?E/:2 -FD!G H@B
!&!'( !&!' ,
!""
#"" -./0,12034
$""
%""
.;2?541/@5
5678,9:;030 6HG!I JCE
%*%)+ %*%) (
!
" # $ %& -(./012345(67230897:0
%! ) '(%&
375x500
pixels,
21
labels.
General
potenAals
based
on
label
co‐occurence.
Experiments:
Dynamic
Image
SegmentaJon
*
!&'#
Sheep
+,!"
, 6>?!@,.A2B 6>?!@,C7/:2B 6DEF!?6F,.A2B 6DEF!?6F,C7/:2B
!&'*
Previous
Opt
Modify
White
Unaries
;357