Dynamic Tree Block Coordinate Ascent - Semantic Scholar

Report 3 Downloads 75 Views
Dynamic
Tree
Block
Coordinate
 Ascent


Daniel
Tarlow1,
Dhruv
Batra2
 
Pushmeet
Kohli3,
Vladimir
Kolmogorov4
 1:
University
of
Toronto
 
 2:
TTI
Chicago
 
 
 


3:
Microso3
Research
Cambridge
 4:
University
College
London


InternaAonal
Conference
on
Machine
Learning
(ICML),
2011


MAP
in
Large
Discrete
Models
 •  Many
important
problems
can
be
expressed
as
a
discrete
 Random
Field
(MRF,
CRF)
 •  MAP
inference
is
a
fundamental
problem


min E(x) = min

x∈X

x∈X

! i∈V

θi (xi ) +

!

θij (xi , xj )

(i,j)∈E

InpainAng


Stereo


Object
Class
Labeling


Protein
Design
/
Side
Chain
PredicAon


Primal
and
Dual
 Primal
 min x

!

Dual
 θA (xA )

A∈V∪E



!

min θ˜A (xA ) =

A∈V∪E

xA

!

h∗A

A∈V∪E

‐


Dual
is
a
lower
bound:
less
constrained
version
of
primal
 ‐







is
a
reparameteriza)on,
determined
by
messages
 ‐


hA*
is
height
of
unary
or
pairwise
potenAal
 ‐


DefiniAon
of
reparameterizaAon:


!

A∈V∪E

θA (xA ) =

!

A∈V∪E

θ˜A (xA )

∀{xA }

LP‐based
message
passing:
find
reparameterizaAon
to
maximize
dual


Standard
Linear
Program‐based
Message
Passing
 •  Max
Product
Linear
Programming
(MPLP)
 –  Update
edges
in
fixed
order


•  SequenAal
Tree‐Reweighted
Max
Product
(TRW‐S)
 –  SequenAally
iterate
over
variables
in
fixed
order


•  Tree
Block
Coordinate
Ascent
(TBCA)
[Sontag
&
Jaakkola,
2009]
 –  Update
trees
in
fixed
order


Key:
these
are
all
energy
oblivious
 Can
we
do
be^er
by
being
energy
aware?


Example
 TBCA
with
StaJc
Schedule:
 630
messages
needed


TBCA
with
Dynamic
Schedule:

 276
messages
needed


Benefit
of
Energy
Awareness
 StaJc
seMngs
 –  Not
all
graph
regions
are
equally
difficult
 –  RepeaAng
computaAon
on
easy
parts
is
wasteful
 Harder
region
 Easy
region


Dynamic
seMngs
(e.g.,
learning,
search)
 –  Small
region
of
graph
changes.
 –  ComputaAon
on
unchanged
part
is
wasteful
 Unchanged
 Changed
 Image


Previous
OpAmum
 Change
Mask


References
and
Related
Work
 •  [Elidan
et
al.,
2006],
[Su^on
&
McCallum,
2007]

 –  Residual
Belief
PropagaAon.

Pass
most
different
messages
first.


•  [Chandrasekaran
et
al.,
2007]
 –  Works
only
on
conAnuous
variables.

Very
different
formulaAon.


•  [Batra
et
al.,
2011]
 –  Local
Primal
Dual
Gap
for
Tightening
LP
relaxaAons.


•  [Kolmogorov,
2006]
 –  Weak
Tree
Agreement
in
relaAon
to
TRW‐S.


•  [Sontag
et
al.,
2009]

 –  Tree
Block
Coordinate
Descent.


VisualizaJon
of
reparameterized
energy
 States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
 "Good" local settings:

G


x1
 R
 G
 B


G‐G,
B‐B


(can assume "good" has cost 0, otherwise cost 1)

G
 x2


G‐G,
B‐B


G
 x3


G‐B,
B‐B


B
 x4


VisualizaJon
of
reparameterized
energy
 States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
 "Good" local settings:

G


x1


G‐G,
B‐B
 "Don't
be
R
or
B"
 "Don't
be
R
or
B"


G
 x2


G‐G,
B‐B
 "Don't
be
R
or
B"
 "Don't
be
R
or
B"


G
 x3


G‐B,
B‐B
 "Don't
be
R
or
G"
 "Don't
be
R"


R
 G
 B
 HypotheJcal
messages
that
e.g.
residual
max‐product
would
send.


B
 x4


VisualizaJon
of
reparameterized
energy
 States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
 "Good" local settings:

G


x1


G‐G,
B‐B


G
 x2


G‐G,
B‐B


G
 x3


G‐B,
B‐B


B
 x4


R
 G
 B
 But
we
don't
need
to
send
any
messages.

We
are
at
the
global
opJmum.
 Our
scores
(see
later
slides)
are
0,
so
we
wouldn't
send
any
messages
here.


VisualizaJon
of
reparameterized
energy
 States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
 "Good" local settings:

B


x1


G‐G,
B‐B


B
 x2


G‐G,
B‐B


B


G‐B,
B‐B


x3


R
 G
 B
 Change
unary
potenJals
(e.g.,
during
learning
or
search)



B
 x4


VisualizaJon
of
reparameterized
energy
 States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
 "Good" local settings:

B


x1


G‐G,
B‐B


B
 x2


G‐G,
B‐B


B


G‐B,
B‐B


x3


R
 G
 B
 Locally,
best
assignment
for
some
variables
change.


B
 x4


VisualizaJon
of
reparameterized
energy
 States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
 "Good" local settings:

B


x1


G‐G,
B‐B
 "Don't
be
R
or
G"
 "Don't
be
R
or
G"


B
 x2


G‐G,
B‐B
 "Don't
be
R
or
G"
 "Don't
be
R
or
G"


B
 x3


G‐B,
B‐B
 "Don't
be
R
or
G"
 "Don't
be
R"


R
 G
 B
 HypotheJcal
messages
that
e.g.
residual
max‐product
would
send.


B
 x4


VisualizaJon
of
reparameterized
energy
 States
for
each
variable:
red
(R),
green
(G),
or
blue
(B)
 "Good" local settings:

B


x1


G‐G,
B‐B


B
 x2


G‐G,
B‐B


B
 x3


G‐B,
B‐B


B
 x4


R
 G
 B
 But
we
don't
need
to
send
any
messages.

We
are
at
the
global
opJmum.
 Our
scores
(see
later
slides)
are
0,
so
we
wouldn't
send
any
messages
here.


VisualizaJon
of
reparameterized
energy
 "Good" local settings: B
 G‐G,
B‐B
 B
 x1


G‐G,
B‐B


x2


B


G‐B,
B‐B


x3


B
 x4


Possible
fix:

look
at
how
much
sending
messages
on
edge
would
improve
dual.
 • 


Would
work
in
above
case,
but
incorrectly
ignores
e.g.
the
subgraph
below:
 "Good" local settings: B
 x1


R
 G
 B


B‐B


G,
B
 x2


G‐G


R,
G
 x3


R‐R


R
 x4


Key
Slide


B
 x1


B‐B


G,
B
 x2


B‐B


R,G


R‐R


x3


Locally,
everything
looks
opAmal


R
 x4


Key
Slide


B
 x1


B‐B


G,
B
 x2


B‐B


R,G


R‐R


x3


Try
assigning
a
value
to
each
variable


R
 x4


Key
Slide
 Our
main
contribuAon


Use
primal
(and
dual)
informaJon
to
 choose
regions
on
which
to
pass
messages
 B


B‐B


G,
B


x1


x2


al


Am p o b u S

B‐B


R,G


R‐R


R


x3


x4


Am Subop

Try
assigning
a
value
to
each
variable


al


Our
FormulaJon
 •  Measure
primal‐dual
local
agreement
at
edges
and
variables
 –  Local
Primal
Dual
Gap
(LPDG).
 –  Weak
Tree
Agreement
(WTA).


•  Choose
forest
with
maximum
disagreement
 –  Kruskal's
algorithm,
possibly
terminated
early


•  Apply
TBCA
update
on
maximal
trees


Important!

Minimize
overhead.
 Use
quanAAes
that
are
already
computed
during
 inference,
and
carefully
cache
computaAons




Local
Primal‐Dual
Gap
(LPDG)
Score
 •  Difference
between
primal
and
dual
objecAves
 – Given
primal
assignment
xp
and
dual
variables
 (messages)
defining




,
primal‐dual
gap
is
 Primal‐dual

 gap


!

θA (xpA )

A∈V∪E



!

A∈V∪E

primal


=

! "

A∈V∪E

min θ˜A (xA ) xA

dual


# ! p θ˜A (xA ) − min θ˜A (xA ) = LPDG (A)

Primal
cost
of
node/edge


xA

A∈V∪E

Dual
bound
at
node/edge


e:
“local
disagreement”
measure:
 eA = LPDG (A)

Shortcoming
of
LPDG
Score:
Loose
RelaxaJons


LPDG
>
0,
 but
dual
opAmal


Filled
circle
means


,
black
edge
means


Weak
Tree
Agreement
(WTA)

[Kolmogorov
2006]
 Reparameterized
potenAals





are
said
to
saAsfy
WTA
if
 there
exist
non‐empty
subsets

















for
each
node
i

 Di ⊆ Xi such
that

 θ˜i (xi ) = h∗i ∀xi ∈ Di min θ˜ij (xi , xj ) = h∗ij ∀xi ∈ Di , (i, j) ∈ E xj ∈Dj





Black
edge
means
 labels


labels


Filled
circle
means


At
Weak
Tree
Agreement


Not
at
Weak
Tree
Agreement


Weak
Tree
Agreement
(WTA)

[Kolmogorov
2006]
 Reparameterized
potenAals





are
said
to
saAsfy
WTA
if
 there
exist
non‐empty
subsets

















for
each
node
i

 Di ⊆ Xi such
that

 θ˜i (xi ) = h∗i ∀xi ∈ Di min θ˜ij (xi , xj ) = h∗ij ∀xi ∈ Di , (i, j) ∈ E xj ∈Dj





Black
edge
means
 labels


labels


Filled
circle
means


At
Weak
Tree
Agreement
 D1={0}
 D2={0,2}
 D2={0,2}
 D3={0}


Not
at
Weak
Tree
Agreement
 D1={0}


D2={2}


D2={0,2}


D3={0}


Weak
Tree
Agreement
(WTA)

[Kolmogorov
2006]
 Reparameterized
potenAals





are
said
to
saAsfy
WTA
if
 there
exist
non‐empty
subsets

















for
each
node
i

 Di ⊆ Xi such
that

 θ˜i (xi ) = h∗i ∀xi ∈ Di min θ˜ij (xi , xj ) = h∗ij ∀xi ∈ Di , (i, j) ∈ E xj ∈Dj





Black
edge
means
 labels


labels


Filled
circle
means


At
Weak
Tree
Agreement


Not
at
Weak
Tree
Agreement
 D1={0}


D2={2}


D2={0,2}


D3={0}


WTA
Score
 e:
“local
disagreement”
measure


D2={0,2}
 D3={0,2}


Costs:
 solid
–
low
 do^ed
–
medium
 else
–
high


e23 = max(min(a,high), min(b,c)) − c Filled
circle
means


,
black
edge
means


WTA
Score
 e:
“local
disagreement”
measure


D2={0,2}
 D3={0,2}


Costs:
 solid
–
low
 do^ed
–
medium
 else
–
high


e23 = max(min(a,high), min(b,c)) − c Filled
circle
means


,
black
edge
means


WTA
Score
 e:
“local
disagreement”
measure


D2={0,2}
 D3={0,2}


Costs:
 solid
–
low
 do^ed
–
medium
 else
–
high


e23 = max( a, c ) − c = a − c Filled
circle
means


,
black
edge
means


WTA
Score
 e:
“local
disagreement”
measure


D2={0,2}
 D3={0,2}


Costs:
 solid
–
low
 do^ed
–
medium
 else
–
high


e23 = max( a, c ) − c = a − c Filled
circle
means


,
black
edge
means


WTA
Score
 e:
“local
disagreement”
measure:
node
measure


ei = max θ˜i (x i ) − min θ˜i (x i ) x i ∈Di



xi

Single
FormulaJon
of
LPDG
and
WTA
 •  Set
a
max
history
size
parameter
R.
 •  Store
most
recent
R
labelings
of
variable
i
in
 label
set
Di
 R=1:
LPDG
score.








R>1:
WTA
score.
 Combine
scores
into
undirected
edge
score:



ProperJes
of
LPDG/WTA
Scores
 •  LPDG
measure
gives
upper
bound
on
possible
dual
 improvement
from
passing
messages
on
forest
 •  LPDG
may
overesAmate
"usefulness"
of
an
edge
e.g.,
on
non‐ Aght
relaxaAons.


LPDG
>
0
 WTA
=
0


•  WTA
measure
addresses
overesAmate
problem:
is
zero
shortly
 a3er
normal
message
passing
would
converge.
 •  Both
only
change
when
messages
are
passed
on
nearby
 region
of
graph.




Experiments
 Computer
Vision:
 • 


Stereo
 • 


Image
SegmentaAon
 • 


Dynamic
Image
SegmentaAon
 Protein
Design:
 • 


StaAc
problem
 • 


CorrelaAon
between
measure
and
dual
improvement
 • 


Dynamic
search
applicaAon
 Algorithms
 • 


TBCA:
StaAc
Schedule,
LPDG
Schedule,
WTA
Schedule
 • 


MPLP


[Sontag
and
Globerson
implementaAon]
 • 


TRW‐S

[Kolmogorov
ImplementaAon]


Experiments:
Stereo


383x434
pixels,
16
labels.
Po^s
potenAals.


Experiments:
Image
SegmentaJon


(

+

+,!"

'(%&

, %*%,

!&!)( !&!) ->?@!AB5C ->?@!D-@ ->?@!?E/:2 -FD!G H@B

!&!'( !&!' ,

!""

#"" -./0,12034

$""

%""

.;2?541/@5

5678,9:;030 6HG!I JCE

%*%)+ %*%) (

!

" # $ %& -(./012345(67230897:0

%! ) '(%&

375x500
pixels,
21
labels.
General
potenAals
based
on
label
co‐occurence.


Experiments:
Dynamic
Image
SegmentaJon


*

!&'#

Sheep


+,!"

, 6>?!@,.A2B 6>?!@,C7/:2B 6DEF!?6F,.A2B 6DEF!?6F,C7/:2B

!&'*

Previous
Opt


Modify
White
Unaries


;357