Shared Components Topic Models

Comment

Report 2 Downloads 164 Views

Shared Components Topic Models Ma$hew R. Gormley, Mark Dredze, Benjamin Van Durme, Jason Eisner Center for Language and Speech Processing Human Language Technology Center of Excellence Johns Hopkins University NAACL 2012 June 6, 2012

Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics)

2

Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) Most extensions to LDA

Our Model

3

words

words

words

words

words

0.000 0.006 0.012

0.006 0.000

probability

0.006 0.000

probability

0.000 0.006 0.012

probability

0.006 0.000

0.000

0.006

probability

LDA for Topic Modeling

probability

(Blei, Ng, & Jordan, 2003)

words

•  Each topic is deﬁned as a Mul5nomial distribu5on over the vocabulary, parameterized by ϕk 4

(Blei, Ng, & Jordan, 2003)

LDA for Topic Modeling

words

words

ϕ5

words

words

0.000 0.006 0.012

ϕ6

probability

0.006 0.000

0.006

probability

ϕ4

0.000

0.000 0.006 0.012

probability

0.006 0.000

probability

0.006 0.000

words

ϕ3

ϕ2

probability

ϕ1

words

•  Each topic is deﬁned as a Mul5nomial distribu5on over the vocabulary, parameterized by ϕk 5

words

words

0.000 0.006 0.012

probability

0.006

probability

words

ϕ6

0.000

ϕ5

0.006

probability

ϕ4

0.000

probability

0.006

probability

0.006

words

ϕ3

0.000

ϕ2

0.000

probability

ϕ1

0.000 0.006 0.012

LDA for Topic Modeling

(Blei, Ng, & Jordan, 2003)

words

words

team, season, hockey, player, penguins, ice, canadiens, puck, montreal, stanley, cup

•  A topic is visualized as its high probability words.

6

words

words

0.000 0.006 0.012

probability

0.006

probability

words

ϕ6

0.000

{hockey}

ϕ5

0.006

probability

ϕ4

0.000

probability

0.006

probability

0.006

words

ϕ3

0.000

ϕ2

0.000

probability

ϕ1

0.000 0.006 0.012

LDA for Topic Modeling

(Blei, Ng, & Jordan, 2003)

words

words

team, season, hockey, player, penguins, ice, canadiens, puck, montreal, stanley, cup

•  A topic is visualized as its high probability words. •  A pedigogical label is used to idenTfy the topic. 7

LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}

ϕ3

{hockey}

ϕ4

{U.S. gov.}

(Blei, Ng, & Jordan, 2003)

ϕ5

{baseball}

ϕ6

{Japan}

•  A topic is visualized as its high probability words. •  A pedagogical label is used to idenTfy the topic. 8

LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}

ϕ3

{hockey}

ϕ4

{U.S. gov.}

ϕ5

{baseball}

ϕ6

{Japan}

Dirichlet(α) θ1=

9

LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}

ϕ3

{hockey}

ϕ4

{U.S. gov.}

ϕ5

{baseball}

ϕ6

{Japan}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US

10

LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}

ϕ3

{hockey}

ϕ4

{U.S. gov.}

ϕ5

{baseball}

ϕ6

{Japan}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US

11

LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}

ϕ3

{hockey}

ϕ4

{U.S. gov.}

ϕ5

{baseball}

ϕ6

{Japan}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard

12

LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}

ϕ3

{hockey}

ϕ4

{U.S. gov.}

ϕ5

{baseball}

ϕ6

{Japan}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's ﬁsh boats in the disputed waters oﬀ Dixon… 13

LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}

ϕ3

{hockey}

ϕ4

{U.S. gov.}

ϕ5

{baseball}

ϕ6

{Japan}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's ﬁsh boats in the disputed waters oﬀ Dixon…

θ2= In the year before Lemieux came, Pi$sburgh ﬁnished with 38 points. Following his arrival, the Pens ﬁnished…

θ3= The Orioles' pitching staﬀ again is having a ﬁne exhibiTon season. Four shutouts, low team ERA, (Well, I haven't go$en any baseball… 14

LDA for Topic Modeling Dirichlet(β) ϕ1 ϕ2 {Canadian gov.} {government}

ϕ3

{hockey}

ϕ4

{U.S. gov.}

ϕ5

{baseball}

ϕ6

{Japan}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's ﬁsh boats in the disputed waters oﬀ Dixon…

θ2= In the year before Lemieux came, Pi$sburgh ﬁnished with 38 points. Following his arrival, the Pens ﬁnished…

θ3= The Orioles' pitching staﬀ again is having a ﬁne exhibiTon season. Four shutouts, low team ERA, (Well, I haven't go$en any baseball… 15

LDA for Topic Modeling Dirichlet(β) ϕ1 ϕ2 {Canadian gov.} {government}

ϕ3

{hockey}

ϕ4

{U.S. gov.}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's ﬁsh boats in the disputed waters oﬀ Dixon…

θ2= In the year before Lemieux came, Pi$sburgh ﬁnished with 38 points. Following his arrival, the Pens ﬁnished…

DistribuTons over ϕ ϕ words (topics) 5

{baseball}

6

{Japan}

DistribuTons over θ3= topics (docs) The Orioles' pitching staﬀ again is having a ﬁne exhibiTon season. Four shutouts, low team ERA, (Well, I haven't go$en any baseball… 16

LDA for Topic Modeling Dirichlet(β) ϕ1 ϕ2 {Canadian gov.} {government}

ϕ3

{hockey}

ϕ4

{U.S. gov.}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's ﬁsh boats in the disputed waters oﬀ Dixon…

θ2= In the year before Lemieux came, Pi$sburgh ﬁnished with 38 points. Following his arrival, the Pens ﬁnished…

DistribuTons over ϕ ϕ words (topics) 5

{baseball}

6

{Japan}

DistribuTons over θ3= topics (docs) The Orioles' pitching staﬀ again is having a ﬁne exhibiTon season. Four shutouts, low team ERA, (Well, I haven't go$en any baseball… 17

LDA for Topic Modeling Two problems with the LDA generaTve story for topics: 1.  Independently generate each topic 2.  For each topic, store a parameter per word in the vocabulary

18

LDA for Topic Modeling Two problems with the LDA generaTve story for topics: 1.  Independently generate each topic 2.  For each topic, store a parameter per word in the vocabulary

We’re not the ﬁrst to noTce this… 19

Our Model Shared Components Topic Model (SCTM): –  Generate a pool of “components” (proto-‐topics) –  Assemble each topic from some of the components •  MulTply and renormalize (“product of experts”)

–  Documents are mixtures of topics ( just like LDA) 1.  So the wordlists of two topics are not generated independently! 2.  Fewer parameters 20

SCTM: MoTvaTng Example Components are distribuTons over words. How to combine components into topics? Decreasing probability

ϕ1 {sports}

ϕ2 {Canada}

ϕ3 {government}

player

canada

democracy

team

Quebec

socialism

hockey

parliament

voted

baseball

snow

elecTon

Orioles

Hansard’s

Obama

Canucks

Elizabeth II

PuTn

season

hockey

parliament

21

SCTM: MoTvaTng Example We can imagine a component as a set of words (i.e. all the non-‐zero probabiliTes are idenTcal): ϕ1 {sports}

ϕ2 {Canada}

ϕ3 {government}

player

canada

democracy

team

Quebec

socialism

hockey

parliament

voted

baseball

snow

elecTon

Orioles

Hansard’s

Obama

Canucks

Elizabeth II

PuTn

season

hockey

parliament

22

SCTM: MoTvaTng Example To create a {Canadian government} topic we could take the union of {government} and {Canada}. ϕ1 {sports}

ϕ2 {Canada}

ϕ3 {government}

player

canada

democracy

team

Quebec

socialism

hockey

parliament

voted

baseball

snow

elecTon

Orioles

Hansard’s

Obama

Canucks

Elizabeth II

PuTn

season

hockey

parliament

23

SCTM: MoTvaTng Example Be$er yet, to create a {Canadian government} topic we could take the intersec5on of {government} and {Canada}. ϕ3 {government}

ϕ1 {sports}

{Canadian gov.}

ϕ2 {Canada}

24

SCTM: MoTvaTng Example Be$er yet, to create a {Canadian government} topic we could take the intersec5on of {government} and {Canada}. ϕ3 {government}

{Canadian gov.}

{hockey}?

ϕ1 {sports}

ϕ2 {Canada}

25

SCTM: MoTvaTng Example More complex intersec5ons might be more realisTc: ϕ1

{Canada}

ϕ2

{government}

ϕ3

{sports}

b1

{Canadian gov.}

26

SCTM: MoTvaTng Example More complex intersec5ons might be more realisTc: ϕ1

{Canada}

b1

{Canadian gov.}

ϕ2

{government}

ϕ3

{sports}

ϕ4

{U.S.}

b4 {U.S. gov.}

27

SCTM: MoTvaTng Example More complex intersec5ons might be more realisTc: ϕ1

{Canada}

b1

{Canadian gov.}

ϕ2

{government}

b3 {hockey}

ϕ3

{sports}

ϕ4

{U.S.}

b4 {U.S. gov.}

28

SCTM: MoTvaTng Example More complex intersec5ons might be more realisTc: ϕ1

{Canada}

b1

ϕ2

{government}

b2 {Canadian gov.} {government}

b3 {hockey}

ϕ3

{sports}

b4 {U.S. gov.}

ϕ4

{U.S.}

b5 {baseball}

ϕ5

{Japan}

b6 {Japan}

29

SCTM: MoTvaTng Example More complex intersec5ons might be more realisTc:

Components

ϕ1

{Canada}

b1

ϕ2

{government}

b2 {Canadian gov.} {government}

b3 {hockey}

ϕ3

{sports}

b4 {U.S. gov.}

ϕ4

{U.S.}

b5 {baseball}

ϕ5

{Japan}

b6 {Japan}

Topics 30

Sos IntersecTon and Union •  We don’t want topics to be sets of words, we want probability distribu5ons over words •  In probability space…

Union

Mixture

IntersecTon

Normalized Product 31

Anonymous Author(s) Affiliation Address email

Product of Experts

Product of Experts (PoE) model (Hinton, 2002) Q C φ PV c=1 QC cx v=1 c=1 φcv

Experts– (PoE) [1] model . . , φC ) = product , where ther Another name p(x|φ for a1 ,n.ormalized nd the summation the denominator is over all possible types. –  For a sinubset of components, deﬁne the feature model as: �

p(x|φ1 , . . . , φC ) = �V

c∈C

v=1

richlet allocation generative process {1, . . . , K}:

IntersecTon

[draw distribution over words] m ∈ {1, . . . , M } [draw distribution over topics] ∈ {1, . . . , Nm }

�

φcx c∈C

φcv

The Finite IBP model generative process

Normalized [colum ∼ Product Beta( , 1) probability of comp ([draw PoE)

For each component c ∈ {1, . . . , C}: γ πc C For each topic k ∈ {1, . . . , K}: b ∼ Bernoulli(π )

[ro

32

Our Model Shared Components Topic Model (SCTM): –  Generate a pool of “components” (proto-‐topics) –  Assemble each topic from some of the components •  Mul5ply and renormalize (“product of experts”)

–  Documents are mixtures of topics ( just like LDA) 1.  So topics are not independent! 2.  Fewer parameters

33

Our Model Shared Components Topic Model (SCTM):

–  Generate a pool of “components” (proto-‐topics) –  Assemble each topic from some of the components •  MulTply and renormalize (“product of experts”)

–  Documents are mixtures of topics ( just like LDA) 1.  So topics are not independent! 2.  Fewer parameters

34

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? ϕ1

{Canada}

ϕ2

{government}

ϕ3

{sports}

ϕ4

{U.S.}

ϕ5

{Japan}

b1

35

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1

π2

ϕ1

ϕ2

{Canada}

{government}

π3 ϕ3

{sports}

π4 ϕ4

{U.S.}

π5 ϕ5

{Japan}

b1

36

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1

π2

ϕ1

ϕ2

{Canada}

π3

{government}

ϕ3

{sports}

π4 ϕ4

{U.S.}

π5 ϕ5

{Japan}

b1

b1c ~ Bernoulli(πc) 37

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1

π2

ϕ1

ϕ2

{Canada}

π3

{government}

ϕ3

{sports}

π4 ϕ4

{U.S.}

π5 ϕ5

{Japan}

b1

b1c ~ Bernoulli(πc) 38

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1

π2

ϕ1

ϕ2

{Canada}

π3

{government}

ϕ3

{sports}

π4 ϕ4

{U.S.}

π5 ϕ5

{Japan}

b1

b1c ~ Bernoulli(πc) 39

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1

π2

ϕ1

ϕ2

{Canada}

π3

{government}

ϕ3

{sports}

π4 ϕ4

{U.S.}

π5 ϕ5

{Japan}

b1

b1c ~ Bernoulli(πc) 40

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1

π2

ϕ1

ϕ2

{Canada}

π3

{government}

ϕ3

{sports}

π4 ϕ4

{U.S.}

π5 ϕ5

{Japan}

b1

b1c ~ Bernoulli(πc) 41

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1

π2

ϕ1

ϕ2

{Canada}

π3

{government}

ϕ3

{sports}

π4 ϕ4

{U.S.}

π5 ϕ5

{Japan}

b1

{Canadian gov.}

b1c ~ Bernoulli(πc) 42

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1

π2

ϕ1

ϕ2

{Canada}

π3

{government}

ϕ3

{sports}

π4 ϕ4

{U.S.}

π5 ϕ5

{Japan}

b1

b2 {Canadian gov.} {government}

bkc ~ Bernoulli(πc) 43

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1

π2

ϕ1

ϕ2

{Canada}

b1

π3

{government}

b2 {Canadian gov.} {government}

b3 {hockey}

ϕ3

{sports}

b4 {U.S. gov.}

π4 ϕ4

{U.S.}

b5 {baseball}

π5 ϕ5

{Japan}

b6 {Japan}

bkc ~ Bernoulli(πc) 44

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? Beta(γ/C, 1) π1

π2

ϕ1

ϕ2

{Canada}

b1

π3

{government}

b2 {Canadian gov.} {government}

b3 {hockey}

ϕ3

{sports}

b4 {U.S. gov.}

π4 ϕ4

{U.S.}

b5 {baseball}

π5 ϕ5

{Japan}

b6 {Japan}

bkc ~ Bernoulli(πc) 45

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? Beta-‐Bernoulli model

–  The ﬁnite version of the Indian Buﬀet Process (Griﬃths & Ghahramani, 2006) –  Prior over K x C binary matrices

46

Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? Beta-‐Bernoulli model

–  The ﬁnite version of the Indian Buﬀet Process (Griﬃths & Ghahramani, 2006) –  Prior over K x C binary matrices –  We can stack the binary vectors to form a matrix

ϕ1

ϕ2

ϕ3

ϕ4

ϕ5 b1 {Canadian gov.} b2 {government} b3 {hockey} b4 {U.S. gov.} b5 {baseball} b6 {Japan} 47

Our Model Shared Components Topic Model (SCTM): –  Generate a pool of “components” (proto-‐topics) –  Assemble each topic from some of the components •  MulTply and renormalize (“product of experts”)

–  Documents are mixtures of topics (just like LDA) 1.  So topics are not independent! 2.  Fewer parameters

48

Our Model (SCTM) ϕ1

{Canada}

b1

ϕ2

{government}

b2 {Canadian gov.} {government}

b3 {hockey}

ϕ3

{sports}

b4 {U.S. gov.}

ϕ4

{U.S.}

b5 {baseball}

ϕ5

{Japan}

b6 {Japan}

How do we generate the components? 49

Our Model (SCTM) ϕ1

{Canada}

b1

ϕ2

{government}

b2 {Canadian gov.} {government}

b3 {hockey}

ϕ3

{sports}

b4 {U.S. gov.}

Dirichlet(β)

ϕ4

{U.S.}

b5 {baseball}

ϕ5

{Japan}

b6 {Japan}

How do we generate the components? 50

Our Model (SCTM) ϕ1

{Canada}

b1

ϕ2

{government}

b2 {Canadian gov.} {government}

b3 {hockey}

ϕ3

{sports}

b4 {U.S. gov.}

Dirichlet(β)

ϕ4

{U.S.}

b5 {baseball}

ϕ5

{Japan}

b6 {Japan}

51

Our Model (SCTM) ϕ1

{Canada}

ϕ2

ϕ3

{government}

b1

b2 {Canadian gov.} {government}

{sports}

b3 {hockey}

b4 {U.S. gov.}

Dirichlet(β)

ϕ4

{U.S.}

b5 {baseball}

ϕ5

{Japan}

b6 {Japan}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain

θ2= In the year before Lemieux came, Pi$sburgh ﬁnished with 38 points. Following his arrival, the

θ3= The Orioles' pitching staﬀ again is having a ﬁne exhibiTon season. Four shutouts, low team ERA,

52

SCTM

Beta(γ/C, 1) π1

π3

π2 ϕ1

{Canada}

ϕ2

b2 {Canadian gov.} {government}

{sports}

b3 {hockey}

π5

π4 ϕ3

{government}

b1

Dirichlet(β)

b4 {U.S. gov.}

ϕ4

{U.S.}

b5 {baseball}

ϕ5

{Japan}

b6 {Japan}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain

θ2= In the year before Lemieux came, Pi$sburgh ﬁnished with 38 points. Following his arrival, the

θ3= The Orioles' pitching staﬀ again is having a ﬁne exhibiTon season. Four shutouts, low team ERA,

53

SCTM

Beta(γ/C, 1) π1

π3

π2 ϕ1

{Canada}

ϕ2

b1

b2 {Canadian gov.} {government}

π4 ϕ3

{government}

{sports}

b3 {hockey}

b4 {U.S. gov.}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain

DistribuTons Dirichlet(β) π over words (topics)

θ2= In the year before Lemieux came, Pi$sburgh ﬁnished with 38 points. Following his arrival, the

5

ϕ4

{U.S.}

b5 {baseball}

ϕ5

{Japan}

b6 {Japan}

DistribuTons over θ3= topics (docs)

The Orioles' pitching staﬀ again is having a ﬁne exhibiTon season. Four shutouts, low team ERA,

54

Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) •  •  •  •  •  •  •  •  •  •  • 

(Blei et al., 2004) (Rosen-‐Zvi et al., 2004) (Teh et al., 2004) (Blei & Laﬀerty, 2006) (Li & McCallum, 2006) (Mimno et al., 2007) (Boyd-‐Graber & Blei, 2009) (Williamson et al, 2010) (Paul & Girju, 2010) (Paisley et al, 2011) (Kim & Sudderth, 2011) 55

Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) •  •  •  •  •  •  •  •  •  •  • 

Hierarchical LDA (hLDA) Author-‐Topic Model HDP mixture model Correlated Topic Models (CTM) Pachinko AllocaTon Model (PAM) Hierarchical PAM (hPAM) SyntacTc Topic Models Focused Topic Models 2D Topic-‐Aspect Model DILN for mixed-‐membership modeling Doubly Correlated Nonparametric TM 56

Correlated Topics •  Correlated Topics –  Correlated Topic Models (CTM) –  Pachinko AllocaTon Model (PAM) –  Hierarchical LDA (hLDA) –  Hierarchical PAM (hPAM)

•  Key diﬀerence from SCTM: correlaTon is limited to topics that appear together in the same document –  Example: {hockey} and {baseball} topics share many words in common, but never appear in the same document

•  The spirit of learning relaTonships between topics is very similar! 57

Our Model (SCTM) ϕ1

{Canada}

ϕ2

ϕ3

{government}

b1

b2 {Canadian gov.} {government}

θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain

{sports}

b3 {hockey}

b4 {U.S. gov.}

θ2= In the year before Lemieux came, Pi$sburgh ﬁnished with 38 points. Following his arrival, the

ϕ4

{U.S.}

b5 {baseball}

ϕ5

{Japan}

b6 {Japan}

θ3= The Orioles' pitching staﬀ again is having a ﬁne exhibiTon season. Four shutouts, low team ERA,

58

Correlated Topics

ϕ3

{sports}

•  Correlated Topics –  Correlated Topic Models (CTM) –  Pachinko AllocaTon Model (PAM) –  Hierarchical LDA (hLDA) –  Hierarchical PAM (hPAM)

b3 {hockey}

b5 {baseball}

•  Key diﬀerence from SCTM: correlaTon is limited to topics that appear together in the same document –  Example: {hockey} and {baseball} topics share many words in common, but never appear in the same document

•  The spirit of learning relaTonships between topics is very similar! 59

Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) •  •  •  •  •  •  •  •  •  •  • 

Hierarchical LDA (hLDA) Author-‐Topic Model HDP mixture model Correlated Topic Models (CTM) Pachinko AllocaTon Model (PAM) Hierarchical PAM (hPAM) SyntacTc Topic Models Focused Topic Models 2D Topic-‐Aspect Model DILN for mixed-‐membership modeling Doubly Correlated Nonparametric TM

•  •  •  • 

(Wallach et al., 2009) (Reisinger et al., 2010) (Wang & Blei, 2009) (Eisenstein et al., 2011)

60

Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) •  •  •  •  •  •  •  •  •  •  • 

Hierarchical LDA (hLDA) Author-‐Topic Model HDP mixture model Correlated Topic Models (CTM) Pachinko AllocaTon Model (PAM) Hierarchical PAM (hPAM) SyntacTc Topic Models Focused Topic Models 2D Topic-‐Aspect Model DILN for mixed-‐membership modeling Doubly Correlated Nonparametric TM

•  •  •  • 

Asymmetric Dirichlet prior Spherical Topic Models Sparse Topic Models SAGE for topic modeling

61

Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) •  •  •  •  •  •  •  •  •  •  • 

Hierarchical LDA (hLDA) Author-‐Topic Model HDP mixture model Correlated Topic Models (CTM) Pachinko AllocaTon Model (PAM) Hierarchical PAM (hPAM) SyntacTc Topic Models Focused Topic Models 2D Topic-‐Aspect Model DILN for mixed-‐membership modeling Doubly Correlated Nonparametric TM

•  •  •  •  • 

Asymmetric Dirichlet prior Spherical Topic Models Sparse Topic Models SAGE for topic modeling Shared Components Topic Models (this work)

62

Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) •  •  •  •  • 

Asymmetric Dirichlet prior Spherical Topic Models Sparse Topic Models SAGE for topic modeling Shared Components Topic Models (this work)

63

Comparison of a few Topic Models Dependently Fewer Generated Parameters Topics

Descrip5on

LDA (Blei et al., 2003) Asymmetric Dirichlet Prior (Wallach et al., 2009) Spherical Topic Model (Reisinger et al., 2010) SparseTM (Wang & Blei, 2009) SAGE (Eisenstein et al., 2011)

All topics drawn from language speciﬁc base distribuTon

Each topic is sparse

64

Comparison of a few Topic Models Dependently Fewer Generated Parameters Topics

Descrip5on

LDA (Blei et al., 2003) Asymmetric Dirichlet Prior (Wallach et al., 2009) Spherical Topic Model (Reisinger et al., 2010) SparseTM (Wang & Blei, 2009) SAGE (Eisenstein et al., 2011) SCTM (This paper)

All topics drawn from language speciﬁc base distribuTon

Each topic is sparse

Topics are products of a shared pool of components 65

Parameter EsTmaTon • Goal: infer values for model parameters ϕc

{Canada}

πc

θm=

• Monte Carlo EM (MCEM) algorithm, where the M-‐step minimizes a ContrasTve Divergence (CD) objecTve

66

Parameter EsTmaTon

Beta(γ) π1

π3

π2 ϕ1

{Canada}

ϕ2

{government}

b1

b2 {Canadian gov.} {government}

{sports}

b3 {hockey}

π5

π4 ϕ3

b4 {U.S. gov.}

Dirichlet(β)

ϕ4

{U.S.}

b5 {baseball}

ϕ5

{Japan}

b6 {Japan}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain

θ2= In the year before Lemieux came, Pi$sburgh ﬁnished with 38 points. Following his arrival, the

θ3= The Orioles' pitching staﬀ again is having a ﬁne exhibiTon season. Four shutouts, low team ERA,

67

Parameter EsTmaTon

Beta(γ) π1

π Integrated out

π2 ϕ1

{Canada}

3

ϕ2

{government}

b1

b2 {Canadian gov.} {government}

{sports}

b3 {hockey}

π5

π4

ϕ3

b4 {U.S. gov.}

Dirichlet(β)

ϕ4

{U.S.}

b5 {baseball}

ϕ5

{Japan}

b6 {Japan}

Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain

Integrated out θ = 2

In the year before Lemieux came, Pi$sburgh ﬁnished with 38 points. Following his arrival, the

θ3= The Orioles' pitching staﬀ again is having a ﬁne exhibiTon season. Four shutouts, low team ERA,

68

Parameter EsTmaTon ϕ1

{Canada}

ϕ2

{government}

b1

b2 {Canadian gov.} {government} The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's ﬁsh boats in the disputed waters oﬀ Dixon…

b3 {hockey}

ϕ3

{sports}

b4 {U.S. gov.}

In the year before Lemieux came, Pi$sburgh ﬁnished with 38 points. Following his arrival, the Pens ﬁnished…

ϕ4

{U.S.}

b5 {baseball}

ϕ5

{Japan}

b6 {Japan}

The Orioles' pitching staﬀ again is having a ﬁne exhibiTon season. Four shutouts, low team ERA, (Well, I haven't go$en any baseball… 69

Parameter EsTmaTon ϕ1

{Canada}

ϕ2

{government}

b1

b2 {Canadian gov.} {government} z11 z13

z14

ϕ3

z12 z15 z16

{sports}

b3 {hockey}

z21 z24

b4 {U.S. gov.}

ϕ4

ϕ5

{U.S.}

{Japan}

b5 {baseball} z31

z22 z23

b6 {Japan} z32 z33

z34 70

Parameter EsTmaTon ϕ1

{Canada}

ϕ2

b2 {Canadian gov.} {government}

z13

z14

Model parameters {sports}

{government}

b1

z11

ϕ3

z12 z15 z16

b3 {hockey}

b4 {U.S. gov.}

ϕ4

{U.S.}

z24

{Japan}

b5 {baseball}

Latent variables z21

ϕ5

z31

z22 z23

b6 {Japan} z32 z33

z34 71

Parameter EsTmaTon Standard Mϕ-‐step: Maximize likelihood of ϕcϕ ϕ ϕ

ϕ1

2

{Canada}

3

{government}

4

{sports}

condiToned on zmn and bck

b1

b2 {Canadian gov.} {government}

b3 {hockey}

{U.S.}

b4 {U.S. gov.}

5

{Japan}

b5 {baseball}

b6 {Japan}

Standard E-‐step: Compute expectaTons of zmn a nd z z z z bck condiToned zo n ϕc z 11

z13

32

31

12

z14

z15 z16

21

z24

22

z23

z33

z34 72

Parameter EsTmaTon Standard Mϕ-‐step: Maximize likelihood of ϕcϕ ϕ ϕ

ϕ1

2

{Canada}

3

{government}

4

{sports}

condiToned on zmn and bck

b1

b2 {Canadian gov.} {government}

b3 {hockey}

{U.S.}

b4 {U.S. gov.}

{Japan}

b5 {baseball}

b6 {Japan}

Monte-‐Carlo E-‐step: Sample zmn and bckz z z on z condiToned z ϕc z 11

z13

5

32

31

12

z14

z15 z16

21

z24

22

z23

z33

z34 73

Parameter EsTmaTon CD M contrasTve divergence of ϕϕc ϕ -‐step: Minimize ϕ ϕ ϕ 1

2

{Canada}

3

{government}

4

{sports}

condiToned on zmn and bck

b1

b2 {Canadian gov.} {government}

b3 {hockey}

{U.S.}

b4 {U.S. gov.}

{Japan}

b5 {baseball}

b6 {Japan}

Monte-‐Carlo E-‐step: Sample zmn and bckz z z on z condiToned z ϕc z 11

z13

5

32

31

12

z14

z15 z16

21

z24

22

z23

z33

z34 74

9 0 1 2

3 4 5 6 7 8

9 0 1 2

076 077 078

{Draw xmnjth∼sample p(· |z{Z, , bzmn} , φ) [dr mnB} for i = 1 to N do QC where bzc φ Sample zi c=1 c,x p(x|z, QC for k = 1 to K do bz , φ) = PV bzc φ for c = 1 to C do v=1 c=1 c,v Sample bkc {M-step:} CD M-‐step: for c = 1 to C do We follow Hinton (2002) for v = 1 to do Algorithm 1 VSCTM Training Single gradient step over ξ Initialize parameters: ξc , bkc , zi . d CD({Z, B}) (t+1) while not converged doφ(t) φcv = cv − η · dφcv {E-step:}

Parameter EsTmaTon

079 080 081 082 083 084

for j = 1 to J do Monte-‐Carlo E -‐step: {Draw jth sample {Z, B}(j) } 085 for i = 1 to N doBelow, we provide the Contrastive Divergence Sample zi 086 gence objective, where Z and B are treated as fixe for k = 1 to K do 087 � for c = 1 to C do B}) d SampledbCD({Z, 088 kc ≈− {M-step:} 75 dξcv 089

Parameter EsTmaTon • Goal: infer values for model parameters ϕc

{Canada}

πc

θm=

• Monte Carlo EM (MCEM) algorithm, where the M-‐step minimizes a ContrasTve Divergence (CD) objecTve

76

Experiments: Topic Modeling •  Experiments: –  Can SCTM combine a ﬁxed number of components (mulTnomials) into topics to achieve lower perplexity? –  Does SCTM achieve lower perplexity than LDA with a more compact model?

•  Analysis: –  What are the learned topics like? –  What are the learned components like? –  What topic-‐structure is learned? 77

Experiments: Topic Modeling Experimental Setup: – Datasets: •  1,000 random arTcles from 20 Newsgroups •  1,617 NIPS abstracts

– EvaluaTon: •  les-‐to-‐right average perplexity on held-‐out data

– Models: •  LDA trained with a collapsed Gibbs sampler –  In LDA, components and topics are in a one-‐to-‐one relaTonship (i.e. a special case of the SCTM where each topic is comprised of only its corresponding component)

•  SCTM with parameter esTmaTon as described 78

Experiments: Topic Modeling •  Experiments: –  Can SCTM combine a ﬁxed number of components (mulTnomials) into topics to achieve lower perplexity? –  Does SCTM achieve lower perplexity than LDA with a more compact model?

•  Analysis: –  What are the learned topics like? –  What are the learned components like? –  What topic-‐structure is learned? 79

Experiments: Topic Modeling 20News

1800

● ●

1600

Perplexity

● ●

LDA

● ●

1400

● ●

1200

● ● ● ● ● ●

1000 800 0

20

40

60

# of Components

80

100 80

Experiments: Topic Modeling 10

1800

20News ● ●

1600

Perplexity

● ●

20 ● ●

1400

SCTM with # components = # topics (labels show # topics)

40 ● ●

1200

LDA SCTM

60

80 ● ●

100 ● ● ● ●

1000 800 0

20

40

60

# of Components

80

100 81

Experiments: Topic Modeling 10

1800

20News ● ●

1600

Perplexity

● 20 ● 20

● 40 ●

1400

LDA SCTM

40 80

● ●

1200

(labels show # topics)

60 80 ● ●

100 ● ●

120

● ●

160

200

1000 800 0

20

40

60

# of Components

80

100 82

Experiments: Topic Modeling 10

1800

20News ● ●

1600

Perplexity

● 20 ● 20 30 ● 40 ●

1400

40

60 80

1200

LDA SCTM

● ●

(labels show # topics)

60 80 ● ●

120

100 ● ●

120 180

1000

● ●

160

200

240

300

800 0

20

40

60

# of Components

80

100 83

Experiments: Topic Modeling 10

1800

20News ● ●

1600

Perplexity

● 20 ● 20 30

40

1400

● 40 ●

60 80

1200

LDA SCTM

40 80

● ●

(labels show # topics)

60 80 ● ●

120

100 ● ●

120

160

180 240

1000

● ●

160

200

240 320

300 400

800 0

20

40

60

# of Components

80

100 84

Experiments: Topic Modeling 10

1800

20News ● ●

1600

Perplexity

● 20 ● 20 30

40 50

1400

● 40 ●

60 80 100

1200

40 80

● ●

(labels show # topics)

60 80 ● ●

120

100 ● ●

120

160 200

1000

● ●

160

180 240 300

200

240 320 400

300 400 500

800 0

20

LDA SCTM

40

60

# of Components

80

100 85

Experiments: Topic Modeling NIPS

700 ● ●

● ●

LDA

600

Perplexity

● ●

500

● ●

● ● ● ●

400

● ●

300 0

20

40

60

# of Components

80

100 86

Experiments: Topic Modeling 10

700

NIPS ● ●

● ●

20

600

Perplexity

30 40 50

20 ● ●

40

40

60

500

80 100

400

(labels show # topics)

● ●

80

60

120 160 200

120

● ● ● 80 ●

180 240 300

20

40

100

160 240 320 400

300 0

LDA SCTM

60

# of Components

● ●

200 300 400 500

80

100 87

Experiments: Topic Modeling •  Experiments: –  Can SCTM combine a ﬁxed number of components (mulTnomials) into topics to achieve lower perplexity? –  Does SCTM achieve lower perplexity than LDA with a more compact model?

•  Analysis: –  What are the learned topics like? –  What are the learned components like? –  What topic-‐structure is learned? 88

Experiments: Topic Modeling ● 10 ●

20News ● 20 ●

1400

● ●

Labels for LDA show # topics. Labels for SCTM show # components, # topics

Perplexity

● 40 ●

1200

LDA

● 60 ● ● 80 ● ● 100 ●

● 120 ● ● 140 ●

1000

800 0

100

200

300

400

500

# of Model Parameters (thousands)

600 89

Experiments: Topic Modeling ● 10 ● 10,20 10,30

● ●

● 20 ● 10,4020,40 10,50 20,60

Perplexity

1400

20News

20,80

1200

LDA SCTM

Labels for LDA show # topics. Labels for SCTM show # components, # topics

● 40 ● 40,80 ● 60 ●

20,100 40,120

60,120

● 80 ●

● 100 ● 40,160 ● 120 ● 80,160 60,180 ● 140 ● 100,200 40,200 60,240 80,240 100,300 60,300 80,320 80,400 100,400 100,500

1000

800 0

100

200

300

400

500

# of Model Parameters (thousands)

600 90

Experiments: Topic Modeling ● 10 ●

NIPS

600

Perplexity

● 20 ●

500

● ●

LDA

● 40 ●

● 60 ● ● 80 ●

400

● 100 ● ● 120 ●

● 140 ●

● 160 ●

● 180 ●

● 200 ●

300 0

100

200

300

# of Model Parameters (thousands)

400 91

Experiments: Topic Modeling ● 10 ● 10,20

10,30 ● 10,40 20 ● 10,50 20,40

Perplexity

600

NIPS

20,60

500

● ●

LDA SCTM

● 40 ●

20,80 40,80 20,100

● 60 ●

40,120

60,120 80 ● ● 100 ● 40,160 ● 60,180 120 ● 40,200 80,160 ● 140 ● ● 60,240 80,240 160 ● ● 180 ● 100,200 100,300 60,300 ● 200 ● 80,320 100,400 80,400 100,500

400

300 0

100

●

200

300

# of Model Parameters (thousands)

400 92

Experiments: Topic Modeling •  Experiments: –  Can SCTM combine a ﬁxed number of components (mulTnomials) into topics to achieve lower perplexity? –  Does SCTM achieve lower perplexity than LDA with a more compact model?

•  Analysis: –  What are the learned topics like? –  What are the learned components like? –  What topic-‐structure is learned? 93

What does SCTM learn?

Figure 2: SCTM binary matrix and topics from 3599 training documents of 20N EWS for C squares are “on” (equal to 1).

20News

20

y

15

10

5

2

4

x

1800

6

k 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

αk 0.306 0.031 0.025 0.102 0.071 0.018 0.074 0.106 0.011 0.055 0.063 0.160 0.042 0.038 0.079 0.158 0.136 0.122 0.017 0.380

Top words for topic subject organization israel return define law org encryption chip clipper keys des escrow security law turkish armenian armenians war turkey turks armenia drive card disk scsi hard controller mac drives image jpeg window display code gif color mit jews israeli jewish arab peace land war arabs org money back question years thing things point christian bible church question christ christians life administration president year market money senior health medical center research information april gun law state guns control bill rights states world organization system israel state usa cwru reply space nasa gov launch power wire ground air space nasa gov launch power wire ground air team game year play games season players hockey car lines dod bike good uiuc sun cars windows file government key jesus system program article writes center page harvard virginia research max output access digex int entry col line lines people don university posting host nntp time

Top words for topic a organization subject is administration preside years food center year opinions drive hard po pitt file program year c

! 10 10,20 11 ! 10,30

20,40 20 10,40 21 !!

1400 10,50

20,60

Perplexity

← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ←

20,80 40

1200

20,100

40

4

1000 800 0

10

# of Mode

8 10

10 !

LDA

700

10 !

LDA

10 ! 11 !

94

What does SCTM learn?

Figure 2: SCTM binary matrix and topics from 3599 training documents of 20N EWS for C are “on” (equalmatrix to 1). 2:squares SCTM binary and topics from 3599 training documents o

20News Figure k α Top words for topic squares are “on” (equal to 1). 20

y

15

10

5

2

4

x

1800

6 10

← 14 8 10 ← 15 ← 16

k

0.306 0.031 α 0.025k 0.306 0.102 0.071 0.031 0.018 0.025 0.074 0.106 0.102 0.011 0.071 0.055 0.018 0.063 0.160 0.074 0.042 0.106 0.038 0.011 0.079 0.158 0.055 0.136 0.063 0.122 0.160 0.017 0.380 0.042

Top words for topic a subject organization israel return define law org organization subject is encryption chip clipper keys des escrow security law administration preside Top words for topic turkish armenian armenians war turkey turks armenia years food center year subject organization israel law org drive card disk scsi hard controller macreturn drives define opinions drive hard po image jpeg window display code gifkeys color des mit escrowpitt file program encryption chip clipper security lawyear c jews israeli jewish arab peace land war arabs armenian armenians war turkey turks armenia orgturkish money back question years thing things point christian christcontroller christians lifemac drives drivebible cardchurch diskquestion scsi hard 10 administration president year market money senior 11 10,30 image jpeg window display code gif color mit 10,20 health medical center research information april 20,40 20 10,40 21 israeli arab gunjews law state guns jewish control bill rightspeace states land war arabs 1400 10,50 world systemquestion israel state years usa cwru reply things point 20,60 orgorganization money back thing 20,80 40 space nasa gov launch power wire ground air 20,100 1200 christian bible church question christ christians life 40 space nasa gov launch power wire ground air 4 administration president year market team game year play games season players hockey money senior 1000 carhealth lines dodmedical bike good center uiuc sun research cars information april windows file government key jesus system program gun law state guns control bill rights states 800 article writes center page harvard virginia research world system israel state usa cwru reply max output organization access digex int entry col line 0 10 lines peoplenasa don university postingpower host nntp time ground air space gov launch wire ! !

! !

Perplexity

← 1 ← 2 k ← 3 ← ← 14 ← ← 25 ← 6 ← ← 37 ← ← 48 ← ← 59 ← 10 ← ← 611 ← ← 712 13 ←← 8 ← 14 ← ← 915 ← ←1016 ← 17 ← ←1118 ← ←1219 ← ←1320

# of Mode

0.038 space nasa gov launch power wire ground air 0.079 team game year play games season players hockey 10 10 0.158 11 700 dod bike good uiuc sun cars LDA car lines LDA !

!

! !

95

Figure 2: SCTM binary matrix and topics from 3599 training documents o squares are “on” (equal to 1).

What does SCTM learn?

k αk Top words for topic ← matrix 1 0.306 subjectfrom organization israel documents return defineoflaw org Figure 2: SCTM binary and topics 3599 training 20N EWS for C ← 2 0.031 encryption chip clipper keys des escrow security law squares are “on” (equal to 1). ← 3 0.025 turkish armenian armenians war turkey turks armenia αk Top words for topic words for topic a ← ← 4k1 0.102 drive card disk scsi hard controller mac Top drives 0.306 subject organization israel return define law org organization subject is ← ← 5 2 0.071 image chip jpegclipper window display code gif mit 0.031 encryption keys des escrow security law color administration preside 0.025 turkish turkey turksland armenia years food center year ← ← 6 3 0.018 jewsarmenian israeliarmenians jewish war arab peace war arabs ← 4 0.102 drive card disk scsi hard controller mac drives opinions drive hard po ← ← 7 5 0.074 org money back question years thing things 0.071 image jpeg window display code gif color mit pitt filepoint program year c ← ← 8 6 0.106 christian bible question 0.018 jews israeli jewish arabchurch peace land war arabs christ christians life ← 7 0.074 org money back question years thing thingsmarket point ← 9 0.011 administration president year money senior ← 8 0.106 christian bible church question christ christians life ← ←10 9 0.055 health medical research information april 10,201011 0.011 administration presidentcenter year market money senior 10,30 0.055 health center guns research information ← ←1110 0.063 gunmedical law state control billapril rights states 20,40 20 10,40 21 ← 11 0.063 gun law state guns control bill rights states 1400 10,50 ← 12 0.160 world organization system israel state usa cwru reply 20,60 ← 12 0.160 world organization system israel state usa cwru reply 20,80 40 ← ←1313 0.042 space govpower launch wire ground air 1200 20,100 0.042 space nasa nasa gov launch wirepower ground air 40 0.038 space nasa nasa gov launch wirepower ground air ← ←1414 0.038 space govpower launch wire ground air 4 15 0.079 team game year play games season players hockey ←← 15 0.079 team game year play games season players hockey 1000 ← 16 0.158 car lines dod bike good uiuc sun cars ← ←1617 0.158 car lines dod bikekey good cars 0.136 windows file government jesus uiuc systemsun program 800 0.122 article writes center harvard virginia ← ←1718 0.136 windows file page government keyresearch jesus system program ← 19 0.017 max output access digex int entry col line 0 10 ← ←1820 0.122 article center pagehost harvard 0.380 lines people writes don university posting nntp timevirginia research # of Mode ← 19 0.017 max output access digex int entry col line ← 20 0.380 lines people don university posting host nntp time 2 4 6 8 10

20News

20

15

10

5

! !

Perplexity

y

! !

x

1800

10 !

LDA

700

10 !

LDA

10 ! 11 !

96

Experiments: Topic Modeling •  Experiments: –  Can SCTM combine a ﬁxed number of components (mulTnomials) into topics to achieve lower perplexity? –  Does SCTM achieve lower perplexity than LDA with a more compact model?

•  Analysis: –  What are the learned topics like? –  What are the learned components like? –  What topic-‐structure is learned? 97

SCTM: Hasse Diagram over Topics c=9 visual image images cells cortex scene support spatial feature vision cues stimulus statistics

c=4 paper units output layer networks patterns unit pattern set rule network rules weights training

c=2

c=1

network networks data learning optimal linear vector independent binary natural algorithms pca

model information parameters kalman robust matrices likelihood experimentally

k=16 αk =0.11 training units paper hidden number output problem rule set order unit show present method weights task

k=14 αk =0.07 models images image problem structure analysis mixture clustering approach show computational

k=8 αk =0.23 algorithm training error function method performance input classification classifier

k=4 αk =0.12 bayesian results show estimation method based parameters likelihood methods models

k=18 αk =0.07 information analysis component rules signal independent representations noise basis

k=3 αk =0.06 object recognition system objects information visual matching problem based classification

k=12 αk =0.13 problem state control reinforcement problems models time based decision markov systems function

k=5 αk =0.04 object recognition system objects information visual matching problem based classification

k=9 αk =0.02 vector feature classification support vectors kernel regression weight inputs dimensionality

NIPS

k=11 αk =0.08 learning networks system recognition time network describes hand context views classification

k=1 αk =0.11 model learning system information parameters networks robust kalman rules estimation

k=13 αk =0.05 networks network learning distributed system weight vectors property binary point optimal real

k=7 αk =0.08 data paper networks network output feature features patterns set train introduced unit functions

k=10 αk =0.09 neural neurons analog synaptic neuron networks memory time capacity model associative noise dynamics

k=2 αk =0.13 network input information time recurrent back propagation units architecture forward layer

k=19 αk =0.03 system networks set neurons visual phase feature processing features output associative

k=20 αk =0.02 time network weights activation delay current chaotic connected discrete connections

k=6 αk =0.23 neural network paper recognition speech systems based results performance artificial

k=17 αk =0.10 number functions weights function layer generalization error results loss linear size

k=15 αk =0.12 cells neurons visual cortex motion response processing spatial cell properties patterns spike

Figure 4: Hasse diagram on NIPS for C = 10, K = 20 showing the top words for topics and unrepresented components (in shaded box). Notice that some topics only consist of a single component. The shaded box contains the

98

m state rol ement models ased markov unction

data paper networks network output feature features patterns set train introduced unit functions

SCTM: Hasse Diagram over Topics

c=9 visual image images cells cortex scene support spatial feature vision cues stimulus statistics

c=4 paper units output layer networks patterns unit pattern set rule network rules weights training

k=11 αk =0.08 learning networks system recognition time network describes hand context views classification

k=8 αk =0.23 algorithm training error function method performance input classification classifier

=1 αk =0.11 model learning system information parametersα =0.12 tworks k=4 robust bayesian results show kalman rules estimation method based estimation parameters k

likelihood methods models

c=2

c=1

network networks data learning optimal linear vector independent binary natural algorithms pca

model information parameters kalman robust matrices likelihood experimentally

k=18 αk =0.07 information analysis component rules signal independent representations k noise basis

k=10 αk =0.09 neural neurons analog synaptic networks α =0.11 k=14 α =0.07neuron k=16 training units models images memory time paper hidden image problem model structure capacity number output analysis mixture problem rule set associative clustering order unit show approach show present method computational noise dynamics weights task k

k

k=3 αk =0.06 object recognition system objects information visual matching problem based classification

k=13 α =0.05 networks network learning distributed system weight vectors property binary point optimal real

k=5 αk =0.04 object recognition system objects information visual matching problem k based classification

weight inputs dimensionality

k=7 αk =0.08 data paper networks network output feature features patterns set train introduced unit functions

k=2 αk =0.13 network input information time recurrent back k=10 α =0.09 k=11 α =0.08 neural neurons learningpropagation analog synaptic networks system neuron networks recognition time units memory time network architecture capacity model describes hand context views associative forward layer classification noise dynamics k

k

k=1 αk =0.11 model learning system information parameters networks robust kalman rules estimation

k=19 α =0.03 system networks set neurons visual phase feature k=9 α =0.02 processing vector feature classification features output support vectors kernel associative regression k

NIPS

k=12 αk =0.13 problem state control reinforcement problems models time based decision markov systems function

k=13 αk =0.05 networks network learning distributed system weight vectors property binary k point optimal real

k=20 α =0.02 time network weights activation delay current chaotic connected discrete connections

k=19 αk =0.03 system networks set neurons visual phase feature processing features output associative

k=2 αk =0.13 network input information time recurrent back propagation units architecture forward layer

k=20 αk =0.02 time network weights activation delay current chaotic connected discrete kconnections

k=6 α =0.23 neural network paper recognition speechk=17systems α =0.10 based results number functions performance weights function layer artificial generalization k

error results loss linear size

k=6 αk =0.23 neural network paper recognition speech systems based results performance artificial

k=15 αk =0.12 cells neurons visual cortex motion response processing spatial cell properties patterns spike

Figure 4: Hasse diagram on NIPS for C = 10, K = 20 showing the top words for topics and unrepresented components (in shaded box). Notice that some topics only consist of a single component. The shaded box contains the

99

SCTM: Hasse Diagram over Topics c=9 visual image images cells cortex scene support spatial feature vision cues stimulus statistics

c=4 paper units output layer networks patterns unit pattern set rule network rules weights training

c=2

c=1

network networks data learning optimal linear vector independent binary natural algorithms pca

model information parameters kalman robust matrices likelihood experimentally

k=16 αk =0.11 training units paper hidden number output problem rule set order unit show present method weights task

k=14 αk =0.07 models images image problem structure analysis mixture clustering approach show computational

k=8 αk =0.23 algorithm training error function method performance input classification classifier

c=9 visual image images cells cortex scene support spatial feature vision cues stimulus k=18 α =0.07 =0.06 statistics k=3 αobject information k

analysis component rules signal independent representations noise basis

k

recognition system objects information visual matching problem based classification

k=12 αk =0.13 problem state control reinforcement problems models time based decision markov systems function

recognition system objects information visual matching problem based classification

k

system information parameters networks robust kalman rules estimation

k=7 αk =0.08 data paper networks network output feature features patterns set train introduced unit functions

k=10 αk =0.09 neural neurons analog synaptic neuron networks memory time capacity model associative noise dynamics

k=2 αk =0.13 network input information time recurrent back propagation units architecture forward layer

k

k

k

network learning distributed system weight vectors property binary point optimal real

set neurons visual phase feature processing features output associative

weights activation delay current chaotic connected discrete connections

k=11 αk =0.08 learning networks system recognition time network describes hand context views classification

c=4 paper units output layer networks patterns unit pattern set rule network rulesk=1 α =0.11 k=5 α =0.04 weights training object model learning k

NIPS

c=2

c=1

network model networks data information parameters learning optimal kalman robust linear vector matrices independent likelihood binary natural k=20 α =0.02 k=13 α =0.05 k=19 α =0.03 experimentally algorithms time network networks pca system networks

k=12 αk =0. problem st control reinforcem problems mo time base decision ma k=6 α =0.23 systems func neural network k

paper recognition speech systems based results performance artificial

α=0.12 k=15 α k=14 αk =0.07 k=17 α =0.10 k=16 k =0.11 cells neurons number visual cortexunits functions training models images weights motion response function processing layer paper hidden image problem generalization spatial cell structure number output properties error results patterns spike analysis mixture loss linear size problem rule set clustering order unit show approach show present commethod Figure 4: Hasse diagram on NIPS for C = 10, K = 20 showing the top words for topics and unrepresented computational weights task 100 ponents (in shaded box). Notice that some topics only consist of a single component. The shaded box contains the k=4 αk =0.12 bayesian results show estimation method based parameters likelihood methods models

k=9 αk =0.02 vector feature classification support vectors kernel regression weight inputs dimensionality

k

k

Experiments: Topic Modeling •  Experiments:

–  For the same number of components (mulTnomials), SCTM achieves lower perplexity than LDA –  Non-‐square SCTM achieves lower perplexity than LDA with a more compact model

•  Analysis:

–  SCTM learns diverse LDA-‐like topics –  Components are usually only interpretable when they also appear as a topic –  SCTM learns an implicit Hasse diagram deﬁning subsumpTon relaTonships between topics

101

Summary Shared Components Topic Model (SCTM): 1.  Generate a pool of “components” (proto-‐topics) 2.  Assemble each topic from some of the components •  MulTply and renormalize (“product of experts”)

3.  Documents are mixtures of topics ( just like LDA) –  So the wordlists of two topics are not generated independently! –  Fewer parameters 102

Future Work •  Improve inference for SCTM •  Topics as products of components in other applica5ons –  SelecTonal preference: components could correspond to semanTc features that intersect to deﬁne semanTc classes –  Vision: topics are classes of objects, the components could be features of those objects

103

Thank you!

QuesTons, comments?

104

Recommend Documents

Lexiâ¢ Topic Models

Additive Regularization of Topic Models for Topic

Spherical Topic Models - Semantic Scholar

Historicizing Topic Models. A distant reading of topic modeling texts ...