Shared Components Topic Models Ma$hew R. Gormley, Mark Dredze, Benjamin Van Durme, Jason Eisner Center for Language and Speech Processing Human Language Technology Center of Excellence Johns Hopkins University NAACL 2012 June 6, 2012
Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics)
2
Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) Most extensions to LDA
Our Model
3
words
words
words
words
words
0.000 0.006 0.012
0.006 0.000
probability
0.006 0.000
probability
0.000 0.006 0.012
probability
0.006 0.000
0.000
0.006
probability
LDA for Topic Modeling
probability
(Blei, Ng, & Jordan, 2003)
words
• Each topic is defined as a Mul5nomial distribu5on over the vocabulary, parameterized by ϕk 4
(Blei, Ng, & Jordan, 2003)
LDA for Topic Modeling
words
words
ϕ5
words
words
0.000 0.006 0.012
ϕ6
probability
0.006 0.000
0.006
probability
ϕ4
0.000
0.000 0.006 0.012
probability
0.006 0.000
probability
0.006 0.000
words
ϕ3
ϕ2
probability
ϕ1
words
• Each topic is defined as a Mul5nomial distribu5on over the vocabulary, parameterized by ϕk 5
words
words
0.000 0.006 0.012
probability
0.006
probability
words
ϕ6
0.000
ϕ5
0.006
probability
ϕ4
0.000
probability
0.006
probability
0.006
words
ϕ3
0.000
ϕ2
0.000
probability
ϕ1
0.000 0.006 0.012
LDA for Topic Modeling
(Blei, Ng, & Jordan, 2003)
words
words
team, season, hockey, player, penguins, ice, canadiens, puck, montreal, stanley, cup
• A topic is visualized as its high probability words.
6
words
words
0.000 0.006 0.012
probability
0.006
probability
words
ϕ6
0.000
{hockey}
ϕ5
0.006
probability
ϕ4
0.000
probability
0.006
probability
0.006
words
ϕ3
0.000
ϕ2
0.000
probability
ϕ1
0.000 0.006 0.012
LDA for Topic Modeling
(Blei, Ng, & Jordan, 2003)
words
words
team, season, hockey, player, penguins, ice, canadiens, puck, montreal, stanley, cup
• A topic is visualized as its high probability words. • A pedigogical label is used to idenTfy the topic. 7
LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}
ϕ3
{hockey}
ϕ4
{U.S. gov.}
(Blei, Ng, & Jordan, 2003)
ϕ5
{baseball}
ϕ6
{Japan}
• A topic is visualized as its high probability words. • A pedagogical label is used to idenTfy the topic. 8
LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}
ϕ3
{hockey}
ϕ4
{U.S. gov.}
ϕ5
{baseball}
ϕ6
{Japan}
Dirichlet(α) θ1=
9
LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}
ϕ3
{hockey}
ϕ4
{U.S. gov.}
ϕ5
{baseball}
ϕ6
{Japan}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US
10
LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}
ϕ3
{hockey}
ϕ4
{U.S. gov.}
ϕ5
{baseball}
ϕ6
{Japan}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US
11
LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}
ϕ3
{hockey}
ϕ4
{U.S. gov.}
ϕ5
{baseball}
ϕ6
{Japan}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard
12
LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}
ϕ3
{hockey}
ϕ4
{U.S. gov.}
ϕ5
{baseball}
ϕ6
{Japan}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's fish boats in the disputed waters off Dixon… 13
LDA for Topic Modeling ϕ1 ϕ2 {Canadian gov.} {government}
ϕ3
{hockey}
ϕ4
{U.S. gov.}
ϕ5
{baseball}
ϕ6
{Japan}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's fish boats in the disputed waters off Dixon…
θ2= In the year before Lemieux came, Pi$sburgh finished with 38 points. Following his arrival, the Pens finished…
θ3= The Orioles' pitching staff again is having a fine exhibiTon season. Four shutouts, low team ERA, (Well, I haven't go$en any baseball… 14
LDA for Topic Modeling Dirichlet(β) ϕ1 ϕ2 {Canadian gov.} {government}
ϕ3
{hockey}
ϕ4
{U.S. gov.}
ϕ5
{baseball}
ϕ6
{Japan}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's fish boats in the disputed waters off Dixon…
θ2= In the year before Lemieux came, Pi$sburgh finished with 38 points. Following his arrival, the Pens finished…
θ3= The Orioles' pitching staff again is having a fine exhibiTon season. Four shutouts, low team ERA, (Well, I haven't go$en any baseball… 15
LDA for Topic Modeling Dirichlet(β) ϕ1 ϕ2 {Canadian gov.} {government}
ϕ3
{hockey}
ϕ4
{U.S. gov.}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's fish boats in the disputed waters off Dixon…
θ2= In the year before Lemieux came, Pi$sburgh finished with 38 points. Following his arrival, the Pens finished…
DistribuTons over ϕ ϕ words (topics) 5
{baseball}
6
{Japan}
DistribuTons over θ3= topics (docs) The Orioles' pitching staff again is having a fine exhibiTon season. Four shutouts, low team ERA, (Well, I haven't go$en any baseball… 16
LDA for Topic Modeling Dirichlet(β) ϕ1 ϕ2 {Canadian gov.} {government}
ϕ3
{hockey}
ϕ4
{U.S. gov.}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's fish boats in the disputed waters off Dixon…
θ2= In the year before Lemieux came, Pi$sburgh finished with 38 points. Following his arrival, the Pens finished…
DistribuTons over ϕ ϕ words (topics) 5
{baseball}
6
{Japan}
DistribuTons over θ3= topics (docs) The Orioles' pitching staff again is having a fine exhibiTon season. Four shutouts, low team ERA, (Well, I haven't go$en any baseball… 17
LDA for Topic Modeling Two problems with the LDA generaTve story for topics: 1. Independently generate each topic 2. For each topic, store a parameter per word in the vocabulary
18
LDA for Topic Modeling Two problems with the LDA generaTve story for topics: 1. Independently generate each topic 2. For each topic, store a parameter per word in the vocabulary
We’re not the first to noTce this… 19
Our Model Shared Components Topic Model (SCTM): – Generate a pool of “components” (proto-‐topics) – Assemble each topic from some of the components • MulTply and renormalize (“product of experts”)
– Documents are mixtures of topics ( just like LDA) 1. So the wordlists of two topics are not generated independently! 2. Fewer parameters 20
SCTM: MoTvaTng Example Components are distribuTons over words. How to combine components into topics? Decreasing probability
ϕ1 {sports}
ϕ2 {Canada}
ϕ3 {government}
player
canada
democracy
team
Quebec
socialism
hockey
parliament
voted
baseball
snow
elecTon
Orioles
Hansard’s
Obama
Canucks
Elizabeth II
PuTn
season
hockey
parliament
21
SCTM: MoTvaTng Example We can imagine a component as a set of words (i.e. all the non-‐zero probabiliTes are idenTcal): ϕ1 {sports}
ϕ2 {Canada}
ϕ3 {government}
player
canada
democracy
team
Quebec
socialism
hockey
parliament
voted
baseball
snow
elecTon
Orioles
Hansard’s
Obama
Canucks
Elizabeth II
PuTn
season
hockey
parliament
22
SCTM: MoTvaTng Example To create a {Canadian government} topic we could take the union of {government} and {Canada}. ϕ1 {sports}
ϕ2 {Canada}
ϕ3 {government}
player
canada
democracy
team
Quebec
socialism
hockey
parliament
voted
baseball
snow
elecTon
Orioles
Hansard’s
Obama
Canucks
Elizabeth II
PuTn
season
hockey
parliament
23
SCTM: MoTvaTng Example Be$er yet, to create a {Canadian government} topic we could take the intersec5on of {government} and {Canada}. ϕ3 {government}
ϕ1 {sports}
{Canadian gov.}
ϕ2 {Canada}
24
SCTM: MoTvaTng Example Be$er yet, to create a {Canadian government} topic we could take the intersec5on of {government} and {Canada}. ϕ3 {government}
{Canadian gov.}
{hockey}?
ϕ1 {sports}
ϕ2 {Canada}
25
SCTM: MoTvaTng Example More complex intersec5ons might be more realisTc: ϕ1
{Canada}
ϕ2
{government}
ϕ3
{sports}
b1
{Canadian gov.}
26
SCTM: MoTvaTng Example More complex intersec5ons might be more realisTc: ϕ1
{Canada}
b1
{Canadian gov.}
ϕ2
{government}
ϕ3
{sports}
ϕ4
{U.S.}
b4 {U.S. gov.}
27
SCTM: MoTvaTng Example More complex intersec5ons might be more realisTc: ϕ1
{Canada}
b1
{Canadian gov.}
ϕ2
{government}
b3 {hockey}
ϕ3
{sports}
ϕ4
{U.S.}
b4 {U.S. gov.}
28
SCTM: MoTvaTng Example More complex intersec5ons might be more realisTc: ϕ1
{Canada}
b1
ϕ2
{government}
b2 {Canadian gov.} {government}
b3 {hockey}
ϕ3
{sports}
b4 {U.S. gov.}
ϕ4
{U.S.}
b5 {baseball}
ϕ5
{Japan}
b6 {Japan}
29
SCTM: MoTvaTng Example More complex intersec5ons might be more realisTc:
Components
ϕ1
{Canada}
b1
ϕ2
{government}
b2 {Canadian gov.} {government}
b3 {hockey}
ϕ3
{sports}
b4 {U.S. gov.}
ϕ4
{U.S.}
b5 {baseball}
ϕ5
{Japan}
b6 {Japan}
Topics 30
Sos IntersecTon and Union • We don’t want topics to be sets of words, we want probability distribu5ons over words • In probability space…
Union
Mixture
IntersecTon
Normalized Product 31
Anonymous Author(s) Affiliation Address email
Product of Experts
Product of Experts (PoE) model (Hinton, 2002) Q C φ PV c=1 QC cx v=1 c=1 φcv
Experts– (PoE) [1] model . . , φC ) = product , where ther Another name p(x|φ for a1 ,n.ormalized nd the summation the denominator is over all possible types. – For a sinubset of components, define the feature model as: �
p(x|φ1 , . . . , φC ) = �V
c∈C
v=1
richlet allocation generative process {1, . . . , K}:
IntersecTon
[draw distribution over words] m ∈ {1, . . . , M } [draw distribution over topics] ∈ {1, . . . , Nm }
�
φcx c∈C
φcv
The Finite IBP model generative process
Normalized [colum ∼ Product Beta( , 1) probability of comp ([draw PoE)
For each component c ∈ {1, . . . , C}: γ πc C For each topic k ∈ {1, . . . , K}: b ∼ Bernoulli(π )
[ro
32
Our Model Shared Components Topic Model (SCTM): – Generate a pool of “components” (proto-‐topics) – Assemble each topic from some of the components • Mul5ply and renormalize (“product of experts”)
– Documents are mixtures of topics ( just like LDA) 1. So topics are not independent! 2. Fewer parameters
33
Our Model Shared Components Topic Model (SCTM):
– Generate a pool of “components” (proto-‐topics) – Assemble each topic from some of the components • MulTply and renormalize (“product of experts”)
– Documents are mixtures of topics ( just like LDA) 1. So topics are not independent! 2. Fewer parameters
34
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? ϕ1
{Canada}
ϕ2
{government}
ϕ3
{sports}
ϕ4
{U.S.}
ϕ5
{Japan}
b1
35
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1
π2
ϕ1
ϕ2
{Canada}
{government}
π3 ϕ3
{sports}
π4 ϕ4
{U.S.}
π5 ϕ5
{Japan}
b1
36
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1
π2
ϕ1
ϕ2
{Canada}
π3
{government}
ϕ3
{sports}
π4 ϕ4
{U.S.}
π5 ϕ5
{Japan}
b1
b1c ~ Bernoulli(πc) 37
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1
π2
ϕ1
ϕ2
{Canada}
π3
{government}
ϕ3
{sports}
π4 ϕ4
{U.S.}
π5 ϕ5
{Japan}
b1
b1c ~ Bernoulli(πc) 38
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1
π2
ϕ1
ϕ2
{Canada}
π3
{government}
ϕ3
{sports}
π4 ϕ4
{U.S.}
π5 ϕ5
{Japan}
b1
b1c ~ Bernoulli(πc) 39
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1
π2
ϕ1
ϕ2
{Canada}
π3
{government}
ϕ3
{sports}
π4 ϕ4
{U.S.}
π5 ϕ5
{Japan}
b1
b1c ~ Bernoulli(πc) 40
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1
π2
ϕ1
ϕ2
{Canada}
π3
{government}
ϕ3
{sports}
π4 ϕ4
{U.S.}
π5 ϕ5
{Japan}
b1
b1c ~ Bernoulli(πc) 41
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1
π2
ϕ1
ϕ2
{Canada}
π3
{government}
ϕ3
{sports}
π4 ϕ4
{U.S.}
π5 ϕ5
{Japan}
b1
{Canadian gov.}
b1c ~ Bernoulli(πc) 42
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1
π2
ϕ1
ϕ2
{Canada}
π3
{government}
ϕ3
{sports}
π4 ϕ4
{U.S.}
π5 ϕ5
{Japan}
b1
b2 {Canadian gov.} {government}
bkc ~ Bernoulli(πc) 43
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? π1
π2
ϕ1
ϕ2
{Canada}
b1
π3
{government}
b2 {Canadian gov.} {government}
b3 {hockey}
ϕ3
{sports}
b4 {U.S. gov.}
π4 ϕ4
{U.S.}
b5 {baseball}
π5 ϕ5
{Japan}
b6 {Japan}
bkc ~ Bernoulli(πc) 44
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? Beta(γ/C, 1) π1
π2
ϕ1
ϕ2
{Canada}
b1
π3
{government}
b2 {Canadian gov.} {government}
b3 {hockey}
ϕ3
{sports}
b4 {U.S. gov.}
π4 ϕ4
{U.S.}
b5 {baseball}
π5 ϕ5
{Japan}
b6 {Japan}
bkc ~ Bernoulli(πc) 45
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? Beta-‐Bernoulli model
– The finite version of the Indian Buffet Process (Griffiths & Ghahramani, 2006) – Prior over K x C binary matrices
46
Learning the Structure of Topics How do we decide which subset of components combine to form a single topic? Beta-‐Bernoulli model
– The finite version of the Indian Buffet Process (Griffiths & Ghahramani, 2006) – Prior over K x C binary matrices – We can stack the binary vectors to form a matrix
ϕ1
ϕ2
ϕ3
ϕ4
ϕ5 b1 {Canadian gov.} b2 {government} b3 {hockey} b4 {U.S. gov.} b5 {baseball} b6 {Japan} 47
Our Model Shared Components Topic Model (SCTM): – Generate a pool of “components” (proto-‐topics) – Assemble each topic from some of the components • MulTply and renormalize (“product of experts”)
– Documents are mixtures of topics (just like LDA) 1. So topics are not independent! 2. Fewer parameters
48
Our Model (SCTM) ϕ1
{Canada}
b1
ϕ2
{government}
b2 {Canadian gov.} {government}
b3 {hockey}
ϕ3
{sports}
b4 {U.S. gov.}
ϕ4
{U.S.}
b5 {baseball}
ϕ5
{Japan}
b6 {Japan}
How do we generate the components? 49
Our Model (SCTM) ϕ1
{Canada}
b1
ϕ2
{government}
b2 {Canadian gov.} {government}
b3 {hockey}
ϕ3
{sports}
b4 {U.S. gov.}
Dirichlet(β)
ϕ4
{U.S.}
b5 {baseball}
ϕ5
{Japan}
b6 {Japan}
How do we generate the components? 50
Our Model (SCTM) ϕ1
{Canada}
b1
ϕ2
{government}
b2 {Canadian gov.} {government}
b3 {hockey}
ϕ3
{sports}
b4 {U.S. gov.}
Dirichlet(β)
ϕ4
{U.S.}
b5 {baseball}
ϕ5
{Japan}
b6 {Japan}
51
Our Model (SCTM) ϕ1
{Canada}
ϕ2
ϕ3
{government}
b1
b2 {Canadian gov.} {government}
{sports}
b3 {hockey}
b4 {U.S. gov.}
Dirichlet(β)
ϕ4
{U.S.}
b5 {baseball}
ϕ5
{Japan}
b6 {Japan}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain
θ2= In the year before Lemieux came, Pi$sburgh finished with 38 points. Following his arrival, the
θ3= The Orioles' pitching staff again is having a fine exhibiTon season. Four shutouts, low team ERA,
52
SCTM
Beta(γ/C, 1) π1
π3
π2 ϕ1
{Canada}
ϕ2
b2 {Canadian gov.} {government}
{sports}
b3 {hockey}
π5
π4 ϕ3
{government}
b1
Dirichlet(β)
b4 {U.S. gov.}
ϕ4
{U.S.}
b5 {baseball}
ϕ5
{Japan}
b6 {Japan}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain
θ2= In the year before Lemieux came, Pi$sburgh finished with 38 points. Following his arrival, the
θ3= The Orioles' pitching staff again is having a fine exhibiTon season. Four shutouts, low team ERA,
53
SCTM
Beta(γ/C, 1) π1
π3
π2 ϕ1
{Canada}
ϕ2
b1
b2 {Canadian gov.} {government}
π4 ϕ3
{government}
{sports}
b3 {hockey}
b4 {U.S. gov.}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain
DistribuTons Dirichlet(β) π over words (topics)
θ2= In the year before Lemieux came, Pi$sburgh finished with 38 points. Following his arrival, the
5
ϕ4
{U.S.}
b5 {baseball}
ϕ5
{Japan}
b6 {Japan}
DistribuTons over θ3= topics (docs)
The Orioles' pitching staff again is having a fine exhibiTon season. Four shutouts, low team ERA,
54
Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) • • • • • • • • • • •
(Blei et al., 2004) (Rosen-‐Zvi et al., 2004) (Teh et al., 2004) (Blei & Lafferty, 2006) (Li & McCallum, 2006) (Mimno et al., 2007) (Boyd-‐Graber & Blei, 2009) (Williamson et al, 2010) (Paul & Girju, 2010) (Paisley et al, 2011) (Kim & Sudderth, 2011) 55
Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) • • • • • • • • • • •
Hierarchical LDA (hLDA) Author-‐Topic Model HDP mixture model Correlated Topic Models (CTM) Pachinko AllocaTon Model (PAM) Hierarchical PAM (hPAM) SyntacTc Topic Models Focused Topic Models 2D Topic-‐Aspect Model DILN for mixed-‐membership modeling Doubly Correlated Nonparametric TM 56
Correlated Topics • Correlated Topics – Correlated Topic Models (CTM) – Pachinko AllocaTon Model (PAM) – Hierarchical LDA (hLDA) – Hierarchical PAM (hPAM)
• Key difference from SCTM: correlaTon is limited to topics that appear together in the same document – Example: {hockey} and {baseball} topics share many words in common, but never appear in the same document
• The spirit of learning relaTonships between topics is very similar! 57
Our Model (SCTM) ϕ1
{Canada}
ϕ2
ϕ3
{government}
b1
b2 {Canadian gov.} {government}
θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain
{sports}
b3 {hockey}
b4 {U.S. gov.}
θ2= In the year before Lemieux came, Pi$sburgh finished with 38 points. Following his arrival, the
ϕ4
{U.S.}
b5 {baseball}
ϕ5
{Japan}
b6 {Japan}
θ3= The Orioles' pitching staff again is having a fine exhibiTon season. Four shutouts, low team ERA,
58
Correlated Topics
ϕ3
{sports}
• Correlated Topics – Correlated Topic Models (CTM) – Pachinko AllocaTon Model (PAM) – Hierarchical LDA (hLDA) – Hierarchical PAM (hPAM)
b3 {hockey}
b5 {baseball}
• Key difference from SCTM: correlaTon is limited to topics that appear together in the same document – Example: {hockey} and {baseball} topics share many words in common, but never appear in the same document
• The spirit of learning relaTonships between topics is very similar! 59
Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) • • • • • • • • • • •
Hierarchical LDA (hLDA) Author-‐Topic Model HDP mixture model Correlated Topic Models (CTM) Pachinko AllocaTon Model (PAM) Hierarchical PAM (hPAM) SyntacTc Topic Models Focused Topic Models 2D Topic-‐Aspect Model DILN for mixed-‐membership modeling Doubly Correlated Nonparametric TM
• • • •
(Wallach et al., 2009) (Reisinger et al., 2010) (Wang & Blei, 2009) (Eisenstein et al., 2011)
60
Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) • • • • • • • • • • •
Hierarchical LDA (hLDA) Author-‐Topic Model HDP mixture model Correlated Topic Models (CTM) Pachinko AllocaTon Model (PAM) Hierarchical PAM (hPAM) SyntacTc Topic Models Focused Topic Models 2D Topic-‐Aspect Model DILN for mixed-‐membership modeling Doubly Correlated Nonparametric TM
• • • •
Asymmetric Dirichlet prior Spherical Topic Models Sparse Topic Models SAGE for topic modeling
61
Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) • • • • • • • • • • •
Hierarchical LDA (hLDA) Author-‐Topic Model HDP mixture model Correlated Topic Models (CTM) Pachinko AllocaTon Model (PAM) Hierarchical PAM (hPAM) SyntacTc Topic Models Focused Topic Models 2D Topic-‐Aspect Model DILN for mixed-‐membership modeling Doubly Correlated Nonparametric TM
• • • • •
Asymmetric Dirichlet prior Spherical Topic Models Sparse Topic Models SAGE for topic modeling Shared Components Topic Models (this work)
62
Contrast of LDA Extensions DistribuTons DistribuTons over over Topic Model = + topics (docs) words (topics) • • • • •
Asymmetric Dirichlet prior Spherical Topic Models Sparse Topic Models SAGE for topic modeling Shared Components Topic Models (this work)
63
Comparison of a few Topic Models Dependently Fewer Generated Parameters Topics
Descrip5on
LDA (Blei et al., 2003) Asymmetric Dirichlet Prior (Wallach et al., 2009) Spherical Topic Model (Reisinger et al., 2010) SparseTM (Wang & Blei, 2009) SAGE (Eisenstein et al., 2011)
All topics drawn from language specific base distribuTon
Each topic is sparse
64
Comparison of a few Topic Models Dependently Fewer Generated Parameters Topics
Descrip5on
LDA (Blei et al., 2003) Asymmetric Dirichlet Prior (Wallach et al., 2009) Spherical Topic Model (Reisinger et al., 2010) SparseTM (Wang & Blei, 2009) SAGE (Eisenstein et al., 2011) SCTM (This paper)
All topics drawn from language specific base distribuTon
Each topic is sparse
Topics are products of a shared pool of components 65
Parameter EsTmaTon • Goal: infer values for model parameters ϕc
{Canada}
πc
θm=
• Monte Carlo EM (MCEM) algorithm, where the M-‐step minimizes a ContrasTve Divergence (CD) objecTve
66
Parameter EsTmaTon
Beta(γ) π1
π3
π2 ϕ1
{Canada}
ϕ2
{government}
b1
b2 {Canadian gov.} {government}
{sports}
b3 {hockey}
π5
π4 ϕ3
b4 {U.S. gov.}
Dirichlet(β)
ϕ4
{U.S.}
b5 {baseball}
ϕ5
{Japan}
b6 {Japan}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain
θ2= In the year before Lemieux came, Pi$sburgh finished with 38 points. Following his arrival, the
θ3= The Orioles' pitching staff again is having a fine exhibiTon season. Four shutouts, low team ERA,
67
Parameter EsTmaTon
Beta(γ) π1
π Integrated out
π2 ϕ1
{Canada}
3
ϕ2
{government}
b1
b2 {Canadian gov.} {government}
{sports}
b3 {hockey}
π5
π4
ϕ3
b4 {U.S. gov.}
Dirichlet(β)
ϕ4
{U.S.}
b5 {baseball}
ϕ5
{Japan}
b6 {Japan}
Dirichlet(α) θ1= The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain
Integrated out θ = 2
In the year before Lemieux came, Pi$sburgh finished with 38 points. Following his arrival, the
θ3= The Orioles' pitching staff again is having a fine exhibiTon season. Four shutouts, low team ERA,
68
Parameter EsTmaTon ϕ1
{Canada}
ϕ2
{government}
b1
b2 {Canadian gov.} {government} The 54/40' boundary dispute is sTll unresolved, and Canadian and US Coast Guard vessels regularly if infrequently detain each other's fish boats in the disputed waters off Dixon…
b3 {hockey}
ϕ3
{sports}
b4 {U.S. gov.}
In the year before Lemieux came, Pi$sburgh finished with 38 points. Following his arrival, the Pens finished…
ϕ4
{U.S.}
b5 {baseball}
ϕ5
{Japan}
b6 {Japan}
The Orioles' pitching staff again is having a fine exhibiTon season. Four shutouts, low team ERA, (Well, I haven't go$en any baseball… 69
Parameter EsTmaTon ϕ1
{Canada}
ϕ2
{government}
b1
b2 {Canadian gov.} {government} z11 z13
z14
ϕ3
z12 z15 z16
{sports}
b3 {hockey}
z21 z24
b4 {U.S. gov.}
ϕ4
ϕ5
{U.S.}
{Japan}
b5 {baseball} z31
z22 z23
b6 {Japan} z32 z33
z34 70
Parameter EsTmaTon ϕ1
{Canada}
ϕ2
b2 {Canadian gov.} {government}
z13
z14
Model parameters {sports}
{government}
b1
z11
ϕ3
z12 z15 z16
b3 {hockey}
b4 {U.S. gov.}
ϕ4
{U.S.}
z24
{Japan}
b5 {baseball}
Latent variables z21
ϕ5
z31
z22 z23
b6 {Japan} z32 z33
z34 71
Parameter EsTmaTon Standard Mϕ-‐step: Maximize likelihood of ϕcϕ ϕ ϕ
ϕ1
2
{Canada}
3
{government}
4
{sports}
condiToned on zmn and bck
b1
b2 {Canadian gov.} {government}
b3 {hockey}
{U.S.}
b4 {U.S. gov.}
5
{Japan}
b5 {baseball}
b6 {Japan}
Standard E-‐step: Compute expectaTons of zmn a nd z z z z bck condiToned zo n ϕc z 11
z13
32
31
12
z14
z15 z16
21
z24
22
z23
z33
z34 72
Parameter EsTmaTon Standard Mϕ-‐step: Maximize likelihood of ϕcϕ ϕ ϕ
ϕ1
2
{Canada}
3
{government}
4
{sports}
condiToned on zmn and bck
b1
b2 {Canadian gov.} {government}
b3 {hockey}
{U.S.}
b4 {U.S. gov.}
{Japan}
b5 {baseball}
b6 {Japan}
Monte-‐Carlo E-‐step: Sample zmn and bckz z z on z condiToned z ϕc z 11
z13
5
32
31
12
z14
z15 z16
21
z24
22
z23
z33
z34 73
Parameter EsTmaTon CD M contrasTve divergence of ϕϕc ϕ -‐step: Minimize ϕ ϕ ϕ 1
2
{Canada}
3
{government}
4
{sports}
condiToned on zmn and bck
b1
b2 {Canadian gov.} {government}
b3 {hockey}
{U.S.}
b4 {U.S. gov.}
{Japan}
b5 {baseball}
b6 {Japan}
Monte-‐Carlo E-‐step: Sample zmn and bckz z z on z condiToned z ϕc z 11
z13
5
32
31
12
z14
z15 z16
21
z24
22
z23
z33
z34 74
9 0 1 2
3 4 5 6 7 8
9 0 1 2
076 077 078
{Draw xmnjth∼sample p(· |z{Z, , bzmn} , φ) [dr mnB} for i = 1 to N do QC where bzc φ Sample zi c=1 c,x p(x|z, QC for k = 1 to K do bz , φ) = PV bzc φ for c = 1 to C do v=1 c=1 c,v Sample bkc {M-step:} CD M-‐step: for c = 1 to C do We follow Hinton (2002) for v = 1 to do Algorithm 1 VSCTM Training Single gradient step over ξ Initialize parameters: ξc , bkc , zi . d CD({Z, B}) (t+1) while not converged doφ(t) φcv = cv − η · dφcv {E-step:}
Parameter EsTmaTon
079 080 081 082 083 084
for j = 1 to J do Monte-‐Carlo E -‐step: {Draw jth sample {Z, B}(j) } 085 for i = 1 to N doBelow, we provide the Contrastive Divergence Sample zi 086 gence objective, where Z and B are treated as fixe for k = 1 to K do 087 � for c = 1 to C do B}) d SampledbCD({Z, 088 kc ≈− {M-step:} 75 dξcv 089
Parameter EsTmaTon • Goal: infer values for model parameters ϕc
{Canada}
πc
θm=
• Monte Carlo EM (MCEM) algorithm, where the M-‐step minimizes a ContrasTve Divergence (CD) objecTve
76
Experiments: Topic Modeling • Experiments: – Can SCTM combine a fixed number of components (mulTnomials) into topics to achieve lower perplexity? – Does SCTM achieve lower perplexity than LDA with a more compact model?
• Analysis: – What are the learned topics like? – What are the learned components like? – What topic-‐structure is learned? 77
Experiments: Topic Modeling Experimental Setup: – Datasets: • 1,000 random arTcles from 20 Newsgroups • 1,617 NIPS abstracts
– EvaluaTon: • les-‐to-‐right average perplexity on held-‐out data
– Models: • LDA trained with a collapsed Gibbs sampler – In LDA, components and topics are in a one-‐to-‐one relaTonship (i.e. a special case of the SCTM where each topic is comprised of only its corresponding component)
• SCTM with parameter esTmaTon as described 78
Experiments: Topic Modeling • Experiments: – Can SCTM combine a fixed number of components (mulTnomials) into topics to achieve lower perplexity? – Does SCTM achieve lower perplexity than LDA with a more compact model?
• Analysis: – What are the learned topics like? – What are the learned components like? – What topic-‐structure is learned? 79
Experiments: Topic Modeling 20News
1800
● ●
1600
Perplexity
● ●
LDA
● ●
1400
● ●
1200
● ● ● ● ● ●
1000 800 0
20
40
60
# of Components
80
100 80
Experiments: Topic Modeling 10
1800
20News ● ●
1600
Perplexity
● ●
20 ● ●
1400
SCTM with # components = # topics (labels show # topics)
40 ● ●
1200
LDA SCTM
60
80 ● ●
100 ● ● ● ●
1000 800 0
20
40
60
# of Components
80
100 81
Experiments: Topic Modeling 10
1800
20News ● ●
1600
Perplexity
● 20 ● 20
● 40 ●
1400
LDA SCTM
40 80
● ●
1200
(labels show # topics)
60 80 ● ●
100 ● ●
120
● ●
160
200
1000 800 0
20
40
60
# of Components
80
100 82
Experiments: Topic Modeling 10
1800
20News ● ●
1600
Perplexity
● 20 ● 20 30 ● 40 ●
1400
40
60 80
1200
LDA SCTM
● ●
(labels show # topics)
60 80 ● ●
120
100 ● ●
120 180
1000
● ●
160
200
240
300
800 0
20
40
60
# of Components
80
100 83
Experiments: Topic Modeling 10
1800
20News ● ●
1600
Perplexity
● 20 ● 20 30
40
1400
● 40 ●
60 80
1200
LDA SCTM
40 80
● ●
(labels show # topics)
60 80 ● ●
120
100 ● ●
120
160
180 240
1000
● ●
160
200
240 320
300 400
800 0
20
40
60
# of Components
80
100 84
Experiments: Topic Modeling 10
1800
20News ● ●
1600
Perplexity
● 20 ● 20 30
40 50
1400
● 40 ●
60 80 100
1200
40 80
● ●
(labels show # topics)
60 80 ● ●
120
100 ● ●
120
160 200
1000
● ●
160
180 240 300
200
240 320 400
300 400 500
800 0
20
LDA SCTM
40
60
# of Components
80
100 85
Experiments: Topic Modeling NIPS
700 ● ●
● ●
LDA
600
Perplexity
● ●
500
● ●
● ● ● ●
400
● ●
300 0
20
40
60
# of Components
80
100 86
Experiments: Topic Modeling 10
700
NIPS ● ●
● ●
20
600
Perplexity
30 40 50
20 ● ●
40
40
60
500
80 100
400
(labels show # topics)
● ●
80
60
120 160 200
120
● ● ● 80 ●
180 240 300
20
40
100
160 240 320 400
300 0
LDA SCTM
60
# of Components
● ●
200 300 400 500
80
100 87
Experiments: Topic Modeling • Experiments: – Can SCTM combine a fixed number of components (mulTnomials) into topics to achieve lower perplexity? – Does SCTM achieve lower perplexity than LDA with a more compact model?
• Analysis: – What are the learned topics like? – What are the learned components like? – What topic-‐structure is learned? 88
Experiments: Topic Modeling ● 10 ●
20News ● 20 ●
1400
● ●
Labels for LDA show # topics. Labels for SCTM show # components, # topics
Perplexity
● 40 ●
1200
LDA
● 60 ● ● 80 ● ● 100 ●
● 120 ● ● 140 ●
1000
800 0
100
200
300
400
500
# of Model Parameters (thousands)
600 89
Experiments: Topic Modeling ● 10 ● 10,20 10,30
● ●
● 20 ● 10,4020,40 10,50 20,60
Perplexity
1400
20News
20,80
1200
LDA SCTM
Labels for LDA show # topics. Labels for SCTM show # components, # topics
● 40 ● 40,80 ● 60 ●
20,100 40,120
60,120
● 80 ●
● 100 ● 40,160 ● 120 ● 80,160 60,180 ● 140 ● 100,200 40,200 60,240 80,240 100,300 60,300 80,320 80,400 100,400 100,500
1000
800 0
100
200
300
400
500
# of Model Parameters (thousands)
600 90
Experiments: Topic Modeling ● 10 ●
NIPS
600
Perplexity
● 20 ●
500
● ●
LDA
● 40 ●
● 60 ● ● 80 ●
400
● 100 ● ● 120 ●
● 140 ●
● 160 ●
● 180 ●
● 200 ●
300 0
100
200
300
# of Model Parameters (thousands)
400 91
Experiments: Topic Modeling ● 10 ● 10,20
10,30 ● 10,40 20 ● 10,50 20,40
Perplexity
600
NIPS
20,60
500
● ●
LDA SCTM
● 40 ●
20,80 40,80 20,100
● 60 ●
40,120
60,120 80 ● ● 100 ● 40,160 ● 60,180 120 ● 40,200 80,160 ● 140 ● ● 60,240 80,240 160 ● ● 180 ● 100,200 100,300 60,300 ● 200 ● 80,320 100,400 80,400 100,500
400
300 0
100
●
200
300
# of Model Parameters (thousands)
400 92
Experiments: Topic Modeling • Experiments: – Can SCTM combine a fixed number of components (mulTnomials) into topics to achieve lower perplexity? – Does SCTM achieve lower perplexity than LDA with a more compact model?
• Analysis: – What are the learned topics like? – What are the learned components like? – What topic-‐structure is learned? 93
What does SCTM learn?
Figure 2: SCTM binary matrix and topics from 3599 training documents of 20N EWS for C squares are “on” (equal to 1).
20News
20
y
15
10
5
2
4
x
1800
6
k 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
αk 0.306 0.031 0.025 0.102 0.071 0.018 0.074 0.106 0.011 0.055 0.063 0.160 0.042 0.038 0.079 0.158 0.136 0.122 0.017 0.380
Top words for topic subject organization israel return define law org encryption chip clipper keys des escrow security law turkish armenian armenians war turkey turks armenia drive card disk scsi hard controller mac drives image jpeg window display code gif color mit jews israeli jewish arab peace land war arabs org money back question years thing things point christian bible church question christ christians life administration president year market money senior health medical center research information april gun law state guns control bill rights states world organization system israel state usa cwru reply space nasa gov launch power wire ground air space nasa gov launch power wire ground air team game year play games season players hockey car lines dod bike good uiuc sun cars windows file government key jesus system program article writes center page harvard virginia research max output access digex int entry col line lines people don university posting host nntp time
Top words for topic a organization subject is administration preside years food center year opinions drive hard po pitt file program year c
! 10 10,20 11 ! 10,30
20,40 20 10,40 21 !!
1400 10,50
20,60
Perplexity
← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ←
20,80 40
1200
20,100
40
4
1000 800 0
10
# of Mode
8 10
10 !
LDA
700
10 !
LDA
10 ! 11 !
94
What does SCTM learn?
Figure 2: SCTM binary matrix and topics from 3599 training documents of 20N EWS for C are “on” (equalmatrix to 1). 2:squares SCTM binary and topics from 3599 training documents o
20News Figure k α Top words for topic squares are “on” (equal to 1). 20
y
15
10
5
2
4
x
1800
6 10
← 14 8 10 ← 15 ← 16
k
0.306 0.031 α 0.025k 0.306 0.102 0.071 0.031 0.018 0.025 0.074 0.106 0.102 0.011 0.071 0.055 0.018 0.063 0.160 0.074 0.042 0.106 0.038 0.011 0.079 0.158 0.055 0.136 0.063 0.122 0.160 0.017 0.380 0.042
Top words for topic a subject organization israel return define law org organization subject is encryption chip clipper keys des escrow security law administration preside Top words for topic turkish armenian armenians war turkey turks armenia years food center year subject organization israel law org drive card disk scsi hard controller macreturn drives define opinions drive hard po image jpeg window display code gifkeys color des mit escrowpitt file program encryption chip clipper security lawyear c jews israeli jewish arab peace land war arabs armenian armenians war turkey turks armenia orgturkish money back question years thing things point christian christcontroller christians lifemac drives drivebible cardchurch diskquestion scsi hard 10 administration president year market money senior 11 10,30 image jpeg window display code gif color mit 10,20 health medical center research information april 20,40 20 10,40 21 israeli arab gunjews law state guns jewish control bill rightspeace states land war arabs 1400 10,50 world systemquestion israel state years usa cwru reply things point 20,60 orgorganization money back thing 20,80 40 space nasa gov launch power wire ground air 20,100 1200 christian bible church question christ christians life 40 space nasa gov launch power wire ground air 4 administration president year market team game year play games season players hockey money senior 1000 carhealth lines dodmedical bike good center uiuc sun research cars information april windows file government key jesus system program gun law state guns control bill rights states 800 article writes center page harvard virginia research world system israel state usa cwru reply max output organization access digex int entry col line 0 10 lines peoplenasa don university postingpower host nntp time ground air space gov launch wire ! !
! !
Perplexity
← 1 ← 2 k ← 3 ← ← 14 ← ← 25 ← 6 ← ← 37 ← ← 48 ← ← 59 ← 10 ← ← 611 ← ← 712 13 ←← 8 ← 14 ← ← 915 ← ←1016 ← 17 ← ←1118 ← ←1219 ← ←1320
# of Mode
0.038 space nasa gov launch power wire ground air 0.079 team game year play games season players hockey 10 10 0.158 11 700 dod bike good uiuc sun cars LDA car lines LDA !
!
! !
95
Figure 2: SCTM binary matrix and topics from 3599 training documents o squares are “on” (equal to 1).
What does SCTM learn?
k αk Top words for topic ← matrix 1 0.306 subjectfrom organization israel documents return defineoflaw org Figure 2: SCTM binary and topics 3599 training 20N EWS for C ← 2 0.031 encryption chip clipper keys des escrow security law squares are “on” (equal to 1). ← 3 0.025 turkish armenian armenians war turkey turks armenia αk Top words for topic words for topic a ← ← 4k1 0.102 drive card disk scsi hard controller mac Top drives 0.306 subject organization israel return define law org organization subject is ← ← 5 2 0.071 image chip jpegclipper window display code gif mit 0.031 encryption keys des escrow security law color administration preside 0.025 turkish turkey turksland armenia years food center year ← ← 6 3 0.018 jewsarmenian israeliarmenians jewish war arab peace war arabs ← 4 0.102 drive card disk scsi hard controller mac drives opinions drive hard po ← ← 7 5 0.074 org money back question years thing things 0.071 image jpeg window display code gif color mit pitt filepoint program year c ← ← 8 6 0.106 christian bible question 0.018 jews israeli jewish arabchurch peace land war arabs christ christians life ← 7 0.074 org money back question years thing thingsmarket point ← 9 0.011 administration president year money senior ← 8 0.106 christian bible church question christ christians life ← ←10 9 0.055 health medical research information april 10,201011 0.011 administration presidentcenter year market money senior 10,30 0.055 health center guns research information ← ←1110 0.063 gunmedical law state control billapril rights states 20,40 20 10,40 21 ← 11 0.063 gun law state guns control bill rights states 1400 10,50 ← 12 0.160 world organization system israel state usa cwru reply 20,60 ← 12 0.160 world organization system israel state usa cwru reply 20,80 40 ← ←1313 0.042 space govpower launch wire ground air 1200 20,100 0.042 space nasa nasa gov launch wirepower ground air 40 0.038 space nasa nasa gov launch wirepower ground air ← ←1414 0.038 space govpower launch wire ground air 4 15 0.079 team game year play games season players hockey ←← 15 0.079 team game year play games season players hockey 1000 ← 16 0.158 car lines dod bike good uiuc sun cars ← ←1617 0.158 car lines dod bikekey good cars 0.136 windows file government jesus uiuc systemsun program 800 0.122 article writes center harvard virginia ← ←1718 0.136 windows file page government keyresearch jesus system program ← 19 0.017 max output access digex int entry col line 0 10 ← ←1820 0.122 article center pagehost harvard 0.380 lines people writes don university posting nntp timevirginia research # of Mode ← 19 0.017 max output access digex int entry col line ← 20 0.380 lines people don university posting host nntp time 2 4 6 8 10
20News
20
15
10
5
! !
Perplexity
y
! !
x
1800
10 !
LDA
700
10 !
LDA
10 ! 11 !
96
Experiments: Topic Modeling • Experiments: – Can SCTM combine a fixed number of components (mulTnomials) into topics to achieve lower perplexity? – Does SCTM achieve lower perplexity than LDA with a more compact model?
• Analysis: – What are the learned topics like? – What are the learned components like? – What topic-‐structure is learned? 97
SCTM: Hasse Diagram over Topics c=9 visual image images cells cortex scene support spatial feature vision cues stimulus statistics
c=4 paper units output layer networks patterns unit pattern set rule network rules weights training
c=2
c=1
network networks data learning optimal linear vector independent binary natural algorithms pca
model information parameters kalman robust matrices likelihood experimentally
k=16 αk =0.11 training units paper hidden number output problem rule set order unit show present method weights task
k=14 αk =0.07 models images image problem structure analysis mixture clustering approach show computational
k=8 αk =0.23 algorithm training error function method performance input classification classifier
k=4 αk =0.12 bayesian results show estimation method based parameters likelihood methods models
k=18 αk =0.07 information analysis component rules signal independent representations noise basis
k=3 αk =0.06 object recognition system objects information visual matching problem based classification
k=12 αk =0.13 problem state control reinforcement problems models time based decision markov systems function
k=5 αk =0.04 object recognition system objects information visual matching problem based classification
k=9 αk =0.02 vector feature classification support vectors kernel regression weight inputs dimensionality
NIPS
k=11 αk =0.08 learning networks system recognition time network describes hand context views classification
k=1 αk =0.11 model learning system information parameters networks robust kalman rules estimation
k=13 αk =0.05 networks network learning distributed system weight vectors property binary point optimal real
k=7 αk =0.08 data paper networks network output feature features patterns set train introduced unit functions
k=10 αk =0.09 neural neurons analog synaptic neuron networks memory time capacity model associative noise dynamics
k=2 αk =0.13 network input information time recurrent back propagation units architecture forward layer
k=19 αk =0.03 system networks set neurons visual phase feature processing features output associative
k=20 αk =0.02 time network weights activation delay current chaotic connected discrete connections
k=6 αk =0.23 neural network paper recognition speech systems based results performance artificial
k=17 αk =0.10 number functions weights function layer generalization error results loss linear size
k=15 αk =0.12 cells neurons visual cortex motion response processing spatial cell properties patterns spike
Figure 4: Hasse diagram on NIPS for C = 10, K = 20 showing the top words for topics and unrepresented components (in shaded box). Notice that some topics only consist of a single component. The shaded box contains the
98
m state rol ement models ased markov unction
data paper networks network output feature features patterns set train introduced unit functions
SCTM: Hasse Diagram over Topics
c=9 visual image images cells cortex scene support spatial feature vision cues stimulus statistics
c=4 paper units output layer networks patterns unit pattern set rule network rules weights training
k=11 αk =0.08 learning networks system recognition time network describes hand context views classification
k=8 αk =0.23 algorithm training error function method performance input classification classifier
=1 αk =0.11 model learning system information parametersα =0.12 tworks k=4 robust bayesian results show kalman rules estimation method based estimation parameters k
likelihood methods models
c=2
c=1
network networks data learning optimal linear vector independent binary natural algorithms pca
model information parameters kalman robust matrices likelihood experimentally
k=18 αk =0.07 information analysis component rules signal independent representations k noise basis
k=10 αk =0.09 neural neurons analog synaptic networks α =0.11 k=14 α =0.07neuron k=16 training units models images memory time paper hidden image problem model structure capacity number output analysis mixture problem rule set associative clustering order unit show approach show present method computational noise dynamics weights task k
k
k=3 αk =0.06 object recognition system objects information visual matching problem based classification
k=13 α =0.05 networks network learning distributed system weight vectors property binary point optimal real
k=5 αk =0.04 object recognition system objects information visual matching problem k based classification
weight inputs dimensionality
k=7 αk =0.08 data paper networks network output feature features patterns set train introduced unit functions
k=2 αk =0.13 network input information time recurrent back k=10 α =0.09 k=11 α =0.08 neural neurons learningpropagation analog synaptic networks system neuron networks recognition time units memory time network architecture capacity model describes hand context views associative forward layer classification noise dynamics k
k
k=1 αk =0.11 model learning system information parameters networks robust kalman rules estimation
k=19 α =0.03 system networks set neurons visual phase feature k=9 α =0.02 processing vector feature classification features output support vectors kernel associative regression k
NIPS
k=12 αk =0.13 problem state control reinforcement problems models time based decision markov systems function
k=13 αk =0.05 networks network learning distributed system weight vectors property binary k point optimal real
k=20 α =0.02 time network weights activation delay current chaotic connected discrete connections
k=19 αk =0.03 system networks set neurons visual phase feature processing features output associative
k=2 αk =0.13 network input information time recurrent back propagation units architecture forward layer
k=20 αk =0.02 time network weights activation delay current chaotic connected discrete kconnections
k=6 α =0.23 neural network paper recognition speechk=17systems α =0.10 based results number functions performance weights function layer artificial generalization k
error results loss linear size
k=6 αk =0.23 neural network paper recognition speech systems based results performance artificial
k=15 αk =0.12 cells neurons visual cortex motion response processing spatial cell properties patterns spike
Figure 4: Hasse diagram on NIPS for C = 10, K = 20 showing the top words for topics and unrepresented components (in shaded box). Notice that some topics only consist of a single component. The shaded box contains the
99
SCTM: Hasse Diagram over Topics c=9 visual image images cells cortex scene support spatial feature vision cues stimulus statistics
c=4 paper units output layer networks patterns unit pattern set rule network rules weights training
c=2
c=1
network networks data learning optimal linear vector independent binary natural algorithms pca
model information parameters kalman robust matrices likelihood experimentally
k=16 αk =0.11 training units paper hidden number output problem rule set order unit show present method weights task
k=14 αk =0.07 models images image problem structure analysis mixture clustering approach show computational
k=8 αk =0.23 algorithm training error function method performance input classification classifier
c=9 visual image images cells cortex scene support spatial feature vision cues stimulus k=18 α =0.07 =0.06 statistics k=3 αobject information k
analysis component rules signal independent representations noise basis
k
recognition system objects information visual matching problem based classification
k=12 αk =0.13 problem state control reinforcement problems models time based decision markov systems function
recognition system objects information visual matching problem based classification
k
system information parameters networks robust kalman rules estimation
k=7 αk =0.08 data paper networks network output feature features patterns set train introduced unit functions
k=10 αk =0.09 neural neurons analog synaptic neuron networks memory time capacity model associative noise dynamics
k=2 αk =0.13 network input information time recurrent back propagation units architecture forward layer
k
k
k
network learning distributed system weight vectors property binary point optimal real
set neurons visual phase feature processing features output associative
weights activation delay current chaotic connected discrete connections
k=11 αk =0.08 learning networks system recognition time network describes hand context views classification
c=4 paper units output layer networks patterns unit pattern set rule network rulesk=1 α =0.11 k=5 α =0.04 weights training object model learning k
NIPS
c=2
c=1
network model networks data information parameters learning optimal kalman robust linear vector matrices independent likelihood binary natural k=20 α =0.02 k=13 α =0.05 k=19 α =0.03 experimentally algorithms time network networks pca system networks
k=12 αk =0. problem st control reinforcem problems mo time base decision ma k=6 α =0.23 systems func neural network k
paper recognition speech systems based results performance artificial
α=0.12 k=15 α k=14 αk =0.07 k=17 α =0.10 k=16 k =0.11 cells neurons number visual cortexunits functions training models images weights motion response function processing layer paper hidden image problem generalization spatial cell structure number output properties error results patterns spike analysis mixture loss linear size problem rule set clustering order unit show approach show present commethod Figure 4: Hasse diagram on NIPS for C = 10, K = 20 showing the top words for topics and unrepresented computational weights task 100 ponents (in shaded box). Notice that some topics only consist of a single component. The shaded box contains the k=4 αk =0.12 bayesian results show estimation method based parameters likelihood methods models
k=9 αk =0.02 vector feature classification support vectors kernel regression weight inputs dimensionality
k
k
Experiments: Topic Modeling • Experiments:
– For the same number of components (mulTnomials), SCTM achieves lower perplexity than LDA – Non-‐square SCTM achieves lower perplexity than LDA with a more compact model
• Analysis:
– SCTM learns diverse LDA-‐like topics – Components are usually only interpretable when they also appear as a topic – SCTM learns an implicit Hasse diagram defining subsumpTon relaTonships between topics
101
Summary Shared Components Topic Model (SCTM): 1. Generate a pool of “components” (proto-‐topics) 2. Assemble each topic from some of the components • MulTply and renormalize (“product of experts”)
3. Documents are mixtures of topics ( just like LDA) – So the wordlists of two topics are not generated independently! – Fewer parameters 102
Future Work • Improve inference for SCTM • Topics as products of components in other applica5ons – SelecTonal preference: components could correspond to semanTc features that intersect to define semanTc classes – Vision: topics are classes of objects, the components could be features of those objects
103
Thank you!
QuesTons, comments?
104