Modeling and Learning Semantic Co-Compositionality
through Prototype Projections and Neural Networks
Masashi Tsubaki, Kevin Duh, Masashi Shimbo, Yuji Matsumoto
Nara Institute of Science and Technology (NAIST), Japan
Contributions
Two contributions in our work
New model of
compositionality
in word vector space
Unsupervised word vector
re-training algorithm
considering compositionality
Masashi Tsubaki
1
Contributions
Two contributions in our work
New model of
compositionality
in word vector space
Unsupervised word vector
re-training algorithm
considering compositionality
Masashi Tsubaki
1
Introduction
Modeling of compositionality in word vector space From word to phrase representation with matrix-vector operation
[Mitchell and Lapata 08], [Baroni and Zamparell 10], [Socher+ 12], [Van de Cruys+ 13]
run marathon
= race
run company
= operate run
marathon
company
Masashi Tsubaki
2
Introduction
Modeling of compositionality in word vector space From word to phrase representation with matrix-vector operation
[Mitchell and Lapata 08], [Baroni and Zamparell 10], [Socher+ 12], [Van de Cruys+ 13]
run marathon
= race
run company
= operate run
marathon
company
Masashi Tsubaki
2
Introduction
Modeling of compositionality in word vector space From word to phrase representation with matrix-vector operation
[Mitchell and Lapata 08], [Baroni and Zamparell 10], [Socher+ 12], [Van de Cruys+ 13]
run marathon
= race
run company
= operate run
marathon
company
Masashi Tsubaki
2
Introduction
Modeling of compositionality in word vector space From word to phrase representation with matrix-vector operation
[Mitchell and Lapata 08], [Baroni and Zamparell 10], [Socher+ 12], [Van de Cruys+ 13]
run marathon
= race
run company
= operate run
marathon
company
New model inspired by Co-Compositionality Masashi Tsubaki
2
Our Model
Main Idea : Co-Compositionality [Pustejovsky 1995]
Co-compositionality Verb and object are allowed to modify each other’s meanings and generate the overall semantics
Masashi Tsubaki
3
Our Model
Main Idea : Co-Compositionality [Pustejovsky 1995]
Co-compositionality Verb and object are allowed to modify each other’s meanings and generate the overall semantics f( run , company ) = operate
f( run , marathon ) = race
Masashi Tsubaki
3
Our Model
Main Idea : Co-Compositionality [Pustejovsky 1995]
Co-compositionality Verb and object are allowed to modify each other’s meanings and generate the overall semantics f( runcompany , company ) = operate
f( runmarathon , marathon ) = race
Masashi Tsubaki
3
Our Model
Main Idea : Co-Compositionality [Pustejovsky 1995]
Co-compositionality Verb and object are allowed to modify each other’s meanings and generate the overall semantics f( runcompany , companyrun ) = operate
f( runmarathon , marathonrun ) = race
Masashi Tsubaki
3
Our Model
Main Idea : Co-Compositionality [Pustejovsky 1995]
Co-compositionality Verb and object are allowed to modify each other’s meanings and generate the overall semantics f( runcompany , companyrun ) = operate
f( runmarathon , marathonrun ) = race
Masashi Tsubaki
3
Our Model
Main Idea : Co-Compositionality [Pustejovsky 1995]
Co-compositionality Verb and object are allowed to modify each other’s meanings and generate the overall semantics f( runcompany , companyrun ) = operate
f( runmarathon , marathonrun ) = race
Question
How do we implement
co-compositionality in vector space ? Masashi Tsubaki
3
Our Model
Prototype Projection for Co-Compositionality
Prototype Projection
Matrix-vector operation as an
implementation for Co-Compositionality
Masashi Tsubaki
4
Our Model
Co-Compositionality with Prototype Projections
run
company
Masashi Tsubaki
5
Our Model
Co-Compositionality with Prototype Projections
run
company
VerbOf start
build
・・・
buy
Prototype verbs
of “company” Masashi Tsubaki
5
Our Model
Co-Compositionality with Prototype Projections
run
company
VerbOf start
build
・・・
・・・
buy
≈
×
Prototype verbs
of “company” Masashi Tsubaki
5
Our Model
Co-Compositionality with Prototype Projections
run
company
VerbOf start
build
・・・
・・・
buy
≈
Latent subspace formed by
prototype verb vectors of company
×
V
Prototype verbs
of “company” Masashi Tsubaki
5
Our Model
Co-Compositionality with Prototype Projections runcompany run
Pcompany=VTV (Orthogonal projection
matrix to V)
company
VerbOf start
build
・・・
・・・
buy
≈
Latent subspace formed by
prototype verb vectors of company
×
V
Prototype verbs
of “company” Masashi Tsubaki
5
Our Model
Co-Compositionality with Prototype Projections runcompany Prototype
Projection
run
Pcompany=VTV (Orthogonal projection
matrix to V)
company
VerbOf start
build
・・・
・・・
buy
≈
Latent subspace formed by
prototype verb vectors of company
×
V
Prototype verbs
of “company” Masashi Tsubaki
5
Our Model
Co-Compositionality with Prototype Projections runcompany Prototype
Projection
run
Pcompany=VTV (Orthogonal projection
matrix to V)
VerbOf start
build
ObjectOf firm
×
bank
V
Prototype verbs
of “company”
hotel
・・・
・・・
・・・
buy
≈
company
Prototype objects
of “run” Masashi Tsubaki
5
Our Model
Co-Compositionality with Prototype Projections runcompany Prototype
Projection
run
Pcompany=VTV (Orthogonal projection
matrix to V)
VerbOf start
build
ObjectOf firm
×
bank
Prototype verbs
of “company”
hotel
≈
× ・・・
V
・・・
・・・
・・・
buy
≈
company
O
Prototype objects
of “run” Masashi Tsubaki
5
Our Model
Co-Compositionality with Prototype Projections companyrun
runcompany Prototype
Projection
run
Pcompany=VTV (Orthogonal projection
matrix to V)
VerbOf start
build
company
ObjectOf firm
×
bank
Prototype verbs
of “company”
hotel
≈
× ・・・
V
・・・
・・・
・・・
buy
≈
Prun=OTO
O
Prototype objects
of “run” Masashi Tsubaki
5
Our Model
Co-Compositionality with Prototype Projections operate = runcompany Prototype
Projection
run
+
Pcompany=VTV (Orthogonal projection
matrix to V)
VerbOf start
build
Prun=OTO
company
ObjectOf firm
×
bank
Prototype verbs
of “company”
hotel
≈
× ・・・
V
・・・
・・・
・・・
buy
≈
companyrun
O
Prototype objects
of “run” Masashi Tsubaki
5
Our Model
Intuitive image of prototype projection run Assume various senses of “run”
are aggregated in one vector
buy
Prototype verbs
of “company”
operate
build start
race
finish
Prototype verbs
of “marathon”
Tease out the proper semantics from
aggregate representation by projection to latent space Masashi Tsubaki
6
Experiment
Evaluation : Verb disambiguation in subject-verb-object triples Evaluation dataset [Grefenstette and Sadrzadeh 11]
Subj-Verb-Obj
Landmark verb
Similarity of human judgment
People-run-company
operate
7
People-run-company
move
2
200 subject-verb-object triples judged by 25 participants
Masashi Tsubaki
7
Experiment
Evaluation : Verb disambiguation in subject-verb-object triples Evaluation dataset [Grefenstette and Sadrzadeh 11]
Subj-Verb-Obj
Landmark verb
Similarity of human judgment
People-run-company
operate
7
People-run-company
move
2
200 subject-verb-object triples judged by 25 participants
Final co-compositional vector for subject-verb-object
subj + cocompositioned(verb, obj)
Masashi Tsubaki
7
Experiment
Evaluation : Verb disambiguation in subject-verb-object triples Evaluation dataset [Grefenstette and Sadrzadeh 11]
Subj-Verb-Obj
Landmark verb
Similarity of human judgment
People-run-company
operate
7
People-run-company
move
2
200 subject-verb-object triples judged by 25 participants
Final co-compositional vector for subject-verb-object
subj + cocompositioned(verb, obj) Models are evaluated by Spearman’s rank correlation
between vectors’ computed similarity and human judgment Masashi Tsubaki
7
Experiment
Implementation details run VerbOf start
build
ObjectOf firm
×
bank
hotel
≈
× ・・・
V
・・・
・・・
・・・
buy
≈
company
O
Extracted 20 prototype words from ukWaC corpus
Masashi Tsubaki
8
Experiment
Implementation details run VerbOf start
build
ObjectOf firm
×
bank
hotel
≈
× ・・・
V
・・・
・・・
・・・
buy
≈
company
O
high frequency
Extracted 20 prototype words from ukWaC corpus
Masashi Tsubaki
8
Experiment
Implementation details run VerbOf start
build
ObjectOf firm
×
bank
hotel
≈
× ・・・
V
・・・
・・・
・・・
buy
≈
company
O
both high frequency and high similarity
high frequency
Extracted 20 prototype words from ukWaC corpus
Masashi Tsubaki
8
Experiment
Implementation details run VerbOf start
build
firm
bank
80% of the
top singular values
hotel
≈
× ・・・
V
・・・
high frequency
ObjectOf
× ・・・
・・・
buy
≈
company
O
both high frequency and high similarity
Extracted 20 prototype words from ukWaC corpus
Masashi Tsubaki
8
Experiment
Implementation details run VerbOf start
build
firm
bank
80% of the
top singular values
hotel
≈
× ・・・
V
・・・
high frequency
ObjectOf
× ・・・
・・・
buy
≈
company
O
both high frequency and high similarity
Extracted 20 prototype words from ukWaC corpus
Word representation [Blacoe and Lapata 12]
①Distributional vector (2000 dim) ②Neural vector (50 dim)
Masashi Tsubaki
8
Experiment
Baselines : Models compared to ours
Add
[Mitchell and Lapata 08]
sbj + verb + obj
Multiply
[Mitchell and Lapata 08]
sbj × verb × obj
Grefenstette and Sadrzadeh 11
Mathematical model based on
abstract categorical framework
Van de Cruys+13
Multi-way interaction model
based on non-negative matrix factorization
Masashi Tsubaki
9
Result and Discussion
Correlation with human judgment (Distributional vector) Achieves high performance (ρ = 0.41)
0.45
0.41
0.4
Correlation ρ
0.35
0.37
0.35 0.31
0.3 0.25
0.21
0.2 0.15 0.1 0.05 0 1 Add
2 3 4 Multiply Grefenstette and
Van de
Sadrzadeh 11 Cruys+ 13
Masashi Tsubaki
Our 5 Model
10
Result and Discussion
Correlation with human judgment (Neural vector) State of the art performance (ρ= 0.44)
0.5
0.44
0.45
0.37
Correlation ρ
0.4 0.35
0.31
0.3 0.25
0.3 0.21
0.2 0.15 0.1 0.05 0
1 Add
2 3 Grefenstette and
Van4 de
Multiply Sadrzadeh 11 Cruys+ 13
Masashi Tsubaki
5 Our Model
11
Result and Discussion
Correlation with human judgment (Neural vector) State of the art performance (ρ= 0.44)
0.5
0.44
0.45
0.37
Correlation ρ
0.4 0.35
0.31
0.3 0.25
0.3 0.21
0.2 0.15 0.1 0.05 0
1 Add
2 3 Grefenstette and
Van4 de
Multiply Sadrzadeh 11 Cruys+ 13
5 Our Model
Co-Compositionality is useful for word sense disambiguation
Prototype projection is effective implementation for Co-Compositionality
Masashi Tsubaki
11
Introduction
Two contributions in our work
New model of
compositionality
in word vector space
Unsupervised word vector
re-training algorithm
considering compositionality
Masashi Tsubaki
12
Introduction
Two contributions in our work
New model of
compositionality
in word vector space
Unsupervised word vector
re-training algorithm
considering compositionality
Masashi Tsubaki
13
Our Model
Compositional Neural Language Model Re-training word representation with decomposition of phrase vector
Masashi Tsubaki
14
Our Model
Compositional Neural Language Model Re-training word representation with decomposition of phrase vector s=uTz
①Compute the score s of
correct phrase
u z = v + o run
v
company
o
Masashi Tsubaki
14
Our Model
Compositional Neural Language Model Re-training word representation with decomposition of phrase vector sc=uTz u zc = vc + o eat
vc
company
①Compute the score s of
correct phrase
②Compute the score sc of
corrupted incorrect phrase
o
Masashi Tsubaki
14
Our Model
Compositional Neural Language Model Re-training word representation with decomposition of phrase vector sc=uTz u zc = vc + o eat
vc
company
①Compute the score s of
correct phrase
②Compute the score sc of
corrupted incorrect phrase
o
J = max ( 0,1− s + sc ) Correct score > Incorrect score
Masashi Tsubaki
14
Our Model
Compositional Neural Language Model Re-training word representation with decomposition of phrase vector sc=uTz u zc = vc + o eat
vc
company
o
①Compute the score s of
correct phrase
②Compute the score sc of
corrupted incorrect phrase
③Minimize cost function
by SGD, u → unew, z → znew
J = max ( 0,1− s + sc ) Correct score > Incorrect score
Masashi Tsubaki
14
Our Model
Compositional Neural Language Model Re-training word representation with decomposition of phrase vector s=uTz unew znew
①Compute the score s of
correct phrase
②Compute the score sc of
corrupted incorrect phrase
③Minimize cost function
by SGD, u → unew, z → znew
J = max ( 0,1− s + sc ) Correct score > Incorrect score
Masashi Tsubaki
14
Our Model
Compositional Neural Language Model Re-training word representation with decomposition of phrase vector s=uTz
①Compute the score s of
correct phrase
unew
znew ②Compute the score sc of
corrupted incorrect phrase
company run ③Minimize cost function
vnew o by SGD, u → unew, z → znew
J = max 0,1− s + sc ④New verb vector is
Correct score > Incorrect score vnew = znew - o
(
)
Masashi Tsubaki
14
Our Model
Compositional Neural Language Model Re-training word representation with decomposition of phrase vector s=uTz
①Compute the score s of
correct phrase
unew
znew ②Compute the score sc of
corrupted incorrect phrase
company run ③Minimize cost function
vnew o by SGD, u → unew, z → znew
J = max 0,1− s + sc ④New verb vector is
Correct score > Incorrect score vnew = znew - o
(
)
New word representations considering compositionality Masashi Tsubaki
14
Our Model
Compositional Neural Language Model s=uTz u z = x + y y
x
Masashi Tsubaki
15
Our Model
Compositional Neural Language Model s=uTz u z = x + y
Co-Compositionality
+ with Prototype Projection y
x
Masashi Tsubaki
15
Our Model
Compositional Neural Language Model s=uTz u z = x + y
Co-Compositionality
+ with Prototype Projection y
x
Pobj
Pverb
run
company
v
o
Masashi Tsubaki
15
Our Model
Co-Compositional Neural Language Model Compositional Neural Language Model with Prototype Projection s=uTz u z = x + y
Co-Compositionality
+ with Prototype Projection y
x
Pobj
Pverb
run
company
v
o
Masashi Tsubaki
15
Our Model
Co-Compositional Neural Language Model Compositional Neural Language Model with Prototype Projection s=uTz u z = x + y y
x
Pobj run
Pverb v
company
①Prototype projection for
both verb and object
②Optimize parameters
with same method as
Compositional NLM
③Minimize
(
2
min xnew − Pobj v + λ v o
Masashi Tsubaki
v
2
)
15
Our Model
Co-Compositional Neural Language Model Compositional Neural Language Model with Prototype Projection s=uTz unew znew y
xnew
Pobj run
Pverb v
company
①Prototype projection for
both verb and object
②Optimize parameters
with same method as
Compositional NLM
③Minimize
(
2
min xnew − Pobj v + λ v o
Masashi Tsubaki
v
2
)
15
Our Model
Co-Compositional Neural Language Model Compositional Neural Language Model with Prototype Projection s=uTz unew znew y
xnew
Pobj run
Pverb vnew
company
①Prototype projection for
both verb and object
②Optimize parameters
with same method as
Compositional NLM
③Minimize
(
2
min xnew − Pobj v + λ v o
Masashi Tsubaki
v
2
)
15
Our Model
Co-Compositional Neural Language Model Compositional Neural Language Model with Prototype Projection s=uTz unew znew y
xnew
Pobj run
Pverb vnew
company
①Prototype projection for
both verb and object
②Optimize parameters
with same method as
Compositional NLM
③Minimize
(
2
min xnew − Pobj v + λ v o
v
2
)
New word representations considering co-compositionality Masashi Tsubaki
15
Experiment
Evaluation : Verb disambiguation [Grefenstette and Sadrzadeh 11]
Original neural vector [Blacoe and Lapata 12]
vs.
Re-trained neural vector with our learning models
Masashi Tsubaki
16
Experiment
Evaluation : Verb disambiguation [Grefenstette and Sadrzadeh 11]
Original neural vector [Blacoe and Lapata 12]
vs.
Re-trained neural vector with our learning models
Training data
Extracted 5000 Verb-Obj pairs from ukWaC corpus
Hyper-parameters
Learning rate:0.01, Regularization:10^4
20 iterations (One iteration is one run through the training data)
Masashi Tsubaki
16
Result and Discussion
Correlation with human judgment (Re-trained neural vector) New state of the art performance (ρ= 0.47)
0.5
0.44
0.45
0.38
Correlation ρ
0.4 0.35
0.47
0.31
0.3
Original
0.25
Re-trained
0.2 0.15 0.1 0.05 0 1 2 Compositional NLM Co-Compositional NLM
Higher performance with re-trained word representation
Masashi Tsubaki
17
Summary
Conclusion
New model of
compositionality
in word vector space
Co-Compositionality with Prototype Projection
Unsupervised word vector
re-training algorithm
considering compositionality
Compositional & Co-Compositional Neural Language Models
Achieve state of the art on verb disambiguation task Masashi Tsubaki
18
Appendix
Examples
Masashi Tsubaki
19
Appendix
Results of the different compositionality models
Masashi Tsubaki
20
Appendix
The number of prototype words
Masashi Tsubaki
21
Appendix
Variations in model configuration
Masashi Tsubaki
22
Appendix
Composition operator and parameter
Masashi Tsubaki
23