Modeling and Learning Semantic Co-Compositionality through ...

Report 3 Downloads 60 Views
Modeling and Learning Semantic Co-Compositionality

through Prototype Projections and Neural Networks

Masashi Tsubaki, Kevin Duh, Masashi Shimbo, Yuji Matsumoto

Nara Institute of Science and Technology (NAIST), Japan

Contributions

Two contributions in our work

New model of

compositionality

in word vector space

Unsupervised word vector

re-training algorithm

considering compositionality

Masashi Tsubaki

1

Contributions

Two contributions in our work

New model of

compositionality

in word vector space

Unsupervised word vector

re-training algorithm

considering compositionality

Masashi Tsubaki

1

Introduction

Modeling of compositionality in word vector space From word to phrase representation with matrix-vector operation

[Mitchell and Lapata 08], [Baroni and Zamparell 10], [Socher+ 12], [Van de Cruys+ 13]

run marathon

= race

run company

= operate run

marathon

company

Masashi Tsubaki

2

Introduction

Modeling of compositionality in word vector space From word to phrase representation with matrix-vector operation

[Mitchell and Lapata 08], [Baroni and Zamparell 10], [Socher+ 12], [Van de Cruys+ 13]

run marathon

= race

run company

= operate run

marathon

company

Masashi Tsubaki

2

Introduction

Modeling of compositionality in word vector space From word to phrase representation with matrix-vector operation

[Mitchell and Lapata 08], [Baroni and Zamparell 10], [Socher+ 12], [Van de Cruys+ 13]

run marathon

= race

run company

= operate run

marathon

company

Masashi Tsubaki

2

Introduction

Modeling of compositionality in word vector space From word to phrase representation with matrix-vector operation

[Mitchell and Lapata 08], [Baroni and Zamparell 10], [Socher+ 12], [Van de Cruys+ 13]

run marathon

= race

run company

= operate run

marathon

company

New model inspired by Co-Compositionality Masashi Tsubaki

2

Our Model

Main Idea : Co-Compositionality [Pustejovsky 1995]

Co-compositionality   Verb and object are allowed to modify each other’s meanings and generate the overall semantics

Masashi Tsubaki

3

Our Model

Main Idea : Co-Compositionality [Pustejovsky 1995]

Co-compositionality   Verb and object are allowed to modify each other’s meanings and generate the overall semantics f( run , company ) = operate





f( run , marathon ) = race

Masashi Tsubaki

3

Our Model

Main Idea : Co-Compositionality [Pustejovsky 1995]

Co-compositionality   Verb and object are allowed to modify each other’s meanings and generate the overall semantics f( runcompany , company ) = operate





f( runmarathon , marathon ) = race

Masashi Tsubaki

3

Our Model

Main Idea : Co-Compositionality [Pustejovsky 1995]

Co-compositionality   Verb and object are allowed to modify each other’s meanings and generate the overall semantics f( runcompany , companyrun ) = operate





f( runmarathon , marathonrun ) = race

Masashi Tsubaki

3

Our Model

Main Idea : Co-Compositionality [Pustejovsky 1995]

Co-compositionality   Verb and object are allowed to modify each other’s meanings and generate the overall semantics f( runcompany , companyrun ) = operate





f( runmarathon , marathonrun ) = race

Masashi Tsubaki

3

Our Model

Main Idea : Co-Compositionality [Pustejovsky 1995]

Co-compositionality   Verb and object are allowed to modify each other’s meanings and generate the overall semantics f( runcompany , companyrun ) = operate





f( runmarathon , marathonrun ) = race

Question

How do we implement

co-compositionality in vector space ? Masashi Tsubaki

3

Our Model

Prototype Projection for Co-Compositionality

Prototype Projection

Matrix-vector operation as an

implementation for Co-Compositionality

Masashi Tsubaki

4

Our Model

Co-Compositionality with Prototype Projections

run

company

Masashi Tsubaki

5

Our Model

Co-Compositionality with Prototype Projections

run

company

VerbOf start

build

・・・

buy

Prototype verbs

of “company” Masashi Tsubaki

5

Our Model

Co-Compositionality with Prototype Projections

run

company

VerbOf start

build

・・・

・・・

buy



×

Prototype verbs

of “company” Masashi Tsubaki

5

Our Model

Co-Compositionality with Prototype Projections

run

company

VerbOf start

build

・・・

・・・

buy



Latent subspace formed by

prototype verb vectors of company

×

V

Prototype verbs

of “company” Masashi Tsubaki

5

Our Model

Co-Compositionality with Prototype Projections runcompany run

Pcompany=VTV (Orthogonal projection

matrix to V)

company

VerbOf start

build

・・・

・・・

buy



Latent subspace formed by

prototype verb vectors of company

×

V

Prototype verbs

of “company” Masashi Tsubaki

5

Our Model

Co-Compositionality with Prototype Projections runcompany Prototype

Projection

run

Pcompany=VTV (Orthogonal projection

matrix to V)

company

VerbOf start

build

・・・

・・・

buy



Latent subspace formed by

prototype verb vectors of company

×

V

Prototype verbs

of “company” Masashi Tsubaki

5

Our Model

Co-Compositionality with Prototype Projections runcompany Prototype

Projection

run

Pcompany=VTV (Orthogonal projection

matrix to V)

VerbOf start

build

ObjectOf firm

×

bank

V

Prototype verbs

of “company”

hotel

・・・

・・・

・・・

buy



company

Prototype objects

of “run” Masashi Tsubaki

5

Our Model

Co-Compositionality with Prototype Projections runcompany Prototype

Projection

run

Pcompany=VTV (Orthogonal projection

matrix to V)

VerbOf start

build

ObjectOf firm

×

bank

Prototype verbs

of “company”

hotel



× ・・・

V

・・・

・・・

・・・

buy



company

O

Prototype objects

of “run” Masashi Tsubaki

5

Our Model

Co-Compositionality with Prototype Projections companyrun

runcompany Prototype

Projection

run

Pcompany=VTV (Orthogonal projection

matrix to V)

VerbOf start

build

company

ObjectOf firm

×

bank

Prototype verbs

of “company”

hotel



× ・・・

V

・・・

・・・

・・・

buy



Prun=OTO

O

Prototype objects

of “run” Masashi Tsubaki

5

Our Model

Co-Compositionality with Prototype Projections operate = runcompany Prototype

Projection

run

+  

Pcompany=VTV (Orthogonal projection

matrix to V)

VerbOf start

build

Prun=OTO

company

ObjectOf firm

×

bank

Prototype verbs

of “company”

hotel



× ・・・

V

・・・

・・・

・・・

buy



companyrun

O

Prototype objects

of “run” Masashi Tsubaki

5

Our Model

Intuitive image of prototype projection run Assume various senses of “run”

are aggregated in one vector

buy

Prototype verbs

of “company”

operate

build start

race

finish

Prototype verbs

of “marathon”

Tease out the proper semantics from



aggregate representation by projection to latent space Masashi Tsubaki

6

Experiment

Evaluation : Verb disambiguation in subject-verb-object triples Evaluation dataset [Grefenstette and Sadrzadeh 11]

Subj-Verb-Obj

Landmark verb

Similarity of human judgment

People-run-company

operate

7

People-run-company

move

2

200 subject-verb-object triples judged by 25 participants

Masashi Tsubaki

7

Experiment

Evaluation : Verb disambiguation in subject-verb-object triples Evaluation dataset [Grefenstette and Sadrzadeh 11]

Subj-Verb-Obj

Landmark verb

Similarity of human judgment

People-run-company

operate

7

People-run-company

move

2

200 subject-verb-object triples judged by 25 participants

Final co-compositional vector for subject-verb-object

subj + cocompositioned(verb, obj)

Masashi Tsubaki

7

Experiment

Evaluation : Verb disambiguation in subject-verb-object triples Evaluation dataset [Grefenstette and Sadrzadeh 11]

Subj-Verb-Obj

Landmark verb

Similarity of human judgment

People-run-company

operate

7

People-run-company

move

2

200 subject-verb-object triples judged by 25 participants

Final co-compositional vector for subject-verb-object

subj + cocompositioned(verb, obj) Models are evaluated by Spearman’s rank correlation

between vectors’ computed similarity and human judgment Masashi Tsubaki

7

Experiment

Implementation details run VerbOf start

build

ObjectOf firm

×

bank

hotel



× ・・・

V

・・・

・・・

・・・

buy



company

O

Extracted 20 prototype words from ukWaC corpus

Masashi Tsubaki

8

Experiment

Implementation details run VerbOf start

build

ObjectOf firm

×

bank

hotel



× ・・・

V

・・・

・・・

・・・

buy



company

O

high frequency

Extracted 20 prototype words from ukWaC corpus

Masashi Tsubaki

8

Experiment

Implementation details run VerbOf start

build

ObjectOf firm

×

bank

hotel



× ・・・

V

・・・

・・・

・・・

buy



company

O

both high frequency and high similarity

high frequency

Extracted 20 prototype words from ukWaC corpus

Masashi Tsubaki

8

Experiment

Implementation details run VerbOf start

build

firm

bank

80% of the

top singular values

hotel



× ・・・

V

・・・

high frequency

ObjectOf

× ・・・

・・・

buy



company

O

both high frequency and high similarity

Extracted 20 prototype words from ukWaC corpus

Masashi Tsubaki

8

Experiment

Implementation details run VerbOf start

build

firm

bank

80% of the

top singular values

hotel



× ・・・

V

・・・

high frequency

ObjectOf

× ・・・

・・・

buy



company

O

both high frequency and high similarity

Extracted 20 prototype words from ukWaC corpus

Word representation [Blacoe and Lapata 12]

①Distributional vector (2000 dim) ②Neural vector (50 dim)

Masashi Tsubaki

8

Experiment

Baselines : Models compared to ours

Add

[Mitchell and Lapata 08]



sbj + verb + obj

Multiply

[Mitchell and Lapata 08]



sbj × verb × obj

Grefenstette and Sadrzadeh 11

Mathematical model based on

abstract categorical framework

Van de Cruys+13

Multi-way interaction model

based on non-negative matrix factorization

Masashi Tsubaki

9

Result and Discussion

Correlation with human judgment (Distributional vector) Achieves high performance (ρ = 0.41)

0.45  

0.41  

0.4  

Correlation ρ

0.35  

0.37  

0.35   0.31  

0.3   0.25  

0.21  

0.2   0.15   0.1   0.05   0   1   Add

2   3   4   Multiply Grefenstette and

Van de

Sadrzadeh 11 Cruys+ 13

Masashi Tsubaki

Our 5  Model

10

Result and Discussion

Correlation with human judgment (Neural vector) State of the art performance (ρ= 0.44)

0.5  

0.44  

0.45  

0.37  

Correlation ρ

0.4   0.35  

0.31  

0.3   0.25  

0.3   0.21  

0.2   0.15   0.1   0.05   0  

1   Add

2   3   Grefenstette and

Van4   de

Multiply Sadrzadeh 11 Cruys+ 13

Masashi Tsubaki

5   Our Model

11

Result and Discussion

Correlation with human judgment (Neural vector) State of the art performance (ρ= 0.44)

0.5  

0.44  

0.45  

0.37  

Correlation ρ

0.4   0.35  

0.31  

0.3   0.25  

0.3   0.21  

0.2   0.15   0.1   0.05   0  

1   Add

2   3   Grefenstette and

Van4   de

Multiply Sadrzadeh 11 Cruys+ 13

5   Our Model

Co-Compositionality is useful for word sense disambiguation

Prototype projection is effective implementation for Co-Compositionality

Masashi Tsubaki

11

Introduction

Two contributions in our work

New model of

compositionality

in word vector space

Unsupervised word vector

re-training algorithm

considering compositionality

Masashi Tsubaki

12

Introduction

Two contributions in our work

New model of

compositionality

in word vector space

Unsupervised word vector

re-training algorithm

considering compositionality

Masashi Tsubaki

13

Our Model

Compositional Neural Language Model Re-training word representation with decomposition of phrase vector

Masashi Tsubaki

14

Our Model

Compositional Neural Language Model Re-training word representation with decomposition of phrase vector s=uTz

①Compute the score s of

correct phrase

u z  =  v  +  o run

v

company

o

Masashi Tsubaki

14

Our Model

Compositional Neural Language Model Re-training word representation with decomposition of phrase vector sc=uTz u zc  =  vc  +  o eat

vc

company

①Compute the score s of

correct phrase



②Compute the score sc of

corrupted incorrect phrase

o

Masashi Tsubaki

14

Our Model

Compositional Neural Language Model Re-training word representation with decomposition of phrase vector sc=uTz u zc  =  vc  +  o eat

vc

company

①Compute the score s of

correct phrase



②Compute the score sc of

corrupted incorrect phrase

o

J = max ( 0,1− s + sc ) Correct score > Incorrect score

Masashi Tsubaki

14

Our Model

Compositional Neural Language Model Re-training word representation with decomposition of phrase vector sc=uTz u zc  =  vc  +  o eat

vc

company

o

①Compute the score s of

correct phrase



②Compute the score sc of

corrupted incorrect phrase

   

③Minimize cost function

by SGD, u → unew, z → znew

J = max ( 0,1− s + sc ) Correct score > Incorrect score

Masashi Tsubaki

14

Our Model

Compositional Neural Language Model Re-training word representation with decomposition of phrase vector s=uTz unew znew

①Compute the score s of

correct phrase



②Compute the score sc of

corrupted incorrect phrase

   

③Minimize cost function

by SGD, u → unew, z → znew

J = max ( 0,1− s + sc ) Correct score > Incorrect score

Masashi Tsubaki

14

Our Model

Compositional Neural Language Model Re-training word representation with decomposition of phrase vector s=uTz

①Compute the score s of

correct phrase

unew

znew ②Compute the score sc of

corrupted incorrect phrase

   

company run ③Minimize cost function

vnew o by SGD, u → unew, z → znew



J = max 0,1− s + sc ④New verb vector is

Correct score > Incorrect score vnew = znew - o

(

)

Masashi Tsubaki

14

Our Model

Compositional Neural Language Model Re-training word representation with decomposition of phrase vector s=uTz

①Compute the score s of

correct phrase

unew

znew ②Compute the score sc of

corrupted incorrect phrase

   

company run ③Minimize cost function

vnew o by SGD, u → unew, z → znew



J = max 0,1− s + sc ④New verb vector is

Correct score > Incorrect score vnew = znew - o

(

)

New word representations considering compositionality Masashi Tsubaki

14

Our Model

Compositional Neural Language Model s=uTz u z  =  x  +  y y

x

Masashi Tsubaki

15

Our Model

Compositional Neural Language Model s=uTz u z  =  x  +  y

Co-Compositionality

+ with Prototype Projection y

x

Masashi Tsubaki

15

Our Model

Compositional Neural Language Model s=uTz u z  =  x  +  y

Co-Compositionality

+ with Prototype Projection y

x

Pobj

Pverb

run

company

v

o

Masashi Tsubaki

15

Our Model

Co-Compositional Neural Language Model Compositional Neural Language Model with Prototype Projection s=uTz u z  =  x  +  y

Co-Compositionality

+ with Prototype Projection y

x

Pobj

Pverb

run

company

v

o

Masashi Tsubaki

15

Our Model

Co-Compositional Neural Language Model Compositional Neural Language Model with Prototype Projection s=uTz u z  =  x  +  y y

x

Pobj run

Pverb v

company

①Prototype projection for

both verb and object



②Optimize parameters

with same method as

Compositional NLM



③Minimize

(

2

min xnew − Pobj v + λ v o

Masashi Tsubaki

v

2

)

15

Our Model

Co-Compositional Neural Language Model Compositional Neural Language Model with Prototype Projection s=uTz unew znew y

xnew

Pobj run

Pverb v

company

①Prototype projection for

both verb and object



②Optimize parameters

with same method as

Compositional NLM



③Minimize

(

2

min xnew − Pobj v + λ v o

Masashi Tsubaki

v

2

)

15

Our Model

Co-Compositional Neural Language Model Compositional Neural Language Model with Prototype Projection s=uTz unew znew y

xnew

Pobj run

Pverb vnew

company

①Prototype projection for

both verb and object



②Optimize parameters

with same method as

Compositional NLM



③Minimize

(

2

min xnew − Pobj v + λ v o

Masashi Tsubaki

v

2

)

15

Our Model

Co-Compositional Neural Language Model Compositional Neural Language Model with Prototype Projection s=uTz unew znew y

xnew

Pobj run

Pverb vnew

company

①Prototype projection for

both verb and object



②Optimize parameters

with same method as

Compositional NLM



③Minimize

(

2

min xnew − Pobj v + λ v o

v

2

)

New word representations considering co-compositionality Masashi Tsubaki

15

Experiment

Evaluation : Verb disambiguation [Grefenstette and Sadrzadeh 11]

Original neural vector [Blacoe and Lapata 12]



vs.

Re-trained neural vector with our learning models

Masashi Tsubaki

16

Experiment

Evaluation : Verb disambiguation [Grefenstette and Sadrzadeh 11]

Original neural vector [Blacoe and Lapata 12]



vs.

Re-trained neural vector with our learning models

Training data



Extracted 5000 Verb-Obj pairs from ukWaC corpus



Hyper-parameters



Learning rate:0.01, Regularization:10^4



20 iterations (One iteration is one run through the training data)

Masashi Tsubaki

16

Result and Discussion

Correlation with human judgment (Re-trained neural vector) New state of the art performance (ρ= 0.47)

0.5  

0.44  

0.45  

0.38  

Correlation ρ

0.4   0.35  

0.47  

0.31  

0.3  

Original

0.25  

Re-trained

0.2   0.15   0.1   0.05   0   1   2   Compositional NLM Co-Compositional NLM

Higher performance with re-trained word representation

Masashi Tsubaki

17

Summary

Conclusion

New model of

compositionality

in word vector space

Co-Compositionality with Prototype Projection

Unsupervised word vector

re-training algorithm

considering compositionality

Compositional & Co-Compositional Neural Language Models

Achieve state of the art on verb disambiguation task Masashi Tsubaki

18

Appendix

Examples

Masashi Tsubaki

19

Appendix

Results of the different compositionality models

Masashi Tsubaki

20

Appendix

The number of prototype words

Masashi Tsubaki

21

Appendix

Variations in model configuration

Masashi Tsubaki

22

Appendix

Composition operator and parameter

Masashi Tsubaki

23