Transfer learning in language ダ ウ メ
Hal Daumé III Computer Science University of Maryland
[email protected] IWSML Kyoto, Japan
Piyush Rai (Utah
Avishek Saha Abhishek Kumar (Utah) (UMD)
31 Mar 2012
Jagadeesh Jagarlamudi (UMD) 1
Hal Daumé III (
[email protected])
Suresh Venkatasubramanian (Utah) Transfer learning in language
Linguistic ambiguity
2
➢
Teacher Strikes Idle Kids
➢
Enraged Cow Injures Farmer With Ax
➢
I saw the Grand Canyon flying to New York
➢
Dog collar vs. flea collar
➢
Plastic cat food can cover
➢
The BUG in the room … ➢ … flew out the window ➢ … was planted by spies
➢
Everyone on the island speaks two languages Hal Daumé III (
[email protected])
Transfer learning in language
Typical NLP pipeline The man ate a sandwich The man eat+ a sandwich past DT NN VB DT NN
S
NP VP Theme
∃ a ∃ t ∃ e
Source Semantics Source Shallowmantics Source Syntax Source Morphology
Source Words 3
Hal Daumé III (
[email protected])
Target Semantics Target Shallowmantics Target Syntax Target Morphology
ion rat ne Ge
man(a) & sandwich(t) & eat(e,a,t) & past(e)
Interlingua
An aly sis
NP Agent
Morphology Tagging Parsing Role labeling Interpretation
Target Words Transfer learning in language
Typical NLP pipeline The man ate a sandwich The man eat+ a sandwich past DT NN VB DT NN NP Agent
NP VP Theme
Interlingua These tasks are TheseSource tasks are Semantics Target Semantics related! a t e highly Source Target highly related! Shallowmantics man(a) & Shallowmantics ∃
S
∃
Source Syntax Source Morphology
Source Words 4
Hal Daumé III (
[email protected])
Target Syntax Target Morphology
ion rat ne Ge
sandwich(t) & eat(e,a,t) & past(e)
An aly sis
∃
Morphology Tagging Parsing Role labeling Interpretation
Target Words Transfer learning in language
Pipeline models break down (sorta) ➢
Tagging + Parsing
+ 0%
/
+ 3%
➢
Parsing + Named Entities
+ 0.5% /
+ 4%
➢
Parsing + Role Identification (upper bound:
+ 0%
- 0.3% + 13% )
➢
Named Entities + Coreference + 0.3% /
+ 1.3%
(upper bound:
+ 8% )
/
Why? Maybe simpler model already has a lot of the fancier information
[Finkel & Manning; ACL 09 Sutton & McCallum; NAACL 07 Daumé III & Marcu; EMNLP 06 Many others...] 5
Hal Daumé III (
[email protected])
Transfer learning in language
This talk is about... 1. Joint Parsing and Entity Recognition
2. Transfer via Multilinguality
Mark Gales spoke at IWSML NNP NNP VB IN NNP Person
NP S
VP
PP Event
3. Transfer from unlabeled data
6
Hal Daumé III (
[email protected])
Transfer learning in language
This talk is about... 1. Joint Parsing and Entity Recognition
2. Transfer via Multilinguality
Mark Gales spoke at IWSML NNP NNP VB IN NNP Person
NP S
VP
PP Event
3. Transfer from unlabeled data
7
Hal Daumé III (
[email protected])
Transfer learning in language
Agreement-based transfer George Bush spoke to Congress NNP NNP VBD TO NNP [NP -] [- VP -] [-PP-][- NP -] [[[[-
Person Person -] Person -] Person -]
-]
yesterday NN [- NP -]
[- Org -] [Org [- Org -]
-]
Entities are subsequences of NPs ● NNPs are subsequences of entities ●
8
Hal Daumé III (
[email protected])
Transfer learning in language
Lots of approaches in 2008 ➢
Semi-supervised learning with constraints ➢ ➢
➢
Co-regularization ➢ ➢
➢
Encourage learned models to have similar structure [Ganchev, Graca, Blitzer, Taskar; UAI 2008]
Cross-task Co-training ➢ ➢
9
Force outputs to obey constraints and do self-training [Chang, Ratinov, Rizzolo, Roth; AAAI 2008]
Do self-training only on outputs that obey constraints [Daumé III; EMNLP 2008]
Hal Daumé III (
[email protected])
Transfer learning in language
Simple black-box algorithm ➢ ➢ ➢ ➢ ➢ ➢
Learn a parser on labeled data We love love love Learn an entity recognizer on labeledalgorithms data black-box Run both on unlabeled data Assume outputs are both correct for any data point that obeys the constraints in outputs space Retrain models on original data plus new data Rinse and repeat If the constraints are: Correct (true outputs always agree) Discriminating (the probability of agreement is at most 1/[4 (|Y| - 1)^2 ]) Then this algorithm “works” (in a PAC sense) [Daumé III; EMNLP 2008]
10
Hal Daumé III (
[email protected])
Transfer learning in language
Black-box results
[Daumé III; EMNLP 2008] 11
Hal Daumé III (
[email protected])
Transfer learning in language
This talk is about... 1. Joint Parsing and Entity Recognition
2. Transfer via Multilinguality
Mark Gales spoke at IWSML NNP NNP VB IN NNP Person
NP S
VP
PP Event
3. Transfer from unlabeled data
12
Hal Daumé III (
[email protected])
Transfer learning in language
Multilinguality as a source of x-fer The man ate a tasty D N V D J NP
VP
sandwich N
NP
S Le+homme a mange un sandwich savoureaux 21% on average D N A V D N J over 8 languages
English PCFG
French PCFG
ϴ
NP English, Dutch NP VP Danish,SSwedish El hombre Portuguese se comio un bocadillo sabrosa Spanish, D N Slovene A V D N J Spanish Chinese PCFG NP NP See also:VP [Berg-Kirkpatrick & Klein; ACL10] S et al.... [Iwata, Mochihashi & Sawada; ACL10] Snyder, Barzilay 13
Hal Daumé III (
[email protected])
Transfer learning in language
Implicational Universals English: I eat dinner in restaurants. French: je mange le diner dans les restaurants I eat the dinner in the restaurants
VO ⊃ PreP PostP ⊃ OV Verb-Object (VO) Prepositional (PreP)
Japanese: boku -wa bangohan-o resutoran -ni taberu I -topic dinner -obj restaurants -in eat Hindi: main raat ka khaana restra mein khaata hoon I night-of-meal restaurants in eat am
Object-Verb (OV) Postpositional (PostP)
[Daumé III & Campbell; ACL 2007] 14
Hal Daumé III (
[email protected])
Transfer learning in language
Typological Map: VO
[Daumé III & Campbell; ACL 2007] 15
Hal Daumé III (
[email protected])
Transfer learning in language
Typological Map: PreP
[Daumé III & Campbell; ACL 2007] 16
Hal Daumé III (
[email protected])
Transfer learning in language
Unsupervised part of speech tagging ➢
Seeds (frequent words for each tag) ➢ ➢ ➢ ➢
➢
Typological rules: ➢ ➢
➢
N: membro, milhoes, obras D: as [the,2f] o [the,1m] os [the,2m] V: afector, gasta, juntar P: com, como, de, em
Art ← Noun Prp → Noun
Tag knowledge: ➢ ➢
Open class Closed class [Teichert & Daumé III; NIPSWS 2009]
17
Hal Daumé III (
[email protected])
Transfer learning in language
Does typology help? SEEDS Can alsoNO transfer across languages SEEDS 60 60 Even for typologically distinct ones! 55
55
50
50
45
45
40
40
35
35
30
30
25
25
20
20
ArtN
18
ArtN
[Teichert & Daumé III; NIPSWS 2009; Hal Daumé III (
[email protected]) Transfer learningsub] in language Sanders & Daume III, EMNLP 2012
This talk is about... 1. Joint Parsing and Entity Recognition
2. Transfer via Multilinguality
Mark Gales spoke at IWSML NNP NNP VB IN NNP Person
NP S
VP
PP Event
3. Transfer from unlabeled data
19
Hal Daumé III (
[email protected])
Transfer learning in language
Spectral Clustering ➢ ➢ ➢
Represent datapoints as the vertices V of a graph G. All pairs of vertices are connected by an edge E. Edges have weights W. ➢
20
Large weights mean that the adjacent vertices are very similar; small weights imply dissimilarity.
Hal Daumé III (
[email protected])
Transfer learning in language
Graph partitioning ➢
➢
➢
Clustering on a graph is equivalent to partitioning the vertices of the graph. A loss function for a partition of V into sets A and B
In a good partition, vertices in different partitions will be dissimilar. ➢
21
Mincut criterion: Find partition
that minimizes
Hal Daumé III (
[email protected])
Transfer learning in language
Graph partitioning
22
➢
Mincut criterion ignores the size of the subgraphs formed.
➢
Normalized cut criterion favors balanced partitions.
➢
Minimizing the normalized cut criterion exactly is NPhard.
Hal Daumé III (
[email protected])
Transfer learning in language
Spectral Clustering ➢
➢
One way of approximately optimizing the normalized cut criterion leads to spectral clustering. Spectral clustering ➢ ➢
➢
23
Find a new representation of the original data points. Cluster the points in this representation using any clustering scheme (say 2-means).
The representation involves forming the row-normalized matrix using the largest 2 eigenvectors of the matrix
Hal Daumé III (
[email protected])
Transfer learning in language
Example: 2-means
24
Hal Daumé III (
[email protected])
Transfer learning in language
Example: Spectral clustering
25
Hal Daumé III (
[email protected])
Transfer learning in language
Multiview spectral clustering 日本語
En
日本語
…
Ue We
En
En
Wj
Uj
日本語
UeUeWj T
Uj Uj We T
26
Hal Daumé III (
[email protected])
Transfer learning in language
Multiview spectral clustering
U
W
…
日本語 Algorithm Algorithm En 1. each view 1. Run Run SVD SVD on on each view 日本語 En 2. 2. Project Project each each view view onto onto subspace subspace e spanned e j by other's top-left svs spanned by other's top-left svs 3. 日本語 3. Goto Goto 11 unless unless converged En converged
W
Uj
Look Look ma: ma: no no hyperparameters! hyperparameters!
UeUeWj T
Uj Uj We T
27
Hal Daumé III (
[email protected])
Transfer learning in language
Multiview spectral clustering
U
W
…
日本語 Algorithm Algorithm En 1. each view 1. Run Run SVD SVD on on each view 日本語 En 2. each view onto subspace Results (Reuters) 2. Project Project each view onto Results (Reuters)subspace e spanned e j by other's top svs spanned by other's top svs 3. converged 日本語 Norm. F-score 3. Goto Goto 11 unless unless converged En F-score Norm. MI MI Best 0.287 Best View View 0.342 0.342 0.287 Look ma: Concat 0.368 0.298 Look ma: no no hyperparameters! hyperparameters! Concat 0.368 0.298 T0.381 SofA 0.381 0.342 SofA 0.342 Co-Spec 0.412 e e j0.388 Co-Spec 0.412 0.388
W
Uj
U U W
Uj Uj We T
28
Hal Daumé III (
[email protected])
Transfer learning in language
This talk is about... 1. Joint Parsing and Entity Recognition
2. Transfer via Multilinguality
Mark Gales spoke at IWSML NNP NNP VB IN NNP Person
NP S
VP
PP Event
3. Transfer from unlabeled data
29
Hal Daumé III (
[email protected])
Transfer learning in language
This talk is about... 1. Jointalgorithms Parsing and Simple can Simple algorithms can Entity Recognition achieve great transfer achieve great transfer
2. Transfer via Multilinguality
Mark Gales spoke at IWSML NNP NNP VB IN NNP Person
NP S
VP
PP Event
3. Transfer from unlabeled data
30
Hal Daumé III (
[email protected])
Transfer learning in language
This talk is about... 1. Jointalgorithms Parsing and Simple can Simple algorithms can Entity Recognition achieve great transfer achieve great transfer Mark Gales spoke at IWSML NNP NNP VB IN NNP Person
NP S
VP
Plentiful multilingual Plentiful multilingual 2. Transfer via data ++knowledge dataMultilinguality knowledge== strong strongmodels models
PP Event
3. Transfer from unlabeled data
31
Hal Daumé III (
[email protected])
Transfer learning in language
This talk is about... 1. Jointalgorithms Parsing and Simple can Simple algorithms can Entity Recognition achieve great transfer achieve great transfer Mark Gales spoke at IWSML NNP NNP VB IN NNP Person
NP S
VP
Plentiful multilingual Plentiful multilingual 2. Transfer via data ++knowledge dataMultilinguality knowledge== strong strongmodels models
PP Event
3. Transfer from unlabeled Unlabeled (paired) Unlabeled (paired) data data can canbe beexpoited expoitedefficiently efficiently
32
Hal Daumé III (
[email protected])
Transfer learning in language
This talk is about... Plentiful multilingual Plentiful multilingual 2. Transfer via data ++knowledge dataMultilinguality knowledge== strong models strong models Mark Gales spoke at IWSML NNP NNP VB IN NNP When will transfer When will transferwill willhelp? help? Has transfer helped? Has transfer helped? NP PP Person Event VPto How incorporate knowledge? How to incorporate knowledge? S Scaling Scalingto tobillions billionsof ofexamples? examples? 3. Transfer from unlabeled Unlabeled (paired) Unlabeled (paired) data data can canbe beexpoited expoitedefficiently efficiently 1. Jointalgorithms Parsing and Simple can Simple algorithms can Entity Recognition achieve great transfer achieve great transfer
?
33
Hal Daumé III (
[email protected])
Transfer learning in language
This talk is about... Piyush Avishek Abhishek Jags
Suresh
Plentiful multilingual Plentiful multilingual 2. Transfer via data ++knowledge dataMultilinguality knowledge== strong models strong models Mark Gales spoke at IWSML NNP NNP VB IN NNP When will transfer When will transferwill willhelp? help? Has transfer helped? Has transfer helped? NP PP Person Event VPto How incorporate knowledge? How to incorporate knowledge? S Scaling Scalingto tobillions billionsof ofexamples? examples? 3. Transfer from unlabeled Unlabeled (paired) Unlabeled (paired) data data can canbe beexpoited expoitedefficiently efficiently 1. Jointalgorithms Parsing and Simple can Simple algorithms can Entity Recognition achieve great transfer achieve great transfer
?
ありがとうございます ありがとうございます !! 質問は? 質問は? 34
Hal Daumé III (
[email protected])
Transfer learning in language