Compositional Morphology for Word ... - Semantic Scholar

Report 2 Downloads 39 Views
C OMPOSITIONAL M ORPHOLOGY FOR W ORD R EPRESENTATIONS AND L ANGUAGE M ODELLING Jan Botha, Phil Blunsom

ICML 2014, Beijing

M OTIVATION

P ROPOSED M ETHOD

M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of unkingly conduct .

E XPERIMENTS

M OTIVATION

P ROPOSED M ETHOD

M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of unkingly conduct . Wait what – unkingly?

E XPERIMENTS

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand ⇒ compositional morphology in action

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand ⇒ compositional morphology in action

W HAT OUR MODELS SEE ( MOSTLY ) 10

2

95

529

11

88

21

50

74

239

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

M OTIVATING E XAMPLE W HAT WE SEE The king finally abdicated after years of unkingly conduct . Wait what – unkingly? unkingly 2n’kINli a word you have probably never seen, but still understand ⇒ compositional morphology in action

W HAT OUR MODELS SEE ( MOSTLY ) 10

2

95

529

11

88

21

50

74

239

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

M OTIVATING E XAMPLE 2 Other languages display still more variation

C ZECH CONJUGATION

ˇ cistit (to clean) cˇ istím cˇ istíš cˇ istí cˇ istíme cˇ istíte cˇ istil ˇ cˇ išten cˇ isti ˇ cˇ istete ˇ cˇ isteme

T URKISH PRODUCTIVE DERIVATION Avrupa Avrupalı Avrupalıla¸s Avrupalıla¸stır Avrupalıla¸stırama Avrupalıla¸stıramadık ...

(Europe) (of Europe) (become of Europe) (to Europeanise) (be unable to Europeanise) (we were unable to Europeanise)

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

M OTIVATING E XAMPLE 2 Other languages display still more variation

C ZECH CONJUGATION

ˇ cistit (to clean) cˇ istím cˇ istíš cˇ istí cˇ istíme cˇ istíte cˇ istil ˇ cˇ išten cˇ isti ˇ cˇ istete ˇ cˇ isteme

T URKISH PRODUCTIVE DERIVATION Avrupa Avrupalı Avrupalıla¸s Avrupalıla¸stır Avrupalıla¸stırama Avrupalıla¸stıramadık ...

(Europe) (of Europe) (become of Europe) (to Europeanise) (be unable to Europeanise) (we were unable to Europeanise)

⇒ we should model morphemes!

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

R EPRESENTING WORDS I

Discrete set? {a, aardvark, . . . , account, accounted, accounting, . . . }

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

R EPRESENTING WORDS I

Discrete set? {a, aardvark, . . . , account, accounted, accounting, . . . }

I

Vector space? x2

a

accounted account aardvark x1

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

E XTRACT FROM C OLLOBERT & W ESTON E MBEDDINGS

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

E XTRACT FROM C OLLOBERT & W ESTON E MBEDDINGS

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

E XTRACT FROM C OLLOBERT & W ESTON E MBEDDINGS

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

M ORPHEME VECTORS Existing word vectors already capture some morphology. −−−→ −−→ −−−→ −−→ −−−−→ −−−→ I banks − bank ≈ kings − king ≈ queens − queen (Mikolov et al. 2013)

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

M ORPHEME VECTORS Existing word vectors already capture some morphology. −−−→ −−→ −−−→ −−→ −−−−→ −−−→ I banks − bank ≈ kings − king ≈ queens − queen (Mikolov et al. 2013)

Logical extension: −−−→ −−→ → − I kings ≈ king + -s −−−−−→ −→ −−→ − → I unkingly ≈ un- + king + -ly

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

M ORPHEME VECTORS Existing word vectors already capture some morphology. −−−→ −−→ −−−→ −−→ −−−−→ −−−→ I banks − bank ≈ kings − king ≈ queens − queen (Mikolov et al. 2013)

Logical extension: −−−→ −−→ → − I kings ≈ king + -s −−−−−→ −→ −−→ − → I unkingly ≈ un- + king + -ly

H OW TO ... I

obtain morpheme vectors

I

compose morpheme vectors

I

do it all within a language model usable in an MT decoder

M OTIVATION

P ROPOSED M ETHOD

M ORPHOLOGICAL COMPOSITION AS ADDITION Literally, word = sum of its parts?

E XPERIMENTS

M OTIVATION

P ROPOSED M ETHOD

M ORPHOLOGICAL COMPOSITION AS ADDITION Literally, word = sum of its parts? Problems: I I

−−→ −−→ −−→ −−→ hang + over 6= over + hang −−−−−−−→ −−−→ −−−→ non-compositionality: greenhouse 6= green + house

bag of morphemes:

E XPERIMENTS

M OTIVATION

P ROPOSED M ETHOD

M ORPHOLOGICAL COMPOSITION AS ADDITION Literally, word = sum of its parts? Problems: I I

−−→ −−→ −−→ −−→ hang + over 6= over + hang −−−−−−−→ −−−→ −−−→ non-compositionality: greenhouse 6= green + house

bag of morphemes:

P RAGMATIC S OLUTION include word identity as component too: −−−−−−−→ greenhouse ≡

−−−→ −−−→ greenstem + housestem

−−−−−→ unkingly ≡

→ − −−→ → − unpre + kingstem + ly suf

E XPERIMENTS

M OTIVATION

P ROPOSED M ETHOD

M ORPHOLOGICAL COMPOSITION AS ADDITION Literally, word = sum of its parts? Problems: I I

−−→ −−→ −−→ −−→ hang + over 6= over + hang −−−−−−−→ −−−→ −−−→ non-compositionality: greenhouse 6= green + house

bag of morphemes:

P RAGMATIC S OLUTION include word identity as component too: −−−−−−−→ −−−−−−−→ −−−→ −−−→ greenhouse ≡ greenhouseid + greenstem + housestem −−−−−→ −−−−−→ → − −−→ → − unkingly ≡ unkinglyid + unpre + kingstem + ly suf

E XPERIMENTS

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

S IMPLEST VECTOR - BASED PROBABILISTIC LM LBL (Log-bilinear model)

(Mnih & Hinton, 2007; Mnih & Teh, 2012)

“colorless green ideas sleep furiously .”

M OTIVATION

P ROPOSED M ETHOD

A DD MORPHEME VECTORS INSIDE LM LBL++

“colorless green ideas sleep furiously .”

E XPERIMENTS

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

C OMPUTATIONAL E FFICIENCY Problem: Each probability query requires normalisation over vocabulary. I

O(vocab size)

I

rich morphology ⇒ large vocabulary

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

C OMPUTATIONAL E FFICIENCY Problem: Each probability query requires normalisation over vocabulary. I

O(vocab size)

I

rich morphology ⇒ large vocabulary

S OLUTION : D ECOMPOSE MODEL USING WORD CLASSES   P word | history = P class(word) | history  × P word | class(word), history I I

use unsupervised Brown-clustering √ each LM query becomes 2 × O( vocab size) ⇒ fast enough for MT-decoding

M OTIVATION

P ROPOSED M ETHOD

E VALUATION OVERVIEW Setup I

4-gram models

I

Czech, English, French, German, Spanish, Russian

I

train on 20–50m tokens

I

large vocabularies (exclude 5% of singletons)

E XPERIMENTS

M OTIVATION

P ROPOSED M ETHOD

E VALUATION OVERVIEW Setup I

4-gram models

I

Czech, English, French, German, Spanish, Russian

I

train on 20–50m tokens

I

large vocabularies (exclude 5% of singletons)

Three evaluation contexts: I

Perplexity on test data

I

Word similarity rating

I

Machine translation

E XPERIMENTS

M OTIVATION

P ROPOSED M ETHOD

E VALUATION OVERVIEW

Three evaluation contexts: I

Perplexity on test data

I

Word similarity rating

I

Machine translation

E XPERIMENTS

M OTIVATION

P ROPOSED M ETHOD

E XPERIMENTS

P ERPLEXITY I MPROVEMENTS BY L ANGUAGE CLBL→CLBL++ 683→643

6 422→404

%

4

313→300 281→273 207→203 232→227

2

0 CS

DE

EN

ES

FR

RU

M OTIVATION

P ROPOSED M ETHOD

P ERPLEXITY I MPROVEMENTS ON G ERMAN CLBL→CLBL++

(B REAK - DOWN BY TOKEN FREQUENCY )

20 15 % 10 5 0

0