slides - Informatics Homepages Server

Report 4 Downloads 54 Views
Dynamic Topic Adaptation for SMT Using Distributional Profiles Eva Hasler, Barry Haddow, Philipp Koehn School of Informatics University of Edinburgh

WMT 2014

Introduction

Model description

Experimental results

Conclusions

Adaptation means ... • taking contextual information into account,

for example the domain of a text

Limitations of domain adaptation • assumption: known, homogenous target domain • domains defined in terms of corpus boundaries

Context dependence in standard phrase-based models • translation model: source context within same phrase • language model context • no wider sentence or document context

2/21

Introduction

Model description

Experimental results

Conclusions

Task: Translation of diverse set of documents • typical scenario in web translation • online MT system has to translate diverse documents from

different topics & genres • domain adaptation techniques not easily applicable

→ adapt dynamically

3/21

Introduction

Model description

Experimental results

Conclusions

4/21

Introduction

Model description

Experimental results

Conclusions

No contextual information

4/21

Introduction

Model description

Experimental results

Conclusions

Sentence-level context information

4/21

Introduction

Model description

Experimental results

Conclusions

Document-level context information

4/21

Introduction

Model description

Experimental results

Conclusions

Both types of contextual information

4/21

Introduction

Model description

Experimental results

Conclusions

Related work Using source side context • integrate WSD classifier with local word and POS features

into MT systems [Carpuat and Wu, 2007, Chan et al., 2007]

Using topic models • learn document-level topics to adapt translation features

[Eidelman et al., 2012, Hasler et al., 2014] • learn sentence-level topics, compute similarity of training

and test sentences [Banchs and Costa-juss`a, 2011]

Cross-domain adaptation • define vector profiles of phrase pairs over training corpora,

adapt to an in-domain development set [Chen et al., 2013]

5/21

Introduction

Model description

Experimental results

Conclusions

Proposed approach Capture semantics of translation units • unit: phrase pair • abstract from lexical forms in training contexts • encode contextual information independent of corpus

boundaries: lower-dimensional topic mixture

Test time: compute context similarity • when translating a document, first measure its semantic

content • during decoding, favour semantically similar translation units

(dynamic adaptation)

6/21

Introduction

Model description

Experimental results

Conclusions

Phrase Pair Topic Model How to learn semantic representations? • represent each phrase pair

as distributional profile: “document” containing all context words • collect all source context

words in local training contexts of a phrase pair

7/21

Introduction

Model description

Experimental results

Conclusions

Phrase Pair Topic Model How to learn semantic representations? • represent each phrase pair

as distributional profile: “document” containing all context words • collect all source context

words in local training contexts of a phrase pair • learn latent representation

θpp for each phrase pair

7/21

Introduction

Model description

Experimental results

Conclusions

For each of P phrase pairs ppi in the collection Model for training

1. Draw a topic distribution from an asymmetric Dirichlet prior, θp ∼ Dirichlet(α0 , α . . . α). 2. For each position c in the distributional profile of ppi , draw a topic from that distribution, zp,c ∼ Multinomial(θp ). 3. Conditioned on topic zp,c , choose a context word wp,c ∼ Multinomial(ψzp,c ).

8/21

Introduction

Model description

Experimental results

Conclusions

For each of L test sentences (local) in the collection Model for testing

1. Draw a topic distribution from an asymmetric Dirichlet prior, θl ∼ Dirichlet(α0 , α . . . α). 2. For each position c in the test sentence, draw a topic from that distribution, zl,c ∼ Multinomial(θl ). 3. Conditioned on topic zl,c , choose a context word wl,c ∼ Multinomial(ψzl,c ).

9/21

Introduction

Model description

Experimental results

Conclusions

Learned topic representations

noyau → kernel

noyau → nucleus

noyau → core

• some ambiguity remains: both kernel and core occur in IT

contexts as translations of noyau

10/21

Introduction

Model description

Experimental results

Conclusions

Applying the model to unseen test documents

11/21

Introduction

Model description

Experimental results

Conclusions

Types of similarity features • local: similarity feature using local context • global: similarity feature using global context • +: log-linear combination • ⊕: additive combination of topic vectors • ⊗: multiplicative combination of topic vectors • ~: weighted combination that depends on sentence length

12/21

Introduction

Model description

Experimental results

Conclusions

Training data (French-English) Data Train Dev Test

Mixed 354K (6450) 2453 (39) 5664 (112)

Commoncrawl 110K 818 1892

NewsCom 103K 817 1878

Ted 140K 818 1894

Baselines (trained with Moses toolkit) • Concatenation baseline (no adaptation) • Domain adaptation baselines (train and test domains known) • LIN-TM: linear phrase table interpolation [Sennrich, 2012] • FILLUP: phrase table fillup [Bisazza et al., 2011] • one model per domain

Topic-adapted model • add topic-adapted features per sentence-level phrase table

13/21

Introduction

Model description

Experimental results

Conclusions

Combining local and global context

Model Baseline + global + ⊕ ⊗ ~

local local local local

Mixed -26.86 -27.27

Cc 19.61 20.12

Nc 29.42 29.48

Ted 31.88 32.55

*27.43 *27.49 -27.34 *27.45 +0.63

20.18 20.30 20.24 20.22 +0.69

29.65 29.66 29.61 29.51 +0.24

32.79 32.76 32.50 32.79 +0.88

• model with 50 latent topics

14/21

Introduction

Model description

Experimental results

Conclusions

Examples of ambiguous source phrases Source: Reference: Source: Reference:

Le noyau contient de nombreux pilotes, afin de fonctionner chez la plupart des utilisateurs. The precompiled kernel includes a lot of drivers, in order to work for most users. Il est prudent de consulter les pages de manuel ou les faq sp´ecifiques `a votre os. It’s best to consult the man pages or faqs for your os.

Source:

Nous fournissons nano (un petit ´editeur), vim (vi am´elior´e), qemacs (clone de emacs), elvis, joe .

Reference:

Nano (a lightweight editor), vim (vi improved), qemacs (emacs clone), elvis and joe.

Source:

Elle a introduit des politiques [..] `a cot´e des relations de gouvernement `a gouvernement traditionnelles. She has introduced policies [..] alongside traditional government-to-government relations.

Reference:

15/21

Introduction

Model description

Experimental results

Conclusions

Examples of ambiguous source phrases Source: Reference: Source: Reference:

Le noyau contient de nombreux pilotes, afin de fonctionner chez la plupart des utilisateurs. The precompiled kernel includes a lot of drivers, in order to work for most users. Il est prudent de consulter les pages de manuel ou les faq sp´ecifiques `a votre os. It’s best to consult the man pages or faqs for your os.

Model global local global⊕local

noyau → kernel nucleus kernel

os → os bones os

elvis → the king elvis the king

relations → relationship relations relations

16/21

Introduction

Model description

Experimental results

Conclusions

Comparison with domain adaptation

Adaptation Domain-adapted Topic-adapted

Model LIN-TM FILLUP global⊕local

Mixed -27.24 -27.12 *27.49

Cc 19.61 19.36 20.30

Nc 29.87 29.78 29.66

Ted 32.73 32.71 32.76

>LIN-TM >FILLUP

+0.25 +0.37

+0.69 +0.94

-0.21 -0.12

+0.03 +0.05

• Commoncrawl documents are the most diverse set • News Commentary documents are the least diverse set

17/21

Introduction

Model description

Experimental results

Conclusions

Combination with a document similarity feature

• similar to [Banchs and Costa-juss` a, 2011] • compute max sim score for each applicable phrase pair:

noyau → kernel, noyau → nucleus, noyau → core

18/21

Introduction

Model description

Experimental results

Conclusions

Combination with a document similarity feature

Model Baseline + docSim + global⊕local + global~local

Mixed -26.86 -27.22 *27.58 *27.60 +0.74

Cc 19.61 20.11 20.34 20.35 +0.74

Nc 29.42 29.63 29.71 29.70 +0.28

Ted 31.88 32.40 32.96 33.03 +1.15

19/21

Introduction

Model description

Experimental results

Conclusions

Conclusions

• introduced Phrase Pair Topic model: • learns semantic representations for translation units • provides compact way of storing contextual information • translation model is dynamically adapted to local/global test

context • adaptation with similarity features → efficient at test time • combining information from different scopes and topic

granularity performs better than each feature separately

20/21

Introduction

Model description

Experimental results

Conclusions

Thank you!

21/21

Introduction

Model description

Experimental results

Conclusions

Banchs, R. E. and Costa-juss`a, M. R. (2011). A Semantic Feature for Statistical Machine Translation. In SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation. Bisazza, A., Ruiz, N., and Federico, M. (2011). Fill-up versus Interpolation Methods for Phrase-based SMT Adaptation. In Proceedings of IWSLT. Carpuat, M. and Wu, D. (2007). Improving Statistical Machine Translation using Word Sense Disambiguation. In Proceedings of EMNLP, pages 61–72. Chan, Y. S., Ng, H. T., and Chiang, D. (2007). Word Sense Disambiguation Improves Statistical Machine Translation. In Proceedings of ACL.

21/21

Introduction

Model description

Experimental results

Conclusions

Chen, B., Kuhn, R., and Foster, G. (2013). Vector Space Model for Adaptation in Statistical Machine Translation. In Proceedings of ACL, pages 1285–1293. Eidelman, V., Boyd-Graber, J., and Resnik, P. (2012). Topic Models for Dynamic Translation Model Adaptation. In Proceedings of ACL. Hasler, E., Blunsom, P., Koehn, P., and Haddow, B. (2014). Dynamic Topic Adaptation for Phrase-based MT. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden. Sennrich, R. (2012). Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation. In Proceedings of EACL.

21/21