Dynamic Topic Adaptation for SMT Using Distributional Profiles Eva Hasler, Barry Haddow, Philipp Koehn School of Informatics University of Edinburgh
WMT 2014
Introduction
Model description
Experimental results
Conclusions
Adaptation means ... • taking contextual information into account,
for example the domain of a text
Limitations of domain adaptation • assumption: known, homogenous target domain • domains defined in terms of corpus boundaries
Context dependence in standard phrase-based models • translation model: source context within same phrase • language model context • no wider sentence or document context
2/21
Introduction
Model description
Experimental results
Conclusions
Task: Translation of diverse set of documents • typical scenario in web translation • online MT system has to translate diverse documents from
different topics & genres • domain adaptation techniques not easily applicable
→ adapt dynamically
3/21
Introduction
Model description
Experimental results
Conclusions
4/21
Introduction
Model description
Experimental results
Conclusions
No contextual information
4/21
Introduction
Model description
Experimental results
Conclusions
Sentence-level context information
4/21
Introduction
Model description
Experimental results
Conclusions
Document-level context information
4/21
Introduction
Model description
Experimental results
Conclusions
Both types of contextual information
4/21
Introduction
Model description
Experimental results
Conclusions
Related work Using source side context • integrate WSD classifier with local word and POS features
into MT systems [Carpuat and Wu, 2007, Chan et al., 2007]
Using topic models • learn document-level topics to adapt translation features
[Eidelman et al., 2012, Hasler et al., 2014] • learn sentence-level topics, compute similarity of training
and test sentences [Banchs and Costa-juss`a, 2011]
Cross-domain adaptation • define vector profiles of phrase pairs over training corpora,
adapt to an in-domain development set [Chen et al., 2013]
5/21
Introduction
Model description
Experimental results
Conclusions
Proposed approach Capture semantics of translation units • unit: phrase pair • abstract from lexical forms in training contexts • encode contextual information independent of corpus
boundaries: lower-dimensional topic mixture
Test time: compute context similarity • when translating a document, first measure its semantic
content • during decoding, favour semantically similar translation units
(dynamic adaptation)
6/21
Introduction
Model description
Experimental results
Conclusions
Phrase Pair Topic Model How to learn semantic representations? • represent each phrase pair
as distributional profile: “document” containing all context words • collect all source context
words in local training contexts of a phrase pair
7/21
Introduction
Model description
Experimental results
Conclusions
Phrase Pair Topic Model How to learn semantic representations? • represent each phrase pair
as distributional profile: “document” containing all context words • collect all source context
words in local training contexts of a phrase pair • learn latent representation
θpp for each phrase pair
7/21
Introduction
Model description
Experimental results
Conclusions
For each of P phrase pairs ppi in the collection Model for training
1. Draw a topic distribution from an asymmetric Dirichlet prior, θp ∼ Dirichlet(α0 , α . . . α). 2. For each position c in the distributional profile of ppi , draw a topic from that distribution, zp,c ∼ Multinomial(θp ). 3. Conditioned on topic zp,c , choose a context word wp,c ∼ Multinomial(ψzp,c ).
8/21
Introduction
Model description
Experimental results
Conclusions
For each of L test sentences (local) in the collection Model for testing
1. Draw a topic distribution from an asymmetric Dirichlet prior, θl ∼ Dirichlet(α0 , α . . . α). 2. For each position c in the test sentence, draw a topic from that distribution, zl,c ∼ Multinomial(θl ). 3. Conditioned on topic zl,c , choose a context word wl,c ∼ Multinomial(ψzl,c ).
9/21
Introduction
Model description
Experimental results
Conclusions
Learned topic representations
noyau → kernel
noyau → nucleus
noyau → core
• some ambiguity remains: both kernel and core occur in IT
contexts as translations of noyau
10/21
Introduction
Model description
Experimental results
Conclusions
Applying the model to unseen test documents
11/21
Introduction
Model description
Experimental results
Conclusions
Types of similarity features • local: similarity feature using local context • global: similarity feature using global context • +: log-linear combination • ⊕: additive combination of topic vectors • ⊗: multiplicative combination of topic vectors • ~: weighted combination that depends on sentence length
12/21
Introduction
Model description
Experimental results
Conclusions
Training data (French-English) Data Train Dev Test
Mixed 354K (6450) 2453 (39) 5664 (112)
Commoncrawl 110K 818 1892
NewsCom 103K 817 1878
Ted 140K 818 1894
Baselines (trained with Moses toolkit) • Concatenation baseline (no adaptation) • Domain adaptation baselines (train and test domains known) • LIN-TM: linear phrase table interpolation [Sennrich, 2012] • FILLUP: phrase table fillup [Bisazza et al., 2011] • one model per domain
Topic-adapted model • add topic-adapted features per sentence-level phrase table
13/21
Introduction
Model description
Experimental results
Conclusions
Combining local and global context
Model Baseline + global + ⊕ ⊗ ~
local local local local
Mixed -26.86 -27.27
Cc 19.61 20.12
Nc 29.42 29.48
Ted 31.88 32.55
*27.43 *27.49 -27.34 *27.45 +0.63
20.18 20.30 20.24 20.22 +0.69
29.65 29.66 29.61 29.51 +0.24
32.79 32.76 32.50 32.79 +0.88
• model with 50 latent topics
14/21
Introduction
Model description
Experimental results
Conclusions
Examples of ambiguous source phrases Source: Reference: Source: Reference:
Le noyau contient de nombreux pilotes, afin de fonctionner chez la plupart des utilisateurs. The precompiled kernel includes a lot of drivers, in order to work for most users. Il est prudent de consulter les pages de manuel ou les faq sp´ecifiques `a votre os. It’s best to consult the man pages or faqs for your os.
Source:
Nous fournissons nano (un petit ´editeur), vim (vi am´elior´e), qemacs (clone de emacs), elvis, joe .
Reference:
Nano (a lightweight editor), vim (vi improved), qemacs (emacs clone), elvis and joe.
Source:
Elle a introduit des politiques [..] `a cot´e des relations de gouvernement `a gouvernement traditionnelles. She has introduced policies [..] alongside traditional government-to-government relations.
Reference:
15/21
Introduction
Model description
Experimental results
Conclusions
Examples of ambiguous source phrases Source: Reference: Source: Reference:
Le noyau contient de nombreux pilotes, afin de fonctionner chez la plupart des utilisateurs. The precompiled kernel includes a lot of drivers, in order to work for most users. Il est prudent de consulter les pages de manuel ou les faq sp´ecifiques `a votre os. It’s best to consult the man pages or faqs for your os.
Model global local global⊕local
noyau → kernel nucleus kernel
os → os bones os
elvis → the king elvis the king
relations → relationship relations relations
16/21
Introduction
Model description
Experimental results
Conclusions
Comparison with domain adaptation
Adaptation Domain-adapted Topic-adapted
Model LIN-TM FILLUP global⊕local
Mixed -27.24 -27.12 *27.49
Cc 19.61 19.36 20.30
Nc 29.87 29.78 29.66
Ted 32.73 32.71 32.76
>LIN-TM >FILLUP
+0.25 +0.37
+0.69 +0.94
-0.21 -0.12
+0.03 +0.05
• Commoncrawl documents are the most diverse set • News Commentary documents are the least diverse set
17/21
Introduction
Model description
Experimental results
Conclusions
Combination with a document similarity feature
• similar to [Banchs and Costa-juss` a, 2011] • compute max sim score for each applicable phrase pair:
noyau → kernel, noyau → nucleus, noyau → core
18/21
Introduction
Model description
Experimental results
Conclusions
Combination with a document similarity feature
Model Baseline + docSim + global⊕local + global~local
Mixed -26.86 -27.22 *27.58 *27.60 +0.74
Cc 19.61 20.11 20.34 20.35 +0.74
Nc 29.42 29.63 29.71 29.70 +0.28
Ted 31.88 32.40 32.96 33.03 +1.15
19/21
Introduction
Model description
Experimental results
Conclusions
Conclusions
• introduced Phrase Pair Topic model: • learns semantic representations for translation units • provides compact way of storing contextual information • translation model is dynamically adapted to local/global test
context • adaptation with similarity features → efficient at test time • combining information from different scopes and topic
granularity performs better than each feature separately
20/21
Introduction
Model description
Experimental results
Conclusions
Thank you!
21/21
Introduction
Model description
Experimental results
Conclusions
Banchs, R. E. and Costa-juss`a, M. R. (2011). A Semantic Feature for Statistical Machine Translation. In SSST-5 Proceedings of the Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation. Bisazza, A., Ruiz, N., and Federico, M. (2011). Fill-up versus Interpolation Methods for Phrase-based SMT Adaptation. In Proceedings of IWSLT. Carpuat, M. and Wu, D. (2007). Improving Statistical Machine Translation using Word Sense Disambiguation. In Proceedings of EMNLP, pages 61–72. Chan, Y. S., Ng, H. T., and Chiang, D. (2007). Word Sense Disambiguation Improves Statistical Machine Translation. In Proceedings of ACL.
21/21
Introduction
Model description
Experimental results
Conclusions
Chen, B., Kuhn, R., and Foster, G. (2013). Vector Space Model for Adaptation in Statistical Machine Translation. In Proceedings of ACL, pages 1285–1293. Eidelman, V., Boyd-Graber, J., and Resnik, P. (2012). Topic Models for Dynamic Translation Model Adaptation. In Proceedings of ACL. Hasler, E., Blunsom, P., Koehn, P., and Haddow, B. (2014). Dynamic Topic Adaptation for Phrase-based MT. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden. Sennrich, R. (2012). Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation. In Proceedings of EACL.
21/21