SAS STATISTICAL MACHINE TRANSLATION SYSTEM FOR WAT 2014 Rui Wang, Xu Yang and Yan Gao SAS Institute Inc, Beijing, China
Introduction This paper describes the machine translation system employed by SAS Institute Inc in the 1st Workshop on Asian Translation. We participate in two subtasks in this year’s WAT: Chinese to Japanese; English to Japanese. The sentence structure of Japanese is different with that of English/Chinese. Japanese is typically a Subject-Object-Verb (SOV) language while Chinese and English are Subject-VerbObject (SVO) languages, as illustrated in the following Figure. Japanese: 私は 日本語 を好きです 。 ( I ) (Japanese) (like) (.) S O V
Baseline (phrase-based model provided by the organizer) : Japanese: Juman segmentation tool Chinese: Stanford Word Segmenter SAS segmentation: SAS segmentation tool of SAS® Text Miner for Chinese and Japanese.
Reordering rule: Head Finalization [Isozaki 2010] Move syntactic heads to the end of the corresponding syntactic constituents. Use dependency parser: ENJU Parser (developed by University of Tokyo ) * *
The statistic machine translation between Japanese and the いる 影響 および 今後 予測さ れる 影響 を まとめた 。 SVO language is particularly difficult because of the long dis- 時点 で 検出されて on) ) (based 根据(P (IP (VP (PP (PPP PP-rule: (XXX)) → PP ((XXX) P) (a) tance difference of word orders. We propose a simple syntactic (NP (ADJP (JJ 大型 (large) )) to move P (NP to (NN the零售 end of )PP .商店 (shop) ) (NN (retail) reordering approach to transform Chinese/English into SVO (NN 选址法 (locating method) )))) (IP (VP (PP (P 根据 (based on) ) languages. (VP (VV 进行 (process) ) (NP (ADJP (JJ 大型 (large) )) (NP (NN 方针 (policy) )) In addition, we apply the segmentation tool in SAS® Text (NP (NN 零售 (retail) ) (NN 商店 (shop) ) (DEC 的)) Miner to the corpus and obtain improvement of the translation. (NN 选址法 (locating method) ))))
Background
Experiments
Syntactic Reordering Approaches
Introduce the system architecture of SAS at WAT 2014; Describe the reordering approaches in detail; Show experiments results to illustrate the effect of our system.
*
*
C3
T1 T1 I
T2 love
C5 C5
C6 C6
T3 T3
T4 T4
the
children
T1 T1 I
C5 C5
C6 C6
T3 T3
T4 T4
the
children
Future work:
T2 love
私は
子供 を
(a) Original
愛していま
(a) Original sentence
す
私は
子供 を
愛していま
(b) Reordered
(b) Reordered sentence
す
Consider Japanese Case Marker in the translation; Add more reordering rules on Chinese to Japanese translation; Attend the work to English to Chinese translation.