Syntactic reordering in statistical machine translation

OPEN ARCHIVE

Union Jack
Dannebrog

Syntactic reordering in statistical machine translation

Show simple item record

dc.contributor.author Elming, Jakob
dc.date.accessioned 2009-09-29
dc.date.accessioned 2009-09-29T10:54:15Z
dc.date.available 2009-09-29T10:54:15Z
dc.date.issued 2009-09-29
dc.identifier.isbn 9788759383698
dc.identifier.uri http://hdl.handle.net/10398/7922
dc.description.abstract Reordering has been an important topic in statistical machine translation (SMT) as long as SMT has been around. State-of-the-art SMT systems such as Pharaoh (Koehn, 2004a) still employ a simplistic model of the reordering process to do non-local reordering. This model penalizes any reordering no matter the words. The reordering is only selected if it leads to a translation that looks like a much better sentence than the alternative. Recent developments have, however, seen improvements in translation quality following from syntax-based reordering. One such development is the pre-translation approach that adjusts the source sentence to resemble target language word order prior to translation. This is done based on rules that are either manually created or automatically learned from word aligned parallel corpora. We introduce a novel approach to syntactic reordering. This approach provides better exploitation of the information in the reordering rules and eliminates problematic biases of previous approaches. Although the approach is examined within a pre-translation reordering framework, it easily extends to other frameworks. Our approach significantly outperforms a state-of-the-art phrase-based SMT system and previous approaches to pretranslation reordering, including (Li et al., 2007; Zhang et al., 2007b; Crego & Mari˜ no, 2007). This is consistent both for a very close language pair, English-Danish, and a very distant language pair, English-Arabic. We also propose automatic reordering rule learning based on a rich set of linguistic information. As opposed to most previous approaches that extract a large set of rules, our approach produces a small set of predominantly general rules. These provide a good reflection of the main reordering issues of a given language pair. We examine the influence of several parameters that may have influence on the quality of the rules learned. Finally, we provide a new approach for improving automatic word alignment. This word alignment is used in the above task of automatically learning reordering rules. Our approach learns from hand aligned data how to combine several automatic word alignments to one superior word alignment. The automatic word alignments are created from the same data that has been preprocessed with different tokenization schemes. Thus utilizing the different strengths that different tokenization schemes exhibit in word alignment. We achieve a 38% error reduction for the automatic word alignment en_US
dc.format.extent 209 s. en_US
dc.language eng en_US
dc.subject.other Ph.d.-afhandlinger en_US
dc.title Syntactic reordering in statistical machine translation en_US
dc.type phd en_US
dc.accessionstatus modt09sep29 jobrmo en_US
dc.contributor.corporation Copenhagen Business School. CBS en_US
dc.contributor.corporationshort LIMAC en_US
dc.contributor.department Institut for Internationale Sprogstudier og Vidensteknologi ( en_US
dc.contributor.departmentshort ISV( en_US
dc.contributor.departmentuk Department of International Language Studies and Computational Linguistics( en_US
dc.contributor.departmentukshort ISV( en_US
dc.idnumber x656602987 en_US
dc.publisher.city Frederiksberg en_US
dc.publisher.year 2008 en_US


Creative Commons License This work is licensed under a Creative Commons License.

Files Size Format View
jakob_elming.pdf 1009.Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record