Comparing spoken and written translation with post-editing in the -> English à Japanese Translation Corpus

OPEN ARCHIVE

Union Jack
Dannebrog

Comparing spoken and written translation with post-editing in the -> English à Japanese Translation Corpus

Show full item record

Title: Comparing spoken and written translation with post-editing in the -> English à Japanese Translation Corpus
Author: Carl, Michael; Lacruz, Isabel; Yamada, Masaru; Aizawa, Akiko
Abstract: Spoken language applications are becoming increasingly operational and are used in many computer applications today. Translation dictation is a mode of translation by which a translator reads a source text and speaks out its translation, instead of typing it. Translation dictation is thus a method of translation situated in between interpretation, where the interpreter hears a text and speaks out the translation (e.g., during conference interpreting) and conventional translation by which a written source text is translated mainly using the keyboard. It is close to sight translation. Translation Dictation was a technique used in some translation bureaus in the 1960s and 1970s (Gingold, 1978) but it has been used less frequently since the mid-80s, as professional translators started using micro-computers (Zapata and Kirkedal, 2015). Already, the ALPAC report (Pierce et al., 1966) mentioned that “productivity of human translators might be as much as four times higher when dictating” as compared to writing, and with today´s increasing quality of voice recognition this mode of translation is experiencing a come-back. The usage of Automatic Speech Recognition (ASR) systems provides an efficient means to produce texts, and our experiments suggest that for some translators and types of text translations it might become even more efficient than post-editing of machine translation. In this paper we describe the ENJA15 translation study and corpus. The ENJA15 corpus is a collection of translation process data that was collected in a collaborative effort by CRITT and NII. The ENJA15 data is part of a bigger data set which will enable us to compare human translation production processes across different languages, different translation modes, including from-scratch translation, machine translation post-editing and translation dictation.
URI: http://hdl.handle.net/10398/9281
Date: 2016-02-25
Notes: Paper presented at The 22nd Annual Meeting of the Association for Natural Language Processing (NLP2016). Tohoku University, Japan, March 2016

Creative Commons License This work is licensed under a Creative Commons License.

Files Size Format View
Michael Cral_2016_03.pdf 258.9Kb PDF View/Open Conference paper

This item appears in the following Collection(s)

Show full item record