Browsing Department of International Language Studies and Computational Linguistics (ISV) by Title
Previous Page
Now showing items 27-31 of 31
-
Carl, Michael (, 2008)[More information][Less information]
Abstract: The paper introduces a new research strategy for the investigation of human translation behavior. While conventional cognitive research methods make use of think aloud protocols (TAP), we introduce and investigate User- Activity Data (UAD). UAD consists of the translator’s recorded keystroke and eye-movement behavior, which makes it possible to replay a translation session and to register the subjects’ comments on their own behavior during a retrospective interview. UAD has the advantage of being objective and reproducable, and, in contrast to TAP, does not interfere with the translation process. The paper gives the background of this technique and an example on a English-to-Danish translation. Our goal is to elaborate and investigate cognitively grounded basic translation concepts which are materialized and traceable in the UAD and which, in a later stage, will provide the basis for appropriate and targeted help for the translator at a given moment. URI: http://hdl.handle.net/10398/8044 Files in this item: 1
UAD-3.pdf (408.4Kb) -
Elming, Jakob (Frederiksberg, 2008)[More information][Less information]
Abstract: Reordering has been an important topic in statistical machine translation (SMT) as long as SMT has been around. State-of-the-art SMT systems such as Pharaoh (Koehn, 2004a) still employ a simplistic model of the reordering process to do non-local reordering. This model penalizes any reordering no matter the words. The reordering is only selected if it leads to a translation that looks like a much better sentence than the alternative. Recent developments have, however, seen improvements in translation quality following from syntax-based reordering. One such development is the pre-translation approach that adjusts the source sentence to resemble target language word order prior to translation. This is done based on rules that are either manually created or automatically learned from word aligned parallel corpora. We introduce a novel approach to syntactic reordering. This approach provides better exploitation of the information in the reordering rules and eliminates problematic biases of previous approaches. Although the approach is examined within a pre-translation reordering framework, it easily extends to other frameworks. Our approach significantly outperforms a state-of-the-art phrase-based SMT system and previous approaches to pretranslation reordering, including (Li et al., 2007; Zhang et al., 2007b; Crego & Mari˜ no, 2007). This is consistent both for a very close language pair, English-Danish, and a very distant language pair, English-Arabic. We also propose automatic reordering rule learning based on a rich set of linguistic information. As opposed to most previous approaches that extract a large set of rules, our approach produces a small set of predominantly general rules. These provide a good reflection of the main reordering issues of a given language pair. We examine the influence of several parameters that may have influence on the quality of the rules learned. Finally, we provide a new approach for improving automatic word alignment. This word alignment is used in the above task of automatically learning reordering rules. Our approach learns from hand aligned data how to combine several automatic word alignments to one superior word alignment. The automatic word alignments are created from the same data that has been preprocessed with different tokenization schemes. Thus utilizing the different strengths that different tokenization schemes exhibit in word alignment. We achieve a 38% error reduction for the automatic word alignment URI: http://hdl.handle.net/10398/7922 Files in this item: 1
jakob_elming.pdf (1.033Mb) -
A Program for Recording User Activity Data for Empirical Reading and Writing ResearchCarl, Michael (Frederiksberg, 2012)[More information][Less information]
Abstract: This paper presents a novel implementation of Translog-II. Translog-II is a Windows-oriented program to record and study reading and writing processes on a computer. In our research, it is an instrument to acquire objective, digital data of human translation processes. As their predecessors, Translog 2000 and Translog 2006, also Translog-II consists of two main components: Translog-II Supervisor and Translog-II User, which are used to create a project file, to run a text production experiments (a user reads, writes or translates a text) and to replay the session. Translog produces a log files which contains all user activity data of the reading, writing, or translation session, and which can be evaluated by external tools. While there is a large body of translation process research based on Translog, this paper gives an overview of the Translog-II functions and its data visualization options. URI: http://hdl.handle.net/10398/8435 Files in this item: 1
Michael_Carl_2012.pdf (824.8Kb) -
Quantifying alignment units with keystroke dataCarl, Michael (, 2009)[More information][Less information]
Abstract: The paper discusses a method to triangulate process and product data. We suggest converting Translog data into a relational format which contains both process and product data. We outline how this representation allows us to retrieve and correlate the various dimensions of the data more easily. The concept of Alignment Unit (AU) is introduced and contrasted with that of Translation Unit (TU). While AUs refer to translation equivalences in the source and target texts of the product data, TUs refer to cognitive entities that can be observed in the process data. With an (almost) exhaustive fragmentation of the source and target texts into AUs, we are able to distribute and allocate the entire set of keystroke data to appropriate AUs. Using the properties of the keystroke data, AUs are quantified in a novel way which enables us to visualise and investigate the structure of translation production on a fine-grained scale. URI: http://hdl.handle.net/10398/8040 Files in this item: 1
keystrokes.pdf (940.0Kb) -
Korzen, Iørn; Gylling, Morten (Hamburg, 2011)[More information][Less information]
Abstract: This paper argues that translators can greatly benefit from contrastive studies of discourse structure. Cross-linguistic studies of Italian and Danish point to significant typological differences in information packaging in the two languages, especially in their use of deverbalisation. Italian sentences tend to include a larger number of Elementary Discourse Units (EDUs), especially propositions, than Danish. A higher percentage of these is rhetorically backgrounded by means of non-finite and nominalised predicates. Danish text structure, on the other hand, is more informationally linear and characterised by a higher number of finite verbs and topic shifts. These typological differences are transferred into three simple translation rules concerning 1) the number of EDUs, 2) the rhetorical structure, and 3) the textualisation of rhetorical satellites. URI: http://hdl.handle.net/10398/8416 Files in this item: 1
Korzen_Gylling.pdf (513.0Kb)
Previous Page
Now showing items 27-31 of 31