Research documents Forfattere "Carl, Michael"
Viser 1-20 af i alt 29
Næste side-
Mesa-Lao, Bartolomé; Carl, Michael (Edinburgh, 2012)[Flere oplysninger][Færre oplysninger]
Resume: In this work package, we evaluate the CASMACAT workbench in eld trials to study the use of the workbench in a real-world environment. We will also integrate the workbench into com- munity translation platforms and collect user activity data from both eld trials and volunteer translators. This Deliverable covers Tasks 6.1 and 6.2. Task 6.1: Field trials at translation agency. Three annual eld trials to evaluate the CASMACAT workbench in a real-world professional translation environment. Task 6.2: Analysis of translator feedback and activity data. Collect feed-back of translators self-estimation through retrospective interviews and correlate this with the activity data. URI: http://hdl.handle.net/10398/9067 Filer i denne post: 1
Michael Carl_d6.1.pdf (481.5Kb) -
Iglesias, Eva Marcos; Pellegrino, Massimiliano; Carl, Michael; García-Martínez, Mercedes; Mesa-Lao, Bartolomé; Underwood, Nancy (Edinburgh, 2013)[Flere oplysninger][Færre oplysninger]
Resume: In this work package, we evaluate the CasMaCat workbench in eld trials to study the use of the workbench in a real-world environment. We will also integrate the workbench into com- munity translation platforms and collect user activity data from both eld trials and volunteer translators. This Deliverable covers Tasks 6.1 and 6.2. Task 6.1: Three eld trials at a translation agency (Celer Soluciones SL)to evaluate the CasMaCat workbench in a real-world professional translation environment. Task 6.2: Analysis of translator feedback and activity data. Collection of feedback of trans- lators' self-estimation through retrospective interviews. URI: http://hdl.handle.net/10398/9061 Filer i denne post: 1
Michael Carl_d6.2.pdf (653.9Kb) -
Alabau, Vicent; Carl, Michael; Martínez, García; González-Rubio, Jesús; Mesa-Lao, Bartolomé; Ortiz-Martínez, Daniel; Rodrigues, Sofia; Schaeffer, Moritz (Edinburgh, 2014)[Flere oplysninger][Færre oplysninger]
Resume: In this work package, we evaluate the CasMaCat workbench in eld trials to study the use of the workbench in a real-world environment. We have also integrated the workbench into community translation platforms and collected user activity data from both eld trials and volunteer translators interacting with the workbench. This Deliverable covers Task 6.1 and 6.2. Task 6.1: Third eld trial at a translation agency (Celer Soluciones SL in Madrid) to evaluate the CasMaCat workbench in a real-world professional translation environment. Task 6.2: Analysis of translator feedback and activity data. Collection of feedback of translators' self-estimation through questionnaires and retrospective interviews. In addition to the originally planned third eld trial for 2014, we have also conducted an additional longitudinal study between April and May 2014 (as discussed in the last review meeting { December 2013). URI: http://hdl.handle.net/10398/9054 Filer i denne post: 1
Michael carl_d6.3.pdf (661.1Kb) -
Bonk, Ragnar; Alabau, Vicent; Carl, Michael; Koehn, Philipp (Edinburgh, 2013)[Flere oplysninger][Færre oplysninger]
Resume: This document contains details about the implementation of the 2nd prototype of the casmacat workbench and the Translation Process Research Database (TPR-DB). It outlines the major components of the workbench and their usage (Sections 1, 2, 3 and 6), as well as the structure and feature of the TPR-DB (Section 7). Since gaze information is the most valuable source for tracking translator e ort in text understanding, and due to the noise inherent in current head-free eye-tracking technology, Sections 4 and 5 report attempts to implement solutions for obtaining better gaze-to-word mapping accuracy. At the time of this writing, an installation guide1 has been written and made available to a select group of alpha testers (researchers from universities and research laboratories) to prepare a wider release of the prototype. URI: http://hdl.handle.net/10398/9062 Filer i denne post: 1
Michael Carl_d5.3.pdf (3.599Mb) -
Carl, Michael; Lacruz, Isabel; Yamada, Masaru; Aizawa, Akiko (Frederiksberg, 2016)[Flere oplysninger][Færre oplysninger]
Resume: Spoken language applications are becoming increasingly operational and are used in many computer applications today. Translation dictation is a mode of translation by which a translator reads a source text and speaks out its translation, instead of typing it. Translation dictation is thus a method of translation situated in between interpretation, where the interpreter hears a text and speaks out the translation (e.g., during conference interpreting) and conventional translation by which a written source text is translated mainly using the keyboard. It is close to sight translation. Translation Dictation was a technique used in some translation bureaus in the 1960s and 1970s (Gingold, 1978) but it has been used less frequently since the mid-80s, as professional translators started using micro-computers (Zapata and Kirkedal, 2015). Already, the ALPAC report (Pierce et al., 1966) mentioned that “productivity of human translators might be as much as four times higher when dictating” as compared to writing, and with today´s increasing quality of voice recognition this mode of translation is experiencing a come-back. The usage of Automatic Speech Recognition (ASR) systems provides an efficient means to produce texts, and our experiments suggest that for some translators and types of text translations it might become even more efficient than post-editing of machine translation. In this paper we describe the ENJA15 translation study and corpus. The ENJA15 corpus is a collection of translation process data that was collected in a collaborative effort by CRITT and NII. The ENJA15 data is part of a bigger data set which will enable us to compare human translation production processes across different languages, different translation modes, including from-scratch translation, machine translation post-editing and translation dictation. URI: http://hdl.handle.net/10398/9281 Filer i denne post: 1
Michael Cral_2016_03.pdf (265.1Kb) -
Carl, Michael; Mesa-Lao, Bartolomé; Schaeffer, Moritz; García-Martínez, Mercedes (Edinburgh, 2014)[Flere oplysninger][Færre oplysninger]
Resume: This deliverable describes the experimental data gathered in Tasks 1.1, 1.2, 1.4 and 1.5, it is related to deliverable D6.5. Numerous translation and post-editing experiments have been conducted during the Cas- MaCat project and many of them have been assembled in a Translation Process Database (TPR-DB) which is hosted at the CRITT1. The current TPR-DB version 2.0 is an extension of the TPR-DB version 1.0 which was described in deliverable D1.1, Appendix 4.5. This deliverable gives an overview of the data collected in TPR-DB version 2.0. A more detailed description of the TPR-DB can be found on the TPR-DB website. A description of the structure and the features is provided in a document on the same site from the link http://bridge.cbs.dk/resources/tpr-db/TPR-DB1.4.pdf. URI: http://hdl.handle.net/10398/9058 Filer i denne post: 1
Michael Carl_d1.4.pdf (462.9Kb) -
Carl, Michael; Doherty, Stephen; O’Brien, Sharon (Preprint, 2010)[Flere oplysninger][Færre oplysninger]
Resume: Eye tracking has been used successfully as a technique for measuring cognitive load in reading, psycholinguistics, writing, language acquisition etc for some time now. Its application as a technique for automatically measuring the reading ease of MT output has not yet, to our knowledge, been tested. We report here on a preliminary study testing the use and validity of an eye tracking methodology as a means of semi- and/or automatically evaluating machine translation output. 50 French machine translated sentences, 25 rated as excellent and 25 rated as poor in an earlier human evaluation, were selected. 10 native speakers of French were instructed to read the MT sentences for comprehensibility. Their eye gaze data were recorded non-invasively using a Tobii 1750 eye tracker. The average gaze time and fixation count were found to be higher for the “bad” sentences, while average fixation duration and pupil dilations were not found to be substantially different between output rated as good or bad. Comparisons between BLEU scores and eye gaze data were also made and found to correlate well with gaze time and fixation count, and to a lesser extent with pupil dilation and fixation duration. We conclude that the eye tracking data, in particular gaze time and fixation count, correlate reasonably well with human evaluation of MT output but fixation duration and pupil dilation may be less reliable indicators of reading difficulty for MT output. We also conclude that eye tracking has promise as an automatic MT Evaluation technique. URI: http://hdl.handle.net/10398/8045 Filer i denne post: 1
SubmissionforMT_dohertyobriencarl.pdf (226.2Kb) -
García-Martínez, Mercedes; Cheung Petersen, Dan; Tsoukala, Chara; Alabau, Vicent; Ortíz-Martínez, Daniel; Koehn, Philipp; Carl, Michael (Edinburgh, 2014)[Flere oplysninger][Færre oplysninger]
Resume: This document contains details about the implementation of the 3rd prototype of the casmacat workbench as well as the CRITT Translation Process Research Database (TPR-DB). It outlines the improvements of the workbench respect of the previous Deliverable 5.3. This deliverable will be updated in month 36 of the project with further improvements. URI: http://hdl.handle.net/10398/9056 Filer i denne post: 1
Michael Carl_d5.4.pdf (1.965Mb) -
AddendumGarcía-Martínez, Mercedes; Carl, Michael; Mesa-Lao, Bartolomé; Alabau, Vicent; Ortíz-Martínez, Daniel; Koehn, Philipp (Edinburgh, 2014)[Flere oplysninger][Færre oplysninger]
Resume: This document is an extension of D5.4 as suggested in the second review report. It contains de- tails about the implementation of the nal prototype of the casmacat workbench and outlines the improvements of the workbench with respect of the previous deliverable 5.4. The objective of WP5 is to integrate the translation system and user interface and to develop the casmacat workbench. This deliverable shows the functional components of the workbench and describes their interaction possibilities in the last casmacat prototype. It also describes the most recent additions to the workbench. URI: http://hdl.handle.net/10398/9057 Filer i denne post: 1
Michael Carl_d5.5.pdf (1.100Mb) -
Carl, Michael; García-Martínez, Mercedes; Hill, Robin; Keller, Frank; Mesa-Lao, Bartolomé; Schaeffer, Moritz (Edinburgh, 2014)[Flere oplysninger][Færre oplysninger]
Resume: D1.3 marks the final CASMACAT report on user interface studies, cognitive and user modelling covering the completion of tasks T1.5 (Cognitive Modelling) and T1.6 (User Modelling) as part of Work Package 1. Within tasks T1.1 to T1.4, a series of experiments have established a solid understanding of human behaviour in computer-aided translation, focusing on the use of visualization options, different translation modalities, individual differences in translation production, translator types and translation/postediting styles. Additionally, the bulk of this experimental data has been released as a publicly available database under a creative common license and further details on this can be found in D1.4. In parallel to these more holistic studies, a second set of experiments aimed to examine some of these factors in a constrained laboratory setting. These focused on the underlying psycholinguistic processing and cognitive modelling of translators’ activity to capture reading difficulty, verification and perplexity during translation and post-editing. This deliverable combines these earlier empirical findings with experiments conducted in Year 3 of the project and grounds translation within a broader theoretical framework associated with human sentence processing and communication. As well as broadening our general understanding of bilingual cognitive processing, there were two major objectives behind the experimental investigations in Year 3. The first was to evaluate the utility of providing translators with Source-Target word alignment information through spatially-direct visual cues. The second was to determine what, if any, differences arise from expertise by comparing the results between a group of bilinguals and a group of professionally trained translators on the same translation-related tasks. URI: http://hdl.handle.net/10398/9059 Filer i denne post: 1
Michael Carl_d1.3.pdf (2.686Mb) -
Carl, Michael (, 2008)[Flere oplysninger][Færre oplysninger]
Resume: One of the aims of the Eye-to-IT project is to investigate the possibility of using eye-tracking devices for detecting situations of targeted help for human translators. A prerequisite for automated assistance in human translation is the understanding and modelling of reading behaviour, the ability to follow human eye movements and to map gaze sample points — the output of eyetracking devices — onto words and symbols fixated. Within the Eye-to-IT project we currently use a so-called “Gaze-to- Word Mapping” (GWM) device (ˇSpakov 2008) that first computes possible fixations from sequences of gaze sample coordinates and then maps the fixations on the words which are likely to be fixated. This paper suggests an alternative framework of a probabilistic gaze mapping model for reading, in which fixations on textual objects are directly computed from the gaze sample points. The framework integrates various knowledge sources with the aim to compute the most likely fixations on words and symbols on the basis of the available data. URI: http://hdl.handle.net/10398/8043 Filer i denne post: 1
CLS.pdf (186.2Kb) -
Carl, Michael; Hill, Robin (Edinburgh, 2012)[Flere oplysninger][Færre oplysninger]
Resume: This WP lays the empirical foundations for the development of the CASMACAT workbench. A series of experiments will establish basic facts about translator behaviour in computer-aided translation, focusing on the use of visualisation option and input modalities. Another series of studies will deal with individual di erences in translation, in particular translator types and translation styles. The initial report deals with translation types and styles, text types and reading model adapted for machine translated texts. It covers the rst periode of Tasks 1.3, 1.4, and 1.5. The deliverable is structured into three sections which bie y summarize the work and an appendix which contains more detailed information about the produced material and a number of papers. An experimental setup (see section 2.1) and a questionnaire (see section 1.1) was designed to obtain consistent data from various translators in di erent languages under similar conditions. Translation data was collected in several locations (section 2.2) and assembled into a TPR database, as described in section 1.2. Preliminary studies were conducted to investigate post- editing and translation styles (section 1.3). Translation data was also collected in the rst casmacat eld trial. The assessment is provided in Deliverable d6.1. Section 3 describes the rst Edinburgh Eyetracking experiment while the Appendix contains furter material. URI: http://hdl.handle.net/10398/9064 Filer i denne post: 1
Michael Carl_d1.1.pdf (2.163Mb) -
Alabau, Vicent; González-Rubio, Jesús; Ortiz-Martínez, Daniel; Sanchis-Trilles, Germán; García-Martínez, Mercedes; Mesa-Lao, Bartolomé; Cheung Pedersen, Dan; Dragsted, Barbara; Carl, Michael (Frederiksberg, 2014)[Flere oplysninger][Færre oplysninger]
Resume: This paper describes a pilot study with a computed-assisted translation workbench aiming at testing the integration of online and active learning features. We investigate the effect of these features on translation productivity, using interactive translation prediction (ITP) as a baseline. User activity data were collected from five beta testers using key-logging and eye-tracking. User feedback was also collected at the end of the experiments in the form of retrospective think-aloud protocols. We found that OL performs better than ITP, especially in terms of trans- lation speed. In addition, AL provides better translation quality than ITP for the same levels of user effort. We plan to incorporate these features in the final version of the workbench. URI: http://hdl.handle.net/10398/9070 Filer i denne post: 1
Michael Carl_ AMTA2014Proceedings_1.pdf (295.8Kb) -
Carl, Michael; Kay, Martin; Jensen, Kristian T. H. (Preprint, 2010)[Flere oplysninger][Færre oplysninger]
Resume: This paper investigates properties of translation processes, as observed in the translation behaviour of student and professional translators. The translation process can be divided into a gisting, drafting and post-editing phase. We find that student translators have longer gisting phases whereas professional translators have longer post-editing phases. Long-distance revisions, which would typically be expected during post-editing, occur to the same extent during drafting as during post-editing. Further, both groups of translators seem to face the same translation problems. We suggest how those findings might be taken into account in the design of computer assisted translation tools. URI: http://hdl.handle.net/10398/8046 Filer i denne post: 1
LonDistRevision.pdf (651.7Kb) -
Carl, Michael; Lacruz, Isabel; Yamada, Masaru; Aizawa, Akiko (Frederiksberg, 2016)[Flere oplysninger][Færre oplysninger]
Resume: By the mid-1980s it was clear that unrestricted high quality machine translation would not be achievable in the foreseeable future and alternative directions to the then dominant rule-based paradigm were proposed. The appropriate level of linguistic representation was difficult to determine, hard to compute; and translation relations were incomplete, error prone and time consuming to formalize within the current-state rule-based translation formalisms. At the same time, with the upcoming availability of Personal Computers (PCs), more translations were produced in electronic form. As translators produce translations daily, implicitly solving those translation problems that are so hard to formalize, Isabelle (1992) said that “Existing translations contain more solutions to more translation problems than any other existing resource.“ New horizons for using MT were thus sought, which led to a number of different paradigms, some of which are briefly described. URI: http://hdl.handle.net/10398/9280 Filer i denne post: 1
Michael Cral_2016_02.pdf (261.8Kb) -
Low Resources Machine TranslationCarl, Michael; Maite, Melero; Badia, Toni; Vandeghinste, Vincent; Dirix, Peter; Schuurman, Ineke; Markantonatou, Stella; Sofianopoulos, Sokratis; Vassiliou, Marina; Yannoutsou, Olga (, 2008)[Flere oplysninger][Færre oplysninger]
Resume: METIS-II was a EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use ‘basic’ linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their ‘home’ languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions. URI: http://hdl.handle.net/10398/8037 Filer i denne post: 1
METIS-II.pdf (503.5Kb) -
Schaeffer, Moritz; Carl, Michael (Nagoya, 2017)[Flere oplysninger][Færre oplysninger]
Resume: This study investigates the coordination of reading (input) and writing (output) activities in from-scratch translation and post-editing. We segment logged eye movements and keylogging data into minimal units of reading and writing activity and model the process of post-editing and from-scratch translation as a Markov model. We show that the time translators and post-editors spend on source or target text reading predicts with a high degree of accuracy how likely it is that they engage in successive typing. We further show that the typing probability is also conditioned by the degree to which source and target text share semantic and syntactic properties. The minimal cognitive Markov model describes very basic factors which play a role in the processes occurring between input (reading) and output (writing) during translation. URI: http://hdl.handle.net/10398/9528 Filer i denne post: 1
Schaeffer_Carl.pdf (280.0Kb) -
Carl, Michael; Lykke Jakobsen,Arnt; Jensen, Kristian T. H. (, 2009)[Flere oplysninger][Færre oplysninger]
Resume: One of the aims of the Eye-to-IT project (FP6 IST 517590) is to integrate keyboard logging and eye-tracking data to study and anticipate the behaviour of human translators. This so-called User-Activity Data (UAD) would make it possible to empirically ground cognitive models and to validate hypotheses of human processing concepts in the data. In order to thoroughly ground a cognitive model of the user in empirical observation, two conditions must be met as a minimum. All UAD data must be fully synchronised so that data relate to a common construct. Secondly, data must be represented in a queryable form so that large volumes of data can be analysed electronically. Two programs have evolved in the Eye-to-IT project: TRANSLOG is designed to register and replay keyboard logging data, while GWM is a tool to record and replay eye-movement data. This paper reports on an attempt to synchronise and integrate the representations of both software components so that sequences of keyboard and eye-movement data can be retrieved and their interaction studied. The outcome of this effort would be the possibility to correlate eye- and keyboard activities of translators (the user model) with properties of the source and target texts and thus to uncover dependencies in the UAD. URI: http://hdl.handle.net/10398/8041 Filer i denne post: 1
NLPCS09.pdf (481.2Kb) -
Lacruz, Isabel; Carl, Michael; Yamada, Masaru; Aizawa, Akiko (Frederiksberg, 2016)[Flere oplysninger][Færre oplysninger]
Resume: Traditionally, attempts to measure Machine Translation (MT) quality have focused on how close output is to a “gold standard” translation. TER (Translation Error Rate) is one standard measure that can be generated automatically. It is the normalized length of the shortest path (smallest number of edits per word) needed to convert MT output to an average of “ideal” translations (Snover et al., 2006). MT quality has now improved so much that post-edited (or in some cases, raw) MT output is routinely used in many applications in place of from-scratch translations. Despite the translators’ continued resistance to post-editing, there is increasing evidence that productivity is greater when translators post-edit rather than translate from scratch (e.g., Green et al., 2013). Machine-assisted alternatives to post-editing, such as Interactive Translation Prediction (see for example Sanchis- Trilles et al., 2014) are also making rapid advances. Because of these changing paradigms, alternative ways of measuring MT quality are being developed. Under many circumstances, perfect accuracy is not necessary: it is enough for MT output to be “good enough.” The end-user of the raw product should be able to use it with little effort, and the posteditor should easily be able to produce a satisfactory product. MT utility is determined by the effect the MT output has on the actual effort expended by the user, while MT adequacy is determined by the anticipated demand the MT output places on the user. Adequacy has been measured by human judgments along Likert scales, as well as by automatic metrics such as TER. In the context of post-editing, TER is modified to HTER, to measure the discrepancy between MT output and the final post-edited product. Thus, HTER measures the smallest number of necessary edits per word during post-editing. URI: http://hdl.handle.net/10398/9279 Filer i denne post: 1
Michael Cral_2016_01.pdf (467.9Kb) -
Singla, Karan; Orrego-Carmona, David; Gonzales, Ashleigh Rhea; Carl, Michael; Bangalore, Srinivas (Frederiksberg, 2014)[Flere oplysninger][Færre oplysninger]
Resume: The purpose of the current investigation is to predict post-editor profiles based on user be- haviour and demographics using machine learning techniques to gain a better understanding of post-editor styles. Our study extracts process unit features from the CasMaCat LS14 database from the CRITT Translation Process Research Database (TPR-DB). The analysis has two main research goals: We create n-gram models based on user activity and part-of-speech sequences to automatically cluster post-editors, and we use discriminative classifier models to character- ize post-editors based on a diverse range of translation process features. The classification and clustering of participants resulting from our study suggest this type of exploration could be used as a tool to develop new translation tool features or customization possibilities. URI: http://hdl.handle.net/10398/9071 Filer i denne post: 1
Michael Carl_ AMTA2014Proceedings_2.pdf (197.8Kb)
Viser 1-20 af i alt 29
Næste side