Browsing Department of International Language Studies and Computational Linguistics (ISV) by Year Published
Previous Page
Now showing items 21-31 of 31
-
Juel Henrichsen, Peter (, 2009)[More information][Less information]
Abstract: This working paper presents the CBS text-to-speech tool colloquially known as the TtT (Tekst-til-Tale). The tool is intended for training of university-level students, especially linguists training for a degree in speech technology, and visiting foreign students wanting to improve their spoken Danish. The TtT is operated through a simple wwwbased user-interface. Using the TtT requires basic skills in formal grammar-writing, but no knowledge on other aspects of artificial voice development such as phonetic-acoustic quantification, prosodic modelling, and signal generation. The paper includes a user manual. URI: http://hdl.handle.net/10398/7763 Files in this item: 1
2009-1.pdf (363.0Kb) -
A Study of CWA Raters' Decision-Making BehavioursLindhardsen, Vivian (Frederiksberg, 2009)[More information][Less information]
Abstract: The present maps study maps the decision-making behaviors of experienced raters in a well-established Communal Writing Assessment (CWA) context, tracing their behaviors all the way from the independent rating sessions, where the initial images and judgments are formed, to the communal rating sessions, where the final scores are assigned on the basis of collaboration between two rates. Results from think-aloud protocols, recorded discussions, retrespective reports and reported scores from 20 raters rating 15 ESL essays show that when moving from the independent ratings to the communal ratings, there is little, if any, increase in rater agreement levels and the raters' attention to the textual features corresponding to the official criteria become more evenly distributed. However, rather than consulting the scale descriptors directly in resolving insecurities about score assignment, the raters seemed to rely heavily on each others' expertise, thereby reducing the importance of the scale and emphasizing the value of the community of raters. In validating their scores in the communal rating discussions the raters appeared to be critically and equally engaged in the discussions, and through deliberating and refining their assessments the raters believed that CWA practices produce more accurate scores than in independent ratings and lead to professional development. These interpretations support a hermeneutic rather than a psychometric approach to establishing the validity of the present CWA practices. URI: http://hdl.handle.net/10398/7743 Files in this item: 1
Vivian_Lindhardsen.pdf (8.523Mb) -
En valensgrammatisk undersøgelseSkot-Hansen, Annemette (Frederiksberg, 2009)[More information][Less information]
Abstract: The purpose of this dissertation is to map the valency structure of a subset of the French adverbs. More specifically, the dissertation seeks to answer the following questions: What valency structure follows from the lexical content of the adverbs investigated? What is the nature of the semantic relation established? What is the status of the valents relative to the adverb and relative to other valents? The empirical object of investigation is focused on adverbs derived from adjectives which take prepositional phrases headed by the preposition à as their complement. In addition, the delimitation chosen for this dissertation is a class of adverbs which share the feature that they carry the suffix -ment, which developed from the Latin noun mens, meaning “spirit/thought/mood/tenor”. It is argued that the fusion of an adverb and mens establishes the general meaning [in an adjective spirit/thought/mood/tenor], i.e. the adverb retains the general quality denoted by the adjective, but the meaning targets the verb situation (at clause level) or the quality (at phrasal level) which saturates the argument of the adverb. Following tradition, the analysis adopted here, takes the verb situation to be realised by the predicate, and the quality to be realised by an adjective phrase, which may be realised by a past participle or, in rare cases, by another adverb. Since the valent is required by the lexical content of the adverb, it is assumed, following Herslund and Sørensen, that the valent is a fundamental valent. Another important feature of the adverbs which are analysed in this dissertation is that they establish a relation between two entities. This means that in addition to its fundamental valent, the adverb takes a further valent which it links with the fundamental valent. This second valent is referred to as the second valent of the adverb. The two valents are analysed as two relata in a relation. Unlike the fundamental valent, the second valent is always at phrasal level. When the adverb functions at clausal level, the second valent is realised as the prepositional object of the preposition phrase headed by à. This realisation is, however, not possible when the adverb functions at phrasal level. It is argued that this is a consequence of the fact that it is impossible to insert other constituents between the adverb and the adjective, adverb or participle which is modified by the adverb. The result is that where the second valent is realised, the adverb moves from preposition to postposition relative to its fundamental valent. In the data investigated the second valent denotes very different entities such as situations denoted by verbs and qualities, but also objects and abstract entities. The individual adverbs which are investigated here each determine their valency. In general there are different sources that allow us to uncover the core meaning of a word. The sources chosen in this dissertation are: the semantic roles assigned by the adverbs, their symmetry, elements of shared semantics or partial synonymy, their morphology and etymological roots. In order to bring together these different sources, the dissertation postulates a denotation design for each adverb. The etymology of the adverbs has been a particularly helpful in determining the relation and valency they establish. In addition to adverb and adjective suffixes, the majority of the adverbs investigated have a preposition in their synchronic morphological make-up which denotes a relation between two entities: some adverbs contain both a preposition and a morpheme from another word class, e.g. comparativement and subséquemment, while others contain only a preposition, e.g. antérieurement and postérieurement. A very small subset does not contain a preposition, but only a single adverb morpheme which denotes the relation in question, so, for instance, the adjectives par and similis, which have formed pareillement and semblablement, denote a relation between two relata. From an etymological perspective, a few adverbs, such as latéralement, do not denote a relation – so it is only through the formal realisation of the preposition phrase that the relation is established. The dissertation maps the etymological and morphological structure of the adverb and the range of functions that the adverb and its valents can have at clausal and phrasal level. The function of the adverb is relevant to the extent that the function affects its semantics and its valency structure. The effect of function is seen in some adverbs when they operate on clausal or on phrasal level and in other adverbs when they modify entire clauses or just the verb. URI: http://hdl.handle.net/10398/7944 Files in this item: 1
Annemette_Skot-Hansen.pdf (2.974Mb) -
Carl, Michael; Lykke Jakobsen,Arnt; Jensen, Kristian T. H. (, 2009)[More information][Less information]
Abstract: One of the aims of the Eye-to-IT project (FP6 IST 517590) is to integrate keyboard logging and eye-tracking data to study and anticipate the behaviour of human translators. This so-called User-Activity Data (UAD) would make it possible to empirically ground cognitive models and to validate hypotheses of human processing concepts in the data. In order to thoroughly ground a cognitive model of the user in empirical observation, two conditions must be met as a minimum. All UAD data must be fully synchronised so that data relate to a common construct. Secondly, data must be represented in a queryable form so that large volumes of data can be analysed electronically. Two programs have evolved in the Eye-to-IT project: TRANSLOG is designed to register and replay keyboard logging data, while GWM is a tool to record and replay eye-movement data. This paper reports on an attempt to synchronise and integrate the representations of both software components so that sequences of keyboard and eye-movement data can be retrieved and their interaction studied. The outcome of this effort would be the possibility to correlate eye- and keyboard activities of translators (the user model) with properties of the source and target texts and thus to uncover dependencies in the UAD. URI: http://hdl.handle.net/10398/8041 Files in this item: 1
NLPCS09.pdf (481.2Kb) -
Quantifying alignment units with keystroke dataCarl, Michael (, 2009)[More information][Less information]
Abstract: The paper discusses a method to triangulate process and product data. We suggest converting Translog data into a relational format which contains both process and product data. We outline how this representation allows us to retrieve and correlate the various dimensions of the data more easily. The concept of Alignment Unit (AU) is introduced and contrasted with that of Translation Unit (TU). While AUs refer to translation equivalences in the source and target texts of the product data, TUs refer to cognitive entities that can be observed in the process data. With an (almost) exhaustive fragmentation of the source and target texts into AUs, we are able to distribute and allocate the entire set of keystroke data to appropriate AUs. Using the properties of the keystroke data, AUs are quantified in a novel way which enables us to visualise and investigate the structure of translation production on a fine-grained scale. URI: http://hdl.handle.net/10398/8040 Files in this item: 1
keystrokes.pdf (940.0Kb) -
Carl, Michael (, 2008)[More information][Less information]
Abstract: The paper introduces a new research strategy for the investigation of human translation behavior. While conventional cognitive research methods make use of think aloud protocols (TAP), we introduce and investigate User- Activity Data (UAD). UAD consists of the translator’s recorded keystroke and eye-movement behavior, which makes it possible to replay a translation session and to register the subjects’ comments on their own behavior during a retrospective interview. UAD has the advantage of being objective and reproducable, and, in contrast to TAP, does not interfere with the translation process. The paper gives the background of this technique and an example on a English-to-Danish translation. Our goal is to elaborate and investigate cognitively grounded basic translation concepts which are materialized and traceable in the UAD and which, in a later stage, will provide the basis for appropriate and targeted help for the translator at a given moment. URI: http://hdl.handle.net/10398/8044 Files in this item: 1
UAD-3.pdf (408.4Kb) -
Carl, Michael (, 2008)[More information][Less information]
Abstract: One of the aims of the Eye-to-IT project is to investigate the possibility of using eye-tracking devices for detecting situations of targeted help for human translators. A prerequisite for automated assistance in human translation is the understanding and modelling of reading behaviour, the ability to follow human eye movements and to map gaze sample points — the output of eyetracking devices — onto words and symbols fixated. Within the Eye-to-IT project we currently use a so-called “Gaze-to- Word Mapping” (GWM) device (ˇSpakov 2008) that first computes possible fixations from sequences of gaze sample coordinates and then maps the fixations on the words which are likely to be fixated. This paper suggests an alternative framework of a probabilistic gaze mapping model for reading, in which fixations on textual objects are directly computed from the gaze sample points. The framework integrates various knowledge sources with the aim to compute the most likely fixations on words and symbols on the basis of the available data. URI: http://hdl.handle.net/10398/8043 Files in this item: 1
CLS.pdf (186.2Kb) -
Low Resources Machine TranslationCarl, Michael; Maite, Melero; Badia, Toni; Vandeghinste, Vincent; Dirix, Peter; Schuurman, Ineke; Markantonatou, Stella; Sofianopoulos, Sokratis; Vassiliou, Marina; Yannoutsou, Olga (, 2008)[More information][Less information]
Abstract: METIS-II was a EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use ‘basic’ linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their ‘home’ languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions. URI: http://hdl.handle.net/10398/8037 Files in this item: 1
METIS-II.pdf (503.5Kb) -
Elming, Jakob (Frederiksberg, 2008)[More information][Less information]
Abstract: Reordering has been an important topic in statistical machine translation (SMT) as long as SMT has been around. State-of-the-art SMT systems such as Pharaoh (Koehn, 2004a) still employ a simplistic model of the reordering process to do non-local reordering. This model penalizes any reordering no matter the words. The reordering is only selected if it leads to a translation that looks like a much better sentence than the alternative. Recent developments have, however, seen improvements in translation quality following from syntax-based reordering. One such development is the pre-translation approach that adjusts the source sentence to resemble target language word order prior to translation. This is done based on rules that are either manually created or automatically learned from word aligned parallel corpora. We introduce a novel approach to syntactic reordering. This approach provides better exploitation of the information in the reordering rules and eliminates problematic biases of previous approaches. Although the approach is examined within a pre-translation reordering framework, it easily extends to other frameworks. Our approach significantly outperforms a state-of-the-art phrase-based SMT system and previous approaches to pretranslation reordering, including (Li et al., 2007; Zhang et al., 2007b; Crego & Mari˜ no, 2007). This is consistent both for a very close language pair, English-Danish, and a very distant language pair, English-Arabic. We also propose automatic reordering rule learning based on a rich set of linguistic information. As opposed to most previous approaches that extract a large set of rules, our approach produces a small set of predominantly general rules. These provide a good reflection of the main reordering issues of a given language pair. We examine the influence of several parameters that may have influence on the quality of the rules learned. Finally, we provide a new approach for improving automatic word alignment. This word alignment is used in the above task of automatically learning reordering rules. Our approach learns from hand aligned data how to combine several automatic word alignments to one superior word alignment. The automatic word alignments are created from the same data that has been preprocessed with different tokenization schemes. Thus utilizing the different strengths that different tokenization schemes exhibit in word alignment. We achieve a 38% error reduction for the automatic word alignment URI: http://hdl.handle.net/10398/7922 Files in this item: 1
jakob_elming.pdf (1.033Mb) -
A white paperBuch-Kromann, Matthias (København, 2007)[More information][Less information]
Abstract: In this white paper, we review the theoretical evidence about the computational efficiency of dependency parsing and machine translation without the widely used, but linguistically questionable assumptions about projectivity and edge-factoring. On the basis of the heuristic local optimality parser proposed by (Buch-Kromann, 2006), we propose a common architecture for monolingual parsing, parallel parsing, and translation that does not make these assumptions. Finally, we describe the elementary repair operations in the model, and argue that the model is potentially interesting as a model of human translation. URI: http://hdl.handle.net/10398/6846 Files in this item: 1
2007-1.pdf (355.9Kb) -
Med udgangspunkt i støtteverbers leksikaliseringsmønstre i dansk og franskHein, Birgitte (Frederiksberg, 2003)[More information][Less information]
Abstract: Enhver oversætter mellem et germansk sprog som dansk og et romansk sprog som fransk ved, at det ofte er bestemte sproglige konstruktioner, der volder problemer. En af disse konstruktioner består af et støtteverbum og et objekt, der tilsammen danner en semantisk enhed. Da denne konstruktion er hyppigt forekommende, specielt i juridiske og administrative tekster, kan det være af både praktisk og teoretisk værdi at skaffe et klarere billede af, hvordan konstruktionerne idiomatisk opbygges og bruges på de to sprog. Undersøgelsen søger at indskrive sig i en sammenhæng, der vedrører både oversættelse og lingvistisk beskrivelse, ud fra et ønske om at en komparativ beskrivelse skal kunne give en oversætter viden, som han kan bruge i sit praktiske arbejde. De fleste, som har benyttet computer-støttede oversættelser, må være enige i, at det stadig er nødvendigt med kvalificeret menneskelig oversættelse, hvis man skal have en idiomatisk korrekt og brugbart resultat. Der er ganske vist i dag mulighed for computer-støttede ”rå-oversættelser”. Somme tider kan disse oversættelser tjene til for eksempel at give en internetbruger et hurtigt indtryk af indholdet af en web-side på et sprog, som han ikke behersker.... URI: http://hdl.handle.net/10398/8623 Files in this item: 1
Birgitte_Hein.pdf (776.8Kb)
Previous Page
Now showing items 21-31 of 31