<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns="http://www.w3.org/2005/Atom">
<title>Working Papers (ISV)</title>
<link href="http://hdl.handle.net/10398/83" rel="alternate"/>
<subtitle/>
<id>http://hdl.handle.net/10398/83</id>
<updated>2013-06-19T10:59:01Z</updated>
<dc:date>2013-06-19T10:59:01Z</dc:date>
<entry>
<title>Hierarchy-based Partition Models: Using Classification Hierarchies to</title>
<link href="http://hdl.handle.net/10398/8221" rel="alternate"/>
<author>
<name>Buch-Kromann, Matthias</name>
</author>
<author>
<name>Haulrich, Martin</name>
</author>
<id>http://hdl.handle.net/10398/8221</id>
<updated>2011-09-09T00:03:33Z</updated>
<published>2010-12-14T00:00:00Z</published>
<summary type="text">Hierarchy-based Partition Models: Using Classification Hierarchies to
Buch-Kromann, Matthias; Haulrich, Martin
We propose a novel machine learning&#13;
technique that can be used to estimate&#13;
probability distributions for categorical&#13;
random variables that are equipped with&#13;
a natural set of classification hierarchies,&#13;
such as words equipped with word class&#13;
hierarchies, wordnet hierarchies, and suffix&#13;
and affix hierarchies. We evaluate the&#13;
estimator on bigram language modelling&#13;
with a hierarchy based on word suffixes,&#13;
using English, Danish, and Finnish data&#13;
from the Europarl corpus with training sets&#13;
of up to 1–1.5 million words. The results&#13;
show that the proposed estimator outperforms&#13;
modified Kneser-Ney smoothing in&#13;
terms of perplexity on unseen data. This&#13;
suggests that important information is hidden&#13;
in the classification hierarchies that we&#13;
routinely use in computational linguistics,&#13;
but that we are unable to utilize this information&#13;
fully because our current statistical&#13;
techniques are either based on simple&#13;
counting models or designed for sample&#13;
spaces with a distance metric, rather than&#13;
sample spaces with a non-metric topology&#13;
given by a classification hierarchy.&#13;
Keywords: machine learning; categorical&#13;
variables; classification hierarchies; language&#13;
modelling; statistical estimation
</summary>
<dc:date>2010-12-14T00:00:00Z</dc:date>
</entry>
<entry>
<title>The DTAG treebank tool. Annotating and querying treebanks and</title>
<link href="http://hdl.handle.net/10398/8222" rel="alternate"/>
<author>
<name>Buch-Kromann, Matthias</name>
</author>
<id>http://hdl.handle.net/10398/8222</id>
<updated>2012-01-09T09:46:33Z</updated>
<published>2010-12-14T00:00:00Z</published>
<summary type="text">The DTAG treebank tool. Annotating and querying treebanks and
Buch-Kromann, Matthias
DTAG is a versatile annotation tool that&#13;
supports manual and semi-automatic annotation&#13;
of a wide range of linguistic phenomena,&#13;
including the annotation of syntax,&#13;
discourse, coreference, morphology,&#13;
and word alignments. It includes commands&#13;
for editing general labeled graphs&#13;
and graph alignments, comparing annotations,&#13;
managing annotation tasks, and interfacing&#13;
with a revision control system.&#13;
Its visualization component can display&#13;
graphs and alignments for entire texts in a&#13;
compact format, with a highly flexible and&#13;
configurable formatting scheme. It also&#13;
provides a powerful search-replace mechanism&#13;
with queries based on full first-order&#13;
logic, which can be used to search for&#13;
linguistic constructions and automatically&#13;
apply graph transformations to collections&#13;
of annotated graphs.
</summary>
<dc:date>2010-12-14T00:00:00Z</dc:date>
</entry>
<entry>
<title>A Danish Nonsense Syllable Speech Material</title>
<link href="http://hdl.handle.net/10398/8218" rel="alternate"/>
<author>
<name>Christiansen, Thomas U.</name>
</author>
<id>http://hdl.handle.net/10398/8218</id>
<updated>2010-12-09T09:04:30Z</updated>
<published>2010-12-09T00:00:00Z</published>
<summary type="text">A Danish Nonsense Syllable Speech Material
Christiansen, Thomas U.
Nonsense syllable speech materials are often used when investigating speech perception in quiet&#13;
and under adverse conditions. The main advantage of using nonsense syllables over words and&#13;
sentences is that the acoustic as well as linguistic context is minimal. This paper describes the&#13;
considerations involved in producing three anechoic recordings of 14 male and 14 female native&#13;
talkers of Danish each speaking 65 nonsense syllables repeated three times with falling F0 (total of&#13;
16380 syllables).
</summary>
<dc:date>2010-12-09T00:00:00Z</dc:date>
</entry>
<entry>
<title>The CBS Text-to-Speech Workbench</title>
<link href="http://hdl.handle.net/10398/7763" rel="alternate"/>
<author>
<name>Juel Henrichsen, Peter</name>
</author>
<id>http://hdl.handle.net/10398/7763</id>
<updated>2009-08-05T09:28:50Z</updated>
<published>2009-04-07T13:28:51Z</published>
<summary type="text">The CBS Text-to-Speech Workbench
Juel Henrichsen, Peter
This working paper presents the CBS text-to-speech tool colloquially&#13;
known as the TtT (Tekst-til-Tale). The tool is intended for training of&#13;
university-level students, especially linguists training for a degree in&#13;
speech technology, and visiting foreign students wanting to improve&#13;
their spoken Danish. The TtT is operated through a simple wwwbased&#13;
user-interface. Using the TtT requires basic skills in formal&#13;
grammar-writing, but no knowledge on other aspects of artificial voice&#13;
development such as phonetic-acoustic quantification, prosodic&#13;
modelling, and signal generation. The paper includes a user manual.
</summary>
<dc:date>2009-04-07T13:28:51Z</dc:date>
</entry>
</feed>
