Browsing Conference papers by Author "Juel Henrichsen, Peter"
Now showing items 1-7 of 7
-
Uneson, Marcus; Juel Henrichsen, Peter (Jachranka, 2011)[More information][Less information]
-
Juel Henrichsen, Peter (, 2011)[More information][Less information]
Abstract: Modern hearing aids use a variety of advanced digital signal processing methods in order to improve speech intelligibility. These methods are based on knowledge about the acoustics outside the ear as well as psychoacoustics. We present a novel observation based on the fact that acoustic prominence is not equal to information prominence for time intervals at the syllabic and sub-syllabic levels. The idea is that speech elements with a high degree of information can be robustly identified based on basic acoustic properties. We evaluated the correlation of (information rich) content words in the DanPASS corpus with fundamental frequency (F0) and spectral tilt across four frequency bands. Our results show a correlation of certain band-level differences and the presence of content words. Similarly, but to a lesser extent, a correlation between F0 and the presence of content words was found. The principle described here has the potential to improve the “information-to-noise” ratio in hearing aids. In addition, this concept may also be applicable in automatic speech recognition systems. URI: http://hdl.handle.net/10398/8411 Files in this item: 1
Peter_Juel_Henrichsen_ISAAR2011.pdf (296.9Kb) -
Data-driven prosodic feature assignment for diphone synthesisJuel Henrichsen, Peter (Frederiksberg, 2)[More information][Less information]
Abstract: Today's synthetic voices are largely based on diphone synthesis (DiSyn) and unit selection synthesis (UnitSyn). In most DiSyn systems, prosodic envelopes are generated with formal models while UnitSyn systems refer to extensive, highly indexed sound databases. Each approach has its drawbacks; such as low naturalness (DiSyn) and dependence on huge amounts of background data (UnitSyn). We present a hybrid model based on high-level speech data. As preliminary tests show, prosodic models combining DiSyn style at the phone level with UnitSyn style at the supra-segmental levels may approach UnitSyn quality on a DiSyn footprint. Our test data are Danish, but our algorithm is language neutral. URI: http://hdl.handle.net/10398/8595 Files in this item: 1
Henrichsen.pdf (158.2Kb) -
Christiansen, Thomas U.; Juel Henrichsen, Peter (Aalborg, 2011)[More information][Less information]
Abstract: Nonsense syllable speech materials are often used when investigating speech perception in quiet and under adverse conditions. The main advantage of using nonsense syllables over words and sentences is that the acoustic as well as the linguistic context is minimal. This paper presents three anechoic recordings of 13 male and 13 female native talkers of Danish each speaking 65 nonsense syllables repeated three times with the neutral intonation contour for Danish (in total 15210 syllables). The authors compared and ranked groups of three recordings. These three recording had the same talker and had identical phonetic content. The syllables were ranked according to the general “appropriateness” and consistency, i.e., prototypical production of the consonant-vowel (CV) with respect to applicability in speech perceptual studies. The results were compared to results of an automatic method based on acoustic measures. The two novel ideas are 1) to devise an automated method for evaluating “appropriateness” of CVs and 2) to develop a Danish CV-material annotated with an objective measure of “appropriateness” for each recorded CV. The latter would potentially render more CV’s appropriate for perceptual studies. Moreover, objective evaluation would make it possible to examine any perceptual effects of variability in CV production (for example how susceptible different renderings by the same talker of CV’s are to background noise). To the knowledge of the authors, no such material has yet been published for any language. URI: http://hdl.handle.net/10398/8412 Files in this item: 1
Peter_Juel_Henrichsen_2.pdf (427.2Kb) -
A dual-layer Danish speech corpus for perception studiesChristiansen, Thomas Ulrich; Juel Henrichsen, Peter (Frederiksberg, 2012)[More information][Less information]
Abstract: In this paper, we present the newly established Danish speech corpus PiTu. The corpus consists of recordings of 28 native Danish talkers (14 female and 14 male) each reproducing (i) a series of nonsense syllables, and (ii) a set of authentic natural language sentences. The speech corpus is tailored for investigating the relationship between early stages of the speech perceptual process and later stages. We present our considerations involved in preparing the experimental set-up, producing the anechoic recordings, compiling the data, and exploring the materials in linguistic research. We report on a small pilot experiment demonstrating how PiTu and similar speech corpora can be used in studies of prosody as a function of semantic content. The experiment addresses the issue of whether the governing principles of Danish prosody assignment is mainly talker-specific or mainly content-typical (under the specific experimental conditions). The corpus is available at http://amtoolbox.sourceforge.net/pitu/. URI: http://hdl.handle.net/10398/8619 Files in this item: 1
Peter_Juel_Henrichsen_2012_3.pdf (105.4Kb) -
A Multi-lingual Speech Corpus for Cognitive ResearchJuel Henrichsen, Peter; Uneson, Marcus (Frederiksberg, 2012)[More information][Less information]
Abstract: We present the speech corpus SMALLWorlds (Spoken Multi-lingual Accounts of Logically Limited Worlds), newly established and still growing. SMALLWorlds contains monologic descriptions of scenes or worlds which are simple enough to be formally describable. The descriptions are instances of content-controlled monologue: semantically “pre-specified” but still bearing most hallmarks of spontaneous speech (hesitations and filled pauses, relaxed syntax, repetitions, self-corrections, incomplete constituents, irrelevant or redundant information, etc.) as well as idiosyncratic speaker traits. In the paper, we discuss the pros and cons of data so elicited. Following that, we present a typical SMALLWorlds task: the description of a simple drawing with differently coloured circles, squares, and triangles, with no hints given as to which description strategy or language style to use. We conclude with an example on how SMALLWorlds may be used: unsupervised lexical learning from phonetic transcription. At the time of writing, SMALLWorlds consists of more than 250 recordings in a wide range of typologically diverse languages from many parts of the world, some unwritten and endangered. URI: http://hdl.handle.net/10398/8618 Files in this item: 1
Peter_Juel_Henrichsen_2012_2.pdf (172.0Kb) -
Christiansen, Thomas Ulrich; Juel Henrichsen, Peter (Frederiksberg, 2012)[More information][Less information]
Abstract: Digital hearing aids use a variety of advanced digital signal processing methods in order to improve speech intelligibility. These methods are based on knowledge about the acoustics outside the ear as well as psychoacoustics. This paper investigates the recent observation that speech elements with a high degree of information can be robustly identified based on basic acoustic properties, i.e., function words have greater spectral tilt than content words for each of the 18 Danish talkers investigated. In this paper we examine these spectral tilt differences as a function of time based on a speech material six times the duration of previous investigations. Our results show that the correlation of spectral tilt with information content is relatively constant across time, even if averaged across talkers. This indicates that it is possible to devise a robust method for estimating information density in the speech signal based on computationally simple short-term band-level differences. The principle described here has the potential to improve speech transduction in hearing aids and cochlear implants. In addition, the concept of information-based speech transduction may also be applicable in automatic speech recognition systems. URI: http://hdl.handle.net/10398/8617 Files in this item: 1
Peter_Juel_Henrichsen_1.pdf (478.2Kb)
Now showing items 1-7 of 7