ISCA Archive Interspeech 2009 Sessions Booklet
  ISCA Archive Sessions Booklet
top

Interspeech 2009

Brighton, United Kingdom
6-10 September 2009

General Chair: Roger Moore
doi: 10.21437/Interspeech.2009




Speech Analysis and Processing I-III


Nearly perfect detection of continuous f_0 contour and frame classification for TTS synthesis
Thomas Ewender, Sarah Hoffmann, Beat Pfister

AM-FM estimation for speech based on a time-varying sinusoidal model
Yannis Pantazis, Olivier Rosec, Yannis Stylianou

Voice source waveform analysis and synthesis using principal component analysis and Gaussian mixture modelling
Jon Gudnason, Mark R. P. Thomas, Patrick A. Naylor, Dan P. W. Ellis

Model-based estimation of instantaneous pitch in noisy speech
Jung Ook Hong, Patrick J. Wolfe

Complex cepstrum-based decomposition of speech for glottal source estimation
Thomas Drugman, Baris Bozkurt, Thierry Dutoit

Approximate intrinsic fourier analysis of speech
Frank Tompkins, Patrick J. Wolfe

Spectral and temporal modulation features for phonetic recognition
Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu

Use of harmonic phase information for polarity detection in speech signals
Ibon Saratxaga, Daniel Erro, Inmaculada Hernáez, Iñaki Sainz, Eva Navas

Finite mixture spectrogram modeling for multipitch tracking using a factorial hidden Markov model
Michael Wohlmayr, Franz Pernkopf

Group-delay-deviation based spectral analysis of speech
Anthony Stark, Kuldip Paliwal

Speaker dependent mapping for low bit rate coding of throat microphone speech
Joseph M. Anand, B. Yegnanarayana, Sanjeev Gupta, M. R. Kesheorey

Analysis of Lombard speech using excitation source information
G. Bapineedu, B. Avinash, Suryakanth V. Gangashetty, B. Yegnanarayana

A comparison of linear and nonlinear dimensionality reduction methods applied to synthetic speech
Andrew Errity, John McKenna

ZZT-domain immiscibility of the opening and closing phases of the LF GFM under frame length variations
C. F. Pedersen, O. Andersen, P. Dalsgaard

Dimension reducing of LSF parameters based on radial basis function neural network
Hongjun Sun, Jianhua Tao, Huibin Jia

Characterizing speaker variability using spectral envelopes of vowel sounds
A. N. Harish, D. R. Sanand, S. Umesh

Analysis of band structures for speaker-specific information in FM feature extraction
Tharmarajah Thiruvaran, Eliathamby Ambikairajah, Julien Epps

Artificial nasalization of speech sounds based on pole-zero models of spectral relations between mouth and nose signals
Karl Schnell, Arild Lacroix

Error metrics for impaired auditory nerve responses of different phoneme groups
Andrew Hines, Naomi Harte

Model-based automatic evaluation of L2 learner's English timing
Chatchawarn Hansakunbuntheung, Hiroaki Kato, Yoshinori Sagisaka

A Bayesian approach to non-intrusive quality assessment of speech
Petko N. Petkov, Iman S. Mossavat, W. Bastiaan Kleijn

Precision of phoneme boundaries derived using hidden Markov models
Ladan Baghai-Ravary, Greg Kochanski, John Coleman

A novel method for epoch extraction from speech signals
Lakshmish Kaushik, Douglas O'Shaughnessy

LS regularization of group delay features for speaker recognition
Jia Min Karen Kua, Julien Epps, Eliathamby Ambikairajah, Eric Choi

Glottal closure and opening instant detection from speech signals
Thomas Drugman, Thierry Dutoit


Speech Perception I, II


Relative importance of formant and whole-spectral cues for vowel perception
Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano

Influences of vowel duration on speaker-size estimation and discrimination
Chihiro Takeshima, Minoru Tsuzaki, Toshio Irino

High front vowels in Czech: a contrast in quantity or quality?
Václav Jonáš Podlipský, Radek Skarnitzl, Jan Volín

Effect of contralateral noise on energetic and informational masking on speech-in-speech intelligibility
Marjorie Dole, Michel Hoen, Fanny Meunier

Using location cues to track speaker changes from mobile, binaural microphones
Heidi Christensen, Jon Barker

A perceptual investigation of speech transcription errors involving frequent near-homophones in French and american English
Ioana Vasilescu, Martine Adda-Decker, Lori Lamel, Pierre Hallé

The role of glottal pulse rate and vocal tract length in the perception of speaker identity
Etienne Gaudrain, Su Li, Vin Shen Ban, Roy D. Patterson

Development of voicing categorization in deaf children with cochlear implant
Victoria Medina, Willy Serniclaes

Processing liaison-initial words in native and non-native French: evidence from eye movements
Annie Tremblay

Estimating the potential of signal and interlocutor-track information for language modeling
Nigel G. Ward, Benjamin H. Walker

Effect of r-resonance information on intelligibility
Antje Heinrich, Sarah Hawkins

Perception of temporal cues at discourse boundaries
Hsin-Yi Lin, Janice Fon

Human audio-visual consonant recognition analyzed with three bimodal integration models
Zhanyu Ma, Arne Leijon

Effects of tempo in radio commercials on young and elderly listeners
Hanny den Ouden, Hugo Quené

Self-voice recognition in 4 to 5-year-old children
Sofia Strömbergsson

Are real tongue movements easier to speech read than synthesized?
Olov Engwall, Preben Wik

Eliciting a hierarchical structure of human consonant perception task errors using formal concept analysis
Carmen Peláez-Moreno, Ana I. García-Moral, Francisco J. Valverde-Albacete

Acoustic and perceptual effects of vocal training in amateur male singing
Takeshi Saitou, Masataka Goto




Spoken Dialogue Systems


Enabling a user to specify an item at any time during system enumeration - item identification for barge-in-able conversational dialogue systems
Kyoko Matsuyama, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

System request detection in human conversation based on multi-resolution Gabor wavelet features
Tomoyuki Yamagata, Tetsuya Takiguchi, Yasuo Ariki

Using graphical models for mixed-initiative dialog management systems with realtime Policies
Stefan Schwärzler, Stefan Maier, Joachim Schenk, Frank Wallhoff, Gerhard Rigoll

Conversation robot participating in and activating a group communication
Shinya Fujie, Yoichi Matsuyama, Hikaru Taniyama, Tetsunori Kobayashi

Recent advances in WFST-based dialog system
Chiori Hori, Kiyonori Ohtake, Teruhisa Misu, Hideki Kashioka, Satoshi Nakamura

A statistical dialog manager for the LUNA project
David Griol, Giuseppe Riccardi, Emilio Sanchis

A Policy-switching learning approach for adaptive spoken dialogue agents
Heriberto Cuayáhuitl, Juventino Montiel-Hernández

Strategies for accelerating the design of dialogue applications using heuristic information from the backend database
L. F. D'Haro, R. Cordoba, R. San-Segundo, J. Macias-Guarasa, J. M. Pardo

Feature-based summary space for stochastic dialogue modeling with hierarchical semantic frames
Florian Pinault, Fabrice Lefèvre, Renato De Mori

Language modeling and dialog management for address recognition
Rajesh Balchandran, Leonid Rachevsky, Larry Sansone

A framework for rapid development of conversational natural language call routing systems for call centers
Ea-Ee Jan, Hong-Kwang Kuo, Osamuyimen Stewart, David Lubensky

The MonAMI reminder: a spoken dialogue system for face-to-face interaction
Jonas Beskow, Jens Edlund, Björn Granström, Joakim Gustafson, Gabriel Skantze, Helena Tobiasson

Influence of training on direct and indirect measures for the evaluation of multimodal systems
Julia Seebode, Stefan Schaffer, Ina Wechsung, Florian Metze

Talking heads for interacting with spoken dialog smart-home systems
Christine Kühnel, Benjamin Weiss, Sebastian Möller

Speech generation from hand gestures based on space mapping
Aki Kunikoshi, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose



Automatic Speech Recognition: Language Models I, II


Back-off language model compression
Boulos Harb, Ciprian Chelba, Jeffrey Dean, Sanjay Ghemawat

Improving broadcast news transcription with a precision grammar and discriminative reranking
Tobias Kaufmann, Thomas Ewender, Beat Pfister

Use of contexts in language model interpolation and adaptation
X. Liu, M. J. F. Gales, P. C. Woodland

Exploiting Chinese character models to improve speech recognition performance
J. L. Hieronymus, X. Liu, M. J. F. Gales, P. C. Woodland

Constraint selection for topic-based MDI adaptation of language models
Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot

Nonstationary latent Dirichlet allocation for speech recognition
Chuang-Hua Chueh, Jen-Tzung Chien

Multiple text segmentation for statistical language modeling
Sopheap Seng, Laurent Besacier, Brigitte Bigi, Eric Castelli

Measuring tagging performance of a joint language model
Denis Filimonov, Mary Harper

Improved language modelling using bag of word pairs
Langzhou Chen, K. K. Chin, Kate Knill

Morphological analysis and decomposition for Arabic speech-to-text systems
F. Diehl, M. J. F. Gales, M. Tomalin, P. C. Woodland

Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR
Amr El-Desoky, Christian Gollan, David Rybach, Ralf Schlüter, Hermann Ney

Topic dependent language model based on topic voting on noun history
Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa

Investigation of morph-based speech recognition improvements across speech genres
Péter Mihajlik, Balázs Tarján, Zoltán Tüske, Tibor Fegyó

Effective use of pause information in language modelling for speech recognition
Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa

A parallel training algorithm for hierarchical pitman-yor process language models
Songfang Huang, Steve Renals

Probabilistic and possibilistic language models based on the world wide web
Stanislas Oger, Vladimir Popescu, Georges Linarès



Statistical Parametric Synthesis I, II


Autoregressive HMMs for speech synthesis
Matt Shannon, William Byrne

Asynchronous F0 and spectrum modeling for HMM-based speech synthesis
Cheng-Cheng Wang, Zhen-Hua Ling, Li-Rong Dai

A minimum v/u error approach to F0 generation in HMM-based TTS
Yao Qian, Frank K. Soong, Miaomiao Wang, Zhizheng Wu

Voiced/unvoiced decision algorithm for HMM-based speech synthesis
Shiyin Kang, Zhiwei Shuang, Quansheng Duan, Yong Qin, Lianhong Cai

Local minimum generation error criterion for hybrid HMM speech synthesis
Xavi Gonzalvo, Alexander Gutkin, Joan Claudi Socoró, Ignasi Iriondo, Paul Taylor

Thousands of voices for HMM-based speech synthesis
Junichi Yamagishi, Bela Usabaev, Simon King, Oliver Watts, John Dines, Jilei Tian, Rile Hu, Yong Guan, Keiichiro Oura, Keiichi Tokuda, Reima Karhila, Mikko Kurimo

A Bayesian approach to Hidden Semi-Markov Model based speech synthesis
Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Rich context modeling for high quality HMM-based TTS
Zhi-Jie Yan, Yao Qian, Frank K. Soong

Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems
Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

The HMM synthesis algorithm of an embedded unified speech recognizer and synthesizer
Guntram Strecha, Matthias Wolff, Frank Duckhorn, Sören Wittenberg, Constanze Tschöpe

Syllable HMM based Mandarin TTS and comparison with concatenative TTS
Zhiwei Shuang, Shiyin Kang, Qin Shi, Yong Qin, Lianhong Cai

Pulse density representation of spectrum for statistical speech processing
Yoshinori Shiga

Parameterization of vocal fry in HMM-based speech synthesis
Hanna Silén, Elina Helander, Jani Nurminen, Moncef Gabbouj

A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis
Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit

A decision tree-based clustering approach to state definition in an excitation modeling framework for HMM-based speech synthesis
Ranniery Maia, Tomoki Toda, Keiichi Tokuda, Shinsuke Sakai, Satoshi Nakamura

An improved minimum generation error based model adaptation for HMM-based speech synthesis
Yi-Jian Wu, Long Qin, Keiichi Tokuda

Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models
Matthew Gibson

Speaker adaptation using a parallel phone set pronunciation dictionary for Thai-English bilingual TTS
Anocha Rugchatjaroen, Nattanun Thatphithakkul, Ananlada Chotimongkol, Ausdang Thangthai, Chai Wutiwiwatchai

HMM-based automatic eye-blink synthesis from speech
Michal Dziemianko, Gregor Hofer, Hiroshi Shimodaira



Human Speech Production I, II


Probabilistic effects on French [t] duration
Francisco Torreira, Mirjam Ernestus

On the production of sandhi phenomena in French: psycholinguistic and acoustic data
Odile Bagou, Violaine Michel, Marina Laganaro

Extreme reductions: contraction of disyllables into monosyllables in taiwan Mandarin
Chierh Cheng, Yi Xu

Annotation and features of non-native Mandarin tone quality
Mitchell Peabody, Stephanie Seneff

On-line formant shifting as a function of F0
Kateřina Chládková, Paul Boersma, Václav Jonáš Podlipský

Production boundary between fricative and affricate in Japanese and Korean speakers
Kimiko Yamakawa, Shigeaki Amano, Shuichi Itahashi

Aerodynamics of fricative production in european portuguese
Cátia M. R. Pinho, Luis M. T. Jesus, Anna Barney

Contextual effects on protrusion and lip opening for /i,y/
Anne Bonneau, Julie Buquet, Brigitte Wrobel-Dautcourt

Speech rate effects on european portuguese nasal vowels
Catarina Oliveira, Paula Martins, António Teixeira

Relation of formants and subglottal resonances in Hungarian vowels
Tamás Gábor Csapó, Zsuzsanna Bárkányi, Tekla Etelka Gráczi, Tamás Bőhm, Steven M. Lulich

Simple physical models of the vocal tract for education in speech science
Takayuki Arai

Auto-meshing algorithm for acoustic analysis of vocal tract
Kyohei Hayashi, Nobuhiro Miki

Voice production model employing an interactive boundary-layer analysis of glottal flow
Tokihiko Kaburagi, Katsunori Daimo, Shogo Nakamura

Characteristics of two-dimensional finite difference techniques for vocal tract analysis and voice synthesis
Matt Speed, Damian Murphy, David M. Howard

Adaptation of a predictive model of tongue shapes
Chao Qin, Miguel Á. Carreira-Perpiñán

Using sensor orientation information for computational head stabilisation in 3d electromagnetic articulography (EMA)
Christian Kroos

Collision threshold pressure before and after vocal loading
Laura Enflo, Johan Sundberg, Friedemann Pabst

Gender differences in the realization of vowel-initial glottalization
Elke Philburn

Stability and composition of functional synergies for speech movements in children and adults
Hayo Terband, Frits van Brenk, Pascal van Lieshout, Lian Nijland, Ben Maassen

An analysis of speech rate strategies in aging
Frits van Brenk, Hayo Terband, Pascal van Lieshout, Anja Lowit, Ben Maassen

Variability and stability in collaborative dialogues: turn-taking and filled pauses
Štefan Beňuš

Speaking in the presence of a competing talker
Youyi Lu, Martin Cooke


Prosody, Text Analysis, and Multilingual Models


Polyglot speech prosody control
Harald Romsdorfer

Weighted neural network ensemble models for speech prosody control
Harald Romsdorfer

Cross-language F0 modeling for under-resourced tonal languages: a case study on Thai-Mandarin
Vataya Boonpiam, Anocha Rugchatjaroen, Chai Wutiwiwatchai

Prosodic issues in synthesising thadou, a tibeto-burman tone language
Dafydd Gibbon, Pramod Pandey, D. Mary Kim Haokip, Jolanta Bachan

Advanced unsupervised joint prosody labeling and modeling for Mandarin speech and its application to prosody generation for TTS
Chen-Yu Chiang, Sin-Horng Chen, Yih-Ru Wang

Optimization of t-tilt F0 modeling
Ausdang Thangthai, Anocha Rugchatjaroen, Nattanun Thatphithakkul, Ananlada Chotimongkol, Chai Wutiwiwatchai

A multi-level context-dependent prosodic model applied to durational modeling
Nicolas Obin, Xavier Rodet, Anne Lacheret-Dujour

Sentiment classification in English from sentence-level annotations of emotions regarding models of affect
Alexandre Trilla, Francesc Alías

Identification of contrast and its emphatic realization in HMM based speech synthesis
Leonardo Badino, J. Sebastian Andersson, Junichi Yamagishi, Robert A. J. Clark

How to improve TTS systems for emotional expressivity
Antonio Rui Ferreira Rebordao, Mostafa Al Masum Shaikh, Keikichi Hirose, Nobuaki Minematsu

State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis
Yi-Jian Wu, Yoshihiko Nankaku, Keiichi Tokuda

Real voice and TTS accent effects on intelligibility and comprehension for indian speakers of English as a second language
Frederick Weber, Kalika Bali

Improving consistence of phonetic transcription for text-to-speech
Pablo Daniel Agüero, Antonio Bonafonte, Juan Carlos Tulli


Automatic Speech Recognition: Adaptation I, II


On the development of matched and mismatched Italian children's speech recognition systems
Piero Cosi

Combination of acoustic and lexical speaker adaptation for disordered speech recognition
Oscar Saz, Eduardo Lleida, Antonio Miguel

Bilinear transformation space-based maximum likelihood linear regression frameworks
Hwa Jeon Song, Yongwon Jeong, Hyung Soon Kim

Speaking style adaptation for spontaneous speech recognition using multiple-regression HMM
Yusuke Ijima, Takeshi Matsubara, Takashi Nose, Takao Kobayashi

Acoustic class specific VTLN-warping using regression class trees
S. P. Rath, S. Umesh

Speaker normalization for template based speech recognition
Sébastien Demange, Dirk Van Compernolle

Improving the robustness with multiple sets of HMMs
Hans-Günter Hirsch, Andreas Kitzig

On the use of pitch normalization for improving children's speech recognition
Rohit Sinha, Shweta Ghai

Using VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors
S. P. Rath, S. Umesh, A. K. Sarkar

Speaker adaptation based on two-step active learning
Koichi Shinoda, Hiroko Murakami, Sadaoki Furui

Tree-based estimation of speaker characteristics for speech recognition
Mats Blomberg, Daniel Elenius

A study on the influence of covariance adaptation on jacobian compensation in vocal tract length normalization
D. R. Sanand, S. P. Rath, S. Umesh

On the estimation and the use of confusion-matrices for improving ASR accuracy
Omar Caballero Morales, Stephen J. Cox

A study on soft margin estimation of linear regression parameters for speaker adaptation
Shigeki Matsuda, Yu Tsao, Jinyu Li, Satoshi Nakamura, Chin-Hui Lee

Exploring the role of spectral smoothing in context of children's speech recognition
Shweta Ghai, Rohit Sinha

Unsupervised lattice-based acoustic model adaptation for speaker-dependent conversational telephone speech transcription
K. Thambiratnam, F. Seide

Rapid unsupervised adaptation using frame independent output probabilities of gender and context independent phoneme models
Satoshi Kobashikawa, Atsunori Ogawa, Yoshikazu Yamaguchi, Satoshi Takahashi

Bark-shift based nonlinear speaker normalization using the second subglottal resonance
Shizhen Wang, Yi-Hui Lee, Abeer Alwan


Applications in Learning and Other Areas


Designing spoken tutorial dialogue with children to elicit predictable but educationally valuable responses
Gregory Aist, Jack Mostow

Optimizing non-native speech recognition for CALL applications
Joost van Doremalen, Helmer Strik, Catia Cucchiarini

Evaluation of English intonation based on combination of multiple evaluation scores
Akinori Ito, Tomoaki Konno, Masashi Ito, Shozo Makino

A language-independent feature set for the automatic evaluation of prosody
Andreas Maier, F. Hönig, V. Zeissler, Anton Batliner, E. Körner, N. Yamanaka, P. Ackermann, Elmar Nöth

Adapting the acoustic model of a speech recognizer for varied proficiency non-native spontaneous speech using read speech with language-specific pronunciation difficulty
Klaus Zechner, Derrick Higgins, René Lawless, Yoko Futagi, Sarah Ohls, George Ivanov

Analysis and utilization of MLLR speaker adaptation technique for learners' pronunciation evaluation
Dean Luo, Yu Qiao, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose

Control of human generating force by use of acoustic information - study on onomatopoeic utterances for controlling small lifting-force
Miki Iimura, Taichi Sato, Kihachiro Tanaka

Mi-DJ: a multi-source intelligent DJ service
Ching-Hsien Lee, Hsu-Chih Wu

Human voice or prompt generation? can they co-exist in an application?
Géza Németh, Csaba Zainkó, Mátyás Bartalis, Gábor Olaszy, Géza Kiss

Automatic vs. human question answering over multimedia meeting recordings
Quoc Anh Le, Andrei Popescu-Belis







Speech and Audio Segmentation and Classification


Wavelet-based speaker change detection in single channel speech data
Michael Wiesenegger, Franz Pernkopf

An adaptive threshold computation for unsupervised speaker segmentation
Laura Docio-Fernandez, Paula Lopez-Otero, Carmen Garcia-Mateo

A data-driven approach for estimating the time-frequency binary mask
Gibak Kim, Philipos C. Loizou

A semi-supervised version of heteroscedastic linear discriminant analysis
Haolang Zhou, Damianos Karakos, Andreas G. Andreou

Self-learning vector quantization for pattern discovery from speech
Okko Johannes Räsänen, Unto Kalervo Laine, Toomas Altosaar

Monaural segregation of voiced speech using discriminative random fields
Rohit Prabhavalkar, Zhaozhang Jin, Eric Fosler-Lussier

Advancements in whisper-island detection within normally phonated audio streams
Chi Zhang, John H. L. Hansen

Joint segmentation and classification of dialog acts using conditional random fields
Matthias Zimmermann

Exploring complex vowels as phrase break correlates in a corpus of English speech with proPOSEL, a prosody and POS English lexicon
Claire Brierley, Eric Atwell

Automatic topic detection of recorded voice messages
Caroline Clemens, Stefan Feldes, Karlheinz Schuhmacher, Joachim Stegmann

Identification and automatic detection of parasitic speech sounds
Jindřich Matoušek, Radek Skarnitzl, Pavel Machač, Jan Trmal

Phonetic alignment for speech synthesis in under-resourced languages
D. R. van Niekerk, Etienne Barnard

Improving initial boundary estimation for HMM-based automatic phonetic segmentation
Kalu U. Ogbureke, Julie Carson-Berndsen



Special Session: Advanced Voice Function Assessment


Acoustic and high-speed digital imaging based analysis of pathological voice contributes to better understanding and differential diagnosis of neurological dysphonias and of mimicking phonatory disorders
Krzysztof Izdebski, Yuling Yan, Melda Kunduk

Normalized modulation spectral features for cross-database voice pathology detection
Maria Markaki, Yannis Stylianou

Speech sample salience analysis for speech cycle detection
C. Mertens, Francis Grenez, Jean Schoentgen

The use of telephone speech recordings for assessment and monitoring of cognitive function in elderly people
Viliam Rapcan, Shona D'Arcy, Nils Penard, Ian H. Robertson, Richard B. Reilly

Optimized feature set to assess acoustic perturbations in dysarthric speech
Sunil Nagaraja, Eduardo Castillo-Guerra

A microphone-independent visualization technique for speech disorders
Andreas Maier, Stefan Wenhardt, Tino Haderlein, Maria Schuster, Elmar Nöth

Evaluation of the effect of the GSM full rate codec on the automatic detection of laryngeal pathologies based on cepstral analysis
Rubén Fraile, Carmelo Sánchez, Juan I. Godino-Llorente, Nicolás Sáenz-Lechón, Víctor Osma-Ruiz, Juana M. Gutiérrez

Cepstral analysis of vocal dysperiodicities in disordered connected speech
A. Alpan, Jean Schoentgen, Y. Maryn, Francis Grenez, P. Murphy

Standard information from patients: the usefulness of self-evaluation (measured with the French version of the VHI)
Lise Crevier-Buchman, Stephanie Borel, Stéphane Hans, Madeleine Menard, Jacqueline Vaissiere

Intelligibility assessment in children with cleft lip and palate in Italian and German
Marcello Scipioni, Matteo Gerosa, Diego Giuliani, Elmar Nöth, Andreas Maier

Universidade de aveiro's voice evaluation protocol
Luis M. T. Jesus, Anna Barney, Ricardo Santos, Janine Caetano, Juliana Jorge, Pedro Sá Couto



Prosody: Production I, II


Did you say a BLUE banana? the prosody of contrast and abnormality in bulgarian and dutch
Diana V. Dimitrova, Gisela Redeker, John C. J. Hoeks

A quantitative study of F0 peak alignment and sentence modality
Hansjörg Mixdorff, Hartmut R. Pfitzinger

Closely related languages, different ways of realizing focus
Szu-wei Chen, Bei Wang, Yi Xu

Cross-variety rhythm typology in portuguese
Plínio A. Barbosa, M. Céu Viana, Isabel Trancoso

Pitch adaptation in different age groups: boundary tones versus global pitch
Marie Nilsenová, Marc Swerts, Véronique Houtepen, Heleen Dittrich

Backchannel-inviting cues in task-oriented dialogue
Agustín Gravano, Julia Hirschberg

Perception and production of boundary tones in whispered dutch
W. Heeren, V. J. Van Heuven

Pitch accents and information status in a German radio news corpus
Katrin Schweitzer, Arndt Riester, Michael Walsh, Grzegorz Dogil

Analysis of voice fundamental frequency contours of continuing and terminating prosodic phrases in four swiss German dialects
Adrian Leemann, Keikichi Hirose, Hiroya Fujisaki

Intonational features for identifying regional accents of Italian
Michelina Savino

Analysis and recognition of accentual patterns
Agnieszka Wagner

Using responsive prosodic variation to acknowledge the user's current state
Nigel G. Ward, Rafael Escalante-Ruiz

Intonation segments and segmental intonation
Oliver Niebuhr

The phrase-final accent in kammu: effects of tone, focus and engagement
David House, Anastasia Karlsson, Jan-Olof Svantesson, Damrong Tayanin

Tonal alignment in three varieties of hiberno-English
Raya Kalaldeh, Amelie Dorn, Ailbhe Ní Chasaide

Determining intonational boundaries from the acoustic signal
Lourdes Aguilar, Antonio Bonafonte, Francisco Campillo, David Escudero

Compression and truncation revisited
Claudia K. Ohl, Hartmut R. Pfitzinger

Comparison of Fujisaki-model extractors and F0 stylizers
Hartmut R. Pfitzinger, Hansjörg Mixdorff, Jan Schwarz

Is tonal alignment interpretation independent of methodology?
Caterina Petrone, Mariapaola D'Imperio

Modeling the intonation of topic structure: two approaches
Margaret Zellers, Brechtje Post, Mariapaola D'Imperio




Speech Processing with Audio or Audiovisual Input


Application of differential microphone array for IS-127 EVRC rate determination algorithm
Henry Widjaja, Suryoadhi Wibowo

Estimating the position and orientation of an acoustic source with a microphone array network
Alberto Yoshihiro Nakano, Seiichi Nakagawa, Kazumasa Yamamoto

Singing voice detection in polyphonic music using predominant pitch
Vishweshwara Rao, S. Ramakrishnan, Preeti Rao

Word stress assessment for computer aided language learning
Juan Pablo Arias, Nestor Becerra Yoma, Hiram Vivanco

A non-intrusive signal-based model for speech quality evaluation using automatic classification of background noises
Adrien Leman, Julien Faure, Etienne Parizet

Acoustic event detection for spotting “hot spots” in podcasts
Kouhei Sumi, Tatsuya Kawahara, Jun Ogata, Masataka Goto

Improving detection of acoustic events using audiovisual data and feature level fusion
T. Butko, C. Canton-Ferrer, C. Segura, X. Giró, C. Nadeu, J. Hernando, J. R. Casas

Detecting audio events for semantic video search
M. Bugalho, J. Portêlo, Isabel Trancoso, T. Pellegrini, Alberto Abad

Factor analysis for audio-based video genre classification
Mickael Rouvier, Driss Matrouf, Georges Linarès

Robust audio-based classification of video genre
Mickael Rouvier, Georges Linarès, Driss Matrouf

Fusing audio and video information for online speaker diarization
Joerg Schmalenstroeer, Martin Kelling, Volker Leutnant, Reinhold Haeb-Umbach

Multimodal speaker verification using ancillary known speaker characteristics such as gender or age
Girija Chetty, Michael Wagner

Discovering keywords from cross-modal input: ecological vs. engineering methods for enhancing acoustic repetitions
Guillaume Aimetti, Roger K. Moore, L. ten Bosch, Okko Johannes Räsänen, Unto Kalervo Laine



Robust Automatic Speech Recognition I-III


Optimization of dereverberation parameters based on likelihood of speech recognizer
Randy Gomez, Tatsuya Kawahara

Application of noise robust MDT speech recognition on the SPEECON and speechdat-car databases
J. F. Gemmeke, Y. Wang, Maarten Van Segbroeck, B. Cranen, Hugo Van hamme

Model based feature enhancement for automatic speech recognition in reverberant environments
Alexander Krueger, Reinhold Haeb-Umbach

A study of mutual front-end processing method based on statistical model for noise robust speech recognition
Masakiyo Fujimoto, Kentaro Ishizuka, Tomohiro Nakatani

Integrating codebook and utterance information in cepstral statistics normalization techniques for robust speech recognition
Guan-min He, Jeih-weih Hung

Reduced complexity equalization of lombard effect for speech recognition in noisy adverse environments
Hynek Bořil, John H. L. Hansen

Unsupervised training scheme with non-stereo data for empirical feature vector compensation
L. Buera, Antonio Miguel, Alfonso Ortega, Eduardo Lleida, Richard M. Stern

Incremental adaptation with VTS and joint adaptively trained systems
F. Flego, M. J. F. Gales

Target speech GMM-based spectral compensation for noise robust speech recognition
Takahiro Shinozaki, Sadaoki Furui

Noise-robust feature extraction based on forward masking
Sheng-Chiuan Chiou, Chia-Ping Chen

Noisy speech recognition by using output combination of discrete-mixture HMMs and continuous-mixture HMMs
Tetsuo Kosaka, You Saito, Masaharu Kato

Adaptive training with noisy constrained maximum likelihood linear regression for noise robust speech recognition
D. K. Kim, M. J. F. Gales

Performance comparisons of the integrated parallel model combination approaches with front-end noise reduction
Guanghu Shen, Soo-Young Suk, Hyun-Yeol Chung

Tuning support vector machines for robust phoneme classification with acoustic waveforms
Jibran Yousafzai, Zoran Cvetković, Peter Sollich

An analytic derivation of a phase-sensitive observation model for noise robust speech recognition
Volker Leutnant, Reinhold Haeb-Umbach

Variational model composition for robust speech recognition with time-varying background noise
Wooil Kim, John H. L. Hansen

Comparison of estimation techniques in joint uncertainty decoding for noise robust speech recognition
Haitian Xu, K. K. Chin

Replacing uncertainty decoding with subband re-estimation for large vocabulary speech recognition in noise
Jianhua Lu, Ji Ming, Roger Woods

Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement
Ramón Fernandez Astudillo, Dorothea Kolossa, Reinhold Orglmeister

Signal separation for robust speech recognition based on phase difference information obtained in the frequency domain
Chanwoo Kim, Kshitiz Kumar, Bhiksha Raj, Richard M. Stern

Transforming features to compensate speech recogniser models for noise
R. C. van Dalen, F. Flego, M. J. F. Gales

Subband temporal modulation spectrum normalization for automatic speech recognition in reverberant environments
Xugang Lu, Masashi Unoki, Satoshi Nakamura

Robust in-car spelling recognition - a tandem BLSTM-HMM approach
Martin Wöllmer, Florian Eyben, Björn Schuller, Yang Sun, Tobias Moosmayr, Nhu Nguyen-Thien

Applying non-negative matrix factorization on time-frequency reassignment spectra for missing data mask estimation
Maarten Van Segbroeck, Hugo Van hamme


Speaker Verification and Identification I-III


Investigation into variants of joint factor analysis for speaker recognition
Lukáš Burget, Pavel Matějka, Valiantsina Hubeika, Jan Černocký

Improved GMM-based speaker verification using SVM-driven impostor dataset selection
Mitchell McLaren, Robbie Vogt, Brendan Baker, Sridha Sridharan

Adaptive individual background model for speaker verification
Yossi Bar-Yosef, Yuval Bistritz

Optimization of discriminative kernels in SVM speaker verification
Shi-Xiong Zhang, Man-Wai Mak

UBM-based sequence kernel for speaker recognition
Zhenchun Lei

GMM kernel by Taylor series for speaker verification
Minqiang Xu, Xi Zhou, Beiqian Dai, Thomas S. Huang

Does session variability compensation in speaker recognition model intrinsic variation under mismatched conditions?
Elizabeth Shriberg, Sachin Kajarekar, Nicolas Scheffer

Variability compensated support vector machines applied to speaker verification
Zahi N. Karam, W. M. Campbell

Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification
Najim Dehak, Réda Dehak, Patrick Kenny, Niko Brümmer, Pierre Ouellet, Pierre Dumouchel

Within-session variability modelling for factor analysis speaker verification
Robbie Vogt, Jason Pelecanos, Nicolas Scheffer, Sachin Kajarekar, Sridha Sridharan

Speaker recognition by Gaussian information bottleneck
Ron M. Hecht, Elad Noor, Naftali Tishby

Variational dynamic kernels for speaker verification
C. Longworth, R. C. van Dalen, M. J. F. Gales

Mel, linear, and antimel frequency cepstral coefficients in broad phonetic regions for telephone speaker recognition
Howard Lei, Eduardo Lopez

Fast GMM computation for speaker verification using scalar quantization and discrete densities
Guoli Ye, Brian Mak, Man-Wai Mak

Text-independent speaker identification using vocal tract length normalization for building universal background model
A. K. Sarkar, S. Umesh, S. P. Rath

BUT system for NIST 2008 speaker recognition evaluation
Lukáš Burget, Michal Fapšo, Valiantsina Hubeika, Ondřej Glembek, Martin Karafiát, Marcel Kockmann, Pavel Matějka, Petr Schwarz, Jan Černocký

Selection of the best set of shifted delta cepstral features in speaker verification using mutual information
José R. Calvo, Rafael Fernández, Gabriel Hernández

Forensic speaker recognition using traditional features comparing automatic and human-in-the-loop formant tracking
Alberto de Castro, Daniel Ramos, Joaquin Gonzalez-Rodriguez

Open-set speaker identification under mismatch conditions
S. G. Pillay, A. Ariyaeeinia, P. Sivakumaran, M. Pawlewski

Minivectors: an improved GMM-SVM approach for speaker verification
Xavier Anguera

Robustness of phase based features for speaker recognition
R. Padmanabhan, Sree Hari Krishnan Parthasarathi, Hema A. Murthy

The MIT lincoln laboratory 2008 speaker recognition system
D. E. Sturim, W. M. Campbell, Zahi N. Karam, Douglas Reynolds, F. S. Richardson

Speaker recognition on lossy compressed speech using the speex codec
A. R. Stauffer, A. D. Lawson

Text-independent speaker verification using rank threshold in large number of speaker models
Haruka Okamoto, Satoru Tsuge, Amira Abdelwahab, Masafumi Nishida, Yasuo Horiuchi, Shingo Kuroiwa

The role of age in factor analysis for speaker identification
Yun Lei, John H. L. Hansen

Do humans and speaker verification system use the same information to differentiate voices?
Juliette Kahn, Solange Rossato



Single- and Multichannel Speech Enhancement


Watermark recovery from speech using inverse filtering and sign correlation
Robert Morris, Ralph Johnson, Vladimir Goncharoff, Joseph DiVita

Weighted linear prediction for speech analysis in noisy conditions
Jouni Pohjalainen, Heikki Kallasjoki, Kalle J. Palomäki, Mikko Kurimo, Paavo Alku

Log-spectral magnitude MMSE estimators under super-Gaussian densities
Richard C. Hendriks, Richard Heusdens, Jesper Jensen

Speech enhancement in a 2-dimensional area based on power spectrum estimation of multiple areas with investigation of existence of active sources
Yusuke Hioka, Ken'ichi Furuya, Yoichi Haneda, Akitoshi Kataoka

Modulation domain spectral subtraction for speech enhancement
Kuldip Paliwal, Belinda Schwerin, Kamil Wójcicki

Variational loopy belief propagation for multi-talker speech recognition
Steven J. Rennie, John R. Hershey, Peder A. Olsen

Enhancement of binaural speech using codebook constrained iterative binaural wiener filter
Nadir Cazi, T. V. Sreenivas

A semi-blind source separation method with a less amount of computation suitable for tiny DSP modules
Kazunobu Kondo, Makoto Yamada, Hideki Kenmochi

Model-based speech separation: identifying transcription using orthogonality
S. W. Lee, Frank K. Soong, Tan Lee

Enhanced minimum statistics technique incorporating soft decision for noise suppression
Yun-Sik Park, Ji-Hyun Song, Jae-Hun Choi, Joon-Hyuk Chang

Effect of noise reduction on reaction time to speech in noise
Mark Huckvale, Jayne Leak

Joint noise reduction and dereverberation of speech using hybrid TF-GSC and adaptive MMSE estimator
Behdad Dashtbozorg, Hamid Reza Abutalebi

A study on multiple sound source localization with a distributed microphone system
Kook Cho, Takanobu Nishiura, Yoichi Yamashita

Robust minimal variance distortionless speech power spectra enhancement using order statistic filter for microphone array
Tao Yu, John H. L. Hansen

Speech enhancement minimizing generalized euclidean distortion using supergaussian priors
Amit Das, John H. L. Hansen

STFT-based speech enhancement by reconstructing the harmonics
Iman Haji Abolhassani, Sid-Ahmed Selouani, Douglas O'Shaughnessy

Joint speech enhancement and speaker identification using monte carlo methods
Ciira wa Maina, John MacLaren Walsh



Assistive Speech Technology


Personalizing synthetic voices for people with progressive speech disorders: judging voice similarity
S. M. Creer, S. P. Cunningham, P. D. Green, K. Fatema

Electrolaryngeal speech enhancement based on statistical voice conversion
Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

Age recognition for spoken dialogue systems: do we need it?
Maria Wolters, Ravichander Vipperla, Steve Renals

Speech-based and multimodal media center for different user groups
Markku Turunen, Jaakko Hakulinen, Aleksi Melto, Juho Hella, Juha-Pekka Rajaniemi, Erno Mäkinen, Jussi Rantala, Tomi Heimonen, Tuuli Laivo, Hannu Soronen, Mervi Hansen, Pellervo Valkama, Toni Miettinen, Roope Raisamo

Virtual speech reading support for hard of hearing in a domestic multi-media setting
Samer Al Moubayed, Jonas Beskow, Ann-Marie Öster, Giampiero Salvi, Björn Granström, Nic van Son, Ellen Ormel

Real-time correction of closed-captions
Patrick Cardinal, Gilles Boulianne

Universal access: speech recognition for talkers with spastic dysarthria
Harsh Vardhan Sharma, Mark Hasegawa-Johnson

Exploring speech therapy games with children on the autism spectrum
Mohammed E. Hoque, Joseph K. Lane, Rana el Kaliouby, Matthew Goodwin, Rosalind W. Picard

Analyzing GMMs to characterize resonance anomalies in speakers suffering from apnoea
José Luis Blanco, Rubén Fernández, David Pardo, Álvaro Sigüenza, Luis A. Hernández, José Alcázar

On the mutual information between source and filter contributions for voice pathology detection
Thomas Drugman, Thomas Dubuisson, Thierry Dutoit

A system for detecting miscues in dyslexic read speech
Morten Højfeldt Rasmussen, Zheng-Hua Tan, Børge Lindberg, Søren Holdt Jensen


Topics in Spoken Language Processing


Techniques for rapid and robust topic identification of conversational telephone speech
Jonathan Wintrode, Scott Kulp

Localization of speech recognition in spoken dialog systems: how machine translation can make our lives easier
David Suendermann, Jackson Liscombe, Krishna Dayanidhi, Roberto Pieraccini

Algorithms for speech indexing in microsoft recite
Kunal Mukerjee, Shankar Regunathan, Jeffrey Cole

Parallelized viterbi processor for 5,000-word large-vocabulary real-time continuous speech recognition FPGA system
Tsuyoshi Fujinaga, Kazuo Miura, Hiroki Noguchi, Hiroshi Kawaguchi, Masahiko Yoshimoto

SplaSH (spoken language search hawk): integrating time-aligned with text-aligned annotations
Sara Romano, Elvio Cecere, Francesco Cutugno

Podcastle: collaborative training of acoustic models on the basis of wisdom of crowds for podcast transcription
Jun Ogata, Masataka Goto

A WFST-based log-linear framework for speaking-style transformation
Graham Neubig, Shinsuke Mori, Tatsuya Kawahara

Clusterrank: a graph based method for meeting summarization
Nikhil Garg, Benoit Favre, Korbinian Reidhammer, Dilek Hakkani-Tür

Leveraging sentence weights in a concept-based optimization framework for extractive meeting summarization
Shasha Xie, Benoit Favre, Dilek Hakkani-Tür, Yang Liu

Hybrids of supervised and unsupervised models for extractive speech summarization
Shih-Hsiang Lin, Yueng-Tien Lo, Yao-Ming Yeh, Berlin Chen

Automatic detection of audio advertisements
I. Dan Melamed, Yeon-Jun Kim

Named entity network based on wikipedia
Sameer Maskey, Wisam Dakka



Emotion and Expression I, II


Emotion dimensions and formant position
Martijn Goudbeek, Jean Philippe Goldman, Klaus R. Scherer

Identifying uncertain words within an utterance via prosodic features
Heather Pon-Barry, Stuart Shieber

Evaluating evaluators: a case study in understanding the benefits and pitfalls of multi-evaluator modeling
Emily Mower, Maja J. Matarić, Shrikanth S. Narayanan

Responding to user emotional state by adding emotional coloring to utterances
Jaime C. Acosta, Nigel G. Ward

Analysis of laugh signals for detecting in continuous speech
K. Sudheer Kumar, M. Sri Harish Reddy, K. Sri Rama Murty, B. Yegnanarayana

Data-driven clustering in emotional space for affect recognition using discriminatively trained LSTM networks
Martin Wöllmer, Florian Eyben, Björn Schuller, Ellen Douglas-Cowie, Roddy Cowie

Perceiving surprise on cue words: prosody and semantics interact on right and really
Catherine Lai

Emotion recognition using linear transformations in combination with video
Rok Gajšek, Vitomir Štruc, Simon Dobrišek, France Mihelič

Speaker dependent emotion recognition using prosodic supervectors
Ignacio Lopez-Moreno, Carlos Ortego-Resa, Joaquin Gonzalez-Rodriguez, Daniel Ramos

Physiologically-inspired feature extraction for emotion recognition
Yu Zhou, Yanqing Sun, Junfeng Li, Jianping Zhang, Yonghong Yan

Perceived loudness and voice quality in affect cueing
Irena Yanushevskaya, Christer Gobl, Ailbhe Ní Chasaide

Modeling mutual influence of interlocutor emotion states in dyadic spoken interactions
Chi-Chun Lee, Carlos Busso, Sungbok Lee, Shrikanth S. Narayanan

A detailed study of word-position effects on emotion expression in speech
Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan

CMAC for speech emotion profiling
Norhaslinda Kamaruddin, Abdul Wahab

On the relevance of high-level features for speaker independent emotion recognition of spontaneous speech
Marko Lugger, Bin Yang

Recognising interest in conversational speech - comparing bag of frames and supra-segmental features
Björn Schuller, Gerhard Rigoll


Voice Transformation I, II


Many-to-many eigenvoice conversion with reference voice
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

Alleviating the one-to-many mapping problem in voice conversion with context-dependent modeling
Elizabeth Godoy, Olivier Rosec, Thierry Chonavel

Efficient modeling of temporal structure of speech for applications in voice transformation
Binh Phu Nguyen, Masato Akagi

Cross-language voice conversion based on eigenvoices
Malorie Charlier, Yamato Ohtani, Tomoki Toda, Alexis Moinet, Thierry Dutoit

Voice conversion using k-histograms and frame selection
Alejandro José Uriz, Pablo Daniel Agüero, Antonio Bonafonte, Juan Carlos Tulli

Online model adaptation for voice conversion using model-based speech synthesis techniques
Dalei Wu, Baojie Li, Hui Jiang, Qian-Jie Fu

HMM adaptation and voice conversion for the synthesis of child speech: a comparison
Oliver Watts, Junichi Yamagishi, Simon King, Kay Berkling

HMM-based speaker characteristics emphasis using average voice model
Takashi Nose, Junichi Adada, Takao Kobayashi

An evaluation methodology for prosody transformation systems based on chirp signals
Damien Lolive, Nelly Barbot, Olivier Boeffard

Voice morphing based on interpolation of vocal tract area functions using AR-HMM analysis of speech
Yoshiki Nambu, Masahiko Mikawa, Kazuyo Tanaka

A novel model-based pitch conversion method for Mandarin speech
Hsin-Te Hwang, Chen-Yu Chiang, Po-Yi Sung, Sin-Horng Chen

Observation of empirical cumulative distribution of vowel spectral distances and its application to vowel based voice conversion
Hideki Kawahara, Masanori Morise, Toru Takahashi, Hideki Banno, Ryuichi Nisimura, Toshio Irino

Japanese pitch conversion for voice morphing based on differential modeling
Ryuki Tachibana, Zhiwei Shuang, Masafumi Nishimura

A novel technique for voice conversion based on style and content decomposition with bilinear models
Victor Popa, Jani Nurminen, Moncef Gabbouj

Rule-based voice quality variation with formant synthesis
Felix Burkhardt



Prosody Perception and Language Acquisition


Perception of English compound vs. phrasal stress: natural vs. synthetic speech
Irene Vogel, Arild Hestvik, H. Timothy Bunnell, Laura Spinu

New method for delexicalization and its application to prosodic tagging for text-to-speech synthesis
Martti Vainio, Antti Suni, Tuomo Raitio, Jani Nurminen, Juhani Järvikivi, Paavo Alku

Speech rate and pauses in non-native Finnish
Minnaleena Toivola, Mietta Lennes, Eija Aho

Modelling similarity perception of intonation
Uwe D. Reichel, Felicitas Kleber, Raphael Winkelmann

Studying L2 suprasegmental features in asian Englishes: a position paper
Helen Meng, Chiu-yu Tseng, Mariko Kondo, Alissa Harrison, Tanya Viscelgia

Classification of disfluent phenomena as fluent communicative devices in specific prosodic contexts
Helena Moniz, Isabel Trancoso, Ana Isabel Mata

Cross-cultural perception of discourse phenomena
Rolf Carlson, Julia Hirschberg

Modelling vocabulary growth from birth to young adulthood
Roger K. Moore, L. ten Bosch

Adaptive non-negative matrix factorization in a computational model of language acquisition
Joris Driesen, L. ten Bosch, Hugo Van hamme

Classifying clear and conversational speech based on acoustic features
Akiko Amano-Kusumoto, John-Paul Hosom, Izhak Shafran

The acoustic characteristics of Russian vowels in children of 6 and 7 years of age
Elena E. Lyakso, Olga V. Frolova, Aleks S. Grigoriev

Japanese children's acquisition of prosodic Politeness expressions
Takaaki Shochi, Donna Erickson, Kaoru Sekiyama, Albert Rilliard, Véronique Aubergé

Perceptual training of singleton and geminate stops in Japanese language by Korean learners
Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka


Resources, Annotation and Evaluation


Resources for speech research: present and future infrastructure needs
Lou Boves, Rolf Carlson, Erhard Hinrichs, David House, Steven Krauwer, Lothar Lemnitzer, Martti Vainio, Peter Wittenburg

Speech recordings via the internet: an overview of the VOYS project in scotland
Catherine Dickie, Felix Schaeffler, Christoph Draxler, Klaus Jänsch

The multi-session audio research project (MARP) corpus: goals, design and initial findings
A. D. Lawson, A. R. Stauffer, E. J. Cupples, S. J. Wenndt, W. P. Bray, J. J. Grieco

Structure and annotation of Polish LVCSR speech database
Katarzyna Klessa, Grażyna Demenko

Balanced corpus of informal spoken Czech: compilation, design and findings
Martina Waclawičová, Michal Křen, Lucie Válková

JTrans: an open-source software for semi-automatic text-to-speech alignment
C. Cerisara, O. Mella, D. Fohr

Predicting the quality of multimodal systems based on judgments of single modalities
Ina Wechsung, Klaus-Peter Engelbrecht, Anja B. Naumann, Stefan Schaffer, Julia Seebode, Florian Metze, Sebastian Möller

Auto-checking speech transcriptions by multiple template constrained posterior
Lijuan Wang, Shenghao Qin, Frank K. Soong

Subjective experiments on influence of response timing in spoken dialogues
Toshihiko Itoh, Norihide Kitaoka, Ryota Nishimura

Usability study of VUI consistent with GUI focusing on age-groups
Jun Okamoto, Tomoyuki Kato, Makoto Shozakai

Annotating communicative function and semantic content in dialogue act for construction of consulting dialogue systems
Teruhisa Misu, Kiyonori Ohtake, Chiori Hori, Hideki Kashioka, Satoshi Nakamura

Improved speech summarization with multiple-hypothesis representations and kullback-leibler divergence measures
Shih-Hsiang Lin, Berlin Chen

An improved speech segmentation quality measure: the r-value
Okko Johannes Räsänen, Unto Kalervo Laine, Toomas Altosaar

No sooner said than done? testing incrementality of semantic interpretations of spontaneous speech
Michaela Atterer, Timo Baumann, David Schlangen





ASR: New Paradigms I, II


The semi-supervised switchboard transcription project
Amarnag Subramanya, Jeff Bilmes

Maximum mutual information multi-phone units in direct modeling
Geoffrey Zweig, Patrick Nguyen

Profiling large-vocabulary continuous speech recognition on embedded devices: a hardware resource sensitivity analysis
Kai Yu, Rob A. Rutenbar

Continuous speech recognition using attention shift decoding with soft decision
Ozlem Kalinli, Shrikanth S. Narayanan

Towards using hybrid word and fragment units for vocabulary independent LVCSR systems
Ariya Rastrow, Abhinav Sethy, Bhuvana Ramabhadran, Frederick Jelinek

Unsupervised training of an HMM-based speech recognizer for topic classification
Herbert Gish, Man-hung Siu, Arthur Chan, Bill Belfield

The case for case-based automatic speech recognition
Viktoria Maier, Roger K. Moore

A self-labeling speech corpus: collecting spoken words with an online educational game
Ian McGraw, Alexander Gruenstein, Andrew Sutherland

A noise robust method for pattern discovery in quantized time series: the concept matrix approach
Okko Johannes Räsänen, Unto Kalervo Laine, Toomas Altosaar

Using parallel architectures in speech recognition
Patrick Cardinal, Pierre Dumouchel, Gilles Boulianne

Example-based speech recognition using formulaic phrases
Christopher J. Watkins, Stephen J. Cox

Parallel fast likelihood computation for LVCSR using mixture decomposition
Naveen Parihar, Ralf Schlüter, David Rybach, Eric A. Hansen

An indexing weight for voice-to-text search
Chen Liu

On invariant structural representation for speech recognition: theoretical validation and experimental improvement
Yu Qiao, Nobuaki Minematsu, Keikichi Hirose

Articulatory feature asynchrony analysis and compensation in detection-based ASR
I-Fan Chen, Hsin-Min Wang

CRANDEM: conditional random fields for word recognition
Jeremy Morris, Eric Fosler-Lussier

HEAR: an hybrid episodic-abstract speech recognizer
Sébastien Demange, Dirk Van Compernolle





LVCSR Systems and Spoken Term Detection


Real-time live broadcast news subtitling system for Spanish
Alfonso Ortega, Jose Enrique Garcia, Antonio Miguel, Eduardo Lleida

Development of the 2008 SRI Mandarin speech-to-text system for broadcast news and conversation
Xin Lei, Wei Wu, Wen Wang, Arindam Mandal, Andreas Stolcke

Multifactor adaptation for Mandarin broadcast news and conversation speech recognition
Wen Wang, Arindam Mandal, Xin Lei, Andreas Stolcke, Jing Zheng

Development of the GALE 2008 Mandarin LVCSR system
C. Plahl, Björn Hoffmeister, Georg Heigold, Jonas Lööf, Ralf Schlüter, Hermann Ney

The RWTH aachen university open source speech recognition system
David Rybach, Christian Gollan, Georg Heigold, Björn Hoffmeister, Jonas Lööf, Ralf Schlüter, Hermann Ney

Online detecting end times of spoken utterances for synchronization of live speech and its transcripts
Jie Gao, Qingwei Zhao, Yonghong Yan

Real-time ASR from meetings
Philip N. Garner, John Dines, Thomas Hain, Asmaa El Hannani, Martin Karafiát, Danil Korchagin, Mike Lincoln, Vincent Wan, Le Zhang

Improvements to the LIUM French ASR system based on CMU sphinx: what helps to significantly reduce the word error rate?
Paul Deléglise, Yannick Estève, Sylvain Meignier, Teva Merlin

Merging search spaces for subword spoken term detection
Timo Mertens, Daniel Schneider, Joachim Köhler

A posterior probability-based system hybridisation and combination for spoken term detection
Javier Tejedor, Dong Wang, Simon King, Joe Frankel, José Colás

Stochastic pronunciation modelling for spoken term detection
Dong Wang, Simon King, Joe Frankel

Term-dependent confidence for out-of-vocabulary term detection
Dong Wang, Simon King, Joe Frankel, Peter Bell

A comparison of query-by-example methods for spoken term detection
Wade Shen, Christopher M. White, Timothy J. Hazen

Fast keyword detection using suffix array
Kouichi Katsurada, Shigeki Teshima, Tsuneo Nitta







Phonetics


How similar are clusters resulting from schwa deletion in French to identical underlying clusters?
Audrey Bürki, Cécile Fougeron, Christophe Veaux, Ulrich H. Frauenfelder

Word-final [t]-deletion: an analysis on the segmental and sub-segmental level
Barbara Schuppler, Wim van Dommelen, Jacques Koreman, Mirjam Ernestus

Rarefaction gestures and coarticulation in mangetti dune !xung clicks
Amanda Miller, Abigail Scott, Bonny Sands, Sheena Shah

The acoustics of mangetti dune !xung clicks
Amanda Miller, Sheena Shah

Acoustic characteristics of ejectives in amharic
Hussien Seid, S. Rajendran, B. Yegnanarayana

Sentence-final particles in hong kong Cantonese: are they tonal or intonational?
Wing Li Wu

Same tone, different category: linguistic-tonetic variation in the areal tone acoustics of chuqu wu
William Steed, Phil Rose

Why would aspiration lower the pitch of the following vowel? observations from leng-shui-jiang Chinese
Caicai Zhang

Dialectal characteristics of osaka and tokyo Japanese: analyses of phonologically identical words
Kanae Amino, Takayuki Arai

Categories and gradience in intonation: evidence from linguistics and neurobiology
Brechtje Post, Francis Nolan, Emmanuel Stamatakis, Toby Hudson

Exploring vocalization of /l/ in English: an EPG and EMA study
Mitsuhiro Nakamura

The monophthongs and diphthongs of north-eastern welsh: an acoustic study
Robert Mayr, Hannah Davies

Voicing profile of Polish sonorants: [r] in obstruent clusters
J. Sieczkowska, Bernd Möbius, Antje Schweitzer, Michael Walsh, Grzegorz Dogil







Systems for Spoken Language Understanding


Classification-based strategies for combining multiple 5-w question answering systems
Sibel Yaman, Dilek Hakkani-Tür, Gokhan Tur, Ralph Grishman, Mary Harper, Kathleen R. McKeown, Adam Meyers, Kartavya Sharma

Combining semantic and syntactic information sources for 5-w question answering
Sibel Yaman, Dilek Hakkani-Tür, Gokhan Tur

Phrase and word level strategies for detecting appositions in speech
Benoit Favre, Dilek Hakkani-Tür

Error correction of proportions in spoken opinion surveys
Nathalie Camelin, Renato De Mori, Frederic Bechet, Géraldine Damnati

Transformation-based learning for semantic parsing
F. Jurčíček, M. Gašić, S. Keizer, F. Mairesse, B. Thomson, K. Yu, S. Young

Large-scale Polish SLU
Patrick Lehnen, Stefan Hahn, Hermann Ney, Agnieszka Mykowiecka

Optimizing CRFs for SLU tasks in various languages using modified training criteria
Stefan Hahn, Patrick Lehnen, Georg Heigold, Hermann Ney

Learning lexicons from spoken utterances based on statistical model selection
Ryo Taguchi, Naoto Iwahashi, Takashi Nose, Kotaro Funakoshi, Mikio Nakano

Improving speech understanding accuracy with limited training data using multiple language models and multiple understanding models
Masaki Katsumaru, Mikio Nakano, Kazunori Komatani, Kotaro Funakoshi, Tetsuya Ogata, Hiroshi G. Okuno

Low-cost call type classification for contact center calls using partial transcripts
Youngja Park, Wilfried Teiken, Stephen C. Gates

A new quality measure for topic segmentation of text and speech
Mehryar Mohri, Pedro Moreno, Eugene Weinstein

Concept segmentation and labeling for conversational speech
Marco Dinarelli, Alessandro Moschitti, Giuseppe Riccardi







Speaker and Speech Variability, Paralinguistic and Nonlinguistic Cues


A novel codebook search technique for estimating the open quotient
Yen-Liang Shue, Jody Kreiman, Abeer Alwan

Long term examination of intra-session and inter-session speaker variability
A. D. Lawson, A. R. Stauffer, B. Y. Smolenski, B. B. Pokines, M. Leonard, E. J. Cupples

Distorted visual information influences audiovisual perception of voicing
Ragnhild Eg, Dawn Behne

Perceived naturalness of a synthesizer of disordered voices
Samia Fraj, Francis Grenez, Jean Schoentgen

Audio-visual speech asynchrony modeling in a talking head
Alexey Karpov, Liliya Tsirulnik, Zdeněk Krňoul, Andrey Ronzhin, Boris Lobanov, Miloš Železný

The effects of fundamental frequency and formant space on speaker discrimination through bone-conducted ultrasonic hearing
Takayuki Kagomiya, Seiji Nakagawa

Automatic detection and prediction of topic changes through automatic detection of register variations and pause duration
Céline De Looze, Stéphane Rauzy

Analyzing features for automatic age estimation on cross-sectional data
Werner Spiegl, Georg Stemmer, Eva Lasarcyk, Varada Kolhatkar, Andrew Cassidy, Blaise Potard, Stephen Shum, Young Chol Song, Puyang Xu, Peter Beyerlein, James Harnsberger, Elmar Nöth

Intercultural differences in evaluation of pathological voice quality: perceptual and acoustical comparisons between RASATI and GRBASI scales
Emi Juliana Yamauchi, Satoshi Imaizumi, Hagino Maruyama, Tomoyuki Haji

F0 cues for the discourse functions of “hã” in hindi
Kalika Bali

Audio spatialisation strategies for multitasking during teleconferences
Stuart N. Wrigley, Simon Tucker, Guy J. Brown, Steve Whittaker

Speech rate effects on linguistic change
Alexsandro R. Meireles, Plínio A. Barbosa

Mandarin spontaneous narrative planning - prosodic evidence from national taiwan university lecture corpus
Chiu-yu Tseng, Zhao-yu Su, Lin-shan Lee




×

Keynotes

ASR: Features for Noise Robustness

Production: Articulatory Modelling

Systems for LVCSR and Rich Transcription

Speech Analysis and Processing I-III

Speech Perception I, II

Accent and Language Recognition

ASR: Acoustic Model Training and Combination

Spoken Dialogue Systems

Special Session: INTERSPEECH 2009 Emotion Challenge

Automatic Speech Recognition: Language Models I, II

Phoneme-Level Perception

Statistical Parametric Synthesis I, II

Systems for Spoken Language Translation

Human Speech Production I, II

Prosody, Text Analysis, and Multilingual Models

Automatic Speech Recognition: Adaptation I, II

Applications in Learning and Other Areas

Special Session: Silent Speech Interfaces

ASR: Discriminative Training

Language Acquisition

ASR: Lexical and Prosodic Models

Unit-Selection Synthesis

Speech and Audio Segmentation and Classification

Speaker Recognition and Diarisation

Special Session: Advanced Voice Function Assessment

Automotive and Mobile Applications

Prosody: Production I, II

ASR: Spoken Language Understanding

Speaker Diarisation

Speech Processing with Audio or Audiovisual Input

ASR: Decoding and Confidence Measures

Robust Automatic Speech Recognition I-III

Speaker Verification and Identification I-III

Text Processing for Spoken Language Generation

Single- and Multichannel Speech Enhancement

ASR: Acoustic Modelling

Assistive Speech Technology

Topics in Spoken Language Processing

Special Session: Measuring the Rhythm of Speech

Emotion and Expression I, II

Voice Transformation I, II

Phonetics, Phonology, Cross-Language Comparisons, Pathology

Prosody Perception and Language Acquisition

Resources, Annotation and Evaluation

Special Session: Lessons and Challenges Deploying Voice Search

Word-Level Perception

Applications in Education and Learning

ASR: New Paradigms I, II

Single-Channel Speech Enhancement

Expression, Emotion and Personality Recognition

Speech Synthesis Methods

LVCSR Systems and Spoken Term Detection

Special Session: Active Listening & Synchrony

Language Recognition

Phonetics & Phonology

Speech Activity Detection

Multimodal Speech (e.g. Audiovisual Speech, Gesture)

Phonetics

Special Session: Machine Learning for Adaptivity in Spoken Dialogue Systems

Prosody: Perception

Segmentation and Classification

Evaluation & Standardisation of SL Technology and Systems

Speech Coding

Systems for Spoken Language Understanding

Special Session: New Approaches to Modeling Variability for Automatic Speech Recognition

User Interactions in Spoken Dialog Systems

Production: Articulation and Acoustics

Features for Speech and Speaker Recognition

Speech and Multimodal Resources & Annotation

Speaker and Speech Variability, Paralinguistic and Nonlinguistic Cues

ASR: Acoustic Model Features

ASR: Tonal Language, Cross-Lingual and Multilingual ASR