ISCA Archive Interspeech 2005 Sessions Booklet
  ISCA Archive Sessions Booklet
top

Interspeech 2005

Lisbon, Portugal
4-8 September 2005

General Chair: Isabel Trancoso
doi: 10.21437/Interspeech.2005

Speech Recognition - Language Modelling I-III


Dynamic language model adaptation using variational Bayes inference
Yik-Cheung Tam, Tanja Schultz

The hidden vector state language model
Vidura Seneviratne, Steve Young

Class-based variable memory length Markov model
Shinsuke Mori, Gakuto Kurata

Context-sensitive statistical language modeling
Alexander Gruenstein, Chao Wang, Stephanie Seneff

Language model data filtering via user simulation and dialogue resynthesis
Chao Wang, Stephanie Seneff, Grace Chung

Bayesian learning for latent semantic analysis
Jen-Tzung Chien, Meng-Sung Wu, Chia-Sheng Wu

Discriminative maximum entropy language model for speech recognition
Chuang-Hua Chueh, To-Chang Chien, Jen-Tzung Chien

Open vocabulary speech recognition with flat hybrid models
Maximilian Bisani, Hermann Ney

An error-corrective language-model adaptation for automatic speech recognition
Minwoo Jeong, Jihyun Eun, Sangkeun Jung, Gary Geunbae Lee

Discriminative training of finite state decoding graphs
Shiuan-Sung Lin, François Yvon

Building continuous space language models for transcribing european languages
Holger Schwenk, Jean-Luc Gauvain

Using random forest language models in the IBM RT-04 CTS system
Peng Xu, Lidia Mangu

Minimum word error based discriminative training of language models
Jen-Wei Kuo, Berlin Chen

On the use of morphological constraints in n-gram statistical language model
A. Ghaoui, François Yvon, C. Mokbel, Gérard Chollet

A posteriori multiple word-domain language model
Elvira I. Sicilia-Garcia, Ji Ming, F. Jack Smith

Effective topic-tree based language model adaptation
Javier Dieguez-Tirado, Carmen García Mateo, Antonio Cardenal-Lopez

Building topic specific language models from webdata using competitive models
Abhinav Sethy, Panayiotis G. Georgiou, Shrikanth Narayanan

Trigger-based language model adaptation for automatic meeting transcription
Carlos Troncoso, Tatsuya Kawahara

Statistical language models for large vocabulary spontaneous speech recognition in dutch
Jacques Duchateau, Dong Hoon Van Uytsel, Hugo Van Hamme, Patrick Wambacq

Diachronic vocabulary adaptation for broadcast news transcription
Alexandre Allauzen, Jean-Luc Gauvain

Growing an n-gram language model
Vesa Siivola, Bryan L. Pellom

Embedding grammars into statistical language models
Harald Hüning, Manuel Kirschner, Fritz Class, Andre Berton, Udo Haiber

Methods for combining language models in speech recognition
Simo Broman, Mikko Kurimo

Review of statistical modeling of highly inflected lithuanian using very large vocabulary
Airenas Vaiciunas, Gailius Raskinis

Generalized hebbian algorithm for incremental latent semantic analysis
Genevieve Gorrell, Brandyn Webb

Language model adaptation for resource deficient languages using translated data
Arnar Thor Jensson, Edward W. D. Whittaker, Koji Iwano, Sadaoki Furui

POS-based language models for large vocabulary speech recognition on embedded systems
Petra Witschel, Sergey Astrov, Gabriele Bakenecker, Josef G. Bauer, Harald Höge


Prosody in Language Performance I, II


The effect of stress and boundaries on segmental duration in a corpus of authentic speech (british English)
Daniel Hirst, Caroline Bouzon

Investigation of the relationship between turn-taking and prosodic features in spontaneous dialogue
Tomoko Ohsuga, Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa

Filled pauses as cues to the complexity of following phrases
Michiko Watanabe, Keikichi Hirose, Yasuharu Den, Nobuaki Minematsu

Perceptual magnet effect in German boundary tones
Katrin Schneider, Bernd Möbius

Constraints on the acquisition of simplex and complex words in German
Angela Grimm, Jochen Trommer

Whistled speech: a natural phonetic description of languages adapted to human perception and to the acoustical environment
Julien Meyer

The stress foot as a unit of planned timing: evidence from shortening in the prosodic phrase
Heejin Kim, Jennifer Cole

Segmental "anchorage" and the French late rise
Pauline Welby, Hélène Loevenbruck

Prosodic cues for syntactically-motivated junctures
Ivan Chow

A glimpse of the time-course of intonation processing in European Portuguese
Isabel Falé, Isabel Hub Faria

Great expectations - introspective vs. perceptual prominence ratings and their acoustic correlates
Petra Wagner

Choosing a scale for measuring perceived prominence
Christian Jensen, John Tøndering

The effects of prosodic features on the interpretation of clarification ellipses
Jens Edlund, David House, Gabriel Skantze

Exploration of different types of intonational deviations in foreign-accented and synthesized speech
Matthias Jilka

A rhythmic-prosodic model of poetic speech
Jörg Bröggelwirth

Fine-tuning speech registers: a comparison of the prosodic features of child-directed and foreigner-directed speech
Sonja Biersack, Vera Kempe, Lorna Knapton

An analysis of the intonational structure of stuttered speech
Timothy Arbisi-Kelm

Voice quality dimensions of pitch accents
Britta Lintfert, Wolfgang Wokurek

Audiovisual production and perception of contrastive focus in French: a multispeaker study
Marion Dohen, Hélène Loevenbruck

Predicting end of utterance in multimodal and unimodal conditions
Pashiera Barkhuysen, Emiel Krahmer, Marc Swerts

Production of prominence in Japanese sign language
Saori Tanaka, Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa


Spoken Language Extraction / Retrieval I, II


Fast vocabulary-independent audio search using path-based graph indexing
Olivier Siohan, Michiel Bacchiani

The effects of speech recognition and punctuation on information extraction performance
John Makhoul, Alex Baron, Ivan Bulyko, Long Nguyen, Lance Ramshaw, David Stallard, Richard Schwartz, Bing Xiang

Indexing uncertainty for spoken document search
Ciprian Chelba, Alex Acero

Exploiting passage retrieval for n-best rescoring of spoken questions
Tomoyosi Akiba, Hiroyuki Abe

Multi-stage compaction approach to broadcast news summarisation
BalaKrishna Kolluru, Heidi Christensen, Yoshihiko Gotoh

Audio-video summarization of TV news using speech recognition and shot change detection
Chien-Lin Huang, Chia-Hsin Hsieh, Chung-Hsien Wu

Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals
Toru Taniguchi, Akishige Adachi, Shigeki Okawa, Masaaki Honda, Katsuhiko Shirai

Extractive summarization of meeting recordings
Gabriel Murray, Steve Renals, Jean Carletta

IR-based classification of customer-agent phone calls
Arjan van Hessen, Jaap Hinke

Mining broadcast news data: robust information extraction from word lattices
Benoît Favre, Frédéric Béchet, Pascal Nocéra

To recover from speech recognition errors in spoken document retrieval
Mikko Kurimo, Ville Turunen

Unsupervised clustering of spontaneous speech documents
Edgar Gonzàlez, Jordi Turmo

Spectral cross-correlation features for audio indexing of broadcast news and meetings
Masahide Yamaguchi, Masaru Yamashita, Shoichi Matsunaga

Spontaneous speech consolidation for spoken language applications
Chiori Hori, Alex Waibel

Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization
Sameer Maskey, Julia Hirschberg

Hierarchical topic organization and visual presentation of spoken documents using probabilistic latent semantic analysis (PLSA) for efficient retrieval/browsing applications
Te-Hsuan Li, Ming-Han Lee, Berlin Chen, Lin-Shan Lee

The COST278 broadcast news segmentation and speaker clustering evaluation - overview, methodology, systems, results
Janez Zibert, France Mihelic, Jean-Pierre Martens, Hugo Meinedo, Joao Neto, Laura Docio, Carmen Garcia Mateo, Petr David, Jindrich Zdansky, Matus Pleva, Anton Cizmar, Andrej Zgank, Zdravko Kacic, Csaba Teleki, Klara Vicsi

Comparison of keyword spotting approaches for informal continuous speech
Igor Szoke, Petr Schwarz, Pavel Matejka, Lukas Burget, Martin Karafiat, Michal Fapso, Jan Cernocky

Dialogue strategy to clarify user's queries for document retrieval system with speech interface
Teruhisa Misu, Tatsuya Kawahara

Comparison of different phone-based spoken document retrieval methods with text and spoken queries
Nicolas Moreau, Shan Jin, Thomas Sikora



New Applications


Speech retrieval of Mandarin broadcast news via mobile devices
Berlin Chen, Yi-Ting Chen, Chih-Hao Chang, Hung-Bin Chen

State estimation of meetings by information fusion using Bayesian network
Michiaki Katoh, Kiyoshi Yamamoto, Jun Ogata, Takashi Yoshimura, Futoshi Asano, Hideki Asoh, Nobuhiko Kitawaki

Results from a survey of attendees at ASRU 1997 and 2003
Roger K. Moore

Speech processing in the networked home environment - a view on the amigo project
Reinhold Haeb-Umbach, Basilis Kladis, Joerg Schmalenstroeer

Fixed distortion segmentation in efficient sound segment searching
Masahide Sugiyama

Identifying singers of popular songs
Tin Lay Nwe, Haizhou Li

Speech repair: quick error correction just by using selection operation for speech input interfaces
Jun Ogata, Masataka Goto

Steerable highly directional audio beam loudspeaker
Dirk Olszewski, Fransiskus Prasetyo, Klaus Linhard

Automatic music genre classification using second-order statistical measures for the prescriptive approach
Hassan Ezzaidi, Jean Rouat

Effect of head orientation on the speaker localization performance in smart-room environment
Alberto Abad, Dusan Macho, Carlos Segura, Javier Hernando, Climent Nadeu

Application of automatic speaker recognition techniques to pathological voice assessment (dysphonia)
Corinne Fredouille, G. Pouchoulin, Jean-François Bonastre, M. Azzarello, A. Giovanni, A. Ghio

Adaptive speech analytics: system, infrastructure, and behavior
Upendra V. Chaudhari, Ganesh N. Ramaswamy, Eddie Epstein, Sasha P. Caskey, Mohamed Kamal Omar




Acoustic Processing for ASR I-III


Frame based model order selection of spectral envelopes
Matthias Wölfel

On variable-scale piecewise stationary spectral analysis of speech signals for ASR
Vivek Tyagi, Christian Wellekens, Hervé Bourlard

Efficient pitch-based estimation of VTLN warp factors
Arlo Faria, David Gelbart

Accent detection and speech recognition for Shanghai-accented Mandarin
Yanli Zheng, Richard Sproat, Liang Gu, Izhak Shafran, Haolang Zhou, Yi Su, Daniel Jurafsky, Rebecca Starr, Su-Youn Yoon

Variability of automatic speech recognition systems using different features
Loic Barrault, Renato de Mori, Roberto Gemello, Franco Mana, Driss Matrouf

Crosslingual and bilingual speech recognition with Slovak and Czech speechdat-e databases
Slavomir Lihan, Jozef Juhar, Anton Cizmar

Automatic data selection for MLP-based feature extraction for ASR
Carmen Pelaez-Moreno, Qifeng Zhu, Barry Y. Chen, Nelson Morgan

Rapid porting of ASR-systems to mobile devices
Thilo W. Kohler, Christian Fugen, Sebastian Stüker, Alex Waibel

A stream-based audio segmentation, classification and clustering pre-processing system for broadcast news using ANN models
Hugo Meinedo, Joao Neto

Speech activity detection fusing acoustic phonetic and energy features
Etienne Marcheret, Karthik Visweswariah, Gerasimos Potamianos

Robust voice activity detection based on the entropy of noise-suppressed spectrum
Zoltan Tuske, Peter Mihajlik, Zoltan Tobler, Tibor Fegyo

Multiple moving speaker tracking by microphone array on mobile robot
Masamitsu Murase, Shunichi Yamamoto, Jean-Marc Valin, Kazuhiro Nakadai, Kentaro Yamada, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Learning statistically characterized resonance targets in a hidden trajectory model of speech coarticulation and reduction
Li Deng, Dong Yu, Alex Acero

Articulatory motivated acoustic features for speech recognition
Daniil Kocharov, András Zolnay, Ralf Schlüter, Hermann Ney

Effects of Bayesian predictive classification using variational Bayesian posteriors for sparse training data in speech recognition
Shinji Watanabe, Atsushi Nakamura

A study on separation between acoustic models and its applications
Yu Tsao, Jinyu Li, Chin-Hui Lee

Extended baum-welch reestimation of Gaussian mixture models based on reverse Jensen inequality
Mohamed Afify

Hidden conditional random fields for phone classification
Asela Gunawardana, Milind Mahajan, Alex Acero, John C. Platt

Hierarchical clustering of mixture tying using a partially observable Markov decision process
Michael Jonas, James G. Schmolze

Flavors of Gaussian warping
Pierre Ouellet, Gilles Boulianne, Patrick Kenny

Phoneme alignment based on discriminative learning
Joseph Keshet, Shai Shalev-Shwartz, Yoram Singer, Dan Chazan

Comparison of low footprint acoustic modeling techniques for embedded ASR systems
Jussi Leppänen, Imre Kiss

Factors in classification of stop consonant place of articulation
Atiwong Suchato, Proadpran Punyabukkana

Cross-speaker articulatory position data for phonetic feature prediction
Arthur R. Toth, Alan W. Black

Improvements to fMPE for discriminative training of features
Daniel Povey

Incorporating tone-related MLP posteriors in the feature representation for Mandarin ASR
Xin Lei, Mei-Yuh Hwang, Mari Ostendorf

Speech trajectory clustering for improved speech recognition
Yan Han, Johan de Veth, Louis Boves

Selection of features and combination of classifiers using a fuzzy approach for acoustic event classification
Andrey Temko, Dusan Macho, Climent Nadeu

Multi-task learning strategies for a recurrent neural net in a hybrid tied-posteriors acoustic model
Jan Stadermann, Wolfram Koska, Gerhard Rigoll

Revising Perceptual Linear Prediction (PLP)
Florian Hönig, Georg Stemmer, Christian Hacker, Fabio Brugnara

Confidence measures in speech recognition based on probability distribution of likelihoods
Joel Pinto, R. N. V. Sitaram

Continuous local codebook features for multi- and cross-lingual acoustic phonetic modelling
Frank Diehl, Asuncion Moreno, Enric Monte

Augmented state space acoustic decoding for modeling local variability in speech
Antonio Miguel, Eduardo Lleida, Richard Rose, Luis Buera, Alfonso Ortega

Auditory Teager energy cepstrum coefficients for robust speech recognition
Dimitrios Dimitriadis, Petros Maragos, Alexandros Potamianos

A hybrid Maxent/HMM based ASR system
Yasser Hifny, Steve Renals, Neil D. Lawrence

Regularizing linear discriminant analysis for speech recognition
Hakan Erdogan

Comprehensive modulation representation for automatic speech recognition
Yadong Wang, Steven Greenberg, Jayaganesh Swaminathan, Ramdas Kumaresan, David Poeppel

Segment-based phonetic class detection using minimum verification error (MVE) training
Qiang Fu, Biing-Hwang Juang

Acoustic and phonetic confusions in accented speech recognition
Yi Liu, Pascale Fung

Auditory image model features for automatic speech recognition
Mario E. Munich, Qiguang Lin

Applications of NAM microphones in speech recognition for privacy in human-machine communication
Panikos Heracleous, Tomomi Kaino, Hiroshi Saruwatari, Kiyohiro Shikano

A hybrid ANN/DBN approach to articulatory feature recognition
Joe Frankel, Simon King


Speech Recognition - Adaptation I, II


A speaker biased SI recognizer for embedded mobile applications
Yaxin Zhang, Bian Wu, Xiaolin Ren, Xin He

Fast unsupervised speaker adaptation through a discriminative eigen-MLLR algorithm
Bart Bakker, Carsten Meyer, Xavier Aubert

Incremental largest margin linear regression and MAP adaptation for speech separation in telemedicine applications
Rusheng Hu, Jian Xue, Yunxin Zhao

Applying vocal tract length normalization to meeting recordings
Giulia Garau, Steve Renals, Thomas Hain

Implementing frequency-warping and VTLN through linear transformation of conventional MFCC
S. Umesh, András Zolnay, Hermann Ney

MLLR-like speaker adaptation based on linearization of VTLN with MFCC features
Xiaodong Cui, Abeer Alwan

Model adaptation by state splitting of HMM for long reverberation
Chandra Kant Raut, Takuya Nishimoto, Shigeki Sagayama

Online speaker adaptation and tracking for real-time speech recognition
Daben Liu, Daniel Kiecza, Amit Srivastava, Francis Kubala

Automatic speech recognition based on adaptation and clustering using temporal-difference learning
Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa

Improving the speech recognition performance of beginners in spoken conversational interaction for language learning
Hui Ye, Steve Young

Rapid unsupervised speaker adaptation based on multi-template HMM sufficient statistics in noisy environments
Randy Gomez, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano

Rapid speaker adaptation for continuous speech recognition using merging eigenvoices
Dong-jin Choi, Yung-Hwan Oh

Feature adaptation using projection of Gaussian posteriors
Karthik Visweswariah, Peder Olsen

Maximum margin learning and adaptation of MLP classifiers
Xiao Li, Jeff Bilmes, Jonathan Malkin

Leveraging speaker-dependent variation of adaptation
Arindam Mandal, Mari Ostendorf, Andreas Stolcke

A comparative study of two kernel eigenspace-based speaker adaptation methods on large vocabulary continuous speech recognition
Roger Hsiao, Brian Mak

Environmental compensation using ASR model adaptation by a Bayesian parametric representation method
Xuechuan Wang, Douglas O'Shaughnessy

Discriminative speaker adaptation with eigenvoices
Jun Luo, Zhijian Ou, Zuoying Wang


Signal Analysis, Processing and Feature Estimation I-III


Real-time pitch tracking based on combined SMDSF
Jian Liu, Thomas Fang Zheng, Jing Deng, Wenhu Wu

Fundamental frequency estimation by least-squares harmonic model fitting
András Bánhalmi, Kornél Kovács, András Kocsor, László Tóth

Harmonic filtering for joint estimation of pitch and voiced source with single-microphone input
S. W. Lee, Frank K. Soong, P. C. Ching

High-resolution noise-robust spectral-based pitch estimation
Marián Képesi, Luis Weruaga

F0 estimation for adult and children's speech
John-Paul Hosom

Fundamental frequency and voicing prediction from MFCCs for speech reconstruction from unconstrained speech
Ben Milner, Xu Shao, Jonathan Darch

F0 stylisation with a free-knot b-spline model and simulated-annealing optimization
N. Barbot, Olivier Boëffard, D. Lolive

Voiced excitation as entrained primary response of a reconstructed glottal master oscillator
F. R. Drepper

Estimation of LF glottal source parameters based on an ARX model
Damien Vincent, Olivier Rosec, Thierry Chonavel

Some experiments on iterative reconstruction of speech from STFT phase and magnitude spectra
Leigh D. Alsteris, Kuldip K. Paliwal

Statistical properties of the warped discrete cosine transform cepstrum compared with MFCC
R. Muralishankar, Abhijeet Sangwan, Douglas O'Shaughnessy

New signal features for robust identification of isolated vowels
Aníbal J. S. Ferreira

Amplitude modulation of frication noise by voicing saturates
Jonathan Pincas, Philip J. B. Jackson

Extraction of relevant speech features using the information bottleneck method
Ron M. Hecht, Naftali Tishby

Comparing several models for perceptual long-term modeling of amplitude and phase trajectories of sinusoidal speech
Mohammad Firouzmand, Laurent Girin, Sylvain Marchand

Multi-resolution RASTA filtering for TANDEM-based ASR
Hynek Hermansky, Petr Fousek

A category-dependent feature selection method for speech signals
Woojay Jeon, Biing-Hwang Juang

Voicing features for robust speech detection
Trausti Kristjansson, Sabine Deligne, Peder Olsen

PCA of perturbation parameters in voice pathology detection
Pedro Gomez, Francisco Diaz, Agustin Alvarez, Rafael Martinez, Victoria Rodellar, Roberto Fernandez-Baillo, Alberto Nieto, Francisco J. Fernandez

Dynamic programming based segmentation approach to LSF matrix reconstruction
Anindya Sarkar, T. V. Sreenivas

Explicit segmentation of speech based on frequency-domain AR modeling
T. Nagarajan, Douglas O'Shaughnessy

Non-parametric speaker turn segmentation of meeting data
Petr Motlícek, Lukás Burget, Jan Cernocký

Unsupervised segmentation of continuous speech using vector autoregressive time-frequency modeling errors
Petri Korhonen, Unto K. Laine

The analysis on band-limited hypernasal speech using group delay based formant extraction technique
P. Vijayalakshmi, M. RamasubbaReddy

Detection of acoustic change-points in audio records via global BIC maximization and dynamic programming
Jindrich Zdánský, Jan Nouza

Multi-band approach of audio source discrimination with empirical mode decomposition
Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu

Application of auditory image model for speech event detection
Minoru Tsuzaki, Satomi Tanaka, Hiroaki Kato, Yoshinori Sagisaka

Unsupervised identification of speech segments using kernel methods for clustering
José Anibal Arias

Speech event detection using multiband modulation energy
Georgios Evangelopoulos, Petros Maragos

Measuring unsupervised acoustic clustering through phoneme pair merge-and-split tests
John Kominek, Alan W. Black

Variational Bayesian speaker change detection
Fabio Valente, Christian Wellekens

Distinctive feature based SVM discriminant features for improvements to phone recognition on telephone band speech
Sarah Borys, Mark Hasegawa-Johnson

Detection of hypernasality using statistical pattern classifiers
P. Vijayalakshmi, M. RamasubbaReddy

Self-organizing chirp-sensitive artificial auditory cortical model
Luis Weruaga, Marián Képesi

On the use of a decimative spectral estimation method based on eigenanalysis and SVD for formant and bandwidth tracking of speech signals
Sotiris Karabetsos, Pirros Tsiakoulis, Stavroula-Evita Fotinea, Ioannis Dologlou

Frequency-domain auditory suppression modelling (FASM) - a WDFT-based anthropomorphic noise-robust feature extraction algorithm for speech recognition
Alexei V. Ivanov, Marek Parfieniuk, Alexander A. Petrovsky

Asymptotically exact AM-FM decomposition based on iterated hilbert transform
Francesco Gianfelici, Giorgio Biagetti, Paolo Crippa, Claudio Turchetti

Advances in statistical estimation and tracking of AM-FM speech components
Athanassios Katsamanis, Petros Maragos

Formant frequency prediction from MFCC vectors in noisy environments
Jonathan Darch, Ben Milner, Saeed Vaseghi

Detection of vowel onset point events using excitation information
S. R. Mahadeva Prasanna, B. Yegnanarayana

Pitch-synchronous time-scaling for prosodic and voice quality transformations
João P. Cabral, Luís C. Oliveira

Discrimination between singing and speaking voices
Yasunori Ohishi, Masataka Goto, Katunobu Itou, Kazuya Takeda


Robust Speech Recognition I-IV


Joint Bayesian predictive classification and parallel model combination for robust speech recognition
Svein G. Pettersen, Magne H. Johnsen, Tor A. Myrvoll

Gaussian elimination algorithm for HMM complexity reduction in continuous speech recognition systems
Glauco F. G. Yared, Fábio Violaro, Lívio C. Sousa

Robust speech recognition in cars using phoneme dependent multi-environment linear normalization
Luis Buera, Eduardo Lleida, Antonio Miguel, Alfonso Ortega

Energy-based frame selection for reliable feature normalization and transformation in robust speech recognition
Yi Chen, Lin-Shan Lee

Remodeling of the sensor for non-audible murmur (NAM)
Yoshitaka Nakajima, Hideki Kashioka, Kiyohiro Shikano, Nick Campbell

Focused word segmentation for ASR
Amarnag Subramanya, Jeff Bilmes, Chia-Ping Chen

A comparison of particle filtering variants for speech feature enhancement
Reinhold Haeb-Umbach, Joerg Schmalenstroeer

Enhancement of mel log-power spectrum of speech using particle filtering
Ilyas Potamitis, Nikolaos Fakotakis

Improving robustness of speech recognition performance to aggregate of noises by two-dimensional visualization
Makoto Shozakai, Goshu Nagino

Feature compensation based on switching linear dynamic model and soft decision
Woohyung Lim, Bong Kyoung Kim, Nam Soo Kim

Using output probability distribution for improving speech recognition in adverse environment
Shilei Huang, Xiang Xie, Jingming Kuang

A generalized framework for compensation of mel-filterbank outputs in feature extraction for robust ASR
Eric H. C. Choi

Robust automatic speech recognition using a perceptually-based optimal spectral amplitude estimator speech enhancement algorithm in various low-SNR environments
Hesham Tolba, Zili Li, Douglas O'Shaughnessy

Improved noise-robustness in distributed speech recognition via perceptually-weighted vector quantisation of filterbank energies
Stephen So, Kuldip K. Paliwal

Sub-band weighted projection measure for robust sub-band speech recognition
Babak Nasersharif, Ahmad Akbari

Noise compensation using interacting multiple kalman filters
Jianping Deng, Martin Bouchard, Tet Hin Yeap

Kalman and unscented kalman filter feature enhancement for noise robust ASR
Veronique Stouten, Hugo Van Hamme, Patrick Wambacq

Histogram-based quantization (HQ) for robust and scalable distributed speech recognition
Chia-yu Wan, Lin-Shan Lee

A data-driven approach for the model parameter compensation in noisy speech recognition
Yong-Joo Chung

Rapid response and robust speech recognition by preliminary model adaptation for additive and convolutional noise
Satoshi Kobashikawa, Satoshi Takahashi, Yoshikazu Yamaguchi, Atsunori Ogawa

Nonlinear and linear transformations of speech features to compensate for channel and noise effects
Saurabh Prasad, Stephen A. Zahorian

Construction method of acoustic models dealing with various background noises based on combination of HMMs
Motoyuki Suzuki, Yusuke Kato, Akinori Ito, Shozo Makino

Robust speech recognition based on noise and SNR classification - a multiple-model framework
Haitian Xu, Zheng-Hua Tan, Paul Dalsgaard, Børge Lindberg

Eigen-environment based noise compensation method for robust speech recognition
Hwa Jeon Song, Hyung Soon Kim

Robust feature compensation in nonstationary and multiple noise environments
Martin Graciarena, Horacio Franco, Greg Myers, Victor Abrash

Maximum mutual information SPLICE transform for seen and unseen conditions
Jasha Droppo, Alex Acero

Speech recognition with support vector machines in a hybrid system
Sven E. Krüger, Martin Schafföner, Marcel Katz, Edin Andelic, Andreas Wendemuth

Experiments on speaker profile portability
Vincent Barreaud, Douglas O'Shaughnessy, Jean-Guy Dahan

A confidence measure invariant to language and grammar
Daniele Colibro, Luciano Fissore, Claudio Vair, Emanuele Dalmasso, Pietro Laface

Robust detection of sonorant landmarks
Ken Schutte, James Glass

Context-dependent word duration modelling for robust speech recognition
Ning Ma, Phil Green

An energy search approach to variable frame rate front-end processing for robust ASR
Julien Epps, Eric H. C. Choi

Non-linear estimation of voice activity to improve automatic recognition of noisy speech
Roberto Gemello, Franco Mana, Renato de Mori

Voice activity detection based on optimally weighted combination of multiple features
Yusuke Kida, Tatsuya Kawahara

Soft decision strategy and adaptive compensation for robust speech recognition against impulsive noise
Pei Ding

Statistical class-based MFCC enhancement of filtered and band-limited speech for robust ASR
Nicolás Morales, Doroteo Torre Toledano, John H. L. Hansen, José Colás, Javier Garrido

Spectral entropy feature in full-combination multi-stream for robust ASR
Hemant Misra, Hervé Bourlard

Environment-independent mask estimation for missing-feature reconstruction
Wooil Kim, Richard M. Stern, Hanseok Ko

Soft harmonic masks for recognising speech in the presence of a competing speaker
André Coy, Jon Barker

Comb filter decomposition for robust ASR
Lech Szymanski, Martin Bouchard

Investigating the role of the Lombard reflex in non-audible murmur (NAM) recognition
Panikos Heracleous, Tomomi Kaino, Hiroshi Saruwatari, Kiyohiro Shikano

Improved "TEO" feature-based automatic stress detection using physiological and acoustic speech sensors
Evan Ruzanski, John H. L. Hansen, Don Finan, James Meyerhoff, William Norris, Terry Wollert

Spectral subtraction using elliptic integral for multiplication factor
Takeshi S. Kobayakawa

Robust distant speech recognition based on position dependent CMN using a novel multiple microphone processing technique
Longbiao Wang, Norihide Kitaoka, Seiichi Nakagawa

Data collection and evaluation of speech recognition for motorbike riders
H. Tanaka, H. Fujimura, C. Miyajima, T. Nishino, Katunobu Itou, Kazuya Takeda

Application of a first-order differential microphone for efficient voice activity detection in a car platform
Agustín Álvarez, Pedro Gómez, V. Nieto, Rafael Martínez, Victoria Rodellar

Robust speech recognition for mobile devices in car noise
Panji Setiawan, Suhadi Suhadi, Tim Fingscheidt, Sorel Stan

Evaluation and optimization of noise robust front-end technologies for the automatic recognition of Hungarian telephone speech
Péter Mihajlik, Zoltán Tobler, Zoltán Tüske, Géza Gordos

A performance investigation of noisy voice recognition over IP telephony networks
Gang Chen, Douglas O'Shaughnessy, Hesham Tolba

Internal noise suppression for speech recognition by small robots
Akinori Ito, Takashi Kanayama, Motoyuki Suzuki, Shozo Makino

Temporal ICA for classification of acoustic events i a kitchen environment
Florian Kraft, Robert Malkin, Thomas Schaaf, Alex Waibel

hello - is anybody at home? - about the minimum word accuracy of a smart home spoken dialogue system
Jan Felix Krebber

The simulation of realistic acoustic input scenarios for speech recognition systems
H. Gunter Hirsch, Harald Finster

An agent-based framework for speech investigation
Michael Walsh, Gregory M. P. O'Hare, Julie Carson-Berndsen

Joint uncertainty decoding for noise robust speech recognition
H. Liao, M. J. F. Gales

Confidence scoring and rejection using multi-pass speech recognition
Vincent Vanhoucke

Memory-enhanced MMSE-based channel error mitigation for distributed speech recognition
Cheng-Lung Lee, Wen-Whei Chang

Designing multiple distinctive phonetic feature extractors for canonicalization by using clustering technique
Takashi Fukuda, Muhammad Ghulam, Tsuneo Nitta

Efficient blind dereverberation framework for automatic speech recognition
Keisuke Kinoshita, Tomohiro Nakatani, Masato Miyoshi

Combining multi-source far distance speech recognition strategies: beamforming, blind channel and confusion network combination
Matthias Wölfel, John McDonough


Speech Perception I, II


Lexical tone perception in musicians and non-musicians
Jennifer A. Alexander, Patrick C. M. Wong, Ann R. Bradlow

Contextual effect on perception of lexical tones in Cantonese
Joan K.-Y. Ma, Valter Ciocca, Tara Whitehill

Visual cues in Mandarin tone perception
Hansjörg Mixdorff, Yu Hu, Denis Burnham

Cross-language perception of word stress
Hansjörg Mixdorff, Yu Hu

The lexical statistics of word recognition problems caused by L2 phonetic confusion
Anne Cutler

A multi-layer fuzzy logical model for emotional speech perception
Chun-Fang Huang, Masato Akagi

Influence of F0 on Vietnamese syllable perception
Do Dat Tran, Eric Castelli, Jean-François Serignat, Van Loan Trinh, Xuan Hung Le

Lexical tone and pitch perception in tone and non-tone language speakers
Barbara Schwanhäußer, Denis Burnham

Intonational contrasts in EP: a categorical perception approach
Isabel Falé, Isabel Hub Faria

Does narrow focus activate alternative referents?
Bettina Braun, Andrea Weber, Matthew Crocker

Audiovisual interaction on the perception of frequency glide of linear sweep tones
Kiyoaki Aikawa, Hayato Hashimoto

Audiovisual integration in dichotic listening
Kei Omata, Ken Mogi

Perception experiment combining a parametric loudspeaker and a synthetic talking head
Gunilla Svanfeldt, Dirk Olszewski

Multidimensional scaling of listener responses to synthetic speech
Catherine Mayo, Robert A. J. Clark, Simon King

A timbre space for speech
Hiroko Terasawa, Malcolm Slaney, Jonathan Berger

Voice quality assessment by means of comparative judgments of speech tokens
A. Kacha, Francis Grenez, Jean Schoentgen

Speech intelligibility derived from time-frequency and source smearing
Toshio Irino, Satoru Satou, Shunsuke Nomura, Hideki Banno, Hideki Kawahara

Steady-state pre-processing for improving speech intelligibility in reverberant environments: evaluation in a hall with an electrical reverberator
Nahoko Hayashi, Takayuki Arai, Nao Hodoshima, Yusuke Miyauchi, Kiyohiro Kurisu

Neural bases of listening to speech in noise
Patrick C.M. Wong, Kiara M. Lee, Todd B. Parrish

The intelligibility of tracheoesophageal speech: first results
P. Jongmans, F. J. M. Hilgers, Louis C. W. Pols, C. J. van As-Brooks

A computational model of the speech reception threshold for laterally separated speech and noise
Guy J. Brown, Kalle J. Palomäki

Lexical inhibition effects in time-compressed speech
Esther Janse

Perception of time-compressed rapid acoustic cues in French CV syllables
Caroline Jacquier, Fanny Meunier

Reversed speech comprehension depends on the auditory efferent system functionality
C. Grataloup, M. Hoen, F. Pellegrino, E. Veuillet, L. Collet, Fanny Meunier

Perceptual space of English fricatives for Japanese learners
Won Tokuma, Shinichi Tokuma

Perceptual salience of language-specific acoustic differences in autonomous fillers across eight languages
Ioana Vasilescu, Maria Candea, Martine Adda-Decker

Effects of cortical and subcortical brain damage on the processing of emotional prosody
Marc D. Pell


Spoken Language Understanding I, II


Utterance verification incorporating in-domain confidence and discourse coherence measures
Ian R. Lane, Tatsuya Kawahara

Using symbolic prominence to help design feature subsets for topic classification and clustering of natural human-human conversations
Constantinos Boulis, Mari Ostendorf

Tightly integrated spoken language understanding using word-to-concept translation
Katsuhito Sudoh, Hajime Tsukada

Exploiting unlabeled data using multiple classifiers for improved natural language call-routing
Ruhi Sarikaya, Hong-Kwang Jeff Kuo, Vaibhava Goel, Yuqing Gao

Active learning with minimum expected error for spoken language understanding
Hong-Kwang Jeff Kuo, Vaibhava Goel

Lexical out-of-vocabulary models for one-stage speech interpretation
Matthias Thomae, Tibor Fabian, Robert Lieb, Günther Ruske

Hierarchical language models for one-stage speech interpretation
Matthias Thomae, Tibor Fabian, Robert Lieb, Günther Ruske

Spoken language understanding using layered n-gram modeling
Nick J. C. Wang

Named entity recognition from spontaneous open-domain speech
Mihai Surdeanu, Jordi Turmo, Eli Comelles

Discriminative training and support vector machine for natural language call routing
Imed Zitouni, Hui Jiang, Qiru Zhou

A multiple classifier-based concept-spotting approach for robust spoken language understanding
Jihyun Eun, Minwoo Jeong, Gary Geunbae Lee

A flexible and integrated interface between speech recognition, speech interpretation and dialog management
Robert Lieb, Matthias Thomae, Günther Ruske, Daniel Bobbert, Frank Althoff

Incremental dependency parsing of Japanese spoken monologue based on clause boundaries
Tomohiro Ohno, Shigeki Matsubara, Hideki Kashioka, Naoto Kato, Yasuyoshi Inagaki

Situation based speech recognition for structuring baseball live games
Atsushi Sako, Tetsuya Takiguchi, Yasuo Ariki

Semantic annotation of the French media dialog corpus
H. Bonneau-Maynard, Sophie Rosset, C. Ayache, A. Kuhn, Djamel Mostefa

Robust and efficient semantic parsing of free word order languages in spoken dialogue systems
Ralf Engel

Conceptual language model design for spoken language understanding
Catherine Kobus, Géraldine Damnati, Lionel Delphin-Poulat, Renato de Mori

From robust spoken language understanding to knowledge acquisition and management
Luís Seabra Lopes, António J. S. Teixeira, Marcelo Quinderé, Mário Rodrigues

Improving end-to-end performance of call classification through data confusion reduction and model tolerance enhancement
Cheng Wu, Xiang Li, Hong-Kwang Jeff Kuo, E. E. Jan, Vaibhava Goel, David Lubensky


Paralinguistic and Nonlinguistic Information in Speech


No laughing matter
Nick Campbell, Hideki Kashioka, Ryo Ohara

A study on the automatic detection and characterization of emotion in a voice service context
C. Blouin, V. Maffiolo

Classical and novel discriminant features for affect recognition from speech
Raul Fernandez, Rosalind W. Picard

Low-dimensional feature space derivation for emotion recognition
Jaroslaw Cichosz, Krzysztof Slot

Proposal of acoustic measures for automatic detection of vocal fry
Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita

Automatic detection of laughter
Khiet P. Truong, David A. van Leeuwen

Tales of tuning - prototyping for automatic classification of emotional user states
Anton Batliner, Stefan Steidl, Christian Hacker, Elmar Nöth, Heinrich Niemann

Automatic emotion recognition using prosodic parameters
Iker Luengo, Eva Navas, Inmaculada Hernáez, Jon Sánchez

An articulatory study of emotional speech production
Sungbok Lee, Serdar Yildirim, Abe Kazemzadeh, Shrikanth Narayanan

Informed blending of databases for emotional speech synthesis
Gregor O. Hofer, Korin Richmond, Robert A. J. Clark

Emotional FESTIVAL-MBROLA TTS synthesis
Fabio Tesser, Piero Cosi, Carlo Drioli, Graziano Tisato

Emofilt: the simulation of emotional speech by prosody-transformation
Felix Burkhardt

Acoustic/prosodic and lexical correlates of charismatic speech
Andrew Rosenberg, Julia Hirschberg

Communicative speech synthesis using constituent word attributes
Yoko Greenberg, Minoru Tsuzaki, Hiroaki Kato, Yoshinori Sagisaka

Emotions in dubbed speech: an intercultural approach with respect to F0
Angelika Braun, Matthias Katerbow

The prosodic dimensions of emotion in speech: the relative weights of parameters
Nicolas Audibert, Véronique Aubergé, Albert Rilliard

Stimulus duration and type in perception of female and male speaker age
Susanne Schötz

Perceptions of emotions in expressive storytelling
Cecilia Ovesdotter Alm, Richard Sproat

Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT
Hideki Kawahara, Alain de Cheveigné, Hideki Banno, Toru Takahashi, Toshio Irino

Gradually changing expression of singing voice based on morphing
Tomoko Yonezawa, Noriko Suzuki, Kenji Mase, Kiyoshi Kogure




Multi-modal / Multi-media Processing I, II


Non-verbal speech processing for a communicative agent
Nick Campbell

Physiologically motivated audio-visual localisation and tracking
Stuart N. Wrigley, Guy J. Brown

Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition
Jing Huang, Daniel Povey

INTERFACE: a new tool for building emotive/expressive talking heads
Graziano Tisato, Piero Cosi, Carlo Drioli, Fabio Tesser

Variance reduction by using separate genuine- impostor statistics in multimodal biometrics
P. Ejarque, Javier Hernando

The dialog application metalanguage GDialogXML
Volker Schubert, Stefan W. Hamerich

Myoelectric signals for multimodal speech recognition
Raghunandan S. Kumaran, Karthik Narayanan, John N. Gowdy

Is color information really useful for lip-reading ? (or what is lost when color is not used)
Philippe Daubias

A system for audio-visual speech recognition
I. Shdaifat, R.-R. Grigat

Multimodal interface for organization name input based on combination of isolated word recognition and continuous base-word recognition
Norihide Kitaoka, Hironori Oshikawa, Seiichi Nakagawa

Recognition of (3) party conversation using prosody and gaze
Yosuke Matsusaka

Combining voiceprint and face biometrics for speaker identification using SDWS
Dongdong Li, Yingchun Yang, Zhaohui Wu

Using the focus of visual attention to improve spontaneous speech recognition
Neil Cooke, Martin Russell

Real-time outer lip contour tracking for HCI applications
Sabri Gurbuz

Improving lip-reading with feature space transforms for multi-stream audio-visual speech recognition
Jing Huang, Karthik Visweswariah

Are there facial correlates of Thai syllabic tones?
Hansjörg Mixdorff, Denis Burnham, Guillaume Vignali, Patavee Charnvivit

A new posterior based audio-visual integration method for robust speech recognition
Rowan Seymour, Ji Ming, Darryl Stewart



Spoken / Multi-modal Dialogue Systems I, II


Learning of stochastic dialog models through a dialog simulation technique
Francisco Torres, Emilio Sanchis, Encarna Segarra

Evaluating the DI@l-log system on a cohort of elderly, diabetic patients: results from a preliminary study
Lesley-Ann Black, Michael McTear, Norman Black, Roy Harper, Michelle Lemon

Combination of classifiers for automatic recognition of dialog acts
Pavel Král, Christophe Cerisara, Jana Klecková

Rapidly developing spoken Chinese dialogue systems with the d-ear SDS SDK
Xiaojun Wu, Thomas Fang Zheng, Michael Brasser, Zhanjiang Song

Robust algorithms and interaction strategies for voice spelling
Daniela Oria, Akos Vetek

Modality integration and dialog management for a robotic assistant
Ioannis Toptsis, Axel Haasch, Sonja Hüwel, Jannik Fritsch, Gernot A. Fink

An integration framework for a mobile multimodal dialogue system accessing the semantic web
Norbert Reithinger, Daniel Sonntag

Operating a public spoken guidance system in real environment
Ryuichi Nisimura, Akinobu Lee, Masashi Yamada, Kiyohiro Shikano

Distributed dialogue management for smart terminal devices
Esa-Pekka Salonen, Markku Turunen, Jaakko Hakulinen, Leena Helin, Perttu Prusi, Anssi Kainulainen

Visualization of spoken dialogue systems for demonstration, debugging and tutoring
Jaakko Hakulinen, Markku Turunen, Esa-Pekka Salonen

Development and evaluation of a spoken dialog system to access a newspaper web site
César González-Ferreras, Valentín Cardeñoso-Payo

Comparing ASR modeling methods for spoken dialogue simulation and optimal strategy learning
Olivier Pietquin, Richard Beaufort

An approach to multi-strategy dialogue management
Shiu-Wah Chu, Ian O'Neill, Philip Hanna, Michael McTear

Towards user modelling in conversational dialogue systems: a qualitative study of the dynamics of dialogue parameters
Anna Hjalmarsson

Reducing the description amount in authoring MMI applications
Kouichi Katsurada, Kazumine Aoki, Hirobumi Yamada, Tsuneo Nitta

Contextual constraints based on dialogue models in database search task for spoken dialogue systems
Kazunori Komatani, Naoyuki Kanda, Tetsuya Ogata, Hiroshi G. Okuno

Using word-level pitch features to better predict student emotions during spoken tutoring dialogues
Mihai Rotaru, Diane J. Litman

Let's go public! taking a spoken dialog system to the real world
Antoine Raux, Brian Langner, Dan Bohus, Alan W. Black, Maxine Eskenazi

Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system
Shinya Fujie, Kenta Fukushima, Tetsunori Kobayashi

Learning user simulations for information state update dialogue systems
Kallirroi Georgila, James Henderson, Oliver Lemon

Design of a voice-enabled interface for real-time access to stock exchange from a PDA through GPRS
Darío Martín-Iglesias, Yago Pereiro-Estevan, Ana I. García-Moral, Ascensión Gallardo-Antolín, Fernando Díaz-de-María

Integrating denotational meaning into a DBN language model
William Schuler, Tim Miller

Improving out-of-coverage language modelling in a multimodal dialogue system using small training sets
Louis ten Bosch

Ritel: an open-domain, human-computer dialog system
Olivier Galibert, Gabriel Illouz, Sophie Rosset

User evaluation of conversational agent h. c. Andersen
Niels Ole Bernsen, Laila Dybkjaer

Integrated development and on-the-fly simulation of multimodal dialogs
Silke Goronzy, Nicole Beringer

Interactions between speech recognition problems and user emotions
Mihai Rotaru, Diane J. Litman, Katherine Forbes-Riley

Webtalk: mining websites for interactively answering questions
Junlan Feng, Srihari Reddy, Murat Saraçlar

Towards generic quality prediction models for spoken dialogue systems - a case study
Sebastian Möller

Robust access to large structured data using voice form-filling
S. Parthasarathy, Cyril Allauzen, R. Munkong


Speech Production I


The labial-coronal effect and CVCV stability during reiterant speech production: an acoustic analysis
Amélie Rochet-Capellan, Jean-Luc Schwartz

The labial-coronal effect and CVCV stability during reiterant speech production: an articulatory analysis
Amélie Rochet-Capellan, Jean-Luc Schwartz

Articulatory constraints and coronal stops: an EPG study
Mitsuhiro Nakamura

Strategies of labial coarticulation
Vincent Robert, Brigitte Wrobel-Dautcourt, Yves Laprie, Anne Bonneau

Investigation and modeling of coarticulation during speech
Jianwu Dang, Jianguo Wei, Takeharu Suzuki, Pascal Perrier

Tongue kinematics in diphthong production in Ningbo Chinese
Fang Hu

Comparing tongue positions of vowels in oral and nasal contexts
Takayuki Arai

Can we retrieve vocal tract dynamics that produced speech? toward a speaker articulatory strategy model
Slim Ouni

Modeling the production of VCV sequences via the inversion of a biomechanical model of the tongue
Pascal Perrier, Liang Ma, Yohan Payan

Estimation of the acoustic properties of the nasal tract during the production of nasalized vowels
Xiaochuan Niu, Alexander Kain, Jan P. H. van Santen

A web-based articulatory speech synthesis system for distance education
Kohichi Ogata

Group delay function as a means to assess quality of glottal inverse filtering
Paavo Alku, Matti Airas, Tom Bäckström, Hannu Pulakka

Subglottal pressure and NAQ variation in voice production of classically trained baritone singers
Eva Björkner, Johan Sundberg, Paavo Alku

Covariation of subglottal pressure, F0 and intensity
Gunnar Fant, Anita Kruckenberg

Automatic voice-source parameterization of natural speech
Javier Pérez, Antonio Bonafonte

Physiological study of whispered speech in Moroccan Arabic
Chakir Zeroual, John H. Esling, Lise Crevier-Buchman

Voice quality in down syndrome children treated with rapid maxillary expansion
C. P. Moura, D. Andrade, L. M. Cunha, M. J. Cunha, H. Vilarinho, H. Barros, Diamantino Freitas, M. Pais-Clemente

Synthesis of disordered speech
Julien Hanquinet, Francis Grenez, Jean Schoentgen

Quasi-automatic extraction of tongue movement from a large existing speech cineradiographic database
Julie Fontecave, Frédéric Berthommier

The working memory token test (WMTT): preliminary findings in young adults with and without dyslexia
Shimon Sapir, Ravit Cohen Mimran

Reducing the corpus-based TTS signal degradation due to speaker's word pronunciations
Sérgio Paulo, Luís C. Oliveira

A phonetic study of the "er-hua" rimes in Beijing Mandarin
Wai-Sum Lee

A toolkit for voice inverse filtering and parametrisation
Matti Airas, Hannu Pulakka, Tom Bäckström, Paavo Alku

Stylization of glottal-flow spectra produced by a mechanical vocal-fold model
Denisse Sciamarella, Christophe d'Alessandro

Numerical glottal sound source model as coupled problem between vocal cord vibration and glottal flow
Hideyuki Nomura, Tetsuo Funada

A tagged-cine MRI investigation of German vowels
Marianne Pouplier, Maureen Stone

A three-dimensional linear articulatory model of velum based on MRI data
Antoine Serrurier, Pierre Badin

On the relationship between intra-oral pressure and speech sonority
Anne Cros, Didier Demolin, Ana Georgina Flesia, Antonio Galves


Spoken Language Resources and Technology Evaluation I, II


Two experiments comparing reading with listening for human processing of conversational telephone speech
Douglas Jones, Wade Shen, Elizabeth Shriberg, Andreas Stolcke, Teresa Kamm, Douglas Reynolds

The ESTER phase II evaluation campaign for the rich transcription of French broadcast news
Sylvain Galliano, Edouard Geoffrois, Djamel Mostefa, Khalid Choukri, Jean-François Bonastre, Guillaume Gravier

A method of multi-layered speech segmentation tailored for speech synthesis
Takashi Saito

Generation of word alternative pronunciations using weighted finite state transducers
Sérgio Paulo, Luís C. Oliveira

Multiword expressions in spontaneous speech: do we really speak like that?
Helmer Strik, Diana Binnenpoorte, Catia Cucchiarini

Czech spontaneous speech corpus with structural metadata
Jáchym Kolár, Jan Svec, Stephanie Strassel, Christopher Walker, Dagmar Kozlíková, Josef Psutka

A database of German emotional speech
Felix Burkhardt, A. Paeschke, M. Rolfes, Walter F. Sendlmeier, Benjamin Weiss

Evaluating the pronunciation of proper names by four French grapheme-to-phoneme converters
Philippe Boula de Mareuil, Christophe d'Alessandro, Gerard Bailly, Frederic Bechet, Marie-Neige Garcia, Michel Morel, Romain Prudon, Jean Veronis

A human-human train timetable dialogue corpus
Filip Jurcicek, Jiri Zahradil, Libor Jelinek

A Portuguese spoken and multi-modal dialog corpora
Gloria Branco, Luis Almeida, Rui Gomes, Nuno Beires

Development of a Cantonese-English code-mixing speech corpus
Joyce Y. C. Chan, P. C. Ching, Tan Lee

BNSI Slovenian broadcast news database - speech and text corpus
Andrej Zgank, Darinka Verdonik, Aleksandra Zögling Markus, Zdravko Kacic

Confronting HMM-based phone labelling with human evaluation of speech production
Jan Volín, Radek Skarnitzl, Petr Pollák

Structural metadata annotation: moving beyond English
Stephanie Strassel, Jáchym Kolár, Zhiyi Song, Leila Barclay, Meghan Glenn

Neologos: an optimized database for the development of new speech processing algorithms
Delphine Charlet, Sacha Krstulovic, Frédéric Bimbot, Olivier Boëffard, Dominique Fohr, Odile Mella, Filip Korkmazsky, Djamel Mostefa, Khalid Choukri, Arnaud Vallée

A hybrid approach to automatic segmentation and labeling for Mandarin Chinese speech corpus
Cheng-Yuan Lin, Kuan-Ting Chen, J.-S. Roger Jang

The multiple pronunciations in Taiwanese and the automatic transcription of Buddhist sutra with augmented read speech
Yuang-Chin Chiang, Min-Siong Liang, Hong-Yi Lin, Ren-Yuan Lyu

Bootstrapping pronunciation dictionaries: practical issues
Marelie Davel, Etienne Barnard

Root causes of lost time and user stress in a simple dialog system
Nigel G. Ward, Anais G. Rivera, Karen Ward, David G. Novick

Evaluating communication effectiveness in team collaboration
Julie A. Parisi, Douglas S. Brungart

Bilingual aligned corpora for speech to speech translation for Spanish, English and Catalan
David Conejero, Alan Lounds, Carmen Garcia Mateo, Leandro Rodriguez-Linares, Raquel Mochales, Asuncion Moreno

Design and collection of Czech Lombard speech database
Hynek Boril, Petr Pollak

TBALL data collection: the making of a young children's speech corpus
Abe Kazemzadeh, Hong You, Markus Iseli, Barbara Jones, Xiaodong Cui, Margaret Heritage, Patti Price, Elaine Anderson, Shrikanth Narayanan, Abeer Alwan

Construction and utilization of bilingual speech corpus for simultaneous machine interpretation research
Hitomi Tohyama, Shigeki Matsubara, Nobuo Kawaguchi, Yasuyoshi Inagaki

Meeting acts: a labeling system for group interaction in meetings
Rebecca Bates, Patrick Menning, Elizabeth Willingham, Chad Kuyper

A new evaluation criteria for keyword spotting techniques and a new algorithm
Marius C. Silaghi, Rachna Vargiya

Phattsessionz: recording 1000 adolescent speakers in schools in Germany
Christoph Draxler, Alexander Steffen

An Amharic speech corpus for large vocabulary continuous speech recognition
Solomon Teferra Abate, Wolfgang Menzel, Bairu Tafila

The FASil speech and multimodal corpora
Hans Dolfing, David Reitter, Luís Almeida, Nuno Beires, Michael Cody, Rui Gomes, Kerry Robinson, Roman Zielinski

Revealing phonological similarities between German and dutch
Karin Müller





Prosodic Structure


The focus prosody: more than a simple binary function
Véronique Aubergé, Albert Rilliard

Peak timing in two dialects of connaught irish
Martha Dalton, Ailbhe Ní Chasaide

Compound rises and "uptalk" in spoken English
Janet Fletcher

Duration and the temporal structure of Mandarin discourse
Li-chiung Yang

Prosodic realization of split noun phrases in Mandarin Chinese compared in topic and focus contexts
Bei Wang

Downstep effect on disyllabic words of citation forms in standard Chinese
Ziyu Xiong

Estimation of intonation variation with constrained tone transformations
Jinfu Ni, Hisashi Kawai, Keikichi Hirose

Voice quality of falling tones in taiwan min
Ho-hsien Pan

Duration, intensity and pause predictions in relation to prosody organization
Chiu-yu Tseng, Bau-Ling Fu

Pitch accent prediction: effects of genre and speaker
Jiahong Yuan, Jason M. Brenier, Daniel Jurafsky

Analysis and modeling of fundamental frequency contours of hindi utterances
Hiroya Fujisaki, Sumio Ohno

Fundamental frequency and tone in isizulu: initial experiments
Natasha Govender, Etienne Barnard, Marelie Davel

Intonational sequences in tuscan Italian
Judith Bishop, Marc Peake, Dmitry Sityaev

Effects of raddoppiamento sintattico on tonal alignment in Italian
Caterina Petrone

Acoustic analysis of Czech stress: intonation, duration and intensity revisited
Tomás Dubeda, Jan Votrubec

Variability of F0 peak alignment in moroccan Arabic accentual focus
Mohamed Yeou

Phonological analysis of schwa and liaison within the PFC project (phonologie du franais contemporain): how determinant are the prosodic factors?
Anne Lacheret, Ch. Lyche, Michel Morel

Abstractness in speech-metronome synchronisation: P-centres as cyclic attractors
Plínio A. Barbosa, Pablo Arantes, Alexsandro R. Meireles, Jussara M. Vieira





Large Vocabulary Speech Recognition Systems


Development of a conversational telephone speech recognizer for Levantine Arabic
Dimitra Vergyri, Katrin Kirchhoff, R. Gadde, Andreas Stolcke, Jing Zheng

Exploiting large quantities of spontaneous speech for unsupervised training of acoustic models
Bhuvana Ramabhadran

Improved spontaneous Mandarin speech recognition by disfluency interruption point (IP) detection using prosodic features
Che-Kuang Lin, Lin-Shan Lee

Improvements to the BBN RT04 Mandarin conversational telephone speech recognition system
Jeff Z. Ma, Spyros Matsoukas

Incorporating a Bayesian wide phonetic context model for acoustic rescoring
Sakriani Sakti, Satoshi Nakamura, Konstantin Markov

Modeling vowels for Arabic BN transcription
Abdel Messaoudi, Lori Lamel, Jean-Luc Gauvain

Recent progress in Arabic broadcast news transcription at BBN
Mohamed Afify, Long Nguyen, Bing Xiang, Sherif Abdou, John Makhoul

The 2004 BBN 1xRT recognition systems for English broadcast news and conversational telephone speech
Spyros Matsoukas, Rohit Prasad, Srinivas Laxminarayan, Bing Xiang, Long Nguyen, Richard Schwartz

The 2004 BBN/LIMSI 20xRT English conversational telephone speech recognition system
Rohit Prasad, Spyros Matsoukas, C.-L. Kao, Jeff Z. Ma, D.-X. Xu, T. Colthurst, O. Kimball, Richard Schwartz, Jean-Luc Gauvain, Lori Lamel, Holger Schwenk, G. Adda, F. Lefevre

The BBN Mandarin broadcast news transcription system
Bing Xiang, Long Nguyen, Xuefeng Guo, Dongxin Xu

The LIUM speech transcription system: a CMU Sphinx III-based system for French broadcast news
Paul Deléglise, Yannick Estève, Sylvain Meignier, Teva Merlin

Transcribing lectures and seminars
Lori Lamel, G. Adda, E. Bilinski, Jean-Luc Gauvain

Transcription of conference room meetings: an investigation
Thomas Hain, John Dines, Giulia Garau, Martin Karafiát, Darren Moore, Vincent Wan, Roeland Ordelman, Steve Renals

Where are we in transcribing French broadcast news?
Jean-Luc Gauvain, G. Adda, Martine Adda-Decker, Alexandre Allauzen, V. Gendner, Lori Lamel, Holger Schwenk

Two-pass strategy for handling OOVs in a large vocabulary recognition task
Odette Scharenborg, Stephanie Seneff

The BBN RT04 English broadcast news transcription system
Long Nguyen, Bing Xiang, Mohamed Afify, Sherif Abdou, Spyros Matsoukas, Richard Schwartz, John Makhoul

Investigations on ensemble based semi-supervised acoustic model training
Rong Zhang, Ziad Al Bawab, Arthur Chan, Ananlada Chotimongkol, David Huggins-Daines, Alexander I. Rudnicky

Fully automated system for Czech spoken broadcast transcription with very large (300k+) lexicon
Jan Nouza, Jindrich Zdánský, Petr David, Petr Cerva, Jan Kolorenc, Dana Nejedlová

Experiments with probabilistic principal component analysis in LVCSR
Mike Schuster, Takaaki Hori, Atsushi Nakamura

Vietnamese large vocabulary continuous speech recognition
Thang Tat Vu, Dung Tien Nguyen, Mai Chi Luong, John-Paul Hosom

Data sampling for improved speech recognizer training
Takahiro Shinozaki, Mari Ostendorf, Les Atlas


Prosody Modelling and Speech Technology I, II


Context in multi-lingual tone and pitch accent recognition
Gina-Anne Levow

Automatic prominence identification and prosodic typology
Fabio Tamburini

Influence of syntax on prosodic boundary prediction
Tommy Ingulfsen, Tina Burrows, Sabine Buchholz

Using prosodic information for disambiguation purposes
Roberto Gretter, Dino Seppi

Analysis of the effects of word emphasis and echo question on F0 contours of Cantonese utterances
Wentao Gu, Keikichi Hirose, Hiroya Fujisaki

Combining models of prosodic phrasing and pausing
Tina Burrows, Peter Jackson, Katherine Knill, Dmitry Sityaev

Analysis by synthesis of speech prosody: the Prozed environment
Daniel Hirst, Cyril Auran

A discriminative approach to phrase break modelling
Stephen Cox

Stochastic and syntactic techniques for predicting phrase breaks
Ian Read, Stephen Cox

Tree-based prediction of prosodic phrase breaks on top of shallow textual features
Gerasimos Xydas, Panagiotis Zervas, Georgios Kouroupetroglou, Nikolaos Fakotakis, George Kokkinakis

Chinese prosodic phrasing with a constraint-based approach
Honghui Dong, Jianhua Tao, Bo Xu

A probabilistic approach to prosodic word prediction for Mandarin Chinese TTS
Minghui Dong, Kim-Teng Lua, Haizhou Li

Evaluation of a system for F0 contour prediction for european Portuguese
João Paulo Teixeira, Diamantino Freitas, Hiroya Fujisaki

Analysis on command sequences of a F0 generation model for Mandarin speech and its application to their automatic extraction
Ke Li, Yoshinori Sagisaka

Corpus-based extraction of F0 contour generation process model parameters
Keikichi Hirose, Yusuke Furuyama, Nobuaki Minematsu

Optimized selection of intonation dictionaries in corpus based intonation modelling
David Escudero, Valentín Cardeñoso-Payo

Generation of fundamental frequency contours for Mandarin speech synthesis based on tone nucleus model
Qinghua Sun, Keikichi Hirose, Wentao Gu, Nobuaki Minematsu

On the inter-syllable coarticulation effect of pitch modeling for Mandarin speech
Chen-Yu Chiang, Yih-Ru Wang, Sin-Horng Chen

Training the tilt intonation model using the JEMA methodology
Matej Rojc, Pablo Daniel Aguero, Antonio Bonafonte, Zdravko Kacic

Piecewise linear stylization of pitch via wavelet analysis
Dagen Wang, Shrikanth Narayanan

Phonetic labeling and segmentation of mixed-lingual prosody databases
Harald Romsdorfer, Beat Pfister

Exploratory analysis of linguistic data based on genetic algorithm for robust modeling of the segmental duration of speech
Edmilson Morais, Fábio Violaro

Annotation-mining for rhythm model comparison in Brazilian portuguese
Dafydd Gibbon, Flaviane Romani Fernandes

A stochastic approach to phoneme and accent estimation
Tohru Nagano, Shinsuke Mori, Masafumi Nishimura

The detection of emphatic words using acoustic and lexical features
Jason M. Brenier, Daniel M. Cer, Daniel Jurafsky

Tone recognition in Mandarin using focus
Dinoj Surendran, Gina-Anne Levow, Yi Xu

An automatic intonation recognizer for the Polish language based on machine learning and expert knowledge
Mikolaj Wypych

Generalized envelope matching technique for time-scale modification of speech (GEM-TSM)
Atsuhiro Sakurai




Text-to-Speech I, II


Learning to personalize spoken generation for dialogue systems
François Mairesse, Marilyn Walker

Optimization of text-to-speech phonetic transcriptions using a-posteriori signal comparison
S. Revelin, D. Cadic, C. Waast-Richard

Voice transformation using principle component analysis based LSF quantization and dynamic programming approach
Özgül Salor, Mübeccel Demirekler

Adapt Mandarin TTS system to Chinese dialect TTS systems
Hai Ping Li, Wei Zhang

Grapheme-to-phoneme conversion based on TBL algorithm in Mandarin TTS system
Min Zheng, Qin Shi, Wei Zhang, Lianhong Cai

An automaton-based machine learning technique for automatic phonetic transcription
Paolo Massimino, Alberto Pacchiotti

Comparative objective and subjective evaluation of three data-driven techniques for proper name pronunciation
Tasanawan Soonklang, Robert I. Damper, Yannick Marchand

Articulatory synthesis using corpus-based estimation of line spectrum pairs
Olov Engwall

Effects of pitch accent type on interpreting information status in synthetic speech
Aoju Chen, Els den Os

Towards generic spatial object model and route guidance grammar for speech-based systems
Perttu Prusi, Anssi Kainulainen, Jaakko Hakulinen, Markku Turunen, Esa-Pekka Salonen, Leena Helin

Duration-embedded bi-HMM for expressive voice conversion
Chi-Chun Hsia, Chung-Hsien Wu, Te-Hsien Liu

Analysis of major factors of naturalness degradation in concatenative synthesis
Toshio Hirai, Hisashi Kawai, Minoru Tsuzaki, Nobuyuki Nishizawa

Duration modeling and memory optimization in a Mandarin TTS system
Jilei Tian, Jani Nurminen, Imre Kiss

A bi-lingual Mandarin-to-taiwanese text-to-speech system
Min-Siong Liang, Ke-Chun Chuang, Rhuei-Cheng Yang, Yuang-Chin Chiang, Ren-Yuan Lyu

Using morphology and phoneme history to improve grapheme-to-phoneme conversion
Uwe D. Reichel, Florian Schiel

Predicting consonant duration with Bayesian belief networks
Olga Goubanova, Simon King

Inducing decision tree pronunciation variation models from annotated speech data
Per-Anders Jande

Phonetic transcription verification with generalized posterior probability
Lijuan Wang, Yong Zhao, Min Chu, Frank K. Soong, Zhigang Cao

Training a maximum entropy model for surface realization
Hua Cheng, Fuliang Weng, Niti Hantaweepant, Lawrence Cavedon, Stanley Peters

NAM-to-speech conversion with Gaussian mixture models
Tomoki Toda, Kiyohiro Shikano

Which Italian do current systems speak? a first step towards pronunciation modelling of Italian varieties
Michelina Savino, Mario Refice, Massimo Mitaritonna

Modelling pitch accent types for Polish speech synthesis
Dominika Oliver, Robert A. J. Clark

Learning methods and features for corpus-based phrase break prediction on Thai
C. Hansakunbuntheung, Ausdang Thangthai, Chai Wutiwiwatchai, Rungkarn Siricharoenchai

Hidden Markov models for grapheme to phoneme conversion
Paul Taylor

Pitch-effects in diphone recording: are logatomes inappropriate?
Ulrich Reubold, Alexander Steffen

Speech parameter generation algorithm considering global variance for HMM-based speech synthesis
Tomoki Toda, Keiichi Tokuda

Performance evaluation of style adaptation for hidden semi-Markov model based speech synthesis
Makoto Tachibana, Junichi Yamagishi, Takashi Masuko, Takao Kobayashi

A comparison of methods for speaker-dependent pronunciation tuning for text-to-speech synthesis
Gabriel Webster, Tina Burrows, Katherine Knill

Perceptually-based data-driven join costs: comparing join types
Ann K. Syrdal, Alistair D. Conkie

Discontinuity detection in concatenated speech synthesis based on nonlinear speech analysis
Yannis Pantazis, Yannis Stylianou, Esther Klabbers


Speaker Characterization and Recognition I-IV


Robust distant speaker recognition based on position dependent cepstral mean normalization
Longbiao Wang, Norihide Kitaoka, Seiichi Nakagawa

Speaker adaptation in the NIST speaker recognition evaluation 2004
David A. van Leeuwen

A distance measure between GMMs based on the unscented transform and its application to speaker recognition
Jacob Goldberger, Hagai Aronowitz

Estimation of speaker's height and vocal tract length from speech signal
Sorin Dusan

On the relationship between phonetic modeling precision and phonetic speaker recognition accuracy
Doroteo Torre Toledano, Carlos Fombella, Joaquin Gonzalez Rodriguez, Luis Hernandez Gomez

Open-set speaker identification using adapted Gaussian mixture models
J. Fortuna, P. Sivakumaran, A. Ariyaeeinia, A. Malegaonkar

Speaker verification in noisy conditions using correlated subband features
James McAuley, Ji Ming, Pat Corr

Probabilistic anchor models approach for speaker verification
Mikaël Collet, Yassine Mam, Delphine Charlet, Frédéric Bimbot

A Bayesian network approach combining pitch and spectral envelope features to reduce channel mismatch in speaker verification and forensic speaker recognition
Mijail Arcienega, Anil Alexander, Philipp Zimmermann, Andrzej Drygajlo

Channel robust speaker verification via Bayesian blind stochastic feature transformation
Kwok-Kwong Yiu, Man-Wai Mak, Sun-Yuan Kung

dPLRM-based speaker identification with log power spectrum
Tomoko Matsui, Kunio Tanabe

Speaker verification using Gaussian mixture models within changing real car environments
Xianxian Zhang, John H. L. Hansen, Pongtep Angkititrakul, Kazuya Takeda

The correspondences between the perception of the speaker individualities contained in speech sounds and their acoustic properties
Kanae Amino, Tsutomu Sugawara, Takayuki Arai

A noise-robust pitch synchronous feature extraction algorithm for speaker recognition systems
Samuel Kim, Sungwan Yoon, Thomas Eriksson, Hong-Goo Kang, Dae Hee Youn

Modeling high-level information by using Gaussian mixture correlation for GMM-UBM based speaker recognition
Jing Deng, Thomas Fang Zheng, Zhanjiang Song, Jian Liu

In-set/out-of-set speaker identification based on discriminative speech frame selection
Xianxian Zhang, John H. L. Hansen

Mixture of support vector machines for text-independent speaker recognition
Zhenchun Lei, Yingchun Yang, Zhaohui Wu

Optimal model order selection based on regression tree in speaker identification
Shilei Zhang, Junmei Bai, Shuwu Zhang, Bo Xu

Speaker verification improvement using blind inversion of distortions
Marcos Faúndez-Zanuy, Jordi Solé-Casals

Maximum conditional mutual information modeling for speaker verification
Mohamed Kamal Omar, Jiri Navrátil, Ganesh N. Ramaswamy

Class-dependent score combination for speaker recognition
Luciana Ferrer, Kemal Sönmez, Sachin Kajarekar

Modeling intra-speaker variability for speaker recognition
Hagai Aronowitz, Dror Irony, David Burshtein

Liveness detection using cross-modal correlations in face-voice person authentication
Girija Chetty, Michael Wagner

Stream-weight optimization by LDA and adaboost for multi-stream speaker verification
Taichi Asami, Koji Iwano, Sadaoki Furui

Considering speech quality in speaker verification fusion
Yosef A. Solewicz, Moshe Koppel

MLLR transforms as features in speaker recognition
Andreas Stolcke, Luciana Ferrer, Sachin Kajarekar, Elizabeth Shriberg, Anand Venkataraman

Gaussian mixture modelling of broad phonetic and syllabic events for text-independent speaker verification
Brendan Baker, Robbie Vogt, Sridha Sridharan

Efficient speaker identification and retrieval
Hagai Aronowitz, David Burshtein

The Cambridge University March 2005 speaker diarisation system
R. Sinha, S. E. Tranter, M. J. F. Gales, P. C. Woodland

Combining speaker identification and BIC for speaker diarization
Xuan Zhu, Claude Barras, Sylvain Meignier, Jean-Luc Gauvain

Broadcast news speaker tracking for ESTER 2005 campaign
Dan Istrate, Nicolas Scheffer, Corinne Fredouille, Jean-François Bonastre

Experiments on speaker tracking and segmentation in radio broadcast news
Daniel Moraru, Mathieu Ben, Guillaume Gravier

Unsupervised segmentation and verification of multi-speaker conversational speech
Emanuele Dalmasso, Pietro Laface, Daniele Colibro, Claudio Vair

Focal speakers: a speaker selection method able to deal with heterogeneous similarity criteria
Sacha Krstulovic, Frédéric Bimbot, Delphine Charlet, Olivier Boëffard

A model space framework for efficient speaker detection
Mathieu Ben, Guillaume Gravier, Frédéric Bimbot

Speaker detection using acoustic event sequences
Nicolas Scheffer, Jean-François Bonastre

Speaker clustering of unknown utterances based on maximum purity estimation
Wei-Ho Tsai, Hsin-Min Wang

Modified DISTBIC algorithm for speaker change detection
Petra Zochová, Vlasta Radová

Decision trees with improved efficiency for fast speaker verification
Gilles Gonon, Rémi Gribonval, Frédéric Bimbot

A speaker independent "liveness" test for audio-visual biometrics
Nicolas Eveno, Laurent Besacier

Distributed speaker recognition using speaker-dependent VQ codebook and earth mover's distance
Shingo Kuroiwa, Yoshiyuki Umeda, Satoru Tsuge, Fuji Ren

Speaker verification via articulatory feature-based conditional pronunciation modeling with vowel and consonant mixture models
Ka-Yee Leung, Man-Wai Mak, Manhung Siu, Sun-Yuan Kung

Prosodic features based on wavelet analysis for speaker verification
Jixu Chen, Beiqian Dai, Jun Sun

Relevant information extraction for discriminative training applied to speaker identification
M. Mihoubi, Douglas O'Shaughnessy, P. Dumouchel

Conceiving a new sequence kernel and applying it to SVM speaker verification
Jérôme Louradour, Khalid Daoudi

The predictive differential amplitude spectrum for robust speaker recognition in stationary noises
Jing Deng, Thomas Fang Zheng, Jian Liu, Wenhu Wu

Data-driven clustering for blind feature mapping in speaker verification
Michael Mason, Robbie Vogt, Brendan Baker, Sridha Sridharan

Improved covariance modeling for GMM in speaker identification
Xi Zhou, Zhi-qiang Yao, Beiqian Dai

Modelling session variability in text-independent speaker verification
Robbie Vogt, Brendan Baker, Sridha Sridharan

Overlapping wavelet packet features for speaker verification
Mihalis Siafarikas, Todor Ganchev, Nikolaos Fakotakis, George Kokkinakis

Using Hadamard ECOC in multi-class problems based on SVM
An-rong Yin, Xiang Xie, Jingming Kuang


Single-channel Speech Enhancement


Supergaussian GARCH models for speech signals
Israel Cohen

A spectral conversion approach to feature denoising and speech enhancement
A. Mouchtaris, J. Van der Spiegel, P. Mueller, P. Tsakalides

Acoustic feedback cancellation in speech reinforcement systems for vehicles
Alfonso Ortega, Eduardo Lleida, Enrique Masgrau, Luis Buera, Antonio Miguel

Implicit control of noise canceller for speech enhancement
Julien Bourgeois, Jürgen Freudenberger, Guillaume Lathoud

Speech enhancement using Markov model of speech segments
T. M. Sunil Kumar, T. V. Sreenivas

A wavelet based noise reduction algorithm for speech signal corrupted by coloured noise
Vladimir Braquet, Takao Kobayashi

Speech enhancement in temporal DFT trajectories using Kalman filters
Esfandiar Zavarehei, Saeed Vaseghi

Formant-tracking linear prediction models for speech processing in noisy environments
Qin Yan, Saeed Vaseghi, Esfandiar Zavarehei, Ben Milner

Statistical noise compensation for cochlear implant processing
Hui Jiang, Qian-Jie Fu

WPD-based noise suppression using nonlinearly weighted threshold quantile estimation and optimal wavelet shrinking
Tuan Van Pham, Gernot Kubin

Subjective and objective quality assessment of regression-enhanced speech in real car environments
Weifeng Li, Katunobu Itou, Kazuya Takeda, Fumitada Itakura

A model for selective segregation of a target instrument sound from the mixed sound of various instruments
Masashi Unoki, Masaaki Kubo, Atsushi Haniu, Masato Akagi

Improved decision directed approach for speech enhancement using an adaptive time segmentation
Richard C. Hendriks, Richard Heusdens, Jesper Jensen

Generalized filter-bank equalizer for noise reduction with reduced signal delay
Heinrich W. Lollmann, Peter Vary

A pitch-based model for separation of reverberant speech
Nicoleta Roman, DeLiang Wang

On noise gain estimation for HMM-based speech enhancement
David Y. Zhao, W. Bastiaan Kleijn

Speech enhancement using auditory phase opponency model
Om Deshmukh, Carol Espy-Wilson



Gender and Age Issues in Speech and Language Research I, II


Speaker adaptive acoustic modeling with mixture of adult and children's speech
Matteo Gerosa, Diego Giuliani, Fabio Brugnara

A comparison of human and computer recognition accuracy for children's speech
Shona D'Arcy, Martin Russell

Italian children's speech recognition for advanced interactive literacy tutors
Piero Cosi, Bryan L. Pellom

Do speech recognizers prefer female speakers?
Martine Adda-Decker, Lori Lamel

Detecting Politeness and frustration state of a child in a conversational computer game
Serdar Yildirim, Chul Min Lee, Sungbok Lee, Alexandros Potamianos, Shrikanth Narayanan

Gender in everyday speech and language: a corpus-based study
Diana Binnenpoorte, Christophe Van Bael, Els den Os, Louis Boves

Adaptation and normalization experiments in speech recognition for 4 to 8 year old children
Daniel Elenius, Mats Blomberg

PROSPECT features and their application to missing data techniques for vocal tract length normalization
Wim Jansen, Hugo Van Hamme

Data driven subword unit modeling for speech recognition and its application to interactive reading tutors
Andreas Hagen, Bryan L. Pellom

The PF_STAR children's speech corpus
Anton Batliner, Mats Blomberg, Shona D'Arcy, Daniel Elenius, Diego Giuliani, Matteo Gerosa, Christian Hacker, Martin Russell, Stefan Steidl, Michael Wong

The Swedish NICE corpus - spoken dialogues between children and embodied characters in a computer game scenario
Linda Bell, Johan Boye, Joakim Gustafson, Mattias Heldner, Anders Lindström, Mats Wirén

A preprocessing technique for improving speech intelligibility in reverberant environments: the effect of steady-state suppression on elderly people
Yusuke Miyauchi, Nao Hodoshima, Keiichi Yasu, Nahoko Hayashi, Takayuki Arai, Mitsuko Shindo



Spoken Language Translation I, II


Document driven machine translation enhanced ASR
M. Paulik, Christian Fügen, Sebastian Stüker, Tanja Schultz, Thomas Schaaf, Alex Waibel

Automatic text dictation in computer-assisted translation
Shahram Khadivi, András Zolnay, Hermann Ney

On the use of speech recognition in computer assisted translation
L. Rodríguez, J. Civera, E. Vidal, Francisco Casacuberta, C. Martínez

Speech translation for low-resource languages: the case of Pashto
Andreas Kathol, Kristin Precoda, Dimitra Vergyri, Wen Wang, Susanne Riehemann

Finite-state transducer inference for a speech-input Portuguese-to-English machine translation system
David Picó, Jorge González, Francisco Casacuberta, Diamantino Caseiro, Isabel Trancoso

Quantitative evaluation of effects of speech recognition errors on speech translation quality
Kenko Ohta, Keiji Yasuda, Genichiro Kikui, Masuzo Yanagida

On the integration of speech recognition and statistical machine translation
E. Matusov, S. Kanthak, Hermann Ney

Integrated n-best re-ranking for spoken language translation
V. H. Quan, M. Federico, M. Cettolo

An n-gram-based statistical machine translation decoder
Josep M. Crego, José B. Mariño, Adrià de Gispert

Use of maximum entropy in natural word generation for statistical concept-based speech-to-speech translation
Liang Gu, Yuqing Gao

Improving statistical machine translation by classifying and generalizing inflected verb forms
Adrià de Gispert, José B. Mariño, Josep M. Crego

Improved speech recognition word lattice translation by confidence measure
Abdulvohid Bozarov, Yoshinori Sagisaka, Ruiqiang Zhang, Genichiro Kikui


Multi-channel Speech Enhancement


A stereo input-output superdirective beamformer for dual channel noise reduction
Thomas Lotter, Bastian Sauert, Peter Vary

Kalman filters for time delay of arrival-based source localization
Ulrich Klee, Tobias Gehrig, John McDonough

Simultaneous adaptation of echo cancellation and spectral subtraction for in-car speech recognition
Osamu Ichikawa, Masafumi Nishimura

Variable step size adaptive decorrelation filtering for competing speech separation
Rong Hu, Yunxin Zhao

Speech extraction in a car interior using frequency-domain ICA with rapid filter adaptations
Daisuke Saitoh, Atsunobu Kaminuma, Hiroshi Saruwatari, Tsuyoki Nishikawa, Akinobu Lee

Speech enhancement using non-acoustic sensors
Rongqiang Hu, Sunil D. Kamath, David V. Anderson

Improved blind dereverberation performance by using spatial information
Marc Delcroix, Takafumi Hikichi, Masato Miyoshi

A hybrid microphone array post-filter in a diffuse noise field
Junfeng Li, Masato Akagi

A framework for estimation of clean speech by fusion of outputs from multiple speech enhancement systems
Venkatesh Krishnan, Phil S. Whitehead, David V. Anderson, Mark A. Clements

A study of weighted CSP analysis with average speech spectrum for noise robust talker localization
Yuki Denda, Takanobu Nishiura, Yoichi Yamashita

Sound segregation based on binaural zero-crossings
Young-Ik Kim, Sung Jun An, Rhee Man Kil, Hyung-Min Park

A two-microphone diversity system and its application for hands-free car kits
Jürgen Freudenberger, Klaus Linhard

Directionally constrained minimization of power algorithm for speech signals
Takahiro Murakami, Kiyoshi Kurihara, Yoshihisa Ishida

Oriented global coherence field for the estimation of the head orientation in smart rooms equipped with distributed microphone arrays
Alessio Brutti, Maurizio Omologo, Piergiorgio Svaizer

Robust speaker localization through adaptive weighted pair TDOA (AWEPAT) estimation
Nilesh Madhu, Rainer Martin

A spectrogram model for enhanced source localization and noise-robust ASR
Guillaume Lathoud, Mathew Magimai-Doss, Bertrand Mesot

Denoising through source separation and minimum tracking
Sriram Srinivasan, Mattias Nilsson, W. Bastiaan Kleijn

Collaborative voice activity detection for hearing aids
Louisa Busca Grisoni, John H. L. Hansen

Using inter-frequency decorrelation to reduce the permutation inconsistency problem in blind source separation
Enrique Robledo-Arnuncio, Biing-Hwang Juang

A graphical model for multi-sensory speech processing in air-and-bone conductive microphones
Amarnag Subramanya, Zhengyou Zhang, Zicheng Liu, Jasha Droppo, Alex Acero


Phonetics and Phonology I, II


On the nature of acoustic information in identification of coarticulated vowels
Sorin Dusan

Impact of duration on F1/F2 formant values of oral vowels: an automatic analysis of large broadcast news corpora in French and German
Cédric Gendrot, Martine Adda-Decker

Modeling of between-speaker and within-speaker variation in spontaneous speech tempo
Hugo Quene

Vowel devoicing vs. mora-timed rhythm in spontaneous Japanese - inspection of phonetic labels of OGI_TS
Masahiko Komatsu, Makiko Aoyagi

Does vowel space size depend on language vowel inventories? evidence from two Arabic dialects and French
Jalal-Eddin Al-Tamimi, Emmanuel Ferragne

Understanding phonology by phonetic implementation
Chilin Shih

The feature [sonorant] in lexical access
Danny R. Moates, Zinny S. Bond, Russell Fox, Verna Stockmal

Voice and aspiration in German and east bengali stops: a cross-language study
Simone Mikuteit

Polder dutch: aspects of the /ei/-lowering in standard dutch
Irene Jacobi, Louis C. W. Pols, Jan Stroop

Production and perception of Vietnamese vowels
Eric Castelli, René Carré

Using open quotient for the characterisation of vietnamese glottalised tones
Tuan Vu Ngoc, Christophe d'Alessandro, Alexis Michaud

On the acoustic characterization of ejective stops in Waima'a
John Hajek, Mary Stevens

Spirantization of /p t k/ in Sienese Italian and so-called semi-fricatives
Mary Stevens, John Hajek

Italian geminates under speech rate and focalization changes: kinematic, acoustic, and perception data
Barbara Gili Fivela, Claudio Zmarich

Durational characteristics of Korean Lombard speech
Sunhee Kim

A cross-linguistic study of vowel quantity in different word structures: Japanese, Finnish and Czech
Toshiko Isei-Jaakkola, Satoshi Asakawa

Acoustic properties of foreign accent: VOT variations in Moroccan-accented Italian
Laura Mori, Melissa Barkat-Defradas

The interrelation between the perception and production of English vowels by native speakers of Brazilian Portuguese
Andréia S. Rauber, Paola Escudero, Ricardo A. H. Bion, Barbara O. Baptista

Recognition of German obstruents
Julia Hoelterhoff

Czech voiced labiodental continuant discrimination from basic acoustic data
Radek Skarnitzl, Jan Volín

An elitist approach for extracting automatically well-realized speech sounds with high confidence
Jean-Baptiste Maj, Anne Bonneau, Dominique Fohr, Yves Laprie

Applying multiple regression models for predicting word duration in a corpus of spontaneous speech
Na'im R. Tyson

On european Portuguese automatic syllabification
Catarina Oliveira, Lurdes Castro Moutinho, António J. S. Teixeira

Rule-based grapheme-to-phoneme method for the Greek
A. Chalamandaris, S. Raptis, Pirros Tsiakoulis

Assimilation and deletion phenomena involving word-final /n/ and word-initial /p, t, k/ in modern Greek: a codification of the observed variation intended for use in TTS synthesis
Constandinos Kalimeris, George Mikros, Stelios Bakamidis

A German viseme-set for automatic transcription of input text used for audio-visual speech synthesis
Christian Weiss, Bianca Aschenberner

Visual perception of anticipatory rounding gestures in French
Johanna-Pascale Roy



TTS Inventory


Synthesising hyperarticulation in unit selection TTS
Matthew P. Aylett

Symbolic prosody driven unit selection for highly natural synthetic speech
Daniel Tihelka

Hybrid syllable/triphone speech synthesis
Jindrich Matousek, Zdenek Hanzlícek, Daniel Tihelka

A neural network approach for the design of the target cost function in unit-selection speech synthesis
Francisco Campillo Díaz, José Luis Alba, Eduardo Rodríguez Banga

FSM and k-nearest-neighbor for corpus based video-realistic audio-visual synthesis
Christian Weiss

An embedded and concatenative approach to TTS of multiple languages
Gui-Lin Chen, Ke-Song Han, Zhen-Li Yu, Dong-Jian Yue, Yi-Qing Zu

Morphing spectral envelopes using audio flow
Tony Ezzat, Ethan Meyers, James Glass, Tomaso Poggio

Linguistic features weighting for a text-to-speech system without prosody model
Vincent Colotte, Richard Beaufort

Unit selection synthesis database development using utterance verification
Ingunn Amdal, Torbjørn Svendsen

Refining phoneme segmentations using speaker-adaptive context dependent boundary models
Yong Zhao, Lijuan Wang, Min Chu, Frank K. Soong, Zhigang Cao

Customizing base unit set with speech database in TTS systems
Yining Chen, Yong Zhao, Min Chu

Unit selection for speech synthesis based on a new acoustic target cost
Soufiane Rouibia, Olivier Rosec

Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling
Dan Chazan, Ron Hoory, Zvi Kons, Ariel Sagi, Slava Shechtman, Alexander Sorin

High quality Spanish restricted-domain TTS oriented to a weather forecast application
Francesc Alías, Ignasi Iriondo, Lluís Formiga, Xavier Gonzalvo, Carlos Monzo, Xavier Sevillano

Comparing spectral distance measures for join cost optimization in concatenative speech synthesis
Ingmund Bjørkan, Torbjørn Svendsen, Snorre Farner

HMM-based european Portuguese TTS system
Maria João Barros, Ranniery Maia, Keiichi Tokuda, Fernando Gil Resende, Diamantino Freitas

Combining the flexibility of speech synthesis with the naturalness of pre-recorded audio: a comparison of two approaches to phrase-splicing TTS
Wael Hamza, John F. Pitrelli

Codec integrated voice conversion for embedded speech synthesis
Guntram Strecha, Oliver Jokisch, Matthias Eichner, Rüdiger Hoffmann

Evaluation of VTLN-based voice conversion for embedded speech synthesis
David Sundermann, Guntram Strecha, Antonio Bonafonte, Harald Höge, Hermann Ney

Model adaptation and adaptive training using ESAT algorithm for HMM-based speech synthesis
Juri Isogai, Junichi Yamagishi, Takao Kobayashi

Embedded Cantonese TTS for multi-device access to web content
Tien-Ying Fung, Yuk-Chi Li, Eddie Sio, Icarus Lee, Helen Meng, P. C. Ching

Model based analysis of a diphone database for improved unit concatenation
Karl Schnell, Arild Lacroix



Discourse and Dialogue I, II


Synchronizing dialogue contributions of human users and virtual characters in a virtual reality environment
Norbert Pfleger, Markus Löckelt

Does active learning help automatic dialog act tagging in meeting data?
Anand Venkataraman, Yang Liu, Elizabeth Shriberg, Andreas Stolcke

A principled approach for rejection threshold optimization in spoken dialog systems
Dan Bohus, Alexander I. Rudnicky

Application of confidence measures for dialogue systems through the use of parallel speech recognizers
David Pérez-Piñar López, Carmen García Mateo

Multi-level information and automatic dialog acts detection in human-human spoken dialogs
Sophie Rosset, Delphine Tribout

From question answering to spoken dialogue: towards an information search assistant for interactive multimodal information extraction
Rieks op den Akker, Harry Bunt, Simon Keizer, Boris van Schooten

Timing of experimentally elicited minimal responses as quantitative evidence for the use of intonation in projecting TRPs
Wieneke Wesseling, Rob J. J. H. van Son

Linguistic and acoustic features depending on different situations - the experiments considering speech recognition rate
Shinya Yamada, Toshihiko Itoh, Kenji Araki

Towards voiceXML compilation for portable embedded applications in ubiquitous environments
Dirk Bühler, Stefan W. Hamerich

Prosody in public speech: analyses of a news announcement and a Political interview
Eva Strangert

Characterising dialogue call-flows for pervasive environments
Amit Anil Nanavati, Nitendra Rajput

An architecture for pluggable disambiguation mechanism for RDC based voice applications
Tanveer Faruquie, Pankaj Kankar, Nitendra Rajput, Abhishek Verma

Adapting dialog call-flows for pervasive devices
Nitendra Rajput, Amit Anil Nanavati, Abhishek Kumar, Neeraj Chaudhary

Clarification questions to improve dialogue flow and speech recognition in spoken dialogue systems
Ulf Krum, Hartwig Holzapfel, Alex Waibel

Speech interface for controlling an hi-fi audio system based on a Bayesian belief networks approach for dialog modeling
Fernando Fernández, Javier Ferreiros, Valentín Sama, Juan Manuel Montero, Rubén San Segundo, Javier Macías-Guarasa, Rafael García





Topics in Speech Recognition


Comparing HMM, maximum entropy, and conditional random fields for disfluency detection
Yang Liu, Elizabeth Shriberg, Andreas Stolcke, Mary Harper

Recognizing speech from simultaneous speakers
Bhiksha Raj, Rita Singh, Paris Smaragdis

Polynomial dynamic time warping kernel support vector machines for dysarthric speech recognition with sparse training data
Vincent Wan, James Carmichael

Flavoured acoustic model and combined spelling to sound for asymmetrical bilingual environment
R. Lejeune, J. Baude, C. Tchong, H. Crepy, C. Waast-Richard

Genetic triangulation of graphical models for speech and language processing
Chris Bartels, Kevin Duh, Jeff Bilmes, Katrin Kirchhoff, Simon King

Improving speech recognition using a data-driven approach
Guillermo Aradilla, Jithendra Vepa, Hervé Bourlard

Outlier detection for acoustic model training using robust statistics
Shigeki Matsuda, Wolfgang Herbordt, Satoshi Nakamura

Optimization methods for discriminative training
Jonathan Le Roux, Erik McDermott

Segmentation of recordings based on partial transcriptions
Patrick Cardinal, Gilles Boulianne, Michel Comeau

A speaker independent continuous speech recognizer for Amharic
Hussien Seid, Björn Gambäck

Optimizing the structure of partly-hidden Markov models using weighted likelihood-ratio maximization criterion
Tetsuji Ogawa, Tetsunori Kobayashi

Multilingual speech recognition: a unified approach
C. Santhosh Kumar, V. P. Mohandas, Haizhou Li

Detection of recognition errors based on classifiers trained on artificially created data
Tomás Bartos, Ludek Müller

On designing and evaluating speech event detectors
Jinyu Li, Chin-Hui Lee

Local word confidence measure using word graph and n-best list
Joseph Razik, Odile Mella, Dominique Fohr, Jean-Paul Haton

Mandarin/English mixed-lingual name recognition for mobile phone
Xiaolin Ren, Xin He, Yaxin Zhang

New word-level and sentence-level confidence scoring using graph theory calculus and its evaluation on speech understanding
Javier Ferreiros, Rubén San Segundo, Fernando Fernández, Luis-Fernando D'Haro, Valentín Sama, Roberto Barra, Pedro Mellén

Analysis of spectral space reduction in spontaneous speech and its effects on speech recognition performances
Masanobu Nakamura, Koji Iwano, Sadaoki Furui

SVitchboard 1: small vocabulary tasks from Switchboard
Simon King, Chris Bartels, Jeff Bilmes


×

Keynote Papers

Speech Recognition - Language Modelling I-III

Prosody in Language Performance I, II

Spoken Language Extraction / Retrieval I, II

The Blizzard Challenge 2005

New Applications

E-learning and Spoken Language Processing

E-inclusion and Spoken Language Processing I, II

Acoustic Processing for ASR I-III

Speech Recognition - Adaptation I, II

Signal Analysis, Processing and Feature Estimation I-III

Robust Speech Recognition I-IV

Speech Perception I, II

Spoken Language Understanding I, II

Paralinguistic and Nonlinguistic Information in Speech

Issues in Large Vocabulary Decoding

Spoken Language Acquisition, Development and Learning I, II

Multi-modal / Multi-media Processing I, II

Emotional speech analysis and synthesis

Spoken / Multi-modal Dialogue Systems I, II

Speech Production I

Spoken Language Resources and Technology Evaluation I, II

Early Language Acquisition

Bridging the Gap ASR-HSR

Speech Recognition - Pronunciation Modelling

Prosodic Structure

Applications of Confidence Related Measures to ASR

Multilingual TTS

Speech Bandwidth Extension

Large Vocabulary Speech Recognition Systems

Prosody Modelling and Speech Technology I, II

Detecting and Synthesizing Speaker State

Rapid Development of Spoken Dialogue Systems

Text-to-Speech I, II

Speaker Characterization and Recognition I-IV

Single-channel Speech Enhancement

Acoustic Modelling for LVCSR

Gender and Age Issues in Speech and Language Research I, II

Language and Dialect Identification I, II

Spoken Language Translation I, II

Multi-channel Speech Enhancement

Phonetics and Phonology I, II

Human factors, User Experience and Natural Language Application Design

TTS Inventory

Speech Coding

Discourse and Dialogue I, II

Speech Recognition in Ubiquitous Networking and Context-Aware Computing

Speech Coding and Quality Assessment

Speech Inversion

Topics in Speech Recognition