ISCA Archive Interspeech 2010 Sessions Booklet
  ISCA Archive Sessions Booklet

Interspeech 2010

Makuhari, Chiba, Japan
26-30 September 2010

General Chair: Keikichi Hirose
doi: 10.21437/Interspeech.2010

ASR: Acoustic Models I-III

A discriminative splitting criterion for phonetic decision trees
Simon Wiesler, Georg Heigold, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney

Canonical state models for automatic speech recognition
Mark J. F. Gales, Kai Yu

Restructuring exponential family mixture models
Pierre L. Dognin, John R. Hershey, Vaibhava Goel, Peder Olsen

Unsupervised discovery and training of maximally dissimilar cluster models
Françoise Beaufays, Vincent Vanhoucke, Brian Strope

Probabilistic state clustering using conditional random field for context-dependent acoustic modelling
Khe Chai Sim

Integrate template matching and statistical modeling for speech recognition
Xie Sun, Yunxin Zhao

Boosting systems for LVCSR
George Saon, Hagen Soltau

Incorporating sparse representation phone identification features in automatic speech recognition using exponential families
Vaibhava Goel, Tara N. Sainath, Bhuvana Ramabhadran, Peder Olsen, David Nahamoo, Dimitri Kanevsky

Integrating MLP features and discriminative training in data sampling based ensemble acoustic modeling
Xin Chen, Yunxin Zhao

Semi-supervised training of Gaussian mixture models by conditional entropy minimization
Jui-Ting Huang, Mark Hasegawa-Johnson

A study of irrelevant variability normalization based training and unsupervised online adaptation for LVCSR
Guangchuan Shi, Yu Shi, Qiang Huo

Improvements to generalized discriminative feature transformation for speech recognition
Roger Hsiao, Florian Metze, Tanja Schultz

Parallel training of neural networks for speech recognition
Karel Veselý, Lukáš Burget, František Grézl

The use of sense in unsupervised training of acoustic models for ASR systems
Rita Singh, Benjamin Lambert, Bhiksha Raj

Boosted mixture learning of Gaussian mixture HMMs for speech recognition
Jun Du, Yu Hu, Hui Jiang

On the exploitation of hidden Markov models and linear dynamic models in a hybrid decoder architecture for continuous speech recognition
Volker Leutnant, Reinhold Haeb-Umbach

Context dependent modelling approaches for hybrid speech recognizers
Alberto Abad, Thomas Pellegrini, Isabel Trancoso, João Neto

A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination
Yotaro Kubo, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi

Decision tree state clustering with word and syllable features
Hank Liao, Chris Alberti, Michiel Bacchiani, Olivier Siohan

A duration modeling technique with incremental speech rate normalization
Hiroshi Fujimura, Takashi Masuko, Mitsuyoshi Tachimori

Long short-term memory networks for noise robust speech recognition
Martin Wöllmer, Yang Sun, Florian Eyben, Björn Schuller

One-model speech recognition and synthesis based on articulatory movement HMMs
Tsuneo Nitta, Takayuki Onoda, Masashi Kimura, Yurie Iribe, Kouichi Katsurada

Acoustic modeling with bootstrap and restructuring for low-resourced languages
Xiaodong Cui, Jian Xue, Pierre L. Dognin, Upendra V. Chaudhari, Bowen Zhou

Lecture speech recognition by combining word graphs of various acoustic models
Tetsuo Kosaka, Keisuke Goto, Takashi Ito, Masaharu Kato

Semi-parametric trajectory modelling using temporally varying feature mapping for speech recognition
Khe Chai Sim, Shilin Liu

Deep-structured hidden conditional random fields for phonetic recognition
Dong Yu, L. Deng

Semi-supervised learning for improved expression of uncertainty in discriminative classifiers
Jonathan Malkin, Jeff Bilmes

Modeling posterior probabilities using the linear exponential family
Peder Olsen, Vaibhava Goel, Charles Micchelli, John R. Hershey

Spoken Dialogue Systems II

New technique to enhance the performance of spoken dialogue systems based on dialogue states-dependent language models and grammatical rules
Ramón López-Cózar, David Griol

A stochastic finite-state transducer approach to spoken dialog management
Lluís-F. Hurtado, Joaquin Planells, Encarna Segarra, Emilio Sanchis, David Griol

Enhanced monitoring tools and online dialogue optimisation merged into a new spoken dialogue system design experience
Romain Laroche, Philippe Bretier, Ghislain Putois

Optimising a handcrafted dialogue system design
Romain Laroche, Ghislain Putois, Philippe Bretier

Utterance selection for speech acts in a cognitive tourguide scenario
Felix Putze, Tanja Schultz

Lexical entrainment of real users in the let's go spoken dialog system
Gabriel Parent, Maxine Eskenazi

Combining user intention and error modeling for statistical dialog simulators
Silvia Quarteroni, Meritxell González, Giuseppe Riccardi, Sebastian Varges

Parallel processing of interruptions and feedback in companions affective dialogue system
Jaakko Hakulinen, Markku Turunen, Raúl Santos de la Camara, Nigel Crook

Dynamic language modeling using Bayesian networks for spoken dialog systems
Antoine Raux, Neville Mehta, Deepak Ramachandran, Rakesh Gupta

Automatic detection of task-incompleted dialog for spoken dialog system based on dialog act n-gram
Sunao Hara, Norihide Kitaoka, Kazuya Takeda

Dialogue act detection in error-prone spoken dialogue systems using partial sentence tree and latent dialogue act matrix
Wei-Bin Liang, Chung-Hsien Wu, Yu-Cheng Hsiao

Detection of hot spots in poster conversations based on reactive tokens of audience
Tatsuya Kawahara, Kouhei Sumi, Zhi-Qiang Chang, Katsuya Takanashi

Psychological evaluation of a group communication activation robot in a party game
Yoichi Matsuyama, Shinya Fujie, Hikaru Taniyama, Tetsunori Kobayashi

Analyzing user utterances in barge-in-able spoken dialogue system for improving identification accuracy
Kyoko Matsuyama, Kazunori Komatani, Ryu Takeda, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

Pitch similarity in the vicinity of backchannels
Mattias Heldner, Jens Edlund, Julia Hirschberg

A rule-based backchannel prediction model using pitch and pause information
Khiet P. Truong, Ronald Poppe, Dirk Heylen

Speech Synthesis: Unit Selection and Others

A classifier-based target cost for unit selection speech synthesis trained on perceptual data
Volker Strom, Simon King

Applying scalable phonetic context similarity in unit selection of concatenative text-to-speech
Wei Zhang, Xiaodong Cui

Speech database reduction method for corpus-based TTS system
Mitsuaki Isogai, Hideyuki Mizuno

Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier
Heng Lu, Zhen-Hua Ling, Si Wei, Lirong Dai, Ren-Hua Wang

Using robust viterbi algorithm and HMM-modeling in unit selection TTS to replace units of poor quality
Hanna Silén, Elina Helander, Jani Nurminen, Konsta Koppinen, Moncef Gabbouj

Automatic detection of abnormal stress patterns in unit selection synthesis
Yeon-Jun Kim, Mark C. Beutnagel

Enhancements of viterbi search for fast unit selection synthesis
Daniel Tihelka, Jiří Kala, Jindřich Matoušek

Accurate pitch marking for prosodic modification of speech segments
Thomas Ewender, Beat Pfister

A novel hybrid approach for Mandarin speech synthesis
Shifeng Pan, Meng Zhang, Jianhua Tao

Modeling liaison in French by using decision trees
Josafá de Jesus Aguiar Pontes, Sadaoki Furui

Improvement on plural unit selection and fusion
Jian Luan, Jian Li

Improving speech synthesis of machine translation output
Alok Parlikar, Alan W. Black, Stephan Vogel

Paraphrase generation to improve text-to-speech synthesis
Ghislain Putois, Jonathan Chevelu, Cédric Boidin

ASR: Search, Decoding and Confidence Measures I, II

Phone mismatch penalty matrices for two-stage keyword spotting via multi-pass phone recognizer
Chang Woo Han, Shin Jae Kang, Chul Min Lee, Nam Soo Kim

English spoken term detection in multilingual recordings
Petr Motlicek, Fabio Valente, Philip N. Garner

A hybrid approach to robust word lattice generation via acoustic-based word detection
Icksang Han, Chiyoun Park, Jeongmi Cho, Jeongsu Kim

Direct observation of pruning errors (DOPE): a search analysis tool
Volker Steinbiss, Martin Sundermeyer, Hermann Ney

Direct construction of compact context-dependency transducers from data
David Rybach, Michael Riley

Incremental composition of static decoding graphs with label pushing
Miroslav Novák

A novel path extension framework using steady segment detection for Mandarin speech recognition
Zhanlei Yang, Wenju Liu

On the relation of Bayes risk, word error, and word posteriors in ASR
Ralf Schlüter, Markus Nußbaum-Thom, Hermann Ney

Time conditioned search in automatic speech recognition reconsidered
D. Nolden, Hermann Ney, Ralf Schlüter

Efficient data selection for speech recognition based on prior confidence estimation using speech and context independent models
Satoshi Kobashikawa, Taichi Asami, Yoshikazu Yamaguchi, Hirokazu Masataki, Satoshi Takahashi

A novel confidence measure based on marginalization of jointly estimated error cause probabilities
Atsunori Ogawa, Atsushi Nakamura

CRF-based combination of contextual features to improve a posteriori word-level confidence measures
Julien Fayolle, Fabienne Moreau, Christian Raymond, Guillaume Gravier, Patrick Gros

Recognition of spontaneous conversational speech using long short-term memory phoneme predictions
Martin Wöllmer, Florian Eyben, Björn Schuller, Gerhard Rigoll

Improving ASR error detection with non-decoder based features
Thomas Pellegrini, Isabel Trancoso

Phoneme classification and lattice rescoring based on a k-NN approach
Ladan Golipour, Douglas O'Shaughnessy

Online adaptive learning for speech recognition decoding
Jeff Bilmes, Hui Lin

Improvements of search error risk minimization in viterbi beam search for speech recognition
Takaaki Hori, Shinji Watanabe, Atsushi Nakamura

Special-Purpose Speech Applications

Evaluation of a silent speech interface based on magnetic sensing
Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, Sergey I. Rybchenko

Advanced speech communication system for deaf people
Rubén San-Segundo, Verónica López, Raquel Martín, Syaheerah Lufti, Javier Ferreiros, Ricardo Córdoba, José Manuel Pardo

Unsupervised acoustic model adaptation for multi-origin non native ASR
Sethserey Sam, Eric Castelli, Laurent Besacier

Speech-based automated cognitive status assessment
Dilek Hakkani-Tür, Dimitra Vergyri, Gokhan Tur

Speech recognition with a seamlessly updated language model for real-time closed-captioning
Toru Imai, Shinichi Homma, Akio Kobayashi, Takahiro Oku, Shoei Sato

The comparison between the deletion-based methods and the mixing-based methods for audio CAPTCHA systems
Takuya Nishimoto, Takayuki Watanabe

Comparing mono- & multilingual acoustic seed models for a low e-resourced language: a case-study of luxembourgish
Martine Adda-Decker, Lori Lamel, Natalie D. Snoeren

Manipulating treacheoesophageal speech
Rob J. J. H. van Son, Irene Jacobi, Frans Hilgers

Towards mixed language speech recognition systems
David Imseng, Hervé Bourlard, Mathew Magimai Doss

Voice search for development
Etienne Barnard, Johan Schalkwyk, Charl van Heerden, Pedro J. Moreno

Cross-cultural investigation of prosody in verbal feedback in interactional rapport
Gina-Anne Levow, Susan Duncan, Edward T. King

Multimodal speaker diarization using oriented optical flow histograms
Mary Tai Knox, Gerald Friedland

Towards an ASR-free objective analysis of pathological speech
Catherine Middag, Yvan Saeys, Jean-Pierre Martens

Speaker Characterization and Recognition I-IV

Simple and efficient speaker comparison using approximate KL divergence
William M. Campbell, Zahi N. Karam

The IIR NIST SRE 2008 and 2010 summed channel speaker recognition systems
Hanwu Sun, Bin Ma, Chien-Lin Huang, Trung Hieu Nguyen, Haizhou Li

Speaker characterization using long-term and temporal information
Chien-Lin Huang, Hanwu Sun, Bin Ma, Haizhou Li

Score-level compensation of extreme speech duration variability in speaker verification
Sergio Perez-Gomez, Daniel Ramos, Javier Gonzalez-Dominguez, Joaquin Gonzalez-Rodriguez

Speaker recognition experiments using connectionist transformation network features
Alberto Abad, Isabel Trancoso

Speaker recognition using supervised probabilistic principal component analysis
Yun Lei, John H. L. Hansen

Looking for relevant features for speaker role recognition
Benjamin Bigot, Julien Pinquier, Isabelle Ferrané, Régine André-Obrecht

Prosodic speaker verification using subspace multinomial models with intersession compensation
Marcel Kockmann, Lukáš Burget, Ondřej Glembek, Luciana Ferrer, Jan Černocký

The estimation and kernel metric of spectral correlation for text-independent speaker verification
Eryu Wang, Kong Aik Lee, Bin Ma, Haizhou Li, Wu Guo, Lirong Dai

Improving monaural speaker identification by double-talk detection
Rahim Saeidi, Pejman Mowlaee, Tomi Kinnunen, Zheng-Hua Tan, Mads Græsbøll Christensen, Søren Holdt Jensen, Pasi Fränti

Exploring subsegmental and suprasegmental features for a text-dependent speaker verification in distant speech signals
B. Avinash, S. Guruprasad, Bayya Yegnanarayana

A fast implementation of factor analysis for speaker verification
Qingsong Liu, Wei Huang, Dongxing Xu, Hongbin Cai, Beiqian Dai

An investigation into direct scoring methods without SVM training in speaker verification
Ce Zhang, Rong Zheng, Bo Xu

Large margin Gaussian mixture models for speaker identification
Reda Jourani, Khalid Daoudi, Régine André-Obrecht, Driss Aboutajdine

On the use of Gaussian component information in the generative likelihood ratio estimation for speaker verification
Rong Zheng, Bo Xu

Acoustic vector resampling for GMMSVM-based speaker verification
Man-Wai Mak, Wei Rao

A fast speaker indexing using vector quantization and second order statistics with adaptive threshold computation
Konstantin Biatov

Using phoneme recognition and text-dependent speaker verification to improve speaker segmentation for Chinese speech
Gang Wang, Xiaojun Wu, Thomas Fang Zheng

On enhancing feature sequence filtering with filter-bank energy transformation in speaker verification with telephone speech
Claudio Garretón, Néstor Becerra Yoma

MAP estimation of subspace transform for speaker recognition
Donglai Zhu, Bin Ma, Kong Aik Lee, Cheung-Chi Leung, Haizhou Li

A longest matching segment approach for text-independent speaker recognition
Ayeh Jafari, Ramji Srinivasan, Danny Crookes, Ji Ming

Approaching human listener accuracy with modern speaker verification
Ville Hautamäki, Tomi Kinnunen, Mohaddeseh Nosratighods, Kong Aik Lee, Bin Ma, Haizhou Li

Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions
Jouni Pohjalainen, Rahim Saeidi, Tomi Kinnunen, Paavo Alku

The use of subvector quantization and discrete densities for fast GMM computation for speaker verification
Guoli Ye, Brian Mak

Transcript-dependent speaker recognition using mixer 1 and 2
Fred S. Richardson, Joseph P. Campbell

On the potential of glottal signatures for speaker recognition
Thomas Drugman, Thierry Dutoit

Acoustic feature diversity and speaker verification
R. Padmanabhan, Hema A. Murthy

A discriminative performance metric for GMM-UBM speaker identification
Omid Dehzangi, Bin Ma, Eng Siong Chng, Haizhou Li

A novel speaker binary key derived from anchor models
Xavier Anguera, Jean-François Bonastre

Variant time-frequency cepstral features for speaker recognition
Wei-Qiang Zhang, Yan Deng, Liang He, Jia Liu

Exploitation of phase information for speaker recognition
Ning Wang, P. C. Ching, Tan Lee

Effects of the phonological relevance in speaker verification
Yanhua Long, Lirong Dai, Bin Ma, Wu Guo

Topological representation of speech for speaker recognition
Gabriel H. Sierra, Jean-François Bonastre, Driss Matrouf, Jose R. Calvo

Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions
Seyed Omid Sadjadi, John H. L. Hansen

Speaker recognition using the resynthesized speech via spectrum modeling
Xiang Zhang, Chuan Cao, Lin Yang, Hongbin Suo, Jianping Zhang, Yonghong Yan

Speech Synthesis: HMM-Based Speech Synthesis I, II

Speaker and language adaptive training for HMM-based polyglot speech synthesis
Heiga Zen

Context adaptive training with factorized decision trees for HMM-based speech synthesis
Kai Yu, Heiga Zen, François Mairesse, Steve Young

Roles of the average voice in speaker-adaptive HMM-based speech synthesis
Junichi Yamagishi, Oliver Watts, Simon King, Bela Usabaev

An HMM trajectory tiling (HTT) approach to high quality TTS
Yao Qian, Zhi-Jie Yan, Yijian Wu, Frank K. Soong, Xin Zhuang, Shengyi Kong

A perceptual study of acceleration parameters in HMM-based TTS
Yi-Ning Chen, Zhi-Jie Yan, Frank K. Soong

Evaluation of prosodic contextual factors for HMM-based speech synthesis
Shuji Yokomizo, Takashi Nose, Takao Kobayashi

Sinusoidal model parameterization for HMM-based TTS system
Slava Shechtman, Alex Sorin

Improved training of excitation for HMM-based parametric speech synthesis
Yoshinori Shiga, Tomoki Toda, Shinsuke Sakai, Hisashi Kawai

Excitation modeling based on waveform interpolation for HMM-based speech synthesis
June Sig Sung, Doo Hwa Hong, Kyung Hwan Oh, Nam Soo Kim

Formant-based frequency warping for improving speaker adaptation in HMM TTS
Xin Zhuang, Yao Qian, Frank K. Soong, Yijian Wu, Bo Zhang

Improved modelling of speech dynamics using non-linear formant trajectories for HMM-based speech synthesis
Hongwei Hu, Martin J. Russell

Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis
Zhen-Hua Ling, Yu Hu, Lirong Dai

Autoregressive clustering for HMM speech synthesis
Matt Shannon, William Byrne

An implementation of decision tree-based context clustering on graphics processing units
Nicholas Pilkington, Heiga Zen

Quantized HMMs for low footprint text-to-speech synthesis
Alexander Gutkin, Xavi Gonzalvo, Stefan Breuer, Paul Taylor

The role of higher-level linguistic features in HMM-based speech synthesis
Oliver Watts, Junichi Yamagishi, Simon King

HMM-based singing voice synthesis system using pitch-shifted pseudo training data
Ayami Mase, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

An unsupervised approach to creating web audio contents-based HMM voices
Jinfu Ni, Hisashi Kawai

Conversational spontaneous speech synthesis using average voice model
Tomoki Koriyama, Takashi Nose, Takao Kobayashi


Rhythm and formant features for automatic alcohol detection
Florian Schiel, Christian Heinrich, Veronika Neumeyer

An exploration of voice source correlates of focus
Irena Yanushevskaya, Christer Gobl, John Kane, Ailbhe Ní Chasaide

Modeling perceived vocal age in american English
James D. Harnsberger, Rahul Shrivastav, W. S. Brown Jr.

Multivariate analysis of vocal fatigue in continuous reading
Marie-José Caraty, Claude Montacié

Frequency-domain delexicalization using surrogate vowels
Alexander Kain, Jan P. H. van Santen

Emotion recognition using imperfect speech recognition
Florian Metze, Anton Batliner, Florian Eyben, Tim Polzehl, Björn Schuller, Stefan Steidl

A novel feature extraction strategy for multi-stream robust emotion identification
Gang Liu, Yun Lei, John H. L. Hansen

Setup for acoustic-visual speech synthesis by concatenating bimodal units
Asterios Toutios, Utpala Musti, Slim Ouni, Vincent Colotte, Brigitte Wrobel-Dautcourt, Marie-Odile Berger

Towards affective state modeling in narrative and conversational settings
Bart Jochems, Martha Larson, Roeland Ordelman, Ronald Poppe, Khiet P. Truong

Detection of anger emotion in dialog speech using prosody feature and temporal relation of utterances
Narichika Nomoto, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi

Gesture and speech coordination: the influence of the relationship between manual gesture and speech
Benjamin Roustan, Marion Dohen

Analysis and detection of cognitive load and frustration in drivers' speech
Hynek Bořil, Seyed Omid Sadjadi, Tristan Kleinschmidt, John H. L. Hansen

Acoustic-based recognition of head gestures accompanying speech
Akira Sasou, Yasuharu Hashimoto, Katsuhiko Sakaue

Multimodal dialog in the car: combining speech and turn-and-push dial to control comfort functions
Sandro Castronovo, Angela Mahr, Margarita Pentcheva, Christian Müller

Hands free audio analysis from home entertainment
Danil Korchagin, Philip N. Garner, Petr Motlicek

Affective story teller: a TTS system for emotional expressivity
Mostafa Al Masum Shaikh, Antonio Rui Ferreira Rebordão, Keikichi Hirose

ASR: Speaker Adaptation, Robustness Against Reverberation

Enhancing children's speech recognition under mismatched condition by explicit acoustic normalization
Shweta Ghai, Rohit Sinha

Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems
Bo Li, Khe Chai Sim

Augmentation of adaptation data
Ravichander Vipperla, Steve Renals, Joe Frankel

Discriminative adaptation based on fast combination of DMAP and dfMLLR
Lukáš Machlica, Zbyněk Zajíc, Luděk Müller

Revisiting VTLN using linear transformation on conventional MFCC
Doddipatla Rama Sanand, Ralf Schlüter, Hermann Ney

Speaker adaptation based on nonlinear spectral transform for speech recognition
Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Speaker adaptation based on system combination using speaker-class models
Tetsuo Kosaka, Takashi Ito, Masaharu Kato, Masaki Kohda

Speaker adaptation in transformation space using two-dimensional PCA
Yongwon Jeong, Young Rok Song, Hyung Soon Kim

On speaker adaptive training of artificial neural networks
Jan Trmal, Jan Zelinka, Luděk Müller

Model synthesis for band-limited speech recognition
Yongjun He, Jiqing Han

Performance estimation of reverberant speech recognition based on reverberant criteria RSR-dn with acoustic parameters
Takahiro Fukumori, Masanori Morise, Takanobu Nishiura

A novel approach for matched reverberant training of HMMs using data pairs
Armin Sehr, Christian Hofmann, Roland Maas, Walter Kellermann

An auditory based modulation spectral feature for reverberant speech recognition
Hari Krishna Maganti, Marco Matassoni

On the potential of channel selection for recognition of reverberated speech with multiple microphones
Martin Wolf, Climent Nadeu

An improved wavelet-based dereverberation for robust automatic speech recognition
Randy Gomez, Tatsuya Kawahara

Methods for robust speech recognition in reverberant environments: a comparison
Rico Petrick, Thomas Fehér, Masashi Unoki, Rüdiger Hoffmann

Language Learning, TTS, and Other Applications

Integration of multilayer regression analysis with structure-based pronunciation assessment
Masayuki Suzuki, Yu Qiao, Nobuaki Minematsu, Keikichi Hirose

Using non-native error patterns to improve pronunciation verification
Joost van Doremalen, Catia Cucchiarini, Helmer Strik

Regularized-MLLR speaker adaptation for computer-assisted language learning system
Dean Luo, Yu Qiao, Nobuaki Minematsu, Yutaka Yamauchi, Keikichi Hirose

Automatic evaluation of English pronunciation by Japanese speakers using various acoustic features and pattern recognition techniques
Kuniaki Hirabayashi, Seiichi Nakagawa

Decision tree based tone modeling with corrective feedbacks for automatic Mandarin tone assessment
Hsien-Cheng Liao, Jiang-Chun Chen, Sen-Chia Chang, Ying-Hua Guan, Chin-Hui Lee

CASTLE: a computer-assisted stress teaching and learning environment for learners of English as a second language
Jingli Lu, Ruili Wang, Liyanage C. De Silva, Yang Gao, Jia Liu

Automatic reference independent evaluation of prosody quality using multiple knowledge fusions
Shen Huang, Hongyan Li, Shijin Wang, Jiaen Liang, Bo Xu

Landmark-based automated pronunciation error detection
Su-Youn Yoon, Mark Hasegawa-Johnson, Richard Sproat

HMM based TTS for mixed language text
Zhiwei Shuang, Shiyin Kang, Yong Qin, Lirong Dai, Lianhong Cai

An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation
Hui Liang, John Dines

Classroom note-taking system for hearing impaired students using automatic speech recognition adapted to lectures
Tatsuya Kawahara, Norihiro Katsumaru, Yuya Akita, Shinsuke Mori

Exploring web-browser based runtimes engines for creating ubiquitous speech interfaces
Paul R. Dixon, Sadaoki Furui

Pitch and Glottal-Waveform Estimation and Modeling I, II

Efficient three-stage pitch estimation for packet loss concealment
Xuejing Sun, Sameer Gadre

On evaluation of the f0 estimation based on time-varying complex speech analysis
Keiichi Funaki

Pitch estimation in noisy speech based on temporal accumulation of spectrum peaks
Feng Huang, Tan Lee

Multi-pitch estimation by a joint 2-d representation of pitch and pitch dynamics
Tianyu T. Wang, Thomas F. Quatieri

On the effect of fundamental frequency on amplitude and frequency modulation patterns in speech resonances
Pirros Tsiakoulis, Alexandros Potamianos

Pitch determination using autocorrelation function in spectral domain
M. Shahidur Rahman, Tetsuya Shimamura

Chirp complex cepstrum-based decomposition for asynchronous glottal analysis
Thomas Drugman, Thierry Dutoit

Exploiting glottal formant parameters for glottal inverse filtering and parameterization
Alan Ó Cinnéide, David Dorran, Mikel Gainza, Eugene Coyle

Glottal parameters estimation on speech using the zeros of the z-transform
Nicolas Sturmel, Christophe d'Alessandro, Boris Doval

Significance of pitch synchronous analysis for speaker recognition using AANN models
Sri Harish Reddy Mallidi, Kishore Prahallad, Suryakanth V. Gangashetty, Bayya Yegnanarayana

On using voice source measures in automatic gender classification of children's speech
Gang Chen, Xue Feng, Yen-Liang Shue, Abeer Alwan

SAFE: a statistical algorithm for F0 estimation for both clean and noisy speech
Wei Chu, Abeer Alwan

Robust and efficient pitch estimation using an iterative ARMA technique
Jung Ook Hong, Patrick J. Wolfe

Statistical modeling of F0 dynamics in singing voices based on Gaussian processes with multiple oscillation bases
Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Hidehisa Nagano, Kunio Kashino

Applying geometric source separation for improved pitch extraction in human-robot interaction
Martin Heckmann, Claudius Gläser, Frank Joublin, Kazuhiro Nakadai

A spectral LF model based approach to voice source parameterisation
John Kane, Mark Kane, Christer Gobl

Glottal-based analysis of the lombard effect
Thomas Drugman, Thierry Dutoit

New Paradigms in ASR I, II

Mandarin digit recognition assisted by selective tone distinction
Xiao-Dong Wang, Kunihiko Owa, Makoto Shozakai

Brazilian portuguese acoustic model training based on data borrowing from other language
Kazuhiko Abe, Sakriani Sakti, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura

Rapid bootstrapping of five eastern european languages using the rapid language adaptation toolkit
Ngoc Thang Vu, Tim Schlippe, Franziska Kraus, Tanja Schultz

Cross-lingual speaker adaptation via Gaussian component mapping
Houwei Cao, Tan Lee, P. C. Ching

Cross-lingual acoustic modeling for dialectal Arabic speech recognition
Mohamed Elmahdy, Rainer Gruhn, Wolfgang Minker, Slim Abdennadher

Cross-lingual and multi-stream posterior features for low resource LVCSR systems
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky

Latent perceptual mapping: a new acoustic modeling framework for speech recognition
Shiva Sundaram, Jerome R. Bellegarda

Unsupervised model adaptation on targeted speech segments for LVCSR system combination
Richard Dufour, Fethi Bougares, Yannick Estève, Paul Deléglise

Incremental word learning using large-margin discriminative training and variance floor estimation
Irene Ayllón Clemente, Martin Heckmann, Alexander Denecke, Britta Wrede, Christian Goerick

State-based labelling for a sparse representation of speech and its application to robust speech recognition
Tuomas Virtanen, Jort F. Gemmeke, Antti Hurmalainen

Similarity scoring for recognizing repeated out-of-vocabulary words
Mirko Hannemann, Stefan Kombrink, Martin Karafiát, Lukáš Burget

Data pruning for template-based automatic speech recognition
Dino Seppi, Dirk Van Compernolle

Improved topic classification and keyword discovery using an HMM-based speech recognizer trained without supervision
Man-Hung Siu, Herbert Gish, Arthur Chan, William Belfield

An analysis of sparseness and regularization in exemplar-based methods for speech classification
Dimitri Kanevsky, Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo

Investigation of full-sequence training of deep belief networks for speech recognition
Abdel-rahman Mohamed, Dong Yu, L. Deng

Mandarin tone recognition using affine-invariant prosodic features and tone posteriorgram
Yow-Bang Wang, Lin-shan Lee

Continuous speech recognition with a TF-IDF acoustic model
Geoffrey Zweig, Patrick Nguyen, Jasha Droppo, Alex Acero

SCARF: a segmental conditional random field toolkit for speech recognition
Geoffrey Zweig, Patrick Nguyen

Speech Production: Various Approaches

Speaking style dependency of formant targets
Akiko Amano-Kusumoto, John-Paul Hosom, Alexander Kain

Similarity of effects of emotions on the speech organ configuration with and without speaking
Tatsuya Kitamura

A study of intra-speaker and inter-speaker affective variability using electroglottograph and inverse filtered glottal waveforms
Daniel Bone, Samuel Kim, Sungbok Lee, Shrikanth S. Narayanan

Modal analysis of vocal fold vibrations using laryngotopography
Ken-Ichi Sakakibara, Hiroshi Imagawa, Miwako Kimura, Hisayuki Yokonishi, Niro Tayama

Laryngeal voice quality in the expression of focus
Martti Vainio, Matti Airas, Juhani Järvikivi, Paavo Alku

Laryngeal characteristics during the production of geminate consonants
Masako Fujimoto, Kikuo Maekawa, Seiya Funatsu

Numerical study of turbulent flow-induced sound production in presence of a tooth-shaped obstacle: towards sibilant [s] physical modeling
Julien Cisonni, Kazunori Nozaki, Annemie Van Hirtum, Shigeo Wada

Morphological and predictability effects on schwa reduction: the case of dutch word-initial syllables
Iris Hanique, Barbara Schuppler, Mirjam Ernestus

Acoustic-to-articulatory inversion based on local regression
Samer Al Moubayed, G. Ananthakrishnan

Korean lenis, fortis, and aspirated stops: effect of place of articulation on acoustic realization
Mirjam Broersma

Speech synthesis by modeling harmonics structure with multiple function
Toru Nakashika, Ryuki Tachibana, Masafumi Nishimura, Tetsuya Takiguchi, Yasuo Ariki

Physics of body-conducted silent speech - production, propagation and representation of non-audible murmur
Makoto Otani, Tatsuya Hirahara

Speech Enhancement

Multichannel noise reduction using low order RTF estimate
Subhojit Chakladar, Nam Soo Kim, Yu Gwang Jin, Tae Gyoon Kang

Reinforced blocking matrix with cross channel projection for speech enhancement
Inho Lee, Jongsung Yoon, Yoonjae Lee, Hanseok Ko

Masking property based microphone array post-filter design
Ning Cheng, Wenju Liu, Lan Wang

Reduction of broadband noise in speech signals by multilinear subspace analysis
Yusuke Sato, Tetsuya Hoya, Hovagim Bakardjian, Andrzej Cichocki

Novel probabilistic control of noise reduction for improved microphone array beamforming
Jungpyo Hong, Seungho Han, Sangbae Jeong, Minsoo Hahn

Speech enhancement using improved generalized sidelobe canceller in frequency domain with multi-channel postfiltering
Kai Li, Qiang Fu, Yonghong Yan

Close speaker cancellation for suppression of non-stationary background noise for hands-free speech interface
Jani Even, Carlos Ishi, Hiroshi Saruwatari, Norihiro Hagita

Multi-channel iterative dereverberation based on codebook constrained iterative multi-channel wiener filter
Ajay Srinivasamurthy, Thippur V. Sreenivas

Speaker-dependent mapping of source and system features for enhancement of throat microphone speech
Anand Joseph Xavier Medabalimi, Sri Harish Reddy Mallidi, Bayya Yegnanarayana

An analytic modeling approach to enhancing throat microphone speech commands for keyword spotting
Jun Cai, Stefano Marini, Pierre Malarme, Francis Grenez, Jean Schoentgen

Single-channel speech enhancement using kalman filtering in the modulation domain
Stephen So, Kamil K. Wójcicki, Kuldip K. Paliwal

Integrated feedback and noise reduction algorithm in digital hearing aids via oscillation detection
Miao Yao, Weiqian Liang

A blind signal-to-noise ratio estimator for high noise speech recordings
Charles Mercier, Roch Lefebvre

ASR: Feature Extraction I, II

Improved phoneme recognition by integrating evidence from spectro-temporal and cepstral features
Shang-wen Li, Liang-che Sun, Lin-shan Lee

Using spectro-temporal features to improve AFE feature extraction for ASR
Suman V. Ravuri, Nelson Morgan

Using harmonic phase information to improve ASR rate
Ibon Saratxaga, Inma Hernáez, Igor Odriozola, Eva Navas, Iker Luengo, Daniel Erro

Speech recognition using long-term phase information
Kazumasa Yamamoto, Eiichi Sueyoshi, Seiichi Nakagawa

Low-dimensional space transforms of posteriors in speech recognition
Jan Zelinka, Jan Trmal, Luděk Müller

Hierarchical bottle neck features for LVCSR
Christian Plahl, Ralf Schlüter, Hermann Ney

Hierarchical neural net architectures for feature extraction in ASR
František Grézl, Martin Karafiát

Mutual information analysis for feature and sensor subset selection in surface electromyography based speech recognition
Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan

Learning from human errors: prediction of phoneme confusions based on modified ASR training
Bernd T. Meyer, Birger Kollmeier

Hidden logistic linear regression for support vector machine based phone verification
Bo Li, Khe Chai Sim

Jointly optimized discriminative features for speech recognition
Tim Ng, Bing Zhang, Long Nguyen

Invariant integration features combined with speaker-adaptation methods
Florian Müller, Alfred Mertins

Multi resolution discriminative models for subvocalic speech recognition
Mark Raugas, Vivek Kumar Rangarajan Sridhar, Rohit Prasad, Prem Natarajan

A comparative large scale study of MLP features for Mandarin ASR
Fabio Valente, Mathew Magimai Doss, Christian Plahl, Suman V. Ravuri, Wen Wang

Recognizing cochlear implant-like spectrally reduced speech with HMM-based ASR: experiments with MFCCs and PLP coefficients
Cong-Thanh Do, Dominique Pastor, Gaël Le Lan, André Goalic

SLP Systems

Comparison of methods for topic classification in a speech-oriented guidance system
Rafael Torres, Shota Takeuchi, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano

Using dependency parsing and machine learning for factoid question answering on spoken documents
Pere R. Comas, Jordi Turmo, Lluís Màrquez

A spoken term detection framework for recovering out-of-vocabulary words using the web
Carolina Parada, Abhinav Sethy, Mark Dredze, Frederick Jelinek

Improved spoken term detection by discriminative training of acoustic models based on user relevance feedback
Hung-yi Lee, Chia-ping Chen, Ching-feng Yeh, Lin-shan Lee

A lightweight keyword and tag-cloud retrieval algorithm for automatic speech recognition transcripts
Sebastian Tschöpel, Daniel Schneider

Lecture subtopic retrieval by retrieval keyword expansion using subordinate concept
Noboru Kanedera, Tetsuo Funada, Seiichi Nakagawa

Spoken document retrieval for oral presentations integrating global document similarities into local document similarities
Hiroaki Nanjo, Yusuke Iyonaga, Takehiko Yoshimi

Combining word-based features, statistical language models, and parsing for named entity recognition
Joseph Polifroni, Stephanie Seneff

Efficient combined approach for named entity recognition in spoken language
Azeddine Zidouni, Sophie Rosset, Hervé Glotin

Prominence based scoring of speech segments for automatic speech-to-speech summarization
Sree Harsha Yella, Vasudeva Varma, Kishore Prahallad

Maximum lexical cohesion for fine-grained news story segmentation
Zihan Liu, Lei Xie, Wei Feng

Phoneme lattice based texttiling towards multilingual story segmentation
Xiaoxuan Wang, Lei Xie, Bin Ma, Eng Siong Chng, Haizhou Li


When is indexical information about speech activated? evidence from a cross-modal priming experiment
Benjamin Munson, Renata Solum

The influence of actual and perceived sexual orientation on diadochokinetic rate in women and men
Benjamin Munson

Laryngealization and features for Chinese tonal recognition
Kristine M. Yu

Production and perception of vietnamese short vowels in V1V2 context
Viet Son Nguyen, Eric Castelli, René Carré

Measuring basic tempo across languages and some implications for speech rhythm
Gertraud Fenk-Oczlon, August Fenk

Durational structure of Japanese single/geminate stops in three- and four-mora words spoken at varied rates
Yukari Hirata, Shigeaki Amano

Distribution and trichotomic realization of voiced velars in Japanese - an experimental study
Shin-ichiro Sano, Tomohiko Ooigawa

Specification in context - devoicing processes in Polish, French, american English and German sonorants
Jagoda Sieczkowska, Bernd Möbius, Grzegorz Dogil

Phonetic imitation of Japanese vowel devoicing
Kuniko Nielsen

Post-aspiration in standard Italian: some first cross-regional acoustic evidence
Mary Stevens, John Hajek

Articulatory grounding of southern salentino harmony processes
Mirko Grimaldi, Andrea Calabrese, Francesco Sigona, Luigina Garrapa, Bianca Sisinni

Effects of accent typicality and phonotactic frequency on nonword immediate serial recall performance in Japanese
Yuuki Tanida, Taiji Ueno, Satoru Saito, Matthew A. Lambon Ralph

How abstract is phonetics?
Osamu Fujimura

Speech Production: Vocal Tract Modeling and Imaging

Data-driven analysis of realtime vocal tract MRI using correlated image regions
Adam C. Lammert, Michael I. Proctor, Shrikanth S. Narayanan

Rapid semi-automatic segmentation of real-time magnetic resonance images for parametric vocal tract analysis
Michael I. Proctor, Daniel Bone, Athanasios Katsamanis, Shrikanth S. Narayanan

Improved real-time MRI of oral-velar coordination using a golden-ratio spiral view order
Yoon-Chul Kim, Shrikanth S. Narayanan, Krishna S. Nayak

Statistical multi-stream modeling of real-time MRI articulatory speech data
Erik Bresch, Athanasios Katsamanis, Louis Goldstein, Shrikanth S. Narayanan

Predicting unseen articulations from multi-speaker articulatory models
G. Ananthakrishnan, Pierre Badin, Julián Andrés Valdés Vargas, Olov Engwall

Estimating missing data sequences in x-ray microbeam recordings
Chao Qin, Miguel Á. Carreira-Perpiñán

Adaptation of a tongue shape model by local feature transformations
Chao Qin, Miguel Á. Carreira-Perpiñán, Mohsen Farhadloo

Vocal tract contour analysis of emotional speech by the functional data curve representation
Sungbok Lee, Shrikanth S. Narayanan

Locally-weighted regression for estimating the forward kinematics of a geometric vocal tract model
Adam C. Lammert, Louis Goldstein, Khalil Iskarous

Identifying articulatory goals from kinematic data using principal differential analysis
Michael Reimer, Frank Rudzicz

Estimation of speech lip features from discrete cosinus transform
Zuheng Ming, Denis Beautemps, Gang Feng, Sébastien Schmerber

Autoregressive modelling for linear prediction of ultrasonic speech
Farzaneh Ahmadi, Ian V. McLoughlin, Hamid R. Sharifzadeh

Prosody: Language-Specific Models

Influence of lexical tones on intonation in kammu
Anastasia Karlsson, David House, Jan-Olof Svantesson, Damrong Tayanin

Phonetic realization of second occurrence focus in Japanese
Satoshi Nambu, Yong-cheol Lee

Prosodic grouping and relative clause disambiguation in Mandarin
Jianjing Kuang

Text-based unstressed syllable prediction in Mandarin
Ya Li, Jianhua Tao, Meng Zhang, Shifeng Pan, Xiaoying Xu

flat pitch accents in Czech
Tomáš Duběda

Positional variability of pitch accents in Czech
Tomáš Duběda

Modeling of sentence-medial pauses in bangla readout speech: occurrence and duration
Shyamal Das Mandal, Arup Saha, Tulika Basu, Keikichi Hirose, Hiroya Fujisaki

Declarative sentence intonation patterns in 8 swiss German dialects
Adrian Leemann, Lucy Zuberbühler

Syllable-level prominence detection with acoustic evidence
Je Hun Jeon, Yang Liu

Prosody cues for classification of the discourse particle "hã" in hindi
Sankalan Prasad, Kalika Bali

Interaction of syntax-marked focus and wh-question induced focus in standard Chinese
Yuan Jia, Aijun Li

Prominence detection in Swedish using syllable correlates
Samer Al Moubayed, Jonas Beskow

Automatic analysis of the intonation of a tone language. applying the momel algorithm to spontaneous standard Chinese (beijing)
Na Zhi, Daniel Hirst, Pier Marco Bertinetto

Towards long-range prosodic attribute modeling for language recognition
Raymond W. M. Ng, Cheung-Chi Leung, Ville Hautamäki, Tan Lee, Bin Ma, Haizhou Li

A modified parameterization of the Fujisaki model
Robert Schubert, Oliver Jokisch, Diane Hirschfeld

ASR: Language Modeling and Speech Understanding I

Within and across sentence boundary language model
Saeedeh Momtazi, Friedrich Faubel, Dietrich Klakow

Impact of word classing on shrinkage-based language models
Ruhi Sarikaya, Stanley F. Chen, Abhinav Sethy, Bhuvana Ramabhadran

Combination of probabilistic and possibilistic language models
Stanislas Oger, Vladimir Popescu, Georges Linarès

On-demand language model interpolation for mobile speech input
Brandon Ballinger, Cyril Allauzen, Alexander Gruenstein, Johan Schalkwyk

Text normalization based on statistical machine translation and internet user support
Tim Schlippe, Chenfei Zhu, Jan Gebhardt, Tanja Schultz

Efficient estimation of maximum entropy language models with n-gram features: an SRILM extension
Tanel Alumäe, Mikko Kurimo

Similar n-gram language model
Christian Gillot, Christophe Cerisara, David Langlois, Jean-Paul Haton

Topic and style-adapted language modeling for Thai broadcast news ASR
Markpong Jongtaveesataporn, Sadaoki Furui

Augmented context features for Arabic speech recognition
Ahmad Emami, Hong-Kwang J. Kuo, Imed Zitouni, Lidia Mangu

A statistical segment-based approach for spoken language understanding
Lucía Ortega, Isabel Galiano, Lluís-F. Hurtado, Emilio Sanchis, Encarna Segarra

Improving back-off models with bag of words and hollow-grams
Benjamin Lecouteux, Raphaël Rubino, Georges Linarès

Study on interaction between entropy pruning and kneser-ney smoothing
Ciprian Chelba, Thorsten Brants, Will Neveitt, Peng Xu

Dynamic language model adaptation using keyword category classification
Hitoshi Yamamoto, Ken Hanazawa, Kiyokazu Miki, Koichi Shinoda

Integration of cache-based model and topic dependent class model with soft clustering and soft voting
Welly Naptali, Masatoshi Tsuchiya, Seiichi Nakagawa

Conditional models for detecting lambda-functions in a spoken language understanding system
Fréderic Duvert, Renato De Mori

Novel weighting scheme for unsupervised language model adaptation using latent dirichlet allocation
Md. Akmal Haidar, Douglas O'Shaughnessy

Automatic speech recognition system channel modeling
Qun Feng Tan, Kartik Audhkhasi, Panayiotis G. Georgiou, Emil Ettelaie, Shrikanth S. Narayanan

Round-robin discrimination model for reranking ASR hypotheses
Takanobu Oba, Takaaki Hori, Atsushi Nakamura

On-the-fly lattice rescoring for real-time automatic speech recognition
Haşim Sak, Murat Saraçlar, Tunga Güngör

First and Second Language Acquisition

Cantonese tone word learning by tone and non-tone language speakers
Angela Cooper, Yue Wang

Validation of a training method for L2 continuous-speech segmentation
Anne Cutler, Janise Shanley

Linguistic rhythm in foreign accent
Jiahong Yuan

The effect of a word embedded in a sentence and speaking rate variation on the perceptual training of geminate and singleton consonant distinction
Mee Sonu, Keiichi Tajima, Hiroaki Kato, Yoshinori Sagisaka

Foreign accent matters most when timing is wrong
Chiharu Tsurutani

Effects of Korean learners' consonant cluster reduction strategies on English speech recognition performance
Hyejin Hong, Jina Kim, Minhwa Chung

The effects of EMA-based augmented visual feedback on the English speakers' acquisition of the Japanese flap: a perceptual study
June S. Levitt, William F. Katz

Perception of voiceless fricatives by Japanese listeners of advanced and intermediate level English proficiency
Hinako Masuda, Takayuki Arai

Perception of estonian vowel categories by native and non-native speakers
Lya Meister, Einar Meister

Spoken English assessment system for non-native speakers using acoustic and prosodic features
Qin Shi, Kun Li, ShiLei Zhang, Stephen M. Chu, Ji Xiao, ZhiJian Ou

Russian infants and children's sounds and speech corpuses for language acquisition studies
Elena E. Lyakso, Olga V. Frolova, Anna V. Kurazhova, Julia S. Gaikova

Language-specific influence on phoneme development: French and drehu data
Julia Monnin, Hélène Lœvenbruck

Did you say susi or shushi? measuring the emergence of robust fricative contrasts in English- and Japanese-acquiring children
Jeffrey J. Holliday, Mary E. Beckman, Chanelle Mays

Spoken Language Resources, Systems and Evaluation I, II

An empirical comparison of the t3, juicer, HDecode and sphinx3 decoders
Josef R. Novak, Paul R. Dixon, Sadaoki Furui

Tracter: a lightweight dataflow framework
Philip N. Garner, John Dines

Verifying pronunciation dictionaries using conflict analysis
Marelie H. Davel, Febe de Wet

Automatic estimation of transcription accuracy and difficulty
Brandon C. Roy, Soroush Vosoughi, Deb Roy

Creating a linguistic plausibility dataset with non-expert annotators
Benjamin Lambert, Rita Singh, Bhiksha Raj

Construction and evaluations of an annotated Chinese conversational corpus in travel domain for the language model of speech recognition
Xinhui Hu, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura

Building transcribed speech corpora quickly and cheaply for many languages
Thad Hughes, Kaisuke Nakajima, Linne Ha, Atul Vasu, Pedro J. Moreno, Mike LeBeau

The CHiME corpus: a resource and a challenge for computational hearing in multisource environments
Heidi Christensen, Jon Barker, Ning Ma, Phil D. Green

Developing a Chinese L2 speech database of Japanese learners with narrow-phonetic labels for computer assisted pronunciation training
Wen Cao, Dongning Wang, Jinsong Zhang, Ziyu Xiong

How children acquire situation understanding skills?: a developmental analysis utilizing multimodal speech behavior corpus
Shogo Ishikawa, Shinya Kiriyama, Yoichi Takebayashi, Shigeyoshi Kitazawa

The influence of expertise and efficiency on modality selection strategies and perceived mental effort
Ina Wechsung, Stefan Schaffer, Robert Schleicher, Anja Naumann, Sebastian Möller

Parameters describing multimodal interaction - definitions and three usage scenarios
Christine Kühnel, Benjamin Weiss, Sebastian Möller

Repair strategies on trial: which error recovery do users like best?
Alexander Zgorzelski, Alexander Schmitt, Tobias Heinroth, Wolfgang Minker

Say what? why users choose to speak their web queries
Maryam Kamvar, Doug Beeferman

The effect of audience familiarity on the perception of modified accent
Jonathan Teutenberg, Catherine I. Watson

On generating combilex pronunciations via morphological analysis
Korin Richmond, Robert A. J. Clark, Sue Fitt

Say it as you mean it - analyzing free user comments in the VOICE awards corpus
Florian Gödde, Sebastian Möller

A new multichannel multi modal dyadic interaction database
Viktor Rozgić, Bo Xiao, Athanasios Katsamanis, Brian R. Baucom, Panayiotis G. Georgiou, Shrikanth S. Narayanan

SEAME: a Mandarin-English code-switching speech corpus in south-east asia
Dau-Cheng Lyu, Tien-Ping Tan, Eng Siong Chng, Haizhou Li

Robust ASR Against Noise

Robust word recognition using articulatory trajectories and gestures
Vikramjit Mitra, Hosung Nam, Carol Espy-Wilson, Elliot Saltzman, Louis Goldstein

Performance estimation of noisy speech recognition considering recognition task complexity
Takeshi Yamada, Tomohiro Nakajima, Nobuhiko Kitawaki, Shoji Makino

Estimating noise from noisy speech features with a monte carlo variant of the expectation maximization algorithm
Friedrich Faubel, Dietrich Klakow

Template-based spectral estimation using microphone array for speech recognition
Satoshi Tamura, Eriko Hishikawa, Wataru Taguchi, Satoru Hayamizu

A particle filter feature compensation approach to robust speech recognition
Aleem Mushtaq, Yu Tsao, Chin Hui-Lee

Nonlinear enhancement of onset for robust speech recognition
Chanwoo Kim, Richard M. Stern

Mask estimation in non-stationary noise environments for missing feature based robust speech recognition
Shirin Badiezadegan, Richard C. Rose

Robust automatic speech recognition with decoder oriented ideal binary mask estimation
Lae-Hoon Kim, Kyung-Tae Kim, Mark Hasegawa-Johnson

A robust speech recognition system against the ego noise of a robot
Gökhan Ince, Kazuhiro Nakadai, Tobias Rodemann, Hiroshi Tsujino, Jun-ichi Imura

Empirical mode decomposition for noise-robust automatic speech recognition
Kuo-Hao Wu, Chia-Ping Chen

An effective feature compensation scheme tightly matched with speech recognizer employing SVM-based GMM generation
Wooil Kim, Jun-Won Suh, John H. L. Hansen

Artificial and online acquired noise dictionaries for noise robust ASR
Jort F. Gemmeke, Tuomas Virtanen

Voice activity detection based on conditional random fields using multiple features
Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

A comparative study of noise estimation algorithms for VTS-based robust speech recognition
Yong Zhao, Biing-Hwang Juang

On using missing-feature theory with cepstral features - approximations to the multivariate integral
Frank Seide, Pei Zhao

Using a DBN to integrate sparse classification and GMM-based ASR
Yang Sun, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves

Voice Conversion and Speech Synthesis

Shape-invariant speech transformation with the phase vocoder
Axel Röbel

A phonetic alternative to cross-language voice conversion in a text-dependent context: evaluation of speaker identity
Kayoko Yanagisawa, Mark Huckvale

Evaluation of speaker mimic technology for personalizing SGD voices
Esther Klabbers, Alexander Kain, Jan P. H. van Santen

Adaptive voice-quality control based on one-to-many eigenvoice conversion
Kumi Ohta, Tomoki Toda, Yamato Ohtani, Hiroshi Saruwatari, Kiyohiro Shikano

Applying voice conversion to concatenative singing-voice synthesis
Fernando Villavicencio, Jordi Bonada

Improved generation of fundamental frequency in HMM-based speech synthesis using generation process model
Miaomiao Wang, Miaomiao Wen, Keikichi Hirose, Nobuaki Minematsu

A hierarchical F0 modeling method for HMM-based speech synthesis
Ming Lei, Yijian Wu, Frank K. Soong, Zhen-Hua Ling, Lirong Dai

Training a parametric-based logF0 model with the minimum generation error criterion
Javier Latorre, Mark J. F. Gales, Heiga Zen

Improving Mandarin segmental duration prediction with automatically extracted syntax features
Miaomiao Wen, Miaomiao Wang, Keikichi Hirose, Nobuaki Minematsu

An intonation model for TTS in sepedi
Daniel R. van Niekerk, Etienne Barnard

Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners
Michael Pucher, Dietmar Schabus, Junichi Yamagishi

A comparison of pronunciation modeling approaches for HMM-TTS
Gabriel Webster, Sacha Krstulović, Kate Knill

HMM-based text-to-articulatory-movement prediction and analysis of critical articulators
Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi

Detection, Classification, and Segmentation

Audio-based sports highlight detection by fourier local auto-correlations
Jiaxing Ye, Takumi Kobayashi, Tetsuya Higuchi

Automatic excitement-level detection for sports highlights generation
Hynek Bořil, Abhijeet Sangwan, Taufiq Hasan, John H. L. Hansen

Detecting novel objects in acoustic scenes through classifier incongruence
Jörg-Hendrik Bach, Jörn Anemüller

A multidomain approach for automatic home environmental sound classification
Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis

Content-based advertisement detection
Patrick Cardinal, Vishwa Gupta, Gilles Boulianne

Identification of abnormal audio events based on probabilistic novelty detection
Stavros Ntalampiras, Ilyas Potamitis, Nikos Fakotakis

Lightly supervised recognition for automatic alignment of large coherent speech recordings
Norbert Braunschweiler, Mark J. F. Gales, Sabine Buchholz

Incremental diarization of telephone conversations
Oshry Ben-Harush, Itshak Lapidot, Hugo Guterman

Audio analytics by template modeling and 1-pass DP based decoding
Srikanth Cherla, V. Ramasubramanian

Perceptual wavelet decomposition for speech segmentation
Mariusz Ziółko, Jakub Gałka, Bartosz Ziółko, Tomasz Drwiȩga

A comparative study of constrained and unconstrained approaches for segmentation of speech signal
Venkatesh Keri, Kishore Prahallad

Automatic discriminative measurement of voice onset time
Morgan Sonderegger, Joseph Keshet

Selective gammatone filterbank feature for robust sound event recognition
Yi Ren Leng, Huy Dat Tran, Norihide Kitaoka, Haizhou Li

Speech Coding, Modeling, and Transmission

Modelling speech line spectral frequencies with dirichlet mixture models
Zhanyu Ma, Arne Leijon

PDF-optimized LSF vector quantization based on beta mixture models
Zhanyu Ma, Arne Leijon

Non-linear predictive vector quantization of feature vectors for distributed speech recognition
Jose Enrique Garcia, Alfonso Ortega, Antonio Miguel, Eduardo Lleida

Superwideband extension of g.718 and g.729.1 speech codecs
Lasse Laaksonen, Mikko Tammi, Vladimir Malenovsky, Tommy Vaillancourt, Mi Suk Lee, Tomofumi Yamanashi, Masahiro Oshikiri, Claude Lamblin, Balazs Kovesi, Lei Miao, Deming Zhang, Jon Gibbs, Holly Francois

A multipulse FEC scheme based on amplitude estimation for CELP codecs over packet networks
José L. Carmona, Angel M. Gómez, Antonio M. Peinado, José L. Pérez-Córdoba, José A. González

Voice quality evaluation of recent open source codecs
Anssi Rämö, Henri Toukomaa

Efficient HMM-based estimation of missing features, with applications to packet loss concealment
Bengt J. Borgström, Per H. Borgström, Abeer Alwan

Speech inventory based discriminative training for joint speech enhancement and low-rate speech coding
Xiaoqiang Xiao, Robert M. Nickel

Quality-based playout buffering with FEC for conversational voIP
Qipeng Gong, Peter Kabal

Sub-band basis spectrum model for pitch-synchronous log-spectrum and phase based on approximation of sparse coding
Masatsune Tamura, Takehiko Kagoshima, Masami Akamine

A multimodal density function estimation approach to formant tracking
Sundar Harshavardhan, Chandra Sekhar Seelamantula, Thippur V. Sreenivas

Estimation studies of vocal tract shape trajectory using a variable length and lossy kelly-lochbaum model
Heikki Rasilo, Unto K. Laine, Okko Johannes Räsänen

Speech Perception: Processing and Intelligibility

A feature extraction method for automatic speech recognition based on the cochlear nucleus
Serajul Haque, Roberto Togneri

A phoneme recognition framework based on auditory spectro-temporal receptive fields
Samuel Thomas, Kailash Patil, Sriram Ganapathy, Nima Mesgarani, Hynek Hermansky

Perceptual compensation for effects of reverberation in speech identification: a computer model based on auditory efferent processing
Amy V. Beeston, Guy J. Brown

Predicting human perception and ASR classification of word-final [t] by its acoustic sub-segmental properties
Barbara Schuppler, Mirjam Ernestus, Wim van Dommelen, Jacques Koreman

A speech-in-noise test based on spoken digits: comparison of normal and impaired listeners using a computer model
Matthew Robertson, Guy J. Brown, Wendy Lecluyse, Manasa Panda, Christine M. Tan

Evaluation of bone-conducted ultrasonic hearing-aid regarding transmission of paralinguistic information: a comparison with cochlear implant simulator
Takayuki Kagomiya, Seiji Nakagawa

Challenging the speech intelligibility index: macroscopic vs. microscopic prediction of sentence recognition in normal and hearing-impaired listeners
Tim Jürgens, Stefan Fredelake, Ralf M. Meyer, Birger Kollmeier, Thomas Brand

Does sentence complexity interfere with intelligibility in noise? evaluation of the oldenburg linguistically and audiologically controlled sentence test (OLACS)
Verena N. Uslar, Thomas Brand, Mirko Hanke, Rebecca Carroll, Esther Ruigendijk, Cornelia Hamann, Birger Kollmeier

Intelligibility predictions for speech against fluctuating masker
Juan-Pablo Ramirez, Hamed Ketabdar, Alexander Raake

An effect of formant amplitude in vowel perception
Masashi Ito, Keiji Ohara, Akinori Ito, Masafumi Yano

Functional imaging of brain regions sensitive to communication sounds in primates
Christopher I. Petkov, Benjamin Wilson

Spoken Language Understanding and Spoken Language Translation I, II

Strategies for statistical spoken language understanding with small amount of data - an empirical study
Ye-Yi Wang

Investigating multiple approaches for SLU portability to a new language
Bassam Jabaian, Laurent Besacier, Fabrice Lefèvre

Learning naturally spoken commands for a robot
Anja Austermann, Seiji Yamada, Kotaro Funakoshi, Mikio Nakano

A semi-supervised cluster-and-label approach for utterance classification
Amparo Albalate, Aparna Suchindranath, David Suendermann, Wolfgang Minker

Classifying dialog acts in human-human and human-machine spoken conversations
Silvia Quarteroni, Giuseppe Riccardi

Exploring speaker characteristics for meeting summarization
Fei Liu, Yang Liu

Semi-supervised extractive speech summarization via co-training algorithm
Shasha Xie, Hui Lin, Yang Liu

Extractive summarization using a latent variable model
Asli Celikyilmaz, Dilek Hakkani-Tür

Hierarchical classification for speech-to-speech translation
Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth S. Narayanan

Rapid development of speech translation using consecutive interpretation
Matthias Paulik, Alex Waibel

Combining many alignments for speech to speech translation
Sameer R. Maskey, Steven J. Rennie, Bowen Zhou

Online SLU model adaptation with a partial oracle
Pierre Gotab, Geraldine Damnati, Frederic Bechet, Lionel Delphin-Poulat

Role of language models in spoken fluency evaluation
Om D. Deshmukh, Harish Doddala, Ashish Verma, Karthik Visweswariah

Social role discovery from spoken language using dynamic Bayesian networks
Sibel Yaman, Dilek Hakkani-Tür, Gokhan Tur

Domain adaptation and compensation for emotion detection
Michelle Hewlett Sanchez, Gokhan Tur, Luciana Ferrer, Dilek Hakkani-Tür

Phrase alignment confidence for statistical machine translation
Sankaranarayanan Ananthakrishnan, Rohit Prasad, Prem Natarajan

Named-entity projection and data-driven morphological decomposition for field maintainable speech-to-speech translation systems
Ian R. Lane, Alex Waibel

Speaker and Language Recognition

Improved n-gram phonotactic models for language recognition
Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel

A study of term weighting in phonotactic approach to spoken language recognition
Sirinoot Boonsuk, Donglai Zhu, Bin Ma, Atiwong Suchato, Proadpran Punyabukkana, Nattanun Thatphithakkul, Chai Wutiwiwatchai

Exploiting context-dependency and acoustic resolution of universal speech attribute models in spoken language recognition
Sabato Marco Siniscalchi, Jeremy Reed, Torbjørn Svendsen, Chin-Hui Lee

Hierarchical multilayer perceptron based language identification
David Imseng, Mathew Magimai Doss, Hervé Bourlard

The NIST 2010 speaker recognition evaluation
Alvin F. Martin, Craig S. Greenberg

Bayesian speaker recognition using Gaussian mixture model and laplace approximation
Shih-Sian Cheng, I-Fan Chen, Hsin-Min Wang

What else is new than the hamming window? robust MFCCs for speaker recognition via multitapering
Tomi Kinnunen, Rahim Saeidi, Johan Sandberg, Maria Hansson-Sandsten

Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework
Achintya Kumar Sarkar, S. Umesh

Graph-embedding for speaker recognition
Zahi N. Karam, William M. Campbell

A hybrid modeling strategy for GMM-SVM speaker recognition with adaptive relevance factor
Chang Huai You, Haizhou Li, Kong Aik Lee

Robust mixture modeling using t-distribution: application to speaker ID
Sundar Harshavardhan, Thippur V. Sreenivas

A variable frame length and rate algorithm based on the spectral kurtosis measure for speaker verification
Chi-Sang Jung, Kyu J. Han, Hyunson Seo, Shrikanth S. Narayanan, Hong-Goo Kang

INTERSPEECH 2010 Paralinguistic Challenge (Special Session)

The INTERSPEECH 2010 paralinguistic challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Felix Burkhardt, Laurence Devillers, Christian Müller, Shrikanth S. Narayanan

Age and gender classification from speech using decision level fusion and ensemble based techniques
Florian Lingenfelser, Johannes Wagner, Thurid Vogt, Jonghwa Kim, Elisabeth André

Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence
Je Hun Jeon, Rui Xia, Yang Liu

Fuzzy support vector machines for age and gender classification
Phuoc Nguyen, Trung Le, Dat Tran, Xu Huang, Dharmendra Sharma

Gender and affect recognition based on GMM and GMM-UBM modeling with relevance MAP estimation
Rok Gajšek, Janez Žibert, Tadej Justin, Vitomir Štruc, Boštjan Vesnicer, France Mihelič

Age recognition based on speech signals using weights supervector
Royi Porat, Dan Lange, Yaniv Zigel

Age and gender classification using fusion of acoustic and prosodic features
Hugo Meinedo, Isabel Trancoso

Brno university of technology system for interspeech 2010 paralinguistic challenge
Marcel Kockmann, Lukáš Burget, Jan Černocký

Combining five acoustic level modeling methods for automatic speaker age and gender recognition
Ming Li, Chi-Sang Jung, Kyu J. Han

Age and gender recognition based on multiple systems - early vs. late fusion
Tobias Bocklet, Georg Stemmer, Viktor Zeissler, Elmar Nöth

Automatic speaker age and gender recognition in the car for tailoring dialog and mobile services
Michael Feld, Felix Burkhardt, Christian Müller

Voice Activity and Turn Detection

Toward detecting voice activity employing soft decision in second-order conditional MAP
Sang-Kyun Kim, Jae-Hun Choi, Sang-Ick Kang, Ji-Hyun Song, Joon-Hyuk Chang

Voice activity detection in a reguarized reproducing kernel hilbert space
Xugang Lu, Masashi Unoki, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura

A new VAD framework using statistical model and human knowledge based empirical rule
Ji Wu, Xiao-lei Zhang, Wei Li

Adaptive high accuracy approaches to speech activity detection in noisy and hostile audio environments
Mark Huggins, Brett Smolenski, Aaron Lawson

Robust voice activity detection in stereo recording with crosstalk
Prasanta Kumar Ghosh, Andreas Tsiartas, Panayiotis G. Georgiou, Shrikanth S. Narayanan

Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization
Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani

Spectral entropy-based voice activity detector for videoconferencing systems
Bowon Lee, Debargha Muhkerjee

The QUT-NOISE-TIMIT corpus for the evaluation of voice activity detection algorithms
David Dean, Sridha Sridharan, Robert Vogt, Michael Mason

A Bayesian approach to voice activity detection using multiple statistical models and discriminative training
Tao Yu, John H. L. Hansen

Noise robust voice activity detection using features extracted from the time-domain autocorrelation function
Houman Ghaemmaghami, Brendan Baker, Robert Vogt, Sridha Sridharan

VAD-measure-embedded decoder with online model adaptation
Tasuku Oonishi, Koji Iwano, Sadaoki Furui

Robust statistical voice activity detection using a likelihood ratio sign test
Shiwen Deng, Jiqing Han

Automatic turn segmentation in spoken conversations
Alexei V. Ivanov, Giuseppe Riccardi

Turn taking-based conversation detection by using DOA estimation
Yohei Kawaguchi, Masahito Togami, Yasunari Obuchi



Special Session: Models of Speech - In Search of Better Representations

ASR: Acoustic Models I-III

Spoken Dialogue Systems I, II

Spoken Dialogue Systems II

Speech Perception: Factors Influencing Perception

Prosody: Models

Speech Synthesis: Unit Selection and Others

ASR: Search, Decoding and Confidence Measures I, II

Special-Purpose Speech Applications

Speech Analysis

Systems for LVCSR

Speaker Characterization and Recognition I-IV

Source Separation

Speech Synthesis: HMM-Based Speech Synthesis I, II

Multi-Modal Signal Processing


ASR: Speaker Adaptation, Robustness Against Reverberation

Language Learning, TTS, and Other Applications

Pitch and Glottal-Waveform Estimation and Modeling I, II

Open Vocabulary Spoken Document Retrieval (Special Session)

Robust ASR

Language and Dialect Identification

Technologies for Learning and Education

Emotional Speech

New Paradigms in ASR I, II

Speech Production: Various Approaches

Speech Enhancement

Special Session: Fact and Replica of Speech Production (Special Session)

ASR: Language Modeling

Single-Channel Speech Enhancement

Speech Synthesis: Miscellaneous Topics

Prosody: Basics & Applications

ASR: Feature Extraction I, II

Speech Perception: Cross Language and Age

SLP Systems

Quality of Experiencing Speech Services (Special Session)

Language Processing

Speech and Audio Segmentation

Prosody: Analysis

Systems for LVCSR and Rich Transcription


Speech Production: Vocal Tract Modeling and Imaging

Speech Intelligibility Enhancement for All Ages, Health Conditions and Environments (Special Session)

ASR: Acoustic Model Adaptation

SLP Systems for Information Extraction/Retrieval

Speech Representation

Voice Conversion

Prosody: Language-Specific Models

ASR: Language Modeling and Speech Understanding I

First and Second Language Acquisition

Spoken Language Resources, Systems and Evaluation I, II

Speech Production: Analysis

Paralanguage & Cognition

Robust ASR Against Noise

Voice Conversion and Speech Synthesis

Detection, Classification, and Segmentation

Compressive Sensing for Speech and Language Processing (Special Session)

ASR: Lexical and Pronunciation Modeling

Speaker Recognition and Diarization

Speech and Audio Classification

Emotion Recognition

Speech Coding, Modeling, and Transmission

Speech Perception: Processing and Intelligibility

Spoken Language Understanding and Spoken Language Translation I, II

Social Signals in Speech (Special Session)

Physiology and Pathology of Spoken Language

Speaker Diarization

Multi-Modal ASR, Including Audio-Visual ASR

Speaker and Language Recognition

Source Localization and Separation

INTERSPEECH 2010 Paralinguistic Challenge (Special Session)

Signal Processing for Music and Song

Modeling First Language Acquisition

Discourse and Dialogue

Voice Activity and Turn Detection