ISCA Archive Interspeech 2007 Sessions Booklet
  ISCA Archive Sessions Booklet
top

Interspeech 2007

Antwerp, Belgium
27-31 August 2007

General Chairs: Dirk Van Compernolle, Lou Boves
doi: 10.21437/Interspeech.2007


Speech Production I, II


An articulatory and acoustic study of "retroflex" and "bunched" american English rhotic sound based on MRI
Xinhui Zhou, Carol Y. Espy-Wilson, Mark Tiede, Suzanne Boyce

An MRI study of european portuguese nasals
Paula Martins, Inês Carbone, Augusto Silva, António J. S. Teixeira

A four-cube FEM model of the extrinsic and intrinsic tongue muscles to simulate the production of vowel /i/
Sayoko Takano, Hiroki Matsuzaki, Kunitoshi Motoki

Performance evaluation of glottal quality measures from the perspective of vocal tract filter consistency
Juan Torres, Elliot Moore

Statistical identification of critical, dependent and redundant articulators
Veena D. Singampalli, Philip J. B. Jackson

An empirical investigation of the nonuniqueness in the acoustic-to-articulatory mapping
Chao Qin, Miguel Á. Carreira-Perpiñán

Vocal tract length during speech production
Sorin Dusan

Approximation method of subglottal system using ARMA filter
Nobuhiro Miki, Kyohei Hayashi

Enhancing acoustic-to-EPG mapping with lip position information
Asterios Toutios, Konstantinos Margaritis

A model of glottal flow incorporating viscous-inviscid interaction
Tokihiko Kaburagi, Yosuke Tanabe

Thinking outside the cube: modeling language processing tasks in a multiple resource paradigm
Kilian G. Seeber

Experimental validation of direct and inverse glottal flow models for unsteady flow conditions
Julien Cisonni, Annemie Van Hirtum, Jan Willems, Xavier Pelorson

Effect of unsteady glottal flow on the speech production process
Hideyuki Nomura, Tetsuo Funada

Word stress correlates in spontaneous child-directed speech in German
Katrin Schneider, Bernd Möbius

Acquisition and synchronization of multimodal articulatory data
Michael Aron, Nicolas Ferveur, Erwan Kerrien, Marie-Odile Berger, Yves Laprie

A phonetic concatenative approach of labial coarticulation
Vincent Robert, Yves Laprie, Anne Bonneau

Visual analysis of lip coarticulation in VCV utterances
Aseel Turkmani, Adrian Hilton, Philip J. B. Jackson, James Edge

Comparison of multiple voice source parameters in different phonation types
Matti Airas, Paavo Alku

Acoustic and affective comparisons of natural and imaginary infant-, foreigner- and adult-directed speech
Monja Knoll, Lisa Scharrer

Vowel production in two occlusal classes
André Araújo, Luis M. T. Jesus, Isabel M. Costa

Nepalese retroflex stops: a static palatography study of inter- and intra-speaker variability
Rajesh Khatiwada

Effects of testosterone levels on temporal and intonational aspects of speech: more exploratory data
Charles A. Lamoureux, Victor J. Boucher


Phonetic Segmentation and Classification I, II


Fixed-size kernel logistic regression for phoneme classification
Peter Karsmakers, Kristiaan Pelckmans, Johan Suykens, Hugo Van hamme

A multiple-model based framework for automatic speech segmentation
Seung Seop Park, Jong Won Shin, Jong Kyu Kim, Nam Soo Kim

Semi-supervised learning of speech sounds
Aren Jansen, Partha Niyogi

Evaluation of syllable stress using single class classifier
Abhinav Parate, Ashish Verma, Jayanta Basak

Distinctive phonetic feature (DPF) based phone segmentation using hybrid neural networks
Mohammad Nurul Huda, Ghulam Muhammad, Junsei Horikawa, Tsuneo Nitta

A methodology for the automatic detection of perceived prominent syllables in spoken French
J. -Ph. Goldman, M. Avanzi, A. -C. Simon, Anne Lacheret, A. Auchlin

Dual-channel acoustic detection of nasalization states
Xiaochuan Niu, Jan P. H. van Santen

Acoustic parameters for the automatic detection of vowel nasalization
Tarun Pruthi, Carol Y. Espy-Wilson

On the use of time-delay neural networks for highly accurate classification of stop consonants
Jun Hou, Lawrence R. Rabiner, Sorin Dusan

A new approach for phoneme segmentation of speech signals
Ladan Golipour, Douglas O'Shaughnessy

Automatically learning the units of speech by non-negative matrix factorisation
Veronique Stouten, Kris Demuynck, Hugo Van hamme

A saliency-based auditory attention model with applications to unsupervised prominent syllable detection in speech
Ozlem Kalinli, Shrikanth S. Narayanan

Zero-crossing-based ratio masking for sound segregation
Sung Jun An, Young-Ik Kim, Rhee Man Kil

Event detection of speech signals based on auditory processing with a dynamic compressive gammachirp filterbank
Satomi Tanaka, Minoru Tsuzaki, Hiroaki Kato, Yoshinori Sagisaka

Segmentation of speech: child's play?
Odette Scharenborg, Mirjam Ernestus, Vincent Wan

Dimensionality reduction methods applied to both magnitude and phase derived features
Andrew Errity, John McKenna, Barry Kirkpatrick



Spoken Dialog Systems I, II


Utilizing online content as domain knowledge in a multi-domain dynamic dialogue system
Craig Wootton, Michael McTear, Terry Anderson

Handling speech input in the ritel QA dialogue system
Boris van Schooten, Sophie Rosset, Olivier Galibert, Aurélien Max, Rieks op den Akker, Gabriel Illouz

Online call quality monitoring for automating agent-based call centers
Woosung Kim

Analysis of communication failures for spoken dialogue systems
Sebastian Möller, Klaus-Peter Engelbrecht, Antti Oulasvirta

How to access audio files of large data bases using in-car speech dialogue systems
Sandra Mann, André Berton, Ute Ehrlich

Analyzing temporal transition of real user's behaviors in a spoken dialogue system
Kazunori Komatani, Tatsuya Kawahara, Hiroshi G. Okuno

Voicepedia: towards speech-based access to unstructured information
J. Sherwani, Dong Yu, Tim Paek, Mary Czerwinski, Yun-Cheng Ju, Alex Acero

Exploiting prosodic features for dialog act tagging in a discriminative modeling framework
Vivek Rangarajan, Srinivas Bangalore, Shrikanth S. Narayanan

Using information state to improve dialogue move identification in a spoken dialogue system
Hua Ai, Antonio Roque, Anton Leuski, David Traum

Using multiple strategies to manage spoken dialogue
Shiu-Wah Chu, Ian O'Neill, Philip Hanna

An information state based dialogue manager for a mobile robot
Marcelo Quinderé, Luís Seabra Lopes, António J. S. Teixeira

Automated directory assistance system - from theory to practice
Dong Yu, Yun-Cheng Ju, Ye-Yi Wang, Geoffrey Zweig, Alex Acero

The voice-rate dialog system for consumer ratings
Geoffrey Zweig, Patrick Nguyen, Yun-Cheng Ju, Ye-Yi Wang, Dong Yu, Alex Acero

The influence of user tailoring and cognitive load on user performance in spoken dialogue systems
Andi Winterboer, Jiang Hu, Johanna D. Moore, Clifford Nass

Confidence measures for voice search applications
Ye-Yi Wang, Dong Yu, Yun-Cheng Ju, Geoffrey Zweig, Alex Acero

Effects of quiz-style information presentation on user understanding
Ryuichiro Higashinaka, Kohji Dohsaka, Shigeaki Amano, Hideki Isozaki

A data visualization and analysis method for natural language call routing system design
Hong-Kwang Jeff Kuo, Vaibhava Goel


Accent and Language Identification I, II


Discriminative optimization of language adapted HMMs for a language identification system based on parallel phoneme recognizers
Josef G. Bauer, Bernt Andrassy, Ekaterina Timoshenko

Fusion of contrastive acoustic models for parallel phonotactic spoken language identification
Khe Chai Sim, Haizhou Li

Multi-layer kohonen self-organizing feature map for language identification
Liang Wang, Eliathamby Ambikairajah, Eric H. C. Choi

Hierarchical language identification based on automatic language clustering
Bo Yin, Eliathamby Ambikairajah, Fang Chen

Using speech rhythm for acoustic language identification
Ekaterina Timoshenko, Harald Höge

A model-based estimation of phonotactic language verification performance
Ka-keung Wong, Man-hung Siu, Brian Mak

A tagging algorithm for mixed language identification in a noisy domain
Mike Rosner, Paulseph-John Farrugia

Improved language recognition using better phonetic decoders and fusion with MFCC and SDC features
Doroteo T. Toledano, Javier Gonzalez-Dominguez, Alejandro Abejon-Gonzalez, Danilo Spada, Ismael Mateos-Garcia, Joaquin Gonzalez-Rodriguez

An open-set detection evaluation methodology applied to language and emotion recognition
David A. van Leeuwen, Khiet P. Truong

Boosting with anti-models for automatic language identification
Xi Yang, Man-hung Siu, Herbert Gish, Brian Mak

Acoustic language identification using fast discriminative training
Fabio Castaldo, Daniele Colibro, Emanuele Dalmasso, Pietro Laface, Claudio Vair

Spoken language identification using score vector modeling and support vector machine
Ming Li, Hongbin Suo, Xiao Wu, Ping Lu, Yonghong Yan

Language identification based on n-gram frequency ranking
R. Cordoba, L. F. D'Haro, F. Fernandez-Martinez, J. Macias-Guarasa, J. Ferreiros

Improving phonotactic language recognition with acoustic adaptation
Wade Shen, Douglas Reynolds



Robust ASR I, II


Noise-robust hands-free voice activity detection with adaptive zero crossing detection using talker direction estimation
Yuki Denda, Takamasa Tanaka, Masato Nakayama, Takanobu Nishiura, Yoichi Yamashita

A robust mel-scale subband voice activity detector for a car platform
A. Álvarez, R. Martínez, P. Gómez, V. Nieto, V. Rodellar

Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio
Kentaro Ishizuka, Tomohiro Nakatani, Masakiyo Fujimoto, Noboru Miyazaki

Feature and distribution normalization schemes for statistical mismatch reduction in reverberant speech recognition
A. M. Toh, Roberto Togneri, Sven Nordholm

Temporal masking for unsupervised minimum Bayes risk speaker adaptation
Matthew Gibson, Thomas Hain

Speech feature compensation based on pseudo stereo codebooks for robust speech recognition in additive noise environments
Tsung-hsueh Hsieh, Jeih-weih Hung

Multiband, multisensor robust features for noisy speech recognition
Dimitrios Dimitriadis, Petros Maragos, Stamatios Lefkimmiatis

Noise robust speech recognition for voice driven wheelchair
Akira Sasou, Hiroaki Kojima

Irrelevant variability normalization based HMM training using VTS approximation of an explicit model of environmental distortions
Yu Hu, Qiang Huo

On the jointly unsupervised feature vector normalization and acoustic model compensation for robust speech recognition
Luis Buera, Antonio Miguel, Eduardo Lleida, Óscar Saz, Alfonso Ortega

An ensemble modeling approach to joint characterization of speaker and speaking environments
Yu Tsao, Chin-Hui Lee

Cluster-based polynomial-fit histogram equalization (CPHEQ) for robust speech recognition
Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen

Robust distributed speech recognition using histogram equalization and correlation information
Pedro M. Martinez, Jose C. Segura, Luz Garcia

Predictive minimum Bayes risk classification for robust speech recognition
Jen-Tzung Chien, Koichi Shinoda, Sadaoki Furui

Applying word duration constraints by using unrolled HMMs
Ning Ma, Jon Barker, Phil Green

Evaluating the temporal structure normalisation technique on the Aurora-4 task
Xiong Xiao, Eng Siong Chng, Haizhou Li

Two-stage system for robust neutral/lombard speech recognition
Hynek Bořil, Petr Fousek, Harald Höge

Noise suppression using search strategy with multi-model compositions
Takatoshi Jitsuhiro, Tomoji Toriyama, Kiyoshi Kogure

Investigations into early and late reflections on distant-talking speech recognition toward suitable reverberation criteria
Takanobu Nishiura, Yoshiki Hirano, Yuki Denda, Masato Nakayama

An approach to iterative speech feature enhancement and recognition
Stefan Windmann, Reinhold Haeb-Umbach

Optimization of temporal filters in the modulation frequency domain for constructing robust features in speech recognition
Jeih-weih Hung

The harming part of room acoustics in automatic speech recognition
Rico Petrick, Kevin Lohde, Matthias Wolff, Rüdiger Hoffmann

A reference model weighting-based method for robust speech recognition
Yuan Fu Liao, Yh-Her Yang, Chi-Hui Hsu, Cheng-Chang Lee, Jing-Teng Zeng

Mel sub-band filtering and compression for robust speech recognition
Babak Nasersharif, Ahmad Akbari, Mohammad Mehdi Homayounpour


Adaptation in ASR I, II


Clustered maximum likelihood linear basis for rapid speaker adaptation
Yun Tang, Richard Rose

Rapid speaker adaptation by reference model interpolation
Wenxuan Teng, Guillaume Gravier, Frédéric Bimbot, Frédéric Soufflet

Rapid unsupervised speaker adaptation using single utterance based on MLLR and speaker selection
Randy Gomez, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

Robustness of several kernel-based fast adaptation methods on noisy LVCSR
Brian Mak, Roger Hsiao

Estimating VTLN warping factors by distribution matching
Janne Pylkkönen

Frequency domain correspondence for speaker normalization
Ming Liu, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang, Zhengyou Zhang

Unsupervised training of adaptation rate using q-learning in large vocabulary continuous speech recognition
Masafumi Nishida, Yasuo Horiuchi, Akira Ichikawa

Application of CMLLR in narrow band wide band adapted systems
Martin Karafiát, Lukáš Burget, Jan Černocký, Thomas Hain

Fast adaptation of GMM-based compact models
Christophe Lévy, Georges Linarès, Jean-François Bonastre

Efficient estimation of speaker-specific projecting feature transforms
Jonas Lööf, Ralf Schlüter, Hermann Ney

Regularized feature-based maximum likelihood linear regression for speech recognition
Mohamed Kamal Omar

Modelling confusion matrices to improve speech recognition accuracy, with an application to dysarthric speech
Omar Caballero Morales, Stephen Cox

An active approach to speaker and task adaptation based on automatic analysis of vocabulary confusability
Qiang Huo, Wei Li

fMPE-MAP: improved discriminative adaptation for modeling new domains
Jing Zheng, Andreas Stolcke

Discriminative MCE-based speaker adaptation of acoustic models for a spoken lecture processing task
Timothy J. Hazen, Erik McDermott


Speaker Verification & Identification I-IV


A new kernel for SVM MLLR based speaker recognition
Zahi N. Karam, William M. Campbell

A GMM-based probabilistic sequence kernel for speaker verification
Kong-Aik Lee, Changhuai You, Haizhou Li, Tomi Kinnunen

Speaker recognition using kernel-PCA and intersession variability modeling
Hagai Aronowitz

Linear and non linear kernel GMM supervector machines for speaker verification
Réda Dehak, Najim Dehak, Patrick Kenny, Pierre Dumouchel

Support vector regression for speaker verification
Ignacio Lopez-Moreno, Ismael Mateos-Garcia, Daniel Ramos, Joaquin Gonzalez-Rodriguez

Derivative and parametric kernels for speaker verification
C. Longworth, M. J. F. Gales

Application of shifted delta cepstral features in speaker verification
Jose R. Calvo, Rafael Fernández, Gabriel Hernández

A smoothing kernel for spatially related features and its application to speaker verification
Luciana Ferrer, Kemal Sönmez, Elizabeth Shriberg

VZ-norm: an extension of z-norm to the multivariate case for anchor model based speaker verification
D. Charlet, M. Collet, Frédéric Bimbot

Word-conditioned HMM supervectors for speaker recognition
Howard Lei, Nikki Mirghafori

Speaker clustering using direct maximization of a BIC-based score
Wei-Ho Tsai

Confidence measure based unsupervised target model adaptation for speaker verification
A. Preti, Jean-François Bonastre, Driss Matrouf, F. Capman, B. Ravera

Emotion attribute projection for speaker recognition on emotional speech
Huanjun Bao, Ming-Xing Xu, Thomas Fang Zheng

High-level feature-based speaker verification via articulatory phonetic-class pronunciation modeling
Shi-Xiong Zhang, Man-Wai Mak, Helen Meng

Direct acoustic feature using iterative EM algorithm and spectral energy for classifying suicidal speech
T. Yingthawornsuk, H. Kaymaz Keskinpala, D. M. Wilkes, R. G. Shiavi, R. M. Salomon

On comparing and combining intra-speaker variability compensation and unsupervised model adaptation in speaker verification
Claudio Garreton, Nestor Becerra Yoma, Fernando Huenupán, Carlos Molina

Comparison of two kinds of speaker location representation for SVM-based speaker verification
Xianyu Zhao, Yuan Dong, Hao Yang, Jian Zhao, Liang Lu, Haila Wang

Jitter and shimmer measurements for speaker recognition
Mireia Farrús, Javier Hernando, Pascual Ejarque

Natural-emotion GMM transformation algorithm for emotional speaker recognition
Zhenyu Shan, Yingchun Yang, Ruizhi Ye

Optimized one-bit quantization for adapted GMM-based speaker verification
Ivy H. Tseng, Olivier Verscheure, Deepak S. Turaga, Upendra V. Chaudhari

A comparison of session variability compensation techniques for SVM-based speaker recognition
Mitchell McLaren, Robbie Vogt, Brendan Baker, Sridha Sridharan

Influence of task duration in text-independent speaker verification
Benoît Fauve, Nicholas Evans, Neil Pearson, Jean-François Bonastre, John Mason

A text-constrained prosodic system for speaker verification
Elizabeth Shriberg, Luciana Ferrer

Fusing acoustic, phonetic and data-driven systems for text-independent speaker verification
Asmaa El Hannani, Dijana Petrovska-Delacrétaz

Continuous prosodic features and formant modeling with joint factor analysis for speaker verification
Najim Dehak, Patrick Kenny, Pierre Dumouchel

Loquendo - Politecnico di torino's 2006 NIST speaker recognition evaluation system
Claudio Vair, Daniele Colibro, Fabio Castaldo, Emanuele Dalmasso, Pietro Laface

A straightforward and efficient implementation of the factor analysis model for speaker verification
Driss Matrouf, Nicolas Scheffer, Benoît Fauve, Jean-François Bonastre

Multi-modal user authentication from video for mobile or variable-environment applications
Timothy J. Hazen, Daniel Schultz

Quasi text-independent speaker-verification based on pattern matching
Michael Gerber, René Beutler, Beat Pfister

Virtual fusion for speaker recognition
Yosef A. Solewicz, Moshe Koppel

Evolutionary minimum verification error learning of the alternative hypothesis model for LLR-based speaker verification
Yi-Hsiang Chao, Wei-Ho Tsai, Shih-Sian Cheng, Hsin-Min Wang, Ruei-Chuan Chang

Speaker recognition by combining MFCC and phase information
Seiichi Nakagawa, Kouhei Asakawa, Longbiao Wang

A semi-automatic approach for speaker mining of tapped telephone conversations
Sandeep Manocha, Carol Y. Espy-Wilson

Cluster adaptive training weights as features in SVM-based speaker verification
Hao Yang, Yuan Dong, Xianyu Zhao, Jian Zhao, Liang Lu, Haila Wang

Study on speaker verification with non-audible murmur segments
Hideki Okamoto, Mariko Kojima, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano

Dimension reduction for speaker identification based on mutual information
Xugang Lu, Jianwu Dang

Robustness of long time measures of fundamental frequency
Jonas Lindh, Anders Eriksson

Score distribution scaling for speaker recognition
Vinod Prakash, John H. L. Hansen

Global features for rapid identity verification with dynamic biometric data
A. C. Morris, J. Koreman, B. Ly-Van, H. Sellahewa, S. Jassim, R. Llarena Gómez

Robust voice activity detection for narrow-bandwidth speaker verification under adverse environments
Tuan Van Pham, Michael Neffe, Gernot Kubin

Speaker verification with multiple classifier fusion using Bayes based confidence measure
Fernando Huenupán, Nestor Becerra Yoma, Carlos Molina, Claudio Garreton

Audiovisual speaker identity verification based on lip motion features
Girija Chetty, Michael Wagner

Duration and pronunciation conditioned lexical modeling for speaker verification
Gokhan Tur, Elizabeth Shriberg, Andreas Stolcke, Sachin Kajarekar

Artificial impostor voice transformation effects on false acceptance rates
Jean-François Bonastre, Driss Matrouf, Corinne Fredouille


Spoken Data Retrieval I, II


Rapid and accurate spoken term detection
David R. H. Miller, Michael Kleber, Chia-Lin Kao, Owen Kimball, Thomas Colthurst, Stephen A. Lowe, Richard M. Schwartz, Herbert Gish

Subword-based position specific posterior lattices (s-PSPL) for indexing speech information
Yi-cheng Pan, Hung-lin Chang, Berlin Chen, Lin-shan Lee

Improved methods for language model based question classification
Andreas Merkel, Dietrich Klakow

Error-tolerant question answering for spoken documents
Tomoyosi Akiba, Hirofumi Tsujimura

Exploiting information extraction annotations for document retrieval in distillation tasks
Dilek Hakkani-Tür, Gokhan Tur, Michael Levit

Learning spoken document similarity and recommendation using supervised probabilistic latent semantic analysis
K. Thambiratnam, F. Seide

A phonetic search approach to the 2006 NIST spoken term detection evaluation
Roy Wallace, Robbie Vogt, Sridha Sridharan

An integration method of retrieval results using plural subword models for vocabulary-free spoken document retrieval
Yoshiaki Itoh, Kohei Iwata, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee

The SRI/OGI 2006 spoken term detection system
Dimitra Vergyri, Izhak Shafran, Andreas Stolcke, Ramana R. Gadde, Murat Akbacak, Brian Roark, Wen Wang

Podcastle: a web 2.0 approach to speech recognition research
Masataka Goto, Jun Ogata, Kouichirou Eto

Speech mining in noisy audio message corpus
Nathalie Camelin, Frédéric Béchet, Géraldine Damnati, Renato De Mori

A fast fuzzy keyword spotting algorithm based on syllable confusion network
Jian Shao, Qingwei Zhao, Pengyuan Zhang, Zhaojie Liu, Yonghong Yan

Advances in speechfind: transcript reliability estimation employing confidence measure based on discriminative sub-word model for SDR
Wooil Kim, John H. L. Hansen

An interactive timeline for speech database browsing
Benoit Favre, Jean-François Bonastre, Patrice Bellot


Speech Perception I, II


Spoken word recognition of Chinese homophones: a further investigation
Michael C. W. Yip

The role of outer hair cell function in the perception of synthetic versus natural speech
Maria Wolters, Pauline Campbell, Christine DePlacido, Amy Liddell, David Owens

Hybridizing conversational and clear speech
Akiko Kusumoto, Alexander B. Kain, John-Paul Hosom, Jan P. H. van Santen

Neighborhood density and neighborhood frequency effects in French spoken word recognition
Sophie Dufour, Ulrich Hans Frauenfelder

Discrimination and recognition of scaled word sounds
Toshio Irino, Yoshie Aoki, Yoshie Hayashi, Hideki Kawahara, Roy D. Patterson

Benchmarking human performance on the acoustic and linguistic subtasks of ASR systems
László Tóth

Contributions of temporal fine structure cues to Chinese speech recognition in cochlear implant simulation
Lin Yang, Jianping Zhang, Yonghong Yan

Effect of number of masking talkers on speech-on-speech masking in Chinese
Xihong Wu, Jing Chen, Zhigang Yang, Qiang Huang, Mengyuan Wang, Liang Li

Do different boundary types induce subtle acoustic cues to which French listeners are sensitive?
Odile Bagou, Sophie Dufour, Cécile Fougeron, Alain Content, Ulrich Hans Frauenfelder

An information theoretic approach to predict speech intelligibility for listeners with normal and impaired hearing
Svante Stadler, Arne Leijon, Björn Hagerman

Speaking rate effects in a landmark-based phonetic exemplar model
Travis Wade, Bernd Möbius

Acoustic correlates of intelligibility enhancements in clearly produced fricatives
Kazumi Maniwa, Allard Jongman, Travis Wade

Modelling the human-machine gap in speech reception: microscopic speech intelligibility prediction for normal-hearing subjects with an auditory model
Tim Jürgens, Thomas Brand, Birger Kollmeier

Lombard speech impact on perceptual speaker recognition
Ayako Ikeno, John H. L. Hansen

Effect of within- and between-talker variability on word identification in noise by younger and older adults
Huiwen Goy, Kathleen Pichora-Fuller, Pascal van Lieshout, Gurjit Singh, Bruce Schneider

Speech perception in children with speech sound disorder
H. Timothy Bunnell, N. Carolyn Schanen, Linda D. Vallino, Thierry G. Morlet, James B. Polikoff, Jennette D. Driscoll, James T. Mantell

Speech coding and information processing by auditory neurons
Huan Wang, Werner Hemmert

What do listeners attend to in hearing prosodic structures? investigating the human speech-parser using short-term recall
Annie C. Gilbert, Victor J. Boucher

Time-compressed speech perception with speech and noise maskers
Douglas S. Brungart, Nandini Iyer

L2 consonant identification in noise: cross-language comparisons
Anne Cutler, Martin Cooke, Maria Luisa Garcia Lecumberri, Dennis Pasveer

Effects of non-native dialects on spoken word recognition
Jennifer T. Le, Catherine T. Best, Michael D. Tyler, Christian Kroos

Identification of natural whistled vowels by non-whistlers
Julien Meyer, Fanny Meunier, Laure Dentel

Prelexical adjustments to speaker idiosyncrasies: are they position-specific?
Alexandra Jesse, James M. McQueen

Top-down effects on compensation for coarticulation are not replicable
Holger Mitterer



Prosodic Modeling I, II


Modeling incompletion phenomenon in Mandarin dialog prosody
Jian Yu, Lixing Huang, Jianhua Tao, Xia Wang

Accent assignment algorithm in Hungarian, based on syntactic analysis
Anne Tamm, Kálmán Abari, Gábor Olaszy

An effective initial/final duration prediction method for corpus-based singing voice synthesis of Mandarin Chinese
Cheng-Yuan Lin, Pei-Chi Jao, J. -S. Roger Jang

Increasing prosodic variability of text-to-speech synthesizers
Géza Németh, Márk Fék, Tamás Gábor Csapó

Unsupervised HMM classification of F0 curves
Damien Lolive, Nelly Barbot, Olivier Boeffard

Automatic pitch accent prediction for text-to-speech synthesis
Ian Read, Stephen Cox

An unsupervised approach to automatic prosodic annotation
Xinqiang Ni, Yining Chen, Frank K. Soong, Min Chu, Ping Zhang

A system for transforming the emotion in speech: combining data-driven conversion techniques for prosody and voice quality
Zeynep Inanoglu, Steve Young

An automatic prosody labeling method for Mandarin speech
Chen-Yu Chiang, Hsiu-Min Yu, Yih-Ru Wang, Sin-Horng Chen

Corpus-based generation of prosodic features from text based on generation process model
Keikichi Hirose, Keiko Ochi, Nobuaki Minematsu

Novel eigenpitch-based prosody model for text-to-speech synthesis
Jilei Tian, Jani Nurminen, Imre Kiss

Modelling prominence and emphasis improves unit-selection synthesis
Volker Strom, Ani Nenkova, Robert Clark, Yolanda Vazquez-Alvarez, Jason Brenier, Simon King, Dan Jurafsky

A framework of reply speech generation for concept-to-speech conversion in spoken dialogue systems
Seiya Takada, Yuji Yagi, Keikichi Hirose, Nobuaki Minematsu

Synthesis of prosodic attitudinal variants in German backchannel ja
Thorsten Stocksmeier, Stefan Kopp, Dafydd Gibbon

Inter-language prosodic style modification experiment using word impression vector for communicative speech generation
Ke Li, Yoko Greenberg, Yoshinori Sagisaka





Language Modeling I, II


Large-scale random forest language models for speech recognition
Yi Su, Frederick Jelinek, Sanjeev Khudanpur

PLSA-based topic detection in meetings for adaptation of lexicon and language model
Yuya Akita, Yusuke Nemoto, Tatsuya Kawahara

Language modeling using PLSA-based topic HMM
Atsushi Sako, Tetsuya Takiguchi, Yasuo Ariki

Lexicon adaptation with reduced character error (LARCE) - a new direction in Chinese language modeling
Yi-cheng Pan, Lin-shan Lee

Minimum rank error training for language modeling
Meng-Sung Wu, Jen-Tzung Chien

Integrating MAP, marginals, and unsupervised language model adaptation
Wen Wang, Andreas Stolcke

Dynamic language model adaptation using presentation slides for lecture speech recognition
Hiroki Yamazaki, Koji Iwano, Koichi Shinoda, Sadaoki Furui, Haruo Yokota

Web-based language modelling for automatic lecture transcription
Cosmin Munteanu, Gerald Penn, Ron Baecker

LSA-based language model adaptation for highly inflected languages
Tanel Alumäe, Toomas Kirt

Language model adaptation using latent dirichlet allocation and an efficient topic inference algorithm
Aaron Heidel, Hung-an Chang, Lin-shan Lee

Structural Bayesian language modeling and adaptation
Sibel Yaman, Jen-Tzung Chien, Chin-Hui Lee

Vocabulary selection for a broadcast news transcription system using a morpho-syntactic approach
Ciro Martins, António J. S. Teixeira, João Neto

Handling OOV words in Arabic ASR via flexible morphological constraints
Nguyen Bach, Mohamed Noamany, Ian Lane, Tanja Schultz

Phrases in category-based language models for Spanish and basque ASR
Raquel Justo, M. Inés Torres

Language modeling for automatic turkish broadcast news transcription
Ebru Arısoy, Haşim Sak, Murat Saraçlar






Speech Enhancement


The effect of the additivity assumption on time and frequency domain wiener filtering for speech enhancement
Kamil K. Wójcicki, Stephen So, Kuldip K. Paliwal

Noise reduction based on adaptive β-order generalized spectral subtraction for speech enhancement
Junfeng Li, Shuichi Sakamoto, Satoshi Hongo, Masato Akagi, Yôiti Suzuki

Class constrained ROVER based speech enhancement
Amit Das, John H. L. Hansen

EMD based soft-thresholding for speech enhancement
Erhan Deger, Md. Khademul Islam Molla, Keikichi Hirose, Nobuaki Minematsu, Md. Kamrul Hasan

An approximate solution for perceptually constrained signal subspace speech enhancement method
Adam Borowicz, Alexander Petrovsky

Quality assessment of speech enhancement systems by separation of enhanced speech, noise, and echo
Tim Fingscheidt, Suhadi Suhadi

Perceptual musical noise reduction using critical bands tonality coefficients and masking thresholds
Anis Ben Aicha, Sofia Ben Jebara

On optimal estimation of compressed speech for hearing aids
Dirk Mauler, Anil M. Nagathil, Rainer Martin

DFT domain subspace based noise tracking for speech enhancement
Richard C. Hendriks, Jesper Jensen, Richard Heusdens

Noise tracking for speech systems in adverse environments
Nitish Krishnamurthy, John H. L. Hansen

Speech enhancement using multi-reference noise reduction in a vehicle environment
Abderrahman Essebbar, Tristan Poinsard

Blind adaptive principal eigenvector beamforming for acoustical source separation
Ernst Warsitz, Reinhold Haeb-Umbach, Dang Hai Tran Vu

Time-domain blind audio source separation using advanced ICA methods
Zbyněk Koldovský, Petr Tichavský

Model-based speech separation with single-microphone input
S. W. Lee, Frank K. Soong, P. C. Ching

Multi-step linear prediction based speech dereverberation in noisy reverberant environment
Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Masato Miyoshi

A statistical model based post-filtering algorithm for residual echo suppression
Seung Yeol Lee, Jong Won Shin, Hwan Sik Yun, Nam Soo Kim

An optimal speech enhancement under speech uncertainty probability and masking property of auditory system
Xiaoshan Huang, Xiaoqun Zhao






Phonetics and Phonology


The phonetics and phonology of high and low tones in two falling f0-contours in standard German
Tamara Rathcke, Jonathan Harrington

Temporal alignment of creaky voice in neutralised realisations of an underlying, post-nasal voicing contrast in German
Tina John, Jonathan Harrington

The duration of speech pauses in a multilingual environment
Mike Demol, Werner Verhelst, Piet Verhoeve

Syllable timing patterns in Polish: results from annotation mining
Dafydd Gibbon, Jolanta Bachan, Grażyna Demenko

Minimal pairs and functional loads of sound contrasts obtained from a list of modern greek words
Constandinos Kalimeris, Stelios Bakamidis

More on acoustic correlates of stress
Daan Wissing

Comparing praat and snack formant measurements on two large corpora of northern and southern French
Cécile Woehrling, Philippe Boula de Mareüil

The phonetic exponency of phrasal accentuation in French and German
William Barry, Bistra Andreeva, Ingmar Steiner

Phonetic geminates in cypriot greek: the case of voiceless plosives
Christiana Christodoulou

Predicting vowel duration in spontaneous canadian French speech
Darcie Williams, François Poiré

Rhotic variation and schwa epenthesis in windsor French
Ivan Chow, François Poiré

On the categorical nature of the process involved in schwa elision in French
Audrey Bürki, Cécile Fougeron, Cédric Gendrot

Exploring tonal variations via context-dependent tone models
Yue-Ning Hu, Min Chu, Chao Huang, Yan-Ning Zhang

Acoustic analysis of the neutral tone in Mandarin
Philippe Martin, Jun Li

F0 analysis of perceptual distance among Cantonese level tones
Rerrario Shui-Ching Ho, Yoshinori Sagisaka


Features for ASR


Extended powered cepstral normalization (p-CN) with range equalization for robust features in speech recognition
Chang-wen Hsu, Lin-shan Lee

Selection of optimal dimensionality reduction method using chernoff bound for segmental unit input HMM
Makoto Sakai, Norihide Kitaoka, Seiichi Nakagawa

Fepstrum: an improved modulation spectrum for ASR
Vivek Tyagi

Narrowband to wideband feature expansion for robust multilingual ASR
Dušan Macho

Non-linear spectral contrast stretching for in-car speech recognition
Weifeng Li, Hervé Bourlard

Clustering-based two-dimensional linear discriminant analysis for speech recognition
Xiao-Bing Li, Douglas O'Shaughnessy

A study on temporal features derived by analytic signal
Yotaro Kubo, Shigeki Okawa, Akira Kurematsu, Katsuhiko Shirai

Dimensionality reduction of speech features using nonlinear principal components analysis
Stephen A. Zahorian, Tara Singh, Hongbing Hu

Linear transformation approach to VTLN using dynamic frequency warping
D. R. Sanand, D. Dinesh Kumar, S. Umesh

Features interpolation domain for distributed speech recognition and performance for ITU-t g.723.1 CODEC
Vladimir Fabregas Surigué de Alencar, Abraham Alcaim

Dynamic integration of multiple feature streams for robust real-time LVCSR
Shoei Sato, Kazuo Onoe, Akio Kobayashi, Shinich Homma, Toru Imai, Tohru Takagi, Tetsunori Kobayashi

PCA-based feature extraction for fluctuation in speaking style of articulation disorders
Hironori Matsumasa, Tetsuya Takiguchi, Yasuo Ariki, Ichao Li, Toshitaka Nakabayashi

Multi-stream features combination based on dempster-shafer rule for LVCSR system
Fabio Valente, Jithendra Vepa, Hynek Hermansky

Dimensionality reduction for speech recognition using neighborhood components analysis
Natasha Singh-Miller, Michael Collins, Timothy J. Hazen

Probabilistic latent speaker analysis for large vocabulary speech recognition
Dan Su, Xihong Wu, Huisheng Chi

MRASTA and PLP in automatic speech recognition
S. R. Mahadeva Prasanna, Hynek Hermansky


Objective Assessment of Voice and Speech Quality


Women's vocal aging: a longitudinal approach
Markus Brückl

Effect of intensive voice therapy on vocal tremor for parkinson speakers
Laurence Cnockaert, Jean Schoentgen, Canan Ozsancak, Pascal Auzou, Francis Grenez

Assessment of vocal dysperiodicities in connected disordered speech
A. Alpan, A. Kacha, Francis Grenez, Jean Schoentgen

Effects of FE modelled consequences of tonsillectomy on perceptual evaluation of voice
Anne-Maria Laukkanen, Jaromír Horáček, Pavel Švancara, Elina Lehtinen

Speech quality after major surgery of the oral cavity and oropharynx with microvascular soft tissue reconstruction
Irma M. Verdonck-de Leeuw, Louis ten Bosch, Li Ying Chao, Rico N. P. M. Rinkel, Pepijn A. Borggreven, Lou Boves, C. René Leemans

Voice fatigue and use of speech recognition: a study of voice quality ratings
Christel de Bruijn, Sandra Whiteside

Complementary approaches for voice disorder assessment
Jean-François Bonastre, Corinne Fredouille, A. Ghio, A. Giovanni, G. Pouchoulin, J. Révis, B. Teston, P. Yu

Frequency study for the characterization of the dysphonic voices
G. Pouchoulin, Corinne Fredouille, Jean-François Bonastre, A. Ghio, A. Giovanni

Acoustic correlates of laryngeal-muscle fatigue: findings for a phonometric prevention of acquired voice pathologies
Victor J. Boucher

Automatic scoring of the intelligibility in patients with cancer of the oral cavity
Andreas Maier, Maria Schuster, Anton Batliner, Elmar Nöth, Emeka Nkenke

Automatic assessment of children's reading level
Jacques Duchateau, Leen Cleuren, Hugo Van hamme, Pol Ghesquière

Using waveform matching techniques in the measurement of shimmer in voiced signals
Carlos Ferrer, María E. Hernández-Díaz, Eduardo González

Analysis of the impact of analogue telephone channel on MFCC parameters for voice pathology detection
R. Fraile, J. I. Godino-Llorente, N. Sáenz-Lechón, V. Osma-Ruiz, P. Gómez-Vilda

Objective parameters from videokymographic images: a user-friendly interface
C. Manfredi, L. Bocchi, G. Cantarella, G. Peretti, G. Guidi, V. Mezzatesta



Resource Acquisition and Preparation; Resource and System Evaluation


JAAE: the java abstract annotation editor
Ivan Habernal, Miloslav Konopík

How to judge reusability of existing speech corpora for target task by utilizing statistical multidimensional scaling
Goshu Nagino, Makoto Shozakai, Kiyohiro Shikano

Feasibility of constructing an expressive speech corpus from television soap opera dialogue
Peter Rutten

Collection of empirical data for standardization of generic vocabularies in speech driven ICT devices and services
Rosemary Orr, Bernat González i Llinares, Françoise Petersen, Helge Hüttenrauch, Martin Böcker, Michael Tate

Acoustic-phonetic features for refining the explicit speech segmentation
Antonio Marcos Selmini, Fábio Violaro

Text island spotting in large speech databases
B. Lecouteux, Georges Linarès, Frédéric Beaugendre, Pascal Nocera

People watcher: a game for eliciting human-transcribed data for automated directory assistance
Tim Paek, Yun-Cheng Ju, Christopher Meek

The effect of speech interface accuracy on driving performance
Andrew Kun, Tim Paek, Zeljko Medenica

Context constrained-generalized posterior probability for verifying phone transcriptions
Hua Zhang, Lijuan Wang, Frank K. Soong, Wenju Liu

Getting start with UTDrive: driver-behavior modeling and assessment of distraction for in-vehicle speech systems
Pongtep Angkititrakul, DongGu Kwak, SangJo Choi, JeongHee Kim, Anh PhucPhan, Amardeep Sathyanarayana, John H. L. Hansen

Relative evaluation of informativeness in machine generated summaries
BalaKrishna Kolluru, Yoshihiko Gotoh

A method for evaluating task-oriented spoken dialog translation systems based on communication efficiency
Toshiyuki Takezawa, Masahide Mizushima, Tohru Shimizu, Genichiro Kikui

Using eye movements for online evaluation of speech synthesis
Charlotte van Hooijdonk, Edwin Commandeur, Reinier Cozijn, Emiel Krahmer, Erwin Marsi

Sentence level intelligibility evaluation for Mandarin text-to-speech systems using semantically unpredictable sentences
Jian Li, Dmitry Sityaev, Jie Hao

N-best: the northern- and southern-dutch benchmark evaluation of speech recognition technology
Judith Kessens, David A. van Leeuwen

A MAP based approach to adaptive speech intelligibility measurements
Trym Holter, Svein Sørsdal

Phone boundary detection using selective refinements and context-dependent acoustic features
Sirinoot Boonsuk, Proadpran Punyabukkana, Atiwong Suchato


ASR: New Paradigms


Modeling context and language variation for non-native speech recognition
Tien-Ping Tan, Laurent Besacier

An evaluation of cross-language adaptation and native speech training for rapid HMM construction based on very limited training data
Xufang Zhao, Douglas O'Shaughnessy

Never-ending learning with dynamic hidden Markov network
Konstantin Markov, Satoshi Nakamura

Building multiple complementary systems using directed decision trees
C. Breslin, M. J. F. Gales

Automatic speech recognition framework for multilingual audio contents
Hiroaki Nanjo, Yuichi Oku, Takehiko Yoshimi

Combined acoustic and pronunciation modelling for non-native speech recognition
G. Bouselmi, Dominique Fohr, I. Illina

Automatic estimation of scaling factors among probabilistic models in speech recognition
Tadashi Emori, Yoshifumi Onishi, Koichi Shinoda

Memory efficient modeling of polyphone context with weighted finite-state transducers
Emilian Stoimenov, John McDonough

Extra large vocabulary continuous speech recognition algorithm based on information retrieval
Valeriy Pylypenko

PocketSUMMIT: small-footprint continuous speech recognition
I. Lee Hetherington

Development of preschool children subsystem for ASR and q&a in a real-environment speech-oriented guidance task
Tobias Cincarek, Izumi Shindo, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

A study on word detector design and knowledge-based pruning and rescoring
Chengyuan Ma, Chin-Hui Lee

Parameter tuning for fast speech recognition
Thomas Colthurst, Tresi Arvizo, Chia-Lin Kao, Owen Kimball, Stephen A. Lowe, David R. H. Miller, Jim Van Sciver

A computational model for unsupervised word discovery
Louis ten Bosch, Bert Cranen

Phoneme confusions in human and automatic speech recognition
Bernd T. Meyer, Matthias Wächter, Thomas Brand, Birger Kollmeier

Construction of spoken language model including fillers using filler prediction model
Kengo Ohta, Masatoshi Tsuchiya, Seiichi Nakagawa

Attention shift decoding for conversational speech recognition
Raghunandan Kumaran, Jeff Bilmes, Katrin Kirchhoff


Speech and Language Technology for Less-resourced Languages


A morpho-graphemic approach for the recognition of spontaneous speech in agglutinative languages - like Hungarian
Péter Mihajlik, Tibor Fegyó, Zoltán Tüske, Pavel Ircing

A semi-supervised learning approach for morpheme segmentation for an Arabic dialect
Mei Yang, Jing Zheng, Andreas Kathol

Accelerating the annotation of lexical data for less-resourced languages
Gerhard B. van Huyssteen, Martin J. Puttkammer

On web-based creation of speech resources for less-resourced languages
Christoph Draxler

Building an information retrieval system for serbian - challenges and solutions
Miroslav Martinović, Srdjan Vesić, Goran Rakić

Bootstrapping morphological analysis of gĩkũyũ using unsupervised maximum entropy learning
Guy De Pauw, Peter Waiganjo Wagacha

The voiceTRAN machine translation system
Jerneja Žganec Gros, Stanislav Gruden

MuLAS: a framework for automatically building multi-tier corpora
Sérgio Paulo, Luís C. Oliveira

Creating multimedia dictionaries of endangered languages using LEXUS
Jacquelijn Ringersma, Marc Kemps-Snijders

IceNLP: a natural language processing toolkit for icelandic
Hrafn Loftsson, Eiríkur Rögnvaldsson

Phonotactic spoken language identification with limited training data
Marius Peche, Marelie Davel, Etienne Barnard

Automatic speech recognition for an under-resourced language - amharic
Solomon Teferra Abate, Wolfgang Menzel

Information retrieval strategies for accessing african audio corpora
Abdillahi Nimaan, Pascal Nocera, Frédéric Béchet, Jean-François Bonastre

Morfessor and variKN machine learning tools for speech and language technology
Vesa Siivola, Mathias Creutz, Mikko Kurimo

Towards better language modeling for Thai LVCSR
Markpong Jongtaveesataporn, Issara Thienlikit, Chai Wutiwiwatchai, Sadaoki Furui




Speech Coding and Transmission


Normalized two stage SVQ for minimum complexity wide-band LSF quantization
Saikat Chatterjee, T. V. Sreenivas

A novel 2kb/s waveform interpolation speech coder based on non-negative matrix factorization
Peng Zhang, Chang-chun Bao

A novel energy distribution comparison approach for robust speech spectrum vector quantization
Ahmed Ismail, Yasser Dakroury, Hazem Abbas

Novel low-band phase representation for low bit-rate speech coding
Ahmed Ismail, Yasser Dakroury, Hazem Abbas

Perceptual-based playout mechanisms for multi-stream voice over IP networks
Chun-Feng Wu, Cheng-Lung Lee, Wen-Whei Chang

Time-warping and re-phasing in packet loss concealment
Robert Zopf, Jes Thyssen, Juin-Hwey Chen

The harmonic model codec (HMC) framework for voIP
Yannis Agiomyrgiannakis, Yannis Stylianou

Bit-erasure channel decoding for GMM-based multiple description coding
Yannis Agiomyrgiannakis, Yannis Stylianou

Degradation-classification assisted single-ended quality measurement of speech
Hua Yuan, Tiago H. Falk, Wai-Yip Chan

Concept and evaluation of a downward-compatible system for spatial teleconferencing using automatic speaker clustering
Alexander Raake, Sascha Spors, Jens Ahrens, Jitendra Ajmera

Speech quality estimation using packet loss effects in CELP-type speech coders
Min-Ki Lee, Kyung-Tae Kim, Hong-Goo Kang, Dae Hee Youn

An 8-32 kbit/s scalable wideband coder extended with MDCT-based bandwidth extension on top of a 6.8 kbit/s narrowband CELP coder
Masahiro Oshikiri, Hiroyuki Ehara, Toshiyuki Morii, Tomofumi Yamanashi, Kaoru Satoh, Koji Yoshida









Speech Synthesis I, II


An HMM-based speech synthesis system applied to German and its adaptation to a limited set of expressive football announcements
Sacha Krstulović, Anna Hunecke, Marc Schröder

Statistical vowelization of Arabic text for speech synthesis in speech-to-speech translation systems
Liang Gu, Wei Zhang, Lazkin Tahir, Yuqing Gao

A pair-based language model for the robust lexical analysis in Chinese text-to-speech synthesis
Wu Liu, Dezhi Huang, Yuan Dong, Xinnian Mao, Haila Wang

A trainable excitation model for HMM-based speech synthesis
R. Maia, Tomoki Toda, Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda

Cross-language phonemisation in German text-to-speech synthesis
Jochen Steigner, Marc Schröder

Preliminary experiments toward automatic generation of new TTS voices from recorded speech alone
Ryuki Tachibana, Tohru Nagano, Gakuto Kurata, Masafumi Nishimura, Noboru Babaguchi

Implementation and evaluation of an HMM-based Thai speech synthesis system
Suphattharachai Chomphan, Takao Kobayashi

Speech synthesis enhancement in noisy environments
Davide Bonardo, Enrico Zovato

Tagging syllable boundaries with joint n-gram models
Helmut Schmid, Bernd Möbius, Julia Weidenkaff

Hierarchical non-uniform unit selection based on prosodic structure
Jun Xu, Dezhi Huang, Yongxin Wang, Yuan Dong, Lianhong Cai, Haila Wang

Control of an articulatory speech synthesizer based on dynamic approximation of spatial articulatory targets
Peter Birkholz

A preselection method based on cost degradation from the optimal sequence for concatenative speech synthesis
Nobuyuki Nishizawa, Hisashi Kawai

Line cepstral quefrencies and their use for acoustic inventory coding
Guntram Strecha, Matthias Eichner, Rüdiger Hoffmann

Articulatory acoustic feature applications in speech synthesis
Peter Cahill, Daniel Aioanei, Julie Carson-Berndsen

Approaches for adaptive database reduction for text-to-speech synthesis
Aleksandra Krul, Géraldine Damnati, François Yvon, Cédric Boidin, Thierry Moudenc

Exploiting unlabeled internal data in conditional random fields to reduce word segmentation errors for Chinese texts
Richard Tzong-Han Tsai, Hsi-Chuan Hung, Hong-Jie Dai, Wen-Lian Hsu

On the role of spectral dynamics in unit selection speech synthesis
Barry Kirkpatrick, Darragh O'Brien, Ronán Scaife, Andrew Errity

ugloss: a framework for improving spoken language generation understandability
Brian Langner, Alan W. Black

Combination of LSF and pole based parameter interpolation for model-based diphone concatenation
Karl Schnell, Arild Lacroix

Automatic building of synthetic voices from large multi-paragraph speech databases
Kishore Prahallad, Arthur R. Toth, Alan W. Black

Automatic phonetic segmentation of Spanish emotional speech
A. Gallardo-Antolín, R. Barra, Marc Schröder, Sacha Krstulović, J. M. Montero

Iterative unit selection with unnatural prosody detection
Dacheng Lin, Yong Zhao, Frank K. Soong, Min Chu, Jieyu Zhao



Improved Acoustic Modeling for ASR


Improved HMM/SVM methods for automatic phoneme segmentation
Jen-Wei Kuo, Hung-Yi Lo, Hsin-Min Wang

Gaussian mixture optimization for HMM based on efficient cross-validation
Takahiro Shinozaki, Tatsuya Kawahara

Model-space MLLR for trajectory HMMs
Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda

In-context phone posteriors as complementary features for tandem ASR
Hamed Ketabdar, Hervé Bourlard

Phone-discriminating minimum classification error (p-MCE) training for phonetic recognition
Qian Qian, Xiaodong He, Li Deng

Improved acoustic modeling for transcribing Arabic broadcast data
Lori Lamel, Abdel. Messaoudi, Jean-Luc Gauvain

String and lattice based discriminative training for the corpus of spontaneous Japanese lecture transcription task
Erik McDermott, Atsushi Nakamura

Discriminative noise adaptive training approach for an environment migration
Byung-Ok Kang, Ho-Young Jung, Yun-Keun Lee

Word confusability - measuring hidden Markov model similarity
Jia-Yu Chen, Peder A. Olsen, John R. Hershey

Speech recognition with state-based nearest neighbour classifiers
Thomas Deselaers, Georg Heigold, Hermann Ney

HMM-based speech recognition using decision trees instead of GMMs
Remco Teunen, Masami Akamine

An improved method for unsupervised training of LVCSR systems
Christian Gollan, Stefan Hahn, Ralf Schlüter, Hermann Ney

A variational approach to robust maximum likelihood estimation for speech recognition
Mohamed Kamal Omar

Generating small, accurate acoustic models with a modified Bayesian information criterion
Kai Yu, Rob A. Rutenbar

Sparse Gaussian graphical models for speech recognition
Peter Bell, Simon King

An HMM acoustic model incorporating various additional knowledge sources
Sakriani Sakti, Konstantin Markov, Satoshi Nakamura

Comparison of subspace methods for Gaussian mixture models in speech recognition
Matti Varjokallio, Mikko Kurimo



Systems for LVCSR and Rich Transcription I, II


The RWTH 2007 TC-STAR evaluation system for european English and Spanish
Jonas Lööf, Christian Gollan, Stefan Hahn, Georg Heigold, B. Hoffmeister, Christian Plahl, David Rybach, Ralf Schlüter, Hermann Ney

Using direction of arrival estimate and acoustic feature information in speaker diarization
Eugene Chin Wei Koh, Hanwu Sun, Tin Lay Nwe, Trung Hieu Nguyen, Bin Ma, Eng Siong Chng, Haizhou Li, Susanto Rahardja

Recovering punctuation marks for automatic speech recognition
Fernando Batista, Diamantino Caseiro, Nuno Mamede, Isabel Trancoso

Disfluency correction of spontaneous speech using conditional random fields with variable-length features
Jui-Feng Yeh, Chung-Hsien Wu, Wei-Yen Wu

Detection, diarization, and transcription of far-field lecture speech
Jing Huang, Etienne Marcheret, Karthik Visweswariah, Vit Libal, Gerasimos Potamianos

Speech-based annotation and retrieval of digital photographs
Timothy J. Hazen, Brennan Sherry, Mark Adler

Co-training using prosodic and lexical information for sentence segmentation
Umit Guz, Sébastien Cuendet, Dilek Hakkani-Tür, Gokhan Tur

Extracting true speaker identities from transcriptions
Yannick Estève, Sylvain Meignier, Paul Deléglise, Julie Mauclair

An improved speaker diarization system
Rong Fu, Ian D. Benest

The ISL 2007 English speech transcription system for european parliament speeches
Sebastian Stüker, Christian Fügen, Florian Kraft, Matthias Wölfel

Advances in Mandarin broadcast speech recognition
Mei-Yuh Hwang, Wen Wang, Xin Lei, Jing Zheng, Ozgur Cetin, Gang Peng

Automatic transcription for a web 2.0 service to search podcasts
Jun Ogata, Masataka Goto, Kouichirou Eto






First Language, Second Language, Cross-language


Perception and production of word-final alveolar stops by brazilian portuguese learners of English
Melissa Bettoni-Techio, Andréia S. Rauber, Rosana Denise Koerich

The relationship between the perception and production of English nasal codas by brazilian learners of English
Denise Cristina Kluge, Andréia S. Rauber, Mara Silvia Reis, Ricardo A. Hoffmann Bion

CALL courseware for learning reactive tokens in face-to-face dialogs
Takafumi Utashiro, Goh Kawai

The developmental analysis of demonstrative expression skills utilizing a multimodal infant behavior corpus
Shinya Kiriyama, Ryo Tsuji, Tomohiko Kasami, Shogo Ishikawa, Naofumi Otani, Hiroaki Horiuchi, Yoichi Takebayashi, Shigeyoshi Kitazawa

Russian vowels system acoustic features development in ontogenesis
Elena E. Lyakso, Olga V. Frolova

The role of metrical stress in comprehension and production in dutch children at-risk of dyslexia
Petra van Alphen, Elise de Bree, Paula Fikkert, Frank Wijnen

A statistical method of evaluating pronunciation proficiency for presentation in English
Seiichi Nakagawa, Kei Ohta

The intelligibility and its relations to acoustic characteristics of English /s/ and /esh/ produced by native speakers of Japanese
Akiyo Joto, Yoshiki Nagase, Seiya Funatsu

The limits of multidimensional category learning
Martijn Goudbeek, Daniel Swingley, Keith R. Kluender

Mobile adaptive CALL (MAC): a lightweight speech-based intervention for mobile language learners
Maria Uther, James Uther, Panos Athanasopoulos, Pushpendra Singh, Reiko Akahane-Yamada

English and French speakers' perception of voicing distinctions in non-native lateral consonant syllable onsets
Catherine T. Best, Pierre A. Hallé, Jennifer S. Pardo

Predicting the consequences of vocalizations in early infancy
Francisco Lacerda, Lisa Gustavsson

Learning tone distinctions for Mandarin Chinese
David Weenink, Guangqin Chen, Zongyan Chen, Stefan de Konink, Dennis Vierkant, Eveline van Hagen, R. J. J. H. van Son

Perception of disfluency: language differences and listener bias
Catherine Lai, Kyle Gorman, Jiahong Yuan, Mark Liberman



Systems for Spoken Language Translation I, II


Improved machine translation of speech-to-text outputs
Daniel Déchelotte, Holger Schwenk, Gilles Adda, Jean-Luc Gauvain

Improvements in machine translation for English/iraqi speech translation
S. Saleem, K. Subramanian, R. Prasad, David Stallard, Chia-Lin Kao, P. Natarajan, R. Suleiman

Improving speech translation with automatic boundary prediction
Evgeny Matusov, Dustin Hillard, Mathew Magimai-Doss, Dilek Hakkani-Tür, Mari Ostendorf, Hermann Ney

Punctuating confusion networks for speech translation
Roldano Cattoni, Nicola Bertoldi, Marcello Federico

Integration of ASR and machine translation models in a document translation task
Aarthi Reddy, Richard Rose, Alain Désilets

Bilingual LSA-based translation lexicon adaptation for spoken language translation
Yik-Cheung Tam, Tanja Schultz

The BBN 2007 displayless English/iraqi speech-to-speech translation system
David Stallard, Fred Choi, Chia-Lin Kao, Kriste Krstovski, P. Natarajan, R. Prasad, S. Saleem, K. Subramanian

Context dependent word modeling for statistical machine translation using part-of-speech tags
Ruhi Sarikaya, Yonggang Deng, Yuqing Gao

Translating conversational speech to standard linguistic form
Darren Scott Appling, Nick Campbell

Using inter-lingual triggers for machine translation
Caroline Lavecchia, Kamel Smaïli, David Langlois, Jean-Paul Haton

The IRST English-Spanish translation system for european parliament speeches
Daniele Falavigna, Nicola Bertoldi, Fabio Brugnara, Roldano Cattoni, Mauro Cettolo, Boxing Chen, Marcello Federico, Diego Giuliani, Roberto Gretter, Deepa Gupta, Dino Seppi

The influence of utterance chunking on machine translation performance
Christian Fügen, Muntsin Kolss

Iraqcomm: a next generation translation system
Kristin Precoda, Jing Zheng, Dimitra Vergyri, Horacio Franco, Colleen Richey, Andreas Kathol, Sachin Kajarekar

Optimizing sentence segmentation for spoken language translation
Sharath Rao, Ian Lane, Tanja Schultz












Voice Activity Detection and Sound Classification


Speech-nonspeech discrimination using the information bottleneck method and spectro-temporal modulation index
Maria Markaki, Michael Wohlmayr, Yannis Stylianou

A uniformly most powerful test for statistical model-based voice activity detection
Keun Won Jang, Dong Kook Kim, Joon-Hyuk Chang

Direct optimisation of a multilayer perceptron for the estimation of cepstral mean and variance statistics
John Dines, Jithendra Vepa

Filtering the unknown: speech activity detection in heterogeneous video collections
Marijn Huijbregts, Chuck Wooters, Roeland Ordelman

Environmentally aware voice activity detector
Abhijeet Sangwan, Nitish Krishnamurthy, John H. L. Hansen

Noise robust voice activity detection based on switching kalman filter
Masakiyo Fujimoto, Kentaro Ishizuka

Voice activity detection based on support vector machine using effective feature vectors
Q-Haing Jo, Yun-Sik Park, Kye-Hwan Lee, Ji-Hyun Song, Joon-Hyuk Chang

Voice activity detection in degraded speech using excitation source information
K Sri Rama Murty, B Yegnanarayana, S Guruprasad

Evaluation of real-time voice activity detection based on high order statistics
David Cournapeau, Tatsuya Kawahara

Robust voice activity detection based on adaptive sub-band energy sequence analysis and harmonic detection
Yanmeng Guo, Qian Qian, Yonghong Yan

The influence of speech activity detection and overlap on speaker diarization for meeting room recordings
Corinne Fredouille, Nicholas Evans

Voice activity detection using the phase vector in microphone array
Gibak Kim, Nam Ik Cho

Adaptive weighting of microphone arrays for distant-talking F0 and voiced/unvoiced estimation
Federico Flego, Christian Zieger, Maurizio Omologo

Robust and high-resolution voiced/unvoiced classification in noisy speech using a signal smoothness criterion
A. Sreenivasa Murthy, S. Chandra Sekhar, T. V. Sreenivas

Audio classification using extended baum-welch transformations
Tara N. Sainath, Victor Zue, Dimitri Kanevsky

Automatic laughter detection using neural networks
Mary Tai Knox, Nikki Mirghafori

Automatic acoustic segmentation for speech recognition on broadcast recordings
Gang Peng, Mei-Yuh Hwang, Mari Ostendorf



×

Keynotes 1-4

Discriminative and Large Margin Techniques in Acoustic Modeling

Speech Production I, II

Phonetic Segmentation and Classification I, II

Discourse, Dialog and Conversation

Spoken Dialog Systems I, II

Accent and Language Identification I, II

Education and Training

Robust ASR I, II

Adaptation in ASR I, II

Speaker Verification & Identification I-IV

Spoken Data Retrieval I, II

Speech Perception I, II

Prosody: Prosodic Structure

Prosodic Modeling I, II

Speech Analysis

Spectral Analysis, Formants and Vocal Tract Models

Speech and Audio Processing for Intelligent Environments

Language Modeling I, II

Prosody Production and Perception

Multimodal Speech Recognition

Speech and Other Modalities

Multimodal/Multimedia Signal Processing

Speech Enhancement

Structure-based and Template-based Automatic Speech Recognition

Robust ASR Against Noise and Reverberation

Language Resources and Tools

Single-channel Speech Enhancement

Phonetics and Phonology

Features for ASR

Objective Assessment of Voice and Speech Quality

Discourse, Dialog and Emotion Expression

Resource Acquisition and Preparation; Resource and System Evaluation

ASR: New Paradigms

Speech and Language Technology for Less-resourced Languages

Spoken Language Understanding

Pitch Extraction I, II

Speech Coding and Transmission

Topics in Acoustic Modeling

Confidence Measures (and Related Topics)

Grapheme-to-Phoneme Conversion

Lexical and Prosodic Modeling

Speech Recognition by Automatic Attribute Transcription

Speaker Diarization

First and Second Language Learning

Speech Synthesis I, II

Voice Conversion and Modification

Improved Acoustic Modeling for ASR

Multilingualism in Speech and Language Processing

Systems for LVCSR and Rich Transcription I, II

Language Learning and Assessment

Multimodal Interaction: Analysis and Technology

Emotion

Speakers: Expression, Emotion and Personality Recognition

First Language, Second Language, Cross-language

Novel Techniques for the NATO Non-native Air-traffic Control and HIWIRE Cockpit Databases

Systems for Spoken Language Translation I, II

Articulatory Features

Wideband Speech Processing

Accessibility Issues

New Application Areas

Story Segmentation

Prosody: Production

Prosody: Perception

Machine Learning for Spoken Dialog Systems

Phonetics

Spoken Language Understanding and Summarization

Voice Activity Detection and Sound Classification

Unreviewed Papers for Special Sessions