ISCA Archive Eurospeech 2003 Sessions Booklet
  ISCA Archive Sessions Booklet
top

8th European Conference on Speech Communication and Technology

Geneva, Switzerland
1-4 September 2003

General Chair: Hervé Bourlard



Speech Signal Processing 1-4


Speech analysis with the short-time chirp transform
Luis Weruaga, Marian Kepesi

Glottal spectrum based inverse filtering
Ixone Arroabarren, Alfonso Carlosena

A novel method of analysing and comparing responses of hearing aid algorithms using auditory time-frequency representation
G.V. Kiran, T.V. Sreenivas

Frequency-related representation of speech
Kuldip K. Paliwal, Bishnu S. Atal

Tracking a moving speaker using excitation source information
Vikas C. Raykar, Ramani Duraiswami, B. Yegnanarayana, S.R. Mahadeva Prasanna

Tracking vocal tract resonances using an analytical nonlinear predictor and a target-guided temporal constraint
Li Deng, Issam Bazzi, Alex Acero

Optimization of the CELP model in the LSP domain
Khosrow Lashkari, Toshio Miki

Transforming voice quality
Ben Gillett, Simon King

DOA estimation of speech signal using equilateral-triangular microphone array
Yusuke Hioka, Nozomu Hamada

Multi-array fusion for beamforming and localization of moving speakers
Ilyas Potamitis, George Tremoulis, Nikos Fakotakis, George Kokkinakis

Integrated pitch and MFCC extraction for speech reconstruction and speech recognition applications
Xu Shao, Ben P. Milner, Stephen J. Cox

Exploiting time warping in AMR-NB and AMR-WB speech coders
Lasse Laaksonen, Sakari Himanen, Ari Heikkinen, Jani Nurminen

A new approach to voice activity detection based on self-organizing maps
Stephan Grashey

Estimating the spectral envelope of voiced speech using multi-frame analysis
Yoshinori Shiga, Simon King

Adaptive noise estimation using second generation and perceptual wavelet transforms
Essa Jafer, Abdulhussain E. Mahdi

A clustering approach to on-line audio source separation
Julien Bourgeois

Estimation of voice source and vocal tract characteristics based on multi-frame analysis
Yoshinori Shiga, Simon King

A new method for pitch prediction from spectral envelope and its application in voice conversion
Taoufik En-Najjary, Olivier Rosec, Thierry Chonavel

Maximum likelihood endpoint detection with time-domain features
Marco Orlandi, Alfiero Santarelli, Daniele Falavigna

Unified analysis of glottal source spectrum
Ixone Arroabarren, Alfonso Carlosena

Local regularity analysis at glottal opening and closure instants in electroglottogram signal using wavelet transform modulus maxima
Aicha Bouzid, Noureddine Ellouze

Improved robustness of automatic speech recognition using a new class definition in linear discriminant analysis
M. Schaffoner, M. Katz, S.E. Kruger, A. Wendemuth

Voice conversion methods for vocal tract and pitch contour modification
Oytun Turk, Levent M. Arslan

Modulation spectrum for pitch and speech pause detection
Olaf Schreiner

Robust energy demodulation based on continuous models with application to speech recognition
Dimitrios Dimitriadis, Petros Maragos

A robust and sensitive word boundary decision algorithm
Jong Uk Kim, SangGyun Kim, Chang D. Yoo

A novel transcoding algorithm for SMV and g.723.1 speech coders via direct parameter transformation
Seongho Seo, Dalwon Jang, Sunil Lee, Chang D. Yoo

A novel rate selection algorithm for transcoding CELP-type codec and SMV
Dalwon Jang, Seongho Seo, Sunil Lee, Chang D. Yoo

Subband-based acoustic shock limiting algorithm on a low-resource DSP system
G. Choy, D. Hermann, R.L. Brennan, T. Schneider, H. Sheikhzadeh, E. Cornu

Pitch estimation using phase locked loops
Patricia A. Pelle, Matias L. Capeletto

Performance evaluation of IFAS-based fundamental frequency estimator in noisy environment
Dhany Arifianto, Takao Kobayashi

Estimation of the parameters of the quantitative intonation model with continuous wavelet analysis
Hans Kruschke, Michael Lenz

Morphological filtering of speech spectrograms in the context of additive noise
Francisco Romero Rodriguez, Wei M. Liu, Nicholas W.D. Evans, John S.D. Mason

Segmenting multiple concurrent speakers using microphone arrays
Guillaume Lathoud, Iain A. McCowan, Darren C. Moore

Segmentation of speech into syllable-like units
T. Nagarajan, Hema A. Murthy, Rajesh M. Hegde

A syllable segmentation algorithm for English and italian
Massimo Petrillo, Francesco Cutugno

Modeling speaking rate for voice fonts
Ashish Verma, Arun Kumar

A new HMM-based approach to broad phonetic classification of speech
Jouni Pohjalainen

Acoustic change detection and segment clustering of two-way telephone conversations
Xin Zhong, Mark A. Clements, Sung Lim

Blind normalization of speech from different channels
David N. Levin

Speech watermarking by parametric embedding with an l_(infinity) fidelity criterion
A.R. Gurijala, J.R. Deller Jr.


Phonology and Phonetics I


Features of contracted syllables of spontaneous Mandarin
Shu-Chuan Tseng

Durational characteristics of hindi stop consonants
K. Samudravijaya

Quantity comparison of Japanese and finnish in various word structures
Toshiko Isei-Jaakkola

Broad focus across sentence types in greek
Mary Baltazani

Analysis and modeling of syllable duration for Thai speech synthesis
Chatchawarn Hansakunbuntheung, Virongrong Tesprasit, Rungkarn Siricharoenchai, Yoshinori Sagisaka

Reaction time as an indicator of discrete intonational contrasts in English
Aoju Chen

Corpus-based syntax-prosody tree matching
Dafydd Gibbon

A new approach to segment and detect syllables from high-speed speech
D.W. Ying, W. Gao, W.Q. Wang

Information structure and efficiency in speech production
R.J.J.H. van Son, Louis C.W. Pols

Learning rule ranking by dynamic construction of context-free grammars using AND/OR graphs
Anna Corazza, Louis ten Bosch

The effect of surrounding phrase lengths on pause duration
Elena Zvonik, Fred Cummins

Statistical estimation of phoneme's most stable point based on universal constraint
Shigeki Okawa, Katsuhiko Shirai

Independent automatic segmentation by self-learning categorial pronunciation rules
N. Beringer

Prosodic correlates of contrastive and non-contrastive themes in German
Bettina Braun, D. Robert Ladd

Accentual lengthening in standard Chinese: evidence from four-syllable constituents
Yiya Chen

Syllable structure based phonetic units for context-dependent continuous Thai speech recognition
Supphanat Kanokphara

An acoustic phonetic analysis of diphthongs in ningbo Chinese
Fang Hu

Latent ability to manipulate phonemes by Japanese preliterates in roman alphabet
Takashi Otake, Yoko Sakamoto

The /i/-/a/-/u/-ness of spoken vowels
Hartmut R. Pfitzinger


Topics in Prosody and Emotional Speech


Transforming F0 contours
Ben Gillett, Simon King

Evaluation of the affect of speech intonation using a model of the perception of interval dissonance and harmonic tension
Norman D. Cook, Takeshi Fujisawa, Kazuaki Takami

A new pitch modeling approach for Mandarin speech
Wen-Hsing Lai, Yih-Ru Wang, Sin-Horng Chen

Bayesian induction of intonational phrase breaks
P. Zervas, M. Maragoudakis, Nikos Fakotakis, George Kokkinakis

Predicting the perceptive judgment of voices in a telecom context: selection of acoustic parameters
T. Ehrette, N. Chateau, Christophe d'Alessandro, V. Maffiolo

Stress-based speech segmentation revisited
Sven L. Mattys

Emotion recognition by speech signals
Oh-Wook Kwon, Kwokleung Chan, Jiucang Hao, Te-Won Lee

Automatic prosodic prominence detection in speech using acoustic features: an unsupervised system
Fabio Tamburini

Improved emotion recognition with large set of statistical features
Vladimir Hozjan, Zdravko Kacic

Recognition of intonation patterns in Thai utterance
Patavee Charnvivit, Nuttakorn Thubthong, Ekkarit Maneenoi, Sudaporn Luksaneeyanawin, Somchai Jitapunkul

Use of linguistic information for automatic extraction of f_0 contour generation process model parameters
Keikichi Hirose, Yusuke Furuyama, Shuichi Narusawa, Nobuaki Minematsu, Hiroya Fujisaki

Potential audiovisual correlates of contrastive focus in French
Marion Dohen, Hélène Loevenbruck, Marie-Agnes Cathiard, Jean-Luc Schwartz

How does human segment the speech by prosody ?
Toshie Hatano, Yasuo Horiuchi, Akira Ichikawa

Language-reconfigurable universal phone recognition
B.D. Walker, B.C. Lackey, J.S. Muller, P.J. Schone

Emotion recognition using a data-driven fuzzy inference system
Chul Min Lee, Shrikanth Narayanan

Effects of voice prosody by computers on human behaviors
Noriko Suzuki, Yohei Yabuta, Yugo Takeuchi, Yasuhiro Katagiri

An investigation of intensity patterns for German
Oliver Jokisch, Marco Kuhne

Segmental durations predicted with a neural network
Joao Paulo Teixeira, Diamantino Freitas

Generation and perception of f_0 markedness in conversational speech with adverbs expressing degrees
Takumi Yamashita, Yoshinori Sagisaka

Quantitative analysis and synthesis of syllabic tones in vietnamese
Hansjorg Mixdorff, Nguyen Hung Bach, Hiroya Fujisaki, Mai Chi Luong

Japanese prosodic labeling support system utilizing linguistic information
Shinya Kiriyama, Yoshifumi Mitsuta, Yuta Hosokawa, Yoshikazu Hashimoto, Toshihiko Ito, Shigeyoshi Kitazawa

Why and how to control the authentic emotional speech corpora
Veronique Auberge, Nicolas Audibert, Albert Rilliard

Prosodic cues for emotion characterization in real-life spoken dialogs
Laurence Devillers, Ioana Vasilescu


Language Modeling, Discourse and Dialog


Towards the automatic generation of mixed-initiative dialogue systems from web content
Joseph Polifroni, Grace Chung, Stephanie Seneff

A context resolution server for the galaxy conversational systems
Edward Filisko, Stephanie Seneff

Semantic and dialogic annotation for automated multilingual customer service
Hilda Hardy, Kirk Baker, Hélène Bonneau-Maynard, Laurence Devillers, Sophie Rosset, Tomek Strzalkowski

Disfluency under feedback and time-pressure
H.B.M. Nicholson, E.G. Bard, A.H. Anderson, M.L. Flecha-Garcia, D. Kenicer, L. Smallwood, J. Mullin, R.J. Lickley, Y. Chen

Control in task-oriented dialogues
Peter A. Heeman, Fan Yang, Susan E. Strayer

The 300k LIMSI German broadcast news transcription system
Kevin McTait, Martine Adda-Decker

Weighted entropy training for the decision tree based text-to-phoneme mapping
Jilei Tian, Janne Suontausta, Juha Hakkinen

Word class modeling for speech recognition with out-of-task words using a hierarchical language model
Yoshihiko Ogawa, Hirofumi Yamamoto, Yoshinori Sagisaka, Genichiro Kikui

Compound decomposition in dutch large vocabulary speech recognition
Roeland Ordelman, Arjan van Hessen, Franciska de Jong

Designing for errors: similarities and differences of disfluency rates and prosodic characteristics across domains
Guergana Savova, Joan Bachenko

Syllable classification using articulatory-acoustic features
Mirjam Wester

Hierarchical class n-gram language models: towards better estimation of unseen events in speech recognition
Imed Zitouni, Olivier Siohan, Chin-Hui Lee

Incremental and iterative monolingual clustering algorithms
Sergio Barrachina, Juan Miguel Vilar

Techniques for effective vocabulary selection
Anand Venkataraman, Wen Wang

Recognition of out-of-vocabulary words with sub-lexical language models
Lucian Galescu

A semantic representation for spoken dialogs
Hélène Bonneau-Maynard, Sophie Rosset

A corpus-based decompounding algorithm for German lexical modeling in LVCSR
Martine Adda-Decker

Modeling cross-morpheme pronunciation variations for korean large vocabulary continuous speech recognition
Kyong-Nim Lee, Minhwa Chung


Speech Synthesis: Unit Selection 1, 2


Unit selection based on voice recognition
Yi Zhou, Yiqing Zu

On unit analysis for Cantonese corpus-based TTS
Jun Xu, Thomas Choy, Minghui Dong, Cuntai Guan, Haizhou Li

Unit selection in concatenative TTS synthesis systems based on mel filter bank amplitudes and phonetic context
T. Lambert, Andrew P. Breen, Barry Eggleton, Stephen J. Cox, Ben P. Milner

Text design for TTS speech corpus building using a modified greedy selection
Baris Bozkurt, Ozlem Ozturk, Thierry Dutoit

Discriminative weight training for unit-selection based speech synthesis
Seung Seop Park, Chong Kyu Kim, Nam Soo Kim

The application of interactive speech unit selection in TTS systems
Peter Rutten, Justin Fackrell

On the design of cost functions for unit-selection speech synthesis
Francisco Campillo Diaz, Eduardo R. Banga

Kalman-filter based join cost for unit-selection speech synthesis
Jithendra Vepa, Simon King

Optimizing integrated cost function for segment selection in concatenative speech synthesis based on perceptual evaluations
Tomoki Toda, Hisashi Kawai, Minoru Tsuzaki

Automatic segmentation for czech concatenative speech synthesis using statistical approach with boundary-specific correction
Jindrich Matousek, Daniel Tihelka, Josef Psutka

Automatic speech segmentation and verification for concatenative synthesis
Chih-Chung Kuo, Chi-Shiang Kuo, Jau-Hung Chen, Sen-Chia Chang

DTW-based phonetic alignment using multiple acoustic features
Sergio Paulo, Luis C. Oliveira

Evaluating and correcting phoneme segmentation for unit selection synthesis
John Kominek, Christina L. Bennett, Alan W. Black

Control and prediction of the impact of pitch modification on synthetic speech quality
Esther Klabbers, Jan P.H. van Santen

My voice, your prosody: sharing a speaker specific prosody model across speakers in unit selection TTS
Matthew Aylett, Justin Fackrell, Peter Rutten

Learning phrase break detection in Thai text-to-speech
Virongrong Tesprasit, Paisarn Charoenpornsawat, Virach Sornlertlamvanich

A speech model of acoustic inventories based on asynchronous interpolation
Alexander B. Kain, Jan P.H. van Santen

Corpus-based synthesis of fundamental frequency contours of Japanese using automatically-generated prosodic corpus and generation process model
Keikichi Hirose, Takayuki Ono, Nobuaki Minematsu

Unit size in unit selection speech synthesis
S.P. Kishore, Alan W. Black

Restricted unlimited domain synthesis
Antje Schweitzer, Norbert Braunschweiler, Tanja Klankert, Bernd Möbius, Bettina Sauberlich

Evaluation of units selection criteria in corpus-based speech synthesis
Hélène Francois, Olivier Boeffard

Combining non-uniform unit selection with diphone based synthesis
Michael Pucher, Friedrich Neubarth, Erhard Rank, Georg Niklfeld, Qi Guan

Evolutionary weight tuning based on diphone pairs for unit selection speech synthesis
Francesc Alias, Xavier Llora

Keeping rare events rare
Ove Andersen, Charles Hoequist






Speech Modeling and Features 1-4


Linear predictive method with low-frequency emphasis
Paavo Alku, Tom Backstrom

Beyond a single critical-band in TRAP based ASR
Pratibha Jain, Hynek Hermansky

Variational Bayesian GMM for speech recognition
Fabio Valente, Christian Wellekens

Time alignment for scenario and sounds with voice, music and BGM
Yamato Wada, Masahide Sugiyama

Efficient quantization of speech excitation parameters using temporal decomposition
Phu Chien Nguyen, Masato Akagi

Distributed genetic algorithm to discover a wavelet packet best basis for speech recognition
Robert van Kommer, Beat Hirsbrunner

New model-based HMM distances with applications to run-time ASR error estimation and model tuning
Chao-Shih Huang, Chin-Hui Lee, Hsiao-Chuan Wang

Analysis of voice source characteristics using a constrained polynomial model
Tokihiko Kaburagi, Koji Kawai

Tone pattern discrimination combining parametric modeling and maximum likelihood estimation
Jinfu Ni, Hisashi Kawai

Feature selection for the classification of crosstalk in multi-channel audio
Stuart N. Wrigley, Guy J. Brown, Vincent Wan, Steve Renals

A DTW-based DAG technique for speech and speaker feature analysis
Jingwei Liu

Feature transformations and combinations for improving ASR performance
Panu Somervuo, Barry Chen, Qifeng Zhu

On the role of intonation in the organization of Mandarin Chinese speech prosody
Chiu-yu Tseng

An optimized multi-duration HMM for spontaneous speech recognition
Yuichi Ohkawa, Akihiro Yoshida, Motoyuki Suzuki, Akinori Ito, Shozo Makino

Speaker recognition using MPEG-7 descriptors
Hyoung-Gook Kim, Edgar Berdahl, Nicolas Moreau, Thomas Sikora

A comparative study on maximum entropy and discriminative training for acoustic modeling in automatic speech recognition
Wolfgang Macherey, Hermann Ney

Extraction methods of voicing feature for robust speech recognition
Andras Zolnay, Ralf Schluter, Hermann Ney

Use of a CSP-based voice activity detector for distant-talking ASR
Luca Armani, Marco Matassoni, Maurizio Omologo, Piergiorgio Svaizer

Maximum conditional mutual information projection for speech recognition
Mohamed Kamal Omar, Mark Hasegawa-Johnson

A computational model of arm gestures in conversation
Dafydd Gibbon, Ulrike Gut, Benjamin Hell, Karin Looks, Alexandra Thies, Thorsten Trippel

Nonlinear analysis of speech signals: generalized dimensions and lyapunov exponents
Vassilis Pitsikalis, Iasonas Kokkinos, Petros Maragos

Time-domain based temporal processing with application of orthogonal transformations
Petr Motlicek, Jan Cernocký

Recognition of phoneme strings using TRAP technique
Petr Schwarz, Pavel Matejka, Jan Cernocký

Comparative study on hungarian acoustic model sets and training methods
Tibor Fegyo, Peter Mihajlik, Peter Tatai

F_0 estimation of one or several voices
Alain de Cheveigne, Alexis Baskind

In search of target class definition in tandem feature extraction
Sunil Sivadas, Hynek Hermansky

Segmentation of speech for speaker and language recognition
Andre G. Adami, Hynek Hermansky

Feature generation based on maximum classification probability for improved speech recognition
Xiang Li, Richard M. Stern

Speech recognition with a generative factor analyzed hidden Markov model
Kaisheng Yao, Kuldip K. Paliwal, Te-Won Lee

Learning discriminative temporal patterns in speech: development of novel TRAPS-like classifiers
Barry Chen, Shuangyu Chang, Sunil Sivadas

Using mutual information to design class-specific phone recognizers
Patricia Scanlon, Daniel P.W. Ellis, Richard Reilly

Estimation of GMM in voice conversion including unaligned data
Helenca Duxans, Antonio Bonafonte

Trajectory modeling based on HMMs with the explicit relationship between static and dynamic features
Keiichi Tokuda, Heiga Zen, Tadashi Kitamura

On the advantage of frequency-filtering features for speech recognition with variable sampling frequencies. experiments with speechdatcar databases
Hermann Bauerecker, Climent Nadeu, Jaume Padrell

Towards the automatic extraction of fujisaki model parameters for Mandarin
Hansjorg Mixdorff, Hiroya Fujisaki, Gao Peng Chen, Yu Hu

Product of Gaussians as a distributed representation for speech recognition
S.S. Airey, M.J.F. Gales

Harmonic weighting for all-pole modeling of the voiced speech
Davor Petrinovic

Estimation of resonant characteristics based on AR-HMM modeling and spectral envelope conversion of vowel sounds
Nobuyuki Nishizawa, Keikichi Hirose, Nobuaki Minematsu

Band-independent speech-event categories for TRAP based ASR
Hynek Hermansky, Pratibha Jain

Local averaging and differentiating of spectral plane for TRAP-based ASR
Frantisek Grezl, Hynek Hermansky

Minimum variance distortionless response on a warped frequency scale
Matthias Wolfel, John McDonough, Alex Waibel

Improving the efficiency of automatic speech recognition by feature transformation and dimensionality reduction
Xuechuan Wang, Douglas O'Shaughnessy

Distributed speech recognition on the WSJ task
Jan Stadermann, Gerhard Rigoll

Integrating multilingual articulatory features into speech recognition
Sebastian Stuker, Florian Metze, Tanja Schultz, Alex Waibel

Locus equations determination using the speechdat(II)
Bojan Petek

A memory-based approach to Cantonese tone recognition
Michael Emonts, Deryle Lonsdale

Experimental evaluation of the relevance of prosodic features in Spanish using machine learning techniques
David Escudero, Valentin Cardenoso, Antonio Bonafonte

Dominance spectrum based v/UV classification and f_0 estimation
Tomohiro Nakatani, Toshio Irino, Parham Zolfaghari

Analysis and modeling of f_0 contours of portuguese utterances based on the command-response model
Hiroya Fujisaki, Shuichi Narusawa, Sumio Ohno, Diamantino Freitas

Covariation and weighting of harmonically decomposed streams for ASR
Philip J.B. Jackson, David M. Moreno, Martin J. Russell, Javier Hernando


Speech Enhancement 1, 2


A semi-blind source separation method for hands-free speech recognition of multiple talkers
Panikos Heracleous, Satoshi Nakamura, Kiyohiro Shikano

Influence of the waveguide propagation on the antenna performance in a car cabin
Leonid Krasny, Ali Khayrallah

Multi-speaker DOA tracking using interactive multiple models and probabilistic data association
Ilyas Potamitis, George Tremoulis, Nikos Fakotakis

Speech enhancement using weighting function based on the variance of wavelet coefficients
Ching-Ta Lu, Hsiao-Chuan Wang

Microphone array voice activity detection and noise suppression using wideband generalized likelihood ratio
Ilyas Potamitis, Eran Fishler

Adaptive beamforming in room with reverberation
Zoran Saric, Slobodan Jovicic

Perceptually-constrained generalized singular value decomposition-based approach for enhancing speech corrupted by colored noise
Gwo-hwa Ju, Lin-shan Lee

Blind separation and deconvolution for convolutive mixture of speech using SIMO-model-based ICA and multichannel inverse filtering
Hiroaki Yamajo, Hiroshi Saruwatari, Tomoya Takatani, Tsuyoki Nishikawa, Kiyohiro Shikano

Quality enhancement of CELP coded speech by using an MFCC based Gaussian mixture model
D.G. Raza, C.F. Chan

Enhancement of noisy speech for noise robust front-end and speech reconstruction at back-end of DSR system
Hyoung-Gook Kim, Markus Schwab, Nicolas Moreau, Thomas Sikora

Improved kalman filter-based speech enhancement
Jianqiang Wei, Limin Du, Zhaoli Yan, Hui Zeng

Speech segregation based on fundamental event information using an auditory vocoder
Toshio Irino, Roy D. Patterson, Hideki Kawahara

Time delay estimation based on hearing characteristic
Zhaoli Yan, Limin Du, Jianqiang Wei, Hui Zeng

Parametric multi-band automatic gain control for noisy speech enhancement
M. Stolbov, S. Koval, M. Khitrov

Neural networks versus codebooks in an application for bandwidth extension of speech signals
Bernd Iser, Gerhard Schmidt

Wavelet-based perceptual speech enhancement using adaptive threshold estimation
Essa Jafer, Abdulhussain E. Mahdi

A trainable speech enhancement technique based on mixture models for speech and noise
Ilyas Potamitis, Nikos Fakotakis, George Kokkinakis

Perceptual wavelet adaptive denoising of speech
Qiang Fu, Eric A. Wan

Enhancement of speech in multispeaker environment
B. Yegnanarayana, S.R. Mahadeva Prasanna, Mathew Magimai Doss

Noise reduction using paired-microphones on non-equally-spaced microphone arrangement
Mitsunori Mizumachi, Satoshi Nakamura

Improving speech intelligibility by steady-state suppression as pre-processing in small to medium sized halls
Nao Hodoshima, Takayuki Arai, Tsuyoshi Inoue, Keisuke Kinoshita, Akiko Kusumoto

Enhancement of hearing-impaired Mandarin speech
Chen-Long Lee, Ya-Ru Yang, Wen-Whei Chang, Yuan-Chuan Chiang

Speech enhancement for a car environment using LP residual signal and spectral subtraction
A. Alvarez, V. Nieto, P. Gomez, R. Martinez

Speech enhancement and improved recognition accuracy by integrating wavelet transform and spectral subtraction algorithm
Gwo-hwa Ju, Lin-shan Lee

Multi-referenced correction of the voice timbre distortions in telephone networks
Gael Mahe, Andre Gilloire

Efficient speech enhancement based on left-right HMM with state sequence detection using LRT
J.J. Lee, J.H. Lee, K.Y. Lee

Introduction of the CELP structure of the GSM coder in the acoustic echo canceller for the GSM network
H. Gnaba, M. Turki-Hadj Alouane, M. Jaidane-Saidane, P. Scalart

Extracting an AV speech source from a mixture of signals
David Sodoyer, Laurent Girin, Christian Jutten, Jean-Luc Schwartz

Speech enhancement for hands-free car phones by adaptive compensation of harmonic engine noise components
Henning Puder

Enhance low-frequency suppression of GSC beamforming
Zhaorong Hou, Ying Jia

Speech enhancement using a-priori information
Sriram Srinivasan, Jonas Samuelsson, W. Bastiaan Kleijn

Blind inversion of multidimensional functions for speech enhancement
John Hogden, Patrick Valdez, Shigeru Katagiri, Erik McDermott

Convergence improvement for oversampled subband adaptive noise and echo cancellation
H.R. Abutalebi, H. Sheikhzadeh, R.L. Brennan, G.H. Freeman

A speech dereverberation method based on the MTF concept
Masashi Unoki, Keigo Sakata, Masato Akagi

Accuracy improved double-talk detector based on state transition diagram
SangGyun Kim, Jong Uk Kim, Chang D. Yoo

Perceptual based speech enhancement for normal-hearing and hearing-impaired individuals
Ajay Natarajan, John H.L. Hansen, Kathryn Arehart, Jessica A. Rossi-Katz

Residual echo power estimation for speech reinforcement systems in vehicles
Alfonso Ortega, Eduardo Lleida, Enrique Masgrau

Dual-mode wideband speech recovery from narrowband speech
Yasheng Qian, Peter Kabal

A robust noise and echo canceller
Khaldoon Al-Naimi, Christian Sturt, Ahmet Kondoz

Computational auditory scene analysis by using statistics of high-dimensional speech dynamics and sound source direction
Johannes Nix, Michael Kleinschmidt, Volker Hohmann


Spoken Dialog Systems 1, 2


Two studies of open vs. directed dialog strategies in spoken dialog systems
Silke M. Witt, Jason D. Williams

The queen's communicator: an object-oriented dialogue manager
Ian O'Neill, Philip Hanna, Xingkun Liu, Michael McTear

Ravenclaw: dialog management using hierarchical task decomposition and an expectation agenda
Dan Bohus, Alexander I. Rudnicky

Features for tree based dialogue course management
Klaus Macherey, Hermann Ney

Development of a stochastic dialog manager driven by semantics
Francisco Torres, Emilio Sanchis, Encarna Segarra

Generation of natural response timing using decision tree based on prosodic and linguistic information
Masashi Takeuchi, Norihide Kitaoka, Seiichi Nakagawa

Child and adult speaker adaptation during error resolution in a publicly available spoken dialogue system
Linda Bell, Joakim Gustafson

Conceptual decoding for spoken dialog systems
Yannick Esteve, Christian Raymond, Frédéric Bechet, Renato De Mori

Sentence verification in spoken dialogue system
Huei-Ming Wang, Yi-Chung Lin

Detection and recognition of correction utterance in spontaneously spoken dialog
Norihide Kitaoka, Naoko Kakutani, Seiichi Nakagawa

Topic-specific parser design in an air travel natural language understanding application
Chaitanya J.K. Ekanadham, Juan M. Huerta

The use of confidence measures in vector based call-routing
Stephen J. Cox, Gavin Cawley

Multi-channel sentence classification for spoken dialogue language modeling
Frédéric Bechet, Giuseppe Riccardi, Dilek Z. Hakkani-Tur

Automatic induction of n-gram language models from a natural language grammar
Stephanie Seneff, Chao Wang, Timothy J. Hazen

Connectionist classification and specific stochastic models in the understanding process of a dialogue system
David Vilar, Maria Jose Castro, Emilio Sanchis

Robust parsing of utterances in negotiative dialogue
Johan Boye, Mats Wiren

Flexible speech act identification of spontaneous speech with disfluency
Chung-Hsien Wu, Gwo-Lang Yan

Efficient spoken dialogue control depending on the speech recognition rate and system's database
Kohji Dohsaka, Norihito Yasuda, Kiyoaki Aikawa

Robust speech understanding based on expected discourse plan
Shin-ya Takahashi, Tsuyoshi Morimoto, Sakashi Maeda, Naoyuki Tsuruta

A study on domain recognition of spoken dialogue systems
T. Isobe, S. Hayakawa, H. Murao, T. Mizutani, Kazuya Takeda, Fumitada Itakura

Domain adaptation augmented by state-dependence in spoken dialog systems
Wei He, Honglian Li, Baozong Yuan

Smartkom-home - an advanced multi-modal interface to home entertainment
Thomas Portele, Silke Goronzy, Martin Emele, Andreas Kellner, Sunna Torge, Jurgen te Vrugt

Methods to improve its portability of a spoken dialog system both on task domains and languages
Yunbiao Xu, Fengying Di, Masahiro Araki, Yasuhisa Niimi

Voxenter^TM - intelligent voice enabled call center for hungarian
Tibor Fegyo, Peter Mihajlik, Mate Szarvas, Peter Tatai, Gabor Tatai

Automatic call-routing without transcriptions
Qiang Huang, Stephen J. Cox

Jaspis^2 - an architecture for supporting distributed spoken dialogues
Markku Turunen, Jaakko Hakulinen

Development of a bilingual spoken dialog system for weather information retrieval
Janez Zibert, Sanda Martincic-Ipsic, Melita Hajdinjak, Ivo Ipsic, France Mihelic

Improving "how may i help you?" systems using the output of recognition lattices
James Allen, David Attwater, Peter Durston, Mark Farrell

Incremental learning of new user formulations in automatic directory assistance
M. Andorno, L. Fissore, P. Laface, M. Nigra, C. Popovici, F. Ravera, C. Vair

Dialog systems for automotive environments
Julie A. Baca, Feng Zheng, Hualin Gao, Joseph Picone

The development of a multi-purpose spoken dialogue system
Joao P. Neto, Nuno J. Mamede, Renato Cassaca, Luis C. Oliveira

The dynamic, multi-lingual lexicon in smartkom
Silke Goronzy, Zica Valsan, Martin Emele, Juergen Schimanowski

Evaluating discourse understanding in spoken dialogue systems
Ryuichiro Higashinaka, Noboru Miyazaki, Mikio Nakano, Kiyoaki Aikawa

Assessment of spoken dialogue system usability - what are we really measuring?
Lars Bo Larsen

Evaluation of a speech-driven telephone information service using the PARADISE framework: a closer look at subjective measures
Paula M.T. Smeele, Juliette A.J.S. Waals

Quantifying the impact of system characteristics on perceived quality dimensions of a spoken dialogue service
Sebastian Moller, Janto Skowronek

A programmable policy manager for conversational biometrics
Ganesh N. Ramaswamy, Ran D. Zilca, Oleg Alecksandrovich

Integration of speaker recognition into conversational spoken dialogue systems
Timothy J. Hazen, Douglas A. Jones, Alex Park, Linda C. Kukolich, Douglas A. Reynolds






Topics in Speech Recognition and Segmentation


Utterance verification under distributed detection and fusion framework
Taeyoon Kim, Hanseok Ko

Joint estimation of thresholds in a bi-threshold verification problem
Simon Ho, Brian Mak

Confidence measures for phonetic segmentation of continuous speech
Samir Nefti, Olivier Boeffard, Thierry Moudenc

Using confidence measures and domain knowledge to improve speech recognition
Pascal Wiggers, Leon J.M. Rothkrantz

Isolated word verification using cohort word-level verification
K. Thambiratnam, Sridha Sridharan

A new approach to minimize utterance verification error rate for a specific operating point
Wing-Hei Au, Man-Hung Siu

Continuous speech recognition and verification based on a combination score
Binfeng Yan, Rui Guo, Xiaoyan Zhu

Impact of word graph density on the quality of posterior probability based confidence measures
Tibor Fabian, Robert Lieb, Gunther Ruske, Matthias Thomae

An efficient keyword spotting technique using a complementary language for filler models training
Panikos Heracleous, Tohru Shimizu

Context-sensitive evaluation and correction of phone recognition output
Michael Levit, Hiyan Alshawi, Allen Gorin, Elmar Nöth

Estimating speech recognition error rate without acoustic test data
Yonggang Deng, Milind Mahajan, Alex Acero

Multigram-based grapheme-to-phoneme conversion for LVCSR
M. Bisani, Hermann Ney

Integrating statistical and rule-based knowledge for continuous German speech recognition
Rene Beutler, Beat Pfister

A fast, accurate and stream-based speaker segmentation and clustering algorithm
An Vandecatseye, Jean-Pierre Martens

A sequential metric-based audio segmentation method via the Bayesian information criterion
Shi-sian Cheng, Hsin-Min Wang

Sentence boundary detection in arabic speech
Amit Srivastava, Francis Kubala

Automated transcription and topic segmentation of large spoken archives
Martin Franz, Bhuvana Ramabhadran, Todd Ward, Michael Picheny

Automatic disfluency identification in conversational speech using multiple knowledge sources
Yang Liu, Elizabeth Shriberg, Andreas Stolcke

Topic segmentation and retrieval system for lecture videos based on spontaneous speech recognition
Natsuo Yamamoto, Jun Ogata, Yasuo Ariki





Speech Coding and Transmission


Optimization of window and LSF interpolation factor for the ITU-t g.729 speech coding standard
Wai C. Chu, Toshio Miki

Likelihood ratio test with complex laplacian model for voice activity detection
Joon-Hyuk Chang, Jong-Won Shin, Nam Soo Kim

Multi-mode quantization of adjacent speech parameters using a low-complexity prediction scheme
Jani Nurminen

Multi-mode matrix quantizer for low bit rate LSF quantization
Ulpu Sinervo, Jani Nurminen, Ari Heikkinen, Jukka Saarinen

Voicing controlled frame loss concealment for adaptive multi-rate (AMR) speech frames in voice-over-IP
Frank Mertz, Herve Taddei, Imre Varga, Peter Vary

Perceptual irrelevancy removal in narrowband speech coding
Marja Lahdekorpi, Jani Nurminen, Ari Heikkinen, Jukka Saarinen

Very-low-rate speech compression by indexation of polyphones
Charles du Jeu, Maurice Charbit, Gérard Chollet

Entropy-optimized channel error mitigation with application to speech recognition over wireless
Victoria Sanchez, Antonio M. Peinado, Angel M. Gomez, Jose L. Perez-Cordoba

Robust jointly optimized multistage vector quantization for speech coding
Venkatesh Krishnan, David V. Anderson

Polar quantization of sinusoids from speech signal blocks
Harald Pobloth, Renat Vafin, W. Bastiaan Kleijn

Transcoding algorithm for g.723.1 and AMR speech coders: for interoperability between voIP and mobile networks
Sung-Wan Yoon, Jin-Kyu Choi, Hong-Goo Kang, Dae-Hee Youn

Quality-complexity trade-off in predictive LSF quantization
Davorka Petrinovic, Davor Petrinovic

Variable bit rate control with trellis diagram approximation
Kei Kikuiri, Nobuhiko Naka, Tomoyuki Ohya

Towards optimal encoding for classification with applications to distributed speech recognition
Naveen Srinivasamurthy, Antonio Ortega, Shrikanth Narayanan

Multi-rate extension of the scalable to lossless PSPIHT audio coder
Mohammed Raad, Ian Burnett, Alfred Mertins

Entropy constrained quantization of LSP parameters
Turaj Zakizadeh Shabestary, Per Hedelin, Fredrik Norden


Speech Recognition - Search and Lexicon Modeling


Named entity extraction from Japanese broadcast news
Akio Kobayashi, Franz J. Och, Hermann Ney

Morpheme-based lexical modeling for korean broadcast news transcription
Young-Hee Park, Dong-Hoon Ahn, Minhwa Chung

Data driven example based continuous speech recognition
Mathias De Wachter, Kris Demuynck, Dirk van Compernolle, Patrick Wambacq

Large vocabulary speaker independent isolated word recognition for embedded systems
Sergey Astrov, Bernt Andrassy

Low-latency incremental speech transcription in the synface project
Alexander Seward

Multilingual acoustic modeling using graphemes
S. Kanthak, Hermann Ney

A cross-media retrieval system for lecture videos
Atsushi Fujii, Katunobu Itou, Tomoyosi Akiba, Tetsuya Ishikawa

Building a test collection for speech-driven web retrieval
Atsushi Fujii, Katunobu Itou

Confidence measure driven scalable two-pass recognition strategy for large list grammars
Miroslav Novak, Diego Ruiz

An efficient, fast matching approach using posterior probability estimates in speech recognition
Sherif Abdou, Michael S. Scordilis

On lexicon creation for turkish LVCSR
Kadri Hacioglu, Bryan Pellom, Tolga Ciloglu, Ozlem Ozturk, Mikko Kurimo, Mathias Creutz

Compiling large-context phonetic decision trees into finite-state transducers
Stanley F. Chen

Automatic summarization of broadcast news using structural features
Sameer Raj Maskey, Julia Hirschberg

A dynamic cross-reference pruning strategy for multiple feature fusion at decoder run time
Yonghong Yan, Chengyi Zheng, Jianping Zhang, Jielin Pan, Jiang Han, Jian Liu

Design of the CMU sphinx-4 decoder
Paul Lamere, Philip Kwok, William Walker, Evandro Gouvea, Rita Singh, Bhiksha Raj, Peter Wolf

A new decoder design for large vocabulary turkish speech recognition
Onur Cilingir, Mubeccel Demirekler


Speech Technology Applications


Automatic speech recognition with sparse training data for dysarthric speakers
Phil Green, James Carmichael, Athanassios Hatzis, Pam Enderby, Mark Hawley, Mark Parker

Prediction of sentence importance for speech summarization using prosodic parameters
Akira Inoue, Takayoshi Mikami, Yoichi Yamashita

An automatic singing transcription system with multilingual singing lyric recognizer and robust melody tracker
Chong-kai Wang, Ren-Yuan Lyu, Yuang-Chin Chiang

Speech shift: direct speech-input-mode switching through intentional control of voice pitch
Masataka Goto, Yukihiro Omoto, Katunobu Itou, Tetsunori Kobayashi

Evaluating multiple LVCSR model combination in NTCIR-3 speech-driven web retrieval task
Masahiko Matsushita, Hiromitsu Nishizaki, Takehito Utsuro, Yasuhiro Kodama, Seiichi Nakagawa

Semantic object synchronous understanding in SALT for highly interactive user interface
Kuansan Wang

Information retrieval based call classification
Jan Kneissler, Anne K. Kienappel, Dietrich Klakow

Using syllable-based indexing features and language models to improve German spoken document retrieval
Martha Larson, Stefan Eickeler

An empirical text transformation method for spontaneous speech synthesizers
Shiva Sundaram, Shrikanth Narayanan

A new approach to reducing alarm noise in speech
Yilmaz Gul, Aladdin M. Ariyaeeinia, Oliver Dewhirst

Improved name recognition with user modeling
Dong Yu, Kuansan Wang, Milind Mahajan, Peter Mau, Alex Acero

Speech recognition over bluetooth wireless channels
Ziad Al Bawab, Ivo Locher, Jianxia Xue, Abeer Alwan

Speech starter: noise-robust endpoint detection by using filled pauses
Koji Kitayama, Masataka Goto, Katunobu Itou, Tetsunori Kobayashi

Automatic segmentation of film dialogues into phonemes and graphemes
Gilles Boulianne, Jean-Francois Beaumont, Patrick Cardinal, Michel Comeau, Pierre Ouellet, Pierre Dumouchel

Automated closed-captioning of live TV broadcast news in French
Julie Brousseau, Jean-Francois Beaumont, Gilles Boulianne, Patrick Cardinal, Claude Chapdelaine, Michel Comeau, Frédéric Osterrath, Pierre Ouellet

Automatic construction of unique signatures and confusable sets for natural language directory assistance applications
E.E. Jan, Benoit Maison, Lidia Mangu, Geoffrey Zweig

Recent enhancements in CU VOCAL for Chinese TTS-enabled applications
Helen M. Meng, Yuk-Chi Li, Tien-Ying Fung, Man-Cheuk Ho, Chi-Kin Keung, Tin-Hang Lo, Wai-Kit Lo, P.C. Ching

Evaluation of an alert system for selective dissemination of broadcast news
Isabel Trancoso, Joao P. Neto, Hugo Meinedo, Rui Amaral

Low complexity joint optimization of excitation parameters in analysis-by-synthesis speech coding
U. Mittal, J.P. Ashley, E.M. Cruz-Zeno

Named entity extraction from word lattices
James Horlock, Simon King

A topic classification system based on parametric trajectory mixture models
William Belfield, Herbert Gish





Speech Recognition - Adaptation 1, 2


Vocal tract normalization as linear transformation of MFCC
Michael Pitz, Hermann Ney

Non-native spontaneous speech recognition through polyphone decision tree specialization
Zhirong Wang, Tanja Schultz

Live speech recognition in sports games by adaptation of acoustic model and language model
Yasuo Ariki, Takeru Shigemori, Tsuyoshi Kaneko, Jun Ogata, Masakiyo Fujimoto

Speaker adaptation using regression classes generated by phonetic decision tree-based successive state splitting
Se-Jin Oh, Kwang-Dong Kim, Duk-Gyoo Roh, Woo-Chang Sung, Hyun-Yeol Chung

Reduction of dimension of HMM parameters using ICA and PCA in MLLR framework for speaker adaptation
Jiun Kim, Jaeho Chung

Geometric constrained maximum likelihood linear regression on Mandarin dialect adaptation
Huayun Zhang, Bo Xu

Adapting language models for frequent fixed phrases by emphasizing n-gram subsets
Tomoyosi Akiba, Katunobu Itou, Atsushi Fujii

Learning intra-speaker model parameter correlations from many short speaker segments
Anne K. Kienappel

Modeling Cantonese pronunciation variation by acoustic model refinement
Patgi Kam, Tan Lee, Frank K. Soong

Performance improvement of rapid speaker adaptation based on eigenvoice and bias compensation
Jong Se Park, Hwa Jeon Song, Hyung Soon Kim

Training data optimization for language model adaptation
Xiaoshan Fang, Jianfeng Gao, Jianfeng Li, Huanye Sheng

Approaches to foreign-accented speaker-independent speech recognition
Stefanie Aalburg, Harald Hoege

Unsupervised speaker adaptation based on HMM sufficient statistics in various noisy environments
Shingo Yamade, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano

Using genetic algorithms for rapid speaker adaptation
Fabrice Lauri, Irina Illina, Dominique Fohr, Filipp Korkmazsky

Structural state-based frame synchronous compensation
Vincent Barreaud, Irina Illina, Dominique Fohr, Filipp Korkmazsky

Effect of foreign accent on speech recognition in the NATO n-4 corpus
Aaron D. Lawson, David M. Harris, John J. Grieco

Duration normalization and hypothesis combination for improved spontaneous speech recognition
Jon P. Nedel, Richard M. Stern

Maximum a posteriori linear regression (MAPLR) variance adaptation for continuous density HMMS
Wu Chou, Xiaodong He

On divergence based clustering of normal distributions and its application to HMM adaptation
Tor Andre Myrvoll, Frank K. Soong

Fast incremental adaptation using maximum likelihood regression and stochastic gradient descent
Sreeram V. Balakrishnan

Large vocabulary conversational speech recognition with a subspace constraint on inverse covariance matrices
Scott Axelrod, Vaibhava Goel, Brian Kingsbury, Karthik Visweswariah, Ramesh Gopinath

Speaker adaptation based on confidence-weighted training
Gyucheol Jang, Minho Jin, Chang D. Yoo

Jacobian adaptation based on the frequency-filtered spectral energies
Alberto Abad, Climent Nadeu, Javier Hernando, Jaume Padrell

Structural linear model-space transformations for speaker adaptation
Driss Matrouf, Olivier Bellot, Pascal Nocera, Georges Linares, Jean-Francois Bonastre

Minimum classification error (MCE) model adaptation of continuous density HMMS
Xiaodong He, Wu Chou

Adapting acoustic models to new domains and conditions using untranscribed data
Asela Gunawardana, Alex Acero


Speech Resources and Standards


Tfarsdat - the telephone farsi speech database
Mahmood Bijankhan, Javad Sheykhzadegan, Mahmood R. Roohani, Rahman Zarrintare, Seyyed Z. Ghasemi, Mohammad E. Ghasedi

Large lexica for speech-to-speech translation: from specification to creation
Elviira Hartikainen, Giulio Maltese, Asunción Moreno, Shaunie Shammass, Ute Ziegenhain

A pronunciation lexicon for turkish based on two-level morphology
Kemal Oflazer, Sharon Inkelas

Using both global and local hidden Markov models for automatic speech unit segmentation
Hong Zheng, Yiqing Lu

Quality control of language resources at ELRA
Henk van den Heuvel, Khalid Choukri, Harald Hoge, Bente Maegaard, Jan Odijk, Valerie Mapelli

Validation of phonetic transcriptions based on recognition performance
Christophe van Bael, Diana Binnenpoorte, Helmer Strik, Henk van den Heuvel

The basque speech_dat (II) database: a description and first test recognition results
I. Hernaez, I. Luengo, E. Navas, M. Zubizarreta, I. Gaminde, J. Sanchez

Towards an evaluation standard for speech control concepts in real-world scenarios
Jens Maase, Diane Hirschfeld, Uwe Koloska, Timo Westfeld, Jorg Helbig

Orientel: recording telephone speech of turkish speakers in Germany
Chr. Draxler

Spanish broadcast news transcription
Gerhard Backfried, Roser Jaquemot Caldes

Large vocabulary continuous speech recognition in greek: corpus and an automatic dictation system
Vassilios Digalakis, Dimitrios Oikonomidis, D. Pratsolis, N. Tsourakis, C. Vosnidis, N. Chatzichrisafis, V. Diakoloukas

The LIUM-AVS database : a corpus to test lip segmentation and speechreading systems in natural conditions
Philippe Daubias, Paul Deleglise

Implementation and evaluation of a text-to-speech synthesis system for turkish
Ozgul Salor, Bryan Pellom, Mubeccel Demirekler

The czech speech and prosody database both for ASR and TTS purposes
Jachym Kolar, Jan Romportl, Josef Psutka

Construction of an advanced in-car spoken dialogue corpus and its characteristic analysis
Itsuki Kishida, Yuki Irie, Yukiko Yamaguchi, Shigeki Matsubara, Nobuo Kawaguchi, Yasuyoshi Inagaki

Measuring the readability of automatic speech-to-text transcripts
Douglas A. Jones, Florian Wolf, Edward Gibson, Elliott Williams, Evelina Fedorenko, Douglas A. Reynolds, Marc Zissman

The NESPOLE! voIP multilingual corpora in tourism and medical domains
Nadia Mana, Susanne Burger, Roldano Cattoni, Laurent Besacier, Victoria MacLaren, John McDonough, Florian Metze

Lexica and corpora for speech-to-speech translation: a trilingual approach
David Conejero, Jesus Gimenez, Victoria Arranz, Antonio Bonafonte, Neus Pascual, Nuria Castell, Asunción Moreno

From switchboard to fisher: telephone collection protocols, their uses and yields
Christopher Cieri, David Miller, Kevin Walker

Development of the estonian speechdat-like database
Einar Meister, Jurgen Lasn, Lya Meister

Towards a repository of digital talking books
Antonio Serralheiro, Isabel Trancoso, Diamantino Caseiro, Teresa Chambel, Luis Carrico, Nuno Guimaraes

Shared resources for robust speech-to-text technology
Stephanie Strassel, David Miller, Kevin Walker, Christopher Cieri





Robust Speech Recognition 1-4


A hidden Markov model-based missing data imputation approach
Yu Luo, Limin Du

Integration of noise reduction algorithms for Aurora2 task
Takeshi Yamada, Jiro Okada, Kazuya Takeda, Norihide Kitaoka, Masakiyo Fujimoto, Shingo Kuroiwa, Kazumasa Yamamoto, Takanobu Nishiura, Mitsunori Mizumachi, Satoshi Nakamura

Classification with free energy at raised temperatures
Rita Singh, Manfred K. Warmuth, Bhiksha Raj, Paul Lamere

Flooring the observation probability for robust ASR in impulsive noise
Pei Ding, Bertram E. Shi, Pascale Fung, Zhigang Cao

Combination of temporal domain SVD based speech enhancement and GMM based speech estimation for ASR in noise - evaluation on the AURORA2 task -
Masakiyo Fujimoto, Yasuo Ariki

Additive noise and channel distortion-robust parametrization tool - performance evaluation on Aurora 2 & 3
Petr Fousek, Petr Pollak

Robust feature extraction and acoustic modeling at multitel: experiments on the Aurora databases
Stephane Dupont, Christophe Ris

Noise robust speech parameterization based on joint wavelet packet decomposition and autoregressive modeling
Bojan Kotnik, Zdravko Kacic, Bogomir Horvat

Database adaptation for ASR in cross-environmental conditions in the SPEECON project
Christophe Couvreur, Oren Gedge, Klaus Linhard, Shaunie Shammass, Johan Vantieghem

Autoregressive modeling based feature extraction for Aurora3 DSR task
Petr Motlicek, Jan Cernocký

Evaluation on the Aurora 2 database of acoustic models that are less noise-sensitive
Edmondo Trentin, Marco Matassoni, Marco Gori

Revisiting scenarios and methods for variable frame rate analysis in automatic speech recognition
J. Macias-Guarasa, J. Ordonez, J.M. Montero, J. Ferreiros, R. Cordoba, L.F. D'Haro

Multitask learning in connectionist robust ASR using recurrent neural networks
Shahla Parveen, Phil Green

Confusion matrix based entropy correction in multi-stream combination
Hemant Misra, Andrew Morris

Dynamic channel compensation based on maximum a posteriori estimation
Huayun Zhang, Zhaobing Han, Bo Xu

Far-field ASR on inexpensive microphones
Laura Docio-Fernandez, David Gelbart, Nelson Morgan

Evaluation of ETSI advanced DSR front-end and bias removal method on the Japanese newspaper article sentences speech corpus
Satoru Tsuge, Shingo Kuroiwa, Kenji Kita

Environment adaptive control of noise reduction parameters for improved robustness of ASR
Chng Chin Soon, Bernt Andrassy, Josef Bauer, Gunther Ruske

Speech enhancement with microphone array and fourier / wavelet spectral subtraction in real noisy environments
Yuki Denda, Takanobu Nishiura, Hideki Kawahara

Environmental sound source identification based on hidden Markov model for robust speech recognition
Takanobu Nishiura, Satoshi Nakamura, Kazuhiro Miki, Kiyohiro Shikano

High-likelihood model based on reliability statistics for robust combination of features: application to noisy speech recognition
Peter Jancovic, Munevver Kokuer, Fionn Murtagh

Noise robust digit recognition with missing frames
Cenk Demiroglu, David V. Anderson

A noise-robust ASR back-end technique based on weighted viterbi recognition
Xiaodong Cui, Alexis Bernard, Abeer Alwan

Voice quality normalization in an utterance for robust ASR
Muhammad Ghulam, Takashi Fukuda, Tsuneo Nitta

Environmental sniffing: robust digit recognition for an in-vehicle environment
Murat Akbacak, John H.L. Hansen

Energy contour extraction for in-car speech recognition
Tai-Hwei Hwang

Noise-robust ASR by using distinctive phonetic features approximated with logarithmic normal distribution of HMM
Takashi Fukuda, Tsuneo Nitta

Noise-robust automatic speech recognition using orthogonalized distinctive phonetic feature vectors
Takashi Fukuda, Tsuneo Nitta

Language model accuracy and uncertainty in noise cancelling in the stochastic weighted viterbi algorithm
Nestor Becerra Yoma, Ivan Brito, Jorge Silva

Assessment of dereverberation algorithms for large vocabulary speech recognition systems
Koen Eneman, Jacques Duchateau, Marc Moonen, Dirk van Compernolle, Hugo van Hamme

Analysis and compensation of packet loss in distributed speech recognition using interleaving
Ben P. Milner, A.B. James

Non-linear compression of feature vectors using transform coding and non-uniform bit allocation
Ben P. Milner

Predictive hidden Markov model selection for decision tree state tying
Jen-Tzung Chien, Sadaoki Furui

Three simultaneous speech recognition by integration of active audition and face recognition for humanoid
Kazuhiro Nakadai, Daisuke Matsuura, Hiroshi G. Okuno, Hiroshi Tsujino

Mis-recognized utterance detection using multiple language models generated by clustered sentences
Katsuhisa Fujinaga, Hiroaki Kokubo, Hirofumi Yamamoto, Genichiro Kikui, Hiroshi Shimodaira

Using word confidence measure for OOV words detection in a spontaneous spoken dialog system
Hui Sun, Guoliang Zhang, Fang Zheng, Mingxing Xu

Speech recognition using EMG; mime speech recognition
Hiroyuki Manabe, Akira Hiraiwa, Toshiaki Sugimura

Automatic generation of non-uniform context-dependent HMM topologies based on the MDL criterion
Takatoshi Jitsuhiro, Tomoko Matsui, Satoshi Nakamura

Comparison of effects of acoustic and language knowledge on spontaneous speech perception/recognition between human and automatic speech recognizer
Norihide Kitaoka, Masahisa Shingu, Seiichi Nakagawa

Using statistical language modelling to identify new vocabulary in a grammar-based speech recognition system
Genevieve Gorrell

A source model mitigation technique for distributed speech recognition over lossy packet channels
Angel M. Gomez, Antonio M. Peinado, Victoria Sanchez, Antonio J. Rubio

The effect of an intermediate articulatory layer on the performance of a segmental HMM
Martin J. Russell, Philip J.B. Jackson

Automatic phone set extension with confidence measure for spontaneous speech
Yi Liu, Pascale Fung

Utterance verification using an optimized k-nearest neighbour classifier
R. Paredes, A. Sanchis, E. Vidal, A. Juan

A segment-based algorithm of speech enhancement for robust speech recognition
Guokang Fu, Ta-Hsin Li

Robust multiple resolution analysis for automatic speech recognition
Roberto Gemello, Franco Mana, Dario Albesano, Renato De Mori

An accurate noise compensation algorithm in the log-spectral domain for robust speech recognition
Mohamed Afify

A new adaptive long-term spectral estimation voice activity detector
Javier Ramirez, Jose C. Segura, Carmen Benitez, Angel de la Torre, Antonio J. Rubio

Robust speech recognition using non-linear spectral smoothing
Michael J. Carey

A novel use of residual noise model for modified PMC
Cailian Miao, Yangsheng Wang

Robust speech recognition to non-stationary noise based on model-driven approaches
Christophe Cerisara, Irina Illina

Towards missing data recognition with cepstral features
Christophe Cerisara

On-line parametric histogram equalization techniques for noise robust embedded speech recognition
Hemmo Haverinen, Imre Kiss

Compensation of channel distortion in line spectrum frequency domain
An-Tze Yu, Hsiao-Chuan Wang

Voicing parameter and energy based speech/non-speech detection for speech recognition in adverse conditions
Arnaud Martin, Laurent Mauuary

Two correction models for likelihoods in robust speech recognition using missing feature theory
Hugo van Hamme

Spectral maxima representation for robust automatic speech recognition
J. Sujatha, K.R. Prasanna Kumar, K.R. Ramakrishnan, N. Balakrishnan

Missing feature theory applied to robust speech recognition over IP network
Toshiki Endo, Shingo Kuroiwa, Satoshi Nakamura

Comparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for robust automatic speech recognition in low-SNR car environments
Hesham Tolba, Sid-Ahmed Selouani, Douglas O'Shaughnessy

Robust speech recognition using missing feature theory in the cepstral or LDA domain
Hugo van Hamme

Bandwidth mismatch compensation for robust speech recognition
Yuan-Fu Liao, Jeng-Shien Lin, Wei-Ho Tsai

Markov chain monte carlo methods for noise robust feature extraction using the autoregressive model
Robert W. Morris, Jon A. Arrowood, Mark A. Clements

A comparative study of some discriminative feature reduction algorithms on the AURORA 2000 and the daimlerchrysler in-car ASR tasks
Joan Mari Hilario, Fritz Class


Speech Recognition - Large Vocabulary 1, 2


Large vocabulary ASR for spontaneous czech in the MALACH project
Josef Psutka, Pavel Ircing, J.V. Psutka, Vlasta Radova, William J. Byrne, Jan Hajic, Jiri Mirovsky, Samuel Gustman

Active and unsupervised learning for automatic speech recognition
Giuseppe Riccardi, Dilek Z. Hakkani-Tur

Perceptual MVDR-based cepstral coefficients (PMCCs) for high accuracy speech recognition
Umit H. Yapanel, Satya Dharanipragada, John H.L. Hansen

A discriminative decision tree learning approach to acoustic modeling
Sheng Gao, Chin-Hui Lee

Large corpus experiments for broadcast news recognition
Patrick Nguyen, Luca Rigazio, Jean-Claude Junqua

Performance evaluation of phonotactic and contextual onset-rhyme models for speech recognition of Thai language
Somchai Jitapunkul, Ekkarit Maneenoi, Visarut Ahkuputra, Sudaporn Luksaneeyanawin

Overlapped di-tone modeling for tone recognition in continuous Cantonese speech
Yao Qian, Tan Lee, Yujia Li

Speaker model selection using Bayesian information criterion for speaker indexing and speaker adaptation
Masafumi Nishida, Tatsuya Kawahara

Automatic transcription of football commentaries in the MUMIS project
Janienke Sturm, Judith M. Kessens, Mirjam Wester, Febe de Wet, Eric Sanders, Helmer Strik

On the limits of cluster-based acoustic modeling
S. Douglas Peters

Large vocabulary taiwanese (min-nan) speech recognition using tone features and statistical pronunciation modeling
Dau-Cheng Lyu, Min-Siong Liang, Yuang-Chin Chiang, Chun-Nan Hsu, Ren-Yuan Lyu

A new spectral transformation for speaker normalization
Pierre L. Dognin, Amro El-Jaroudi

Enhanced tree clustering with single pronunciation dictionary for conversational speech recognition
Hua Yu, Tanja Schultz

Fitting class-based language models into weighted finite-state transducer framework
Pavel Ircing, Josef Psutka

Multi-source training and adaptation for generic speech recognition
Fabrice Lefevre, Jean-Luc Gauvain, Lori Lamel

Toward domain-independent conversational speech recognition
Brian Kingsbury, Lidia Mangu, George Saon, Geoffrey Zweig, Scott Axelrod, Vaibhava Goel, Karthik Visweswariah, Michael Picheny

Comparative study of boosting and non-boosting training for constructing ensembles of acoustic models
Rong Zhang, Alexander I. Rudnicky

Discriminative optimization of large vocabulary Mandarin conversational speech recognition system
Peng Ding, Zhenbiao Chen, Sheng Hu, Shuwu Zhang, Bo Xu

Speech recognition with dynamic grammars using finite-state transducers
Johan Schalkwyk, Lee Hetherington, Ezra Story

FLavor: a flexible architecture for LVCSR
Kris Demuynck, Tom Laureys, Dirk van Compernolle, Hugo van Hamme

An architecture for rapid decoding of large vocabulary conversational speech
George Saon, Geoffrey Zweig, Brian Kingsbury, Lidia Mangu, Upendra Chaudhari

MMI-MAP and MPE-MAP for acoustic model adaptation
D. Povey, M.J.F. Gales, D.Y. Kim, P.C. Woodland

Lattice segmentation and minimum Bayes risk discriminative training
Vlasios Doumpiotis, Stavros Tsakalidis, William J. Byrne





Speech Perception


Schema-based modeling of phonemic restoration
Soundararajan Srinivasan, DeLiang Wang

Perception of voice-individuality for distortions of resonance/source characteristics and waveforms
Hisao Kuwabara

The perceptual cues of a high level pitch-accent pattern in Japanese: pitch-accent patterns and duration
Tsutomu Sato

Illusory continuity of intermittent pure tone in binaural listening and its dependency on interaural time difference
Mamoru Iwaki, Norio Nakamura

CART-based factor analysis of intelligibility reduction in Japanese English
Nobuaki Minematsu, Changchen Guo, Keikichi Hirose

Harmonic alternatives to sine-wave speech
Laszlo Toth, Andras Kocsor

Non-intrusive assessment of perceptual speech quality using a self-organising map
Dorel Picovici, Abdulhussain E. Mahdi

Inhibitory priming effect in auditory word recognition: the role of the phonological mismatch length between primes and targets
Sophie Dufour, Ronald Peereman

Recognising `real-life' speech with spem: a speech-based computational model of human speech recognition
Odette Scharenborg, Louis ten Bosch, Lou Boves

The effect of speech rate and noise on bilinguals' speech perception: the case of native speakers of arabic in israel
Judith Rosenhouse, Liat Kishon-Rabin

Subjective evaluations for perception of speaker identity through acoustic feature transplantations
Oytun Turk, Levent M. Arslan

Modelling human speech recognition using automatic speech recognition paradigms in speM
Odette Scharenborg, James M. McQueen, Louis ten Bosch, Dennis Norris

The effect of amplitude compression on wide band telephone speech for hearing-impaired elderly people
Mutsumi Saito, Kimio Shiraishi, Kimitoshi Fukudome

Word activation model by Japanese school children without knowledge of roman alphabet
Takashi Otake, Miki Komatsu

Multi-resolution auditory scene analysis: robust speech recognition using pattern-matching from a noisy signal
Sue Harding, Georg Meyer

Investigation of emotionally morphed speech perception and its structure using a high quality speech manipulation system
Hisami Matsui, Hideki Kawahara

Usefulness of phase spectrum in human speech perception
Kuldip K. Paliwal, Leigh Alsteris

Perception of English lexical stress by English and Japanese speakers: effect of duration and "realistic" intensity change
Shinichi Tokuma

French intonational rises and their role in speech seg mentation [sic]
Pauline Welby

Physical and perceptual configurations of Japanese fricatives from multidimensional scaling analyses
Won Tokuma

An acquisition model of speech perception with considerations of temporal information
Ching-Pong Au


Multi-Modal Processing and Speech Interface Design


An integrated system for smart-home control of appliances based on remote speech interaction
Ilyas Potamitis, K. Georgila, Nikos Fakotakis, George Kokkinakis

A spoken language interface to an electronic programme guide
Jianhong Jin, Martin J. Russell, Michael J. Carey, James Chapman, Harvey Lloyd-Thomas, Graham Tattersall

Towards a personal robot with language interface
L. Seabra Lopes, Antonio Teixeira, M. Rodrigues, D. Gomes, C. Teixeira, L. Ferreira, P. Soares, J. Girao, N. Senica

Preference, perception, and task completion of open, menu-based, and directed prompts for call routing: a case study
Jason D. Williams, Andrew T. Shaw, Lawrence Piano, Michael Abt

An integrated toolkit deploying speech technology for computer based speech training with application to dysarthric speakers
Athanassios Hatzis, Phil Green, James Carmichael, Stuart Cunningham, Rebecca Palmer, Mark Parker, Peter O'Neill

Towards best practices for speech user interface design
Bernhard Suhm

Design and evaluation of a limited two-way speech translator
David Stallard, John Makhoul, Frederick Choi, Ehry Macrostie, Premkumar Natarajan, Richard Schwartz, Bushra Zawaydeh

Multimodal interaction on PDA's integrating speech and pen inputs
Sorin Dusan, Gregory J. Gadbois, James Flanagan

Towards multimodal interaction with an intelligent room
Petra Gieselmann, Matthias Denecke

A multimodal conversational interface for a concept vehicle
Roberto Pieraccini, Krishna Dayanidhi, Jonathan Bloom, Jean-Gui Dahan, Michael Phillips, Bryan R. Goodman, K. Venkatesh Prasad

Context awareness using environmental noise classification
L. Ma, D.J. Smith, Ben P. Milner

Simple designing methods of corpus-based visual speech synthesis
Tatsuya Shiraishi, Tomoki Toda, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano

Comparing the usability of a user driven and a mixed initiative multimodal dialogue system for train timetable information
Janienke Sturm, Ilse Bakx, Bert Cranen, Jacques Terken

Read my tongue movements: bimodal learning to perceive and produce non-native speech /r/ and /l/
Dominic W. Massaro, Joanna Light

Low resource lip finding and tracking algorithm for embedded devices
Jesus F. Guitarte Perez, Klaus Lukas, Alejandro F. Frangi

Detection and separation of speech segment using audio and video information fusion
Futoshi Asano, Yoichi Motomura, Hideki Asoh, Takashi Yoshimura, Naoyuki Ichimura, Kiyoshi Yamamoto, Nobuhiko Kitawaki, Satoshi Nakamura

Resynthesis of 3d tongue movements from facial data
Olov Engwall, Jonas Beskow

Acquiring lexical information from multilevel temporal annotations
Thorsten Trippel, Felix Sasaki, Benjamin Hell, Dafydd Gibbon

LUCIA a new italian talking-head based on a modified cohen-massaro's labial coarticulation model
Piero Cosi, Andrea Fusaro, Graziano Tisato

A visual context-aware multimodal system for spoken language processing
Niloy Mukherjee, Deb Roy





Speech Synthesis: Voice Conversion and Miscellaneous Topics


GMM-based voice conversion applied to emotional speech synthesis
Hiromichi Kawanami, Yohei Iwami, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano

Probability models of formant parameters for voice conversion
Dimitrios Rentzos, Saeed Vaseghi, Qin Yan, Ching-Hsiang Ho, Emir Turajlic

Perceptually weighted linear transformations for voice conversion
Hui Ye, Steve Young

Voice conversion with smoothed GMM and MAP adaptation
Yining Chen, Min Chu, Eric Chang, Jia Liu, Runsheng Liu

A system for voice conversion based on adaptive filtering and line spectral frequency distance optimization for text-to-speech synthesis
Ozgul Salor, Mubeccel Demirekler, Bryan Pellom

Speaker conversion in ARX-based source-formant type speech synthesis
Hiroki Mori, Hideki Kasuya

Implementing an SSML compliant concatenative TTS system
Andrew P. Breen, Steve Minnis, Barry Eggleton

Acoustic variations of focused disyllabic words in Mandarin Chinese: analysis, synthesis and perception
Zhenglai Gu, Hiroki Mori, Hideki Kasuya

An approach to common acoustical pole and zero modeling of consecutive periods of voiced speech
Pedro Quintana-Morales, Juan L. Navarro-Mesa

Estimating the vocal-tract area function and the derivative of the glottal wave from a speech signal
Huiqun Deng, Michael Beddoes, Rabab Ward, Murray Hodgson

Glottal closure instant synchronous sinusoidal model for high quality speech analysis/synthesis
Parham Zolfaghari, Tomohiro Nakatani, Toshio Irino, Hideki Kawahara, Fumitada Itakura

Mixed physical modeling techniques applied to speech production
Matti Karjalainen

An expandable web-based audiovisual text-to-speech synthesis system
Sascha Fagel, Walter F. Sendlmeier

A reconstruction of farkas kempelen's speaking machine
P. Nikleczy, G. Olaszy

Acoustic model selection and voice quality assessment for HMM-based Mandarin speech synthesis
Wentao Gu, Keikichi Hirose

Modeling of various speaking styles and emotions for HMM-based speech synthesis
Junichi Yamagishi, Koji Onishi, Takashi Masuko, Takao Kobayashi

Towards the development of a brazilian portuguese text-to-speech system based on HMM
R. da S. Maia, Heiga Zen, Keiichi Tokuda, Tadashi Kitamura, F.G.V. Resende Jr.

Grapheme to phoneme conversion and dictionary verification using graphonemes
Paul Vozila, Jeff Adams, Yuliya Lobacheva, Ryan Thomas

Improving the accuracy of pronunciation prediction for unit selection TTS
Justin Fackrell, Wojciech Skut, Kathrine Hammervold

Detection of list-type sentences
Taniya Mishra, Esther Klabbers, Jan P.H. van Santen


Acoustic Modelling 1, 2


A new pitch synchronous time domain phoneme recognizer using component analysis and pitch clustering
Ramon Prieto, Jing Jiang, Chi-Ho Choi

Mixed-lingual spoken word recognition by using VQ codebook sequences of variable length segments
Hiroaki Kojima, Kazuyo Tanaka

Low memory acoustic models for HMM based speech recognition
Tommi Lahti, Olli Viikki, Marcel Vasilache

Nearest-neighbor search algorithms based on subcodebook selection and its application to speech recognition
Jose A.R. Fonollosa

Non-linear maximum likelihood feature transformation for speech recognition
Mohamed Kamal Omar, Mark Hasegawa-Johnson

Automatic generation of context-independent variable parameter models using successive state and mixture splitting
Soo-Young Suk, Ho-Youl Jung, Hyun-Yeol Chung

Data driven generation of broad classes for decision tree construction in acoustic modeling
Andrej Zgank, Zdravko Kacic, Bogomir Horvat

An efficient integrated gender detection scheme and time mediated averaging of gender dependent acoustic models
Peder A. Olsen, Satya Dharanipragada

Syllable-based acoustic modeling for Japanese spontaneous speech recognition
Jun Ogata, Yasuo Ariki

Cross-stream observation dependencies for multi-stream speech recognition
Ozgur Cetin, Mari Ostendorf

Pruning transitions in a hidden Markov model with optimal brain surgeon
Brian Mak, Kin-Wah Chan

Using pitch frequency information in speech recognition
Mathew Magimai-Doss, Todd A. Stephenson, Hervé Bourlard

Hidden feature models for speech recognition using dynamic Bayesian networks
Karen Livescu, James Glass, Jeff Bilmes

An efficient viterbi algorithm on DBNs
Wei Hu, Yimin Zhang, Qian Diao, Shan Huang

Speech recognition based on syllable recovery
Li Zhang, William Edmondson

HARTFEX: a multi-dimensional system of HMM based recognisers for articulatory features extraction
Tarek Abu-Amer, Julie Carson-Berndsen

Automatic baseform generation from acoustic data
Benoit Maison

Data-driven pronunciation modeling for ASR using acoustic subword units
Thurid Spiess, Britta Wrede, Gernot A. Fink, Franz Kummert

Variable length mixtures of inverse covariances
Vincent Vanhoucke, Ananth Sankar

Semi-tied full deviation matrices for laplacian density models
Christoph Neukirchen

Acoustic modeling with mixtures of subspace constrained exponential models
Karthik Visweswariah, Scott Axelrod, Ramesh Gopinath

Discriminative estimation of subspace precision and mean (SPAM) models
Vaibhava Goel, Scott Axelrod, Ramesh Gopinath, Peder A. Olsen, Karthik Visweswariah

Model-integration rapid training based on maximum likelihood for speech recognition
Shinichi Yoshizawa, Kiyohiro Shikano

On the use of kernel PCA for feature extraction in speech recognition
Amaro Lima, Heiga Zen, Yoshihiko Nankaku, Chiyomi Miyajima, Keiichi Tokuda, Tadashi Kitamura




Speaker and Language Recognition


Speaker modeling from selected neighbors applied to speaker recognition
Yassine Mami, Delphine Charlet

Who knows carl bildt? - and what if you don't?
Elisabeth Zetterholm, Kirk P.H. Sullivan, James Green, Erik Eriksson, Jan van Doorn, Peter E. Czigler

Improving the competitiveness of discriminant neural networks in speaker verification
C. Vivaracho-Pascual, J. Ortega-Garcia, L. Alonso-Romero, Q. Moro-Sancho

On the fusion of dissimilarity-based classifiers for speaker identification
Tomi Kinnunen, Ville Hautamaki, Pasi Franti

Robust speaker identification using posterior union models
Ji Ming, Darryl Stewart, Philip Hanna, Pat Corr, Jack Smith, Saeed Vaseghi

syncpitch: a pseudo pitch synchronous algorithm for speaker recognition
Ran D. Zilca, Jiri Navratil, Ganesh N. Ramaswamy

A method for on-line speaker indexing using generic reference models
Soonil Kwon, Shrikanth Narayanan

Discriminative training and maximum likelihood detector for speaker identification
M. Mihoubi, Gilles Boulianne, Pierre Dumouchel

Novel approaches for one- and two-speaker detection
Sachin S. Kajarekar, Andre G. Adami, Hynek Hermansky

Fusing high- and low-level features for speaker recognition
Joseph P. Campbell, Douglas A. Reynolds, Robert B. Dunn

Score normalisation applied to open-set, text-independent speaker identification
P. Sivakumaran, J. Fortuna, Aladdin M. Ariyaeeinia

On the number of Gaussian components in a mixture: an application to speaker verification tasks
Mijail Arcienega, Andrzej Drygajlo

Using accent information in ASR models for Swedish
Giampiero Salvi

Estimating Japanese word accent from syllable sequence using support vector machine
Hideharu Nakajima, Masaaki Nagata, Hisako Asano, Masanobu Abe

PPRLM optimization for language identification in air traffic control tasks
R. Cordoba, G. Prime, J. Macias-Guarasa, J.M. Montero, J. Ferreiros, J.M. Pardo


Spoken Language Understanding and Translation


Spoken cross-language access to image collection via captions
Hsin-Hsi Chen

Understanding process for speech recognition
Salma Jamoussi, Kamel Smaili, Jean-Paul Haton

Collecting machine-translation-aided bilingual dialogues for corpus-based speech translation
Toshiyuki Takezawa, Genichiro Kikui

Combination of finite state automata and neural network for spoken language understanding
Chai Wutiwiwatchai, Sadaoki Furui

Discriminative methods for improving named entity extraction on speech data
James Horlock, Simon King

Improving statistical natural concept generation in interlingua-based speech-to-speech translation
Liang Gu, Yuqing Gao, Michael Picheny

How NLP techniques can improve speech understanding: ROMUS - a robust chunk based message understanding system using link grammars
Jerome Goulian, Jean-Yves Antoine, Franck Poirier

Discriminative training of n-gram classifiers for speech and text routing
Ciprian Chelba, Alex Acero

Correction of disfluencies in spontaneous speech using a noisy-channel approach
Matthias Honal, Tanja Schultz

Multi-class extractive voicemail summarization
Konstantinos Koumpis, Steve Renals

Active labeling for spoken language understanding
Gokhan Tur, Mazin Rahim, Dilek Z. Hakkani-Tur

Exploiting unlabeled utterances for spoken language understanding
Gokhan Tur, Dilek Z. Hakkani-Tur

Noise robustness in speech to speech translation
Fu-Hua Liu, Yuqing Gao, Liang Gu, Michael Picheny

Example-based bi-directional Chinese-English machine translation with semi-automatically induced grammars
K.C. Siu, Helen M. Meng, C.C. Wong

Spotting "hot spots" in meetings: human judgments and prosodic cues
Britta Wrede, Elizabeth Shriberg

Combination of CFG and n-gram modeling in semantic grammar learning
Ye-Yi Wang, Alex Acero

Automatic title generation for Chinese spoken documents using an adaptive k nearest-neighbor approach
Shun-Chuan Chen, Lin-shan Lee

Speech summarization using weighted finite-state transducers
Takaaki Hori, Chiori Hori, Yasuhiro Minami

Cross domain Chinese speech understanding and answering based on named-entity extraction
Yun-Tien Lee, Shun-Chuan Chen, Lin-shan Lee

Evaluation method for automatic speech summarization
Chiori Hori, Takaaki Hori, Sadaoki Furui

An information theoretic approach for using word cluster information in natural language call routing
Li Li, Feng Liu, Wu Chou

Unsupervised topic discovery applied to segmentation of news transcriptions
Sreenivasa Sista, Amit Srivastava, Francis Kubala, Richard Schwartz



Speaker Recognition and Verification


New MAP estimators for speaker recognition
P. Kenny, M. Mihoubi, Pierre Dumouchel

A new SVM approach to speaker identification and verification using probabilistic distance kernels
Pedro J. Moreno, Purdy P. Ho

Adaptive decision fusion for multi-sample speaker verification over GSM networks
Ming-Cheung Cheung, Man-Wai Mak, Sun-Yuan Kung

Environment adaptation for robust speaker verification
Kwok-Kwong Yiu, Man-Wai Mak, Sun-Yuan Kung

On cohort selection for speaker verification
Yaniv Zigel, Arnon Cohen

Speaker characterization using principal component analysis and wavelet transform for speaker verification
C. Tadj, A. Benlahouar

Unsupervised speaker indexing using anchor models and automatic transcription of discussions
Yuya Akita, Tatsuya Kawahara

A statistical approach to assessing speech and voice variability in speaker verification
Klaus R. Scherer, D. Grandjean, T. Johnstone, G. Klasmeyer, Tanja Banziger

Automatic singer identification of popular music recordings via estimation and modeling of solo vocal signal
Wei-Ho Tsai, Hsin-Min Wang, Dwight Rodgers

A DP algorithm for speaker change detection
Michele Vescovi, Mauro Cettolo, Romeo Rizzi

SOM as likelihood estimator for speaker clustering
Itshak Lapidot

Automatic estimation of perceptual age using speaker modeling techniques
Nobuaki Minematsu, Keita Yamauchi, Keikichi Hirose

Speaker recognition using local models
Ryan Rifkin

Dependence of GMM adaptation on feature post-processing for speaker recognition
Robbie Vogt, Jason Pelecanos, Sridha Sridharan

Text-independent speaker recognition by speaker-specific GMM and speaker adapted syllable-based HMM
Seiichi Nakagawa, Wei Zhang

On the amount of speech data necessary for successful speaker identification
Ales Padrta, Vlasta Radova

Speaker verification based on the German veridat database
Ulrich Turk, Florian Schiel



Interdisciplinary


Learning Chinese tones
Valery A. Petrushin

A pronunciation training system for Japanese lexical accents with corrective feedback in learner's voice
Keikichi Hirose, Frédéric Gendrin, Nobuaki Minematsu

Considerations on vowel durations for Japanese CALL system
Taro Mouri, Keikichi Hirose, Nobuaki Minematsu

Influence of recording equipment on the identification of second language phoneme contrasts
Hiroaki Kato, Masumi Nukinay, Hideki Kawaharay, Reiko Akahane-Yamada

Training a confidence measure for a reading tutor that listens
Yik-Cheung Tam, Jack Mostow, Joseph E. Beck, Satanjeev Banerjee

Evaluating the effect of predicting oral reading miscues
Satanjeev Banerjee, Joseph E. Beck, Jack Mostow

VISPER II - enhanced version of the educational software for speech processing courses
Miroslav Holada, Jan Nouza

The use of multiple pause information in dependency structure analysis of spoken Japanese sentences
Meirong Lu, Kazuyuki Takagi, Kazuhiko Ozeki

A neural network approach to dependency analysis of Japanese sentences using prosodic information
Kazuyuki Takagi, Mamiko Okimoto, Yoshio Ogawa, Kazuhiko Ozeki

Say-as classification for alphabetic words in Japanese texts
Hisako Asano, Masaaki Nagata, Masanobu Abe

Automatic transformation of environmental sounds into sound-imitation words based on Japanese syllable structure
Kazushi Ishihara, Yasushi Tsubota, Hiroshi G. Okuno

Decision tree-based simultaneous clustering of phonetic contexts, dimensions, and state positions for acoustic modeling
Heiga Zen, Keiichi Tokuda, Tadashi Kitamura

A statistical method of evaluating pronunciation proficiency for English words spoken by Japanese
Seiichi Nakagawa, Kazumasa Mori, Naoki Nakamura


×

Plenary Talks

Aurora Noise Robustness on SMALL Vocabulary Databases

ISCA Special Interest Group Session: "Hot Topics" in Speech Science and Technology

Speech Signal Processing 1-4

Phonology and Phonetics I

Topics in Prosody and Emotional Speech

Language Modeling, Discourse and Dialog

Speech Synthesis: Unit Selection 1, 2

Aurora Noise Robustness on LARGE Vocabulary Databases

Multilingual Speech-to-Speech Translation

Prosody

Language Modeling

Speech Modeling and Features 1-4

Speech Enhancement 1, 2

Spoken Dialog Systems 1, 2

Robust Speech Recognition - Noise Compensation

Forensic Speaker Recognition

Emotion in Speech

Dialog System User and Domain Modeling

Topics in Speech Recognition and Segmentation

Robust Speech Recognition - Acoustic Modeling

Advanced Machine Learning Algorithms for Speech and Language Processing

Multi-Modal Spoken Language Processing

Speech Coding and Transmission

Speech Recognition - Search and Lexicon Modeling

Speech Technology Applications

Robust Speech Recognition - Front-end Processing

Spoken Language Processing for e-Inclusion

Language and Accent Identification

Speech Recognition - Adaptation 1, 2

Speech Resources and Standards

Towards Synthesizing Expressive Speech

Speaker Verification

Dialog System Generation

Robust Speech Recognition 1-4

Speech Recognition - Large Vocabulary 1, 2

Robust Methods in Processing of Natural Language Dialogues

Speaker Identification

Speech Synthesis: Miscellaneous 1, 2

Speech Perception

Multi-Modal Processing and Speech Interface Design

Speech Recognition - Language Modeling

Feature Analysis and Cross-Language Processing of Chinese Spoken Language

Speech Production and Physiology

Speech Synthesis: Voice Conversion and Miscellaneous Topics

Acoustic Modelling 1, 2

Time is of the Essence - Dynamic Approaches to Spoken Language

Topics in Speech Recognition

Speaker and Language Recognition

Spoken Language Understanding and Translation

Towards a Roadmap for Speech Technology

Speaker Recognition and Verification

Multi-Lingual Spoken Language Processing

Interdisciplinary