doi: 10.21437/Interspeech.2012
An information-extraction approach to speech analysis and processing
Chin-Hui Lee
Music understanding and the future of music performance
Roger B. Dannenberg
Weighted transducers in speech and language processing
Michael Riley
Finding meaning in rodent ultrasonic vocalizations
Garet Lahvis
Large vocabulary speech recognition using deep tensor neural networks
Dong Yu, Li Deng, Frank Seide
Scalable minimum Bayes risk training of deep neural network acoustic models using distributed hessian-free optimization
Brian Kingsbury, Tara N. Sainath, Hagen Soltau
Discriminative feature-space transforms using deep neural networks
George Saon, Brian Kingsbury
Context-dependent MLPs for LVCSR: TANDEM, hybrid or both?
Zoltán Tüske, Martin Sundermeyer, Ralf Schlüter, Hermann Ney
Recurrent neural networks for noise reduction in robust ASR
Andrew L. Maas, Quoc V. Le, Tyler M. O'Neil, Oriol Vinyals, Patrick Nguyen, Andrew Y. Ng
Pipelined back-propagation for context-dependent deep neural networks
Xie Chen, Adam Eversole, Gang Li, Dong Yu, Frank Seide
Are sparse representations rich enough for acoustic modeling?
Oriol Vinyals, Li Deng
A initial attempt on task-specific adaptation for deep neural network-based large vocabulary continuous speech recognition
Yeming Xiao, Zhen Zhang, Shang Cai, Jielin Pan, Yonghong Yan
Application of pretrained deep neural networks to large vocabulary speech recognition
Navdeep Jaitly, Patrick Nguyen, Andrew Senior, Vincent Vanhoucke
Cross-lingual and ensemble MLPs strategies for low-resource speech recognition
Yanmin Qian, Jia Liu
Initialization schemes for multilayer perceptron training and their impact on ASR performance using multilingual data
Ngoc Thang Vu, Wojtek Breiter, Florian Metze, Tanja Schultz
Hermitian based hidden activation functions for adaptation of hybrid HMM/ANN models
Sabato Marco Siniscalchi, Jinyu Li, Chin-Hui Lee
Integrating deep neural networks into structural classification approach based on weighted finite-state transducers
Yotaro Kubo, Takaaki Hori, Atsushi Nakamura
Parallel training for deep stacking networks
Li Deng, Brian Hutchinson, Dong Yu
Articulatory feature based multilingual MLPs for low-resource speech recognition
Yanmin Qian, Jia Liu
Uncertainty-driven compensation of multi-stream MLP acoustic models for robust ASR ramon
Ramón Fernandez Astudillo, Alberto Abad, João Paulo da Silva Neto
Arabic dialect identification - "is the secret in the silence?" and other observations
Hynek Bořil, Abhijeet Sangwan, John H. L. Hansen
The 2011 NIST language recognition evaluation
Craig S. Greenberg, Alvin F. Martin, Mark A. Przybocki
The BLZ submission to the NIST 2011 LRE: data collection, system development and performance
Luis Javier Rodríguez-Fuentes, Mikel Penagarikano, Amparo Varona, Mireia Diez, Germán Bordel, Alberto Abad, David Martínez, Jesus Villalba, Alfonso Ortega, Eduardo Lleida
Phonotactic language recognition using ivvectors and phoneme posteriogram counts
Luis Fernando D'Haro, Ondřej Glembek, Oldřich Plchot, Pavel Matějka, Mehdi Soufifar, Ricardo Cordoba, Jan Černocký
Supervector LDA: a new approach to reduced-complexity i-vector language recognition
Alan McCree, Bengt Borgström
Patrol team language identification system for DARPA RATS P1 evaluation
Pavel Matějka, Oldřich Plchot, Mehdi Soufifar, Ondřej Glembek, Luis Fernando D'Haro, Karel Veselý, František Grézl, Jeff Ma, Spyros Matsoukas, Najim Dehak
Articulatory strategies in obstruent production in Mandarin esophageal speech
Fang Hu, Yungang Wu, Wen Xu, Demin Han
Consonantal space area in children with a cleft palate: an acoustic study
Marion Béchet, Fabrice Hirsch, Camille Fauth, Rudolph Sock
Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech
Milton Sarria Paja, Tiago H. Falk
Assessment of disordered voices using empirical mode decomposition in the log-spectral domain
Abdellah Kacha, Francis Grenez, Jean Schoentgen
Learning an artificial F0-contour for ALT speech
Anna Katharina Fuchs, Martin Hagmüller
Ultrax: an animated midsagittal vocal tract display for speech therapy
Korin Richmond, Steve Renals
A study of mutual information for GMM-based spectral conversion
Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Yih-Ru Wang, Sin-Horng Chen
Bayesian mixture of probabilistic linear regressions for voice conversion
Na Li, Yu Qiao
Iterative MMSE estimation of vocal tract length normalization factors for voice transformation
Daniel Erro, Eva Navas, Inma Hernáez
An HMM approach to residual estimation for high resolution voice conversion
Winston Percybrooks, Elliot Moore
Implementation of computationally efficient real-time voice conversion
Tomoki Toda, Takashi Muramatsu, Hideki Banno
Effects of speaker adaptive training on tensor-based arbitrary speaker conversion
Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose
Discrimination of linguistic and non-linguistic vocalizations in spontaneous speech: intra- and inter-corpus perspectives
Felix Weninger, Björn Schuller
Accentual transfer from Swiss-German to French. a study of "francais federal"
Mathieu Avanzi, Pauline Dubosson, Sandra Schwab, Nicolas Obin
Phonology & the interpretation of fine phonetic detail in Berlin German
Stefanie Jannedy, Melanie Weirich
Evaluation of a formant-based speech-driven lip motion generation
Carlos T. Ishi, Chaoran Liu, Hiroshi Ishiguro, Norihiro Hagita
Using spectral measures to differentiate Mandarin and Korean sibilant fricatives
Jeffrey Kallay, Jeffrey Holliday
EFL conversational triads: foreigner-directed speech and hyperarticulation
Hua-Li Jian, Richard Konopka
Syllable perception depends on tone perception
Iris Chuoying Ouyang, Khalil Iskarous
Assessing agreement level between forced alignment models with data from endangered language documentation corpora
Christian T. DiCanio, Hosung Nam, Douglas H. Whalen, H. Timothy Bunnell, Jonathan D. Amith, Rey Castillo Garcia
How consonants, dialect and speech rate affect vowel devoicing?
Masako Fujimoto, Seiya Funatsu, Ichiro Fujimoto
Effects of stress and speech rate on vowel quality in Catalan and Spanish
Marianna Nadeu
Predictability affects vowel dispersion and dynamics in the Buckeye corpus
Michael McAuliffe, Molly Babel
Dialectal and generational variations in vowels in spontaneous speech
Robert Allen Fox, Ewa Jacewicz
Perceiving listener-directed speech: effects of authenticity and lexical neighborhood density
Rebecca Scarborough, Georgia Zellou
Acoustic cues of vowel quality to coda nasal perception in southern Min
Ying Chen, Vsevolod Kapatsinski, Susan Guion-Anderson
Lenition of /d/ in spontaneous Spanish and Catalan
Miquel Simonet, José I. Hualde, Marianna Nadeu
Distance-dependent noise reduction for two-channel microphones
Thomas Fehér, Dietmar Richter, Oliver Jokisch, Rüdiger Hoffmann
Direction of arrival estimation based on subband weighting for noisy conditions
Wei Xue, Wenju Liu
Binaural noise reduction using frequency-warped FIR filters
Jorge I. Marin-Hurtado, David V. Anderson
Exploring off time nature for speech enhancement
Meng Yu, Jack Xin
Model-based single-channel dereverberation in noisy acoustical environments
Xulei Bao, Jie Zhu
An auditory inspired multimodal framework for speech enhancement
Majid Mirbagheri, Sahar Akram, Shihab Shamma
Binary mask estimation for improved speech intelligibility in reverberant environments
Oldooz Hazrati, Jaewook Lee, Philipos C. Loizou
Enhancing subjective speech intelligibility using a statistical model of speech
Petko N. Petkov, W. Bastiaan Kleijn, Gustav Eje Henter
Morpheme level feature-based language models for German LVCSR
Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney
Tied-state mixture language model for WFST-based speech recognition
Hitoshi Yamamoto, Paul R. Dixon, Shigeki Matsuda, Chiori Hori, Hideki Kashioka
Maximum entropy language model adaptation for mobile speech input
Tanel Alumäe, Kaarel Kaljurand
Supervised and unsupervised web-based language model domain adaptation
Gwénolé Lecorvé, John Dines, Thomas Hain, Petr Motlicek
A hierarchical Bayesian approach for semi-supervised discriminative language modeling
Yik-Cheung Tam, Paul Vozila
Leveraging social annotation for topic language model adaptation
Youzheng Wu, Kazuhiko Abe, Paul R. Dixon, Chiori Hori, Hideki Kashioka
LSTM neural networks for language modeling
Martin Sundermeyer, Ralf Schlüter, Hermann Ney
Phrasal cohort based unsupervised discriminative language modeling
Puyang Xu, Brian Roark, Sanjeev Khudanpur
Deriving conversation-based features from unlabeled speech for discriminative language modeling
Damianos Karakos, Brian Roark, Izhak Shafran, Kenji Sagae, Maider Lehr, Emily Prud'hommeaux, Puyang Xu, Nathan Glenn, Sanjeev Khudanpur, Murat Saraclar, Dan Bikel, Mark Dredze, Chris Callison-Burch, Yuan Cao, Keith Hall, Eva Hasler, Philip Koehn, Adam Lopez, Matt Post, Darcey Riley
Performance comparison of training algorithms for semi-supervised discriminative language modeling
Erinç Dikici, Arda Çelebi, Murat Saraçlar
On-the-fly topic adaptation for YouTube video transcription
Kapil Thadani, Fadi Biadsy, Dan Bikel
Portability of semantic annotations for fast development of dialogue corpora
Bassam Jabaian, Fabrice Lefèvre, Laurent Besacier
Optimization of dialog strategies using automatic dialog simulation and statistical dialog management techniques
David Griol, Zoraida Callejas, Ramón López-Cózar
Preference-learning based inverse reinforcement learning for dialog control
Hiroaki Sugiyama, Toyomi Meguro, Yasuhiro Minami
A data-driven approach to understanding spoken route directions in human-robot dialogue
Raveesh Meena, Gabriel Skantze, Joakim Gustafson
Detecting system-directed utterances using dialogue-level features
Kazunori Komatani, Akira Hirano, Mikio Nakano
An online generated transducer to increase dialog manager coverage
Joaquin Planells, Lluís-F. Hurtado, Emilio Sanchis, Encarna Segarra
A sequential Bayesian dialog agent for computational ethnography
Abe Kazemzadeh, James Gibson, Juanchen Li, Sungbok Lee, Panayiotis G. Georgiou, Shrikanth Narayanan
Clippyscript: a programming language for multi-domain dialogue systems
Frank Seide, Sean McDirmid
Correlation between model-based approximations of grounding-related cognition and user judgments
Klaus-Peter Engelbrecht, Sebastian Möller
Assessment of user simulators for spoken dialogue systems by means of subspace multidimensional clustering
Zoraida Callejas, David Griol, Klaus-Peter Engelbrecht
“help me, i need more user tests!” user simulations as supportive tool in the development process of spoken dialogue systems
Florian Kretzschmar, Sebastian Möller
Caller response timing patterns in spoken dialog systems
Silke M. Witt
A discriminative classification-based approach to information state updates for a multi-domain dialog system
Dilek Hakkani-Tür, Gokhan Tur, Larry Heck, Ashley Fidler, Asli Celikyilmaz
Learning when to listen: detecting system-addressed speech in human-human-computer dialog
Elizabeth Shriberg, Andreas Stolcke, Dilek Hakkani-Tür, Larry Heck
Exploiting the semantic web for unsupervised natural language semantic parsing
Gokhan Tur, Minwoo Jeong, Ye-Yi Wang, Dilek Hakkani-Tür, Larry Heck
Prosodic entrainment in an information-driven dialog system
Andrew Fandrianto, Maxine Eskenazi
The INTERSPEECH 2012 speaker trait challenge
Björn Schuller, Stefan Steidl, Anton Batliner, Elmar Nöth, Alessandro Vinciarelli, Felix Burkhardt, Rob van Son, Felix Weninger, Florian Eyben, Tobias Bocklet, Gelareh Mohammadi, Benjamin Weiss
On speaker-independent personality perception and prediction from speech
Tim Polzehl, Katrin Schoenenberg, Sebastian Möller, Florian Metze, Gelareh Mohammadi, Alessandro Vinciarelli
Speaker personality classification using systems based on acoustic-lexical cues and an optimal tree-structured Bayesian network
Kartik Audhkhasi, Angeliki Metallinou, Ming Li, Shrikanth S. Narayanan
Personality traits detection using a parallelized modified SFFS algorithm
Clément Chastagnol, Laurence Devillers
Feature selection for speaker traits
Jouni Pohjalainen, Serdar Kadioglu, Okko Räsänen
A frame pruning approach for paralinguistic recognition tasks
Johannes Wagner, Florian Lingenfelser, Elisabeth André
Modulation spectrum analysis for speaker personality trait recognition
Alexei Ivanov, Xin Chen
A comparison of classification paradigms for speaker likeability determination
Nicholas Cummins, Julien Epps, Jia Min Karen Kua
Predicting likability of speakers with Gaussian processes
Dingchao Lu, Fei Sha
Likability classification - a not so deep neural network approach
Raymond Brueckner, Björn Schuller
Genetic algorithm based feature selection for speaker trait classification
Dongrui Wu
Is 'not bad' good enough? aspects of unknown voices' likability
Benjamin Weiss, Felix Burkhardt
Multi-system fusion of extended context prosodic and cepstral features for paralinguistic speaker trait classification
Michelle Hewlett Sanchez, Aaron Lawson, Dimitra Vergyri, Harry Bratt
The log-Gabor method: speech classification using spectrogram image analysis
Harm Buisman, Eric Postma
Anchor models and WCCN normalization for speaker trait classification
Yazid Attabi, Pierre Dumouchel
Pitch and intonation contribution to speakers' traits classification
Claude Montacié, Marie-José Caraty
Text-dependent pathological voice detection
Gopala Krishna Anumanchipalli, Hugo Meinedo, Miguel Bugalho, Isabel Trancoso, Luís C. Oliveira, Alan W. Black
Intelligibility classification of pathological speech using fusion of multiple high level descriptors
Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li, Shrikanth Narayanan
Interspeech pathology challenge: investigations into speaker and sentence specific effects
Anthony Stark, Alireza Bayestehtashk, Meysam Asgari, Izhak Shafran
Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations
Xinhui Zhou, Daniel Garcia-Romero, Nima Mesgarani, Maureen Stone, Carol Espy-Wilson, Shihab Shamma
Detecting intelligibility by linear dimensionality reduction and normalized voice quality hierarchical features
Dong-Yan Huang, Yongwei Zhu, Dajun Wu, Rongshan Yu
Microphone array post-filter based on spatially-correlated noise measurements for distant speech recognition
Kenichi Kumatani, Bhiksha Raj, Rita Singh, John McDonough
Combining bottleneck-BLSTM and semi-supervised sparse NMF for recognition of conversational speech in highly instationary noise
Felix Weninger, Martin Wöllmer, Björn Schuller
Noise compensation for subspace Gaussian mixture models
Liang Lu, K. K. Chin, Arnab Ghoshal, Steve Renals
Combination of sparse classification and multilayer perceptron for noise-robust ASR
Yang Sun, Mathew M. Doss, Jort F. Gemmeke, Bert Cranen, Louis ten Bosch, Lou Boves
Sub-band based log-energy and its dynamic range stretching for robust in-car speech recognition
Weifeng Li, Hervé Bourlard
Noise compensation for speech recognition using subspace Gaussian mixture models
Mohamed Bouallegue, Mickael Rouvier, Driss Matrouf, Georges Linarès
Novel metrics of speech rhythm for the assessment of emotion
Fabien Ringeval, Mohamed Chetouani, Björn Schuller
Temporal and situational context modeling for improved dominance recognition in meetings
Martin Wöllmer, Florian Eyben, Björn Schuller, Gerhard Rigoll
Audiovisual correlates of basic emotions in blind and sighted people
Marc Swerts, Kitty Leuverink, Madelene Munnik, Vera Nijveld
Combining ranking and classification to improve emotion recognition in spontaneous speech
Houwei Cao, Ragini Verma, Ani Nenkova
Active learning by sparse instance tracking and classifier confidence in acoustic emotion recognition
Zixing Zhang, Björn Schuller
Emotion recognition using acoustic and lexical features
Viktor Rozgić, Sankaranarayanan Ananthakrishnan, Shirin Saleem, Rohit Kumar, Aravind Namandi Vembu, Rohit Prasad
Improving recognition of speaker states and traits by cumulative evidence: intoxication, sleepiness, age and gender
Felix Weninger, Erik Marchi, Björn Schuller
Speaker clustering in emotion recognition
Ni Ding, Julien Epps
Automatic detection of conflict escalation in spoken conversations
Samuel Kim, Sree Harsha Yella, Fabio Valente
The entropy of intoxicated speech.lexical creativity and heavy tongues
Uwe D. Reichel, Thomas Kisler
A robust unsupervised arousal rating framework using prosody with cross-corpora evaluation
Daniel Bone, Chi-Chun Lee, Shrikanth S. Narayanan
Unveiling the acoustic properties that describe the valence dimension
Carlos Busso, Tauhidur Rahman
Annotation and recognition of personality traits in spoken conversations from the AMI meetings corpus
Fabio Valente, Samuel Kim, Petr Motlicek
The effects of lexical tones and nasal coda /-n/ to sadness in Taiwan Hakka
Shao-ren Lyu
Confidence measures in speech emotion recognition based on semi-supervised learning
Jun Deng, Björn Schuller
Using i-vector space model for emotion recognition
Rui Xia, Yang Liu
Cries and whispers.classification of vocal effort in expressive speech
Nicolas Obin
Emotional speech: a spectral analysis
Pouria Fewzee, Fakhri Karray
Classifying skewed data: importance weighting to optimize average recall
Andrew Rosenberg
Gaze patterns in turn-taking
Catharine Oertel, Marcin Włodarczak, Jens Edlund, Petra Wagner, Joakim Gustafson
The "audio-visual face cover corpus": investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear
Natalie Fecher
A case study: detecting counselor reflections in psychotherapy for addictions using linguistic features
Doğan Can, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan
Synthetic speech discrimination using pitch pattern statistics derived from image analysis
Phillip L. De Leon, Bryan Stewart, Junichi Yamagishi
Pitch-scaled analysis based residual reconstruction for speech analysis and synthesis
Zhengqi Wen, Hideki Kawahara, Jianhua Tao
Robust pitch estimation using l1-regularized maximum likelihood estimation
Feng Huang, Tan Lee
A full-band adaptive harmonic representation of speech
Gilles Degottex, Yannis Stylianou
Deviation measure of waveform symmetry and its application to high-speed and temporally-fine F0 extraction for vocal sound texture manipulation
Hideki Kawahara, Masanori Morise, Ryuichi Nisimura, Toshio Irino
Hidden Markov convolutive mixture model for pitch contour analysis of speech
Kota Yoshizato, Hirokazu Kameoka, Daisuke Saito, Shigeki Sagayama
Extrinsic normalization for vocal tracts depends on the signal, not on attention
Matthias Sjerps, James M. McQueen, Holger Mitterer
Perceptual learning of /f/-/s/ by older listeners
Odette Scharenborg, Esther Janse, Andrea Weber
Correlation between vocal tract length, body height, formant frequencies, and pitch frequency for the five Japanese vowels uttered by fifteen male speakers
Hiroaki Hatano, Tatsuya Kitamura, Hironori Takemoto, Parham Mokhtari, Kiyoshi Honda, Shinobu Masaki
Detection of transition segments in VCV utterances for estimation of the place of closure of oral stops for speech training
J. Jagbandhu, K. S. Nataraj, Prem C. Pandey
Audiovisual discrimination of CV syllables: a simultaneous fMRI-EEG study
Cyril Dubois, Rudolph Sock
Contribution of spectral shapes to tone perception
Natthawut Kertkeidkachorn, Surapol Vorapatratorn, Sirinart Tangruamsub, Proadpran Punyabukkana, Atiwong Suchato
Methodological issues in assessing perceptual representation of consonant sounds in Thai
Charturong Tantibundhit, Chutamanee Onsuwan, P. Phienphanich, Chai Wutiwiwatchai
Pitch and phonological perception of tone in the Suruí language of Rondônia (Brazil): identification task of LHL and LHH tonal patterns
Julien Meyer
The role of creaky voice in Mandarin tone 2 and tone 3 perception
Rui Cao, Ratree Wayland, Edith Kaan
Can litheners retune native categories acroth a thoneme boundary?
Michael D. Tyler, Mona M. Faris
Synthetic F0 can effectively convey speaker ID in delexicalized speech
Eric Morley, Esther Klabbers, Jan P. H. van Santen, Alexander Kain, Seyed Hamidreza Mohammadi
Evaluating prosodic processing for incremental speech synthesis
Timo Baumann, David Schlangen
Expressing speaker's intentions through sentence-final intonations for Japanese conversational speech synthesis
Kazuhiko Iwata, Tetsunori Kobayashi
Modeling pause-duration for style-specific speech synthesis
Alok Parlikar, Alan W. Black
Enumerating differences between various communicative functions for purposes of Czech expressive speech synthesis in limited domain
Martin Gruber
Quality analysis of macroprosodic F0 dynamics in text-to-speech signals
Christoph R. Norrenbrock, Florian Hinterleitner, Ulrich Heute, Sebastian Möller
Improved automatic extraction of generation process model commands and its use for generating fundamental frequency contours for training HMM-based speech synthesis
Hiroya Hashimoto, Keikichi Hirose, Nobuaki Minematsu
Discontinuous observation HMM for prosodic-event-based F0 generation
Tomoki Koriyama, Takashi Nose, Takao Kobayashi
Hierarchical English emphatic speech synthesis based on HMM with limited training data
Fanbo Meng, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai
Employing sentence structure: syntax trees as prosody generators
Sarah Hoffmann, Beat Pfister
A stochastic model of singing voice F0 contours for characterizing expressive dynamic components
Yasunori Ohishi, Hirokazu Kameoka, Daichi Mochihashi, Kunio Kashino
Study on integration of speaker diarization with speaker adaptive speech recognition for broadcast transcription
Jan Silovsky, Petr Cerva, Jindrich Zdansky, Jan Nouza
On the use of spectral and iterative methods for speaker diarization
Stephen Shum, Najim Dehak, James Glass
Where did i go wrong?: identifying troublesome segments for speaker diarization systems
Mary Tai Knox, Nikki Mirghafori, Gerald Friedland
Speaker diarization of overlapping speech based on silence distribution in meeting recordings
Sree Harsha Yella, Fabio Valente
Phone adaptive training for speaker diarization
Simon Bozonnet, Ravichander Vipperla, Nicholas Evans
Compensating for ageing and quality variation in speaker verification
Finnian Kelly, Andrzej Drygajlo, Naomi Harte
Calibration of probabilistic age recognition
David van Leeuwen, Mohamad Hasan Bahari
Age estimation from telephone speech using i-vectors
Mohamad Hasan Bahari, Mitchell McLaren, Hugo Van hamme, David van Leeuwen
A factorized representation of FMLLR transform based on QR-decomposition
Shakti P. Rath, Martin Karafiát, Ondřej Glembek, Jan Černocký
A correlational discriminant approach to feature extraction for robust speech recognition
Vikrant Singh Tomar, Richard C. Rose
Discriminative training using non-uniform criteria for keyword spotting on spontaneous speech
Chao Weng, Biing-Hwang (Fred) Juang, Daniel Povey
Discriminative reranking for LVCSR leveraging invariant structure
Masayuki Suzuki, Gakuto Kurata, Masafumi Nishimura, Nobuaki Minematsu
Discriminative fuzzy clustering maximum a posterior linear regression for speaker adaptation
Ting-yao Hu, Yu Tsao, Lin-shan Lee
Simultaneous discriminative training and mixture splitting of HMMs for speech recognition
Muhammad Ali Tahir, Markus Nussbaum-Thom, Ralf Schlüter, Hermann Ney
Low-SNR, speaker-dependent speech enhancement using GMMs and MFCCs
Laura Boucheron, Phillip L. De Leon
Can modified casual speech reach the intelligibility of clear speech?
Maria Koutsogiannaki, Michelle Pettinato, Cassie Mayo, Varvara Kandia, Yannis Stylianou
Speech enhancement using sparse convolutive non-negative matrix factorization with basis adaptation
Michael A. Carlin, Nicolas Malyska, Thomas F. Quatieri
Inventory-based audio-visual speech enhancement
Dorothea Kolossa, Robert Nickel, Steffen Zeiler, Rainer Martin
Utilization of the lombard effect in post-.ltering for intelligibility enhancement of telephone speech
Emma Jokinen, Paavo Alku, Martti Vainio
Speech enhancement by online non-negative spectrogram decomposition in nonstationary noise environments
Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis
Phoneme resistance during speech-in-speech comprehension
Léo Varnet, Julien Meyer, Michel Hoen, Fanny Meunier
smile with a smile
Hugo Quené, Will Schuerman
Interactions between turn-taking gaps, disfluencies and social obligation
Rebecca Lunsford, Peter A. Heeman, Jan P. H. van Santen
Effect of being seen on the production of visible speech cues. a pilot study on lombard speech
Maëva Garnier, Lucie Ménard, Gabrielle Richard
Temporal entrainment in overlapped speech: cross-linguistic study
Marcin Włodarczak, Juraj Šimko, Petra Wagner
Based on isolated saliency or causal integration? toward a better understanding of human annotation process using multiple instance learning and sequential probability ratio test
Chi-Chun Lee, Athanasios Katsamanis, Panayiotis G. Georgiou, Shrikanth S. Narayanan
Contrasting cues to verbal and non-verbal backchannels in multi-lingual dyadic rapport
Gina-Anne Levow, Susan Duncan
Prosodic measurements and question types in the spontal corpus of Swedish dialogues
Sofia Strömbergsson, Jens Edlund, David House
Measuring prosodic alignment in cooperative task-based conversations
Khiet P. Truong, Dirk Heylen
On the dynamics of overlap in multi-party conversation
Kornel Laskowski, Mattias Heldner, Jens Edlund
On the acoustics of overlapping laughter in conversational speech
Khiet P. Truong, Jürgen Trouvain
A corpus-based study of interruptions in spoken dialogue
Agustín Gravano, Julia Hirschberg
Text-to-speech intelligibility across speech rates
Ann K. Syrdal, H. Timothy Bunnell, Susan R. Hertz, Taniya Mishra, Murray Spiegel, Corine Bickley, Deborah Rekart, Matthew J. Makashay
Objective intelligibility assessment of text-to-speech system using template constrained generalized posterior probability
Linfang Wang, Lijuan Wang, Yan Teng, Zhe Geng, Frank K. Soong
Mel cepstral coefficient modification based on the glimpse proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise
Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King
Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression
Tudor-Catalin Zorila, Varvara Kandia, Yannis Stylianou
Implementation of simple spectral techniques to enhance the intelligibility of speech using a harmonic model
Daniel Erro, Yannis Stylianou, Eva Navas, Inma Hernáez
Making conversational vowels more clear
Seyed Hamidreza Mohammadi, Alexander Kain, Jan P. H. van Santen
Naturalness judgement of prosodic variation of Japanese utterances with prosody modified stimuli
Chiharu Tsurutani, Shunichi Ishihara
Effects of dialectal origin on articulation rate in French
Mathieu Avanzi, Pauline Dubosson, Sandra Schwab
A new approach of speaking rate modeling for Mandarin speech prosody
Chiao-Hua Hsieh, Chen-Yu Chiang, Yih-Ru Wang, Hsiu-Min Yu, Sin-Horng Chen
Modelling pause duration as a function of contextual length
David Doukhan, Albert Rilliard, Sophie Rosset, Christophe D'Alessandro
Production and perception of focus in PFC and non-PFC languages: comparing beijing Mandarin and hainan tsat
Bei Wang, Chenxia Li, Qian Wu, Xiaxia Zhang, Baofeng Wang, Yi Xu
Prosodic realization of focus in statement and question in tibetan (lhasa dialect)
Xiaxia Zhang, Bei Wang, Qian Wu, Yi Xu
Effect of noise type and level on focus related fundamental frequency changes
Martti Vainio, Daniel Aalto, Antti Suni, Anja Arnhold, Tuomo Raitio, Henri Seijo, Juhani Järvikivi, Paavo Alku
Role of prosody in automatic modality recognition of bangla speech
Anal Warsi, Tulika Basu, Debasis Mazumdar
Where to associate stressed additive particles? evidence from speech prosody
Bettina Braun
From PVI to perception: a return to the roots of rhythm in broadcast news
Matthew Benton
A methodology for the study of rhythm in drummed forms of languages: application to Bora Manguare of Amazon
Julien Meyer, Laure Dentel, Frank Seifart
Perception of pitch contours among native tone listeners
Ratree Wayland, Donruethai Laphasradakul, Edith Kaan, Rui Cao
Pitch range control of Japanese boundary pitch movements
Yosuke Igarashi, Hanae Koiso
Perceived prosodic boundaries in taiwanese and their acoustic correlates
Grace Kuo
Phonetic foreignization of Mandarin for dubbing in imported western movies
Laying Hon, Yuan Jia, Aijun Li
Prosodic contex-based analysis of disfluencies.
Helena Moniz, Fernando Batista, Isabel Trancoso, Ana Isabel Mata
Describing the development of intonational categories using a target-oriented parametric approach
Britta Lintfert, Bernd Möbius
Automatic detection of high vocal effort in telephone speech
Jouni Pohjalainen, Tuomo Raitio, Hannu Pulakka, Paavo Alku
Analysis of mimicry speech
D. Gomathi, Sathya Adithya Thati, Karthik Venkat Sridaran, Bayya Yegnanarayana
Estimation of the vocal tract shape of nasals using a Bayesian scheme
Christian H. Kasess, Wolfgang Kreuzer, Ewald Enzinger, Nadja Kerschhofer-Puhalo
Advances in combined electro-optical palatography
Peter Birkholz, Philippe Dächert, Christiane Neuschaefer-Rube
Noise robust pitch tracking by subband autocorrelation classification
Byung Suk Lee, Daniel P. W. Ellis
Inference of critical articulator position for fricative consonants
Alexander Sepulveda, Rodrigo Capobianco-Guido, German Castellanos-Dominguez
Vocal tremor measurement based on autocorrelation of contours
Markus Brückl
Model-based duration-difference approach on accent evaluation of L2 learner
Chatchawarn Hansakunbuntheung, Ananlada Chotimongkol, Sumonmas Thatphithakkul, Patcharika Chootrakool
Continuous articulatory-to-acoustic mapping using phone-based trajectory HMM for a silent speech interface
Thomas Hueber, Gérard Bailly, Bruce Denby
Prediction of turn-taking by combining prosodic and eye-gaze information in poster conversations
Tatsuya Kawahara, Takuma Iwatate, Katsuya Takanashi
Using quality ratings to predict modality choice in multimodal systems
Ina Wechsung, Klaus-Peter Engelbrecht, Sebastian Möller
HMM based continuous EOG recognition for eye-input speech interface
Fuming Fang, Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa, Sadaoki Furui, Toshimitsu Musha
A random, semantically appropriate sentence generator for speaker verification
Jason Lilley, Amanda Stent, Ilija Zeljkovic
Coherent topic transition in a conversational agent
Daniel Macias-Galindo, Wilson Wong, Lawrence Cavedon, John Thangarajah
Using reinforcement learning for dialogue management Policies: towards understanding MDP violations and convergence
Peter A. Heeman, Jordan Fryer, Rebecca Lunsford, Andrew Rueckert, Ethan Selfridge
Enhancing speech understanding in spoken dialogue systems by means of a new frame-correction technique
Ramón López-Cózar, Zoraida Callejas, David Griol
Prosodic cues to disengagement and uncertainty in physics tutorial dialogues
Diane Litman, Heather Friedberg, Kate Forbes-Riley
Spoken dialogs with a virtual science tutor
Wayne H. Ward, Daniel Bolanos, Ronald A. Cole
Real-time lecture transcription using ASR for Czech hearing impaired or deaf students
Petr Cerva, Jan Silovsky, Jindrich Zdansky, Jan Nouza, Jiri Malek
Application of structural events detected on ASR outputs for automated speaking assessment
Lei Chen, Su-Youn Yoon
Addressing confusions in spoken language in ESL pronunciation tutors
Oscar Saz, Maxine Eskenazi
The use of DBN-HMMs for mispronunciation detection and diagnosis in L2 English to support computer-aided pronunciation training
Xiaojun Qian, Helen Meng, Frank K. Soong
Practice and feedback in L2 speaking: an evaluation of the DISCO CALL system
Catia Cucchiarini, Joost van Doremalen, Helmer Strik
Cross-speaker acoustic-to-articulatory inversion using phone-based trajectory HMM for pronunciation training
Thomas Hueber, Atef Ben-Youssef, Gérard Bailly, Pierre Badin, Frédéric Elisei
MAP estimation of whole-word acoustic models with dictionary priors
Keith Kintzley, Aren Jansen, Hynek Hermansky
Data-driven posterior features for low resource speech recognition applications
Samuel Thomas, Sriram Ganapathy, Aren Jansen, Hynek Hermansky
Sparse Bayesian factor analysis for stereo-based stochastic mapping
Xiaodong Cui, Mohamed Afify, George Saon, Vaibhava Goel
Word discovery with beta process factor analysis
Niklas Vanhainen, Giampiero Salvi
Speaker adaptation using variational Bayesian linear regression in normalized feature space
Seong-Jun Hahm, Atsunori Ogawa, Masakiyo Fujimoto, Takaaki Hori, Atsushi Nakamura
Bayesian feature enhancement for ASR of noisy reverberant real-world data
Alexander Krueger, Oliver Walter, Volker Leutnant, Reinhold Haeb-Umbach
Robust tracking for automatic reading tutors
Emre Yilmaz, Dirk van Compernolle, Hugo Van hamme
Maximum F1-score discriminative training for automatic mispronunciation detection in computer-assisted language learning
Hao Huang, Jianming Wang, Halidan Abudureyimu
Error pattern detection integrating generative and discriminative learning for computer-aided pronunciation training
Yow-Bang Wang, Lin-Shan Lee
The automatic assessment of non-native prosody: combining classical prosodic analysis with acoustic modelling
Florian Hönig, Tobias Bocklet, Korbinian Riedhammer, Anton Batliner, Elmar Nöth
Improving L1-specific phonological error diagnosis in computer assisted pronunciation training
Theban Stanley, Kadri Hacioglu
A self-learning assistive vocal interface based on vocabulary learning and grammar induction
Jort F. Gemmeke, Janneke van de Loo, Guy de Pauw, Joris Driesen, Hugo Van hamme, Walter Daelemans
Real-time visualization of English pronunciation on an IPA chart based on articulatory feature extraction
Yurie Iribe, Takurou Mori, Kouichi Katsurada, Goh Kawai, Tsuneo Nitta
Acoustic feature-based non-scorable response detection for an automated speaking proficiency assessment
Je Hun Jeon, Su-Youn Yoon
Pronunciation quality evaluation of sentences by combining word based scores
Jorge Wuth, Néstor Becerra Yoma, Leopoldo Benavides, Hiram Vivanco
Designing a spoken language interface for a tutorial dialogue system
Peter Bell, Myroslava Dzikovska, Amy Isard
Automatic pronunciation error detection based on extended pronunciation space using the unsupervised clustering of pronunciation errors
Long Zhang, Haifeng Li, Lin Ma
Less errors with TTS? a dictation experiment with foreign language learners
Thomas Pellegrini, Ângela Costa, Isabel Trancoso
Improvement in automatic pronunciation scoring using additional basic scores and learning to rank
Liang-Yu Chen, Jyh-Shing Roger Jang
Automatic tone assessment of non-native Mandarin speakers
Jian Cheng
On the modeling of voiceless stop sounds of speech using adaptive quasi-harmonic models
George P. Kafentzis, Olivier Rosec, Yannis Stylianou
An alignment matching method to explore pseudosyllable properties across different corpora
Raymond W. M. Ng, Thomas Hain, Keikichi Hirose
Deep architectures for articulatory inversion
Benigno Uria, Iain Murray, Steve Renals, Korin Richmond
Automatic measurement of positive and negative voice onset time
Katharine Henry, Morgan Sonderegger, Joseph Keshet
Efficient multipulse approximation of speech excitation using the most singular manifold
Vahid Khanagha, Khalid Daoudi
Intrinsic spectral analysis for zero and high resource speech recognition
Aren Jansen, Samuel Thomas, Hynek Hermansky
Computational modelling of the recognition of foreign-accented speech
Odette Scharenborg, Marijt Witteman, Andrea Weber
The production and perception of Estonian quantity degrees by native and non-native speakers
Lya Meister, Einar Meister
Perception of the moraic obstruent /q/: a cross-linguistic study
Makiko Sadakata, Mizuki Shingai, Alex Brandmeyer, Kaoru Sekiyama
Comparative analysis of intensity between native speakers and Japanese speakers of English
Tomoko Nariai, Kazuyo Tanaka, Tatsuya Kawahara
Auditory and dynamic modeling paradigms to detect L2 mispronunciations
Christos Koniaris, Olov Engwall, Giampiero Salvi
Cross linguistic comparison of Mandarin and English EMA articulatory data
Sheng Li, Lan Wang
Physiological and acoustic study of word initial post-lexical gemination in Moroccan Arabic
Chakir Zeroual, Diamantis Gafos, Phil Hoole, John Esling
Perceptual assimilation of Arabic voiceless fricatives by English monolinguals
Michael D. Tyler, Sarah Fenwick
Non-auditory cognitive capabilities in computational modeling of early language acquisition
Okko Räsänen
Modeling spoken language acquisition with a generic cognitive architecture for associative learning
Okko Räsänen, Heikki Rasilo, Unto K. Laine
Pitch estimation based on long frame harmonic model and short frame average correlation coefficient
Dongmei Wang, Philipos C. Loizou
Diagnostic prediction of transmitted speech quality: a new framework for signal-based and parametric models
Sebastian Möller, Marcel Wältermann, Nicolas Côté
Enumerative algebraic coding for ACELP
Tom Bäckström
Speech enhancement with bivariate gamma model
Atanu Saha, Tetsuya Shimamura
Improvements of the beta-order minimum mean-square error (MMSE) spectral amplitude estimator using chi priors
Marek Trawicki, Michael Johnson
Enhancing speech by reconstruction from robust acoustic features
Philip Harding, Ben Milner
Joint pitch-analysis formant-synthesis framework for CS recovery of speech
Srikanth Raj Chetupally, Thippur V. Sreenivas
A new noise-tracking algorithm for generalizing binary time-frequency (t-f) masking to ratio masking
Shan Liang, Wei Jiang, Wenju Liu
Optimised spectral weightings for noise-dependent speech intelligibility enhancement
Yan Tang, Martin Cooke
Exploring rich expressive information from audiobook data using cluster adaptive training
Langzhou Chen, Mark J. F. Gales, Vincent Wan, Javier Latorre, Masami Akamine
Turning a monolingual speaker into multilingual for a mixed-language TTS
Ji He, Yao Qian, Frank K. Soong, Sheng Zhao
Using HMM-based speech synthesis to reconstruct the voice of individuals with degenerative speech disorders
Christophe Veaux, Junichi Yamagishi, Simon King
Speech factorization for HMM-TTS based on cluster adaptive training
Javier Latorre, Vincent Wan, Mark J. F. Gales, Langzhou Chen, K. K. Chin, Kate Knill, Masami Akamine
Factored MLLR adaptation algorithm for HMM-based expressive TTS
June Sig Sung, Doo Hwa Hong, Hyun Woo Koo, Nam Soo Kim
Speaker-adaptive visual speech synthesis in the HMM-framework
Dietmar Schabus, Michael Pucher, Gregor Hofer
Cross-lingual speaker adaptation for HMM-based speech synthesis based on perceptual characteristics and speaker interpolation
Viviane de Franca Oliveira, Sayaka Shiota, Yoshihiko Nankaku, Keiichi Tokuda
C2h: a computational model of H&h-based phonetic contrast in synthetic speech
Mauro Nicolao, Javier Latorre, Roger K. Moore
Vowel creation by articulatory control in HMM-based parametric speech synthesis
Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi
Analysis of speaker clustering strategies for HMM-based speech synthesis
Rasmus Dall, Christophe Veaux, Junichi Yamagishi, Simon King
Word relevance modeling for speech recognition
Kuan-Yu Chen, Hao-Chin Chang, Berlin Chen, Hsin-Min Wang
Using context-free grammars for embedded speech recognition with weighted finite-state transducers
Frank Duckhorn, Rüdiger Hoffmann
Automatic transcription error recovery for person name recognition
Richard Dufour, Géraldine Damnati, Delphine Charlet, Frédéric Béchet
Efficient beam width control to suppress excessive speech recognition computation time based on prior score range normalization
Satoshi Kobashikawa, Takaaki Hori, Yoshikazu Yamaguchi, Taichi Asami, Hirokazu Masataki, Satoshi Takahashi
Search space pruning based on anticipated path recombination in LVCSR
David Nolden, Ralf Schlüter, Hermann Ney
Estimating word-stability during incremental speech recognition
Ian McGraw, Alexander Gruenstein
Using broad phonetic classes to guide search in automatic speech recognition
Stefan Ziegler, Bogdan Ludusan, Guillaume Gravier
Parallel combination of multilingual speech streams for improved ASR
João Miranda, João Paulo da Silva Neto, Alan W. Black
Low latency combination of parallelized single-pass LVCSR systems
Fethi Bougares, Mickael Rouvier, Yannick Estève, Georges Linarès
Efficient on-the-fly hypothesis rescoring in a hybrid GPU/CPU-based large vocabulary continuous speech recognition engine
Jungsuk Kim, Jike Chong, Ian Lane
Fully automated neuropsychological assessment for detecting mild cognitive impairment
Maider Lehr, Emily Prud'hommeaux, Izhak Shafran, Brian Roark
Spontaneous-speech acoustic-prosodic features of children with autism and the interacting psychologist
Daniel Bone, Matthew P. Black, Chi-Chun Lee, Marian E. Williams, Pat Levitt, Sungbok Lee, Shrikanth Narayanan
Contrastive intonation in autism: the effect of speaker- and listener-perspective
Constantijn Kaland, Emiel Krahmer, Marc Swerts
Characterizing covert articulation in apraxic speech using real-time MRI
Christina Hagedorn, Michael Proctor, Louis Goldstein, Maria Luisa Gorno Tempini, Shrikanth S. Narayanan
Automatic word naming recognition for treatment and assessment of aphasia
Alberto Abad, Anna Pompili, Angela Costa, Isabel Trancoso
Vocal-source biomarkers for depression: a link to psychomotor activity
Thomas F. Quatieri, Nicolas Malyska
Audio and contact microphones for cough detection
Thomas Drugman, Jerome Urbain, Nathalie Bauwens, Ricardo Chessini, Anne-Sophie Aubriot, Patrick Lebecque, Thierry Dutoit
Analyzing and interpreting automatically learned rules across dialects
Nancy F. Chen, Wade Shen, Joseph P. Campbell
The effect of use of drugs on speaker's fundamental frequency and formants
Andrey Raev, Yuri Matveev, Tatiana Goloshchapova
On the assessment of audiovisual cues to speaker confidence by preteens with typical development (TD) and a-typical development (AD)
Marc Swerts, Cees de Bie
Interplay between verbal response latency and physiology of children with autism during ECA interactions
Theodora Chaspari, Chi-Chun Lee, Shrikanth Narayanan
Combination of multiple speech dimensions for automatic assessment of dysarthric speech intelligibility
Myung Jong Kim, Hoirin Kim
Whole-word recognition from articulatory movements for silent speech interfaces
Jun Wang, Ashok Samal, Jordan R. Green, Frank Rudzicz
Verifying session level pronunciation accuracy in a speech therapy application
Shou-Chun Yin, Richard C. Rose, Yun Tang
Duration of ambulatory monitoring needed to accurately estimate voice use
Daryush D. Mehta, Rebecca Woodbury Listfield, Harold A. Cheyne II, James T. Heaton, Shengran W. Feng, Matías Zañartu, Robert E. Hillman
Evaluating NLP features for automatic prediction of language impairment using child speech transcripts
Khairun-nisa Hassanali, Yang Liu, Thamar Solorio
Quantitative analysis of pitch in speech of children with neurodevelopmental disorders
Géza Kiss, Jan P. H. van Santen, Emily Prud'hommeaux, Lois M. Black
Discriminatively learning factorized finite state pronunciation models from dynamic Bayesian networks
Preethi Jyothi, Eric Fosler-Lussier, Karen Livescu
Joint decoding for speech recognition and semantic tagging
Anoop Deoras, Ruhi Sarikaya, Gokhan Tur, Dilek Hakkani-Tür
Investigation of maximum entropy hybrid language models for open vocabulary German and Polish LVCSR
M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney
A specialized WFST approach for class models and dynamic vocabulary
Paul R. Dixon, Chiori Hori, Hideki Kashioka
Dynamic grammars with lookahead composition for WFST-based speech recognition
Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose
Knowledge-based word lattice rescoring in a dynamic context
Todd Shore, Friedrich Faubel, Hartmut Helmke, Dietrich Klakow
Mixture component clustering for efficient speaker verification
Richard D. McClanahan, Phillip L. De Leon
Front-end channel compensation using mixture-dependent feature transformations for i-vector speaker recognition
Taufiq Hasan, John H. L. Hansen
Query-by-example using speaker content graphs
William M. Campbell, Elliot Singer
Unsupervised NAP training data design for speaker recognition
Hanwu Sun, Bin Ma
The role of score calibration in speaker recognition
George Doddington
A Bayesian approach to speaker recognition based on GMMs using multiple model structures
Takafumi Hattori, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
Residual phase cepstrum coefficients with application to cross-lingual speaker verification
Jianglin Wang, Michael Johnson
Speaker veri.cation using neighborhood preserving embedding
Chunyan Liang, Jinchao Yang, Lin Yang, Yonghong Yan
Discriminative decision function based scoring method in joint factor analysis for speaker verification
Chunyan Liang, Xiang Zhang, Lin Yang, Yonghong Yan
Integrated feature normalization and enhancement for robust speaker recognition using acoustic factor analysis
Taufiq Hasan, John H. L. Hansen
Factor analysis and nuisance attribute projection revisited
Lukáš Machlica, Zbyněk Zajic
Compensation of intrinsic variability with factor analysis modeling for robust speaker verification
Sheng Chen, Mingxing Xu
RSR2015: database for text-dependent speaker verification using multiple pass-phrases
Anthony Larcher, Kong Aik Lee, Bin Ma, Haizhou Li
Speaker idiosyncratic rhythmic features in the speech signal
Volker Dellwo, Adrian Leemann, Marie-José Kolly
Bilinear factor analysis for i-vector based speaker verification
Yun Lei, Lukáš Burget, Nicolas Scheffer
Unsupervised speaker identification using overlaid texts in TV broadcast
Johann Poignant, Hervé Bredin, Viet Bac Le, Laurent Besacier, Claude Barras, Georges Quénot
Mask estimation and refinement for MFT-based robust speaker verification
Yali Zhao, Lie Xie, Zhonghua Fu
Sparse probabilistic linear discriminant analysis for speaker verification
Hai Yang, Chunyan Liang, Yunfei Xu, Lin Yang, Yonghong Yan
Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification
Achintya Kumar Sarkar, Driss Matrouf, Pierre Michel Bousquet, Jean-François Bonastre
Ensemble classifiers using unsupervised data selection for speaker recognition
Chien-Lin Huang, Chiori Hori, Hideki Kashioka, Bin Ma
A method of speaker identification based on phoneme mean F-Ratio contribution
Songgun Hyon, Hongcui Wang, Chen Zhao, Jianguo Wei, Jianwu Dang
Mitigating effects of recording condition mismatch in speaker recognition using partial least squares
Jeremiah J. Remus, Jenniffer M. Estrada, Stephanie A. C. Schuckers
Similarities in fundamental frequency in infant speech segmentation models
Ellen Marklund, Francisco Lacerda, Iris-Corinna Schwarz, Ulla Sundberg
Phonological complexity and vocabulary size in 30-month-old Swedish children
Ulrika Marklund, Ulla Sundberg, Iris-Corinna Schwarz, Francisco Lacerda
Auditory-visual speech to infants and adults: signals and correlations
Jeesun Kim, Chris Davis, Christine Kitamura
Objective child vocal development measurement with naturalistic daylong audio recording
Dongxin Xu, Jill Gilkerson, Jeffery A. Richards
Speech production-perception relationships in children with speech delay
Kyoko Nagao, Mark Paullin, Vilena Livinsky, James B. Polikoff, Linda D. Vallino, Thierry G. Morlet, N. Carolyn Schanen, H. Timothy Bunnell
Synthetic correction of deviant speech – children's perception of phonologically modified recordings of their own speech
Sofia Strömbergsson
Combining multiple high quality corpora for improving HMM-TTS
Vincent Wan, Javier Latorre, K. K. Chin, Langzhou Chen, Mark J. F. Gales, Heiga Zen, Kate Knill, Masami Akamine
An evaluation of parameter generation methods with rich context models in HMM-based speech synthesis
Shinnosuke Takamichi, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, Sakriani Sakti, Satoshi Nakamura
Using Bayesian networks to find relevant context features for HMM-based speech synthesis
Heng Lu, Simon King
Considering global variance of the log power spectrum derived from mel-cepstrum in HMM-based parametric speech synthesis
Xiang Yin, Zhen-Hua Ling, Ming Lei, Lirong Dai
A speech parameter generation algorithm using local bariance for HMM-based speech synthesis
Vataya Chunwijitra, Takashi Nose, Takao Kobayashi
Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP
Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima, Masami Akamine
Wideband parametric speech synthesis using warped linear prediction
Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku
Modeling the creaky excitation for parametric speech synthesis
Thomas Drugman, John Kane, Christer Gobl
Amplitude spectrum based excitation model for HMM-based speech synthesis
Zhengqi Wen, Jianhua Tao
Speech synthesis using a non-maximally decimated filter bank for embedded systems
Nobuyuki Nishizawa, Tsuneo Kato
Ways to implement global variance in statistical speech synthesis
Hanna Silén, Elina Helander, Jani Nurminen, Moncef Gabbouj
HMM-based speech synthesis using sub-band basis spectrum model
Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Takehiko Kagoshima, Masami Akamine
Comparing different acoustic modeling techniques for multilingual boosting
David Imseng, John Dines, Petr Motlicek, Philip N. Garner, Hervé Bourlard
Model-based approaches to adaptive training in reverberant environments
Yongqiang Wang, Mark J. F. Gales
Model-based approaches for degraded channel modelling in robust ASR
Mark J. F. Gales, Federico Flego
Improved model selection for the ASR-driven binary mask
William Hartmann, Eric Fosler-Lussier
Accelerated batch learning of convex log-linear models for LVCSR
Simon Wiesler, Ralf Schlüter, Hermann Ney
Improving discriminative training for robust acoustic models in large vocabulary continuous speech recognition
Janne Pylkkönen, Mikko Kurimo
Semi-supervised methods for improving keyword search of unseen terms
Scott Novotney, Ivan Bulyko, Richard Schwartz, Sanjeev Khudanpur, Owen Kimball
Probabilistic speaker-class based acoustic modeling for large vocabulary continuous speech recognition
Xiangang Li, Dan Su, Zaihu Pang, Xihong Wu
Classification of stressed speech using physical parameters derived from two-mass model
Xiao Yao, Takatoshi Jitsuhiro, Chiyomi Miyajima, Norihide Kitaoka, Kazuya Takeda
IVN-based joint training of GMM and HMMs using an improved VTS-based feature compensation for noisy speech recognition
Jun Du, Qiang Huo
Amplitude modulation filters as feature sets for robust ASR: constant absolute or relative bandwidth?
Niko Moritz, Jörn Anemüller, Birger Kollmeier
Effect of speech priors in single-channel speech-music separation for ASR
Cemil Demir, A. Taylan Cemgil, Murat Saraçlar
On the role of binary mask pattern in automatic speech recognition
Arun Narayanan, DeLiang Wang
Dereverberation based on wavelet packet filtering for robust automatic speech recognition
Randy Gomez, Tatsuya Kawahara
Spectral intersections for non-stationary signal separation
Trausti Kristjansson, Thad Hughes
Speech recognition by denoising and dereverberation based on spectral subtraction in a real noisy reverberant environment
Kyohei Odani, Longbiao Wang, Atsuhiko Kai
Q-Gaussian based spectral subtraction for robust speech recognition
Hilman F. Pardede, Koichi Shinoda, Koji Iwano
Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition
Bernd T. Meyer, Constantin Spille, Birger Kollmeier, Nelson Morgan
Feature extraction based on hearing system signal processing for robust large vocabulary speech recognition
Qi Peter Li, Xie Sun
Automatic estimation of the first two subglottal resonances in children's speech with application to speaker normalization in limited-data conditions
Harish Arsikere, Gary K. F. Leung, Steven M. Lulich, Abeer Alwan
Robust phoneme recognition based on biomimetic speech contours
Michael A. Carlin, Kailash Patil, Sridhar Krishna Nemala, Mounya Elhilali
A feature space transformation method for personalization using generalized i-vector clustering
Kaisheng Yao, Yifan Gong, Chaojun Liu
Longer features: they do a speech detector good
T. J. Tsai, Nelson Morgan
Robust feature extraction for speech recognition by enhancing auditory spectrum
Md Jahangir Alam, Patrick Kenny, Douglas O'Shaughnessy
Enhancing vocal tract length normalization with elastic registration for automatic speech recognition
Florian Müller, Alfred Mertins
Beamforming using uniform circular arrays for distant speech recognition in reverberant environments and double talk scenarios
Hannes Pessentheiner, Stefan Petrik, Harald Romsdorfer
Novel approach to live captioning through re-speaking: tailoring speech recognition to re-speaker's needs
Aleš Pražák, Zdeněk Loos, Jan Trmal, Josef V. Psutka, Josef Psutka
Development and evaluation of automatic punctuation for French and English speech-to-text
Jáchym Kolář, Lori Lamel
Spoken document clustering using word confusion networks
Shajith Ikbal, Sachindra Joshi, Ashish Verma, Om D. Deshmukh
Dynamic conditional random fields for joint sentence boundary and punctuation prediction
Xuancong Wang, Hwee Tou Ng, Khe Chai Sim
Analysis of the characteristics of talk-show TV programs
Fabio Brugnara, Daniele Falavigna, Diego Giuliani, Roberto Gretter
Rethinking the corpus: moving towards dynamic linguistic resources
Andrew Rosenberg
Speaker recognition for children's speech
Saeid Safavi, Maryam Najafian, Abualsoud Hanani, Martin Russell, Peter Jančovič, Michael Carey
A simple and efficient method to align very long speech signals to acoustically imperfect transcriptions
Germán Bordel, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, Amparo Varona
Estimation of talker's head orientation based on discrimination of the shape of cross-power spectrum phase coefficients
Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
Sentence detection using multiple annotations
Ann Lee, James Glass
A speaker-role based approach for detecting Politicians in TV broadcast news
Delphine Charlet, Geraldine Damnati
Relative importance of temporal envelope and fine structure cues in low- and high- order harmonic regions for Mandarin lexical-tone recognition
Guangting Mai
Real-time implementation of multi-band frequency compression for listeners with moderate sensorineural impairment
Nitya Tiwari, Prem C. Pandey, Pandurangarao N. Kulkarni
Word prominence detection using robust yet simple prosodic features
Taniya Mishra, Vivek Rangarajan Sridhar, Alistair Conkie
Online story segmentation of multilingual streaming broadcast news
Amit Srivastava, Saurabh Khanwalkar, Gretchen Markiewicz, Guruprasad Saikumar
Average spectrotemporal structure of continuous speech matches with the frequency resolution of human hearing
Okko Räsänen
Perceptual importance of the phase related information in speech
Ibon Saratxaga, Inma Hernáez, Michael Pucher, Eva Navas, Iñaki Sainz
Improving the entropy estimate of neuronal firings of modeled cochlear nucleus neurons
Andrea Grigorescu, Marek Rudnicki, Michael Isik, Werner Hemmert, Stefano Rini
Perception of synthetic speech in adult users of cochlear implants
Kyoko Nagao, Mark Paullin, James B. Polikoff, Jason Lilley, H. Timothy Bunnell
Hearing loss and the use of acoustic cues in phonetic categorisation of fricatives
Odette Scharenborg, Esther Janse
Intelligibility of speech spoken in noise/reverberation for older adults in reverberant environments
Nao Hodoshima, Takayuki Arai, Kiyohiro Kurisu
Improved speech intelligibility with a chimaera hearing aid algorithm
Andrew Hines, Naomi Harte
Unsupervised acoustic analyses of normal and lombard speech, with spectral envelope transformation to improve intelligibility
Elizabeth Godoy, Yannis Stylianou
The effect of dichotic processing on the perception of binaural cues
Akiko Amano-Kusumoto, Justin M. Aronoff, Motokuni Itoh, Sigfrid D. Soli
Speech and speaker separation in human auditory cortex
Nima Mesgarani, Edward Chang
On the effect of the acoustic environment on the accuracy of perception of speaker orientation from auditory cues alone
Jens Edlund, Mattias Heldner, Joakim Gustafson
Sibilant speech detection in noise
Sira Gonzalez, Mike Brookes
Voice activity detection using speech recognizer feedback
Kit Thambiratnam, Weiwu Zhu, Frank Seide
Descriptive vocabulary development for degraded speech
Dushyant Sharma, Gaston Hilkhuysen, Patrick A. Naylor, Nikolay D. Gaubitch, Mark Huckvale, Mike Brookes
Overlapped speech detection in meeting using cross-channel spectral subtraction and spectrum similarity
Ryo Yokoyama, Yu Nasu, Koichi Shinoda, Koji Iwano
Speech restoration based on deep learning autoencoder with layer-wised pretraining
Xugang Lu, Shigeki Matsuda, Chiori Hori, Hideki Kashioka
Detection and positioning of overlapped sounds in a room environment
Rupayan Chakraborty, Climent Nadeu, Taras Butko
Foreground speech segmentation using zero frequency filtered signal
K. T. Deepak, Biswajit Dev Sarma, S. R. Mahadeva Prasanna
The effect of spectral estimator on common spectral measures for sibilant fricatives
Patrick Reidy, Mary Beckman
Gaussian mixture gain priors for regularized nonnegative matrix factorization in single- channel source separation
Emad M. Grais, Hakan Erdogan
Speaker independent single channel source separation using sinusoidal features
Shivesh Ranjan, Karen L. Payton, Pejman Mowlaee
Boosting classification based speech separation using temporal dynamics
Yuxuan Wang, DeLiang Wang
Acoustic features for classification based speech separation
Yuxuan Wang, Kun Han, DeLiang Wang
Hidden Markov models as priors for regularized nonnegative matrix factorization in single-channel source separation
Emad M. Grais, Hakan Erdogan
Unconstrained speech separation by composition of longest segments
Ming Ji, Ramji Srinivasan, Danny Crookes
Modulation domain blind source separation for noisy speech mixture
Yi Zhang, Yunxin Zhao
Phase estimation for signal reconstruction in single-channel source separation
Pejman Mowlaee, Rahim Saeidi, Rainer Martin
Bayesian group sparse learning for nonnegative matrix factorization
Jen-Tzung Chien, Hsin-Lung Hsieh
Resonator-based creaky voice detection
Thomas Drugman, John Kane, Christer Gobl
Effect of tongue tip trilling on the glottal excitation source
V. K. Mittal, N. Dhananjaya, Bayya Yegnanarayana
Estimating the voice source in noise
Gang Chen, Yen-Liang Shue, Jody Kreiman, Abeer Alwan
Voice source analysis using biomechanical modeling and glottal inverse filtering
Alan Pinheiro, Tuomo Raitio, Danyane Gomes, Paavo Alku
Speech modeling and processing by low-dimensional dynamic glottal models
Carlo Drioli, Andrea Calanca
Improved formant frequency estimation from high-pitched vowels by downgrading the contribution of the glottal source with weighted linear prediction
Paavo Alku, Jouni Pohjalainen, Martti Vainio, Anne-Maria Laukkanen, Brad Story
Automatic topology generation of glottal source HMM
Akira Sasou
Towards glottal source controllability in expressive speech synthesis
Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Tuomo Raitio, Nicolas Obin, Paavo Alku, Junichi Yamagishi, Juan M. Montero
Combining temporal and cepstral features for the automatic perceptual categorization of disordered connected speech
Ali Alpan, Jean Schoentgen, Francis Grenez
A preliminary study on cross-databases emotion recognition using the glottal features in speech
Rui Sun, Elliot Moore II
Analysis on the importance of short-term speech parameterizations for emotional statistical parametric speech synthesis
Ranniery Maia, Masami Akamine
Analysis of vocal tremor and jitter by empirical mode decomposition of glottal cycle length time series
Christophe Mertens, Francis Grenez, Jean Schoentgen
Utilizing Markov chain Monte Carlo (MCMC) method for improved glottal inverse filtering
Harri Auvinen, Tuomo Raitio, Samuli Siltanen, Paavo Alku
Glottal source shape parameter estimation using phase minimization variants
Stefan Huber, Axel Roebel, Gilles Degottex
Glottal waveform analysis of physical task stress speech
Keith W. Godin, Taufiq Hasan, John H. L. Hansen
Speaker discrimination ability of glottal waveform features
Juan Félix Torres, Elliot Moore
Paraphrastic language models
Xunying Liu, Mark J. F. Gales, Phillip C. Woodland
Efficient structured language modeling for speech recognition
Ariya Rastrow, Mark Dredze, Sanjeev Khudanpur
Towards recurrent neural networks language models with linguistic and contextual features
Yangyang Shi, Pascal Wiggers, Catholijn M. Jonker
Conversion of recurrent neural network language models to weighted finite state transducers for automatic speech recognition
Gwénolé Lecorvé, Petr Motlicek
Large scale hierarchical neural network language models
Hong-Kwang Kuo, Ebru Arısoy, Ahmad Emami, Paul Vozila
A sparse plus low rank maximum entropy language model
Brian Hutchinson, Mari Ostendorf, Maryam Fazel
PLDA modeling in i-vector and supervector space for speaker verification
Ye Jiang, Kong Aik Lee, Zhenmin Tang, Bin Ma, Anthony Larcher, Haizhou Li
Supervized mixture of PLDA models for cross-channel speaker verification
Konstantin Simonchik, Timur Pekhovsky, Andrey Shulipa, Anton Afanasyev
Spoofing countermeasures for the protection of automatic speaker recognition systems against attacks with artificial signals
Federico Alegre, Ravichander Vipperla, Nicholas Evans
PLDA using Gaussian restricted boltzmann machines with application to speaker verification
Themos Stafylakis, Patrick Kenny, Mohammed Senoussaoui, Pierre Dumouchel
Mean hilbert envelope coefficients (MHEC) for robust speaker recognition
Seyed Omid Sadjadi, Taufiq Hasan, John H. L. Hansen
Detecting converted speech and natural speech for anti-spoofing attack in speaker recognition
Zhizheng Wu, Eng Siong Chng, Haizhou Li
Maximising objective speech intelligibility by local F0 modulation
Julián Villegas, Martin Cooke
Effect of prosodic changes on speech intelligibility
Catherine Mayo, Vincent Aubanel, Martin Cooke
Effects of visual speech information on native listener judgments of L2 consonant intelligibility
Saya Kawase, Yue Wang
Perceptual compensation for the effects of reverberation on consonant identification: a comparison of human and machine performance
Guy J. Brown, Amy V. Beeston, Kalle J. Palomäki
The intelligibility of lombard speech: communicative setting matters
Michael Fitzpatrick, Jeesun Kim, Chris Davis
Performance comparison of intrusive objective speech intelligibility and quality metrics for cochlear implant users
João Felipe Santos, Stefano Cosentino, Oldooz Hazrati, Philipos C. Loizou, Tiago H. Falk
Exploiting temporal sequence structure for semantic analysis of multimedia
Sourish Chaudhuri, Rita Singh, Bhiksha Raj
Time delay estimation for speech signal based on FOC-spectrum
Hong Liu, Xiaofei Li
Low-rank audio signal classification under soft margin and trace norm constraints
Ziqiang Shi, Tieran Zheng, Jiqing Han, Shiwen Deng
GCC-PHAT based head orientation estimation
Carlos Segura, Javier Hernando
Plagiarism detection in polyphonic music using monaural signal separation
Soham De, Indradyumna Roy, Tarunima Prabhakar, Kriti Suneja, Sourish Chaudhuri, Rita Singh, Bhiksha Raj
TDOA estimation for multiple speakers in underdetermined case
Mariem Bouafif, Zied Lachiri
Local-feature-map integration using convolutional neural networks for music genre classification
Toru Nakashika, Christophe Garcia, Tetsuya Takiguchi
Training deep nets with imbalanced and unlabeled data
Jeff Berry, Ian Fasel, Luciano Fadiga, Diana Archangeli
Speech data clustering based on phoneme error trend for unsupervised acoustic model adaptation
Taichi Asami, Satoshi Kobashikawa, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi
Gaussian map based acoustic model adaptation using untranscribed data for speech recognition in severely adverse environments
Wooil Kim, John H. L. Hansen
Investigating performance of the discriminative methods for long-term speaker adaptation
Danning Jiang, Dimitri Kanevsky, Vaibhava Goel, Yong Qin
A two-stage speaker adaptation approach for subspace Gaussian mixture model based nonnative speech recognition
Bo Li, Khe Chai Sim
A comparative study of adaptive, automatic recognition of disordered speech
Heidi Christensen, Stuart Cunningham, Charles Fox, Phil Green, Thomas Hain
Phoneme class based adaptation for mismatch acoustic modeling of distant noisy speech
Seçkin Uluskan, John H. L. Hansen
Rapid nonlinear speaker adaptation for large-vocabulary continuous speech recognition
Zoi Roupakia, Anton Ragni, Mark J. F. Gales
A study on using word-level HMMs to improve ASR performance over state-of-the-art phone-level acoustic modeling for LVCSR
I-Fan Chen, Chin-Hui Lee
Factored adaptation using a combination of feature-space and model-space transforms
Michael Seltzer, Alex Acero
Exploring discriminative speech trajectory structures
Heyun Huang, Louis ten Bosch, Bert Cranen, Lou Boves
Estimating classifier performance in unknown noise
Ehsan Variani, Hynek Hermansky
Continuous digit recognition in noise: reservoirs can do an excellent job!
Azarakhsh Jalalvand, Fabian Triefenbach, Jean-Pierre Martens
Optimization-based control for the extended baum-welch algorithm
Janne Pylkkönen, Mikko Kurimo
Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems
Marc René Schädler, Birger Kollmeier
Phone recognition in critical bands using sub-band temporal modulations
Feipeng Li, Sri Harish Mallidi, Hynek Hermansky
Combining acoustic data driven G2p and letter-to-sound rules for under resource lexicon generation
Ramya Rasipuram, Mathew M. Doss
CRF-based diacritisation of colloquial Arabic for automatic speech recognition
Sarah Al-Shareef, Thomas Hain
Analysis of temporal resolution in frequency domain linear prediction
Sriram Ganapathy, Hynek Hermansky
White listing and score normalization for keyword spotting of noisy speech
Bing Zhang, Richard Schwartz, Stavros Tsakalidis, Long Nguyen, Spyros Matsoukas
Complementary phone error training
Frank Diehl, Phillip C. Woodland
Posterior-scaled MPE: novel discriminative training criteria
Markus Nussbaum-Thom, Zoltan Tuske, Georg Heigold, Ralf Schlüter, Hermann Ney
Improve the implementation of pitch features for Mandarin digit string recognition task
Pei Ding, Liqiang He
Exploring joint equalization of spatial-temporal contextual statistics of speech features for robust speech recognition
Hsin-Ju Hsieh, Jeih-weih Hung, Berlin Chen
Speaker-dependent voice activity detection robust to background speech noise
Shigeki Matsuda, Naoya Ito, Kosuke Tsujino, Hideki Kashioka, Shigeki Sagayama
Log-spectral feature reconstruction based on an occlusion model for noise robust speech recognition
Jose A. González, Antonio M. Peinado, Angel M. Gómez, Ning Ma
Decoding of uncertain features using the posterior distribution of the clean data for robust speech recognition
Ahmed Hussen Abdelaziz, Dorothea Kolossa
Coupling identification and reconstruction of missing features for noise-robust automatic speech recognition
Ning Ma, Jon Barker
Integrating stress information in large vocabulary continuous speech recognition
Bogdan Ludusan, Stefan Ziegler, Guillaume Gravier
Group sparse hidden Markov models for speech recognition
Jen-Tzung Chien, Cheng-Chun Chiang
The speech recognition virtual kitchen: an initial prototype
Florian Metze, Eric Fosler-Lussier
Perma and Balloon: tools for string alignment and text processing
Uwe D. Reichel
Visartico: a visualization tool for articulatory data
Slim Ouni, Loïc Mangeonjean, Ingmar Steiner
Towards automated annotation of audio and video recordings by application of advanced web-services
Przemyslaw Lenkiewicz, Dieter van Uytvanck, Peter Wittenburg, Sebastian Drude
A rule based pronunciation generator and regional accent databank for Portuguese
Simone Ashby, Sílvia Barbosa, Silvia Brandão, José Pedro Ferreira, Maarten Janssen, Catarina Silva, Mário Eduardo Viaro
Speech enhancement for android (SEA): a speech processing demonstration tool for android based smart phones and tablets
Roger Chappel, Kuldip Paliwal
ProTK: an improved prosody toolkit
Jacob Okamoto, Serguei Pakhomov, Elizabeth Shriberg, Andreas Stolcke
Speechmark: landmark detection tool for speech analysis
Suzanne Boyce, Harriet Fell, Joel MacAuslan
A tutorial dialogue system with unrestricted spoken input
Peter Bell, Myroslava Dzikovska, Amy Isard
Integrating adaptive beam-forming and auditory features for robust large vocabulary speech recognition
Xie Sun, Qi Peter Li, Manli Zhu, Qiru Zhou
A natural in-car speech interface to internet services using hybrid ASR
Hansjörg Hofmann, Ute Ehrlich, Klaus Bader, Ilona Nothelfer, André Berton
How marni helps English language learners acquire oral reading fluency
Ronald A. Cole, Daniel Bolanos, Wayne H. Ward, J. T. Carmer, Eric Borts, Edward Svirsky
Demonstration of advanced multi-modal, network-centric communication management suite
Victor Finomore Jr, John Stewart, Rita Singh, Bhiksha Raj, Ron Dallman
Dutch automatic speech recognition on the web: towards a general purpose system
Joris Pelemans, Kris Demuynck, Patrick Wambacq
An on-line, cloud-based Spanish-Spanish sign language translation system
Javier Tejedor, Fernando López-Colino, Jordi Porta, José Colás
Efficient segmental conditional random fields for one-pass phone recognition
Yanzhang He, Eric Fosler-Lussier
Enhanced polyphone decision tree adaptation for accented speech recognition
Udhyakumar Nallasamy, Florian Metze, Tanja Schultz
Efficient VTS adaptation using jacobian approximation
Jinyu Li, Michael L. Seltzer, Yifan Gong
Robust triphone mapping for acoustic modeling
Miloš Cerňak, David Imseng, Hervé Bourlard
Sparse banded precision matrices for low resource speech recognition
Weibin Zhang, Pascale Fung
Semi-blind model adaptation using piece-wise energy decay curve for large reverberant environments
Abdul Waheed Mohammed, Marco Matassoni, Harikrishna Maganti, Maurizio Omologo
Developments of a hybrid pre-processor based on frequency shifting for stereophonic acoustic echo cancellation
Bruno C. Bispo, Diamantino S. Freitas
Example-based speech enhancement with joint of spatial, spectral & temporal cues of speech and noise
Keisuke Kinoshita, Marc Delcroix, Mehrez Souden, Tomohiro Nakatani
A fast-converging adaptive frequency-domain MVDR beamformer for speech enhancement
Shengkui Zhao, Douglas L. Jones
A signal-separation-based array postfilter for distant speech recognition
Rita Singh, Kenichi Kumatani, John McDonough, Chen Liu
Constrained multichannel speech dereverberation
Meng Yu, Frank K. Soong
A triple-microphone real-time speech enhancement algorithm based on approximate array analytical solutions
Ryan Ritch, Meng Yu, Jack Xin
Developing a speech activity detection system for the DARPA RATS program
Tim Ng, Bing Zhang, Long Nguyen, Spyros Matsoukas, Xinhui Zhou, Nima Mesgarani, Karel Veselý, Pavel Matějka
Speech activity detection for noisy data using adaptation techniques
Mohamed Kamal Omar
Speech/nonspeech segmentation in web videos
Ananya Misra
On the use of machine learning methods for speech and voicing classification
Philip Harding, Ben Milner
Acoustic and data-driven features for robust speech activity detection
Samuel Thomas, Sri Harish Mallidi, Thomas Janu, Hynek Hermansky, Nima Mesgarani, Xinhui Zhou, Shihab Shamma, Tim Ng, Bing Zhang, Long Nguyen, Spyros Matsoukas
A two-step NMF based algorithm for single channel speech separation
Shuo Wang, Wenjun Wu
Meaning inhibition and sentence processing in Chinese: evidence from negative priming
Michael C. W. Yip
Similar speaker selection technique based on distance metric learning with perceptual voice quality similarity
Yusuke Ijima, Mitsuaki Isogai, Hideyuki Mizuno
Gendered sound symbolism and masking effects in speech processing
Molly Babel, Grant McGuire
Modeling cue trading in human word recognition
Louis ten Bosch, Odette Scharenborg
Accounting for speech rate in spoken word recognition
David Cheng-Huan Li, Elsi Kaiser
The processes underlying two frequent casual speech phenomena in Dutch: a production experiment
Iris Hanique, Mirjam Ernestus
Intrinsic velocity differences of lip and jaw movements: preliminary results
Peter Birkholz, Phil Hoole
Co-occurrence of reduced word forms in natural speech
Malte C. Viebahn, Mirjam Ernestus, James M. McQueen
Voice production mechanisms of vibrato in Noh
Ikuyo Yoshinaga, Jiangping Kong
Automatic detection of hypernasal speech signals using nonlinear and entropy measurements
Juan Rafael Orozco-Arroyave, Julian David Arias-Londoño, Jesús Francisco Vargas-Bonilla, Elmar Nöth
Effects of the availability of visual information and presence of competing conversations on speech production
Vincent Aubanel, Martin Cooke, Emma Foster, Maria Luisa Garcia Lecumberri, Catherine Mayo
Constrained maximum mutual information dimensionality reduction for language identification
Shuai Huang, Glen A. Coppersmith, Damianos Karakos
Phonotactic language recognition using MLP features
Mohamed Faouzi BenZeghiba, Jean-Luc Gauvain, Lori Lamel
The EHU systems for the NIST 2011 language recognition evaluation
Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-Fuentes, Mireia Diez, German Bordel
Study of different backends in a state-of-the-art language recognition system
Mikel Penagarikano, Amparo Varona, Mireia Diez, Luis Javier Rodriguez-Fuentes, German Bordel
On the use of non-linear polynomial kernel SVMs in language recognition
Sibel Yaman, Jason Pelecanos, Mohamed Kamal Omar
Exemplar-based sparse representation for language recognition on i-vectors
Bing Jiang, Yan Song, Wu Guo, Lirong Dai
Subspace-based feature representation and learning for language recognition
Yu-Chin Shih, Hung-Shin Lee, Hsin-Min Wang, Shyh-Kang Jeng
Effect of relevance factor of maximum a posteriori adaptation for GMM-SVM in speaker and language recognition
Changhuai You, Haizhou Li, Bin Ma, Kong Aik Lee
Using time-synchronous phone co-occurrences in a SVM-phonotactic dialect recognition system
Amparo Varona, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, German Bordel, Mireia Diez
Nativeness classification with suprasegmental features on the accent group level
Mahnoosh Mehrabani, Joseph Tepperman, Emily Nava
Open-vocabulary retrieval of spoken content with shorter/longer queries considering word/subword-based acoustic feature similarity
Huny-yi Lee, Po-wei Chou, Lin-shan Lee
Consumer-level multimedia event detection through unsupervised audio signal modeling
Byungki Byun, Ilseo Kim, Sabato Marco Siniscalchi, Chin-Hui Lee
Event-based video retrieval using audio
Qin Jin, Peter Schulam, Shourabh Rawat, Susanne Burger, Duo Ding, Florian Metze
Compact audio representation for event detection in consumer media
Xiaodan Zhuang, Stavros Tsakalidis, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan
N-gram FST indexing for spoken term detection
Chao Liu, Dong Wang, Javier Tejedor
Spoken inquiry discrimination using bag-of-words for speech-oriented guidance system
Haruka Majima, Rafael Torres, Yoko Fujita, Hiromichi Kawanami, Tomoko Matsui, Hiroshi Saruwatari, Kiyohiro Shikano
Robust event detection from spoken content in consumer domain videos
Stavros Tsakalidis, Xiaodan Zhuang, Roger Hsiao, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan
Bag-of-audio-words approach for multimedia event classification
Stephanie Pancoast, Murat Akbacak
Improvements in Japanese voice search
Ken-ichi Iso, Edward Whittaker, Tadashi Emori, Junpei Miyake
A conversational movie search system based on conditional random fields
Jingjing Liu, Scott Cyphers, Panupong Pasupat, Ian McGraw, James Glass
Interactive spoken content retrieval with different types of actions optimized by a Markov decision process
Tsung-Hsien Wen, Hung-Yi Lee, Lin-Shan Lee
Voice query refinement
Cyril Allauzen, Edward Benson, Ciprian Chelba, Michael Riley, Johan Schalkwyk
Indexing raw acoustic features for scalable zero resource search
Aren Jansen, Benjamin Van Durme
Lexical-phonetic automata for spoken utterance indexing and retrieval
Julien Fayolle, Murat Saraçlar, Fabienne Moreau, Christian Raymond, Guillaume Gravier
Automating crowd-supervised learning for spoken language systems
Ian McGraw, Scott Cyphers, Panupong Pasupat, Jingjing Liu, James Glass
Enhancing exemplar-based posteriors for speech recognition tasks
Tara N. Sainath, David Nahamoo, Dimitri Kanevsky, Bhuvana Ramabhadran
Advances in noise robust digit recognition using hybrid exemplar-based techniques
Jort F. Gemmeke, Hugo Van hamme
Group sparsity for speaker identity discrimination in factorisation-based speech recognition
Antti Hurmalainen, Rahim Saeidi, Tuomas Virtanen
Using sparse classification outputs as feature observations for noise-robust ASR
Yang Sun, Bert Cranen, Jort F. Gemmeke, Lou Boves, Louis ten Bosch, Mathew M. Doss
Synthetic references for template-based ASR using posterior features
Serena Soldo, Mathew Magimai-Doss, Hervé Bourlard
Heterogeneous convolutive non-negative sparse coding
Dong Wang, Javier Tejedor
Convolutive non-negative sparse coding and new features for speech overlap handling in speaker diarization
Jürgen T. Geiger, Ravichander Vipperla, Simon Bozonnet, Nicholas Evans, Björn Schuller, Gerhard Rigoll
Selection of TDOA parameters for MDM speaker diarization
Beatriz Martínez-González, José M. Pardo, Julián D. Echeverry-Correa, José A. Vallejo-Pinto, Roberto Barra-Chicote
Confidence for speaker diarization using PCA spectral ratio
Orith Toledo-Ronen, Hagai Aronowitz
Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model
Naohiro Tawara, Tetsuji Ogawa, Shinji Watanabe, Atsushi Nakamura, Tetsunori Kobayashi
Diartk: an open source toolkit for research in multistream speaker diarization and its application to meetings recordings
Deepu Vijayasenan, Fabio Valente
I-vectors and ILP clustering adapted to cross-show speaker diarization
Grégor Dupuy, Mickael Rouvier, Sylvain Meignier, Yannick Estève
Emphatic segments and emphasis spread in Lebanese Arabic: a real-time magnetic resonance imaging study
Assaf Israel, Michael Proctor, Louis Goldstein, Khalil Iskarous, Shrikanth Narayanan
Using magnetic resonance to image the pharynx during Arabic speech: static and dynamic aspects
Ryan K. Shosted, Bradley P. Sutton, Abbas Benmamoun
Articulatory speaker normalisation based on MRI-data using three-way linear decomposition methods
Julián Andrés Valdés Vargas, Pierre Badin, Laurent Lamalle
Vowels produced by sliding three-tube model with different lengths
Takayuki Arai
Estimating the vocal-tract area function from formants using a sensitivity function and least square
Tokihiko Kaburagi, Tetsuro Takano, Yuki Sakamoto
Modeling source-tract interaction in speech production: voicing onset vs. vowel height after a voiceless obstruent
Jorge C. Lucero, Laura L. Koenig, Susanne Fuchs
Modelling a noisy-channel for voice conversion using articulatory features
Bajibabu Bollepalli, Alan W. Black, Kishore Prahallad
Asymmetries in the perception of synthesized speech
Anna C. Janska, Erich Schröger, Thomas Jacobsen, Robert A. J. Clark
Predicting character-appropriate voices for a TTS-based storyteller system
Erica Greene, Taniya Mishra, Patrick Haffner, Alistair Conkie
Psychoacoustic segment scoring for multi-form speech synthesis
Alexander Sorin, Slava Shechtman, Vincent Pollet
Pauses and respiratory markers of the structure of book reading
Gérard Bailly, Cécilia Gouvernayre
Proper name splicing in computer games with TTS
Blaise Potard, Matthew P. Aylett, Christopher J. Pidcock
Speaker clustering for a mixture of singing and reading
Mahnoosh Mehrabani, John H. L. Hansen
Automatic speech segmentation using probabilistic latent component modeling
Sayan Ghosh, Thippur V. Sreenivas
Overlapping sound event recognition using local spectrogram features with the generalised hough transform
Jonathan Dennis, Huy Dat Tran, Eng Siong Chng
Automatic phoneme segmentation using auditory attention features
Ozlem Kalinli
A non-uniform filterbank for speaker recognition
Jia Min Karen Kua, Tharmarajah Thiruvaran, Eliathamby Ambikairajah
Towards an unsupervised speaking style voice building framework: multi.style speaker diarization
Jaime Lorenzo-Trueba, Beatriz Martinez-Gonzalez, Veronica Lopez–Ludeña, Roberto Barra-Chicote, Javier Ferreiros, Junichi Yamagishi, Juan M. Montero
KNNDIST: a non-parametric distance measure for speaker segmentation
Seyed Hamidreza Mohammadi, Hossein Sameti, Mahsa Sadat Elyasi Langarani, Amirhossein Tavanaei
Lexical story co-segmentation of Chinese broadcast news
Wei Feng, Xuecheng Nie, Liang Wan, Lei Xie, Jianmin Jiang
Toward an optimum feature set and HMM model parameters for automatic phonetic alignment of spontaneous speech
Montri Karnjanadecha, Stephen A. Zahorian
Spelling as a complementary strategy for speech recognition
Keith Vertanen, Per Ola Kristensson
Automatic error recovery for pronunciation dictionaries
Tim Schlippe, Sebastian Ochs, Ngoc Thang Vu, Tanja Schultz
Confidence measure for speech indexing based on latent dirichlet allocation
Grégory Senay, Georges Linarès
Mixed probabilistic and deterministic dependency parsing
Christophe Cerisara, Alejandra Lorenzo
Automatic vocabulary adaptation based on semantic similarity and speech recognition confidence measure
Shoko Yamahata, Yoshikazu Yamaguchi, Atsunori Ogawa, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi
Towards empirical dialog-state modeling and its use in language modeling
Nigel G. Ward, Alejandro Vega
Evaluation of many-to-many alignment algorithm by automatic pronunciation annotation using web text mining
Keigo Kubo, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano
Applying multiview learning algorithms to human-human conversation classification
Sokol Koço, Cécile Capponi, Frédéric Béchet
Automatic transcription of lecture speech using language model based on speaking- style transformation of proceeding texts
Yuya Akita, Makoto Watanabe, Tatsuya Kawahara
Normalization of text messages using character- and phone-based machine translation approaches
Chen Li, Yang Liu
A weighted combination of speech with text-based models for Arabic diacritization
Aisha S. Azim, Xiaoxuan Wang, Sim Khe Chai
Using sub-word-level information for confidence estimation with conditional random field models
Matthew S. Seigel, Phillip C. Woodland
Supervised spoken document summarization jointly considering utterance importance and redundancy by structured support vector machine
Hung-yi Lee, Yu-yu Chou, Yow-Bang Wang, Lin-shan Lee
Integrating intra-speaker topic modeling and temporal-based inter-speaker topic modeling in random walk for improved multi-party meeting summarization
Yun-Nung Chen, Florian Metze
Language modeling for voice-enabled social TV using tweets
Junlan Feng, Bernard Renger
Detecting OOV named-entities in conversational speech
Rohit Kumar, Rohit Prasad, Sankaranarayanan Ananthakrishnan, Aravind Namandi Vembu, Dave Stallard, Stavros Tsakalidis, Prem Natarajan
Unsupervised deep belief features for speech translation
Sameer Maskey, Bowen Zhou
Euskoparl: a speech and text Spanish-basque parallel corpus
Alicia Pérez, José M. Alcaide, María-Inés Torres
Comparing transcription agreement on non-native English speech corpus between native and non-native annotators
Hyuksu Ryu, Sunhee Kim, Minhwa Chung
Podcastle: collaborative training of language models on the basis of wisdom of crowds
Jun Ogata, Masataka Goto
Speech pattern discovery using audio-visual fusion and canonical correlation analysis
Lei Xie, Yinqing Xu, Lilei Zheng, Qiang Huang, Bingfeng Li
Power mean pyramid scores for summarization evaluation
Sameer Maskey, Andrew Rosenberg
Visualizing tool for evaluating inter-label similarity in prosodic labeling experiments
David Escudero-Mancebo, Eva Estebas-Vilaplana
Objective, subjective and linguistic roads to perceptual prominence.how are they compared and why?
Petra Wagner, Fabio Tamburini, Andreas Windmann
Audio-visual evaluation and detection of word prominence in a human-machine interaction scenario
Martin Heckmann
Obtaining prominence judgments from naive listeners.influence of rating scales linguistic levels and normalisation
Denis Arnold, Petra Wagner, Bernd Möbius
Towards hierarchical prosodic prominence generation in TTS synthesis
Leonardo Badino, Robert A. J. Clark, Mirjam Wester
Investigating syllabic prominence with conditional random fields and latent-dynamic conditional random fields
Francesco Cutugno, Enrico Leone, Bogdan Ludusan, Antonio Origlia
Disentangling lexical, morphological, syntactic and semantic influences on German prominence - evidence from a production study
Barbara Samlowski, Petra Wagner, Bernd Möbius
Using prominence and phrasing predictions to improve weighted dictionary pronunciation models
Andrew Rosenberg
A continuous prominence score based on acoustic features
Jean-Philippe Goldman, Mathieu Avanzi, Antoine Auchlin, Anne Catherine Simon
More on the normalization of syllable prominence ratings
Christopher Sappok, Denis Arnold
F0 and the perception of prominence
Tim Mahrt, Jennifer Cole, Margaret Fleck, Mark Hasegawa-Johnson
Language differences in the perceptual weight of prominence-lending properties
Bistra Andreeva, William Barry, Magdalena Wolska
A novel confidence measure based on context consistency for spoken term detection
Haiyang Li, Jiqing Han, Tieran Zheng, Guibin Zheng
Discriminatively trained phoneme confusion model for keyword spotting
Panagiota Karanasou, Lukas Burget, Dimitra Vergyri, Murat Akbacak, Arindam Mandal
Inverting the point process model for fast phonetic keyword search
Keith Kintzley, Aren Jansen, Kenneth Church, Hynek Hermansky
Exploiting discriminative point process models for spoken term detection
Atta Norouzian, Aren Jansen, Richard C. Rose, Samuel Thomas
Subword speech recognition for detection of unseen words
Ivan Bulyko, José Herrero, Chris Mihelich, Owen Kimball
OOV word detection using hybrid models with mixed types of fragments
Long Qin, Alexander Rudnicky
An automatic child-directed speech detector for the study of child language development
Soroush Vosoughi, Deb Roy
Aligning manifolds to model the earliest phonological abstraction in infant-caretaker vocal imitation
Andrew R. Plummer
The F0 fall delay of lexical pitch accent in Japanese infant-directed speech
Yoko Saikachi, Mafuyu Kitahara, Ken'ya Nishikawa, Ai Kanato, Reiko Mazuka
Childrenfs productions of multi-syllabic lexical stress patterns in different prosodic positions
Irina A. Shport
Prosodic marking of continuation versus completion in childrenfs narratives
Melissa A. Redford, Laura C. Dilley, Jessica L. Gamache, Elizabeth A. Wieland
Judging temporal onset differences for concurrent vowels: results for young, middleaged, and older adults
Daniel Fogerty, Diane Kewley-Port, Larry E. Humes
Combining frame and segment based models for environmental sound classification
Pengfei Hu, Wenju Liu, Wei Jiang
Using blob detection in missing feature linear-frequency cepstral coefficients for robust sound event recognition
Yi Ren Leng, Huy Dat Tran
Goal-oriented auditory scene recognition
Kailash Patil, Mounya Elhilali
Prof-life-log: audio environment detection for naturalistic audio streams
Ali Ziaei, Abhijeet Sangwan, John H. L. Hansen
Pooling robust shift-invariant sparse representations of acoustic signals
Po-Sen Huang, Jianchao Yang, Mark Hasegawa-Johnson, Feng Liang, Thomas S. Huang
Evaluation of a sparse representation-based classifier for bird phrase classification under limited data conditions
Lee Ngee Tan, Kantapon Kaewtip, Martin L. Cody, Charles E. Taylor, Abeer Alwan
Improving WFST-based G2p conversion with alignment constraints and RNNLM n-best rescoring
Josef R. Novak, Paul R. Dixon, Nobuaki Minematsu, Keikichi Hirose, Chiori Hori, Hideki Kashioka
Expand CRF to model long distance dependencies in prosodic break prediction
Jian Luan, Bolei He, Hairong Xia, Linfang Wang, Daniela Braga, Sheng Zhao
Perceptual foundations for naturalistic variability in the prosody of synthetic speech
Nanette Veilleux, Jonathan Barnes, Alejna Brugos, Stefanie Shattuck-Hufnagel
Comparison of grapheme-to-phoneme methods on large pronunciation dictionaries and LVCSR tasks
Stefan Hahn, Paul Vozila, Maximilian Bisani
A simple hybrid acoustic/morphologically-constrained technique for the synthesis of stop consonants in various vocalic contexts
Frédéric Berthommier, Laurent Girin, Louis-Jean Boë
The IIIT-h indic speech databases
Kishore Prahallad, E. Naresh Kumar, Venkatesh Keri, S. Rajendran, Alan W. Black
Detecting acronyms from capital letter sequences in Spanish
Rubén San-Segundo, Juan M. Montero, Verónica López-Ludeña, Simon King
Hidden conditional random fields with M-to-N alignments for grapheme-to-phoneme conversion
Patrick Lehnen, Stefan Hahn, Vlad-Andrei Guta, Hermann Ney
Phrase boundary assignment from text in multiple domains
Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran
Improved prediction of Japanese word accent sandhi using CRF
Nobuaki Minematsu, Shumpei Kobayashi, Shinya Shimizu, Keikichi Hirose
Articulatory VCV synthesis from EMA data
Asterios Toutios, Shinji Maeda
Nasality from Moroccan Arabic nasal and pharyngeal consonants: patterns of airflow and nasalance
Georgia Zellou
Inter-gestural timing in French nasal vowels: a comparative study of (liege, tournai) northern French vs. (marseille, toulouse) southern French
Véronique Delvaux, Kathy Huet, Myriam Piccaluga, Bernard Harmegnies
Nasal coarticulation and contrastive stress
Georgia Zellou, Rebecca Scarborough
An MRI study of the oral articulation of European Portuguese nasal vowels
Catarina Oliveira, Paula Martins, Samuel Silva, António Teixeira
Acoustic and perceptual similarity in coarticulatorily nasalized vowels
Rebecca Scarborough, Georgia Zellou
Articulatory differences between oral and nasal vowels based on the simulation of a speaker-adaptive articulatory model
Panying Rong, Ryan K. Shosted, David Kuehn