doi: 10.21437/Interspeech.2013
My adventures with speech
Hynek Hermansky
On the interaction of social and linguistic factors in phonetic variation in typical and atypical speakers
Benjamin Munson
Are cortical oscillations a useful ingredient of speech perception?
Anne-Lise Giraud
Verbal communication through brain computer interfaces
Maureen Clerc
Information retrieval-based dynamic time warping
Xavier Anguera
On the computation of document frequency statistics from spoken corpora using factor automata
Doğan Can, Shrikanth Narayanan
Acceleration of spoken term detection using a suffix array by assigning optimal threshold values to sub-keywords
Kouichi Katsurada, Seiichi Miura, Kheang Seng, Yurie Iribe, Tsuneo Nitta
Strategies for high accuracy keyword detection in noisy channels
Arindam Mandal, Julien van Hout, Yik-Cheung Tam, Vikramjit Mitra, Yun Lei, Jing Zheng, Dimitra Vergyri, Luciana Ferrer, Martin Graciarena, Andreas Kathol, Horacio Franco
On the calibration and fusion of heterogeneous spoken term detection systems
Alberto Abad, Luis Javier Rodríguez-Fuentes, Mikel Penagarikano, Amparo Varona, Germán Bordel
Intensive acoustic models constructed by integrating low-occurrence models for spoken term detection
Shiro Narumi, Kazuma Konno, Takuya Nakano, Yoshiaki Itoh, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-wook Lee
Using phonetic feature extraction to determine optimal speech regions for maximising the effectiveness of glottal source analysis
John Kane, Irena Yanushevskaya, John Dalton, Christer Gobl, Ailbhe Ní Chasaide
Beyond bandlimited sampling of speech spectral envelope imposed by the harmonic structure of voiced sounds
Hideki Kawahara, Masanori Morise, Tomoki Toda, Ryuichi Nisimura, Toshio Irino
A source-filter based adaptive harmonic model and its application to speech prosody modification
JeeSok Lee, Frank K. Soong, Hong-Goo Kang
Detection of glottal opening instants using Hilbert envelope
K. Ramesh, S. R. M. Prasanna, D. Govind
Robust formant detection using group delay function and stabilized weighted linear prediction
Dhananjaya Gowda, Jouni Pohjalainen, Mikko Kurimo, Paavo Alku
A source-filter separation algorithm for voiced sounds based on an exact anticausal/causal pole decomposition for the class of periodic signals
Thomas Hézard, Thomas Hélie, Boris Doval
Assessing the intelligibility impact of vowel space expansion via clear speech-inspired frequency warping
Elizabeth Godoy, M. Koutsogiannaki, Yannis Stylianou
Prediction of intelligibility of noisy and time-frequency weighted speech based on mutual information between amplitude envelopes
Jesper Jensen, Cees H. Taal
Frequency-adaptive post-filtering for intelligibility enhancement of narrowband telephone speech
Emma Jokinen, Marko Takanen, Paavo Alku
Comparative investigation of objective speech intelligibility prediction measures for noise-reduced signals in Mandarin and Japanese
Junfeng Li, Fei Chen, Masato Akagi, Yonghong Yan
Monitoring the effects of temporal clipping on voIP speech quality
Andrew Hines, Jan Skoglund, Anil Kokaram, Naomi Harte
The spectral dynamics of vowels in Mandarin Chinese
Jiahong Yuan
Pitch-gesture modeling using subband autocorrelation change detection
Malcolm Slaney, Elizabeth Shriberg, Jui-Ting Huang
Analysis of emotional speech at subsegmental level
P. Gangamohan, Sudarsana Reddy Kadiri, B. Yegnanarayana
Periodicity extraction for voiced sounds with multiple periodicity
Masanori Morise, Hideki Kawahara, Kenji Ozawa
Modelling and estimation of the fundamental frequency of speech using a hidden Markov model
John H. Taylor, Ben Milner
Extended weighted linear prediction using the autocorrelation snapshot — a robust speech analysis method and its application to recognition of vocal emotions
Jouni Pohjalainen, Paavo Alku
Improving the accuracy and the robustness of harmonic model for pitch estimation
Meysam Asgari, Izhak Shafran
A comparative study of glottal open quotient estimation techniques
John Kane, Stefan Scherer, Louis-Philippe Morency, Christer Gobl
Estimation of multiple-branch vocal tract models: the influence of prior assumptions
Christian H. Kasess, Wolfgang Kreuzer
Detecting overlapping speech with long short-term memory recurrent neural networks
Jürgen T. Geiger, Florian Eyben, Björn Schuller, Gerhard Rigoll
Evaluation of fundamental validity in applying AR-HMM with automatic topology generation to pathology voice analysis
Akira Sasou
Significance of instants of significant excitation for source modeling
Nagaraj Adiga, S. R. M. Prasanna
Significance of variable height-bandwidth group delay filters in the spectral reconstruction of speech
Devanshu Arya, Anant Raj, Rajesh M. Hegde
Nonlinear prediction of speech signal using volterra-wiener series
Hemant A. Patil, Tanvina B. Patel
Evaluation of speech-based protocol for detection of early-stage dementia
Aharon Satt, Alexander Sorin, Orith Toledo-Ronen, Oren Barkan, Ioannis Kompatsiaris, Athina Kokonozi, Magda Tsolaki
Instantaneous harmonic representation of speech using multicomponent sinusoidal excitation
Elias Azarov, Maxim Vashkevich, Alexander Petrovsky
A quantitative comparison of glottal closure instant estimation algorithms on a large variety of singing sounds
Onur Babacan, Thomas Drugman, Nicolas d'Alessandro, Nathalie Henrich, Thierry Dutoit
Automatic gender recognition in normal and pathological speech
J. A. Gómez-García, Juan Ignacio Godino-Llorente, G. Castellanos-Domínguez
Unsupervised vocal-tract length estimation through model-based acoustic-to-articulatory inversion
Shanqing Cai, H. Timothy Bunnell, Rupal Patel
Model order estimation using Bayesian NMF for discovering phone patterns in spoken utterances
Sayeh Mirzaei, Hugo Van hamme, Yaser Norouzi
Parallel absolute-relative feature based phonotactic language recognition
Weiwei Liu, Wei-Qiang Zhang, Zhiyi Li, Jia Liu
Dimensionality reduction of phone log-likelihood ratio features for spoken language recognition
Mireia Diez, Amparo Varona, Mikel Penagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel
Improvements in language identification on the RATS noisy speech corpus
Jeff Ma, Bing Zhang, Spyros Matsoukas, Sri Harish Mallidi, Feipeng Li, Hynek Hermansky
Regularized subspace n-gram model for phonotactic ivector extraction
Mehdi Soufifar, Lukáš Burget, Oldřich Plchot, Sandro Cumani, Jan Černocký
Foreign accent detection from spoken Finnish using i-vectors
Hamid Behravan, Ville Hautamäki, Tomi Kinnunen
Adaptive Gaussian backend for robust language identification
Mitchell McLaren, Aaron Lawson, Yun Lei, Nicolas Scheffer
Lattice-based training of bottleneck feature extraction neural networks
Matthias Paulik
Modular combination of deep neural networks for acoustic modeling
Jonas Gehring, Wonkyum Lee, Kevin Kilgour, Ian Lane, Yajie Miao, Alex Waibel
Informative spectro-temporal bottleneck features for noise-robust speech recognition
Shuo-Yiin Chang, Nelson Morgan
A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR
Zhi-Jie Yan, Qiang Huo, Jian Xu
Improved feature processing for deep neural networks
Shakti P. Rath, Daniel Povey, Karel Veselý, Jan Černocký
Deep vs. wide: depth on a budget for robust speech recognition
Oriol Vinyals, Nelson Morgan
An early case of “VOT”
Angelika Braun
Pitch pattern variations in three regional varieties of American English
Robert Allen Fox, Ewa Jacewicz, Jessica Hart
Fine-grain voice strength estimation from vowel spectral cues
Jean-Sylvain Liénard, Claude Barras
Linking loudness increases in normal and lombard speech to decreasing vowel formant separation
Elizabeth Godoy, Catherine Mayo, Yannis Stylianou
Three-dimensional rectangular vocal-tract model with asymmetric wall impedances
Kunitoshi Motoki
Quasi closed phase analysis for glottal inverse filtering
Manu Airaksinen, Brad Story, Paavo Alku
The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism
Björn Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli, Klaus Scherer, Fabien Ringeval, Mohamed Chetouani, Felix Weninger, Florian Eyben, Erik Marchi, Marcello Mortillaro, Hugues Salamin, Anna Polychroniou, Fabio Valente, Samuel Kim
Non-linguistic vocalisation recognition based on hybrid GMM-SVM approach
Artur Janicki
Characteristic contours of syllabic-level units in laughter
Jieun Oh, Eunjoon Cho, Malcolm Slaney
Detection of nonverbal vocalizations using Gaussian mixture models: looking for fillers and laughter in conversational speech
Teun F. Krikke, Khiet P. Truong
Using phonetic patterns for detecting social cues in natural conversations
Johannes Wagner, Florian Lingenfelser, Elisabeth André
Paralinguistic event detection from speech using probabilistic time-series smoothing and masking
Rahul Gupta, Kartik Audhkhasi, Sungbok Lee, Shrikanth Narayanan
Detecting laughter and filled pauses using syllable-based features
Gouzhen An, David Guy Brizan, Andrew Rosenberg
Classifying language-related developmental disorders from speech cues: the promise and the potential confounds
Daniel Bone, Theodora Chaspari, Kartik Audkhasi, James Gibson, Andreas Tsiartas, Maarten Van Segbroeck, Ming Li, Sungbok Lee, Shrikanth Narayanan
Classification of developmental disorders from speech signals using submodular feature selection
Katrin Kirchhoff, Yuzong Liu, Jeff Bilmes
Robust and accurate features for detecting and diagnosing autism spectrum disorders
Meysam Asgari, Alireza Bayestehtashk, Izhak Shafran
Suprasegmental information modelling for autism disorder spectrum and specific language impairment classification
David Martínez, Dayana Ribas, Eduardo Lleida, Alfonso Ortega, Antonio Miguel
Let me finish: automatic conflict detection using speaker overlap
Félix Grèzes, Justin Richards, Andrew Rosenberg
GMM based speaker variability compensated system for interspeech 2013 compare emotion challenge
Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah, Haizhou Li
Random subset feature selection in automatic recognition of developmental disorders, affective states, and level of conflict from speech
Okko Räsänen, Jouni Pohjalainen
Ensemble of machine learning and acoustic segment model techniques for speech emotion and autism spectrum disorders recognition
Hung-yi Lee, Ting-yao Hu, How Jing, Yun-Fan Chang, Yu Tsao, Yu-Cheng Kao, Tsang-Long Pao
Detecting autism, emotions and social signals using adaboost
Gábor Gosztolya, Róbert Busa-Fekete, László Tóth
Resistance is futile — the intonation between continuation rise and calling contour in German
Oliver Niebuhr
The influence of F0 contour continuity on prominence perception
Hansjörg Mixdorff, Oliver Niebuhr
Native English listeners' perceptions of prosody in L1 and L2 reading
Caroline L. Smith, Paul Edmunds
Naturalness judgement of L2 Mandarin Chinese — does timing matter?
Chiharu Tsurutani, Dean Luo
Language background affects the strength of the pitch bias in a duration discrimination task
Daniel Aalto, Juraj Šimko, Martti Vainio
Pitch and lengthening as cues to turn transition in Swedish
Margaret Zellers
Perception of glottalization in varying pitch contexts across languages
Maria Paola Bissiri, Margaret Zellers
Exemplar-based pitch accent categorisation using the generalized context model
Michael Walsh, Katrin Schweitzer, Nadja Schauffler
Double contrast is signalled by prenuclear and nuclear accent types alone, not by f0-plateaux
Bettina Braun, Yuki Asano
Word stress perception in European Portuguese
Susana Correia, Sónia Frota, Joseph Butler, Marina Vigário
Using generalized additive models and random forests to model prosodic prominence in German
Denis Arnold, Petra Wagner, R. Harald Baayen
Perceiving speech rate differences between natural and time-scale modified utterances
Hartmut R. Pfitzinger, Hansjörg Mixdorff
On the robustness of some acoustic parameters for signalling word stress across styles in Brazilian Portuguese
Plínio A. Barbosa, Anders Eriksson, Joel Åkesson
Reexamine the sandhi rules and the merging tones in hakka language
Shao-ren Lyu, Ho-hsien Pan
A preliminary spectral analysis of palatal and velar stop bursts in pitjantjatjara
Marija Tabain, Richard Beare, Andrew Butcher
Presentational focus realisation in nalbaria variety of assamese
Shakuntala Mahanta, A. I. Twaha
On the relation between intonational phrasing and pitch accent distribution. evidence from European Portuguese varieties
Marisa Cruz, Sónia Frota
How are word-final schwas different in the north and south of france?
Rena Nemoto, Martine Adda-Decker
Modeling postcolonial language varieties: challenges and lessons learned from mozambican Portuguese
Simone Ashby, Sílvia Barbosa, Catarina Silva, Paulino Fumo, José Pedro Ferreira
Prosody of contrastive focus in estonian
Heete Sahkai, Mari-Liis Kalvik, Meelis Mihkla
Exploring the connection of acoustic and distinctive features
Thomas Kisler, Uwe D. Reichel
A physiological analysis of the tense/lax vowel contrast in two varieties of German
Conceição Cunha, Jonathan Harrington, Phil Hoole
Production of estonian quantity contrasts by native speakers of Finnish
Einar Meister, Lya Meister
Aerodynamic and durational cues of phonological voicing in whisper
Yohann Meynadier, Yulia Gaydina
Information theoretic syllable structure and its relation to the c-center effect
Uwe D. Reichel
The bulgarian stressed and unstressed vowel system. a corpus study
Bistra Andreeva, William Barry, Jacques Koreman
Training an articulatory synthesizer with continuous acoustic data
Santitham Prom-on, Peter Birkholz, Yi Xu
Estimating speaker-specific intonation patterns using the linear alignment model
Géza Kiss, Jan P. H. van Santen
Factored maximum likelihood kernelized regression for HMM-based singing voice synthesis
June Sig Sung, Doo Hwa Hong, Hyun Woo Koo, Nam Soo Kim
Improvements to HMM-based speech synthesis based on parameter generation with rich context models
Shinnosuke Takamichi, Tomoki Toda, Yoshinori Shiga, Sakriani Sakti, Graham Neubig, Satoshi Nakamura
Voice conversion in high-order eigen space using deep belief nets
Toru Nakashika, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression
Hanna Silén, Jani Nurminen, Elina Helander, Moncef Gabbouj
A style control technique for singing voice synthesis based on multiple-regression HSMM
Takashi Nose, Misa Kanemoto, Tomoki Koriyama, Takao Kobayashi
Predicting the quality of text-to-speech systems from a large-scale feature set
Florian Hinterleitner, Christoph R. Norrenbrock, Sebastian Möller, Ulrich Heute
Speaker-specific retraining for enhanced compression of unit selection text-to-speech databases
Jani Nurminen, Hanna Silén, Moncef Gabbouj
Avatar therapy: an audio-visual dialogue system for treating auditory hallucinations
Mark Huckvale, Julian Leff, Geoff Williams
Optimizations and fitting procedures for the liljencrants-fant model for statistical parametric speech synthesis
Prasanna Kumar Muthukumar, Alan W. Black, H. Timothy Bunnell
Analysis and modeling of “focus” in context
Dirk Hovy, Gopala Krishna Anumanchipalli, Alok Parlikar, Caroline Vaughn, Adam Lammert, Eduard Hovy, Alan W. Black
Probabilistic speech F0 contour model incorporating statistical vocabulary model of phrase-accent command sequence
Tatsuma Ishihara, Hirokazu Kameoka, Kota Yoshizato, Daisuke Saito, Shigeki Sagayama
Reconstruction of continuous voiced speech from whispers
Ian Vince McLoughlin, Jingjie Li, Yan Song
Generating fundamental frequency contours for speech synthesis in yorùbá
Daniel R. van Niekerk, Etienne Barnard
Real-time voice conversion using artificial neural networks with rectified linear units
Elias Azarov, Maxim Vashkevich, Denis Likhachov, Alexander Petrovsky
Generation of fundamental frequency contours for Thai speech synthesis using tone nucleus model
Oraphan Krityakien, Keikichi Hirose, Nobuaki Minematsu
Unsupervised speaker and expression factorization for multi-speaker expressive synthesis of ebooks
Langzhou Chen, Norbert Braunschweiler
Which resemblance is useful to predict phrase boundary rise labels for Japanese expressive text-to-speech synthesis, numerically-expressed stylistic or distribution-based semantic?
Hideharu Nakajima, Hideyuki Mizuno, Osamu Yoshioka, Satoshi Takahashi
A targets-based superpositional model of fundamental frequency contours applied to HMM-based speech synthesis
Jinfu Ni, Yoshinori Shiga, Chiori Hori, Yutaka Kidawara
An investigation of acoustic features for singing voice conversion based on perceptual age
Kazuhiro Kobayashi, Hironori Doi, Tomoki Toda, Tomoyasu Nakano, Masataka Goto, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
Effect of MPEG audio compression on HMM-based speech synthesis
Bajibabu Bollepalli, Tuomo Raitio, Paavo Alku
Evaluation of a singing voice conversion method based on many-to-many eigenvoice conversion
Hironori Doi, Tomoki Toda, Tomoyasu Nakano, Masataka Goto, Satoshi Nakamura
Statistical nonparametric speech synthesis using sparse Gaussian processes
Tomoki Koriyama, Takashi Nose, Takao Kobayashi
Hybrid nearest-neighbor/cluster adaptive training for rapid speaker adaptation in statistical speech synthesis systems
Amir Mohammadi, Cenk Demiroglu
Uniform concatenative excitation model for synthesising speech without voiced/unvoiced classification
João P. Cabral
Production and perception of pseudo-V1CV2 outside the vowel triangle: speech illusion effects
Thi Anh Xuan Tran, Viet Son Nguyen, Eric Castelli, René Carré
Recent evolution of non-standard consonantal variants in French broadcast news
Maria Candea, Martine Adda-Decker, Lori Lamel
Architekt or archtekt? perception of devoiced vowels produced by Japanese speakers of German
Frank Zimmerer, Rei Yasuda, Henning Reetz
Comparing vowel category response surfaces over age-varying maximal vowel spaces within and across language communities
Andrew R. Plummer, Lucie Ménard, Benjamin Munson, Mary E. Beckman
Perceived vocal attractiveness across dialects is similar but not uniform
Molly Babel, Grant McGuire
Mutual intelligibility of American, Chinese and Dutch-accented speakers of English tested by SUS and SPIN sentences
Hongyan Wang, Vincent J. van Heuven
Speech enhancement based on deep denoising autoencoder
Xugang Lu, Yu Tsao, Shigeki Matsuda, Chiori Hori
Musical noise analysis for Bayesian minimum mean-square error speech amplitude estimators based on higher-order statistics
Hiroshi Saruwatari, Suzumi Kanehara, Ryoichi Miyazaki, Kiyohiro Shikano, Kazunobu Kondo
Non-negative matrix factorization with linear constraints for single-channel speech enhancement
Nikolay Lyubimov, Mikhail Kotov
A single channel speech enhancement approach by combining statistical criterion and multi-frame sparse dictionary learning
Hung-Wei Tseng, Srikanth Vishnubhotla, Mingyi Hong, Xiangfeng Wang, Jinjun Xiao, Zhi-Quan Luo, Tao Zhang
Speech enhancement using convolutive nonnegative matrix factorization with cosparsity regularization
Majid Mirbagheri, Yanbo Xu, Sahar Akram, Shihab Shamma
Joint stochastic-deterministic wiener filtering with recursive Bayesian estimation of deterministic speech
Matthew McCallum, Bernard Guillemin
Automatic self-supervised learning of associations between speech and text
Juho Knuuttila, Okko Räsänen, Unto K. Laine
Particle swarm optimisation of spoken dialogue system strategies
Lucie Daubigney, Matthieu Geist, Olivier Pietquin
Model-based Bayesian reinforcement learning for dialogue management
Pierre Lison
Evaluating spoken dialogue models under the interactive pattern recognition framework
Fabrizio Ghigi, María Inés Torres, Raquel Justo, José-Miguel Benedí
Multi-layer mutually reinforced random walk with hidden parameters for improved multi-party meeting summarization
Yun-Nung Chen, Florian Metze
A recursive dialogue game framework with optimal Policy offering personalized computer-assisted language learning
Pei-hao Su, Yow-Bang Wang, Tsung-Hsien Wen, Tien-han Yu, Lin-shan Lee
Improving LVCSR with hidden conditional random fields for grapheme-to-phoneme conversion
Stefan Hahn, Patrick Lehnen, Simon Wiesler, Ralf Schlüter, Hermann Ney
Context-dependent phone mapping for LVCSR of under-resourced languages
Van Hai Do, Xiong Xiao, Eng Siong Chng, Haizhou Li
Improving grapheme-based ASR by probabilistic lexical modeling approach
Ramya Rasipuram, Mathew Magimai-Doss
Crosslingual tandem-SGMM: exploiting out-of-language data for acoustic model and feature level adaptation
Petr Motlicek, David Imseng, Philip N. Garner
Multilingual multilayer perceptron for rapid language adaptation between and across language families
Ngoc Thang Vu, Tanja Schultz
Modeling prosodic sequences with k-means and dirichlet process GMMs
Andrew Rosenberg
Convergence of articulation rate in spontaneous speech
Antje Schweitzer, Natalie Lewandowski
Phonetic convergence in shadowed speech: a comparison of perceptual and acoustic measures
Jennifer S. Pardo
Pitch and duration as a basis for entrainment of overlapped speech onsets
Marcin Włodarczak, Juraj Šimko, Petra Wagner
Investigating fine temporal dynamics of prosodic and lexical accommodation
Francesca Bonin, Céline De Looze, Sucheta Ghosh, Emer Gilmartin, Carl Vogel, Anna Polychroniou, Hugues Salamin, Alessandro Vinciarelli, Nick Campbell
Spontaneous and explicit speech imitation
Jeesun Kim, Ruben Demirdjian, Chris Davis
Imitation interacts with one's second-language phonology but it does not operate cross-linguistically
Václav Jonáš Podlipský, Šárka Šimáčková, Kateřina Chládková
Prosodic markings of semantic predictability in taiwan Mandarin
Po-jen Hsieh
How did it work? historic phonetic devices explained by coeval photographs
Rüdiger Hoffmann, Dieter Mehnert, Rolf Dietzel
Eliciting speech with sentence lists — a critical evaluation with special emphasis on segmental anchoring
Lea S. Kohtz, Oliver Niebuhr
An MRI-based acoustic study of Mandarin vowels
Yuguang Wang, Jianwu Dang, Xi Chen, Jianguo Wei, Hongcui Wang, Kiyoshi Honda
Melody metrics for prosodic typology: comparing English, French and Chinese
Daniel Hirst
Velic coordination in French nasals: a real-time magnetic resonance imaging study
Michael Proctor, Louis Goldstein, Adam Lammert, Dani Byrd, Asterios Toutios, Shrikanth Narayanan
Learning to imitate adult speech with the KLAIR virtual infant
Mark Huckvale, Amrita Sharma
Physics-based synthesis of disordered voices
Jorge C. Lucero, Jean Schoentgen, Mara Behlau
Place assimilation and articulatory strategies: the case of sibilant sequences in French as L1 and L2
Sonia d'Apolito, Barbara Gili Fivela
Effects of lexical class and lemma frequency on German homographs
Barbara Samlowski, Petra Wagner, Bernd Möbius
Measuring laryngealization in running speech: interaction with contrastive tones in yalálag zapotec
Leonardo Lancia, Heriberto Avelino, Daniel Voigt
A neural oscillator model of speech timing and rhythm
Erin Rusaw
Observations of perseverative coarticulation in lateral approximants using MRI
Nicole Wong, Maojing Fu, Zhi-Pei Liang, Ryan K. Shosted, Bradley P. Sutton
Timing differences in articulation between voiced and voiceless stop consonants: an analysis of cine-MRI data
Masako Fujimoto, Tatsuya Kitamura, Hiroaki Hatano, Ichiro Fujimoto
Vocal tract cross-distance estimation from real-time MRI using region-of-interest analysis
Adam Lammert, Vikram Ramanarayanan, Michael Proctor, Shrikanth Narayanan
Syllable nuclei detection using perceptually significant features
Apoorv Reddy Arrabothu, Nivedita Chennupati, B. Yegnanarayana
Truncation of pharyngeal gesture in English diphthong [aɪ]
Fang-Ying Hsieh, Louis Goldstein, Dani Byrd, Shrikanth Narayanan
The effect of word frequency and lexical class on articulatory-acoustic coupling
Zhaojun Yang, Vikram Ramanarayanan, Dani Byrd, Shrikanth Narayanan
Discrimination between fricative and affricate in Japanese using time and spectral domain variables
Kimiko Yamakawa, Shigeaki Amano
L2 syntax acquisition: the effect of oral and written computer assisted practice
Polina Drozdova, Catia Cucchiarini, Helmer Strik
The physiological use of the charismatic voice in Political speech
Rosario Signorello, Didier Demolin
Crosslinguistic corpus of hesitation phenomena: a corpus for investigating first and second language speech performance
Ralph L. Rose
Real-time control of a 2d animation model of the vocal tract using optopalatography
Simon Preuß, Christiane Neuschaefer-Rube, Peter Birkholz
The influence of accentuation and polysyllabicity on compensatory shortening in German
Jessica Siddins, Jonathan Harrington, Felicitas Kleber, Ulrich Reubold
An investigation of vowel epenthesis in Chinese learners' production of German consonants
Hongwei Ding, Rüdiger Hoffmann
On the evaluation of inversion mapping performance in the acoustic domain
Korin Richmond, Zhen-Hua Ling, Junichi Yamagishi, Benigno Uría
Comparing computation in Gaussian mixture and neural network based large-vocabulary speech recognition
Vishwa Gupta, Gilles Boulianne
Simultaneous perturbation stochastic approximation for automatic speech recognition
Daniel Stein, Jochen Schwenninger, Michael Stadtschnitzer
Hardware/software codesign for mobile speech recognition
David Sheffield, Michael Anderson, Yunsup Lee, Kurt Keutzer
Exploiting the succeeding words in recurrent neural network language models
Yangyang Shi, Martha Larson, Pascal Wiggers, Catholijn M. Jonker
Speech acoustic unit segmentation using hierarchical dirichlet processes
Amir Hossein Harati Nejad Torbati, Joseph Picone, Marc Sobel
Transducer-based speech recognition with dynamic language models
Munir Georges, Stephan Kanthak, Dietrich Klakow
A method for structure estimation of weighted finite-state transducers and its application to grapheme-to-phoneme conversion
Yotaro Kubo, Takaaki Hori, Atsushi Nakamura
Combining forward-based and backward-based decoders for improved speech recognition performance
Denis Jouvet, Dominique Fohr
ivector-based acoustic data selection
Olivier Siohan, Michiel Bacchiani
Accurate and compact large vocabulary speech recognition on mobile devices
Xin Lei, Andrew Senior, Alexander Gruenstein, Jeffrey Sorensen
Pre-initialized composition for large-vocabulary speech recognition
Cyril Allauzen, Michael Riley
Speaker dependent activation keyword detector based on GMM-UBM
Evelyn Kurniawati, Sapna George
Written-domain language modeling for automatic speech recognition
Haşim Sak, Yun-hsuan Sung, Françoise Beaufays, Cyril Allauzen
Detecting words in speech using linear separability in a bag-of-events vector space
Maarten Versteegh, Louis ten Bosch
On the improvement of multimodal voice activity detection
Matt Burlick, Dimitrios Dimitriadis, Eric Zavesky
Using linguistic information to detect overlapping speech
Jürgen T. Geiger, Florian Eyben, Nicholas Evans, Björn Schuller, Gerhard Rigoll
Incremental acoustic subspace learning for voice activity detection using harmonicity-based features
Jiaxing Ye, Takumi Kobayashi, Masahiro Murakawa, Tetsuya Higuchi
Endpoint detection using weighted finite state transducer
Hoon Chung, SungJoo Lee, YunKeun Lee
A robust frontend for VAD: exploiting contextual, discriminative and spectral cues of human voice
Maarten Van Segbroeck, Andreas Tsiartas, Shrikanth Narayanan
All for one: feature combination for highly channel-degraded speech activity detection
Martin Graciarena, Abeer Alwan, Dan Ellis, Horacio Franco, Luciana Ferrer, John H. L. Hansen, Adam Janin, Byung-Suk Lee, Yun Lei, Vikramjit Mitra, Nelson Morgan, Seyed Omid Sadjadi, T. J. Tsai, Nicolas Scheffer, Lee Ngee Tan, Benjamin Williams
Superposed speech localisation using frequency tracking
Maxime Le Coz, Julien Pinquier, Régine André-Obrecht
Multi-band long-term signal variability features for robust voice activity detection
Andreas Tsiartas, Theodora Chaspari, Nassos Katsamanis, Prasanta Kumar Ghosh, Ming Li, Maarten Van Segbroeck, Alexandros Potamianos, Shrikanth Narayanan
A low-complexity voice activity detector for smart hearing protection of hyperacusic persons
Narimene Lezzoum, Ghyslain Gagnon, Jérémie Voix
Speech activity detection on youtube using deep neural networks
Neville Ryant, Mark Liberman, Jiahong Yuan
Speaker and noise independent voice activity detection
François G. Germain, Dennis L. Sun, Gautham J. Mysore
Confidence-based scoring: a useful diagnostic tool for detection tasks
T. J. Tsai, Adam Janin
Concurrent processing of voice activity detection and noise reduction using empirical mode decomposition and modulation spectrum analysis
Yasuaki Kanai, Shota Morita, Masashi Unoki
The furhat social companion talking head
Samer Al Moubayed, Jonas Beskow, Gabriel Skantze
Audition: the most important sense for humanoid robots?
Rodolphe Gelin, Gabriele Barbieri
Ultraspeech-player: intuitive visualization of ultrasound articulatory data for speech therapy and pronunciation training
Thomas Hueber
Laughter modulation: from speech to speech-laugh
Jieun Oh, Ge Wang
Refr: an open-source reranker framework
Daniel M. Bikel, Keith B. Hall
Embedding speech recognition to control lights
Alessandro Sosi, Fabio Brugnara, Luca Cristoforetti, Marco Matassoni, Mirco Ravanelli, Maurizio Omologo
The MUTE silent speech recognition system
Geoffrey S. Meltzner, James T. Heaton, Yunbin Deng
The edinburgh speech production facility doubletalk corpus
James M. Scobbie, Alice Turk, Christian Geng, Simon King, Robin Lickley, Korin Richmond
Lexee: a cloud-based platform for building and deploying voice-enabled mobile applications
Dmitry Sityaev, Jonathan Hotz, Vadim Snitkovsky
Visualizing articulatory data with VisArtico
Slim Ouni
A tool to elicit and collect multicultural and multimodal laughter
Mariette Soury, Clément Gossart, Martine Adda-Decker, Laurence Devillers
Design of a mobile app for interspeech conferences: towards an open tool for the spoken language community
Robert Schleicher, Tilo Westermann, Jinjin Li, Moritz Lawitschka, Benjamin Mateev, Ralf Reichmuth, Sebastian Möller
The speech recognition virtual kitchen
Florian Metze, Eric Fosler-Lussier, Rebecca Bates
Multilingual web conferencing using speech-to-speech translation
John Chen, Shufei Wen, Vivek Kumar Rangarajan Sridhar, Srinivas Bangalore
ROCme! software for the recording and management of speech corpora
Emmanuel Ferragne, Sébastien Flavier, Christian Fressard
Voice search in mobile applications with the rootvole framework
Felix Burkhardt
On-line audio dilation for human interaction
John S. Novak, Jason Archer, Valeriy Shafiro, Robert Kenyon, Jason Leigh
Phase-aware single-channel speech enhancement
Pejman Mowlaee, Mario Kaoru Watanabe, R. Saeidi
A free online accent and intonation dictionary for teachers and learners of Japanese
Hiroko Hirano, Ibuki Nakamura, Nobuaki Minematsu, Masayuki Suzuki, Chieko Nakagawa, Noriko Nakamura, Yukinori Tagawa, Keikichi Hirose, Hiroya Hashimoto
Reactive accent interpolation through an interactive map application
Maria Astrinaki, Junichi Yamagishi, Simon King, Nicolas d'Alessandro, Thierry Dutoit
A non-experts user interface for obtaining automatic diagnostic spelling evaluations for learners of the German writing system
Kay Berkling
Simple4all
Robert A. J. Clark
On-line learning of lexical items and grammatical constructions via speech, gaze and action-based human-robot interaction
Grégoire Pointeau, Maxime Petit, Xavier Hinaut, Guillaume Gibert, Peter Ford Dominey
Development of a pronunciation training system based on auditory-visual elements
Haruko Miyakoda
Real-time and non-real-time voice conversion systems with web interfaces
Elias Azarov, Maxim Vashkevich, Denis Likhachov, Alexander Petrovsky
Application of the NAO humanoid robot in the treatment of bone marrow-transplanted children (demo)
E. Csala, G. Németh, Cs. Zainkó
Photo-realistic expressive text to talking head synthesis
Vincent Wan, Robert Anderson, Art Blokland, Norbert Braunschweiler, Langzhou Chen, BalaKrishna Kolluru, Javier Latorre, Ranniery Maia, Björn Stenger, Kayoko Yanagisawa, Yannis Stylianou, Masami Akamine, M. J. F. Gales, Roberto Cipolla
Demonstration of LAPSyd: lyon-albuquerque phonological systems database
Ian Maddieson, Sébastien Flavier, Egidio Marsico, François Pellegrino
Speechmark acoustic landmark tool: application to voice pathology
Suzanne Boyce, Marisha Speights, Keiko Ishikawa, Joel MacAuslan
MODIS: an audio motif discovery software
Laurence Catanese, Nathan Souviraà-Labastie, Bingqing Qu, Sebastien Campion, Guillaume Gravier, Emmanuel Vincent, Frédéric Bimbot
The acoustics of word stress in Swedish: a function of stress level, speaking style and word accent
Anders Eriksson, Plínio A. Barbosa, Joel Åkesson
Intonational contrasts encode speaker's certainty in neutral vs. incredulity declarative questions in French
Amandine Michelas, Cristel Portes, Maud Champagne-Lavau
Prosodic changes pre-announcing a syntactic completion point in Japanese utterance
Yuichi Ishimoto, Mika Enomoto, Hitoshi Iida
Prosodic encoding of declarative, interrogative and imperative sentences in jaminjung, a language of australia
Candide Simard
Crosslinguistic priming in interactive reference: evidence for conceptual alignment in speech production
Anne Vullinghs, Martijn Goudbeek, Emiel Krahmer
A cross-linguistic study on turn-taking and temporal alignment in verbal interaction
Spyros Kousidis, David Schlangen, Stavros Skopeteas
Discriminative nonnegative dictionary learning using cross-coherence penalties for single channel source separation
Emad M. Grais, Hakan Erdogan
Monaural speech segregation based on pitch track correction using an ensemble kalman filter
Han-Gyu Kim, Gil-Jin Jang, Jeong-Sik Park, Yung-Hwan Oh
Voice activity classification for automatic bi-speaker adaptive beamforming in speech separation
Thuy N. Tran, William Cowley, André Pollok
Blind source separation using spatially distributed microphones based on microphone-location dependent source activities
Keisuke Kinoshita, Mehrez Souden, Tomohiro Nakatani
Non-negative tensor factorisation of modulation spectrograms for monaural sound source separation
Tom Barker, Tuomas Virtanen
Iterative sinusoidal-based partial phase reconstruction in single-channel source separation
Mario Kaoru Watanabe, Pejman Mowlaee
Classification of speech under stress by modeling the aerodynamics of the laryngeal ventricle
Xiao Yao, Takatoshi Jitsuhiro, Chiyomi Miyajima, Norihide Kitaoka, Kazuya Takeda
“sure, i did the right thing”: a system for sarcasm detection in speech
Rachel Rakov, Andrew Rosenberg
Investigating voice quality as a speaker-independent indicator of depression and PTSD
Stefan Scherer, Giota Stratou, Jonathan Gratch, Louis-Philippe Morency
A corpus-based study of elderly and young speakers of European Portuguese: acoustic correlates and their impact on speech recognition performance
Thomas Pellegrini, Annika Hämäläinen, Philippe Boula de Mareüil, Michael Tjalve, Isabel Trancoso, Sara Candeias, Miguel Sales Dias, Daniela Braga
Modeling spectral variability for the classification of depressed speech
Nicholas Cummins, Julien Epps, Vidhyasaharan Sethu, Michael Breakspear, Roland Goecke
Sentiment analysis of online spoken reviews
Verónica Pérez-Rosas, Rada Mihalcea
Demographic recommendation by means of group profile elicitation using speaker age and gender recognition
Sven Ewan Shepstone, Zheng-Hua Tan, Søren Holdt Jensen
Affective classification of generic audio clips using regression models
Nikolaos Malandrakis, Shiva Sundaram, Alexandros Potamianos
A preliminary study of cross-lingual emotion recognition from speech: automatic classification versus human perception
Je Hun Jeon, Duc Le, Rui Xia, Yang Liu
Active learning for dimensional speech emotion recognition
Wenjing Han, Haifeng Li, Huabin Ruan, Lin Ma, Jiayin Sun, Björn Schuller
Auditory detectability of vocal ageing and its effect on forensic automatic speaker recognition
Finnian Kelly, Naomi Harte
Comparative study of speaker personality traits recognition in conversational and broadcast news speech
Firoj Alam, G. Riccardi
Active learning by label uncertainty for acoustic emotion recognition
Zixing Zhang, Jun Deng, Erik Marchi, Björn Schuller
Modeling therapist empathy and vocal entrainment in drug addiction counseling
Bo Xiao, Panayiotis G. Georgiou, Zac E. Imel, David C. Atkins, Shrikanth Narayanan
Estimating callers' levels of knowledge in call center dialogues
Chiaki Miyazaki, Ryuichiro Higashinaka, Toshiro Makino, Yoshihiro Matsuo
Energy and F0 contour modeling with functional data analysis for emotional speech detection
Juan Pablo Arias, Carlos Busso, Néstor Becerra Yoma
Incremental emotion recognition
Taniya Mishra, Dimitrios Dimitriadis
Comparison of spectrum estimators in speaker verification: mismatch conditions induced by vocal effort
Cemal Hanilçi, Tomi Kinnunen, Padmanabhan Rajan, Jouni Pohjalainen, Paavo Alku, Figen Ertaş
Using denoising autoencoder for emotion recognition
Rui Xia, Yang Liu
Using twin-HMM-based audio-visual speech enhancement as a front-end for robust audio-visual speech recognition
Ahmed Hussen Abdelaziz, Steffen Zeiler, Dorothea Kolossa
Spectro-temporal directional derivative features for automatic speech recognition
James Gibson, Maarten Van Segbroeck, Antonio Ortega, Panayiotis G. Georgiou, Shrikanth Narayanan
Attribute-based histogram equalization (HEQ) and its adaptation for robust speech recognition
Xiong Xiao, Eng Siong Chng, Haizhou Li
Modified cepstral mean normalization — transforming to utterance specific non-zero mean
Vikas Joshi, N. Vishnu Prasad, S. Umesh
Damped oscillator cepstral coefficients for robust speech recognition
Vikramjit Mitra, Horacio Franco, Martin Graciarena
Regularized MVDR spectrum estimation-based robust feature extractors for speech recognition
Md. Jahangir Alam, Patrick Kenny, Douglas O'Shaughnessy
Noise adaptive training for subspace Gaussian mixture models
Liang Lu, Arnab Ghoshal, Steve Renals
The IBM speech activity detection system for the DARPA RATS program
George Saon, Samuel Thomas, Hagen Soltau, Sriram Ganapathy, Brian Kingsbury
Conditional emission densities for combining speech enhancement and recognition systems
Armin Sehr, Takuya Yoshioka, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Roland Maas, Walter Kellermann
Channel selection using n-best hypothesis for multi-microphone ASR
Martin Wolf, Climent Nadeu
Reverberant speech recognition based on denoising autoencoder
Takaaki Ishii, Hiroki Komiyama, Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa
Adaptive stereo-based stochastic mapping
Shay Maymon, Pierre Dognin, Xiaodong Cui, Vaibhava Goel
Distribution-based feature normalization for robust speech recognition leveraging context and dynamics cues
Yu-Chen Kao, Berlin Chen
An investigation of temporally varying weight regression for noise robust speech recognition
Shilin Liu, Khe Chai Sim
Feature space generalized variable parameter HMMs for noise robust recognition
Yang Li, Xunying Liu, Lan Wang
Bidirectional truncated recurrent neural networks for efficient speech denoising
Philémon Brakel, Dirk Stroobandt, Benjamin Schrauwen
Multi-stream recognition of noisy speech with performance monitoring
Ehsan Variani, Feipeng Li, Hynek Hermansky
Model-based noise suppression using unsupervised estimation of hidden Markov model for non-stationary noise
Masakiyo Fujimoto, Tomohiro Nakatani
Joint noise cancellation and dereverberation using multi-channel linearly constrained minimum variance filter
Karan Nathwani, Rajesh M. Hegde
Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?
Marc Delcroix, Yotaro Kubo, Tomohiro Nakatani, Atsushi Nakamura
Histogram equalization of real and imaginary modulation spectra for noise-robust speech recognition
Hsin-Ju Hsieh, Berlin Chen, Jeih-weih Hung
An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition
Bo Li, Yu Tsao, Khe Chai Sim
Bounded conditional mean imputation with an approximate posterior
Ulpu Remes
Mixtures of Bayesian joint factor analyzers for noise robust automatic speech recognition
Xiaodong Cui, Vaibhava Goel, Brian Kingsbury
Robust speech enhancement techniques for ASR in non-stationary noise and dynamic environments
Gang Liu, Dimitrios Dimitriadis, Enrico Bocchieri
Optimization of sigmoidal rate-level function based on acoustic features
Víctor Poblete, Néstor Becerra Yoma, Richard M. Stern
Composing auditory ERPs: cross-linguistic comparison of auditory change complex for Japanese fricative consonants
Makiko Sadakata, Loukianos Spyrou, Mizuki Shingai, Kaoru Sekiyama
How voicing, place and manner of articulation differently modulate event-related potentials associated with response inhibition
Nathalie Bedoin, Jennifer Krzonowski, Emmanuel Ferragne
Categorization of speech in early auditory evoked responses
Ludovic Bellier, Michel Mazzuca, Hung Thai-Van, Anne Caclin, Rafael Laboissière
Perception and production of Italian vowels: an ERP study
Anna Dora Manca, Mirko Grimaldi
Implicit learning leads to familiarity effects for intonation but not for voice
Ann-Kathrin Grohe, Bettina Braun
Spoofing and countermeasures for automatic speaker verification
Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi
I-vectors meet imitators: on vulnerability of speaker verification systems against voice mimicry
Rosa González Hautamäki, Tomi Kinnunen, Ville Hautamäki, Timo Leino, Anne-Maria Laukkanen
Security evaluation of i-vector based speaker verification systems against hill-climbing attacks
Marta Gomez-Barrero, Javier Gonzalez-Dominguez, Javier Galbally, Joaquin Gonzalez-Rodriguez
A new speaker verification spoofing countermeasure based on local binary patterns
Federico Alegre, Ravichander Vipperla, Asmaa Amehraye, Nicholas Evans
Voice transformation-based spoofing of text-dependent speaker verification systems
Zvi Kons, Hagai Aronowitz
Vulnerability evaluation of speaker verification under voice conversion spoofing: the effect of text constraints
Zhizheng Wu, Anthony Larcher, Kong Aik Lee, Eng Siong Chng, Tomi Kinnunen, Haizhou Li
Efficient speech transcription through respeaking
Matthias Sperber, Graham Neubig, Christian Fügen, Satoshi Nakamura, Alex Waibel
Annotation and classification of Political advertisements
Samuel Kim, Panayiotis G. Georgiou, Shrikanth Narayanan
Using role play for collecting question-answer pairs for dialogue agents
Ryuichiro Higashinaka, Kohji Dohsaka, Hideki Isozaki
Individual differences of emotional expression in speaker's behavioral and autonomic responses
Yoshiko Arimoto, Kazuo Okanoya
Development and validation of the conversational agents scale (CAS)
Ina Wechsung, Benjamin Weiss, Christine Kühnel, Patrick Ehrenbrink, Sebastian Möller
Motivational feedback in crowdsourcing: a case study in speech transcription
G. Riccardi, A. Ghosh, S. A. Chowdhury, Ali Orkan Bayer
The sheffield wargames corpus
Charles Fox, Yulan Liu, Erich Zwyssig, Thomas Hain
Formalizing expert knowledge for developing accurate speech recognizers
Anuj Kumar, Florian Metze, Wenyi Wang, Matthew Kam
Analysis of gaze and speech patterns in three-party quiz game interaction
Samer Al Moubayed, Jens Edlund, Joakim Gustafson
Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech
Olivier Galibert
'houston, we have a solution': using NASA apollo program to advance speech and language processing technology
Abhijeet Sangwan, Lakshmish Kaushik, Chengzhu Yu, John H. L. Hansen, Douglas W. Oard
Annotation errors detection in TTS corpora
Jindřich Matoušek, Daniel Tihelka
Technique for automatic sentence level alignment of long speech and transcripts
Imran Ahmed, Sunil Kumar Kopparapu
Text-to-speech alignment of long recordings using universal phone models
Sarah Hoffmann, Beat Pfister
Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data
Adriana Stan, Peter Bell, Junichi Yamagishi, Simon King
Automatic social role recognition in professional meetings using conditional random fields
Ashtosh Sapru, Hervé Bourlard
Same same but different — an acoustical comparison of the automatic segmentation of high quality and mobile telephone speech
Christoph Draxler, Hanna S. Feiser
Performance of the MVOCA silent speech interface across multiple speakers
Robin Hofe, Jie Bai, Lam A. Cheah, Stephen R. Ell, James M. Gilbert, Roger K. Moore, Phil D. Green
Automatic glottal tracking from high-speed digital images using a continuous normalized cross correlation
Gustavo Andrade-Miranda, Juan Ignacio Godino-Llorente
Automatic evaluation of parkinson's speech — acoustic, prosodic and voice related cues
Tobias Bocklet, Stefan Steidl, Elmar Nöth, Sabine Skodda
Comparison of approaches for an efficient phonetic decoding
Luiza Orosanu, Denis Jouvet
Learning speaker-specific pronunciations of disordered speech
H. Christensen, Phil D. Green, Thomas Hain
Adapting a speech into sign language translation system to a new domain
V. López-Ludeña, R. San-Segundo, C. González-Morcillo, J. C. López, E. Ferreiro
Language-universal speech audiometry with automated scoring
Bart Vaerenberg, Louis ten Bosch, Wojtek Kowalczyk, Martine Coene, Herwig De Smet, Paul J. Govaerts
Balancing word lists in speech audiometry through large spoken language corpora
Annemiek Hammer, Bart Vaerenberg, Wojtek Kowalczyk, Louis ten Bosch, Martine Coene, Paul J. Govaerts
Developing an information system for deaf
V. López-Ludeña, R. San-Segundo, J. Ferreiros, J. M. Pardo, E. Ferreiro
Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models
Myung Jong Kim, Joohong Yoo, Hoirin Kim
Voice pathology detection and classification using MPEG-7 audio low-level features
Ghulam Muhammad, Moutasem Melhem
Empirical mode decomposition-based spectral acoustic cues for disordered voices analysis
Abdellah Kacha, Francis Grenez, Jean Schoentgen
Exemplar-based individuality-preserving voice conversion for articulation disorders in noisy environments
Ryo Aihara, Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki
Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech
H. Christensen, M. B. Aniol, Peter Bell, Phil D. Green, Thomas Hain, Simon King, Pawel Swietojanski
Effects of envelope filter cutoff frequency on the intelligibility of Mandarin noise-vocoded speech in babble noise: implications for cochlear implants
Guangting Mai, James W. Minett, William S. -Y. Wang
CSLM — a modular open-source continuous space language modeling toolkit
Holger Schwenk
Speed up of recurrent neural network language models with sentence independent subsampling stochastic gradient descent
Yangyang Shi, Mei-Yuh Hwang, Kaisheng Yao, Martha Larson
Improving unsupervised language model adaptation with discriminative data filtering
Shuangyu Chang, Michael Levit, Partha Parthasarathy, Benoit Dumoulin
Lightly supervised training for risk-based discriminative language models
Akio Kobayashi, Takahiro Oku, Yuya Fujita, Shoei Sato
Investigation of MT-based ASR confusion models for semi-supervised discriminative language modeling
Erinç Dikici, Emily Prud'hommeaux, Brian Roark, Murat Saraçlar
Unsupervised discriminative language modeling using error rate estimator
Takanobu Oba, Atsunori Ogawa, Takaaki Hori, Hirokazu Masataki, Atsushi Nakamura
A region-specific feature-space transformation for speaker adaptation and singularity analysis of jacobian matrix
Shakti P. Rath, Lukáš Burget, Martin Karafiát, Ondřej Glembek, Jan Černocký
An explicit independence constraint for factorised adaptation in speech recognition
Y.-Q. Wang, M. J. F. Gales
Asynchronous factorisation of speaker and background with feature transforms in speech recognition
Oscar Saz, Thomas Hain
Cluster adaptive training with factorized decision trees for speech recognition
Kai Yu, Hainan Xu
Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition
Ossama Abdel-Hamid, Hui Jiang
Text-to-speech inspired duration modeling for improved whole-word acoustic models
Keith Kintzley, Aren Jansen, Hynek Hermansky
Duration of early vocalisations
Adele Gregory, Marija Tabain, Michael Robb
Acoustic development of vowel production in American English children
Jing Yang, Robert Allen Fox
The role of intrinsic motivations in learning sensorimotor vocal mappings: a developmental robotics study
Clément Moulin-Frier, Pierre-Yves Oudeyer
Children's timing and repair strategies for communication in adverse listening conditions
Valerie Hazan, Michèle Pettinato
Speech planning as an index of speech motor control maturity
Guillaume Barbier, Pascal Perrier, Lucie Ménard, Yohan Payan, Mark K. Tiede, Joseph S. Perkell
The relationship between gender-differentiated productions of /s/ and gender role behaviour in young children
Melissa Kinsman, Fangfang Li
Data-driven design of a sentence list for an articulatory speech corpus
Jeffrey Berry, Luciano Fadiga
Faster 3d vocal tract real-time MRI using constrained reconstruction
Yinghua Zhu, Asterios Toutios, Shrikanth Narayanan, Krishna Nayak
Relevance-weighted-reconstruction of articulatory features in deep-neural-network-based acoustic-to-articulatory mapping
Claudia Canevari, Leonardo Badino, Luciano Fadiga, Giorgio Metta
Word frequency, vowel length and vowel quality in speech production: an EMA study of the importance of experience
Fabian Tomaschek, Martijn Wieling, Denis Arnold, R. Harald Baayen
Towards a systematic and quantitative analysis of vocal tract data
Samuel Silva, António Teixeira, Catarina Oliveira, Paula Martins
A two-step technique for MRI audio enhancement using dictionary learning and wavelet packet analysis
Colin Vaz, Vikram Ramanarayanan, Shrikanth Narayanan
Electromagnetic articulography with AG500 and AG501
Massimo Stella, Antonio Stella, Francesco Sigona, Paolo Bernardini, Mirko Grimaldi, Barbara Gili Fivela
Development and implementation of fiducial markers for vocal tract MRI imaging and speech articulatory modelling
Pierre Badin, Julián Andrés Valdés Vargas, Arielle Koncki, Laurent Lamalle, Christophe Savariaux
Functional data analysis of tongue articulation in palatal vowels: gothenburg and malmöhus Swedish /iː, yː, ̟ʉː/
Susanne Schötz, Johan Frid, Lars Gustafsson, Anders Löfqvist
SMASH: a tool for articulatory data processing and analysis
Jordan R. Green, Jun Wang, David L. Wilson
Emotion recognition of conversational affective speech using temporal course modeling
Jen-Chun Lin, Chung-Hsien Wu, Wen-Li Wei
The role of empathy in the recognition of vocal emotions
Rene Altrov, Hille Pajupuu, Jaan Pajupuu
Electrophysiological evidence for benefits of imitation during the processing of spoken words embedded in sentential contexts
Angèle Brunellière, Sophie Dufour
Compensatory speech response to time-scale altered auditory feedback
Rintaro Ogane, Masaaki Honda
Bhattacharyya distance based emotional dissimilarity measure in multi-dimensional space for emotion classification
Tin Lay Nwe, Trung Hieu Nguyen, Dilip Kumar Limbu
On the enhancement of dereverberation algorithms based on a perceptual evaluation criterion
Thiago de M. Prego, Amaro A. de Lima, Sergio L. Netto
Revisiting pitch slope and height effects on perceived duration
Carlos Gussenhoven, Wencui Zhou
Adaptation to natural fast speech and time-compressed speech in children
Hélène Guiraud, Emmanuel Ferragne, Nathalie Bedoin, Véronique Boulenger
Modeling durational incompressibility
Andreas Windmann, Juraj Šimko, Britta Wrede, Petra Wagner
Perceived prosodic correlates of smiled speech in spontaneous data
Caroline Émond, Lucie Ménard, Marty Laforest
Predicting speech quality based on interactivity and delay
Alexander Raake, Katrin Schoenenberg, Janto Skowronek, Sebastian Egger
Perceptual, acoustic and electroglottographic correlates of 3 aggressive attitudes in French: a pilot study
Charlotte Kouklia, Nicolas Audibert
Theme identification in telephone service conversations using quaternions of speech features
Mohamed Morchid, Georges Linarès, Marc El-Beze, Renato De Mori
Detection of laughter in children's speech using spectral and prosodic acoustic features
Hrishikesh Rao, Jonathan C. Kim, Agata Rozga, Mark A. Clements
Classification of cooperative and competitive overlaps in speech using cues from the context, overlapper, and overlappee
Khiet P. Truong
Annotation and detection of conflict escalation in Political debates
Samuel Kim, Fabio Valente, Alessandro Vinciarelli
Machine learning of probabilistic phonological pronunciation rules from the Italian CLIPS corpus
Florian Schiel, Mary Stevens, Uwe D. Reichel, Francesco Cutugno
Human perception of alcoholic intoxication in speech
Barbara Baumeister, Florian Schiel
Phonetic manifestation and influence of zero anaphora in Chinese reading texts
Luying Hou, Yuan Jia, Aijun Li
Diacritics restoration for Arabic dialect texts
S. Harrat, M. Abbas, K. Meftouh, K. Smaili
Effects of talk-spurt silence boundary thresholds on distribution of gaps and overlaps
Marcin Włodarczak, Petra Wagner
Final lengthening in Russian: a corpus-based study
Tatiana Kachkovskaia, Nina Volskaya, Pavel Skrelin
From segmentation bootstrapping to transcription-to-word conversion
Uwe D. Reichel
Manual and automatic tone annotation: the case of an endangered language from north vietnam “mo piu”
Geneviève Caelen-Haumont, Katarina Bartkova
Non-canonical syntactic structures in discourse: tonality, tonicity and tones in English (semi-)spontaneous speech
Laetitia Leonarduzzi, Sophie Herment
Prediction of strategy and outcome as negotiation unfolds by using basic verbal and behavioral features
Elnaz Nouri, Sunghyun Park, Stefan Scherer, Jonathan Gratch, Peter Carnevale, Louis-Philippe Morency, David Traum
Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both?
Johann Poignant, Laurent Besacier, Viet Bac Le, Sophie Rosset, Georges Quénot
Integer linear programming for speaker diarization and cross-modal identification in TV broadcast
Hervé Bredin, Johann Poignant
Native accent classification via i-vectors and speaker compensation fusion
Andrea DeMarco, Stephen J. Cox
An open-source state-of-the-art toolbox for broadcast news diarization
Mickael Rouvier, Grégor Dupuy, Paul Gay, Elie Khoury, Teva Merlin, Sylvain Meignier
Audio event classification using deep neural networks
Zvi Kons, Orith Toledo-Ronen
Code-Switching event detection based on delta-BIC using phonetic eigenvoice models
Wei-Bin Liang, Chung-Hsien Wu, Chun-Shan Hsu
Automatic estimation of dialect mixing ratio for dialect speech recognition
Naoki Hirayama, Koichiro Yoshino, Katsutoshi Itoyama, Shinsuke Mori, Hiroshi G. Okuno
The albayzin 2012 language recognition evaluation
Luis Javier Rodríguez-Fuentes, Niko Brümmer, Mikel Penagarikano, Amparo Varona, Germán Bordel, Mireia Diez
TRAP language identification system for RATS phase II evaluation
Kyu J. Han, Sriram Ganapathy, Ming Li, Mohamed K. Omar, Shrikanth Narayanan
Improving language identification robustness to highly channel-degraded speech through multiple system fusion
Aaron Lawson, Mitchell McLaren, Yun Lei, Vikramjit Mitra, Nicolas Scheffer, Luciana Ferrer, Martin Graciarena
Multi-centroidal duration generation algorithm for HMM-based TTS
Yongguo Kang, Jian Li, Yan Deng, Miaomiao Wang
Analysis and synthesis of shouted speech
Tuomo Raitio, Antti Suni, Jouni Pohjalainen, Manu Airaksinen, Martti Vainio, Paavo Alku
Robust estimation of multiple-regression HMM parameters for dimension-based expressive dialogue speech synthesis
Tomohiro Nagata, Hiroki Mori, Takashi Nose
A new prosody annotation protocol for live sports commentaries
Sandrine Brognaux, Benjamin Picart, Thomas Drugman
Unsupervised prominence prediction for speech synthesis
Mahnoosh Mehrabani, Taniya Mishra, Alistair Conkie
Expressive speech synthesis in MARY TTS using audiobook data and emotionML
Marcela Charfuelan, Ingmar Steiner
Using dialog-activity similarity for spoken information retrieval
Nigel G. Ward, Steven D. Werner
A hybrid HMM/DNN approach to keyword spotting of short words
I-Fan Chen, Chin-Hui Lee
Leveraging locality for topic identification of conversational speech
Jonathan Wintrode
Person name spotting by combining acoustic matching and LDA topic models
Grégory Senay, Benjamin Bigot, Richard Dufour, Georges Linarès, Corinne Fredouille
Using phonological phrase segmentation to improve automatic keyword spotting for the highly agglutinating Hungarian language
György Szaszák, András Beke
Leveraging knowledge graphs for web-scale unsupervised semantic parsing
Larry Heck, Dilek Hakkani-Tür, Gokhan Tur
Fast and memory effective i-vector extraction using a factorized sub-space
Sandro Cumani, Pietro Laface
Effective estimation of a multi-session speaker model using information on signal parameters
Konstantin Simonchik, Andrey Shulipa, Timur Pekhovsky
Automatic regularization of cross-entropy cost for speaker recognition fusion
Ville Hautamäki, Kong Aik Lee, David A. van Leeuwen, R. Saeidi, Anthony Larcher, Tomi Kinnunen, Taufiq Hasan, Seyed Omid Sadjadi, Gang Liu, Hynek Bořil, John H. L. Hansen, Benoit Fauve
Speaker verification based on fusion of acoustic and articulatory information
Ming Li, Jangwon Kim, Prasanta Kumar Ghosh, Vikram Ramanarayanan, Shrikanth Narayanan
The distribution of calibrated likelihood-ratios in speaker recognition
David A. van Leeuwen, Niko Brümmer
Eigenageing compensation for speaker verification
Finnian Kelly, Niko Brümmer, Naomi Harte
Anchor and UBM-based multi-class MLLR m-vector system for speaker verification
A. K. Sarkar, Claude Barras
Ensemble approach in speaker verification
Leibny Paola Garcia Perera, Bhiksha Raj, Juan Arturo Nolazco-Flores
Sequential model adaptation for speaker verification
Jun Wang, Dong Wang, Xiaojun Wu, Thomas Fang Zheng, Javier Tejedor
Improving short utterance based i-vector speaker recognition using source and utterance-duration normalization techniques
A. Kanagasundaram, D. Dean, Javier Gonzalez-Dominguez, S. Sridharan, D. Ramos, Joaquin Gonzalez-Rodriguez
On leveraging conversational data for building a text dependent speaker verification system
Hagai Aronowitz, Oren Barkan
THU-EE system fusion for the NIST 2012 speaker recognition evaluation
Wei-Qiang Zhang, Zhiyi Li, Weiwei Liu, Jia Liu
Subspace-constrained supervector PLDA for speaker verification
Daniel Garcia-Romero, Alan McCree
Augmenting short-term cepstral features with long-term discriminative features for speaker verification of telephone data
Cong-Thanh Do, Claude Barras, Viet Bac Le, A. K. Sarkar
Using group delay functions from all-pole models for speaker recognition
Padmanabhan Rajan, Tomi Kinnunen, Cemal Hanilçi, Jouni Pohjalainen, Paavo Alku
Secure binary embeddings of front-end factor analysis for privacy preserving speaker verification
José Portêlo, Alberto Abad, Bhiksha Raj, Isabel Trancoso
On von-mises fisher mixture model in text-independent speaker identification
Jalil Taghia, Zhanyu Ma, Arne Leijon
Using phone log-likelihood ratios as features for speaker recognition
Mireia Diez, Amparo Varona, Mikel Penagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel
Handling recordings acquired simultaneously over multiple channels with PLDA
Jesús Villalba, Mireia Diez, Amparo Varona, Eduardo Lleida
Bayesian distance metric learning on i-vector for speaker verification
Xiao Fang, Najim Dehak, James Glass
Merging human and automatic system decisions to improve speaker recognition performance
Rosa González Hautamäki, Ville Hautamäki, Padmanabhan Rajan, Tomi Kinnunen
Effects of mouth-only and whole-face displays on audio-visual speech perception in noise: is the vision of a talker's full face truly the most efficient solution?
Grozdana Erjavec, Denis Legros
Acoustic and visual phonetic features in the mcgurk effect — an audiovisual speech illusion
Kaisa Tiippana, Mikko Tiainen, Lari Vainio, Martti Vainio
The effect of visual speech timing and form cues on the processing of speech and nonspeech
Chris Davis, Jeesun Kim
Effect of context, rebinding and noise, on audiovisual speech fusion
Ganesh Attigodu Chandrashekara, Frédéric Berthommier, Olha Nahorna, Jean-Luc Schwartz
Social face to face communication — American English attitudinal prosody
Albert Rilliard, Donna Erickson, Takaaki Shochi, João Antônio de Moraes
Adaptation of respiratory patterns in collaborative reading
Gérard Bailly, Amélie Rochet-Capellan, Coriandre Vilain
Convolutional deep rectifier neural nets for phone recognition
László Tóth
Pitch synchronous spectral analysis for a pitch dependent recognition of voiced phonemes — PISAR
Hans-Günter Hirsch
New parameters for automatic speech recognition based on the mammalian cochlea model using resonance analysis
José Luis Oropeza Rodríguez
Using an autoencoder with deformable templates to discover features for automated speech recognition
Navdeep Jaitly, Geoffrey E. Hinton
Speaking rate normalization with lattice-based context-dependent phoneme duration modeling for personalized speech recognizers on mobile devices
Ching-Feng Yeh, Hung-yi Lee, Lin-shan Lee
Subspace models for bottleneck features
Jun Qi, Dong Wang, Javier Tejedor
Bottleneck features based on gammatone frequency cepstral coefficients
Jun Qi, Dong Wang, Ji Xu, Javier Tejedor
Cross-entropy vs. squared error training: a theoretical and experimental comparison
Pavel Golik, Patrick Doetsch, Hermann Ney
Acoustic features for detection of phonemic aspiration in voiced plosives
Vaishali Patil, Preeti Rao
Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks
Dimitri Palaz, Ronan Collobert, Mathew Magimai-Doss
Hierarchical models based on a continuous acoustic space to identify phonological features
Javier Mikel Olaso, María Inés Torres
Locality sensitive hashing for fast computation of correlational manifold learning based feature space transformations
Vikrant Singh Tomar, Richard C. Rose
Evaluating speech features with the minimal-pair ABX task: analysis of the classical MFC/PLP pipeline
Thomas Schatz, Vijayaditya Peddinti, Francis Bach, Aren Jansen, Hynek Hermansky, Emmanuel Dupoux
Knowledge integration for improving performance in LVCSR
Chen-Yu Chiang, Sabato Marco Siniscalchi, Sin-Horng Chen, Chin-Hui Lee
Inter-speaker variability in audio-visual classification of word prominence
Martin Heckmann
Parameter clustering for temporally varying weight regression for automatic speech recognition
Shilin Liu, Khe Chai Sim
Phone duration modeling using clustering of rich contexts
Tanel Alumäe, Rena Nemoto
Human mouth state detection using low frequency ultrasound
Farzaneh Ahmadi, Mousa Ahmadi, Ian Vince McLoughlin
Lexical stress detection for L2 English speech using deep belief networks
Kun Li, Xiaojun Qian, Shiyin Kang, Helen Meng
MLP-HMM two-stage unsupervised training for low-resource languages on conversational telephone speech recognition
Yanmin Qian, Jia Liu
Failure transitions for joint n-gram models and G2p conversion
Josef R. Novak, Nobuaki Minematsu, Keikichi Hirose
Generative modeling of speech F0 contours
Hirokazu Kameoka, Kota Yoshizato, Tatsuma Ishihara, Yasunori Ohishi, Kunio Kashino, Shigeki Sagayama
G2p variant prediction techniques for ASR and STD
Marelie H. Davel, Charl van Heerden, Etienne Barnard
Rhythm analysis of second-language speech through low-frequency auditory features
Jin Jin, Joseph Tepperman
Graph-based semi-supervised learning for phone and segment classification
Yuzong Liu, Katrin Kirchhoff
Selective use of gaze information to improve ASR performance in noisy environments by cache-based class language model adaptation
Ao Shen, Neil Cooke, Martin Russell
Deep segmental neural networks for speech recognition
Ossama Abdel-Hamid, Li Deng, Dong Yu, Hui Jiang
Quantifying cross-linguistic variation in grapheme-to-phoneme mapping
Martine Coene, Annemiek Hammer, Wojtek Kowalczyk, Louis ten Bosch, Bart Vaerenberg, Paul J. Govaerts
Estimation of interest and comprehension level of audience through multi-modal behaviors in poster conversations
Tatsuya Kawahara, Soichiro Hayashi, Katsuya Takanashi
A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL)
Wenping Hu, Yao Qian, Frank K. Soong
A multi-domain dialog system to integrate heterogeneous spoken dialog systems
Joaquin Planells, Lluís-F. Hurtado, Encarna Segarra, Emilio Sanchis
Development and evaluation of spoken dialog systems with one or two agents
Yuki Todo, Ryota Nishimura, Kazumasa Yamamoto, Seiichi Nakagawa
User feedback in human-robot interaction: prosody, gaze and timing
Gabriel Skantze, Catharine Oertel, Anna Hjalmarsson
KPCatcher — a keyphrase extraction system for enterprise videos
Yongxin Taylor Xi, Matthias Paulik, Venkata Ramana Gadde, Ananth Sankar
Discriminative pronunciation modeling based on minimum phone error training
Meixu Song, Qingqing Zhang, Jielin Pan, Yonghong Yan
Grapheme-to-phoneme conversion based on adaptive regularization of weight vectors
Keigo Kubo, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura
An efficient method to estimate pronunciation from multiple utterances
Tofigh Naghibi, Sarah Hoffmann, Beat Pfister
Category-based phoneme-to-grapheme transliteration
Willem D. Basson, Marelie H. Davel
Discriminative training of WFST factors with application to pronunciation modeling
Preethi Jyothi, Eric Fosler-Lussier, Karen Livescu
Discriminative training of a phoneme confusion model for a dynamic lexicon in ASR
Penny Karanasou, François Yvon, Thomas Lavergne, Lori Lamel
The 2012 NIST speaker recognition evaluation
Craig S. Greenberg, Vincent M. Stanford, Alvin F. Martin, Meghana Yadagiri, George R. Doddington, John J. Godfrey, Jaime Hernandez-Cordero
Likelihood-ratio calibration using prior-weighted proper scoring rules
Niko Brümmer, George R. Doddington
A noise-robust system for NIST 2012 speaker recognition evaluation
Luciana Ferrer, Mitchell McLaren, Nicolas Scheffer, Yun Lei, Martin Graciarena, Vikramjit Mitra
I4u submission to NIST SRE 2012: a large-scale collaborative effort for noise-robust speaker verification
Haizhou Li, John H. L. Hansen, Jean-Francois Bonastre, S. Marcel, John S. D. Mason, Eliathamby Ambikairajah
Improved unsupervised NAP training dataset design for speaker recognition
Hanwu Sun, Bin Ma
Nuance - Politecnico di torino's 2012 NIST speaker recognition evaluation system
Daniele Colibro, Claudio Vair, Kevin Farrell, Nir Krause, Gennady Karvitsky, Sandro Cumani, Pietro Laface
A perceptually and physiologically motivated voice source model
Gang Chen, Marc Garellek, Jody Kreiman, Bruce R. Gerratt, Abeer Alwan
Stable articulatory tasks and their variable formation: tamil retroflex consonants
Caitlin Smith, Michael Proctor, Khalil Iskarous, Louis Goldstein, Shrikanth Narayanan
Articulatory settings facilitate mechanically advantageous motor control of vocal tract articulators
Vikram Ramanarayanan, Adam Lammert, Louis Goldstein, Shrikanth Narayanan
The interplay of linguistic structure and breathing in German spontaneous speech
Amélie Rochet-Capellan, Susanne Fuchs
Physical models of the vocal tract with a flapping tongue for flap and liquid sounds
Takayuki Arai
Articulatory copy synthesis from cine x-ray films
Yves Laprie, Matthieu Loosvelt, Shinji Maeda, Rudolph Sock, Fabrice Hirsch
Large-scale personal assistant technology deployment: the siri experience
Jerome R. Bellegarda
Evaluating an adaptive dialog system for the public
Benjamin Weiss, Simon Willkomm, Sebastian Möller
Self-taught assistive vocal interfaces: an overview of the ALADIN project
Jort F. Gemmeke, Bart Ons, Netsanet Tessema, Hugo Van hamme, Janneke van de Loo, Guy De Pauw, Walter Daelemans, Jonathan Huyghe, Jan Derboven, Lode Vuegen, Bert Van Den Broeck, Peter Karsmakers, Bart Vanrumste
Affect recognition in real-life acoustic conditions — a new perspective on feature selection
Florian Eyben, Felix Weninger, Björn Schuller
A distributed system for recognizing home automation commands and distress calls in the Italian language
Emanuele Principi, Stefano Squartini, Francesco Piazza, Danilo Fuselli, Maurizio Bonifazi
Probabilistic trainable segmenter for call center audio using multiple features
Nina Zinovieva, Xiaodan Zhuang, Pat Peterson, Joe Alwan, Rohit Prasad
Voice search in mobile applications and the use of linked open data
Felix Burkhardt, Hans Ulrich Nägeli
Evaluation of a real-time voice order recognition system from multiple audio channels in a home
Michel Vacher, Benjamin Lecouteux, Dan Istrate, Thierry Joubert, François Portet, Mohamed Sehili, Pedro Chahuara
In-home detection of distress calls: the case of aged users
Frédéric Aman, Michel Vacher, Solange Rossato, François Portet
Data driven methods for utterance semantic tagging
Ding Liu, Anthea Cheung, Anna Margolis, Patrick Redmond, Jun-won Suh, Chao Wang
The AT&t speech API: a study on practical challenges for customized speech to text service
E. Gouvêa, A. Moreno-Daniel, A. Reddy, R. Chengalvarayan, D. Thomson, A. Ljolje
In-vehicle destination entry by voice: practical aspects
Bart D'hoore, Alfred Wiesen
Intelligibility at a multilingual cocktail party: effect of concurrent language knowledge
Aurore Gautreau, Michel Hoen, Fanny Meunier
Regional accents affect speech intelligibility in a multitalker environment
Ewa Jacewicz, Robert Allen Fox
Perception of English minimal pairs in noise by Japanese listeners: does clear speech for L2 listeners help?
Shinichi Tokuma, Won Tokuma
Salento Italian listeners' perception of American English vowels
Bianca Sisinni, Paola Escudero, Mirko Grimaldi
TP 3.1 software: a tool for designing audio, visual, and audiovisual perceptual training tasks and perception tests
Andréia Schurt Rauber, Anabela Rato, Denise Cristina Kluge, Giane Rodrigues dos Santos
Effect of linguistic masker on the intelligibility of Mandarin sentences
Fei Chen, Junfeng Li, Lena L. N. Wong, Yonghong Yan
The learning and generalization of contrasts consistent or inconsistent with native biases
Kyuwon Moon, Meghan Sumner
L2 English learners' recognition of words spoken in familiar versus unfamiliar English accents
Jia Ying, Jason A. Shaw, Catherine T. Best
The effects of perceptual and/or productive training on the perception and production of English vowels /ɪ/ and /iː/ by Cantonese ESL learners
Janice Wing Sze Wong
On the role of L1 speech production in L2 perception: evidence from Spanish learners of French
Natalia Kartushina, Ulrich Hans Frauenfelder
Looking for lexical feedback effects in /tl/→/kl/ repairs
Pierre Hallé, Natalia Kartushina, Juan Segui, Ulrich Hans Frauenfelder
Recognizing words across regional accents: the role of perceptual assimilation in lexical competition
Catherine T. Best, Jason A. Shaw, Elizabeth Clancy
Dysarthria intelligibility assessment in a factor analysis total variability space
David Martínez, Phil D. Green, H. Christensen
Perceptual interference between regional accent and voice/speech disorders
Alain Ghio, Médéric Gasquet-Cyrus, Juliette Roquel, Antoine Giovanni
Linguistic disfluency in narrative speech: evidence from story-telling in 6-year olds
Ingrida Balčiūnienė
Assessing the utility of judgments of children's speech production made by untrained listeners in uncontrolled listening environments
Benjamin Munson
Consonant distortions in dysarthria due to parkinson's disease, amyotrophic lateral sclerosis and cerebellar ataxia
Tanja Kocjančič Antolík, Cécile Fougeron
Study of coarticulation and F2 transitions in French and Italian adult stutterers
Marine Verdurand, Solange Rossato, Lionel Granjon, Daria Balbo, Claudio Zmarich
Automatic tracheoesophageal voice typing using acoustic parameters
Renee P. Clapham, Corina J. Van As-Brooks, Michiel W. M. Van den Brekel, Frans J. M. Hilgers, Rob J. J. H. Van Son
Burst-based features for the classification of pathological voices
Julie Mauclair, Lionel Koenig, Marina Robert, Peggy Gatignol
Classification of depression state based on articulatory precision
Brian S. Helfer, Thomas F. Quatieri, James R. Williamson, Daryush D. Mehta, Rachelle Horwitz, Bea Yu
Using text and acoustic features to diagnose progressive aphasia and its subtypes
Kathleen C. Fraser, Frank Rudzicz, Elizabeth Rochon
Multi-domain neural network language model
Tanel Alumäe
Improving lightly supervised training for broadcast transcription
Y. Long, M. J. F. Gales, P. Lanchantin, X. Liu, M. S. Seigel, P. C. Woodland
Weakly supervised parsing with rules
C. Cerisara, A. Lorenzo, P. Kral
Relative error bounds for statistical classifiers based on the f-divergence
Markus Nussbaum-Thom, Eugen Beck, Tamer Alkhouli, Ralf Schlüter, Hermann Ney
Experiments towards a better LVCSR system for tamil
Melvin Jose Johnson Premkumar, Ngoc Thang Vu, Tanja Schultz
A hybrid language model for open-vocabulary Thai LVCSR
Kwanchiva Thangthai, Ananlada Chotimongkol, Chai Wutiwiwatchai
Hierarchical pitman-yor and dirichlet process for language model
Jen-Tzung Chien, Ying-Lan Chang
Unsupervised confidence calibration using examples of recognized words and their contexts
Taichi Asami, Satoshi Kobashikawa, Hirokazu Masataki, Osamu Yoshioka, Satoshi Takahashi
Multilingual hierarchical MRASTA features for ASR
Zoltán Tüske, Ralf Schlüter, Hermann Ney
Heuristic selection of training sentences from historical TV guide for semi-supervised LM adaptation
Harry M. Chang
Combination of random indexing based language model and n-gram language model for speech recognition
Dominique Fohr, Odile Mella
Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training
Yajie Miao, Florian Metze
Finding recurrent out-of-vocabulary words
Long Qin, Alexander Rudnicky
Using conversational word bursts in spoken term detection
Justin Chiu, Alexander Rudnicky
Brain activations in speech recovery process after intra-oral surgery: an fMRI study
Audrey Acher, Marc Sato, Laurent Lamalle, Coriandre Vilain, Arnaud Attye, Alexandre Krainik, Georges Bettega, Christian Adrien Righini, Brice Carlot, Muriel Brix, Pascal Perrier
Acoustic and perceptual analysis of vocal tremor
Christophe Mertens, Jean Schoentgen, Francis Grenez, Sabine Skodda
Lexical tone perception in Thai normal-hearing adults and those using hearing aids: a case study
C. Tantibundhit, C. Onsuwan, N. Klangpornkun, P. Phienphanich, T. Saimai, N. Saimai, P. Pitathawatchai, Chai Wutiwiwatchai
Evaluation of a bone-conducted ultrasonic hearing aid in vocal emotion transmission
Takayuki Kagomiya, Seiji Nakagawa
Processing of /i/ and /u/ in Italian cochlear-implant children: a behavioral and neurophysiologic study
Luigia Garrapa, Davide Bottari, Mirko Grimaldi, Francesco Pavani, Andrea Calabrese, Michele De Benedetto, Silvano Vitale
Predicting the bilateral advantage in cochlear implantees using a non-intrusive speech intelligibility measure
Stefano Cosentino, Tiago H. Falk, David McAlpine
A blind segmentation approach to acoustic event detection based on i-vector
Zhen Huang, You-Chi Cheng, Kehuang Li, Ville Hautamäki, Chin-Hui Lee
A dynamic programming framework for neural network-based automatic speech segmentation
Van Zyl van Vuuren, Louis ten Bosch, Thomas Niesler
Acoustic segmentation of speech using zero time liftering (ZTL)
RaviShankar Prasad, B. Yegnanarayana
Unsupervised mining of acoustic subword units with segment-level Gaussian posteriorgrams
Haipeng Wang, Tan Lee, Cheung-Chi Leung, Bin Ma, Haizhou Li
Combination of auditory attention features with phone posteriors for better automatic phoneme segmentation
Ozlem Kalinli
Automatic phonetic segmentation using boundary models
Jiahong Yuan, Neville Ryant, Mark Liberman, Andreas Stolcke, Vikramjit Mitra, Wen Wang
HMM-based TTS for hanoi vietnamese: issues in design and evaluation
Thi Thu Trang Nguyen, Christophe D'Alessandro, Albert Rilliard, Do Dat Tran
HMM-based synthesis of creaky voice
Tuomo Raitio, John Kane, Thomas Drugman, Christer Gobl
Integrating conditional random fields and joint multi-gram model with syllabic features for grapheme-to-phone conversion
Xiaoxuan Wang, Khe Chai Sim
Structure learning in hidden conditional random fields for grapheme-to-phoneme conversion
Patrick Lehnen, Alexandre Allauzen, Thomas Lavergne, François Yvon, Stefan Hahn, Hermann Ney
TUNDRA: a multilingual corpus of found data for TTS research created with light supervision
Adriana Stan, O. Watts, Y. Mamiya, M. Giurgiu, Robert A. J. Clark, Junichi Yamagishi, Simon King
Minimum mean squared error based warped complex cepstrum analysis for statistical parametric speech synthesis
Ranniery Maia, M. J. F. Gales, Yannis Stylianou, Masami Akamine
Augmented conditional random fields modeling based on discriminatively trained features
Yasser Hifny
Sequence-discriminative training of deep neural networks
Karel Veselý, Arnab Ghoshal, Lukáš Burget, Daniel Povey
Discriminatively trained sparse inverse covariance matrices for low resource acoustic modeling
Weibin Zhang, Pascale Fung
Discriminative training of acoustic models for system combination
Yuuki Tachioka, Shinji Watanabe
Semi-supervised GMM and DNN acoustic model training with multi-system combination and confidence re-calibration
Yan Huang, Dong Yu, Yifan Gong, Chaojun Liu
Restructuring of deep neural network acoustic models with singular value decomposition
Jian Xue, Jinyu Li, Yifan Gong
Large-scale characterization of Mandarin pronunciation errors made by native speakers of European languages
Nancy F. Chen, Vivaek Shivakumar, Mahesh Harikumar, Bin Ma, Haizhou Li
Production training in second language acquisition: a comparison between objective measures and subjective judgments
Véronique Delvaux, Kathy Huet, Myriam Piccaluga, Bernard Harmegnies
The production and perception of voice onset time in English-speaking children enrolled in a French immersion program
Nicole Netelenbos, Fangfang Li
Pronunciation errors by Spanish learners of Dutch: a data-driven study for ASR-based pronunciation training
Pepi Burgos, Catia Cucchiarini, Roeland van Hout, Helmer Strik
Realisation of tonal alignment in the English of Japanese-English late bilinguals
Calbert Graham, Brechtje Post
The influence of language and speech task upon creaky voice use among six young American women learning French
Agathe Benoist-lucy, Claire Pillot-Loiseau
Acoustic-prosodic, turn-taking, and language cues in child-psychologist interactions for varying social demand
Daniel Bone, Chi-Chun Lee, Theodora Chaspari, Matthew P. Black, Marian E. Williams, Sungbok Lee, Pat Levitt, Shrikanth Narayanan
A preliminary study of child vocalization on a parallel corpus of US and shanghainese toddlers
Hynek Bořil, Qian Zhang, Pongtep Angkititrakul, John H. L. Hansen, Dongxin Xu, Jill Gilkerson, Jeffrey A. Richards
A survey about databases of children's speech
Felix Claus, Hamurabi Gamboa Rosales, Rico Petrick, Horst-Udo Hain, Rüdiger Hoffmann
Affective evaluation of multimodal dialogue games for preschoolers using physiological signals
Vassiliki Kouloumenta, Manolis Perakakis, Alexandros Potamianos
Amplitude modulation features for emotion recognition from speech
Md. Jahangir Alam, Yazid Attabi, Pierre Dumouchel, Patrick Kenny, Douglas O'Shaughnessy
Analyzing eye-voice coordination in rapid automatized naming
Daniel Bone, Chi-Chun Lee, Vikram Ramanarayanan, Shrikanth Narayanan, Renske S. Hoedemaker, Peter C. Gordon
Analyzing the structure of parent-moderated narratives from children with ASD using an entity-based approach
Theodora Chaspari, Emily Mower Provost, Shrikanth Narayanan
Automated speech scoring for non-native middle school students with multiple task types
Keelan Evanini, Xinhao Wang
Identification of gender from children's speech by computers and humans
Saeid Safavi, Peter Jančovič, Martin Russell, Michael Carey
On why Japanese /r/ sounds are difficult for children to acquire
Takayuki Arai
Recurrent neural networks for language understanding
Kaisheng Yao, Geoffrey Zweig, Mei-Yuh Hwang, Yangyang Shi, Dong Yu
A study on LVCSR and keyword search for tagalog
Korbinian Riedhammer, Van Hai Do, James Hieronymus
Characterising depressed speech for classification
Sharifa Alghowinem, Roland Goecke, Michael Wagner, Julien Epps, Gordon Parker, Michael Breakspear
Combining acoustic name spotting and continuous context models to improve spoken person name recognition in speech
Benjamin Bigot, Grégory Senay, Georges Linarès, Corinne Fredouille, Richard Dufour
A resource-dependent approach to word modeling for keyword spotting
I-Fan Chen, Chin-Hui Lee
Markers of confidence and correctness in spoken medical narratives
Kathryn Womack, Cecilia Ovesdotter Alm, Cara Calvelli, Jeff B. Pelz, Pengcheng Shi, Anne Haake
Development of a web framework for teaching and learning Japanese prosody: OJAD (online Japanese accent dictionary)
Ibuki Nakamura, Nobuaki Minematsu, Masayuki Suzuki, Hiroko Hirano, Chieko Nakagawa, Noriko Nakamura, Yukinori Tagawa, Keikichi Hirose, Hiroya Hashimoto
Addressee detection for dialog systems using temporal and spectral dimensions of speaking style
Elizabeth Shriberg, Andreas Stolcke, Suman Ravuri
Analysis of factors involved in the choice of rising or non-rising intonation in question utterances appearing in conversational speech
Hiroaki Hatano, Miyako Kiso, Carlos T. Ishi
IsNL? a discriminative approach to detect natural language like queries for conversational understanding
Asli Celikyilmaz, Gokhan Tur, Dilek Hakkani-Tür
Automatic accent quantification of indian speakers of English
Jian Cheng, Nikhil Bojja, Xin Chen
Semantic parsing using word confusion networks with conditional random fields
Gokhan Tur, Anoop Deoras, Dilek Hakkani-Tür
Timing responses to questions in dialogue
Sofia Strömbergsson, Anna Hjalmarsson, Jens Edlund, David House
BUT BABEL system for spontaneous Cantonese
Martin Karafiát, František Grézl, Mirko Hannemann, Karel Veselý, Jan Černocký
Semi-supervised manifold learning approaches for spoken term verification
Atta Norouzian, Richard C. Rose, Aren Jansen
Language modeling for mixed language speech recognition using weighted phrase extraction
Ying Li, Pascale Fung
Correlates to intelligibility in deviant child speech — comparing clinical evaluations to audience response system-based evaluations by untrained listeners
Sofia Strömbergsson, Christina Tånnander
Using linguistic analysis to characterize conceptual units of thought in spoken medical narratives
Kathryn Womack, Cecilia Ovesdotter Alm, Cara Calvelli, Jeff B. Pelz, Pengcheng Shi, Anne Haake
Interacting with robots via speech and gestures, an integrated architecture
Francesco Cutugno, Alberto Finzi, Michelangelo Fiore, Enrico Leone, Silvia Rossi
Incorporating named entity recognition into the speech transcription process
Mohamed Hatmi, Christine Jacquin, Emmanuel Morin, Sylvain Meignier
DTW-distance-ordered spoken term detection
Teppei Ohno, Tomoyosi Akiba
Refining sentence similarity with discourse information in dialog system
Sangkeun Jung, Seung-Hoon Na
Two-step correction of speech recognition errors based on n-gram and long contextual information
Ryohei Nakatani, Tetsuya Takiguchi, Yasuo Ariki
Inferring actor communities from videos
Sumit Negi, Ramnath Balasubramanyan, Santanu Chaudhury
Multiple topic identification in telephone conversations
Xavier Bost, Marc El-Beze, Renato De Mori
Variable-Span out-of-vocabulary named entity detection
Wei Chen, Sankaranarayanan Ananthakrishnan, Rohit Prasad, Prem Natarajan
On the feasibility of using pupil diameter to estimate cognitive load changes for in-vehicle spoken dialogues
Andrew L. Kun, Oskar Palinko, Zeljko Medenica, Peter A. Heeman
Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding
Grégoire Mesnil, Xiaodong He, Li Deng, Yoshua Bengio
Paraphrase features to improve natural language understanding
Xiaohu Liu, Ruhi Sarikaya, Chris Brockett, Chris Quirk, William B. Dolan
A weakly-supervised approach for discovering new user intents from search query logs
Dilek Hakkani-Tür, Asli Celikyilmaz, Larry Heck, Gokhan Tur
Exploiting shared information for multi-intent natural language sentence classification
Puyang Xu, Ruhi Sarikaya
Quality assessment of asymmetric multiparty telephone conferences: a systematic method from technical degradations to perceived impairments
Janto Skowronek, Julian Herlinghaus, Alexander Raake
User activity estimation method based on probabilistic generative model of acoustic event sequence with user activity and its subordinate categories
Keisuke Imoto, Suehiro Shimauchi, Hisashi Uematsu, Hitoshi Ohmuro
Generalizing continuous-space translation of paralinguistic information
Takatomo Kano, Shinnosuke Takamichi, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura
An empirical comparison of joint optimization techniques for speech translation
Masaya Ohgushi, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura
A sequential repetition model for improved disfluency detection
Mari Ostendorf, Sangyun Hahn
Disfluency detection based on prosodic features for university lectures
Henrique Medeiros, Helena Moniz, Fernando Batista, Isabel Trancoso, Luis Nunes
What's the difference? comparing humans and machines on the Aurora 2 speech recognition task
Bernd T. Meyer
Calibration of distance measures for unsupervised query-by-example
Michele Gubian, Lou Boves, Maarten Versteegh
Indexing multimedia documents with acoustic concept recognition lattices
Diego Castan, Murat Akbacak
MINT.tools: tools and adaptors supporting acquisition, annotation and analysis of multimodal corpora
Spyros Kousidis, Thies Pfeiffer, David Schlangen
Automatic human utility evaluation of ASR systems: does WER really predict performance?
Benoit Favre, Kyla Cheung, Siavash Kazemian, Adam Lee, Yang Liu, Cosmin Munteanu, Ani Nenkova, Dennis Ochei, Gerald Penn, Stephen Tratz, Clare Voss, Frauke Zeller
Corpus analysis of simultaneous interpretation data for improving real time speech translation
Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore
A real-world system for simultaneous translation of German lectures
Eunah Cho, Christian Fügen, Teresa Hermann, Kevin Kilgour, Mohammed Mediani, Christian Mohr, Jan Niehues, Kay Rottmann, Christian Saam, Sebastian Stüker, Alex Waibel
Freestyle: a challenge-response system for hip hop lyrics via unsupervised induction of stochastic transduction grammars
Dekai Wu, Karteek Addanki, Markus Saers
Toward transfer of acoustic cues of emphasis across languages
Andreas Tsiartas, Panayiotis G. Georgiou, Shrikanth Narayanan
Simple, lexicalized choice of translation timing for simultaneous speech translation
Tomoki Fujita, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura
Fitting long-range information using interpolated distanced n-grams and cache models into a latent dirichlet language model for speech recognition
Md. Akmal Haidar, Douglas O'Shaughnessy
Incorporating proximity information for relevance language modeling in speech recognition
Yi-Wen Chen, Bo-Han Hao, Kuan-Yu Chen, Berlin Chen
Instance-based on-line language model adaptation
Ali Orkan Bayer, G. Riccardi
Unsupervised topic adaptation for morph-based speech recognition
André Mansikkaniemi, Mikko Kurimo
Unsupervised language model adaptation for automatic speech recognition of broadcast news using web 2.0
Tim Schlippe, Lukasz Gren, Ngoc Thang Vu, Tanja Schultz
Recurrent neural network based language model personalization by social network crowdsourcing
Tsung-Hsien Wen, Aaron Heidel, Hung-yi Lee, Yu Tsao, Lin-shan Lee
Language-independent call routing using the large margin estimation principle
Moataz El Ayadi, Mohamed Afify
Deep belief network based semantic taggers for spoken language understanding
Anoop Deoras, Ruhi Sarikaya
Error-corrective discriminative joint decoding of automatic spoken language transcription and understanding
Bassam Jabaian, Fabrice Lefèvre
Detecting summarization hot spots in meetings using group level involvement and turn-taking features
Catherine Lai, Jean Carletta, Steve Renals
Supervised spoken document summarization based on structured support vector machine with utterance clusters as hidden variables
Sz-Rung Shiang, Hung-yi Lee, Lin-shan Lee
Web data harvesting for speech understanding grammar induction
Ioannis Klasinas, Alexandros Potamianos, Elias Iosif, Spiros Georgiladakis, Gianluca Mameli
Articulatory synthesis of French connected speech from EMA data
Asterios Toutios, Shrikanth Narayanan
A new language independent, photo-realistic talking head driven by voice only
Xinjian Zhang, Lijuan Wang, Gang Li, Frank Seide, Frank K. Soong
Binocular photometric stereo acquisition and reconstruction for 3d talking head applications
Chaoyang Wang, Lijuan Wang, Yasuyuki Matsushita, Bojun Huang, Magnetro Chen, Frank K. Soong
Speaker adaptation of an acoustic-articulatory inversion model using cascaded Gaussian mixture regressions
Thomas Hueber, Gérard Bailly, Pierre Badin, Frédéric Elisei
Articulatory features for speech-driven head motion synthesis
Atef Ben-Youssef, Hiroshi Shimodaira, David Adam Braude
Template-warping based speech driven head motion synthesis
David Adam Braude, Hiroshi Shimodaira, Atef Ben-Youssef
ALIZE 3.0 — open source toolkit for state-of-the-art speaker recognition
Anthony Larcher, Jean-Francois Bonastre, Benoit Fauve, Kong Aik Lee, Christophe Lévy, Haizhou Li, John S. D. Mason, Jean-Yves Parfait
New cosine similarity scorings to implement gender-independent speaker verification
Mohammed Senoussaoui, Patrick Kenny, Pierre Dumouchel, Najim Dehak
Improving speaker identification in TV-shows using person name detection in overlaid text and speech
Delphine Charlet, Corinne Fredouille, Géraldine Damnati, Grégory Senay
Exploring methods of improving speaker accuracy for speaker diarization
Mary Tai Knox, Nikki Mirghafori, Gerald Friedland
Combining deep speaker specific representations with GMM-SVM for speaker verification
Ryan Price, Sangeeta Biswas, Koichi Shinoda
Using spectral moments as a speaker specific feature in nasals and fricatives
Carola Schindler, Christoph Draxler
A computational model of perceptuo-motor processing in speech perception: learning to imitate and categorize synthetic CV syllables
Raphaël Laurent, Jean-Luc Schwartz, Pierre Bessière, Julien Diard
Talker-specific perceptual processing: influences on internal category structure
Rachel M. Theodore
Elicitation and analysis of a corpus of robust noise-induced word misperceptions in Spanish
Maria Luisa García Lecumberri, Attila Máté Tóth, Yan Tang, Martin Cooke
Vocabulary structure and spoken-word recognition: evidence from French reveals the source of embedding asymmetry
Anne Cutler, Laurence Bruggeman
How do multiple sublexical cues converge in lexical segmentation? an artificial language learning study
Odile Bagou, Ulrich Hans Frauenfelder
Towards an end-to-end computational model of speech comprehension: simulating a lexical decision task
Louis ten Bosch, Lou Boves, Mirjam Ernestus
A phase-modified approach for TDE-based acoustic localization
Georgios Athanasopoulos, Werner Verhelst
Interference robust DOA estimation of human speech by exploiting historical information and temporal correlation
Wei Xue, Shan Liang, Wenju Liu
Identifying new bird species from differences in birdsong
Naomi Harte, Sadhbh Murphy, David J. Kelly, Nicola M. Marples
Controlling “shout” expression in a Japanese POP singing performance: analysis and suppression study
Yuri Nishigaki, Ken-Ichi Sakakibara, Masanori Morise, Ryuichi Nisimura, Toshio Irino, Hideki Kawahara
Dimensionality analysis of singing speech based on locality preserving projections
Mahnoosh Mehrabani, John H. L. Hansen
Audio classification using dominant spatial patterns in time-frequency space
Md. Khademul Islam Molla, Keikichi Hirose
Spectro-temporal modulation based singing detection combined with pitch-based grouping for singing voice separation
Tse-En Lin, Chung-Chien Hsu, Yi-Cheng Chen, Jian-Hueng Chen, Tai-Shih Chi
NMF-based temporal feature integration for acoustic event classification
Jimmy Ludeña-Choez, Ascensión Gallardo-Antolín
Robust audio-codebooks for large-scale event detection in consumer videos
Shourabh Rawat, Peter F. Schulam, Susanne Burger, Duo Ding, Yipei Wang, Florian Metze
Person identification using biometric markers from footsteps sound
M. Umair Bin Altaf, Taras Butko, Biing-Hwang Juang
Learning binaural spectrogram features for azimuthal speaker localization
Wiktor Młynarski
An unsupervised Bayesian classifier for multiple speaker detection and localization
Youssef Oualil, Friedrich Faubel, Dietrich Klakow
Joint recognition and direction-of-arrival estimation of simultaneous meeting-room acoustic events
Rupayan Chakraborty, Climent Nadeu
Audio self organized units for high-level event detection
Xiaodan Zhuang, Shuang Wu, Pradeep Natarajan, Rohit Prasad, Prem Natarajan
LAPSyd: lyon-albuquerque phonological systems database
Ian Maddieson, Sébastien Flavier, Egidio Marsico, Christophe Coupé, François Pellegrino
The duration compensation issue revisited
Plínio A. Barbosa
Cross-language comparison of functional load for vowels, consonants, and tones
Yoon Mi Oh, François Pellegrino, Christophe Coupé, Egidio Marsico
Notes on so-called inter-speaker difference in spontaneous speech: the case of Japanese voiced obstruent
Kikuo Maekawa
The role of the pharynx and tongue in enhancement of vowel nasalization: a real-time MRI investigation of French nasal vowels
Christopher Carignan, Ryan K. Shosted, Maojing Fu, Zhi-Pei Liang, Bradley P. Sutton
Assimilation of word-final nasals to following word-initial place of articulation in UK English
Margaret E. L. Renwick, Ladan Baghai-Ravary, Rosalind Temple, John S. Coleman
Joint spectral distribution modeling using restricted boltzmann machines for voice conversion
Ling-Hui Chen, Zhen-Hua Ling, Yan Song, Li-Rong Dai
Exemplar-based unit selection for voice conversion utilizing temporal information
Zhizheng Wu, Tuomas Virtanen, Tomi Kinnunen, Eng Siong Chng, Haizhou Li
Alleviating the over-smoothing problem in GMM-based voice conversion with discriminative training
Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Yih-Ru Wang, Sin-Horng Chen
A hybrid approach to electrolaryngeal speech enhancement based on spectral subtraction and statistical voice conversion
Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
A digital signal processor implementation of silent/electrolaryngeal speech enhancement based on real-time statistical voice conversion
Takuto Moriguchi, Tomoki Toda, Motoaki Sano, Hiroshi Sato, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
Foreign accent conversion through voice morphing
Sandesh Aryal, Daniel Felps, Ricardo Gutierrez-Osuna
Empirical link between hypothesis diversity and fusion performance in an ensemble of automatic speech recognition systems
Kartik Audhkhasi, Andreas M. Zavou, Panayiotis G. Georgiou, Shrikanth Narayanan
A lecture transcription system combining neural network acoustic and language models
Peter Bell, Hitoshi Yamamoto, Pawel Swietojanski, Youzheng Wu, Fergus McInnes, Chiori Hori, Steve Renals
Neural network acoustic models for the DARPA RATS program
Hagen Soltau, Hong-Kwang Kuo, Lidia Mangu, George Saon, Tomas Beran
Improved models for automatic punctuation prediction for spoken and written text
Nicola Ueffing, Maximilian Bisani, Paul Vozila
Some issues affecting the transcription of Hungarian broadcast audio
Anindya Roy, Lori Lamel, Thiago Fraga-Silva, Jean-Luc Gauvain, Ilya Oparin
Development of the RWTH transcription system for slovenian
Pavel Golik, Zoltán Tüske, Ralf Schlüter, Hermann Ney
Noise robust speaker verification with delta cepstrum normalization
Naoyuki Kanda, Ryu Takeda, Yasunari Obuchi
R-norm: improving inter-speaker variability modelling at the score level via regression score normalisation
David Vandyke, Michael Wagner, Roland Goecke
Frequency warping and robust speaker verification: a comparison of alternative mel-scale representations
Tomi Kinnunen, Md. Jahangir Alam, Pavel Matějka, Patrick Kenny, Jan Černocký, Douglas O'Shaughnessy
Acoustic factor analysis based universal background model for robust speaker verification in noise
Taufiq Hasan, John H. L. Hansen
A new Bayesian network to assess the reliability of speaker verification decisions
Jesús Villalba, Eduardo Lleida, Alfonso Ortega, Antonio Miguel
The IBM RATS phase II speaker recognition system: overview and analysis
Weizhong Zhu, Sibel Yaman, Jason Pelecanos
Multi-session PLDA scoring of i-vector for partially open-set speaker detection
Kong Aik Lee, Anthony Larcher, Chang Huai You, Bin Ma, Haizhou Li
Impact of noise reduction and spectrum estimation on noise robust speaker identification
Keith W. Godin, Seyed Omid Sadjadi, John H. L. Hansen
Improvement of distant-talking speaker identification using bottleneck features of DNN
Takanori Yamada, Longbiao Wang, Atsuhiko Kai
Geometric contamination for GMM/UBM speaker verification in reverberant environments
Alessio Brutti, Maurizio Omologo
Towards a more efficient SVM supervector speaker verification system using Gaussian reduction and a tree-structured hash
Richard D. McClanahan, Phillip L. De Leon
Improving the PLDA based speaker verification in limited microphone data conditions
A. Kanagasundaram, D. Dean, Javier Gonzalez-Dominguez, S. Sridharan, D. Ramos, Joaquin Gonzalez-Rodriguez
The I3a speaker recognition system for NIST SRE12: post-evaluation analysis
Jesús Villalba, Eduardo Lleida, Alfonso Ortega, Antonio Miguel
Text-dependent speaker recognition using PLDA with uncertainty propagation
T. Stafylakis, Patrick Kenny, P. Ouellet, J. Perez, M. Kockmann, Pierre Dumouchel
Robust speaker recognition using spectro-temporal autoregressive models
Sri Harish Mallidi, Sriram Ganapathy, Hynek Hermansky
Effect of multicondition training on i-vector PLDA configurations for speaker recognition
Padmanabhan Rajan, Tomi Kinnunen, Ville Hautamäki
Improving robustness to compressed speech in speaker recognition
Mitchell McLaren, Victor Abrash, Martin Graciarena, Yun Lei, Jan Pešán
Modulation features for noise robust speaker identification
Vikramjit Mitra, Mitchell McLaren, Horacio Franco, Martin Graciarena, Nicolas Scheffer
Minimax i-vector extractor for short duration speaker verification
Ville Hautamäki, You-Chi Cheng, Padmanabhan Rajan, Chin-Hui Lee
Standoff speaker recognition: effects of recording distance mismatch on speaker recognition system performance
Mike Fowler, Mark McCurry, Jonathan Bramsen, Kehinde Dunsin, Jeremiah Remus
Vowel identity conditions the time course of tone recognition
Jason A. Shaw, Michael D. Tyler, Benjawan Kasisopa, Yuan Ma, Michael Proctor, Chong Han, Donald Derrick, Denis Burnham
Changes in the role of intensity as a cue for fricative categorisation
Odette Scharenborg, Esther Janse
Weighting of acoustic cues shifts to frication duration in identification of fricatives/affricates when auditory properties are degraded due to aging
Keiichi Yasu, Takayuki Arai, Kei Kobayashi, Mitsuko Shindo
Duration as a secondary cue for perception of voicing and tone in shanghai Chinese
Jiayin Gao, Pierre Hallé
Development of central auditory processes and their links with language skills in typically developing children
Marie Dekerle, Fanny Meunier, Marie-Ange N'Guyen, Estelle Gillet-Perret, Delphine Lassus-Sangosse, Sophie Donnadieu
Show me what you listen to! auditory classification images can reveal the processing of fine acoustic cues during speech categorization
Léo Varnet, Kenneth Knoblauch, Fanny Meunier, Michel Hoen
The organ stop “vox humana” as a model for a vowel synthesiser
Fabian Brackhane, Jürgen Trouvain
Information theoretic acoustic feature selection for acoustic-to-articulatory inversion
Prasanta Kumar Ghosh, Shrikanth Narayanan
Formant contours in Czech vowels: speaker-discriminating potential
Dita Fejlová, David Lukeš, Radek Skarnitzl
An anisotropic diffusion filter based on multidirectional separability
Shen Liu, Jianguo Wei, Xin Wang, Wenhuan Lu, Qiang Fang, Jianwu Dang
The phonological voicing contrast in Czech: an EPG study of phonated and whispered fricatives
Radek Skarnitzl, Pavel Šturm, Pavel Machač
Vowel and prosodic factor dependent variations of vocal-tract length
Shinji Maeda, Yves Laprie
Word identification using phonetic features: towards a method to support multivariate fMRI speech decoding
Tijl Grootswagers, Karen Dijkstra, Louis ten Bosch, Alex Brandmeyer, Makiko Sadakata
Analysis of breathy, modal and pressed phonation based on low frequency spectral density
Dhananjaya Gowda, Mikko Kurimo
Is the vowel length contrast in Japanese exaggerated in infant-directed speech?
Keiichi Tajima, Kuniyoshi Tanaka, Andrew Martin, Reiko Mazuka
Investigating the relationship between glottal area waveform shape and harmonic magnitudes through computational modeling and laryngeal high-speed videoendoscopy
Gang Chen, Robin A. Samlan, Jody Kreiman, Abeer Alwan
Formant frequency tracking using Gaussian mixtures with maximum a posteriori adaptation
Jonathan C. Kim, Hrishikesh Rao, Mark A. Clements
Devoicing of vowels in German, a comparison of Japanese and German speakers
Rei Yasuda, Frank Zimmerer
Identifying consonantal tasks via measures of tongue shaping: a real-time MRI investigation of the production of vocalized syllabic /l/ in American English
Caitlin Smith, Adam Lammert
A speech enhancement method by coupling speech detection and spectral amplitude estimation
Feng Deng, Chang-chun Bao, Feng Bao
Late reverberation suppression using MMSE modulation spectral estimation
Chenxi Zheng, Wai-Yip Chan
A new statistical excitation mapping for enhancement of throat microphone recordings
M. A. Tuğtekin Turan, Engin Erzin
Classification based binaural dereverberation
Nicoleta Roman, Michael I. Mandel
Target-to-non-target directional ratio estimation based on dual-microphone phase differences for target-directional speech enhancement
Seon Man Kim, Hong Kook Kim
Speech spectrum restoration based on conditional restricted boltzmann machine
Xugang Lu, Shigeki Matsuda, Chiori Hori
Speaker separation using visual speech features and single-channel audio
Faheem Khan, Ben Milner
Spectral modulation sensitivity based perceptual acoustic echo cancellation
Wei-Lun Chuang, Kah-Meng Cheong, Chung-Chien Hsu, Tai-Shih Chi
Speech enhancement using compressed sensing
Vinayak Abrol, Pulkit Sharma, Anil Kumar Sao
Spectro-temporal post-enhancement using MMSE estimation in NMF based single-channel source separation
Emad M. Grais, Hakan Erdogan
A pitch-based spectral enhancement technique for robust speech processing
Kantapon Kaewtip, Lee Ngee Tan, Abeer Alwan
Stochastic-deterministic signal modelling for the tracking of pitch in noise and speech mixtures using factorial HMMs
Matthew McCallum, Bernard Guillemin
Restoration of clipped signals with application to speech recognition
Shay Maymon, Etienne Marcheret, Vaibhava Goel
On the robustness of distributed EM based BSS in asynchronous distributed microphone array scenarios
Yasufumi Uezu, Keisuke Kinoshita, Mehrez Souden, Tomohiro Nakatani
Infinite support vector machines in speech recognition
Jingzhou Yang, Rogier C. van Dalen, M. J. F. Gales
An on-line incremental speaker adaptation technique for audio stream transcription
Diego Giuliani, Fabio Brugnara
Accent- and speaker-specific polyphone decision trees for non-native speech recognition
Dominic Telaar, Mark C. Fuhs
Investigations on hessian-free optimization for cross-entropy training of deep neural networks
Simon Wiesler, Jinyu Li, Jian Xue
Cross-lingual acoustic model adaptation based on transfer vector field smoothing with MAP
Masahiro Saiko, Shigeki Matsuda, Ken Hanazawa, Ryosuke Isotani, Chiori Hori
N-best rescoring by phoneme classifiers using subclass adaboost algorithm
Hiroshi Fujimura, Yusuke Shinohara, Takashi Masuko
Stream selection and integration in multistream ASR using GMM-based performance monitoring
Tetsuji Ogawa, Feipeng Li, Hynek Hermansky
VTLN based on the linear interpolation of contiguous mel filter-bank energies
Néstor Becerra Yoma, Claudio Garretón, Fernando Huenupán, Ignacio Catalán, Jorge Wuth
Context-dependent modeling and speaker normalization applied to reservoir-based phone recognition
Fabian Triefenbach, Azarakhsh Jalalvand, Kris Demuynck, Jean-Pierre Martens
Interpolation of acoustic models for speech recognition
Thiago Fraga-Silva, Jean-Luc Gauvain, Lori Lamel
Training log-linear acoustic models in higher-order polynomial feature space for speech recognition
M. Tahir, H. Huang, Ralf Schlüter, Hermann Ney, Louis ten Bosch, Bert Cranen, Lou Boves
Comparison of spectral analysis methods for automatic speech recognition
Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian
Synthetic speaker models using VTLN to improve the performance of children in mismatched speaker conditions for ASR
D. R. Sanand, T. Svendsen
Exploring convolutional neural network structures and optimization techniques for speech recognition
Ossama Abdel-Hamid, Li Deng, Dong Yu
Rediscovering 25 years of discoveries in spoken language processing: a preliminary ISCA archive analysis
Joseph Mariani, Patrick Paroubek, Gil Francopoulo, Marine Delaborde
An inter- and cross-disciplinary perspective of spoken language processing
Hiroya Fujisaki
Progress and prospects for speech technology: what ordinary people think
Roger K. Moore
Feature-rich sub-lexical language models using a maximum entropy approach for German LVCSR
M. Ali Basha Shaik, Amr El-Desoky Mousa, Ralf Schlüter, Hermann Ney
Morpheme level hierarchical pitman-yor class-based language models for LVCSR of morphologically rich languages
Amr El-Desoky Mousa, M. Ali Basha Shaik, Ralf Schlüter, Hermann Ney
Discriminatively trained dependency language modeling for conversational speech recognition
Benjamin Lambert, Bhiksha Raj, Rita Singh
Prefix tree based n-best list re-scoring for recurrent neural network language model used in speech recognition system
Yujing Si, Qingqing Zhang, Ta Li, Jielin Pan, Yonghong Yan
Cross-domain paraphrasing for improving language modelling using out-of-domain data
X. Liu, M. J. F. Gales, P. C. Woodland
Viterbi decoding for latent words language models using gibbs sampling
Ryo Masumura, Hirokazu Masataki, Takanobu Oba, Osamu Yoshioka, Satoshi Takahashi
Computationally efficient objective function for algebraic codebook optimization in ACELP
Tom Bäckström
Speech quality prediction for artificial bandwidth extension algorithms
Sebastian Möller, Emilia Kelaidi, Friedemann Köster, Nicolas Côté, Patrick Bauer, Tim Fingscheidt, Thomas Schlien, Hannu Pulakka, Paavo Alku
Speech enhancement with weighted denoising auto-encoder
Bing-yin Xia, Chang-chun Bao
Syllable-based pitch encoding for low bit rate speech coding with recognition/synthesis architecture
Milos Cernak, Xingyu Na, Philip N. Garner
Artificial bandwidth extension based on regularized piecewise linear mapping with discriminative region weighting and long-Span features
Nguyen Duc Duy, Masayuki Suzuki, Nobuaki Minematsu, Keikichi Hirose
Enhanced muting method in packet loss concealment of ITU-t g.722 employing optimized sigmoid function
Bong-Ki Lee, Chungsoo Lim, Jihwan Park, Joon-Hyuk Chang
The interplay of intonation and complex lexical tones: how speaker attitudes affect the realization of glottalization on vietnamese sentence-final particles
Thi-Lan Nguyen, Alexis Michaud, Do Dat Tran, Dang-Khoa Mac
The voice prominence hypothesis: the interplay of F0 and voice source features in accentuation
Ailbhe Ní Chasaide, Irena Yanushevskaya, John Kane, Christer Gobl
Mora-based pre-low raising in Japanese pitch accent
Albert Lee, Yi Xu, Santitham Prom-on
Prosodic cues of sarcastic speech in French: slower, higher, wider
Hélène Lœvenbruck, Mohamed Ameur Ben Jannet, Mariapaola D'Imperio, Mathilde Spini, Maud Champagne-Lavau
Correlates of contrastive focus in congenitally blind adults and sighted adults
Lucie Ménard, Annie Leclerc, Mark K. Tiede, Amélie Prémont, Christine Turgeon, Paméla Trudeau-Fisette, Dominique Côté
Is protrusion of French rounded vowels affected by prosodic positions?
Laurianne Georgeton, Nicolas Audibert
Intelligibility-enhancing speech modifications: the hurricane challenge
Martin Cooke, Catherine Mayo, Cassia Valentini-Botinhao
Statistical synthesizer with embedded prosodic and spectral modifications to generate highly intelligible speech in noise
D. Erro, T. C. Zorilă, Yannis Stylianou, E. Navas, I. Hernaez
Lombard modified text-to-speech synthesis for improved intelligibility: submission for the hurricane challenge 2013
Antti Suni, Reima Karhila, Tuomo Raitio, Mikko Kurimo, Martti Vainio, Paavo Alku
Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise
Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King, Yannis Stylianou
Increasing speech intelligibility via spectral shaping with frequency warping and dynamic range compression plus transient enhancement
Elizabeth Godoy, Yannis Stylianou
Improving speech intelligibility in noise by SII-dependent preprocessing using frequency-dependent amplification and dynamic range compression
Henning Schepker, Jan Rennies, Simon Doclo
SII-based speech preprocessing for intelligibility improvement in noise
Cees H. Taal, Jesper Jensen
Rephrasing-based speech intelligibility enhancement
Mengqiu Zhang, Petko N. Petkov, W. Bastiaan Kleijn
Information-preserving temporal reallocation of speech in the presence of fluctuating maskers
Vincent Aubanel, Martin Cooke
Preservation of speech spectral dynamics enhances intelligibility
Petko N. Petkov, W. Bastiaan Kleijn
An overview of the VUB entry for the 2013 hurricane challenge
Henk Brouckxon, Werner Verhelst
Improvement of speech intelligibility by reallocation of spectral energy
Reiko Takou, Nobumasa Seiyama, Atsushi Imai