doi: 10.21437/Interspeech.2015
The emergence of compositional structure in language evolution and development
Mary E. Beckman
The technology powering personal digital assistants
Ruhi Sarikaya
The HBP-atlas — concept, perspectives, and application for language and speech research
Katrin Amunts
Voices of power, passion, and personality
Klaus Scherer
Learning the speech front-end with raw waveform CLDNNs
Tara N. Sainath, Ron J. Weiss, Andrew Senior, Kevin W. Wilson, Oriol Vinyals
Architectures for deep neural network based acoustic models defined over windowed speech waveforms
Mayank Bhargava, Richard Rose
Analysis of CNN-based speech recognition system using raw speech as input
Dimitri Palaz, Mathew Magimai-Doss, Ronan Collobert
Bilinear map of filter-bank outputs for DNN-based speech recognition
Tetsuji Ogawa, Kenshiro Ueda, Kouichi Katsurada, Tetsunori Kobayashi, Tsuneo Nitta
Speech recognition with temporal neural networks
Payton Lin, Dau-Cheng Lyu, Yun-Fan Chang, Yu Tsao
Convolutional neural networks for acoustic modeling of raw time signal in LVCSR
Pavel Golik, Zoltán Tüske, Ralf Schlüter, Hermann Ney
Stable and unstable intervals as a basic segmentation procedure of the speech signal
Ulrike Glavitsch, Lei He, Volker Dellwo
Polysyllabic shortening and word-final lengthening in English
Andreas Windmann, Juraj Šimko, Petra Wagner
The acoustics of word stress in English as a function of stress level and speaking style
Anders Eriksson, Mattias Heldner
Pitch accent distribution in German infant-directed speech
Katharina Zahner, Muna Pohl, Bettina Braun
Acoustic correlates of perceived syllable prominence in German
Hansjörg Mixdorff, Christian Cossio-Mercado, Angelika Hönemann, Jorge Gurlekian, Diego Evin, Humberto Torres
Cross-modality matching of linguistic and emotional prosody
Simone Simonetti, Jeesun Kim, Chris Davis
Pitch scaling as a perceptual cue for questions in German
Jan Michalsky
Parameterization of prosodic headedness
Uwe D. Reichel, Katalin Mády, Štefan Beňuš
Detection of mizo tones
Biswajit Dev Sarma, Priyankoo Sarmah, Wendy Lalhminghlui, S. R. Mahadeva Prasanna
The intonation of echo wh-questions
Sophie Repp, Lena Rosin
Immediately postverbal questions in urdu
Farhat Jabeen, Tina Bögel, Miriam Butt
Prosodic (non-)realisation of broad, narrow and contrastive focus in Hungarian: a production and a perception study
Katalin Mády
F0 discontinuity as a marker of prosodic boundary strength in lombard speech
Štefan Beňuš, Uwe D. Reichel, Juraj Šimko
Comparing journalistic and spontaneous speech: prosodic and spectral analysis
Cédric Gendrot, Martine Adda-Decker, Yaru Wu
Rhythm influences the tonal realisation of focus
Nadja Schauffler, Katrin Schweitzer
Linguistic measures of pitch range in slavic and Germanic languages
Bistra Andreeva, Bernd Möbius, Grazyna Demenko, Frank Zimmerer, Jeanin Jügler
The effect of stress on vowel space in daxi hakka Chinese
Chunan Qiu, Jie Liang
Declination, peak height and pitch level in declaratives and questions of south connaught irish
Maria O'Reilly, Ailbhe Ní Chasaide
Contextual variation of tones in mizo
Priyankoo Sarmah, Leena Dihingia, Wendy Lalhminghlui
The prosodic marking of rhetorical questions in German
Daniela Wochner, Jana Schlegel, Nicole Dehé, Bettina Braun
A fast algorithm for improved intelligibility of speech-in-noise based on frequency and time domain energy reallocation
Tudor-Cătălin Zorilă, Yannis Stylianou
Intelligibility enhancement of casual speech for reverberant environments inspired by clear speech properties
Maria Koutsogiannaki, Petko N. Petkov, Yannis Stylianou
Intelligibility enhancement of vocal announcements for public address systems: a design for all through a presbycusis pre-compensation filter
A. Ben Jemaa, N. Mechergui, G. Courtois, A. Mudry, S. Djaziri-Larbi, M. Turki, H. Lissek, M. Jaidane
Model-based integration of reverberation for noise-adaptive near-end listening enhancement
Henning Schepker, David Hülsmeier, Jan Rennies, Simon Doclo
Online Lombard adaptation in incremental speech synthesis
Sebastian Rottschäfer, Hendrik Buschmeier, Herwin van Welbergen, Stefan Kopp
Comparison of Gaussian process regression and Gaussian mixture models in spectral tilt modelling for intelligibility enhancement of telephone speech
Emma Jokinen, Ulpu Remes, Paavo Alku
A discriminative reliability-aware classification model with applications to intelligibility classification in pathological speech
Naveen Kumar, Shrikanth S. Narayanan
Voiced/unvoiced transitions in speech as a potential bio-marker to detect Parkinson's disease
J. R. Orozco-Arroyave, Florian Hönig, J. D. Arias-Londoño, J. F. Vargas-Bonilla, Sabine Skodda, J. Rusz, Elmar Nöth
Low-frequency components analysis in running speech for the automatic detection of Parkinson's disease
T. Villa-Cañas, J. D. Arias-Londoño, J. R. Orozco-Arroyave, J. F. Vargas-Bonilla, Elmar Nöth
Automatic detection of Parkinson's disease from continuous speech recorded in non-controlled noise conditions
J. C. Vásquez-Correa, T. Arias-Vergara, J. R. Orozco-Arroyave, J. F. Vargas-Bonilla, J. D. Arias-Londoño, Elmar Nöth
Relevance vector machine for depression prediction
Nicholas Cummins, Vidhyasaharan Sethu, Julien Epps, Jarek Krajewski
Typicality and emotion in the voice of children with autism spectrum condition: evidence across three languages
Erik Marchi, Björn Schuller, Simon Baron-Cohen, Ofer Golan, Sven Bölte, Prerna Arora, Reinhold Häb-Umbach
Deep contextual language understanding in spoken dialogue systems
Chunxi Liu, Puyang Xu, Ruhi Sarikaya
RNN-based labeled data generation for spoken language understanding
Yik-Cheung Tam, Yangyang Shi, Hunk Chen, Mei-Yuh Hwang
Is it time to switch to word embedding and recurrent neural networks for spoken language understanding?
Vedran Vukotic, Christian Raymond, Guillaume Gravier
Recurrent neural network and LSTM models for lexical utterance classification
Suman Ravuri, Andreas Stolcke
Semantic retrieval of personal photos using a deep autoencoder fusing visual features with speech annotations represented as word/paragraph vectors
Hung-tsung Lu, Yuan-ming Liou, Hung-yi Lee, Lin-shan Lee
A comparison of normalization techniques applied to latent space representations for speech analytics
Mohamed Morchid, Richard Dufour, Driss Matrouf
Study of entity-topic models for OOV proper name retrieval
Imran Sheikh, Irina Illina, Dominique Fohr
Audio quotation marks for natural language understanding
Simon Boutin, Réal Tremblay, Patrick Cardinal, Doug Peters, Pierre Dumouchel
Using word confusion networks for slot filling in spoken language understanding
Xiaohao Yang, Jia Liu
Distributed representation-based spoken word sense induction
Justin Chiu, Yajie Miao, Alan W. Black, Alexander I. Rudnicky
Structuring lectures in massive open online courses (MOOCs) for efficient learning by linking similar sections and predicting prerequisites
Sheng-syun Shen, Hung-yi Lee, Shang-wen Li, Victor Zue, Lin-shan Lee
News talk-show chaptering with journalistic genres
Delphine Charlet, Géraldine Damnati, Jérémy Trione
An analysis of time-aggregated and time-series features for scoring different aspects of multimodal presentation data
Vikram Ramanarayanan, Lei Chen, Chee Wee Leong, Gary Feng, David Suendermann-Oeft
Incorporating prosodic prominence evidence into term weights for spoken content retrieval
David N. Racca, Gareth J. F. Jones
Leveraging word embeddings for spoken document summarization
Kuan-Yu Chen, Shih-Hung Liu, Hsin-Min Wang, Berlin Chen, Hsin-Hsi Chen
Mutually exclusive grounding for weakly supervised non-negative matrix factorisation
Vincent Renkens, Hugo Van hamme
Using semantic maps for robust natural language interaction with robots
Emanuele Bastianelli, Danilo Croce, Roberto Basili, Daniele Nardi
Efficient learning for spoken language understanding tasks with word embedding based pre-training
Yi Luan, Shinji Watanabe, Bret Harsham
Zero-shot semantic parser for spoken language understanding
Emmanuel Ferreira, Bassam Jabaian, Fabrice Lefèvre
Adapting lexical representation and OOV handling from written to spoken language with word embedding
Jeremie Tafforeau, Thierry Artieres, Benoit Favre, Frederic Bechet
Dialog state tracking using long short-term memory neural networks
Xiaohao Yang, Jia Liu
Detecting repetitions in spoken dialogue systems using phonetic distances
José Lopes, Giampiero Salvi, Gabriel Skantze, Alberto Abad, Joakim Gustafson, Fernando Batista, Raveesh Meena, Isabel Trancoso
Multi-language hypotheses ranking and domain tracking for open domain dialogue systems
Paul A. Crook, Jean-Philippe Robichaud, Ruhi Sarikaya
Measuring mimicry in task-oriented conversations: degree of mimicry is related to task difficulty
Vijay Solanki, Alessandro Vinciarelli, Jane Stuart-Smith, Rachel Smith
Auto-imputing radial basis functions for neural-network turn-taking models
Kornel Laskowski
Effect of gender and call duration on customer satisfaction in call center big data
Quim Llimona, Jordi Luque, Xavier Anguera, Zoraida Hidalgo, Souneil Park, Nuria Oliver
Using profile similarity to measure agreement in personality perception
Zoraida Callejas, David Griol
Relieving mental stress of speakers using a tele-operated robot in foreign language speech education
Shizuka Nakamura, Miki Watanabe, Yuichiro Yoshikawa, Kohei Ogawa, Hiroshi Ishiguro
Backward mimicry and forward influence in prosodic contour choice in standard American English
Agustín Gravano, Štefan Beňuš, Rivka Levitan, Julia Hirschberg
The role of speakers and context in classifying competition in overlapping speech
Shammur Absar Chowdhury, Morena Danieli, Giuseppe Riccardi
Automatic detection and annotation of disfluencies in spoken French corpora
George Christodoulides, Mathieu Avanzi
Clustering novel intents in a conversational interaction system with semantic parsing
Dilek Hakkani-Tür, Yun-Cheng Ju, Geoffrey Zweig, Gokhan Tur
Semantic analysis of spoken input using Markov logic networks
Vladimir Despotovic, Oliver Walter, Reinhold Haeb-Umbach
Hierarchical discriminative model for spoken language understanding based on convolutional neural network
Jan Švec, Adam Chýlek, Luboš Šmídl
Learning semantic hierarchy with distributed representations for unsupervised spoken language understanding
Yun-Nung Chen, William Yang Wang, Alexander I. Rudnicky
The effect of soft, modal and loud voice levels on entrainment in noisy conditions
Éva Székely, Mark T. Keane, Julie Carson-Berndsen
Does voice anthropomorphism affect lexical alignment in speech-based human-computer dialogue?
Benjamin R. Cowan, Holly P. Branigan
Exploiting top-down source models to improve binaural localisation of multiple sources in reverberant environments
Ning Ma, Guy J. Brown, Jose A. Gonzalez
Binaural sound source localisation and tracking using a dynamic spherical head model
Christopher Schymura, Fiete Winter, Dorothea Kolossa, Sascha Spors
The role of temporal resolution in modulation-based speech segregation
Tobias May, Thomas Bentsen, Torsten Dau
Improving automatic speech recognition in spatially-aware hearing aids
Hendrik Kayser, Constantin Spille, Daniel Marquardt, Bernd T. Meyer
Dereverberation for active human-robot communication robust to speaker's face orientation
Randy Gomez, Levko Ivanchuk, Keisuke Nakamura, Takeshi Mizumoto, Kazuhiro Nakadai
Multi-task learning for text-dependent speaker verification
Nanxin Chen, Yanmin Qian, Kai Yu
JFA for speaker recognition with random digit strings
Themos Stafylakis, Patrick Kenny, Md. Jahangir Alam, Marcel Kockmann
Structured prediction for speaker identification in TV series
Elena Knyazeva, Guillaume Wisniewski, Hervé Bredin, François Yvon
Speaker recognition by means of acoustic and phonetically informed GMMs
Sandro Cumani, Pietro Laface, Farzana Kulsoom
A fast approach to psychoacoustic model compensation for robust speaker recognition in additive noise
Ashish Panda
Blind score normalization method for PLDA based speaker recognition
Danila Doroshin, Nikolay Lubimov, Marina Nastasenko, Mikhail Kotov
Non-linear PLDA for i-vector speaker verification
Sergey Novoselov, Timur Pekhovsky, Oleg Kudashev, Valentin S. Mendelev, Alexey Prudnikov
On the need of template protection for voice authentication
Carlos Vaquero, Patricia Rodríguez
Evaluation and calibration of short-term aging effects in speaker verification
Finnian Kelly, John H. L. Hansen
Phone-centric local variability vector for text-constrained speaker verification
Liping Chen, Kong Aik Lee, Bin Ma, Wu Guo, Haizhou Li, Li-Rong Dai
Cosine distance features for robust speaker verification
Kuruvachan K. George, C. Santhosh Kumar, K I Ramachandran, Ashish Panda
Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification
Sayaka Shiota, Fernando Villavicencio, Junichi Yamagishi, Nobutaka Ono, Isao Echizen, Tomoko Matsui
Noise robust speaker recognition with convolutive sparse coding
Antti Hurmalainen, Rahim Saeidi, Tuomas Virtanen
Combining amplitude and phase-based features for speaker verification with short duration utterances
Md. Jahangir Alam, Patrick Kenny, Themos Stafylakis
The reddots data collection for speaker recognition
Kong Aik Lee, Anthony Larcher, Guangsen Wang, Patrick Kenny, Niko Brümmer, David van Leeuwen, Hagai Aronowitz, Marcel Kockmann, Carlos Vaquero, Bin Ma, Haizhou Li, Themos Stafylakis, Md. Jahangir Alam, Albert Swart, Javier Perez
Noise-robust speaker recognition based on morphological component analysis
Yongjun He, Chen Chen, Jiqing Han
Analysis of mutual duration and noise effects in speaker recognition: benefits of condition-matched cohort selection in score normalization
Andreas Nautsch, Rahim Saeidi, Christian Rathgeb, Christoph Busch
Robustness to additive noise of locally-normalized cepstral coefficients in speaker verification
Josué Fredes, José Novoa, Victor Poblete, Simon King, Richard M. Stern, Néstor Becerra Yoma
Probabilistic linear discriminant analysis for robust speaker identification in co-channel speech
Navid Shokouhi, John H. L. Hansen
Community detection with manifold learning on speaker i-vector space for Chinese
Hongcui Wang, Di Jin, Lantian Li, Jianwu Dang
A comparison of neural network feature transforms for speaker diarization
Sree Harsha Yella, Andreas Stolcke
Clustering short push-to-talk segments
Ilya Shapiro, Neta Rabin, Irit Opher, Itshak Lapidot
Exploring ANN back-ends for i-vector based speaker age estimation
Anna Fedorova, Ondřej Glembek, Tomi Kinnunen, Pavel Matějka
Analysis of the second phase of the 2013-2014 i-vector machine learning challenge
Désiré Bansé, George R. Doddington, Daniel Garcia-Romero, John J. Godfrey, Craig S. Greenberg, Jaime Hernández-Cordero, John M. Howard, Alvin F. Martin, Lisa P. Mason, Alan McCree, Douglas A. Reynolds
NIST language recognition evaluation — plans for 2015
Alvin F. Martin, Craig S. Greenberg, John M. Howard, Désiré Bansé, George R. Doddington, Jaime Hernández-Cordero, Lisa P. Mason
Factor analysis for speaker segmentation and improved speaker diarization
Brecht Desplanques, Kris Demuynck, Jean-Pierre Martens
Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations
Koji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Katsuya Takanashi, Tatsuya Kawahara
Novel clustering selection criterion for fast binary key speaker diarization
Héctor Delgado, Xavier Anguera, Corinne Fredouille, Javier Serrano
Speaker diarization with i-vectors from DNN senone posteriors
Gregory Sell, Daniel Garcia-Romero, Alan McCree
Using voice-quality measurements with prosodic and spectral features for speaker diarization
Abraham Woubie, Jordi Luque, Javier Hernando
Integrating online i-vector extractor with information bottleneck based speaker diarization system
Srikanth Madikeri, Ivan Himawan, Petr Motlicek, Marc Ferras
Phase perception of the glottal excitation of vocoded speech
Tuomo Raitio, Lauri Juvela, Antti Suni, Martti Vainio, Paavo Alku
Using acoustics to improve pronunciation for synthesis of low resource languages
Sunayana Sitaram, Serena Jeblee, Alan W. Black
Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrum
Tadashi Inai, Sunao Hara, Masanobu Abe, Yusuke Ijima, Noboru Miyazaki, Hideyuki Mizuno
Pruning redundant synthesis units based on static and delta unit appearance frequency
Heng Lu, Wei Zhang, Xu Shao, Quan Zhou, Wenhui Lei, Hongbin Zhou, Andrew Breen
Emotional transplant in statistical speech synthesis based on emotion additive model
Yamato Ohtani, Yu Nasu, Masahiro Morita, Masami Akamine
Generalized variable parameter HMMs based acoustic-to-articulatory inversion
Xurong Xie, Xunying Liu, Lan Wang, Rongfeng Su
Semi-supervised training of a voice conversion mapping function using a joint-autoencoder
Seyed Hamidreza Mohammadi, Alexander Kain
On glottal source shape parameter transformation using a novel deterministic and stochastic speech analysis and synthesis system
Stefan Huber, Axel Roebel
Fluent personalized speech synthesis with prosodic word-level spontaneous speech generation
Yi-Chin Huang, Chung-Hsien Wu, Ming-Ge Shie
Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics
Yuji Oshima, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
Evaluation of state mapping based foreign accent conversion
Markus Toman, Michael Pucher
Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features
Zhizheng Wu, Simon King
Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis
Yang Wang, Minghao Yang, Zhengqi Wen, Jianhua Tao
F0 parameterization of glottalized tones for HMM-based vietnamese TTS
Duy Khanh Ninh, Yoichi Yamashita
Deep neural network context embeddings for model selection in rich-context HMM synthesis
Thomas Merritt, Junichi Yamagishi, Zhizheng Wu, Oliver Watts, Simon King
An investigation of context clustering for statistical speech synthesis with deep neural network
Bo Chen, Zhehuai Chen, Jiachen Xu, Kai Yu
Sentence-level control vectors for deep neural network speech synthesis
Oliver Watts, Zhizheng Wu, Simon King
Micro-structure of disfluencies: basics for conversational speech synthesis
Simon Betz, Petra Wagner, David Schlangen
Using automatic stress extraction from audio for improved prosody modelling in speech synthesis
György Szaszák, András Beke, Gábor Olaszy, Bálint Pál Tóth
Reconstructing voices within the multiple-average-voice-model framework
Pierre Lanchantin, Christophe Veaux, Mark J. F. Gales, Simon King, Junichi Yamagishi
HMM based myanmar text to speech system
Ye Kyaw Thu, Win Pa Pa, Jinfu Ni, Yoshinori Shiga, Andrew Finch, Chiori Hori, Hisashi Kawai, Eiichiro Sumita
Multiple feed-forward deep neural networks for statistical parametric speech synthesis
Shinji Takaki, SangJin Kim, Junichi Yamagishi, JongJin Kim
Sequence-to-sequence neural net models for grapheme-to-phoneme conversion
Kaisheng Yao, Geoffrey Zweig
Knowledge versus data in TTS: evaluation of a continuum of synthesis systems
Rosie Kay, Oliver Watts, Roberto Barra Chicote, Cassie Mayo
Improving G2p from wiktionary and other (web) resources
Steffen Eger
BLSTM neural networks for speech driven head motion synthesis
Chuang Ding, Pengcheng Zhu, Lei Xie
Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential
Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
Reconstructing intelligible audio speech from visual speech features
Thomas Le Cornu, Ben Milner
Universal grapheme-based speech synthesis
Sunayana Sitaram, Alok Parlikar, Gopala Krishna Anumanchipalli, Alan W. Black
Artificial personality and disfluency
Mirjam Wester, Matthew Aylett, Marcus Tomalin, Rasmus Dall
Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
Marc Evrard, Samuel Delalez, Christophe d'Alessandro, Albert Rilliard
A multi-layer F0 model for singing voice synthesis using a b-spline representation with intuitive controls
Luc Ardaillon, Gilles Degottex, Axel Roebel
Creating expressive synthetic voices by unsupervised clustering of audiobooks
Igor Jauk, Antonio Bonafonte, Paula Lopez-Otero, Laura Docio-Fernandez
Articulatory-based conversion of foreign accents with deep neural networks
Sandesh Aryal, Ricardo Gutierrez-Osuna
Anomaly-based annotation errors detection in TTS corpora
Jindřich Matoušek, Daniel Tihelka
Analysing automatic descriptions of intonation with ICARUS
Katrin Schweitzer, Markus Gärtner, Arndt Riester, Ina Rösiger, Kerstin Eckart, Jonas Kuhn, Grzegorz Dogil
iCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent
Nancy F. Chen, Rong Tong, Darren Wee, Peixuan Lee, Bin Ma, Haizhou Li
Development of a Cantonese dysarthric speech corpus
Ka Ho Wong, Yu Ting Yeung, Edwin H. Y. Chan, Patrick C. M. Wong, Gina-Anne Levow, Helen Meng
Stylex: a corpus of educational videos for research on speaking styles and their impact on engagement and learning
Harish Arsikere, Sonal Patil, Ranjeet Kumar, Kundan Shrivastava, Om Deshmukh
A dialog act tagging approach to behavioral coding: a case study of addiction counseling conversations
Doğan Can, David C. Atkins, Shrikanth S. Narayanan
Analysing rhythm in ritual discourse in yucatec maya using automatic speech alignment
Valentina Vapnarsky, Claude Barras, Cédric Becquey, David Doukhan, Martine Adda-Decker, Lori Lamel
Noise-matched training of CRF based sentence end detection models
Madina Hasan, Rama Doddipatla, Thomas Hain
The effect of spectral slope on pitch perception
Jianjing Kuang, Mark Liberman
Combined cine- and tagged-MRI for tracking landmarks on the tongue surface
Honghao Bao, Wenhuan Lu, Kiyoshi Honda, Jianguo Wei, Qiang Fang, Jianwu Dang
Human vocal tract growth: a longitudinal study of the development of various anatomical structures
Guillaume Barbier, Louis-Jean Boë, Guillaume Captier, Rafael Laboissière
Analysis of coarticulated speech using estimated articulatory trajectories
Ganesh Sivaraman, Vikramjit Mitra, Mark K. Tiede, Elliot Saltzman, Louis Goldstein, Carol Espy-Wilson
Speech planning in 4-year-old children versus adults: acoustic and articulatory analyses
Guillaume Barbier, Pascal Perrier, Lucie Ménard, Yohan Payan, Mark K. Tiede, Joseph S. Perkell
Morphological and acoustic analysis of the vocal tract using a multi-speaker volumetric MRI dataset
Tokihiko Kaburagi
Experimental assessment of the tongue incompressibility hypothesis during speech production
Zisis Iason Skordilis, Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan
Multilingual bottleneck features for language recognition
Radek Fér, Pavel Matějka, František Grézl, Oldřich Plchot, Jan Černocký
DNN senone MAP multinomial i-vectors for phonotactic language recognition
Alan McCree, Daniel Garcia-Romero
Deep bottleneck network based i-vector representation for language identification
Yan Song, Xinhai Hong, Bing Jiang, Ruilian Cui, Ian McLoughlin, Li-Rong Dai
An end-to-end approach to language identification in short utterances using convolutional neural networks
Alicia Lozano-Diez, Ruben Zazo-Candil, Javier Gonzalez-Dominguez, Doroteo T. Toledano, Joaquin Gonzalez-Rodriguez
Boosting universal speech attributes classification with deep neural network for foreign accent characterization
Ville Hautamäki, Sabato Marco Siniscalchi, Hamid Behravan, Valerio Mario Salerno, Ivan Kukanov
Multilingual tandem bottleneck feature for language identification
Wang Geng, Jie Li, Shanshan Zhang, Xinyuan Cai, Bo Xu
On compressibility of neural network phonological features for low bit rate speech coding
Afsaneh Asaei, Milos Cernak, Hervé Bourlard
Robust and accurate LSF location with laguerre method
Michał Lenarczyk
Interactivity-aware playout adaptation
Jochen Issing, Nikolaus Färber, Reinhard German
Advanced time shrinking using a drop classifier based on codec features
Jochen Issing, Nikolaus Färber, Reinhard German
Measuring and monitoring speech quality for voice over IP with POLQA, viSQOL and p.563
Andrew Hines, Eoin Gillen, Naomi Harte
Towards the prediction of human speaker identification performance from measured speech quality
Laura Fernández Gallardo, Sebastian Möller
Personalization of word-phrase-entity language models
M. Levit, Andreas Stolcke, R. Subba, S. Parthasarathy, S. Chang, S. Xie, T. Anastasakos, Benoit Dumoulin
Discriminative bilinear language modeling for broadcast transcriptions
Akio Kobayashi, Manon Ichiki, Takahiro Oku, Kazuo Onoe, Shoei Sato
Recognize foreign low-frequency words with similar pairs
Xi Ma, Xiaoxi Wang, Dong Wang, Zhiyong Zhang
Combinations of various language model technologies including data expansion and adaptation in spontaneous speech recognition
Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito
Bringing contextual information to google speech recognition
Petar Aleksic, Mohammadreza Ghodsi, Assaf Michaely, Cyril Allauzen, Keith Hall, Brian Roark, David Rybach, Pedro Moreno
Sequence-based class tagging for robust transcription in ASR
Lucy Vasserman, Vlad Schogol, Keith Hall
The INTERSPEECH 2015 computational paralinguistics challenge: nativeness, Parkinson's & eating condition
Björn Schuller, Stefan Steidl, Anton Batliner, Simone Hantke, Florian Hönig, J. R. Orozco-Arroyave, Elmar Nöth, Yue Zhang, Felix Weninger
The degree of nativeness sub-challenge: the data
Florian Hönig
Phrase accentuation verification and phonetic variation measurement for the degree of nativeness sub-challenge
Claude Montacié, Marie-José Caraty
Combining multiple approaches to predict the degree of nativeness
Eugénio Ribeiro, Jaime Ferreira, Julia Olcoz, Alberto Abad, Helena Moniz, Fernando Batista, Isabel Trancoso
Automated evaluation of non-native English pronunciation quality: combining knowledge- and data-driven features at multiple time scales
Matthew P. Black, Daniel Bone, Zisis Iason Skordilis, Rahul Gupta, Wei Xia, Pavlos Papadopoulos, Sandeep Nallan Chakravarthula, Bo Xiao, Maarten Van Segbroeck, Jangwon Kim, Panayiotis G. Georgiou, Shrikanth S. Narayanan
The Parkinson's condition sub-challenge: the data
J. R. Orozco-Arroyave
Estimating the severity of Parkinson's disease from speech using linear regression and database partitioning
Dávid Sztahó, Gábor Kiss, Klára Vicsi
Random forest-based prediction of Parkinson's disease progression using acoustic, ASR and intelligibility features
Alexander Zlotnik, Juan M. Montero, Rubén San-Segundo, Ascensión Gallardo-Antolín
Automatic recognition of unified Parkinson's disease rating from speech with acoustic, i-vector and phonotactic features
Guozhen An, David Guy Brizan, Min Ma, Michelle Morales, Ali Raza Syed, Andrew Rosenberg
Parkinson's condition estimation using speech acoustic and inversely mapped articulatory data
Seongjun Hahm, Jun Wang
Segment-dependent dynamics in predicting Parkinson's disease
James R. Williamson, Thomas F. Quatieri, Brian S. Helfer, Joseph Perricone, Satrajit S. Ghosh, Gregory Ciccarelli, Daryush D. Mehta
The eating condition sub-challenge: the data
Anton Batliner
Automatic classification of eating conditions from speech using acoustic feature selection and a set of hierarchical support vector machine classifiers
Abhay Prasad, Prasanta Kumar Ghosh
Combining hierarchical classification with frequency weighting for the recognition of eating conditions
Johannes Wagner, Andreas Seiderer, Florian Lingenfelser, Elisabeth André
Acoustic group feature selection using wrapper method for automatic eating condition recognition
Dara Pir, Theodore Brown
Comparing SVM, softmax, and shallow neural networks for eating condition classification
Thomas Pellegrini
Using representation learning and out-of-domain data for a paralinguistic speech task
Benjamin Milde, Chris Biemann
Fisher vectors with cascaded normalization for paralinguistic analysis
Heysem Kaya, Alexey A. Karpov, Albert Ali Salah
Automatic estimation of Parkinson's disease severity from diverse speech tasks
Jangwon Kim, Md. Nasir, Rahul Gupta, Maarten Van Segbroeck, Daniel Bone, Matthew P. Black, Zisis Iason Skordilis, Zhaojun Yang, Panayiotis G. Georgiou, Shrikanth S. Narayanan
Assessing the degree of nativeness and Parkinson's condition using Gaussian processes and deep rectifier neural networks
Tamás Grósz, Róbert Busa-Fekete, Gábor Gosztolya, László Tóth
The INTERSPEECH 2015 computational paralinguistics challenge: a summary of results
Stefan Steidl
Wrapping up: the story of the compare challenges, what we learned and where to go
Anton Batliner
Recognition of voiced sounds with a continuous state HMM
S. M. Houghton, Colin J. Champion, Philip Weber
Learning speech rate in speech recognition
Xiangyu Zeng, Shi Yin, Dong Wang
Pronunciation and silence probability modeling for ASR
Guoguo Chen, Hainan Xu, Minhua Wu, Daniel Povey, Sanjeev Khudanpur
Exploring minimal pronunciation modeling for low resource languages
Marelie Davel, Etienne Barnard, Charl van Heerden, William Hartmann, Damianos Karakos, Richard Schwartz, Stavros Tsakalidis
Attribute knowledge integration for speech recognition based on multi-task learning neural networks
Hao Zheng, Zhanlei Yang, Liwei Qiao, Jianping Li, Wenju Liu
Detecting audio-visual synchrony using deep neural networks
Etienne Marcheret, Gerasimos Potamianos, Josef Vopicka, Vaibhava Goel
Cross database training of audio-visual hidden Markov models for phone recognition
Shahram Kalantari, David Dean, Houman Ghaemmaghami, Sridha Sridharan, Clinton Fookes
Incorporating visual information for spoken term detection
Shahram Kalantari, David Dean, Sridha Sridharan
Integration of deep bottleneck features for audio-visual speech recognition
Hiroshi Ninomiya, Norihide Kitaoka, Satoshi Tamura, Yurie Iribe, Kazuya Takeda
Automatic detection of sentence prominence in speech using predictability of word-level acoustic features
Sofoklis Kakouros, Okko Räsänen
An empirical model of emphatic word detection
Milos Cernak, Pierre-Edouard Honnet
Using tilt for automatic emphasis detection with Bayesian networks
Yishuang Ning, Zhiyong Wu, Xiaoyan Lou, Helen Meng, Jia Jia, Lianhong Cai
Analysis of a low-dimensional bottleneck neural network representation of speech for modelling speech dynamics
Linxue Bai, Peter Jančovič, Martin Russell, Philip Weber
Statistical acoustic-to-articulatory mapping unified with speaker normalization based on voice conversion
Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose
Analysis of features from analytic representation of speech using MP-ABX measures
Raghavendra Reddy Pappagari, Karthika Vijayan, K. Sri Rama Murty
Source-filter separation of speech signal in the phase domain
Erfan Loweimi, Jon Barker, Thomas Hain
A maximum likelihood approach to the detection of moments of maximum excitation and its application to high-quality speech parameterization
Ranniery Maia, Yannis Stylianou, Masami Akamine
SABR: sparse, anchor-based representation of the speech signal
Christopher Liberatore, Sandesh Aryal, Zelun Wang, Seth Polsley, Ricardo Gutierrez-Osuna
Automatic transformation of irregular to regular voice by residual analysis and synthesis
Tamás Gábor Csapó, Géza Németh
Optical sensor calibration for electro-optical stomatography
Simon Preuß, Peter Birkholz
From text to formants — indirect model for trajectory prediction based on a multi-speaker parallel speech database
Kálmán Abari, Tamás Gábor Csapó, Bálint Pál Tóth, Gábor Olaszy
Layered nonnegative matrix factorization for speech separation
Chung-Chien Hsu, Jen-Tzung Chien, Tai-Shih Chi
Robust tongue tracking in ultrasound images: a multi-hypothesis approach
Catherine Laporte, Lucie Ménard
Objective measures for predicting the intelligibility of spectrally smoothed speech with artificial excitation
Danny Websdale, Thomas Le Cornu, Ben Milner
Vocal tremor analysis via AM-FM decomposition of empirical modes of the glottal cycle length time series
Christophe Mertens, Francis Grenez, François Viallet, Alain Ghio, Sabine Skodda, Jean Schoentgen
Estimating lower vocal tract features with closed-open phase spectral analyses
Elizabeth Godoy, Nicolas Malyska, Thomas F. Quatieri
Inductive implementation of segmental HMMs as CS-HMMs
S. M. Houghton, Colin J. Champion
A discriminative analysis within and across voiced and unvoiced consonants in neutral and whispered speech in multiple indian languages
G. Nisha Meenakshi, Prasanta Kumar Ghosh
Aligning meeting recordings via adaptive fingerprinting
T. J. Tsai, Andreas Stolcke
On representation learning for artificial bandwidth extension
Matthias Zöhrer, Robert Peharz, Franz Pernkopf
AM-FM based filter bank analysis for estimation of spectro-temporal envelopes and its application for speaker recognition in noisy reverberant environments
Dhananjaya Gowda, Rahim Saeidi, Paavo Alku
Fast and accurate phase unwrapping
Thomas Drugman, Yannis Stylianou
Sparse representation with temporal max-smoothing for acoustic event detection
Xugang Lu, Peng Shen, Yu Tsao, Chiori Hori, Hisashi Kawai
Estimation of glottal closure instants from telephone speech using a group delay-based approach that considers speech signal as a spectrum
G Anushiya Rachel, P Vijayalakshmi, T Nagarajan
The role of prosody and voice quality in text-dependent categories of storytelling across languages
Raúl Montaño, Francesc Alías
Neuromorphic based oscillatory device for incremental syllable boundary detection
Alexandre Hyafil, Milos Cernak
Mispronunciation detection without nonnative training data
Ann Lee, James Glass
Automatic accentedness evaluation of non-native speech using phonetic and sub-phonetic posterior probabilities
Ramya Rasipuram, Milos Cernak, Alexandre Nachen, Mathew Magimai-Doss
Using F0 contours to assess nativeness in a sentence repeat task
Min Ma, Keelan Evanini, Anastassia Loukina, Xinhao Wang, Klaus Zechner
Using linguistic indicators of difficulty to identify mild cognitive impairment
Rebecca Lunsford, Peter A. Heeman
Automatic intelligibility measures applied to speech signals simulating age-related hearing loss
Lionel Fontan, Jérôme Farinas, Isabelle Ferrané, Julien Pinquier, Xavier Aumont
Assessing empathy using static and dynamic behavior models based on therapist's language in addiction counseling
Sandeep Nallan Chakravarthula, Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou
SVitchboard II and fiSVer i: high-quality limited-complexity corpora of conversational English speech
Yuzong Liu, Rishabh Iyer, Katrin Kirchhoff, Jeff Bilmes
Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian model
Herman Kamper, Aren Jansen, Sharon Goldwater
LSTM for punctuation restoration in speech transcripts
Ottokar Tilk, Tanel Alumäe
Noise robust exemplar matching for speech enhancement: applications to automatic speech recognition
Emre Yılmaz, Deepak Baby, Hugo Van hamme
A study on robust detection of pronunciation erroneous tendency based on deep neural network
Yingming Gao, Yanlu Xie, Wen Cao, Jinsong Zhang
Vowel mispronunciation detection using DNN acoustic models with cross-lingual training
Shrikant Joshi, Nachiket Deo, Preeti Rao
Confidence-features and confidence-scores for ASR applications in arbitration and DNN speaker adaptation
Kshitiz Kumar, Ziad Al Bawab, Yong Zhao, Chaojun Liu, Benoit Dumoulin, Yifan Gong
Topic modeling for conference analytics
Pengfei Liu, Shoaib Jameel, Wai Lam, Bin Ma, Helen Meng
Sparse coding based features for speech units classification
Pulkit Sharma, Vinayak Abrol, A. D. Dileep, Anil Kumar Sao
Smarter driving with IDA, the intelligent driving assistant for singapore
Andreea I. Niculescu, Ngoc Thuy Huong Thai, Chongjia Ni, Boon Pang Lim, Kheng Hui Yeo, Rafael E. Banchs
Talk it out: adding speech interaction to support informational and transactional applications on public touch-screen kiosks
Kheng Hui Yeo, Rafael E. Banchs
Conversational agent and management tools for conference and tourism domain
Luis Fernando D'Haro, Seokhwan Kim, Rafael E. Banchs
Latvian speech-to-text transcription service
Askars Salimbajevs, Jevgenijs Strigins
System supporting speaker identification in emergency call center
Jakub Gałka, Joanna Grzybowska, Magdalena Igras, Paweł Jaciów, Kamil Wajda, Marcin Witkowski, Mariusz Ziółko
QAT2 — the QCRI advanced transcription and translation system
Ahmed Abdelali, Ahmed Ali, Francisco Guzmán, Felix Stahlberg, Stephan Vogel, Yifan Zhang
Implementation of a live dialectal media subtitling system
Michael Stadtschnitzer, Christoph Schmidt
A system for automatic broadcast news summarisation, geolocation and translation
Peter Bell, Catherine Lai, Clare Llewellyn, Alexandra Birch, Mark Sinclair
Media monitoring system for latvian radio and TV broadcasts
Artūrs Znotiņš, Kaspars Polis, Roberts Darģis
Meeting assistant application
Michel Assayag, Jonathan Huang, Jonathan Mamou, Oren Pereg, Saurav Sahay, Oren Shamir, Georg Stemmer, Moshe Wasserblat
SARMATA 2.0 automatic Polish language speech recognition system
Bartosz Ziółko, Tomasz Jadczyk, Dawid Skurzok, Piotr Żelasko, Jakub Gałka, Tomasz Pȩdzimąż, Ireneusz Gawlik, Szymon Pałka
Remeeting — get more out of meetings
Arlo Faria, Korbinian Riedhammer
Web application system for pronunciation practice by children with disabilities and to support cooperation of teachers and medical workers
Ikuyo Masuda-Katsuse
PATSY — it's all about pronunciation!
Caroline Kaufhold, Vadim Gamidov, Andreas Kiessling, Klaus Reinhard, Elmar Nöth
Real-time pitch modification system for speech and singing voice
Elias Azarov, Maxim Vashkevich, Denis Likhachov, Alexander Petrovsky
Nao is doing humour in the CHIST-ERA joker project
Guillaume Dubuisson Duplessis, Lucile Béchade, Mohamed A. Sehili, Agnès Delaborde, Vincent Letard, Anne-Laure Ligozat, Paul Deléglise, Yannick Estève, Sophie Rosset, Laurence Devillers
ABIMS — auditory bewildered interaction measurement system
Lisa Lange, Bartholomäus Pfeiffer, Daniel Duran
Phontasia — a game for training German orthography
Kay Berkling, Nadine Pflaumer, Alexei Coyplove
E-commu-book: an assistive technology for users with speech impairments
Ka Ho Wong, Wai Kim Leung, Helen Meng
Swiss graphogame: concept and design presentation of a computerised reading intervention for children with high risk for poor reading outcomes
Martina Röthlisberger, Iliana I. Karipidis, Georgette Pleisch, Volker Dellwo, Ulla Richardson, Silvia Brem
Neolexon — a therapy app for patients with aphasia
Jakob Pfab, Hanna Jakob, Mona Späth, Christoph Draxler
Acoustic stress detection for improved navigation of educational videos
Sonal Patil, Harish Arsikere, Om Deshmukh
Multimodal read-aloud ebooks for language learning
Xavier Anguera
Speech technologies for african languages: example of a multilingual calculator for education
Laurent Besacier, Elodie Gauthier, Mathieu Mangeot, Philippe Bretier, Paul Bagshaw, Olivier Rosec, Thierry Moudenc, François Pellegrino, Sylvie Voisin, Egidio Marsico, Pascal Nocera
The reddots platform for mobile crowd-sourcing of speech data
Kong Aik Lee, Guangsen Wang, Kam Pheng Ng, Hanwu Sun, Trung Hieu Nguyen, Ngoc Thuy Huong Thai, Bin Ma, Haizhou Li
Two extensions of umeda and teranishi's physical models of the human vocal tract
Takayuki Arai
Collaborative annotation for person identification in TV shows
Matheuz Budnik, Laurent Besacier, Johann Poignant, Hervé Bredin, Claude Barras, Mickael Stefas, Pierrick Bruneau, Thomas Tamisier
Phonetic/linguistic web services at BAS
Thomas Kisler, Florian Schiel, Uwe D. Reichel, Christoph Draxler
Managing speech databases with emur and the EMU-webapp
Raphael Winkelmann
Visual comparison of speaker groups
Sebastian Wankerl, Florian Hönig, Anton Batliner, J. R. Orozco-Arroyave, Elmar Nöth
Tools for rapid customization of S2s systems for emergent domains
Rohit Kumar, Matthew E. Roy, Sanjika Hewavitharana, Dennis N. Mehay, Nina Zinovieva
The speech recognition virtual kitchen turns one
Florian Metze, Eric Riebling, Eric Fosler-Lussier, Andrew Plummer, Rebecca Bates
Model-based adaptive pre-processing of speech for enhanced intelligibility in noise and reverberation
Jan Rennies, Andreas Volgenandt, Henning Schepker, Simon Doclo
Experiences with and new application ideas for the interspeech app
Sebastian Möller, Tilo Westermann
Traditional IVR and visual IVR — killing two birds with one stone
Dmitry Sityaev, Praphul Kumar, Rajesh Ramchander
Bayesian integration of sound source separation and speech recognition: a new approach to simultaneous speech recognition
Kousuke Itakura, Izaya Nishimuta, Yoshiaki Bando, Katsutoshi Itoyama, Kazuyoshi Yoshii
Channel selection in the short-time modulation domain for distant speech recognition
Ivan Himawan, Petr Motlicek, Sridha Sridharan, David Dean, Dian Tjondronegoro
A multi-channel speech enhancement framework for robust NMF-based speech recognition for speech-impaired users
Gert Dekkers, Toon van Waterschoot, Bart Vanrumste, Bert Van Den Broeck, Jort F. Gemmeke, Hugo Van hamme, Peter Karsmakers
Sound source separation algorithm using phase difference and angle distribution modeling near the target
Chanwoo Kim, Kean K. Chin
Contaminated speech training methods for robust DNN-HMM distant speech recognition
Mirco Ravanelli, Maurizio Omologo
Distance-aware DNNs for robust speech recognition
Yajie Miao, Florian Metze
Perception and production of vowel contrasts in German learners of English
Helena Levy
Goodness of tone (GOT) for non-native Mandarin tone recognition
Rong Tong, Nancy F. Chen, Bin Ma, Haizhou Li
The effect of high-variability training on the perception and production of French stops by German native speakers
Jeanin Jügler, Frank Zimmerer, Bernd Möbius, Christoph Draxler
Perception of Mandarin tones by native tibetan speakers
Wenfu Bao, Hui Feng, Jianwu Dang, Zhilei Liu, Yang Yu, Siyu Wang
Study of acoustic correlates of English lexical stress produced by native (L1) bengali speakers compared to native (L1) English speakers
Shambhu Nath Saha, Shyamal Kr. Das Mandal
Prosodic phrasing unique to the acquisition of L2 intonation — an analysis of L2 Japanese intonation by L1 Swedish learners
Yasuko Nagano-Madsen
Fusion of LVCSR and posteriorgram based keyword search
Leda Sarı, Batuhan Gündoğdu, Murat Saraçlar
Improving speech recognition and keyword search for low resource languages using web data
Gideon Mendels, Erica Cooper, Victor Soto, Julia Hirschberg, Mark J. F. Gales, Kate M. Knill, Anton Ragni, Haipeng Wang
Two-step spoken term detection using SVM classifier trained with pre-indexed keywords based on ASR result
Kentaro Domoto, Takehito Utsuro, Naoki Sawada, Hiromitsu Nishizaki
Enhancing low resource keyword spotting with automatically retrieved web documents
Le Zhang, Damianos Karakos, William Hartmann, Roger Hsiao, Richard Schwartz, Stavros Tsakalidis
A comparison between a DNN and a CRF disfluency detection and reconstruction system
Dario Bertero, Linlin Wang, Ho Yin Chan, Pascale Fung
Recurrent neural networks for incremental disfluency detection
Julian Hough, David Schlangen
Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning
Qiong Hu, Zhizheng Wu, Korin Richmond, Junichi Yamagishi, Yannis Stylianou, Ranniery Maia
An investigation of recurrent neural network architectures for statistical parametric speech synthesis
Sivanand Achanta, Tejas Godambe, Suryakanth V. Gangashetty
Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis
Yuchen Fan, Yao Qian, Frank K. Soong, Lei He
Towards minimum perceptual error training for DNN-based speech synthesis
Cassia Valentini-Botinhao, Zhizheng Wu, Simon King
Deep neural network-based statistical parametric speech synthesis system using improved time-frequency trajectory excitation model
Eunwoo Song, Hong-Goo Kang
A study of speaker adaptation for DNN-based speech synthesis
Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, Simon King
High-resolution acoustic modeling and compact language modeling of language-universal speech attributes for spoken language identification
Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee
Phonemes frequency based PLLR dimensionality reduction for language recognition
Saad Irtza, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah, Haizhou Li
Exploiting i-vector posterior covariances for short-duration language recognition
Sandro Cumani, Oldřich Plchot, Radek Fér
Using the beat histogram for speech rhythm description and language identification
Athanasios Lykartsis, Stefan Weinzierl
Speaker recognition for speech under face cover
Rahim Saeidi, Tuija Niemi, Hanna Karppelin, Jouni Pohjalainen, Tomi Kinnunen, Paavo Alku
Dataset-invariant covariance normalization for out-domain PLDA speaker verification
Md. Hafizur Rahman, Ahilan Kanagasundaram, David Dean, Sridha Sridharan
Sparse coding of total variability matrix
Longting Xu, Kong Aik Lee, Haizhou Li, Zhen Yang
Duration dependent covariance regularization in PLDA modeling for speaker verification
Weicheng Cai, Ming Li, Lin Li, QingYang Hong
Exploiting supervector structure for speaker recognition trained on a small development set
Hagai Aronowitz
Modified-prior PLDA and score calibration for duration mismatch compensation in speaker recognition system
QingYang Hong, Lin Li, Ming Li, Ling Huang, Lihong Wan, Jun Zhang
Speaker verification using Gaussian posteriorgrams on fixed phrase short utterances
Sarfaraz Jelil, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna
Importance of intelligible phonemes for human speaker recognition in different channel bandwidths
Laura Fernández Gallardo, Sebastian Möller, Michael Wagner
Denoising autoencoder-based speaker feature restoration for utterances of short duration
Hitoshi Yamamoto, Takafumi Koshinaka
Full multicondition training for robust i-vector based speaker recognition
Dayana Ribas, Emmanuel Vincent, José Ramón Calvo
Maximum a posteriori adaptation of network parameters in deep models
Zhen Huang, Sabato Marco Siniscalchi, I-Fan Chen, Jinyu Li, Jiadong Wu, Chin-Hui Lee
Regularized sequence-level deep neural network model adaptation
Yan Huang, Yifan Gong
Modeling speaker variability using long short-term memory networks for speech recognition
Xiangang Li, Xihong Wu
Intermediate-layer DNN adaptation for offline and session-based iterative speaker adaptation
Kshitiz Kumar, Chaojun Liu, Kaisheng Yao, Yifan Gong
Speaker adaptation of convolutional neural network using speaker specific subspace vectors of SGMM
Murali Karthick B., Prateek Kolhar, S. Umesh
On speaker adaptation of long short-term memory recurrent neural networks
Yajie Miao, Florian Metze
Automatic identification of received language in MEG
Emilio Parisotto, Youness A. Ghassabeh, Matt J. MacDonald, Adelina Cozma, Elizabeth W. Pang, Frank Rudzicz
Detection of cardiovascular reactivity in speech
Laurens van der Werff, Jón Guðnason, Kamilla Rún Jóhannsdóttir
Lateralization in emotional speech perception following transcranial direct current stimulation
Alex Francois-Nienaber, Jed A. Meltzer, Frank Rudzicz
Speech reconstruction from human auditory cortex with deep neural networks
Minda Yang, Sameer A. Sheth, Catherine A. Schevon, Guy M. McKhann, Nima Mesgarani
Temporal dynamics of the speech readiness potential, and its use in a neural decoder of speech-motor intention
Jonathan S. Brumberg, Nichol Castro, Akshatha Rao
Continuous speech recognition from ECoG
Dominic Heger, Christian Herff, Adriana de Pesters, Dominic Telaar, Peter Brunner, Gerwin Schalk, Tanja Schultz
Locally-connected and convolutional neural networks for small footprint speaker recognition
Yu-hsin Chen, Ignacio Lopez-Moreno, Tara N. Sainath, Mirkó Visontai, Raziel Alvarez, Carolina Parada
Insights into deep neural networks for speaker recognition
Daniel Garcia-Romero, Alan McCree
A unified deep neural network for speaker and language recognition
Fred Richardson, Douglas A. Reynolds, Najim Dehak
Investigation of bottleneck features and multilingual deep neural networks for speaker verification
Yao Tian, Meng Cai, Liang He, Jia Liu
Frequency offset correction in single sideband (SSB) speech by deep neural network for speaker verification
Hua Xing, Gang Liu, John H. L. Hansen
Exploring robustness of DNN/RNN for extracting speaker baum-welch statistics in mismatched conditions
Hao Zheng, Shanshan Zhang, Wenju Liu
Simultaneous optimization of multiple tree structures for factor analyzed HMM-based speech synthesis
Takenori Yoshimura, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda
HMM training strategy for incremental speech synthesis
Maël Pouget, Thomas Hueber, Gérard Bailly, Timo Baumann
Modulation spectrum-constrained trajectory training algorithm for HMM-based speech synthesis
Shinnosuke Takamichi, Tomoki Toda, Alan W. Black, Satoshi Nakamura
Random forests for statistical speech synthesis
Alan W. Black, Prasanna Kumar Muthukumar
Speaker adaptation using relevance vector regression for HMM-based expressive TTS
Doo Hwa Hong, Joun Yeop Lee, Se Young Jang, Nam Soo Kim
Towards a linear dynamical model based speech synthesizer
Vassilis Tsiaras, Ranniery Maia, Vassilis Diakoloukas, Yannis Stylianou, Vassilis Digalakis
Providing objective metrics of team communication skills via interpersonal coordination mechanisms
Céline De Looze, Brian Vaughan, Finnian Kelly, Alison Kay
Dialog act modeling for virtual personal assistant applications using a small volume of labeled data and domain knowledge
Donghyeon Lee, Jinsik Lee, Eun-Kyoung Kim, Jaewon Lee
A polyglot domain optimised text-to-speech system for railway station announcements
Csaba Zainkó, Mátyás Bartalis, Géza Németh, Gábor Olaszy
Development of hindi speech recognition system of agricultural commodities using deep neural network
Partho Mandal, Shalini Jain, Gaurav Ojha, Anupam Shukla
Real-time audio signal enhancement for hands-free speech applications
Thomas Fehér, Michael Freitag, Christian Gruber
Personalized synthetic voices for speaking impaired: website and app
D. Erro, Inma Hernaez, Agustin Alonso, D. García-Lorenzo, Eva Navas, J. Ye, H. Arzelus, Igor Jauk, N. Q. Hy, C. Magariños, R. Pérez-Ramón, M. Sulír, Xiaohai Tian, X. Wang
Under-resourced speech recognition based on the speech manifold
Reza Sahraeian, Dirk Van Compernolle, Febe de Wet
Multilingual features based keyword search for very low-resource languages
Pavel Golik, Zoltán Tüske, Ralf Schlüter, Hermann Ney
Second language speech recognition using multiple-pass decoding with lexicon represented by multiple reduced phoneme sets
Xiaoyun Wang, Seiichi Yamamoto
Using resources from a closely-related language to develop ASR for a very under-resourced language: a case study for iban
Sarah Samson Juan, Laurent Besacier, Benjamin Lecouteux, Mohamed Dyab
Prediction of speech recognition accuracy for utterance classification
Maxim L. Korenevsky, Andrey B. Smirnov, Valentin S. Mendelev
Error bounds for context reduction and feature omission
Eugen Beck, Ralf Schlüter, Hermann Ney
A metric for evaluating speech recognizer output based on human-perception model
Nobuyasu Itoh, Gakuto Kurata, Ryuki Tachibana, Masafumi Nishimura
How to evaluate ASR output for named entity recognition?
Mohamed Ameur Ben Jannet, Olivier Galibert, Martine Adda-Decker, Sophie Rosset
Acoustic-prosodic analysis of attitudinal expressions in German
Hansjörg Mixdorff, Angelika Hönemann, Albert Rilliard
Continuous emotion tracking using total variability space
Hossein Khaki, Engin Erzin
An analysis of the relationship between signal-derived vocal arousal score and human emotion production and perception
Chi-Chun Lee, Daniel Bone, Shrikanth S. Narayanan
Morphology of vocal affect bursts: exploring expressive interjections in Japanese conversation
Hiroki Mori
Emotion clustering based on probabilistic linear discriminant analysis
Mahnoosh Mehrabani, Ozlem Kalinli, Ruxin Chen
Objective study of the performance degradation in emotion recognition through the AMR-WB+ codec
Aaron Albin, Elliot Moore
Analysis of excitation source features of speech for emotion recognition
Sudarsana Reddy Kadiri, P. Gangamohan, Suryakanth V. Gangashetty, B. Yegnanarayana
An investigation of emotion change detection from speech
Zhaocheng Huang, Julien Epps, Eliathamby Ambikairajah
Crosslinguistic comparison on the perception of Mandarin attitudinal speech
Wentao Gu, Ping Tang, Keikichi Hirose, Véronique Aubergé
Conflict intensity estimation from speech using Greedy forward-backward feature selection
Gábor Gosztolya
Exploring acoustic differences between Cantonese (tonal) and English (non-tonal) spoken expressions of emotions
Chee Seng Chong, Jeesun Kim, Chris Davis
Valence, arousal and dominance estimation for English, German, Greek, Portuguese and Spanish lexica using semantic models
Elisavet Palogiannidi, Elias Iosif, Polychronis Koutsakis, Alexandros Potamianos
Dimensionality reduction for speech emotion features by multiscale kernels
Xinzhou Xu, Jun Deng, Wenming Zheng, Li Zhao, Björn Schuller
High-level feature representation using recurrent neural network for speech emotion recognition
Jinkyu Lee, Ivan Tashev
Speech emotion classification using tree-structured sparse logistic regression
Myung Jong Kim, Joohong Yoo, Younggwan Kim, Hoirin Kim
Annotators' agreement and spontaneous emotion classification performance
Bogdan Vlasenko, Andreas Wendemuth
Multi-stream long short-term memory neural network language model
Ebru Arisoy, Murat Saraçlar
Composition-based on-the-fly rescoring for salient n-gram biasing
Keith Hall, Eunjoon Cho, Cyril Allauzen, Françoise Beaufays, Noah Coccaro, Kaisuke Nakajima, Michael Riley, Brian Roark, David Rybach, Linda Zhang
Learning phrase patterns for ASR name error detection using semantic similarity
Alex Marin, Mari Ostendorf, Ji He
Sparse non-negative matrix language modeling for skip-grams
Noam Shazeer, Joris Pelemans, Ciprian Chelba
Pruning sparse non-negative matrix n-gram language models
Joris Pelemans, Noam Shazeer, Ciprian Chelba
Geo-location for voice search language modeling
Ciprian Chelba, Xuedong Zhang, Keith Hall
On efficient training of word classes and their application to recurrent neural network language models
Rami Botros, Kazuki Irie, Martin Sundermeyer, Hermann Ney
Deep semantic encodings for language modeling
Ali Orkan Bayer, Giuseppe Riccardi
Learning OOV through semantic relatedness in spoken dialog systems
Ming Sun, Yun-Nung Chen, Alexander I. Rudnicky
TDTO language modeling with feedforward neural networks
Tze Yuang Chong, Rafael E. Banchs, Eng Siong Chng, Haizhou Li
Improvements to the pruning behavior of DNN acoustic models
Matthias Paulik
Fast and accurate recurrent neural network acoustic models for speech recognition
Haşim Sak, Andrew Senior, Kanishka Rao, Françoise Beaufays
Compressing deep neural networks using a rank-constrained topology
Preetum Nakkiran, Raziel Alvarez, Rohit Prabhavalkar, Carolina Parada
Convolutional neural networks for small-footprint keyword spotting
Tara N. Sainath, Carolina Parada
Efficient GPU implementation of convolutional neural networks for speech recognition
Ewout van den Berg, Daniel Brand, Rajesh Bordawekar, Leonid Rachevsky, Bhuvana Ramabhadran
Scalable distributed DNN training using commodity GPU cloud computing
Nikko Strom
Joint source localization and separation in spherical harmonic domain using a sparsity based method
Sachin N. Kalkur, Sandeep Reddy C., Rajesh M. Hegde
Regularized non-negative matrix factorization using alternating direction method of multipliers and its application to source separation
Shaofei Zhang, Dong-Yan Huang, Lei Xie, Eng Siong Chng, Haizhou Li, Minghui Dong
Two-stage multi-target joint learning for monaural speech separation
Shuai Nie, Shan Liang, Wei Xue, Xueliang Zhang, Wenju Liu, Like Dong, Hong Yang
Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement
Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee
Discriminative nonnegative matrix factorization using cross-reconstruction error for source separation
Kisoo Kwon, Jong Won Shin, Hyung Yong Kim, Nam Soo Kim
Using audio and visual information for single channel speaker separation
Faheem Khan, Ben Milner
On the nature of the features generated in the human auditory pathway for phone recognition
Harald Höge
How the slope of the speech spectrum affects the perception of speaker size
Kodai Yamamoto, Toshio Irino, Ryuichi Nisimura, Hideki Kawahara, Roy D. Patterson
Weakly-supervised word learning is improved by an active online algorithm
Heikki Rasilo, Okko Räsänen
The effect of cochlear implant processing on speaker intelligibility: a perceptual study and computer model
Lin Lin, Jon Barker, Guy J. Brown
Phonetic-phonological feature emerges by associating phonetic with semantic information — a GSOM-based modeling study
Mengxue Cao, Aijun Li, Qiang Fang, Bernd J. Kröger
DIANA: towards computational modeling reaction times in lexical decision in north American English
L. ten Bosch, L. Boves, B. Tucker, M. Ernestus
Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and n-gram prior distributions
Qian Chen, Zhen-Hua Ling, Chen-Yu Yang, Li-Rong Dai
A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis
Manuel Sam Ribeiro, Junichi Yamagishi, Robert A. J. Clark
Duration prediction using multi-level model for GPR-based speech synthesis
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi
Data-driven foot-based intonation generator for text-to-speech synthesis
Mahsa Sadat Elyasi Langarani, Jan van Santen, Seyed Hamidreza Mohammadi, Alexander Kain
Weighted correlation based atom decomposition intonation modelling
Branislav Gerazov, Pierre-Edouard Honnet, Aleksandar Gjoreski, Philip N. Garner
Using deep bidirectional recurrent neural networks for prosodic-target prediction in a unit-selection text-to-speech system
Raul Fernandez, Asaf Rendel, Bhuvana Ramabhadran, Ron Hoory
Large vocabulary automatic speech recognition for children
Hank Liao, Golan Pundak, Olivier Siohan, Melissa K. Carroll, Noah Coccaro, Qi-Ming Jiang, Tara N. Sainath, Andrew Senior, Françoise Beaufays, Michiel Bacchiani
Acoustic-prosodic correlates of `awkward' prosody in story retellings from adolescents with autism
Daniel Bone, Matthew P. Black, Anil Ramakrishna, Ruth Grossman, Shrikanth S. Narayanan
Evidence of phonological processes in automatic recognition of children's speech
Eva Fringi, Jill Fain Lehman, Martin Russell
Influence of speaker familiarity on blind and visually impaired children's perception of synthetic voices in audio games
Michael Pucher, Markus Toman, Dietmar Schabus, Cassia Valentini-Botinhao, Junichi Yamagishi, Bettina Zillinger, Erich Schmid
Low-memory fast on-line adaptation for acoustically mismatched children's speech recognition
S. Shahnawazuddin, Rohit Sinha
Large vocabulary children's speech recognition with DNN-HMM and SGMM acoustic modeling
Diego Giuliani, Bagher BabaAli
HMM adaptation for child speech synthesis
Avashna Govender, Febe de Wet, Jules-Raymond Tapamo
Vocal turn-taking patterns in groups of children performing collaborative tasks: an exploratory study
Jaebok Kim, Khiet P. Truong, Vicky Charisi, Cristina Zaga, Manja Lohse, Dirk Heylen, Vanessa Evers
Towards an automated screening tool for pediatric speech delay
Roozbeh Sadeghian, Stephen A. Zahorian
Children's reading aloud performance: a database and automatic detection of disfluencies
Jorge Proença, Dirce Celorico, Sara Candeias, Carla Lopes, Fernando Perdigão
Keyword spotting in multi-player voice driven games for children
Harshavardhan Sundar, Jill Fain Lehman, Rita Singh
Age-dependent height estimation and speaker normalization for children's speech using the first three subglottal resonances
Jinxi Guo, Rohit Paturi, Gary Yeung, Steven M. Lulich, Harish Arsikere, Abeer Alwan
The effect of speakers' regional varieties on listeners' decision-making
Adrian Leemann, Camilla Bernardasci, Francis Nolan
Word-initial glottal stop insertion, hiatus resolution and linking in British English
Robert Fuchs
Acoustic analysis of Mandarin affricates
Shanpeng Li, Wentao Gu
Homophonous phonotactic and morphonotactic consonant clusters in word-final position
Hannah Leykum, Sylvia Moosmüller, Wolfgang U. Dressler
Consonant duration and VOT as a function of syllable complexity and voicing in a sub-set of Spanish clusters
Mark Gibson, Ana María Fernández Planas, Adamantios Gafos, Emily Remirez
Hands-on tool producing front vowels for phonetic education: aiming for pronunciation training with tactile sensation
Takayuki Arai
Acoustics of articulatory constraints: vowel classification and nasalization
Indranil Dutta, Ayushi Pandey
Voice-conditioned allophones of MOUTH and PRICE in bahamian creole
Janina Kraus
Analysis of spatial variation with app-based crowdsourced audio data
Marie-José Kolly, Adrian Leemann, Florian Matter
Confusability in L2 vowels: analyzing the role of different features
Mátyás Jani, Catia Cucchiarini, Roeland van Hout, Helmer Strik
Perception of French speakers' German vowels
Frank Zimmerer, Jürgen Trouvain
Unintuitive phonetic behavior in tswana post-nasal stops
Jagoda Bruni, Daniel Duran, Grzegorz Dogil
Classification of place-of-articulation of stop consonants using temporal analysis
A. P. Prathosh, A. G. Ramakrishnan, T. V. Ananthapadmanabha
The emergence of nasal velar codas in Brazilian Portuguese: an rt-MRI study
Marissa Barlaz, Maojing Fu, Zhi-Pei Liang, Ryan Shosted, Brad Sutton
Salient dimensions in implicit phonotactic learning
Elise Michon, Emmanuel Dupoux, Alejandrina Cristia
An acoustic examination of the three-way sibilant contrast in lower sorbian
Phil Howson
Investigating consonant reduction in Mandarin Chinese with improved forced alignment
Jiahong Yuan, Mark Liberman
Durational characteristics and timing patterns of Russian onset clusters at two speaking rates
Marianne Pouplier, Stefania Marin, Alexei Kochetov
Modeling temporal dependency for robust estimation of LP model parameters in speech enhancement
Chun Hoy Wong, Tan Lee, Yu Ting Yeung, P. C. Ching
Learning a speech manifold for signal subspace speech denoising
Colin Vaz, Shrikanth S. Narayanan
An iterative speech model-based a priori SNR estimator
Samy Elshamy, Nilesh Madhu, Wouter Tirry, Tim Fingscheidt
Multi-resolution stacking for speech separation based on boosted DNN
Xiao-Lei Zhang, DeLiang Wang
Least squares estimate of the initial phases in STFT based speech enhancement
Sidsel Marie Nørholm, Martin Krawczyk-Becker, Timo Gerkmann, Steven van de Par, Jesper Rindom Jensen, Mads Græsbøll Christensen
Enhancement of non-stationary speech using harmonic chirp filters
Sidsel Marie Nørholm, Jesper Rindom Jensen, Mads Græsbøll Christensen
Text-informed speech enhancement with deep neural networks
Keisuke Kinoshita, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani
Complex tensor factorization in modulation frequency domain for single-channel speech enhancement
Shogo Masaya, Masashi Unoki
Systematic integration of acoustic echo canceller and noise reduction modules for voice communication systems
Hyeonjoo Kang, JeeSok Lee, Soonho Baek, Hong-Goo Kang
DNN-based residual echo suppression
Chul Min Lee, Jong Won Shin, Nam Soo Kim
Codebook-based speech enhancement using Markov process and speech-presence probability
Qi He, Changchun Bao, Feng Bao
On optimal smoothing in minimum statistics based noise tracking
Aleksej Chinaev, Reinhold Haeb-Umbach
A data-driven speech enhancement method based on modeled long-range temporal dynamics
Yue Hao, Changchun Bao, Feng Bao, Feng Deng
Improved phase reconstruction in single-channel speech separation
Florian Mayer, Pejman Mowlaee
Time-frequency kernel-based CNN for speech recognition
Tuo Zhao, Yunxin Zhao, Xin Chen
Consonant recognition with continuous-state hidden Markov models and perceptually-motivated features
Philip Weber, Colin J. Champion, S. M. Houghton, Peter Jančovič, Martin Russell
Investigating factor analysis features for deep neural networks in noisy speech recognition
Sriram Ganapathy, Samuel Thomas, Dimitrios Dimitriadis, Steven Rennie
Ensemble of Gaussian mixture localized neural networks with application to phone recognition
Ruchir Travadi, Shrikanth S. Narayanan
DNN derived filters for processing of modulation spectrum of speech
Jan Pešán, Lukáš Burget, Hynek Hermansky, Karel Veselý
Exploring how deep neural networks form phonemic categories
Tasha Nagamine, Michael L. Seltzer, Nima Mesgarani
Pronunciation accuracy and intelligibility of non-native speech
Anastassia Loukina, Melissa Lopez, Keelan Evanini, David Suendermann-Oeft, Alexei V. Ivanov, Klaus Zechner
Productions of /h/ in German: French vs. German speakers
Frank Zimmerer, Jürgen Trouvain
German non-native realizations of French voiced fricatives in final position of a group of words
Anne Bonneau, Martine Cadot
From newcastle MOUTH to aussie ears: australians' perceptual assimilation and adaptation for newcastle UK vowels
Catherine T. Best, Jason A. Shaw, Gerard Docherty, Bronwen G. Evans, Paul Foulkes, Jennifer Hay, Jalal Al-Tamimi, Katharine Mair, Karen E. Mulak, Sophie Wood
Wubuy coronal stop perception by speakers of three dialects of bangla
Rikke Louise Bundgaard-Nielsen, Brett Baker, Olga Maxwell, Janet Fletcher
Using melody metrics to compare English speech read by native speakers and by L2 Chinese speakers from shanghai
Daniel Hirst, Hongwei Ding
Predicting therapist empathy in motivational interviews using language features inspired by psycholinguistic norms
James Gibson, Nikolaos Malandrakis, Francisco Romero, David C. Atkins, Shrikanth S. Narayanan
Therapy language analysis using automatically generated psycholinguistic norms
Nikolaos Malandrakis, Shrikanth S. Narayanan
A dynamic model for behavioral analysis of couple interactions using acoustic features
Wei Xia, James Gibson, Bo Xiao, Brian Baucom, Panayiotis G. Georgiou
Analysis and modeling of the role of laughter in motivational interviewing based psychotherapy conversations
Rahul Gupta, Theodora Chaspari, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan
The discourse value of social signals at topic change moments
Francesca Bonin, Nick Campbell, Carl Vogel
Automatic detection of uncertainty in spontaneous German dialogue
Tobias Schrank, Barbara Schuppler
Face reading from speech — predicting facial action units from audio cues
Fabien Ringeval, Erik Marchi, Marc Mehu, Klaus Scherer, Björn Schuller
A new front-end for classification of non-speech sounds: a study on human whistle
Mahesh Kumar Nandwana, Hynek Bořil, John H. L. Hansen
Robust features for sonorant segmentation in continuous speech
Sri Harsha Dumpala, Bhanu Teja Nellore, Raghu Ram Nevali, Suryakanth V. Gangashetty, B. Yegnanarayana
Reduction of reverberation effects in the MFCC modulation spectrum for improved classification of acoustic signals
Sebastian Gergen, Anil Nagathil, Rainer Martin
Spiking neural networks and the generalised hough transform for speech pattern detection
Jonathan Dennis, Huy Dat Tran, Haizhou Li
Acoustic event recognition using dominant spectral basis vectors
Woohyun Choi, Sangwook Park, David K. Han, Hanseok Ko
A statistical model-based voice activity detection using multiple DNNs and noise awareness
Inyoung Hwang, Jaeseong Sim, Sang-Hyeon Kim, Kwang-Sub Song, Joon-Hyuk Chang
A universal VAD based on jointly trained deep neural networks
Qing Wang, Jun Du, Xiao Bao, Zi-Rui Wang, Li-Rong Dai, Chin-Hui Lee
Spectrographic speech mask estimation using the time-frequency correlation of speech presence
Ge Zhan, Zhaoqiong Huang, Dongwen Ying, Jielin Pan, Yonghong Yan
Complete-linkage clustering for voice activity detection in audio and visual speech
Houman Ghaemmaghami, David Dean, Shahram Kalantari, Sridha Sridharan, Clinton Fookes
A model based voice activity detector for noisy environments
Kaavya Sriskandaraja, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah
An unsupervised visual-only voice activity detection approach using temporal orofacial features
Fei Tao, John H. L. Hansen, Carlos Busso
Automatic detection of equipment alarms in a neonatal intensive care unit environment: a knowledge-based approach
Ganna Raboshchuk, Peter Jančovič, Climent Nadeu, Alex Peiró Lilja, Münevver Köküer, Blanca Muñoz Mahamud, Ana Riverola de Veciana
“multilingual” deep neural network for music genre classification
Jia Dai, Wenju Liu, Chongjia Ni, Like Dong, Hong Yang
Accurate endpointing with expected pause duration
Baiyang Liu, Bjorn Hoffmeister, Ariya Rastrow
Locality constrained transitive distance clustering on speech data
Wenbo Liu, Zhiding Yu, Bhiksha Raj, Ming Li
Feature extraction strategies in deep learning based acoustic event detection
Miquel Espi, Masakiyo Fujimoto, Keisuke Kinoshita, Tomohiro Nakatani
An acoustic event detection framework and evaluation metric for surveillance in cars
Peter Transfeld, Simon Receveur, Tim Fingscheidt
Diachronic semantic cohesion for topic segmentation of TV broadcast news
Abdessalam Bouchekif, Géraldine Damnati, Yannick Estève, Delphine Charlet, Nathalie Camelin
Comparison of forced-alignment speech recognition and humans for generating reference VAD
Ivan Kraljevski, Zheng-Hua Tan, Maria Paola Bissiri
Improving voice activity detection in movies
Bernhard Lehner, Gerhard Widmer, Reinhard Sonnleitner
Learning from real users: rating dialogue success with neural networks for reinforcement learning in spoken dialogue systems
Pei-Hao Su, David Vandyke, Milica Gašić, Dongho Kim, Nikola Mrkšić, Tsung-Hsien Wen, Steve Young
A framework to develop context-aware adaptive dialogue system
David Griol, Zoraida Callejas, Ramón López-Cózar
A proposal to develop domain and subtask-adaptive dialog management models
David Griol, Zoraida Callejas
Hypotheses ranking and state tracking for a multi-domain dialog system using multiple ASR alternates
Omar Zia Khan, Jean-Philippe Robichaud, Paul A. Crook, Ruhi Sarikaya
An entropy minimization framework for goal-driven dialogue management
Ji Wu, Miao Li, Chin-Hui Lee
Context-dependent error correction of spoken referring expressions
Ingrid Zukerman, Andisheh Partovi, Su Nam Kim
Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): introductory talk by the organizers
Zhizheng Wu, Tomi Kinnunen
ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge
Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Cemal Hanilçi, Md. Sahidullah, Aleksandr Sizov
The AHOLAB RPS SSD spoofing challenge 2015 submission
Jon Sanchez, Ibon Saratxaga, Inma Hernaez, Eva Navas, D. Erro
Human vs machine spoofing detection on wideband and narrowband data
Mirjam Wester, Zhizheng Wu, Junichi Yamagishi
Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge
Xiong Xiao, Xiaohai Tian, Steven Du, Haihua Xu, Eng Siong Chng, Haizhou Li
Classifiers for synthetic speech detection: a comparison
Cemal Hanilçi, Tomi Kinnunen, Md. Sahidullah, Aleksandr Sizov
Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech
Tanvina B. Patel, Hemant A. Patil
Spoofing detection with DNN and one-class SVM for the ASVspoof 2015 challenge
Jesús Villalba, Antonio Miguel, Alfonso Ortega, Eduardo Lleida
Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015
Md. Jahangir Alam, Patrick Kenny, Gautam Bhattacharya, Themos Stafylakis
Spoofing countermeasure based on analysis of linear prediction error
Artur Janicki
Simultaneous utilization of spectral magnitude and phase information to extract supervectors for speaker verification anti-spoofing
Yi Liu, Yao Tian, Liang He, Jia Liu, Michael T. Johnson
A comparison of features for synthetic speech detection
Md. Sahidullah, Tomi Kinnunen, Cemal Hanilçi
Relative phase information for detecting human speech and spoofed speech
Longbiao Wang, Yohei Yoshida, Yuta Kawakami, Seiichi Nakagawa
Robust deep feature for spoofing detection — the SJTU system for ASVspoof 2015 challenge
Nanxin Chen, Yanmin Qian, Heinrich Dinkel, Bo Chen, Kai Yu
Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): open discussion and future plans
Junichi Yamagishi, Nicholas Evans
Applying GPGPU to recurrent neural network language model based fast network search in the real-time LVCSR
Kyungmin Lee, Chiyoun Park, Ilhwan Kim, Namhoon Kim, Jaewon Lee
Real-time integration of dynamic context information for improving automatic speech recognition
Youssef Oualil, Marc Schulder, Hartmut Helmke, Anna Schmidt, Dietrich Klakow
Rapid vocabulary addition to context-dependent decoder graphs
Cyril Allauzen, Michael Riley
Modeling phonetic context with non-random forests for speech recognition
Hainan Xu, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur
Ant colony algorithm applied to automatic speech recognition graph decoding
Benjamin Lecouteux, Didier Schwab
Garbage modeling for on-device speech recognition
Christophe Van Gysel, Leonid Velikovich, Ian McGraw, Françoise Beaufays
A comparative study of BNF and DNN multilingual training on cross-lingual low-resource speech recognition
Haihua Xu, Van Hai Do, Xiong Xiao, Eng Siong Chng
Neural higher-order factors in conditional random fields for phoneme classification
Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf
Stacked auto-encoder for ASR error detection and word error rate prediction
Shahab Jalalvand, Daniele Falavigna
Estimation of the air-tissue boundaries of the vocal tract in the mid-sagittal plane from electromagnetic articulograph data
Satyabrata Parida, Ashok Kumar Pattem, Prasanta Kumar Ghosh
A new Italian dataset of parallel acoustic and articulatory data
Claudia Canevari, Leonardo Badino, Luciano Fadiga
Error analysis of extracted tongue contours from 2d ultrasound images
Tamás Gábor Csapó, Steven M. Lulich
Accuracy of a markerless acquisition technique for studying speech articulators
Andrea Bandini, Slim Ouni, Piero Cosi, Silvia Orlandi, Claudia Manfredi
Measuring oral and nasal airflow in production of Chinese plosive
Yujie Chi, Kiyoshi Honda, Jianguo Wei, Hui Feng, Jianwu Dang
Enhanced videokymographic data analysis based on vocal folds dynamics modeling
Carlo Drioli, Gian Luca Foresti
Interpolation of tongue fleshpoint kinematics from combined EMA position and orientation data
Andrew J. Kolb, Michael T. Johnson, Jeffrey Berry
A new technique for assessing glottal dynamics in speech and singing by means of optical-flow computation
Gustavo Andrade-Miranda, Nathalie Henrich Bernardoni, Juan Ignacio Godino-Llorente
On the incompatibility of trilling and palatalization: a single-subject study of sustained apical and uvular trills
Alexei Kochetov, Phil Howson
Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings
Pengcheng Zhu, Lei Xie, Yunlin Chen
Adapting machine translation models toward misrecognized speech with text-to-speech pronunciation rules and acoustic confusability
Nicholas Ruiz, Qin Gao, William Lewis, Marcello Federico
“speech is silver, but silence is golden”: improving speech-to-speech translation performance by slashing users input
Frederic Bechet, Benoit Favre, Mickael Rouvier
A study on the stability and effectiveness of features in quality estimation for spoken language translation
Raymond W. M. Ng, Kashif Shah, Lucia Specia, Thomas Hain
Efficient language model adaptation for automatic speech recognition of spoken translations
Joris Pelemans, Tom Vanallemeersch, Kris Demuynck, Hugo Van hamme, Patrick Wambacq
Speed or accuracy? a study in evaluation of simultaneous speech translation
Takashi Mieno, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura
Large scale speech-to-text translation with out-of-domain corpora using better context-based models and domain adaptation
Marcin Junczys-Dowmunt, Paweł Przybysz, Arleta Staszuk, Eun-Kyoung Kim, Jaewon Lee
An i-vector backend for speaker verification
Patrick Kenny, Themos Stafylakis, Md. Jahangir Alam, Marcel Kockmann
Multi-channel speaker verification based on total variability modelling
Joana Correia, Alessio Brutti, Alberto Abad
SNR-invariant PLDA modeling for robust speaker verification
Na Li, Man-Wai Mak
Investigating in-domain data requirements for PLDA training
Md. Hafizur Rahman, David Dean, Ahilan Kanagasundaram, Sridha Sridharan
Migrating i-vectors between speaker recognition systems using regression neural networks
Ondřej Glembek, Pavel Matějka, Oldřich Plchot, Jan Pešán, Lukáš Burget, Petr Schwarz
Improving PLDA speaker verification using WMFD and linear-weighted approaches in limited microphone data conditions
Ahilan Kanagasundaram, David Dean, Sridha Sridharan
The relationship between voice source parameters and the maxima dispersion quotient (MDQ)
Christer Gobl, Irena Yanushevskaya, Ailbhe Ní Chasaide
Glottal inverse filtering based on quadratic programming
Manu Airaksinen, Tom Bäckström, Paavo Alku
Automatic detection of creaky voice using epoch parameters
N. P. Narendra, K. Sreenivasa Rao
Perception of voicing in the absence of native voicing experience
Rikke Louise Bundgaard-Nielsen, Brett Baker
The relationship between acoustic and perceived intraspeaker variability in voice quality
Jody Kreiman, Soo Jin Park, Patricia A. Keating, Abeer Alwan
Perceptual cues of whispered tones: are they really special?
Li Jiao, Qiuwu Ma, Ting Wang, Yi Xu
Multiscale recurrent neural network based language model
Tsuyoshi Morioka, Tomoharu Iwata, Takaaki Hori, Tetsunori Kobayashi
Bag-of-words input for long history representation in neural network-based language models for speech recognition
Kazuki Irie, Ralf Schlüter, Hermann Ney
Efficient machine translation decoding with slow language models
Ahmad Emami
Latent words recurrent neural network language models
Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi, Akinori Ito
Combining multiple-type input units using recurrent neural network for LVCSR language modeling
Vataya Chunwijitra, Ananlada Chotimongkol, Chai Wutiwiwatchai
Prosodically-enhanced recurrent neural network language models
Siva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, Akinobu Lee
Biosignal-based spoken communication: welcome and introduction
Matthias Janke, Michael Wand
A comprehensive 3d biomechanically-driven vocal tract model including inverse dynamics for speech research
Peter Anderson, Negar M. Harandi, Scott Moisik, Ian Stavness, Sidney Fels
Low frequency ultrasonic voice activity detection using convolutional neural networks
Ian McLoughlin, Yan Song
Real-time control of a DNN-based articulatory synthesizer for silent speech conversion: a pilot study
Florent Bocquelet, Thomas Hueber, Laurent Girin, Christophe Savariaux, Blaise Yvert
Tongue tracking in ultrasound images using eigentongue decomposition and artificial neural networks
Diandra Fabre, Thomas Hueber, Florent Bocquelet, Pierre Badin
Speaker-independent silent speech recognition with across-speaker articulatory normalization and speaker adaptive training
Jun Wang, Seongjun Hahm
Codebook clustering for unit selection based EMG-to-speech conversion
Lorenz Diener, Matthias Janke, Tanja Schultz
Flexible tracking of auditory attention
Majid Mirbagheri, Bradley Ekin, Les Atlas, Adrian K. C. Lee
Biosignal-based spoken communication: panel and discussion
Matthias Janke, Michael Wand
A study on deep neural network acoustic model adaptation for robust far-field speech recognition
Seyedmahdad Mirsamadi, John H. L. Hansen
Speech dereverberation using long short-term memory
Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Reverberation robust acoustic modeling using i-vectors with time delay neural networks
Vijayaditya Peddinti, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur
Delta-melspectra features for noise robustness to DNN-based ASR systems
Kshitiz Kumar, Chaojun Liu, Yifan Gong
Combating reverberation in large vocabulary continuous speech recognition
Vikramjit Mitra, Julien Van Hout, Mitchell McLaren, Wen Wang, Martin Graciarena, Dimitra Vergyri, Horacio Franco
Three ways to adapt a CTS recognizer to unseen reverberated speech in BUT system for the ASpIRE challenge
Martin Karafiát, František Grézl, Lukáš Burget, Igor Szöke, Jan Černocký
Robust parameter estimation for audio declipping in noise
Mark J. Harvilla, Richard M. Stern
Multi-task learning deep neural networks for speech feature denoising
Bin Huang, Dengfeng Ke, Hao Zheng, Bo Xu, Yanyan Xu, Kaile Su
Time-frequency masking for large scale robust speech recognition
Yuxuan Wang, Ananya Misra, Kean K. Chin
Efficient use of DNN bottleneck features in generalized variable parameter HMMs for noise robust speech recognition
Rongfeng Su, Xurong Xie, Xunying Liu, Lan Wang
Investigating modulation spectrogram features for deep neural network-based automatic speech recognition
Deepak Baby, Hugo Van hamme
Deep neural network based spectral feature mapping for robust speech recognition
Kun Han, Yanzhang He, Deblin Bagchi, Eric Fosler-Lussier, DeLiang Wang
Analyzing speech rate entrainment and its relation to therapist empathy in drug addiction counseling
Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan
Agreement and disagreement utterance detection in conversational speech by extracting and integrating local features
Atsushi Ando, Taichi Asami, Manabu Okamoto, Hirokazu Masataki, Sumitaka Sakauchi
Still together?: the role of acoustic features in predicting marital outcome
Md. Nasir, Wei Xia, Bo Xiao, Brian Baucom, Shrikanth S. Narayanan, Panayiotis G. Georgiou
On evaluation metrics for social signal detection
Gábor Gosztolya
Laughter and filler detection in naturalistic audio
Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen
Automatic formatted transcripts for videos
Aasish Pappu, Amanda Stent
Does my speech rock? automatic assessment of public speaking skills
Lucas Azaïs, Adrien Payan, Tianjiao Sun, Guillaume Vidal, Tina Zhang, Eduardo Coutinho, Florian Eyben, Björn Schuller
Verbal intelligence identification based on text classification
Roman Sergienko, Alexander Schmitt
A multimodal approach for automatic assessment of school principals' oral presentation during pre-service training program
Shan-Wen Hsiao, Hung-Ching Sun, Ming-Chuan Hsieh, Ming-Hsueh Tsai, Hsin-Chih Lin, Chi-Chun Lee
Are you TED talk material? comparing prosody in professors and TED speakers
T. J. Tsai
Detection of cognitive states and their correlation to speech recognition performance in speech-to-speech machine translation systems
Hayakawa Akira, Fasih Haider, Loredana Cerrato, Nick Campbell, Saturnino Luz
Perceptual speech quality dimensions in a conversational situation
Friedemann Köster, Sebastian Möller
Multidimensional evaluation and predicting overall speech quality
Jens Berger, Anna Llagostera
On speech intelligibility estimation of phase-aware single-channel speech enhancement
Andreas Gaich, Pejman Mowlaee
A framework for the evaluation of microscopic intelligibility models
Ricard Marxer, Martin Cooke, Jon Barker
A binaural short time objective intelligibility measure for noisy and enhanced speech
Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen
A glimpse-based approach for predicting binaural intelligibility with single and multiple maskers in anechoic conditions
Yan Tang, Martin Cooke, Bruno M. Fazenda, Trevor J. Cox
Improving the prediction power of the speech transmission index to account for non-linear distortions introduced by noise-reduction algorithms
Fei Chen
DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech
Kehuang Li, Zhen Huang, Yong Xu, Chin-Hui Lee
Speech quality evaluation of artificial bandwidth extension: comparing subjective judgments and instrumental predictions
Hannu Pulakka, Ville Myllylä, Anssi Rämö, Paavo Alku
Synchronous overlap and add of spectra for enhancement of excitation in artificial bandwidth extension of speech
M. A. Tuğtekin Turan, Engin Erzin
Speech bandwidth expansion based on deep neural networks
Yingxue Wang, Shenghui Zhao, Wenbo Liu, Ming Li, Jingming Kuang
A novel method of artificial bandwidth extension using deep architecture
Bin Liu, Jianhua Tao, Zhengqi Wen, Ya Li, Danish Bukhari
Annotating large lattices with the exact word error
Rogier C. van Dalen, Mark J. F. Gales
Semi-supervised maximum mutual information training of deep neural network acoustic models
Vimal Manohar, Daniel Povey, Sanjeev Khudanpur
Rectified linear neural networks with tied-scalar regularization for LVCSR
Shiliang Zhang, Hui Jiang, Si Wei, Li-Rong Dai
Segmental conditional random fields with deep neural networks as acoustic models for first-pass word recognition
Yanzhang He, Eric Fosler-Lussier
Distinct triphone acoustic modeling using deep neural networks
Dongpeng Chen, Brian Mak
Minimum word error training of RNN-based voice activity detection
Gregory Gelly, Jean-Luc Gauvain
Vocal biomarkers to discriminate cognitive load in a working memory task
Thomas F. Quatieri, James R. Williamson, Christopher J. Smalt, Tejash Patel, Joseph Perricone, Daryush D. Mehta, Brian S. Helfer, Gregory Ciccarelli, Darrell Ricke, Nicolas Malyska, Jeff Palmer, Kristin Heaton, Marianna Eddy, Joseph Moran
I-vector based physical task stress detection with different fusion strategies
Chunlei Zhang, Gang Liu, Chengzhu Yu, John H. L. Hansen
Automatic detection of mild cognitive impairment from spontaneous speech using ASR
László Tóth, Gábor Gosztolya, Veronika Vincze, Ildikó Hoffmann, Gréta Szatlóczki, Edit Biró, Fruzsina Zsura, Magdolna Pákáski, János Kálmán
Contemporary stochastic feature selection algorithms for speech-based emotion recognition
Maxim Sidorov, Christina Brester, Alexander Schmitt
Effect of different jitter-induced glottal pulse shape changes in periodicity perturbation measures
Carlos A. Ferrer, Diana Torres, Eduardo González, José Ramón Calvo, Eduardo Castillo
Automatic audio sentiment extraction using keyword spotting
Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen
Unsupervised relation detection using automatic alignment of query patterns extracted from knowledge graphs and query click logs
Panupong Pasupat, Dilek Hakkani-Tür
A latent variable model for joint pause prediction and dependency parsing
The Tung Nguyen, Graham Neubig, Hiroyuki Shindo, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura
Extractive meeting summarization through speaker zone detection
Mohammad Hadi Bokaei, Hossein Sameti, Yang Liu
Positional language modeling for extractive broadcast news speech summarization
Shih-Hung Liu, Kuan-Yu Chen, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu
Speech-based location estimation of first responders in a simulated search and rescue scenario
Saeid Mokaram, Roger K. Moore
Constructive feedback, thinking process and cooperation: assessing the quality of classroom interaction
Tahir Sousa, Lucie Flekova, Margot Mieskes, Iryna Gurevych
A real-time variable-q non-stationary Gabor transform for pitch shifting
Dong-Yan Huang, Minghui Dong, Haizhou Li
Many-to-many voice conversion based on multiple non-negative matrix factorization
Ryo Aihara, Testuya Takiguchi, Yasuo Ariki
Statistical singing voice conversion based on direct waveform modification with global variance
Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
System fusion for high-performance voice conversion
Xiaohai Tian, Zhizheng Wu, Siu Wa Lee, Quy Hy Nguyen, Minghui Dong, Eng Siong Chng
Speaker adaptation using only vocalic segments via frequency warping
Agustin Alonso, D. Erro, Eva Navas, Inma Hernaez
Non-audible murmur enhancement based on statistical conversion using air- and body-conductive microphones in noisy environments
Yusuke Tajiri, Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura
Advanced crowdsourcing for speech and beyond: introduction by the organizers
Tim Polzehl, Gina-Anne Levow
Transcribing continuous speech using mismatched crowdsourcing
Preethi Jyothi, Mark Hasegawa-Johnson
Selection and aggregation techniques for crowdsourced semantic annotation task
Shammur Absar Chowdhury, Marcos Calvo, Arindam Ghosh, Evgeny A. Stepanov, Ali Orkan Bayer, Giuseppe Riccardi, Fernando García, Emilio Sanchis
Controlling quality and handling fraud in large scale crowdsourcing speech data collections
Spencer Rothwell, Ahmad Elshenawy, Steele Carter, Daniela Braga, Faraz Romani, Michael Kennewick, Bob Kennewick
Data collection and annotation for state-of-the-art NER using unmanaged crowds
Spencer Rothwell, Steele Carter, Ahmad Elshenawy, Vladislavs Dovgalecs, Safiyyah Saleem, Daniela Braga, Bob Kennewick
Robustness in speech quality assessment and temporal training expiry in mobile crowdsourcing environments
Tim Polzehl, Babak Naderi, Friedemann Köster, Sebastian Möller
Effect of trapping questions on the reliability of speech quality judgments in a crowdsourcing paradigm
Babak Naderi, Tim Polzehl, Ina Wechsung, Friedemann Köster, Sebastian Möller
Voice Äpp: a mobile app for crowdsourcing Swiss German dialect data
Adrian Leemann, Marie-José Kolly, Jean-Philippe Goldman, Volker Dellwo, Ingrid Hove, Ibrahim Almajai, Sarah Grimm, Sylvain Robert, Daniel Wanitsch
Expert and crowdsourced annotation of pronunciation errors for automatic scoring systems
Anastassia Loukina, Melissa Lopez, Keelan Evanini, David Suendermann-Oeft, Klaus Zechner
Capcap: an output-agreement game for video captioning
Hernisa Kacorri, Kaoru Shinkawa, Shin Saito
Auris populi: crowdsourced native transcriptions of Dutch vowels spoken by adult Spanish learners
Pepi Burgos, Eric Sanders, Catia Cucchiarini, Roeland van Hout, Helmer Strik
Crowdsource a little to label a lot: labeling a speech corpus of dialectal Arabic
Samantha Wray, Ahmed Ali
Using keyword spotting to help humans correct captioning faster
Yashesh Gaur, Florian Metze, Yajie Miao, Jeffrey P. Bigham
Validating and optimizing a crowdsourced method for gradient measures of child speech
Tara McAllister Byun, Elaine Hitchcock, Daphna Harel
Joint training of speech separation, filterbank and acoustic model for robust automatic speech recognition
Zhong-Qiu Wang, DeLiang Wang
Joint environment and speaker normalization using factored front-end CMLLR
Shakti Rath, Sunil Sivadas, Bin Ma
Robust speech recognition using DNN-HMM acoustic model combining noise-aware training with spectral subtraction
Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa
Robust i-vector extraction for neural network adaptation in noisy environment
Chengzhu Yu, Atsunori Ogawa, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, John H. L. Hansen
Spectrally selective dithering for distorted speech recognition
Michal Borsky, Petr Mizera, Petr Pollak
Feature-space speaker adaptation for probabilistic linear discriminant analysis acoustic models
Liang Lu, Steve Renals
Speaker adaptation using the i-vector technique for bottleneck features
Patrick Cardinal, Najim Dehak, Yu Zhang, James Glass
I-vector estimation using informative priors for adaptation of deep neural networks
Penny Karanasou, Mark J. F. Gales, Philip C. Woodland
Robust i-vector based adaptation of DNN acoustic model for speech recognition
Sri Garimella, Arindam Mandal, Nikko Strom, Bjorn Hoffmeister, Spyros Matsoukas, Sree Hari Krishnan Parthasarathi
GMM-derived features for effective unsupervised adaptation of deep neural network acoustic models
Natalia Tomashenko, Yuri Khokhlov
Unsupervised adaptation for deep neural network using linear least square method
Roger Hsiao, Tim Ng, Stavros Tsakalidis, Long Nguyen, Richard Schwartz
Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation
Sheng Li, Xugang Lu, Yuya Akita, Tatsuya Kawahara
Data-selective transfer learning for multi-domain speech recognition
Mortaza Doulaty, Oscar Saz, Thomas Hain
Language-independent method for analysis of German stuttering recordings
Tomas Lustyk, Petr Bergl, Tino Haderlein, Elmar Nöth, Roman Cmejla
An investigation of MDVP parameters for voice pathology detection on three different databases
Ahmed Al-nasheri, Zulfiqar Ali, Ghulam Muhammad, Mansour Alsulaiman
Energy distribution analysis and nonlinear dynamical analysis of adductor spasmodic dysphonia
Jiantao Wu, Ping Yu, Nan Yan, Lan Wang, Xiaohui Yang, Manwa L. Ng
Auditory-visual tone perception in hearing impaired Thai listeners
Benjawan Kasisopa, Nittayapa Klangpornkun, Denis Burnham
Speech intelligibility decline in individuals with fast and slow rates of ALS progression
Panying Rong, Yana Yunusova, Jordan R. Green
Latency analysis of speech shadowing reveals processing differences in Japanese adults who do and do not stutter
Rong Na A, Koichi Mori, Naomi Sakai
A syllable-based analysis of speech temporal organization: a comparison between speaking styles in dysarthric and healthy populations
Brigitte Bigi, Katarzyna Klessa, Laurianne Georgeton, Christine Meunier
Autonomous measurement of speech intelligibility utilizing automatic speech recognition
Bernd T. Meyer, Birger Kollmeier, Jasper Ooster
Can you hear me? acoustic modifications in speech directed to foreigners and hearing-impaired people
Monja Angelika Knoll, Melissa Johnstone, Charlene Blakely
Improving automatic forced alignment for dysarthric speech transcription
Yu Ting Yeung, Ka Ho Wong, Helen Meng
Communicative needs and respiratory constraints
Marcin Włodarczak, Mattias Heldner, Jens Edlund
Analysis and classification of cooperative and competitive dialogs
Uwe D. Reichel, Nina Pörner, Dianne Nowack, Jennifer Cole
Towards automatic detection of reported speech in dialogue using prosodic cues
Alessandra Cervone, Catherine Lai, Silvia Pareti, Peter Bell
Modeling phrasing and prominence using deep recurrent learning
Andrew Rosenberg, Raul Fernandez, Bhuvana Ramabhadran
Pitch declination and reset as a function of utterance duration in conversational speech data
Céline De Looze, Irena Yanushevskaya, Andy Murphy, Eoghan O'Connor, Christer Gobl
Investigating the role of `yeah' in stance-dense conversation
Valerie Freeman, Gina-Anne Levow, Richard Wright, Mari Ostendorf
Enhanced processing of a lost language: linguistic knowledge or linguistic skill?
Jiyoun Choi, Mirjam Broersma, Anne Cutler
Production inconsistencies delay adaptation to foreign accents
Ann-Kathrin Grohe, Gregory J. Poarch, Adriana Hanulíková, Andrea Weber
Acquisition of English speech rhythm by monolingual children
Mikhail Ordin, Leona Polyanskaya
Durational information in word-initial lexical embeddings in spoken Dutch
Odette Scharenborg
The development of categorical perception of lexical tones in Mandarin-speaking preschoolers
Fei Chen, Nan Yan, Lan Wang, Tao Yang, Jiantao Wu, Han Zhao, Gang Peng
Perception of Italian liquids by Japanese listeners: comparisons to Spanish liquids
Tomohiko Ooigawa
The IBM 2015 English conversational telephone speech recognition system
George Saon, Hong-Kwang J. Kuo, Steven Rennie, Michael Picheny
The cambridge university 2014 BOLT conversational telephone Mandarin Chinese LVCSR system for speech translation
Xunying Liu, Federico Flego, Linlin Wang, C. Zhang, Mark J. F. Gales, Philip C. Woodland
The IBM BOLT speech transcription system
Samuel Thomas, George Saon, Hong-Kwang J. Kuo, Lidia Mangu
Improvements in RWTH LVCSR evaluation systems for Polish, Portuguese, English, urdu, and Arabic
M. Ali Basha Shaik, Zoltán Tüske, M. Ali Tahir, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney
Active learning based data selection for limited resource STT and KWS
Thiago Fraga-Silva, Jean-Luc Gauvain, Lori Lamel, Antoine Laurent, Viet-Bac Le, Abdel Messaoudi
Improved hindi broadcast ASR by adapting the language model and pronunciation model using a priori syntactic and morphophonemic knowledge
Preethi Jyothi, Mark Hasegawa-Johnson
The zero resource speech challenge 2015
Maarten Versteegh, Roland Thiollière, Thomas Schatz, Xuan Nga Cao, Xavier Anguera, Aren Jansen, Emmanuel Dupoux
Discovering discrete subword units with binarized autoencoders and hidden-Markov-model encoders
Leonardo Badino, Alessio Mereta, Lorenzo Rosasco
A hybrid dynamic time warping-deep neural network architecture for unsupervised acoustic modeling
Roland Thiollière, Ewan Dunbar, Gabriel Synnaeve, Maarten Versteegh, Emmanuel Dupoux
Automatic segmentation and clustering of speech using sparse coding and metaheuristic search
Wiehan Agenbag, Thomas Niesler
Parallel inference of dirichlet process Gaussian mixture models for unsupervised acoustic modeling: a feasibility study
Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li
Using articulatory features and inferred phonological segments in zero resource speech processing
Pallavi Baljekar, Sunayana Sitaram, Prasanna Kumar Muthukumar, Alan W. Black
A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge
Daniel Renshaw, Herman Kamper, Aren Jansen, Sharon Goldwater
Unsupervised word discovery from speech using automatic segmentation into syllable-like units
Okko Räsänen, Gabriel Doyle, Michael C. Frank
An evaluation of graph clustering methods for unsupervised term discovery
Vince Lyzinski, Gregory Sell, Aren Jansen
A time delay neural network architecture for efficient modeling of long temporal contexts
Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur
Long short-term memory based convolutional recurrent neural networks for large vocabulary speech recognition
Xiangang Li, Xihong Wu
Parameterised sigmoid and reLU hidden activation functions for DNN acoustic modelling
C. Zhang, Philip C. Woodland
Discriminative template learning in group-convolutional networks for invariant speech representations
Chiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio
Investigation of parametric rectified linear units for noise robust speech recognition
Sunil Sivadas, Zhenzhou Wu, Ma Bin
Multi-softmax deep neural network for semi-supervised training
Hang Su, Haihua Xu
A multi-region deep neural network model in speech recognition
Jia Cui, George Saon, Bhuvana Ramabhadran, Brian Kingsbury
A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition
Liang Lu, Xingxing Zhang, Kyunghyun Cho, Steve Renals
Gaussian free cluster tree construction using deep neural network
Linchen Zhu, Kevin Kilgour, Sebastian Stüker, Alex Waibel
Very deep convolutional neural networks for LVCSR
Mengxiao Bi, Yanmin Qian, Kai Yu
Transferring knowledge from a RNN to a DNN
William Chan, Nan Rosemary Ke, Ian Lane
SVD-based universal DNN modeling for multiple scenarios
Changliang Liu, Jinyu Li, Yifan Gong
Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks
Zhuo Chen, Shinji Watanabe, Hakan Erdogan, John R. Hershey
Speaker-dependent multipitch tracking using deep neural networks
Yuzhou Liu, DeLiang Wang
An error correction scheme for GCI detection algorithms using pitch smoothness criterion
Sujith P., A. P. Prathosh, A. G. Ramakrishnan, Prasanta Kumar Ghosh
Robust pitch estimation in noisy speech using ZTW and group delay function
RaviShankar Prasad, B. Yegnanarayana
Robust localization of single sound source based on phase difference regression
Zhaoqiong Huang, Ge Zhan, Dongwen Ying, Yonghong Yan
Frequency map selection using a RBFN-based classifier in the MVDR beamformer for speaker localization in reverberant rooms
Daniele Salvati, Carlo Drioli, Gian Luca Foresti
Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions
Ning Ma, Guy J. Brown, Tobias May
Joint optimization of recurrent networks exploiting source auto-regression for source separation
Shuai Nie, Wei Xue, Shan Liang, Xueliang Zhang, Wenju Liu, Liwei Qiao, Jianping Li
Real-time audio-to-score alignment of singing voice based on melody and lyric information
Rong Gong, Philippe Cuvillier, Nicolas Obin, Arshia Cont
Vocal separation from monaural music using adaptive auditory filtering based on kernel back-fitting
Jun-Yong Lee, Hye-Seung Cho, Hyoung-Gook Kim
A two-stage singing voice separation algorithm using spectro-temporal modulation features
Frederick Z. Yen, Mao-Chang Huang, Tai-Shih Chi
Robust sound event classification using LBP-HOG based bag-of-audio-words feature representation
Hyungjun Lim, Myung Jong Kim, Hoirin Kim
Action planning and congruency effect between articulation and grasping
Mikko Tiainen, Lari Vainio, Kaisa Tiippana, Naeem Komeilipoor, Martti Vainio
Cognitive workload and vocabulary sparseness: theory and practice
Ron M. Hecht, Aharon Bar-Hillel, Stas Tiomkin, Hadar Levi, Omer Tsimhoni, Naftali Tishby
Counting competing speakers in a timeframe — human versus computer
Valentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu
Segmental contribution to the intelligibility of ideal binary-masked sentences
Fei Chen, Alexander Siu Tai Kwok
Perception of an existing and non-existing L2 English phoneme behind noise by Japanese native speakers
Mako Ishida, Takayuki Arai
Viseme comparison based on phonetic cues for varying speech accents
Chitralekha Bhat, Sunil Kopparapu
Quantifying difference in vocalizations of bird populations
Colm O'Reilly, Nicola M. Marples, David J. Kelly, Naomi Harte
Reverberation-robust acoustic indoor localization
Jae Choi, Jeunghun Kim, Shin Jae Kang, Nam Soo Kim
An alternating optimization approach for phase retrieval
Huaiping Ming, Dong-Yan Huang, Lei Xie, Haizhou Li, Minghui Dong
Learning to estimate reverberation time in noisy and reverberant rooms
Xiong Xiao, Shengkui Zhao, Xionghu Zhong, Douglas L. Jones, Eng Siong Chng, Haizhou Li
Direction of arrival estimation based on reverberation weighting and noise error estimator
Cheng Pang, Jie Zhang, Hong Liu
Representing nonspeech audio signals through speech classification models
Huy Phan, Lars Hertel, Marco Maass, Radoslaw Mazur, Alfred Mertins
Mitigating the effects of non-stationary unseen noises on language recognition performance
Luciana Ferrer, Mitchell McLaren, Aaron Lawson, Martin Graciarena
An information theory based data-homogeneity measure for voice comparison
Moez Ajili, Jean-François Bonastre, Solange Rossato, Juliette Kahn, Itshak Lapidot
The QUT-NOISE-SRE protocol for the evaluation of noisy speaker recognition
David Dean, Ahilan Kanagasundaram, Houman Ghaemmaghami, Md. Hafizur Rahman, Sridha Sridharan
Score stabilization for speaker recognition trained on a small development set
Hagai Aronowitz
Anti-spoofing system: an investigation of measures to detect synthetic and human speech
Abhinav Misra, Shivesh Ranjan, Chunlei Zhang, John H. L. Hansen
A likelihood ratio-based forensic voice comparison in microphone vs. mobile mismatched conditions using Japanese /ai/
Michael J. Carne
Are we using enough listeners? no! — an empirically-supported critique of interspeech 2014 TTS evaluations
Mirjam Wester, Cassia Valentini-Botinhao, Gustav Eje Henter
How to compare TTS systems: a new subjective evaluation methodology focused on differences
Jonathan Chevelu, Damien Lolive, Sébastien Le Maguer, David Guennec
Double-ended prediction of the naturalness ratings of the blizzard challenge 2008-2013
Lukas Latacz, Werner Verhelst
Entropy-based sentence selection for speech synthesis using phonetic and prosodic contexts
Takashi Nose, Yusuke Arao, Takao Kobayashi, Komei Sugiura, Yoshinori Shiga, Akinori Ito
A comparison of speech synthesis systems based on GPR, HMM, and DNN with a small amount of training data
Tomoki Koriyama, Takao Kobayashi
Objective intelligibility assessment of text-to-speech systems through utterance verification
Raphael Ullmann, Ramya Rasipuram, Mathew Magimai-Doss, Hervé Bourlard
Continuous word representation using neural networks for proper name retrieval from diachronic documents
Dominique Fohr, Irina Illina
Recurrent neural network language model adaptation for multi-genre broadcast speech recognition
X. Chen, T. Tan, Xunying Liu, Pierre Lanchantin, M. Wan, Mark J. F. Gales, Philip C. Woodland
Paragraph vector based topic model for language model adaptation
Wengong Jin, Tianxing He, Yanmin Qian, Kai Yu
Personalized speech recognizer with keyword-based personalized lexicon and language model using word vector representations
Ching-Feng Yeh, Yuan-ming Liou, Hung-yi Lee, Lin-shan Lee
Discriminative data selection for lightly supervised training of acoustic model using closed caption texts
Sheng Li, Yuya Akita, Tatsuya Kawahara
Cross-lingual transfer learning during supervised training in low resource scenarios
Amit Das, Mark Hasegawa-Johnson
Robust speech processing using observation uncertainty and uncertainty propagation: session and paper overview
Ramón F. Astudillo, Shinji Watanabe, Ahmed Hussen Abdelaziz, Dorothea Kolossa
Uncertainty propagation for noise robust speaker recognition: the case of NIST-SRE
Dayana Ribas, Emmanuel Vincent, José Ramón Calvo
Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features
Yuuki Tachioka, Shinji Watanabe
Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation
Rahim Saeidi, Paavo Alku
Autoencoder based multi-stream combination for noise robust speech recognition
Sri Harish Mallidi, Tetsuji Ogawa, Karel Veselý, Phani S. Nidadavolu, Hynek Hermansky
Uncertainty decoding for DNN-HMM hybrid systems based on numerical sampling
Christian Huemmer, Roland Maas, Andreas Schwarz, Ramón F. Astudillo, Walter Kellermann
Uncertainty propagation through deep neural networks
Ahmed Hussen Abdelaziz, Shinji Watanabe, John R. Hershey, Emmanuel Vincent, Dorothea Kolossa
Handling derivative filterbank features in bounded-marginalization-based missing data automatic speech recognition
Marco Kühne
Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR
Arun Narayanan, Ananya Misra, Kean K. Chin
Integration of DNN based speech enhancement and ASR
Ramón F. Astudillo, Joana Correia, Isabel Trancoso
A general artificial neural network extension for HTK
C. Zhang, Philip C. Woodland
Audio augmentation for speech recognition
Tom Ko, Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur
A diversity-penalizing ensemble training method for deep learning
Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur
Deep neural network training emphasizing central frames
Gakuto Kurata, Daniel Willett
Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach
Kai Chen, Zhi-Jie Yan, Qiang Huo
Structured output layer with auxiliary targets for context-dependent acoustic modelling
Pawel Swietojanski, Peter Bell, Steve Renals
Complementary tasks for context-dependent deep neural network acoustic models
Peter Bell, Steve Renals
Towards end-to-end speech recognition for Chinese Mandarin using long short-term memory recurrent neural networks
Jie Li, Heng Zhang, Xinyuan Cai, Bo Xu
Improving deep neural networks based multi-accent Mandarin speech recognition using i-vectors and accent-specific top layer
Mingming Chen, Zhanlei Yang, Jizhong Liang, Yanpeng Li, Wenju Liu
Rapid adaptation for deep neural networks through multi-task learning
Zhen Huang, Jinyu Li, Sabato Marco Siniscalchi, I-Fan Chen, Ji Wu, Chin-Hui Lee
fMLLR based feature-space speaker adaptation of DNN acoustic models
Sree Hari Krishnan Parthasarathi, Bjorn Hoffmeister, Spyros Matsoukas, Arindam Mandal, Nikko Strom, Sri Garimella
I-vector dependent feature space transformations for adaptive speech recognition
Xiangang Li, Xihong Wu
Unsupervised domain discovery using latent dirichlet allocation for acoustic modelling in speech recognition
Mortaza Doulaty, Oscar Saz, Thomas Hain
Training data selection for acoustic modeling via submodular optimization of joint kullback-leibler divergence
Taichi Asami, Ryo Masumura, Hirokazu Masataki, Manabu Okamoto, Sumitaka Sakauchi
Combination of NN and CRF models for joint detection of punctuation and disfluencies
Eunah Cho, Kevin Kilgour, Jan Niehues, Alex Waibel
Tunable keyword-aware language modeling and context dependent fillers for LVCSR-based spoken keyword search
Tze Siong Lau, I-Fan Chen, Chin-Hui Lee
Joint decoding of tandem and hybrid systems for improved keyword spotting on low resource languages
Haipeng Wang, Anton Ragni, Mark J. F. Gales, Kate M. Knill, Philip C. Woodland, C. Zhang
Preserving word-level emphasis in speech-to-speech translation using linear regression HSMMs
Quoc Truong Do, Shinnosuke Takamichi, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura
Phonology-augmented statistical transliteration for low-resource languages
Hoang Gia Ngo, Nancy F. Chen, Binh Minh Nguyen, Bin Ma, Haizhou Li
Evaluation of re-ranking by prioritizing highly ranked documents in spoken term detection
Kazuki Oouchi, Ryota Konno, Takahiro Akyu, Kazuma Konno, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh
Distinctive feature based representation of speech for query-by-example spoken term detection
Abhijeet Saxena, B. Yegnanarayana
Combination of diverse subword units in spoken term detection
Shi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh
Sparse modeling of posterior exemplars for keyword detection
Dhananjay Ram, Afsaneh Asaei, Pranay Dighe, Hervé Bourlard
Stress level detection using double-layer subband filter
Tin Lay Nwe, Qianli Xu, Cuntai Guan, Bin Ma
Prosodic characteristics of read speech before and after treadmill running
Jürgen Trouvain, Khiet P. Truong
A database for analysis of speech under physical stress: detection of exercise intensity while running and talking
Khiet P. Truong, Arne Nieuwenhuys, Peter Beek, Vanessa Evers
Stressed out: what speech tells us about stress
Will Paul, Cecilia Ovesdotter Alm, Reynold Bailey, Joe Geigel, Linwei Wang
Prediction of heart rate changes from speech features during interaction with a misbehaving dialog system
Andreas Tsiartas, Andreas Kathol, Elizabeth Shriberg, Massimiliano de Zambotti, Adrian Willoughby
Acoustic correlates for perceived effort levels in expressive speech
Mary Pietrowicz, Mark Hasegawa-Johnson, Karrie Karahalios
Pitch-based speech perturbation measures using a novel GCI detection algorithm: application to pathological voice classification
Khalid Daoudi, Ashwini Jaya Kumar
Speech-based assessment of PTSD in a military population using diverse feature classes
Dimitra Vergyri, Bruce Knoth, Elizabeth Shriberg, Vikramjit Mitra, Mitchell McLaren, Luciana Ferrer, Pablo Garcia, Charles Marmar
Cognitive impairment prediction in the elderly based on vocal biomarkers
Bea Yu, Thomas F. Quatieri, James R. Williamson, James C. Mundt
Automatic age detection in normal and pathological voice
J. -A. Gómez-García, L. Moro-Velázquez, Juan Ignacio Godino-Llorente, G. Castellanos-Domínguez