Prosody 1-3

Stable and unstable intervals as a basic segmentation procedure of the speech signal
Ulrike Glavitsch, Lei He, Volker Dellwo

Polysyllabic shortening and word-final lengthening in English
Andreas Windmann, Juraj Šimko, Petra Wagner

The acoustics of word stress in English as a function of stress level and speaking style
Anders Eriksson, Mattias Heldner

Pitch accent distribution in German infant-directed speech
Katharina Zahner, Muna Pohl, Bettina Braun

Acoustic correlates of perceived syllable prominence in German
Hansjörg Mixdorff, Christian Cossio-Mercado, Angelika Hönemann, Jorge Gurlekian, Diego Evin, Humberto Torres

Cross-modality matching of linguistic and emotional prosody
Simone Simonetti, Jeesun Kim, Chris Davis

Pitch scaling as a perceptual cue for questions in German
Jan Michalsky

Parameterization of prosodic headedness
Uwe D. Reichel, Katalin Mády, Štefan Beňuš

Detection of mizo tones
Biswajit Dev Sarma, Priyankoo Sarmah, Wendy Lalhminghlui, S. R. Mahadeva Prasanna

The intonation of echo wh-questions
Sophie Repp, Lena Rosin

Immediately postverbal questions in urdu
Farhat Jabeen, Tina Bögel, Miriam Butt

Prosodic (non-)realisation of broad, narrow and contrastive focus in Hungarian: a production and a perception study
Katalin Mády

F0 discontinuity as a marker of prosodic boundary strength in lombard speech
Štefan Beňuš, Uwe D. Reichel, Juraj Šimko

Comparing journalistic and spontaneous speech: prosodic and spectral analysis
Cédric Gendrot, Martine Adda-Decker, Yaru Wu

Rhythm influences the tonal realisation of focus
Nadja Schauffler, Katrin Schweitzer

Linguistic measures of pitch range in slavic and Germanic languages
Bistra Andreeva, Bernd Möbius, Grazyna Demenko, Frank Zimmerer, Jeanin Jügler

The effect of stress on vowel space in daxi hakka Chinese
Chunan Qiu, Jie Liang

Declination, peak height and pitch level in declaratives and questions of south connaught irish
Maria O'Reilly, Ailbhe Ní Chasaide

Contextual variation of tones in mizo
Priyankoo Sarmah, Leena Dihingia, Wendy Lalhminghlui

The prosodic marking of rhetorical questions in German
Daniela Wochner, Jana Schlegel, Nicole Dehé, Bettina Braun

Spoken Language Understanding 1-3

Deep contextual language understanding in spoken dialogue systems
Chunxi Liu, Puyang Xu, Ruhi Sarikaya

RNN-based labeled data generation for spoken language understanding
Yik-Cheung Tam, Yangyang Shi, Hunk Chen, Mei-Yuh Hwang

Is it time to switch to word embedding and recurrent neural networks for spoken language understanding?
Vedran Vukotic, Christian Raymond, Guillaume Gravier

Recurrent neural network and LSTM models for lexical utterance classification
Suman Ravuri, Andreas Stolcke

Semantic retrieval of personal photos using a deep autoencoder fusing visual features with speech annotations represented as word/paragraph vectors
Hung-tsung Lu, Yuan-ming Liou, Hung-yi Lee, Lin-shan Lee

A comparison of normalization techniques applied to latent space representations for speech analytics
Mohamed Morchid, Richard Dufour, Driss Matrouf

Study of entity-topic models for OOV proper name retrieval
Imran Sheikh, Irina Illina, Dominique Fohr

Audio quotation marks for natural language understanding
Simon Boutin, Réal Tremblay, Patrick Cardinal, Doug Peters, Pierre Dumouchel

Using word confusion networks for slot filling in spoken language understanding
Xiaohao Yang, Jia Liu

Distributed representation-based spoken word sense induction
Justin Chiu, Yajie Miao, Alan W. Black, Alexander I. Rudnicky

Structuring lectures in massive open online courses (MOOCs) for efficient learning by linking similar sections and predicting prerequisites
Sheng-syun Shen, Hung-yi Lee, Shang-wen Li, Victor Zue, Lin-shan Lee

News talk-show chaptering with journalistic genres
Delphine Charlet, Géraldine Damnati, Jérémy Trione

An analysis of time-aggregated and time-series features for scoring different aspects of multimodal presentation data
Vikram Ramanarayanan, Lei Chen, Chee Wee Leong, Gary Feng, David Suendermann-Oeft

Incorporating prosodic prominence evidence into term weights for spoken content retrieval
David N. Racca, Gareth J. F. Jones

Leveraging word embeddings for spoken document summarization
Kuan-Yu Chen, Shih-Hung Liu, Hsin-Min Wang, Berlin Chen, Hsin-Hsi Chen

Mutually exclusive grounding for weakly supervised non-negative matrix factorisation
Vincent Renkens, Hugo Van hamme

Using semantic maps for robust natural language interaction with robots
Emanuele Bastianelli, Danilo Croce, Roberto Basili, Daniele Nardi

Efficient learning for spoken language understanding tasks with word embedding based pre-training
Yi Luan, Shinji Watanabe, Bret Harsham

Zero-shot semantic parser for spoken language understanding
Emmanuel Ferreira, Bassam Jabaian, Fabrice Lefèvre

Adapting lexical representation and OOV handling from written to spoken language with word embedding
Jeremie Tafforeau, Thierry Artieres, Benoit Favre, Frederic Bechet

Dialog state tracking using long short-term memory neural networks
Xiaohao Yang, Jia Liu

Detecting repetitions in spoken dialogue systems using phonetic distances
José Lopes, Giampiero Salvi, Gabriel Skantze, Alberto Abad, Joakim Gustafson, Fernando Batista, Raveesh Meena, Isabel Trancoso

Multi-language hypotheses ranking and domain tracking for open domain dialogue systems
Paul A. Crook, Jean-Philippe Robichaud, Ruhi Sarikaya

Measuring mimicry in task-oriented conversations: degree of mimicry is related to task difficulty
Vijay Solanki, Alessandro Vinciarelli, Jane Stuart-Smith, Rachel Smith

Auto-imputing radial basis functions for neural-network turn-taking models
Kornel Laskowski

Effect of gender and call duration on customer satisfaction in call center big data
Quim Llimona, Jordi Luque, Xavier Anguera, Zoraida Hidalgo, Souneil Park, Nuria Oliver

Using profile similarity to measure agreement in personality perception
Zoraida Callejas, David Griol

Relieving mental stress of speakers using a tele-operated robot in foreign language speech education
Shizuka Nakamura, Miki Watanabe, Yuichiro Yoshikawa, Kohei Ogawa, Hiroshi Ishiguro

Backward mimicry and forward influence in prosodic contour choice in standard American English
Agustín Gravano, Štefan Beňuš, Rivka Levitan, Julia Hirschberg

The role of speakers and context in classifying competition in overlapping speech
Shammur Absar Chowdhury, Morena Danieli, Giuseppe Riccardi

Automatic detection and annotation of disfluencies in spoken French corpora
George Christodoulides, Mathieu Avanzi

Clustering novel intents in a conversational interaction system with semantic parsing
Dilek Hakkani-Tür, Yun-Cheng Ju, Geoffrey Zweig, Gokhan Tur

Semantic analysis of spoken input using Markov logic networks
Vladimir Despotovic, Oliver Walter, Reinhold Haeb-Umbach

Hierarchical discriminative model for spoken language understanding based on convolutional neural network
Jan Švec, Adam Chýlek, Luboš Šmídl

Learning semantic hierarchy with distributed representations for unsupervised spoken language understanding
Yun-Nung Chen, William Yang Wang, Alexander I. Rudnicky

Speaker Recognition and Diarization 1-3

Multi-task learning for text-dependent speaker verification
Nanxin Chen, Yanmin Qian, Kai Yu

JFA for speaker recognition with random digit strings
Themos Stafylakis, Patrick Kenny, Md. Jahangir Alam, Marcel Kockmann

Structured prediction for speaker identification in TV series
Elena Knyazeva, Guillaume Wisniewski, Hervé Bredin, François Yvon

Speaker recognition by means of acoustic and phonetically informed GMMs
Sandro Cumani, Pietro Laface, Farzana Kulsoom

A fast approach to psychoacoustic model compensation for robust speaker recognition in additive noise
Ashish Panda

Blind score normalization method for PLDA based speaker recognition
Danila Doroshin, Nikolay Lubimov, Marina Nastasenko, Mikhail Kotov

Non-linear PLDA for i-vector speaker verification
Sergey Novoselov, Timur Pekhovsky, Oleg Kudashev, Valentin S. Mendelev, Alexey Prudnikov

On the need of template protection for voice authentication
Carlos Vaquero, Patricia Rodríguez

Evaluation and calibration of short-term aging effects in speaker verification
Finnian Kelly, John H. L. Hansen

Phone-centric local variability vector for text-constrained speaker verification
Liping Chen, Kong Aik Lee, Bin Ma, Wu Guo, Haizhou Li, Li-Rong Dai

Cosine distance features for robust speaker verification
Kuruvachan K. George, C. Santhosh Kumar, K I Ramachandran, Ashish Panda

Voice liveness detection algorithms based on pop noise caused by human breath for automatic speaker verification
Sayaka Shiota, Fernando Villavicencio, Junichi Yamagishi, Nobutaka Ono, Isao Echizen, Tomoko Matsui

Noise robust speaker recognition with convolutive sparse coding
Antti Hurmalainen, Rahim Saeidi, Tuomas Virtanen

Combining amplitude and phase-based features for speaker verification with short duration utterances
Md. Jahangir Alam, Patrick Kenny, Themos Stafylakis

The reddots data collection for speaker recognition
Kong Aik Lee, Anthony Larcher, Guangsen Wang, Patrick Kenny, Niko Brümmer, David van Leeuwen, Hagai Aronowitz, Marcel Kockmann, Carlos Vaquero, Bin Ma, Haizhou Li, Themos Stafylakis, Md. Jahangir Alam, Albert Swart, Javier Perez

Noise-robust speaker recognition based on morphological component analysis
Yongjun He, Chen Chen, Jiqing Han

Analysis of mutual duration and noise effects in speaker recognition: benefits of condition-matched cohort selection in score normalization
Andreas Nautsch, Rahim Saeidi, Christian Rathgeb, Christoph Busch

Robustness to additive noise of locally-normalized cepstral coefficients in speaker verification
Josué Fredes, José Novoa, Victor Poblete, Simon King, Richard M. Stern, Néstor Becerra Yoma

Probabilistic linear discriminant analysis for robust speaker identification in co-channel speech
Navid Shokouhi, John H. L. Hansen

Community detection with manifold learning on speaker i-vector space for Chinese
Hongcui Wang, Di Jin, Lantian Li, Jianwu Dang

A comparison of neural network feature transforms for speaker diarization
Sree Harsha Yella, Andreas Stolcke

Clustering short push-to-talk segments
Ilya Shapiro, Neta Rabin, Irit Opher, Itshak Lapidot

Exploring ANN back-ends for i-vector based speaker age estimation
Anna Fedorova, Ondřej Glembek, Tomi Kinnunen, Pavel Matějka

Analysis of the second phase of the 2013-2014 i-vector machine learning challenge
Désiré Bansé, George R. Doddington, Daniel Garcia-Romero, John J. Godfrey, Craig S. Greenberg, Jaime Hernández-Cordero, John M. Howard, Alvin F. Martin, Lisa P. Mason, Alan McCree, Douglas A. Reynolds

NIST language recognition evaluation — plans for 2015
Alvin F. Martin, Craig S. Greenberg, John M. Howard, Désiré Bansé, George R. Doddington, Jaime Hernández-Cordero, Lisa P. Mason

Factor analysis for speaker segmentation and improved speaker diarization
Brecht Desplanques, Kris Demuynck, Jean-Pierre Martens

Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations
Koji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Katsuya Takanashi, Tatsuya Kawahara

Novel clustering selection criterion for fast binary key speaker diarization
Héctor Delgado, Xavier Anguera, Corinne Fredouille, Javier Serrano

Speaker diarization with i-vectors from DNN senone posteriors
Gregory Sell, Daniel Garcia-Romero, Alan McCree

Using voice-quality measurements with prosodic and spectral features for speaker diarization
Abraham Woubie, Jordi Luque, Javier Hernando

Integrating online i-vector extractor with information bottleneck based speaker diarization system
Srikanth Madikeri, Ivan Himawan, Petr Motlicek, Marc Ferras

Speech Synthesis 1-3

Phase perception of the glottal excitation of vocoded speech
Tuomo Raitio, Lauri Juvela, Antti Suni, Martti Vainio, Paavo Alku

Using acoustics to improve pronunciation for synthesis of low resource languages
Sunayana Sitaram, Serena Jeblee, Alan W. Black

Sub-band text-to-speech combining sample-based spectrum with statistically generated spectrum
Tadashi Inai, Sunao Hara, Masanobu Abe, Yusuke Ijima, Noboru Miyazaki, Hideyuki Mizuno

Pruning redundant synthesis units based on static and delta unit appearance frequency
Heng Lu, Wei Zhang, Xu Shao, Quan Zhou, Wenhui Lei, Hongbin Zhou, Andrew Breen

Emotional transplant in statistical speech synthesis based on emotion additive model
Yamato Ohtani, Yu Nasu, Masahiro Morita, Masami Akamine

Generalized variable parameter HMMs based acoustic-to-articulatory inversion
Xurong Xie, Xunying Liu, Lan Wang, Rongfeng Su

Semi-supervised training of a voice conversion mapping function using a joint-autoencoder
Seyed Hamidreza Mohammadi, Alexander Kain

On glottal source shape parameter transformation using a novel deterministic and stochastic speech analysis and synthesis system
Stefan Huber, Axel Roebel

Fluent personalized speech synthesis with prosodic word-level spontaneous speech generation
Yi-Chin Huang, Chung-Hsien Wu, Ming-Ge Shie

Non-native speech synthesis preserving speaker individuality based on partial correction of prosodic and phonetic characteristics
Yuji Oshima, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura

Evaluation of state mapping based foreign accent conversion
Markus Toman, Michael Pucher

Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features
Zhizheng Wu, Simon King

Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis
Yang Wang, Minghao Yang, Zhengqi Wen, Jianhua Tao

F0 parameterization of glottalized tones for HMM-based vietnamese TTS
Duy Khanh Ninh, Yoichi Yamashita

Deep neural network context embeddings for model selection in rich-context HMM synthesis
Thomas Merritt, Junichi Yamagishi, Zhizheng Wu, Oliver Watts, Simon King

An investigation of context clustering for statistical speech synthesis with deep neural network
Bo Chen, Zhehuai Chen, Jiachen Xu, Kai Yu

Sentence-level control vectors for deep neural network speech synthesis
Oliver Watts, Zhizheng Wu, Simon King

Micro-structure of disfluencies: basics for conversational speech synthesis
Simon Betz, Petra Wagner, David Schlangen

Using automatic stress extraction from audio for improved prosody modelling in speech synthesis
György Szaszák, András Beke, Gábor Olaszy, Bálint Pál Tóth

Reconstructing voices within the multiple-average-voice-model framework
Pierre Lanchantin, Christophe Veaux, Mark J. F. Gales, Simon King, Junichi Yamagishi

HMM based myanmar text to speech system
Ye Kyaw Thu, Win Pa Pa, Jinfu Ni, Yoshinori Shiga, Andrew Finch, Chiori Hori, Hisashi Kawai, Eiichiro Sumita

Multiple feed-forward deep neural networks for statistical parametric speech synthesis
Shinji Takaki, SangJin Kim, Junichi Yamagishi, JongJin Kim

Sequence-to-sequence neural net models for grapheme-to-phoneme conversion
Kaisheng Yao, Geoffrey Zweig

Knowledge versus data in TTS: evaluation of a continuum of synthesis systems
Rosie Kay, Oliver Watts, Roberto Barra Chicote, Cassie Mayo

Improving G2p from wiktionary and other (web) resources
Steffen Eger

BLSTM neural networks for speech driven head motion synthesis
Chuang Ding, Pengcheng Zhu, Lei Xie

Articulatory controllable speech modification based on Gaussian mixture models with direct waveform modification using spectrum differential
Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura

Reconstructing intelligible audio speech from visual speech features
Thomas Le Cornu, Ben Milner

Universal grapheme-based speech synthesis
Sunayana Sitaram, Alok Parlikar, Gopala Krishna Anumanchipalli, Alan W. Black

Artificial personality and disfluency
Mirjam Wester, Matthew Aylett, Marcus Tomalin, Rasmus Dall

Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis
Marc Evrard, Samuel Delalez, Christophe d'Alessandro, Albert Rilliard

A multi-layer F0 model for singing voice synthesis using a b-spline representation with intuitive controls
Luc Ardaillon, Gilles Degottex, Axel Roebel

Creating expressive synthetic voices by unsupervised clustering of audiobooks
Igor Jauk, Antonio Bonafonte, Paula Lopez-Otero, Laura Docio-Fernandez

Articulatory-based conversion of foreign accents with deep neural networks
Sandesh Aryal, Ricardo Gutierrez-Osuna

Interspeech 2015 Computational Paralinguistics ChallengE (ComParE): Degree of Nativeness, Parkinson's & Eating Condition (Special Session)

The INTERSPEECH 2015 computational paralinguistics challenge: nativeness, Parkinson's & eating condition
Björn Schuller, Stefan Steidl, Anton Batliner, Simone Hantke, Florian Hönig, J. R. Orozco-Arroyave, Elmar Nöth, Yue Zhang, Felix Weninger

The degree of nativeness sub-challenge: the data
Florian Hönig

Phrase accentuation verification and phonetic variation measurement for the degree of nativeness sub-challenge
Claude Montacié, Marie-José Caraty

Combining multiple approaches to predict the degree of nativeness
Eugénio Ribeiro, Jaime Ferreira, Julia Olcoz, Alberto Abad, Helena Moniz, Fernando Batista, Isabel Trancoso

Automated evaluation of non-native English pronunciation quality: combining knowledge- and data-driven features at multiple time scales
Matthew P. Black, Daniel Bone, Zisis Iason Skordilis, Rahul Gupta, Wei Xia, Pavlos Papadopoulos, Sandeep Nallan Chakravarthula, Bo Xiao, Maarten Van Segbroeck, Jangwon Kim, Panayiotis G. Georgiou, Shrikanth S. Narayanan

The Parkinson's condition sub-challenge: the data
J. R. Orozco-Arroyave

Estimating the severity of Parkinson's disease from speech using linear regression and database partitioning
Dávid Sztahó, Gábor Kiss, Klára Vicsi

Random forest-based prediction of Parkinson's disease progression using acoustic, ASR and intelligibility features
Alexander Zlotnik, Juan M. Montero, Rubén San-Segundo, Ascensión Gallardo-Antolín

Automatic recognition of unified Parkinson's disease rating from speech with acoustic, i-vector and phonotactic features
Guozhen An, David Guy Brizan, Min Ma, Michelle Morales, Ali Raza Syed, Andrew Rosenberg

Parkinson's condition estimation using speech acoustic and inversely mapped articulatory data
Seongjun Hahm, Jun Wang

Segment-dependent dynamics in predicting Parkinson's disease
James R. Williamson, Thomas F. Quatieri, Brian S. Helfer, Joseph Perricone, Satrajit S. Ghosh, Gregory Ciccarelli, Daryush D. Mehta

The eating condition sub-challenge: the data
Anton Batliner

Automatic classification of eating conditions from speech using acoustic feature selection and a set of hierarchical support vector machine classifiers
Abhay Prasad, Prasanta Kumar Ghosh

Combining hierarchical classification with frequency weighting for the recognition of eating conditions
Johannes Wagner, Andreas Seiderer, Florian Lingenfelser, Elisabeth André

Acoustic group feature selection using wrapper method for automatic eating condition recognition
Dara Pir, Theodore Brown

Comparing SVM, softmax, and shallow neural networks for eating condition classification
Thomas Pellegrini

Using representation learning and out-of-domain data for a paralinguistic speech task
Benjamin Milde, Chris Biemann

Fisher vectors with cascaded normalization for paralinguistic analysis
Heysem Kaya, Alexey A. Karpov, Albert Ali Salah

Automatic estimation of Parkinson's disease severity from diverse speech tasks
Jangwon Kim, Md. Nasir, Rahul Gupta, Maarten Van Segbroeck, Daniel Bone, Matthew P. Black, Zisis Iason Skordilis, Zhaojun Yang, Panayiotis G. Georgiou, Shrikanth S. Narayanan

Assessing the degree of nativeness and Parkinson's condition using Gaussian processes and deep rectifier neural networks
Tamás Grósz, Róbert Busa-Fekete, Gábor Gosztolya, László Tóth

The INTERSPEECH 2015 computational paralinguistics challenge: a summary of results
Stefan Steidl

Wrapping up: the story of the compare challenges, what we learned and where to go
Anton Batliner

Pronunciation, Prosody and Audiovisual Features and Models

Recognition of voiced sounds with a continuous state HMM
S. M. Houghton, Colin J. Champion, Philip Weber

Learning speech rate in speech recognition
Xiangyu Zeng, Shi Yin, Dong Wang

Pronunciation and silence probability modeling for ASR
Guoguo Chen, Hainan Xu, Minhua Wu, Daniel Povey, Sanjeev Khudanpur

Exploring minimal pronunciation modeling for low resource languages
Marelie Davel, Etienne Barnard, Charl van Heerden, William Hartmann, Damianos Karakos, Richard Schwartz, Stavros Tsakalidis

Attribute knowledge integration for speech recognition based on multi-task learning neural networks
Hao Zheng, Zhanlei Yang, Liwei Qiao, Jianping Li, Wenju Liu

Detecting audio-visual synchrony using deep neural networks
Etienne Marcheret, Gerasimos Potamianos, Josef Vopicka, Vaibhava Goel

Cross database training of audio-visual hidden Markov models for phone recognition
Shahram Kalantari, David Dean, Houman Ghaemmaghami, Sridha Sridharan, Clinton Fookes

Incorporating visual information for spoken term detection
Shahram Kalantari, David Dean, Sridha Sridharan

Integration of deep bottleneck features for audio-visual speech recognition
Hiroshi Ninomiya, Norihide Kitaoka, Satoshi Tamura, Yurie Iribe, Kazuya Takeda

Automatic detection of sentence prominence in speech using predictability of word-level acoustic features
Sofoklis Kakouros, Okko Räsänen

An empirical model of emphatic word detection
Milos Cernak, Pierre-Edouard Honnet

Using tilt for automatic emphasis detection with Bayesian networks
Yishuang Ning, Zhiyong Wu, Xiaoyan Lou, Helen Meng, Jia Jia, Lianhong Cai

Speech Analysis and Representation 1-3

Analysis of a low-dimensional bottleneck neural network representation of speech for modelling speech dynamics
Linxue Bai, Peter Jančovič, Martin Russell, Philip Weber

Statistical acoustic-to-articulatory mapping unified with speaker normalization based on voice conversion
Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu, Keikichi Hirose

Analysis of features from analytic representation of speech using MP-ABX measures
Raghavendra Reddy Pappagari, Karthika Vijayan, K. Sri Rama Murty

Source-filter separation of speech signal in the phase domain
Erfan Loweimi, Jon Barker, Thomas Hain

A maximum likelihood approach to the detection of moments of maximum excitation and its application to high-quality speech parameterization
Ranniery Maia, Yannis Stylianou, Masami Akamine

SABR: sparse, anchor-based representation of the speech signal
Christopher Liberatore, Sandesh Aryal, Zelun Wang, Seth Polsley, Ricardo Gutierrez-Osuna

Automatic transformation of irregular to regular voice by residual analysis and synthesis
Tamás Gábor Csapó, Géza Németh

Optical sensor calibration for electro-optical stomatography
Simon Preuß, Peter Birkholz

From text to formants — indirect model for trajectory prediction based on a multi-speaker parallel speech database
Kálmán Abari, Tamás Gábor Csapó, Bálint Pál Tóth, Gábor Olaszy

Layered nonnegative matrix factorization for speech separation
Chung-Chien Hsu, Jen-Tzung Chien, Tai-Shih Chi

Robust tongue tracking in ultrasound images: a multi-hypothesis approach
Catherine Laporte, Lucie Ménard

Objective measures for predicting the intelligibility of spectrally smoothed speech with artificial excitation
Danny Websdale, Thomas Le Cornu, Ben Milner

Vocal tremor analysis via AM-FM decomposition of empirical modes of the glottal cycle length time series
Christophe Mertens, Francis Grenez, François Viallet, Alain Ghio, Sabine Skodda, Jean Schoentgen

Estimating lower vocal tract features with closed-open phase spectral analyses
Elizabeth Godoy, Nicolas Malyska, Thomas F. Quatieri

Inductive implementation of segmental HMMs as CS-HMMs
S. M. Houghton, Colin J. Champion

A discriminative analysis within and across voiced and unvoiced consonants in neutral and whispered speech in multiple indian languages
G. Nisha Meenakshi, Prasanta Kumar Ghosh

Aligning meeting recordings via adaptive fingerprinting
T. J. Tsai, Andreas Stolcke

On representation learning for artificial bandwidth extension
Matthias Zöhrer, Robert Peharz, Franz Pernkopf

AM-FM based filter bank analysis for estimation of spectro-temporal envelopes and its application for speaker recognition in noisy reverberant environments
Dhananjaya Gowda, Rahim Saeidi, Paavo Alku

Fast and accurate phase unwrapping
Thomas Drugman, Yannis Stylianou

Sparse representation with temporal max-smoothing for acoustic event detection
Xugang Lu, Peng Shen, Yu Tsao, Chiori Hori, Hisashi Kawai

Estimation of glottal closure instants from telephone speech using a group delay-based approach that considers speech signal as a spectrum
G Anushiya Rachel, P Vijayalakshmi, T Nagarajan

The role of prosody and voice quality in text-dependent categories of storytelling across languages
Raúl Montaño, Francesc Alías

Neuromorphic based oscillatory device for incremental syllable boundary detection
Alexandre Hyafil, Milos Cernak

Speech Recognition — Technologies and Systems for New Applications

Mispronunciation detection without nonnative training data
Ann Lee, James Glass

Automatic accentedness evaluation of non-native speech using phonetic and sub-phonetic posterior probabilities
Ramya Rasipuram, Milos Cernak, Alexandre Nachen, Mathew Magimai-Doss

Using F0 contours to assess nativeness in a sentence repeat task
Min Ma, Keelan Evanini, Anastassia Loukina, Xinhao Wang, Klaus Zechner

Using linguistic indicators of difficulty to identify mild cognitive impairment
Rebecca Lunsford, Peter A. Heeman

Automatic intelligibility measures applied to speech signals simulating age-related hearing loss
Lionel Fontan, Jérôme Farinas, Isabelle Ferrané, Julien Pinquier, Xavier Aumont

Assessing empathy using static and dynamic behavior models based on therapist's language in addiction counseling
Sandeep Nallan Chakravarthula, Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou

SVitchboard II and fiSVer i: high-quality limited-complexity corpora of conversational English speech
Yuzong Liu, Rishabh Iyer, Katrin Kirchhoff, Jeff Bilmes

Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian model
Herman Kamper, Aren Jansen, Sharon Goldwater

LSTM for punctuation restoration in speech transcripts
Ottokar Tilk, Tanel Alumäe

Noise robust exemplar matching for speech enhancement: applications to automatic speech recognition
Emre Yılmaz, Deepak Baby, Hugo Van hamme

A study on robust detection of pronunciation erroneous tendency based on deep neural network
Yingming Gao, Yanlu Xie, Wen Cao, Jinsong Zhang

Vowel mispronunciation detection using DNN acoustic models with cross-lingual training
Shrikant Joshi, Nachiket Deo, Preeti Rao

Confidence-features and confidence-scores for ASR applications in arbitration and DNN speaker adaptation
Kshitiz Kumar, Ziad Al Bawab, Yong Zhao, Chaojun Liu, Benoit Dumoulin, Yifan Gong

Topic modeling for conference analytics
Pengfei Liu, Shoaib Jameel, Wai Lam, Bin Ma, Helen Meng

Sparse coding based features for speech units classification
Pulkit Sharma, Vinayak Abrol, A. D. Dileep, Anil Kumar Sao

Show and Tell Session 1-4 (Special Session)

Smarter driving with IDA, the intelligent driving assistant for singapore
Andreea I. Niculescu, Ngoc Thuy Huong Thai, Chongjia Ni, Boon Pang Lim, Kheng Hui Yeo, Rafael E. Banchs

Talk it out: adding speech interaction to support informational and transactional applications on public touch-screen kiosks
Kheng Hui Yeo, Rafael E. Banchs

Conversational agent and management tools for conference and tourism domain
Luis Fernando D'Haro, Seokhwan Kim, Rafael E. Banchs

Latvian speech-to-text transcription service
Askars Salimbajevs, Jevgenijs Strigins

System supporting speaker identification in emergency call center
Jakub Gałka, Joanna Grzybowska, Magdalena Igras, Paweł Jaciów, Kamil Wajda, Marcin Witkowski, Mariusz Ziółko

QAT2 — the QCRI advanced transcription and translation system
Ahmed Abdelali, Ahmed Ali, Francisco Guzmán, Felix Stahlberg, Stephan Vogel, Yifan Zhang

Implementation of a live dialectal media subtitling system
Michael Stadtschnitzer, Christoph Schmidt

A system for automatic broadcast news summarisation, geolocation and translation
Peter Bell, Catherine Lai, Clare Llewellyn, Alexandra Birch, Mark Sinclair

Media monitoring system for latvian radio and TV broadcasts
Artūrs Znotiņš, Kaspars Polis, Roberts Darģis

Meeting assistant application
Michel Assayag, Jonathan Huang, Jonathan Mamou, Oren Pereg, Saurav Sahay, Oren Shamir, Georg Stemmer, Moshe Wasserblat

SARMATA 2.0 automatic Polish language speech recognition system
Bartosz Ziółko, Tomasz Jadczyk, Dawid Skurzok, Piotr Żelasko, Jakub Gałka, Tomasz Pȩdzimąż, Ireneusz Gawlik, Szymon Pałka

Remeeting — get more out of meetings
Arlo Faria, Korbinian Riedhammer

Web application system for pronunciation practice by children with disabilities and to support cooperation of teachers and medical workers
Ikuyo Masuda-Katsuse

PATSY — it's all about pronunciation!
Caroline Kaufhold, Vadim Gamidov, Andreas Kiessling, Klaus Reinhard, Elmar Nöth

Real-time pitch modification system for speech and singing voice
Elias Azarov, Maxim Vashkevich, Denis Likhachov, Alexander Petrovsky

Nao is doing humour in the CHIST-ERA joker project
Guillaume Dubuisson Duplessis, Lucile Béchade, Mohamed A. Sehili, Agnès Delaborde, Vincent Letard, Anne-Laure Ligozat, Paul Deléglise, Yannick Estève, Sophie Rosset, Laurence Devillers

ABIMS — auditory bewildered interaction measurement system
Lisa Lange, Bartholomäus Pfeiffer, Daniel Duran

Phontasia — a game for training German orthography
Kay Berkling, Nadine Pflaumer, Alexei Coyplove

E-commu-book: an assistive technology for users with speech impairments
Ka Ho Wong, Wai Kim Leung, Helen Meng

Swiss graphogame: concept and design presentation of a computerised reading intervention for children with high risk for poor reading outcomes
Martina Röthlisberger, Iliana I. Karipidis, Georgette Pleisch, Volker Dellwo, Ulla Richardson, Silvia Brem

Neolexon — a therapy app for patients with aphasia
Jakob Pfab, Hanna Jakob, Mona Späth, Christoph Draxler

Acoustic stress detection for improved navigation of educational videos
Sonal Patil, Harish Arsikere, Om Deshmukh

Multimodal read-aloud ebooks for language learning
Xavier Anguera

Speech technologies for african languages: example of a multilingual calculator for education
Laurent Besacier, Elodie Gauthier, Mathieu Mangeot, Philippe Bretier, Paul Bagshaw, Olivier Rosec, Thierry Moudenc, François Pellegrino, Sylvie Voisin, Egidio Marsico, Pascal Nocera

The reddots platform for mobile crowd-sourcing of speech data
Kong Aik Lee, Guangsen Wang, Kam Pheng Ng, Hanwu Sun, Trung Hieu Nguyen, Ngoc Thuy Huong Thai, Bin Ma, Haizhou Li

Two extensions of umeda and teranishi's physical models of the human vocal tract
Takayuki Arai

Collaborative annotation for person identification in TV shows
Matheuz Budnik, Laurent Besacier, Johann Poignant, Hervé Bredin, Claude Barras, Mickael Stefas, Pierrick Bruneau, Thomas Tamisier

Phonetic/linguistic web services at BAS
Thomas Kisler, Florian Schiel, Uwe D. Reichel, Christoph Draxler

Managing speech databases with emur and the EMU-webapp
Raphael Winkelmann

Visual comparison of speaker groups
Sebastian Wankerl, Florian Hönig, Anton Batliner, J. R. Orozco-Arroyave, Elmar Nöth

Tools for rapid customization of S2s systems for emergent domains
Rohit Kumar, Matthew E. Roy, Sanjika Hewavitharana, Dennis N. Mehay, Nina Zinovieva

The speech recognition virtual kitchen turns one
Florian Metze, Eric Riebling, Eric Fosler-Lussier, Andrew Plummer, Rebecca Bates

Model-based adaptive pre-processing of speech for enhanced intelligibility in noise and reverberation
Jan Rennies, Andreas Volgenandt, Henning Schepker, Simon Doclo

Experiences with and new application ideas for the interspeech app
Sebastian Möller, Tilo Westermann

Traditional IVR and visual IVR — killing two birds with one stone
Dmitry Sityaev, Praphul Kumar, Rajesh Ramchander

Speaker and Language Recognition

High-resolution acoustic modeling and compact language modeling of language-universal speech attributes for spoken language identification
Yannan Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee

Phonemes frequency based PLLR dimensionality reduction for language recognition
Saad Irtza, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah, Haizhou Li

Exploiting i-vector posterior covariances for short-duration language recognition
Sandro Cumani, Oldřich Plchot, Radek Fér

Using the beat histogram for speech rhythm description and language identification
Athanasios Lykartsis, Stefan Weinzierl

Speaker recognition for speech under face cover
Rahim Saeidi, Tuija Niemi, Hanna Karppelin, Jouni Pohjalainen, Tomi Kinnunen, Paavo Alku

Dataset-invariant covariance normalization for out-domain PLDA speaker verification
Md. Hafizur Rahman, Ahilan Kanagasundaram, David Dean, Sridha Sridharan

Sparse coding of total variability matrix
Longting Xu, Kong Aik Lee, Haizhou Li, Zhen Yang

Duration dependent covariance regularization in PLDA modeling for speaker verification
Weicheng Cai, Ming Li, Lin Li, QingYang Hong

Exploiting supervector structure for speaker recognition trained on a small development set
Hagai Aronowitz

Modified-prior PLDA and score calibration for duration mismatch compensation in speaker recognition system
QingYang Hong, Lin Li, Ming Li, Ling Huang, Lihong Wan, Jun Zhang

Speaker verification using Gaussian posteriorgrams on fixed phrase short utterances
Sarfaraz Jelil, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna

Importance of intelligible phonemes for human speaker recognition in different channel bandwidths
Laura Fernández Gallardo, Sebastian Möller, Michael Wagner

Denoising autoencoder-based speaker feature restoration for utterances of short duration
Hitoshi Yamamoto, Takafumi Koshinaka

Full multicondition training for robust i-vector based speaker recognition
Dayana Ribas, Emmanuel Vincent, José Ramón Calvo

Emotion 1, 2

Acoustic-prosodic analysis of attitudinal expressions in German
Hansjörg Mixdorff, Angelika Hönemann, Albert Rilliard

Continuous emotion tracking using total variability space
Hossein Khaki, Engin Erzin

An analysis of the relationship between signal-derived vocal arousal score and human emotion production and perception
Chi-Chun Lee, Daniel Bone, Shrikanth S. Narayanan

Morphology of vocal affect bursts: exploring expressive interjections in Japanese conversation
Hiroki Mori

Emotion clustering based on probabilistic linear discriminant analysis
Mahnoosh Mehrabani, Ozlem Kalinli, Ruxin Chen

Objective study of the performance degradation in emotion recognition through the AMR-WB+ codec
Aaron Albin, Elliot Moore

Analysis of excitation source features of speech for emotion recognition
Sudarsana Reddy Kadiri, P. Gangamohan, Suryakanth V. Gangashetty, B. Yegnanarayana

An investigation of emotion change detection from speech
Zhaocheng Huang, Julien Epps, Eliathamby Ambikairajah

Crosslinguistic comparison on the perception of Mandarin attitudinal speech
Wentao Gu, Ping Tang, Keikichi Hirose, Véronique Aubergé

Conflict intensity estimation from speech using Greedy forward-backward feature selection
Gábor Gosztolya

Exploring acoustic differences between Cantonese (tonal) and English (non-tonal) spoken expressions of emotions
Chee Seng Chong, Jeesun Kim, Chris Davis

Valence, arousal and dominance estimation for English, German, Greek, Portuguese and Spanish lexica using semantic models
Elisavet Palogiannidi, Elias Iosif, Polychronis Koutsakis, Alexandros Potamianos

Dimensionality reduction for speech emotion features by multiscale kernels
Xinzhou Xu, Jun Deng, Wenming Zheng, Li Zhao, Björn Schuller

High-level feature representation using recurrent neural network for speech emotion recognition
Jinkyu Lee, Ivan Tashev

Speech emotion classification using tree-structured sparse logistic regression
Myung Jong Kim, Joohong Yoo, Younggwan Kim, Hoirin Kim

Annotators' agreement and spontaneous emotion classification performance
Bogdan Vlasenko, Andreas Wendemuth

Speech and Language Processing of Children's Speech (Special Session)

Large vocabulary automatic speech recognition for children
Hank Liao, Golan Pundak, Olivier Siohan, Melissa K. Carroll, Noah Coccaro, Qi-Ming Jiang, Tara N. Sainath, Andrew Senior, Françoise Beaufays, Michiel Bacchiani

Acoustic-prosodic correlates of `awkward' prosody in story retellings from adolescents with autism
Daniel Bone, Matthew P. Black, Anil Ramakrishna, Ruth Grossman, Shrikanth S. Narayanan

Evidence of phonological processes in automatic recognition of children's speech
Eva Fringi, Jill Fain Lehman, Martin Russell

Influence of speaker familiarity on blind and visually impaired children's perception of synthetic voices in audio games
Michael Pucher, Markus Toman, Dietmar Schabus, Cassia Valentini-Botinhao, Junichi Yamagishi, Bettina Zillinger, Erich Schmid

Low-memory fast on-line adaptation for acoustically mismatched children's speech recognition
S. Shahnawazuddin, Rohit Sinha

Large vocabulary children's speech recognition with DNN-HMM and SGMM acoustic modeling
Diego Giuliani, Bagher BabaAli

HMM adaptation for child speech synthesis
Avashna Govender, Febe de Wet, Jules-Raymond Tapamo

Vocal turn-taking patterns in groups of children performing collaborative tasks: an exploratory study
Jaebok Kim, Khiet P. Truong, Vicky Charisi, Cristina Zaga, Manja Lohse, Dirk Heylen, Vanessa Evers

Towards an automated screening tool for pediatric speech delay
Roozbeh Sadeghian, Stephen A. Zahorian

Children's reading aloud performance: a database and automatic detection of disfluencies
Jorge Proença, Dirce Celorico, Sara Candeias, Carla Lopes, Fernando Perdigão

Keyword spotting in multi-player voice driven games for children
Harshavardhan Sundar, Jill Fain Lehman, Rita Singh

Age-dependent height estimation and speaker normalization for children's speech using the first three subglottal resonances
Jinxi Guo, Rohit Paturi, Gary Yeung, Steven M. Lulich, Harish Arsikere, Abeer Alwan

Syllables and Segments 1, 2

The effect of speakers' regional varieties on listeners' decision-making
Adrian Leemann, Camilla Bernardasci, Francis Nolan

Word-initial glottal stop insertion, hiatus resolution and linking in British English
Robert Fuchs

Acoustic analysis of Mandarin affricates
Shanpeng Li, Wentao Gu

Homophonous phonotactic and morphonotactic consonant clusters in word-final position
Hannah Leykum, Sylvia Moosmüller, Wolfgang U. Dressler

Consonant duration and VOT as a function of syllable complexity and voicing in a sub-set of Spanish clusters
Mark Gibson, Ana María Fernández Planas, Adamantios Gafos, Emily Remirez

Hands-on tool producing front vowels for phonetic education: aiming for pronunciation training with tactile sensation
Takayuki Arai

Acoustics of articulatory constraints: vowel classification and nasalization
Indranil Dutta, Ayushi Pandey

Voice-conditioned allophones of MOUTH and PRICE in bahamian creole
Janina Kraus

Analysis of spatial variation with app-based crowdsourced audio data
Marie-José Kolly, Adrian Leemann, Florian Matter

Confusability in L2 vowels: analyzing the role of different features
Mátyás Jani, Catia Cucchiarini, Roeland van Hout, Helmer Strik

Perception of French speakers' German vowels
Frank Zimmerer, Jürgen Trouvain

Unintuitive phonetic behavior in tswana post-nasal stops
Jagoda Bruni, Daniel Duran, Grzegorz Dogil

Classification of place-of-articulation of stop consonants using temporal analysis
A. P. Prathosh, A. G. Ramakrishnan, T. V. Ananthapadmanabha

The emergence of nasal velar codas in Brazilian Portuguese: an rt-MRI study
Marissa Barlaz, Maojing Fu, Zhi-Pei Liang, Ryan Shosted, Brad Sutton

Salient dimensions in implicit phonotactic learning
Elise Michon, Emmanuel Dupoux, Alejandrina Cristia

An acoustic examination of the three-way sibilant contrast in lower sorbian
Phil Howson

Investigating consonant reduction in Mandarin Chinese with improved forced alignment
Jiahong Yuan, Mark Liberman

Durational characteristics and timing patterns of Russian onset clusters at two speaking rates
Marianne Pouplier, Stefania Marin, Alexei Kochetov

Speech Enhancement

Modeling temporal dependency for robust estimation of LP model parameters in speech enhancement
Chun Hoy Wong, Tan Lee, Yu Ting Yeung, P. C. Ching

Learning a speech manifold for signal subspace speech denoising
Colin Vaz, Shrikanth S. Narayanan

An iterative speech model-based a priori SNR estimator
Samy Elshamy, Nilesh Madhu, Wouter Tirry, Tim Fingscheidt

Multi-resolution stacking for speech separation based on boosted DNN
Xiao-Lei Zhang, DeLiang Wang

Least squares estimate of the initial phases in STFT based speech enhancement
Sidsel Marie Nørholm, Martin Krawczyk-Becker, Timo Gerkmann, Steven van de Par, Jesper Rindom Jensen, Mads Græsbøll Christensen

Enhancement of non-stationary speech using harmonic chirp filters
Sidsel Marie Nørholm, Jesper Rindom Jensen, Mads Græsbøll Christensen

Text-informed speech enhancement with deep neural networks
Keisuke Kinoshita, Marc Delcroix, Atsunori Ogawa, Tomohiro Nakatani

Complex tensor factorization in modulation frequency domain for single-channel speech enhancement
Shogo Masaya, Masashi Unoki

Systematic integration of acoustic echo canceller and noise reduction modules for voice communication systems
Hyeonjoo Kang, JeeSok Lee, Soonho Baek, Hong-Goo Kang

DNN-based residual echo suppression
Chul Min Lee, Jong Won Shin, Nam Soo Kim

Codebook-based speech enhancement using Markov process and speech-presence probability
Qi He, Changchun Bao, Feng Bao

On optimal smoothing in minimum statistics based noise tracking
Aleksej Chinaev, Reinhold Haeb-Umbach

A data-driven speech enhancement method based on modeled long-range temporal dynamics
Yue Hao, Changchun Bao, Feng Bao, Feng Deng

Improved phase reconstruction in single-channel speech separation
Florian Mayer, Pejman Mowlaee

Speech and Audio Segmentation and Classification; Voice Activity Detection 1-3

Face reading from speech — predicting facial action units from audio cues
Fabien Ringeval, Erik Marchi, Marc Mehu, Klaus Scherer, Björn Schuller

A new front-end for classification of non-speech sounds: a study on human whistle
Mahesh Kumar Nandwana, Hynek Bořil, John H. L. Hansen

Robust features for sonorant segmentation in continuous speech
Sri Harsha Dumpala, Bhanu Teja Nellore, Raghu Ram Nevali, Suryakanth V. Gangashetty, B. Yegnanarayana

Reduction of reverberation effects in the MFCC modulation spectrum for improved classification of acoustic signals
Sebastian Gergen, Anil Nagathil, Rainer Martin

Spiking neural networks and the generalised hough transform for speech pattern detection
Jonathan Dennis, Huy Dat Tran, Haizhou Li

Acoustic event recognition using dominant spectral basis vectors
Woohyun Choi, Sangwook Park, David K. Han, Hanseok Ko

A statistical model-based voice activity detection using multiple DNNs and noise awareness
Inyoung Hwang, Jaeseong Sim, Sang-Hyeon Kim, Kwang-Sub Song, Joon-Hyuk Chang

A universal VAD based on jointly trained deep neural networks
Qing Wang, Jun Du, Xiao Bao, Zi-Rui Wang, Li-Rong Dai, Chin-Hui Lee

Spectrographic speech mask estimation using the time-frequency correlation of speech presence
Ge Zhan, Zhaoqiong Huang, Dongwen Ying, Jielin Pan, Yonghong Yan

Complete-linkage clustering for voice activity detection in audio and visual speech
Houman Ghaemmaghami, David Dean, Shahram Kalantari, Sridha Sridharan, Clinton Fookes

A model based voice activity detector for noisy environments
Kaavya Sriskandaraja, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah

An unsupervised visual-only voice activity detection approach using temporal orofacial features
Fei Tao, John H. L. Hansen, Carlos Busso

Automatic detection of equipment alarms in a neonatal intensive care unit environment: a knowledge-based approach
Ganna Raboshchuk, Peter Jančovič, Climent Nadeu, Alex Peiró Lilja, Münevver Köküer, Blanca Muñoz Mahamud, Ana Riverola de Veciana

“multilingual” deep neural network for music genre classification
Jia Dai, Wenju Liu, Chongjia Ni, Like Dong, Hong Yang

Accurate endpointing with expected pause duration
Baiyang Liu, Bjorn Hoffmeister, Ariya Rastrow

Locality constrained transitive distance clustering on speech data
Wenbo Liu, Zhiding Yu, Bhiksha Raj, Ming Li

Feature extraction strategies in deep learning based acoustic event detection
Miquel Espi, Masakiyo Fujimoto, Keisuke Kinoshita, Tomohiro Nakatani

An acoustic event detection framework and evaluation metric for surveillance in cars
Peter Transfeld, Simon Receveur, Tim Fingscheidt

Diachronic semantic cohesion for topic segmentation of TV broadcast news
Abdessalam Bouchekif, Géraldine Damnati, Yannick Estève, Delphine Charlet, Nathalie Camelin

Comparison of forced-alignment speech recognition and humans for generating reference VAD
Ivan Kraljevski, Zheng-Hua Tan, Maria Paola Bissiri

Improving voice activity detection in movies
Bernhard Lehner, Gerhard Widmer, Reinhard Sonnleitner

Automatic Speaker Verification Spoofing and Countermeasures (ASVspoof 2015) (Special Session)

Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): introductory talk by the organizers
Zhizheng Wu, Tomi Kinnunen

ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge
Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Cemal Hanilçi, Md. Sahidullah, Aleksandr Sizov

The AHOLAB RPS SSD spoofing challenge 2015 submission
Jon Sanchez, Ibon Saratxaga, Inma Hernaez, Eva Navas, D. Erro

Human vs machine spoofing detection on wideband and narrowband data
Mirjam Wester, Zhizheng Wu, Junichi Yamagishi

Spoofing speech detection using high dimensional magnitude and phase features: the NTU approach for ASVspoof 2015 challenge
Xiong Xiao, Xiaohai Tian, Steven Du, Haihua Xu, Eng Siong Chng, Haizhou Li

Classifiers for synthetic speech detection: a comparison
Cemal Hanilçi, Tomi Kinnunen, Md. Sahidullah, Aleksandr Sizov

Combining evidences from mel cepstral, cochlear filter cepstral and instantaneous frequency features for detection of natural vs. spoofed speech
Tanvina B. Patel, Hemant A. Patil

Spoofing detection with DNN and one-class SVM for the ASVspoof 2015 challenge
Jesús Villalba, Antonio Miguel, Alfonso Ortega, Eduardo Lleida

Development of CRIM system for the automatic speaker verification spoofing and countermeasures challenge 2015
Md. Jahangir Alam, Patrick Kenny, Gautam Bhattacharya, Themos Stafylakis

Spoofing countermeasure based on analysis of linear prediction error
Artur Janicki

Simultaneous utilization of spectral magnitude and phase information to extract supervectors for speaker verification anti-spoofing
Yi Liu, Yao Tian, Liang He, Jia Liu, Michael T. Johnson

A comparison of features for synthetic speech detection
Md. Sahidullah, Tomi Kinnunen, Cemal Hanilçi

Relative phase information for detecting human speech and spoofed speech
Longbiao Wang, Yohei Yoshida, Yuta Kawakami, Seiichi Nakagawa

Robust deep feature for spoofing detection — the SJTU system for ASVspoof 2015 challenge
Nanxin Chen, Yanmin Qian, Heinrich Dinkel, Bo Chen, Kai Yu

Automatic speaker verification spoofing and countermeasures (ASVspoof 2015): open discussion and future plans
Junichi Yamagishi, Nicholas Evans

Robust Speech Recognition: Features, Far-field and Reverberation

A study on deep neural network acoustic model adaptation for robust far-field speech recognition
Seyedmahdad Mirsamadi, John H. L. Hansen

Speech dereverberation using long short-term memory
Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Reverberation robust acoustic modeling using i-vectors with time delay neural networks
Vijayaditya Peddinti, Guoguo Chen, Daniel Povey, Sanjeev Khudanpur

Delta-melspectra features for noise robustness to DNN-based ASR systems
Kshitiz Kumar, Chaojun Liu, Yifan Gong

Combating reverberation in large vocabulary continuous speech recognition
Vikramjit Mitra, Julien Van Hout, Mitchell McLaren, Wen Wang, Martin Graciarena, Dimitra Vergyri, Horacio Franco

Three ways to adapt a CTS recognizer to unseen reverberated speech in BUT system for the ASpIRE challenge
Martin Karafiát, František Grézl, Lukáš Burget, Igor Szöke, Jan Černocký

Robust parameter estimation for audio declipping in noise
Mark J. Harvilla, Richard M. Stern

Multi-task learning deep neural networks for speech feature denoising
Bin Huang, Dengfeng Ke, Hao Zheng, Bo Xu, Yanyan Xu, Kaile Su

Time-frequency masking for large scale robust speech recognition
Yuxuan Wang, Ananya Misra, Kean K. Chin

Efficient use of DNN bottleneck features in generalized variable parameter HMMs for noise robust speech recognition
Rongfeng Su, Xurong Xie, Xunying Liu, Lan Wang

Investigating modulation spectrogram features for deep neural network-based automatic speech recognition
Deepak Baby, Hugo Van hamme

Deep neural network based spectral feature mapping for robust speech recognition
Kun Han, Yanzhang He, Deblin Bagchi, Eric Fosler-Lussier, DeLiang Wang

Social Signals, Assessment and Paralinguistics

Analyzing speech rate entrainment and its relation to therapist empathy in drug addiction counseling
Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan

Agreement and disagreement utterance detection in conversational speech by extracting and integrating local features
Atsushi Ando, Taichi Asami, Manabu Okamoto, Hirokazu Masataki, Sumitaka Sakauchi

Still together?: the role of acoustic features in predicting marital outcome
Md. Nasir, Wei Xia, Bo Xiao, Brian Baucom, Shrikanth S. Narayanan, Panayiotis G. Georgiou

On evaluation metrics for social signal detection
Gábor Gosztolya

Laughter and filler detection in naturalistic audio
Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen

Automatic formatted transcripts for videos
Aasish Pappu, Amanda Stent

Does my speech rock? automatic assessment of public speaking skills
Lucas Azaïs, Adrien Payan, Tianjiao Sun, Guillaume Vidal, Tina Zhang, Eduardo Coutinho, Florian Eyben, Björn Schuller

Verbal intelligence identification based on text classification
Roman Sergienko, Alexander Schmitt

A multimodal approach for automatic assessment of school principals' oral presentation during pre-service training program
Shan-Wen Hsiao, Hung-Ching Sun, Ming-Chuan Hsieh, Ming-Hsueh Tsai, Hsin-Chih Lin, Chi-Chun Lee

Are you TED talk material? comparing prosody in professors and TED speakers
T. J. Tsai

Detection of cognitive states and their correlation to speech recognition performance in speech-to-speech machine translation systems
Hayakawa Akira, Fasih Haider, Loredana Cerrato, Nick Campbell, Saturnino Luz

Bandwidth Extension, Quality and Intelligibility Measures

Perceptual speech quality dimensions in a conversational situation
Friedemann Köster, Sebastian Möller

Multidimensional evaluation and predicting overall speech quality
Jens Berger, Anna Llagostera

On speech intelligibility estimation of phase-aware single-channel speech enhancement
Andreas Gaich, Pejman Mowlaee

A framework for the evaluation of microscopic intelligibility models
Ricard Marxer, Martin Cooke, Jon Barker

A binaural short time objective intelligibility measure for noisy and enhanced speech
Asger Heidemann Andersen, Jan Mark de Haan, Zheng-Hua Tan, Jesper Jensen

A glimpse-based approach for predicting binaural intelligibility with single and multiple maskers in anechoic conditions
Yan Tang, Martin Cooke, Bruno M. Fazenda, Trevor J. Cox

Improving the prediction power of the speech transmission index to account for non-linear distortions introduced by noise-reduction algorithms
Fei Chen

DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech
Kehuang Li, Zhen Huang, Yong Xu, Chin-Hui Lee

Speech quality evaluation of artificial bandwidth extension: comparing subjective judgments and instrumental predictions
Hannu Pulakka, Ville Myllylä, Anssi Rämö, Paavo Alku

Synchronous overlap and add of spectra for enhancement of excitation in artificial bandwidth extension of speech
M. A. Tuğtekin Turan, Engin Erzin

Speech bandwidth expansion based on deep neural networks
Yingxue Wang, Shenghui Zhao, Wenbo Liu, Ming Li, Jingming Kuang

A novel method of artificial bandwidth extension using deep architecture
Bin Liu, Jianhua Tao, Zhengqi Wen, Ya Li, Danish Bukhari

Advanced Crowdsourcing for Speech and Beyond (Special Session)

Advanced crowdsourcing for speech and beyond: introduction by the organizers
Tim Polzehl, Gina-Anne Levow

Transcribing continuous speech using mismatched crowdsourcing
Preethi Jyothi, Mark Hasegawa-Johnson

Selection and aggregation techniques for crowdsourced semantic annotation task
Shammur Absar Chowdhury, Marcos Calvo, Arindam Ghosh, Evgeny A. Stepanov, Ali Orkan Bayer, Giuseppe Riccardi, Fernando García, Emilio Sanchis

Controlling quality and handling fraud in large scale crowdsourcing speech data collections
Spencer Rothwell, Ahmad Elshenawy, Steele Carter, Daniela Braga, Faraz Romani, Michael Kennewick, Bob Kennewick

Data collection and annotation for state-of-the-art NER using unmanaged crowds
Spencer Rothwell, Steele Carter, Ahmad Elshenawy, Vladislavs Dovgalecs, Safiyyah Saleem, Daniela Braga, Bob Kennewick

Robustness in speech quality assessment and temporal training expiry in mobile crowdsourcing environments
Tim Polzehl, Babak Naderi, Friedemann Köster, Sebastian Möller

Effect of trapping questions on the reliability of speech quality judgments in a crowdsourcing paradigm
Babak Naderi, Tim Polzehl, Ina Wechsung, Friedemann Köster, Sebastian Möller

Voice Äpp: a mobile app for crowdsourcing Swiss German dialect data
Adrian Leemann, Marie-José Kolly, Jean-Philippe Goldman, Volker Dellwo, Ingrid Hove, Ibrahim Almajai, Sarah Grimm, Sylvain Robert, Daniel Wanitsch

Expert and crowdsourced annotation of pronunciation errors for automatic scoring systems
Anastassia Loukina, Melissa Lopez, Keelan Evanini, David Suendermann-Oeft, Klaus Zechner

Capcap: an output-agreement game for video captioning
Hernisa Kacorri, Kaoru Shinkawa, Shin Saito

Auris populi: crowdsourced native transcriptions of Dutch vowels spoken by adult Spanish learners
Pepi Burgos, Eric Sanders, Catia Cucchiarini, Roeland van Hout, Helmer Strik

Crowdsource a little to label a lot: labeling a speech corpus of dialectal Arabic
Samantha Wray, Ahmed Ali

Using keyword spotting to help humans correct captioning faster
Yashesh Gaur, Florian Metze, Yajie Miao, Jeffrey P. Bigham

Validating and optimizing a crowdsourced method for gradient measures of child speech
Tara McAllister Byun, Elaine Hitchcock, Daphna Harel

Robust Speech Recognition: Adaptation

Joint training of speech separation, filterbank and acoustic model for robust automatic speech recognition
Zhong-Qiu Wang, DeLiang Wang

Joint environment and speaker normalization using factored front-end CMLLR
Shakti Rath, Sunil Sivadas, Bin Ma

Robust speech recognition using DNN-HMM acoustic model combining noise-aware training with spectral subtraction
Akihiro Abe, Kazumasa Yamamoto, Seiichi Nakagawa

Robust i-vector extraction for neural network adaptation in noisy environment
Chengzhu Yu, Atsunori Ogawa, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, John H. L. Hansen

Spectrally selective dithering for distorted speech recognition
Michal Borsky, Petr Mizera, Petr Pollak

Feature-space speaker adaptation for probabilistic linear discriminant analysis acoustic models
Liang Lu, Steve Renals

Speaker adaptation using the i-vector technique for bottleneck features
Patrick Cardinal, Najim Dehak, Yu Zhang, James Glass

I-vector estimation using informative priors for adaptation of deep neural networks
Penny Karanasou, Mark J. F. Gales, Philip C. Woodland

Robust i-vector based adaptation of DNN acoustic model for speech recognition
Sri Garimella, Arindam Mandal, Nikko Strom, Bjorn Hoffmeister, Spyros Matsoukas, Sree Hari Krishnan Parthasarathi

GMM-derived features for effective unsupervised adaptation of deep neural network acoustic models
Natalia Tomashenko, Yuri Khokhlov

Unsupervised adaptation for deep neural network using linear least square method
Roger Hsiao, Tim Ng, Stavros Tsakalidis, Long Nguyen, Richard Schwartz

Ensemble speaker modeling using speaker adaptive training deep neural network for speaker adaptation
Sheng Li, Xugang Lu, Yuya Akita, Tatsuya Kawahara

Data-selective transfer learning for multi-domain speech recognition
Mortaza Doulaty, Oscar Saz, Thomas Hain

Speech and Hearing Disorders

Language-independent method for analysis of German stuttering recordings
Tomas Lustyk, Petr Bergl, Tino Haderlein, Elmar Nöth, Roman Cmejla

An investigation of MDVP parameters for voice pathology detection on three different databases
Ahmed Al-nasheri, Zulfiqar Ali, Ghulam Muhammad, Mansour Alsulaiman

Energy distribution analysis and nonlinear dynamical analysis of adductor spasmodic dysphonia
Jiantao Wu, Ping Yu, Nan Yan, Lan Wang, Xiaohui Yang, Manwa L. Ng

Auditory-visual tone perception in hearing impaired Thai listeners
Benjawan Kasisopa, Nittayapa Klangpornkun, Denis Burnham

Speech intelligibility decline in individuals with fast and slow rates of ALS progression
Panying Rong, Yana Yunusova, Jordan R. Green

Latency analysis of speech shadowing reveals processing differences in Japanese adults who do and do not stutter
Rong Na A, Koichi Mori, Naomi Sakai

A syllable-based analysis of speech temporal organization: a comparison between speaking styles in dysarthric and healthy populations
Brigitte Bigi, Katarzyna Klessa, Laurianne Georgeton, Christine Meunier

Autonomous measurement of speech intelligibility utilizing automatic speech recognition
Bernd T. Meyer, Birger Kollmeier, Jasper Ooster

Can you hear me? acoustic modifications in speech directed to foreigners and hearing-impaired people
Monja Angelika Knoll, Melissa Johnstone, Charlene Blakely

Improving automatic forced alignment for dysarthric speech transcription
Yu Ting Yeung, Ka Ho Wong, Helen Meng

Neural Networks: Novel Architectures for LVCSR

A time delay neural network architecture for efficient modeling of long temporal contexts
Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur

Long short-term memory based convolutional recurrent neural networks for large vocabulary speech recognition
Xiangang Li, Xihong Wu

Parameterised sigmoid and reLU hidden activation functions for DNN acoustic modelling
C. Zhang, Philip C. Woodland

Discriminative template learning in group-convolutional networks for invariant speech representations
Chiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio

Investigation of parametric rectified linear units for noise robust speech recognition
Sunil Sivadas, Zhenzhou Wu, Ma Bin

Multi-softmax deep neural network for semi-supervised training
Hang Su, Haihua Xu

A multi-region deep neural network model in speech recognition
Jia Cui, George Saon, Bhuvana Ramabhadran, Brian Kingsbury

A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition
Liang Lu, Xingxing Zhang, Kyunghyun Cho, Steve Renals

Gaussian free cluster tree construction using deep neural network
Linchen Zhu, Kevin Kilgour, Sebastian Stüker, Alex Waibel

Very deep convolutional neural networks for LVCSR
Mengxiao Bi, Yanmin Qian, Kai Yu

Transferring knowledge from a RNN to a DNN
William Chan, Nan Rosemary Ke, Ian Lane

SVD-based universal DNN modeling for multiple scenarios
Changliang Liu, Jinyu Li, Yifan Gong

Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks
Zhuo Chen, Shinji Watanabe, Hakan Erdogan, John R. Hershey

Speech and Music Analysis

Speaker-dependent multipitch tracking using deep neural networks
Yuzhou Liu, DeLiang Wang

An error correction scheme for GCI detection algorithms using pitch smoothness criterion
Sujith P., A. P. Prathosh, A. G. Ramakrishnan, Prasanta Kumar Ghosh

Robust pitch estimation in noisy speech using ZTW and group delay function
RaviShankar Prasad, B. Yegnanarayana

Robust localization of single sound source based on phase difference regression
Zhaoqiong Huang, Ge Zhan, Dongwen Ying, Yonghong Yan

Frequency map selection using a RBFN-based classifier in the MVDR beamformer for speaker localization in reverberant rooms
Daniele Salvati, Carlo Drioli, Gian Luca Foresti

Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions
Ning Ma, Guy J. Brown, Tobias May

Joint optimization of recurrent networks exploiting source auto-regression for source separation
Shuai Nie, Wei Xue, Shan Liang, Xueliang Zhang, Wenju Liu, Liwei Qiao, Jianping Li

Real-time audio-to-score alignment of singing voice based on melody and lyric information
Rong Gong, Philippe Cuvillier, Nicolas Obin, Arshia Cont

Vocal separation from monaural music using adaptive auditory filtering based on kernel back-fitting
Jun-Yong Lee, Hye-Seung Cho, Hyoung-Gook Kim

A two-stage singing voice separation algorithm using spectro-temporal modulation features
Frederick Z. Yen, Mao-Chang Huang, Tai-Shih Chi

Robust sound event classification using LBP-HOG based bag-of-audio-words feature representation
Hyungjun Lim, Myung Jong Kim, Hoirin Kim

Robust Speech Processing Using Observation Uncertainty and Uncertainty Propagation (Special Session)

Robust speech processing using observation uncertainty and uncertainty propagation: session and paper overview
Ramón F. Astudillo, Shinji Watanabe, Ahmed Hussen Abdelaziz, Dorothea Kolossa

Uncertainty propagation for noise robust speaker recognition: the case of NIST-SRE
Dayana Ribas, Emmanuel Vincent, José Ramón Calvo

Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features
Yuuki Tachioka, Shinji Watanabe

Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation
Rahim Saeidi, Paavo Alku

Autoencoder based multi-stream combination for noise robust speech recognition
Sri Harish Mallidi, Tetsuji Ogawa, Karel Veselý, Phani S. Nidadavolu, Hynek Hermansky

Uncertainty decoding for DNN-HMM hybrid systems based on numerical sampling
Christian Huemmer, Roland Maas, Andreas Schwarz, Ramón F. Astudillo, Walter Kellermann

Uncertainty propagation through deep neural networks
Ahmed Hussen Abdelaziz, Shinji Watanabe, John R. Hershey, Emmanuel Vincent, Dorothea Kolossa

Handling derivative filterbank features in bounded-marginalization-based missing data automatic speech recognition
Marco Kühne

Large-scale, sequence-discriminative, joint adaptive training for masking-based robust ASR
Arun Narayanan, Ananya Misra, Kean K. Chin

Integration of DNN based speech enhancement and ASR
Ramón F. Astudillo, Joana Correia, Isabel Trancoso

Acoustic Model Adaptation and Training

A general artificial neural network extension for HTK
C. Zhang, Philip C. Woodland

Audio augmentation for speech recognition
Tom Ko, Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur

A diversity-penalizing ensemble training method for deep learning
Xiaohui Zhang, Daniel Povey, Sanjeev Khudanpur

Deep neural network training emphasizing central frames
Gakuto Kurata, Daniel Willett

Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach
Kai Chen, Zhi-Jie Yan, Qiang Huo

Structured output layer with auxiliary targets for context-dependent acoustic modelling
Pawel Swietojanski, Peter Bell, Steve Renals

Complementary tasks for context-dependent deep neural network acoustic models
Peter Bell, Steve Renals

Towards end-to-end speech recognition for Chinese Mandarin using long short-term memory recurrent neural networks
Jie Li, Heng Zhang, Xinyuan Cai, Bo Xu

Improving deep neural networks based multi-accent Mandarin speech recognition using i-vectors and accent-specific top layer
Mingming Chen, Zhanlei Yang, Jizhong Liang, Yanpeng Li, Wenju Liu

Rapid adaptation for deep neural networks through multi-task learning
Zhen Huang, Jinyu Li, Sabato Marco Siniscalchi, I-Fan Chen, Ji Wu, Chin-Hui Lee

fMLLR based feature-space speaker adaptation of DNN acoustic models
Sree Hari Krishnan Parthasarathi, Bjorn Hoffmeister, Spyros Matsoukas, Arindam Mandal, Nikko Strom, Sri Garimella

I-vector dependent feature space transformations for adaptive speech recognition
Xiangang Li, Xihong Wu

Unsupervised domain discovery using latent dirichlet allocation for acoustic modelling in speech recognition
Mortaza Doulaty, Oscar Saz, Thomas Hain

Training data selection for acoustic modeling via submodular optimization of joint kullback-leibler divergence
Taichi Asami, Ryo Masumura, Hirokazu Masataki, Manabu Okamoto, Sumitaka Sakauchi

Stress, Load, and Pathologies

Stress level detection using double-layer subband filter
Tin Lay Nwe, Qianli Xu, Cuntai Guan, Bin Ma

Prosodic characteristics of read speech before and after treadmill running
Jürgen Trouvain, Khiet P. Truong

A database for analysis of speech under physical stress: detection of exercise intensity while running and talking
Khiet P. Truong, Arne Nieuwenhuys, Peter Beek, Vanessa Evers

Stressed out: what speech tells us about stress
Will Paul, Cecilia Ovesdotter Alm, Reynold Bailey, Joe Geigel, Linwei Wang

Prediction of heart rate changes from speech features during interaction with a misbehaving dialog system
Andreas Tsiartas, Andreas Kathol, Elizabeth Shriberg, Massimiliano de Zambotti, Adrian Willoughby

Acoustic correlates for perceived effort levels in expressive speech
Mary Pietrowicz, Mark Hasegawa-Johnson, Karrie Karahalios

Pitch-based speech perturbation measures using a novel GCI detection algorithm: application to pathological voice classification
Khalid Daoudi, Ashwini Jaya Kumar

Speech-based assessment of PTSD in a military population using diverse feature classes
Dimitra Vergyri, Bruce Knoth, Elizabeth Shriberg, Vikramjit Mitra, Mitchell McLaren, Luciana Ferrer, Pablo Garcia, Charles Marmar

Cognitive impairment prediction in the elderly based on vocal biomarkers
Bea Yu, Thomas F. Quatieri, James R. Williamson, James C. Mundt

Automatic age detection in normal and pathological voice
J. -A. Gómez-García, L. Moro-Velázquez, Juan Ignacio Godino-Llorente, G. Castellanos-Domínguez



