Interspeech 2016

8-12 Sep 2016, San Francisco

Nelson Morgan

ISSN: 1990-9772  DOI: 10.21437/Interspeech.2016

Keynote 1: ISCA Medalist: John Makhoul


A 50-Year Retrospective on Speech and Language Processing
John Makhoul


Neural Networks in Speech Recognition


Improving English Conversational Telephone Speech Recognition
Ivan Medennikov, Alexey Prudnikov, Alexander Zatvornitskiy

The IBM 2016 English Conversational Telephone Speech Recognition System
George Saon, Tom Sercu, Steven Rennie, Hong-Kwang J. Kuo

Small-Footprint Deep Neural Networks with Highway Connections for Speech Recognition
Liang Lu, Steve Renals

Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention
Dong Yu, Wayne Xiong, Jasha Droppo, Andreas Stolcke, Guoli Ye, Jinyu Li, Geoffrey Zweig

Lower Frame Rate Neural Network Acoustic Models
Golan Pundak, Tara N. Sainath

Improved Neural Network Initialization by Grouping Context-Dependent Targets for Acoustic Modeling
Gakuto Kurata, Brian Kingsbury


Special Session: Auditory-Visual Expressive Speech and Gesture in Humans and Machines


Automatic Scoring of Monologue Video Interviews Using Multimodal Cues
Lei Chen, Gary Feng, Michelle Martin-Raugh, Chee Wee Leong, Christopher Kitchen, Su-Youn Yoon, Blair Lehman, Harrison Kell, Chong Min Lee

The Sound of Disgust: How Facial Expression May Influence Speech Production
Chee Seng Chong, Jeesun Kim, Chris Davis

Analyzing Temporal Dynamics of Dyadic Synchrony in Affective Interactions
Zhaojun Yang, Shrikanth S. Narayanan

Audiovisual Speech Scene Analysis in the Context of Competing Sources
Attigodu C. Ganesh, Frédéric Berthommier, Jean-Luc Schwartz

Head Motion Generation with Synthetic Speech: A Data Driven Approach
Najmeh Sadoughi, Carlos Busso

The Consistency and Stability of Acoustic and Visual Cues for Different Prosodic Attitudes
Jeesun Kim, Chris Davis

Introduction to Poster Presentation of Part II
Jeesun Kim, Gérard Bailly



Speech and Language Processing for Clinical Health Applications


Toward Development and Evaluation of Pain Level-Rating Scale for Emergency Triage based on Vocal Characteristics and Facial Expressions
Fu-Sheng Tsai, Ya-Ling Hsu, Wei-Chen Chen, Yi-Ming Weng, Chip-Jin Ng, Chi-Chun Lee

Predicting Severity of Voice Disorder from DNN-HMM Acoustic Posteriors
Tan Lee, Yuanyuan Liu, Yu Ting Yeung, Thomas K.T. Law, Kathy Y.S. Lee

Long-Term Stability of Tracheoesophageal Voices
Klaske E. van Sluis, Michiel W.M. van den Brekel, Frans J.M. Hilgers, Rob J.J.H. van Son

Detecting Mild Cognitive Impairment from Spontaneous Speech by Correlation-Based Phonetic Feature Selection
Gábor Gosztolya, László Tóth, Tamás Grósz, Veronika Vincze, Ildikó Hoffmann, Gréta Szatlóczki, Magdolna Pákáski, János Kálmán

Towards an Automated Screening Tool for Developmental Speech and Language Impairments
Jen J. Gong, Maryann Gong, Dina Levy-Lambert, Jordan R. Green, Tiffany P. Hogan, John V. Guttag

Spectral Enhancement of Cleft Lip and Palate Speech
Vikram C.M., Nagaraj Adiga, S.R. Mahadeva Prasanna



Speech Analysis


Automatic Classification of Phonation Modes in Singing Voice: Towards Singing Style Characterisation and Application to Ethnomusicological Recordings
Jean-Luc Rouas, Leonidas Ioannidis

Novel Nonlinear Prediction Based Features for Spoofed Speech Detection
Himanshu N. Bhavsar, Tanvina B. Patel, Hemant A. Patil

Robust Vowel Landmark Detection Using Epoch-Based Features
Sri Harsha Dumpala, Bhanu Teja Nellore, Raghu Ram Nevali, Suryakanth V. Gangashetty, B. Yegnanarayana

Sensitivity of Quantitative RT-MRI Metrics of Vocal Tract Dynamics to Image Reconstruction Settings
Johannes Töger, Yongwan Lim, Sajan Goud Lingala, Shrikanth S. Narayanan, Krishna S. Nayak

Sound Pattern Matching for Automatic Prosodic Event Detection
Milos Cernak, Afsaneh Asaei, Pierre-Edouard Honnet, Philip N. Garner, Hervé Bourlard

Automatic Classification of Lexical Stress in English and Arabic Languages Using Deep Learning
Mostafa Shahin, Julien Epps, Beena Ahmed



Speech and Hearing Disorders & Perception


Auditory-Visual Perception of VCVs Produced by People with Down Syndrome: Preliminary Results
Alexandre Hennequin, Amélie Rochet-Capellan, Marion Dohen

Combining Non-Pathological Data of Different Language Varieties to Improve DNN-HMM Performance on Pathological Speech
Emre Yılmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik

Evaluation of a Phone-Based Anomaly Detection Approach for Dysarthric Speech
Imed Laaridh, Corinne Fredouille, Christine Meunier

Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-Taper Spectral Estimation
Chitralekha Bhat, Bhavik Vachhani, Sunil Kopparapu

Impaired Categorical Perception of Mandarin Tones and its Relationship to Language Ability in Autism Spectrum Disorders
Fei Chen, Nan Yan, Xiaojie Pan, Feng Yang, Zhuanzhuan Ji, Lan Wang, Gang Peng

Perceived Naturalness of Electrolaryngeal Speech Produced Using sEMG-Controlled vs. Manual Pitch Modulation
K.F. Nagle, J.T. Heaton

Identifying Hearing Loss from Learned Speech Kernels
Shamima Najnin, Bonny Banerjee, Lisa Lucks Mendel, Masoumeh Heidari Kapourchali, Jayanta Kumar Dutta, Sungmin Lee, Chhayakanta Patro, Monique Pousson

Differential Effects of Velopharyngeal Dysfunction on Speech Intelligibility During Early and Late Stages of Amyotrophic Lateral Sclerosis
Panying Rong, Yana Yunusova, Jordan R. Green

The Production of Intervocalic Glides in Non Dysarthric Parkinsonian Speech
V. Delvaux, V. Roland, K. Huet, M. Piccaluga, M.C. Haelewyck, B. Harmegnies

Auditory Processing Impairments Under Background Noise in Children with Non-Syndromic Cleft Lip and/or Palate
Yang Feng, Zhang Lu

Modulation Spectral Features for Predicting Vocal Emotion Recognition by Simulated Cochlear Implants
Zhi Zhu, Ryota Miyauchi, Yukiko Araki, Masashi Unoki

Automatic Discrimination of Soft Voice Onset Using Acoustic Features of Breathy Voicing
Keiko Ochi, Koichi Mori, Naomi Sakai, Nobutaka Ono

Effect of Noise on Lexical Tone Perception in Cantonese-Speaking Amusics
Jing Shao, Caicai Zhang, Gang Peng, Yike Yang, William S.-Y. Wang

Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss
Yuki Takashima, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki, Nobuyuki Mitani, Kiyohiro Omori, Kaoru Nakazono

Perception of Tone in Whispered Mandarin Sentences: The Case for Singapore Mandarin
Yuling Gu, Boon Pang Lim, Nancy F. Chen


Speech Synthesis Poster


A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences
Feng-Long Xie, Frank K. Soong, Haifeng Li

Parallel Dictionary Learning for Voice Conversion Using Discriminative Graph-Embedded Non-Negative Matrix Factorization
Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki

Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks
Yu Gu, Zhen-Hua Ling, Li-Rong Dai

Voice Conversion Based on Matrix Variate Gaussian Mixture Model Using Multiple Frame Features
Yi Yang, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu

Voice Conversion Based on Trajectory Model Training of Neural Networks Considering Global Variance
Naoki Hosaka, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

Comparing Articulatory and Acoustic Strategies for Reducing Non-Native Accents
Sandesh Aryal, Ricardo Gutierrez-Osuna

Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data
Seyyed Saeed Sarfjoo, Cenk Demiroglu

Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams
Lifa Sun, Hao Wang, Shiyin Kang, Kun Li, Helen Meng

Acoustic Analysis of Syllables Across Indian Languages
Anusha Prakash, Jeena J. Prakash, Hema A. Murthy

Objective Evaluation Methods for Chinese Text-To-Speech Systems
Teng Zhang, Zhipeng Chen, Ji Wu, Sam Lai, Wenhui Lei, Carsten Isert

Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis
Yusuke Ijima, Taichi Asami, Hideyuki Mizuno

A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks
Takenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi, Keiichi Tokuda

Text-to-Speech for Individuals with Vision Loss: A User Study
Monika Podsiadło, Shweta Chahar

Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks
Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki, Junichi Yamagishi

Data Selection and Adaptation for Naturalness in HMM-Based Speech Synthesis
Erica Cooper, Alison Chang, Yocheved Levitan, Julia Hirschberg


Topics in Speech Processing


A Portable Automatic PA-TA-KA Syllable Detection System to Derive Biomarkers for Neurological Disorders
Fei Tao, Louis Daudet, Christian Poellabauer, Sandra L. Schneider, Carlos Busso

Deep Neural Networks for i-Vector Language Identification of Short Utterances in Cars
Omid Ghahabi, Antonio Bonafonte, Javier Hernando, Asunción Moreno

Improving i-Vector and PLDA Based Speaker Clustering with Long-Term Features
Abraham Woubie, Jordi Luque, Javier Hernando


Show & Tell Session 1


Open Language Interface for Voice Exploitation (OLIVE)
Aaron Lawson, Mitchell McLaren, Harry Bratt, Martin Graciarena, Horacio Franco, Christopher George, Allen Stauffer, Chris Bartels, Julien VanHout

A Multimodal Dialogue System for Air Traffic Control Trainees Based on Discrete-Event Simulation
Luboš Šmídl, Adam Chýlek, Jan Švec

Lig-Aikuma: A Mobile App to Collect Parallel Speech for Under-Resourced Language Studies
Elodie Gauthier, David Blachon, Laurent Besacier, Guy-Noël Kouarata, Martine Adda-Decker, Annie Rialland, Gilles Adda, Grégoire Bachman

ARET — Automatic Reading of Educational Texts for Visually Impaired Students
Martin Grůber, Jindřich Matoušek, Zdeněk Hanzlíček, Zdeněk Krňoul, Zbyněk Zajíc


New Trends in Neural Networks for Speech Recognition


Segmental Recurrent Neural Networks for End-to-End Speech Recognition
Liang Lu, Lingpeng Kong, Chris Dyer, Noah A. Smith, Steve Renals

Acoustic Modeling Using Bidirectional Gated Recurrent Convolutional Units
Markus Nussbaum-Thom, Jia Cui, Bhuvana Ramabhadran, Vaibhava Goel

Exploiting Depth and Highway Connections in Convolutional Recurrent Deep Neural Networks for Speech Recognition
Wei-Ning Hsu, Yu Zhang, Ann Lee, James Glass

Stimulated Deep Neural Network for Speech Recognition
Chunyang Wu, Penny Karanasou, Mark J.F. Gales, Khe Chai Sim

Phonetic Context Embeddings for DNN-HMM Phone Recognition
Leonardo Badino

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
Ying Zhang, Mohammad Pezeshki, Philémon Brakel, Saizheng Zhang, César Laurent, Yoshua Bengio, Aaron Courville


Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances


Joint Speaker and Lexical Modeling for Short-Term Characterization of Speaker
Guangsen Wang, Kong Aik Lee, Trung Hieu Nguyen, Hanwu Sun, Bin Ma

Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus
Md Jahangir Alam, Patrick Kenny, Vishwa Gupta

Text Dependent Speaker Verification Using Un-Supervised HMM-UBM and Temporal GMM-UBM
Achintya Kr. Sarkar, Zheng-Hua Tan

Utterance Verification for Text-Dependent Speaker Recognition: A Comparative Assessment Using the RedDots Corpus
Tomi Kinnunen, Md. Sahidullah, Ivan Kukanov, Héctor Delgado, Massimiliano Todisco, Achintya Kr. Sarkar, Nicolai Bæk Thomsen, Ville Hautamäki, Nicholas Evans, Zheng-Hua Tan

Parallel Speaker and Content Modelling for Text-Dependent Speaker Verification
Jianbo Ma, Saad Irtza, Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah

i-Vector/HMM Based Text-Dependent Speaker Verification System for RedDots Challenge
Hossein Zeinali, Hossein Sameti, Lukáš Burget, Jan Černocký, Nooshin Maghsoodi, Pavel Matějka

Exploring Session Variability and Template Aging in Speaker Verification for Fixed Phrase Short Utterances
Rohan Kumar Das, Sarfaraz Jelil, S.R. Mahadeva Prasanna


Articulatory Measurements and Analysis


Prediction of the Articulatory Movements of Unseen Phonemes of a Speaker Using the Speech Structure of Another Speaker
Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu

Vocal Tract Length Normalization for Speaker Independent Acoustic-to-Articulatory Speech Inversion
Ganesh Sivaraman, Vikramjit Mitra, Hosung Nam, Mark Tiede, Carol Espy-Wilson

Investigation of Speed-Accuracy Tradeoffs in Speech Production Using Real-Time Magnetic Resonance Imaging
Adam C. Lammert, Christine H. Shadle, Shrikanth S. Narayanan, Thomas F. Quatieri

Characterizing Vocal Tract Dynamics Across Speakers Using Real-Time MRI
Tanner Sorensen, Asterios Toutios, Louis Goldstein, Shrikanth S. Narayanan

Tracking Contours of Orofacial Articulators from Real-Time MRI of Speech
Mathieu Labrunie, Pierre Badin, Dirk Voit, Arun A. Joseph, Laurent Lamalle, Coriandre Vilain, Louis-Jean Boë, Jens Frahm

State-of-the-Art MRI Protocol for Comprehensive Assessment of Vocal Tract Structure and Function
Sajan Goud Lingala, Asterios Toutios, Johannes Töger, Yongwan Lim, Yinghua Zhu, Yoon-Chul Kim, Colin Vaz, Shrikanth S. Narayanan, Krishna S. Nayak


Automatic Assessment of Emotions


DBN-ivector Framework for Acoustic Emotion Recognition
Rui Xia, Yang Liu

An Investigation of Emotional Speech in Depression Classification
Brian Stasak, Julien Epps, Nicholas Cummins, Roland Goecke

Retrieving Categorical Emotions Using a Probabilistic Framework to Define Preference Learning Samples
Reza Lotfian, Carlos Busso

At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech
Maximilian Schmitt, Fabien Ringeval, Björn Schuller

Speech Emotion Recognition Using Affective Saliency
Arodami Chorianopoulou, Polychronis Koutsakis, Alexandros Potamianos

Laughter Valence Prediction in Motivational Interviewing Based on Lexical and Acoustic Cues
Rahul Gupta, Nishant Nath, Taruna Agrawal, Panayiotis Georgiou, David C. Atkins, Shrikanth S. Narayanan




Special Session: Auditory-Visual Expressive Speech and Gesture in Humans and Machines


Generating Natural Video Descriptions via Multimodal Processing
Qin Jin, Junwei Liang, Xiaozhu Lin

Feature-Level Decision Fusion for Audio-Visual Word Prominence Detection
Martin Heckmann

Acoustic and Visual Analysis of Expressive Speech: A Case Study of French Acted Speech
Slim Ouni, Vincent Colotte, Sara Dahmani, Soumaya Azzi

Characterization of Audiovisual Dramatic Attitudes
Adela Barbulescu, Rémi Ronfard, Gérard Bailly

Conversational Engagement Recognition Using Auditory and Visual Cues
Yuyun Huang, Emer Gilmartin, Nick Campbell

An Acoustic Analysis of Child-Child and Child-Robot Interactions for Understanding Engagement during Speech-Controlled Computer Games
Theodora Chaspari, Jill Fain Lehman

Auditory-Visual Lexical Tone Perception in Thai Elderly Listeners with and without Hearing Impairment
Benjawan Kasisopa, Chutamanee Onsuwan, Charturong Tantibundhit, Nittayapa Klangpornkun, Suparak Techacharoenrungrueang, Sudaporn Luksaneeyanawin, Denis Burnham

Use of Agreement/Disagreement Classification in Dyadic Interactions for Continuous Emotion Recognition
Hossein Khaki, Engin Erzin


Special Session: Intelligibility Under the Microscope


Microscopic Multilingual Matrix Test Predictions Using an ASR-Based Speech Recognition Model
Marc René Schädler, David Hülsmeier, Anna Warzybok, Sabine Hochmuth, Birger Kollmeier

DNN-Based Automatic Speech Recognition as a Model for Human Phoneme Perception
Mats Exter, Bernd T. Meyer

Undoing Misperceptions: A Microscopic Analysis of Consistent Confusions Through Signal Modifications
Attila Máté Tóth, Martin Cooke

Blind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMs
Mahdie Karbasi, Ahmed Hussen Abdelaziz, Hendrik Meutzner, Dorothea Kolossa

Misperceptions Arising from Speech-in-Babble Interactions
Attila Máté Tóth, Martin Cooke, Jon Barker

Introducing Temporal Rate Coding for Speech in Cochlear Implants: A Microscopic Evaluation in Humans and Models
Anja Eichenauer, Mathias Dietz, Bernd T. Meyer, Tim Jürgens

Language Effects in Noise-Induced Word Misperceptions
Maria Luisa Garcia Lecumberri, Jon Barker, Ricard Marxer, Martin Cooke

Speech Reductions Cause a De-Weighting of Secondary Acoustic Cues
Léo Varnet, Fanny Meunier, Michel Hoen

Using Phonologically Weighted Levenshtein Distances for the Prediction of Microscopic Intelligibility
Lionel Fontan, Isabelle Ferrané, Jérôme Farinas, Julien Pinquier, Xavier Aumont

The Impact of Manner of Articulation on the Intelligibility of Voicing Contrast in Noise: Cross-Linguistic Implications
Mayuki Matsui

Directly Comparing the Listening Strategies of Humans and Machines
Michael I. Mandel


Spoken Documents, Spoken Understanding and Semantic Analysis


LSTM-Based NeuroCRFs for Named Entity Recognition
Marc-Antoine Rondeau, Yi Su

Exploring Word Mover’s Distance and Semantic-Aware Embedding Techniques for Extractive Broadcast News Summarization
Shih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu

Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition
Imran Sheikh, Irina Illina, Dominique Fohr, Georges Linarès

Beyond Utterance Extraction: Summary Recombination for Speech Summarization
Jérémy Trione, Benoit Favre, Frederic Bechet

Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling
Bing Liu, Ian Lane

Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding
Aaron Jaech, Larry Heck, Mari Ostendorf

LatticeRnn: Recurrent Neural Networks Over Lattices
Faisal Ladhak, Ankur Gandhe, Markus Dreyer, Lambert Mathias, Ariya Rastrow, Björn Hoffmeister

Learning Document Representations Using Subspace Multinomial Model
Santosh Kesiraju, Lukáš Burget, Igor Szőke, Jan Černocký

Attention-Based Convolutional Neural Networks for Sentence Classification
Zhiwei Zhao, Youzheng Wu

Spoken Language Understanding in a Latent Topic-Based Subspace
Mohamed Morchid, Mohamed Bouaziz, Waad Ben Kheder, Killian Janod, Pierre-Michel Bousquet, Richard Dufour, Georges Linarès

Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM
Dilek Hakkani-Tür, Gokhan Tur, Asli Celikyilmaz, Yun-Nung Chen, Jianfeng Gao, Li Deng, Ye-Yi Wang

Deep Stacked Autoencoders for Spoken Language Understanding
Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori

Labeled Data Generation with Encoder-Decoder LSTM for Semantic Slot Filling
Gakuto Kurata, Bing Xiang, Bowen Zhou

Exploring the Correlation of Pitch Accents and Semantic Slots for Spoken Language Understanding
Sabrina Stehwien, Ngoc Thang Vu

Analysis on Gated Recurrent Unit Based Question Detection Approach
Yaodong Tang, Zhiyong Wu, Helen Meng, Mingxing Xu, Lianhong Cai


Spoken Term Detection


Combining State-Level Spotting and Posterior-Based Acoustic Match for Improved Query-by-Example Spoken Term Detection
Shuji Oishi, Tatsuya Matsuba, Mitsuaki Makino, Atsuhiko Kai

A Novel Discriminative Score Calibration Method for Keyword Search
Zhiqiang Lv, Meng Cai, Wei-Qiang Zhang, Jia Liu

Segmented Dynamic Time Warping for Spoken Query-by-Example Search
Jorge Proença, Fernando Perdigão

Generating Complementary Acoustic Model Spaces in DNN-Based Sequence-to-Frame DTW Scheme for Out-of-Vocabulary Spoken Term Detection
Shi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh

Multi-Task Learning and Weighted Cross-Entropy for DNN-Based Keyword Spotting
Sankaran Panchapagesan, Ming Sun, Aparna Khare, Spyros Matsoukas, Arindam Mandal, Björn Hoffmeister, Shiv Vitaladevuni

Audio Word2Vec: Unsupervised Learning of Audio Segment Representations Using Sequence-to-Sequence Autoencoder
Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-Yi Lee, Lin-Shan Lee

Non-Uniform Boosted MCE Training of Deep Neural Networks for Keyword Spotting
Zhong Meng, Biing-Hwang Juang

Language Model Data Augmentation for Keyword Spotting in Low-Resourced Training Conditions
Arseniy Gorin, Rasa Lileikytė, Guangpu Huang, Lori Lamel, Jean-Luc Gauvain, Antoine Laurent


Show & Tell Session 2


STON: Efficient Subtitling in Dutch Using State-of-the-Art Tools
Lyan Verwimp, Brecht Desplanques, Kris Demuynck, Joris Pelemans, Marieke Lycke, Patrick Wambacq

An Automatic Training Tool for Air Traffic Control Training
Petr Stanislav, Luboš Šmídl, Jan Švec

Digitala: An Augmented Test and Review Process Prototype for High-Stakes Spoken Foreign Language Examination
Reima Karhila, Aku Rouhe, Peter Smit, André Mansikkaniemi, Heini Kallio, Erik Lindroos, Raili Hildén, Martti Vainio, Mikko Kurimo

Exploring Collections of Multimedia Archives Through Innovative Interfaces in the Context of Digital Humanities
Géraldine Damnati, Delphine Charlet, Marc Denjean



Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge


The Speakers in the Wild (SITW) Speaker Recognition Database
Mitchell McLaren, Luciana Ferrer, Diego Castan, Aaron Lawson

The 2016 Speakers in the Wild Speaker Recognition Evaluation
Mitchell McLaren, Luciana Ferrer, Diego Castan, Aaron Lawson

Analysis of Speaker Recognition Systems in Realistic Scenarios of the SITW 2016 Challenge
Ondřej Novotný, Pavel Matějka, Oldřich Plchot, Ondřej Glembek, Lukáš Burget, Jan Černocký

A Speaker Recognition System for the SITW Challenge
Oleg Kudashev, Sergey Novoselov, Konstantin Simonchik, Alexandr Kozlov

Speakers In The Wild (SITW): The QUT Speaker Recognition System
H. Ghaemmaghami, M.H. Rahman, Ivan Himawan, David Dean, Ahilan Kanagasundaram, Sridha Sridharan, Clinton Fookes

AUT System for SITW Speaker Recognition Challenge
Abbas Khosravani, Mohammad Mehdi Homayounpour

LIA System for the SITW Speaker Recognition Challenge
Waad Ben Kheder, Moez Ajili, Pierre-Michel Bousquet, Driss Matrouf, Jean-François Bonastre

Investigating Various Diarization Algorithms for Speaker in the Wild (SITW) Speaker Recognition Challenge
Yi Liu, Yao Tian, Liang He, Jia Liu



Behavioral Signal Processing and Speaker State and Traits Analytics


Privacy-Preserving Speech Analytics for Automatic Assessment of Student Collaboration
Nikoletta Bassiou, Andreas Tsiartas, Jennifer Smith, Harry Bratt, Colleen Richey, Elizabeth Shriberg, Cynthia D’Angelo, Nonye Alozie

Complexity in Prosody: A Nonlinear Dynamical Systems Approach for Dyadic Conversations; Behavior and Outcomes in Couples Therapy
Md. Nasir, Brian Baucom, Shrikanth S. Narayanan, Panayiotis Georgiou

Couples Behavior Modeling and Annotation Using Low-Resource LSTM Language Models
Shao-Yen Tseng, Sandeep Nallan Chakravarthula, Brian Baucom, Panayiotis Georgiou

Speech Likability and Personality-Based Social Relations: A Round-Robin Analysis over Communication Channels
Laura Fernández Gallardo, Benjamin Weiss

Behavioral Coding of Therapist Language in Addiction Counseling Using Recurrent Neural Networks
Bo Xiao, Doğan Can, James Gibson, Zac E. Imel, David C. Atkins, Panayiotis Georgiou, Shrikanth S. Narayanan

Factor Analysis Based Speaker Normalisation for Continuous Emotion Prediction
Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah


Spoken Term Detection


Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term Detection
Dhananjay Ram, Afsaneh Asaei, Hervé Bourlard

Unsupervised Bottleneck Features for Low-Resource Query-by-Example Spoken Term Detection
Hongjie Chen, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li

A Nonparametric Bayesian Approach for Spoken Term Detection by Example Query
Amir Hossein Harati Nejad Torbati, Joseph Picone

Rescoring Hypothesized Detections of Out-of-Vocabulary Keywords Using Subword Samples
Van Tung Pham, Haihua Xu, Xiong Xiao, Nancy F. Chen, Eng Siong Chng, Haizhou Li

Unrestricted Vocabulary Keyword Spotting Using LSTM-CTC
Yimeng Zhuang, Xuankai Chang, Yanmin Qian, Kai Yu

Interactive Spoken Content Retrieval by Deep Reinforcement Learning
Yen-Chen Wu, Tzu-Hsiang Lin, Yang-De Chen, Hung-Yi Lee, Lin-Shan Lee



Acoustic and Articulatory Phonetics


Vowels and Diphthongs in Cangnan Southern Min Chinese Dialect
Fang Hu, Chunyu Ge

Diphthongization of Nuclear Vowels and the Emergence of a Tetraphthong in Hetang Cantonese
Wenqi Hu, Fang Hu, Jian Jin

PhonVoc: A Phonetic and Phonological Vocoding Toolkit
Milos Cernak, Philip N. Garner

Vowels and Diphthongs in the Taiyuan Jin Chinese Dialect
Liping Xia, Fang Hu

The Effects of Prosody on French V-to-V Coarticulation: A Corpus-Based Study
Giuseppina Turco, Cécile Fougeron, Nicolas Audibert

An Acoustic Analysis of /r/ in Tyrolean
Vincenzo Galatà, Lorenzo Spreafico, Alessandro Vietti, Constantijn Kaland

Hyperarticulated Production of Korean Glides by Age Group
Seung-Eun Chang, Minsook Kim

Coda Stop and Taiwan Min Checked Tone Sound Changes
Ho-hsien Pan, Hsiao-tung Huang, Shao-ren Lyu


Prosody, Phonation and Voice Quality


The Influence of Modality and Speaking Style on the Assimilation Type and Categorization Consistency of Non-Native Speech
Sarah E. Fenwick, Catherine T. Best, Chris Davis, Michael D. Tyler

Prosodic Convergence with Spoken Stimuli in Laboratory Data
Margaret Zellers

Effects of Stress on Fricatives: Evidence from Standard Modern Greek
Charalambos Themistocleous, Angelandria Savva, Andrie Aristodemou

Analysis of Chinese Syllable Durations in Running Speech of Japanese L2 Learners
Yue Sun, Shudon Hsiao, Yoshinori Sagisaka, Jinsong Zhang

Automatic Paragraph Segmentation with Lexical and Prosodic Features
Catherine Lai, Mireia Farrús, Johanna D. Moore

Automatic Glottal Inverse Filtering with Non-Negative Matrix Factorization
Manu Airaksinen, Lauri Juvela, Tom Bäckström, Paavo Alku

Speaker Identity and Voice Quality: Modeling Human Responses and Automatic Speaker Recognition
Soo Jin Park, Caroline Sigouin, Jody Kreiman, Patricia Keating, Jinxi Guo, Gary Yeung, Fang-Yu Kuo, Abeer Alwan

Analysis of Glottal Stop in Assam Sora Language
Sishir Kalita, Luke Horo, Priyankoo Sarmah, S.R. Mahadeva Prasanna, S. Dandapat

Acoustic Differences Between English /t/ Glottalization and Phrasal Creak
Marc Garellek, Scott Seyfarth

The Acoustics of Lexical Stress in Italian as a Function of Stress Level and Speaking Style
Anders Eriksson, Pier Marco Bertinetto, Mattias Heldner, Rosalba Nodari, Giovanna Lenoci

Cross-Gender and Cross-Dialect Tone Recognition for Vietnamese
Antje Schweitzer, Ngoc Thang Vu

Prosody Modification Using Allpass Residual of Speech Signals
Karthika Vijayan, K. Sri Rama Murty

Analyzing the Contribution of Top-Down Lexical and Bottom-Up Acoustic Cues in the Detection of Sentence Prominence
Sofoklis Kakouros, Joris Pelemans, Lyan Verwimp, Patrick Wambacq, Okko Räsänen

A Longitudinal Study of Children’s Intonation in Narrative Speech
Jeffrey Kallay, Melissa A. Redford


Speech Production Analysis and Modeling


Velum Control for Oral Sounds
Reed Blaylock, Louis Goldstein, Shrikanth S. Narayanan

F0 Development in Acquiring Korean Stop Distinction
Gayeon Son

Phonetic Reduction Can Lead to Lengthening, and Enhancement Can Lead to Shortening
Clara Cohen, Matt Carlson

Mechanical Production of [b], [m] and [w] Using Controlled Labial and Velopharyngeal Gestures
Takayuki Arai

An Improved 3D Geometric Tongue Model
Qiang Fang, Yun Chen, Haibo Wang, Jianguo Wei, Jianrong Wang, Xiyu Wu, Aijun Li

Congruency Effect Between Articulation and Grasping in Native English Speakers
Mikko Tiainen, Fatima M. Felisberti, Kaisa Tiippana, Martti Vainio, Juraj Simko, Jiri Lukavsky, Lari Vainio

Emergence of Vocal Developmental Sequences in a Predictive Coding Model of Speech Acquisition
Shamima Najnin, Bonny Banerjee

Categorization of Natural Spanish Whistled Vowels by Naïve Spanish Listeners
Julien Meyer, Laure Dentel, Fanny Meunier

Between- and Within-Speaker Effects of Bilingualism on F0 Variation
Rob Voigt, Dan Jurafsky, Meghan Sumner

Vowel Characteristics in the Assessment of L2 English Pronunciation
Calbert Graham, Paula Buttery, Francis Nolan

Kulning (Swedish Cattle Calls): Acoustic, EGG, Stroboscopic and High-Speed Video Analyses of an Unusual Singing Style
Ahmed Geneid, Anne-Maria Laukkanen, Anita McAllister, Robert Eklund

Glottal Squeaks in VC Sequences
Míša Hejná, Pertti Palo, Scott Moisik

Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks
Naoya Takahashi, Tofigh Naghibi, Beat Pfister


Spoken Dialogue Systems


Personalized Natural Language Understanding
Xiaohu Liu, Ruhi Sarikaya, Liang Zhao, Yong Ni, Yi-Cheng Pan

A Sequence-to-Sequence Model for User Simulation in Spoken Dialogue Systems
Layla El Asri, Jing He, Kaheer Suleman

Root Cause Analysis of Miscommunication Hotspots in Spoken Dialogue Systems
Spiros Georgiladakis, Georgia Athanasopoulou, Raveesh Meena, José Lopes, Arodami Chorianopoulou, Elisavet Palogiannidi, Elias Iosif, Gabriel Skantze, Alexandros Potamianos

Making Personal Digital Assistants Aware of What They Do Not Know
Omar Zia Khan, Ruhi Sarikaya

Implementing Acoustic-Prosodic Entrainment in a Conversational Avatar
Rivka Levitan, Štefan Beňuš, Ramiro H. Gálvez, Agustín Gravano, Florencia Savoretti, Marian Trnka, Andreas Weise, Julia Hirschberg

Perceived Usability and Cognitive Demand of Secondary Tasks in Spoken Versus Visual-Manual Automotive Interaction
Annika Silvervarg, Sofia Lindvall, Jonatan Andersson, Ida Esberg, Christian Jernberg, Filip Frumerie, Arne Jönsson


Show & Tell Session 3


Zara: An Empathetic Interactive Virtual Agent
Pascale Fung, Anik Dey, Farhad Bin Siddique, Ruixi Lin, Yang Yang, Wan Yan, Ricky Ho Yin Chan

Measuring Pronunciation Improvement in Users of CAPT Tool TipTopTalk!
Cristian Tejedor-García, David Escudero-Mancebo, Enrique Cámara-Arenas, César González-Ferreras, Valentín Cardeñoso-Payo

SparkNG: Interactive MATLAB Tools for Introduction to Speech Production, Perception and Processing Fundamentals and Application of the Aliasing-Free L-F Model Component
Hideki Kawahara

Real-Time Tracking of Speakers’ Emotions, States, and Traits on Mobile Platforms
Erik Marchi, Florian Eyben, Gerhard Hagerer, Björn Schuller


Special Event: Mindfulness


Mindfulness Special Event
Nikki Mirghafori


Keynote 2: Edward Chang


The Human Speech Cortex
Edward Chang


Special Event: Speaker Comparison for Forensic and Investigative Applications II


Speaker Comparison for Forensic and Investigative Applications II
Jean-François Bonastre, Joseph P. Campbell, Anders Eriksson, Hiro Nakasone, Reva Schwartz


Special Session: Clinical and Neuroscience-Inspired Vocal Biomarkers of Neurological and Psychiatric Disorders


Acoustic-Prosodic and Turn-Taking Features in Interactions with Children with Neurodevelopmental Disorders
Daniel Bone, Somer Bishop, Rahul Gupta, Sungbok Lee, Shrikanth S. Narayanan

Automatic Detection of Parkinson’s Disease Based on Modulated Vowels
Daria Hemmerling, Juan Rafael Orozco-Arroyave, Andrzej Skalski, Janusz Gajda, Elmar Nöth

Towards Automatic Detection of Amyotrophic Lateral Sclerosis from Speech Acoustic and Articulatory Samples
Jun Wang, Prasanna V. Kothalkar, Beiming Cao, Daragh Heitzman

Neurophysiological Vocal Source Modeling for Biomarkers of Disease
Gregory Ciccarelli, Thomas F. Quatieri, Satrajit S. Ghosh

Relation of Automatically Extracted Formant Trajectories with Intelligibility Loss and Speaking Rate Decline in Amyotrophic Lateral Sclerosis
Rachelle L. Horwitz-Martin, Thomas F. Quatieri, Adam C. Lammert, James R. Williamson, Yana Yunusova, Elizabeth Godoy, Daryush D. Mehta, Jordan R. Green

Automatic Analysis of Typical and Atypical Encoding of Spontaneous Emotion in the Voice of Children
Fabien Ringeval, Erik Marchi, Charline Grossard, Jean Xavier, Mohamed Chetouani, David Cohen, Björn Schuller

Recognition of Depression in Bipolar Disorder: Leveraging Cohort and Person-Specific Knowledge
Soheil Khorram, John Gideon, Melvin McInnis, Emily Mower Provost

Diagnosing People with Dementia Using Automatic Conversation Analysis
Bahman Mirheidari, Daniel Blackburn, Markus Reuber, Traci Walker, Heidi Christensen


Special Session: Singing Synthesis Challenge: Fill-In the Gap


SERAPHIM: A Wavetable Synthesis System with 3D Lip Animation for Real-Time Speech and Singing Applications on Mobile Platforms
Paul Yaozhu Chan, Minghui Dong, Grace Xue Hui Ho, Haizhou Li

Expressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016
Jordi Bonada, Martí Umbert, Merlijn Blaauw

Vocal Effort Modification for Singing Synthesis
Olivier Perrotin, Christophe d’Alessandro

Bertsokantari: a TTS Based Singing Synthesis System
Eder del Blanco, Inma Hernaez, Eva Navas, Xabier Sarasola, D. Erro

Evaluation of Singing Synthesis: Methodology and Case Study with Concatenative and Performative Systems
Lionel Feugère, Christophe d’Alessandro, Samuel Delalez, Luc Ardaillon, Axel Roebel

Expressive Control of Singing Voice Synthesis Using Musical Contexts and a Parametric F0 Model
Luc Ardaillon, Celine Chabot-Canet, Axel Roebel

Optimal Unit Stitching in a Unit Selection Singing Synthesis System
Marius Cotescu



Automatic Learning of Representations


Inferring Phonemic Classes from CNN Activation Maps Using Clustering Techniques
Thomas Pellegrini, Sandrine Mouysset

Joint Learning of Speaker and Phonetic Similarities with Siamese Networks
Neil Zeghidour, Gabriel Synnaeve, Nicolas Usunier, Emmanuel Dupoux

Unsupervised Learning of Acoustic Units Using Autoencoders and Kohonen Nets
Vikramjit Mitra, Dimitra Vergyri, Horacio Franco

Learning Multiscale Features Directly from Waveforms
Zhenyao Zhu, Jesse H. Engel, Awni Hannun

Supervised Learning of Acoustic Models in a Zero Resource Setting to Improve DPGMM Clustering
Michael Heck, Sakriani Sakti, Satoshi Nakamura

Semi-Supervised and Cross-Lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models Under Low-Resource Conditions
Haihua Xu, Hang Su, Chongjia Ni, Xiong Xiao, Hao Huang, Eng Siong Chng, Haizhou Li


Language Modeling for Conversational Speech and Confidence Measures


Recurrent Out-of-Vocabulary Word Detection Using Distribution of Features
Taichi Asami, Ryo Masumura, Yushi Aono, Koichi Shinoda

Investigation of Semi-Supervised Acoustic Model Training Based on the Committee of Heterogeneous Neural Networks
Naoyuki Kanda, Shoji Harada, Xugang Lu, Hisashi Kawai

Acoustic Word Embeddings for ASR Error Detection
Sahar Ghannay, Yannick Estève, Nathalie Camelin, Paul deléglise

Combining Semantic Word Classes and Sub-Word Unit Speech Recognition for Robust OOV Detection
Axel Horndasch, Anton Batliner, Caroline Kaufhold, Elmar Nöth

Web Data Selection Based on Word Embedding for Low-Resource Speech Recognition
Chuandong Xie, Wu Guo, Guoping Hu, Junhua Liu

Colloquialising Modern Standard Arabic Text for Improved Speech Recognition
Sarah Al-Shareef, Thomas Hain



Behavioral Signal Processing and Speaker State and Traits Analytics


Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion Recognition
Che-Wei Huang, Shrikanth S. Narayanan

Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition
Linchuan Li, Zhiyong Wu, Mingxing Xu, Helen Meng, Lianhong Cai

Inter-Speech Clicks in an Interspeech Keynote
Jürgen Trouvain, Zofia Malisz

Speaker Age Classification and Regression Using i-Vectors
Joanna Grzybowska, Stanisław Kacprzak

Sparsely Connected and Disjointly Trained Deep Neural Networks for Low Resource Behavioral Annotation: Acoustic Classification in Couples’ Therapy
Haoqi Li, Brian Baucom, Panayiotis Georgiou

Automatically Classifying Self-Rated Personality Scores from Speech
Guozhen An, Sarah Ita Levitan, Rivka Levitan, Andrew Rosenberg, Michelle Levine, Julia Hirschberg

Estimation of Children’s Physical Characteristics from Their Voices
Jill Fain Lehman, Rita Singh

Talking to a System and Talking to a Human: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task
Hayakawa Akira, Saturnino Luz, Nick Campbell

Predicting Affective Dimensions Based on Self Assessed Depression Severity
Rahul Gupta, Shrikanth S. Narayanan

Enhancement of Automatic Oral Presentation Assessment System Using Latent N-Grams Word Representation and Part-of-Speech Information
Wen-Yu Huang, Shan-Wen Hsiao, Hung-Ching Sun, Ming-Chuan Hsieh, Ming-Hsueh Tsai, Chi-Chun Lee

Use of Vowels in Discriminating Speech-Laugh from Laughter and Neutral Speech
Sri Harsha Dumpala, P. Gangamohan, Suryakanth V. Gangashetty, B. Yegnanarayana

A Convex Model for Linguistic Influence in Group Conversations
Kan Kawabata, Visar Berisha, Anna Scaglione, Amy LaCross

A Deep Learning Approach to Modeling Empathy in Addiction Counseling
James Gibson, Doğan Can, Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis Georgiou, Shrikanth S. Narayanan

Unipolar Depression vs. Bipolar Disorder: An Elicitation-Based Approach to Short-Term Detection of Mood Disorder
Kun-Yi Huang, Chung-Hsien Wu, Yu-Ting Kuo, Fong-Lin Jang


Speech Synthesis Poster


Conditional Random Fields for the Tunisian Dialect Grapheme-to-Phoneme Conversion
Abir Masmoudi, Mariem Ellouze, Fethi Bougares, Yannick Esètve, Lamia Belguith

Efficient Thai Grapheme-to-Phoneme Conversion Using CRF-Based Joint Sequence Modeling
Sittipong Saychum, Sarawoot Kongyoung, Anocha Rugchatjaroen, Patcharika Chootrakool, Sawit Kasuriya, Chai Wutiwiwatchai

An Articulatory-Based Singing Voice Synthesis Using Tongue and Lips Imaging
Aurore Jaumard-Hakoun, Kele Xu, Clémence Leboullenger, Pierre Roussel-Ragot, Bruce Denby

Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis
Xu Li, Zhiyong Wu, Helen Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai

Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data
Xu Li, Zhiyong Wu, Helen Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai

Audio-to-Visual Speech Conversion Using Deep Neural Networks
Sarah Taylor, Akihiro Kato, Iain Matthews, Ben Milner

Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine
Toru Nakashika, Yasuhiro Minami

Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data
Asterios Toutios, Tanner Sorensen, Krishna Somandepalli, Rachel Alexander, Shrikanth S. Narayanan

Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information
Xurong Xie, Xunying Liu, Lan Wang

Articulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural Networks
Zheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai

Generating Gestural Scores from Acoustics Through a Sparse Anchor-Based Representation of Speech
Christopher Liberatore, Ricardo Gutierrez-Osuna

On the Suitability of Vocalic Sandwiches in a Corpus-Based TTS Engine
David Guennec, Damien Lolive

Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi

Using Zero-Frequency Resonator to Extract Multilingual Intonation Structure
Jinfu Ni, Yoshinori Shiga, Hisashi Kawai


Resources and Annotation of Resources


A DNN-HMM Approach to Story Segmentation
Jia Yu, Xiong Xiao, Lei Xie, Eng Siong Chng, Haizhou Li

The SIWIS Database: A Multilingual Speech Database with Acted Emphasis
Jean-Philippe Goldman, Pierre-Edouard Honnet, Rob Clark, Philip N. Garner, Maria Ivanova, Alexandros Lazaridis, Hui Liang, Tiago Macedo, Beat Pfister, Manuel Sam Ribeiro, Eric Wehrli, Junichi Yamagishi

Open Source Speech and Language Resources for Frisian
Emre Yılmaz, Henk van den Heuvel, Jelske Dijkstra, Hans Van de Velde, Frederik Kampstra, Jouke Algra, David Van Leeuwen

The SRI CLEO Speaker-State Corpus
Andreas Kathol, Elizabeth Shriberg, Massimilano de Zambotti

SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin Chinese
Nancy F. Chen, Rong Tong, Darren Wee, Peixuan Lee, Bin Ma, Haizhou Li

The SRI Speech-Based Collaborative Learning Corpus
Colleen Richey, Cynthia D’Angelo, Nonye Alozie, Harry Bratt, Elizabeth Shriberg

An Expectation Maximization Approach to Joint Modeling of Multidimensional Ratings Derived from Multiple Annotators
Anil Ramakrishna, Rahul Gupta, Ruth B. Grossman, Shrikanth S. Narayanan

Voting Detector: A Combination of Anomaly Detectors to Reveal Annotation Errors in TTS Corpora
Jindřich Matoušek, Daniel Tihelka


Show & Tell Session 4


The Magic Stone: A Video Game to Improve Communication Skills of People with Intellectual Disabilities
Mario Corrales-Astorgano, David Escudero-Mancebo, César González-Ferreras, Yurena Gutiérrez-González, Valle Flores-Lucas, Valentín Cardeñoso-Payo, Lourdes Aguilar-Cuevas

Identifying Perceptually Similar Voices with a Speaker Recognition System Using Auto-Phonetic Features
Finnian Kelly, Anil Alexander, Oscar Forth, Samuel Kent, Jonas Lindh, Joel Åkesson

A Real-Time Framework for Visual Feedback of Articulatory Data Using Statistical Shape Models
Kristy James, Alexander Hewer, Ingmar Steiner, Stefanie Wuhrer

Flexible, Rapid Authoring of Goal-Orientated, Multi-Turn Dialogues Using the Task Completion Platform
Alex Marin, Paul Crook, Omar Zia Khan, Vasiliy Radostev, Khushboo Aggarwal, Ruhi Sarikaya


Acoustic Model Adaptation


Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models
Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Takuya Yoshioka, Dung T. Tran, Tomohiro Nakatani

Transfer Learning with Bottleneck Feature Networks for Whispered Speech Recognition
Boon Pang Lim, Faith Wong, Yuyao Li, Jia Wei Bay

Adaptation of Neural Networks Constrained by Prior Statistics of Node Co-Activations
Tasha Nagamine, Zhuo Chen, Nima Mesgarani

Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings
Masayuki Suzuki, Ryuki Tachibana, Samuel Thomas, Bhuvana Ramabhadran, George Saon

Subspace LHUC for Fast Adaptation of Deep Neural Network Acoustic Models
Lahiru Samarakoon, Khe Chai Sim

Improving Children’s Speech Recognition Through Out-of-Domain Data Augmentation
Joachim Fainberg, Peter Bell, Mike Lincoln, Steve Renals


Special Session: Sharing Research and Education Resources for Understanding Speech Processing


Virtual Machines and Containers as a Platform for Experimentation
Florian Metze, Eric Riebling, Anne S. Warlaumont, Elika Bergelson

CloudCAST — Remote Speech Technology for Speech Professionals
Phil Green, Ricard Marxer, Stuart Cunningham, Heidi Christensen, Frank Rudzicz, Maria Yancheva, André Coy, Massimiliano Malavasi, Lorenzo Desideri, Fabio Tamburini

webASR 2 — Improved Cloud Based Speech Technology
Thomas Hain, Jeremy Christian, Oscar Saz, Salil Deena, Madina Hasan, Raymond W.M. Ng, Rosanna Milner, Mortaza Doulaty, Yulan Liu

Sharing Speech Synthesis Software for Research and Education Within Low-Tech and Low-Resource Communities
Andrew R. Plummer, Mary E. Beckman

The Berkeley Phonetics Machine
Ronald L. Sprouse, Keith Johnson

Experiences with Shared Resources for Research and Education in Speech and Language Processing
Rebecca Bates, Eric Fosler-Lussier, Florian Metze, Martha Larson, Gina-Anne Levow, Emily Mower Provost


Special Session: Voice Conversion Challenge


The Voice Conversion Challenge 2016
Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, Junichi Yamagishi

Analysis of the Voice Conversion Challenge 2016 Evaluation Results
Mirjam Wester, Zhizheng Wu, Junichi Yamagishi

The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion
Ling-Hui Chen, Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Li-Rong Dai

A Voice Conversion Mapping Function Based on a Stacked Joint-Autoencoder
Seyed Hamidreza Mohammadi, Alexander Kain

Locally Linear Embedding for Exemplar-Based Spectral Conversion
Yi-Chiao Wu, Hsin-Te Hwang, Chin-Cheng Hsu, Yu Tsao, Hsin-Min Wang

Applying Spectral Normalisation and Efficient Envelope Estimation and Statistical Transformation for the Voice Conversion Challenge 2016
Fernando Villavicencio, Junichi Yamagishi, Jordi Bonada, Felipe Espic

ML Parameter Generation with a Reformulated MGE Training Criterion — Participation in the Voice Conversion Challenge 2016
D. Erro, A. Alonso, L. Serrano, D. Tavarez, I. Odriozola, Xabier Sarasola, Eder del Blanco, J. Sanchez, I. Saratxaga, Eva Navas, Inma Hernaez

The NU-NAIST Voice Conversion System for the Voice Conversion Challenge 2016
Kazuhiro Kobayashi, Shinnosuke Takamichi, Satoshi Nakamura, Tomoki Toda



Robust Speaker Recognition and Anti-Spoofing


Integrated Spoofing Countermeasures and Automatic Speaker Verification: An Evaluation on ASVspoof 2015
Md. Sahidullah, Héctor Delgado, Massimiliano Todisco, Hong Yu, Tomi Kinnunen, Nicholas Evans, Zheng-Hua Tan

Cross-Database Evaluation of Audio-Based Spoofing Detection Systems
Pavel Korshunov, Sébastien Marcel

Investigation of Sub-Band Discriminative Information Between Spoofed and Genuine Speech
Kaavya Sriskandaraja, Vidhyasaharan Sethu, Phu Ngoc Le, Eliathamby Ambikairajah

An Investigation of Spoofing Speech Detection Under Additive Noise and Reverberant Conditions
Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li

Robust Speaker Recognition with Combined Use of Acoustic and Throat Microphone Speech
Md. Sahidullah, Rosa Gonzalez Hautamäki, Dennis Alexander Lehmann Thomsen, Tomi Kinnunen, Zheng-Hua Tan, Ville Hautamäki, Robert Parts, Martti Pitkänen

Statistical Modeling of Speaker’s Voice with Temporal Co-Location for Active Voice Authentication
Zhong Meng, Biing-Hwang Juang


Speech Enhancement and Applications


Joint Enhancement and Coding of Speech by Incorporating Wiener Filtering in a CELP Codec
Johannes Fischer, Tom Bäckström

Multi-Channel Linear Prediction Based on Binaural Coherence for Speech Dereverberation
Hong Liu, Xiuling Wang, Miao Sun, Cheng Pang

Single-Channel Speech Enhancement Using Double Spectrum
Martin Blass, Pejman Mowlaee, W. Bastiaan Kleijn

On the Appropriateness of Complex-Valued Neural Networks for Speech Enhancement
Lukas Drude, Bhiksha Raj, Reinhold Haeb-Umbach

Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement
Steffen Zeiler, Hendrik Meutzner, Ahmed Hussen Abdelaziz, Dorothea Kolossa

Assessing Speech Quality in Speech-Aware Hearing Aids Based on Phoneme Posteriorgrams
Constantin Spille, Hendrik Kayser, Hynek Hermansky, Bernd T. Meyer



Speaker Recognition


Analysis of Face Mask Effect on Speaker Recognition
Rahim Saeidi, Ilkka Huhtakallio, Paavo Alku

Data Selection for Within-Class Covariance Estimation
Elliot Singer, Tyler Campbell, Douglas Reynolds

Inter-Task System Fusion for Speaker Recognition
M. Ferras, Srikanth Madikeri, S. Dey, Petr Motlicek, Hervé Bourlard

Mahalanobis Metric Scoring Learned from Weighted Pairwise Constraints in I-Vector Speaker Recognition System
Zhenchun Lei, Yanhong Wan, Jian Luo, Yingen Yang

Novel Subband Autoencoder Features for Detection of Spoofed Speech
Meet H. Soni, Tanvina B. Patel, Hemant A. Patil

On the Issue of Calibration in DNN-Based Speaker Recognition Systems
Mitchell McLaren, Diego Castan, Luciana Ferrer, Aaron Lawson

Probabilistic Approach Using Joint Long and Short Session i-Vectors Modeling to Deal with Short Utterances for Speaker Recognition
Waad Ben Kheder, Driss Matrouf, Moez Ajili, Jean-François Bonastre

Short Utterance Variance Modelling and Utterance Partitioning for PLDA Speaker Verification
Ahilan Kanagasundaram, David Dean, Sridha Sridharan, Clinton Fookes, Ivan Himawan

Speaker-Dependent Dictionary-Based Speech Enhancement for Text-Dependent Speaker Verification
Nicolai Bæk Thomsen, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Børge Lindberg, Søren Holdt Jensen

Text-Available Speaker Recognition System for Forensic Applications
Chengzhu Yu, Chunlei Zhang, Finnian Kelly, Abhijeet Sangwan, John H.L. Hansen

Transfer Learning for Speaker Verification on Short Utterances
Qingyang Hong, Lin Li, Lihong Wan, Jun Zhang, Feng Tong

Twin Model G-PLDA for Duration Mismatch Compensation in Text-Independent Speaker Verification
Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong Aik Lee

Universal Background Sparse Coding and Multilayer Bootstrap Network for Speaker Clustering
Xiao-Lei Zhang

Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data
Yao Tian, Meng Cai, Liang He, Wei-Qiang Zhang, Jia Liu


Decoding, System Combination


Maximum a posteriori Based Decoding for CTC Acoustic Models
Naoyuki Kanda, Xugang Lu, Hisashi Kawai

Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures
Afsaneh Asaei, Gil Luyet, Milos Cernak, Hervé Bourlard

Model Compression Applied to Small-Footprint Keyword Spotting
George Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, Shiv Vitaladevuni

Why do ASR Systems Despite Neural Nets Still Depend on Robust Features
Angel Mario Castro Martinez, Marc René Schädler

An Adaptive Multi-Band System for Low Power Voice Command Recognition
Qing He, Gregory W. Wornell, Wei Ma

Memory-Efficient Modeling and Search Techniques for Hardware ASR Decoders
Michael Price, Anantha Chandrakasan, James Glass

Log-Linear System Combination Using Structured Support Vector Machines
J. Yang, Anton Ragni, Mark J.F. Gales, Kate M. Knill

Efficient Segmental Cascades for Speech Recognition
Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

A WFST Framework for Single-Pass Multi-Stream Decoding
Sirui Xu, Eric Fosler-Lussier

Comparison of Multiple System Combination Techniques for Keyword Spotting
William Hartmann, Le Zhang, Kerri Barnes, Roger Hsiao, Stavros Tsakalidis, Richard Schwartz

Rescoring by Combination of Posteriorgram Score and Subword-Matching Score for Use in Query-by-Example
Masato Obara, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh

Phone Synchronous Decoding with CTC Lattice
Zhehuai Chen, Wei Deng, Tao Xu, Kai Yu


Special Session: Clinical and Neuroscience-Inspired Vocal Biomarkers of Neurological and Psychiatric Disorders


Speech Features for Depression Detection
Saurabh Sahu, Carol Espy-Wilson

Parkinson’s Disease Progression Assessment from Speech Using GMM-UBM
T. Arias-Vergara, J.C. Vasquez-Correa, Juan Rafael Orozco-Arroyave, J.F. Vargas-Bonilla, Elmar Nöth

Speech-Based Detection of Alzheimer’s Disease in Conversational German
Jochen Weiner, Christian Herff, Tanja Schultz

Cross-Cultural Depression Recognition from Vocal Biomarkers
Sharifa Alghowinem, Roland Goecke, Julien Epps, Michael Wagner, Jeffrey Cohn

Speech Recognition in Alzheimer’s Disease and in its Assessment
Luke Zhou, Kathleen C. Fraser, Frank Rudzicz

Does She Speak RTT? Towards an Earlier Identification of Rett Syndrome Through Intelligent Pre-Linguistic Vocalisation Analysis
Florian B. Pokorny, Peter B. Marschik, Christa Einspieler, Björn Schuller

Speech Rhythm in Parkinson’s Disease: A Study on Italian
Massimo Pettorino, Maria Grazia Busà, Elisa Pellegrino


Show & Tell Session 5


English Language Speech Assistant
Xavier Anguera, Vu Van

Remeeting — Deep Insights to Conversations
Allen Guo, Arlo Faria, Korbinian Riedhammer

SERAPHIM Live! — Singing Synthesis for the Performer, the Composer, and the 3D Game Developer
Paul Yaozhu Chan, Minghui Dong, Grace Xue Hui Ho, Haizhou Li

My-Own-Voice: A Web Service That Allows You to Create a Text-to-Speech Voice From Your Own Voice
Fabrice Malfrere, Olivier Deroo, Emmanuelle Franques, Jonathan Hourez, Nicolas Mazars, Vincent Pagel, Geoffrey Wilfart



Far-Field Speech Processing


Reducing the Computational Complexity of Multimicrophone Acoustic Models with Integrated Feature Extraction
Tara N. Sainath, Arun Narayanan, Ron J. Weiss, Ehsan Variani, Kevin W. Wilson, Michiel Bacchiani, Izhak Shafran

Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition
Bo Li, Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Michiel Bacchiani

Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks
Hakan Erdogan, John R. Hershey, Shinji Watanabe, Michael I. Mandel, Jonathan Le Roux

Channel Selection for Distant Speech Recognition Exploiting Cepstral Distance
Cristina Guerrero, Georgina Tryfou, Maurizio Omologo

Multichannel Spatial Clustering for Robust Far-Field Automatic Speech Recognition in Mismatched Conditions
Michael I. Mandel, Jon Barker

Far-Field ASR Without Parallel Data
Vijayaditya Peddinti, Vimal Manohar, Yiming Wang, Daniel Povey, Sanjeev Khudanpur


Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception, Sincerity & Native Language


The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini

The Deception Sub-Challenge: The Data
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini

Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection
Sarah Ita Levitan, Guozhen An, Min Ma, Rivka Levitan, Andrew Rosenberg, Julia Hirschberg

Is Deception Emotional? An Emotion-Driven Predictive Approach
Shahin Amiriparian, Jouni Pohjalainen, Erik Marchi, Sergey Pugachevskiy, Björn Schuller

Prosodic Cues and Answer Type Detection for the Deception Sub-Challenge
Claude Montacié, Marie-José Caraty

The Sincerity Sub-Challenge: The Data
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini

Automatic Estimation of Perceived Sincerity from Spoken Language
Brandon M. Booth, Rahul Gupta, Pavlos Papadopoulos, Ruchir Travadi, Shrikanth S. Narayanan

Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic Analysis
Gábor Gosztolya, Tamás Grósz, György Szaszák, László Tóth

Minimization of Regression and Ranking Losses with Shallow Neural Networks on Automatic Sincerity Evaluation
Hung-Shin Lee, Yu Tsao, Chi-Chun Lee, Hsin-Min Wang, Wei-Cheng Lin, Wei-Chen Chen, Shan-Wen Hsiao, Shyh-Kang Jeng

Prediction of Deception and Sincerity from Speech Using Automatic Phone Recognition-Based Features
Robert Herms

Sincerity and Deception in Speech: Two Sides of the Same Coin? A Transfer- and Multi-Task Learning Perspective
Yue Zhang, Felix Weninger, Zhao Ren, Björn Schuller

Fusing Acoustic Feature Representations for Computational Paralinguistics Tasks
Heysem Kaya, Alexey A. Karpov


Special Session: Speech, Audio, and Language Processing Techniques Applied to Bird and Animal Vocalizations


Introduction
Naomi Harte, Peter Jančovič, Karl-L. Schuchmann

Poster Overview Presentations
Naomi Harte, Peter Jančovič, Karl-L. Schuchmann

Discussion
Naomi Harte, Peter Jančovič, Karl-L. Schuchmann

Closing Remarks
Naomi Harte, Peter Jančovič, Karl-L. Schuchmann


Dialogue Systems and Analysis of Dialogue


A Stochastic Model for Computer-Aided Human-Human Dialogue
Merwan Barlier, Romain Laroche, Olivier Pietquin

Highlighting Psychological Features for Predicting Child Interjections During Story Telling
Gaël Lejeune, François Rioult, Bruno Crémilleux

Hybrid Dialogue State Tracking for Real World Human-to-Human Dialogues
Kai Sun, Su Zhu, Lu Chen, Siqiu Yao, Xueyang Wu, Kai Yu

Automatic Recognition of Social Roles Using Long Term Role Transitions in Small Group Interactions
Gaurav Fotedar, Aditya Gaonkar P., Saikat Chatterjee, Prasanta Kumar Ghosh

On the Influence of Gender on Interruptions in Multiparty Dialogue
Paul Van Eecke, Raquel Fernández

Detection of User Escalation in Human-Computer Interactions
Ian Beaver, Cynthia Freeman


Interaction between Speech Production and Perception


Assessing Idiosyncrasies in a Bayesian Model of Speech Communication
Marie-Lou Barnaud, Julien Diard, Pierre Bessière, Jean-Luc Schwartz

Prosodic and Linguistic Analysis of Semantic Fluency Data: A Window into Speech Production and Cognition
Maria K. Wolters, Najoung Kim, Jung-Ho Kim, Sarah E. MacPherson, Jong C. Park

Sensorimotor Response to Visual Imagery of Tongue Displacement
William F. Katz, Divya Prabhakaran

Does Auditory-Motor Learning of Speech Transfer from the CV Syllable to the CVCV Word?
Tiphaine Caudrelier, Pascal Perrier, Jean-Luc Schwartz, Amélie Rochet-Capellan

Exemplar Dynamics in Phonetic Convergence of Speech Rate
Antje Schweitzer, Michael Walsh

Articulation Rate in Adverse Listening Conditions in Younger and Older Adults
Outi Tuomainen, Valerie Hazan




Speaker Diarization and Recognition


Speaker Linking and Applications Using Non-Parametric Hashing Methods
Douglas E. Sturim, William M. Campbell

Iterative PLDA Adaptation for Speaker Diarization
Gaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier

A Speaker Diarization System for Studying Peer-Led Team Learning Groups
Harishchandra Dubey, Lakshmish Kaushik, Abhijeet Sangwan, John H.L. Hansen

DNN-Based Speaker Clustering for Speaker Diarisation
Rosanna Milner, Thomas Hain

On the Importance of Efficient Transition Modeling for Speaker Diarization
Itshak Lapidot, Jean-François Bonastre

Priors for Speaker Counting and Diarization with AHC
Gregory Sell, Alan McCree, Daniel Garcia-Romero

Two-Pass IB Based Speaker Diarization System Using Meeting-Specific ANN Based Features
Nauman Dawalatabad, Srikanth Madikeri, Chandra Sekhar C., Hema A. Murthy

DNN-Based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification
Zeyan Oo, Yuta Kawakami, Longbiao Wang, Seiichi Nakagawa, Xiong Xiao, Masahiro Iwahashi

Unit-Selection Attack Detection Based on Unfiltered Frequency-Domain Features
Ulrich Scherhag, Andreas Nautsch, Christian Rathgeb, Christoph Busch

Investigating the Impact of Dialect Prestige on Lexical Decision
Mairym Lloréns Monteserín, Jason Zevin

Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features
Jinxi Guo, Gary Yeung, Deepak Muralidharan, Harish Arsikere, Amber Afshan, Abeer Alwan

Factor Analysis Based Speaker Verification Using ASR
Hang Su, Steven Wegmann

Joint Sound Source Separation and Speaker Recognition
Jeroen Zegers, Hugo Van hamme

Robust Multichannel Gender Classification from Speech in Movie Audio
Naveen Kumar, Md. Nasir, Panayiotis Georgiou, Shrikanth S. Narayanan


Speech Synthesis Poster


Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer
Xavi Gonzalvo, Siamak Tazari, Chun-an Chan, Markus Becker, Alexander Gutkin, Hanna Silen

First Step Towards End-to-End Parametric TTS Synthesis: Generating Spectral Parameters with Neural Attention
Wenfu Wang, Shuang Xu, Bo Xu

The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech Synthesis
Zhengqi Wen, Ya Li, Jianhua Tao

Improved Time-Frequency Trajectory Excitation Vocoder for DNN-Based Speech Synthesis
Eunwoo Song, Frank K. Soong, Hong-Goo Kang

Voice Quality Control Using Perceptual Expressions for Statistical Parametric Speech Synthesis Based on Cluster Adaptive Training
Yamato Ohtani, Koichiro Mori, Masahiro Morita

Waveform Generation Based on Signal Reshaping for Statistical Parametric Speech Synthesis
Felipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu, Simon King

Speaker Representations for Speaker Adaptation in Multiple Speakers’ BLSTM-RNN-Based Speech Synthesis
Yi Zhao, Daisuke Saito, Nobuaki Minematsu

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemysław Szczepaniak

An Investigation of DNN-Based Speech Synthesis Using Speaker Codes
Nobukatsu Hojo, Yusuke Ijima, Hideyuki Mizuno

Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks
Lauri Juvela, Xin Wang, Shinji Takaki, Manu Airaksinen, Junichi Yamagishi, Paavo Alku

Model Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts Framework
Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai

Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN
Blaise Potard, Matthew P. Aylett, David A. Baude, Petr Motlicek

Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody
Alexandros Lazaridis, Milos Cernak, Philip N. Garner

On Smoothing and Enhancing Dynamics of Pitch Contours Represented by Discrete Orthogonal Polynomials for Prosody Generation
Chen-Yu Chiang

An Investigation of Recurrent Neural Network Architectures Using Word Embeddings for Phrase Break Prediction
Anandaswarup Vadapalli, Suryakanth V. Gangashetty

Model-Based Parametric Prosody Synthesis with Deep Neural Network
Hao Liu, Heng Lu, Xu Shao, Yi Xu


Language Model Adaptation


Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models
Thomas Drugman, Janne Pylkkönen, Reinhard Kneser

Learning N-Gram Language Models from Uncertain Data
Vitaly Kuznetsov, Hank Liao, Mehryar Mohri, Michael Riley, Brian Roark

Entropy Based Pruning for Non-Negative Matrix Based Language Models with Contextual Features
Barlas Oğuz, Issac Alphonso, Shuangyu Chang

Unsupervised Adaptation of Recurrent Neural Network Language Models
Siva Reddy Gangireddy, Pawel Swietojanski, Peter Bell, Steve Renals

Contextual Prediction Models for Speech Recognition
Yoni Halpern, Keith Hall, Vlad Schogol, Michael Riley, Brian Roark, Gleb Skobeltsyn, Martin Bäuml

Combining Feature and Model-Based Adaptation of RNNLMs for Multi-Genre Broadcast Speech Recognition
Salil Deena, Madina Hasan, Mortaza Doulaty, Oscar Saz, Thomas Hain



Robustness in Speech Processing


Data Selection by Sequence Summarizing Neural Network in Mismatch Condition Training
Kateřina Žmolíková, Martin Karafiát, Karel Veselý, Marc Delcroix, Shinji Watanabe, Lukáš Burget, Jan Černocký

Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition
Souvik Kundu, Khe Chai Sim, Mark J.F. Gales

Robust Speech Recognition Using Generalized Distillation Framework
Konstantin Markov, Tomoko Matsui

Adversarial Multi-Task Learning of Deep Neural Networks for Robust Speech Recognition
Yusuke Shinohara

The Use of Locally Normalized Cepstral Coefficients (LNCC) to Improve Speaker Recognition Accuracy in Highly Reverberant Rooms
Víctor Poblete, Juan Pablo Escudero, Josué Fredes, José Novoa, Richard M. Stern, Simon King, Néstor Becerra Yoma

Two-Stage Data Augmentation for Low-Resourced Speech Recognition
William Hartmann, Tim Ng, Roger Hsiao, Stavros Tsakalidis, Richard Schwartz


Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception, Sincerity & Native Language


The Native Language Sub-Challenge: The Data
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini

Native Language Identification Using Spectral and Source-Based Features
Avni Rajpal, Tanvina B. Patel, Hardik B. Sailor, Maulik C. Madhavi, Hemant A. Patil, Hiroya Fujisaki

Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features
Yishan Jiao, Ming Tu, Visar Berisha, Julie Liss

Convolutional Neural Networks with Data Augmentation for Classifying Speakers’ Native Language
Gil Keren, Jun Deng, Jouni Pohjalainen, Björn Schuller

Native Language Detection Using the I-Vector Framework
Mohammed Senoussaoui, Patrick Cardinal, Najim Dehak, Alessandro L. Koerich

Within-Speaker Features for Native Language Recognition in the Interspeech 2016 Computational Paralinguistics Challenge
Mark Huckvale

Multimodal Fusion of Multirate Acoustic, Prosodic, and Lexical Speaker Characteristics for Native Language Identification
Prashanth Gurunath Shivakumar, Sandeep Nallan Chakravarthula, Panayiotis Georgiou

Exploiting Phone Log-Likelihood Ratio Features for the Detection of the Native Language of Non-Native English Speakers
Alberto Abad, Eugénio Ribeiro, Fábio Kepler, Ramon Astudillo, Isabel Trancoso

Determining Native Language and Deception Using Phonetic Features and Classifier Combination
Gábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete, László Tóth

The INTERSPEECH 2016 Computational Paralinguistics Challenge: A Summary of Results
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini

Discussion
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini


Acoustic and Articulatory Phonetics


A Preliminary Ultrasound Study of Nasal and Lateral Coronals in Arrernte
Marija Tabain, Richard Beare

Illustrating the Production of the International Phonetic Alphabet Sounds Using Fast Real-Time Magnetic Resonance Imaging
Asterios Toutios, Sajan Goud Lingala, Colin Vaz, Jangwon Kim, John Esling, Patricia Keating, Matthew Gordon, Dani Byrd, Louis Goldstein, Krishna S. Nayak, Shrikanth S. Narayanan

Marginal Contrast Among Romanian Vowels: Evidence from ASR and Functional Load
Margaret E.L. Renwick, Ioana Vasilescu, Camille Dutrey, Lori Lamel, Bianca Vieru

Effects of Subglottal-Coupling and Interdental-Space on Formant Trajectories During Front-to-Back Vowel Transitions in Chinese
Shuanglin Fan, Kiyoshi Honda, Jianwu Dang, Hui Feng

Perceptual Lateralization of Coda Rhotic Production in Puerto Rican Spanish
Mairym Lloréns Monteserín, Shrikanth S. Narayanan, Louis Goldstein

Interaction Between Lexical Tone and Intonation: An EMA Study
Hao Yi, Sam Tilsen


Speech Synthesis Oral I: Neural Networks


Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion
Huaiping Ming, Dongyan Huang, Lei Xie, Jie Wu, Minghui Dong, Haizhou Li

Visual Speech Synthesis Using Dynamic Visemes, Contextual Features and DNNs
Ausdang Thangthai, Ben Milner, Sarah Taylor

A Template-Based Approach for Speech Synthesis Intonation Generation Using LSTMs
Srikanth Ronanki, Gustav Eje Henter, Zhizheng Wu, Simon King

Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis
Bo Li, Heiga Zen

GlottDNN — A Full-Band Glottal Vocoder for Statistical Parametric Speech Synthesis
Manu Airaksinen, Bajibabu Bollepalli, Lauri Juvela, Zhizheng Wu, Simon King, Paavo Alku

Singing Voice Synthesis Based on Deep Neural Networks
Masanari Nishimura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda



Speech Translation and Metadata for Linguistic/Discourse Structure


Dynamic Transcription for Low-Latency Speech Translation
Jan Niehues, Thai Son Nguyen, Eunah Cho, Thanh-Le Ha, Kevin Kilgour, Markus Müller, Matthias Sperber, Sebastian Stüker, Alex Waibel

Learning a Translation Model from Word Lattices
Oliver Adams, Graham Neubig, Trevor Cohn, Steven Bird

Disfluency Detection Using a Bidirectional LSTM
Vicky Zayats, Mari Ostendorf, Hannaneh Hajishirzi

Sentence Boundary Detection Based on Parallel Lexical and Acoustic Models
Xiaoyin Che, Sheng Luo, Haojin Yang, Christoph Meinel

Transferring Emphasis in Speech Translation Using Hard-Attentional Neural Network Models
Quoc Truong Do, Sakriani Sakti, Graham Neubig, Satoshi Nakamura

Better Evaluation of ASR in Speech Translation Context Using Word Embeddings
Ngoc-Tien Le, Christophe Servan, Benjamin Lecouteux, Laurent Besacier



Special Session: Speech, Audio, and Language Processing Techniques Applied to Bird and Animal Vocalizations


Bird Song Synthesis Based on Hidden Markov Models
Jordi Bonada, Robert Lachlan, Merlijn Blaauw

Noise-Robust Hidden Markov Models for Limited Training Data for Within-Species Bird Phrase Classification
Kantapon Kaewtip, Charles Taylor, Abeer Alwan

A Framework for Automated Marmoset Vocalization Detection and Classification
Alan Wisler, Laura J. Brattain, Rogier Landman, Thomas F. Quatieri

Call Alternation Between Specific Pairs of Male Frogs Revealed by a Sound-Imaging Method in Their Natural Habitat
Ikkyu Aihara, Takeshi Mizumoto, Hiromitsu Awano, Hiroshi G. Okuno

Sinusoidal Modelling for Ecoacoustics
Patrice Guyot, Alice Eldridge, Ying Chen Eyre-Walker, Alison Johnston, Thomas Pellegrini, Mika Peck

Individual Identity in Songbirds: Signal Representations and Metric Learning for Locating the Information in Complex Corvid Calls
Dan Stowell, Veronica Morfi, Lisa F. Gill

Recognition of Multiple Bird Species Based on Penalised Maximum Likelihood and HMM-Based Modelling of Individual Vocalisation Elements
Peter Jančovič, Münevver Köküer

Cost Effective Acoustic Monitoring of Bird Species
Ciira wa Maina

Feature Learning and Automatic Segmentation for Dolphin Communication Analysis
Daniel Kohlsdorf, Denise Herzing, Thad Starner

Localizing Bird Songs Using an Open Source Robot Audition System with a Microphone Array
Reiji Suzuki, Shiho Matsubayashi, Kazuhiro Nakadai, Hiroshi G. Okuno

Robust Detection of Multiple Bioacoustic Events with Repetitive Structures
Frank Kurth

A Real-Time Parametric General-Purpose Mammalian Vocal Synthesiser
Roger K. Moore

YIN-Bird: Improved Pitch Tracking for Bird Vocalisations
Colm O’Reilly, Nicola M. Marples, David J. Kelly, Naomi Harte


Learning, Education and Different Speech


Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision Functions
Yao-Chi Hsu, Ming-Han Yang, Hsiao-Tsung Hung, Berlin Chen

Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered Speech
Peter A. Heeman, Rebecca Lunsford, Andy McMillin, J. Scott Yaruss

Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale
Simin Xie, Nan Yan, Ping Yu, Manwa L. Ng, Lan Wang, Zhuanzhuan Ji

Automated Screening of Speech Development Issues in Children by Identifying Phonological Error Patterns
Lauren Ward, Alessandro Stefani, Daniel Smith, Andreas Duenser, Jill Freyne, Barbara Dodd, Angela Morgan

Automatic Pronunciation Evaluation of Non-Native Mandarin Tone by Using Multi-Level Confidence Measures
Ju Lin, Yanlu Xie, Jinsong Zhang

Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model
Myungjong Kim, Jun Wang, Hoirin Kim

Detection of Total Syllables and Canonical Syllables in Infant Vocalizations
Anne S. Warlaumont, Heather L. Ramsdell-Hudock

Improving Automatic Recognition of Aphasic Speech with AphasiaBank
Duc Le, Emily Mower Provost

Pronunciation Assessment of Japanese Learners of French with GOP Scores and Phonetic Information
Vincent Laborde, Thomas Pellegrini, Lionel Fontan, Julie Mauclair, Halima Sahraoui, Jérôme Farinas

Pronunciation Error Detection for New Language Learners
Sean Robertson, Cosmin Munteanu, Gerald Penn

L2 English Rhythm in Read Speech by Chinese Students
Hongwei Ding, Xinping Xu



Topics in Speech Recognition


How Neural Network Depth Compensates for HMM Conditional Independence Assumptions in DNN-HMM Acoustic Models
Suman Ravuri, Steven Wegmann

Jointly Learning to Locate and Classify Words Using Convolutional Networks
Dimitri Palaz, Gabriel Synnaeve, Ronan Collobert

On the Efficient Representation and Execution of Deep Acoustic Models
Raziel Alvarez, Rohit Prabhavalkar, Anton Bakhtin

Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI
Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahremani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur

Virtual Adversarial Training Applied to Neural Higher-Order Factors for Phone Classification
Martin Ratajczak, Sebastian Tschiatschek, Franz Pernkopf

Sequence Student-Teacher Training of Deep Neural Networks
Jeremy H.M. Wong, Mark J.F. Gales


Special Session: Realism in Robust Speech Processing


Robustness in Speech, Speaker, and Language Recognition: “You’ve Got to Know Your Limitations”
John H.L. Hansen, Hynek Bořil

The Use of Read versus Conversational Lombard Speech in Spectral Tilt Modeling for Intelligibility Enhancement in Near-End Noise Conditions
Emma Jokinen, Ulpu Remes, Paavo Alku

Corpora for the Evaluation of Robust Speaker Recognition Systems
Douglas E. Sturim, Pedro A. Torres-Carrasquillo, Joseph P. Campbell

A French Corpus for Distant-Microphone Speech Processing in Real Homes
Nancy Bertin, Ewen Camberlein, Emmanuel Vincent, Romain Lebarbenchon, Stéphane Peillon, Éric Lamande, Sunit Sivasankaran, Frédéric Bimbot, Irina Illina, Ariane Tom, Sylvain Fleury, Éric Jamet

Realistic Multi-Microphone Data Simulation for Distant Speech Recognition
Mirco Ravanelli, Piergiorgio Svaizer, Maurizio Omologo

Synthesis of Device-Independent Noise Corpora for Realistic ASR Evaluation
Hannes Gamper, Mark R.P. Thomas, Lyle Corbin, Ivan Tashev

Speaker Recognition Using Real vs Synthetic Parallel Data for DNN Channel Compensation
Fred Richardson, Michael Brandstein, Jennifer Melot, Douglas Reynolds

Discussion
Dayana Ribas, Emmanuel Vincent, John H.L. Hansen, Emma Jokinen, Mirco Ravanelli, Hannes Gamper, Fred Richardson





Dialogue: Backchannels and Turntaking


Prediction and Generation of Backchannel Form for Attentive Listening Systems
Tatsuya Kawahara, Takashi Yamaguchi, Koji Inoue, Katsuya Takanashi, Nigel Ward

Measuring Turn-Taking Offsets in Human-Human Dialogues
Rebecca Lunsford, Peter A. Heeman, Emma Rennie

Using Past Speaker Behavior to Better Predict Turn Transitions
Tomer Meshorer, Peter A. Heeman

Quantitative Analysis of Backchannels Uttered by an Interviewer During Neuropsychological Tests
Gérard Bailly, Frédéric Elisei, Alexandra Juphard, Olivier Moreaud

Predicting User Satisfaction from Turn-Taking in Spoken Conversations
Shammur Absar Chowdhury, Evgeny A. Stepanov, Giuseppe Riccardi

Towards Building an Attentive Artificial Listener: On the Perception of Attentiveness in Feedback Utterances
Catharine Oertel, Joakim Gustafson, Alan W. Black


Language Recognition


Language Recognition via Sparse Coding
Youngjune L. Gwon, William M. Campbell, Douglas E. Sturim, H.T. Kung

A Feature Normalisation Technique for PLLR Based Language Identification Systems
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah

An Investigation of Deep Neural Network Architectures for Language Recognition in Indian Languages
Mounika K.V., Sivanand Achanta, Lakshmi H. R., Suryakanth V. Gangashetty, Anil Kumar Vuppala

Automatic Dialect Detection in Arabic Broadcast Speech
Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, Steve Renals

Combining Weak Tokenisers for Phonotactic Language Recognition in a Resource-Constrained Setting
Raymond W.M. Ng, Bhusan Chettri, Thomas Hain

End-to-End Language Identification Using Attention-Based Recurrent Neural Networks
Wang Geng, Wenfu Wang, Yuanyuan Zhao, Xinyuan Cai, Bo Xu

Enhancing Multilingual Recognition of Emotion in Speech by Language Identification
Hesam Sagha, Pavel Matějka, Maryna Gavryukova, Filip Povolny, Erik Marchi, Björn Schuller


Speech and Audio Segmentation and Classification


Deep Neural Network Bottleneck Features for Acoustic Event Recognition
Seongkyu Mun, Suwon Shon, Wooil Kim, Hanseok Ko

Combining Energy and Cross-Entropy Analysis for Nuclear Segments Detection
Antonio Origlia, Francesco Cutugno

Anchored Speech Detection
Roland Maas, Sree Hari Krishnan Parthasarathi, Brian King, Ruitong Huang, Björn Hoffmeister

Towards Smart-Cars That Can Listen: Abnormal Acoustic Event Detection on the Road
Mahesh Kumar Nandwana, Taufiq Hasan

Hierarchical Classification of Speaker and Background Noise and Estimation of SNR Using Sparse Representation
K.V. Vijay Girish, A.G. Ramakrishnan, T.V. Ananthapadmanabha

Robust Sound Event Detection in Continuous Audio Environments
Haomin Zhang, Ian McLoughlin, Yan Song

Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition
Naoya Takahashi, Michael Gygli, Beat Pfister, Luc Van Gool

Artificial Neural Network-Based Feature Combination for Spatial Voice Activity Detection
Stefan Meier, Walter Kellermann

HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors
Tomi Kinnunen, Alexey Sholokhov, Elie Khoury, Dennis Alexander Lehmann Thomsen, Md. Sahidullah, Zheng-Hua Tan

Manual versus Automated: The Challenging Routine of Infant Vocalisation Segmentation in Home Videos to Study Neuro(mal)development
Florian B. Pokorny, Robert Peharz, Wolfgang Roth, Matthias Zöhrer, Franz Pernkopf, Peter B. Marschik, Björn Schuller

Minimizing Annotation Effort for Adaptation of Speech-Activity Detection Systems
Luciana Ferrer, Martin Graciarena


New Products and Services


Progress and Prospects for Spoken Language Technology: What Ordinary People Think
Roger K. Moore, Hui Li, Shih-Hao Liao

Progress and Prospects for Spoken Language Technology: Results from Four Sexennial Surveys
Roger K. Moore, Ricard Marxer

On Employing a Highly Mismatched Crowd for Speech Transcription
Purushotam Radadia, Rahul Kumar, Kanika Kalra, Shirish Karande, Sachin Lodha

Sage: The New BBN Speech Processing Platform
Roger Hsiao, Ralf Meermeier, Tim Ng, Zhongqiang Huang, Maxwell Jordan, Enoch Kan, Tanel Alumäe, Jan Silovsky, William Hartmann, Francis Keith, Omer Lang, Manhung Siu, Owen Kimball

DNN-Based Feature Enhancement Using Joint Training Framework for Robust Multichannel Speech Recognition
Kang Hyun Lee, Tae Gyoon Kang, Woo Hyun Kang, Nam Soo Kim

Deep Neural Network Frontend for Continuous EMG-Based Speech Recognition
Michael Wand, Jürgen Schmidhuber

Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages
Basil Abraham, S. Umesh, Neethu Mariam Joy

Multi-Language Neural Network Language Models
Anton Ragni, Edgar Dakin, Xie Chen, Mark J.F. Gales, Kate M. Knill

Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration
Ottokar Tilk, Tanel Alumäe

TheanoLM — An Extensible Toolkit for Neural Network Language Modeling
Seppo Enarvi, Mikko Kurimo

Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems
P. Lanchantin, Mark J.F. Gales, Penny Karanasou, X. Liu, Y. Qian, L. Wang, P.C. Woodland, C. Zhang

Manipulating Word Lattices to Incorporate Human Corrections
Yashesh Gaur, Florian Metze, Jeffrey P. Bigham

Context-Aware Restaurant Recommendation for Natural Language Queries: A Formative User Study in the Automotive Domain
Philipp Fischer, Cornelius Styp von Rekowski, Andreas Nürnberger

Teaming Up: Making the Most of Diverse Representations for a Novel Personalized Speech Retrieval Application
Stephanie Pancoast, Murat Akbacak

Automatic Speech Transcription for Low-Resource Languages — The Case of Yoloxóchitl Mixtec (Mexico)
Vikramjit Mitra, Andreas Kathol, Jonathan D. Amith, Rey Castillo García

Real-Time Presentation Tracking Using Semantic Keyword Spotting
Reza Asadi, Harriet J. Fell, Timothy Bickmore, Ha Trinh


Low Resource Speech Recognition


Deriving Phonetic Transcriptions and Discovering Word Segmentations for Speech-to-Speech Translation in Low-Resource Settings
Andrew Wilkinson, Tiancheng Zhao, Alan W. Black

Unsupervised Joint Estimation of Grapheme-to-Phoneme Conversion Systems and Acoustic Model Adaptation for Non-Native Speech Recognition
Satoshi Tsujioka, Sakriani Sakti, Koichiro Yoshino, Graham Neubig, Satoshi Nakamura

Learning Personalized Pronunciations for Contact Name Recognition
Antoine Bruguier, Fuchun Peng, Françoise Beaufays

Generation and Pruning of Pronunciation Variants to Improve ASR Accuracy
Zhenhao Ge, Aravind Ganapathiraju, Ananth N. Iyer, Scott A. Randal, Felix I. Wyss

Optimizing Speech Recognition Evaluation Using Stratified Sampling
Janne Pylkkönen, Thomas Drugman, Max Bisani



Special Event: Speech Ventures


Speech Ventures
Nicolas Scheffer, Korbinian Riedhammer, Alexandre Lebrun, David Suendermann-Oeft


Special Session: Speech and Language Technologies for Human-Machine Conversation-Based Language Education


Context Aware Mispronunciation Detection for Mandarin Pronunciation Training
Rong Tong, Nancy F. Chen, Bin Ma, Haizhou Li

DNN Online with iVectors Acoustic Modeling and Doc2Vec Distributed Representations for Improving Automated Speech Scoring
Jidong Tao, Lei Chen, Chong Min Lee

Self-Adaptive DNN for Improving Spoken Language Proficiency Assessment
Yao Qian, Xinhao Wang, Keelan Evanini, David Suendermann-Oeft

Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees
Wei Li, Kehuang Li, Sabato Marco Siniscalchi, Nancy F. Chen, Chin-Hui Lee

Phoneme Set Design Considering Integrated Acoustic and Linguistic Features of Second Language Speech
Xiaoyun Wang, Tsuneo Kato, Seiichi Yamamoto

HMM-Based Non-Native Accent Assessment Using Posterior Features
Ramya Rasipuram, Milos Cernak, Mathew Magimai-Doss

Automatic Assessment and Error Detection of Shadowing Speech: Case of English Spoken by Japanese Learners
Shuju Shi, Yosuke Kashiwagi, Shohei Toyama, Junwei Yue, Yutaka Yamauchi, Daisuke Saito, Nobuaki Minematsu




Language Recognition


Results of The 2015 NIST Language Recognition Evaluation
Hui Zhao, Désiré Bansé, George Doddington, Craig Greenberg, Jaime Hernández-Cordero, John Howard, Lisa Mason, Alvin Martin, Douglas Reynolds, Elliot Singer, Audrey Tong

The 2015 NIST Language Recognition Evaluation: The Shared View of I2R, Fantastic4 and SingaMS
Kong Aik Lee, Haizhou Li, Li Deng, Ville Hautamäki, Wei Rao, Xiong Xiao, Anthony Larcher, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Jianshu Chen, Ivan Kukanov, Amir Hossein Poorjam, Trung Ngo Trong, Cheng-Lin Xu, Haihua Xu, Bin Ma, Eng Siong Chng, Sylvain Meignier

Pair-Wise Distance Metric Learning of Neural Network Model for Spoken Language Identification
Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai

Non-Iterative Parameter Estimation for Total Variability Model Using Randomized Singular Value Decomposition
Ruchir Travadi, Shrikanth S. Narayanan

Stacked Long-Term TDNN for Spoken Language Recognition
Daniel Garcia-Romero, Alan McCree

A Divide-and-Conquer Approach for Language Identification Based on Recurrent Neural Networks
G. Gelly, Jean-Luc Gauvain, V.B. Le, A. Messaoudi



Language Recognition


Exploiting Hidden-Layer Responses of Deep Neural Networks for Language Recognition
Ruizhi Li, Sri Harish Mallidi, Lukáš Burget, Oldřich Plchot, Najim Dehak

Out of Set Language Modelling in Hierarchical Language Identification
Saad Irtza, Vidhyasaharan Sethu, Sarith Fernando, Eliathamby Ambikairajah, Haizhou Li

Language Identification Based on Generative Modeling of Posteriorgram Sequences Extracted from Frame-by-Frame DNNs and LSTM-RNNs
Ryo Masumura, Taichi Asami, Hirokazu Masataki, Yushi Aono, Sumitaka Sakauchi

Gating Recurrent Enhanced Memory Neural Networks on Language Identification
Wang Geng, Yuanyuan Zhao, Wenfu Wang, Xinyuan Cai, Bo Xu

Sequence Summarizing Neural Networks for Spoken Language Recognition
Jan Pešán, Lukáš Burget, Jan Černocký

The Role of Spectral Resolution in Foreign-Accented Speech Perception
Michelle R. Kapolowicz, Vahid Montazeri, Peter F. Assmann

THU-EE System Description for NIST LRE 2015
Liang He, Yao Tian, Yi Liu, Jiaming Xu, Weiwei Liu, Cai Meng, Jia Liu

Variation in Spoken North Sami Language
Kristiina Jokinen, Trung Ngo Trong, Ville Hautamäki


Music, Audio, and Source Separation


Improved Music Genre Classification with Convolutional Neural Networks
Weibin Zhang, Wenkang Lei, Xiangmin Xu, Xiaofeng Xing

Enhanced Harmonic Content and Vocal Note Based Predominant Melody Extraction from Vocal Polyphonic Music Signals
Gurunath Reddy M., K. Sreenivasa Rao

Long Short-Term Memory for Speaker Generalization in Supervised Speech Separation
Jitong Chen, DeLiang Wang

Phonotactic Language Identification for Singing
Anna M. Kruspe

Comparing the Influence of Spectro-Temporal Integration in Computational Speech Segregation
Thomas Bentsen, Tobias May, Abigail A. Kressner, Torsten Dau

Blind Speech Separation with GCC-NMF
Sean U.N. Wood, Jean Rouat

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking
Vahid Montazeri, Shaikat Hossain, Peter F. Assmann

Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks
Emad M. Grais, Gerard Roma, Andrew J.R. Simpson, Mark D. Plumbley

Monaural Source Separation Using a Random Forest Classifier
Cosimo Riday, Saurabh Bhargava, Richard H.R. Hahnloser, Shih-Chii Liu

Adaptive Group Sparsity for Non-Negative Matrix Factorization with Application to Unsupervised Source Separation
Xu Li, Ziteng Wang, Xiaofei Wang, Qiang Fu, Yonghong Yan

A Robust Dual-Microphone Speech Source Localization Algorithm for Reverberant Environments
Yanmeng Guo, Xiaofei Wang, Chao Wu, Qiang Fu, Ning Ma, Guy J. Brown

Speech Localisation in a Multitalker Mixture by Humans and Machines
Ning Ma, Guy J. Brown

Reverberation-Robust One-Bit TDOA Based Moving Source Localization for Automatic Camera Steering
Harshavardhan Sundar, Gokul Deepak Manavalan, T.V. Sreenivas, Chandra Sekhar Seelamantula

Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud Storage
Keiko Ochi, Nobutaka Ono, Shigeki Miyabe, Shoji Makino


Acoustic Modeling with Neural Networks


Phase-Aware Signal Processing for Automatic Speech Recognition
Johannes Fahringer, Tobias Schrank, Johannes Stahl, Pejman Mowlaee, Franz Pernkopf

Unsupervised Deep Auditory Model Using Stack of Convolutional RBMs for Speech Recognition
Hardik B. Sailor, Hemant A. Patil

Interpretation of Low Dimensional Neural Network Bottleneck Features in Terms of Human Perception and Production
Philip Weber, Linxue Bai, Martin Russell, Peter Jančovič, Stephen Houghton

Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition
Shiliang Zhang, Hui Jiang, Shifu Xiong, Si Wei, Li-Rong Dai

Future Context Attention for Unidirectional LSTM Based Acoustic Model
Jian Tang, Shiliang Zhang, Si Wei, Li-Rong Dai

Hybrid Accelerated Optimization for Speech Recognition
Jen-Tzung Chien, Pei-Wen Huang, Tan Lee

On Online Attention-Based Speech Recognition and Joint Mandarin Character-Pinyin Training
William Chan, Ian Lane

GMM-Free Flat Start Sequence-Discriminative DNN Training
Gábor Gosztolya, Tamás Grósz, László Tóth

Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach
Yajie Miao, Florian Metze

Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling
Yuanyuan Zhao, Shuang Xu, Bo Xu

Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models
Albert Zeyer, Ralf Schlüter, Hermann Ney

Advances in Very Deep Convolutional Neural Networks for LVCSR
Tom Sercu, Vaibhava Goel

Acoustic Modelling from the Signal Domain Using CNNs
Pegah Ghahremani, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur

Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition
Yevgen Chebotar, Austin Waters

Triphone State-Tying via Deep Canonical Correlation Analysis
Weiran Wang, Hao Tang, Karen Livescu

Low-Rank Representation of Nearest Neighbor Posterior Probabilities to Enhance DNN Based Acoustic Modeling
Gil Luyet, Pranay Dighe, Afsaneh Asaei, Hervé Bourlard


Robustness and Adaptation


Improving Large Vocabulary Accented Mandarin Speech Recognition with Attribute-Based I-Vectors
Hao Zheng, Shanshan Zhang, Liwei Qiao, Jianping Li, Wenju Liu

Pitch-Adaptive Front-End Features for Robust Children’s ASR
S. Shahnawazuddin, Abhishek Dey, Rohit Sinha

ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks
Miguel Ángel del-Agua, Santiago Piqueras, Adrià Giménez, Alberto Sanchis, Jorge Civera, Alfons Juan

Automatic Correction of ASR Outputs by Using Machine Translation
Luis Fernando D’Haro, Rafael E. Banchs

A Framework for Practical Multistream ASR
Sri Harish Mallidi, Hynek Hermansky

DNNs for Unsupervised Extraction of Pseudo FMLLR Features Without Explicit Adaptation Data
Neethu Mariam Joy, Murali Karthick Baskar, S. Umesh, Basil Abraham

Multi-Attribute Factorized Hidden Layer Adaptation for DNN Acoustic Models
Lahiru Samarakoon, Khe Chai Sim

Speaker Normalization Through Feature Shifting of Linearly Transformed i-Vector
Jahyun Goo, Younggwan Kim, Hyungjun Lim, Hoirin Kim


Special Event: Computational Approaches to Linguistic Code Switching


Computational Approaches to Linguistic Code Switching
Mona Diab, Pascale Fung, Julia Hirschberg, Thamar Solorio


Neural Networks for Language Modeling


Compositional Neural Network Language Models for Agglutinative Languages
Ebru Arisoy, Murat Saraclar

NN-Grams: Unifying Neural Network and n-Gram Language Models for Speech Recognition
Babak Damavandi, Shankar Kumar, Noam Shazeer, Antoine Bruguier

Recurrent Neural Network Language Model with Incremental Updated Context Information Generated Using Bag-of-Words Representation
Md. Akmal Haidar, Mikko Kurimo

Sequential Recurrent Neural Networks for Language Modeling
Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow

Word-Phrase-Entity Recurrent Neural Networks for Language Modeling
Michael Levit, Sarangarajan Parthasarathy, Shuangyu Chang

LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition
Kazuki Irie, Zoltán Tüske, Tamer Alkhouli, Ralf Schlüter, Hermann Ney


Special Session: Sub-Saharan African Languages: From Speech Fundamentals to Applications


Automatic Speech Recognition Using Probabilistic Transcriptions in Swahili, Amharic, and Dinka
Amit Das, Preethi Jyothi, Mark Hasegawa-Johnson

Speed Perturbation and Vowel Duration Modeling for ASR in Hausa and Wolof Languages
Elodie Gauthier, Laurent Besacier, Sylvie Voisin

Improving the Lwazi ASR Baseline
Charl van Heerden, Neil Kleynhans, Marelie Davel

Preliminary Experiments on Unsupervised Word Discovery in Mboshi
Pierre Godard, Gilles Adda, Martine Adda-Decker, Alexandre Allauzen, Laurent Besacier, Hélène Bonneau-Maynard, Guy-Noël Kouarata, Kevin Löser, Annie Rialland, François Yvon

Unsupervised Phoneme Segmentation of Previously Unseen Languages
Marco Vetter, Markus Müller, Fatima Hamlaoui, Graham Neubig, Satoshi Nakamura, Sebastian Stüker, Alex Waibel

CNN-Based Phone Segmentation Experiments in a Less-Represented Language
Céline Manenti, Thomas Pellegrini, Julien Pinquier

Part-of-Speech Tagging and Chunking in Text-to-Speech Synthesis for South African Languages
Georg I. Schlünz, Nkosikhona Dlamini, Rynhardt P. Kruger

The Effect of Postlexical Deletion on Automatic Speech Recognition in Fast Spontaneously Spoken Zulu
Ewald van der Westhuizen, Thomas Niesler


Speech Production Models


A New Model of Speech Motor Control Based on Task Dynamics and State Feedback
Vikram Ramanarayanan, Benjamin Parrell, Louis Goldstein, Srikantan Nagarajan, John Houde

Using a Biomechanical Model and Articulatory Data for the Numerical Production of Vowels
Saeed Dabbaghchian, Marc Arnela, Olov Engwall, Oriol Guasch, Ian Stavness, Pierre Badin

A New Model for Acoustic Wave Propagation and Scattering in the Vocal Tract
Jianguo Wei, Wendan Guan, Darcy Q. Hou, Dingyi Pan, Wenhuan Lu, Jianwu Dang

Uncontrolled Manifolds in Vowel Production: Assessment with a Biomechanical Model of the Tongue
Andrew Szabados, Pascal Perrier

Experimental Validation of Sound Generated from Flow in Simplified Vocal Tract Model of Sibilant /s/
Tsukasa Yoshinaga, Kazunori Nozaki, Shigeo Wada

Bayesian Modeling in Speech Motor Control: A Principled Structure for the Integration of Various Constraints
Jean-François Patri, Pascal Perrier, Julien Diard



Speaker Recognition


On the Influence of Text Content on Pass-Phrase Strength for Short-Duration Text-Dependent Automatic Speaker Authentication
Giacomo Valenti, Adrien Daniel, Nicholas Evans

Articulation Rate Filtering of CQCC Features for Automatic Speaker Verification
Massimiliano Todisco, Héctor Delgado, Nicholas Evans

The IBM Speaker Recognition System: Recent Advances and Error Analysis
Seyed Omid Sadjadi, Jason W. Pelecanos, Sriram Ganapathy

Probabilistic Approach Using Joint Clean and Noisy i-Vectors Modeling for Speaker Recognition
Waad Ben Kheder, Driss Matrouf, Moez Ajili, Jean-François Bonastre

Generalized Discriminant Analysis (GDA) for Improved i-Vector Based Speaker Recognition
Fahimeh Bahmaninezhad, John H.L. Hansen

Noise and Metadata Sensitive Bottleneck Features for Improving Speaker Recognition with Non-Native Speech Input
Yao Qian, Jidong Tao, David Suendermann-Oeft, Keelan Evanini, Alexei V. Ivanov, Vikram Ramanarayanan


VAD and Audio Events


Robust Audio Event Recognition with 1-Max Pooling Convolutional Neural Networks
Huy Phan, Lars Hertel, Marco Maass, Alfred Mertins

Audio-Based Distributional Representations of Meaning Using a Fusion of Feature Encodings
Giannis Karamanolakis, Elias Iosif, Athanasia Zlatintsi, Aggelos Pikrakis, Alexandros Potamianos

Robust DNN-Based VAD Augmented with Phone Entropy Based Rejection of Background Speech
Yuya Fujita, Ken-ichi Iso

Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection
Ruben Zazo, Tara N. Sainath, Gabor Simko, Carolina Parada

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation
Martin Graciarena, Luciana Ferrer, Vikramjit Mitra

Model Adaptation and Active Learning in the BBN Speech Activity Detection System for the DARPA RATS Program
Damianos Karakos, Scott Novotney, Le Zhang, Richard Schwartz


Spoken Term Detection


Fusion Strategies for Robust Speech Recognition and Keyword Spotting for Channel- and Noise-Degraded Speech
Vikramjit Mitra, Julien VanHout, Wen Wang, Chris Bartels, Horacio Franco, Dimitra Vergyri, Abeer Alwan, Adam Janin, John H.L. Hansen, Richard M. Stern, Abhijeet Sangwan, Nelson Morgan

Recurrent Neural Network-Based Phoneme Sequence Estimation Using Multiple ASR Systems’ Outputs for Spoken Term Detection
Naoki Sawada, Hiromitsu Nishizaki

Enhancing Data-Driven Phone Confusions Using Restricted Recognition
Mark Kane, Julie Carson-Berndsen

Rapid Update of Multilingual Deep Neural Network for Low-Resource Keyword Search
Chongjia Ni, Lei Wang, Cheung-Chi Leung, Feng Rao, Li Lu, Bin Ma, Haizhou Li

Toward High-Performance Language-Independent Query-by-Example Spoken Term Detection for MediaEval 2015: Post-Evaluation Analysis
Cheung-Chi Leung, Lei Wang, Haihua Xu, Jingyong Hou, Van Tung Pham, Hang Lv, Lei Xie, Xiong Xiao, Chongjia Ni, Bin Ma, Eng Siong Chng, Haizhou Li


Speech Enhancement and Noise Reduction


Novel Subband Autoencoder Features for Non-Intrusive Quality Assessment of Noise Suppressed Speech
Meet H. Soni, Hemant A. Patil

SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement
Tian Gao, Jun Du, Li-Rong Dai, Chin-Hui Lee

A Novel Risk-Estimation-Theoretic Framework for Speech Enhancement in Nonstationary and Non-Gaussian Noise Conditions
Jishnu Sadasivan, Chandra Sekhar Seelamantula

Two-Stage Temporal Processing for Single-Channel Speech Enhancement
Suman Samui, Indrajit Chakrabarti, Soumya Kanti Ghosh

A Class-Specific Speech Enhancement for Phoneme Recognition: A Dictionary Learning Approach
Nazreen P.M., A.G. Ramakrishnan, Prasanta Kumar Ghosh

Robust Example Search Using Bottleneck Features for Example-Based Speech Enhancement
Atsunori Ogawa, Shogo Seki, Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Kazuya Takeda

Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks
Anurag Kumar, Dinei Florencio

Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement
Prashanth Gurunath Shivakumar, Panayiotis Georgiou

HMM-Based Speech Enhancement Using Sub-Word Models and Noise Adaptation
Akihiro Kato, Ben Milner

Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech
Li Li, Hirokazu Kameoka, Takuya Higuchi, Hiroshi Saruwatari

A priori SNR Estimation Using a Generalized Decision Directed Approach
Aleksej Chinaev, Reinhold Haeb-Umbach

A DNN-HMM Approach to Non-Negative Matrix Factorization Based Speech Enhancement
Ziteng Wang, Xu Li, Xiaofei Wang, Qiang Fu, Yonghong Yan

SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement
Szu-Wei Fu, Yu Tsao, Xugang Lu

An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement
Kehuang Li, Bo Wu, Chin-Hui Lee

A Novel Research to Artificial Bandwidth Extension Based on Deep BLSTM Recurrent Neural Networks and Exemplar-Based Sparse Representation
Bin Liu, Jianhua Tao


Far-Field, Robustness and Adaptation


Coping with Unseen Data Conditions: Investigating Neural Net Architectures, Robust Features, and Information Fusion for Robust Speech Recognition
Vikramjit Mitra, Horacio Franco

On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models
Natalia Tomashenko, Yuri Khokhlov, Yannick Estève

Analytical Assessment of Dual-Stream Merging for Noise-Robust ASR
Louis ten Bosch, Bert Cranen, Yang Sun

Use of Generalised Nonlinearity in Vector Taylor Series Noise Compensation for Robust Speech Recognition
Erfan Loweimi, Jon Barker, Thomas Hain

Joint Optimization of Denoising Autoencoder and DNN Acoustic Model Based on Multi-Target Learning for Noisy Speech Recognition
Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion
Takuya Higuchi, Takuya Yoshioka, Tomohiro Nakatani

Factorized Linear Input Network for Acoustic Model Adaptation in Noisy Conditions
Dung T. Tran, Marc Delroix, Atsunori Ogawa, Tomohiro Nakatani

Data Augmentation Using Multi-Input Multi-Output Source Separation for Deep Neural Network Based Acoustic Modeling
Yusuke Fujita, Ryoich Takashima, Takeshi Homma, Masahito Togami

Microphone Distance Adaptation Using Cluster Adaptive Training for Robust Far Field Speech Recognition
Animesh Prasad, Khe Chai Sim

An Investigation on the Use of i-Vectors for Robust ASR
Dimitrios Dimitriadis, Samuel Thomas, Sriram Ganapathy

The Sheffield Wargame Corpus — Day Two and Day Three
Yulan Liu, Charles Fox, Madina Hasan, Thomas Hain

Recurrent Models for Auditory Attention in Multi-Microphone Distant Speech Recognition
Suyoun Kim, Ian Lane

Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks
Wonkyum Lee, Kyu J. Han, Ian Lane


Low Resource Speech Recognition


Semi-Supervised Training in Deep Learning Acoustic Model
Yan Huang, Yongqiang Wang, Yifan Gong

Multilingual Data Selection for Low Resource Speech Recognition
Samuel Thomas, Kartik Audhkhasi, Jia Cui, Brian Kingsbury, Bhuvana Ramabhadran

An Investigation on Training Deep Neural Networks Using Probabilistic Transcriptions
Amit Das, Mark Hasegawa-Johnson

Analysis of Mismatched Transcriptions Generated by Humans and Machines for Under-Resourced Languages
Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson

ASR for South Slavic Languages Developed in Almost Automated Way
Jan Nouza, Radek Safarik, Petr Cerva

Improving Under-Resourced Language ASR Through Latent Subword Unit Space Discovery
Marzieh Razavi, Mathew Magimai-Doss

Language Adaptive DNNs for Improved Low Resource Speech Recognition
Markus Müller, Sebastian Stüker, Alex Waibel

Improved Multilingual Training of Stacked Neural Network Acoustic Models for Low Resource Languages
Tanel Alumäe, Stavros Tsakalidis, Richard Schwartz


Keynote 1: ISCA Medalist: John Makhoul

Neural Networks in Speech Recognition

Special Session: Auditory-Visual Expressive Speech and Gesture in Humans and Machines

Prosody

Speech and Language Processing for Clinical Health Applications

Speech Coding and Audio Processing for Noise Reduction

Speech Analysis

First and Second Language Acquisition

Speech and Hearing Disorders & Perception

Speech Synthesis Poster

Topics in Speech Processing

Show & Tell Session 1

New Trends in Neural Networks for Speech Recognition

Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances

Articulatory Measurements and Analysis

Automatic Assessment of Emotions

Acoustic and Articulatory Phonetics

Source Separation and Spatial Audio

Special Session: Auditory-Visual Expressive Speech and Gesture in Humans and Machines

Special Session: Intelligibility Under the Microscope

Spoken Documents, Spoken Understanding and Semantic Analysis

Spoken Term Detection

Show & Tell Session 2

Feature Extraction and Acoustic Modeling Using Neural Networks for ASR

Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge

Non-Native Speech Perception

Behavioral Signal Processing and Speaker State and Traits Analytics

Spoken Term Detection

Co-Inference of Production and Acoustics

Acoustic and Articulatory Phonetics

Prosody, Phonation and Voice Quality

Speech Production Analysis and Modeling

Spoken Dialogue Systems

Show & Tell Session 3

Special Event: Mindfulness

Keynote 2: Edward Chang

Special Event: Speaker Comparison for Forensic and Investigative Applications II

Special Session: Clinical and Neuroscience-Inspired Vocal Biomarkers of Neurological and Psychiatric Disorders

Special Session: Singing Synthesis Challenge: Fill-In the Gap

Conversation and Interaction

Automatic Learning of Representations

Language Modeling for Conversational Speech and Confidence Measures

Topics in Speech Perception

Behavioral Signal Processing and Speaker State and Traits Analytics

Speech Synthesis Poster

Resources and Annotation of Resources

Show & Tell Session 4

Acoustic Model Adaptation

Special Session: Sharing Research and Education Resources for Understanding Speech Processing

Special Session: Voice Conversion Challenge

Intelligibility and Masking

Robust Speaker Recognition and Anti-Spoofing

Speech Enhancement and Applications

Speech Analysis

Speaker Recognition

Decoding, System Combination

Special Session: Clinical and Neuroscience-Inspired Vocal Biomarkers of Neurological and Psychiatric Disorders

Show & Tell Session 5

Keynote 3: Anne Fernald

Far-Field Speech Processing

Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception, Sincerity & Native Language

Special Session: Speech, Audio, and Language Processing Techniques Applied to Bird and Animal Vocalizations

Dialogue Systems and Analysis of Dialogue

Interaction between Speech Production and Perception

Multimodal Processing

Pitch, Tone, and Music

Speaker Diarization and Recognition

Speech Synthesis Poster

Language Model Adaptation

Show & Tell Session 6

Robustness in Speech Processing

Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception, Sincerity & Native Language

Acoustic and Articulatory Phonetics

Speech Synthesis Oral I: Neural Networks

Speech Quality & Intelligibility

Speech Translation and Metadata for Linguistic/Discourse Structure

Speech Coding and Audio Processing for Noise Reduction

Special Session: Speech, Audio, and Language Processing Techniques Applied to Bird and Animal Vocalizations

Learning, Education and Different Speech

Dialogue Systems and Analysis of Dialogue

Topics in Speech Recognition

Special Session: Realism in Robust Speech Processing

Spoken Word Recognition

Speech Synthesis Oral: High Level Linguistic Features

Speech Enhancement

Dialogue: Backchannels and Turntaking

Language Recognition

Speech and Audio Segmentation and Classification

New Products and Services

Low Resource Speech Recognition

Keynote 4: Dan Jurafsky

Special Event: Speech Ventures

Special Session: Speech and Language Technologies for Human-Machine Conversation-Based Language Education

Phonation and Voice Quality

Speech Synthesis Oral: Prosody and Expressive Speech

Language Recognition

Spoken Language Understanding Systems

Language Recognition

Music, Audio, and Source Separation

Acoustic Modeling with Neural Networks

Robustness and Adaptation

Special Event: Computational Approaches to Linguistic Code Switching

Neural Networks for Language Modeling

Special Session: Sub-Saharan African Languages: From Speech Fundamentals to Applications

Speech Production Models

Speaker States and Traits

Speaker Recognition

VAD and Audio Events

Spoken Term Detection

Speech Enhancement and Noise Reduction

Far-Field, Robustness and Adaptation

Low Resource Speech Recognition