ISCA Archive Interspeech 2016 Sessions Website Booklet
  ISCA Archive Sessions Website Booklet

Interspeech 2016

San Francisco, USA
8-12 September 2016

Chair: Nelson Morgan
doi: 10.21437/Interspeech.2016

Speech and Hearing Disorders & Perception

Auditory-Visual Perception of VCVs Produced by People with Down Syndrome: Preliminary Results
Alexandre Hennequin, Amélie Rochet-Capellan, Marion Dohen

Combining Non-Pathological Data of Different Language Varieties to Improve DNN-HMM Performance on Pathological Speech
Emre Yılmaz, Mario Ganzeboom, Catia Cucchiarini, Helmer Strik

Evaluation of a Phone-Based Anomaly Detection Approach for Dysarthric Speech
Imed Laaridh, Corinne Fredouille, Christine Meunier

Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-Taper Spectral Estimation
Chitralekha Bhat, Bhavik Vachhani, Sunil Kopparapu

Impaired Categorical Perception of Mandarin Tones and its Relationship to Language Ability in Autism Spectrum Disorders
Fei Chen, Nan Yan, Xiaojie Pan, Feng Yang, Zhuanzhuan Ji, Lan Wang, Gang Peng

Perceived Naturalness of Electrolaryngeal Speech Produced Using sEMG-Controlled vs. Manual Pitch Modulation
K.F. Nagle, J.T. Heaton

Identifying Hearing Loss from Learned Speech Kernels
Shamima Najnin, Bonny Banerjee, Lisa Lucks Mendel, Masoumeh Heidari Kapourchali, Jayanta Kumar Dutta, Sungmin Lee, Chhayakanta Patro, Monique Pousson

Differential Effects of Velopharyngeal Dysfunction on Speech Intelligibility During Early and Late Stages of Amyotrophic Lateral Sclerosis
Panying Rong, Yana Yunusova, Jordan R. Green

The Production of Intervocalic Glides in Non Dysarthric Parkinsonian Speech
V. Delvaux, V. Roland, K. Huet, M. Piccaluga, M.C. Haelewyck, B. Harmegnies

Auditory Processing Impairments Under Background Noise in Children with Non-Syndromic Cleft Lip and/or Palate
Yang Feng, Zhang Lu

Modulation Spectral Features for Predicting Vocal Emotion Recognition by Simulated Cochlear Implants
Zhi Zhu, Ryota Miyauchi, Yukiko Araki, Masashi Unoki

Automatic Discrimination of Soft Voice Onset Using Acoustic Features of Breathy Voicing
Keiko Ochi, Koichi Mori, Naomi Sakai, Nobutaka Ono

Effect of Noise on Lexical Tone Perception in Cantonese-Speaking Amusics
Jing Shao, Caicai Zhang, Gang Peng, Yike Yang, William S.-Y. Wang

Audio-Visual Speech Recognition Using Bimodal-Trained Bottleneck Features for a Person with Severe Hearing Loss
Yuki Takashima, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki, Nobuyuki Mitani, Kiyohiro Omori, Kaoru Nakazono

Perception of Tone in Whispered Mandarin Sentences: The Case for Singapore Mandarin
Yuling Gu, Boon Pang Lim, Nancy F. Chen

Speech Synthesis Poster

A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences
Feng-Long Xie, Frank K. Soong, Haifeng Li

Parallel Dictionary Learning for Voice Conversion Using Discriminative Graph-Embedded Non-Negative Matrix Factorization
Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki

Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks
Yu Gu, Zhen-Hua Ling, Li-Rong Dai

Voice Conversion Based on Matrix Variate Gaussian Mixture Model Using Multiple Frame Features
Yi Yang, Hidetsugu Uchida, Daisuke Saito, Nobuaki Minematsu

Voice Conversion Based on Trajectory Model Training of Neural Networks Considering Global Variance
Naoki Hosaka, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

Comparing Articulatory and Acoustic Strategies for Reducing Non-Native Accents
Sandesh Aryal, Ricardo Gutierrez-Osuna

Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data
Seyyed Saeed Sarfjoo, Cenk Demiroglu

Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams
Lifa Sun, Hao Wang, Shiyin Kang, Kun Li, Helen Meng

Acoustic Analysis of Syllables Across Indian Languages
Anusha Prakash, Jeena J. Prakash, Hema A. Murthy

Objective Evaluation Methods for Chinese Text-To-Speech Systems
Teng Zhang, Zhipeng Chen, Ji Wu, Sam Lai, Wenhui Lei, Carsten Isert

Objective Evaluation Using Association Between Dimensions Within Spectral Features for Statistical Parametric Speech Synthesis
Yusuke Ijima, Taichi Asami, Hideyuki Mizuno

A Hierarchical Predictor of Synthetic Speech Naturalness Using Neural Networks
Takenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi, Keiichi Tokuda

Text-to-Speech for Individuals with Vision Loss: A User Study
Monika Podsiadło, Shweta Chahar

Speech Enhancement for a Noise-Robust Text-to-Speech Synthesis System Using Deep Recurrent Neural Networks
Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki, Junichi Yamagishi

Data Selection and Adaptation for Naturalness in HMM-Based Speech Synthesis
Erica Cooper, Alison Chang, Yocheved Levitan, Julia Hirschberg

Special Session: Intelligibility Under the Microscope

Microscopic Multilingual Matrix Test Predictions Using an ASR-Based Speech Recognition Model
Marc René Schädler, David Hülsmeier, Anna Warzybok, Sabine Hochmuth, Birger Kollmeier

DNN-Based Automatic Speech Recognition as a Model for Human Phoneme Perception
Mats Exter, Bernd T. Meyer

Undoing Misperceptions: A Microscopic Analysis of Consistent Confusions Through Signal Modifications
Attila Máté Tóth, Martin Cooke

Blind Non-Intrusive Speech Intelligibility Prediction Using Twin-HMMs
Mahdie Karbasi, Ahmed Hussen Abdelaziz, Hendrik Meutzner, Dorothea Kolossa

Misperceptions Arising from Speech-in-Babble Interactions
Attila Máté Tóth, Martin Cooke, Jon Barker

Introducing Temporal Rate Coding for Speech in Cochlear Implants: A Microscopic Evaluation in Humans and Models
Anja Eichenauer, Mathias Dietz, Bernd T. Meyer, Tim Jürgens

Language Effects in Noise-Induced Word Misperceptions
Maria Luisa Garcia Lecumberri, Jon Barker, Ricard Marxer, Martin Cooke

Speech Reductions Cause a De-Weighting of Secondary Acoustic Cues
Léo Varnet, Fanny Meunier, Michel Hoen

Using Phonologically Weighted Levenshtein Distances for the Prediction of Microscopic Intelligibility
Lionel Fontan, Isabelle Ferrané, Jérôme Farinas, Julien Pinquier, Xavier Aumont

The Impact of Manner of Articulation on the Intelligibility of Voicing Contrast in Noise: Cross-Linguistic Implications
Mayuki Matsui

Directly Comparing the Listening Strategies of Humans and Machines
Michael I. Mandel

Spoken Documents, Spoken Understanding and Semantic Analysis

LSTM-Based NeuroCRFs for Named Entity Recognition
Marc-Antoine Rondeau, Yi Su

Exploring Word Mover’s Distance and Semantic-Aware Embedding Techniques for Extractive Broadcast News Summarization
Shih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu

Improved Neural Bag-of-Words Model to Retrieve Out-of-Vocabulary Words in Speech Recognition
Imran Sheikh, Irina Illina, Dominique Fohr, Georges Linarès

Beyond Utterance Extraction: Summary Recombination for Speech Summarization
Jérémy Trione, Benoit Favre, Frederic Bechet

Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling
Bing Liu, Ian Lane

Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding
Aaron Jaech, Larry Heck, Mari Ostendorf

LatticeRnn: Recurrent Neural Networks Over Lattices
Faisal Ladhak, Ankur Gandhe, Markus Dreyer, Lambert Mathias, Ariya Rastrow, Björn Hoffmeister

Learning Document Representations Using Subspace Multinomial Model
Santosh Kesiraju, Lukáš Burget, Igor Szőke, Jan Černocký

Attention-Based Convolutional Neural Networks for Sentence Classification
Zhiwei Zhao, Youzheng Wu

Spoken Language Understanding in a Latent Topic-Based Subspace
Mohamed Morchid, Mohamed Bouaziz, Waad Ben Kheder, Killian Janod, Pierre-Michel Bousquet, Richard Dufour, Georges Linarès

Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM
Dilek Hakkani-Tür, Gokhan Tur, Asli Celikyilmaz, Yun-Nung Chen, Jianfeng Gao, Li Deng, Ye-Yi Wang

Deep Stacked Autoencoders for Spoken Language Understanding
Killian Janod, Mohamed Morchid, Richard Dufour, Georges Linarès, Renato De Mori

Labeled Data Generation with Encoder-Decoder LSTM for Semantic Slot Filling
Gakuto Kurata, Bing Xiang, Bowen Zhou

Exploring the Correlation of Pitch Accents and Semantic Slots for Spoken Language Understanding
Sabrina Stehwien, Ngoc Thang Vu

Analysis on Gated Recurrent Unit Based Question Detection Approach
Yaodong Tang, Zhiyong Wu, Helen Meng, Mingxing Xu, Lianhong Cai

Prosody, Phonation and Voice Quality

The Influence of Modality and Speaking Style on the Assimilation Type and Categorization Consistency of Non-Native Speech
Sarah E. Fenwick, Catherine T. Best, Chris Davis, Michael D. Tyler

Prosodic Convergence with Spoken Stimuli in Laboratory Data
Margaret Zellers

Effects of Stress on Fricatives: Evidence from Standard Modern Greek
Charalambos Themistocleous, Angelandria Savva, Andrie Aristodemou

Analysis of Chinese Syllable Durations in Running Speech of Japanese L2 Learners
Yue Sun, Shudon Hsiao, Yoshinori Sagisaka, Jinsong Zhang

Automatic Paragraph Segmentation with Lexical and Prosodic Features
Catherine Lai, Mireia Farrús, Johanna D. Moore

Automatic Glottal Inverse Filtering with Non-Negative Matrix Factorization
Manu Airaksinen, Lauri Juvela, Tom Bäckström, Paavo Alku

Speaker Identity and Voice Quality: Modeling Human Responses and Automatic Speaker Recognition
Soo Jin Park, Caroline Sigouin, Jody Kreiman, Patricia Keating, Jinxi Guo, Gary Yeung, Fang-Yu Kuo, Abeer Alwan

Analysis of Glottal Stop in Assam Sora Language
Sishir Kalita, Luke Horo, Priyankoo Sarmah, S.R. Mahadeva Prasanna, S. Dandapat

Acoustic Differences Between English /t/ Glottalization and Phrasal Creak
Marc Garellek, Scott Seyfarth

The Acoustics of Lexical Stress in Italian as a Function of Stress Level and Speaking Style
Anders Eriksson, Pier Marco Bertinetto, Mattias Heldner, Rosalba Nodari, Giovanna Lenoci

Cross-Gender and Cross-Dialect Tone Recognition for Vietnamese
Antje Schweitzer, Ngoc Thang Vu

Prosody Modification Using Allpass Residual of Speech Signals
Karthika Vijayan, K. Sri Rama Murty

Analyzing the Contribution of Top-Down Lexical and Bottom-Up Acoustic Cues in the Detection of Sentence Prominence
Sofoklis Kakouros, Joris Pelemans, Lyan Verwimp, Patrick Wambacq, Okko Räsänen

A Longitudinal Study of Children’s Intonation in Narrative Speech
Jeffrey Kallay, Melissa A. Redford

Speech Production Analysis and Modeling

Velum Control for Oral Sounds
Reed Blaylock, Louis Goldstein, Shrikanth S. Narayanan

F0 Development in Acquiring Korean Stop Distinction
Gayeon Son

Phonetic Reduction Can Lead to Lengthening, and Enhancement Can Lead to Shortening
Clara Cohen, Matt Carlson

Mechanical Production of [b], [m] and [w] Using Controlled Labial and Velopharyngeal Gestures
Takayuki Arai

An Improved 3D Geometric Tongue Model
Qiang Fang, Yun Chen, Haibo Wang, Jianguo Wei, Jianrong Wang, Xiyu Wu, Aijun Li

Congruency Effect Between Articulation and Grasping in Native English Speakers
Mikko Tiainen, Fatima M. Felisberti, Kaisa Tiippana, Martti Vainio, Juraj Simko, Jiri Lukavsky, Lari Vainio

Emergence of Vocal Developmental Sequences in a Predictive Coding Model of Speech Acquisition
Shamima Najnin, Bonny Banerjee

Categorization of Natural Spanish Whistled Vowels by Naïve Spanish Listeners
Julien Meyer, Laure Dentel, Fanny Meunier

Between- and Within-Speaker Effects of Bilingualism on F0 Variation
Rob Voigt, Dan Jurafsky, Meghan Sumner

Vowel Characteristics in the Assessment of L2 English Pronunciation
Calbert Graham, Paula Buttery, Francis Nolan

Kulning (Swedish Cattle Calls): Acoustic, EGG, Stroboscopic and High-Speed Video Analyses of an Unusual Singing Style
Ahmed Geneid, Anne-Maria Laukkanen, Anita McAllister, Robert Eklund

Glottal Squeaks in VC Sequences
Míša Hejná, Pertti Palo, Scott Moisik

Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Networks
Naoya Takahashi, Tofigh Naghibi, Beat Pfister

Special Session: Clinical and Neuroscience-Inspired Vocal Biomarkers of Neurological and Psychiatric Disorders

Acoustic-Prosodic and Turn-Taking Features in Interactions with Children with Neurodevelopmental Disorders
Daniel Bone, Somer Bishop, Rahul Gupta, Sungbok Lee, Shrikanth S. Narayanan

Automatic Detection of Parkinson’s Disease Based on Modulated Vowels
Daria Hemmerling, Juan Rafael Orozco-Arroyave, Andrzej Skalski, Janusz Gajda, Elmar Nöth

Towards Automatic Detection of Amyotrophic Lateral Sclerosis from Speech Acoustic and Articulatory Samples
Jun Wang, Prasanna V. Kothalkar, Beiming Cao, Daragh Heitzman

Neurophysiological Vocal Source Modeling for Biomarkers of Disease
Gregory Ciccarelli, Thomas F. Quatieri, Satrajit S. Ghosh

Relation of Automatically Extracted Formant Trajectories with Intelligibility Loss and Speaking Rate Decline in Amyotrophic Lateral Sclerosis
Rachelle L. Horwitz-Martin, Thomas F. Quatieri, Adam C. Lammert, James R. Williamson, Yana Yunusova, Elizabeth Godoy, Daryush D. Mehta, Jordan R. Green

Automatic Analysis of Typical and Atypical Encoding of Spontaneous Emotion in the Voice of Children
Fabien Ringeval, Erik Marchi, Charline Grossard, Jean Xavier, Mohamed Chetouani, David Cohen, Björn Schuller

Recognition of Depression in Bipolar Disorder: Leveraging Cohort and Person-Specific Knowledge
Soheil Khorram, John Gideon, Melvin McInnis, Emily Mower Provost

Diagnosing People with Dementia Using Automatic Conversation Analysis
Bahman Mirheidari, Daniel Blackburn, Markus Reuber, Traci Walker, Heidi Christensen

Behavioral Signal Processing and Speaker State and Traits Analytics

Attention Assisted Discovery of Sub-Utterance Structure in Speech Emotion Recognition
Che-Wei Huang, Shrikanth S. Narayanan

Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition
Linchuan Li, Zhiyong Wu, Mingxing Xu, Helen Meng, Lianhong Cai

Inter-Speech Clicks in an Interspeech Keynote
Jürgen Trouvain, Zofia Malisz

Speaker Age Classification and Regression Using i-Vectors
Joanna Grzybowska, Stanisław Kacprzak

Sparsely Connected and Disjointly Trained Deep Neural Networks for Low Resource Behavioral Annotation: Acoustic Classification in Couples’ Therapy
Haoqi Li, Brian Baucom, Panayiotis Georgiou

Automatically Classifying Self-Rated Personality Scores from Speech
Guozhen An, Sarah Ita Levitan, Rivka Levitan, Andrew Rosenberg, Michelle Levine, Julia Hirschberg

Estimation of Children’s Physical Characteristics from Their Voices
Jill Fain Lehman, Rita Singh

Talking to a System and Talking to a Human: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task
Hayakawa Akira, Saturnino Luz, Nick Campbell

Predicting Affective Dimensions Based on Self Assessed Depression Severity
Rahul Gupta, Shrikanth S. Narayanan

Enhancement of Automatic Oral Presentation Assessment System Using Latent N-Grams Word Representation and Part-of-Speech Information
Wen-Yu Huang, Shan-Wen Hsiao, Hung-Ching Sun, Ming-Chuan Hsieh, Ming-Hsueh Tsai, Chi-Chun Lee

Use of Vowels in Discriminating Speech-Laugh from Laughter and Neutral Speech
Sri Harsha Dumpala, P. Gangamohan, Suryakanth V. Gangashetty, B. Yegnanarayana

A Convex Model for Linguistic Influence in Group Conversations
Kan Kawabata, Visar Berisha, Anna Scaglione, Amy LaCross

A Deep Learning Approach to Modeling Empathy in Addiction Counseling
James Gibson, Doğan Can, Bo Xiao, Zac E. Imel, David C. Atkins, Panayiotis Georgiou, Shrikanth S. Narayanan

Unipolar Depression vs. Bipolar Disorder: An Elicitation-Based Approach to Short-Term Detection of Mood Disorder
Kun-Yi Huang, Chung-Hsien Wu, Yu-Ting Kuo, Fong-Lin Jang

Speech Synthesis Poster

Conditional Random Fields for the Tunisian Dialect Grapheme-to-Phoneme Conversion
Abir Masmoudi, Mariem Ellouze, Fethi Bougares, Yannick Esètve, Lamia Belguith

Efficient Thai Grapheme-to-Phoneme Conversion Using CRF-Based Joint Sequence Modeling
Sittipong Saychum, Sarawoot Kongyoung, Anocha Rugchatjaroen, Patcharika Chootrakool, Sawit Kasuriya, Chai Wutiwiwatchai

An Articulatory-Based Singing Voice Synthesis Using Tongue and Lips Imaging
Aurore Jaumard-Hakoun, Kele Xu, Clémence Leboullenger, Pierre Roussel-Ragot, Bruce Denby

Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis
Xu Li, Zhiyong Wu, Helen Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai

Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data
Xu Li, Zhiyong Wu, Helen Meng, Jia Jia, Xiaoyan Lou, Lianhong Cai

Audio-to-Visual Speech Conversion Using Deep Neural Networks
Sarah Taylor, Akihiro Kato, Iain Matthews, Ben Milner

Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine
Toru Nakashika, Yasuhiro Minami

Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data
Asterios Toutios, Tanner Sorensen, Krishna Somandepalli, Rachel Alexander, Shrikanth S. Narayanan

Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information
Xurong Xie, Xunying Liu, Lan Wang

Articulatory-to-Acoustic Conversion with Cascaded Prediction of Spectral and Excitation Features Using Neural Networks
Zheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai

Generating Gestural Scores from Acoustics Through a Sparse Anchor-Based Representation of Speech
Christopher Liberatore, Ricardo Gutierrez-Osuna

On the Suitability of Vocalic Sandwiches in a Corpus-Based TTS Engine
David Guennec, Damien Lolive

Unsupervised Stress Information Labeling Using Gaussian Process Latent Variable Model for Statistical Speech Synthesis
Decha Moungsri, Tomoki Koriyama, Takao Kobayashi

Using Zero-Frequency Resonator to Extract Multilingual Intonation Structure
Jinfu Ni, Yoshinori Shiga, Hisashi Kawai

Speaker Recognition

Analysis of Face Mask Effect on Speaker Recognition
Rahim Saeidi, Ilkka Huhtakallio, Paavo Alku

Data Selection for Within-Class Covariance Estimation
Elliot Singer, Tyler Campbell, Douglas Reynolds

Inter-Task System Fusion for Speaker Recognition
M. Ferras, Srikanth Madikeri, S. Dey, Petr Motlicek, Hervé Bourlard

Mahalanobis Metric Scoring Learned from Weighted Pairwise Constraints in I-Vector Speaker Recognition System
Zhenchun Lei, Yanhong Wan, Jian Luo, Yingen Yang

Novel Subband Autoencoder Features for Detection of Spoofed Speech
Meet H. Soni, Tanvina B. Patel, Hemant A. Patil

On the Issue of Calibration in DNN-Based Speaker Recognition Systems
Mitchell McLaren, Diego Castan, Luciana Ferrer, Aaron Lawson

Probabilistic Approach Using Joint Long and Short Session i-Vectors Modeling to Deal with Short Utterances for Speaker Recognition
Waad Ben Kheder, Driss Matrouf, Moez Ajili, Jean-François Bonastre

Short Utterance Variance Modelling and Utterance Partitioning for PLDA Speaker Verification
Ahilan Kanagasundaram, David Dean, Sridha Sridharan, Clinton Fookes, Ivan Himawan

Speaker-Dependent Dictionary-Based Speech Enhancement for Text-Dependent Speaker Verification
Nicolai Bæk Thomsen, Dennis Alexander Lehmann Thomsen, Zheng-Hua Tan, Børge Lindberg, Søren Holdt Jensen

Text-Available Speaker Recognition System for Forensic Applications
Chengzhu Yu, Chunlei Zhang, Finnian Kelly, Abhijeet Sangwan, John H.L. Hansen

Transfer Learning for Speaker Verification on Short Utterances
Qingyang Hong, Lin Li, Lihong Wan, Jun Zhang, Feng Tong

Twin Model G-PLDA for Duration Mismatch Compensation in Text-Independent Speaker Verification
Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong Aik Lee

Universal Background Sparse Coding and Multilayer Bootstrap Network for Speaker Clustering
Xiao-Lei Zhang

Improving Deep Neural Networks Based Speaker Verification Using Unlabeled Data
Yao Tian, Meng Cai, Liang He, Wei-Qiang Zhang, Jia Liu

Decoding, System Combination

Maximum a posteriori Based Decoding for CTC Acoustic Models
Naoyuki Kanda, Xugang Lu, Hisashi Kawai

Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures
Afsaneh Asaei, Gil Luyet, Milos Cernak, Hervé Bourlard

Model Compression Applied to Small-Footprint Keyword Spotting
George Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, Shiv Vitaladevuni

Why do ASR Systems Despite Neural Nets Still Depend on Robust Features
Angel Mario Castro Martinez, Marc René Schädler

An Adaptive Multi-Band System for Low Power Voice Command Recognition
Qing He, Gregory W. Wornell, Wei Ma

Memory-Efficient Modeling and Search Techniques for Hardware ASR Decoders
Michael Price, Anantha Chandrakasan, James Glass

Log-Linear System Combination Using Structured Support Vector Machines
J. Yang, Anton Ragni, Mark J.F. Gales, Kate M. Knill

Efficient Segmental Cascades for Speech Recognition
Hao Tang, Weiran Wang, Kevin Gimpel, Karen Livescu

A WFST Framework for Single-Pass Multi-Stream Decoding
Sirui Xu, Eric Fosler-Lussier

Comparison of Multiple System Combination Techniques for Keyword Spotting
William Hartmann, Le Zhang, Kerri Barnes, Roger Hsiao, Stavros Tsakalidis, Richard Schwartz

Rescoring by Combination of Posteriorgram Score and Subword-Matching Score for Use in Query-by-Example
Masato Obara, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh

Phone Synchronous Decoding with CTC Lattice
Zhehuai Chen, Wei Deng, Tao Xu, Kai Yu

Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception, Sincerity & Native Language

The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini

The Deception Sub-Challenge: The Data
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini

Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection
Sarah Ita Levitan, Guozhen An, Min Ma, Rivka Levitan, Andrew Rosenberg, Julia Hirschberg

Is Deception Emotional? An Emotion-Driven Predictive Approach
Shahin Amiriparian, Jouni Pohjalainen, Erik Marchi, Sergey Pugachevskiy, Björn Schuller

Prosodic Cues and Answer Type Detection for the Deception Sub-Challenge
Claude Montacié, Marie-José Caraty

The Sincerity Sub-Challenge: The Data
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini

Automatic Estimation of Perceived Sincerity from Spoken Language
Brandon M. Booth, Rahul Gupta, Pavlos Papadopoulos, Ruchir Travadi, Shrikanth S. Narayanan

Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic Analysis
Gábor Gosztolya, Tamás Grósz, György Szaszák, László Tóth

Minimization of Regression and Ranking Losses with Shallow Neural Networks on Automatic Sincerity Evaluation
Hung-Shin Lee, Yu Tsao, Chi-Chun Lee, Hsin-Min Wang, Wei-Cheng Lin, Wei-Chen Chen, Shan-Wen Hsiao, Shyh-Kang Jeng

Prediction of Deception and Sincerity from Speech Using Automatic Phone Recognition-Based Features
Robert Herms

Sincerity and Deception in Speech: Two Sides of the Same Coin? A Transfer- and Multi-Task Learning Perspective
Yue Zhang, Felix Weninger, Zhao Ren, Björn Schuller

Fusing Acoustic Feature Representations for Computational Paralinguistics Tasks
Heysem Kaya, Alexey A. Karpov

Speaker Diarization and Recognition

Speaker Linking and Applications Using Non-Parametric Hashing Methods
Douglas E. Sturim, William M. Campbell

Iterative PLDA Adaptation for Speaker Diarization
Gaël Le Lan, Delphine Charlet, Anthony Larcher, Sylvain Meignier

A Speaker Diarization System for Studying Peer-Led Team Learning Groups
Harishchandra Dubey, Lakshmish Kaushik, Abhijeet Sangwan, John H.L. Hansen

DNN-Based Speaker Clustering for Speaker Diarisation
Rosanna Milner, Thomas Hain

On the Importance of Efficient Transition Modeling for Speaker Diarization
Itshak Lapidot, Jean-François Bonastre

Priors for Speaker Counting and Diarization with AHC
Gregory Sell, Alan McCree, Daniel Garcia-Romero

Two-Pass IB Based Speaker Diarization System Using Meeting-Specific ANN Based Features
Nauman Dawalatabad, Srikanth Madikeri, C Chandra Sekhar, Hema A. Murthy

DNN-Based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification
Zeyan Oo, Yuta Kawakami, Longbiao Wang, Seiichi Nakagawa, Xiong Xiao, Masahiro Iwahashi

Unit-Selection Attack Detection Based on Unfiltered Frequency-Domain Features
Ulrich Scherhag, Andreas Nautsch, Christian Rathgeb, Christoph Busch

Investigating the Impact of Dialect Prestige on Lexical Decision
Mairym Lloréns Monteserín, Jason Zevin

Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features
Jinxi Guo, Gary Yeung, Deepak Muralidharan, Harish Arsikere, Amber Afshan, Abeer Alwan

Factor Analysis Based Speaker Verification Using ASR
Hang Su, Steven Wegmann

Joint Sound Source Separation and Speaker Recognition
Jeroen Zegers, Hugo Van hamme

Robust Multichannel Gender Classification from Speech in Movie Audio
Naveen Kumar, Md. Nasir, Panayiotis Georgiou, Shrikanth S. Narayanan

Speech Synthesis Poster

Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer
Xavi Gonzalvo, Siamak Tazari, Chun-an Chan, Markus Becker, Alexander Gutkin, Hanna Silen

First Step Towards End-to-End Parametric TTS Synthesis: Generating Spectral Parameters with Neural Attention
Wenfu Wang, Shuang Xu, Bo Xu

The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech Synthesis
Zhengqi Wen, Ya Li, Jianhua Tao

Improved Time-Frequency Trajectory Excitation Vocoder for DNN-Based Speech Synthesis
Eunwoo Song, Frank K. Soong, Hong-Goo Kang

Voice Quality Control Using Perceptual Expressions for Statistical Parametric Speech Synthesis Based on Cluster Adaptive Training
Yamato Ohtani, Koichiro Mori, Masahiro Morita

Waveform Generation Based on Signal Reshaping for Statistical Parametric Speech Synthesis
Felipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu, Simon King

Speaker Representations for Speaker Adaptation in Multiple Speakers’ BLSTM-RNN-Based Speech Synthesis
Yi Zhao, Daisuke Saito, Nobuaki Minematsu

Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, Przemysław Szczepaniak

An Investigation of DNN-Based Speech Synthesis Using Speaker Codes
Nobukatsu Hojo, Yusuke Ijima, Hideyuki Mizuno

Using Text and Acoustic Features in Predicting Glottal Excitation Waveforms for Parametric Speech Synthesis with Recurrent Neural Networks
Lauri Juvela, Xin Wang, Shinji Takaki, Manu Airaksinen, Junichi Yamagishi, Paavo Alku

Model Integration for HMM- and DNN-Based Speech Synthesis Using Product-of-Experts Framework
Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai

Idlak Tangle: An Open Source Kaldi Based Parametric Speech Synthesiser Based on DNN
Blaise Potard, Matthew P. Aylett, David A. Baude, Petr Motlicek

Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody
Alexandros Lazaridis, Milos Cernak, Philip N. Garner

On Smoothing and Enhancing Dynamics of Pitch Contours Represented by Discrete Orthogonal Polynomials for Prosody Generation
Chen-Yu Chiang

An Investigation of Recurrent Neural Network Architectures Using Word Embeddings for Phrase Break Prediction
Anandaswarup Vadapalli, Suryakanth V. Gangashetty

Model-Based Parametric Prosody Synthesis with Deep Neural Network
Hao Liu, Heng Lu, Xu Shao, Yi Xu

Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception, Sincerity & Native Language

The Native Language Sub-Challenge: The Data
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini

Native Language Identification Using Spectral and Source-Based Features
Avni Rajpal, Tanvina B. Patel, Hardik B. Sailor, Maulik C. Madhavi, Hemant A. Patil, Hiroya Fujisaki

Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features
Yishan Jiao, Ming Tu, Visar Berisha, Julie Liss

Convolutional Neural Networks with Data Augmentation for Classifying Speakers’ Native Language
Gil Keren, Jun Deng, Jouni Pohjalainen, Björn Schuller

Native Language Detection Using the I-Vector Framework
Mohammed Senoussaoui, Patrick Cardinal, Najim Dehak, Alessandro L. Koerich

Within-Speaker Features for Native Language Recognition in the Interspeech 2016 Computational Paralinguistics Challenge
Mark Huckvale

Multimodal Fusion of Multirate Acoustic, Prosodic, and Lexical Speaker Characteristics for Native Language Identification
Prashanth Gurunath Shivakumar, Sandeep Nallan Chakravarthula, Panayiotis Georgiou

Exploiting Phone Log-Likelihood Ratio Features for the Detection of the Native Language of Non-Native English Speakers
Alberto Abad, Eugénio Ribeiro, Fábio Kepler, Ramon Astudillo, Isabel Trancoso

Determining Native Language and Deception Using Phonetic Features and Classifier Combination
Gábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete, László Tóth

The INTERSPEECH 2016 Computational Paralinguistics Challenge: A Summary of Results
Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini

Björn Schuller, Stefan Steidl, Anton Batliner, Julia Hirschberg, Judee K. Burgoon, Alice Baird, Aaron Elkins, Yue Zhang, Eduardo Coutinho, Keelan Evanini

Special Session: Speech, Audio, and Language Processing Techniques Applied to Bird and Animal Vocalizations

Bird Song Synthesis Based on Hidden Markov Models
Jordi Bonada, Robert Lachlan, Merlijn Blaauw

Noise-Robust Hidden Markov Models for Limited Training Data for Within-Species Bird Phrase Classification
Kantapon Kaewtip, Charles Taylor, Abeer Alwan

A Framework for Automated Marmoset Vocalization Detection and Classification
Alan Wisler, Laura J. Brattain, Rogier Landman, Thomas F. Quatieri

Call Alternation Between Specific Pairs of Male Frogs Revealed by a Sound-Imaging Method in Their Natural Habitat
Ikkyu Aihara, Takeshi Mizumoto, Hiromitsu Awano, Hiroshi G. Okuno

Sinusoidal Modelling for Ecoacoustics
Patrice Guyot, Alice Eldridge, Ying Chen Eyre-Walker, Alison Johnston, Thomas Pellegrini, Mika Peck

Individual Identity in Songbirds: Signal Representations and Metric Learning for Locating the Information in Complex Corvid Calls
Dan Stowell, Veronica Morfi, Lisa F. Gill

Recognition of Multiple Bird Species Based on Penalised Maximum Likelihood and HMM-Based Modelling of Individual Vocalisation Elements
Peter Jančovič, Münevver Köküer

Cost Effective Acoustic Monitoring of Bird Species
Ciira wa Maina

Feature Learning and Automatic Segmentation for Dolphin Communication Analysis
Daniel Kohlsdorf, Denise Herzing, Thad Starner

Localizing Bird Songs Using an Open Source Robot Audition System with a Microphone Array
Reiji Suzuki, Shiho Matsubayashi, Kazuhiro Nakadai, Hiroshi G. Okuno

Robust Detection of Multiple Bioacoustic Events with Repetitive Structures
Frank Kurth

A Real-Time Parametric General-Purpose Mammalian Vocal Synthesiser
Roger K. Moore

YIN-Bird: Improved Pitch Tracking for Bird Vocalisations
Colm O’Reilly, Nicola M. Marples, David J. Kelly, Naomi Harte

Learning, Education and Different Speech

Mispronunciation Detection Leveraging Maximum Performance Criterion Training of Acoustic Models and Decision Functions
Yao-Chi Hsu, Ming-Han Yang, Hsiao-Tsung Hung, Berlin Chen

Using Clinician Annotations to Improve Automatic Speech Recognition of Stuttered Speech
Peter A. Heeman, Rebecca Lunsford, Andy McMillin, J. Scott Yaruss

Deep Neural Networks for Voice Quality Assessment Based on the GRBAS Scale
Simin Xie, Nan Yan, Ping Yu, Manwa L. Ng, Lan Wang, Zhuanzhuan Ji

Automated Screening of Speech Development Issues in Children by Identifying Phonological Error Patterns
Lauren Ward, Alessandro Stefani, Daniel Smith, Andreas Duenser, Jill Freyne, Barbara Dodd, Angela Morgan

Automatic Pronunciation Evaluation of Non-Native Mandarin Tone by Using Multi-Level Confidence Measures
Ju Lin, Yanlu Xie, Jinsong Zhang

Dysarthric Speech Recognition Using Kullback-Leibler Divergence-Based Hidden Markov Model
Myungjong Kim, Jun Wang, Hoirin Kim

Detection of Total Syllables and Canonical Syllables in Infant Vocalizations
Anne S. Warlaumont, Heather L. Ramsdell-Hudock

Improving Automatic Recognition of Aphasic Speech with AphasiaBank
Duc Le, Emily Mower Provost

Pronunciation Assessment of Japanese Learners of French with GOP Scores and Phonetic Information
Vincent Laborde, Thomas Pellegrini, Lionel Fontan, Julie Mauclair, Halima Sahraoui, Jérôme Farinas

Pronunciation Error Detection for New Language Learners
Sean Robertson, Cosmin Munteanu, Gerald Penn

L2 English Rhythm in Read Speech by Chinese Students
Hongwei Ding, Xinping Xu

Speech and Audio Segmentation and Classification

Deep Neural Network Bottleneck Features for Acoustic Event Recognition
Seongkyu Mun, Suwon Shon, Wooil Kim, Hanseok Ko

Combining Energy and Cross-Entropy Analysis for Nuclear Segments Detection
Antonio Origlia, Francesco Cutugno

Anchored Speech Detection
Roland Maas, Sree Hari Krishnan Parthasarathi, Brian King, Ruitong Huang, Björn Hoffmeister

Towards Smart-Cars That Can Listen: Abnormal Acoustic Event Detection on the Road
Mahesh Kumar Nandwana, Taufiq Hasan

Hierarchical Classification of Speaker and Background Noise and Estimation of SNR Using Sparse Representation
K.V. Vijay Girish, A.G. Ramakrishnan, T.V. Ananthapadmanabha

Robust Sound Event Detection in Continuous Audio Environments
Haomin Zhang, Ian McLoughlin, Yan Song

Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition
Naoya Takahashi, Michael Gygli, Beat Pfister, Luc Van Gool

Artificial Neural Network-Based Feature Combination for Spatial Voice Activity Detection
Stefan Meier, Walter Kellermann

HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors
Tomi Kinnunen, Alexey Sholokhov, Elie Khoury, Dennis Alexander Lehmann Thomsen, Md. Sahidullah, Zheng-Hua Tan

Manual versus Automated: The Challenging Routine of Infant Vocalisation Segmentation in Home Videos to Study Neuro(mal)development
Florian B. Pokorny, Robert Peharz, Wolfgang Roth, Matthias Zöhrer, Franz Pernkopf, Peter B. Marschik, Björn Schuller

Minimizing Annotation Effort for Adaptation of Speech-Activity Detection Systems
Luciana Ferrer, Martin Graciarena

New Products and Services

Progress and Prospects for Spoken Language Technology: What Ordinary People Think
Roger K. Moore, Hui Li, Shih-Hao Liao

Progress and Prospects for Spoken Language Technology: Results from Four Sexennial Surveys
Roger K. Moore, Ricard Marxer

On Employing a Highly Mismatched Crowd for Speech Transcription
Purushotam Radadia, Rahul Kumar, Kanika Kalra, Shirish Karande, Sachin Lodha

Sage: The New BBN Speech Processing Platform
Roger Hsiao, Ralf Meermeier, Tim Ng, Zhongqiang Huang, Maxwell Jordan, Enoch Kan, Tanel Alumäe, Jan Silovsky, William Hartmann, Francis Keith, Omer Lang, Manhung Siu, Owen Kimball

DNN-Based Feature Enhancement Using Joint Training Framework for Robust Multichannel Speech Recognition
Kang Hyun Lee, Tae Gyoon Kang, Woo Hyun Kang, Nam Soo Kim

Deep Neural Network Frontend for Continuous EMG-Based Speech Recognition
Michael Wand, Jürgen Schmidhuber

Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages
Basil Abraham, S. Umesh, Neethu Mariam Joy

Multi-Language Neural Network Language Models
Anton Ragni, Edgar Dakin, Xie Chen, Mark J.F. Gales, Kate M. Knill

Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration
Ottokar Tilk, Tanel Alumäe

TheanoLM — An Extensible Toolkit for Neural Network Language Modeling
Seppo Enarvi, Mikko Kurimo

Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems
P. Lanchantin, Mark J.F. Gales, Penny Karanasou, X. Liu, Y. Qian, L. Wang, P.C. Woodland, C. Zhang

Manipulating Word Lattices to Incorporate Human Corrections
Yashesh Gaur, Florian Metze, Jeffrey P. Bigham

Context-Aware Restaurant Recommendation for Natural Language Queries: A Formative User Study in the Automotive Domain
Philipp Fischer, Cornelius Styp von Rekowski, Andreas Nürnberger

Teaming Up: Making the Most of Diverse Representations for a Novel Personalized Speech Retrieval Application
Stephanie Pancoast, Murat Akbacak

Automatic Speech Transcription for Low-Resource Languages — The Case of Yoloxóchitl Mixtec (Mexico)
Vikramjit Mitra, Andreas Kathol, Jonathan D. Amith, Rey Castillo García

Real-Time Presentation Tracking Using Semantic Keyword Spotting
Reza Asadi, Harriet J. Fell, Timothy Bickmore, Ha Trinh

Music, Audio, and Source Separation

Improved Music Genre Classification with Convolutional Neural Networks
Weibin Zhang, Wenkang Lei, Xiangmin Xu, Xiaofeng Xing

Enhanced Harmonic Content and Vocal Note Based Predominant Melody Extraction from Vocal Polyphonic Music Signals
Gurunath Reddy M., K. Sreenivasa Rao

Long Short-Term Memory for Speaker Generalization in Supervised Speech Separation
Jitong Chen, DeLiang Wang

Phonotactic Language Identification for Singing
Anna M. Kruspe

Comparing the Influence of Spectro-Temporal Integration in Computational Speech Segregation
Thomas Bentsen, Tobias May, Abigail A. Kressner, Torsten Dau

Blind Speech Separation with GCC-NMF
Sean U.N. Wood, Jean Rouat

Effects of Cochlear Hearing Loss on the Benefits of Ideal Binary Masking
Vahid Montazeri, Shaikat Hossain, Peter F. Assmann

Combining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks
Emad M. Grais, Gerard Roma, Andrew J.R. Simpson, Mark D. Plumbley

Monaural Source Separation Using a Random Forest Classifier
Cosimo Riday, Saurabh Bhargava, Richard H.R. Hahnloser, Shih-Chii Liu

Adaptive Group Sparsity for Non-Negative Matrix Factorization with Application to Unsupervised Source Separation
Xu Li, Ziteng Wang, Xiaofei Wang, Qiang Fu, Yonghong Yan

A Robust Dual-Microphone Speech Source Localization Algorithm for Reverberant Environments
Yanmeng Guo, Xiaofei Wang, Chao Wu, Qiang Fu, Ning Ma, Guy J. Brown

Speech Localisation in a Multitalker Mixture by Humans and Machines
Ning Ma, Guy J. Brown

Reverberation-Robust One-Bit TDOA Based Moving Source Localization for Automatic Camera Steering
Harshavardhan Sundar, Gokul Deepak Manavalan, T.V. Sreenivas, Chandra Sekhar Seelamantula

Multi-Talker Speech Recognition Based on Blind Source Separation with ad hoc Microphone Array Using Smartphones and Cloud Storage
Keiko Ochi, Nobutaka Ono, Shigeki Miyabe, Shoji Makino

Acoustic Modeling with Neural Networks

Phase-Aware Signal Processing for Automatic Speech Recognition
Johannes Fahringer, Tobias Schrank, Johannes Stahl, Pejman Mowlaee, Franz Pernkopf

Unsupervised Deep Auditory Model Using Stack of Convolutional RBMs for Speech Recognition
Hardik B. Sailor, Hemant A. Patil

Interpretation of Low Dimensional Neural Network Bottleneck Features in Terms of Human Perception and Production
Philip Weber, Linxue Bai, Martin Russell, Peter Jančovič, Stephen Houghton

Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition
Shiliang Zhang, Hui Jiang, Shifu Xiong, Si Wei, Li-Rong Dai

Future Context Attention for Unidirectional LSTM Based Acoustic Model
Jian Tang, Shiliang Zhang, Si Wei, Li-Rong Dai

Hybrid Accelerated Optimization for Speech Recognition
Jen-Tzung Chien, Pei-Wen Huang, Tan Lee

On Online Attention-Based Speech Recognition and Joint Mandarin Character-Pinyin Training
William Chan, Ian Lane

GMM-Free Flat Start Sequence-Discriminative DNN Training
Gábor Gosztolya, Tamás Grósz, László Tóth

Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach
Yajie Miao, Florian Metze

Multidimensional Residual Learning Based on Recurrent Neural Networks for Acoustic Modeling
Yuanyuan Zhao, Shuang Xu, Bo Xu

Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models
Albert Zeyer, Ralf Schlüter, Hermann Ney

Advances in Very Deep Convolutional Neural Networks for LVCSR
Tom Sercu, Vaibhava Goel

Acoustic Modelling from the Signal Domain Using CNNs
Pegah Ghahremani, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur

Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition
Yevgen Chebotar, Austin Waters

Triphone State-Tying via Deep Canonical Correlation Analysis
Weiran Wang, Hao Tang, Karen Livescu

Low-Rank Representation of Nearest Neighbor Posterior Probabilities to Enhance DNN Based Acoustic Modeling
Gil Luyet, Pranay Dighe, Afsaneh Asaei, Hervé Bourlard

Speech Enhancement and Noise Reduction

Novel Subband Autoencoder Features for Non-Intrusive Quality Assessment of Noise Suppressed Speech
Meet H. Soni, Hemant A. Patil

SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement
Tian Gao, Jun Du, Li-Rong Dai, Chin-Hui Lee

A Novel Risk-Estimation-Theoretic Framework for Speech Enhancement in Nonstationary and Non-Gaussian Noise Conditions
Jishnu Sadasivan, Chandra Sekhar Seelamantula

Two-Stage Temporal Processing for Single-Channel Speech Enhancement
Suman Samui, Indrajit Chakrabarti, Soumya Kanti Ghosh

A Class-Specific Speech Enhancement for Phoneme Recognition: A Dictionary Learning Approach
Nazreen P.M., A.G. Ramakrishnan, Prasanta Kumar Ghosh

Robust Example Search Using Bottleneck Features for Example-Based Speech Enhancement
Atsunori Ogawa, Shogo Seki, Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Kazuya Takeda

Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks
Anurag Kumar, Dinei Florencio

Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement
Prashanth Gurunath Shivakumar, Panayiotis Georgiou

HMM-Based Speech Enhancement Using Sub-Word Models and Noise Adaptation
Akihiro Kato, Ben Milner

Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech
Li Li, Hirokazu Kameoka, Takuya Higuchi, Hiroshi Saruwatari

A priori SNR Estimation Using a Generalized Decision Directed Approach
Aleksej Chinaev, Reinhold Haeb-Umbach

A DNN-HMM Approach to Non-Negative Matrix Factorization Based Speech Enhancement
Ziteng Wang, Xu Li, Xiaofei Wang, Qiang Fu, Yonghong Yan

SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement
Szu-Wei Fu, Yu Tsao, Xugang Lu

An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement
Kehuang Li, Bo Wu, Chin-Hui Lee

A Novel Research to Artificial Bandwidth Extension Based on Deep BLSTM Recurrent Neural Networks and Exemplar-Based Sparse Representation
Bin Liu, Jianhua Tao

Far-Field, Robustness and Adaptation

Coping with Unseen Data Conditions: Investigating Neural Net Architectures, Robust Features, and Information Fusion for Robust Speech Recognition
Vikramjit Mitra, Horacio Franco

On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models
Natalia Tomashenko, Yuri Khokhlov, Yannick Estève

Analytical Assessment of Dual-Stream Merging for Noise-Robust ASR
Louis ten Bosch, Bert Cranen, Yang Sun

Use of Generalised Nonlinearity in Vector Taylor Series Noise Compensation for Robust Speech Recognition
Erfan Loweimi, Jon Barker, Thomas Hain

Joint Optimization of Denoising Autoencoder and DNN Acoustic Model Based on Multi-Target Learning for Noisy Speech Recognition
Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion
Takuya Higuchi, Takuya Yoshioka, Tomohiro Nakatani

Factorized Linear Input Network for Acoustic Model Adaptation in Noisy Conditions
Dung T. Tran, Marc Delroix, Atsunori Ogawa, Tomohiro Nakatani

Data Augmentation Using Multi-Input Multi-Output Source Separation for Deep Neural Network Based Acoustic Modeling
Yusuke Fujita, Ryoich Takashima, Takeshi Homma, Masahito Togami

Microphone Distance Adaptation Using Cluster Adaptive Training for Robust Far Field Speech Recognition
Animesh Prasad, Khe Chai Sim

An Investigation on the Use of i-Vectors for Robust ASR
Dimitrios Dimitriadis, Samuel Thomas, Sriram Ganapathy

The Sheffield Wargame Corpus — Day Two and Day Three
Yulan Liu, Charles Fox, Madina Hasan, Thomas Hain

Recurrent Models for Auditory Attention in Multi-Microphone Distant Speech Recognition
Suyoun Kim, Ian Lane

Semi-Supervised Speaker Adaptation for In-Vehicle Speech Recognition with Deep Neural Networks
Wonkyum Lee, Kyu J. Han, Ian Lane


Keynote 1: ISCA Medalist: John Makhoul

Neural Networks in Speech Recognition

Special Session: Auditory-Visual Expressive Speech and Gesture in Humans and Machines


Speech and Language Processing for Clinical Health Applications

Speech Coding and Audio Processing for Noise Reduction

Speech Analysis

First and Second Language Acquisition

Speech and Hearing Disorders & Perception

Speech Synthesis Poster

Topics in Speech Processing

Show & Tell Session 1

New Trends in Neural Networks for Speech Recognition

Special Session: The RedDots Challenge: Towards Characterizing Speakers from Short Utterances

Articulatory Measurements and Analysis

Automatic Assessment of Emotions

Acoustic and Articulatory Phonetics

Source Separation and Spatial Audio

Special Session: Auditory-Visual Expressive Speech and Gesture in Humans and Machines

Special Session: Intelligibility Under the Microscope

Spoken Documents, Spoken Understanding and Semantic Analysis

Spoken Term Detection

Show & Tell Session 2

Feature Extraction and Acoustic Modeling Using Neural Networks for ASR

Special Session: The Speakers in the Wild (SITW) Speaker Recognition Challenge

Non-Native Speech Perception

Behavioral Signal Processing and Speaker State and Traits Analytics

Spoken Term Detection

Co-Inference of Production and Acoustics

Acoustic and Articulatory Phonetics

Prosody, Phonation and Voice Quality

Speech Production Analysis and Modeling

Spoken Dialogue Systems

Show & Tell Session 3

Special Event: Mindfulness

Keynote 2: Edward Chang

Special Event: Speaker Comparison for Forensic and Investigative Applications II

Special Session: Clinical and Neuroscience-Inspired Vocal Biomarkers of Neurological and Psychiatric Disorders

Special Session: Singing Synthesis Challenge: Fill-In the Gap

Conversation and Interaction

Automatic Learning of Representations

Language Modeling for Conversational Speech and Confidence Measures

Topics in Speech Perception

Behavioral Signal Processing and Speaker State and Traits Analytics

Speech Synthesis Poster

Resources and Annotation of Resources

Show & Tell Session 4

Acoustic Model Adaptation

Special Session: Sharing Research and Education Resources for Understanding Speech Processing

Special Session: Voice Conversion Challenge

Intelligibility and Masking

Robust Speaker Recognition and Anti-Spoofing

Speech Enhancement and Applications

Speech Analysis

Speaker Recognition

Decoding, System Combination

Special Session: Clinical and Neuroscience-Inspired Vocal Biomarkers of Neurological and Psychiatric Disorders

Show & Tell Session 5

Keynote 3: Anne Fernald

Far-Field Speech Processing

Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception, Sincerity & Native Language

Special Session: Speech, Audio, and Language Processing Techniques Applied to Bird and Animal Vocalizations

Dialogue Systems and Analysis of Dialogue

Interaction between Speech Production and Perception

Multimodal Processing

Pitch, Tone, and Music

Speaker Diarization and Recognition

Speech Synthesis Poster

Language Model Adaptation

Show & Tell Session 6

Robustness in Speech Processing

Special Session: Interspeech 2016 Computational Paralinguistics Challenge (ComParE): Deception, Sincerity & Native Language

Acoustic and Articulatory Phonetics

Speech Synthesis Oral I: Neural Networks

Speech Quality & Intelligibility

Speech Translation and Metadata for Linguistic/Discourse Structure

Speech Coding and Audio Processing for Noise Reduction

Special Session: Speech, Audio, and Language Processing Techniques Applied to Bird and Animal Vocalizations

Learning, Education and Different Speech

Dialogue Systems and Analysis of Dialogue

Topics in Speech Recognition

Special Session: Realism in Robust Speech Processing

Spoken Word Recognition

Speech Synthesis Oral: High Level Linguistic Features

Speech Enhancement

Dialogue: Backchannels and Turntaking

Language Recognition

Speech and Audio Segmentation and Classification

New Products and Services

Low Resource Speech Recognition

Keynote 4: Dan Jurafsky

Special Event: Speech Ventures

Special Session: Speech and Language Technologies for Human-Machine Conversation-Based Language Education

Phonation and Voice Quality

Speech Synthesis Oral: Prosody and Expressive Speech

Language Recognition

Spoken Language Understanding Systems

Language Recognition

Music, Audio, and Source Separation

Acoustic Modeling with Neural Networks

Robustness and Adaptation

Special Event: Computational Approaches to Linguistic Code Switching

Neural Networks for Language Modeling

Special Session: Sub-Saharan African Languages: From Speech Fundamentals to Applications

Speech Production Models

Speaker States and Traits

Speaker Recognition

VAD and Audio Events

Spoken Term Detection

Speech Enhancement and Noise Reduction

Far-Field, Robustness and Adaptation

Low Resource Speech Recognition