ISCA Archive Interspeech 2017 Sessions Website Booklet
  ISCA Archive Sessions Website Booklet

Interspeech 2017

Stockholm, Sweden
20-24 August 2017

Chair: Francisco Lacerda
doi: 10.21437/Interspeech.2017

Speech Analysis and Representation 2

Low-Dimensional Representation of Spectral Envelope Without Deterioration for Full-Band Speech Analysis/Synthesis System
Masanori Morise, Genta Miyashita, Kenji Ozawa

Robust Source-Filter Separation of Speech Signal in the Phase Domain
Erfan Loweimi, Jon Barker, Oscar Saz Torralba, Thomas Hain

A Time-Warping Pitch Tracking Algorithm Considering Fast f0 Changes
Simon Stone, Peter Steiner, Peter Birkholz

A Modulation Property of Time-Frequency Derivatives of Filtered Phase and its Application to Aperiodicity and fo Estimation
Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda

Non-Local Estimation of Speech Signal for Vowel Onset Point Detection in Varied Environments
Avinash Kumar, S. Shahnawazuddin, Gayadhar Pradhan

Time-Domain Envelope Modulating the Noise Component of Excitation in a Continuous Residual-Based Vocoder for Statistical Parametric Speech Synthesis
Mohammed Salah Al-Radhi, Tamás Gábor Csapó, Géza Németh

Wavelet Speech Enhancement Based on Robust Principal Component Analysis
Chia-Lung Wu, Hsiang-Ping Hsu, Syu-Siang Wang, Jeih-Weih Hung, Ying-Hui Lai, Hsin-Min Wang, Yu Tsao

Vowel Onset Point Detection Using Sonority Information
Bidisha Sharma, S.R. Mahadeva Prasanna

Analytic Filter Bank for Speech Analysis, Feature Extraction and Perceptual Studies
Unto K. Laine

Learning the Mapping Function from Voltage Amplitudes to Sensor Positions in 3D-EMA Using Deep Neural Networks
Christian Kroos, Mark D. Plumbley

Search, Computational Strategies and Language Modeling

Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition
Ian Williams, Petar Aleksic

Comparison of Decoding Strategies for CTC Acoustic Models
Thomas Zenkel, Ramon Sanabria, Florian Metze, Jan Niehues, Matthias Sperber, Sebastian Stüker, Alex Waibel

Phone Duration Modeling for LVCSR Using Neural Networks
Hossein Hadian, Daniel Povey, Hossein Sameti, Sanjeev Khudanpur

Towards Better Decoding and Language Model Integration in Sequence to Sequence Models
Jan Chorowski, Navdeep Jaitly

Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling
Wenpeng Li, Binbin Zhang, Lei Xie, Dong Yu

Binary Deep Neural Networks for Speech Recognition
Xu Xiang, Yanmin Qian, Kai Yu

Hierarchical Constrained Bayesian Optimization for Feature, Acoustic Model and Decoder Parameter Optimization
Akshay Chandrashekaran, Ian Lane

Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition
Shohei Toyama, Daisuke Saito, Nobuaki Minematsu

Joint Learning of Correlated Sequence Labeling Tasks Using Bidirectional Recurrent Neural Networks
Vardaan Pahuja, Anirban Laha, Shachar Mirkin, Vikas Raykar, Lili Kotlerman, Guy Lev

Estimation of Gap Between Current Language Models and Human Performance
Xiaoyu Shen, Youssef Oualil, Clayton Greenberg, Mittul Singh, Dietrich Klakow

A Phonological Phrase Sequence Modelling Approach for Resource Efficient and Robust Real-Time Punctuation Recovery
Anna Moró, György Szaszák

Speech Perception

Factors Affecting the Intelligibility of Low-Pass Filtered Speech
Lei Wang, Fei Chen

Phonetic Restoration of Temporally Reversed Speech
Shi-yu Wang, Fei Chen

Simultaneous Articulatory and Acoustic Distortion in L1 and L2 Listening: Locally Time-Reversed “Fast” Speech
Mako Ishida

Lexically Guided Perceptual Learning in Mandarin Chinese
L. Ann Burchfield, San-hei Kenny Luk, Mark Antoniou, Anne Cutler

The Effect of Spectral Profile on the Intelligibility of Emotional Speech in Noise
Chris Davis, Chee Seng Chong, Jeesun Kim

Whether Long-Term Tracking of Speech Rate Affects Perception Depends on Who is Talking
Merel Maslowski, Antje S. Meyer, Hans Rutger Bosker

Emotional Thin-Slicing: A Proposal for a Short- and Long-Term Division of Emotional Speech
Daniel Oliveira Peres, Dominic Watt, Waldemar Ferreira Netto

Predicting Epenthetic Vowel Quality from Acoustics
Adriana Guevara-Rukoz, Erika Parlato-Oliveira, Shi Yu, Yuki Hirose, Sharon Peperkamp, Emmanuel Dupoux

The Effect of Spectral Tilt on Size Discrimination of Voiced Speech Sounds
Toshie Matsui, Toshio Irino, Kodai Yamamoto, Hideki Kawahara, Roy D. Patterson

Misperceptions of the Emotional Content of Natural and Vocoded Speech in a Car
Jaime Lorenzo-Trueba, Cassia Valentini Botinhao, Gustav Eje Henter, Junichi Yamagishi

The Relative Cueing Power of F0 and Duration in German Prominence Perception
Oliver Niebuhr, Jana Winkler

Perception and Acoustics of Vowel Nasality in Brazilian Portuguese
Luciana Marques, Rebecca Scarborough

Sociophonetic Realizations Guide Subsequent Lexical Access
Jonny Kim, Katie Drager

Speech Production and Perception

Critical Articulators Identification from RT-MRI of the Vocal Tract
Samuel Silva, António Teixeira

Semantic Edge Detection for Tracking Vocal Tract Air-Tissue Boundaries in Real-Time Magnetic Resonance Images
Krishna Somandepalli, Asterios Toutios, Shrikanth S. Narayanan

Vocal Tract Airway Tissue Boundary Tracking for rtMRI Using Shape and Appearance Priors
Sasan Asadiabadi, Engin Erzin

An Objective Critical Distance Measure Based on the Relative Level of Spectral Valley
T.V. Ananthapadmanabha, A.G. Ramakrishnan, Shubham Sharma

Database of Volumetric and Real-Time Vocal Tract MRI for Speech Science
Tanner Sorensen, Zisis Skordilis, Asterios Toutios, Yoon-Chul Kim, Yinghua Zhu, Jangwon Kim, Adam Lammert, Vikram Ramanarayanan, Louis Goldstein, Dani Byrd, Krishna Nayak, Shrikanth S. Narayanan

The Influence on Realization and Perception of Lexical Tones from Affricate’s Aspiration
Chong Cao, Yanlu Xie, Qi Zhang, Jinsong Zhang

Audiovisual Recalibration of Vowel Categories
Matthias K. Franken, Frank Eisner, Jan-Mathijs Schoffelen, Daniel J. Acheson, Peter Hagoort, James M. McQueen

The Effect of Gesture on Persuasive Speech
Judith Peters, Marieke Hoetjes

Auditory-Visual Integration of Talker Gender in Cantonese Tone Perception
Wei Lai

Event-Related Potentials Associated with Somatosensory Effect in Audio-Visual Speech Perception
Takayuki Ito, Hiroki Ohashi, Eva Montas, Vincent L. Gracco

When a Dog is a Cat and How it Changes Your Pupil Size: Pupil Dilation in Response to Information Mismatch
Lena F. Renner, Marcin Włodarczak

Cross-Modal Analysis Between Phonation Differences and Texture Images Based on Sentiment Correlations
Win Thuzar Kyaw, Yoshinori Sagisaka

Wireless Neck-Surface Accelerometer and Microphone on Flex Circuit with Application to Noise-Robust Monitoring of Lombard Speech
Daryush D. Mehta, Patrick C. Chwalek, Thomas F. Quatieri, Laura J. Brattain

Video-Based Tracking of Jaw Movements During Speech: Preliminary Results and Future Directions
Andrea Bandini, Aravind Namasivayam, Yana Yunusova

Accurate Synchronization of Speech and EGG Signal Using Phase Information
Sunil Kumar S.B., K. Sreenivasa Rao, Tanumay Mandal

The Acquisition of Focal Lengthening in Stockholm Swedish
Anna Sara H. Romøren, Aoju Chen

Short Utterances Speaker Recognition

A Generative Model for Score Normalization in Speaker Recognition
Albert Swart, Niko Brümmer

Content Normalization for Text-Dependent Speaker Verification
Subhadeep Dey, Srikanth Madikeri, Petr Motlicek, Marc Ferras

End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances
Chunlei Zhang, Kazuhito Koishida

Adversarial Network Bottleneck Features for Noise Robust Speaker Verification
Hong Yu, Zheng-Hua Tan, Zhanyu Ma, Jun Guo

What Does the Speaker Embedding Encode?
Shuai Wang, Yanmin Qian, Kai Yu

Incorporating Local Acoustic Variability Information into Short Duration Speaker Verification
Jianbo Ma, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Kong Aik Lee

DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances
Jinghua Zhong, Wenping Hu, Frank K. Soong, Helen Meng

Time-Varying Autoregressions for Speaker Verification in Reverberant Conditions
Ville Vestman, Dhananjaya Gowda, Md. Sahidullah, Paavo Alku, Tomi Kinnunen

Deep Speaker Embeddings for Short-Duration Speaker Verification
Gautam Bhattacharya, Jahangir Alam, Patrick Kenny

Using Voice Quality Features to Improve Short-Utterance, Text-Independent Speaker Verification Systems
Soo Jin Park, Gary Yeung, Jody Kreiman, Patricia A. Keating, Abeer Alwan

Gain Compensation for Fast i-Vector Extraction Over Short Duration
Kong Aik Lee, Haizhou Li

Joint Training of Expanded End-to-End DNN for Text-Dependent Speaker Verification
Hee-soo Heo, Jee-weon Jung, IL-ho Yang, Sung-hyun Yoon, Ha-jin Yu

Dialog Modeling

Online End-of-Turn Detection from Speech Based on Stacked Time-Asynchronous Sequential Networks
Ryo Masumura, Taichi Asami, Hirokazu Masataki, Ryo Ishii, Ryuichiro Higashinaka

Improving Prediction of Speech Activity Using Multi-Participant Respiratory State
Marcin Włodarczak, Kornel Laskowski, Mattias Heldner, Kätlin Aare

Turn-Taking Offsets and Dialogue Context
Peter A. Heeman, Rebecca Lunsford

Towards Deep End-of-Turn Prediction for Situated Spoken Dialogue Systems
Angelika Maier, Julian Hough, David Schlangen

End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech
Yuichi Ishimoto, Takehiro Teraoka, Mika Enomoto

Turn-Taking Estimation Model Based on Joint Embedding of Lexical and Prosodic Contents
Chaoran Liu, Carlos Ishi, Hiroshi Ishiguro

Social Signal Detection in Spontaneous Dialogue Using Bidirectional LSTM-CTC
Hirofumi Inaguma, Koji Inoue, Masato Mimura, Tatsuya Kawahara

Entrainment in Multi-Party Spoken Dialogues at Multiple Linguistic Levels
Zahra Rahimi, Anish Kumar, Diane Litman, Susannah Paletz, Mingzhi Yu

Measuring Synchrony in Task-Based Dialogues
Justine Reverdy, Carl Vogel

Sequence to Sequence Modeling for User Simulation in Dialog Systems
Paul Crook, Alex Marin

Human and Automated Scoring of Fluency, Pronunciation and Intonation During Human–Machine Spoken Dialog Interactions
Vikram Ramanarayanan, Patrick L. Lange, Keelan Evanini, Hillary R. Molloy, David Suendermann-Oeft

Hierarchical LSTMs with Joint Learning for Estimating Customer Satisfaction from Contact Center Calls
Atsushi Ando, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono

Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning
Stefan Ultes, Paweł Budzianowski, Iñigo Casanueva, Nikola Mrkšić, Lina Rojas-Barahona, Pei-Hao Su, Tsung-Hsien Wen, Milica Gašić, Steve Young

Analysis of the Relationship Between Prosodic Features of Fillers and its Forms or Occurrence Positions
Shizuka Nakamura, Ryosuke Nakanishi, Katsuya Takanashi, Tatsuya Kawahara

Cross-Subject Continuous Emotion Recognition Using Speech and Body Motion in Dyadic Interactions
Syeda Narjis Fatima, Engin Erzin

L1 and L2 Acquisition

An Automatically Aligned Corpus of Child-Directed Speech
Micha Elsner, Kiwako Ito

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences
Ocke-Schwen Bohn, Trine Askjær-Jørgensen

On the Role of Temporal Variability in the Acquisition of the German Vowel Length Contrast
Felicitas Kleber

A Data-Driven Approach for Perceptually Validated Acoustic Features for Children’s Sibilant Fricative Productions
Patrick F. Reidy, Mary E. Beckman, Jan Edwards, Benjamin Munson

Proficiency Assessment of ESL Learner’s Sentence Prosody with TTS Synthesized Voice as Reference
Yujia Xiao, Frank K. Soong

Mechanisms of Tone Sandhi Rule Application by Non-Native Speakers
Si Chen, Yunjuan He, Chun Wah Yuen, Bei Li, Yike Yang

Changes in Early L2 Cue-Weighting of Non-Native Speech: Evidence from Learners of Mandarin Chinese
Seth Wiener

Directing Attention During Perceptual Training: A Preliminary Study of Phonetic Learning in Southern Min by Mandarin Speakers
Ying Chen, Eric Pederson

Prosody Analysis of L2 English for Naturalness Evaluation Through Speech Modification
Dean Luo, Ruxin Luo, Lixin Wang

Measuring Encoding Efficiency in Swedish and English Language Learner Speech Production
Gintarė Grigonytė, Gerold Schneider

Lexical Adaptation to a Novel Accent in German: A Comparison Between German, Swedish, and Finnish Listeners
Adriana Hanulíková, Jenny Ekström

Qualitative Differences in L3 Learners’ Neurophysiological Response to L1 versus L2 Transfer
Alejandra Keidel Fernández, Thomas Hörberg

Articulation Rate in Swedish Child-Directed Speech Increases as a Function of the Age of the Child Even When Surprisal is Controlled for
Johan Sjons, Thomas Hörberg, Robert Östling, Johannes Bjerva

The Relationship Between the Perception and Production of Non-Native Tones
Kaile Zhang, Gang Peng

MMN Responses in Adults After Exposure to Bimodal and Unimodal Frequency Distributions of Rotated Speech
Ellen Marklund, Elísabet Eir Cortes, Johan Sjons

Voice, Speech and Hearing Disorders

Float Like a Butterfly Sting Like a Bee: Changes in Speech Preceded Parkinsonism Diagnosis for Muhammad Ali
Visar Berisha, Julie Liss, Timothy Huston, Alan Wisler, Yishan Jiao, Jonathan Eig

Cepstral and Entropy Analyses in Vowels Excerpted from Continuous Speech of Dysphonic and Control Speakers
Antonella Castellana, Andreas Selamtzis, Giampiero Salvi, Alessio Carullo, Arianna Astolfi

Classification of Bulbar ALS from Kinematic Features of the Jaw and Lips: Towards Computer-Mediated Assessment
Andrea Bandini, Jordan R. Green, Lorne Zinman, Yana Yunusova

Zero Frequency Filter Based Analysis of Voice Disorders
Nagaraj Adiga, Vikram C.M., Keerthi Pullela, S.R. Mahadeva Prasanna

Hypernasality Severity Analysis in Cleft Lip and Palate Speech Using Vowel Space Area
Nikitha K., Sishir Kalita, C.M. Vikram, M. Pushpavathi, S.R. Mahadeva Prasanna

Automatic Prediction of Speech Evaluation Metrics for Dysarthric Speech
Imed Laaridh, Waad Ben Kheder, Corinne Fredouille, Christine Meunier

Apkinson — A Mobile Monitoring Solution for Parkinson’s Disease
Philipp Klumpp, Thomas Janu, Tomás Arias-Vergara, J.C. Vásquez-Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth

Dysprosody Differentiate Between Parkinson’s Disease, Progressive Supranuclear Palsy, and Multiple System Atrophy
Jan Hlavnička, Tereza Tykalová, Roman Čmejla, Jiří Klempíř, Evžen Růžička, Jan Rusz

Interpretable Objective Assessment of Dysarthric Speech Based on Deep Neural Networks
Ming Tu, Visar Berisha, Julie Liss

Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition
Bhavik Vachhani, Chitralekha Bhat, Biswajit Das, Sunil Kumar Kopparapu

Prediction of Speech Delay from Acoustic Measurements
Jason Lilley, Madhavi Ratnagiri, H. Timothy Bunnell

The Frequency Range of “The Ling Six Sounds” in Standard Chinese
Aijun Li, Hua Zhang, Wen Sun

Production of Sustained Vowels and Categorical Perception of Tones in Mandarin Among Cochlear-Implanted Children
Wentao Gu, Jiao Yin, James Mahshie

Source Separation and Voice Activity Detection

Audio Content Based Geotagging in Multimedia
Anurag Kumar, Benjamin Elizalde, Bhiksha Raj

Time Delay Histogram Based Speech Source Separation Using a Planar Array
Zhaoqiong Huang, Zhanzhong Cao, Dongwen Ying, Jielin Pan, Yonghong Yan

Excitation Source Features for Improving the Detection of Vowel Onset and Offset Points in a Speech Sequence
Gayadhar Pradhan, Avinash Kumar, S. Shahnawazuddin

A Contrast Function and Algorithm for Blind Separation of Audio Signals
Wei Gao, Roberto Togneri, Victor Sreeram

Weighted Spatial Covariance Matrix Estimation for MUSIC Based TDOA Estimation of Speech Source
Chenglin Xu, Xiong Xiao, Sining Sun, Wei Rao, Eng Siong Chng, Haizhou Li

Speaker Direction-of-Arrival Estimation Based on Frequency-Independent Beampattern
Feng Guo, Yuhang Cao, Zheng Liu, Jiaen Liang, Baoqing Li, Xiaobing Yuan

A Mask Estimation Method Integrating Data Field Model for Speech Enhancement
Xianyun Wang, Changchun Bao, Feng Bao

Improved End-of-Query Detection for Streaming Speech Recognition
Matt Shannon, Gabor Simko, Shuo-Yiin Chang, Carolina Parada

Using Approximated Auditory Roughness as a Pre-Filtering Feature for Human Screaming and Affective Speech AED
Di He, Zuofu Cheng, Mark Hasegawa-Johnson, Deming Chen

Improving Source Separation via Multi-Speaker Representations
Jeroen Zegers, Hugo Van hamme

Multiple Sound Source Counting and Localization Based on Spatial Principal Eigenvector
Bing Yang, Hong Liu, Cheng Pang

Subband Selection for Binaural Speech Source Localization
Girija Ramesan Karthik, Prasanta Kumar Ghosh

Unmixing Convolutive Mixtures by Exploiting Amplitude Co-Modulation: Methods and Evaluation on Mandarin Speech Recordings
Bo-Rui Chen, Huang-Yi Lee, Yi-Wen Liu

Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection
Fei Tao, Carlos Busso

Domain-Specific Utterance End-Point Detection for Speech Recognition
Roland Maas, Ariya Rastrow, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Björn Hoffmeister

Speech Detection and Enhancement Using Single Microphone for Distant Speech Applications in Reverberant Environments
Vinay Kothapally, John H.L. Hansen


A Post-Filtering Approach Based on Locally Linear Embedding Difference Compensation for Speech Enhancement
Yi-Chiao Wu, Hsin-Te Hwang, Syu-Siang Wang, Chin-Cheng Hsu, Yu Tsao, Hsin-Min Wang

Multi-Target Ensemble Learning for Monaural Speech Separation
Hui Zhang, Xueliang Zhang, Guanglai Gao

Improved Example-Based Speech Enhancement by Using Deep Neural Network Acoustic Model for Noise Robust Example Search
Atsunori Ogawa, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani

Subjective Intelligibility of Deep Neural Network-Based Speech Enhancement
Femke B. Gelderblom, Tron V. Tronstad, Erlend Magnus Viggen

Real-Time Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility
Maria Koutsogiannaki, Holly Francois, Kihyun Choo, Eunmi Oh

On the Influence of Modifying Magnitude and Phase Spectrum to Enhance Noisy Speech Signals
Hans-Günter Hirsch, Michael Gref

MixMax Approximation as a Super-Gaussian Log-Spectral Amplitude Estimator for Speech Enhancement
Robert Rehr, Timo Gerkmann

Binary Mask Estimation Strategies for Constrained Imputation-Based Speech Enhancement
Ricard Marxer, Jon Barker

A Fully Convolutional Neural Network for Speech Enhancement
Se Rim Park, Jin Won Lee

Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization
Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino

A Comparison of Perceptually Motivated Loss Functions for Binary Mask Estimation in Speech Separation
Danny Websdale, Ben Milner

Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification
Daniel Michelsanti, Zheng-Hua Tan

Speech Enhancement Using Bayesian Wavenet
Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Dinei Florêncio, Mark Hasegawa-Johnson

Binaural Reverberant Speech Separation Based on Deep Neural Networks
Xueliang Zhang, DeLiang Wang

On the Quality and Intelligibility of Noisy Speech Processed for Near-End Listening Enhancement
Tudor-Cătălin Zorilă, Yannis Stylianou

Special Session: Digital Revolution for Under-resourced Languages 2

The ABAIR Initiative: Bringing Spoken Irish into the Digital Space
Ailbhe Ní Chasaide, Neasa Ní Chiaráin, Christoph Wendler, Harald Berthelsen, Andy Murphy, Christer Gobl

Very Low Resource Radio Browsing for Agile Developmental and Humanitarian Monitoring
Armin Saeb, Raghav Menon, Hugh Cameron, William Kibira, John Quinn, Thomas Niesler

Extracting Situation Frames from Non-English Speech: Evaluation Framework and Pilot Results
Nikolaos Malandrakis, Ondřej Glembek, Shrikanth S. Narayanan

Eliciting Meaningful Units from Speech
Daniil Kocharov, Tatiana Kachkovskaia, Pavel Skrelin

Unsupervised Speech Signal to Symbol Transformation for Zero Resource Speech Applications
Saurabhchand Bhati, Shekhar Nayak, K. Sri Rama Murty

Machine Assisted Analysis of Vowel Length Contrasts in Wolof
Elodie Gauthier, Laurent Besacier, Sylvie Voisin

Leveraging Text Data for Word Segmentation for Underresourced Languages
Thomas Glarner, Benedikt Boenninghoff, Oliver Walter, Reinhold Haeb-Umbach

Improving DNN Bluetooth Narrowband Acoustic Models by Cross-Bandwidth and Cross-Lingual Initialization
Xiaodan Zhuang, Arnab Ghoshal, Antti-Veikko Rosti, Matthias Paulik, Daben Liu

Joint Estimation of Articulatory Features and Acoustic Models for Low-Resource Languages
Basil Abraham, S. Umesh, Neethu Mariam Joy

Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages
Basil Abraham, Tejaswi Seeram, S. Umesh

Building an ASR Corpus Using Althingi’s Parliamentary Speeches
Inga Rún Helgadóttir, Róbert Kjaran, Anna Björk Nikulásdóttir, Jón Guðnason

Implementation of a Radiology Speech Recognition System for Estonian Using Open Source Software
Tanel Alumäe, Andrus Paats, Ivo Fridolin, Einar Meister

Building ASR Corpora Using Eyra
Jón Guðnason, Matthías Pétursson, Róbert Kjaran, Simon Klüpfel, Anna Björk Nikulásdóttir

Rapid Development of TTS Corpora for Four South African Languages
Daniel van Niekerk, Charl van Heerden, Marelie Davel, Neil Kleynhans, Oddur Kjartansson, Martin Jansche, Linne Ha

Uniform Multilingual Multi-Speaker Acoustic Model for Statistical Parametric Speech Synthesis of Low-Resourced Languages
Alexander Gutkin

Nativization of Foreign Names in TTS for Automatic Reading of World News in Swahili
Joseph Mendelson, Pilar Oplustil, Oliver Watts, Simon King

Speech Recognition: Technologies for New Applications and Paradigms

Developing On-Line Speaker Diarization System
Dimitrios Dimitriadis, Petr Fousek

Comparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing
Shreyas Seshadri, Ulpu Remes, Okko Räsänen

Automatic Evaluation of Children Reading Aloud on Sentences and Pseudowords
Jorge Proença, Carla Lopes, Michael Tjalve, Andreas Stolcke, Sara Candeias, Fernando Perdigão

Off-Topic Spoken Response Detection with Word Embeddings
Su-Youn Yoon, Chong Min Lee, Ikkyu Choi, Xinhao Wang, Matthew Mulholland, Keelan Evanini

Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models
Wei Li, Nancy F. Chen, Sabato Marco Siniscalchi, Chin-Hui Lee

Automatic Explanation Spot Estimation Method Targeted at Text and Figures in Lecture Slides
Shoko Tsujimura, Kazumasa Yamamoto, Seiichi Nakagawa

Multiview Representation Learning via Deep CCA for Silent Speech Recognition
Myungjong Kim, Beiming Cao, Ted Mau, Jun Wang

Use of Graphemic Lexicons for Spoken Language Assessment
K.M. Knill, Mark J.F. Gales, K. Kyriakopoulos, A. Ragni, Y. Wang

Distilling Knowledge from an Ensemble of Models for Punctuation Prediction
Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Ya Li

A Mostly Data-Driven Approach to Inverse Text Normalization
Ernest Pusateri, Bharat Ram Ambati, Elizabeth Brooks, Ondrej Platek, Donald McAllaster, Venki Nagesha

Mismatched Crowdsourcing from Multiple Annotator Languages for Recognizing Zero-Resourced Languages: A Nullspace Clustering Approach
Wenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen, Boon Pang Lim

Experiments in Character-Level Neural Network Models for Punctuation
William Gale, Sarangarajan Parthasarathy

Multi-Channel Apollo Mission Speech Transcripts Calibration
Lakshmish Kaushik, Abhijeet Sangwan, John H.L. Hansen

Speaker and Language Recognition Applications

Calibration Approaches for Language Detection
Mitchell McLaren, Luciana Ferrer, Diego Castan, Aaron Lawson

Bidirectional Modelling for Short Duration Language Identification
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah, Julien Epps

Conditional Generative Adversarial Nets Classifier for Spoken Language Identification
Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai

Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition
Antonio Miguel, Jorge Llombart, Alfonso Ortega, Eduardo Lleida

Speaker Clustering by Iteratively Finding Discriminative Feature Space and Cluster Labels
Sungrack Yun, Hye Jin Jang, Taesu Kim

Domain Adaptation of PLDA Models in Broadcast Diarization by Means of Unsupervised Speaker Clustering
Ignacio Viñals, Alfonso Ortega, Jesús Villalba, Antonio Miguel, Eduardo Lleida

LSTM Neural Network-Based Speaker Segmentation Using Acoustic and Language Modelling
Miquel India, José A.R. Fonollosa, Javier Hernando

Acoustic Pairing of Original and Dubbed Voices in the Context of Video Game Localization
Adrien Gresse, Mickael Rouvier, Richard Dufour, Vincent Labatut, Jean-François Bonastre

Homogeneity Measure Impact on Target and Non-Target Trials in Forensic Voice Comparison
Moez Ajili, Jean-François Bonastre, Waad Ben Kheder, Solange Rossato, Juliette Kahn

Null-Hypothesis LLR: A Proposal for Forensic Automatic Speaker Recognition
Yosef A. Solewicz, Michael Jessen, David van der Vloed

The Opensesame NIST 2016 Speaker Recognition Evaluation System
Gang Liu, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin, Tuo Zhao

IITG-Indigo System for NIST 2016 SRE Challenge
Nagendra Kumar, Rohan Kumar Das, Sarfaraz Jelil, Dhanush B.K., H. Kashyap, K. Sri Rama Murty, Sriram Ganapathy, Rohit Sinha, S.R. Mahadeva Prasanna

Locally Weighted Linear Discriminant Analysis for Robust Speaker Verification
Abhinav Misra, Shivesh Ranjan, John H.L. Hansen

Recursive Whitening Transformation for Speaker Recognition on Language Mismatched Condition
Suwon Shon, Seongkyu Mun, Hanseok Ko

Spoken Document Processing

Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings
Shane Settle, Keith Levin, Herman Kamper, Karen Livescu

Constructing Acoustic Distances Between Subwords and States Obtained from a Deep Neural Network for Spoken Term Detection
Daisuke Kaneko, Ryota Konno, Kazunori Kojima, Kazuyo Tanaka, Shi-wook Lee, Yoshiaki Itoh

Fast and Accurate OOV Decoder on High-Level Features
Yuri Khokhlov, Natalia Tomashenko, Ivan Medennikov, Aleksei Romanenko

Exploring the Use of Significant Words Language Modeling for Spoken Document Retrieval
Ying-Wen Chen, Kuan-Yu Chen, Hsin-Min Wang, Berlin Chen

Incorporating Acoustic Features for Spontaneous Speech Driven Content Retrieval
Hiroto Tasaki, Tomoyosi Akiba

Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification
Bo-Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-Yi Lee, Lin-Shan Lee

Automatic Alignment Between Classroom Lecture Utterances and Slide Components
Masatoshi Tsuchiya, Ryo Minamiguchi

Compensating Gender Variability in Query-by-Example Search on Speech Using Voice Conversion
Paula Lopez-Otero, Laura Docio-Fernandez, Carmen Garcia-Mateo

Zero-Shot Learning Across Heterogeneous Overlapping Domains
Anjishnu Kumar, Pavankumar Reddy Muddireddy, Markus Dreyer, Björn Hoffmeister

Hierarchical Recurrent Neural Network for Story Segmentation
Emiru Tsunoo, Peter Bell, Steve Renals

Evaluating Automatic Topic Segmentation as a Segment Retrieval Task
Abdessalam Bouchekif, Delphine Charlet, Géraldine Damnati, Nathalie Camelin, Yannick Estève

Improving Speech Recognizers by Refining Broadcast Data with Inaccurate Subtitle Timestamps
Jeong-Uk Bang, Mu-Yeol Choi, Sang-Hun Kim, Oh-Wook Kwon

A Relevance Score Estimation for Spoken Term Detection Based on RNN-Generated Pronunciation Embeddings
Jan Švec, Josef V. Psutka, Luboš Šmídl, Jan Trmal

Articulatory and Acoustic Phonetics

Mental Representation of Japanese Mora; Focusing on its Intrinsic Duration
Kosuke Sugai

Temporal Dynamics of Lateral Channel Formation in /l/: 3D EMA Data from Australian English
Jia Ying, Christopher Carignan, Jason A. Shaw, Michael Proctor, Donald Derrick, Catherine T. Best

Vowel and Consonant Sequences in three Bavarian Dialects of Austria
Nicola Klingler, Sylvia Moosmüller, Hannes Scheutz

Acoustic Cues to the Singleton-Geminate Contrast: The Case of Libyan Arabic Sonorants
Amel Issa

Mel-Cepstral Distortion of German Vowels in Different Information Density Contexts
Erika Brandt, Frank Zimmerer, Bistra Andreeva, Bernd Möbius

Effect of Formant and F0 Discontinuity on Perceived Vowel Duration: Impacts for Concatenative Speech Synthesis
Tomáš Bořil, Pavel Šturm, Radek Skarnitzl, Jan Volín

An Ultrasound Study of Alveolar and Retroflex Consonants in Arrernte: Stressed and Unstressed Syllables
Marija Tabain, Richard Beare

Reshaping the Transformed LF Model: Generating the Glottal Source from the Waveshape Parameter Rd
Christer Gobl

Kinematic Signatures of Prosody in Lombard Speech
Štefan Beňuš, Juraj Šimko, Mona Lehtinen

What do Finnish and Central Bavarian Have in Common? Towards an Acoustically Based Quantity Typology
Markus Jochim, Felicitas Kleber

Locating Burst Onsets Using SFF Envelope and Phase Information
Bhanu Teja Nellore, RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, B. Yegnanarayana

A Preliminary Phonetic Investigation of Alphabetic Words in Mandarin Chinese
Hongwei Ding, Yuanyuan Zhang, Hongchao Liu, Chu-Ren Huang

A Quantitative Measure of the Impact of Coarticulation on Phone Discriminability
Thomas Schatz, Rory Turnbull, Francis Bach, Emmanuel Dupoux

Music and Audio Processing

Sinusoidal Partials Tracking for Singing Analysis Using the Heuristic of the Minimal Frequency and Magnitude Difference
Kin Wah Edward Lin, Hans Anderson, Clifford So, Simon Lui

Audio Scene Classification with Deep Recurrent Neural Networks
Huy Phan, Philipp Koch, Fabrice Katzberg, Marco Maass, Radoslaw Mazur, Alfred Mertins

Automatic Time-Frequency Analysis of Echolocation Signals Using the Matched Gaussian Multitaper Spectrogram
Maria Sandsten, Isabella Reinhold, Josefin Starkhammar

Classification-Based Detection of Glottal Closure Instants from Speech Signals
Jindřich Matoušek, Daniel Tihelka

A Domain Knowledge-Assisted Nonlinear Model for Head-Related Transfer Functions Based on Bottleneck Deep Neural Network
Xiaoke Qi, Jianhua Tao

Laryngeal Articulation During Trumpet Performance: An Exploratory Study
Luis M.T. Jesus, Bruno Rocha, Andreia Hall

Matrix of Polynomials Model Based Polynomial Dictionary Learning Method for Acoustic Impulse Response Modeling
Jian Guan, Xuan Wang, Pengming Feng, Jing Dong, Wenwu Wang

Acoustic Scene Classification Using a CNN-SuperVector System Trained with Auditory and Spectrogram Image Features
Rakib Hyder, Shabnam Ghaffarzadegan, Zhe Feng, John H.L. Hansen, Taufiq Hasan

An Environmental Feature Representation for Robust Speech Recognition and for Environment Identification
Xue Feng, Brigitte Richardson, Scott Amman, James Glass

Attention and Localization Based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging
Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley

An Audio Based Piano Performance Evaluation Method Using Deep Neural Network Based Acoustic Modeling
Jing Pan, Ming Li, Zhanmei Song, Xin Li, Xiaolin Liu, Hua Yi, Manman Zhu

Music Tempo Estimation Using Sub-Band Synchrony
Shreyan Chowdhury, Tanaya Guha, Rajesh M. Hegde

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification
Yun Wang, Florian Metze

A Note Based Query By Humming System Using Convolutional Neural Network
Naziba Mostafa, Pascale Fung

Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification
Hardik B. Sailor, Dharmesh M. Agrawal, Hemant A. Patil

Novel Shifted Real Spectrum for Exact Signal Reconstruction
Meet H. Soni, Rishabh Tak, Hemant A. Patil

Disorders Related to Speech and Language

Manual and Automatic Transcriptions in Dementia Detection from Speech
Jochen Weiner, Mathis Engelbart, Tanja Schultz

An Affect Prediction Approach Through Depression Severity Parameter Incorporation in Neural Networks
Rahul Gupta, Saurabh Sahu, Carol Espy-Wilson, Shrikanth S. Narayanan

Cross-Database Models for the Classification of Dysarthria Presence
Stephanie Gillespie, Yash-Yee Logan, Elliot Moore, Jacqueline Laures-Gore, Scott Russell, Rupal Patel

Acoustic Evaluation of Nasality in Cerebellar Syndromes
M. Novotný, Jan Rusz, K. Spálenka, Jiří Klempíř, D. Horáková, Evžen Růžička

Emotional Speech of Mentally and Physically Disabled Individuals: Introducing the EmotAsS Database and First Findings
Simone Hantke, Hesam Sagha, Nicholas Cummins, Björn Schuller

Phonological Markers of Oxytocin and MDMA Ingestion
Carla Agurto, Raquel Norel, Rachel Ostrand, Gillinder Bedi, Harriet de Wit, Matthew J. Baggott, Matthew G. Kirkpatrick, Margaret Wardle, Guillermo A. Cecchi

An Avatar-Based System for Identifying Individuals Likely to Develop Dementia
Bahman Mirheidari, Daniel Blackburn, Kirsty Harkness, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen

Cross-Domain Classification of Drowsiness in Speech: The Case of Alcohol Intoxication and Sleep Deprivation
Yue Zhang, Felix Weninger, Björn Schuller

Depression Detection Using Automatic Transcriptions of De-Identified Speech
Paula Lopez-Otero, Laura Docio-Fernandez, Alberto Abad, Carmen Garcia-Mateo

An N-Gram Based Approach to the Automatic Diagnosis of Alzheimer’s Disease from Spoken Language
Sebastian Wankerl, Elmar Nöth, Stefan Evert

Exploiting Intra-Annotator Rating Consistency Through Copeland’s Method for Estimation of Ground Truth Labels in Couples’ Therapy
Karel Mundnich, Md. Nasir, Panayiotis Georgiou, Shrikanth S. Narayanan

Rhythmic Characteristics of Parkinsonian Speech: A Study on Mandarin and Polish
Massimo Pettorino, Wentao Gu, Paweł Półrola, Ping Fan


Trisyllabic Tone 3 Sandhi Patterns in Mandarin Produced by Cantonese Speakers
Jung-Yueh Tu, Janice Wing-Sze Wong, Jih-Ho Cha

Intonation of Contrastive Topic in Estonian
Heete Sahkai, Meelis Mihkla

Reanalyze Fundamental Frequency Peak Delay in Mandarin
Lixia Hao, Wei Zhang, Yanlu Xie, Jinsong Zhang

How Does the Absence of Shared Knowledge Between Interlocutors Affect the Production of French Prosodic Forms?
Amandine Michelas, Cecile Cau, Maud Champagne-Lavau

Three Dimensions of Sentence Prosody and Their (Non-)Interactions
Michael Wagner, Michael McAuliffe

Using Prosody to Classify Discourse Relations
Janine Kleinhans, Mireia Farrús, Agustín Gravano, Juan Manuel Pérez, Catherine Lai, Leo Wanner

Canonical Correlation Analysis and Prediction of Perceived Rhythmic Prominences and Pitch Tones in Speech
Elizabeth Godoy, James R. Williamson, Thomas F. Quatieri

Evaluation of Spectral Tilt Measures for Sentence Prominence Under Different Noise Conditions
Sofoklis Kakouros, Okko Räsänen, Paavo Alku

Creaky Voice as a Function of Tonal Categories and Prosodic Boundaries
Jianjing Kuang

The Acoustics of Word Stress in Czech as a Function of Speaking Style
Radek Skarnitzl, Anders Eriksson

What You See is What You Get Prosodically Less — Visibility Shapes Prosodic Prominence Production in Spontaneous Interaction
Petra Wagner, Nataliya Bryhadyr

Focus Acoustics in Mandarin Nominals
Yu-Yin Hsu, Anqi Xu

Exploring Multidimensionality: Acoustic and Articulatory Correlates of Swedish Word Accents
Malin Svensson Lundmark, Gilbert Ambrazaitis, Otto Ewald

The Perception of English Intonation Patterns by German L2 Speakers of English
Karin Puga, Robert Fuchs, Jane Setter, Peggy Mok

Speaker States and Traits

The Perception of Emotions in Noisified Nonsense Speech
Emilia Parada-Cabaleiro, Alice Baird, Anton Batliner, Nicholas Cummins, Simone Hantke, Björn Schuller

Attention Networks for Modeling Behaviors in Addiction Counseling
James Gibson, Doğan Can, Panayiotis Georgiou, David C. Atkins, Shrikanth S. Narayanan

Computational Analysis of Acoustic Descriptors in Psychotic Patients
Torsten Wörtwein, Tadas Baltrušaitis, Eugene Laksana, Luciana Pennant, Elizabeth S. Liebson, Dost Öngür, Justin T. Baker, Louis-Philippe Morency

Modeling Perceivers Neural-Responses Using Lobe-Dependent Convolutional Neural Network to Improve Speech Emotion Recognition
Ya-Tse Wu, Hsuan-Yu Chen, Yu-Hsien Liao, Li-Wei Kuo, Chi-Chun Lee

Implementing Gender-Dependent Vowel-Level Analysis for Boosting Speech-Based Depression Recognition
Bogdan Vlasenko, Hesam Sagha, Nicholas Cummins, Björn Schuller

Bilingual Word Embeddings for Cross-Lingual Personality Recognition Using Convolutional Neural Nets
Farhad Bin Siddique, Pascale Fung

Emotion Category Mapping to Emotional Space by Cross-Corpus Emotion Labeling
Yoshiko Arimoto, Hiroki Mori

Big Five vs. Prosodic Features as Cues to Detect Abnormality in SSPNET-Personality Corpus
Cedric Fayet, Arnaud Delhay, Damien Lolive, Pierre-François Marteau

Speech Rate Comparison When Talking to a System and Talking to a Human: A Study from a Speech-to-Speech, Machine Translation Mediated Map Task
Hayakawa Akira, Carl Vogel, Saturnino Luz, Nick Campbell

Approaching Human Performance in Behavior Estimation in Couples Therapy Using Deep Sentence Embeddings
Shao-Yen Tseng, Brian Baucom, Panayiotis Georgiou

Complexity in Speech and its Relation to Emotional Bond in Therapist-Patient Interactions During Suicide Risk Assessment Interviews
Md. Nasir, Brian Baucom, Craig J. Bryan, Shrikanth S. Narayanan, Panayiotis Georgiou

An Investigation of Emotion Dynamics and Kalman Filtering for Speech-Based Emotion Prediction
Zhaocheng Huang, Julien Epps

Language Understanding and Generation

Zero-Shot Learning for Natural Language Understanding Using Domain-Independent Sequential Structure and Question Types
Kugatsu Sadamitsu, Yukinori Homma, Ryuichiro Higashinaka, Yoshihiro Matsuo

Parallel Hierarchical Attention Networks with Shared Memory Reader for Multi-Stream Conversational Document Classification
Naoki Sawada, Ryo Masumura, Hiromitsu Nishizaki

Internal Memory Gate for Recurrent Neural Networks with Application to Spoken Language Understanding
Mohamed Morchid

Character-Based Embedding Models and Reranking Strategies for Understanding Natural Language Meal Descriptions
Mandy Korpusik, Zachary Collins, James Glass

Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations
Titouan Parcollet, Mohamed Morchid, Georges Linarès

ASR Error Management for Improving Spoken Language Understanding
Edwin Simonnet, Sahar Ghannay, Nathalie Camelin, Yannick Estève, Renato De Mori

Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks
Mingbo Ma, Kai Zhao, Liang Huang, Bing Xiang, Bowen Zhou

To Plan or not to Plan? Discourse Planning in Slot-Value Informed Sequence to Sequence Models for Language Generation
Neha Nayak, Dilek Hakkani-Tür, Marilyn Walker, Larry Heck

Online Adaptation of an Attention-Based Neural Network for Natural Language Generation
Matthieu Riou, Bassam Jabaian, Stéphane Huet, Fabrice Lefèvre

Spanish Sign Language Recognition with Different Topology Hidden Markov Models
Carlos-D. Martínez-Hinarejos, Zuzanna Parcheta

OpenMM: An Open-Source Multimodal Feature Extraction Tool
Michelle Renee Morales, Stefan Scherer, Rivka Levitan

Speaker Dependency Analysis, Audiovisual Fusion Cues and a Multimodal BLSTM for Conversational Engagement Recognition
Yuyun Huang, Emer Gilmartin, Nick Campbell

Voice Conversion 2

Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks
Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

CAB: An Energy-Based Speaker Clustering Model for Rapid Adaptation in Non-Parallel Voice Conversion
Toru Nakashika

Phoneme-Discriminative Features for Dysarthric Speech Conversion
Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki

Denoising Recurrent Neural Network for Deep Bidirectional LSTM Based Voice Conversion
Jie Wu, D.-Y. Huang, Lei Xie, Haizhou Li

Speaker Dependent Approach for Enhancing a Glossectomy Patient’s Speech via GMM-Based Voice Conversion
Kei Tanaka, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi

Generative Adversarial Network-Based Postfilter for STFT Spectrograms
Takuhiro Kaneko, Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi

Generative Adversarial Network-Based Glottal Waveform Model for Statistical Parametric Speech Synthesis
Bajibabu Bollepalli, Lauri Juvela, Paavo Alku

Emotional Voice Conversion with Adaptive Scales F0 Based on Wavelet Transform Using Limited Amount of Emotional Data
Zhaojie Luo, Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki

Speaker Adaptation in DNN-Based Speech Synthesis Using d-Vectors
Rama Doddipatla, Norbert Braunschweiler, Ranniery Maia

Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion
Runnan Li, Zhiyong Wu, Yishuang Ning, Lifa Sun, Helen Meng, Lianhong Cai

Segment Level Voice Conversion with Recurrent Neural Networks
Miguel Varela Ramos, Alan W. Black, Ramon Fernandez Astudillo, Isabel Trancoso, Nuno Fonseca

Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 1

The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, Cold & Snoring
Björn Schuller, Stefan Steidl, Anton Batliner, Elika Bergelson, Jarek Krajewski, Christoph Janott, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont, Guillermo Hidalgo, Sebastian Schnieder, Clemens Heiser, Winfried Hohenhorst, Michael Herzog, Maximilian Schmitt, Kun Qian, Yue Zhang, George Trigeorgis, Panagiotis Tzirakis, Stefanos Zafeiriou

Description of the Upper Respiratory Tract Infection Corpus (URTIC)
Jarek Krajewski, Sebastian Schieder, Anton Batliner

Description of the Munich-Passau Snore Sound Corpus (MPSSC)
Christoph Janott, Anton Batliner

Description of the Homebank Child/Adult Addressee Corpus (HB-CHAAC)
Elika Bergelson, Andrei Amatuni, Marisa Casillas, Amanda Seidl, Melanie Soderstrom, Anne S. Warlaumont

It Sounds Like You Have a Cold! Testing Voice Features for the Interspeech 2017 Computational Paralinguistics Cold Challenge
Mark Huckvale, András Beke

End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum
Danwei Cai, Zhidong Ni, Wenbo Liu, Weicheng Cai, Gang Li, Ming Li

Infected Phonemes: How a Cold Impairs Speech on a Phonetic Level
Johannes Wagner, Thiago Fraga-Silva, Yvan Josse, Dominik Schiller, Andreas Seiderer, Elisabeth André

Phoneme State Posteriorgram Features for Speech Based Automatic Classification of Speakers in Cold and Healthy Condition
Akshay Kalkunte Suresh, Srinivasa Raghavan K.M., Prasanta Kumar Ghosh

An Integrated Solution for Snoring Sound Classification Using Bhattacharyya Distance Based GMM Supervectors with SVM, Feature Selection with Random Forest and Spectrogram with CNN
Tin Lay Nwe, Huy Dat Tran, Wen Zheng Terence Ng, Bin Ma

Styles, Varieties, Forensics and Tools

The Effects of Real and Placebo Alcohol on Deaffrication
Urban Zihlmann

Polyglot and Speech Corpus Tools: A System for Representing, Integrating, and Querying Speech Corpora
Michael McAuliffe, Elias Stengel-Eskin, Michaela Socolof, Morgan Sonderegger

Mapping Across Feature Spaces in Forensic Voice Comparison: The Contribution of Auditory-Based Voice Quality to (Semi-)Automatic System Testing
Vincent Hughes, Philip Harrison, Paul Foulkes, Peter French, Colleen Kavanagh, Eugenia San Segundo

Effect of Language, Speaking Style and Speaker on Long-Term F0 Estimation
Pablo Arantes, Anders Eriksson, Suska Gutzeit

Stability of Prosodic Characteristics Across Age and Gender Groups
Jan Volín, Tereza Tykalová, Tomáš Bořil

Electrophysiological Correlates of Familiar Voice Recognition
Julien Plante-Hébert, Victor J. Boucher, Boutheina Jemel

Developing an Embosi (Bantu C25) Speech Variant Dictionary to Model Vowel Elision and Morpheme Deletion
Jamison Cooper-Leavitt, Lori Lamel, Annie Rialland, Martine Adda-Decker, Gilles Adda

Rd as a Control Parameter to Explore Affective Correlates of the Tense-Lax Continuum
Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl

Cross-Linguistic Distinctions Between Professional and Non-Professional Speaking Styles
Plínio A. Barbosa, Sandra Madureira, Philippe Boula de Mareüil

Perception and Production of Word-Final /ʁ/ in French
Cedric Gendrot

Glottal Source Estimation from Coded Telephone Speech Using a Deep Neural Network
N.P. Narendra, Manu Airaksinen, Paavo Alku

Automatic Labelling of Prosodic Prominence, Phrasing and Disfluencies in French Speech by Simulating the Perception of Naïve and Expert Listeners
George Christodoulides, Mathieu Avanzi, Anne Catherine Simon

Don’t Count on ASR to Transcribe for You: Breaking Bias with Two Crowds
Michael Levit, Yan Huang, Shuangyu Chang, Yifan Gong

Effects of Training Data Variety in Generating Glottal Pulses from Acoustic Features with DNNs
Manu Airaksinen, Paavo Alku

Towards Intelligent Crowdsourcing for Audio Data Annotation: Integrating Active Learning in the Real World
Simone Hantke, Zixing Zhang, Björn Schuller

Speech Synthesis: Data, Evaluation, and Novel Paradigms

Principles for Learning Controllable TTS from Annotated and Latent Variation
Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi

Sampling-Based Speech Parameter Generation Using Moment-Matching Networks
Shinnosuke Takamichi, Tomoki Koriyama, Hiroshi Saruwatari

Unit Selection with Hierarchical Cascaded Long Short Term Memory Bidirectional Recurrent Neural Nets
Vincent Pollet, Enrico Zovato, Sufian Irhimeh, Pier Batzu

Utterance Selection for Optimizing Intelligibility of TTS Voices Trained on ASR Data
Erica Cooper, Xinyue Wang, Alison Chang, Yocheved Levitan, Julia Hirschberg

Bias and Statistical Significance in Evaluating Speech Synthesis with Mean Opinion Scores
Andrew Rosenberg, Bhuvana Ramabhadran

Phase Modeling Using Integrated Linear Prediction Residual for Statistical Parametric Speech Synthesis
Nagaraj Adiga, S.R. Mahadeva Prasanna

Evaluation of a Silent Speech Interface Based on Magnetic Sensing and Deep Learning for a Phonetically Rich Vocabulary
Jose A. Gonzalez, Lam A. Cheah, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger K. Moore, Ed Holdsworth

Predicting Head Pose from Speech with a Conditional Variational Autoencoder
David Greenwood, Stephen Laycock, Iain Matthews

Real-Time Reactive Speech Synthesis: Incorporating Interruptions
Mirjam Wester, David A. Braude, Blaise Potard, Matthew P. Aylett, Francesca Shaw

A Neural Parametric Singing Synthesizer
Merlijn Blaauw, Jordi Bonada

Tacotron: Towards End-to-End Speech Synthesis
Yuxuan Wang, R.J. Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, Rif A. Saurous

Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System
Tim Capes, Paul Coles, Alistair Conkie, Ladan Golipour, Abie Hadjitarkhani, Qiong Hu, Nancy Huddleston, Melvyn Hunt, Jiangchuan Li, Matthias Neeracher, Kishore Prahallad, Tuomo Raitio, Ramya Rasipuram, Greg Townsend, Becci Williamson, David Winarsky, Zhizheng Wu, Hepeng Zhang

An Expanded Taxonomy of Semiotic Classes for Text Normalization
Daan van Esch, Richard Sproat

Complex-Valued Restricted Boltzmann Machine for Direct Learning of Frequency Spectra
Toru Nakashika, Shinji Takaki, Junichi Yamagishi


ISCA Medal 2017 Ceremony

Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 1

Special Session: Speech Technology for Code-Switching in Multilingual Communities

Special Session: Interspeech 2017 Automatic Speaker Verification Spoofing and Countermeasures Challenge 2

Conversational Telephone Speech Recognition

Multimodal Paralinguistics

Dereverberation, Echo Cancellation and Speech

Acoustic and Articulatory Phonetics

Multimodal and Articulatory Synthesis

Neural Networks for Language Modeling

Pathological Speech and Language

Speech Analysis and Representation 1

Perception of Dialects and L2

Far-field Speech Recognition

Speech Analysis and Representation 2

Speech and Audio Segmentation and Classification 2

Search, Computational Strategies and Language Modeling

Speech Perception

Speech Production and Perception

Multi-lingual Models and Adaptation for ASR

Prosody and Text Processing

Show & Tell 1

Show & Tell 2

Keynote 1: James Allen

Special Session: Speech and Human-Robot Interaction

Special Session: Incremental Processing and Responsive Behaviour

Special Session: Acoustic Manifestations of Social Characteristics

Neural Network Acoustic Models for ASR 1

Models of Speech Production

Speaker Recognition

Phonation and Voice Quality

Speech Synthesis Prosody

Emotion Recognition

WaveNet and Novel Paradigms

Models of Speech Perception

Source Separation and Auditory Scene Analysis

Prosody: Tone and Intonation

Emotion Modeling

Voice Conversion 1

Neural Network Acoustic Models for ASR 2

Speaker Recognition Evaluation

Glottal Source Modeling

Prosody: Rhythm, Stress, Quantity and Phrasing

Speech Recognition for Language Learning

Stance, Credibility, and Deception

Short Utterances Speaker Recognition

Speaker Characterization and Recognition

Acoustic Models for ASR 1

Acoustic Models for ASR 2

Dialog Modeling

L1 and L2 Acquisition

Voice, Speech and Hearing Disorders

Source Separation and Voice Activity Detection


Show & Tell 3

Show & Tell 4

Keynote 2: Catherine Pelachaud

Special Session: Digital Revolution for Under-resourced Languages 1

Special Session: Data Collection, Transcription and Annotation Issues in Child Language Acquisition

Special Session: Digital Revolution for Under-resourced Languages 2

Special Session: Computational Models in Child Language Acquisition

Special Session: Voice Attractiveness

Speech Production and Physiology

Speech and Harmonic Analysis

Dialog and Prosody

Social Signals, Styles, and Interaction

Acoustic Model Adaptation

Cognition and Brain Studies

Noise Robust Speech Recognition

Topic Spotting, Entity Extraction and Semantic Analysis

Dialog Systems

Lexical and Pronunciation Modeling

Language Recognition

Speaker Database and Anti-spoofing

Speech Translation

Multi-channel Speech Enhancement

Speech Recognition: Applications in Medical Practice

Language models for ASR

Speech Recognition: Technologies for New Applications and Paradigms

Speaker and Language Recognition Applications

Spoken Document Processing

Speech Intelligibility

Articulatory and Acoustic Phonetics

Music and Audio Processing

Disorders Related to Speech and Language


Speaker States and Traits

Language Understanding and Generation

Voice Conversion 2

Show & Tell 5

Show & Tell 6

Keynote 3: Björn Lindblom

Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 1

Special Session: State of the Art in Physics-based Voice Simulation

Special Session: Interspeech 2017 Computational Paralinguistics ChallengE (ComParE) 2

Discriminative Training for ASR

Speaker Diarization

Spoken Term Detection

Noise Reduction

Speech Recognition: Multimodal Systems

Neural Network Acoustic Models for ASR 3

Robust Speaker Recognition

Multimodal Resources and Annotation

Forensic Phonetics and Sociophonetic Varieties

Speech and Audio Segmentation and Classification 1

Noise Robust and Far-field ASR

Styles, Varieties, Forensics and Tools

Speech Synthesis: Data, Evaluation, and Novel Paradigms

Show & Tell 7