ISCA Archive Interspeech 2018 Sessions Website Booklet
  ISCA Archive Sessions Website Booklet
top

Interspeech 2018

Hyderabad, India
2-6 September 2018

Chair: B. Yegnanarayana
doi: 10.21437/Interspeech.2018





The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1


The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis, Stefanos Zafeiriou

An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification
Ahmed Imtiaz Humayun, Md. Tauhiduzzaman Khan, Shabnam Ghaffarzadegan, Zhe Feng, Taufiq Hasan

Monitoring Infant's Emotional Cry in Domestic Environments Using the Capsule Network Architecture
Mehmet Ali Tuğtekin Turan, Engin Erzin

Neural Network Architecture That Combines Temporal and Summative Features for Infant Cry Classification in the Interspeech 2018 Computational Paralinguistics Challenge
Mark Huckvale

Evolving Learning for Analysing Mood-Related Infant Vocalisation
Zixing Zhang, Jing Han, Kun Qian, Björn Schuller

Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?
Johannes Wagner, Dominik Schiller, Andreas Seiderer, Elisabeth André

Investigation on Joint Representation Learning for Robust Feature Extraction in Speech Emotion Recognition
Danqing Luo, Yuexian Zou, Dongyan Huang

Using Voice Quality Supervectors for Affect Identification
Soo Jin Park, Amber Afshan, Zhi Ming Chua, Abeer Alwan

An End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical Individuals
Dengke Tang, Junlin Zeng, Ming Li



Speech Segments and Voice Quality


Discriminating Nasals and Approximants in English Language Using Zero Time Windowing
RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty, Bayya Yegnanarayana

Gestural Lenition of Rhotics Captures Variation in Brazilian Portuguese
Phil Howson, Alexei Kochetov

Identification and Classification of Fricatives in Speech Using Zero Time Windowing Method
RaviShankar Prasad, Bayya Yegnanarayana

GlobalTIMIT: Acoustic-Phonetic Datasets for the World’s Languages
Nattanun Chanchaochai, Christopher Cieri, Japhet Debrah, Hongwei Ding, Yue Jiang, Sishi Liao, Mark Liberman, Jonathan Wright, Jiahong Yuan, Juhong Zhan, Yuqing Zhan

Structural Effects on Properties of Consonantal Gestures in Tashlhiyt
Anne Hermes, Doris Mücke, Bastian Auris, Rachid Ridouane

The Retroflex-dental Contrast in Punjabi Stops and Nasals: A Principal Component Analysis of Ultrasound Images
Alexei Kochetov, Matthew Faytak, Kiranpreet Nara

Vowels and Diphthongs in Hangzhou Wu Chinese Dialect
Yang Yue, Fang Hu

Resyllabification in Indian Languages and Its Implications in Text-to-speech Systems
Mahesh M, Jeena J Prakash, Hema Murthy

Voice Source Contribution to Prominence Perception: Rd Implementation
Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl

On the Relationship between Glottal Pulse Shape and Its Spectrum: Correlations of Open Quotient, Pulse Skew and Peak Flow with Source Harmonic Amplitudes
Christer Gobl, Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide

The Individual and the System: Assessing the Stability of the Output of a Semi-automatic Forensic Voice Comparison System
Vincent Hughes, Philip Harrison, Paul Foulkes, Peter French, Colleen Kavanagh, Eugenia San Segundo Fernández

Breathy to Tense Voice Discrimination using Zero-Time Windowing Cepstral Coefficients (ZTWCCs)
Sudarsana Reddy Kadiri, Bayya Yegnanarayana

Analysis of Breathiness in Contextual Vowel of Voiceless Nasals in Mizo
Pamir Gogoi, Sishir Kalita, Parismita Gogoi, Ratree Wayland, Priyankoo Sarmah, S R Mahadeva Prasanna


Speaker State and Trait


Infant Emotional Outbursts Detection in Infant-parent Spoken Interactions
Yijia Xu, Mark Hasegawa-Johnson, Nancy McElwain

Deep Neural Networks for Emotion Recognition Combining Audio and Transcripts
Jaejin Cho, Raghavendra Pappagari, Purva Kulkarni, Jesús Villalba, Yishay Carmiel, Najim Dehak

Preference-Learning with Qualitative Agreement for Sentence Level Emotional Annotations
Srinivas Parthasarathy, Carlos Busso

Transfer Learning for Improving Speech Emotion Classification Accuracy
Siddique Latif, Rajib Rana, Shahzad Younis, Junaid Qadir, Julien Epps

What Do Classifiers Actually Learn? a Case Study on Emotion Recognition Datasets
Patrick Meyer, Eric Buschermöhle, Tim Fingscheidt

State of Mind: Classification through Self-reported Affect and Word Use in Speech.
Eva-Maria Rathner, Yannik Terhorst, Nicholas Cummins, Björn Schuller, Harald Baumeister

Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition
Ziping Zhao, Yu Zheng, Zixing Zhang, Haishuai Wang, Yiqin Zhao, Chao Li

End-to-end Deep Neural Network Age Estimation
Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesús Villalba, Daniel Povey, Sanjeev Khudanpur, Najim Dehak

Improving Gender Identification in Movie Audio Using Cross-Domain Data
Rajat Hebbar, Krishna Somandepalli, Shrikanth Narayanan

On Learning to Identify Genders from Raw Speech Signal Using CNNs
Selen Hande Kabil, Hannah Muckenhirn, Mathew Magimai.-Doss

Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech
Jilt Sebastian, Manoj Kumar, D. S. Pavan Kumar, Mathew Magimai.-Doss, Hema Murthy, Shrikanth Narayanan

The Effect of Exposure to High Altitude and Heat on Speech Articulatory Coordination
James Williamson, Thomas Quatieri, Adam Lammert, Katherine Mitchell, Katherine Finkelstein, Nicole Ekon, Caitlin Dillon, Robert Kenefick, Kristin Heaton


Deep Learning for Source Separation and Pitch Tracking


Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
Lianwu Chen, Meng Yu, Yanmin Qian, Dan Su, Dong Yu

Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian, Dong Yu

Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network
Weipeng He, Petr Motlicek, Jean-Marc Odobez

Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method
Shuai Yang, Zhiyong Wu, Binbin Shen, Helen Meng

Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks
Zhong-Qiu Wang, Xueliang Zhang, DeLiang Wang

Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks
Akihiro Kato, Tomi Kinnunen

Reducing Interference with Phase Recovery in DNN-based Monaural Singing Voice Separation
Paul Magron, Konstantinos Drossos, Stylianos Ioannis Mimilakis, Tuomas Virtanen

Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
Kanru Hua

Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network
Yi Luo, Nima Mesgarani

Music Source Activity Detection and Separation Using Deep Attractor Network
Rajath Kumar, Yi Luo, Nima Mesgarani

Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention
Longfei Yang, Yanlu Xie, Jinsong Zhang









Spoken Dialogue Systems and Conversational Analysis


Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates
Joo-Kyung Kim, Young-Bum Kim

Analyzing Vocal Tract Movements During Speech Accommodation
Sankar Mukherjee, Thierry Legou, Leonardo Lancia, Pauline Hilt, Alice Tomassini, Luciano Fadiga, Alessandro D'Ausilio, Leonardo Badino, Noël Nguyen

Cross-Lingual Multi-Task Neural Architecture for Spoken Language Understanding
Yujiang Li, Xuemin Zhao, Weiqun Xu, Yonghong Yan

Statistical Model Compression for Small-Footprint Natural Language Understanding
Grant P. Strimel, Kanthashree Mysore Sathyendra, Stanislav Peshterliev

Comparison of an End-to-end Trainable Dialogue System with a Modular Statistical Dialogue System
Norbert Braunschweiler, Alexandros Papangelis

A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment
Megan Willi, Stephanie A. Borrie, Tyson S. Barrett, Ming Tu, Visar Berisha

Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs
Matthew Roddy, Gabriel Skantze, Naomi Harte

Classification of Correction Turns in Multilingual Dialogue Corpus
Ivan Kraljevski, Diane Hirschfeld

Contextual Slot Carryover for Disparate Schemas
Chetan Naik, Arpit Gupta, Hancheng Ge, Mathias Lambert, Ruhi Sarikaya

Capsule Networks for Low Resource Spoken Language Understanding
Vincent Renkens, Hugo van Hamme

Intent Discovery Through Unsupervised Semantic Text Clustering
A Padmasundari, Srinivas Bangalore

Multimodal Polynomial Fusion for Detecting Driver Distraction
Yulun Du, Alan W Black, Louis-Philippe Morency, Maxine Eskenazi

Engagement Recognition in Spoken Dialogue via Neural Network by Aggregating Different Annotators' Models
Koji Inoue, Divesh Lala, Katsuya Takanashi, Tatsuya Kawahara

A First Investigation of the Timing of Turn-taking in Ruuli
Tuarik Buanzur, Margaret Zellers, Saudah Namyalo, Alena Witzlack-Makarevich


Spoofing Detection


Spoofing Detection Using Adaptive Weighting Framework and Clustering Analysis
Yuanjun Zhao, Roberto Togneri, Victor Sreeram

Exploration of Compressed ILPR Features for Replay Attack Detection
Sarfaraz Jelil, Sishir Kalita, S R Mahadeva Prasanna, Rohit Sinha

Detection of Replay-Spoofing Attacks Using Frequency Modulation Features
Tharshini Gunendradasan, Buddhi Wickramasinghe, Ngoc Phu Le, Eliathamby Ambikairajah, Julien Epps

Effectiveness of Speech Demodulation-Based Features for Replay Detection
Madhu Kamble, Hemlata Tak, Hemant Patil

Novel Variable Length Energy Separation Algorithm Using Instantaneous Amplitude Features for Replay Detection
Madhu Kamble, Hemant Patil

Feature with Complementarity of Statistics and Principal Information for Spoofing Detection
Jichen Yang, Changhuai You, Qianhua He

Multiple Phase Information Combination for Replay Attacks Detection
Dongbo Li, Longbiao Wang, Jianwu Dang, Meng Liu, Zeyan Oo, Seiichi Nakagawa, Haotian Guan, Xiangang Li

Frequency Domain Linear Prediction Features for Replay Spoofing Attack Detection
Buddhi Wickramasinghe, Saad Irtza, Eliathamby Ambikairajah, Julien Epps

Auditory Filterbank Learning for Temporal Modulation Features in Replay Spoof Speech Detection
Hardik Sailor, Madhu Kamble, Hemant Patil

Deep Siamese Architecture Based Replay Detection for Secure Voice Biometric
Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah

A Deep Identity Representation for Noise Robust Spoofing Detection
Alejandro Gómez Alanís, Antonio M. Peinado, Jose A. Gonzalez, Angel Gomez

End-To-End Audio Replay Attack Detection Using Deep Convolutional Networks with Attention
Francis Tom, Mohit Jain, Prasenjit Dey

Decision-level Feature Switching as a Paradigm for Replay Attack Detection
Saranya M S, Hema Murthy

Modulation Dynamic Features for the Detection of Replay Attacks
Gajan Suthokumar, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah


Speech Analysis and Representation


On the Usefulness of the Speech Phase Spectrum for Pitch Extraction
Erfan Loweimi, Jon Barker, Thomas Hain

Time-regularized Linear Prediction for Noise-robust Extraction of the Spectral Envelope of Speech
Manu Airaksinen, Lauri Juvela, Okko Räsänen, Paavo Alku

Auditory Filterbank Learning Using ConvRBM for Infant Cry Classification
Hardik B. Sailor, Hemant Patil

Effectiveness of Dynamic Features in INCA and Temporal Context-INCA
Nirmesh Shah, Hemant Patil

Singing Voice Phoneme Segmentation by Hierarchically Inferring Syllable and Phoneme Onset Positions
Rong Gong, Xavier Serra

Novel Empirical Mode Decomposition Cepstral Features for Replay Spoof Detection
Prasad Tapkir, Hemant Patil

Novel Linear Frequency Residual Cepstral Features for Replay Attack Detection
Hemlata Tak, Hemant Patil

Analysis of sparse representation based feature on speech mode classification
Kumud Tripathi, K. Sreenivasa Rao

Multicomponent 2-D AM-FM Modeling of Speech Spectrograms
Jitendra Kumar Dhiman, Neeraj Sharma, Chandra Sekhar Seelamantula

An Optimization Framework for Recovery of Speech from Phase-Encoded Spectrograms
Abhilash Sainathan, Sunil Rudresh, Chandra Sekhar Seelamantula

Speaker Recognition with Nonlinear Distortion: Clipping Analysis and Impact
Wei Xia, John H.L. Hansen

Linear Prediction Residual based Short-term Cepstral Features for Replay Attacks Detection
Madhusudan Singh, Debadatta Pati

Analysis of Variational Mode Functions for Robust Detection of Vowels
Surbhi Sakshi, Avinash Kumar, Gayadhar Pradhan


Sequence Models for ASR


Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu

Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
Eugen Beck, Mirko Hannemann, Patrick Dötsch, Ralf Schlüter, Hermann Ney

Acoustic Modeling with DFSMN-CTC and Joint CTC-CE Learning
ShiLiang Zhang, Ming Lei

End-to-End Speech Command Recognition with Capsule Network
Jaesung Bae, Dae-Shik Kim

End-to-End Speech Recognition from the Raw Waveform
Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux

A Multistage Training Framework for Acoustic-to-Word Model
Chengzhu Yu, Chunlei Zhang, Chao Weng, Jia Cui, Dong Yu

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu

Densely Connected Networks for Conversational Speech Recognition
Kyu Han, Akshay Chandrashekaran, Jungsuk Kim, Ian Lane

Multi-Head Decoder for End-to-End Speech Recognition
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda

Compressing End-to-end ASR Networks by Tensor-Train Decomposition
Takuma Mori, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech
Yu-An Chung, James Glass

Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin
Linhao Dong, Shiyu Zhou, Wei Chen, Bo Xu











Speaker Verification II


Voice Comparison and Rhythm: Behavioral Differences between Target and Non-target Comparisons
Moez Ajili, Jean-François Bonastre, Solange Rossato

Co-whitening of I-vectors for Short and Long Duration Speaker Verification
Longting Xu, Kong Aik Lee, Haizhou Li, Zhen Yang

Compensation for Domain Mismatch in Text-independent Speaker Recognition
Fahimeh Bahmaninezhad, John H.L. Hansen

Joint Learning of J-Vector Extractor and Joint Bayesian Model for Text Dependent Speaker Verification
Ziqiang Shi, Liu Liu, Huibin Lin, Rujie Liu

Latent Factor Analysis of Deep Bottleneck Features for Speaker Verification with Random Digit Strings
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu

VoxCeleb2: Deep Speaker Recognition
Joon Son Chung, Arsha Nagrani, Andrew Zisserman

Supervised I-vector Modeling - Theory and Applications
Shreyas Ramoji, Sriram Ganapathy

LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov, Maria Usova

Analysis of Language Dependent Front-End for Speaker Recognition
Srikanth Madikeri, Subhadeep Dey, Petr Motlicek

Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson, Martin Graciarena

Investigation on Bandwidth Extension for Speaker Recognition
Phani Sankar Nidadavolu, Cheng-I Lai, Jesús Villalba, Najim Dehak

On Learning Vocal Tract System Related Speaker Discriminative Information from Raw Signal Using CNNs
Hannah Muckenhirn, Mathew Magimai.-Doss, Sebastien Marcel

On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification
Rajath Kumar, Vaishnavi Yeruva, Sriram Ganapathy

Cosine Metric Learning for Speaker Verification in the I-vector Space
Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen

An Unsupervised Neural Prediction Framework for Learning Speaker Embeddings Using Recurrent Neural Networks
Arindam Jati, Panayiotis Georgiou


Novel Approaches to Enhancement


A New Framework for Supervised Speech Enhancement in the Time Domain
Ashutosh Pandey, DeLiang Wang

Speech Enhancement Using the Minimum-probability-of-error Criterion
Jishnu Sadasivan, Subhadip Mukherjee, Chandra Sekhar Seelamantula

Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics
Pavlos Papadopoulos, Colin Vaz, Shrikanth Narayanan

Using Shifted Real Spectrum Mask as Training Target for Supervised Speech Separation
Yun Liu, Hui Zhang, Xueliang Zhang

Enhancement of Noisy Speech Signal by Non-Local Means Estimation of Variational Mode Functions
Nagapuri Srinivas, Gayadhar Pradhan, Syed Shahnawazuddin

Phase-locked Loop (PLL) Based Phase Estimation in Single Channel Speech Enhancement
Priya Pallavi, Ch V Rama Rao

Cycle-Consistent Speech Enhancement
Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang

Visual Speech Enhancement
Aviv Gabbay, Asaph Shamir, Shmuel Peleg

Implementation of Digital Hearing Aid as a Smartphone Application
Saketh Sharma, Nitya Tiwari, Prem C. Pandey

Bone-Conduction Sensor Assisted Noise Estimation for Improved Speech Enhancement
Ching-Hua Lee, Bhaskar D. Rao, Harinath Garudadri

Artificial Bandwidth Extension with Memory Inclusion Using Semi-supervised Stacked Auto-encoders
Pramod Bachhav, Massimiliano Todisco, Nicholas Evans

Large Vocabulary Concatenative Resynthesis
Soumi Maiti, Joey Ching, Michael Mandel

Concatenative Resynthesis with Improved Training Signals for Speech Enhancement
Ali Raza Syed, Viet Anh Trinh, Michael Mandel


Syllabification, Rhythm, and Voice Activity Detection


Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions
Okko Räsänen, Seshadri Shreyas, Marisa Casillas

A Comparison of Input Types to a Deep Neural Network-based Forced Aligner
Matthew C. Kelley, Benjamin V. Tucker

Joint Learning Using Denoising Variational Autoencoders for Voice Activity Detection
Youngmoon Jung, Younggwan Kim, Yeunju Choi, Hoirin Kim

Information Bottleneck Based Percussion Instrument Diarization System for Taniavartanam Segments of Carnatic Music Concerts
Nauman Dawalatabad, Jom Kuriakose, Chandra Sekhar Chellu, Hema Murthy

Robust Voice Activity Detection Using Frequency Domain Long-Term Differential Entropy
Debayan Ghosh, R Muralishankar, Sanjeev Gurugopinath

Device-directed Utterance Detection
Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister

Acoustic-Prosodic Features of Tabla Bol Recitation and Correspondence with the Tabla Imitation
Rohit M A, Preeti Rao

Who Said That? a Comparative Study of Non-negative Matrix Factorization Techniques
Teun Krikke, Frank Broz, David Lane

AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi

Audiovisual Speech Activity Detection with Advanced Long Short-Term Memory
Fei Tao, Carlos Busso

Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
Pramit Saha, Praneeth Srungarapu, Sidney Fels











Speech and Singing Production


FACTS: A Hierarchical Task-based Control Model of Speech Incorporating Sensory Feedback
Benjamin Parrell, Vikram Ramanarayanan, Srikantan Nagarajan, John Houde

Sensorimotor Response to Tongue Displacement Imagery by Talkers with Parkinson’s Disease
William Katz, Patrick Reidy, Divya Prabhakaran

Automatic Pronunciation Evaluation of Singing
Chitralekha Gupta, Haizhou Li, Ye Wang

Classification of Nonverbal Human Produced Audio Events: A Pilot Study
Rachel E. Bouserhal, Philippe Chabot, Milton Sarria-Paja, Patrick Cardinal, Jérémie Voix

UltraFit: A Speaker-friendly Headset for Ultrasound Recordings in Speech Science
Lorenzo Spreafico, Michael Pucher, Anna Matosova

Articulatory Consequences of Vocal Effort Elicitation Method
Elisabet Eir Cortes, Marcin Wlodarczak, Juraj Šimko

Age-related Effects on Sensorimotor Control of Speech Production
Anne Hermes, Jane Mertens, Doris Mücke

An Ultrasound Study of Gemination in Coronal Stops in Eastern Oromo
Maida Percival, Alexei Kochetov, Yoonjung Kang

Processing Transition Regions of Glottal Stop Substituted /S/ for Intelligibility Enhancement of Cleft Palate Speech
Protima Nomo Sudro, Sishir Kalita, S R Mahadeva Prasanna

Reconstructing Neutral Speech from Tracheoesophageal Speech
Abinay Reddy N, Achuth Rao MV, G. Nisha Meenakshi, Prasanta Kumar Ghosh

Automatic Evaluation of Soft Articulatory Contact for Stuttering Treatment
Keiko Ochi, Koichi Mori, Naomi Sakai

Korean Singing Voice Synthesis Based on an LSTM Recurrent Neural Network
Juntae Kim, Heejin Choi, Jinuk Park, Minsoo Hahn, Sangjin Kim, Jong-Jin Kim

The Trajectory of Voice Onset Time with Vocal Aging
Chen Xuanda, Xiong Ziyu, Hu Jian


Robust Speech Recognition


The Fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, Task and Baselines
Jon Barker, Shinji Watanabe, Emmanuel Vincent, Jan Trmal

Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey, Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson, Karl Ni

Building State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement Baseline
Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu, Shinji Watanabe

Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition
Wei-Ning Hsu, Hao Tang, James Glass

Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition
Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie

Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks
Xuankai Chang, Yanmin Qian, Dong Yu

Weighting Time-Frequency Representation of Speech Using Auditory Saliency for Automatic Speech Recognition
Cong-Thanh Do, Yannis Stylianou

Acoustic Modeling from Frequency Domain Representations of Speech
Pegah Ghahremani, Hossein Hadian, Hang Lv, Daniel Povey, Sanjeev Khudanpur

Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition
Ishwar Chandra Yadav, Avinash Kumar, Syed Shahnawazuddin, Gayadhar Pradhan

Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations
Aaron Nicolson, Kuldip K. Paliwal

Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
Lili Guo, Longbiao Wang, Jianwu Dang, Linjuan Zhang, Haotian Guan, Xiangang Li

Bubble Cooperative Networks for Identifying Important Speech Cues
Viet Anh Trinh, Brian McFee, Michael I Mandel



Integrating Speech Science and Technology for Clinical Applications


Anomaly Detection Approach for Pronunciation Verification of Disordered Speech Using Speech Attribute Features
Mostafa Shahin, Beena Ahmed, Jim X. Ji, Kirrie Ballard

Effectiveness of Voice Quality Features in Detecting Depression
Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Jonathan Flint, Abeer Alwan

Fusing Text-dependent Word-level i-Vector Models to Screen ‘at Risk’ Child Speech
Prasanna Kothalkar, Johanna Rudolph, Christine Dollaghan, Jennifer McGlothlin, Thomas Campbell, John H.L. Hansen

Testing Paradigms for Assistive Hearing Devices in Diverse Acoustic Environments
Ram Charan Chandra Shekar, Hussnain Ali, John H.L. Hansen

Detection of Dementia from Responses to Atypical Questions Asked by Embodied Conversational Agents
Tsuyoki Ujiro, Hiroki Tanaka, Hiroyoshi Adachi, Hiroaki Kazui, Manabu Ikeda, Takashi Kudo, Satoshi Nakamura

Acoustic Features Associated with Sustained Vowel and Continuous Speech Productions by Chinese Children with Functional Articulation Disorders
Wang Zhang, Xiangquan Gui, Tianqi Wang, Manwa Ng, Feng Yang, Lan Wang, Nan Yan

Estimation of Hypernasality Scores from Cleft Lip and Palate Speech
C M Vikram, Ayush Tripathi, Sishir Kalita, S R Mahadeva Prasanna

Detecting Alzheimer’s Disease Using Gated Convolutional Neural Network from Audio Data
Tifani Warnita, Nakamasa Inoue, Koichi Shinoda

Automatic Detection of Orofacial Impairment in Stroke
Andrea Bandini, Jordan Green, Brian Richburg, Yana Yunusova

Detecting Depression with Audio/Text Sequence Modeling of Interviews
Tuka Al Hanai, Mohammad Ghassemi, James Glass












Voice Conversion and Speech Synthesis


Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion
Nirmesh Shah, Maulik C. Madhavi, Hemant Patil

Voice Conversion with Conditional SampleRNN
Cong Zhou, Michael Horgan, Vivek Kumar, Cristina Vasco, Dan Darcy

A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder
Berrak Sisman, Mingyang Zhang, Haizhou Li

WaveNet Vocoder with Limited Training Data for Voice Conversion
Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou, Li-Rong Dai

Collapsed Speech Segment Detection and Suppression for WaveNet Vocoder
Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing, Tomoki Toda

High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder
Kuan Chen, Bo Chen, Jiahao Lai, Kai Yu

Spanish Statistical Parametric Speech Synthesis Using a Neural Vocoder
Antonio Bonafonte, Santiago Pascual, Georgina Dorca

Experiments with Training Corpora for Statistical Text-to-speech Systems.
Monika Podsiadło, Victor Ungureanu

Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions
Yu Gu, Yongguo Kang

Speaker-independent Raw Waveform Model for Glottal Excitation
Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi, Paavo Alku

A New Glottal Neural Vocoder for Speech Synthesis
Yang Cui, Xi Wang, Lei He, Frank K. Soong

Exemplar-based Speech Waveform Generation
Oliver Watts, Cassia Valentini-Botinhao, Felipe Espic, Simon King

Frequency Domain Variants of Velvet Noise and Their Application to Speech Processing and Synthesis
Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino


Extracting Information from Audio


Joint Learning of Interactive Spoken Content Retrieval and Trainable User Simulator
Pei-Hung Chung, Kuan Tung, Ching-Lun Tai, Hung-yi Lee

Attention-based End-to-End Models for Small-Footprint Keyword Spotting
Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie

Prediction of Aesthetic Elements in Karnatic Music: A Machine Learning Approach
Ragesh Rajan M, Ashwin Vijayakumar, Deepu Vijayasenan

Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning
Wenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen

Automatic Speech Recognition and Topic Identification from Speech for Almost-Zero-Resource Languages
Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak, Sanjeev Khudanpur

Play Duration Based User-Entity Affinity Modeling in Spoken Dialog System
Bo Xiao, Nicholas Monath, Shankar Ananthakrishnan, Abishek Ravi

Empirical Analysis of Score Fusion Application to Combined Neural Networks for Open Vocabulary Spoken Term Detection
Shi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh

Phonological Posterior Hashing for Query by Example Spoken Term Detection
Afsaneh Asaei, Dhananjay Ram, Hervé Bourlard

Term Extraction via Neural Sequence Labeling a Comparative Evaluation of Strategies Using Recurrent Neural Networks
Maren Kucza, Jan Niehues, Thomas Zenkel, Alex Waibel, Sebastian Stüker

Semi-supervised Learning for Information Extraction from Dialogue
Anjuli Kannan, Kai Chen, Diana Jaunzeikare, Alvin Rajkomar

Slot Filling with Delexicalized Sentence Generation
Youhyun Shin, Kang Min Yoo, Sang-goo Lee

Music Genre Recognition Using Deep Neural Networks and Transfer Learning
Deepanway Ghosal, Maheshkumar H. Kolekar

Efficient Voice Trigger Detection for Low Resource Hardware
Siddharth Sigtia, Rob Haynes, Hywel Richards, Erik Marchi, John Bridle


Signal Analysis for the Natural, Biological and Social Sciences


A Novel Normalization Method for Autocorrelation Function for Pitch Detection and for Speech Activity Detection
Qiguang Lin, Yiwen Shao

Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley
TV Ananthapadmanabha, Ramakrishnan A G

Deep Learning Techniques for Koala Activity Detection
Ivan Himawan, Michael Towsey, Bradley Law, Paul Roe

Glottal Closure Instant Detection from Speech Signal Using Voting Classifier and Recursive Feature Elimination
Jindřich Matoušek, Daniel Tihelka

Assessing Speaker Engagement in 2-Person Debates: Overlap Detection in United States Presidential Debates
Midia Yousefi, Navid Shokouhi, John H.L. Hansen

All-Conv Net for Bird Activity Detection: Significance of Learned Pooling
Arjun Pankajakshan, Anshul Thakur, Daksh Thapar, Padmanabhan Rajan, Aditya Nigam

Deep Convex Representations: Feature Representations for Bioacoustics Classification
Anshul Thakur, Vinayak Abrol, Pulkit Sharma, Padmanabhan Rajan

Detection of Glottal Excitation Epochs in Speech Signal Using Hilbert Envelope
Hirak Dasgupta, Prem C. Pandey, K S Nataraj

Analyzing Thai Tone Distribution through Functional Data Analysis
Hong Zhang

Articulatory Feature Classification Using Convolutional Neural Networks
Danny Merkx, Odette Scharenborg

A New Frequency Coverage Metric and a New Subband Encoding Model, with an Application in Pitch Estimation
Shoufeng Lin

Improved Epoch Extraction from Telephonic Speech Using Chebfun and Zero Frequency Filtering
B Ganga Gowri, K P Soman, D Govind











Adjusting to Speaker, Accent, and Domain


Multi-Modal Data Augmentation for End-to-end ASR
Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner, Shinji Watanabe

Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition
Takafumi Moriya, Sei Ueno, Yusuke Shinohara, Marc Delcroix, Yoshikazu Yamaguchi, Yushi Aono

Training Augmentation with Adversarial Examples for Robust Speech Recognition
Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang, Lei Xie

Data Augmentation Improves Recognition of Foreign Accented Speech
Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin, Gakuto Kurata

Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition
Natalia Tomashenko, Yuri Khokhlov, Yannick Estève

Neural Language Codes for Multilingual Acoustic Models
Markus Müller, Sebastian Stüker, Alex Waibel

Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition
Sei Ueno, Takafumi Moriya, Masato Mimura, Shinsuke Sakai, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono, Tatsuya Kawahara

Empirical Evaluation of Speaker Adaptation on DNN Based Acoustic Model
Ke Wang, Junbo Zhang, Yujun Wang, Lei Xie

Improving DNNs Trained with Non-Native Transcriptions Using Knowledge Distillation and Target Interpolation
Amit Das, Mark Hasegawa-Johnson

Improving Cross-Lingual Knowledge Transferability Using Multilingual TDNN-BLSTM with Language-Dependent Pre-Final Layer
Siyuan Feng, Tan Lee

Auxiliary Feature Based Adaptation of End-to-end ASR Systems
Marc Delcroix, Shinji Watanabe, Atsunori Ogawa, Shigeki Karita, Tomohiro Nakatani

Leveraging Native Language Information for Improved Accented Speech Recognition
Shahram Ghorbani, John H.L. Hansen

Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
Abhinav Jain, Minali Upreti, Preethi Jyothi

Fast Language Adaptation Using Phonological Information
Sibo Tong, Philip N. Garner, Hervé Bourlard


Speech Synthesis Paradigms and Methods


Naturalness Improvement Algorithm for Reconstructed Glossectomy Patient's Speech Using Spectral Differential Modification in Voice Conversion
Hiroki Murakami, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi

Audio-visual Voice Conversion Using Deep Canonical Correlation Analysis for Deep Bottleneck Features
Satoshi Tamura, Kento Horio, Hajime Endo, Satoru Hayamizu, Tomoki Toda

An Investigation of Convolution Attention Based Models for Multilingual Speech Synthesis of Indian Languages
Pallavi Baljekar, SaiKrishna Rallabandi, Alan W Black

The Effect of Real-Time Constraints on Automatic Speech Animation
Danny Websdale, Sarah Taylor, Ben Milner

Joint Learning of Facial Expression and Head Pose from Speech
David Greenwood, Iain Matthews, Stephen Laycock

Acoustic-dependent Phonemic Transcription for Text-to-speech Synthesis
Kévin Vythelingum, Yannick Estève, Olivier Rosec

Multimodal Speech Synthesis Architecture for Unsupervised Speaker Adaptation
Hieu-Thi Luong, Junichi Yamagishi

Articulatory-to-speech Conversion Using Bi-directional Long Short-term Memory
Fumiaki Taguchi, Tokihiko Kaburagi

Implementation of Respiration in Articulatory Synthesis Using a Pressure-Volume Lung Model
Keisuke Tanihara, Shogo Yonekura, Yasuo Kuniyoshi

Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis
Xiao Zhou, Zhen-Hua Ling, Zhi-Ping Zhou, Li-Rong Dai

Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer
Ruibo Fu, Jianhua Tao, Yibin Zheng, Zhengqi Wen

DNN-based Speech Synthesis for Small Data Sets Considering Bidirectional Speech-Text Conversion
Kentaro Sone, Toru Nakashika

A Weighted Superposition of Functional Contours Model for Modelling Contextual Prominence of Elementary Prosodic Contours
Branislav Gerazov, Gérard Bailly, Yi Xu

LSTBM: A Novel Sequence Representation of Speech Spectra Using Restricted Boltzmann Machine with Long Short-Term Memory
Toru Nakashika


Second Language Acquisition and Code-switching


Should Code-switching Models Be Asymmetric?
Barbara E. Bullock, Gualberto Guzmán, Jacqueline Serigos, Almeida Jacqueline Toribio

Cross-language Perception of Mandarin Lexical Tones by Mongolian-speaking Bilinguals in the Inner Mongolia Autonomous Region, China
Kimiko Tsukada, Yu Rong

Automatically Measuring L2 Speech Fluency without the Need of ASR: A Proof-of-concept Study with Japanese Learners of French
Lionel Fontan, Maxime Le Coz, Sylvain Detey

Analysis of L2 Learners’ Progress of Distinguishing Mandarin Tone 2 and Tone 3
Yue Sun, Win Thuzar Kyaw, Jinsong Zhang, Yoshinori Sagisaka

Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis
Xu Li, Shaoguang Mao, Xixin Wu, Kun Li, Xunying Liu, Helen Meng

Wuxi Speakers’ Production and Perception of Coda Nasals in Mandarin
Lei Wang, Jie Cui, Ying Chen

The Diphthongs of Formal Nigerian English: A Preliminary Acoustic Analysis
Natalia Dyrenko, Robert Fuchs

Characterizing Rhythm Differences between Strong and Weak Accented L2 Speech
Chris Davis, Jeesun Kim

Analysis of Phone Errors Attributable to Phonological Effects Associated With Language Acquisition Through Bottleneck Feature Visualisations
Eva Fringi, Martin Russell

Category Similarity in Multilingual Pronunciation Training
Jacques Koreman

Talker Diarization in the Wild: the Case of Child-centered Daylong Audio-recordings
Alejandrina Cristia, Shobhana Ganesh, Marisa Casillas, Sriram Ganapathy

Automated Classification of Children’s Linguistic versus Non-Linguistic Vocalisations
Zixing Zhang, Alejandrina Cristia, Anne Warlaumont, Björn Schuller

Pitch Characteristics of L2 English Speech by Chinese Speakers: A Large-scale Study
Jiahong Yuan, Qiusi Dong, Fei Wu, Huan Luan, Xiaofei Yang, Hui Lin, Yang Liu


Topics in Speech Recognition


Dual Language Models for Code Switched Speech Recognition
Saurabh Garg, Tanmay Parekh, Preethi Jyothi

Multilingual Neural Network Acoustic Modelling for ASR of Under-Resourced English-isiZulu Code-Switched Speech
Astik Biswas, Febe de Wet, Ewald van der Westhuizen, Emre Yılmaz, Thomas Niesler

Fast ASR-free and Almost Zero-resource Keyword Spotting Using DTW and CNNs for Humanitarian Monitoring
Raghav Menon, Herman Kamper, John Quinn, Thomas Niesler

Text-Dependent Speech Enhancement for Small-Footprint Robust Keyword Detection
Meng Yu, Xuan Ji, Yi Gao, Lianwu Chen, Jie Chen, Jimeng Zheng, Dan Su, Dong Yu

Improved ASR for Under-resourced Languages through Multi-task Learning with Acoustic Landmarks
Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen

Cross-language Phoneme Mapping for Low-resource Languages: An Exploration of Benefits and Trade-offs
Nick K Chibuye, Todd Rosenstock, Brian DeRenzi

User-centric Evaluation of Automatic Punctuation in ASR Closed Captioning
Máté Ákos Tündik, György Szaszák, Gábor Gosztolya, András Beke

Punctuation Prediction Model for Conversational Speech
Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel, Najim Dehak

BUT OpenSAT 2017 Speech Recognition System
Martin Karafiát, Murali Karthick Baskar, Igor Szöke, Vladimír Malenovský, Karel Veselý, František Grézl, Lukáš Burget, Jan Černocký

Visual Recognition of Continuous Cued Speech Using a Tandem CNN-HMM Approach
Li Liu, Thomas Hueber, Gang Feng, Denis Beautemps

Building Large-vocabulary Speaker-independent Lipreading Systems
Kwanchiva Thangthai, Richard Harvey

CRIM's System for the MGB-3 English Multi-Genre Broadcast Media Transcription
Vishwa Gupta, Gilles Boulianne

Sampling Strategies in Siamese Networks for Unsupervised Speech Representation Learning
Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz, Emmanuel Dupoux

Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting
Mengzhe Chen, ShiLiang Zhang, Ming Lei, Yong Liu, Haitao Yao, Jie Gao







Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis


Multilingual Grapheme-to-Phoneme Conversion with Global Character Vectors
Jinfu Ni, Yoshinori Shiga, Hisashi Kawai

A Hybrid Approach to Grapheme to Phoneme Conversion in Assamese
Somnath Roy, Shakuntala Mahanta

Investigation of Using Disentangled and Interpretable Representations for One-shot Cross-lingual Voice Conversion
Seyed Hamidreza Mohammadi, Taehwan Kim

Using Pupillometry to Measure the Cognitive Load of Synthetic Speech
Avashna Govender, Simon King

Measuring the Cognitive Load of Synthetic Speech Using a Dual Task Paradigm
Avashna Govender, Simon King

Attentive Sequence-to-Sequence Learning for Diacritic Restoration of YorùBá Language Text
Iroro Orife

Gated Convolutional Neural Network for Sentence Matching
Peixin Chen, Wu Guo, Zhi Chen, Jian Sun, Lanhua You

On Training and Evaluation of Grapheme-to-Phoneme Mappings with Limited Data
Dravyansh Sharma

The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech
Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins, Björn Schuller

Word Emphasis Prediction for Expressive Text to Speech
Yosi Mass, Slava Shechtman, Moran Mordechay, Ron Hoory, Oren Sar Shalom, Guy Lev, David Konopnicki

A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis
Kai-Zhan Lee, Erica Cooper, Julia Hirschberg

Data Requirements, Selection and Augmentation for DNN-based Speech Synthesis from Crowdsourced Data
Markus Toman, Geoffrey S. Meltzner, Rupal Patel


Neural Network Training Strategies for ASR


Lightly Supervised vs. Semi-supervised Training of Acoustic Model on Luxembourgish for Low-resource Automatic Speech Recognition
Karel Veselý, Carlos Segura, Igor Szöke, Jordi Luque, Jan Černocký

Investigation on the Combination of Batch Normalization and Dropout in BLSTM-based Acoustic Modeling for ASR
Li Wenjie, Gaofeng Cheng, Fengpei Ge, Pengyuan Zhang, Yonghong Yan

Inference-Invariant Transformation of Batch Normalization for Domain Adaptation of Acoustic Models
Masayuki Suzuki, Tohru Nagano, Gakuto Kurata, Samuel Thomas

Active Learning for LF-MMI Trained Neural Networks in ASR
Yanhua Long, Hong Ye, Yijie Li, Jiaen Liang

An Investigation of Mixup Training Strategies for Acoustic Models in ASR
Ivan Medennikov, Yuri Khokhlov, Aleksei Romanenko, Dmitry Popov, Natalia Tomashenko, Ivan Sorokin, Alexander Zatvornitskiy

Comparison of Unsupervised Modulation Filter Learning Methods for ASR
Purvi Agrawal, Sriram Ganapathy

Improved Training for Online End-to-end Speech Recognition Systems
Suyoun Kim, Michael Seltzer, Jinyu Li, Rui Zhao

Combining Natural Gradient with Hessian Free Methods for Sequence Training
Adnan Haider, Philip Woodland

Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models
Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu

A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition
Hao Tang, Wei-Ning Hsu, François Grondin, James Glass

Multilingual Deep Neural Network Training Using Cyclical Learning Rate
Andreas Søeborg Kirkedal, Yeon-Jun Kim













Acoustic Scenes and Rare Events


Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection
Shao-Yen Tseng, Juncheng Li, Yun Wang, Florian Metze, Joseph Szurley, Samarjit Das

Unsupervised Temporal Feature Learning Based on Sparse Coding Embedded BoAW for Acoustic Event Recognition
Liwen Zhang, Jiqing Han, Shiwen Deng

Data Independent Sequence Augmentation Method for Acoustic Scene Classification
Zhang Teng, Kailai Zhang, Ji Wu

A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification
Hongwei Song, Jiqing Han, Shiwen Deng

ASe: Acoustic Scene Embedding Using Deep Archetypal Analysis and GMM
Pulkit Sharma, Vinayak Abrol, Anshul Thakur

Deep Convolutional Neural Network with Scalogram for Audio Scene Modeling
Hangting Chen, Pengyuan Zhang, Haichuan Bai, Qingsheng Yuan, Xiuguo Bao, Yonghong Yan

Time Aggregation Operators for Multi-label Audio Event Detection
Pankaj Joshi, Digvijaysingh Gautam, Ganesh Ramakrishnan, Preethi Jyothi

Early Detection of Continuous and Partial Audio Events Using CNN
Ian McLoughlin, Yan Song, Lam Dang Pham, Ramaswamy Palaniappan, Huy Phan, Yue Lang

Robust Acoustic Event Classification Using Bag-of-Visual-Words
Manjunath Mulimani, Shashidhar G Koolagudi

Wavelet Transform Based Mel-scaled Features for Acoustic Scene Classification
Shefali Waldekar, Goutam Saha

Multi-modal Attention Mechanisms in LSTM and Its Application to Acoustic Scene Classification
Teng Zhang, Kailai Zhang, Ji Wu



Speech Pathology, Depression, and Medical Applications


How Did You like 2017? Detection of Language Markers of Depression and Narcissism in Personal Narratives
Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe, Harald Baumeister

Depression Detection from Short Utterances via Diverse Smartphones in Natural Environmental Conditions
Zhaocheng Huang, Julien Epps, Dale Joachim, Michael Chen

Multi-Lingual Depression-Level Assessment from Conversational Speech Using Acoustic and Text Features
Yasin Özkanca, Cenk Demiroglu, Aslı Besirli, Selime Celik

Dysarthric Speech Classification Using Glottal Features Computed from Non-words, Words and Sentences
Narendra N P, Paavo Alku

Identifying Schizophrenia Based on Temporal Parameters in Spontaneous Speech
Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi, Ildikó Hoffmann

Using Prosodic and Lexical Information for Learning Utterance-level Behaviors in Psychotherapy
Karan Singla, Zhuohao Chen, Nikolaos Flemotomos, James Gibson, Dogan Can, David Atkins, Shrikanth Narayanan

Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning
Ying Qin, Tan Lee, Siyuan Feng, Anthony Pak Hin Kong

Towards an Unsupervised Entrainment Distance in Conversational Speech Using Deep Neural Networks
Md Nasir, Brian Baucom, Shrikanth Narayanan, Panayiotis Georgiou

Patient Privacy in Paralinguistic Tasks
Francisco Teixeira, Alberto Abad, Isabel Trancoso

A Lightly Supervised Approach to Detect Stuttering in Children's Speech
Sadeen Alharbi, Madina Hasan, Anthony J H Simons, Shelagh Brumfitt, Phil Green

Learning Conditional Acoustic Latent Representation with Gender and Age Attributes for Automatic Pain Level Recognition
Jeng-Lin Li, Yi-Ming Weng, Chip-Jin Ng, Chi-Chun Lee







Speaker Verification Using Neural Network Methods II


Training Utterance-level Embedding Networks for Speaker Identification and Verification
Heewoong Park, Sukhyun Cho, Kyubyong Park, Namju Kim, Jonghun Park

Analysis of Complementary Information Sources in the Speaker Embeddings Framework
Mahesh Kumar Nandwana, Mitchell McLaren, Diego Castan, Julien van Hout, Aaron Lawson

Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
Yingke Zhu, Tom Ko, David Snyder, Brian Mak, Daniel Povey

An Improved Deep Embedding Learning Method for Short Duration Speaker Verification
Zhifu Gao, Yan Song, Ian McLoughlin, Wu Guo, Lirong Dai

Avoiding Speaker Overfitting in End-to-End DNNs Using Raw Waveform for Text-Independent Speaker Verification
Jee-weon Jung, Hee-soo Heo, IL-ho Yang, Hye-jin Shim, Ha-jin Yu

Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
Gautam Bhattacharya, Md Jahangir Alam, Vishwa Gupta, Patrick Kenny

Employing Phonetic Information in DNN Speaker Embeddings to Improve Speaker Recognition Performance
Md Hafizur Rahman, Ivan Himawan, Mitchell McLaren, Clinton Fookes, Sridha Sridharan

End-to-end Text-dependent Speaker Verification Using Novel Distance Measures
Subhadeep Dey, Srikanth Madikeri, Petr Motlicek

Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams
Harishchandra Dubey, Abhijeet Sangwan, John H.L. Hansen

Triplet Network with Attention for Speaker Diarization
Huan Song, Megan Willi, Jayaraman J. Thiagarajan, Visar Berisha, Andreas Spanias

I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification
Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda

Analysis of Length Normalization in End-to-End Speaker Verification System
Weicheng Cai, Jinkun Chen, Ming Li

Angular Softmax for Short-Duration Text-independent Speaker Verification
Zili Huang, Shuai Wang, Kai Yu

An End-to-End Text-Independent Speaker Identification System on Short Utterances
Ruifang Ji, Xinyuan Cai, Xu Bo

MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks
Wenhao Ding, Liang He


Emotion Recognition and Analysis


Categorical vs Dimensional Perception of Italian Emotional Speech
Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Alice Baird, Björn Schuller

A Three-Layer Emotion Perception Model for Valence and Arousal-Based Detection from Multilingual Speech
Xingfeng Li, Masato Akagi

Cross-lingual Speech Emotion Recognition through Factor Analysis
Brecht Desplanques, Kris Demuynck

Modeling Self-Reported and Observed Affect from Speech
Jian Cheng, Jared Bernstein, Elizabeth Rosenfeld, Peter W. Foltz, Alex S. Cohen, Terje B. Holmlund, Brita Elvevåg

Stochastic Shake-Shake Regularization for Affective Learning from Speech
Che-Wei Huang, Shrikanth Narayanan

Investigating Speech Enhancement and Perceptual Quality for Speech Emotion Recognition
Anderson R. Avila, Md Jahangir Alam, Douglas O'Shaughnessy, Tiago Falk

Demonstrating and Modelling Systematic Time-varying Annotator Disagreement in Continuous Emotion Annotation
Mia Atcheson, Vidhyasaharan Sethu, Julien Epps

Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function
Jian Huang, Ya Li, Jianhua Tao, Zhen Lian

Imbalance Learning-based Framework for Fear Recognition in the MediaEval Emotional Impact of Movies Task
Xiaotong Zhang, Xingliang Cheng, Mingxing Xu, Thomas Fang Zheng

Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms
Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai

Speech Emotion Recognition Using Spectrogram & Phoneme Embedding
Promod Yenigalla, Abhay Kumar, Suraj Tripathi, Chirag Singh, Sibsambhu Kar, Jithendra Vepa

On Enhancing Speech Emotion Recognition Using Generative Adversarial Networks
Saurabh Sahu, Rahul Gupta, Carol Espy-Wilson

Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes
Srinivas Parthasarathy, Carlos Busso


Acoustic Modelling


Knowledge Distillation for Sequence Model
Mingkun Huang, Yongbin You, Zhehuai Chen, Yanmin Qian, Kai Yu

Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks
Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara, Hisashi Kawai

Filter Sampling and Combination CNN (FSC-CNN): A Compact CNN Model for Small-footprint ASR Acoustic Modeling Using Raw Waveforms
Jinxi Guo, Ning Xu, Xin Chen, Yang Shi, Kaiyuan Xu, Abeer Alwan

Twin Regularization for Online Speech Recognition
Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio

Self-Attentional Acoustic Models
Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker, Alex Waibel

Hierarchical Recurrent Neural Networks for Acoustic Modeling
Jinhwan Park, Iksoo Choi, Yoonho Boo, Wonyong Sung

Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme Prediction
Antoine Bruguier, Anton Bakhtin, Dravyansh Sharma

Leveraging Second-Order Log-Linear Model for Improved Deep Learning Based ASR Performance
Ankit Raj, Shakti P Rath, Jithendra Vepa

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks
Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, Sanjeev Khudanpur

Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings
Da-Rong Liu, Kuan-Yu Chen, Hung-yi Lee, Lin-shan Lee

Phone Recognition Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs
Mengjie Qian, Linxue Bai, Peter Jančovič, Martin Russell

A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation
Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher



×

ISCA Medal Talk

End-to-End Speech Recognition

Prosody Modeling and Generation

Speaker Verification I

Spoken Term Detection

The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 1

Show and Tell 1

Speech Segments and Voice Quality

Speaker State and Trait

Deep Learning for Source Separation and Pitch Tracking

Acoustic Analysis-Synthesis of Speech Disorders

ASR Systems and Technologies

Deception, Personality, and Culture Attribute

Automatic Detection and Recognition of Voice and Speech Disorders

Voice Conversion

The INTERSPEECH 2018 Computational Paralinguistics ChallengE (ComParE): Atypical & Self-Assessed Affect, Crying & Heart Beats 2

Show and Tell 2

Spoken Dialogue Systems and Conversational Analysis

Spoofing Detection

Speech Analysis and Representation

Sequence Models for ASR

Source Separation and Spatial Analysis

Plenary Talk-1

Acoustic Model Adaptation

Statistical Parametric Speech Synthesis

Emotion Modeling

Models of Speech Perception

Multimodal Dialogue Systems

Speech Recognition for Indian Languages

Show and Tell 3

Speaker Verification II

Novel Approaches to Enhancement

Syllabification, Rhythm, and Voice Activity Detection

Selected Topics in Neural Speech Processing

Perspective Talk-1

Dereverberation

Audio Events and Acoustic Scenes

Speaker Diarization

Phonation

Cognition and Brain Studies

Deep Neural Networks: How Can We Interpret What They Learned?

Show and Tell 4

Speech and Singing Production

Robust Speech Recognition

Applications in Education and Learning

Integrating Speech Science and Technology for Clinical Applications

Speaker Characterization and Analysis

Perspective Talk-2

Plenary Talk-2

Novel Neural Network Architectures for Acoustic Modelling

Language Identification

Production of Prosody

Speech Intelligibility and Quality

Integrating Speech Science and Technology for Clinical Applications

Speech Technologies for Code-Switching in Multilingual Communities

Show and Tell 5

Voice Conversion and Speech Synthesis

Extracting Information from Audio

Signal Analysis for the Natural, Biological and Social Sciences

Speech Prosody

Perspective Talk-3

Recurrent Neural Models for ASR

Speaker Verification Using Neural Network Methods I

Speech Perception in Adverse Conditions

Measuring Pitch and Articulation

Speech and Language Analytics for Mental Health

Spoken CALL Shared Task, Second Edition

Show and Tell 6

Adjusting to Speaker, Accent, and Domain

Speech Synthesis Paradigms and Methods

Second Language Acquisition and Code-switching

Topics in Speech Recognition

Zero-resource Speech Recognition

Spatial and Phase Cues for Source Separation and Speech Recognition

Dialectal Variation

Spoken Corpora and Annotation

The First DIHARD Speech Diarization Challenge

Text Analysis, Multilingual Issues and Evaluation in Speech Synthesis

Neural Network Training Strategies for ASR

Application of ASR in Medical Practice

Source and Supra-segmentals

Plenary Talk-3

Distant ASR

Expressive Speech Synthesis

Representation Learning for Emotion

Articulatory Information, Modeling and Inversion

Novel Paradigms for Direct Synthesis Based on Speech-Related Biosignals

Low Resource Speech Recognition Challenge for Indian Languages

Show and Tell 7

Deep Enhancement

Acoustic Scenes and Rare Events

Language Modeling

Speech Pathology, Depression, and Medical Applications

Perspective Talk-4

Spoken Language Understanding

Source Separation from Monaural Input

Multimodal Systems

Coding

Speaker Verification Using Neural Network Methods II

Emotion Recognition and Analysis

Acoustic Modelling

Speech and Speaker Perception