doi: 10.21437/Interspeech.2018
Semi-Supervised End-to-End Speech Recognition
Shigeki Karita, Shinji Watanabe, Tomoharu Iwata, Atsunori Ogawa, Marc Delcroix
Improved Training of End-to-end Attention Models for Speech Recognition
Albert Zeyer, Kazuki Irie, Ralf Schlüter, Hermann Ney
End-to-end Speech Recognition Using Lattice-free MMI
Hossein Hadian, Hossein Sameti, Daniel Povey, Sanjeev Khudanpur
Multi-channel Attention for End-to-End Speech Recognition
Stefan Braun, Daniel Neil, Jithendar Anumula, Enea Ceolini, Shih-Chii Liu
Quaternion Convolutional Neural Networks for End-to-End Automatic Speech Recognition
Titouan Parcollet, Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linares, Renato de Mori, Yoshua Bengio
Compression of End-to-End Models
Ruoming Pang, Tara Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang, Chung-Cheng Chiu
Learning Interpretable Control Dimensions for Speech Synthesis by Using External Data
Zack Hodari, Oliver Watts, Srikanth Ronanki, Simon King
Investigating Accuracy of Pitch-accent Annotations in Neural Network-based Speech Synthesis and Denoising Effects
Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa
An Exploration of Local Speaking Rate Variations in Mandarin Read Speech
Guan-Ting Liou, Chen-Yu Chiang, Yih-Ru Wang, Sin-Horng Chen
BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End
Yibin Zheng, Jianhua Tao, Zhengqi Wen, Ya Li
Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion
Berrak Sisman, Haizhou Li
Improving Mongolian Phrase Break Prediction by Using Syllable and Morphological Embeddings with BiLSTM Model
Rui Liu, Feilong Bao, Guanglai Gao, Hui Zhang, Yonghe Wang
Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification
Lanhua You, Wu Guo, Yan Song, Sheng Zhang
Double Joint Bayesian Modeling of DNN Local I-Vector for Text Dependent Speaker Verification with Random Digit Strings
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu
Fast Variational Bayes for Heavy-tailed PLDA Applied to i-vectors and x-vectors
Anna Silnova, Niko Brümmer, Daniel Garcia-Romero, David Snyder, Lukáš Burget
Integrated Presentation Attack Detection and Automatic Speaker Verification: Common Features and Gaussian Back-end Fusion
Massimiliano Todisco, Héctor Delgado, Kong Aik Lee, Md Sahidullah, Nicholas Evans, Tomi Kinnunen, Junichi Yamagishi
A Generalization of PLDA for Joint Modeling of Speaker Identity and Multiple Nuisance Conditions
Luciana Ferrer, Mitchell McLaren
An Investigation of Non-linear i-vectors for Speaker Verification
Nanxin Chen, Jesús Villalba, Najim Dehak
CNN Based Query by Example Spoken Term Detection
Dhananjay Ram, Lesly Miculicich, Hervé Bourlard
Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search
Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li
Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection
Ziwei Zhu, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai
Fast Derivation of Cross-lingual Document Vectors from Self-attentive Neural Machine Translation Model
Wei Li, Brian Mak
LSTM Based Attentive Fusion of Spectral and Prosodic Information for Keyword Spotting in Hindi Language
Laxmi Pandey, Karan Nathwani
Spoken Keyword Detection Using Joint DTW-CNN
Ravi Shankar, C M Vikram, S R M Prasanna
The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats
Björn Schuller, Stefan Steidl, Anton Batliner, Peter B. Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian B. Pokorny, Eva-Maria Rathner, Katrin D. Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis, Stefanos Zafeiriou
An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification
Ahmed Imtiaz Humayun, Md. Tauhiduzzaman Khan, Shabnam Ghaffarzadegan, Zhe Feng, Taufiq Hasan
Monitoring Infant's Emotional Cry in Domestic Environments Using the Capsule Network Architecture
Mehmet Ali Tuğtekin Turan, Engin Erzin
Neural Network Architecture That Combines Temporal and Summative Features for Infant Cry Classification in the Interspeech 2018 Computational Paralinguistics Challenge
Mark Huckvale
Evolving Learning for Analysing Mood-Related Infant Vocalisation
Zixing Zhang, Jing Han, Kun Qian, Björn Schuller
Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?
Johannes Wagner, Dominik Schiller, Andreas Seiderer, Elisabeth André
Investigation on Joint Representation Learning for Robust Feature Extraction in Speech Emotion Recognition
Danqing Luo, Yuexian Zou, Dongyan Huang
Using Voice Quality Supervectors for Affect Identification
Soo Jin Park, Amber Afshan, Zhi Ming Chua, Abeer Alwan
An End-to-End Deep Learning Framework for Speech Emotion Recognition of Atypical Individuals
Dengke Tang, Junlin Zeng, Ming Li
DialogOS: Simple and Extensible Dialogue Modeling
Alexander Koller, Timo Baumann, Arne Köhn
A Framework for Speech Recognition Benchmarking
Franck Dernoncourt, Trung Bui, Walter Chang
Flexible Tongue Housed in a Static Model of the Vocal Tract With Jaws, Lips and Teeth
Takayuki Arai
Voice Analysis Using Acoustic and Throat Microphones for Speech Therapy
Lani Mathew, K Gopakumar
A Robust Context-Dependent Speech-to-Speech Phraselator Toolkit for Alexa
Manny Rayner, Nikos Tsourakis, Jan Stanek
Discriminating Nasals and Approximants in English Language Using Zero Time Windowing
RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty, Bayya Yegnanarayana
Gestural Lenition of Rhotics Captures Variation in Brazilian Portuguese
Phil Howson, Alexei Kochetov
Identification and Classification of Fricatives in Speech Using Zero Time Windowing Method
RaviShankar Prasad, Bayya Yegnanarayana
GlobalTIMIT: Acoustic-Phonetic Datasets for the World’s Languages
Nattanun Chanchaochai, Christopher Cieri, Japhet Debrah, Hongwei Ding, Yue Jiang, Sishi Liao, Mark Liberman, Jonathan Wright, Jiahong Yuan, Juhong Zhan, Yuqing Zhan
Structural Effects on Properties of Consonantal Gestures in Tashlhiyt
Anne Hermes, Doris Mücke, Bastian Auris, Rachid Ridouane
The Retroflex-dental Contrast in Punjabi Stops and Nasals: A Principal Component Analysis of Ultrasound Images
Alexei Kochetov, Matthew Faytak, Kiranpreet Nara
Vowels and Diphthongs in Hangzhou Wu Chinese Dialect
Yang Yue, Fang Hu
Resyllabification in Indian Languages and Its Implications in Text-to-speech Systems
Mahesh M, Jeena J Prakash, Hema Murthy
Voice Source Contribution to Prominence Perception: Rd Implementation
Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide, Christer Gobl
On the Relationship between Glottal Pulse Shape and Its Spectrum: Correlations of Open Quotient, Pulse Skew and Peak Flow with Source Harmonic Amplitudes
Christer Gobl, Andy Murphy, Irena Yanushevskaya, Ailbhe Ní Chasaide
The Individual and the System: Assessing the Stability of the Output of a Semi-automatic Forensic Voice Comparison System
Vincent Hughes, Philip Harrison, Paul Foulkes, Peter French, Colleen Kavanagh, Eugenia San Segundo Fernández
Breathy to Tense Voice Discrimination using Zero-Time Windowing Cepstral Coefficients (ZTWCCs)
Sudarsana Reddy Kadiri, Bayya Yegnanarayana
Analysis of Breathiness in Contextual Vowel of Voiceless Nasals in Mizo
Pamir Gogoi, Sishir Kalita, Parismita Gogoi, Ratree Wayland, Priyankoo Sarmah, S R Mahadeva Prasanna
Infant Emotional Outbursts Detection in Infant-parent Spoken Interactions
Yijia Xu, Mark Hasegawa-Johnson, Nancy McElwain
Deep Neural Networks for Emotion Recognition Combining Audio and Transcripts
Jaejin Cho, Raghavendra Pappagari, Purva Kulkarni, Jesús Villalba, Yishay Carmiel, Najim Dehak
Preference-Learning with Qualitative Agreement for Sentence Level Emotional Annotations
Srinivas Parthasarathy, Carlos Busso
Transfer Learning for Improving Speech Emotion Classification Accuracy
Siddique Latif, Rajib Rana, Shahzad Younis, Junaid Qadir, Julien Epps
What Do Classifiers Actually Learn? a Case Study on Emotion Recognition Datasets
Patrick Meyer, Eric Buschermöhle, Tim Fingscheidt
State of Mind: Classification through Self-reported Affect and Word Use in Speech.
Eva-Maria Rathner, Yannik Terhorst, Nicholas Cummins, Björn Schuller, Harald Baumeister
Exploring Spatio-Temporal Representations by Integrating Attention-based Bidirectional-LSTM-RNNs and FCNs for Speech Emotion Recognition
Ziping Zhao, Yu Zheng, Zixing Zhang, Haishuai Wang, Yiqin Zhao, Chao Li
End-to-end Deep Neural Network Age Estimation
Pegah Ghahremani, Phani Sankar Nidadavolu, Nanxin Chen, Jesús Villalba, Daniel Povey, Sanjeev Khudanpur, Najim Dehak
Improving Gender Identification in Movie Audio Using Cross-Domain Data
Rajat Hebbar, Krishna Somandepalli, Shrikanth Narayanan
On Learning to Identify Genders from Raw Speech Signal Using CNNs
Selen Hande Kabil, Hannah Muckenhirn, Mathew Magimai.-Doss
Denoising and Raw-waveform Networks for Weakly-Supervised Gender Identification on Noisy Speech
Jilt Sebastian, Manoj Kumar, D. S. Pavan Kumar, Mathew Magimai.-Doss, Hema Murthy, Shrikanth Narayanan
The Effect of Exposure to High Altitude and Heat on Speech Articulatory Coordination
James Williamson, Thomas Quatieri, Adam Lammert, Katherine Mitchell, Katherine Finkelstein, Nicole Ekon, Caitlin Dillon, Robert Kenefick, Kristin Heaton
Permutation Invariant Training of Generative Adversarial Network for Monaural Speech Separation
Lianwu Chen, Meng Yu, Yanmin Qian, Dan Su, Dong Yu
Deep Extractor Network for Target Speaker Recovery from Single Channel Speech Mixtures
Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian, Dong Yu
Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network
Weipeng He, Petr Motlicek, Jean-Marc Odobez
Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method
Shuai Yang, Zhiyong Wu, Binbin Shen, Helen Meng
Robust TDOA Estimation Based on Time-Frequency Masking and Deep Neural Networks
Zhong-Qiu Wang, Xueliang Zhang, DeLiang Wang
Waveform to Single Sinusoid Regression to Estimate the F0 Contour from Noisy Speech Using Recurrent Deep Neural Networks
Akihiro Kato, Tomi Kinnunen
Reducing Interference with Phase Recovery in DNN-based Monaural Singing Voice Separation
Paul Magron, Konstantinos Drossos, Stylianos Ioannis Mimilakis, Tuomas Virtanen
Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
Kanru Hua
Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network
Yi Luo, Nima Mesgarani
Music Source Activity Detection and Separation Using Deep Attractor Network
Rajath Kumar, Yi Luo, Nima Mesgarani
Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention
Longfei Yang, Yanlu Xie, Jinsong Zhang
Vowel Space as a Tool to Evaluate Articulation Problems
Rob van Son, Catherine Middag, Kris Demuynck
Towards a Better Characterization of Parkinsonian Speech: A Multidimensional Acoustic Study
Veronique Delvaux, Kathy Huet, Myriam Piccaluga, Sophie van Malderen, Bernard Harmegnies
Self-similarity Matrix Based Intelligibility Assessment of Cleft Lip and Palate Speech
Sishir Kalita, S R Mahadeva Prasanna, Samarendra Dandapat
Pitch-Adaptive Front-end Feature for Hypernasality Detection
Akhilesh Kumar Dubey, S R Mahadeva Prasanna, S Dandapat
Detection of Amyotrophic Lateral Sclerosis (ALS) via Acoustic Analysis
Raquel Norel, Mary Pietrowicz, Carla Agurto, Shay Rishoni, Guillermo Cecchi
Detection of Glottal Activity Errors in Production of Stop Consonants in Children with Cleft Lip and Palate
C M Vikram, S R Mahadeva Prasanna, Ajish K Abraham, Pushpavathi M, Girish K S
Cold Fusion: Training Seq2Seq Models Together with Language Models
Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates
Investigation on Estimation of Sentence Probability by Combining Forward, Backward and Bi-directional LSTM-RNNs
Kazuki Irie, Zhihong Lei, Liuhui Deng, Ralf Schlüter, Hermann Ney
Subword and Crossword Units for CTC Acoustic Models
Thomas Zenkel, Ramon Sanabria, Florian Metze, Alex Waibel
Neural Error Corrective Language Models for Automatic Speech Recognition
Tomohiro Tanaka, Ryo Masumura, Hirokazu Masataki, Yushi Aono
Entity-Aware Language Model as an Unsupervised Reranker
Mohammad Sadegh Rasooli, Sarangarajan Parthasarathy
Character-level Language Modeling with Gated Hierarchical Recurrent Neural Networks
Iksoo Choi, Jinhwan Park, Wonyong Sung
Acoustic-Prosodic Indicators of Deception and Trust in Interview Dialogues
Sarah Ita Levitan, Angel Maredia, Julia Hirschberg
Deep Personality Recognition for Deception Detection
Guozhen An, Sarah Ita Levitan, Julia Hirschberg, Rivka Levitan
Cross-cultural (A)symmetries in Audio-visual Attitude Perception
Hansjörg Mixdorff, Albert Rilliard, Tan Lee, Matthew K. H. Ma, Angelika Hönemann
An Active Feature Transformation Method for Attitude Recognition of Video Bloggers
Fasih Haider, Fahim A. Salim, Owen Conlan, Saturnino Luz
Automatic Assessment of Individual Culture Attribute of Power Distance Using a Social Context-Enhanced Prosodic Network Representation
Fu-Sheng Tsai, Hao-Chun Yang, Wei-Wen Chang, Chi-Chun Lee
Analysis and Detection of Phonation Modes in Singing Voice using Excitation Source Features and Single Frequency Filtering Cepstral Coefficients (SFFCC)
Sudarsana Reddy Kadiri, Bayya Yegnanarayana
A Deep Learning Method for Pathological Voice Detection Using Convolutional Deep Belief Networks
Huiyi Wu, John Soraghan, Anja Lowit, Gaetano Di-Caterina
Dysarthric Speech Recognition Using Time-delay Neural Network Based Denoising Autoencoder
Chitralekha Bhat, Biswajit Das, Bhavik Vachhani, Sunil Kumar Kopparapu
A Multitask Learning Approach to Assess the Dysarthria Severity in Patients with Parkinson's Disease
Juan Camilo Vásquez Correa, Tomas Arias, Juan Rafael Orozco-Arroyave, Elmar Nöth
The Use of Machine Learning and Phonetic Endophenotypes to Discover Genetic Variants Associated with Speech Sound Disorder
Jason Lilley, Erin Crowgey, H Timothy Bunnell
Whistle-blowing ASRs: Evaluating the Need for More Inclusive Speech Recognition Systems
Meredith Moore, Hemanth Venkateswara, Sethuraman Panchanathan
Data Augmentation Using Healthy Speech for Dysarthric Speech Recognition
Bhavik Vachhani, Chitralekha Bhat, Sunil Kumar Kopparapu
Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function
Shaojin Ding, Guanlong Zhao, Christopher Liberatore, Ricardo Gutierrez-Osuna
Learning Structured Dictionaries for Exemplar-based Voice Conversion
Shaojin Ding, Christopher Liberatore, Ricardo Gutierrez-Osuna
Exemplar-Based Spectral Detail Compensation for Voice Conversion
Yu-Huai Peng, Hsin-Te Hwang, Yichiao Wu, Yu Tsao, Hsin-Min Wang
Whispered Speech to Neutral Speech Conversion Using Bidirectional LSTMs
G. Nisha Meenakshi, Prasanta Kumar Ghosh
Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance
Songxiang Liu, Jinghua Zhong, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng
Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations
Ju-chieh Chou, Cheng-chieh Yeh, Hung-yi Lee, Lin-shan Lee
Attention-based Sequence Classification for Affect Detection
Cristina Gorrostieta, Richard Brutti, Kye Taylor, Avi Shapiro, Joseph Moran, Ali Azarbayejani, John Kane
Computational Paralinguistics: Automatic Assessment of Emotions, Mood and Behavioural State from Acoustics of Speech
Zafi Sherhan Syed, Julien Schroeter, Kirill Sidorov, David Marshall
Investigating Utterance Level Representations for Detecting Intent from Acoustics
SaiKrishna Rallabandi, Bhavya Karki, Carla Viegas, Eric Nyberg, Alan W Black
LSTM Based Cross-corpus and Cross-task Acoustic Emotion Recognition
Heysem Kaya, Dmitrii Fedotov, Ali Yeşilkanat, Oxana Verkholyak, Yang Zhang, Alexey Karpov
Implementing Fusion Techniques for the Classification of Paralinguistic Information
Bogdan Vlasenko, Jilt Sebastian, D. S. Pavan Kumar, Mathew Magimai.-Doss
General Utterance-Level Feature Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect and Heart Beats
Gábor Gosztolya, Tamás Grósz, László Tóth
Self-Assessed Affect Recognition Using Fusion of Attentional BLSTM and Static Acoustic Features
Bo-Hao Su, Sung-Lin Yeh, Ming-Ya Ko, Huan-Yu Chen, Shun-Chang Zhong, Jeng-Lin Li, Chi-Chun Lee
Vocalic, Lexical and Prosodic Cues for the INTERSPEECH 2018 Self-Assessed Affect Challenge
Claude Montacié, Marie-José Caraty
Intonation tutor by SPIRE (In-SPIRE): An Online Tool for an Automatic Feedback to the Second Language Learners in Learning Intonation
Anand P A, Chiranjeevi Yarra, Kausthubha N K, Prasanta Kumar Ghosh
Game-based Spoken Dialog Language Learning Applications for Young Students
Keelan Evanini, Veronika Timpe-Laughlin, Eugene Tsuprun, Ian Blood, Jeremy Lee, James Bruno, Vikram Ramanarayanan, Patrick Lange, David Suendermann-Oeft
The IBM Virtual Voice Creator
Alexander Sorin, Slava Shechtman, Zvi Kons, Ron Hoory, Shay Ben-David, Joe Pavitt, Shai Rozenberg, Carmel Rabinovitz, Tal Drory
Mobile Application for Learning Languages for the Unlettered
Gayathri G, Mohana N, Radhika Pal, Hema Murthy
Mandarin-English Code-switching Speech Recognition
Haihua Xu, Van Tung Pham, Zin Tun Kyaw, Zhi Hao Lim, Eng Siong Chng, Haizhou Li
Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates
Joo-Kyung Kim, Young-Bum Kim
Analyzing Vocal Tract Movements During Speech Accommodation
Sankar Mukherjee, Thierry Legou, Leonardo Lancia, Pauline Hilt, Alice Tomassini, Luciano Fadiga, Alessandro D'Ausilio, Leonardo Badino, Noël Nguyen
Cross-Lingual Multi-Task Neural Architecture for Spoken Language Understanding
Yujiang Li, Xuemin Zhao, Weiqun Xu, Yonghong Yan
Statistical Model Compression for Small-Footprint Natural Language Understanding
Grant P. Strimel, Kanthashree Mysore Sathyendra, Stanislav Peshterliev
Comparison of an End-to-end Trainable Dialogue System with a Modular Statistical Dialogue System
Norbert Braunschweiler, Alexandros Papangelis
A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment
Megan Willi, Stephanie A. Borrie, Tyson S. Barrett, Ming Tu, Visar Berisha
Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs
Matthew Roddy, Gabriel Skantze, Naomi Harte
Classification of Correction Turns in Multilingual Dialogue Corpus
Ivan Kraljevski, Diane Hirschfeld
Contextual Slot Carryover for Disparate Schemas
Chetan Naik, Arpit Gupta, Hancheng Ge, Mathias Lambert, Ruhi Sarikaya
Capsule Networks for Low Resource Spoken Language Understanding
Vincent Renkens, Hugo van Hamme
Intent Discovery Through Unsupervised Semantic Text Clustering
A Padmasundari, Srinivas Bangalore
Multimodal Polynomial Fusion for Detecting Driver Distraction
Yulun Du, Alan W Black, Louis-Philippe Morency, Maxine Eskenazi
Engagement Recognition in Spoken Dialogue via Neural Network by Aggregating Different Annotators' Models
Koji Inoue, Divesh Lala, Katsuya Takanashi, Tatsuya Kawahara
A First Investigation of the Timing of Turn-taking in Ruuli
Tuarik Buanzur, Margaret Zellers, Saudah Namyalo, Alena Witzlack-Makarevich
Spoofing Detection Using Adaptive Weighting Framework and Clustering Analysis
Yuanjun Zhao, Roberto Togneri, Victor Sreeram
Exploration of Compressed ILPR Features for Replay Attack Detection
Sarfaraz Jelil, Sishir Kalita, S R Mahadeva Prasanna, Rohit Sinha
Detection of Replay-Spoofing Attacks Using Frequency Modulation Features
Tharshini Gunendradasan, Buddhi Wickramasinghe, Ngoc Phu Le, Eliathamby Ambikairajah, Julien Epps
Effectiveness of Speech Demodulation-Based Features for Replay Detection
Madhu Kamble, Hemlata Tak, Hemant Patil
Novel Variable Length Energy Separation Algorithm Using Instantaneous Amplitude Features for Replay Detection
Madhu Kamble, Hemant Patil
Feature with Complementarity of Statistics and Principal Information for Spoofing Detection
Jichen Yang, Changhuai You, Qianhua He
Multiple Phase Information Combination for Replay Attacks Detection
Dongbo Li, Longbiao Wang, Jianwu Dang, Meng Liu, Zeyan Oo, Seiichi Nakagawa, Haotian Guan, Xiangang Li
Frequency Domain Linear Prediction Features for Replay Spoofing Attack Detection
Buddhi Wickramasinghe, Saad Irtza, Eliathamby Ambikairajah, Julien Epps
Auditory Filterbank Learning for Temporal Modulation Features in Replay Spoof Speech Detection
Hardik Sailor, Madhu Kamble, Hemant Patil
Deep Siamese Architecture Based Replay Detection for Secure Voice Biometric
Kaavya Sriskandaraja, Vidhyasaharan Sethu, Eliathamby Ambikairajah
A Deep Identity Representation for Noise Robust Spoofing Detection
Alejandro Gómez Alanís, Antonio M. Peinado, Jose A. Gonzalez, Angel Gomez
End-To-End Audio Replay Attack Detection Using Deep Convolutional Networks with Attention
Francis Tom, Mohit Jain, Prasenjit Dey
Decision-level Feature Switching as a Paradigm for Replay Attack Detection
Saranya M S, Hema Murthy
Modulation Dynamic Features for the Detection of Replay Attacks
Gajan Suthokumar, Vidhyasaharan Sethu, Chamith Wijenayake, Eliathamby Ambikairajah
On the Usefulness of the Speech Phase Spectrum for Pitch Extraction
Erfan Loweimi, Jon Barker, Thomas Hain
Time-regularized Linear Prediction for Noise-robust Extraction of the Spectral Envelope of Speech
Manu Airaksinen, Lauri Juvela, Okko Räsänen, Paavo Alku
Auditory Filterbank Learning Using ConvRBM for Infant Cry Classification
Hardik B. Sailor, Hemant Patil
Effectiveness of Dynamic Features in INCA and Temporal Context-INCA
Nirmesh Shah, Hemant Patil
Singing Voice Phoneme Segmentation by Hierarchically Inferring Syllable and Phoneme Onset Positions
Rong Gong, Xavier Serra
Novel Empirical Mode Decomposition Cepstral Features for Replay Spoof Detection
Prasad Tapkir, Hemant Patil
Novel Linear Frequency Residual Cepstral Features for Replay Attack Detection
Hemlata Tak, Hemant Patil
Analysis of sparse representation based feature on speech mode classification
Kumud Tripathi, K. Sreenivasa Rao
Multicomponent 2-D AM-FM Modeling of Speech Spectrograms
Jitendra Kumar Dhiman, Neeraj Sharma, Chandra Sekhar Seelamantula
An Optimization Framework for Recovery of Speech from Phase-Encoded Spectrograms
Abhilash Sainathan, Sunil Rudresh, Chandra Sekhar Seelamantula
Speaker Recognition with Nonlinear Distortion: Clipping Analysis and Impact
Wei Xia, John H.L. Hansen
Linear Prediction Residual based Short-term Cepstral Features for Replay Attacks Detection
Madhusudan Singh, Debadatta Pati
Analysis of Variational Mode Functions for Robust Detection of Vowels
Surbhi Sakshi, Avinash Kumar, Gayadhar Pradhan
Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition
Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su, Dong Yu
Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition
Eugen Beck, Mirko Hannemann, Patrick Dötsch, Ralf Schlüter, Hermann Ney
Acoustic Modeling with DFSMN-CTC and Joint CTC-CE Learning
ShiLiang Zhang, Ming Lei
End-to-End Speech Command Recognition with Capsule Network
Jaesung Bae, Dae-Shik Kim
End-to-End Speech Recognition from the Raw Waveform
Neil Zeghidour, Nicolas Usunier, Gabriel Synnaeve, Ronan Collobert, Emmanuel Dupoux
A Multistage Training Framework for Acoustic-to-Word Model
Chengzhu Yu, Chunlei Zhang, Chao Weng, Jia Cui, Dong Yu
Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese
Shiyu Zhou, Linhao Dong, Shuang Xu, Bo Xu
Densely Connected Networks for Conversational Speech Recognition
Kyu Han, Akshay Chandrashekaran, Jungsuk Kim, Ian Lane
Multi-Head Decoder for End-to-End Speech Recognition
Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Kazuya Takeda
Compressing End-to-end ASR Networks by Tensor-Train Decomposition
Takuma Mori, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech
Yu-An Chung, James Glass
Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin
Linhao Dong, Shiyu Zhou, Wei Chen, Bo Xu
Joint Noise and Reverberation Adaptive Learning for Robust Speaker DOA Estimation with an Acoustic Vector Sensor
Disong Wang, Yuexian Zou
Multiple Concurrent Sound Source Tracking Based on Observation-Guided Adaptive Particle Filter
Hong Liu, Haipeng Lan, Bing Yang, Cheng Pang
Harmonic-Percussive Source Separation of Polyphonic Music by Suppressing Impulsive Noise Events
Gurunath Reddy M, K. Sreenivasa Rao, Partha Pratim Das
Speaker Activity Detection and Minimum Variance Beamforming for Source Separation
Enea Ceolini, Jithendar Anumula, Adrian Huber, Ilya Kiselev, Shih-Chii Liu
Sparsity-Constrained Weight Mapping for Head-Related Transfer Functions Individualization from Anthropometric Features
Xiaoke Qi, Jianhua Tao
Speech Source Separation Using ICA in Constant Q Transform Domain
D.V.L.N Dheeraj Sai, K. S. Kishor, K Sri Rama Murty
Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming
Lu Yin, Ziteng Wang, Risheng Xia, Junfeng Li, Yonghong Yan
Expectation-Maximization Algorithms for Itakura-Saito Nonnegative Matrix Factorization
Paul Magron, Tuomas Virtanen
Subband Weighting for Binaural Speech Source Localization
Karthik Girija Ramesan, Parth Suresh, Prasanta Kumar Ghosh
Learning to Adapt: A Meta-learning Approach for Speaker Adaptation
Ondřej Klejch, Joachim Fainberg, Peter Bell
Speaker Adaptation and Adaptive Training for Jointly Optimised Tandem Systems
Yu Wang, Chao Zhang, Mark Gales, Philip Woodland
Comparison of BLSTM-Layer-Specific Affine Transformations for Speaker Adaptation
Markus Kitza, Ralf Schlüter, Hermann Ney
Correlational Networks for Speaker Normalization in Automatic Speech Recognition
Rini A Sharon, Sandeep Reddy Kothinti, Umesh Srinivasan
Machine Speech Chain with One-shot Speaker Adaptation
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition
Khe Chai Sim, Arun Narayanan, Ananya Misra, Anshuman Tripathi, Golan Pundak, Tara Sainath, Parisa Haghani, Bo Li, Michiel Bacchiani
Waveform-Based Speaker Representations for Speech Synthesis
Moquan Wan, Gilles Degottex, Mark J.F. Gales
Incremental TTS for Japanese Language
Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis
Ruibo Fu, Jianhua Tao, Yibin Zheng, Zhengqi Wen
A Unified Framework for the Generation of Glottal Signals in Deep Learning-based Parametric Speech Synthesis Systems
Min-Jae Hwang, Eunwoo Song, Jin-Seob Kim, Hong-Goo Kang
Acoustic Modeling Using Adversarially Trained Variational Recurrent Neural Network for Speech Synthesis
Joun Yeop Lee, Sung Jun Cheon, Byoung Jin Choi, Nam Soo Kim, Eunwoo Song
On the Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis
Yibin Zheng, Jianhua Tao, Zhengqi Wen, Ruibo Fu
Integrating Recurrence Dynamics for Speech Emotion Recognition
Efthymios Tzinis, Georgios Paraskevopoulos, Christos Baziotis, Alexandros Potamianos
Towards Temporal Modelling of Categorical Speech Emotion Recognition
Wenjing Han, Huabin Ruan, Xiaomin Chen, Zhixiang Wang, Haifeng Li, Björn Schuller
Emotion Recognition from Human Speech Using Temporal Information and Deep Learning
John Kim, Rif A. Saurous
Role of Regularization in the Prediction of Valence from Speech
Kusha Sridhar, Srinivas Parthasarathy, Carlos Busso
Learning Spontaneity to Improve Emotion Recognition in Speech
Karttikeya Mangalam, Tanaya Guha
Predicting Categorical Emotions by Jointly Learning Primary and Secondary Emotions through Multitask Learning
Reza Lotfian, Carlos Busso
Picture Naming or Word Reading: Does the Modality Affect Speech Motor Adaptation and Its Transfer?
Tiphaine Caudrelier, Pascal Perrier, Jean-Luc Schwartz, Amélie Rochet-Capellan
Measuring the Band Importance Function for Mandarin Chinese with a Bayesian Adaptive Procedure
Yufan Du, Yi Shen, Hongying Yang, Xihong Wu, Jing Chen
Wide Learning for Auditory Comprehension
Elnaz Shafaei-Bajestan, R. Harald Baayen
Analyzing Reaction Time Sequences from Human Participants in Auditory Experiments
Louis ten Bosch, Mirjam Ernestus, Lou Boves
Prediction of Perceived Speech Quality Using Deep Machine Listening
Jasper Ooster, Rainer Huber, Bernd T. Meyer
Prediction of Subjective Listening Effort from Acoustic Data with Non-Intrusive Deep Models
Paul Kranzusch, Rainer Huber, Melanie Krüger, Birger Kollmeier, Bernd T. Meyer
A Case Study on the Importance of Belief State Representation for Dialogue Policy Management
Margarita Kotti, Vassilios Diakoloukas, Alexandros Papangelis, Michail Lagoudakis, Yannis Stylianou
Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers
Kohei Hara, Koji Inoue, Katsuya Takanashi, Tatsuya Kawahara
Conversational Analysis Using Utterance-level Attention-based Bidirectional Recurrent Neural Networks
Chandrakant Bothe, Sven Magg, Cornelius Weber, Stefan Wermter
A Comparative Study of Statistical Conversion of Face to Voice Based on Their Subjective Impressions
Yasuhito Ohsugi, Daisuke Saito, Nobuaki Minematsu
Follow-up Question Generation Using Pattern-based Seq2seq with a Small Corpus for Interview Coaching
Ming-Hsiang Su, Chung-Hsien Wu, Kun-Yi Huang, Qian-Bei Hong, Huai-Hung Huang
Coherence Models for Dialogue
Alessandra Cervone, Evgeny Stepanov, Giuseppe Riccardi
Indian Languages ASR: A Multilingual Phone Recognition Framework with IPA Based Common Phone-set, Predicted Articulatory Features and Feature fusion
Manjunath K E, K. Sreenivasa Rao, Dinesh Babu Jayagopi, V Ramasubramanian
Rapid Collection of Spontaneous Speech Corpora Using Telephonic Community Forums
Agha Ali Raza, Awais Athar, Shan Randhawa, Zain Tariq, Muhammad Bilal Saleem, Haris Bin Zia, Umar Saif, Roni Rosenfeld
Effect of TTS Generated Audio on OOV Detection and Word Error Rate in ASR for Low-resource Languages
Savitha Murthy, Dinkar Sitaram, Sunayana Sitaram
Development of Large Vocabulary Speech Recognition System with Keyword Search for Manipuri
Tanvina Patel, Krishna DN, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar, Anuroop Iyengar
Robust Mizo Continuous Speech Recognition
Abhishek Dey, Biswajit Dev Sarma, Wendy Lalhminghlui, Lalnunsiami Ngente, Parismita Gogoi, Priyankoo Sarmah, S R M Prasanna, Rohit Sinha, S R Nirmala
Semi-supervised and Active-learning Scenarios: Efficient Acoustic Model Refinement for a Low Resource Indian Language
Maharajan Chellapriyadharshini, Anoop Toffy, Srinivasa Raghavan K. M., V Ramasubramanian
Automatic Speech Recognition with Articulatory Information and a Unified Dictionary for Hindi, Marathi, Bengali and Oriya
Debadatta Dash, Myungjong Kim, Kristin Teplansky, Jun Wang
Captaina: Integrated Pronunciation Practice and Data Collection Portal
Aku Rouhe, Reima Karhila, Aija Elg, Minnaleena Toivola, Peter Smit, Anna-Riikka Smolander, Mikko Kurimo
auMina™ - Enterprise Speech Analytics
Umesh Sachdev, Rajagopal Jayaraman, Zainab Millwala
HoloCompanion: An MR Friend for EveryOne
Annam Naresh, Rushabh Gandhi, Mallikarjuna Rao Bellamkonda, Mithun Das Gupta
akeira™ - Virtual Assistant
Umesh Sachdev, Rajagopal Jayaraman, Zainab Millwala
Brain-Computer Interface using Electroencephalogram Signatures of Eye Blinks
Srihari Maruthachalam, Sidharth Aggarwal, Mari Ganesh Kumar, Mriganka Sur, Hema Murthy
Voice Comparison and Rhythm: Behavioral Differences between Target and Non-target Comparisons
Moez Ajili, Jean-François Bonastre, Solange Rossato
Co-whitening of I-vectors for Short and Long Duration Speaker Verification
Longting Xu, Kong Aik Lee, Haizhou Li, Zhen Yang
Compensation for Domain Mismatch in Text-independent Speaker Recognition
Fahimeh Bahmaninezhad, John H.L. Hansen
Joint Learning of J-Vector Extractor and Joint Bayesian Model for Text Dependent Speaker Verification
Ziqiang Shi, Liu Liu, Huibin Lin, Rujie Liu
Latent Factor Analysis of Deep Bottleneck Features for Speaker Verification with Random Digit Strings
Ziqiang Shi, Huibin Lin, Liu Liu, Rujie Liu
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung, Arsha Nagrani, Andrew Zisserman
Supervised I-vector Modeling - Theory and Applications
Shreyas Ramoji, Sriram Ganapathy
LOCUST - Longitudinal Corpus and Toolset for Speaker Verification
Evgeny Dmitriev, Yulia Kim, Anastasia Matveeva, Claude Montacié, Yannick Boulard, Yadviga Sinyavskaya, Yulia Zhukova, Adam Zarazinski, Egor Akhanov, Ilya Viksnin, Andrei Shlykov, Maria Usova
Analysis of Language Dependent Front-End for Speaker Recognition
Srikanth Madikeri, Subhadeep Dey, Petr Motlicek
Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
Mahesh Kumar Nandwana, Julien van Hout, Mitchell McLaren, Allen Stauffer, Colleen Richey, Aaron Lawson, Martin Graciarena
Investigation on Bandwidth Extension for Speaker Recognition
Phani Sankar Nidadavolu, Cheng-I Lai, Jesús Villalba, Najim Dehak
On Learning Vocal Tract System Related Speaker Discriminative Information from Raw Signal Using CNNs
Hannah Muckenhirn, Mathew Magimai.-Doss, Sebastien Marcel
On Convolutional LSTM Modeling for Joint Wake-Word Detection and Text Dependent Speaker Verification
Rajath Kumar, Vaishnavi Yeruva, Sriram Ganapathy
Cosine Metric Learning for Speaker Verification in the I-vector Space
Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen
An Unsupervised Neural Prediction Framework for Learning Speaker Embeddings Using Recurrent Neural Networks
Arindam Jati, Panayiotis Georgiou
A New Framework for Supervised Speech Enhancement in the Time Domain
Ashutosh Pandey, DeLiang Wang
Speech Enhancement Using the Minimum-probability-of-error Criterion
Jishnu Sadasivan, Subhadip Mukherjee, Chandra Sekhar Seelamantula
Exploring the Relationship between Conic Affinity of NMF Dictionaries and Speech Enhancement Metrics
Pavlos Papadopoulos, Colin Vaz, Shrikanth Narayanan
Using Shifted Real Spectrum Mask as Training Target for Supervised Speech Separation
Yun Liu, Hui Zhang, Xueliang Zhang
Enhancement of Noisy Speech Signal by Non-Local Means Estimation of Variational Mode Functions
Nagapuri Srinivas, Gayadhar Pradhan, Syed Shahnawazuddin
Phase-locked Loop (PLL) Based Phase Estimation in Single Channel Speech Enhancement
Priya Pallavi, Ch V Rama Rao
Cycle-Consistent Speech Enhancement
Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang
Visual Speech Enhancement
Aviv Gabbay, Asaph Shamir, Shmuel Peleg
Implementation of Digital Hearing Aid as a Smartphone Application
Saketh Sharma, Nitya Tiwari, Prem C. Pandey
Bone-Conduction Sensor Assisted Noise Estimation for Improved Speech Enhancement
Ching-Hua Lee, Bhaskar D. Rao, Harinath Garudadri
Artificial Bandwidth Extension with Memory Inclusion Using Semi-supervised Stacked Auto-encoders
Pramod Bachhav, Massimiliano Todisco, Nicholas Evans
Large Vocabulary Concatenative Resynthesis
Soumi Maiti, Joey Ching, Michael Mandel
Concatenative Resynthesis with Improved Training Signals for Speech Enhancement
Ali Raza Syed, Viet Anh Trinh, Michael Mandel
Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions
Okko Räsänen, Seshadri Shreyas, Marisa Casillas
A Comparison of Input Types to a Deep Neural Network-based Forced Aligner
Matthew C. Kelley, Benjamin V. Tucker
Joint Learning Using Denoising Variational Autoencoders for Voice Activity Detection
Youngmoon Jung, Younggwan Kim, Yeunju Choi, Hoirin Kim
Information Bottleneck Based Percussion Instrument Diarization System for Taniavartanam Segments of Carnatic Music Concerts
Nauman Dawalatabad, Jom Kuriakose, Chandra Sekhar Chellu, Hema Murthy
Robust Voice Activity Detection Using Frequency Domain Long-Term Differential Entropy
Debayan Ghosh, R Muralishankar, Sanjeev Gurugopinath
Device-directed Utterance Detection
Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister
Acoustic-Prosodic Features of Tabla Bol Recitation and Correspondence with the Tabla Imitation
Rohit M A, Preeti Rao
Who Said That? a Comparative Study of Non-negative Matrix Factorization Techniques
Teun Krikke, Frank Broz, David Lane
AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies
Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi
Audiovisual Speech Activity Detection with Advanced Long Short-Term Memory
Fei Tao, Carlos Busso
Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
Pramit Saha, Praneeth Srungarapu, Sidney Fels
Structured Word Embedding for Low Memory Neural Network Language Model
Kaiyu Shi, Kai Yu
Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder
Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Hirokazu Masataki, Yushi Aono
Efficient Keyword Spotting Using Time Delay Neural Networks
Samuel Myer, Vikrant Singh Tomar
Automatic DNN Node Pruning Using Mixture Distribution-based Group Regularization
Tsukasa Yoshida, Takafumi Moriya, Kazuho Watanabe, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono
Conditional-Computation-Based Recurrent Neural Networks for Computationally Efficient Acoustic Modelling
Raffaele Tavarone, Leonardo Badino
Leveraging Translations for Speech Transcription in Low-resource Settings
Antonios Anastasopoulos, David Chiang
Sequence-to-sequence Neural Network Model with 2D Attention for Learning Japanese Pitch Accents
Antoine Bruguier, Heiga Zen, Arkady Arkhangorodsky
Task Specific Sentence Embeddings for ASR Error Detection
Sahar Ghannay, Yannick Estève, Nathalie Camelin
Low-Latency Neural Speech Translation
Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber, Alex Waibel
Low-Resource Speech-to-Text Translation
Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez, Sharon Goldwater
VoiceGuard: Secure and Private Speech Processing
Ferdinand Brasser, Tommaso Frassetto, Korbinian Riedhammer, Ahmad-Reza Sadeghi, Thomas Schneider, Christian Weinert
Single-channel Speech Dereverberation via Generative Adversarial Training
Chenxing Li, Tieqiang Wang, Shuang Xu, Bo Xu
Single-Channel Dereverberation Using Direct MMSE Optimization and Bidirectional LSTM Networks
Wolfgang Mack, Soumitro Chakrabarty, Fabian-Robert Stöter, Sebastian Braun, Bernd Edler, Emanuël Habets
Single-channel Late Reverberation Power Spectral Density Estimation Using Denoising Autoencoders
Ina Kodrasi, Hervé Bourlard
A Non-convolutive NMF Model for Speech Dereverberation
Nikhil Mohanan, Rajbabu Velmurugan, Preeti Rao
Cross-Corpora Convolutional Deep Neural Network Dereverberation Preprocessing for Speaker Verification and Speech Enhancement
Peter Guzewich, Stephen Zahorian, Xiao Chen, Hao Zhang
Dereverberation and Beamforming in Robust Far-Field Speaker Recognition
Ladislav Mošner, Oldřich Plchot, Pavel Matějka, Ondřej Novotný, Jan Černocký
Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks
Yun Wang, Juncheng Li, Florian Metze
A Simple Model for Detection of Rare Sound Events
Weiran Wang, Chieh-Chi Kao, Chao Wang
Temporal Transformer Networks for Acoustic Scene Classification
Teng Zhang, Kailai Zhang, Ji Wu
Temporal Attentive Pooling for Acoustic Event Detection
Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai
R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection
Chieh-Chi Kao, Weiran Wang, Ming Sun, Chao Wang
Detecting Media Sound Presence in Acoustic Scenes
Constantinos Papayiannis, Justice Amoh, Viktor Rozgic, Shiva Sundaram, Chao Wang
S4D: Speaker Diarization Toolkit in Python
Pierre-Alexandre Broux, Florent Desnous, Anthony Larcher, Simon Petitrenaud, Jean Carrive, Sylvain Meignier
Multimodal Speaker Segmentation and Diarization Using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks
Tae Jin Park, Panayiotis Georgiou
Combined Speaker Clustering and Role Recognition in Conversational Speech
Nikolaos Flemotomos, Pavlos Papadopoulos, James Gibson, Shrikanth Narayanan
The ACLEW DiViMe: An Easy-to-use Diarization Tool
Adrien Le Franc, Eric Riebling, Julien Karadayi, Yun Wang, Camila Scaff, Florian Metze, Alejandrina Cristia
Automatic Detection of Multi-speaker Fragments with High Time Resolution
Evdokia Kazimirova, Andrey Belyaev
Neural Speech Turn Segmentation and Affinity Propagation for Speaker Diarization
Ruiqing Yin, Hervé Bredin, Claude Barras
Pitch or Phonation: on the Glottalization in Tone Productions in the Ruokeng Hui Chinese Dialect
Minghui Zhang, Fang Hu
Speaker-specific Structure in German Voiceless Stop Voice Onset Times
Marc Antony Hullebus, Stephen Tobin, Adamantios Gafos
Creak in the Respiratory Cycle
Kätlin Aare, Pärtel Lippus, Marcin Wlodarczak, Mattias Heldner
Acoustic Analysis of Whispery Voice Disguise in Mandarin Chinese
Cuiling Zhang, Bin Li, Si Chen, Yike Yang
The Zurich Corpus of Vowel and Voice Quality, Version 1.0
Dieter Maurer, Christian d’Heureuse, Heidy Suter, Volker Dellwo, Daniel Friedrichs, Thayabaran Kathiresan
Weighting of Coda Voicing Cues: Glottalisation and Vowel Duration
Joshua Penney, Felicity Cox, Anita Szakay
Revealing Spatiotemporal Brain Dynamics of Speech Production Based on EEG and Eye Movement
Bin Zhao, Jinfeng Huang, Gaoyan Zhang, Jianwu Dang, Minbo Chen, Yingjian Fu, Longbiao Wang
Neural Response Development During Distributional Learning
Natalie Boll-Avetisyan, Jessie S. Nixon, Tomas O. Lentz, Liquan Liu, Sandrien van Ommen, Çağri Çöltekin, Jacolien van Rij
Learning Two Tone Languages Enhances the Brainstem Encoding of Lexical Tones
Akshay Raj Maggu, Wenqing Zong, Vina Law, Patrick C. M. Wong
Perceptual Sensitivity to Spectral Change in Australian English Close Front Vowels: An Electroencephalographic Investigation
Daniel Williams, Paola Escudero, Adamantios Gafos
Effective Acoustic Cue Learning Is Not Just Statistical, It Is Discriminative
Jessie S. Nixon
Analyzing EEG Signals in Auditory Speech Comprehension Using Temporal Response Functions and Generalized Additive Models
Kimberley Mulder, Louis ten Bosch, Lou Boves
Information Encoding by Deep Neural Networks: What Can We Learn?
Louis ten Bosch, Lou Boves
Scalable Factorized Hierarchical Variational Autoencoder Training
Wei-Ning Hsu, James Glass
State Gradients for RNN Memory Analysis
Lyan Verwimp, Hugo van Hamme, Vincent Renkens, Patrick Wambacq
Exploring How Phone Classification Neural Networks Learn Phonetic Information by Visualising and Interpreting Bottleneck Features
Linxue Bai, Philip Weber, Peter Jančovič, Martin Russell
Memory Time Span in LSTMs for Multi-Speaker Source Separation
Jeroen Zegers, Hugo van Hamme
Visualizing Phoneme Category Adaptation in Deep Neural Networks
Odette Scharenborg, Sebastian Tiesmeyer, Mark Hasegawa-Johnson, Najim Dehak
Early Vocabulary Development Through Picture-based Software Solutions
G Kasthuri, Prabha Ramanathan, Hema Murthy, Namita Jacob, Anil Prabhakar
Automatic Detection of Expressiveness in Oral Reading
Kamini Sabu, Kanhaiya Kumar, Preeti Rao
PannoMulloKathan: Voice Enabled Mobile App for Agricultural Commodity Price Dissemination in Bengali Language
Madhab Pal, Rajib Roy, Soma Khan, Milton S. Bepari, Joyanta Basu
Visualizing Punctuation Restoration in Speech Transcripts with Prosograph
Alp Öktem, Mireia Farrús, Antonio Bonafonte
CACTAS - Collaborative Audio Categorization and Transcription for ASR Systems
Mithul Mathivanan, Kinnera Saranu, Abhishek Pandey, Jithendra Vepa
FACTS: A Hierarchical Task-based Control Model of Speech Incorporating Sensory Feedback
Benjamin Parrell, Vikram Ramanarayanan, Srikantan Nagarajan, John Houde
Sensorimotor Response to Tongue Displacement Imagery by Talkers with Parkinson’s Disease
William Katz, Patrick Reidy, Divya Prabhakaran
Automatic Pronunciation Evaluation of Singing
Chitralekha Gupta, Haizhou Li, Ye Wang
Classification of Nonverbal Human Produced Audio Events: A Pilot Study
Rachel E. Bouserhal, Philippe Chabot, Milton Sarria-Paja, Patrick Cardinal, Jérémie Voix
UltraFit: A Speaker-friendly Headset for Ultrasound Recordings in Speech Science
Lorenzo Spreafico, Michael Pucher, Anna Matosova
Articulatory Consequences of Vocal Effort Elicitation Method
Elisabet Eir Cortes, Marcin Wlodarczak, Juraj Šimko
Age-related Effects on Sensorimotor Control of Speech Production
Anne Hermes, Jane Mertens, Doris Mücke
An Ultrasound Study of Gemination in Coronal Stops in Eastern Oromo
Maida Percival, Alexei Kochetov, Yoonjung Kang
Processing Transition Regions of Glottal Stop Substituted /S/ for Intelligibility Enhancement of Cleft Palate Speech
Protima Nomo Sudro, Sishir Kalita, S R Mahadeva Prasanna
Reconstructing Neutral Speech from Tracheoesophageal Speech
Abinay Reddy N, Achuth Rao MV, G. Nisha Meenakshi, Prasanta Kumar Ghosh
Automatic Evaluation of Soft Articulatory Contact for Stuttering Treatment
Keiko Ochi, Koichi Mori, Naomi Sakai
Korean Singing Voice Synthesis Based on an LSTM Recurrent Neural Network
Juntae Kim, Heejin Choi, Jinuk Park, Minsoo Hahn, Sangjin Kim, Jong-Jin Kim
The Trajectory of Voice Onset Time with Vocal Aging
Chen Xuanda, Xiong Ziyu, Hu Jian
The Fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, Task and Baselines
Jon Barker, Shinji Watanabe, Emmanuel Vincent, Jan Trmal
Voices Obscured in Complex Environmental Settings (VOiCES) Corpus
Colleen Richey, Maria A. Barrios, Zeb Armstrong, Chris Bartels, Horacio Franco, Martin Graciarena, Aaron Lawson, Mahesh Kumar Nandwana, Allen Stauffer, Julien van Hout, Paul Gamble, Jeffrey Hetherly, Cory Stephenson, Karl Ni
Building State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement Baseline
Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu, Shinji Watanabe
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition
Wei-Ning Hsu, Hao Tang, James Glass
Investigating Generative Adversarial Networks Based Speech Dereverberation for Robust Speech Recognition
Ke Wang, Junbo Zhang, Sining Sun, Yujun Wang, Fei Xiang, Lei Xie
Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks
Xuankai Chang, Yanmin Qian, Dong Yu
Weighting Time-Frequency Representation of Speech Using Auditory Saliency for Automatic Speech Recognition
Cong-Thanh Do, Yannis Stylianou
Acoustic Modeling from Frequency Domain Representations of Speech
Pegah Ghahremani, Hossein Hadian, Hang Lv, Daniel Povey, Sanjeev Khudanpur
Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition
Ishwar Chandra Yadav, Avinash Kumar, Syed Shahnawazuddin, Gayadhar Pradhan
Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations
Aaron Nicolson, Kuldip K. Paliwal
Speech Emotion Recognition by Combining Amplitude and Phase Information Using Convolutional Neural Network
Lili Guo, Longbiao Wang, Jianwu Dang, Linjuan Zhang, Haotian Guan, Xiangang Li
Bubble Cooperative Networks for Identifying Important Speech Cues
Viet Anh Trinh, Brian McFee, Michael I Mandel
Real-Time Scoring of an Oral Reading Assessment on Mobile Devices
Jian Cheng
A Deep Learning Approach to Assessing Non-native Pronunciation of English Using Phone Distances
Konstantinos Kyriakopoulos, Kate Knill, Mark Gales
Paired Phone-Posteriors Approach to ESL Pronunciation Quality Assessment
Yujia Xiao, Frank Soong, Wenping Hu
Investigating the Role of L1 in Automatic Pronunciation Evaluation of L2 Speech
Ming Tu, Anna Grabek, Julie Liss, Visar Berisha
Impact of ASR Performance on Free Speaking Language Assessment
Kate Knill, Mark Gales, Konstantinos Kyriakopoulos, Andrey Malinin, Anton Ragni, Yu Wang, Andrew Caines
Automatic Miscue Detection Using RNN Based Models with Data Augmentation
Yoon Seok Hong, Kyung Seo Ki, Gahgene Gweon
A Study of Objective Measurement of Comprehensibility through Native Speakers' Shadowing of Learners' Utterances
Yusuke Inoue, Suguru Kabashima, Daisuke Saito, Nobuaki Minematsu, Kumi Kanamura, Yutaka Yamauchi
Factorized Deep Neural Network Adaptation for Automatic Scoring of L2 Speech in English Speaking Tests
Dean Luo, Chunxiao Zhang, Linzhong Xia, Lixin Wang
On the Difficulties of Automatic Speech Recognition for Kindergarten-Aged Children
Gary Yeung, Abeer Alwan
Improved Acoustic Modelling for Automatic Literacy Assessment of Children
Mauro Nicolao, Michiel Sanders, Thomas Hain
Anomaly Detection Approach for Pronunciation Verification of Disordered Speech Using Speech Attribute Features
Mostafa Shahin, Beena Ahmed, Jim X. Ji, Kirrie Ballard
Effectiveness of Voice Quality Features in Detecting Depression
Amber Afshan, Jinxi Guo, Soo Jin Park, Vijay Ravi, Jonathan Flint, Abeer Alwan
Fusing Text-dependent Word-level i-Vector Models to Screen ‘at Risk’ Child Speech
Prasanna Kothalkar, Johanna Rudolph, Christine Dollaghan, Jennifer McGlothlin, Thomas Campbell, John H.L. Hansen
Testing Paradigms for Assistive Hearing Devices in Diverse Acoustic Environments
Ram Charan Chandra Shekar, Hussnain Ali, John H.L. Hansen
Detection of Dementia from Responses to Atypical Questions Asked by Embodied Conversational Agents
Tsuyoki Ujiro, Hiroki Tanaka, Hiroyoshi Adachi, Hiroaki Kazui, Manabu Ikeda, Takashi Kudo, Satoshi Nakamura
Acoustic Features Associated with Sustained Vowel and Continuous Speech Productions by Chinese Children with Functional Articulation Disorders
Wang Zhang, Xiangquan Gui, Tianqi Wang, Manwa Ng, Feng Yang, Lan Wang, Nan Yan
Estimation of Hypernasality Scores from Cleft Lip and Palate Speech
C M Vikram, Ayush Tripathi, Sishir Kalita, S R Mahadeva Prasanna
Detecting Alzheimer’s Disease Using Gated Convolutional Neural Network from Audio Data
Tifani Warnita, Nakamasa Inoue, Koichi Shinoda
Automatic Detection of Orofacial Impairment in Stroke
Andrea Bandini, Jordan Green, Brian Richburg, Yana Yunusova
Detecting Depression with Audio/Text Sequence Modeling of Interviews
Tuka Al Hanai, Mohammad Ghassemi, James Glass
Discourse Marker Detection for Hesitation Events on Mandarin Conversation
Yu-Wun Wang, Hen-Hsen Huang, Kuan-Yu Chen, Hsin-Hsi Chen
Acoustic and Perceptual Characteristics of Mandarin Speech in Homosexual and Heterosexual Male Speakers
Puyang Geng, Wentao Gu, Hiroya Fujisaki
Automatic Question Detection from Acoustic and Phonetic Features Using Feature-wise Pre-training
Atsushi Ando, Reine Asakawa, Ryo Masumura, Hosana Kamiyama, Satoshi Kobashikawa, Yushi Aono
Improving Response Time of Active Speaker Detection Using Visual Prosody Information Prior to Articulation
Fasih Haider, Saturnino Luz, Carl Vogel, Nick Campbell
Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions
Bekir Berker Türker, Engin Erzin, Yücel Yemez, Metin Sezgin
Analyzing Effect of Physical Expression on English Proficiency for Multimodal Computer-Assisted Language Learning
Haoran Wu, Yuya Chiba, Takashi Nose, Akinori Ito
Analysis of the Effect of Speech-Laugh on Speaker Recognition System
Sri Harsha Dumpala, Ashish Panda, Sunil Kumar Kopparapu
Vocal Biomarkers for Cognitive Performance Estimation in a Working Memory Task
Jennifer Sloboda, Adam Lammert, James Williamson, Christopher Smalt, Daryush D. Mehta, COL Ian Curry, Kristin Heaton, Jeffrey Palmer, Thomas Quatieri
Lexical and Acoustic Deep Learning Model for Personality Recognition
Guozhen An, Rivka Levitan
Layer Trajectory LSTM
Jinyu Li, Changliang Liu, Yifan Gong
Semi-tied Units for Efficient Gating in LSTM and Highway Networks
Chao Zhang, Philip Woodland
Gaussian Process Neural Networks for Speech Recognition
Max W. Y. Lam, Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Rongfeng Su, Xunying Liu, Helen Meng
Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition
Jian Tang, Yan Song, Lirong Dai, Ian McLoughlin
Gated Recurrent Unit Based Acoustic Modeling with Future Context
Jie Li, Xiaorui Wang, Yuanyuan Zhao, Yan Li
Output-Gate Projected Gated Recurrent Unit for Speech Recognition
Gaofeng Cheng, Daniel Povey, Lu Huang, Ji Xu, Sanjeev Khudanpur, Yonghong Yan
Performance Analysis of the 2017 NIST Language Recognition Evaluation
Seyed Omid Sadjadi, Timothee Kheyrkhah, Craig Greenberg, Elliot Singer, Douglas Reynolds, Lisa Mason, Jaime Hernandez-Cordero
Using Deep Neural Networks for Identification of Slavic Languages from Acoustic Signal
Lukas Mateju, Petr Cerva, Jindrich Zdansky, Radek Safarik
Adding New Classes without Access to the Original Training Data with Applications to Language Identification
Hagai Taitelbaum, Ehud Ben-Reuven, Jacob Goldberger
Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification
Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai
Sub-band Envelope Features Using Frequency Domain Linear Prediction for Short Duration Language Identification
Sarith Fernando, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Effectiveness of Single-Channel BLSTM Enhancement for Language Identification
Peter Sibbern Frederiksen, Jesús Villalba, Shinji Watanabe, Zheng-Hua Tan, Najim Dehak
Articulation Rate as a Speaker Discriminant in British English
Erica Gold
Truncation and Compression in Southern German and Australian English
Jenny Yu, Katharina Zahner
Prominence-based Evaluation of L2 Prosody
Heini Kallio, Antti Suni, Päivi Virkkunen, Juraj Šimko
Length Contrast and Covarying Features: Whistled Speech as a Case Study
Rachid Ridouane, Giuseppina Turco, Julien Meyer
Information Structure, Affect and Prenuclear Prominence in American English
Eleanor Chodroff, Jennifer Cole
Effects of User Controlled Speech Rate on Intelligibility in Noisy Environments
John S. Novak III, Robert V. Kenyon
Binaural Speech Intelligibility Estimation Using Deep Neural Networks
Kazuhiro Kondo, Kazuya Taira, Yosuke Kobayashi
Multi-resolution Gammachirp Envelope Distortion Index for Intelligibility Prediction of Noisy Speech
Katsuhiko Yamamoto, Toshio Irino, Narumi Ohashi, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani
Speech Intelligibility Enhancement Based on a Non-causal Wavenet-like Model
Muhammed Shifas PV, Vassilis Tsiaras, Yannis Stylianou
Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model Based on BLSTM
Szu-wei Fu, Yu Tsao, Hsin-Te Hwang, Hsin-Min Wang
Global SNR Estimation of Speech Signals Using Entropy and Uncertainty Estimates from Dropout Networks
Rohith Aralikatti, Dilip Kumar Margam, Tanay Sharma, Abhinav Thanda, Shankar Venkatesan
Detecting Packet-Loss Concealment Using Formant Features and Decision Tree Learning
Gabriel Mittag, Sebastian Möller
UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions
Aciel Eshky, Manuel Sam Ribeiro, Joanne Cleland, Korin Richmond, Zoe Roxburgh, James M Scobbie, Alan Wrench
Detecting Signs of Dementia Using Word Vector Representations
Bahman Mirheidari, Daniel Blackburn, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen
Classification of Huntington Disease Using Acoustic and Lexical Features
Matthew Perez, Wenyu Jin, Duc Le, Noelle Carlozzi, Praveen Dayalu, Angela Roberts, Emily Mower Provost
The PRIORI Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild
Soheil Khorram, Mimansa Jaiswal, John Gideon, Melvin McInnis, Emily Mower Provost
Language Features for Automated Evaluation of Cognitive Behavior Psychotherapy Sessions
Nikolaos Flemotomos, Victor Martinez, James Gibson, David Atkins, Torrey Creed, Shrikanth Narayanan
Automatic Early Detection of Amyotrophic Lateral Sclerosis from Intelligible Speech Using Convolutional Neural Networks
Kwanghoon An, Myungjong Kim, Kristin Teplansky, Jordan Green, Thomas Campbell, Yana Yunusova, Daragh Heitzman, Jun Wang
A Study of Lexical and Prosodic Cues to Segmentation in a Hindi-English Code-switched Discourse
Preeti Rao, Mugdha Pandya, Kamini Sabu, Kanhaiya Kumar, Nandini Bondale
Building a Unified Code-Switching ASR System for South African Languages
Emre Yılmaz, Astik Biswas, Ewald van der Westhuizen, Febe de Wet, Thomas Niesler
Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition
Pengcheng Guo, Haihua Xu, Lei Xie, Eng Siong Chng
Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech
Emre Yılmaz, Henk van den Heuvel, David van Leeuwen
The Role of Cognate Words, POS Tags and Entrainment in Code-Switching
Victor Soto, Nishmar Cestero, Julia Hirschberg
Homophone Identification and Merging for Code-switched Speech Recognition
Brij Mohan Lal Srivastava, Sunayana Sitaram
Code-switching in Indic Speech Synthesisers
Anju Leela Thomas, Anusha Prakash, Arun Baby, Hema Murthy
A Novel Approach for Effective Recognition of the Code-Switched Data on Monolingual Language Model
Sreeram Ganji, Rohit Sinha
Hierarchical Accent Determination and Application in a Large Scale ASR System
Ramya Viswanathan, Periyasamy Paramasivam, Jithendra Vepa
Toward Scalable Dialog Technology for Conversational Language Learning: Case Study of the TOEFL® MOOC
Vikram Ramanarayanan, David Pautler, Patrick Lange, Eugene Tsuprun, Rutuja Ubale, Keelan Evanini, David Suendermann-Oeft
Machine Learning Powered Data Platform for High-Quality Speech and NLP Workflows
João Freitas, Jorge Ribeiro, Daan Baldwijns, Sara Oliveira, Daniela Braga
Fully Automatic Speaker Separation System, with Automatic Enrolling of Recurrent Speakers
Raphael Cohen, Orgad Keller, Jason Levy, Russell Levy, Micha Breakstone, Amit Ashkenazi
Online Speech Translation System for Tamil
Madhavaraj Ayyavu, Shiva Kumar H R, Ramakrishnan A G
Unsupervised Vocal Tract Length Warped Posterior Features for Non-Parallel Voice Conversion
Nirmesh Shah, Maulik C. Madhavi, Hemant Patil
Voice Conversion with Conditional SampleRNN
Cong Zhou, Michael Horgan, Vivek Kumar, Cristina Vasco, Dan Darcy
A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder
Berrak Sisman, Mingyang Zhang, Haizhou Li
WaveNet Vocoder with Limited Training Data for Voice Conversion
Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou, Li-Rong Dai
Collapsed Speech Segment Detection and Suppression for WaveNet Vocoder
Yi-Chiao Wu, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing, Tomoki Toda
High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder
Kuan Chen, Bo Chen, Jiahao Lai, Kai Yu
Spanish Statistical Parametric Speech Synthesis Using a Neural Vocoder
Antonio Bonafonte, Santiago Pascual, Georgina Dorca
Experiments with Training Corpora for Statistical Text-to-speech Systems.
Monika Podsiadło, Victor Ungureanu
Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions
Yu Gu, Yongguo Kang
Speaker-independent Raw Waveform Model for Glottal Excitation
Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi, Paavo Alku
A New Glottal Neural Vocoder for Speech Synthesis
Yang Cui, Xi Wang, Lei He, Frank K. Soong
Exemplar-based Speech Waveform Generation
Oliver Watts, Cassia Valentini-Botinhao, Felipe Espic, Simon King
Frequency Domain Variants of Velvet Noise and Their Application to Speech Processing and Synthesis
Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda, Toshio Irino
Joint Learning of Interactive Spoken Content Retrieval and Trainable User Simulator
Pei-Hung Chung, Kuan Tung, Ching-Lun Tai, Hung-yi Lee
Attention-based End-to-End Models for Small-Footprint Keyword Spotting
Changhao Shan, Junbo Zhang, Yujun Wang, Lei Xie
Prediction of Aesthetic Elements in Karnatic Music: A Machine Learning Approach
Ragesh Rajan M, Ashwin Vijayakumar, Deepu Vijayasenan
Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning
Wenda Chen, Mark Hasegawa-Johnson, Nancy F. Chen
Automatic Speech Recognition and Topic Identification from Speech for Almost-Zero-Resource Languages
Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak, Sanjeev Khudanpur
Play Duration Based User-Entity Affinity Modeling in Spoken Dialog System
Bo Xiao, Nicholas Monath, Shankar Ananthakrishnan, Abishek Ravi
Empirical Analysis of Score Fusion Application to Combined Neural Networks for Open Vocabulary Spoken Term Detection
Shi-wook Lee, Kazuyo Tanaka, Yoshiaki Itoh
Phonological Posterior Hashing for Query by Example Spoken Term Detection
Afsaneh Asaei, Dhananjay Ram, Hervé Bourlard
Term Extraction via Neural Sequence Labeling a Comparative Evaluation of Strategies Using Recurrent Neural Networks
Maren Kucza, Jan Niehues, Thomas Zenkel, Alex Waibel, Sebastian Stüker
Semi-supervised Learning for Information Extraction from Dialogue
Anjuli Kannan, Kai Chen, Diana Jaunzeikare, Alvin Rajkomar
Slot Filling with Delexicalized Sentence Generation
Youhyun Shin, Kang Min Yoo, Sang-goo Lee
Music Genre Recognition Using Deep Neural Networks and Transfer Learning
Deepanway Ghosal, Maheshkumar H. Kolekar
Efficient Voice Trigger Detection for Low Resource Hardware
Siddharth Sigtia, Rob Haynes, Hywel Richards, Erik Marchi, John Bridle
A Novel Normalization Method for Autocorrelation Function for Pitch Detection and for Speech Activity Detection
Qiguang Lin, Yiwen Shao
Estimation of the Vocal Tract Length of Vowel Sounds Based on the Frequency of the Significant Spectral Valley
TV Ananthapadmanabha, Ramakrishnan A G
Deep Learning Techniques for Koala Activity Detection
Ivan Himawan, Michael Towsey, Bradley Law, Paul Roe
Glottal Closure Instant Detection from Speech Signal Using Voting Classifier and Recursive Feature Elimination
Jindřich Matoušek, Daniel Tihelka
Assessing Speaker Engagement in 2-Person Debates: Overlap Detection in United States Presidential Debates
Midia Yousefi, Navid Shokouhi, John H.L. Hansen
All-Conv Net for Bird Activity Detection: Significance of Learned Pooling
Arjun Pankajakshan, Anshul Thakur, Daksh Thapar, Padmanabhan Rajan, Aditya Nigam
Deep Convex Representations: Feature Representations for Bioacoustics Classification
Anshul Thakur, Vinayak Abrol, Pulkit Sharma, Padmanabhan Rajan
Detection of Glottal Excitation Epochs in Speech Signal Using Hilbert Envelope
Hirak Dasgupta, Prem C. Pandey, K S Nataraj
Analyzing Thai Tone Distribution through Functional Data Analysis
Hong Zhang
Articulatory Feature Classification Using Convolutional Neural Networks
Danny Merkx, Odette Scharenborg
A New Frequency Coverage Metric and a New Subband Encoding Model, with an Application in Pitch Estimation
Shoufeng Lin
Improved Epoch Extraction from Telephonic Speech Using Chebfun and Zero Frequency Filtering
B Ganga Gowri, K P Soman, D Govind
An Empirical Analysis of the Correlation of Syntax and Prosody
Arne Köhn, Timo Baumann, Oskar Dörfler
Analysing the Focus of a Hierarchical Attention Network: the Importance of Enjambments When Classifying Post-modern Poetry
Timo Baumann, Hussein Hussein, Burkhard Meyer-Sickendiek
Language-Dependent Melody Embeddings
Daniil Kocharov, Alla Menshikova
Stress Distribution of Given Information in Chinese Reading Texts
Yuan Jia, Xiaoxiao Ma
Acoustic-prosodic Entrainment in Structural Metadata Events
Vera Cabarrão, Fernando Batista, Helena Moniz, Isabel Trancoso, Ana Isabel Mata
Formant Measures of Vowels Adjacent to Alveolar and Retroflex Consonants in Arrernte: Stressed and Unstressed Position
Marija Tabain, Richard Beare, Andrew Butcher
Automatic Assessment of L2 English Word Prosody Using Weighted Distances of F0 and Intensity Contours
Quy-Thao Truong, Tsuneo Kato, Seiichi Yamamoto
Homogeneity vs Heterogeneity in Indian English: Investigating Influences of L1 on f0 Range
Olga Maxwell, Elinor Payne, Rosey Billington
Emotional Prosody Perception in Mandarin-speaking Congenital Amusics
Yixin Zhang, Tianzhu Geng, Jinsong Zhang
Cultural Differences in Pattern Matching: Multisensory Recognition of Socio-affective Prosody
Takaaki Shochi, Jean-Luc Rouas, Marine Guerry, Donna Erickson
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, Tsubasa Ochiai
A GPU-based WFST Decoder with Exact Lattice Generation
Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
Automatic Speech Recognition System Development in the "Wild"
Anton Ragni, Mark Gales
Semantic Lattice Processing in Contextual Automatic Speech Recognition for Google Assistant
Leonid Velikovich, Ian Williams, Justin Scheiner, Petar Aleksic, Pedro Moreno, Michael Riley
Contextual Speech Recognition in End-to-end Neural Network Systems Using Beam Search
Ian Williams, Anjuli Kannan, Petar Aleksic, David Rybach, Tara Sainath
Forward-Backward Attention Decoder
Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara
Learning Discriminative Features for Speaker Identification and Verification
Sarthak Yadav, Atul Rai
Triplet Loss Based Cosine Similarity Metric Learning for Text-independent Speaker Recognition
Sergey Novoselov, Vadim Shchemelinin, Andrey Shulipa, Alexandr Kozlov, Ivan Kremnev
Speaker Embedding Extraction with Phonetic Information
Yi Liu, Liang He, Jia Liu, Michael T. Johnson
Attentive Statistics Pooling for Deep Speaker Embedding
Koji Okabe, Takafumi Koshinaka, Koichi Shinoda
Robust and Discriminative Speaker Embedding via Intra-Class Distance Variance Regularization
Nam Le, Jean-Marc Odobez
Deep Discriminative Embeddings for Duration Robust Speaker Verification
Na Li, Deyi Tuo, Dan Su, Zhifeng Li, Dong Yu
Impact of Different Speech Types on Listening Effort
Olympia Simantiraki, Martin Cooke, Simon King
Who Are You Listening to? Towards a Dynamic Measure of Auditory Attention to Speech-on-speech.
Moïra-Phoebé Huet, Christophe Micheyl, Etienne Gaudrain, Etienne Parizet
Investigating the Role of Familiar Face and Voice Cues in Speech Processing in Noise
Jeesun Kim, Sonya Karisma, Vincent Aubanel, Chris Davis
The Conversation Continues: the Effect of Lyrics and Music Complexity of Background Music on Spoken-Word Recognition
Odette Scharenborg, Martha Larson
Loud and Shouted Speech Perception at Variable Distances in a Forest
Julien Meyer, Fanny Meunier, Laure Dentel, Noelia Do Carmo Blanco, Frédéric Sèbe
Phoneme Resistance and Phoneme Confusion in Noise: Impact of Dyslexia
Noelia Do Carmo Blanco, Julien Meyer, Michel Hoen, Fanny Meunier
Conditional End-to-End Audio Transforms
Albert Haque, Michelle Guo, Prateek Verma
Detection of Glottal Closure Instants in Degraded Speech Using Single Frequency Filtering Analysis
Gunnam Aneeja, Sudarsana Reddy Kadiri, Bayya Yegnanarayana
Tone Recognition Using Lifters and CTC
Loren Lugosch, Vikrant Singh Tomar
Epoch Extraction from Pathological Children Speech Using Single Pole Filtering Approach
C M Vikram, S R Mahadeva Prasanna
Automated Classification of Vowel-Gesture Parameters Using External Broadband Excitation
Balamurali B T, Jer-Ming Chen
Estimation of Fundamental Frequency from Singing Voice Using Harmonics of Impulse-like Excitation Source
Sudarsana Reddy Kadiri, Bayya Yegnanarayana
Investigating the Effect of Audio Duration on Dementia Detection Using Acoustic Features
Jochen Weiner, Miguel Angrick, Srinivasan Umesh, Tanja Schultz
An Interlocutor-Modulated Attentional LSTM for Differentiating between Subgroups of Autism Spectrum Disorder
Yun-Shao Lin, Susan Shur-Fen Gau, Chi-Chun Lee
Recognition of Echolalic Autistic Child Vocalisations Utilising Convolutional Recurrent Neural Networks
Shahin Amiriparian, Alice Baird, Sahib Julka, Alyssa Alcorn, Sandra Ottl, Sunčica Petrović, Eloise Ainger, Nicholas Cummins, Björn Schuller
Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy Dyadic Interactions
Sandeep Nallan Chakravarthula, Brian Baucom, Panayiotis Georgiou
Computational Modeling of Conversational Humor in Psychotherapy
Anil Ramakrishna, Timothy Greer, David Atkins, Shrikanth Narayanan
Multimodal I-vectors to Detect and Evaluate Parkinson's Disease
Nicanor Garcia, Juan Camilo Vásquez Correa, Juan Rafael Orozco-Arroyave, Elmar Nöth
Overview of the 2018 Spoken CALL Shared Task
Claudia Baur, Andrew Caines, Cathy Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer Strik, Xizi Wei
The CSU-K Rule-Based System for the 2nd Edition Spoken CALL Shared Task
Dominik Jülg, Mario Kunstek, Cem Philipp Freimoser, Kay Berkling, Mengjie Qian
Liulishuo's System for the Spoken CALL Shared Task 2018
Huy Nguyen, Lei Chen, Ramon Prieto, Chuan Wang, Yang Liu
An Optimization Based Approach for Solving Spoken CALL Shared Task
Mohammad Ateeq, Abualsoud Hanani, Aziz Qaroush
The University of Birmingham 2018 Spoken CALL Shared Task Systems
Mengjie Qian, Xizi Wei, Peter Jančovič, Martin Russell
Improvements to an Automated Content Scoring System for Spoken CALL Responses: the ETS Submission to the Second Spoken CALL Shared Task
Keelan Evanini, Matthew Mulholland, Rutuja Ubale, Yao Qian, Robert Pugh, Vikram Ramanarayanan, Aoife Cahill
Extracting Speaker’s Gender, Accent, Age and Emotional State from Speech
Nagendra Goel, Mousmita Sarma, Tejendra Kushwah, Dharmesh Agarwal, Zikra Iqbal, Surbhi Chauhan
Determining Speaker Location from Speech in a Practical Environment
BHVS Narayanamurthy, JV Satyanarayana, Bayya Yegnanarayana
An Automatic Speech Transcription System for Manipuri Language
Tanvina Patel, Krishna DN, Noor Fathima, Nisar Shah, Mahima C, Deepak Kumar, Anuroop Iyengar
SPIRE-SST: An Automatic Web-based Self-learning Tool for Syllable Stress Tutoring (SST) to the Second Language Learners
Chiranjeevi Yarra, Anand P A, Kausthubha N K, Prasanta Kumar Ghosh
Glotto Vibrato Graph: A Device and Method for Recording, Analysis and Visualization of Glottal Activity
Kishalay Chakraborty, Senjam Shantirani Devi, Sanjeevan Devnath, S R Mahadeva Prasanna, Priyankoo Sarmah
Multi-Modal Data Augmentation for End-to-end ASR
Adithya Renduchintala, Shuoyang Ding, Matthew Wiesner, Shinji Watanabe
Multi-task Learning with Augmentation Strategy for Acoustic-to-word Attention-based Encoder-decoder Speech Recognition
Takafumi Moriya, Sei Ueno, Yusuke Shinohara, Marc Delcroix, Yoshikazu Yamaguchi, Yushi Aono
Training Augmentation with Adversarial Examples for Robust Speech Recognition
Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang, Lei Xie
Data Augmentation Improves Recognition of Foreign Accented Speech
Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin, Gakuto Kurata
Speaker Adaptive Training and Mixup Regularization for Neural Network Acoustic Models in Automatic Speech Recognition
Natalia Tomashenko, Yuri Khokhlov, Yannick Estève
Neural Language Codes for Multilingual Acoustic Models
Markus Müller, Sebastian Stüker, Alex Waibel
Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition
Sei Ueno, Takafumi Moriya, Masato Mimura, Shinsuke Sakai, Yusuke Shinohara, Yoshikazu Yamaguchi, Yushi Aono, Tatsuya Kawahara
Empirical Evaluation of Speaker Adaptation on DNN Based Acoustic Model
Ke Wang, Junbo Zhang, Yujun Wang, Lei Xie
Improving DNNs Trained with Non-Native Transcriptions Using Knowledge Distillation and Target Interpolation
Amit Das, Mark Hasegawa-Johnson
Improving Cross-Lingual Knowledge Transferability Using Multilingual TDNN-BLSTM with Language-Dependent Pre-Final Layer
Siyuan Feng, Tan Lee
Auxiliary Feature Based Adaptation of End-to-end ASR Systems
Marc Delcroix, Shinji Watanabe, Atsunori Ogawa, Shigeki Karita, Tomohiro Nakatani
Leveraging Native Language Information for Improved Accented Speech Recognition
Shahram Ghorbani, John H.L. Hansen
Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
Abhinav Jain, Minali Upreti, Preethi Jyothi
Fast Language Adaptation Using Phonological Information
Sibo Tong, Philip N. Garner, Hervé Bourlard
Naturalness Improvement Algorithm for Reconstructed Glossectomy Patient's Speech Using Spectral Differential Modification in Voice Conversion
Hiroki Murakami, Sunao Hara, Masanobu Abe, Masaaki Sato, Shogo Minagi
Audio-visual Voice Conversion Using Deep Canonical Correlation Analysis for Deep Bottleneck Features
Satoshi Tamura, Kento Horio, Hajime Endo, Satoru Hayamizu, Tomoki Toda
An Investigation of Convolution Attention Based Models for Multilingual Speech Synthesis of Indian Languages
Pallavi Baljekar, SaiKrishna Rallabandi, Alan W Black
The Effect of Real-Time Constraints on Automatic Speech Animation
Danny Websdale, Sarah Taylor, Ben Milner
Joint Learning of Facial Expression and Head Pose from Speech
David Greenwood, Iain Matthews, Stephen Laycock
Acoustic-dependent Phonemic Transcription for Text-to-speech Synthesis
Kévin Vythelingum, Yannick Estève, Olivier Rosec
Multimodal Speech Synthesis Architecture for Unsupervised Speaker Adaptation
Hieu-Thi Luong, Junichi Yamagishi
Articulatory-to-speech Conversion Using Bi-directional Long Short-term Memory
Fumiaki Taguchi, Tokihiko Kaburagi
Implementation of Respiration in Articulatory Synthesis Using a Pressure-Volume Lung Model
Keisuke Tanihara, Shogo Yonekura, Yasuo Kuniyoshi
Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis
Xiao Zhou, Zhen-Hua Ling, Zhi-Ping Zhou, Li-Rong Dai
Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer
Ruibo Fu, Jianhua Tao, Yibin Zheng, Zhengqi Wen
DNN-based Speech Synthesis for Small Data Sets Considering Bidirectional Speech-Text Conversion
Kentaro Sone, Toru Nakashika
A Weighted Superposition of Functional Contours Model for Modelling Contextual Prominence of Elementary Prosodic Contours
Branislav Gerazov, Gérard Bailly, Yi Xu
LSTBM: A Novel Sequence Representation of Speech Spectra Using Restricted Boltzmann Machine with Long Short-Term Memory
Toru Nakashika
Should Code-switching Models Be Asymmetric?
Barbara E. Bullock, Gualberto Guzmán, Jacqueline Serigos, Almeida Jacqueline Toribio
Cross-language Perception of Mandarin Lexical Tones by Mongolian-speaking Bilinguals in the Inner Mongolia Autonomous Region, China
Kimiko Tsukada, Yu Rong
Automatically Measuring L2 Speech Fluency without the Need of ASR: A Proof-of-concept Study with Japanese Learners of French
Lionel Fontan, Maxime Le Coz, Sylvain Detey
Analysis of L2 Learners’ Progress of Distinguishing Mandarin Tone 2 and Tone 3
Yue Sun, Win Thuzar Kyaw, Jinsong Zhang, Yoshinori Sagisaka
Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis
Xu Li, Shaoguang Mao, Xixin Wu, Kun Li, Xunying Liu, Helen Meng
Wuxi Speakers’ Production and Perception of Coda Nasals in Mandarin
Lei Wang, Jie Cui, Ying Chen
The Diphthongs of Formal Nigerian English: A Preliminary Acoustic Analysis
Natalia Dyrenko, Robert Fuchs
Characterizing Rhythm Differences between Strong and Weak Accented L2 Speech
Chris Davis, Jeesun Kim
Analysis of Phone Errors Attributable to Phonological Effects Associated With Language Acquisition Through Bottleneck Feature Visualisations
Eva Fringi, Martin Russell
Category Similarity in Multilingual Pronunciation Training
Jacques Koreman
Talker Diarization in the Wild: the Case of Child-centered Daylong Audio-recordings
Alejandrina Cristia, Shobhana Ganesh, Marisa Casillas, Sriram Ganapathy
Automated Classification of Children’s Linguistic versus Non-Linguistic Vocalisations
Zixing Zhang, Alejandrina Cristia, Anne Warlaumont, Björn Schuller
Pitch Characteristics of L2 English Speech by Chinese Speakers: A Large-scale Study
Jiahong Yuan, Qiusi Dong, Fei Wu, Huan Luan, Xiaofei Yang, Hui Lin, Yang Liu
Dual Language Models for Code Switched Speech Recognition
Saurabh Garg, Tanmay Parekh, Preethi Jyothi
Multilingual Neural Network Acoustic Modelling for ASR of Under-Resourced English-isiZulu Code-Switched Speech
Astik Biswas, Febe de Wet, Ewald van der Westhuizen, Emre Yılmaz, Thomas Niesler
Fast ASR-free and Almost Zero-resource Keyword Spotting Using DTW and CNNs for Humanitarian Monitoring
Raghav Menon, Herman Kamper, John Quinn, Thomas Niesler
Text-Dependent Speech Enhancement for Small-Footprint Robust Keyword Detection
Meng Yu, Xuan Ji, Yi Gao, Lianwu Chen, Jie Chen, Jimeng Zheng, Dan Su, Dong Yu
Improved ASR for Under-resourced Languages through Multi-task Learning with Acoustic Landmarks
Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen
Cross-language Phoneme Mapping for Low-resource Languages: An Exploration of Benefits and Trade-offs
Nick K Chibuye, Todd Rosenstock, Brian DeRenzi
User-centric Evaluation of Automatic Punctuation in ASR Closed Captioning
Máté Ákos Tündik, György Szaszák, Gábor Gosztolya, András Beke
Punctuation Prediction Model for Conversational Speech
Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel, Najim Dehak
BUT OpenSAT 2017 Speech Recognition System
Martin Karafiát, Murali Karthick Baskar, Igor Szöke, Vladimír Malenovský, Karel Veselý, František Grézl, Lukáš Burget, Jan Černocký
Visual Recognition of Continuous Cued Speech Using a Tandem CNN-HMM Approach
Li Liu, Thomas Hueber, Gang Feng, Denis Beautemps
Building Large-vocabulary Speaker-independent Lipreading Systems
Kwanchiva Thangthai, Richard Harvey
CRIM's System for the MGB-3 English Multi-Genre Broadcast Media Transcription
Vishwa Gupta, Gilles Boulianne
Sampling Strategies in Siamese Networks for Unsupervised Speech Representation Learning
Rachid Riad, Corentin Dancette, Julien Karadayi, Neil Zeghidour, Thomas Schatz, Emmanuel Dupoux
Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting
Mengzhe Chen, ShiLiang Zhang, Ming Lei, Yong Liu, Haitao Yao, Jie Gao
Multilingual Bottleneck Features for Subword Modeling in Zero-resource Languages
Enno Hermann, Sharon Goldwater
Exploiting Speaker and Phonetic Diversity of Mismatched Language Resources for Unsupervised Subword Modeling
Siyuan Feng, Tan Lee
Unsupervised Word Segmentation from Speech with Attention
Pierre Godard, Marcely Zanon Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio, Laurent Besacier
Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments
Nils Holzenberger, Mingxing Du, Julien Karadayi, Rachid Riad, Emmanuel Dupoux
Full Bayesian Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery
Thomas Glarner, Patrick Hanebrink, Janek Ebbers, Reinhold Haeb-Umbach
Unspeech: Unsupervised Speech Context Embeddings
Benjamin Milde, Chris Biemann
Impact of Aliasing on Deep CNN-Based End-to-End Acoustic Models
Yuan Gong, Christian Poellabauer
Keyword Based Speaker Localization: Localizing a Target Speaker in a Multi-speaker Environment
Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr
End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction
Zhong-Qiu Wang, Jonathan Le Roux, DeLiang Wang, John Hershey
PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation
Naoya Takahashi, Purvi Agrawal, Nabarun Goswami, Yuki Mitsufuji
Integrating Spectral and Spatial Features for Multi-Channel Speaker Separation
Zhong-Qiu Wang, DeLiang Wang
DNN Driven Speaker Independent Audio-Visual Mask Estimation for Speech Separation
Mandar Gogate, Ahsan Adeel, Ricard Marxer, Jon Barker, Amir Hussain
Exploring Temporal Reduction in Dialectal Spanish: A Large-scale Study of Lenition of Voiced Stops and Coda-s
Ioana Vasilescu, Nidia Hernandez, Bianca Vieru, Lori Lamel
Dialect-geographical Acoustic-Tonetics: Five Disyllabic Tone Sandhi Patterns in Cognate Words from the Wu Dialects of ZhèJiāNg Province
Phil Rose
Regional Variation of /r/ in Swiss German Dialects
Adrian Leemann, Stephan Schmid, Dieter Studer-Joho, Marie-José Kolly
Variation in the FACE Vowel across West Yorkshire: Implications for Forensic Speaker Comparisons
Kate Earnshaw, Erica Gold
The ‘West Yorkshire Regional English Database’: Investigations into the Generalizability of Reference Populations for Forensic Speaker Comparison Casework
Erica Gold, Sula Ross, Kate Earnshaw
Studying Vowel Variation in French-Algerian Arabic Code-switched Speech
Jane Wottawa, Amazouz Djegdjiga, Martine Adda-Decker, Lori Lamel
Fearless Steps: Apollo-11 Corpus Advancements for Speech Technologies from Earth to the Moon
John H.L. Hansen, Abhijeet Sangwan, Aditya Joglekar, Ahmet E. Bulut, Lakshmish Kaushik, Chengzhu Yu
A Knowledge Driven Structural Segmentation Approach for Play-Talk Classification During Autism Assessment
Manoj Kumar, Pooja Chebolu, So Hyun Kim, Kassandra Martinez, Catherine Lord, Shrikanth Narayanan
An Open Source Emotional Speech Corpus for Human Robot Interaction Applications
Jesin James, Li Tian, Catherine Inez Watson
Speech Database and Protocol Validation Using Waveform Entropy
Itshak Lapidot, Héctor Delgado, Massimiliano Todisco, Nicholas Evans, Jean-François Bonastre
A French-Spanish Multimodal Speech Communication Corpus Incorporating Acoustic Data, Facial, Hands and Arms Gestures Information
Lucas D. Terissi, Gonzalo Sad, Mauricio Cerda, Slim Ouni, Rodrigo Galvez, Juan C. Gómez, Bernard Girau, Nancy Hitschfeld-Kahler
L2-ARCTIC: A Non-native English Speech Corpus
Guanlong Zhao, Sinem Sonsaat, Alif Silpachai, Ivana Lucic, Evgeny Chukharev-Hudilainen, John Levis, Ricardo Gutierrez-Osuna
ZCU-NTIS Speaker Diarization System for the DIHARD 2018 Challenge
Zbyněk Zajíc, Marie Kunešová, Jan Zelinka, Marek Hrúz
Speaker Diarization with Enhancing Speech for the First DIHARD Challenge
Lei Sun, Jun Du, Chao Jiang, Xueyang Zhang, Shan He, Bing Yin, Chin-Hui Lee
BUT System for DIHARD Speech Diarization Challenge 2018
Mireia Diez, Federico Landini, Lukáš Burget, Johan Rohdin, Anna Silnova, Kateřina Žmolíková, Ondřej Novotný, Karel Veselý, Ondřej Glembek, Oldřich Plchot, Ladislav Mošner, Pavel Matějka
Estimation of the Number of Speakers with Variational Bayesian PLDA in the DIHARD Diarization Challenge.
Ignacio Viñals, Pablo Gimeno, Alfonso Ortega, Antonio Miguel, Eduardo Lleida
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
Gregory Sell, David Snyder, Alan McCree, Daniel Garcia-Romero, Jesús Villalba, Matthew Maciejewski, Vimal Manohar, Najim Dehak, Daniel Povey, Shinji Watanabe, Sanjeev Khudanpur
The EURECOM Submission to the First DIHARD Challenge
Jose Patino, Héctor Delgado, Nicholas Evans
Joint Discriminative Embedding Learning, Speech Activity and Overlap Detection for the DIHARD Speaker Diarization Challenge
Valter Akira Miasato Filho, Diego Augusto Silva, Luis Gustavo Depra Cuozzo
Multilingual Grapheme-to-Phoneme Conversion with Global Character Vectors
Jinfu Ni, Yoshinori Shiga, Hisashi Kawai
A Hybrid Approach to Grapheme to Phoneme Conversion in Assamese
Somnath Roy, Shakuntala Mahanta
Investigation of Using Disentangled and Interpretable Representations for One-shot Cross-lingual Voice Conversion
Seyed Hamidreza Mohammadi, Taehwan Kim
Using Pupillometry to Measure the Cognitive Load of Synthetic Speech
Avashna Govender, Simon King
Measuring the Cognitive Load of Synthetic Speech Using a Dual Task Paradigm
Avashna Govender, Simon King
Attentive Sequence-to-Sequence Learning for Diacritic Restoration of YorùBá Language Text
Iroro Orife
Gated Convolutional Neural Network for Sentence Matching
Peixin Chen, Wu Guo, Zhi Chen, Jian Sun, Lanhua You
On Training and Evaluation of Grapheme-to-Phoneme Mappings with Limited Data
Dravyansh Sharma
The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech
Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins, Björn Schuller
Word Emphasis Prediction for Expressive Text to Speech
Yosi Mass, Slava Shechtman, Moran Mordechay, Ron Hoory, Oren Sar Shalom, Guy Lev, David Konopnicki
A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis
Kai-Zhan Lee, Erica Cooper, Julia Hirschberg
Data Requirements, Selection and Augmentation for DNN-based Speech Synthesis from Crowdsourced Data
Markus Toman, Geoffrey S. Meltzner, Rupal Patel
Lightly Supervised vs. Semi-supervised Training of Acoustic Model on Luxembourgish for Low-resource Automatic Speech Recognition
Karel Veselý, Carlos Segura, Igor Szöke, Jordi Luque, Jan Černocký
Investigation on the Combination of Batch Normalization and Dropout in BLSTM-based Acoustic Modeling for ASR
Li Wenjie, Gaofeng Cheng, Fengpei Ge, Pengyuan Zhang, Yonghong Yan
Inference-Invariant Transformation of Batch Normalization for Domain Adaptation of Acoustic Models
Masayuki Suzuki, Tohru Nagano, Gakuto Kurata, Samuel Thomas
Active Learning for LF-MMI Trained Neural Networks in ASR
Yanhua Long, Hong Ye, Yijie Li, Jiaen Liang
An Investigation of Mixup Training Strategies for Acoustic Models in ASR
Ivan Medennikov, Yuri Khokhlov, Aleksei Romanenko, Dmitry Popov, Natalia Tomashenko, Ivan Sorokin, Alexander Zatvornitskiy
Comparison of Unsupervised Modulation Filter Learning Methods for ASR
Purvi Agrawal, Sriram Ganapathy
Improved Training for Online End-to-end Speech Recognition Systems
Suyoun Kim, Michael Seltzer, Jinyu Li, Rui Zhao
Combining Natural Gradient with Hessian Free Methods for Sequence Training
Adnan Haider, Philip Woodland
Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models
Naoyuki Kanda, Yusuke Fujita, Kenji Nagamatsu
A Study of Enhancement, Augmentation and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition
Hao Tang, Wei-Ning Hsu, François Grondin, James Glass
Multilingual Deep Neural Network Training Using Cyclical Learning Rate
Andreas Søeborg Kirkedal, Yeon-Jun Kim
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus
Jianwei Yu, Xurong Xie, Shansong Liu, Shoukang Hu, Max W. Y. Lam, Xixin Wu, Ka Ho Wong, Xunying Liu, Helen Meng
Automatic Evaluation of Speech Intelligibility Based on I-vectors in the Context of Head and Neck Cancers
Imed Laaridh, Corinne Fredouille, Alain Ghio, Muriel Lalain, Virginie Woisard
Dysarthric Speech Recognition Using Convolutional LSTM Neural Network
Myungjong Kim, Beiming Cao, Kwanghoon An, Jun Wang
Perceptual and Automatic Evaluations of the Intelligibility of Speech Degraded by Noise Induced Hearing Loss Simulation
Imed Laaridh, Julien Tardieu, Cynthia Magnen, Pascal Gaillard, Jérôme Farinas, Julien Pinquier
Articulatory Features for ASR of Pathological Speech
Emre Yılmaz, Vikramjit Mitra, Chris Bartels, Horacio Franco
Mining Multimodal Repositories for Speech Affecting Diseases
Joana Correia, Bhiksha Raj, Isabel Trancoso, Francisco Teixeira
Long Distance Voice Channel Diagnosis Using Deep Neural Networks
Zhen Qin, Tom Ko, Guangjian Tian
Speech Recognition for Medical Conversations
Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu, Xuedong Zhang
Prosodic Focus Acquisition in French Early Cochlear Implanted Children
Chadi Farah, Stephane Roman, Mariapaola D'Imperio
The Role of Temporal Variation in Narrative Organization
Nassima Fezza
Interaction Mechanisms between Glottal Source and Vocal Tract in Pitch Glides
Tiina Murtola, Jarmo Malinen
Relating Articulatory Motions in Different Speaking Rates
Astha Singh, G. Nisha Meenakshi, Prasanta Kumar Ghosh
Estimation of the Asymmetry Parameter of the Glottal Flow Waveform Using the Electroglottographic Signal
João Cabral
Classification of Disorders in Vocal Folds Using Electroglottographic Signal
Tanumay Mandal, K. Sreenivasa Rao, Sanjay Kumar Gupta
Automatic Glottis Localization and Segmentation in Stroboscopic Videos Using Deep Neural Network
Achuth Rao MV, Rahul Krishnamurthy, Pebbili Gopikishore, Veeramani Priyadharshini, Prasanta Kumar Ghosh
Respiratory and Respiratory Muscular Control in JL1’s and JL2’s Text Reading Utilizing 4-RSTs and a Soft Respiratory Mask with a Two-Way Bulb
Toshiko Isei-Jaakkola, Keiko Ochi, Keikichi Hirose
A Preliminary Study on Tonal Coarticulation in Continuous Speech
Lixia Hao, Wei Zhang, Yanlu Xie, Jinsong Zhang
Far-Field Speech Recognition Using Multivariate Autoregressive Models
Sriram Ganapathy, Madhumita Harish
Efficient Implementation of the Room Simulator for Training Deep Neural Network Acoustic Models
Chanwoo Kim, Ehsan Variani, Arun Narayanan, Michiel Bacchiani
Stream Attention for Distributed Multi-Microphone Speech Recognition
Xiaofei Wang, Ruizhi Li, Hynek Hermansky
Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks
Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao, Fil Alleva
Integrating Neural Network Based Beamforming and Weighted Prediction Error Dereverberation
Lukas Drude, Christoph Boeddeker, Jahn Heymann, Reinhold Haeb-Umbach, Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani
A Probability Weighted Beamformer for Noise Robust ASR
Suliang Bu, Yunxin Zhao, Meiyuh Hwang, Sining Sun
Effects of Dimensional Input on Paralinguistic Information Perceived from Synthesized Dialogue Speech with Neural Network
Masaki Yokoyama, Tomohiro Nagata, Hiroki Mori
Neural MultiVoice Models for Expressing Novel Personalities in Dialog
Shereen Oraby, Lena Reed, Sharath T.S., Shubhangi Tandon, Marilyn Walker
Expressive Speech Synthesis Using Sentiment Embeddings
Igor Jauk, Jaime Lorenzo-Trueba, Junichi Yamagishi, Antonio Bonafonte
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder
Kei Akuzawa, Yusuke Iwasawa, Yutaka Matsuo
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis
Xixin Wu, Yuewen Cao, Mu Wang, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng
EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System
Hao Li, Yongguo Kang, Zhenyu Wang
Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech
Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval, Björn Schuller
An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition
Pengcheng Li, Yan Song, Ian McLoughlin, Wu Guo, Lirong Dai
Predicting Arousal and Valence from Waveforms and Spectrograms Using Deep Neural Networks
Zixiaofan Yang, Julia Hirschberg
Emotion Identification from Raw Speech Signals Using DNNs
Mousmita Sarma, Pegah Ghahremani, Daniel Povey, Nagendra Kumar Goel, Kandarpa Kumar Sarma, Najim Dehak
Encoding Individual Acoustic Features Using Dyad-Augmented Deep Variational Representations for Dialog-level Emotion Recognition
Jeng-Lin Li, Chi-Chun Lee
Variational Autoencoders for Learning Latent Representations of Speech Emotion: A Preliminary Study
Siddique Latif, Rajib Rana, Junaid Qadir, Julien Epps
Phoneme-to-Articulatory Mapping Using Bidirectional Gated RNN
Théo Biasutto-Lervat, Slim Ouni
Tongue Segmentation with Geometrically Constrained Snake Model
Zhihua Su, Jianguo Wei, Qiang Fang, Jianrong Wang, Kiyoshi Honda
Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory
Aravind Illa, Prasanta Kumar Ghosh
Automatic Visual Augmentation for Concatenation Based Synthesized Articulatory Videos from Real-time MRI Data for Spoken Language Training
Chandana S, Chiranjeevi Yarra, Ritu Aggarwal, Sanjeev Kumar Mittal, Kausthubha N K, Raseena K T, Astha Singh, Prasanta Kumar Ghosh
Air-Tissue Boundary Segmentation in Real-Time Magnetic Resonance Imaging Video Using Semantic Segmentation with Fully Convolutional Networks
Valliappan CA, Renuka Mannem, Prasanta Kumar Ghosh
Noise Robust Acoustic to Articulatory Speech Inversion
Nadee Seneviratne, Ganesh Sivaraman, Vikramjit Mitra, Carol Espy-Wilson
Designing a Pneumatic Bionic Voice Prosthesis - A Statistical Approach for Source Excitation Generation
Farzaneh Ahmadi, Tomoki Toda
A Neural Model to Predict Parameters for a Generalized Command Response Model of Intonation
Bastian Schnell, Philip N. Garner
Articulation-to-Speech Synthesis Using Articulatory Flesh Point Sensors’ Orientation Information
Beiming Cao, Myungjong Kim, Jun R. Wang, Jan van Santen, Ted Mau, Jun Wang
Effectiveness of Generative Adversarial Network for Non-Audible Murmur-to-Whisper Speech Conversion
Neil Shah, Nirmesh Shah, Hemant Patil
Investigating Objective Intelligibility in Real-Time EMG-to-Speech Conversion
Lorenz Diener, Tanja Schultz
Domain-Adversarial Training for Session Independent EMG-based Speech Recognition
Michael Wand, Tanja Schultz, Jürgen Schmidhuber
Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces
László Tóth, Gábor Gosztolya, Tamás Grósz, Alexandra Markó, Tamás Gábor Csapó
Transcription Correction for Indian Languages Using Acoustic Signatures
Jeena JPrakash, Golda Brunet Rajan, Hema Murthy
BUT System for Low Resource Indian Language ASR
Bhargav Pulugundla, Murali Karthick Baskar, Santosh Kesiraju, Ekaterina Egorova, Martin Karafiát, Lukáš Burget, Jan Černocký
DA-IICT/IIITV System for Low Resource Speech Recognition Challenge 2018
Hardik B. Sailor, Maddala Venkata Siva Krishna, Diksha Chhabra, Ankur T. Patil, Madhu Kamble, Hemant Patil
An Exploration towards Joint Acoustic Modeling for Indian Languages: IIIT-H Submission for Low Resource Speech Recognition Challenge for Indian Languages, INTERSPEECH 2018
Hari Krishna Vydana, Krishna Gurugubelli, V V V Raju, Anil Kumar Vuppala
TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages
Noor Fathima, Tanvina Patel, Mahima C, Anuroop Iyengar
Articulatory and Stacked Bottleneck Features for Low Resource Speech Recognition
Vishwas M. Shetty, Rini A Sharon, Basil Abraham, Tejaswi Seeram, Anusha Prakash, Nithya Ravi, S. Umesh
ISI ASR System for the Low Resource Speech Recognition Challenge for Indian Languages
Jayadev Billa
An Automated Assistant for Medical Scribes
Gregory Finley, Erik Edwards, Amanda Robinson, Najmeh Sadoughi, James Fone, Mark Miller, David Suendermann-Oeft, Michael Brenndoerfer, Nico Axtmann
AGROASSAM: A Web Based Assamese Speech Recognition Application for Retrieving Agricultural Commodity Price and Weather Information
Abhishek Dey, Abhash Deka, Siddika Imani, Barsha Deka, Rohit Sinha, S R Mahadeva Prasanna, Priyankoo Sarmah, K Samudravijaya, S. R. Nirmala
Voice-powered Solutions with Cloud AI
Dan Aharon
Speech Synthesis in the Wild
Ganesh Sivaraman, Parav Nagarsheth, Elie Khoury
Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement
Shuai Nie, Shan Liang, Bin Liu, Yaping Zhang, Wenju Liu, Jianhua Tao
A Deep Neural Network Based Harmonic Noise Model for Speech Enhancement
Zhiheng Ouyang, Hongjiang Yu, Wei-Ping Zhu, Benoit Champagne
A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement
Ke Tan, DeLiang Wang
All-Neural Multi-Channel Speech Enhancement
Zhong-Qiu Wang, DeLiang Wang
Deep Learning for Acoustic Echo Cancellation in Noisy and Double-Talk Scenarios
Hao Zhang, DeLiang Wang
The Conversation: Deep Audio-Visual Speech Enhancement
Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman
Student-Teacher Learning for BLSTM Mask-based Speech Enhancement
Aswin Shanmugam Subramanian, Szu-Jui Chen, Shinji Watanabe
Speech Enhancement Using Deep Mixture of Experts Based on Hard Expectation Maximization
Pavan Karjol, Prasanta Kumar Ghosh
Adversarial Feature-Mapping for Speech Enhancement
Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang (Fred) Juang
Biophysically-inspired Features Improve the Generalizability of Neural Network-based Speech Enhancement Systems
Deepak Baby, Sarah Verhulst
Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement
Li Chai, Jun Du, Chin-Hui Lee
A Priori SNR Estimation Based on a Recurrent Neural Network for Robust Speech Enhancement
Yangyang Xia, Richard Stern
Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection
Shao-Yen Tseng, Juncheng Li, Yun Wang, Florian Metze, Joseph Szurley, Samarjit Das
Unsupervised Temporal Feature Learning Based on Sparse Coding Embedded BoAW for Acoustic Event Recognition
Liwen Zhang, Jiqing Han, Shiwen Deng
Data Independent Sequence Augmentation Method for Acoustic Scene Classification
Zhang Teng, Kailai Zhang, Ji Wu
A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification
Hongwei Song, Jiqing Han, Shiwen Deng
ASe: Acoustic Scene Embedding Using Deep Archetypal Analysis and GMM
Pulkit Sharma, Vinayak Abrol, Anshul Thakur
Deep Convolutional Neural Network with Scalogram for Audio Scene Modeling
Hangting Chen, Pengyuan Zhang, Haichuan Bai, Qingsheng Yuan, Xiuguo Bao, Yonghong Yan
Time Aggregation Operators for Multi-label Audio Event Detection
Pankaj Joshi, Digvijaysingh Gautam, Ganesh Ramakrishnan, Preethi Jyothi
Early Detection of Continuous and Partial Audio Events Using CNN
Ian McLoughlin, Yan Song, Lam Dang Pham, Ramaswamy Palaniappan, Huy Phan, Yue Lang
Robust Acoustic Event Classification Using Bag-of-Visual-Words
Manjunath Mulimani, Shashidhar G Koolagudi
Wavelet Transform Based Mel-scaled Features for Acoustic Scene Classification
Shefali Waldekar, Goutam Saha
Multi-modal Attention Mechanisms in LSTM and Its Application to Acoustic Scene Classification
Teng Zhang, Kailai Zhang, Ji Wu
Contextual Language Model Adaptation for Conversational Agents
Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anu Venkatesh, Ariya Rastrow
Active Memory Networks for Language Modeling
Oscar Chen, Anton Ragni, Mark Gales, Xie Chen
Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR
Yerbolat Khassanov, Eng Siong Chng
Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition
Yike Zhang, Pengyuan Zhang, Yonghong Yan
Training Recurrent Neural Network through Moment Matching for NLP Applications
Yue Deng, Yilin Shen, KaWai Chen, Hongxia Jin
Investigation on LSTM Recurrent N-gram Language Models for Speech Recognition
Zoltán Tüske, Ralf Schlüter, Hermann Ney
Online Incremental Learning for Speaker-Adaptive Language Models
Chih Chi Hu, Bing Liu, John Shen, Ian Lane
Efficient Language Model Adaptation with Noise Contrastive Estimation and Kullback-Leibler Regularization
Jesús Andrés-Ferrer, Nathan Bodenstab, Paul Vozila
Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition
Ke Li, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur
What to Expect from Expected Kneser-Ney Smoothing
Michael Levit, Sarangarajan Parthasarathy, Shuangyu Chang
i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models
Karel Beneš, Santosh Kesiraju, Lukáš Burget
How Did You like 2017? Detection of Language Markers of Depression and Narcissism in Personal Narratives
Eva-Maria Rathner, Julia Djamali, Yannik Terhorst, Björn Schuller, Nicholas Cummins, Gudrun Salamon, Christina Hunger-Schoppe, Harald Baumeister
Depression Detection from Short Utterances via Diverse Smartphones in Natural Environmental Conditions
Zhaocheng Huang, Julien Epps, Dale Joachim, Michael Chen
Multi-Lingual Depression-Level Assessment from Conversational Speech Using Acoustic and Text Features
Yasin Özkanca, Cenk Demiroglu, Aslı Besirli, Selime Celik
Dysarthric Speech Classification Using Glottal Features Computed from Non-words, Words and Sentences
Narendra N P, Paavo Alku
Identifying Schizophrenia Based on Temporal Parameters in Spontaneous Speech
Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi, Ildikó Hoffmann
Using Prosodic and Lexical Information for Learning Utterance-level Behaviors in Psychotherapy
Karan Singla, Zhuohao Chen, Nikolaos Flemotomos, James Gibson, Dogan Can, David Atkins, Shrikanth Narayanan
Automatic Speech Assessment for People with Aphasia Using TDNN-BLSTM with Multi-Task Learning
Ying Qin, Tan Lee, Siyuan Feng, Anthony Pak Hin Kong
Towards an Unsupervised Entrainment Distance in Conversational Speech Using Deep Neural Networks
Md Nasir, Brian Baucom, Shrikanth Narayanan, Panayiotis Georgiou
Patient Privacy in Paralinguistic Tasks
Francisco Teixeira, Alberto Abad, Isabel Trancoso
A Lightly Supervised Approach to Detect Stuttering in Children's Speech
Sadeen Alharbi, Madina Hasan, Anthony J H Simons, Shelagh Brumfitt, Phil Green
Learning Conditional Acoustic Latent Representation with Gender and Age Attributes for Automatic Pain Level Recognition
Jeng-Lin Li, Yi-Ming Weng, Chip-Jin Ng, Chi-Chun Lee
A Deep Reinforcement Learning Based Multimodal Coaching Model (DCM) for Slot Filling in Spoken Language Understanding(SLU)
Yu Wang, Abhishek Patel, Yilin Shen, Hongxia Jin
Is ATIS Too Shallow to Go Deeper for Benchmarking Spoken Language Understanding Models?
Frédéric Béchet, Christian Raymond
Robust Spoken Language Understanding via Paraphrasing
Avik Ray, Yilin Shen, Hongxia Jin
Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension
Chia-Hsuan Lee, Szu-Lin Wu, Chi-Liang Liu, Hung-yi Lee
User Information Augmented Semantic Frame Parsing Using Progressive Neural Networks
Yilin Shen, Xiangyu Zeng, Yu Wang, Hongxia Jin
An Efficient Approach to Encoding Context for Spoken Language Understanding
Raghav Gupta, Abhinav Rastogi, Dilek Hakkani-Tür
Deep Speech Denoising with Vector Space Projections
Jeffrey Hetherly, Paul Gamble, Maria Alejandra Barrios, Cory Stephenson, Karl Ni
A Shifted Delta Coefficient Objective for Monaural Speech Separation Using Multi-task Learning
Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li
A Two-Stage Approach to Noisy Cochannel Speech Separation with Gated Residual Networks
Ke Tan, DeLiang Wang
Monoaural Audio Source Separation Using Variational Autoencoders
Laxmi Pandey, Anurendra Kumar, Vinay Namboodiri
Towards Automated Single Channel Source Separation Using Neural Networks
Arpita Gang, Pravesh Biyani, Akshay Soni
Investigations on Data Augmentation and Loss Functions for Deep Learning Based Speech-Background Separation
Hakan Erdogan, Takuya Yoshioka
Annotator Trustability-based Cooperative Learning Solutions for Intelligent Audio Analysis
Simone Hantke, Christoph Stemp, Björn Schuller
Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription
Rongfeng Su, Xunying Liu, Lan Wang
Deep Lip Reading: A Comparison of Models and an Online Application
Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman
Iterative Learning of Speech Recognition Models for Air Traffic Control
Ajay Srinivasamurthy, Petr Motlicek, Mittul Singh, Youssef Oualil, Matthias Kleinert, Heiko Ehr, Hartmut Helmke
Speaker Adaptive Audio-Visual Fusion for the Open-Vocabulary Section of AVICAR
Leda Sari, Mark Hasegawa-Johnson, Kumaran S, Georg Stemmer, Krishnakumar N Nair
Multimodal Name Recognition in Live TV Subtitling
Marek Hrúz, Aleš Pražák, Michal Bušta
Dithered Quantization for Frequency-Domain Speech and Audio Coding
Tom Bäckström, Johannes Fischer, Sneha Das
Postfiltering with Complex Spectral Correlations for Speech and Audio Coding
Sneha Das, Tom Bäckström
Postfiltering Using Log-Magnitude Spectrum for Speech and Audio Coding
Sneha Das, Tom Bäckström
Temporal Noise Shaping with Companding
Arijit Biswas, Per Hedelin, Lars Villemoes, Vinay Melkote
Multi-frame Quantization of LSF Parameters Using a Deep Autoencoder and Pyramid Vector Quantizer
Yaxing Li, Eshete Derb Emiru, Shengwu Xiong, Anna Zhu, Pengfei Duan, Yichang Li
Multi-frame Coding of LSF Parameters Using Block-Constrained Trellis Coded Vector Quantization
Yaxing Li, Shan Xu, Shengwu Xiong, Anna Zhu, Pengfei Duan, Yueming Ding
Training Utterance-level Embedding Networks for Speaker Identification and Verification
Heewoong Park, Sukhyun Cho, Kyubyong Park, Namju Kim, Jonghun Park
Analysis of Complementary Information Sources in the Speaker Embeddings Framework
Mahesh Kumar Nandwana, Mitchell McLaren, Diego Castan, Julien van Hout, Aaron Lawson
Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification
Yingke Zhu, Tom Ko, David Snyder, Brian Mak, Daniel Povey
An Improved Deep Embedding Learning Method for Short Duration Speaker Verification
Zhifu Gao, Yan Song, Ian McLoughlin, Wu Guo, Lirong Dai
Avoiding Speaker Overfitting in End-to-End DNNs Using Raw Waveform for Text-Independent Speaker Verification
Jee-weon Jung, Hee-soo Heo, IL-ho Yang, Hye-jin Shim, Ha-jin Yu
Deeply Fused Speaker Embeddings for Text-Independent Speaker Verification
Gautam Bhattacharya, Md Jahangir Alam, Vishwa Gupta, Patrick Kenny
Employing Phonetic Information in DNN Speaker Embeddings to Improve Speaker Recognition Performance
Md Hafizur Rahman, Ivan Himawan, Mitchell McLaren, Clinton Fookes, Sridha Sridharan
End-to-end Text-dependent Speaker Verification Using Novel Distance Measures
Subhadeep Dey, Srikanth Madikeri, Petr Motlicek
Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams
Harishchandra Dubey, Abhijeet Sangwan, John H.L. Hansen
Triplet Network with Attention for Speaker Diarization
Huan Song, Megan Willi, Jayaraman J. Thiagarajan, Visar Berisha, Andreas Spanias
I-vector Transformation Using Conditional Generative Adversarial Networks for Short Utterance Speaker Verification
Jiacen Zhang, Nakamasa Inoue, Koichi Shinoda
Analysis of Length Normalization in End-to-End Speaker Verification System
Weicheng Cai, Jinkun Chen, Ming Li
Angular Softmax for Short-Duration Text-independent Speaker Verification
Zili Huang, Shuai Wang, Kai Yu
An End-to-End Text-Independent Speaker Identification System on Short Utterances
Ruifang Ji, Xinyuan Cai, Xu Bo
MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks
Wenhao Ding, Liang He
Categorical vs Dimensional Perception of Italian Emotional Speech
Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Alice Baird, Björn Schuller
A Three-Layer Emotion Perception Model for Valence and Arousal-Based Detection from Multilingual Speech
Xingfeng Li, Masato Akagi
Cross-lingual Speech Emotion Recognition through Factor Analysis
Brecht Desplanques, Kris Demuynck
Modeling Self-Reported and Observed Affect from Speech
Jian Cheng, Jared Bernstein, Elizabeth Rosenfeld, Peter W. Foltz, Alex S. Cohen, Terje B. Holmlund, Brita Elvevåg
Stochastic Shake-Shake Regularization for Affective Learning from Speech
Che-Wei Huang, Shrikanth Narayanan
Investigating Speech Enhancement and Perceptual Quality for Speech Emotion Recognition
Anderson R. Avila, Md Jahangir Alam, Douglas O'Shaughnessy, Tiago Falk
Demonstrating and Modelling Systematic Time-varying Annotator Disagreement in Continuous Emotion Annotation
Mia Atcheson, Vidhyasaharan Sethu, Julien Epps
Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function
Jian Huang, Ya Li, Jianhua Tao, Zhen Lian
Imbalance Learning-based Framework for Fear Recognition in the MediaEval Emotional Impact of Movies Task
Xiaotong Zhang, Xingliang Cheng, Mingxing Xu, Thomas Fang Zheng
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms
Xi Ma, Zhiyong Wu, Jia Jia, Mingxing Xu, Helen Meng, Lianhong Cai
Speech Emotion Recognition Using Spectrogram & Phoneme Embedding
Promod Yenigalla, Abhay Kumar, Suraj Tripathi, Chirag Singh, Sibsambhu Kar, Jithendra Vepa
On Enhancing Speech Emotion Recognition Using Generative Adversarial Networks
Saurabh Sahu, Rahul Gupta, Carol Espy-Wilson
Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes
Srinivas Parthasarathy, Carlos Busso
Knowledge Distillation for Sequence Model
Mingkun Huang, Yongbin You, Zhehuai Chen, Yanmin Qian, Kai Yu
Improving CTC-based Acoustic Model with Very Deep Residual Time-delay Neural Networks
Sheng Li, Xugang Lu, Ryoichi Takashima, Peng Shen, Tatsuya Kawahara, Hisashi Kawai
Filter Sampling and Combination CNN (FSC-CNN): A Compact CNN Model for Small-footprint ASR Acoustic Modeling Using Raw Waveforms
Jinxi Guo, Ning Xu, Xin Chen, Yang Shi, Kaiyuan Xu, Abeer Alwan
Twin Regularization for Online Speech Recognition
Mirco Ravanelli, Dmitriy Serdyuk, Yoshua Bengio
Self-Attentional Acoustic Models
Matthias Sperber, Jan Niehues, Graham Neubig, Sebastian Stüker, Alex Waibel
Hierarchical Recurrent Neural Networks for Acoustic Modeling
Jinhwan Park, Iksoo Choi, Yoonho Boo, Wonyong Sung
Dictionary Augmented Sequence-to-Sequence Neural Network for Grapheme to Phoneme Prediction
Antoine Bruguier, Anton Bakhtin, Dravyansh Sharma
Leveraging Second-Order Log-Linear Model for Improved Deep Learning Based ASR Performance
Ankit Raj, Shakti P Rath, Jithendra Vepa
Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks
Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, Sanjeev Khudanpur
Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings
Da-Rong Liu, Kuan-Yu Chen, Hung-yi Lee, Lin-shan Lee
Phone Recognition Using a Non-Linear Manifold with Broad Phone Class Dependent DNNs
Mengjie Qian, Linxue Bai, Peter Jančovič, Martin Russell
A Multi-Discriminator CycleGAN for Unsupervised Non-Parallel Speech Domain Adaptation
Ehsan Hosseini-Asl, Yingbo Zhou, Caiming Xiong, Richard Socher
Interactions between Vowels and Nasal Codas in Mandarin Speakers’ Perception of Nasal Finals
Chong Cao, Wei Wei, Wei Wang, Yanlu Xie, Jinsong Zhang
Weighting Pitch Contour and Loudness Contour in Mandarin Tone Perception in Cochlear Implant Listeners
Qinglin Meng, Nengheng Zheng, Ambika Prasad Mishra, Jacinta Dan Luo, Jan W. H. Schnupp
Implementing DIANA to Model Isolated Auditory Word Recognition in English
Filip Nenadić, Louis ten Bosch, Benjamin V. Tucker
Effects of Homophone Density on Spoken Word Recognition in Mandarin Chinese
Bhamini Sharma
Visual Timing Information in Audiovisual Speech Perception: Evidence from Lexical Tone Contour
Hui Xie, Biao Zeng, Rui Wang
COSMO SylPhon: A Bayesian Perceptuo-motor Model to Assess Phonological Learning
Marie-Lou Barnaud, Juien Diard, Pierre Bessière, Jean-Luc Schwartz
Experience-dependent Influence of Music and Language on Lexical Pitch Learning Is Not Additive
Akshay Raj Maggu, Patrick C. M. Wong, Hanjun Liu, Francis C. K. Wong
Influences of Fundamental Oscillation on Speaker Identification in Vocalic Utterances by Humans and Computers
Volker Dellwo, Thayabaran Kathiresan, Elisa Pellegrino, Lei He, Sandra Schwab, Dieter Maurer