Odyssey 2020 The Speaker and Language Recognition Workshop

1-5 November 2020, Tokyo, Japan

Chairs: Kong Aik LEE and Takafumi KOSHINAKA and Koichi SHINODA

ISSN: 2312-2846
DOI: 10.21437/Odyssey.2020


Speaker Recognition 1


MagNetO: X-vector Magnitude Estimation Network plus Offset for Improved Speaker Recognition
Daniel Garcia-Romero, Greg Sell, Alan Mccree

BERTphone: Phonetically-aware Encoder Representations for Utterance-level Speaker and Language Recognition
Shaoshi Ling, Julian Salazar, Yuzong Liu, Katrin Kirchhoff

Orthogonality Regularizations for End-to-End Speaker Verification
Yingke Zhu, Brian Mak

Probabilistic Embeddings for Speaker Diarization
Anna Silnova, Niko Brummer, Johan Rohdin, Themos Stafylakis, Lukas Burget


Speaker and Language Recognition


Zero-Time Windowing Cepstral Coefficients for Dialect Classification
Rashmi Kethireddy, Sudarsana Reddy Kadiri, Santosh Kesiraju, Suryakanth V. Gangashetty

Unsupervised Regularization of the Embedding Extractor for Robust Language Identification
Raphaël Duroselle, Denis Jouvet, Irina Illina

Compensation on x-vector for Short Utterance Spoken Language Identification
Peng Shen, Xugang Lu, Komei Sugiura, Sheng Li, Hisashi Kawai

Improving Embedding-based Neural-Network Speaker Recognition
Po-Chin Wang, Chia-Ping Chen, Chung-Li Lu, Bo-Cheng Chan, Shan-Wen Hsiao

Information Preservation Pooling for Speaker Embedding
Min Hyun Han, Woo Hyun Kang, Sung Hwan Mun, Nam Soo Kim

Neural i-vectors
Ville Vestman, Kong Aik Lee, Tomi Kinnunen

Denoising x-vectors for Robust Speaker Recognition
Mohammad Mohammadamini, Driss Matrouf, Paul-Gauthier Noé

Adaptation Strategy and Clustering from Scratch for New Domains of Speaker Recognition
Pierre-Michel Bousquet, Mickaël Rouvier

Adaptive Mean Normalization for Unsupervised Adaptation of Speaker Embeddings
Mitchell Mclaren, Md Hafizur Rahman, Diego Castan, Mahesh Kumar Nandwana, Aaron Lawson


Diarization


Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm
Andreas Stolcke

DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team
Qingjian Lin, Weicheng Cai, Lin Yang, Junjie Wang, Jun Zhang, Ming Li

On Early-stop Clustering for Speaker Diarization
Liping Chen, Kongaik Lee, Lei He, Frank Soong

Linguistically Aided Speaker Diarization Using Speaker Role Information
Nikolaos Flemotomos, Panayiotis Georgiou, Shrikanth Narayanan

Optimal Mapping Loss: A Faster Loss for End-to-End Speaker Diarization
Qingjian Lin, Tingle Li, Lin Yang, Junjie Wang, Ming Li


Spoofing and Countermeasure 1


Generalization of Audio Deepfake Detection
Tianxiang Chen, Avrosh Kumar, Parav Nagarsheth, Ganesh Sivaraman, Elie Khoury

Using Multi-Resolution Feature Maps with Convolutional Neural Networks for Anti-Spoofing in ASV
Qiongqiong Wang, Kong Aik Lee, Takafumi Koshinaka

Novel Variable Length Teager Energy Profiles for Replay Spoof Detection
Madhu Kamble, Hemant Patil

An Initial Investigation on Optimizing Tandem Speaker Verification and Countermeasure Systems Using Reinforcement Learning
Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi

Black-box Attacks on Automatic Speaker Verification using Feedback-controlled Voice Conversion
Xiaohai Tian, Rohan Kumar Das, Haizhou Li


Keynote: Mirco Ravanelli


Towards Unsupervised Learning of Speech Representations
Mirco Ravanelli


Special Session: VOiCES 2020


The VOiCES from a Distance Challenge 2019: Analysis of Speaker Verification Results and Remaining Challenges
Mahesh Kumar Nandwana, Michael Lomnitz, Colleen Richey, Mitchell McLaren, Diego Castan, Luciana Ferrer, Aaron Lawson

Selective Deep Speaker Embedding Enhancement for Speaker Verification
Jee-Weon Jung, Ju-Ho Kim, Hye-Jin Shim, Seung-bin Kim, Ha-Jin Yu

Deep Speaker Embeddings for Far-Field Speaker Recognition on Short Utterances
Aleksei Gusev, Vladimir Volokhov, Tseren Andzhukaev, Sergey Novoselov, Galina Lavrentyeva, Marina Volkova, Alice Gazizullina, Andrey Shulipa, Artem Gorlanov, Anastasia Avdeeva, Artem Ivanov, Alexander Kozlov, Timur Pekhovsky, Yuri Matveev

Utilizing VOiCES Dataset for Multichannel Speaker Verification with Beamforming
Ladislav Mošner, Oldřich Plchot, Johan Rohdin, Jan Černocký

An Empirical Analysis of Information Encoded in Disentangled Neural Speaker Representations
Raghuveer Peri, Haoqi Li, Krishna Somandepalli, Arindam Jati, Shrikanth Narayanan

NPLDA: A Deep Neural PLDA Model for Speaker Verification
Shreyas Ramoji, Prashant Krishnan, Sriram Ganapathy

Learning Mixture Representation for Deep Speaker Embedding Using Attention
Weiwei Lin, Man Wai Mak, Lu Yi


Voice Conversion and Synthesis


Many-to-Many Voice Conversion Using Cycle-Consistent Variational Autoencoder with Multiple Decoders
Dongsuk Yook, Seong-Gyun Leem, Keonnyeong Lee, In-Chul Yoo

Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis
Jennifer Williams, Joanna Rownicka, Pilar Oplustil, Simon King

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data
Kun Zhou, Berrak Sisman, Haizhou Li

Generative Adversarial Networks for Singing Voice Conversion with and without Parallel Data
Berrak Sisman, Haizhou Li

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss
Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li

Personalized Singing Voice Generation Using WaveRNN
Xiaoxue Gao, Xiaohai Tian, Yi Zhou, Rohan Kumar Das, Haizhou Li


Evaluation and Benchmarking


The 2019 NIST Audio-Visual Speaker Recognition Evaluation
Omid Sadjadi, Craig Greenberg, Elliot Singer, Douglas Reynolds, Lisa Mason, Jaime Hernandez-Cordero

The 2019 NIST Speaker Recognition Evaluation CTS Challenge
Seyed Omid Sadjadi, Craig Greenberg, Elliot Singer, Douglas Reynolds, Lisa Mason, Jaime Hernandez-Cordero

Advances in Speaker Recognition for Telephone and Audio-Visual Data: the JHU-MIT Submission for NIST SRE19
Jesus Antonio Villalba Lopez, Daniel Garcia-Romero, Nanxin Chen, Gregory Sell, Jonas Borgstrom, Alan McCree, Leibny Paola Garcia Perera, Saurabh Kataria, Phani Sankar Nidadavolu, Pedro Torres-Carrasquiilo, Najim Dehak

LEAP System for SRE 2019 CTS Challenge - Improvements and Error Analysis
Shreyas Ramoji, Prashant Krishnan, Bhargavram Mysore, Prachi Singh, Sriram Ganapathy

Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge
Jahangir Alam, Gilles Boulianne, Lukas Burget, Mohamed Dahmane, Mireia Diez Sánchez, Alicia Lozano-Diez, Ondrej Glembek, Pierre-Luc St-Charles, Marc Lalonde, Pavel Matejka, Petr Mizera, Joao Monteiro, Ladislav Mosner, Cedric Noiseux, Ondřej Novotný, Oldrich Plchot, Johan Rohdin, Anna Silnova, Josef Slavicek, Themos Stafylakis, Shuai Wang, Hossein Zeinali


Keynote: Luciana Ferrer


The importance of Calibration in Speaker Verification
Luciana Ferrer


Spoofing and Countermeasure 2


A Multi-condition Training Strategy for Countermeasures Against Spoofing Attacks to Speaker Recognizers
Joao Monteiro, Jahangir Alam, Tiago Falk

Analysis of Teager Energy Profiles for Spoof Speech Detection
Madhu Kamble, Aditya Krishna Sai Pulikonda, Maddala Venkata Siva Krishna, Hemant Patil

Effects of Waveform PMF on Anti-spoofing Detection for Replay Data - ASVspoof 2019
Itshak Lapidot, Jean-Francois Bonastre

Phase Spectrum of Time-flipped Speech Signals for Robust Spoofing Detection
Sung-Hyun Yoon, Min-Sung Koh, Ha-Jin Yu

Residual Networks for Resisting Noise: Analysis of an Embeddings-based Spoofing Countermeasure
Bence Halpern, Finnian Kelly, Rob van Son, Anil Alexander

An Explainability Study of the Constant Q Cepstral Coefficient Spoofing Countermeasure for Automatic Speaker Verification
Hemlata Tak, Jose Patino, Andreas Nautsch, Nicholas Evans, Massimiliano Todisco

Subband Modeling for Spoofing Detection in Automatic Speaker Verification
Bhusan Chettri, Tomi Kinnunen, Emmanouil Benetos


Speaker Recognition 2


Delving into VoxCeleb: Environment Invariant Speaker Recognition
Joon Son Chung, Jaesung Huh, Seongkyu Mun

Dropping Classes for Deep Speaker Representation Learning
Chau Luu, Peter Bell, Steve Renals

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification
Xu Li, Jinghua Zhong, Jianwei Yu, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng

A Speaker Verification Backend for Improved Calibration Performance across Varying Conditions
Luciana Ferrer, Mitchell Mclaren

Partial AUC Metric Learning Based Speaker Verification Back-End
Zhongxin Bai, Xiao-Lei Zhang, Jingdong Chen


Speech Application


Joint Training End-to-End Speech Recognition Systems with Speaker Attributes
Sheng Li, Xugang Lu, Raj Dabre, Peng Shen, Hisashi Kawai

Small Footprint Multi-channel Keyword Spotting
Jilong Wu, Yiteng Huang, Hyun-Jin Park, Niranjan Subrahmanya, Patrick Violette

Assessing Child Communication Engagement via Speech Recognition in Naturalistic Active Learning Spaces
Rasa Lileikyte, Dwight Irvin, John H. L. Hansen

Exploring the Effects of Device Variability on Forensic Speaker Comparison Using VOCALISE and NFI-FRIDA, A Forensically Realistic Database
David van der Vloed, Finnian Kelly, Anil Alexander

On Open-Set Speaker Identification with I-Vectors
Kevin Wilkinghoff

Speaker Detection in the Wild: Lessons Learned from JSALT 2019
Leibny Paola Garcia Perera, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak

Speaker Characterization Using TDNN, TDNN-LSTM, TDNN-LSTM-Attention based Speaker Embeddings for NIST SRE 2019
Chien-Lin Huang

Combined Vector Based on Factorized Time-delay Neural Network for Text-Independent Speaker Recognition
Tianyu Liang, Yi Liu, Can Xu, Xianwei Zhang, Liang He


Speaker Recognition 3


Personal VAD: Speaker-Conditioned Voice Activity Detection
Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, Ignacio Lopez Moreno

Speech Bandwidth Expansion For Speaker Recognition On Telephony Audio
Ganesh Sivaraman, Amruta Vidwans, Elie Khoury

Application of Bandwidth Extension with No Learning to Data Augmentation for Speaker Verification
Haruna Miyamoto, Sayaka Shiota, Hitoshi Kiya

Robust Speaker Recognition Using Speech Enhancement And Attention Model
Yanpei Shi, Qiang Huang, Thomas Hain

Analysis of Deep Feature Loss Based Enhancement for Speaker Verification
Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Najim Dehak