Robustness in Speech, Speaker, and Language Recognition: “You’ve Got to Know Your Limitations”

John H.L. Hansen, Hynek Bořil


In the field of speech, speaker and language recognition, significant gains have and are being made with new machine learning strategies along with the availability of new and emerging speech corpora. However, many of the core scientific principles required for effective speech processing research appear to be drifting to the sidelines with the assumptions that access to larger amounts of data can address a growing range of issues relating to new speech/speaker/language recognition scenarios. This study focuses on exploring several challenging domains in formulating effective solutions in realistic speech data, and in particular the notion of using naturalistic data to better reflect the potential effectiveness of new algorithms. Our main focus is on mismatch/speech variability issues due to (i) differences in noisy speech with and without Lombard effect and a communication factor, (ii) realistic field data in noisy/increased cognitive load conditions, and (iii) dialect identification using found data. Finally, we study speaker–noise and speaker–speaker interactions in a newly established, fully naturalistic Prof-Life-Log corpus. The specific outcomes from this study include an analysis of the strengths and weaknesses of simulated vs. actual speech data collection for research.


DOI: 10.21437/Interspeech.2016-1395

Cite as

Hansen, J.H., Bořil, H. (2016) Robustness in Speech, Speaker, and Language Recognition: “You’ve Got to Know Your Limitations”. Proc. Interspeech 2016, 2766-2770.

Bibtex
@inproceedings{Hansen+2016,
author={John H.L. Hansen and Hynek Bořil},
title={Robustness in Speech, Speaker, and Language Recognition: “You’ve Got to Know Your Limitations”},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1395},
url={http://dx.doi.org/10.21437/Interspeech.2016-1395},
pages={2766--2770}
}