Influence of Speaker-Specific Parameters on Speech Separation Systems

David Ditter, Timo Gerkmann

Recent studies have shown that Deep Learning based single-channel speech separation systems perform worse for same-gender mixtures than for different-gender mixtures. In this work, we provide for a more detailed analysis of the respective impact of the fundamental frequency and the vocal tract length on the system performance. While both parameters are correlated with gender, the vocal tract length is a fixed speaker-specific parameter, whereas the fundamental frequency can vary for different speaking styles. We show that the difference of the fundamental frequency medians of two speakers in a mixture is highly correlated with the SDR performance while the difference of the vocal tract lengths is not. Our analysis allows us to do performance predictions for given speakers based on measurements of their fundamental frequency. Furthermore we conclude that current systems separate (short-term) speaking styles rather than (long-term) speaker characteristics.

 DOI: 10.21437/Interspeech.2019-2459

Cite as: Ditter, D., Gerkmann, T. (2019) Influence of Speaker-Specific Parameters on Speech Separation Systems. Proc. Interspeech 2019, 4584-4588, DOI: 10.21437/Interspeech.2019-2459.

  author={David Ditter and Timo Gerkmann},
  title={{Influence of Speaker-Specific Parameters on Speech Separation Systems}},
  booktitle={Proc. Interspeech 2019},