Effects of Base-Frequency and Spectral Envelope on Deep-Learning Speech Separation and Recognition Models

J. Hui, Y. Wei, S.T. Chen, R.H.Y. So


Base-frequencies (F0) and spectral envelopes play an important role in speech separation and recognition by humans. Two experiments were conducted to study how trained networks for multi-speaker speech separation/recognition are affected by difference of F0 and spectral envelopes between source signals. The first experiment examined the effects of natural F0/envelope on the performance of speech separation. Results showed that when the two target signals differed in F0 by ±3 semitones or more or differed in the envelope by a scaling factor larger than 1.08 or less than 0.92, separation performance improved significantly. This is consistent with human listeners and is the first finding for deep learning-network (DNN) models. The second experiment tested the effect of F0/envelope difference on multi-speaker automatic speech recognition(ASR) system’s performance. Results showed that multi-speaker recognition result also significantly rely on F0/envelope differences. The overall results indicated that the dependency of the existing automatic systems on monaural cues is similar to that of human, while automatic systems still perform inferior than human on same tasks.


 DOI: 10.21437/Interspeech.2019-1715

Cite as: Hui, J., Wei, Y., Chen, S., So, R. (2019) Effects of Base-Frequency and Spectral Envelope on Deep-Learning Speech Separation and Recognition Models. Proc. Interspeech 2019, 634-638, DOI: 10.21437/Interspeech.2019-1715.


@inproceedings{Hui2019,
  author={J. Hui and Y. Wei and S.T. Chen and R.H.Y. So},
  title={{Effects of Base-Frequency and Spectral Envelope on Deep-Learning Speech Separation and Recognition Models}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={634--638},
  doi={10.21437/Interspeech.2019-1715},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1715}
}