Comparing the Influence of Spectro-Temporal Integration in Computational Speech Segregation

Thomas Bentsen, Tobias May, Abigail A. Kressner, Torsten Dau


The goal of computational speech segregation systems is to automatically segregate a target speaker from interfering maskers. Typically, these systems include a feature extraction stage in the front-end and a classification stage in the back-end. A spectro-temporal integration strategy can be applied in either the front-end, using the so-called delta features, or in the back-end, using a second classifier that exploits the posterior probability of speech from the first classifier across a spectro-temporal window. This study systematically analyzes the influence of such stages on segregation performance, the error distributions and intelligibility predictions. Results indicated that it could be problematic to exploit context in the back-end, even though such a spectro-temporal integration stage improves the segregation performance. Also, the results emphasized the potential need of a single metric that comprehensively predicts computational segregation performance and correlates well with intelligibility. The outcome of this study could help to identify the most effective spectro-temporal integration strategy for computational segregation systems.


DOI: 10.21437/Interspeech.2016-1025

Cite as

Bentsen, T., May, T., Kressner, A.A., Dau, T. (2016) Comparing the Influence of Spectro-Temporal Integration in Computational Speech Segregation. Proc. Interspeech 2016, 3324-3328.

Bibtex
@inproceedings{Bentsen+2016,
author={Thomas Bentsen and Tobias May and Abigail A. Kressner and Torsten Dau},
title={Comparing the Influence of Spectro-Temporal Integration in Computational Speech Segregation},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1025},
url={http://dx.doi.org/10.21437/Interspeech.2016-1025},
pages={3324--3328}
}