Phonetic features have been proposed to overcome performance degradation in spectral speaker recognition in dif?cult acoustic conditions. The harmful effect of those conditions, however, is not restricted to spectral systems but also affects the performance of the open-loop phone recognisers on which phonetic systems are based. In automatic speech recognition, larger subword units and the use of additional constraints from language models have been employed to improve robustness against adverse acoustic conditions. This paper evaluates the performance of more constrained phone recognition and different subword units for speaker recognition on heterogeneous broadcast data from German parliamentary speeches. Using phone clusters and a strong language model instead of phones obtained from unconstrained recognition improves the equal error rate from 14.3% to 8.6% on the given data.
Cite as: Baum, D., Schneider, D., Mertens, T., Kohler, J. (2010) Constrained Subword Units for Speaker Recognition. Proc. The Speaker and Language Recognition Workshop (Odyssey 2010), paper 02
@inproceedings{baum10_odyssey, author={Doris Baum and Daniel Schneider and Timo Mertens and Joachim Kohler}, title={{Constrained Subword Units for Speaker Recognition}}, year=2010, booktitle={Proc. The Speaker and Language Recognition Workshop (Odyssey 2010)}, pages={paper 02} }