EUROSPEECH 2003 - INTERSPEECH 2003
This paper discusses building gender dependent gaussian mixture models (GMMs) and how to integrate these with an efficient gender detection scheme. Gender specific acoustic models of half the size of a corresponding gender independent acoustic model substantially outperform the larger gender independent acoustic models. With perfect gender detection, gender dependent modeling should therefore yield higher recognition accuracy without consuming more memory. Furthermore, as certain phonemes are inherently gender independent (e.g. silence) much of the male and female specific acoustic models can be shared. This paper proposes how to discover which phonemes are inherently similar for male and female speakers and how to efficiently share this information between gender dependent GMMs. A highly accurate and computationally efficient gender detection scheme is suggested that takes advantage of computations inherently done in the speech recognizer. By making the gender assignment probabilistic an increase in word error rate (WER) seen for erroneously gender labeled speakers is avoided. The method of gender detection and probabilistic use of gender is novel and should be of interest beyond mere gender detection. The only requirement for the method to work is that the training data be appropriately labeled.
Bibliographic reference. Olsen, Peder A. / Dharanipragada, Satya (2003): "An efficient integrated gender detection scheme and time mediated averaging of gender dependent acoustic models", In EUROSPEECH-2003, 2509-2512.