5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

Speaker Normalization Training for Mixture Stochastic Trajectory Model

Irina Illina, Yifan Gong

CRIN/CNRS, INRIA-Lorraine, Vandoeuvre-l` es-Nancy, France Speech Research Media Technologies Laboratory Texas Instruments, Dallas, TX, USA

In this paper we are interested in speaker and environment adaptation techniques for speaker independent (SI) continuous speech recognition. These techniques are used to reduce mismatch between training and the testing conditions, using a small amount of adaptation data. In addition to reducing this mismatch during the adaptation, we propose to reduce the variation due to speakers or environments during the training itself in the context of Speaker Normalisation (SN) approach, using MLLR transformation. SN also includes a combination of the context-dependent, phone dependent and broad phonetic class dependent information. The use of linear regression to model broad phonetic class dependent information assures our model to be used in the case that the adaptation data or training data is not given for some phonetic symbols. SN is developed for Mixture Stochastic Trajectory Model, a segment based model. The approach can be used for speaker, gender or environment normalization. We show the performance of SN compared to SI recognition and to MLLR speaker adaptation, through experiments on continuous speech recognition.

Full Paper

Bibliographic reference.  Illina, Irina / Gong, Yifan (1997): "Speaker normalization training for mixture stochastic trajectory model", In EUROSPEECH-1997, 1855-1858.