INTERSPEECH 2006 - ICSLP
We propose a method for estimating the parameters of SPLICE-like transformations from individual utterances so that this type of transformation can be used to normalize acoustic feature vectors for speech recognition on an utterance-by-utterance basis in a similar manner to cepstral mean normalization. We report results on an in-house French language multi-speaker database collected while deploying an automatic closed-captioning system for live broadcast news. An unusual feature of this database is that there are very large amounts of training data for the individual speakers (typically several hours) so that it is very difficult to improve on multi-speaker modeling by using standard methods of speaker adaptation. We found that the proposed method of feature normalization is capable of achieving a 6% relative improvement over cepstral mean normalization on this task.
Bibliographic reference. Kenny, Patrick / Gupta, Vishwa / Boulianne, G. / Ouellet, Pierre / Dumouchel, Pierre (2006): "Feature normalization using smoothed mixture transformations", In INTERSPEECH-2006, paper 1026-Mon1A2O.1.