ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

A combination of speaker normalization and speech rate normalization for automatic speech recognition

Thilo Pfau, Robert Faltlhauser, Günther Ruske

In this contribution a normalization procedure for automatic speech recognition is introduced which aims at reducing speaking rate specific variations of the features of the phonetic classes. A "spurtwise" calculation of normalization factors allows to capture changes of the speaking rate within one utterance. The costsaving implementation using linear interpolation of the original features and a word graph rescoring procedure leads to a moderate increase in computational load compared to the baseline system without speech rate normalization.

In addition a two-step procedure which combines vocal tract length normalization (VTLN) and speech rate normalization (SRN) has been developed. Experiments showed, that applying SRN to a VTLN-based recognition system leads to relative reduction in word error rate of 4.2%. This is comparable to the decrease observed when using SRN on a system without VTLN. All in all the combination of VTLN and SRN results in a 15% reduction of word error rate compared to the baseline system.


Cite as: Pfau, T., Faltlhauser, R., Ruske, G. (2000) A combination of speaker normalization and speech rate normalization for automatic speech recognition. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 4, 362-365

@inproceedings{pfau00_icslp,
  author={Thilo Pfau and Robert Faltlhauser and Günther Ruske},
  title={{A combination of speaker normalization and speech rate normalization for automatic speech recognition}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 4, 362-365}
}