7th International Conference on Spoken Language Processing
September 16-20, 2002
To be able to build acoustic models for children, that can be used in spoken dialogue systems, speech data has to be collected. Commercial recognizers available for Swedish are trained on adult speech, which makes them less suitable for children’s computer-directed speech. This paper describes some experiments with on-the-fly voice transformation of children’s speech. Two transformation methods were tested, one inspired by the Phase Vocoder algorithm and another by the Time-Domain Pitch-Synchronous Overlap-Add (TD-PSOLA) algorithm. The speech signal is transformed before being sent to the speech recognizer for adult speech. Our results show that this method reduces the error rates in the order of thirty to forty-five percent for children users.
Bibliographic reference. Gustafson, Joakim / Sjölander, Kåre (2002): "Voice transformations for improving children²s speech recognition in a publicly available dialogue system", In ICSLP-2002, 297-300.