7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Voice Transformations for Improving Children’s Speech Recognition in a Publicly Available Dialogue System

Joakim Gustafson (1), Kåre Sjölander (2)

(1) Telia Research AB, Sweden; (2) KTH, Sweden

To be able to build acoustic models for children, that can be used in spoken dialogue systems, speech data has to be collected. Commercial recognizers available for Swedish are trained on adult speech, which makes them less suitable for children’s computer-directed speech. This paper describes some experiments with on-the-fly voice transformation of children’s speech. Two transformation methods were tested, one inspired by the Phase Vocoder algorithm and another by the Time-Domain Pitch-Synchronous Overlap-Add (TD-PSOLA) algorithm. The speech signal is transformed before being sent to the speech recognizer for adult speech. Our results show that this method reduces the error rates in the order of thirty to forty-five percent for children users.

