EUROSPEECH 2003 - INTERSPEECH 2003
When a speech recognition system has to work with signals corresponding to different sampling frequencies, multiple acoustic models may have to be maintained. To avoid this drawback, the system can be trained at the highest expected sampling frequency and the acoustic models are posteriorly converted to a new sampling frequency. However, the usual mel-frequency cepstral coefficients are not well suited to this approach since they are not located in the frequency domain. For this reason, we propose in this paper to face that problem with the features resulting from frequency-filtering the logarithmic band energies. Experimental results are reported with SpeechDatCar databases, at 16 kHz, 11 kHz, and 8 kHz sampling rates, which show no degradation in terms of recognition performance for 11/8 kHz testing signals when the system, trained at 16 kHz, is converted, in an inexpensive way, to 11/8 kHz, instead of directly training the system at 11/8 kHz.
Bibliographic reference. Bauerecker, Hermann / Nadeu, Climent / Padrell, Jaume (2003): "On the advantage of frequency-filtering features for speech recognition with variable sampling frequencies. experiments with speechdatcar databases", In EUROSPEECH-2003, 869-872.