A feature extraction scheme is presented that analyzes speech signals sampled at different sampling rates. This will be needed in the future because of terminals in the telecom network that will transmit speech information also in the frequency region above 4 kHz. A cepstral analysis scheme is applied in the frequency range up to 4 kHz to create a common set of acoustic parameters for all sampling rates. Additional parameters are determined describing the subband energy in the frequency region above 4 kHz. As the major advantage of this feature extraction no individual recognizer has to be trained for each sampling frequency. It is shown with a recognition experiment that terminals and recognition systems can be combined without a remarkable loss in recognition performance with the terminal operating at a different sampling frequency than the recognizer has been trained on.
Cite as: Hirsch, H.G., Hellwig, K., Dobler, S. (2001) Speech recognition at multiple sampling rates. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1837-1840, doi: 10.21437/Eurospeech.2001-434
@inproceedings{hirsch01_eurospeech, author={H. G. Hirsch and K. Hellwig and S. Dobler}, title={{Speech recognition at multiple sampling rates}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={1837--1840}, doi={10.21437/Eurospeech.2001-434} }