In this paper, we propose a speech recognition method under non-stationary musical environments using Kalman filtering speech signal estimation method and iterative unsupervised MLLR adaptation. Our proposing method estimates the speech signal under non-stationary noisy environments such as musical background by applying speech state transition model to Kalman filtering estimation. The speech state transition model represents the state transition of speech component in non-stationary noisy speech and is modeled by using Taylor expansion. In this model, the state transition of noise is estimated by using linear predictive estimation. Furthermore, to obtain higher recognition accuracy, we consider to adapt the acoustic models by using iterative unsupervised MLLR adaptation to speech spectra distorted by Kalman filtering residual noise. In order to evaluate the proposed method, we carried out large vocabulary continuous speech recognition experiments under 3 types of music. As a result, the proposed method obtained the significant improvement in word accuracy.
Cite as: Fujimoto, M., Ariki, Y. (2001) Speech recognition under musical environments using kalman filter and iterative MLLR adaptation. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1879-1882, doi: 10.21437/Eurospeech.2001-444
@inproceedings{fujimoto01_eurospeech, author={M. Fujimoto and Y. Ariki}, title={{Speech recognition under musical environments using kalman filter and iterative MLLR adaptation}}, year=2001, booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)}, pages={1879--1882}, doi={10.21437/Eurospeech.2001-444} }