We address the problem of incorporating frequency weighting into a stochastic modeling framework for robust speech recognition. First, this paper introduces frequency-weighted Euclidean distances weighted by a smoothed reference power spectrum. Then, on the basis of this distance measure, a frequency-weighted continuous density HMM is proposed in which the covariances are proportional to the spectral power in a frequency domain. Using spectral parameters of group delay spectra or spectral slope (RPS) and their time derivatives, frequency-weighting by a global power spectrum was confirmed to significantly improve the recognition accuracy for the RPS from 68.9 % to 91.6 % at a low SNR of 6 dB with added white noise. Furthermore, it was found that the frequency-weighted HMM attained a high recognition accuracy of 77.3 % in multi-speaker word recognition at a SNR of 12 dB, gaining 42.8 % in accuracy compared to the standard HMM.
Cite as: Matsumoto, H. (1992) A frequency-weighted euclidean distance and its application to HMM-based recognition of noisy speech. Proc. ETRW on Speech Processing in Adverse Conditions, 103-106
@inproceedings{matsumoto92_spac, author={Hiroshi Matsumoto}, title={{A frequency-weighted euclidean distance and its application to HMM-based recognition of noisy speech}}, year=1992, booktitle={Proc. ETRW on Speech Processing in Adverse Conditions}, pages={103--106} }