ISCA Archive SPAC 1992
ISCA Archive SPAC 1992

A frequency-weighted euclidean distance and its application to HMM-based recognition of noisy speech

Hiroshi Matsumoto

We address the problem of incorporating frequency weighting into a stochastic modeling framework for robust speech recognition. First, this paper introduces frequency-weighted Euclidean distances weighted by a smoothed reference power spectrum. Then, on the basis of this distance measure, a frequency-weighted continuous density HMM is proposed in which the covariances are proportional to the spectral power in a frequency domain. Using spectral parameters of group delay spectra or spectral slope (RPS) and their time derivatives, frequency-weighting by a global power spectrum was confirmed to significantly improve the recognition accuracy for the RPS from 68.9 % to 91.6 % at a low SNR of 6 dB with added white noise. Furthermore, it was found that the frequency-weighted HMM attained a high recognition accuracy of 77.3 % in multi-speaker word recognition at a SNR of 12 dB, gaining 42.8 % in accuracy compared to the standard HMM.


Cite as: Matsumoto, H. (1992) A frequency-weighted euclidean distance and its application to HMM-based recognition of noisy speech. Proc. ETRW on Speech Processing in Adverse Conditions, 103-106

@inproceedings{matsumoto92_spac,
  author={Hiroshi Matsumoto},
  title={{A frequency-weighted euclidean distance and its application to HMM-based recognition of noisy speech}},
  year=1992,
  booktitle={Proc. ETRW on Speech Processing in Adverse Conditions},
  pages={103--106}
}