In this paper, a gammatone-domain model combination method is proposed for consonant recognition in noisy environments. For this task, we first define a gammatone cepstral coefficient (GCC) as the cepstral representation of the averaged envelopes of a gammatone filtered signal. Then, we investigate a proper phonetic unit by comparing monophone, diphone, and triphone acoustic models, where it is determined from consonant recognition experiments that the diphone hidden Markov models (HMMs) provide the best performance. Next, a gammatone-domain model combination method is developed to combine the clean and noise models in the linear gammatone-envelope domain. We then evaluate the performance of the GCC-based feature and the proposed model combination on intervocalic English consonants (VCV) with 24 different consonants. It is experimentally shown that the GCC-based feature achieves a relatively higher recognition rate of 47.46% than the mel-frequency cepstral coefficients (MFCCs). Also, the model combination applied to the GCC-based diphone HMM system relatively increases the accuracy rate by 77.67% under the noisy conditions.
Bibliographic reference. Yoon, Jae Sam / Park, Ji Hun / Kim, Hong Kook (2008): "Gammatone-domain model combination for consonant recognition in noisy environments", In INTERSPEECH-2008, 1773-1776.