Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

A Block Cosine Transform and its Application in Speech Recognition

Jingdong Chen (1,2), Kuldip K. Paliwal (1), Satoshi Nakamura (2)

(1) School of Microelectronic Engineering, Griffith University , Brisbane, QLD, Australia
(2) ATR Spoken Language Translation Research Labs, Kyoto, Japan

Noise robust speech recognition has become an important area of research in recent years. The fact that human listeners can recognize speech in the presence of strong noise inspires researchers to imitate some aspects of human auditory perception in automatic speech recognition. This has led to sub-band based speech recognition in which the full-band speech is split into several sub-bands and where each sub-band is processed separately. The resulting multi-band features can be combined in various ways for carrying out speech recognition task. Reported results have shown the superiority of this technique for speech recognition in strong noise conditions. In this paper, we will briefly review the multi-band feature extraction. We will then propose a block discrete cosine transform (BDCT) with its kernel transformation matrix being derived from the decomposition of the kernel of the discrete cosine transform (DCT). We show that the BDCT approximates the DCT in keeping information in decorrelating a sequence. When the BDCT is applied to the mel frequency filter bank energies (FBEs) to replace the DCT to convert them to cepstral coefficients, a new kind of MFCCs is yielded. We call these new features Block discrete cosine transform based MFCCs (BMFCCs) and show that a sub-band processing idea is implicit in the BMFCCs since the BDCT automatically divides the mel frequency FBEs into two sub-bands. We will report various speech recognition results using the BMFCCs as well as the comparison with the multi-band MFCCs and fullband MFCCs to elaborate the properties of the BMFCCs.

