Interspeech'2005 - Eurospeech
Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is expected that the performance of speech recognition using the proposed speech coder is better than that using G.729.
Bibliographic reference. Lee, Gil Ho / Yoon, Jae Sam / Kim, Hong Kook (2005): "A MFCC-based CELP speech coder for server-based speech recognition in network environments", In INTERSPEECH-2005, 3169-3172.