Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Enhanced Speech Coding Based on Phonetic Class Segmentation

Adriane Swalm Durey, Venkatesh Krishnan, Thomas P. Barnwell III

Georgia Institute of Technology, Atlanta, GA, USA

Given a baseline speech coder and speech with an available phonetic class segmentation, a number of potential enhancements to that coder become possible. While the quality of speech segmentation by phoneme and phonetic class is constantly improving, we use TIMIT to generate phonetic class segmentation as a basis for initial testing of these techniques. Using coders drawn from the MELP family, we explore specialized phonetic codebooks, phoneticallydriven superframing, and improved modeling of specific phonetic classes and the transitions between them. We compare the reconstructed speech from these enhancements against the base coder using the metrics of computational cost, transmission cost, and the quality of the reconstructed speech. In most cases, we find that segmentation-based coders can produce speech with quality comparable to that of MELP, using fewer transmitted bits and at no additional computational cost. With phonetic codebooks and transition modeling, CCR tests show these segmentation-based coders produce speech of better quality than is produced by MELP.

Full Paper

Bibliographic reference.  Durey, Adriane Swalm / Krishnan, Venkatesh / Barnwell III, Thomas P. (2005): "Enhanced speech coding based on phonetic class segmentation", In INTERSPEECH-2005, 2721-2724.