5th International Conference on Spoken Language Processing
This paper proposes a recognition-synthesis approach to speech coding which uses a formant trajectory model for both recognition and synthesis. It is argued that this unified approach to coding has the potential to achieve low data rates whilst preserving speech quality and paralinguistic information. A simple coding scheme is described which establishes the principles of this approach. Formant analysis is applied to the input speech, and the formant features are input to a linear-trajectory segmental hidden Markov model recognizer to locate segment boundaries. The formant parameters for each segment are coded using a linear trajectory description, and used to drive a parallel-formant synthesizer to reproduce the utterance at the receiver. The coding method has been tested on utterances from a variety of speakers. In the current system, which has not yet been optimised for coding efficiency, speech is typically coded at 600-1000 bits/s with good intelligibility, whilst preserving speaker characteristics.
Bibliographic reference. Holmes, Wendy J. (1998): "Towards a unified model for low bit-rate speech coding using a recognition-synthesis approach", In ICSLP-1998, paper 0553.