First International Conference on Spoken Language Processing (ICSLP 90)

Kobe, Japan
November 18-22, 1990

Description of Acoustic Variations by Tree-Based Phone Modeling

Satoru Hayamizu (1), Kai-Fu Lee (2), Hsiao-Wuen Hon (2)

(1) Electrotechnical Laboratory, Tsukuba Science City, Japan
(2) Carnegie Mellon University, Pittsburgh, PA, USA

This paper discusses the use of tree-based phone modeling to describe acoustic variations of speech, and its application to speech recognition system. There are many sources of variabilities that affect the realization of a phoneme: phonetic contexts, speakers, stress, speaking rates and so on. Explicit modeling with these sources of variabilities will give more accurate and more detailed phone models, but needs a large amount of speech data for training. Tree-based phone modeling is studied to solve this problem with three case studies: phone models with large VQ codebook sizes, decision tree clustering, and speaker-clustering. They are tested on speaker-independent continuous speech recognition experiments with a 991 word vocabulary. Tree-based phone modeling is shown to produce improvement in all three cases and to provide a good guide to provide trainability and generalizability.

Full Paper

