Sixth European Conference on Speech Communication and Technology
Obtaining a total contextual coverage and a smoothed training of phonetic models are the most relevant targets in acoustic modeling for continuous speech recognition systems. Traditionally, these objectives have been addressed using clustering techniques: either bottom-up or top-down ones. These two approaches yield complementary benefits. Because of their unrestricted optimization nature, bottom-up algorithms can reach a cluster configuration with a better homogeneity than top-down clustering can do. However, the phonetic guidance used by the latter gives a complete context generalization, allowing the provision of a model to contexts not found during the training step. In this paper a hybrid algorithm that gets the best properties of both approaches is reported. This algorithm is applied to the hidden Markov models of demiphones, a new phonetic unit introduced by the authors recently. The junction of the demiphone and this hybrid algorithm provides a noticeable saving in the size of the set of phonetic units without degrading the performance.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Marino, José B. / Nogueiras-Rodríguez, Albino (1999): "Top-down bottom-up hybrid clustering algorithm for acoustic-phonetic modeling of speech", In EUROSPEECH'99, 1343-1346.