Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Using Bayesian Belief Networks for Model Duration in Text-to-Speech Systems

Olga Goubanova, Paul Taylor

Centre for Speech Technology Research, University of Edinburgh, UK

The problems of database imbalance and factor interaction make modelling of segment duration in text-to-speech systems a challenging task. We therefore propose a probabilistic Bayesian belief network (BN) approach to tackle data sparsity and factor interaction problems. The belief network approach makes good estimations in cases of missed or incomplete data. Also, it captures factor interaction in a concise way of causal relationships among the nodes in a directed acyclic (DAG) graph. Furthermore, a belief network approach allows a significant reduction of the number of parameters to be estimated. In our work, we model segment duration as a hybrid Bayesian network consisting of discrete and continuous nodes; each node in the network represents a linguistic factor that affects segmental duration. The interaction between the factors is represented as conditional dependence relations in the graphical model. We contrasted the results of belief network model with those of sums of products model and classification and regression tree (CART) model. We trained and tested all three models on the same data. Our new model significantly outperforms CART: the belief network achieves a RMS error of 5 milliseconds compared with 20 ms from CART. The SoP model also produces an error of 9 ms, and hence our new model isnít any worse in terms of final performance. However, we think our model has many other advantages compared to SoP, for instance it is much easier to configure and experiment with new features. This should make it easier to adapt to new languages.

Full Paper

Bibliographic reference.  Goubanova, Olga / Taylor, Paul (2000): "Using bayesian belief networks for model duration in text-to-speech systems", In ICSLP-2000, vol.2, 427-430.