
Sixth International Conference on Spoken Language Processing
(ICSLP 2000)
Beijing, China
October 1620, 2000 

Using Bayesian Belief Networks for Model Duration in TexttoSpeech Systems
Olga Goubanova, Paul Taylor
Centre for Speech Technology Research,
University of Edinburgh, UK
The problems of database imbalance and factor interaction make
modelling of segment duration in texttospeech systems a challenging
task. We therefore propose a probabilistic Bayesian belief
network (BN) approach to tackle data sparsity and factor interaction
problems. The belief network approach makes good estimations
in cases of missed or incomplete data. Also, it captures factor
interaction in a concise way of causal relationships among the
nodes in a directed acyclic (DAG) graph. Furthermore, a belief
network approach allows a significant reduction of the number
of parameters to be estimated. In our work, we model segment
duration as a hybrid Bayesian network consisting of discrete and
continuous nodes; each node in the network represents a linguistic
factor that affects segmental duration. The interaction between
the factors is represented as conditional dependence relations in
the graphical model. We contrasted the results of belief network
model with those of sums of products model and classification
and regression tree (CART) model. We trained and tested all
three models on the same data. Our new model significantly outperforms
CART: the belief network achieves a RMS error of 5
milliseconds compared with 20 ms from CART. The SoP model
also produces an error of 9 ms, and hence our new model isn’t
any worse in terms of final performance. However, we think our
model has many other advantages compared to SoP, for instance
it is much easier to configure and experiment with new features.
This should make it easier to adapt to new languages.
Full Paper
Bibliographic reference.
Goubanova, Olga / Taylor, Paul (2000):
"Using bayesian belief networks for model duration in texttospeech systems",
In ICSLP2000, vol.2, 427430.