13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Estimation of the Vocal Tract Shape of Nasals Using a Bayesian Scheme

Christian H. Kasess (1), Wolfgang Kreuzer (1), Ewald Enzinger (1,2), Nadja Kerschhofer-Puhalo (1)

(1) Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
(2) School of Elec. Eng. & Telecom., Univ. of New South Wales, Australia

For nasal stops and nasalized vowels, one-tube models offer only an inadequate representation. To model the spectral components of nasal speech signals, a minimum of two connected tubes is necessary. Typically, the estimation of branched-tube area functions is based on a polezero model. The present paper introduces a variational Bayesian scheme under Gaussian assumptions to estimate the tube areas directly from the log-spectrum of the speech signal. Probabilistic priors are used to enforce smoothness of the tubes. The method is tested on recorded tokens of /m/ from several speakers using different prior variances. Results show that mild smoothness assumptions yield the best results in terms of model error and marginal likelihood. Furthermore, while yielding comparable fits, the estimated reflection coefficients from the Bayesian scheme show less intra-subject variability between tokens than an unregularized non-linear solver.

Index Terms: vocal tract, estimation, nasal stops, Bayesian statistics

Full Paper

Bibliographic reference.  Kasess, Christian H. / Kreuzer, Wolfgang / Enzinger, Ewald / Kerschhofer-Puhalo, Nadja (2012): "Estimation of the vocal tract shape of nasals using a Bayesian scheme", In INTERSPEECH-2012, 699-702.