8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Data-Driven Pronunciation Modeling for ASR Using Acoustic Subword Units

Thurid Spiess (1), Britta Wrede (2), Gernot A. Fink (1), Franz Kummert (1)

(1) Universitšt Bielefeld, Germany
(2) International Computer Science Institute, USA

We describe a method to model pronunciation variation for ASR in a data-driven way, namely by use of automatically derived acoustic subword units. The inventory of units is designed so as to produce maximal separable pronunciation variants of words while at the same time only the most important variants for the particular application are trained. In doing so, the optimal number of variants per word is determined iteratively. All this is accomplished (almost) fully automatically by use of a state splitting algorithm and a variant distance measure. Compared to a baseline system using triphones as subword units and with minimal pronunciation variants, this method achieved a relative improvement of the word error rate by 10%.

Full Paper

Bibliographic reference.  Spiess, Thurid / Wrede, Britta / Fink, Gernot A. / Kummert, Franz (2003): "Data-driven pronunciation modeling for ASR using acoustic subword units", In EUROSPEECH-2003, 2549-2552.