12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Semi-Automatic Acoustic Model Generation from Large Unsynchronized Audio and Text Chunks

Michele Alessandrini, Giorgio Biagetti, Alessandro Curzi, Claudio Turchetti

Università Politecnica delle Marche, Italy

In this paper an effective technique to train an acoustic model from large and unsynchronized audio and text chunks is presented. Given such a speech corpus, an algorithm to automatically segment each chunk into smaller fragments and to synchronize those to the corresponding text is defined. These smaller fragments are more suitable to be used in standard model training algorithms for usage in automatic speech recognition systems. The proposed approach is particularly suitable to bootstrap language models without relying neither on specialized training material nor borrowing from models trained for other similar languages. Extensive experimentation using the CMU Sphinx 4 recognizer and the SphinxTrain model generator in a setting designed for large-vocabulary continuous speech recognition shows the effectiveness of the approach.

Full Paper

Bibliographic reference.  Alessandrini, Michele / Biagetti, Giorgio / Curzi, Alessandro / Turchetti, Claudio (2011): "Semi-automatic acoustic model generation from large unsynchronized audio and text chunks", In INTERSPEECH-2011, 1681-1684.