15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Neural Network Phone Duration Model for Speech Recognition

Tanel Alumäe

Tallinn University of Technology, Estonia

In this paper, we describe a novel phone duration model that is used to improve the accuracy of a large vocabulary speech recognition system based on state-of-the-art speaker-adapted DNN acoustic models. The duration model calculates the probability density function of phone duration from phone's contextual features using a neural network which is then applied for word lattice rescoring. Experimental results are given for Estonian, English and Finnish transcription tasks. An absolute word error rate reduction of 0.8–1.4% is observed across all evaluation sets.

Full Paper

Bibliographic reference.  Alumäe, Tanel (2014): "Neural network phone duration model for speech recognition", In INTERSPEECH-2014, 1204-1208.