EUROSPEECH 2003 - INTERSPEECH 2003
In recent speech recognition systems the base unit of recognition is generally the speech sound. To each speech sound an acoustic model is associated, whose parameters are estimated by statistical methods. The proper training data fundamentally determine the efficiency of the recognizer. Present day technology and computational capacity allow speech recognition systems to operate with large dictionaries and complex language models, but the quality of the basic pattern matching units has large influence on the reliability of the system. In our experiments presented here we investigated the effects of different training methods to the recognition accuracy; namely, the effect of increasing the number of speakers and the number of mixtures were examined in the case of pronunciation modeling and context independent models.
Bibliographic reference. Fegyo, Tibor / Mihajlik, Peter / Tatai, Peter (2003): "Comparative study on hungarian acoustic model sets and training methods", In EUROSPEECH-2003, 829-832.