ISCA Archive SPKD 2008
ISCA Archive SPKD 2008

An experimental study on continuous phone recognition with little or no language-specific training data

Dau-Cheng Lyu, Sabato Marco Siniscalchi, Chin-Hui Lee

language-specific speech training data. The phone recognizer integrates three levels of information from: (1) frame based speech attribute detectors, (2) artificial neural network based phone event mergers, and (3) decoding based evidence verifiers. With a set of acoustic phonetic attributes defined over a number of available languages, a collection of attribute-to-phone mapping rules can either be specified in a language- dependent way, one for each language, or even independently for all languages if the attribute specification is complete to cover all phones and the phone definition is universal to cover all spoken languages. We report on experimental results on Japanese phone recognition with the OGI Multilingual Speech Corpus. It is interesting that a good performance can be achieved without using any Japanese speech training data, and the phone accuracy rates vary depending on how the attribute detectors and phone mergers are configured. Further improvement is observed by adding little Japanese data to train the attribute-to-phone mergers.


Cite as: Lyu, D.-C., Siniscalchi, S.M., Lee, C.-H. (2008) An experimental study on continuous phone recognition with little or no language-specific training data. Proc. ISCA ITRW on Speech Analysis and Processing for Knowledge Discovery, paper 005

@inproceedings{lyu08_spkd,
  author={Dau-Cheng Lyu and Sabato Marco Siniscalchi and Chin-Hui Lee},
  title={{An experimental study on continuous phone recognition with little or no language-specific training data}},
  year=2008,
  booktitle={Proc. ISCA ITRW on Speech Analysis and Processing for Knowledge Discovery},
  pages={paper 005}
}