First European Conference on Speech Communication and Technology

Paris, France
September 27-29, 1989

Statistical Analysis of Large-Scale Lexical Corpuses in the Context of Continuous Speech Recognition Systems (CSR Systems)

Michael Bundgaard

Institute of General and Applied Linguistics, University of Copenhagen, Copenhagen, Denmark

A number of statistical properties of allophonic and prosodic distributions of the 8000 most frequent words of Danish were investigated. With CSR in mind the distributions of the words were examined with different degrees of partial description of the allophones and considering also to some extent incorrect segmentation. The effect of disregarding unstressed syllables as well as entire allophone groups in the lexicon lookup was examined. The metric characteristics of the initial syllable at word level was uncovered in the efforts to find a way of detecting syllable boundaries.

Full Paper

Bibliographic reference.  Bundgaard, Michael (1989): "Statistical analysis of large-scale lexical corpuses in the context of continuous speech recognition systems (CSR systems)", In EUROSPEECH-1989, 1098-1101.