A number of statistical properties of allophonic and prosodic distributions of the 8000 most frequent words of Danish were investigated. With CSR in mind the distributions of the words were examined with different degrees of partial description of the allophones and considering also to some extent incorrect segmentation. The effect of disregarding unstressed syllables as well as entire allophone groups in the lexicon lookup was examined. The metric characteristics of the initial syllable at word level was uncovered in the efforts to find a way of detecting syllable boundaries.
Bibliographic reference. Bundgaard, Michael (1989): "Statistical analysis of large-scale lexical corpuses in the context of continuous speech recognition systems (CSR systems)", In EUROSPEECH-1989, 1098-1101.