![]() |
European Conference on Speech TechnologyEdinburgh, Scotland, UK |
![]() |
This paper presents a study of the use of rough spectral features which may be detected in
the speech signal with a high degree of certainty, for large vocabulary 015,000 words) isolated
word recognition. The use of these features must permit correct preclassification, a finer
recognition process being used afterward on the vocabulary subset.
For correct cohort (class) access, it is essential to propose several feature strings for any
given word in order to take into account phonological variability, and it is necessary to compact
several occurrences of the same feature into one to avoid segmentation-based errors. We use
context-dependant rewrite rules to transform a syllable in phonetic form into one or several strings
of features; an engine applies these rules to whole words, then determining the feature string
labelled cohorts.
The study was carried out on a 17,000-form vocabulary with confirmation on a 270,000-form
one. Maximum, mean, and expected cohort sizes have been calculated. The definition of expected
size has been generalised in order to take into account the multiple strings for a given word.
The evaluation shows, for the French language, that the use of rough spectral features is
interesting, but less vocabulary-reducing, than it seems in other studies, due to the
multiple string
representation.
Bibliographic reference. Adda, Gilles / Eskénazi, Maxine / Stem, P. E. (1987): "The use of rough spectral features for large vocabulary recognition", In ECST-1987, 1171-1174.