September 22-25, 1997
The vast majority of work in continuous speech recognition uses phoneme-like units as the basic recognition component. The work presented here investigates the practicability of syllable-like units as the building blocks for recognition. A phonetically annotated telephony database is analysed at the syllable level, and a set of syllable-based HMMs are built. Refinements including the introduction of syllable-level bigram probabilities, word- and syllable- level insertion penalties, and the investigation of different model topologies are found to improve recogniser performance. It is found that the syllable-based recogniser gives recognition accuracies of over 60%, which compares with 35% as the baseline accuracy for monophone recognition. It is envisaged that practical applications of syllable recognition could be in a hybrid system, where the most common syllable HMMs would be used in conjunction with whole- word and phoneme models.
Bibliographic reference. Jones, Rhys James / Downey, Simon / Mason, John S. (1997): "Continuous speech recognition using syllables", In EUROSPEECH-1997, 1171-1174.