ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Prosody-enriched lattices for improved syllable recognition

Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan

Automatic recognition of syllables is useful for many spoken language applications such as speech recognition and spoken document retrieval. Short-term spectral properties (such as mel-frequency cepstral coefficients, or MFCCs) are usually the features of choice for such systems, which typically ignore suprasegmental (prosodic) cues that manifest themselves at the syllable, word and utterance level. Previous work has shown that categorical representations of prosody correlate well with lexical entities. In this paper, we attempt to exploit this relationship by enriching syllable-level lattices, generated by a standard speech recognizer, with categorical prosodic events for improved syllable recognition performance. With the enriched lattices, we obtain a 2% relative improvement in syllable error rate over the baseline system on a read speech task (the Boston University Radio News Corpus).

doi: 10.21437/Interspeech.2007-506

Cite as: Ananthakrishnan, S., Narayanan, S.S. (2007) Prosody-enriched lattices for improved syllable recognition. Proc. Interspeech 2007, 1813-1816, doi: 10.21437/Interspeech.2007-506

  author={Sankaranarayanan Ananthakrishnan and Shrikanth S. Narayanan},
  title={{Prosody-enriched lattices for improved syllable recognition}},
  booktitle={Proc. Interspeech 2007},