In this paper we discuss the complementarity of the group delay features with respect to other conventional acoustic features and also propose the use of such diverse information in the linguistic search space for robust speech recognition. A discriminability analysis is carried out on various classes of phonetic units. A class based phonetic unit analysis is conducted to compare the suitability of using different acoustic feature streams for recognition of different phonetic unit classes. The results of recognition for isolated phonemic or syllabic units, give the appropriate feature for each unit. We then turn to describe the significance of this diversity of information present in the various acoustic features and their integration into the linguistic search space for syllable based continuous speech recognition. A weighted average likelihood method is used here, which appropriately weights the relevant acoustic feature for each syllable in question during the Viterbi decoding process. This technique of integrating the complementarity of acoustic features into the linguistic search space gives reasonably reduced word error rates (WER) compared to conventional single or multi-stream acoustic features for experiments conducted on the TIMIT and the DBIL databases.
Bibliographic reference. Ramya, R. / Hegde, Rajesh M. / Murthy, Hema A. (2008): "Significance of group delay based acoustic features in the linguistic search space for robust speech recognition", In INTERSPEECH-2008, 1537-1540.