5th International Conference on Spoken Language Processing
Collections of speech documents can be searched using speech retrieval, where the documents are processed by a speech recogniser to give text that can be searched by text retrieval techniques. We investigated the use of a phoneme-based recogniser to obtain phoneme sequences. We found that phoneme recognition is worse than word recognition, because of lack of context and difficulty in phoneme boundary detection. Comparing the transcriptions of two different phoneme-based recogniser, we found that the effects of training using well-defined phoneme data, the lack of a language model, and lack of a context-dependent model affected recognition performance. Retrieval using trigrams performed better than quadgrams because the longer n-gram features contained too many transcription errors. Comparing the phonetic transcriptions from a word recogniser to that from a phoneme recogniser, we found that using 61 phones modelled with an algorithmic approach were better than using 40 phones modelled with a dictionary approach.
Bibliographic reference. Ng, Corinna / Wilkinson, Ross / Zobel, Justin (1998): "Factors affecting speech retrieval", In ICSLP-1998, paper 0740.