5th International Conference on Spoken Language Processing
The choice of speech unit affects the accuracy, complexity, expandability and ease of adaptation of ASRs to speaker and environmental variations. This paper explores a method of subword modelling based on the concept of multi-phone strings. The motivation in using the longer duration multi-phone strings is to reduce the loss of contextual information, cross-phone correlation, and transitions. Multi-phone strings are an alternative to context-dependent phones and they include many of the syllables. An advantage of multi-phone units is the existence of more than one valid multi-phone transcription for each monophone sequence, this can be used to improve ASR accuracy. A particular case of multi-phone strings namely phone-pairs is investigated in detail. Experimental Evaluation on TIMIT and WSJCAM0 are presented.
Bibliographic reference. O'Neill, Philip / Vaseghi, Saeed / Doherty, Bernard / Tan, Wooi Haw / McCourt, Paul (1998): "Multi-phone strings as subword units for speech recognition", In ICSLP-1998, paper 0178.