The choice of speech unit affects the accuracy, complexity, expandability and ease of adaptation of ASRs to speaker and environmental variations. This paper explores a method of subword modelling based on the concept of multi-phone strings. The motivation in using the longer duration multi-phone strings is to reduce the loss of contextual information, cross-phone correlation, and transitions. Multi-phone strings are an alternative to context-dependent phones and they include many of the syllables. An advantage of multi-phone units is the existence of more than one valid multi-phone transcription for each monophone sequence, this can be used to improve ASR accuracy. A particular case of multi-phone strings namely phone-pairs is investigated in detail. Experimental Evaluation on TIMIT and WSJCAM0 are presented.
Cite as: O'Neill, P., Vaseghi, S., Doherty, B., Tan, W.H., McCourt, P. (1998) Multi-phone strings as subword units for speech recognition. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0178, doi: 10.21437/ICSLP.1998-672
@inproceedings{oneill98_icslp, author={Philip O'Neill and Saeed Vaseghi and Bernard Doherty and Wooi Haw Tan and Paul McCourt}, title={{Multi-phone strings as subword units for speech recognition}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0178}, doi={10.21437/ICSLP.1998-672} }