ISCA Archive ICSLP 1998
ISCA Archive ICSLP 1998

Multi-phone strings as subword units for speech recognition

Philip O'Neill, Saeed Vaseghi, Bernard Doherty, Wooi Haw Tan, Paul McCourt

The choice of speech unit affects the accuracy, complexity, expandability and ease of adaptation of ASRs to speaker and environmental variations. This paper explores a method of subword modelling based on the concept of multi-phone strings. The motivation in using the longer duration multi-phone strings is to reduce the loss of contextual information, cross-phone correlation, and transitions. Multi-phone strings are an alternative to context-dependent phones and they include many of the syllables. An advantage of multi-phone units is the existence of more than one valid multi-phone transcription for each monophone sequence, this can be used to improve ASR accuracy. A particular case of multi-phone strings namely phone-pairs is investigated in detail. Experimental Evaluation on TIMIT and WSJCAM0 are presented.


doi: 10.21437/ICSLP.1998-672

Cite as: O'Neill, P., Vaseghi, S., Doherty, B., Tan, W.H., McCourt, P. (1998) Multi-phone strings as subword units for speech recognition. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0178, doi: 10.21437/ICSLP.1998-672

@inproceedings{oneill98_icslp,
  author={Philip O'Neill and Saeed Vaseghi and Bernard Doherty and Wooi Haw Tan and Paul McCourt},
  title={{Multi-phone strings as subword units for speech recognition}},
  year=1998,
  booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)},
  pages={paper 0178},
  doi={10.21437/ICSLP.1998-672}
}