Sixth International Conference on Spoken Language Processing
Spontaneous speech is highly variable and rarely conforms to conventional assumptions and linguistically defined pronunciation rules. Specifically, there may be many different continuous speech realizations for each expertly defined phonetic unit in the dictionary. The phones may be realized in a clean and complete fashion as in read speech, or they may be realized in a sloppy and incomplete fashion as in highly spontaneous speech. For spontaneous speech, therefore, it may be beneficial to model incompletely realized variants of any phonetic unit as separate units. In this paper we test this hypothesis by introducing two possible modeling classes for the phones AA and IY in the standard English CMU recognition dictionary. We propose three different automatic methods of segregating the training data properly in order to identify and label the appropriate variants. Each of these methods results in improved recognition performance over the baseline, leading to the conclusion that finer modeling frameworks can be helpful to parameterize properly and recognize spontaneous speech.
Bibliographic reference. Nedel, Jon P. / Singh, Rita / Stern, Richard M. (2000): "Automatic subword unit refinement for spontaneous speech recognition via phone splitting", In ICSLP-2000, vol.4, 588-591.