The hidden Markov modelling experiments presented in this paper show that consonant identification results can be improved substantially if a neural network is used to extract linguistically relevant information from the acoustic signal before applying hidden Markov modelling. The neural network - or in this case a combination of two Kohonen networks - takes 12 mel-frequency cepstral coefficients, overall energy and the corresponding delta parameters as input and outputs distinctive phonetic features, like [(plus-minus)uvular] and [(plus-minus)plosive]. Not only does this preprocessing of the data lead to better consonant identification rates, the confusions that occur between the consonants are less severe from a phonetic viewpoint, as is demonstrated. One reason for the improved consonant identification is that the acoustically variable consonant realisations can be mapped onto identical phonetic features by the neural network. This makes the input to hidden Markov modelling more homogenous and improves consonant identification. Furthermore, by using phonetic features the neural network helps the system to focus on linguistically relevant information in the acoustic signal.
Cite as: Koreman, J., Andreeva, B., Barry, W.J. (1998) Do phonetic features help to improve consonant identification in ASR? Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0549, doi: 10.21437/ICSLP.1998-532
@inproceedings{koreman98_icslp, author={Jacques Koreman and Bistra Andreeva and William J. Barry}, title={{Do phonetic features help to improve consonant identification in ASR?}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0549}, doi={10.21437/ICSLP.1998-532} }