INTERSPEECH 2004 - ICSLP
There are many circumstances in which it is useful or necessary to recognise phones rather than words, but phone recognition is inherently less accurate than word recognition. We describe here a post-recognition method for "translating" an errorful phone string output by a speech recogniser into a string that more closely matches the transcription. The technique owes something to Kohonen's idea of "dynamically expanding context" in that it learns from the errors made by the recogniser in a particular context, but it uses many contexts rather than a single context to estimate the "translation" of a recognised phone. The weights given to the different contexts in estimating the translation are determined discriminatively. On the WSJCAM0 database, the technique gives a 19.2% relative improvement in phone errors (including insertions) over the baseline, compared with a 6.2% improvement obtained using dynamically expanding context.
Bibliographic reference. Cox, Stephen (2004): "Using context to correct phone recognition errors", In INTERSPEECH-2004, 2061-2064.