The highest recognition performance is still achieved when training a recognition system with speech data that have been recorded in the acoustic scenario where the system will be applied. We investigated the approach of using several sets of HMMs. These sets have been trained on data that were recorded in different typical noise situations. One HMM set is individually selected at each speech input by comparing the pause segment at the beginning of the utterance with the pause models of all sets. We observed a considerable reduction of the error rates when applying this approach in comparison to two well known techniques for improving the robustness. Furthermore, we developed a technique to additionally adapt certain parameters of the selected HMMs to the specific noise condition. This leads to a further improvement of the recognition rates.
Bibliographic reference. Hirsch, Hans-Günter / Kitzig, Andreas (2009): "Improving the robustness with multiple sets of HMMs", In INTERSPEECH-2009, 564-567.