INTERSPEECH 2007
8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Never-Ending Learning with Dynamic Hidden Markov Network

Konstantin Markov, Satoshi Nakamura

NICT, Japan

Current automatic speech recognition systems have two distinctive modes of operation: training and recognition. After the training, system parameters are fixed, and if a mismatch between training and testing conditions occurs, an adaptation procedure is commonly applied. However, the adaptation methods change the system parameters in such a way that previously learned knowledge is irrecoverably destroyed. In searching for a solution to this problem and motivated by the results of recent neuro-biological studies, we have developed a network of hidden Markov states that is capable of unsupervised on-line adaptive learning while preserving the previously acquired knowledge. Speech patterns are represented by state sequences or paths through the network. The network can detect previously unseen patterns, and if such a new pattern is encountered, it is learned by adding new states and transitions to the network. Paths and states corresponding to spurious events or "noises" and, therefore, rarely visited, are gradually removed. Thus, the network can grow and shrink when needed, i.e. it dynamically changes its structure. The learning process continues as long as the network lasts, i.e. theoretically forever, so it is called neverending learning. The output of the network is the best state sequence and the decoding is done concurrently with the learning. Thus the network always operates in a single learning/decoding mode. Initial experiments with a small database of isolated spelled letters showed that the Dynamic Hidden Markov network is indeed capable of never-ending learning and can perfectly recognize previously learned speech patterns.

Full Paper

Bibliographic reference.  Markov, Konstantin / Nakamura, Satoshi (2007): "Never-ending learning with dynamic hidden Markov network", In INTERSPEECH-2007, 1437-1440.