We present an unsupervised technique to discover the (word-sized) speech units in which a corpus of utterances can be decomposed. First, a fixed-length high-dimensional vector representation of the utterances is obtained. Then, the resulting matrix is decomposed in terms of additive units by applying the non-negative matrix factorisation algorithm. On a small vocabulary task, the obtained basis vectors each represent one of the uttered words. We also investigate the amount of speech data that is needed to obtain a correct set of basis vectors. By decreasing the number of occurrences of the words in the corpus, an indication of the learning rate of the system is obtained.
Bibliographic reference. Stouten, Veronique / Demuynck, Kris / hamme, Hugo Van (2007): "Automatically learning the units of speech by non-negative matrix factorisation", In INTERSPEECH-2007, 1937-1940.