INTERSPEECH 2015

In this paper we address the problem of unsupervised learning of discrete
subword units. Our approach is based on Deep Autoencoders (AEs), whose
encoding node values are thresholded to subsequently generate a symbolic,
i.e., 1ofK (with K = No. of subwords), representation of each speech
frame. We experiment with two variants of the standard AE which we
have named Binarized Autoencoder and HiddenMarkovModel Encoder. The
first forces the binary encoding nodes to have a Ushaped distribution
(with peaks at 0 and 1) while minimizing the reconstruction error.
The latter jointly learns the symbolic encoding representation (i.e.,
subwords) and the prior and transition distribution probabilities of
the learned subwords.
The ABX evaluation of the
Zero Resource Challenge  Track 1 shows that a deep AE with only 6
encoding nodes, which assigns to each frame a 1ofK binary vector
with K = 2^6, can outperform realvalued MFCC representations in the
acrossspeaker setting. Binarized AEs can outperform standard AEs when
using a larger number of encoding nodes, while HMM Encoders may allow
more compact subword transcriptions without worsening the ABX performance.
Bibliographic reference. Badino, Leonardo / Mereta, Alessio / Rosasco, Lorenzo (2015): "Discovering discrete subword units with binarized autoencoders and hiddenMarkovmodel encoders", In INTERSPEECH2015, 31743178.