Discriminative language modeling (DLM) aims to choose the most accurate word sequence by reranking the alternatives output by the automatic speech recognizer (ASR). The conventional (supervised) way of training a DLM requires a large amount of acoustic recordings together with their manual reference transcriptions. These transcriptions are used to determine the target ranks of the ASR outputs, but may be hard to obtain. Previous studies make use of the existing transcribed data to build a confusion model which boosts the training set by generating artificial data: a process known as semi-supervised training. In this study we concentrate on the unsupervised setting where no manual transcriptions are available at all. We propose three ways to determine a sequence that could serve as the missing reference text and two approaches which use this information to (i) determine the ranks of the ASR outputs in order to train the discriminative model directly, and (ii) build a confusion model in order to generate artificial training examples. We compare our techniques with the supervised and the semi-supervised setups. Using the reranking variant of the WER-sensitive perceptron algorithm, we obtain word error rate improvements up to half of those of the supervised case.
Bibliographic reference. Dikici, Erinç / Saraçlar, Murat (2014): "Unsupervised training methods for discriminative language modeling", In INTERSPEECH-2014, 2857-2861.