ISCA Archive MLSLP 2011
ISCA Archive MLSLP 2011

Unlabeled data and other marginals

Mark Hasegawa-Johnson, Jui-Ting Huang, Xiaodan Zhuang

Machine learning minimizes bounds on E[h] computed over an unknown distribution p(x,y). Unlabeled data describe p(x), while scientific prior knowledge can describe p(y). This talk will discuss the use of unlabeled data to compute p(x), and of articulatory phonology to compute p(y), for acoustic modeling and pronunciation modeling in automatic speech recognition. We will demonstrate that, if either p(x) or p(y) is known, it's possible to substantially reduce the VC dimension of the function space, thereby substantially reducing the expected risk of the classifier. As a speculative example, we will show that p(y) can be improved (relative to usual ASR methods) using finite state machines based on articulatory phonology, and preliminary results will be reviewed. As a more fully developed example, we will show that the VC dimension of a maximum mutual information (MMI) speech recognizer can be bounded by the conditional entropy of y given x; the resulting training criterion is MMI over labeled data, minus conditional label entropy of unlabeled data. Algorithms and experimental results will be provided for the cases of isolated phone recognition, and of retraining using N-best lists.

Cite as: Hasegawa-Johnson, M., Huang, J.-T., Zhuang, X. (2011) Unlabeled data and other marginals. Proc. Machine Learning in Speech and Language Processing (MLSLP 2011)

  author={Mark Hasegawa-Johnson and Jui-Ting Huang and Xiaodan Zhuang},
  title={{Unlabeled data and other marginals}},
  booktitle={Proc. Machine Learning in Speech and Language Processing (MLSLP 2011)}