Conditional random field (CRF) formulations for single-microphone speech separation are improved by large-margin parameter estimation. Speech sources are represented by acoustic state sequences from speaker-dependent acoustic models. The large-margin technique improves the classification accuracy of acoustic states by reducing generalization error in the training phase. Non-linear mappings inspired from the mixture-maximization (MIXMAX) model are applied to speech mixture observations. Compared with a factorial hidden Markov model baseline, the improved CRF formulations achieve better separation performance with significantly fewer training data. The separation performance is evaluated in terms of objective speech quality measures and speech recognition accuracy on the reconstructed sources. Compared with the CRF formulations without large-margin parameter estimation, the improved formulations achieve better performance without modifying the statistical inference procedures, especially when the sources are modeled with increased number of acoustic states.
Bibliographic reference. Yeung, Yu Ting / Lee, Tan / Leung, Cheung-Chi (2014): "Large-margin conditional random fields for single-microphone speech separation", In INTERSPEECH-2014, 983-987.