Segmentation of speech into phonemes is beneficial for many spoken language processing applications. Previously, a novel method which employs auditory attention features for detecting phoneme boundaries from acoustic signal was proposed in  outperforming [2, 3]. In this paper, we propose to use phone posterior features, which are obtained from a Deep Belief Network (DBN) based phoneme recognition system, along with attention features since they provide complementary information. When evaluated on TIMIT corpus, the proposed method is shown to successfully predict phoneme boundaries and outperform the recently published text-independent phoneme segmentation methods. Also, the combination of attention features with posterior features yield more than 30% relative improvement in F-measure over the system which used only attention features.
Bibliographic reference. Kalinli, Ozlem (2013): "Combination of auditory attention features with phone posteriors for better automatic phoneme segmentation", In INTERSPEECH-2013, 2302-2305.