14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Combination of Auditory Attention Features with Phone Posteriors for Better Automatic Phoneme Segmentation

Ozlem Kalinli

Sony Computer Entertainment, USA

Segmentation of speech into phonemes is beneficial for many spoken language processing applications. Previously, a novel method which employs auditory attention features for detecting phoneme boundaries from acoustic signal was proposed in [1] outperforming [2, 3]. In this paper, we propose to use phone posterior features, which are obtained from a Deep Belief Network (DBN) based phoneme recognition system, along with attention features since they provide complementary information. When evaluated on TIMIT corpus, the proposed method is shown to successfully predict phoneme boundaries and outperform the recently published text-independent phoneme segmentation methods. Also, the combination of attention features with posterior features yield more than 30% relative improvement in F-measure over the system which used only attention features.


  1. O. Kalinli, “Automatic phoneme segmentation using auditory attention features,” in Proc. of Interspeech, 2012.
  2. S. Dusan and L. Rabiner, “On the relation between maximum spectral transition positions and phone boundaries,” in INTERSPEECH-2006, this archive.
  3. Y. Qiao, N. Shimomura, and N. Minematsu, “Unsupervised optimal phoneme segmentation: objectives, algorithm and comparisons,” in IEEE ICASSP, 2008.

Full Paper

Bibliographic reference.  Kalinli, Ozlem (2013): "Combination of auditory attention features with phone posteriors for better automatic phoneme segmentation", In INTERSPEECH-2013, 2302-2305.