ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Detection-based ASR in the automatic speech attribute transcription project

Ilana Bromberg, Qian Qian, Jun Hou, Jinyu Li, Chengyuan Ma, Brett Matthews, Antonio Moreno-Daniel, Jeremy Morris, Sabato Marco Siniscalchi, Yu Tsao, Yu Wang

We present methods of detector design in the Automatic Speech Attribute Transcription project. This paper details the results of a student-led, cross-site collaboration between Georgia Institute of Technology, The Ohio State University and Rutgers University. The work reported in this paper describes and evaluates the detection-based ASR paradigm and discusses phonetic attribute classes, methods of detecting framewise phonetic attributes and methods of combining attribute detectors for ASR.

We use Multi-Layer Perceptrons, Hidden Markov Models and Support Vector Machines to compute confidence scores for several prescribed sets of phonetic attribute classes. We use Conditional Random Fields (CRFs) and knowledge-based rescoring of phone lattices to combine framewise detection scores for continuous phone recognition on the TIMIT database. With CRFs, we achieve a phone accuracy of 70.63%, outperforming the baseline and enhanced HMM systems, by incorporating all of the attribute detectors discussed in the paper.


doi: 10.21437/Interspeech.2007-510

Cite as: Bromberg, I., Qian, Q., Hou, J., Li, J., Ma, C., Matthews, B., Moreno-Daniel, A., Morris, J., Siniscalchi, S.M., Tsao, Y., Wang, Y. (2007) Detection-based ASR in the automatic speech attribute transcription project. Proc. Interspeech 2007, 1829-1832, doi: 10.21437/Interspeech.2007-510

@inproceedings{bromberg07_interspeech,
  author={Ilana Bromberg and Qian Qian and Jun Hou and Jinyu Li and Chengyuan Ma and Brett Matthews and Antonio Moreno-Daniel and Jeremy Morris and Sabato Marco Siniscalchi and Yu Tsao and Yu Wang},
  title={{Detection-based ASR in the automatic speech attribute transcription project}},
  year=2007,
  booktitle={Proc. Interspeech 2007},
  pages={1829--1832},
  doi={10.21437/Interspeech.2007-510}
}