Automatic Speaker's Role Classification With a Bottom-up Acoustic Feature Selection

Vered Silber-Varod, Anat Lerner, Oliver Jokisch


The objective of the current study is to automatically identify the role played by the speaker in a dialogue. By using machine learning procedures over acoustic feature, we wish to automatically trace the footprints of this information through the speech signal. The acoustic feature set was selected from a large statistic-based feature sets including 1,583 dimension features. The analysis is carried out on interactive dialogues of a Map Task setting. The paper first describes the methodology of choosing the 100 most effective attributes among the 1,583 features that were extracted, and then presents the classification results test of the same speaker in two different roles, and a gender-based classification. Results show an average of a 71% classification rate of the role the same speaker played, 65% for all women together and 65% for all men together.


 DOI: 10.21437/GLU.2017-11

Cite as: Silber-Varod, V., Lerner, A., Jokisch, O. (2017) Automatic Speaker's Role Classification With a Bottom-up Acoustic Feature Selection. Proc. GLU 2017 International Workshop on Grounding Language Understanding, 52-56, DOI: 10.21437/GLU.2017-11.


@inproceedings{Silber-Varod2017,
  author={Vered Silber-Varod and Anat Lerner and Oliver Jokisch},
  title={Automatic Speaker's Role Classification With a Bottom-up Acoustic Feature Selection},
  year=2017,
  booktitle={Proc. GLU 2017 International Workshop on Grounding Language Understanding},
  pages={52--56},
  doi={10.21437/GLU.2017-11},
  url={http://dx.doi.org/10.21437/GLU.2017-11}
}