Interspeech'2005 - Eurospeech
This paper describes the effect of three new acoustic feature parameters to detect audio source segments that are based on spectral cross-correlation: spectral stability, white noise similarity, and sound spectral shape. These parameters are devised for accurate audio source detection and are used in a pre-processing module for automatic indexing of the broadcast news and the meetings. We conducted two audio source classification experiments: one with the broadcast news and the other with the meetings. The experiment with the broadcast news shows that proposed parameters make it possible to capture the audio sources more accurately than can be done with conventional parameters. Classification performance is increased by 6.6% when the proposed parameters are used. The spectral stability is proved to be the most effective among the conventional and the three proposed parameters. Regarding the experiments with the meeting corpus, we conducted speaker identification in addition to the audio source classification. First, the audio source classification procedure detects each sound source segment. Then, a speaker identification procedure finds cross-talk from other participants, and determines her/his own speech period. Speaker identification performance is increased by 2.7% when the proposed parameters are used.
Bibliographic reference. Yamaguchi, Masahide / Yamashita, Masaru / Matsunaga, Shoichi (2005): "Spectral cross-correlation features for audio indexing of broadcast news and meetings", In INTERSPEECH-2005, 613-616.