INTERSPEECH 2006 - ICSLP
In previous work we found that automatic speech recognition (ASR) results on meetings show interesting patterns with respect to speaker overlaps, including a robust asymmetry in word error rates (WERs) before and after overlaps. The paradigm used allowed us to infer that these correlations are not due to crosstalk itself but to changes in how a person speaks around overlap regions. To better understand these ASR and perplexity results, we analyze speaker overlaps with respect to various factors, including collection site, speakers, dialog acts, and hot spots.
We examine a total of 101 meetings from the ICSI meeting corpus and the NIST meeting transcription evaluations of the last four years. We find that overlaps tend to occur at high-perplexity regions in the foreground talkers speech. We also find that overlap regions tend to have higher perplexity than those in nonoverlaps, if trigrams or 4-grams are used, but unigram perplexity within overlaps is considerably lower than that of nonoverlaps. These appear to be robust findings, because they hold in general across meetings from different collection sites, even though meeting style and absolute rates of overlap vary by site. Further analyses of overlap with respect to speakers and meeting content reveal interesting relationships between overlap and dialog acts, as well as between overlap and "hot spots" (points of increased participant involvement). Finally, results from the ICSI meeting corpus show that individual speakers have widely varying rates of being overlapped.
Bibliographic reference. Çetin, Özgür / Shriberg, Elizabeth (2006): "Analysis of overlaps in meetings by dialog factors, hot spots, speakers, and collection site: insights for automatic speech recognition", In INTERSPEECH-2006, paper 1915-Mon2A2O.6.