INTERSPEECH 2004 - ICSLP
Onomatopoeia, or sound-imitation words (SIWs) are important in informing sound events in human-computer communication. One problem is listener-dependency in recognizing environmental sounds by means of SIWs, that is, different listener hears the same environmental sound as a different SIW even under the same condition. Therefore, the use of usual Japanese phonemes is not adequate to express SIWs. To cope with this ambiguity problem, we designed a set of new phonemes, referred to as the basic phoneme-groups (BPGs), to represent environmental sounds. The BPG consists of one or more Japanese phonemes, and thus, the ambiguity problem is resolved based on it by generating one or more SIWs for a sound event. An HMM-based recognizer generates SIWs using the phoneme-groups. Listening experiments showed that automatic SIW recognition based on the BPGs outperformed ones based on the other types of phonemes. The recall and precision rate were 56.4% and 72.2%, respectively.
Bibliographic reference. Ishihara, Kazushi / Hattori, Yuya / Nakatani, Tomohiro / Komatani, Kazunori / Ogata, Tetsuya / Okuno, Hiroshi G. (2004): "Disambiguation in determining phonemes of sound-imitation words for environmental sound recognition", In INTERSPEECH-2004, 1485-1488.