In this paper, we investigate cross-lingual automatic speech emotion recognition. The basic idea is that since the emotion recognition system is based on the acoustic features only, it is possible to combine data in different languages to improve the recognition accuracy. We begin with the construction of a Mandarin database of emotional speech, which is similar to the well-known Berlin Database of Emotional Speech (EMO-DB) in the composition and size. In order to reduce the variability due to different languages and different speakers, we propose to apply histogram equalization as a data normalization method. Recognition systems based on support vector machines have been evaluated on EMO-DB. Compared to the baseline system without multi-lingual databases and data normalization, the proposed system has achieved a relative improvement of 39.9% in the emotion recognition accuracy, from 86.2% to 91.7%. The accuracy is among the best known results reported on EMO-DB, if not the best.
Bibliographic reference. Chiou, Bo-Chang / Chen, Chia-Ping (2014): "Speech emotion recognition with cross-lingual databases", In INTERSPEECH-2014, 558-561.