This paper presents our work on speech recognition of Cantonese spontaneous telephone conversations. The key-points include feature extraction by 6-layer Stacked Bottle-Neck neural network and using fundamental frequency information at its input. We have also investigated into robustness of SBN training (silence, normalization) and shown an efficient combination with PLP using Region-Dependent transforms. A combination of RDT with another popular adaptation technique (SAT) was shown beneficial. The results are reported on BABEL Cantonese data.
Bibliographic reference. Karafiát, Martin / Grézl, František / Hannemann, Mirko / Veselý, Karel / Černocký, Jan (2013): "BUT BABEL system for spontaneous Cantonese", In INTERSPEECH-2013, 2589-2593.