Speech emotion can add more information to speech in comparison to available textual information. However, it will also lead to some problems in speech recognition process.
In a previous study, we depicted the substantial changes of speech parameters caused by speech emotion. Therefore, in order to improve emotional speech recognition rate, in a first step, the effects of emotion on speech parameters should be evaluated and in the next steps, emotional speech recognition accuracy be improved through application of suitable parameters. The changes in speech parameters, i.e. formant frequencies and pitch frequency, due to anger and grief were evaluated for Farsi language in our former research. In this research, using those results, we try to improve emotional speech recognition accuracy using baseline models. We show that adding parameters such as formant and pitch frequencies to the speech feature vector can improve recognition accuracy. The amount of improvement depends on parameter type, number of mixture components and the emotional condition.
Proper identification of emotional condition can also help in improving speech recognition accuracy. To recognize emotional condition of speech, formant and pitch frequencies were used successfully in two different approaches, namley decision tree and GMM.