Emotional Speech Recognition and Emotion Identification in Farsi Language

Gharavian, Davood; احدی, سید محمد

Volume 8, Issue 1 (2008) MJEE 2008, 8(1): 13-27 | Back to browse issues page

Emotional Speech Recognition and Emotion Identification in Farsi Language

Davood Gharavian

¹, سید محمد احدی²

1- P. O. Box: 16756-1719, Tehran, Iran
2- دانشگاه صنعتی امیر کبیر

Abstract: (4215 Views)

Speech emotion can add more information to speech in comparison to available textual information. However, it will also lead to some problems in speech recognition process. In a previous study, we depicted the substantial changes of speech parameters caused by speech emotion. Therefore, in order to improve emotional speech recognition rate, in a first step, the effects of emotion on speech parameters should be evaluated and in the next steps, emotional speech recognition accuracy be improved through application of suitable parameters. The changes in speech parameters, i.e. formant frequencies and pitch frequency, due to anger and grief were evaluated for Farsi language in our former research. In this research, using those results, we try to improve emotional speech recognition accuracy using baseline models. We show that adding parameters such as formant and pitch frequencies to the speech feature vector can improve recognition accuracy. The amount of improvement depends on parameter type, number of mixture components and the emotional condition. Proper identification of emotional condition can also help in improving speech recognition accuracy. To recognize emotional condition of speech, formant and pitch frequencies were used successfully in two different approaches, namley decision tree and GMM.

Keywords: Prosody, Speech Emotion, Speech recognition, Emotion Recognition

Full-Text [PDF 884 kb] (2478 Downloads)

Received: 2010/11/21 | Accepted: 2008/12/24 | Published: 2010/11/21

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.