1- MSc in Artificial Intelligence from the School of Computer Engineering University of Technology, Tusi
2- Assistant Professor Department of Computer Engineering, K.N.Toosi University of Technology.
Abstract: (4842 Views)
Performance of automatic speech recognition (ASR) systems degrades in noisy conditions due to mismatch between training and test environments. Many methods have been proposed for reducing this mismatch in ASR systems. In recent years, deep neural networks (DNNs) have been widely used in ASR systems and also robust speech recognition and feature extraction. In this paper, we propose to use deep belief network (DBN) as a post-processing method for de-noising Mel frequency cepstral coefficients (MFCCs). In addition, we use deep belief network for extracting tandem features (posterior probability of phones occurrence) from de-noised MFCCs (obtained from previous stage) to obtain more robust and discriminative features. The final robust feature vector consists of de-noised MFCCs concatenated to mentioned tandem features. Evaluation results on Aurora2 database show that the proposed feature vector performs better than similar and conventional techniques, where it increases recognition accuracy in average by 28% in comparison to MFCCs.
Received: 2016/04/20 | Accepted: 2014/11/22 | Published: 2016/07/26