In a voice conversion system speech signal of A speaker (i.e. source speaker) is modified so that it sounds as if it had been pronounced by B speaker (i.e. target speaker). This process, sometimes, is called speaker conversion (changing speaker identity). Achieved signal from speaker conversion system is desired to have high quality and very natural. To satisfy this, three major methods are proposed as follows: VQ_based, LMR_based and GMM_based voice conversion methods.
DTW is the most popular way to warp corresponded words in two sentences. In this paper, DTW is used to design corresponding transfer function. To decrease the distance between two speakers, DTW warps the couple phonemes of two speakers, instead of two words or couple sentences while a linear temporal transform which depends on phonemes is used for error decreasing. By using other appropriate corrections that are used in learning and designing of the linear transforms, a high quality voice conversion system is achieved.