Fundamental Frequency (F0) Fusion Transformation-Based on BLSTM for Voice Conversion
Volume 6, Issue 4, August 2018, Pages: 298-305
Received: Aug. 9, 2018;
Published: Aug. 10, 2018
Views 690 Downloads 47
Miao Xiaokong, Command & Control Engineering College, Army Engineering University, Nanjing, China
Zhang Xiongwei, Command & Control Engineering College, Army Engineering University, Nanjing, China
Sun Meng, Command & Control Engineering College, Army Engineering University, Nanjing, China
Follow on us
For the current speech conversion algorithms based on neural networks, the method of using the mean-variance linear transformation fundamental frequency (F0) can easily cause some “mechanical tones” and strange adjustments in the converted speech, and the similarity of the transformed speech is low. This paper proposes the nonlinear mapping of F0 using BLSTM (Bi-directional Long Short Term Memory) neural network, and merging the structural information with the source fundamental frequency. Firstly, the stable BLSTM network is trained through the paired fundamental frequency F0, and then the final required fundamental frequency is obtained by fusing the converted F0′ and the original structural information F0, and finally the speech synthesis is performed, thereby improving the similarity between the converted speech and the target speech. degree. At the same time, it is verified that the method proposed in this paper can reduce the mechanical sounds of speech conversion to a certain extent and improve the similarity of the converted speech.
Voice Conversion, BLSTM Neural Network, Fundamental Frequency Conversion, Nonlinear Mapping
To cite this article
Fundamental Frequency (F0) Fusion Transformation-Based on BLSTM for Voice Conversion, Science Discovery.
Vol. 6, No. 4,
2018, pp. 298-305.