ARCHIVES
Original Article
Voice Cloning
Abubaker Bin Saleh Annaqeeb1
Dr. Mohd Rafi Ahmed2
1Student, MCA, Deccan College of Engineering and Technology, Hyderabad, Telangana, India. 2Associate professor, MCA, Deccan College of Engineering and Technology, Hyderabad, Telangana, India.
Published Online: September-October 2025
Pages: 91-96
Cite this article
↗ https://www.doi.org/10.59256/ijire.20250605015References
1. J. Shen, R. Pang, R. Weiss, M. Schuster, et al., “Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions,” Proc. IEEE ICASSP, 2018, pp. 4779–4783.
2. A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, et al., “WaveNet: A generative model for raw audio,” arXiv preprint arXiv:1609.03499, 2016.
3. Y. Wang, R. Skerry-Ryan, D. Stanton, et al., “Tacotron: Towards end-to-end speech synthesis,” Proc. Interspeech, 2017, pp. 4006–4010.
4. J. Kim, J. Kong, J. Bae, “HiFi-GAN: Generative adversarial networks for efficient and high fidelity speech synthesis,” Proc. NeurIPS, 2020.
5. J. Kim, J. Kong, and J. Son, “Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech,” Proc. ICML, 2021.
6. Y. Jia, Y. Zhang, R. Weiss, et al., “Transfer learning from speaker verification to multispeaker text-to-speech synthesis,” Proc. NeurIPS, 2018.
7. H. Zen, V. Dang, R. Clark, et al., “LibriTTS: A corpus derived from LibriSpeech for text-to-speech,” Proc. Interspeech, 2019, pp. 1526–1530.
8. P. Navarretta, “Ethical challenges in synthetic voices and voice cloning,” AI & Society, vol. 37, no. 4, pp. 1433–1445, 2022.
9. A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” Proc. ICLR, 2021.
10. N. Kalchbrenner, E. Elsen, K. Simonyan, et al., “Efficient neural audio synthesis,” Proc. ICML, 2018, pp. 2410–2419.
2. A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, et al., “WaveNet: A generative model for raw audio,” arXiv preprint arXiv:1609.03499, 2016.
3. Y. Wang, R. Skerry-Ryan, D. Stanton, et al., “Tacotron: Towards end-to-end speech synthesis,” Proc. Interspeech, 2017, pp. 4006–4010.
4. J. Kim, J. Kong, J. Bae, “HiFi-GAN: Generative adversarial networks for efficient and high fidelity speech synthesis,” Proc. NeurIPS, 2020.
5. J. Kim, J. Kong, and J. Son, “Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech,” Proc. ICML, 2021.
6. Y. Jia, Y. Zhang, R. Weiss, et al., “Transfer learning from speaker verification to multispeaker text-to-speech synthesis,” Proc. NeurIPS, 2018.
7. H. Zen, V. Dang, R. Clark, et al., “LibriTTS: A corpus derived from LibriSpeech for text-to-speech,” Proc. Interspeech, 2019, pp. 1526–1530.
8. P. Navarretta, “Ethical challenges in synthetic voices and voice cloning,” AI & Society, vol. 37, no. 4, pp. 1433–1445, 2022.
9. A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” Proc. ICLR, 2021.
10. N. Kalchbrenner, E. Elsen, K. Simonyan, et al., “Efficient neural audio synthesis,” Proc. ICML, 2018, pp. 2410–2419.
Related Articles
2025
Iot-Based Power Theft Detector
2025
Comparative Analysis of Conventional and Diagrid Structural Buildings with Plan Irregularity
2025
The Role of C Language in Google, Adobe, and Mozilla Firefox Applications: Performance, Security, and Future Developments
2025
Seismic Analysis of Circular Building and Rectangular Building
2025
Seismic analysis of double-decker elevated water tank
2025