NEURAL TEXT-TO-SPEECH FOR UZBEK WITH PROSODY TRANSFER AND SPEAKER ADAPTATION

Authors

  • Sukhrob Avezov Sobirovich PhD, Lecturer in the Department of Russian Language and Literature Bukhara State University Author

Keywords:

Uzbek, text-to-speech, prosody transfer, speaker adaptation, FastSpeech-style model, HiFi-GAN, low-resource, evaluation.

Abstract

In this article we present an open, data-efficient Uzbek TTS system that integrates a non-autoregressive acoustic model with a prosody encoder and few-shot speaker adaptation. Rule-based text normalization and grapheme-to-phoneme conversion handle challenges of Uzbek orthography (Latin/Cyrillic), agglutinative morphology, and interrogative clitics. On 55 hours of speech, the proposed model improves MOS, reduces ASR-based CER, and successfully transfers reference prosody across voices with minimal data. We also release recipes, tokenizers, and evaluation metrics to support reproducible benchmarking and rapid local adaptation.

Downloads

Published

2025-09-26

Issue

Section

Articles

How to Cite

NEURAL TEXT-TO-SPEECH FOR UZBEK WITH PROSODY TRANSFER AND SPEAKER ADAPTATION. (2025). Educator Insights: Journal of Teaching Theory and Practice, 1(9), 121-125. https://brightmindpublishing.com/index.php/EI/article/view/1426