Lithuanian speech synthesis using neural networks

Arnas Radzevičius

Title	Lithuanian speech synthesis using neural networks
Translation of Title	Kalbą generuojantys neuroniniai tinklai lietuvių kalbai.
Authors	Radzevičius, Arnas
Full Text
Pages	63
Keywords [eng]	Keywords: natural language processing, NLP, speech synthesis, text-to-speech, TTS, phone- mic orthography, automatic text stressing, automatic accentuation, speech dataset, speech corpus, Tacotron 2, Waveglow, VITS, kalbos sintezė, sintezatorius, automatinis kirčiuoklis, kirčiuoklis, kalbos duomenų rinkinys
Abstract [eng]	This master’s thesis work proposes an approach to using stressed text instead of phonemes for TTS neural network inputs to solve the pronunciation problem of synthesized speech for higher-degree phonemic orthography languages. Tacotron 2 and VITS neural network architectures were used to train neural networks on multiple Lithuanian language datasets. Three single-speaker Lithuanian language speech corpora were collected to be used for the model training experiments, totaling 6, 27, and 92 hours of speech data, respectively. Finally, a survey is conducted to calculate MOS scores and evaluate each trained TTS neural network. Furthermore, the initial experimental results of training a neural network-based accentuation model are detailed. The accentuation model is required as a pre-processing component for the TTS model to solve the synthesized speech pronunciation problem. The best-trained model achieves an accuracy (character-level) of 93%, but the model is not practical since it assigns stress marks to all the letters in the input sequence instead of assigning a single pitch accent for each word in the sequence. The readers are provided a link to a website demonstrating the speech samples generated by the developed synthesizers. Also, the base pre-trained neural network models are provided in the links below.
Dissertation Institution	Vilniaus universitetas.
Type	Master thesis
Language	English
Publication date	2022

„Lithuanian speech synthesis using neural networks“