Abstract [eng] |
Speech is the most natural way of human communication. Text-to-speech (TTS) problem arises in various applications: reading email aloud, reading text from e-book aloud, services for the people with speech disorders. Construction of speech synthesizer is a very complex task. Researchers are trying to automate speech synthesis. In order to solve the problem of Lithuanian speech synthesis, it is necessary to develop mathematical models for Lithuanian speech sounds. The research object of the dissertation is Lithuanian vowel and semivowel phoneme models. The proposed vowel and semivowel phoneme models can be used for developing a TTS formant synthesizer. Lithuanian vowel and semivowel phoneme modelling framework based on a vowel and semivowel phoneme mathematical model and an automatic procedure of estimation of the vowel phoneme fundamental frequency and input determining is proposed. Using this framework, the phoneme signal is described as the output of a linear multiple-input and single-output (MISO) system. The MISO system is a parallel connection of single-input and single-output (SISO) systems whose input impulse amplitudes vary in time. Within this framework two synthesis methods are proposed: harmonic and formant. Simulation has revealed that that the proposed framework gives sufficiently good vowel and semivowel synthesis quality. |