Title |
Algorithm for transliteration of foreign languages to lithuanian / |
Translation of Title |
Užsienio kalbų transliteracijos į lietuvių kalbą algoritmas. |
Authors |
Adebayo Olurebi, David |
Full Text |
|
Pages |
47 |
Keywords [eng] |
Natural Language Processing, machine translation, machine transliteration, context-sensitive rules, context-free rules, Lithuanian, Yoruba, Georgian, IPA, Decision Trees. |
Abstract [eng] |
The continuous trend in globalization of people and world economies demands effective and efficient worldwide information access across language barriers. According to the data provided by the Global Language Monitor, around 5,400 new words are created every year; it is only the 1,000 or so deemed to be in sufficiently widespread use that make it into print. Though automatic translation of words from one language to another has helped bridge that barrier, the adaptation of out-of-vocabulary words such as proper names in such a way that preserves the grammatical or phonetic structure of the target language has proven a daunting task. This work explores the transliteration of Yoruba proper nouns into the Lithuanian language by the means of two routes: direct and intermediate. The latter is the adaptation of out-of-vocabulary words from source to target whilst utilising another language resource, such as the International Phonetic Alphabet (IPA) or a language, at the hub of the transliteration procedure, whereas the former is the customary transliteration process. As part of the intermediate route solution, when using the IPA, we developed a syllabification algorithm with an accuracy of 99.7% to facilitate the correct transcription of Yoruba phonemes to their phonetic alphabets before mapping the source IPA to the target IPA (Lithuanian IPA). On both routes, we experimented the performance of the classification and regression tree (CART) learning against a rule-based (context-free and context-sensitive) approach with the aim of establishing (i) which of the two routes adapt Yoruba names better to Lithuanian (ii) how the rule-based approach compares with machine learning in respect to both routes (iii) how the language resources employed in the intermediate route compare with each other and which does what better. |
Dissertation Institution |
Vilniaus universitetas. |
Type |
Master thesis |
Language |
English |
Publication date |
2023 |