Title Investigation of text data augmentation for transformer training via translation technique /
Authors Šeputis, Dominykas
DOI 10.15388/LMITT.2021.11
eISBN 9786090706237
Full Text Download
Is Part of Konferencijos "Lietuvos magistrantų informatikos ir IT tyrimai darbai", 2021 m. gegužės 14 d... Vilnius : Vilniaus universiteto leidykla, 2021. p. 97-105.. eISBN 9786090706237
Keywords [eng] data augmentation ; transformer ; fine-tuning ; machine translation ; DistilBERT, Opus-MT
Abstract [eng] Data augmentation can improve model’s final accuracy by introducing new data samples to the dataset. In this paper, text data augmentation using translation technique is investigated. Synthetic translations, generated by Opus-MT model are compared to the unique foreign data samples in terms of an impact to the trans- former network-based models’ performance. The experimental results showed that multilingual models like DistilBERT in some cases benefit from the introduction of the addition artificially created data samples presented in a foreign language.
Published Vilnius : Vilniaus universiteto leidykla, 2021
Type Conference paper
Language English
Publication date 2021
CC license CC license description