Title |
Keeleandmete õigusliku režiimi mõju nende abil loodud keelemudelitele / |
Translation of Title |
Influence of legal regime of language data on language models. |
Authors |
Kelli, Aleksei ; Vider, Kadri ; Tavast, Arvi ; Lindén, Krister ; Birštonas, Ramūnas ; Labropoulou, Penny ; Värv, Age ; Kull, Irene ; Tavits, Gaabriel ; Ginter, Carri |
DOI |
10.5128/ERYa16.04 |
Full Text |
|
Is Part of |
Eesti rakenduslingvistika ühingu aastaraamat = Estonian papers in applied linguistics.. Tallinn : Eesti Rakenduslingvistika Ühing. 2020, vol. 16, p. 59-76.. ISSN 1736-2563. eISSN 2228-0677 |
Keywords [eng] |
copyright ; personal data ; language model ; language technology ; text and data mining |
Abstract [eng] |
This article aims to explain the extent to which the legal regime applicable to language data affects the development and use of language models. In their approach, the authors follow a process chart, starting from raw data to finished products containing language technology (eg a refrigerator with a speech interface). The raw data used in language technologies often include copyrighted works, objects of related rights (performances, sound recordings) and personal data (voice, other information about the person) stored in non-annotated and annotated databases. The legal issues of language data have already been studied. However, the legal aspects of language models have not been throughly explored. The authors are of the opinion that, as a rule, the legal status of the language models is not affect by the legal status of the used raw language data, since copyrighted works usually do not remain in the model. However, the use of a person’s voice in a language model can create legal problems. The authors analyze possible solutions to overcome these problems. The article also outlines the regulation of data mining introduced by the new copyright directive and its implementation in the context of development of language models. |
Published |
Tallinn : Eesti Rakenduslingvistika Ühing |
Type |
Journal article |
Language |
English |
Publication date |
2020 |
CC license |
|