| Title |
Comparative analysis of deep learning models when classifying text with a significant class imbalance |
| Translation of Title |
Giliojo mokymosi modelių lyginamoji analizė, sprendžiant teksto klasifikavimo uždavinį, esant dideliam klasių disbalansui. |
| Authors |
Kvedaravičius, Karolis |
| Full Text |
|
| Pages |
43 |
| Keywords [eng] |
Deep learning, text classification, class imbalance, text generation, BERT |
| Abstract [eng] |
In this work, a comparative analysis of different deep learning models and methods that fight class imbalance is performed to identify the best performing models/methods in a variety of imbal‐ anced text classification tasks. Various different papers, which describe and try to tackle the class imbalance problem in deep learning classification tasks with different methods are analyzed. An ex‐ amination of existing overviews on the methods which tackle class imbalance in deep learning was performed. Several shortcomings are identified in that often it is not tested how the method perfor‐ mance transfer between different models and data sets. During this work 45 different experiments were performed when training deep learning models in a binary text classification task with differ‐ ent model/data set/method combinations. The results show the methods which perform with one model/data set combination often do not perform the same with the other. Several possible expla‐ nations are discussed down to model architecture and data set differences. The best method which fights class imbalance was found to be synthetic minority class sample generation using a LLM (Large Language Model). It provided the largest gains of classification metrics and improved those metrics in the largest number of model/data set combinations. |
| Dissertation Institution |
Vilniaus universitetas. |
| Type |
Master thesis |
| Language |
English |
| Publication date |
2026 |