Sentiment analysis: evaluating model performance and transferability across different sectors

Emilija Rizgelytė

Title	Sentiment analysis: evaluating model performance and transferability across different sectors
Translation of Title	Sentimentų analizė: modelių našumo ir pernaudojimo skirtingose platformose vertinimas.
Authors	Rizgelytė, Emilija
Full Text
Pages	78
Keywords [eng]	Sentiment analysis, natural language processing, cross-platform transfer, ensemble averaging Sentimentų analizė, natūralios kalbos apdorojimas, žodžių maišas, atraminių vektorių klasifikatorius
Abstract [eng]	Online reviews and social media posts strongly influence consumer decision making, motivating the need for reliable sentiment analysis models. Rising variety of different platforms storing sentiment data introduces the need of model transferability, to have reliable sentiment analysis models when applied to text originating from different platforms. This thesis investigates cross-domain generalisability of sentiment analysis by comparing model performance when training and testing data come from distinct sources. Five datasets are used: IMDB and Rotten Tomatoes movie reviews, Amazon product reviews, Spotify app reviews from Google Play store and Twitter posts (tweets). Text in each dataset is preprocessed (lowercasing, HTML and non-alphabetic removal, whitespace normalisation, tokenisation, and stop word removal) and represented using Bag of Words, TF-IDF, and Word2Vec embeddings. Classical supervised models (Naïve Bayes, Logistic Regression, linear SVM) and an LSTM based recurrent neural network are evaluated, alongside support vector regression. The experiments include testing models within the same dataset, testing how well a model trained on one platform works while tested on another, removing platform or domain-specific words to see how this affects model performance, training on several datasets combined (expanding training dataset to have context from various platforms), and improving predictions by averaging results from multiple models built with the same feature representation. Results show consistent performance degradation under cross-platform transfer, with the magnitude depending on platform and text style: models trained on Amazon and Spotify transfer comparatively better, while models trained on Twitter generalise worst, and transfer between the two movie review platforms is weaker than domain similarity would suggest. TF-IDF and Bag of Words provide strong and stable sentiment analysis model baselines, Word2Vec performs well only for selected dataset pairs, and the LSTM model is competitive but does not eliminate domain shift. Domain-specific word removal and naive dataset aggregation yield mixed, generally small effects, and ensembles provide only marginal and inconsistent improvements. Friedman and Nemenyi statistical tests indicate that while many cross-platform models differ significantly from intra-domain ones, no robust, consistent benefits arise from domain word removal or ensemble averaging. Overall, the findings indicate that platform mismatch is a primary limitation for sentiment model portability, and that effective cross-platform adaptation likely requires more advanced domain adaptation or representation learning techniques beyond the baseline strategies explored here.
Dissertation Institution	Vilniaus universitetas.
Type	Master thesis
Language	English
Publication date	2026

„Sentiment analysis: evaluating model performance and transferability across different sectors“