Keywords [eng] |
Fuel Consumption, Lithuania’s Vehicle Fleet, Real-World Data, Machine Learning, Regression, Linear Regressions, Gradient Boosting, CatBoost, XGBoost, LightGBM, Random Forest Regression, SVM, K-Means Clustering, Cross-Validation, Outlier Detection, Data Cleaning |
Abstract [eng] |
This master thesis develops a comprehensive methodology for estimating annual fuel con sumption in Lithuania’s road transport sector. It addresses data gaps in primary datasets, integrates supplementary data, harmonizes diverse sources, and uses predictive machine learning models. The study defines the steps of data cleaning, preprocessing, modeling, and hyperparameter tuning while proposing a unified framework adapted to different vehicle types, such as passenger cars and vans. Unlike similar studies, this study incorporates regional and demographic characteristics, such as ur ban and rural driving patterns and transport vehicle owner age, into the modeling process. Moreover, the study examines discrepancies between realworld fuel consumption and the of ficial values provided by manufacturers, proposing adjustments based on realworld data. Gradient Boosting and Random Forest Regression are identified as the most effective methods, demonstrating high predictive power, fast performance while maintaining interpretability. Key vehicle features, such as engine size, weight, and power, have a significant impact on fuel consumption predictions. Never theless, the model could be further enhanced by incorporating more comprehensive realworld data from underrepresented vehicle categories and exploring additional factors such as seasonal varia tions and other vehicle characteristics. Additionally, including older vehicles, given that the current training dataset primarily includes 2021 and 2022 models, would likely improve the generalization and robustness of the model. |