Title Comparison of applying advanced spatiotemporal clustering algorithms with machine learning methods for forecasting of urban mobility demand
Translation of Title Pažangių erdvės ir laiko klasterizavimo algoritmų ir mašininio mokymosi metodų taikymo lyginamoji analizė miesto mobilumo paklausos prognozavimui.
Authors Yasyreva, Anna
Full Text Download
Pages 58
Keywords [eng] spatiotemporal data, urban mobility, clustering algorithms, demand forecasting, urban mobility, erdvės ir laiko duomenys, miesto judumas, klasterizavimo algoritmas, paklausos prognozavimas, K-means, HDBSCAN, SARIMA, XGBoost, LSTM,
Abstract [eng] Urban transportation networks generate vast spatiotemporal data streams that offer untapped potential for optimizing mobility flows and mitigating congestion. This thesis investigates the dynamics of New York City taxi trip data to develop predictive models that enhance urban transportation efficiency. By integrating temporal, spatial, and derivative variables, this study constructs a multilevel analytical framework employing advanced feature engineering, hybrid clustering, and machine learning methodologies. The research pursues three primary objectives: (1) examining the efficacy of hybrid clustering algorithms in identifying demand hotspots; (2) benchmarking the predictive performance of SARIMA, XGBoost, and LSTM networks; and (3) evaluating the stability of these models across varying forecast horizons. Exploratory analysis reveals distinct spatiotemporal patterns corresponding to rush hours and weekly cycles, which were successfully segmented using K-means and HDBSCAN clustering. Comparative evaluation indicates that machine learning approaches significantly outperform the baseline SARIMA model. Specifically, the XGBoost model achieved the highest accuracy, reducing the Root Mean Square Error (RMSE) for one-day forecast by 89.4% compared to SARIMA and 39.1% compared to LSTM, three-day forecast - 90.3% compared to SARIMA and 40.3% compared to LSTM and for seven-day forecast RMSE reduced 87.8% compared to SARIMA and 36.4% compared to LSTM. Furthermore, analysis of the forecasting horizon demonstrates that XGBoost exhibits slower performance degradation over longer time windows than LSTM for mean absolute error (MAE) and mean absolute percentage error (MAPE) evaluation metrics, suggesting superior robustness for operational planning and stability for long-horizon forecasting. These findings provide actionable insights for dynamic fleet management and urban mobility optimization.
Dissertation Institution Vilniaus universitetas.
Type Master thesis
Language English
Publication date 2026