Abstract [eng] |
In this master thesis, the famous Thomson Reuters corpus is analyzed using modern statistical models for text data to investigate a novel research question - just how important the content of news is to financial markets? Using topic models, topic mixtures or proportions of topics that news write about within documents (time periods) are inferred and then used to predict market outcomes or test Granger causality between topics and market variables, where topics are treated to be probability distributions of words. Results of the experiments point to the direction that the content of news might not correspond to the market outcomes exactly and do not carry significant amount of predictive power. However, Granger causality testing reveals, that some specific topics, for example regarding Federal Reserve, politics and etc. do seem to Granger cause market variables and especially, the volatility of trading. This seem to suggest that the content of news might be important as a mean for market players to get informed but other factors are at play too which determine how markets are going to behave. Additionally to this research question, inference algorithm using Direct Representation scheme is derived for Supervised Hierarchical Dirichlet Process mixture model which is found to be superior to some other representation schemes in the literature. |