Title |
Outlier detection in multidimensional streaming data / |
Translation of Title |
Išskirčių identifikavimas daugiamačiuose srauto duomenyse. |
Authors |
Ribokaitė, Lina |
Full Text |
|
Pages |
106 |
Keywords [eng] |
Stream Data, Outlier Detection, Distance-based Outliers with Implemented SVM, Outliers Clustering. |
Abstract [eng] |
Outlier detection is an important task in many areas such as fraud detection, network analysis, sensor data analysis, etc. With increasing demand of stream data analysis, there is a need of efficient algorithm that can detect outliers in real-time. This master thesis aims to develop existing outlier detection algorithms when processing multidimensional streaming data. The research consists of overview on present outlier detection methods and will focus on distance-based outlier detection algorithms for stream data. Four distance-based outlier detection algorithms were chosen to analyse in detail and some improvements were suggested. First improvement is related to implementation of Support Vector Machines. By using SVM we can retrieve useful information about the behaviour of the data and use it in outlier detection. In the experimental part it is proved that the combination of distance-based algorithms and SVM improves outlier detection accuracy, mostly by increasing precision. Another improvement is related to the output of outlier detection algorithms. Typically simple outlier list is returned. In the work we tried implementing outliers clustering which provides additional information that helps to understand the behaviour and frequency of outliers in real-time. This improvement let us highlight outliers’ tendencies and this is very useful information since outlier detection in stream data is usually performed by using only most recent data and not the whole dataset. |
Dissertation Institution |
Vilniaus universitetas. |
Type |
Master thesis |
Language |
English |
Publication date |
2021 |