Comparison of classification algorithms for detection of phishing websites

Paulius Vaitkevičius; Virginijus Marcinkevičius

doi:10.15388/20-INFOR404

Title	Comparison of classification algorithms for detection of phishing websites
Authors	Vaitkevičius, Paulius ; Marcinkevičius, Virginijus
DOI	10.15388/20-INFOR404
Full Text
Is Part of	Informatica.. Vilnius : Vilniaus universiteto Matematikos ir informatikos institutas. 2020, vol. 31, no. 1, p. 143-160.. ISSN 0868-4952. eISSN 1822-8844
Keywords [eng]	phishing detection ; classification algorithms ; phishing datasets
Abstract [eng]	Phishing activities remain a persistent security threat, with global losses exceeding 2.7 billion USD in 2018, according to the FBI’s Internet Crime Complaint Center. In literature, different generations of phishing websites detection methods have been observed. The oldest methods include manual blacklisting of known phishing websites’ URLs in the centralized database, but they have not been able to detect newly launched phishing websites. More recent studies have attempted to solve phishing websites detection as a supervised machine learning problem on phishing datasets, designed on features extracted from phishing websites’ URLs. These studies have shown some classification algorithms performing better than others on differently designed datasets but have not distinguished the best classification algorithm for the phishing websites detection problem in general. The purpose of this research is to compare classic supervised machine learning algorithms on all publicly available phishing datasets with predefined features and to distinguish the best performing algorithm for solving the problem of phishing websites detection, regardless of a specific dataset design. Eight widely used classification algorithms were configured in Python using the Scikit Learn library and tested for classification accuracy on all publicly available phishing datasets. Later, classification algorithms were ranked by accuracy on different datasets using three different ranking techniques while testing the results for a statistically significant difference using Welch’s T-Test. The comparison results are presented in this paper, showing ensembles and neural networks outperforming other classical algorithms.
Published	Vilnius : Vilniaus universiteto Matematikos ir informatikos institutas
Type	Journal article
Language	English
Publication date	2020
CC license

„Comparison of classification algorithms for detection of phishing websites“