Enhancing phishing detection through machine learning
Loading...
Supplementary material
Other Title
Authors
Xue, Zhen
Author ORCID Profiles (clickable)
Degree
Master of Applied Technologies (Computing)
Grantor
Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology
Date
2024
Supervisors
Sarrafpour, Bahman
Type
Masters Thesis
Ngā Upoko Tukutuku (Māori subject headings)
Keyword
phishing
detection
scams
cybersecurity
modelling
machine learning
detection
scams
cybersecurity
modelling
machine learning
ANZSRC Field of Research Code (2020)
Citation
Xue, Z. (2024). Enhancing phishing detection through machine learning (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology
https://hdl.handle.net/10652/6798
Abstract
RESEARCH QUESTIONS
¨ Which dataset is appropriate for training the phishing detection model?
¨ How to find the greatest parameters to optimize the model?
¨ Which feature selection algorithm can offer outstanding performance while considering the detection time?
ABSTRACT
Phishing attacks are currently considered some of the most significant threats among various forms of cyber attacks, particularly in the realm of social engineering. It threatens every person and company in their daily life and business and causes a vast number of financial and sensitive data losses. The traditional way based on the blacklist to prevent phishing is not efficient, and the update of the blacklist database is slower than the appearance of the phishing websites.
Machine learning, a new technology that appeared several years ago, has become an emerging power weapon in many areas. This new technology can be used in cybersecurity to create and train many models for detecting threats and attacks. Machine learning demonstrates greater accuracy and robustness compared to traditional methods, making it more powerful and efficient.
This research uses the random forest (RF), decision tree (DT) and Extreme Gradient Boosting (XG-Boost), three popular machine learning models, to create the classifier for phishing detection. Different feature selection and classification algorithms will be added to the model to enhance the three proposed models. This research aims to find an efficient machine learning model combined with a feature selection algorithm to gain a powerful way to fight against phishing and offer strong protection for individuals and firms when surfing the Internet.
A confusion matrix, accuracy, precision, recall, ROC AUC and Gini will assess the models’ capacity and ensure which feature selection method is more suitable for the proposed machine learning models. A larger dataset is utilized to re-train the selected model, aiming to improve its generalization capability. This thesis will provide certain benefits for phishing detection research and cybersecurity.
Publisher
Permanent link
Link to ePress publication
DOI
Copyright holder
Author
Copyright notice
All rights reserved
