Enhancing phishing detection through machine learning

Loading...
Thumbnail Image

Supplementary material

Other Title

Authors

Xue, Zhen

Author ORCID Profiles (clickable)

Degree

Master of Applied Technologies (Computing)

Grantor

Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology

Date

2024

Supervisors

Sarrafpour, Bahman

Type

Masters Thesis

Ngā Upoko Tukutuku (Māori subject headings)

Keyword

phishing
detection
scams
cybersecurity
modelling
machine learning

ANZSRC Field of Research Code (2020)

Citation

Xue, Z. (2024). Enhancing phishing detection through machine learning (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology https://hdl.handle.net/10652/6798

Abstract

RESEARCH QUESTIONS ¨ Which dataset is appropriate for training the phishing detection model? ¨ How to find the greatest parameters to optimize the model? ¨ Which feature selection algorithm can offer outstanding performance while considering the detection time? ABSTRACT Phishing attacks are currently considered some of the most significant threats among various forms of cyber attacks, particularly in the realm of social engineering. It threatens every person and company in their daily life and business and causes a vast number of financial and sensitive data losses. The traditional way based on the blacklist to prevent phishing is not efficient, and the update of the blacklist database is slower than the appearance of the phishing websites. Machine learning, a new technology that appeared several years ago, has become an emerging power weapon in many areas. This new technology can be used in cybersecurity to create and train many models for detecting threats and attacks. Machine learning demonstrates greater accuracy and robustness compared to traditional methods, making it more powerful and efficient. This research uses the random forest (RF), decision tree (DT) and Extreme Gradient Boosting (XG-Boost), three popular machine learning models, to create the classifier for phishing detection. Different feature selection and classification algorithms will be added to the model to enhance the three proposed models. This research aims to find an efficient machine learning model combined with a feature selection algorithm to gain a powerful way to fight against phishing and offer strong protection for individuals and firms when surfing the Internet. A confusion matrix, accuracy, precision, recall, ROC AUC and Gini will assess the models’ capacity and ensure which feature selection method is more suitable for the proposed machine learning models. A larger dataset is utilized to re-train the selected model, aiming to improve its generalization capability. This thesis will provide certain benefits for phishing detection research and cybersecurity.

Publisher

Link to ePress publication

DOI

Copyright holder

Author

Copyright notice

All rights reserved

Copyright license

Available online at