SMS phishing detection using machine learning and deep learning techniques

Loading...
Thumbnail Image

Supplementary material

Other Title

Authors

Hasti, Pavan

Author ORCID Profiles (clickable)

Degree

Master of Applied Technologies (Computing)

Grantor

Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology

Date

2025

Supervisors

Barmada, Bashar
Varastehpour, Soheil

Type

Masters Thesis

Ngā Upoko Tukutuku (Māori subject headings)

Keyword

New Zealand
SMS phishing
phishing
detection
scams
machine learning
deep learning

Citation

Hasti, P. (2025). SMS phishing detection using machine learning and deep learning techniques (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology https://hdl.handle.net/10652/6803

Abstract

RESEARCH QUESTIONS How effective are ML and DL models in detecting SMS phishing in New Zealand’s mobile system? • What impact does class imbalance have, and how can SMOTE address this in phishing detection? • Which ML and DL models perform best for SMS phishing detection in New Zealand? • How can ethical data handling be maintained in SMS phishing detection? • What real-world challenges (model drift, adversarial attacks) could affect SMS phishing detection, and how can they be mitigated? ABSTRACT Short Message Service (SMS) is still a vital communication tool in our daily life activities, even with the quick development of Internet protocol-based messaging services. An increasingly sophisticated cyber threat known as SMS phishing (smishing) has emerged in tandem with the rise in mobile device use. As a result, people are finding it hard to distinguish good messages from bad ones. The attackers' propensity for always developing methods has made smishing detection problematic for typical phishing detection techniques including heuristic, feature-based, rule-based, and blacklist approaches. The aim is to construct a New Zealand domain-specific SMS phishing detection system to overcome these difficulties and improve mobile system cybersecurity. The goal is to build an effective ML and DL-based model to accurately detect and categorize SMS smishing messages, addressing class imbalance and ensuring ethical data handling for improved cybersecurity. This study collects the dataset, which is the combination of SMS Smishing Collection from Kaggle and smishing messages from the New Zealand Department of Internal Affairs (DIA) anti-scam archive, ensuring relevance to the local context. The Pre-processing methods involved steps to manage missing and duplicated values, while checking label uniqueness, and performing text pre-processing and lemmatization, followed by label encoding. The dataset is balanced with SMOTE. Random Forest and XGBoost, CNN, RNN, and LSTM are some of the deep learning and machine learning classification models selected for their exceptional performance in text analysis. The models work well for detecting fake SMS messages in the setting of mobile communication networks. Accuracy, precision, recall, and F1score were some of the important measures used to assess the models' performance. The result showed that the XGBoost classifier achieving a superior accuracy of 97.05% compared to other models. This study highlights the practical implications of smishing detection, particularly in real-world mobile communication systems, emphasizing the importance of integrating these models into mobile security applications. Additionally, the research discusses potential future work, including the integration of transformer-based models, the handling of model drift, and addressing adversarial concerns in dynamic environments.

Publisher

Link to ePress publication

DOI

Copyright holder

Author

Copyright notice

All rights reserved

Copyright license

Available online at