Incremental learning applied to phishing detection: Incremental QR and SVM

Loading...
Thumbnail Image

Supplementary material

Other Title

Authors

Kyaw, Aung Khant

Author ORCID Profiles (clickable)

Degree

Master of Applied Technologies (Computing)

Grantor

Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology

Date

2025

Supervisors

Song, Lei
Sharifzadeh, Hamid

Type

Masters Thesis

Ngā Upoko Tukutuku (Māori subject headings)

Keyword

phishing
detection
LDA-transformed (Latent Dirichlet Allocation)
SVM (support vector machine)
QR decomposition (linear algebra)
ensemble learning
modelling
machine learning
deep learning
cybersecurity
scams

Citation

Kyaw, A.K. (2025). Incremental learning applied to phishing detection: Incremental QR and SVM (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology https://hdl.handle.net/10652/6943

Abstract

RESEARCH QUESTIONS 1. How effective is LDA-based feature extraction in capturing the most discriminative characteristics for phishing detection compared to other traditional techniques? 2. How does applying QR decomposition to LDA-transformed features improve computational efficiency and enable scalable, real-time phishing detection? 3. What are the limitations of traditional batch-learning approaches in phishing detection, and how can an incremental SVM improve adaptability to evolving phishing attacks? 4. How does the proposed combination of LDA, QR decomposition, and incremental SVM compare to other machine learning pipelines in terms of memory usage, accuracy, and update time? 5. Does reducing feature dimensionality via LDA and QR decomposition help mitigate overfitting and improve generalization in dynamic phishing detection environments? ABSTRACT Phishing through social engineering methods on the Internet is an issue that will continue affecting cybersecurity, as attacks are always changing, and static detection models have the disadvantage of overfitting. To address this, we propose an incremental learning framework that combines class discriminative feature extraction, obtained via Linear Discriminant Analysis, with computationally lightweight updates enabled by QR decomposition for optimization of the projection matrix, and robust classification achieved through Support Vector Machines. This allows for adapting to a new attack pattern in batches, filtering the noise and removing redundant features, thus removing the expensive process of retraining. Evaluated on four benchmark datasets—PhiUSIIL (phishing URLs), SMS spam, email phishing, and phishing webpages—the model achieved 99.71% accuracy and a ROC-AUC of 0.9997 on PhiUSIIL, outperforming traditional PCA/SVD approaches by reducing memory consumption by 30% and accelerating batch updates to 0.038 seconds. The addition of dimensionality reduction not only prevents overfitting but also enhances deployability in real-time, resource-constrained environments, and is additionally a significant step forward in adaptive, real-world threat detection systems. Additionally, this framework’s combination of LDA, QR decomposition, and incremental SVM provides a scalable, adaptive solution that maintains classification performance while minimizing computational overhead. The QR decomposition offers numerical stability and compact orthonormal projection, leading to up to 27% faster training times and 30% lower memory usage compared to incremental PCA and SVD. These improvements enable real-time phishing detection suitable for deployment in resource-limited settings such as mobile devices or edge computing platforms. Furthermore, the model demonstrates robustness across different phishing modalities (URLs, SMS, email, webpages), showing minimal degradation when encountering new phishing patterns or domain shifts, thereby reducing overfitting and improving generalization. This makes the approach not only effective against current phishing threats but also resilient and adaptable to continuously evolving cybersecurity challenges.

Publisher

Link to ePress publication

DOI

Copyright holder

Author

Copyright notice

All rights reserved

Copyright license

Available online at