Incremental learning applied to phishing detection: Incremental QR and SVM
Loading...
Supplementary material
Other Title
Authors
Kyaw, Aung Khant
Author ORCID Profiles (clickable)
Degree
Master of Applied Technologies (Computing)
Grantor
Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology
Date
2025
Supervisors
Song, Lei
Sharifzadeh, Hamid
Sharifzadeh, Hamid
Type
Masters Thesis
Ngā Upoko Tukutuku (Māori subject headings)
Keyword
phishing
detection
LDA-transformed (Latent Dirichlet Allocation)
SVM (support vector machine)
QR decomposition (linear algebra)
ensemble learning
modelling
machine learning
deep learning
cybersecurity
scams
detection
LDA-transformed (Latent Dirichlet Allocation)
SVM (support vector machine)
QR decomposition (linear algebra)
ensemble learning
modelling
machine learning
deep learning
cybersecurity
scams
ANZSRC Field of Research Code (2020)
Citation
Kyaw, A.K. (2025). Incremental learning applied to phishing detection: Incremental QR and SVM (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology
https://hdl.handle.net/10652/6943
Abstract
RESEARCH QUESTIONS
1. How effective is LDA-based feature extraction in capturing the most discriminative characteristics for phishing detection compared to other traditional techniques?
2. How does applying QR decomposition to LDA-transformed features improve computational efficiency and enable scalable, real-time phishing detection?
3. What are the limitations of traditional batch-learning approaches in phishing detection, and how can an incremental SVM improve adaptability to evolving phishing attacks?
4. How does the proposed combination of LDA, QR decomposition, and incremental SVM compare to other machine learning pipelines in terms of memory usage, accuracy, and update time?
5. Does reducing feature dimensionality via LDA and QR decomposition help mitigate overfitting and improve generalization in dynamic phishing detection environments?
ABSTRACT
Phishing through social engineering methods on the Internet is an issue that will continue affecting cybersecurity, as attacks are always changing, and static detection models have the disadvantage of overfitting.
To address this, we propose an incremental learning framework that combines class discriminative feature extraction, obtained via Linear Discriminant Analysis, with computationally lightweight updates enabled by QR decomposition for optimization of the projection matrix, and robust classification achieved through Support Vector Machines. This allows for adapting to a new attack pattern in batches, filtering the noise and removing redundant features, thus removing the expensive process of retraining.
Evaluated on four benchmark datasets—PhiUSIIL (phishing URLs), SMS spam, email phishing, and phishing webpages—the model achieved 99.71% accuracy and a ROC-AUC of 0.9997 on PhiUSIIL, outperforming traditional PCA/SVD approaches by reducing memory consumption by 30% and accelerating batch updates to 0.038 seconds. The addition of dimensionality reduction not only prevents overfitting but also enhances deployability in real-time, resource-constrained environments, and is additionally a significant step forward in adaptive, real-world threat detection systems.
Additionally, this framework’s combination of LDA, QR decomposition, and incremental SVM provides a scalable, adaptive solution that maintains classification performance while minimizing computational overhead. The QR decomposition offers numerical stability and compact orthonormal projection, leading to up to 27% faster training times and 30% lower memory usage compared to incremental PCA and SVD. These improvements enable real-time phishing detection suitable for deployment in resource-limited settings such as mobile devices or edge computing platforms. Furthermore, the model demonstrates robustness across different phishing modalities (URLs, SMS, email, webpages), showing minimal degradation when encountering new phishing patterns or domain shifts, thereby reducing overfitting and improving generalization. This makes the approach not only effective against current phishing threats but also resilient and adaptable to continuously evolving cybersecurity challenges.
Publisher
Permanent link
Link to ePress publication
DOI
Copyright holder
Author
Copyright notice
All rights reserved
