A comparative analysis of machine learning algorithms in network-based intrusion detection systems for detecting advanced persistent threats to enhance cybersecurity
Loading...
Supplementary material
Other Title
Authors
Darnal, Kiran
Author ORCID Profiles (clickable)
Degree
Master of Applied Technologies
Grantor
Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology
Date
2024
Supervisors
Sarrafpour, Bahman
Sabaee, Maryam Erfanian
Sabaee, Maryam Erfanian
Type
Masters Thesis
Ngā Upoko Tukutuku (Māori subject headings)
Keyword
advanced persistent threat (APT)
intrusion detection systems (IDS)
network based intrusion detection systems (NIDS)
cybersecurity
algorithms
machine learning
intrusion detection systems (IDS)
network based intrusion detection systems (NIDS)
cybersecurity
algorithms
machine learning
ANZSRC Field of Research Code (2020)
Citation
Darnal, K. (2024). A comparative analysis of machine learning algorithms in network-based intrusion detection systems for detecting advanced persistent threats to enhance cybersecurity (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology
https://hdl.handle.net/10652/6844
Abstract
RESEARCH QUESTIONS
Q1: Which ML algorithms most effectively detect APT traffic within NIDS?
Q2: What network behaviour features are most significant for identifying APTs using ML models?
Q3: How does data imbalance impact the detection accuracy of ML algorithms, and how do resampling techniques impact their performance?
ABSTRACT
Advanced Persistent Threats (APTs) pose a sophisticated and evolving cyber risk, persistently targeting organisations to extract high-value information. Traditional Intrusion Detection Systems (IDS), which rely on signature-based and heuristic methods, often fail to detect APTs due to their stealthy nature, high false positive rates, and inability to generalise across diverse attack patterns. Additionally, prior research on ML-based IDS has been limited by challenges such as data imbalance, ineffective feature selection strategies, and inconsistent model performance across different datasets.
This research addresses these gaps by performing a comparative evaluation of classical ML and deep learning algorithms for APT detection using the NF-UQ-NIDS-v2 dataset. This large and diverse NetFlow-based dataset is the combination of 4 datasets such as NF-UNSW NB15-v2, NF-BoT-IoT-v2, NF-ToN-IoT-v2, and NF-CSE-CIC-IDS2018-v2) comprising 43 features and over 75 million records, including 21 attack types. A robust ensemble-based feature selection approach was implemented to identify the most relevant network traffic features, optimising model performance. Nine classical ML models, including CatBoost, Decision Tree, K-Nearest Neighbors, Gradient Boosting, Random Forest, SVM, XGBoost, and two deep learning models, 1D-CNN and LSTM, were trained and evaluated using a K-fold cross-validation strategy to ensure reliability.
The results demonstrate that ensemble learning models outperformed other approaches in detecting APTs. For multi-class classification, XGBoost achieved the highest accuracy (93.23%), followed by Gradient Boosting (92.75%) and Random Forest (92.50%). In binary classification, XGBoost and Random Forest achieved outstanding accuracies of 96.40% and 96.37%, respectively, while CatBoost closely followed at 96.13%. These findings highlight the effectiveness of tree-based ensemble methods in capturing complex attack patterns and reducing misclassification rates, making them highly suitable for modern IDS. This study contributes to the field by presenting a comprehensive evaluation of ML models for APT detection, refining feature selection techniques, and providing insights into the strengths and limitations of various approaches for real-world IDS deployment.
Publisher
Permanent link
Link to ePress publication
DOI
Copyright holder
Author
Copyright notice
All rights reserved
