A comparative analysis of machine learning algorithms in network-based intrusion detection systems for detecting advanced persistent threats to enhance cybersecurity

Loading...
Thumbnail Image

Supplementary material

Other Title

Authors

Darnal, Kiran

Author ORCID Profiles (clickable)

Degree

Master of Applied Technologies

Grantor

Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology

Date

2024

Supervisors

Sarrafpour, Bahman
Sabaee, Maryam Erfanian

Type

Masters Thesis

Ngā Upoko Tukutuku (Māori subject headings)

Keyword

advanced persistent threat (APT)
intrusion detection systems (IDS)
network based intrusion detection systems (NIDS)
cybersecurity
algorithms
machine learning

ANZSRC Field of Research Code (2020)

Citation

Darnal, K. (2024). A comparative analysis of machine learning algorithms in network-based intrusion detection systems for detecting advanced persistent threats to enhance cybersecurity (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology https://hdl.handle.net/10652/6844

Abstract

RESEARCH QUESTIONS Q1: Which ML algorithms most effectively detect APT traffic within NIDS? Q2: What network behaviour features are most significant for identifying APTs using ML models? Q3: How does data imbalance impact the detection accuracy of ML algorithms, and how do resampling techniques impact their performance? ABSTRACT Advanced Persistent Threats (APTs) pose a sophisticated and evolving cyber risk, persistently targeting organisations to extract high-value information. Traditional Intrusion Detection Systems (IDS), which rely on signature-based and heuristic methods, often fail to detect APTs due to their stealthy nature, high false positive rates, and inability to generalise across diverse attack patterns. Additionally, prior research on ML-based IDS has been limited by challenges such as data imbalance, ineffective feature selection strategies, and inconsistent model performance across different datasets. This research addresses these gaps by performing a comparative evaluation of classical ML and deep learning algorithms for APT detection using the NF-UQ-NIDS-v2 dataset. This large and diverse NetFlow-based dataset is the combination of 4 datasets such as NF-UNSW NB15-v2, NF-BoT-IoT-v2, NF-ToN-IoT-v2, and NF-CSE-CIC-IDS2018-v2) comprising 43 features and over 75 million records, including 21 attack types. A robust ensemble-based feature selection approach was implemented to identify the most relevant network traffic features, optimising model performance. Nine classical ML models, including CatBoost, Decision Tree, K-Nearest Neighbors, Gradient Boosting, Random Forest, SVM, XGBoost, and two deep learning models, 1D-CNN and LSTM, were trained and evaluated using a K-fold cross-validation strategy to ensure reliability. The results demonstrate that ensemble learning models outperformed other approaches in detecting APTs. For multi-class classification, XGBoost achieved the highest accuracy (93.23%), followed by Gradient Boosting (92.75%) and Random Forest (92.50%). In binary classification, XGBoost and Random Forest achieved outstanding accuracies of 96.40% and 96.37%, respectively, while CatBoost closely followed at 96.13%. These findings highlight the effectiveness of tree-based ensemble methods in capturing complex attack patterns and reducing misclassification rates, making them highly suitable for modern IDS. This study contributes to the field by presenting a comprehensive evaluation of ML models for APT detection, refining feature selection techniques, and providing insights into the strengths and limitations of various approaches for real-world IDS deployment.

Publisher

Link to ePress publication

DOI

Copyright holder

Author

Copyright notice

All rights reserved

Copyright license

Available online at