Comparative research on code vulnerability detection: Open-source vs. proprietary large language models and LSTM neural network

Loading...
Thumbnail Image

Supplementary material

Other Title

Authors

Don, Ravihansa Geekiyanage Geekiyanage

Author ORCID Profiles (clickable)

Degree

Master of Applied Technologies (Computing)

Grantor

Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology

Date

2024

Supervisors

Ardekani, Iman
Bell, Jamie

Type

Masters Thesis

Ngā Upoko Tukutuku (Māori subject headings)

Keyword

software development
vulnerability assessment
risk management framework
software security
computer security
Long Short-Term Memory (LSTM)
open source
neural networks
large language models

Citation

Don, R.G.G. (2024). Comparative research on code vulnerability detection: Open-source vs. proprietary large language models and LSTM neural network (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology. https://hdl.handle.net/10652/6749

Abstract

The reliance of industries such as banking, e-commerce, logistics, transportation, energy, and healthcare on computer systems has escalated the threat of cyberattacks. With the rapid development of the online ecosystem and the growth of sensitive data, cybercriminals have developed the ability to attack software systems. Safeguarding software has become an urgent yet challenging task despite advancements in technology. The contributions of this thesis show the incorporation of security into the Software Development Lifecycle through a focus on Static Code Analysis as a proactive approach for the discovery and mitigation of security flaws in the development cycle. This research focuses on enhancing vulnerability detection in source code through advanced machine learning techniques, including open-source fine-tuned and proprietary large language models. It compares the models, including CodeGen2, LLaMA 2, and GPT, developed by OpenAI to quantitative and qualitative how suitable they are for detecting vulnerabilities, their accuracy and efficiency. Using a zero-shot classification-based approach, the study examines these models’ capabilities in detecting security risks and compares them to Word2Vec+LSTM neural networks. The findings reveal that CodeGen2 emerges as the most reliable model for vulnerability detection, achieving near-perfect precision and balanced recall, leading to superior F1 and AUC scores. LLaMA2-7b delivers reasonable performance, particularly in precision, but falls short in recall. Conversely, GPT-4 Assistant excels in recall but suffers from a high false positive rate, limiting its effectiveness. Classical neural networks, such as Word2Vec+LSTM, demonstrate moderate capability but lag modern LLMs in precision and recall. Through this comparative analysis, the thesis underscores the importance of selecting tools aligned with organizational needs and constraints. The insights gained contribute to developing secure software by integrating machine learning into the SDLC, inspiring further progress in vulnerability detection and mitigating risks in an increasingly digital world.

Publisher

Link to ePress publication

DOI

Copyright holder

Author

Copyright notice

All rights reserved

Copyright license

Available online at