Optimising YOLO and ByteTrack for robust vehicle counting and classification in adverse weather: A computer-vision-based traffic monitoring study using New Zealand data
Loading...
Supplementary material
Other Title
Authors
Biswas, Matthew
Author ORCID Profiles (clickable)
Degree
Master of Applied Technologies (Computing)
Grantor
Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology
Date
2025
Supervisors
Keivanmarz, Ali
Sharifzadeh, Hamid
Sharifzadeh, Hamid
Type
Masters Thesis
Ngā Upoko Tukutuku (Māori subject headings)
Keyword
New Zealand
vehicle detection
traffic flows
computer vision
real-time data processing
machine learning
vehicle detection
traffic flows
computer vision
real-time data processing
machine learning
ANZSRC Field of Research Code (2020)
Citation
Biswas, M. (2025). Optimising YOLO and ByteTrack for robust vehicle counting and classification in adverse weather: A computer-vision-based traffic monitoring study using New Zealand data (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology
https://hdl.handle.net/10652/6800
Abstract
RESEARCH QUESTIONS
• How computer vision and machine learning techniques can be used to develop a robust system that can collect traffic data?
• How will it perform in a real-time scenario?
• How well will this method perform in rain and fog where the visibility is low and in heavily congested traffic?
• Can we capture a New Zealand Dataset representing all weather and traffic conditions?
• Is it going to be challenging to prepare a ground truth from the newly captured video data?
ABSTRACT
Traffic surveillance is critical in modern transportation management systems. It facilitates efficient traffic flow, ensures road safety, and enables informed decision-making. A significant part of traffic surveillance is collecting accurate data on traffic volume and vehicle class. This facilitates adequate and confident decision-making when it comes to future planning.
Conventional traffic volume and class data collection methods, such as loop sensors and pneumatic tubes, have been extensively utilised over an extended period. However, they are less reliable in congested traffic situations as they mostly rely on the length of the vehicle to determine the class of the vehicle. They are also expensive to install or maintain and disrupt traffic in the process. On the other hand, computer vision and deep learning-based approaches have gained significant popularity in recent years and have already been successfully applied in various traffic-related operations. These methods can collect in-depth traffic data from multiple lanes and directions with just one camera, reducing costs significantly while improving data quality.
Computer-vision-based traffic monitoring systems are good at collecting traffic vol ume and class data in general. However, there is room for improvement in extreme weather conditions like heavy rain and heavily congested traffic like stop-and-go sit uations and in classifying complex vehicles like long freight trucks and vehicles with trailers. A computer vision-based traffic monitoring system capable of surpassing tra ditional methods in terms of accuracy and reliability, particularly for measuring traffic volume and class data under challenging weather and traffic conditions, remains an area of ongoing research and development.
This study thoroughly investigates different computer vision models for traffic count and vehicle classification. It proposes a new method to improve the accuracy and detection speed in various adverse weather, lighting, and traffic congestion condi tions aimed at a reliable live monitoring system. This thesis also collects video footage of unique traffic and weather conditions from the New Zealand State Highway to eval uate the proposed method. A total of thirty-five videos covering a total length of over one hour of footage include 8 unique traffic and weather conditions and 7343 vehicles. With these video data, different sizes or sub-versions of You Only Look Once or YOLO (versions 8, 9, 10, and 11) have been thoroughly evaluated against Detection Trans former (DETR), and Faster Region Convolutional Neural Network (Faster-RCNN). A YOLOv10l and a ByteTrack model have been optimised to perform best on both accuracy and detection speed, and the final method is proposed with the optimised version of YOLOv10l and ByteTrack. Out of the box, YOLO and OpenCV-based models gave very inconsistent results where the accuracy fluctuates from 22.19% undercount to 258.97% overcount of the real vehicle count. In this study, multiple weaknesses in the Object-Detection Model and Tracker Model were identified. The proposed method resolved these issues using an optimised YOLOv10l and an optimised ByteTrack model. Overall, it achieved 98.01% accuracy on the total vehicle count, 97.32% on vehicle classification, and 97.67% on accurate lane detection with 17.84 frames per second which is a 32.13% improvement in counting accuracy and 12.64 frames per second faster compared to the out-of the-box YOLO and Open-CV based method. Compared to other versions of YOLO the proposed method outperforms version 8, 9, and 11 by 1.29%, 1.64%, and 0.63% respectively in overall accuracy while maintaining negligible difference in processing time. When it comes to DETR and Faster-RCNN, the proposed method was signif icant faster in processing time and significantly more accurate that Faster-RCNN. It achieved 1.09% higher accuracy with 17.16 FPS faster speed over DETR, and 24.31% higher accuracy with 17.36 FPS faster speed over Faster-RCNN.
Publisher
Permanent link
Link to ePress publication
DOI
Copyright holder
Author
Copyright notice
All rights reserved
