Image-text-image transmission for accident scene communication: A generative AI approach
Loading...
Supplementary material
Other Title
Authors
Tang, Jing
Author ORCID Profiles (clickable)
Degree
Master of Applied Technologies (Computing)
Grantor
Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology
Date
2025
Supervisors
Liu, William
Song, Lei
Song, Lei
Type
Masters Thesis
Ngā Upoko Tukutuku (Māori subject headings)
Keyword
New Zealand
traffic accidents
traffic safety
real-time data processing
vehicle detection
computer vision
neural networks
traffic accidents
traffic safety
real-time data processing
vehicle detection
computer vision
neural networks
ANZSRC Field of Research Code (2020)
Citation
Tang, J. (2025). Image-text-image transmission for accident scene communication: A generative AI approach (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology.
https://hdl.handle.net/10652/6806
Abstract
RESEARCH QUESTIONS
1. Impact of effective question design on model output accuracy
⚫ How do we design multi-angle questions (e.g., from the perspectives of police, insurance, and news) to ensure that linear models can provide accurate and detailed answers about accident scenes?
⚫ How do different types of question design (e.g., concise vs. complex) affect the quality of generated accident scene descriptions?
2. Selection and deployment of image-to-text and text-to-image models
⚫ In real-time accident communication systems, how do we select and evaluate image-totext and text-to-image models based on criteria such as accuracy, speed, and resource usage?
⚫ How can we improve the real-time accuracy of accident scene analysis by deploying lightweight generative models in edge devices and vehicle cloud systems?
3. Comparison of LLaVA and PixArt-Σ models in accident scene recovery
⚫ How do the text descriptions generated by LLaVA compare with the image recovery effects of the PixArt-Σ model in terms of accuracy, detail richness, and practicality, especially in accident scene understanding?
⚫ How does the quality of the descriptions generated by LLaVA affect the clarity of image recovery and the decision-making effect of drivers in high-speed scenarios?
4. The impact of multi-angle information fusion on model output
⚫ How can we improve the accuracy and richness of the generated accident descriptions by combining multi-angle data (such as news, police reports, and insurance perspectives)?
⚫ How can we optimize the description generation ability of the model in multi-angle information fusion of accident scenes to ensure that the output meets the needs of different users?
5. Impact of network conditions on transmission delay of accident images and text descriptions
⚫ How do different network conditions (e.g., bandwidth, latency) affect the transmission delay of accident images and text descriptions in a high-speed driving environment?
ABSTRACT
The main purpose of this study was to address the challenge of reducing the information transmission delay in accident scenarios during high-speed driving. To achieve this, the study proposes a method for converting accident images into text descriptions for transmission and then re-generating images based on those descriptions. This image-text-image approach optimizes bandwidth usage and reduces emergency response times. By comparing the transmission speeds of text and image data uploaded to the server under simulated similar network conditions, the study shows that text descriptions are significantly better than images in terms of speed and resource efficiency. In addition, the study combines different perspectives - news, insurance reports, and police accident analysis - to enrich the model's understanding of the accident scene and designs and compares the limitations on generating images from different perspectives to help develop an accurate and comprehensive description of the restored scene. The experiments mainly used generative AI models, such as LLaVA and PixArt Sigma, to test the feasibility and quality of information restoration from text to images. The results show that despite some limitations, such as model constraints and input truncation, the quality of images generated based on short questions and descriptions is very similar visually. The proposed method is feasible for improving accident response and communication in bandwidthconstrained environments, highlighting the potential of generative AI in enhancing road safety systems.
Publisher
Permanent link
Link to ePress publication
DOI
Copyright holder
Author
Copyright notice
All rights reserved
