Grape & apple disease detection using CNN-ViT hybrid model with RF classifier
Loading...
Supplementary material
Other Title
Authors
Bedi, Japneet Singh
Author ORCID Profiles (clickable)
Degree
Master of Applied Technologies (Computing)
Grantor
Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology
Date
2025
Supervisors
Varestehpour, Soheil
Shakiba, Masoud
Shakiba, Masoud
Type
Masters Thesis
Ngā Upoko Tukutuku (Māori subject headings)
Keyword
grape vines
apple trees
leaf spots
plant disease
modelling
neural networks
image processing
pattern recognition systems in agriculture
apple trees
leaf spots
plant disease
modelling
neural networks
image processing
pattern recognition systems in agriculture
ANZSRC Field of Research Code (2020)
Citation
Bedi, J.S. (2025). Grape & apple disease detection using CNN-ViT hybrid model with RF classifier (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology
https://hdl.handle.net/10652/6947
Abstract
RESEARCH QUESTIONS
1 How much does the hybrid CNN–ViT model improve classification accuracy for grape and apple leaf diseases compared to standalone CNN and ViT models?
2 What is the impact of using a Random Forest classifier on generalization 4 and overfitting mitigation in disease classification tasks?
3 How do CNN, ViT and CNN-ViT-RF hybrid models compare in terms of accuracy and generalization on grape and apple disease classification?
ABSTRACT
As agriculture increasingly adopts intelligent and automated systems, the early detection of crop diseases has become critical to maintaining yield and quality. In New Zealand, where apple and grape cultivation is central to both local and international horticulture, crops are highly vulnerable to diseases such as Black Rot, ESCA, and Leaf Blight. Manual inspection methods are time-consuming and error-prone, especially over large farm areas. To address this challenge, this study proposes a hybrid deep learning framework that integrates Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and a Random Forest (RF) classifier for multi-class plant disease detection using RGB leaf images.
CNNs are leveraged for extracting fine-grained local features, such as texture and lesions, while ViTs model global dependencies using attention-based mechanisms. The complementary feature sets are fused and passed through a Random Forest classifier, enhancing model generalization across disease classes. The proposed CNN–ViT–RF model was evaluated across ten curated datasets comprising grape and apple leaf images with multiple disease categories, class imbalance, and varying quality conditions. The training pipeline incorporated PCA-based clustering, CycleGAN-based augmentation, and 5-fold cross-validation to ensure robustness and generalizability.
Experimental results show the hybrid model significantly outperforms individual CNN, ViT, and CNN+RF baselines, achieving up to 98% classification accuracy and macro-averaged F1-scores consistently above 0.95. The model also demonstrated superior AUC-ROC and class separation in confusion matrix analyses. Robustness tests under conditions of noise, brightness variation, and rotation confirmed the system’s resilience in real-world settings. Inference speed and memory profiling on the NVIDIA Jetson Nano further indicate the model’s suitability for edge deployment in smart farming applications.
Overall, the fusion of local and global feature extractors with an ensemble classifier presents a scalable and effective solution for precision agriculture and automated plant health monitoring
Publisher
Permanent link
Link to ePress publication
DOI
Copyright holder
Author
Copyright notice
All rights reserved
