Grape & apple disease detection using CNN-ViT hybrid model with RF classifier

Loading...
Thumbnail Image

Supplementary material

Other Title

Authors

Bedi, Japneet Singh

Author ORCID Profiles (clickable)

Degree

Master of Applied Technologies (Computing)

Grantor

Unitec, Te Pūkenga – New Zealand Institute of Skills and Technology

Date

2025

Supervisors

Varestehpour, Soheil
Shakiba, Masoud

Type

Masters Thesis

Ngā Upoko Tukutuku (Māori subject headings)

Keyword

grape vines
apple trees
leaf spots
plant disease
modelling
neural networks
image processing
pattern recognition systems in agriculture

Citation

Bedi, J.S. (2025). Grape & apple disease detection using CNN-ViT hybrid model with RF classifier (Unpublished document submitted in partial fulfilment of the requirements for the degree of Master of Applied Technologies (Computing)). Unitec, Te Pūkenga - New Zealand Institute of Skills and Technology https://hdl.handle.net/10652/6947

Abstract

RESEARCH QUESTIONS 1 How much does the hybrid CNN–ViT model improve classification accuracy for grape and apple leaf diseases compared to standalone CNN and ViT models? 2 What is the impact of using a Random Forest classifier on generalization 4 and overfitting mitigation in disease classification tasks? 3 How do CNN, ViT and CNN-ViT-RF hybrid models compare in terms of accuracy and generalization on grape and apple disease classification? ABSTRACT As agriculture increasingly adopts intelligent and automated systems, the early detection of crop diseases has become critical to maintaining yield and quality. In New Zealand, where apple and grape cultivation is central to both local and international horticulture, crops are highly vulnerable to diseases such as Black Rot, ESCA, and Leaf Blight. Manual inspection methods are time-consuming and error-prone, especially over large farm areas. To address this challenge, this study proposes a hybrid deep learning framework that integrates Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), and a Random Forest (RF) classifier for multi-class plant disease detection using RGB leaf images. CNNs are leveraged for extracting fine-grained local features, such as texture and lesions, while ViTs model global dependencies using attention-based mechanisms. The complementary feature sets are fused and passed through a Random Forest classifier, enhancing model generalization across disease classes. The proposed CNN–ViT–RF model was evaluated across ten curated datasets comprising grape and apple leaf images with multiple disease categories, class imbalance, and varying quality conditions. The training pipeline incorporated PCA-based clustering, CycleGAN-based augmentation, and 5-fold cross-validation to ensure robustness and generalizability. Experimental results show the hybrid model significantly outperforms individual CNN, ViT, and CNN+RF baselines, achieving up to 98% classification accuracy and macro-averaged F1-scores consistently above 0.95. The model also demonstrated superior AUC-ROC and class separation in confusion matrix analyses. Robustness tests under conditions of noise, brightness variation, and rotation confirmed the system’s resilience in real-world settings. Inference speed and memory profiling on the NVIDIA Jetson Nano further indicate the model’s suitability for edge deployment in smart farming applications. Overall, the fusion of local and global feature extractors with an ensemble classifier presents a scalable and effective solution for precision agriculture and automated plant health monitoring

Publisher

Link to ePress publication

DOI

Copyright holder

Author

Copyright notice

All rights reserved

Copyright license

Available online at