In a paper published in the journal Symmetry, researchers introduced the critical information mining network (CIMNet) to address the noise challenge in crop disease image recognition. CIMNet outperformed traditional models' accuracy and applicability, effectively recognizing crop diseases in noisy environments typical of real farmland.
The network featured a non-local attention module for capturing global image context and a multi-scale critical information fusion module (MSCM) for integrating key shallow and deep features, enhancing noise resilience. Experiments on two public datasets demonstrated CIMNet's superior performance in noisy and stable conditions, supporting improved crop production efficiency.
Related Work
Past work in plant disease recognition has used deep learning methods, focusing on convolutional neural networks (CNNs) and Vision Transformers. CNNs like AlexNet and MobileNet extract features through layers of convolutions and pooling, achieving high accuracy. Vision Transformers use self-attention mechanisms to identify small, high-contrast disease areas, with models like self-attention convolutional neural networks (SACNN) showing improved accuracy. Advanced techniques, including transfer learning, have further enhanced detection, with models like EfficientNet achieving up to 98.9% accuracy on apple diseases, highlighting the effectiveness of these approaches.
Dataset Overview and Preprocessing
This study used two datasets, one for potato leaf disease and one for tomato leaf disease, to test the model's performance. Table 1 describes the distribution of the datasets. The potato disease leaf dataset includes 1628 images of early blight, 1414 of late blight, and 1020 healthy leaves. The tomato subset of Plant Village consists of 391 images of bacterial spot, 392 of early blight, 423 of late blight, 403 of leaf mold, 377 of septoria leaf spot, 415 of spider mites, 413 of target spot, 393 of yellow leaf curl virus, 393 of mosaic virus, and 421 healthy images.
The potato leaf disease dataset marks three diseases (early blight, late blight, and healthy) with images of 256 × 256 × 3 pixels. The dataset contains 3241 training images, 416 validation images, and 405 test images. The tomato dataset, a subset of Plant Village, includes detailed labeling of 10 disease statuses. Each image has a resolution of 256 × 256 × 3 pixels. The dataset is divided into 3217 training images, 402 validation images, and 402 test images at a ratio of 8:1:1.
The team resized the images using the bilinear difference method to ensure compatibility with various network models. The original size of 256 × 256 × 3 pixels was changed to 224 × 224 × 3 pixels, with the number of channels remaining three (red, green, and blue). The team addressed the insufficient training samples in agricultural datasets using random horizontal and vertical flipping with a frequency of 0.5. They saved the five models with the lowest validation losses and selected the best model for testing.
CIMNet, comprising the non-local Attention Module and the MSCM, was developed to bolster ResNet's feature extraction capabilities. Non-local modules, integrated after ResNet's second and third layers, enhanced global feature capture, while MSCM, positioned post the third layer, amalgamated shallow and deep features, curbing information loss.
Non-local modules fostered feature representation by evaluating inter-position similarity, augmenting network robustness via dimensional alignment and residual connections. MSCM, meanwhile, enriched deep feature comprehension by merging shallow and deep features, facilitated by the key information extraction module (KIB) utilizing a multi-head attention mechanism. The cross-entropy loss function, CrossEntropyLoss, was employed to refine plant disease classification accuracy, adjusting model parameters to align predicted probabilities with true labels.
Setup Evaluation
The experimental environment outlines the hardware and software configurations used for training and evaluation. Subsequently, the experimental setup is described, highlighting the three key phases: image preprocessing, model training, and performance evaluation. Specifically, attention is paid to the hyperparameters employed during model training, including epochs, batch size, optimizer, scheduler, learning rate, and loss function. This comprehensive setup ensures a rigorous evaluation of the proposed model's efficacy.
The discussion results compared CIMNet with various models, including CNN and Transformer architectures, across different noise levels and stable environments. Notably, CIMNet outperformed others, especially in noisy environments, demonstrating robustness in mitigating noise interference and extracting relevant features for accurate disease recognition. Despite a more modest improvement in stable environments, CIMNet maintained superiority over other models. Through detailed analyses, it became clear that CIMNet balances complexity and recognition accuracy, emerging as the optimal model for disease recognition in agricultural datasets.
Conclusion
To sum up, the paper addressed the problem of crop disease recognition in both stable and noisy environments. A CIMNet was proposed, incorporating a non-local Attention Module and an MSCM to enhance global feature utilization and capture the detailed texture and semantic information.
Experimental results demonstrated CIMNet's superior performance, achieving up to 96.5% accuracy in noisy environments and 98.6% in stable conditions, surpassing comparative models. Future research avenues were discussed, including incorporating multi-source data and multimodal learning techniques for comprehensive disease identification and reducing model complexity for practical integration.