In a recent article published in the journal Plant Methods, researchers introduced the "rice field weed multi-scale feature enhanced detection transformer (RMS-DETR)" to detect weeds in rice fields using unmanned aerial vehicle (UAV) remote sensing imagery. This method aims to address challenges in identifying small, occluded, and densely distributed weed targets in complex environments.
Background
Weeds threaten rice production by competing for nutrients, water, and light, and by spreading diseases and pests. Effective control is essential for stable rice production. Traditional methods struggle with rice field conditions, such as soft, wet soil and narrow spaces. UAVs are popular for weed control due to their efficiency in aerial herbicide spraying. However, these methods often suffer from indiscriminate spraying, leading to reduced pesticide use and environmental harm.
Barnyard grass, a common and harmful weed in rice fields, is difficult to distinguish from rice due to its similar appearance and growth habits. It often appears as only a few dozen pixels in UAV images, making it prone to recognition errors due to lighting and occlusion. Therefore, a robust method is needed for detecting barnyard grass using UAV imagery.
About the Research
In this paper, the authors proposed RMS-DETR, which is based on a multi-scale feature-enhanced detection transformer (DETR) model. DETR is a transformer-based object detector that uses a self-attention mechanism to generate contextual representations of input sequences. While DETR has advantages like end-to-end learning, high-level feature learning, and large-scale data-driven capabilities, it struggles with detecting small and complex objects, such as barnyard grass in rice fields.
To overcome these limitations, the study introduced multi-scale feature layers into the DETR model and designed different feature extraction branches for various semantic feature layers. The high-level layer uses a transformer structure with a cascaded group attention (CGA) module to extract information between rice plants and barnyard grass. The CGA module provides different channel subsets of features as input for each attention head, allowing each head to learn unique features and reduce computational redundancy.
The low-level semantic feature layer uses a convolutional neural network (CNN) structure with dilated convolutions to extract features of barnyard grass. Dilated convolutions increase the receptive field and capture richer features of small targets. The features extracted by the transformer and CNN structures are then fused by a cross-scale feature fusion module, forming an information-rich feature space.
Additionally, to improve the model's inference speed, a Partial Convolution (PConv) was used to replace conventional convolution operations. PConv treats the first or last consecutive channel subset of the feature map as representative of the entire feature map, reducing memory access time and computational complexity.
Furthermore, the researchers evaluated the performance of the RMS-DETR model on a self-constructed rice field weed dataset and a public aerial object detection dataset (DOTA). They compared their model with the original DETR model and several other variants, such as anchor DETR, deformable DETR, and dynamic anchor boxes for DETR (DAB-DETR). The evaluation metrics included average precision (AP), precision, recall, model size, inference time, and floating-point operations (FLOPs).
Research Findings
The outcomes showed that the RMS-DETR model achieved the highest recognition accuracy on both datasets, outperforming other models. On the rice field weed dataset, RMS-DETR achieved an AP of 0.792, 3.6% higher than the original DETR model and 2.4% higher than the best variant (DAB-DETR). On the DOTA dataset, RMS-DETR achieved an AP of 0.851, 4.4% higher than the original DETR model and 2.1% higher than the best variant (Deformable DETR).
The RMS-DETR model also significantly improved the detection of small targets, especially single barnyard grass plants, the most challenging category in the rice field weed dataset. RMS-DETR achieved an AP of 0.792 for small targets, 91% higher than the original DETR model and 39% higher than DAB-DETR.
In terms of model size and inference speed, RMS-DETR achieved a good balance between recognition performance and computational efficiency. The model had a size of 40.8M and an inference time of 0.0081 seconds per image, comparable to the original DETR model (38.6M and 0.0075 seconds) and DAB-DETR (40.6M and 0.0082 seconds). It had higher FLOPs of 187G due to the introduction of multi-scale feature layers. However, the authors suggested that FLOPs could be reduced using a smaller input size or a lighter backbone network.
Applications
The proposed model has potential implications for precision agriculture, particularly rice production. By using UAV remote sensing images to detect weeds in rice fields, farmers can generate variable-rate prescription maps for targeted herbicide applications. This can reduce pesticide consumption, save costs, and protect the environment. Additionally, it provides key information on weed species and locations for better weed management and monitoring, helping farmers optimize their cultivation practices and improve rice yield and quality.
Conclusion
In summary, the novel technique proved effective for detecting weeds in rice fields, improving recognition accuracy and speed for small and complex targets. It achieved a good balance between performance and computational efficiency, making it suitable for deployment on devices with limited computing power. Future work should include diverse rice field data to enhance model generalization and explore applications in other agricultural scenarios, such as crop disease detection and yield estimation.