In an article published in the journal Applied Sciences, the authors explored diverse applications of artificial intelligence (AI), spanning natural language processing, computer vision, and data security. They addressed challenges in logical reasoning tasks, proposed innovative models like the discourse graph attention network, and contributed to advancements in medical image processing, text classification, and football team performance evaluation using computer network graph theory.
Background
AI's rapid evolution significantly influences computer vision, geographic information applications, and natural language processing. This progress facilitates human-like semantic reasoning and profoundly impacts human-machine interactions. While breakthroughs like Google's Bidirectional Encoder Representations from Transformers (BERT) model excel in tasks such as text classification, challenges persist in logical reasoning for machine reading comprehension.
Previous studies advocate modeling the logical structure of pre-trained models to enhance reasoning. In the present study, the authors introduced a discourse graph attention network, leveraging punctuation-based node segmentation and positional encoding. This innovation addressed gaps in logical reasoning by dynamically gathering information from adjacent nodes with the help of attention weight coefficients, adapting to varying attention levels in the inference process.
Text Classification
Text classification has witnessed significant developments since Google introduced the BERT model in 2018. Many novel tasks emerged, spanning disease diagnosis from crop electronic medical records, pre-trained models employing Chinese full-word masking strategies, recognition of protein-protein interactions in biomedical texts, and customer comment analysis models like BERTopic.
The authors mentioned an Arabic event co-reference parsing system, which enhanced common reference parsing systems. Additionally, they compiled a substantial Arabic satirical article dataset, constructing a classification model using Deep Learning (DL), Machine Learning (ML), and transformer techniques. In poverty governance, the authors discussed the HTMC-PGT framework, offering novel solutions for the hierarchical single-path multi-label classification problem. Furthermore, researchers explored topic modelling and clustering algorithms to extract the intention of consumers from comment data automatically, providing valuable insights into understanding consumer emotions.
Computer Vision
Computer vision has seamlessly integrated AI in areas like object detection, image classification, and medical image processing. Predominantly, object detection relies on the You Only Look Once (YOLO) algorithm, which is extensively employed in diverse domains like citrus orchards, driver distraction, ship detection, and steel plate defect detection. The authors discussed a pioneering lightweight mask detection algorithm, ECGYOLO, a modification of YOLOv7tiny. This innovation introduced an ECG module, replacing the ELAN module, and incorporated an ECA mechanism in the section of the neck, ensuring real-time and lightweight mask detection.
Another advancement was refining YOLOv8 for precise identification of small targets in images involving remote sensing. Key modifications involve replacing the cross-row convolutional module with the path aggregation network and the Space-to-Depth (SPD) Conv module with the SPANet architecture. The outcomes underscored substantial enhancements in recognition accuracy.
Within image classification, the authors introduced a groundbreaking pooling operation seamlessly integrated into attention blocks. This method incorporated point convolution and extension operations in the channel direction, substantially boosting ImageNet classification accuracy. Simultaneously, in medical image processing, the researchers mentioned a pioneering application of the Vision Transformer (ViT) model for identifying and localizing tumors.
The researchers proposed an enhanced ViT architecture by integrating a shared Multi-layer Perceptron (MLP) header with each patch token's output. This strategic enhancement not only provided richer task-related supervisory information but also enhanced the ViT model's generalization capabilities and optimized deep-level feature learning.
In sports, the authors explored captivating applications of computer vision. Leveraging computer network graph theory, they introduced a groundbreaking passing network to assess football team performance. Employing the ratio of the average clustering coefficient to average centrality as a comprehensive network indicator, they gauged the coordination of football teams. Results unequivocally demonstrated the indicator's pivotal role in elucidating team coordination levels, offering a valuable reference for calculating the competitiveness of football teams.
Numerous investigations delved deeper into image steganography through DL. In a distinct contribution, the authors suggested diverse strategies for embedding data covertly within WebP images. The proposed techniques encompassed both format-based and data-based methodologies. Additionally, the researchers advocated for a container selection approach that optimally utilized WebP compression parameters. Validating their proposals, they conducted thorough assessments through three application programs, conclusively showcasing the effectiveness of their methods.
Conclusion
In conclusion, the fields of natural language processing and computer vision have been significantly affected by the emergence of AI. With more and more research diving deeper into the topic, new applications and opportunities would continue to emerge while enhancing the capability of AI systems.
Journal reference:
- Zheng, W., Liu, M., Li, K., & Liu, X. (2023). AI for Computational Vision, Natural Language Processing, and Geoinformatics. Applied Sciences, 13(24), 13276. https://doi.org/10.3390/app132413276, https://www.mdpi.com/2076-3417/13/24/13276