Feature extraction is a process in machine learning where relevant and informative features are selected or extracted from raw data. It involves transforming the input data into a more compact representation that captures the essential characteristics for a particular task. Feature extraction is often performed to reduce the dimensionality of the data, remove noise, and highlight relevant patterns, improving the performance and efficiency of machine learning models. Techniques such as Principal Component Analysis (PCA), wavelet transforms, and deep learning-based methods can be used for feature extraction.
This research presents an innovative method called TF2 for generating synchronized talking face videos driven by speech audio. The system utilizes generative adversarial networks (GANs) and a Multi-level Wavelet Transform (MWT) to transform speech audio into different frequency domains, improving the realism of the generated video frames.
This study explores the application of artificial intelligence (AI) models for indoor fire prediction, specifically focusing on temperature, carbon monoxide (CO) concentration, and visibility. The research employs computational fluid dynamics (CFD) simulations and deep learning algorithms, including Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and Transpose Convolution Neural Network (TCNN).
This review explores the applications of artificial intelligence (AI) in studying fishing fleet (FV) behavior, emphasizing the role of AI in monitoring and managing fisheries. The paper discusses data sources for FV behavior research, AI techniques used in monitoring FV behavior, and the uses of AI in identifying vessel types, forecasting fishery resources, and analyzing fishing density.
Researchers have introduced a lightweight yet efficient safety helmet detection model, SHDet, based on the YOLOv5 architecture. This model optimizes the YOLOv5 backbone, incorporates upsampling and attention mechanisms, and achieves impressive performance with faster inference speeds, making it a promising solution for real-world applications on construction sites.
Researchers have harnessed the power of Vision Transformers (ViT) to revolutionize fashion image classification and recommendation systems. Their ViT-based models outperformed CNN and pre-trained models, achieving impressive accuracy in classifying fashion images and providing efficient and accurate recommendations, showcasing the potential of ViTs in the fashion industry.
The paper introduces the ODEL-YOLOv5s model, designed to address the challenges of obstacle detection in coal mines using deep learning target detection algorithms. This model improves detection accuracy, real-time responsiveness, and safety for driverless electric locomotives in the challenging coal mine environment. It outperforms other target detection algorithms, making it a promising solution for obstacle identification in coal mines.
Researchers have developed an enhanced YOLOv8 model for detecting wildfire smoke using images captured by unmanned aerial vehicles (UAVs). This approach improves accuracy in various weather conditions and offers a promising solution for early wildfire detection and monitoring in complex forest environments.
Researchers introduce a groundbreaking object tracking algorithm, combining Siamese networks and CNN-based methods, achieving high precision and success scores in benchmark datasets. This innovation holds promise for various applications in computer vision, including autonomous driving and surveillance.
Researchers have developed a comprehensive approach to improving ship detection in synthetic aperture radar (SAR) images using machine learning and artificial intelligence. By selecting relevant papers, identifying key features, and employing the graph theory matrix approach (GTMA) for ranking methods, this research provides a robust framework for enhancing maritime operations and security through more accurate ship detection in challenging sea conditions and weather.
Researchers have developed a "semantic guidance network" to improve video captioning by addressing challenges like redundancy and omission of information in existing methods. The approach incorporates techniques for adaptive keyframe sampling, global encoding, and similarity-based optimization, resulting in improved accuracy and generalization on benchmark datasets. This work opens up possibilities for various applications, including video content search and assistance for visually impaired users.
Researchers have expanded an e-learning system for phonetic transcription with three AI-driven enhancements. These improvements include a speech classification module, a multilingual word-to-IPA converter, and an IPA-to-speech synthesis system, collectively enhancing linguistic education and phonetic transcription capabilities in e-learning environments.
Researchers introduce the LWSRNet model for cinematographic shot classification, emphasizing lightweight, multi-modal input networks. They also present the FullShots dataset, which expands beyond existing benchmarks, and demonstrate the superior performance of LWSRNet in shot classification, contributing to advancements in cinematography analysis.
Researchers have leveraged machine learning and deep learning techniques, including BiLSTM networks, to classify maize gene expression profiles under biotic stress conditions. The study's findings not only demonstrate the superior performance of the BiLSTM model but also identify key genes related to plant defense mechanisms, offering valuable insights for genomics research and applications in developing disease-resistant maize varieties.
A study comparing machine learning algorithms (LDA, C5.0, NNET) to human perception in classifying L2 English vowels based on L1 vowel categories found that NNET and LDA achieved high accuracy, offering potential insights for cross-linguistic speech studies and language learning technology. However, C5.0 performed poorly, highlighting the challenges of handling continuous variables in this context.
Researchers explore the use of a two-stage detector based on Faster R-CNN for precise and real-time Personal Protective Equipment (PPE) detection in hazardous work environments. Their model outperforms YOLOv5, achieving 96% mAP50, improved precision, and reduced inference time, showcasing its potential for enhancing worker safety and compliance.
This comprehensive review explores the application of deep learning in multimodal emotion recognition (MER), covering audio, visual, and text modalities. It discusses deep learning techniques, challenges, and future directions in this field, emphasizing the need for lightweight architectures, interpretable models, diverse datasets, and rigorous real-world testing to advance human-centric AI technologies and interactive systems.
Researchers have developed the U-SMR network, a hybrid model combining ResNet and Swin Transformer, to enhance fabric defect detection in the textile industry. The model balances global and local features, significantly improving accuracy and edge detection while achieving competitive performance and generalization.
Researchers have developed the FFMKO algorithm, a powerful tool for the early detection of Sudden Decline Syndrome (SDS) in date palm trees. By combining image enhancement, thresholding, and clustering techniques, this algorithm achieved an impressive accuracy rate of over 94%, offering a promising solution to combat the devastating effects of SDS on date palm crops.
This article discusses the growing menace of advanced persistent threats (APTs) in the digital landscape and presents a multi-stage machine learning approach to detect and analyze these sophisticated cyberattacks. The research introduces a Composition-Based Decision Tree (CDT) model, outperforming existing algorithms and offering new insights for improved intrusion detection and prevention systems.
This paper presents a novel approach to pupil tracking using event camera imaging, a technology known for its ability to capture rapid and subtle eye movements. The research employs machine-learning-based computer vision techniques to enhance eye tracking accuracy, particularly during fast eye movements.
Terms
While we only use edited and approved content for Azthena
answers, it may on occasions provide incorrect responses.
Please confirm any data provided with the related suppliers or
authors. We do not provide medical advice, if you search for
medical information you must always consult a medical
professional before acting on any information provided.
Your questions, but not your email details will be shared with
OpenAI and retained for 30 days in accordance with their
privacy principles.
Please do not ask questions that use sensitive or confidential
information.
Read the full Terms & Conditions.