Speech Recognition News and Research

RSS

AI Security Education Gets a Boost to Combat Growing Cyber Threats

Researchers from NJIT, Rutgers, and Temple University are developing AI security education programs to address adversarial machine learning threats, aiming to equip future engineers with robust defense strategies.

9 Feb 2025

MIT’s New AI Compiler Speeds Up Machine Learning by 30x Using Data Redundancies

MIT researchers developed an automated system that optimizes deep-learning models by leveraging both sparsity and symmetry in data, significantly reducing computational costs and energy use.

3 Feb 2025

Microsoft Boosts AI Speech Recognition Accuracy for Non-Standard English Speakers

Microsoft’s Azure AI Speech platform has significantly improved its ability to recognize non-standard English speech, with accuracy gains between 18% and 60%, by leveraging recordings from the University of Illinois' Speech Accessibility Project.

2 Feb 2025

Predicting the Unpredictable: The New Era of Reservoir Computing

Researchers introduced a novel reservoir computing framework with a generalized readout, leveraging higher-order nonlinearities to enhance accuracy and robustness in predicting chaotic systems.

16 Jan 2025

RFID Smart Mask Revolutionizes Lip-Reading with AI Precision

Researchers have developed an RFID-enabled smart mask that uses machine learning to achieve high-accuracy lip-reading, even when users wear face masks. This innovation provides a privacy-preserving solution for improving communication and aiding hearing-impaired individuals.

9 Dec 2024

Tencent’s Hunyuan-Large AI Model Sets New Benchmark with 389 Billion Parameters

Hunyuan-Large, Tencent’s largest open-source Transformer-based mixture of experts (MoE) model, pushes the boundaries of AI with 389 billion parameters and 52 billion activated experts, excelling in tasks like reasoning, coding, and long-context processing. It outperforms leading models like LLama3.1, demonstrating superior scalability and efficiency.

11 Nov 2024

Lumos Enhances Multimodal AI with On-Device STR

Lumos, a multimodal AI system, integrates on-device scene text recognition (STR) to improve question answering capabilities. This innovation balances high-quality text recognition with optimized performance, advancing real-world applications for smart assistants.

28 Aug 2024

Intelligent Digital Assistants Improve Assembly Process Quality

A recent study explored the use of a large language model-based voice-enabled digital intelligent assistant in manufacturing assembly processes. It found that while the system effectively reduced cognitive load and improved product quality, it did not significantly impact lead times.

8 Aug 2024

Llama 3: Meta's New AI Model Rivals GPT-4

Meta's Llama 3, a 405B parameter transformer with a 128K token context window, matches GPT-4 in performance across various tasks. With integrated image, video, and speech capabilities, it emphasizes data quality and efficiency, though further development is needed for widespread release.

30 Jul 2024

AI and IoT Revolutionize Sports Training Analysis

Researchers have utilized AI and IoT voice devices to advance sports training feature recognition, employing sensors for real-time data transmission and analysis. This approach successfully identifies movement patterns and predicts athlete states, enhancing training effectiveness.

27 Jun 2024

Accent Classification with Deep Learning Models

A study introduces advanced deep learning models integrating DenseNet with multi-task learning and attention mechanisms for superior English accent classification. MPSA-DenseNet, the standout model, achieved remarkable accuracy, outperforming previous methods.

6 Jun 2024

Silent Speech Interface Using Graphene-Based Textile Strain Sensors and AI

Researchers introduced a groundbreaking silent speech interface (SSI) leveraging few-layer graphene (FLG) strain sensing technology and AI-based self-adaptation. Embedded into a biocompatible smart choker, the sensor achieved high accuracy and computational efficiency, revolutionizing communication in challenging environments.

15 May 2024

Smart Contact Lens for Precise Eye Tracking

This groundbreaking innovation introduces a miniature, imperceptible smart contact lens for wireless interaction, surpassing traditional eye-tracking methods. With biocompatibility confirmed through extensive testing, it heralds a new era in human-machine interaction, offering unparalleled precision and versatility.

8 May 2024

Bridging the Perception Gap: DNNs and Human Peripheral Vision

Researchers delve into the realm of object detection, comparing the performance of deep neural networks (DNNs) to human observers under simulated peripheral vision conditions. Through meticulous experimentation and dataset creation, they unveil insights into the nuances of machine and human perception, paving the way for improved alignment and applications in computer vision and artificial intelligence.

20 Mar 2024

Flash Attention Generative Adversarial Network for Enhanced Lip-to-Speech Technology

Researchers introduced the Flash Attention Generative Adversarial Network (FA-GAN) to address challenges in Chinese sentence-level lip-to-speech (LTS) synthesis. FA-GAN, incorporating joint modeling of global and local lip movements, outperformed existing models in both English and Chinese datasets, showcasing superior performance in speech quality metrics like STOI and ESTOI.

4 Mar 2024

Low-Carbon Transformation in Resource-Based Cities by Integrating ChatGPT and ABC Algorithms

Researchers propose a novel approach utilizing ChatGPT and artificial bee colony (ABC) algorithms to advance low-carbon transformation in resource-based cities. Their study demonstrates significant improvements in energy efficiency, carbon emissions reduction, and traffic congestion alleviation, highlighting the potential of these methods in promoting green development and sustainable urban planning.

15 Feb 2024

Innovative Vision Transformer for Pothole and Traffic Sign Detection in Challenging Conditions

Researchers from India, Australia, and Hungary introduce a robust model employing a cascade classifier and a vision transformer to detect potholes and traffic signs in challenging conditions on Indian roads. The algorithm, showcasing impressive accuracy and outperforming existing methods, holds promise for improving road safety, infrastructure maintenance, and integration with intelligent transport systems and autonomous vehicles

2 Feb 2024

Oracle-MNIST Dataset Unveils Challenges for ML in Ancient Chinese Character Recognition

Researchers from Beijing University introduce Oracle-MNIST, a challenging dataset of 30,222 ancient Chinese characters, providing a realistic benchmark for machine learning (ML) algorithms. The Oracle-MNIST dataset, derived from oracle-bone inscriptions of the Shang Dynasty, surpasses traditional MNIST datasets in complexity, serving as a valuable tool not only for advancing ML research but also for enhancing the study of ancient literature, archaeology, and cultural heritage preservation.

25 Jan 2024

Optical Meta-Imager Accelerates Machine Vision

Researchers present a meta-imager using metasurfaces for optical convolution, offloading computationally intensive operations into high-speed, low-power optics. The system employs angular and polarization multiplexing, achieving both positive and negative valued convolution operations simultaneously, showcasing potential in compact, lightweight, and power-efficient machine vision systems.

10 Jan 2024

Enhancing Science Education with Multimodal Large Language Models

Researchers discuss the transformative role of Multimodal Large Language Models (MLLMs) in science education. Focusing on content creation, learning support, assessment, and feedback, the study demonstrates how MLLMs provide adaptive, personalized, and multimodal learning experiences, illustrating their potential in various educational settings beyond science.

4 Jan 2024