AI Translates Sign Language with Unmatched Precision

Breaking barriers in communication: a new AI-driven neural network deciphers sign languages with unprecedented precision, paving the way for inclusive interactions worldwide.

Research: Word-Level Sign Language Recognition With Multi-Stream Neural Networks Focusing on Local Regions and Skeletal Information. Image Credit: Andrey_Popov / ShutterstockResearch: Word-Level Sign Language Recognition With Multi-Stream Neural Networks Focusing on Local Regions and Skeletal Information. Image Credit: Andrey_Popov / Shutterstock

Nations worldwide have developed sign languages to fit their local communication styles. Each language consists of thousands of signs, making it difficult to learn and understand. However, the work of an Osaka Metropolitan University-led research group has now improved the accuracy of sign language recognition by using artificial intelligence to translate signs into words automatically. The research, published in the journal IEEE Access, has been able to improve the accuracy of sign language recognition.

The researchers developed a novel multi-stream neural network (MSNN) to enhance recognition accuracy. This system combines global movement analysis with localized hand, face, and skeletal position data to better understand signs. Previous research methods focused on capturing information about the signer's general movements. Accuracy problems have stemmed from the different meanings that could arise based on subtle differences in hand shape and the relationship between the hands and the body.

Associate Professor Katsufumi Inoue and Associate Professor Masakazu Iwamura of the Graduate School of Informatics worked with colleagues, including at the Indian Institute of Technology Roorkee, to improve AI recognition accuracy. They added data on the general movements of the signer's upper body, such as hand and facial expressions, as well as skeletal information on the position of the hands relative to the body.

The research team tested their method on two major datasets, WLASL and MS-ASL, which are widely used for American Sign Language (ASL) recognition. Their model achieved Top-1 accuracy improvements of approximately 10–15% compared to conventional methods. For example, it achieved 81.38% accuracy on the WLASL100 dataset.

Overview of the proposed method. The proposed multi-stream neural network (MSNN) consists of three streams: 1) a base stream, 2) local image stream, and 3) skeleton stream. Each stream is trained separately, and the recognition scores extracted from each stream are averaged to obtain the final recognition result.

Overview of the proposed method. The proposed multi-stream neural network (MSNN) consists of three streams: 1) a base stream, 2) local image stream, and 3) skeleton stream. Each stream is trained separately, and the recognition scores extracted from each stream are averaged to obtain the final recognition result.

"We were able to improve the accuracy of word-level sign language recognition by 10-15% compared to conventional methods," Professor Inoue declared. "Our method uses streams for global movement, localized hand and facial features, and skeletal data, which allows us to capture subtle distinctions in gestures. In addition, we expect that the proposed method can be applied to any sign language, hopefully leading to improved communication with speaking- and hearing-impaired people in various countries."

xamples of different gestures that represent the word “pizza” from the WLASL dataset

Examples of different gestures that represent the word “pizza” from the WLASL dataset

The study also identified challenges in recognizing visually similar signs and handling diverse data from different environments, such as varying viewpoints and complex backgrounds. Future work will aim to improve the method’s scalability, enhance robustness for real-world applications, and explore its adaptability to sign languages other than ASL, such as British, Japanese, and Indian sign languages.

Source:
Journal reference:
  • M. Maruyama, S. Singh, K. Inoue, P. Pratim Roy, M. Iwamura and M. Yoshioka, "Word-Level Sign Language Recognition With Multi-Stream Neural Networks Focusing on Local Regions and Skeletal Information," in IEEE Access, vol. 12, pp. 167333-167346, 2024, doi: 10.1109/ACCESS.2024.3494878, https://ieeexplore.ieee.org/document/10749796

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.