Augmented Reality Meets Language Models: Transforming Operations and Maintenance with AI

Download PDF Copy

By Dr. Sampath LonkaReviewed by Susha Cheriyedath, M.Sc.Jul 12 2023

In a recent submission to the arXiv* server, researchers proposed a system that integrates optical character recognition (OCR), augmented reality (AR), and large language models (LLMs) to enhance user performance, ensure trustworthy interactions, and reduce workload in operations and maintenance (O&M) tasks. The system utilizes a dynamic virtual environment powered by Unity, facilitating seamless interactions between the virtual and physical realms.

*Study: Augmented Reality Meets Language Models: Transforming Operations and Maintenance with AI. Image credit: ARMMY PICCA / Shutterstock*

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Background

AR has gained significant attention in various sectors, including gaming, healthcare, engineering, and education, due to advancements in computing power and hardware. AR-supported O&M tasks have shown promise in facilities management by overlaying digital data onto the physical environment, providing real-time information to improve task execution and decision-making.

However, challenges persist in converting context data into suitable AR visualizations, particularly in the "text-to-action" aspect of O&M tasks. The current study explores the integration of large language models (LLMs), specifically ChatGPT, to generate real-time text-to-action guidance, addressing the limitations of traditional natural language processing (NLP) methods. The integration of ChatGPT into AR systems has the potential to enhance the efficiency, accuracy, and user-friendliness of O&M procedures, as demonstrated through a prototype system and preliminary user tests.

Literature review

AR for maintenance: AR has shown potential in improving efficiency, reducing costs, enhancing safety, and increasing customer satisfaction in construction and facilities management. However, challenges such as context awareness and localization remain. Researchers have addressed these challenges by integrating computer vision techniques, point cloud data, non-vision methods, and data fusion approaches. Context understanding in AR has been improved by incorporating facility metadata from tools like BIM.

ChatGPT for content understanding: NLP techniques, including text summarization, have enhanced context understanding in AEC/FM tasks. The emergence of LLMs like ChatGPT has revolutionized information filtering and comprehension. ChatGPT's adaptability, precision, transformer-based architecture, and ability to learn and adapt to new contexts make it suitable for understanding complex O&M instructions. Its success has been demonstrated in various domains, including healthcare, education, and human-robot collaboration. Emergent properties resulting from its large-scale training are behind the high-level intelligence exhibited by ChatGPT. Leveraging ChatGPT's unique features can significantly enhance context understanding in O&M tasks.

System design

The proposed system focuses on efficiency, accuracy, and real-life application compatibility. The challenges addressed include integrating OCR, ChatGPT, and a virtual environment within the Unity game engine. The system architecture revolves around Unity, facilitating interactions and data exchanges between the virtual and physical worlds. The system converts visual inputs from 2D images to text through OCR, processes them using ChatGPT, and then sends the processed data to Unity for execution.

The system utilizes the mixed reality toolkit (MRTK) and HoloLens 2 to understand spatial information and register virtual items in the physical world. Interactions are managed by Unity, with hand interactions and virtual buttons. The design ensures accurate commands from ChatGPT and seamless interactions between the virtual and physical environments, enhancing performance in operations and maintenance tasks.

Experimentation and results

A a case study was conducted with 15 subjects aged between 18 and 30 to evaluate the proposed system. The experiment involved two conditions: "no augmentation" and "AR and GPT." The subjects were recruited publicly, and three of them had experience with AR head-mounted displays (HMDs) and hand interactions. Motion sickness was reported by two subjects when using HoloLens 2, while the remaining participants did not experience any discomfort. The experiment utilized OpenAI's "gpt-4" version. The physical workspace was represented by a box-shaped control panel, while a digital twin model was created in Unity. The subjects interacted with the AR virtual scene using MRTK 2.8 and hand interactions.

Participants completed tasks in both conditions and subsequently filled out questionnaires. The complexity of the tasks was equivalent in both conditions. All participants successfully operated the virtual buttons, and their performance was evaluated based on task completion time and interaction sequence accuracy. Participants underwent a training session before the experiment to familiarize themselves with the AR system and physical control panel. The experiment was recorded, and the completion time was derived from the recorded videos.

Results: The average completion time was significantly lower in the "AR & GPT" condition compared to the "no augmentation" condition. The accuracy of physical interaction was high in both conditions, with slightly higher accuracy observed in the "AR and GPT" condition. ChatGPT exhibited high accuracy in processing the text content. The NASA task load index (NASA-TLX) survey showed a significant difference between the two conditions, indicating that the "AR & GPT" method reduced cognitive load. The trust evaluation survey also demonstrated a significant difference, suggesting that participants trusted the virtual prompts provided by ChatGPT and AR.

Conclusion

In summary, the study revealed that integrating ChatGPT into an AR system improved complex maintenance tasks. ChatGPT-enabled AR resulted in faster completion, higher accuracy, increased trust, and reduced cognitive load compared to conventional AR. These findings have implications for various sectors, enhancing efficiency and the user experience. However, the generalizability of the study is limited to maintenance tasks, and future research should include objective measures and address technical challenges to facilitate wider implementation of ChatGPT-enabled AR.

Journal reference:

Preliminary scientific report. Xu, F., Nguyen, T., & Du, J. (2023). Augmented Reality for Maintenance Tasks with ChatGPT for Automated Text-to-Action. arXiv. DOI:10.48550/arXiv.2307.03351, https://arxiv.org/abs/2307.03351

Posted in: AI Research News

Comments (0)

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Lonka, Sampath. (2023, July 12). Augmented Reality Meets Language Models: Transforming Operations and Maintenance with AI. AZoAi. Retrieved on November 03, 2025 from https://www.azoai.com/news/20230712/Augmented-Reality-Meets-Language-Models-Transforming-Operations-and-Maintenance-with-AI.aspx.
MLA
Lonka, Sampath. "Augmented Reality Meets Language Models: Transforming Operations and Maintenance with AI". AZoAi. 03 November 2025. <https://www.azoai.com/news/20230712/Augmented-Reality-Meets-Language-Models-Transforming-Operations-and-Maintenance-with-AI.aspx>.
Chicago
Lonka, Sampath. "Augmented Reality Meets Language Models: Transforming Operations and Maintenance with AI". AZoAi. https://www.azoai.com/news/20230712/Augmented-Reality-Meets-Language-Models-Transforming-Operations-and-Maintenance-with-AI.aspx. (accessed November 03, 2025).
Harvard
Lonka, Sampath. 2023. Augmented Reality Meets Language Models: Transforming Operations and Maintenance with AI. AZoAi, viewed 03 November 2025, https://www.azoai.com/news/20230712/Augmented-Reality-Meets-Language-Models-Transforming-Operations-and-Maintenance-with-AI.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.