In a paper published in the journal Applied Sciences, researchers presented a novel approach to enhancing large language models (LLMs) with domain-specific knowledge in E-learning. The method incorporated external knowledge sources, such as E-learning lectures and research papers, to improve the LLM’s comprehension and generation abilities by utilizing retrieval-augmented generation. Experimental evaluations highlighted the effectiveness and superiority of this approach in capturing and generating E-learning-specific information compared to existing methods.
Background
Past work has explored domain-specific LLMs, tailored models fine-tuned to perform tasks dictated by organizational guidelines. Unlike general-purpose language models, these custom models require a deep understanding of specific contexts, such as product data and industry terminologies. Fine-tuning foundational models is often sufficient, requiring fewer datasets and less computation. Retrieval-augmented generation (RAG) enhances LLMs by combining pre-trained models with information retrieval systems, enabling the models to perform context-specific tasks such as question answering.
Enhanced E-learning Models
The approach involves pre-training the LLM, Llama 2, released by Meta AI in July 2023, and utilizing E-learning materials, including textbooks and research papers, as external sources of knowledge. Using RAG and source knowledge, relevant information is retrieved from external data sources, augmented with this additional knowledge, and fed into the LLM. The detailed procedures include three primary parts—retrieval, augmentation, and generation.
Based on the prompt, relevant knowledge is retrieved from a knowledge base, typically consisting of vector embeddings stored in a vector database like Pinecone. The retriever embeds the given input at runtime, searches through the vector space containing the data, and ranks the top K most relevant retrieval results based on their proximity in the vector space. It enables understanding of textual relationships and relevance between an input and the document corpus. The retrieved information is combined with the initial prompt using a specific format.
The augmented prompt is fed into a large language model to generate the final output. The generator utilizes these retrieval results to craft a series of prompts based on a predefined template to produce a coherent and relevant response to the input.
The team evaluated the effectiveness of this approach through comprehensive experiments conducted on a large-scale E-learning dataset. The assessment focused on performance across various E-learning-specific tasks, such as question answering, passage generation, and summarization, as well as the model's capability to manage new, unseen E-learning content.
LLMs, such as generative pre-trained transformer 4 (GPT-4), were leveraged as the backbone to output responses, supplemented by extra information from either data retrieval methods or this approach. For data retrieval, the top three relevant information chunks were retrieved from Google Cloud and used as supplementary information for the backbone LLMs during answer generation. Baselines included raw LLM without domain-specific knowledge and LLM with domain-specific knowledge (DSK). Analysts used evaluation metrics like bidirectional encoder representations from transformers (BERTScore) to assess the semantic similarity between generated answers and ground truth. Case studies on Innovation Disruption, Sentiment Analysis, and others from textbooks and research papers further validated the approach by comparing BERTScores between retrieved chunks and domain knowledge generated by the LLM.
Domain-Specific LLM Effectiveness
The experimental results highlight the effectiveness of integrating domain-specific knowledge into LLMs for E-learning applications, focusing on the textbook "Digital Marketing" and the research paper "From Mining to meaning." The enriched LLM demonstrates superior accuracy, fluency, and coherence performance compared to baseline models. A knowledge graph notably enhances the model's ability to comprehend and generate contextually relevant responses.
The team used the open-source BERTScore Python package from Hugging Face's Transformers Library to calculate precision, recall, and F1 Score to automate the evaluation process. The results indicate that the approach consistently improves semantic overlap metrics, particularly in cases where local domain knowledge was applied.
In cases 1 and 2, focusing on familiar concepts like innovation disruption and sentiment analysis from "digital marketing," both raw LLM and LLM augmented with local domain knowledge perform well in the semantic overlap. However, the latter shows slightly higher BERTScore values, underscoring its effectiveness in providing more precise responses. Conversely, in cases 3 and 4 involving specific research topics like "from text to action grid" and "scale-directed text analysis," the LLM enhanced with local domain knowledge significantly outperforms the raw LLM. This disparity underscores the importance of leveraging domain-specific information to generate more accurate outputs in specialized contexts.
Conclusion
In summary, this paper introduced an innovative method to enhance LLMs with domain-specific knowledge in E-learning by integrating external sources like textbooks and research papers. It significantly improved LLM performance in generating accurate and contextually relevant content, particularly for niche and recent research topics. Challenges include maintaining current external knowledge and managing integration complexities. Future research aims to automate knowledge updates, refine prompt designs for better context relevance, and enhance LLM learning techniques for adaptable deployment in educational settings.