In an article recently submitted to the arxiv* server, researchers advocated for the adoption of retrieval-augmented language models (LMs) over traditional parametric LMs. Retrieval-augmented LMs, incorporating large-scale data stores during inference, offered improved reliability, adaptability, and verifiability.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Despite their potential, obstacles hindered widespread adoption, such as limited text utilization beyond knowledge-intensive tasks. The authors proposed a roadmap for developing general-purpose retrieval-augmented LMs, emphasizing reconsideration of data stores, enhanced retriever-LM interaction, and robust infrastructure for efficient training and inference.
Background
LMs, exemplified by generative pre-trained transformer (GPT)-4, showcase significant proficiency in various natural language processing (NLP) tasks, integrating rich language understanding and world knowledge. However, they grapple with persistent challenges, including factual errors, difficulty in verification, and impractical model size.
The present paper introduced the concept of retrieval-augmented LMs as a superior alternative, aiming to overcome these limitations. Parametric LMs rely solely on large-scale text data during training, leading to shortcomings such as factual inaccuracies, verification challenges, and substantial model sizes. Retrieval-augmented LMs, in contrast, leveraged external data stores during inference, reducing factual errors, enhancing attributions, and enabling flexible data opt-in/out.
The proposal envisioned a new generation of LMs capable of seamless adaptation, efficiency, and verifiability, crucial for widespread adoption. While acknowledging the effectiveness of retrieval-augmented LMs, the authors identified existing challenges hindering broader implementation. These included limitations in finding relevant text for diverse tasks, shallow interactions between retrieval and LM components, and insufficient infrastructure for efficient training and inference.
The roadmap presented outlines strategies to address these challenges, emphasizing a nuanced understanding of relevance, deeper interactions between components, and interdisciplinary efforts for scalable infrastructure. The ultimate goal was to unlock the full potential of retrieval-augmented LMs, extending their applications across a wide spectrum of tasks and domains beyond conventional knowledge-intensive contexts.
How far can we go with parametric LMs?
The researchers investigated the limitations of parametric LMs highlighting practical challenges hindering the development of reliable intelligent systems. Parametric LMs, trained on large-scale text datasets, stored knowledge within their parameters, leading to several weaknesses. These include factual inaccuracies, difficulties in verification, challenges in managing and filtering training data, computationally expensive adaptation to new data distributions, and prohibitively large model sizes.
Factual errors, especially in handling long-tail knowledge, persisted despite scaling efforts. Verification became problematic due to the lack of clear attributions. Filtering out sensitive data during training posed challenges, and adapting LMs to evolving data distributions was computationally expensive. The relentless pursuit of larger model sizes for improved performance raised environmental concerns and practical issues. The authors suggested these challenges necessitate a shift from parametric LMs to retrieval-augmented LMs for more reliable, adaptable, and attributable LMs.
How can retrieval-augmented LMs address these issues?
Retrieval-augmented LMs consisted of a retriever and a parametric LM. The retriever built an index based on a datastore of documents, and during inference, it retrieved relevant text from the datastore. The parametric LM then used both the original input and the retrieved text for predictions.
This approach explored across machine learning domains, was particularly effective in minimizing factual errors, improving attributions, enabling flexible data opt-in/out, enhancing adaptability, and demonstrating parameter efficiency. Recent advancements, such as the retrieval-augmented generation (RAG) model, have showcased significant improvements in knowledge-intensive tasks, offering a promising avenue for addressing the weaknesses inherent in parametric LMs.
Why haven’t retrieval-augmented LMs been widely adopted?
The researchers evaluated the current state of retrieval-augmented LMs and discussed challenges hindering their widespread adoption compared to parametric LMs. The architecture taxonomy classified these models into input augmentation, intermediate fusion, and output interpolation. However, existing challenges included limited interactions between retrievers and LMs, misalignments in training objectives, and dependency on Wikipedia-centric data stores. The authors identified obstacles in joint optimization, emphasizing the need for more sophisticated interactions between retrievers and LMs.
Furthermore, it highlighted the lack of standardized libraries and infrastructure for large-scale training and inference, hindering the adoption of retrieval-augmented LMs. To advance these models, the researchers proposed a roadmap to expand data stores for wider applications, develop architectures with deep interactions, implement large-scale joint training techniques, and create specialized infrastructure and open-source libraries tailored to retrieval-augmented LMs.
How can we further advance retrieval-augmented LMs?
The roadmap proposed advancements for retrieval-augmented LMs, aiming to overcome current limitations. It suggested redefining "relevance" beyond semantic and lexical similarity, advocating for versatile retrievers capable of contextualized retrieval. The roadmap emphasized developing architectures with deeper interactions, efficient end-to-end training, and exploring post-hoc adaptations.
To address scaling challenges, research in compression algorithms, faster nearest neighbor search, and specialized hardware is important. The need for standardized, open-source implementations and benchmarks to propel retrieval-augmented LM development was highlighted, promoting collaborative efforts in hardware, systems, and algorithms.
Conclusion
In conclusion, researchers advocated for the adoption of retrieval-augmented LMs as superior to traditional parametric LMs, citing their enhanced reliability, adaptability, and verifiability. Acknowledging current challenges hindering widespread adoption, they proposed a comprehensive roadmap.
This roadmap emphasized redefining "relevance," developing nuanced retriever-LM interactions, and addressing infrastructure constraints for efficient training. The ultimate goal was to unlock the full potential of retrieval-augmented LMs, extending their applications beyond conventional knowledge-centric tasks. The authors stressed collaborative interdisciplinary efforts for successful advancements in architectures, training methodologies, and infrastructure.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
Journal reference:
- Preliminary scientific report.
Asai, A., Zhong, Z., Chen, D., Koh, P. W., Zettlemoyer, L., Hajishirzi, H., & Yih, W. (2024, March 5). Reliable, Adaptable, and Attributable Language Models with Retrieval. ArXiv.org. https://doi.org/10.48550/arXiv.2403.03187, https://arxiv.org/abs/2403.03187