AI-Powered Wikipedia Verification System (SIDE) Enhances Content Verifiability

In an article published in the journal Nature, the authors discuss the importance of verifiability in Wikipedia content and introduce an Artificial Intelligence (AI) system called SIDE (System for Improving the Verifiability of Wikipedia References) to improve the quality of references.

Study: AI-Powered Wikipedia Verification System (SIDE) Enhances Content Verifiability. Image credit: Generated using DALL.E.3
Study: AI-Powered Wikipedia Verification System (SIDE) Enhances Content Verifiability. Image credit: Generated using DALL.E.3

Background

Wikipedia, as a widely accessed knowledge source, provides in-line citations to support the claims in its content, but the existing process of verification is inadequate. An in-line citation might come from a questionable source or it may not entail the claim in the text. Although such claims can still be true, readers cannot promptly verify them using the cited source. Thus, there is an urgent need for tools that can help readers easily verify Wikipedia claims.

The present paper underscores the significance of ensuring that knowledge on Wikipedia is verifiable, given its extensive user base and page views. The research aims to address this challenge by combining AI with human efforts to enhance the credibility of Wikipedia content. The SIDE system utilizes neural networks and information retrieval to identify unreliable citations and recommend better alternatives from the web, learning from existing Wikipedia references.

System Architecture of SIDE

The Retrieval Engine: When confronted with a Wikipedia claim marked as 'failed verification' by either a human editor or SIDE's own verification engine, the system initiates a retrieval process to gather documents that support the claim. SIDE employs both sparse and dense retrieval mechanisms, drawing on various contextual cues.

Claim Context: To formulate a search query, SIDE leverages the context of the claim, which includes the preceding sentences, the section title, and the title of the enclosed Wikipedia article.

Sparse Retriever with Generative Query Expansion: This component uses a sequence-to-sequence (seq2seq) model to convert the claim context into a query. It then matches this query against a BM25 index of Sphere, a vast web-scale data source. Training the seq2seq model with data from Wikipedia aids in generating more effective query expansions.

Dense Passage Retriever: The dense retrieval sub-system employs neural networks to encode the citation context into a dense query vector, which is subsequently matched against passage encodings. The system retrieves passages with rephrased versions of the claim, making it a valuable asset.

The Verification Engine: The verification engine plays a pivotal role in assessing the extent to which a claim is supported by the evidence provided. It operates on a per-passage level and determines the verification score for documents. Key aspects of this engine include the following:

  • Verification Process, where the engine, driven by a fine-tuned BERT transformer, takes the claim and document as input and predicts the level of support.
  • Prioritizing Verifiability, Unlike making binary verification decisions, SIDE's verification engine prioritizes ranking claim-document pairs based on their verifiability. This approach aligns with SIDE's goal of guiding users toward more reliable citations and claims

The architecture of SIDE embodies a synergistic blend of advanced AI techniques, retrieval mechanisms, and verification processes. SIDE's development draws on a rich dataset of Wikipedia contributions and is poised to contribute significantly to the quality and reliability of information on this widely accessed platform.
Dataset and Training for SIDE

To train SIDE's neural network components, a dataset called WAFER is created from the vast pool of Wikipedia citations. This data, however, is inherently noisy as even existing citations may fail verification. While training is based on claim-document pairs, SIDE operates at the passage level.

The expectation-maximization (EM) algorithm is used for training. The verification engine employs a RoBERTa transformer architecture, producing verification scores for claim-document pairs. EM identifies the best supporting passage for a claim, even amid challenges like a lack of gold passages. Negative examples, generated from references with incorrect claims, are introduced for both retriever and verification engine training.

This unique training approach leverages Wikipedia's data scale, addressing verification challenges and ensuring the neural network components align with SIDE's goal of improving Wikipedia content verifiability.

Evaluation and Results

Evaluating SIDE involves a two-step approach:

Retrieval Evaluation

  • SIDE successfully retrieves existing citation sources from the vast web pool.
  • A combination of sparse and dense retrieval methods proves most effective.
  • The verification engine consistently ranks the original citation source highly.

Detecting Failed Verification

  • The verification engine adeptly identifies citations that fail verification, particularly at the passage level.
  • Utilizing URL depth as a baseline help distinguish citation quality.

Evaluation of the Final System

  • A large-scale human assessment indicates that SIDE's citation suggestions align with user preferences.
  • A fine-grained evaluation with Wikipedia users further validated SIDE's ability to select reliable citation pairs.

In essence, SIDE showcases its prowess in enhancing citation quality and verifiability, aligning with user preferences and standards.

Conclusion

The researchers developed SIDE, an AI-based system to enhance the verifiability of Wikipedia citations. They illustrated how recent advances in natural language processing empower machines to assist humans in the intricate task of finding better citations. This endeavor necessitates a profound comprehension of language and a mastery of online search.

While prior research demonstrated the capabilities of neural networks in understanding natural language, these results often originated from well-defined tasks and synthetic datasets. In contrast, the authors’ work showcases similar achievements in a real-world context, where data is noisier, and the task is less rigorously defined. The study's primary objective is not to set new benchmarks but to emphasize that current technologies have matured to a point where they can substantially aid Wikipedia users in verifying claims.

Journal reference:
  • Petroni, F., Broscheit, S., Piktus, A., Lewis, P., Izacard, G., Hosseini, L., Dwivedi-Yu, J., Lomeli, M., Schick, T., Bevilacqua, M., Mazaré, P.-E., Joulin, A., Grave, E., & Riedel, S. (2023). Improving Wikipedia verifiability with AI. Nature Machine Intelligence5(10), 1142–1148. DOI: 10.1038/s42256-023-00726-1, https-//www.nature.com/articles/s42256-023-00726-1

Article Revisions

  • Jul 4 2024 - Fixed broken journal link.
Soham Nandi

Written by

Soham Nandi

Soham Nandi is a technical writer based in Memari, India. His academic background is in Computer Science Engineering, specializing in Artificial Intelligence and Machine learning. He has extensive experience in Data Analytics, Machine Learning, and Python. He has worked on group projects that required the implementation of Computer Vision, Image Classification, and App Development.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2024, July 03). AI-Powered Wikipedia Verification System (SIDE) Enhances Content Verifiability. AZoAi. Retrieved on November 17, 2024 from https://www.azoai.com/news/20231023/AI-Powered-Wikipedia-Verification-System-(SIDE)-Enhances-Content-Verifiability.aspx.

  • MLA

    Nandi, Soham. "AI-Powered Wikipedia Verification System (SIDE) Enhances Content Verifiability". AZoAi. 17 November 2024. <https://www.azoai.com/news/20231023/AI-Powered-Wikipedia-Verification-System-(SIDE)-Enhances-Content-Verifiability.aspx>.

  • Chicago

    Nandi, Soham. "AI-Powered Wikipedia Verification System (SIDE) Enhances Content Verifiability". AZoAi. https://www.azoai.com/news/20231023/AI-Powered-Wikipedia-Verification-System-(SIDE)-Enhances-Content-Verifiability.aspx. (accessed November 17, 2024).

  • Harvard

    Nandi, Soham. 2024. AI-Powered Wikipedia Verification System (SIDE) Enhances Content Verifiability. AZoAi, viewed 17 November 2024, https://www.azoai.com/news/20231023/AI-Powered-Wikipedia-Verification-System-(SIDE)-Enhances-Content-Verifiability.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Tencent’s Hunyuan-Large AI Model Sets New Benchmark with 389 Billion Parameters