Novel Watermarking Techniques for Identifying AI-Generated Text

In an article recently submitted to the arXiv* server, researchers explored watermarking to differentiate generated text from natural text. They introduced new statistical tests with robust theoretical guarantees, even at very low false-positive rates. The study used classical natural language processing (NLP) benchmarks to compare watermark effectiveness. It developed advanced detection schemes for scenarios with access to the large language model (LLM), including multi-bit watermarking.

Study: Novel Watermarking Techniques for Identifying AI-Generated Text. Image Credit: BOY ANTHONY/Shutterstock
Study: Novel Watermarking Techniques for Identifying AI-Generated Text. Image Credit: BOY ANTHONY/Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Background

Past work has highlighted the potential misuse of LLMs for generating disinformation, impersonation, and academic dishonesty. State-of-the-art methods propose watermarking to distinguish generated text from real text. These methods include introducing new statistical tests with robust guarantees against false positives, comparing the effectiveness of watermarks using traditional NLP benchmarks, and developing advanced detection schemes and multi-bit watermarking techniques to enhance identification accuracy and trace specific LLM versions.

Advanced Watermarking Techniques

LLMs generate text by estimating the likelihood of token sequences, using various sampling methods for generation. Watermarking techniques modify the token distribution or the sampling process to embed invisible traces in the text. These methods involve altering token probabilities or using deterministic sampling based on a secret key, which helps detect whether text is watermarked.

The detection process involves statistical tests like Z-tests to differentiate between natural and watermarked text, with adjustments for quality and robustness. Key management ensures diversity and synchronization using cryptographic functions. Recent improvements refine statistical methods and scoring strategies to address false positive rates (FPR) and enhance detection accuracy.

Challenges in Watermark Detection

Large-scale evaluations reveal a gap between theoretical and practical FPR in watermark detection. By selecting 100k texts from multilingual Wikipedia and running detection tests with varying window lengths (h) for the random number generator (RNG) seeding, we observed that empirical FPRs were much higher than theoretical ones. The larger the watermarking context window, the closer the results aligned with theoretical guarantees. However, achieving reliable p-values requires a significantly large h, which compromises the robustness of the watermarking method against text editing.

The analysts developed new non-asymptotical tests to address the limitations of Z-tests for short or repetitive texts. For the greenlist watermark method, the score distribution follows a binomial distribution, and p-values are computed using the regularized incomplete Beta function. The score follows a gamma distribution for the deterministic sampling method, and p-values are calculated using the upper incomplete gamma function. These new tests significantly reduce the gap between empirical and theoretical FPR, particularly at low FPR values.

Even with improved statistical tests, empirical FPRs remained higher than theoretical ones due to the pseudo-random nature of random variables, particularly in formatted data with repeated sequences. Two heuristics were tested to mitigate this issue: scoring tokens only if the watermark context window had not been seen before and scoring tokens for which the h + 1-tuple formed by the watermark context and current token had yet to be seen. The latter method proved more effective, ensuring that empirical and theoretical FPRs matched perfectly, except for h = 0. This approach guarantees FPR by using new statistical tests and scoring unique token sequences.

Watermarking Evaluation Overview

This section introduces the evaluation of watermarking methods using revised statistical tests and explores their impact on natural language processing benchmarks. The focus is on assessing the effectiveness of these methods in detecting watermarked texts, employing stringent detection thresholds to ensure a low FPR. Evaluations are conducted in a simulated chatbot scenario using large language model attack (LLaMA) models, examining different watermark strengths and simulating attacks such as token replacements.

Results highlight varying levels of success in achieving true positive rates (TPR) across different methods, alongside considerations of semantic distortion measured by sentence-bidirectional encoder representations from transformers (S-BERT) scores. Furthermore, the analysis investigates how watermark context width (h) influences token repetition and overall detection sensitivity, which is crucial for balancing robustness and accuracy in real-world applications.

Watermarking's impact on free-form generation tasks is then assessed across several key natural language processing benchmarks. Unlike traditional quality metrics such as perplexity or similarity scores, which may overlook subtle errors introduced by watermarking, this evaluation directly measures performance in tasks like closed-book question answering, mathematical reasoning, and code generation. Larger models demonstrate greater resilience, suggesting that their advanced generative capabilities help mitigate the negative effects of watermarking on practical applications.

Conclusion

To sum up, this research provided theoretical and empirical insights previously overlooked in the literature on watermarks for LLMs. Existing methods were found to rely on biased statistical tests, resulting in inaccurate false positive rates. It was addressed by introducing grounded statistical tests and a revised scoring strategy. Evaluation setups and detection schemes were also introduced to strengthen the application of watermarks for LLMs.

Future work may explore adapting watermarks for more complex sampling schemes, such as beam search, which have significantly improved generation quality. Despite being relatively new in the context of generative models, Watermarking has proven reliable and practical for identifying and tracing LLM outputs.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Preliminary scientific report. Fernandez, P., et al. (2023). Three Bricks to Consolidate Watermarks for Large Language Models. ArXiv. DOI: 10.48550/arxiv.2308.00113, https://arxiv.org/abs/2308.00113
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2024, June 28). Novel Watermarking Techniques for Identifying AI-Generated Text. AZoAi. Retrieved on July 01, 2024 from https://www.azoai.com/news/20240628/Novel-Watermarking-Techniques-for-Identifying-AI-Generated-Text.aspx.

  • MLA

    Chandrasekar, Silpaja. "Novel Watermarking Techniques for Identifying AI-Generated Text". AZoAi. 01 July 2024. <https://www.azoai.com/news/20240628/Novel-Watermarking-Techniques-for-Identifying-AI-Generated-Text.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Novel Watermarking Techniques for Identifying AI-Generated Text". AZoAi. https://www.azoai.com/news/20240628/Novel-Watermarking-Techniques-for-Identifying-AI-Generated-Text.aspx. (accessed July 01, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2024. Novel Watermarking Techniques for Identifying AI-Generated Text. AZoAi, viewed 01 July 2024, https://www.azoai.com/news/20240628/Novel-Watermarking-Techniques-for-Identifying-AI-Generated-Text.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
AI and NLP Transform Software Requirements Engineering