A study published in the journal Scientific Reports proposes a new model-independent approach to distinguish texts written by humans versus those generated by artificial intelligence (AI) systems like Chat Generative Pre-trained Transformer (ChatGPT). According to the researchers, the syntax in texts authored by humans tends to differ from AI-generated texts, which can be quantified by analyzing redundancy in word usage. This study implements a Bayesian probabilistic method using a redundancy measure to support discrimination between human and AI authorship hypotheses, even for short texts.
With the advent of advanced language models like ChatGPT that can generate human-like text, the academic community is increasingly concerned about the inability to verify the authorship of scientific papers or student assignments potentially written using AI. Copyright lawsuits have already been filed in the US against AI companies by authors for unauthorized use of copyrighted books as training data. The core question is whether human versus ChatGPT authorship can be determined for a text.
As AI systems become more sophisticated at mimicking human writing styles, new techniques are required to detect machine-generated text. Stylometry, the statistical analysis of linguistic style, shows promise for identifying subtle differences between human and artificial writing. The syntax and vocabulary richness in human texts differ from the more formulaic structures of AI-generated language. Quantifying stylistic markers like redundancy can reveal these differences.
However, standard machine learning approaches rely on training data from the AI model itself, limiting generalizability. This study implements a model-independent metric based solely on n-gram usage. Combined with Bayesian hypothesis testing, it provides a probabilistic framework to assess evidence for competing authorship claims. Such computational forensic techniques will grow increasingly relevant as AI proliferation raises more questions of verified scholarship and ownership.
About the Study
This study collected two text datasets - 75 articles from Forensic Science International journal (1978-1985) termed 'Human' and an equal number of texts generated by ChatGPT on the same topics. For comparison, 71 original manuscripts by university students termed 'Human' and 49 ChatGPT-generated texts on similar themes were also analyzed. All texts were standardized to 1800 characters.
A redundancy measure was defined to quantify repetitive n-gram usage, indicating lower vocabulary variety. Values were computed using the Program of Textual Analysis by OrphAnalytics (PATOA) software for uni-, bi-, tri-, and quad-grams in each text. The style marker was then evaluated under competing human/ChatGPT authorship hypotheses using Bayes factors, enabling probabilistic classification.
The methodology demonstrates an original approach for stylistic analysis based on quantifying redundancy rather than the traditional focus on vocabulary richness. Unlike typical training-based machine learning models, Bayesian hypothesis testing also provides a formal probabilistic framework for evidence evaluation. The study design uses model-independent measures and limited sample sizes to simulate real-world forensic conditions.
The datasets encompass scientific writing and informal topics to assess consistency across domains. Standardizing relatively short excerpt lengths increases discrimination difficulty versus full-length articles. English and French texts are compared to check language independence. Establishing robust performance despite these constraints highlights applicability for practical questioned authorship scenarios.
Results
Analysis of bi- and multivariate n-grams indicated a clear separation between human and ChatGPT redundancy scores. The best discrimination was achieved using a multivariate model with all n-grams considered jointly.
Across both datasets, Bayes factors strongly supported the correct authorship hypothesis in most cases, with minimal misclassifications. Factors wrongly favoring human authorship for ChatGPT texts and vice versa had low magnitudes, indicating weak support. No difference was observed between texts on scientific versus non-scientific topics or between languages (English/French), highlighting the consistency of the redundancy measure.
The approach was also tested on a small sample of five economics texts by Paul Krugman versus five ChatGPT-generated texts on economics. Again, the probabilistic method perfectly distinguished between the two without any misclassifications.
The results demonstrate successful discrimination between human and AI-generated texts based on quantifiable differences in redundancy. The consistency across domains and languages, and with minimal samples, proves the broad applicability of the approach. The Bayes factors provide a calibrated scale for evidence strength, with misclassifications only occurring in borderline weak support zones. This establishes the redundancy metric and Bayesian framework as a robust solution for questioned authorship in texts of varied provenance.
Future Outlook
This study demonstrates the feasibility of successfully discriminating between human and ChatGPT-authored texts of varying domains and languages using a redundancy-based Bayesian approach. The model's independence from AI training data or algorithms and its applicability, even with limited samples, make this a promising technique to tackle increasingly common authorship issues.
According to the researchers, stylometry and probabilistic evaluation offer a robust forensic text evidence analysis framework as AI usage expands. The ability to quantify writing style differences meaningfully via redundancy analysis and Bayesian hypothesis testing provides scientific rigor to authorship attribution.
Journal reference:
- Bozza, S., Roten, C.-A., Jover, A., Cammarota, V., Pousaz, L., & Taroni, F. (2023). A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach. Scientific Reports, 13(1), 19217. https://doi.org/10.1038/s41598-023-46390-8, https://www.nature.com/articles/s41598-023-46390-8