Large language models (LLMs) such as Generative Pre-trained Transformers (GPT)-4, Pathways Language Model (PaLM), and Large Language Model Meta AI (Llama) have greatly advanced artificial intelligence (AI)-generated text. The concerns surrounding the potential misapplication emphasize the need for AI-generated text. Neural authorship attribution (AA) facilitates tracing AI-generated text back to its source LLM, categorizing it into proprietary and open-source classifications.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.
In a recent paper submitted to the arXiv* server, researchers analyzed LLM writing styles empirically, comparing these categories and investigating their use in AA, contributing to countering AI-generated misinformation threats.
Background
Recent strides in generative LLMs such as GPT-4, OpenAI's proprietary model, PaLM from Google, and open-source models, including Llama 1 and 2, have revolutionized AI-generated content, mainly textual. While AI-generated text is a boon for human productivity, its potential misuse for influence operations and misinformation dissemination poses severe cybersecurity and information integrity risks. The surge in seemingly genuine yet deceitful AI-generated news articles raises concerns. Therefore, immediate computational methodologies are crucial for conducting forensic evaluations of AI-generated text, thereby mitigating the dissemination of misinformation driven by LLMs.
A pivotal facet of AI-generated text forensics is neural authorship attribution, identifying the LLM behind a specific text. This aids in uncovering malicious actors and their strategies, informs countermeasures, and refines LLM usage ethics. Typically, neural authorship attribution involves training a classifier on pre-trained language model (PLM) embeddings using texts generated by known LLMs, such as the Robustly Optimized BERT pre-training approach (RoBERTa). However, the LLM landscape's evolution introduces novel dimensions. Categorizing source LLMs as proprietary or open-source holds merit, potentially revealing campaign nuances such as actor resources and expertise. An open-source LLM choice might indicate specialized skills and computational infrastructure.
Unveiling LLM writing signatures for enhanced neural AA
In previous research, AA focused on identifying writers through distinctive signatures. Early approaches used classical machine learning classifiers such as Naïve Bayes and Support Vector Machines (SVMs) and features such as n-grams, part-of-speech (POS) tags, and topic modeling. Neural networks, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), gained prominence due to their accurate representation. Transformer-based models introduced neural authors, identifying the generating language model. Initial studies applied traditional AA features to neural authors. Recent work employed PLM classifiers such as Grover, GPT-2, and GPT-3 for neural author attribution, including fine-tuned instances to base LLM attribution.
The researchers aimed to measure the writing signatures of both open-source and proprietary LLMs for more transparent neural AA. This is achieved through a three-step methodology. In dataset generation, various LLMs such as GPT-3.5, GPT-4, GPT-NeoX, Llama 1, and Llama 2 were used. Models GPT-3.5 and GPT-4 are proprietary; the rest are open-source. Focusing on news articles controls domain-specific variations. Researchers adopted prompting techniques using human-authored article headlines, yielding six thousand of AI-generated news articles from the chosen LLMs.
The writing signatures of each LLM model were assessed through stylometric features encompassing lexical, syntactic, and structural attributes. Syntactic features measure sentence length, POS frequency, active and passive voice, and tense. Structural features quantify paragraph length, punctuation usage, and capital letters. Features were normalized before forming the final writing signature vector.
Neural AA is treated as a classification problem. A classifier comprehends the boundary within writing signatures, attributing the text to recognizable LLM sources. Initially, binary classification distinguished proprietary and open-source LLMs. Multiple classifications further analyzed neural authorship within each category. Data samples were balanced for each classification. Classification models included XGBoost with stylometry (XGBstylo) and bag-of-words (XGBbow) features. RoBERTa model variations and a fusion of stylometry and RoBERTa embeddings were utilized, enhancing neural AA. Finally, these embeddings are incorporated into the proposed interpretable neural AA model.
Analyzing Proprietary and Open-Source LLMs for Neural AA
Evaluation of proprietary and open-source LLMs was performed separately. For classification, researchers balanced six thousand data samples, utilizing XGBoost and RoBERTa models, both fine-tuned and not, along with a fusion of RoBERTa embeddings and stylometry features for enhanced neural AA.
Initial Analysis: To get a deeper insight into the uniqueness and similarities among the examined LLMs, researchers conducted t-SNE analyses on RoBERTa embeddings and stylometry features. Both spaces show clear proprietary-open-source distinctions, with stylometry exhibiting more pronounced separation. Overlaps occur within LLM categories; GPT-3.5 and GPT-4 overlap among proprietary LLMs, while Llama 1 and GPT-NeoX overlap among open-source ones. Intriguingly, Llama 2 aligns with proprietary LLMs in stylometry, possibly due to its engaging text-generation reputation, narrowing the gap.
Proprietary versus Open-Source Attribution: Experiments on AA reveal that initial attribution, notably with fusion, performs well. Solely Llama 2 inclusion in open-source lowers performance by 7.4%, indicating growing complexity. Shapley additive explanations (SHAP) analysis on XGBoost highlights lexical diversity, specific syntactic features such as prepositions, adjectives, and nouns, and structural attributes for category distinction.
Intra-Category Attribution: Proprietary models, enhanced with stylometry, demonstrate strong performance scores. Conversely, open-source LLMs such as GPT-NeoX and Llama 1 exhibit reduced performance, while Llama 2 shines, showcasing a unique style distinct from its predecessors.
Conclusion
In summary, the current study delved into neural AA, differentiating proprietary and open-source LLMs. Detailed stylometric analysis unveiled distinct writing style indicators: lexical diversity, POS, and structural features. These insights improve attribution techniques and illuminate LLM evolution. Similarities in open-source models could stem from shared pre-training or architecture. However, Llama 2's uniqueness hints at open-source potential. Understanding LLM nuances becomes crucial to counter misinformation threats in the AI-generated content era.
*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.