Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models

Download PDF Copy

By Dr. Sampath LonkaReviewed by Susha Cheriyedath, M.Sc.Aug 2 2023

In a recent paper submitted to the arXiv* server, researchers conducted a comprehensive study comparing the effectiveness of specialized language models and the Generative Pretrained Transformer (GPT)-3.5 model in detecting Sustainable Development Goals (SDGs) within text data. The research also delves into the challenges associated with large language models (LLMs), particularly concerning bias and sensitivity.

*Study: Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models. Image credit: Ole.CNX/Shutterstock*

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Background

The field of artificial intelligence (AI) has witnessed remarkable progress in text summarization and classification, largely due to the emergence of large language models (LLMs), such as GPT models. These powerful systems can swiftly process massive amounts of text, extracting vital information and classifying documents with unprecedented accuracy. However, these advancements come with new challenges related to the biases and sensitivity inherent in the training data and decision-making processes.

One major concern with LLMs is the inadvertent embedding of biases in the training data, making it challenging to detect and rectify these biases. As a result, specialized language models are being developed, requiring meticulous training, comprehensive data collection, feature engineering, and sensitivity fine-tuning to address real-world complexities.

The present study focuses on the Sustainable Development Goals (SDGs) initiative, which was introduced in 2015 as part of the United Nations' 2030 Agenda. This initiative provides a comprehensive framework to address social, environmental, and economic challenges. However, challenges arise from diverse interpretations and perceived vagueness, leading to different categorizations and indicators proposed by scholars and experts. To overcome this, the study employs expert consensus to validate natural language processing (NLP) models, reducing subjectivity bias and enhancing SDG understanding and automation.

Data used in the study

The data utilized in this analysis is sourced from the ongoing INNOSDG project, which aims to capture sustainable development activities arising from public funding, research and development efforts, or ecosystem collaboration. The dataset comprises information from 3,299 Finnish companies established between 2009 and 2022, with company descriptions sourced from reputable data providers such as CB Insights, Vainu, and Pitchbook.

The specialized SDG and GPT-3.5 models were deployed on this dataset. For the first comparison, 2,389 companies met the eligibility requirements, and for the second analysis, companies founded after September 2021 were excluded, resulting in a sample of 2,550. The final analysis on few-shot learning in GPT-3.5's categorization used a small sample from the specialized model's training dataset, detailed elsewhere.

What did the researchers do?

The researchers developed a specialized SDG detection model by creating a lexical query using an SDG terminology database to search for relevant scientific publications from 2015 to 2020. They constructed an advanced taxonomy to categorize SDG-related publications and employed machine learning (ML) classification algorithms to create the specialized model.

The first experiment compared SDG detection in the GPT-3.5 model with the specialized model using prescribed company descriptions from data providers. The GPT-3.5 model iterated over the descriptions to detect SDGs, while a cleaning mechanism was introduced to address potential false positives. The second experiment leveraged GPT's ability to generate descriptions for companies and then detect SDG orientation within these descriptions based on the model's training data.

Additionally, few-shot learning was assessed for the GPT model using a labeled training dataset of journal article abstracts. A sample of 200 observations was randomly selected, and the model was provided with ten examples each for SDG2 and SDG7 to test its performance.

For all analyses, a non-restrictive intersection approach was used to identify overlaps in SDG detection between the models, considering any common SDGs as an overlap. The GPT-3.5 Turbo model was used for the experiments, with a temperature of 0 to minimize creativity.

The methodology for each experiment was thoroughly detailed, and GPT was deployed on a sample of company descriptions and abstracts to evaluate its performance in SDG detection.

Study Results

The specialized model adopts a more conservative approach, detecting fewer SDGs but with higher reliability and relevance, while the GPT model employs a more liberal approach, detecting SDGs in a broader range of descriptions but potentially including less relevant information. Consequently, the average number of SDGs detected per company is higher in the GPT model.

The comparison between SDG detection using GPT-3.5 on prescribed descriptions and GPT-generated descriptions for companies shows considerable intersection but differs in the average number of SDGs detected per company, with the GPT-generated descriptions resulting in more SDGs detected.

The GPT's performance in few-shot learning using a labeled training dataset demonstrates a high capture rate for SDGs 2 and 7, but it overidentifies other SDGs and misidentifies some SDGs. Overall, 34% of the abstracts are correctly identified with the SDG label.

Conclusion

In summary, the comparison between the specialized SDG model and OpenAI's GPT-3.5 highlights the trade-off between the broad coverage of general models and the precision of specialized models. They should not be considered interchangeable, and the choice should depend on the task's specific requirements. While LLMs show potential for more nuanced SDG detection with improved data training and parameter tuning, their black-box nature raises concerns about unexpected outcomes. When accuracy and transparency are crucial, specialized models may be more reliable. The study encourages researchers to consider the trade-offs, cost, complexity, and opacity of LLMs and explore alternatives, such as developing tailored ML models based on their data. The choice of model should be specific to each use case.

Journal reference:

Preliminary scientific report. Hajikhani, A., and Cole, C. (2023). A Critical Review of Large Language Models: Sensitivity, Bias, and the Path Toward Specialized AI. arXiv. https://arxiv.org/abs/2307.15425

Posted in: AI Research News

Comments (0)

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Lonka, Sampath. (2023, August 02). Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models. AZoAi. Retrieved on April 26, 2025 from https://www.azoai.com/news/20230802/Sensitivity-and-Bias-in-Large-Language-Models-A-Comparative-Study-of-SDG-Detection-Models.aspx.
MLA
Lonka, Sampath. "Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models". AZoAi. 26 April 2025. <https://www.azoai.com/news/20230802/Sensitivity-and-Bias-in-Large-Language-Models-A-Comparative-Study-of-SDG-Detection-Models.aspx>.
Chicago
Lonka, Sampath. "Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models". AZoAi. https://www.azoai.com/news/20230802/Sensitivity-and-Bias-in-Large-Language-Models-A-Comparative-Study-of-SDG-Detection-Models.aspx. (accessed April 26, 2025).
Harvard
Lonka, Sampath. 2023. Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models. AZoAi, viewed 26 April 2025, https://www.azoai.com/news/20230802/Sensitivity-and-Bias-in-Large-Language-Models-A-Comparative-Study-of-SDG-Detection-Models.aspx.