Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models

In a recent paper submitted to the arXiv* server, researchers conducted a comprehensive study comparing the effectiveness of specialized language models and the Generative Pretrained Transformer (GPT)-3.5 model in detecting Sustainable Development Goals (SDGs) within text data. The research also delves into the challenges associated with large language models (LLMs), particularly concerning bias and sensitivity.

Study: Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models. Image credit: Ole.CNX/Shutterstock
Study: Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models. Image credit: Ole.CNX/Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Background

The field of artificial intelligence (AI) has witnessed remarkable progress in text summarization and classification, largely due to the emergence of large language models (LLMs), such as GPT models. These powerful systems can swiftly process massive amounts of text, extracting vital information and classifying documents with unprecedented accuracy. However, these advancements come with new challenges related to the biases and sensitivity inherent in the training data and decision-making processes.

One major concern with LLMs is the inadvertent embedding of biases in the training data, making it challenging to detect and rectify these biases. As a result, specialized language models are being developed, requiring meticulous training, comprehensive data collection, feature engineering, and sensitivity fine-tuning to address real-world complexities.

The present study focuses on the Sustainable Development Goals (SDGs) initiative, which was introduced in 2015 as part of the United Nations' 2030 Agenda. This initiative provides a comprehensive framework to address social, environmental, and economic challenges. However, challenges arise from diverse interpretations and perceived vagueness, leading to different categorizations and indicators proposed by scholars and experts. To overcome this, the study employs expert consensus to validate natural language processing (NLP) models, reducing subjectivity bias and enhancing SDG understanding and automation.

Data used in the study

The data utilized in this analysis is sourced from the ongoing INNOSDG project, which aims to capture sustainable development activities arising from public funding, research and development efforts, or ecosystem collaboration. The dataset comprises information from 3,299 Finnish companies established between 2009 and 2022, with company descriptions sourced from reputable data providers such as CB Insights, Vainu, and Pitchbook.

The specialized SDG and GPT-3.5 models were deployed on this dataset. For the first comparison, 2,389 companies met the eligibility requirements, and for the second analysis, companies founded after September 2021 were excluded, resulting in a sample of 2,550. The final analysis on few-shot learning in GPT-3.5's categorization used a small sample from the specialized model's training dataset, detailed elsewhere.

What did the researchers do?

The researchers developed a specialized SDG detection model by creating a lexical query using an SDG terminology database to search for relevant scientific publications from 2015 to 2020. They constructed an advanced taxonomy to categorize SDG-related publications and employed machine learning (ML) classification algorithms to create the specialized model.

The first experiment compared SDG detection in the GPT-3.5 model with the specialized model using prescribed company descriptions from data providers. The GPT-3.5 model iterated over the descriptions to detect SDGs, while a cleaning mechanism was introduced to address potential false positives. The second experiment leveraged GPT's ability to generate descriptions for companies and then detect SDG orientation within these descriptions based on the model's training data.

Additionally, few-shot learning was assessed for the GPT model using a labeled training dataset of journal article abstracts. A sample of 200 observations was randomly selected, and the model was provided with ten examples each for SDG2 and SDG7 to test its performance.

For all analyses, a non-restrictive intersection approach was used to identify overlaps in SDG detection between the models, considering any common SDGs as an overlap. The GPT-3.5 Turbo model was used for the experiments, with a temperature of 0 to minimize creativity.

The methodology for each experiment was thoroughly detailed, and GPT was deployed on a sample of company descriptions and abstracts to evaluate its performance in SDG detection.

Study Results

The specialized model adopts a more conservative approach, detecting fewer SDGs but with higher reliability and relevance, while the GPT model employs a more liberal approach, detecting SDGs in a broader range of descriptions but potentially including less relevant information. Consequently, the average number of SDGs detected per company is higher in the GPT model.

The comparison between SDG detection using GPT-3.5 on prescribed descriptions and GPT-generated descriptions for companies shows considerable intersection but differs in the average number of SDGs detected per company, with the GPT-generated descriptions resulting in more SDGs detected.

The GPT's performance in few-shot learning using a labeled training dataset demonstrates a high capture rate for SDGs 2 and 7, but it overidentifies other SDGs and misidentifies some SDGs. Overall, 34% of the abstracts are correctly identified with the SDG label.

Conclusion

In summary, the comparison between the specialized SDG model and OpenAI's GPT-3.5 highlights the trade-off between the broad coverage of general models and the precision of specialized models. They should not be considered interchangeable, and the choice should depend on the task's specific requirements. While LLMs show potential for more nuanced SDG detection with improved data training and parameter tuning, their black-box nature raises concerns about unexpected outcomes. When accuracy and transparency are crucial, specialized models may be more reliable. The study encourages researchers to consider the trade-offs, cost, complexity, and opacity of LLMs and explore alternatives, such as developing tailored ML models based on their data. The choice of model should be specific to each use case.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Preliminary scientific report. Hajikhani, A., and Cole, C. (2023). A Critical Review of Large Language Models: Sensitivity, Bias, and the Path Toward Specialized AI. arXiv. https://arxiv.org/abs/2307.15425
Dr. Sampath Lonka

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Lonka, Sampath. (2023, August 02). Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models. AZoAi. Retrieved on July 06, 2024 from https://www.azoai.com/news/20230802/Sensitivity-and-Bias-in-Large-Language-Models-A-Comparative-Study-of-SDG-Detection-Models.aspx.

  • MLA

    Lonka, Sampath. "Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models". AZoAi. 06 July 2024. <https://www.azoai.com/news/20230802/Sensitivity-and-Bias-in-Large-Language-Models-A-Comparative-Study-of-SDG-Detection-Models.aspx>.

  • Chicago

    Lonka, Sampath. "Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models". AZoAi. https://www.azoai.com/news/20230802/Sensitivity-and-Bias-in-Large-Language-Models-A-Comparative-Study-of-SDG-Detection-Models.aspx. (accessed July 06, 2024).

  • Harvard

    Lonka, Sampath. 2023. Sensitivity and Bias in Large Language Models: A Comparative Study of SDG Detection Models. AZoAi, viewed 06 July 2024, https://www.azoai.com/news/20230802/Sensitivity-and-Bias-in-Large-Language-Models-A-Comparative-Study-of-SDG-Detection-Models.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Accent Classification with Deep Learning Models