In an article in press with the journal Trends in Plant Science, researchers investigated the feasibility of using large language models (LLMs) to derive essential questions in plant science.
Background
LLMs are primarily deep learning algorithms that use massive amounts of information to generate intelligent output, such as translations, revisions, and summaries. Generative pre-trained transformers (GPTs), such as the GPT-1 released five years ago, are examples of LLMs.
However, the release of ChatGPT by OpenAI Inc. in 2022 provided a breakthrough in the field of LLMs. The chatbot utilizes a modified version of GPT-3 that uses reinforcement learning from human feedback (RLHF) and supervised learning.
LLMs can be applied in plant science research to speed up the scientific process, assist with the evaluation of missed aspects, and simplify several challenging and complex tasks of researchers. For instance, ChatGPT can be used to generate questions based on substantial information outside of and within plant science.
Using ChatGPT to generate plant science questions
In this paper, the authors investigated the feasibility of using LLMs to expedite scientific research and generate useful and inclusive information from a huge data pool. They used ChatGPT to create relevant questions in plant science and evaluated the ability of the LLM to derive questions similar to those obtained from human-led efforts and to include those aspects overlooked in human-generated questions.
ChatGPT March 14 version was prompted using the chat user interface to derive questions through interactive sessions. Different answers can be expected based on different prompts and times as the LLM is evolving continuously. However, the preliminary test results displayed no significant changes in responses generated by ChatGPT over time.
The authors evaluated the generated questions based on the word appearance frequency. In the human questions obtained from the study “One hundred important questions facing plant science: an international perspective” published in the journal New Phytologist, the British spellings were changed to American spellings for uniformity.
Common words, such as ‘what’, ‘with’, ‘we’, ‘use’, ‘to’, ‘those’, ‘their’, ‘the’, ‘that’, ‘such’, ‘other’, ‘of’, ‘in’, ‘how’, ‘for’, ‘do’, ‘can’, ‘as’, ‘are’, and ‘and,’ were excluded for all evaluations. Subsequently, words with similar meanings and variations in their forms were merged.
In every list of questions, the 50 most occurred words were selected based on the word frequency for word cloud analysis. The least frequent words among the 50 most frequent words in the ChatGPT-derived list of questions in plant science occurred twice. Overall, 51 words were included in the word cloud analysis as 23 words appeared twice.
Similarly, the least-frequent words occurred three times in the 50 most frequent words in the list of human questions. Overall, 47 words were included in the cloud after excluding the 32 words that occurred thrice in the questions.
Significance of the study
In the ChatGPT-derived questions, the 11 most frequent words, including molecular, interaction, change, response, sustainable, develop, plant-based materials, control, use, mechanisms, and plant, appeared over 10 times. These questions were divided into four categories: using plants to develop sustainable products, plant–environment interactions, improving plant traits, and understanding the plant mechanisms. The ChatGPT analysis suggested that using plants to produce sustainable products with industrial and practical applications is a crucial aspect of modern plant science.
Growing concerns regarding pollution have increased the importance of developing sustainable products and materials, such as pharmaceuticals and plastics, which can decrease environmental pollutant load. Questions on plant mechanisms primarily focused on plant functioning and photosynthesis to understand life on Earth and identify how stressors such as contaminants affect plants. These questions also indicated the need to improve the understanding of different mechanisms controlling plant development, growth, and interactions with other organisms such as herbivores and pollinators. Recent studies have demonstrated that contaminants can affect plant interactions with pollinators/herbivores in different ways.
Questions on plant interactions with several abiotic and biotic factors primarily revolved around understanding and protecting ecological and plant health. Similarly, the questions on enhanced plant traits suggested the need for developing plants with improved characteristics for higher yields and greater resilience under climate change.
ChatGPT successfully captured continent-specific contexts when authors prompted the LLM to generate the 10 most relevant questions in plant science for each continent. Moreover, ChatGPT also generated essential questions that will be faced by plant science in the 22nd century and the second half of the 21st century.
The questions derived for both periods differed significantly, with questions in the 21st century focusing on creating crops that are resilient to different environmental challenges and have fewer land, water, and nutrient requirements. In contrast, in the 22nd century, the questions focused on plant-microbe interactions and developing crops that are optimized for more complex interactions.
Comparing researcher-selected and ChatGPT-generated questions
Several common features were identified in researcher-derived and ChatGPT-generated questions, including plant–pollinator interactions, microbiomes and their use for improved products and services, nitrogen fixation, biofuels, and homeostasis.
Although both question lists focused on society and plants, ChatGPT questions focus more on molecular approaches to fundamental plant biology and plant-environment interactions compared to plants, climate change, and food production. The researcher’s list of questions gave significant importance to climate change, which was a crucial distinction from the ChatGPT-derived questions. ChatGPT questions primarily focused on producing more sustainable plant-based materials.
Moreover, ChatGPT questions focused on mechanisms controlling different plant functions and ecological processes and plant response to different factors excluding climate change. Conversely, researcher-selected questions primarily focused on aquatic plant forms to remediate pollution.
To summarize, the findings of this study demonstrated that LLMs, specifically ChatGPT, can be used as a supportive tool to expedite, streamline, and facilitate specific tasks in plant science research. Although ChatGPT overlooked several crucial researcher-highlighted aspects, it can offer valuable insights into the expert-generated questions.