An article published in the journal Nature discussed the increasingly creative role of artificial intelligence (AI) in scientific research, specifically in generating new hypotheses to find blind spots in existing research.
Hypotheses generation using AI
In the field of research, researchers already utilize AI to run statistical analyses, automate data collection, search the literature, and draft parts of papers. However, using AI to generate hypotheses, a task requiring creativity to ask important and interesting questions, poses a significant challenge.
Scientific hypotheses range from specific and concrete to general and abstract hypotheses. Additionally, another spectrum of hypotheses partially related to concrete hypotheses ranges from uninterpretable to clear hypotheses. Until now, AI has generated more specific and concrete hypotheses.
In several fields where the underlying principles have been understood, researchers only want to utilize AI to overcome the practical challenge of running complex computations, while in other fields where the fundamentals remain undiscovered, such as social science and medicine, researchers use AI for identifying rules which can be applied to fresh situations.
Existing findings can be organized into knowledge graphs using literature-based discovery and other computational techniques, with the networks of nodes representing, for instance, properties and molecules. AI can be used for analyzing these networks and proposing undiscovered links between property nodes and molecule nodes. Modern drug discovery and the task of assigning functions to genes primarily depend on this process.
AI can also generate in other ways, such as by proposing simple formulae to organize noisy data points. Hypothesis generation has been automated in several fields, including chemistry, biology, materials science, and particle physics. Large language models (LLMs), which refer to the AI techniques trained using substantial amounts of text data to generate new text, can suitably assist scientists in brainstorming. Although language models can generate inaccurate information and present that information as real information, this entire process can facilitate hypothesis generation.
Identifying blind spots
AI can be effectively used to identify blind spots in research, with researchers using AI to generate alien hypotheses/hypotheses that are not expected to be made by humans. In a recently published study, researchers built knowledge graphs of properties, materials, and researchers.
The study's objective was maximizing the plausibility of hypotheses devised by AI being true while minimizing the possibility that researchers make these hypotheses naturally. For instance, discovering the potential of a drug would take a significantly longer duration when scientists studying that drug are only connected distantly to those scientists who are studying a disease that the drug can cure.
In the study, data published up to the year 2001 was fed to the AI, and the results demonstrated that approximately 30% of the predictions made by the AI about the electrical properties of materials and drug repurposing had been actually discovered by researchers six to ten years later. The system can be further tuned to make more accurate predictions based on collaborations and concurrent findings.
Human-AI collaborations
In another recently published paper, the authors described a method for humans and AI to generate clear, broad hypotheses collaboratively. They generated hypotheses on defendants’ facial characteristics that can influence a judge’s decision to detain or free defendants before trial in a proof of concept.
An algorithm identified that several subtle facial features correlate with the decisions of the judges based on the mugshots of past defendants and judges’ decisions. The AI produced new mugshots with those identified facial features cranked down/up, and human participants were involved in describing the general differences between the mugshots.
Results showed that defendants who are likely to be freed were more heavy-faced and well-groomed. This method can also be used in other complex datasets, such as electrocardiograms, to identify impending heart attack markers that are unknown to doctors.
The advent of robot scientists
In science, hypothesis generation and experimentation form an iterative cycle. Robotic systems that perform experiments using mechanized arms are being built to complete this loop. A robotic system, designated as Genesis, is being developed that can experiment with yeast. Genesis can test and formulate hypotheses related to yeast biology by growing yeast cells in 10,000 bioreactors at a time, measuring several characteristics such as gene expression, and making genome edits/adjusting factors such as environmental conditions. The generated hypotheses could be potentially applicable during drug development. Moreover, these robot scientists can be more transparent, efficient, cheap, unbiased, and consistent compared to human researchers.
To summarize, the automation of hypothesis generation will become increasingly crucial as data gathering becomes more automated. However, several existing challenges must be addressed to effectively use AI for hypothesis generation. For instance, hypotheses-generating AI systems primarily depend on machine learning, which requires substantial amounts of data. Additionally, the development of AI systems that can reason about the physical world is also necessary.