A team of student researchers has developed an easy-to-follow machine-learning tutorial aiming to equip biologists and health professionals with the tools to detect antibiotic resistance—demystifying AI for real-world medical challenges.
Research: Using genomic data and machine learning to predict antibiotic resistance: A tutorial paper. Image Credit: sruilk / Shutterstock
According to the World Health Organization, antimicrobial resistance is a growing health crisis that could lead to millions of deaths by 2050. Antibiotics are critical for human health, but many microbes are evolving resistance to one or more drugs. San Francisco State University researchers are among those using machine learning to predict drug resistance in patients. They're also trying to remedy a related problem: the lack of resources that teach how to use machine learning to detect antibiotic resistance.
In a new paper in the journal PLOS Computational Biology, the SFSU team published a step-by-step machine-learning tutorial for beginners. In addition to Biology Professor Pleuni Pennings, the remaining seven researchers on the paper were undergraduate, graduate, and post-baccalaureate students. Many were first-time researchers, and nearly all were new to machine learning.
"We wanted to do a tutorial paper instead [of a research paper] because we thought it was more important to put out a teachable resource. We struggled to find one, so we wanted to make our own," said co-first author Faye Orcales (B.S., '21), who worked on the project as a post-bac.
As beginners from various backgrounds, the team ensured the paper would be accessible to their student peers and educators in biology and chemistry and anyone in health sciences. Though the lesson is beginner-friendly, the authors recommend having introductory coding knowledge, which is beyond the scope of this paper.
"Because it's in a peer-reviewed journal, it makes it feel real because other scientists - not just your professor or friends - reviewed the article. The peer review process was crucial because it gives other perspectives," said co-first author Lucy Moctezuma, a Statistics graduate student at CSU East Bay who has a background in psychology. She joined Pennings' SFSU lab through a friend and was part of the lab for nearly three years. She and Orcales led the effort to write the manuscript and address any feedback. "We were a bunch of students trying to figure it out and we were able to! I think that we should all be proud of that," Moctezuma said.
Using a previously published data set - comprised of 1,936 E. coli strains from patients that were tested against 12 antibiotics - the students developed a step-by-step tutorial for four different popular machine-learning models to predict drug resistance to E. coli. To improve accessibility, they used Google Colab, a free, cloud-based platform to write and run Python codes - which means users don't have to install software to follow the tutorial. The SFSU team provided six free Google Colab "notebooks" with tutorials: one for each of the four models (logistic regression, random forests, extreme gradient-boosted trees, and neural networks) plus two for data preparation and result visualization.
"The students may not realize that it's sort of bold [to submit this paper to PLOS]. It just shows that we do very high-quality work," said Pennings, adding that the students took ownership of the writing and pushed the manuscript forward.
Collaborating with faculty in Biology, Computer Science, Chemistry, and biochemistry, Pennings is the director or co-director of the undergraduate Promoting Inclusivity in Computing (PINC) program, the graduate complement Graduate Opportunities to Learn Data Science (GOLD), and the Science Coding Immersion Program (SCIP), an all-virtual, self-paced coding program for students, staff, and faculty. All the student researchers initially learned coding and/or machine learning from one of these programs and then continued to develop their skills via longer-term research experiences.
"One of my motivations to making all of these materials is because I'm teaching these classes and I wish there was a book about machine learning for health or biology. Something that is doable, fun and relevant. Something that's intuitive, practical and discusses the ethical side," said Pennings, noting that she's already using this published tutorial in her classes.
"When I joined the PINC program, I could see that the instructors were motivated to teach coding in a very accessible way to Biology students. I felt really comfortable in the program because my peers were fellow biologists eager to learn," said Orcales, now a computational scientist at UCSF applying to Ph.D. programs. She hopes this new tutorial will help introduce more of her peers into the machine-learning space. "I hope our readers take away that machine learning isn't this daunting difficult thing to learn when you have the right resources."
Source:
Journal reference:
- Using genomic data and machine learning to predict antibiotic resistance: A tutorial paper Orcales F, Moctezuma Tan L, Johnson-Hagler M, Suntay JM, Ali J, et al. (2024) Using genomic data and machine learning to predict antibiotic resistance: A tutorial paper. PLOS Computational Biology 20(12): e1012579. DOI: 10.1371/journal.pcbi.1012579, https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012579