Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees

Download PDF Copy

By Dr Silpaja Chandrasekar, PhDReviewed by Susha Cheriyedath, M.Sc.Jan 5 2024

In a paper published in the journal Scientific Reports, researchers gathered a dataset comprising approximately 11,000 distinct images of stomata on temperate broadleaf angiosperm tree leaves. This collection includes over 7,000 images covering 17 common hardwood species like oak, maple, ash, elm, and hickory, along with more than 3,000 images representing 55 genotypes from seven Populus taxa.

*Study: Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees. Image credit: Miks Kuncevics/Shutterstock*

Each image is annotated for inner guard cell walls and whole stomata, allowing conversion into various annotation formats. This dataset enables the application of cutting-edge machine learning models to detect, count, and measure leaf stomata, explore diverse stomatal characteristics among hardwood trees, and create new indices for stomatal measurements.

Related Work

An extensive collection of approximately 11,000 unique hardwood leaf stomatal images, sourced from projects spanning 2015 to 2022, has been curated. This dataset includes over 7,000 images covering 17 common hardwood species and over 3,000 representing 55 genotypes from seven Populus taxa. They have annotated each image for inner guard cell walls and whole stomata, creating corresponding you only look once (YOLO) label files that facilitate machine learning model training and analysis.

This freely accessible dataset enables the development of advanced, high-throughput methods for detecting, counting, and measuring leaf stomata in temperate hardwood trees. It also supports exploration into the diverse stomatal characteristics among different hardwood tree types and offers avenues for creating new indices for stomatal measurements, benefiting ecologists, plant biologists, and ecophysiologists.

Leaf Stomatal Image Annotation Overview

The study utilized stomatal images from two datasets, Hardwood and Populus spp., acquired from 2015 to 2022. The Hardwood dataset contained 16 species, including American elm, cherry bark oak, and red maple, spanning ages from one to 50 years. They captured over 10,000 stomatal images using a compound light microscope and digital camera.

The Populus dataset comprised 3,000+ images from 55 genotypes of hybrid poplar and eastern cottonwood, aged four to five years. Between June and August 2020 to 2022, trees underwent photosynthetic CO₂ response curve measurements. They collected a fresh leaf from each tree, stored it in labeled bags in a cooler, and later processed it for stomatal peels using clear nail polish. They captured multiple images per leaf using specific magnification lenses.

The annotation process encompassed manually labeling 1,000 images to train a YOLO model for detecting inner guard cell walls and whole stomata. The publicly available StoManager1, incorporating this model, offered a user-friendly graphical user interface (GUI) for Windows systems, aiding in generating YOLO Darknet format files for machine learning model training. They reviewed and adjusted label discrepancies using LabelImg, utilizing a subset to train and verify YOLO models (v7 and v8).

The dataset on Figshare and Zenodo comprises original images, labels, and data records with 10,715 observations across seven variables per image. Each image has a unique file name and an associated label file detailing the class, coordinates, and dimensions of inner guard cell walls and whole stomata as ratios to the image width and height. The data, including magnification and resolution, are vital for studying stomatal area and density, offering comprehensive insights into leaf stomatal traits.

Dataset Overview and Validations

Data Overview: The dataset, accessible on Figshare and Zenodo, includes original images, labels, and data records. It comprises 10,715 observations across seven variables per image, detailing image name, species, scientific name, magnification, width, height, and resolution. Each image has a unique file name and a corresponding label file containing class information, coordinates, width, and height represented as ratios to the image dimensions. Essential variables like magnification, width, height, and resolution are critical in analyzing stomatal traits.

Validation Process: They rigorously validated images, labels, and data records. Image dimensions and resolution were verified using ImageJ software, ensuring accuracy. YOLOv7 and YOLOv8 models were employed to evaluate the dataset for training, achieving precision, recall, and mean average precision at different intersections over union (IOU) thresholds, demonstrating the model's robustness.

Usage Recommendations: To leverage the dataset for object detection model training, users can utilize platforms like Roboflow for annotations, format conversions, and operations such as resizing or contrast adjustments. Diverse image subsets aid in creating more versatile models, encompassing various species, dimensions, magnifications, and image qualities. Incorporating images with differing qualities enriches the model's ability to detect stomata in diverse scenarios.

Potential Applications: Trained models can extract detected features to formulate new indices for assessing stomatal characteristics. The detected bounding box attributes facilitate estimating stomatal area, density, and orientation. Users can utilize regression models to estimate indices like guard cell length and width using bounding box measurements and orientation information. Developed weighted multivariate linear regression models can explain substantial variations in measured stomatal traits.

Conclusion

To sum up, the dataset, encompassing original images, labels, and comprehensive data records, offers a robust resource for studying stomatal traits. Rigorous validation processes ensured accuracy, supported by evaluations using YOLOv7 and YOLOv8 models, affirming the dataset's reliability for training.

Leveraging platforms like Roboflow and incorporating diverse image subsets will enhance the model's adaptability and performance in detecting stomata across varied scenarios. This dataset's potential applications span from formulating novel indices based on detected features to employing regression models for precise estimations of stomatal characteristics, signifying its invaluable utility in advancing stomatal research and analysis.

Journal reference:

Wang, J., Renninger, H. J., & Ma, Q. (2024). Labeled temperate hardwood tree stomatal image datasets from seven taxa of Populus and 17 hardwood species. Scientific Data, 11:1, 1. https://doi.org/10.1038/s41597-023-02657-3, https://www.nature.com/articles/s41597-023-02657-3

Posted in: AI Research News

Comments (0)

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Download PDF Copy

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

APA
Chandrasekar, Silpaja. (2024, January 05). Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees. AZoAi. Retrieved on April 22, 2025 from https://www.azoai.com/news/20240105/Extensive-ML-Dataset-for-Advancing-Stomatal-Research-in-Hardwood-Trees.aspx.
MLA
Chandrasekar, Silpaja. "Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees". AZoAi. 22 April 2025. <https://www.azoai.com/news/20240105/Extensive-ML-Dataset-for-Advancing-Stomatal-Research-in-Hardwood-Trees.aspx>.
Chicago
Chandrasekar, Silpaja. "Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees". AZoAi. https://www.azoai.com/news/20240105/Extensive-ML-Dataset-for-Advancing-Stomatal-Research-in-Hardwood-Trees.aspx. (accessed April 22, 2025).
Harvard
Chandrasekar, Silpaja. 2024. Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees. AZoAi, viewed 22 April 2025, https://www.azoai.com/news/20240105/Extensive-ML-Dataset-for-Advancing-Stomatal-Research-in-Hardwood-Trees.aspx.