Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees

In a paper published in the journal Scientific Reports, researchers gathered a dataset comprising approximately 11,000 distinct images of stomata on temperate broadleaf angiosperm tree leaves. This collection includes over 7,000 images covering 17 common hardwood species like oak, maple, ash, elm, and hickory, along with more than 3,000 images representing 55 genotypes from seven Populus taxa.

Study: Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees. Image credit: Miks Kuncevics/Shutterstock
Study: Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees. Image credit: Miks Kuncevics/Shutterstock

Each image is annotated for inner guard cell walls and whole stomata, allowing conversion into various annotation formats. This dataset enables the application of cutting-edge machine learning models to detect, count, and measure leaf stomata, explore diverse stomatal characteristics among hardwood trees, and create new indices for stomatal measurements.

Related Work

An extensive collection of approximately 11,000 unique hardwood leaf stomatal images, sourced from projects spanning 2015 to 2022, has been curated. This dataset includes over 7,000 images covering 17 common hardwood species and over 3,000 representing 55 genotypes from seven Populus taxa. They have annotated each image for inner guard cell walls and whole stomata, creating corresponding you only look once (YOLO) label files that facilitate machine learning model training and analysis.

This freely accessible dataset enables the development of advanced, high-throughput methods for detecting, counting, and measuring leaf stomata in temperate hardwood trees. It also supports exploration into the diverse stomatal characteristics among different hardwood tree types and offers avenues for creating new indices for stomatal measurements, benefiting ecologists, plant biologists, and ecophysiologists.

Leaf Stomatal Image Annotation Overview

The study utilized stomatal images from two datasets, Hardwood and Populus spp., acquired from 2015 to 2022. The Hardwood dataset contained 16 species, including American elm, cherry bark oak, and red maple, spanning ages from one to 50 years. They captured over 10,000 stomatal images using a compound light microscope and digital camera.

The Populus dataset comprised 3,000+ images from 55 genotypes of hybrid poplar and eastern cottonwood, aged four to five years. Between June and August 2020 to 2022, trees underwent photosynthetic CO2 response curve measurements. They collected a fresh leaf from each tree, stored it in labeled bags in a cooler, and later processed it for stomatal peels using clear nail polish. They captured multiple images per leaf using specific magnification lenses.

The annotation process encompassed manually labeling 1,000 images to train a YOLO model for detecting inner guard cell walls and whole stomata. The publicly available StoManager1, incorporating this model, offered a user-friendly graphical user interface (GUI) for Windows systems, aiding in generating YOLO Darknet format files for machine learning model training. They reviewed and adjusted label discrepancies using LabelImg, utilizing a subset to train and verify YOLO models (v7 and v8).

The dataset on Figshare and Zenodo comprises original images, labels, and data records with 10,715 observations across seven variables per image. Each image has a unique file name and an associated label file detailing the class, coordinates, and dimensions of inner guard cell walls and whole stomata as ratios to the image width and height. The data, including magnification and resolution, are vital for studying stomatal area and density, offering comprehensive insights into leaf stomatal traits.

Dataset Overview and Validations

Data Overview: The dataset, accessible on Figshare and Zenodo, includes original images, labels, and data records. It comprises 10,715 observations across seven variables per image, detailing image name, species, scientific name, magnification, width, height, and resolution. Each image has a unique file name and a corresponding label file containing class information, coordinates, width, and height represented as ratios to the image dimensions. Essential variables like magnification, width, height, and resolution are critical in analyzing stomatal traits.

Validation Process: They rigorously validated images, labels, and data records. Image dimensions and resolution were verified using ImageJ software, ensuring accuracy. YOLOv7 and YOLOv8 models were employed to evaluate the dataset for training, achieving precision, recall, and mean average precision at different intersections over union (IOU) thresholds, demonstrating the model's robustness.

Usage Recommendations: To leverage the dataset for object detection model training, users can utilize platforms like Roboflow for annotations, format conversions, and operations such as resizing or contrast adjustments. Diverse image subsets aid in creating more versatile models, encompassing various species, dimensions, magnifications, and image qualities. Incorporating images with differing qualities enriches the model's ability to detect stomata in diverse scenarios.

Potential Applications: Trained models can extract detected features to formulate new indices for assessing stomatal characteristics. The detected bounding box attributes facilitate estimating stomatal area, density, and orientation. Users can utilize regression models to estimate indices like guard cell length and width using bounding box measurements and orientation information. Developed weighted multivariate linear regression models can explain substantial variations in measured stomatal traits.

Conclusion

To sum up, the dataset, encompassing original images, labels, and comprehensive data records, offers a robust resource for studying stomatal traits. Rigorous validation processes ensured accuracy, supported by evaluations using YOLOv7 and YOLOv8 models, affirming the dataset's reliability for training.

Leveraging platforms like Roboflow and incorporating diverse image subsets will enhance the model's adaptability and performance in detecting stomata across varied scenarios. This dataset's potential applications span from formulating novel indices based on detected features to employing regression models for precise estimations of stomatal characteristics, signifying its invaluable utility in advancing stomatal research and analysis.

Journal reference:
Silpaja Chandrasekar

Written by

Silpaja Chandrasekar

Dr. Silpaja Chandrasekar has a Ph.D. in Computer Science from Anna University, Chennai. Her research expertise lies in analyzing traffic parameters under challenging environmental conditions. Additionally, she has gained valuable exposure to diverse research areas, such as detection, tracking, classification, medical image analysis, cancer cell detection, chemistry, and Hamiltonian walks.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Chandrasekar, Silpaja. (2024, January 05). Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees. AZoAi. Retrieved on December 25, 2024 from https://www.azoai.com/news/20240105/Extensive-ML-Dataset-for-Advancing-Stomatal-Research-in-Hardwood-Trees.aspx.

  • MLA

    Chandrasekar, Silpaja. "Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees". AZoAi. 25 December 2024. <https://www.azoai.com/news/20240105/Extensive-ML-Dataset-for-Advancing-Stomatal-Research-in-Hardwood-Trees.aspx>.

  • Chicago

    Chandrasekar, Silpaja. "Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees". AZoAi. https://www.azoai.com/news/20240105/Extensive-ML-Dataset-for-Advancing-Stomatal-Research-in-Hardwood-Trees.aspx. (accessed December 25, 2024).

  • Harvard

    Chandrasekar, Silpaja. 2024. Extensive ML Dataset for Advancing Stomatal Research in Hardwood Trees. AZoAi, viewed 25 December 2024, https://www.azoai.com/news/20240105/Extensive-ML-Dataset-for-Advancing-Stomatal-Research-in-Hardwood-Trees.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine Learning Identifies Seismic Precursors, Advancing Earthquake Forecasting Capabilities