AI Tool Identifies Species and Maps Ecosystems Using Multimodal Data

A groundbreaking AI model, TaxaBind, combines six data sources—images, audio, text, and more—to enhance species classification and ecological predictions, helping scientists track biodiversity and environmental changes with unprecedented accuracy.

Species image to satellite image retrieval task. For each example, we show the top 4 most similar satellite images retrieved by our model from a gallery of 100k satellite images in the iSatNat-test set.Species image to satellite image retrieval task. For each example, we show the top 4 most similar satellite images retrieved by our model from a gallery of 100k satellite images in the iSatNat-test set.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Have you ever seen an image of an animal and wondered, "What is that?" TaxaBind, a new tool developed by computer scientists at the McKelvey School of Engineering at Washington University in St. Louis, can satisfy that curiosity and more.

TaxaBind addresses the need for more robust and unified approaches to ecological problems by combining multiple models to perform species classification (what kind of bear is this?), distribution mapping (where are the cardinals?), and other technological tasks. The tool can also be used as a starting point for larger studies related to ecological modeling, which scientists might use to predict shifts in plant and animal populations, climate change effects, or impacts of human activities on ecosystems. 

Srikumar Sastry, the project's lead author, presented TaxaBind on March 2-3 at the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) in Tucson, AZ.

"With TaxaBind we're unlocking the potential of multiple modalities in the ecological domain," Sastry said. "Unlike existing models that only focus on one task at a time, we combine six modalities – ground-level images of species, geographic location, satellite images, text, audio and other environmental features – into one cohesive framework. This enables our models to address a diverse range of ecological tasks."

Sastry, a graduate student working with Nathan Jacobs, a professor of computer science and engineering, used an innovative technique known as multimodal patching to distill information from different modalities into one binding modality. Sastry describes this binding modality as the "mutual friend" that connects and maintains synergy among the other five modalities.

For TaxaBind, the binding modality is ground-level images of species. The tool captures unique features from each of the other five modalities. It condenses them into the binding modality, enabling the AI to simultaneously learn from images, text, sound, geography, and environmental context. 

When the team assessed the tool's performance across various ecological tasks, TaxaBind demonstrated superior capabilities in zero-shot classification, which is the ability to classify a species not present in its training dataset. The demo version of the tool was trained on roughly 450,000 species and can classify a given image by the species it shows, including previously unseen species.

"During training we only need to maintain the synergy between ground-level images and other modalities," Sastry said. "That bridge then creates emergent synergies between the modalities – for example, between satellite images and audio – when TaxaBind is applied to retrieval tasks, even though those modes were not trained together."

This cross-modal retrieval was another area where TaxaBind outperformed state-of-the-art methods. For example, the combination of satellite images and ground-level species images allowed TaxaBind to retrieve habitat characteristics and climate data related to species' locations. It also returned relevant satellite images based on species images, proving the tool's ability to link fine-grained ecological data with real-world environmental information.

The implications of TaxaBind extend far beyond species classification. Sastry notes that the models are general-purpose and could potentially be used as foundational models for other ecology and climate-related applications, such as deforestation monitoring and habitat mapping. He also envisions future iterations of the technology that can make sense of natural language text inputs to respond to user queries.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Source:
Journal reference:
  • Preliminary scientific report. Sastry, S., Khanal, S., Dhakal, A., Ahmad, A., & Jacobs, N. (2024). TaxaBind: A Unified Embedding Space for Ecological Applications. ArXiv. https://arxiv.org/abs/2411.00683

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.