TreeFormer: Revolutionizing Tree Counting in Aerial and Satellite Images with Transformer Power

In a recent paper submitted to the arXiv* server, researchers introduced TreeFormer, a novel semi-supervised framework based on transformer architecture for accurately estimating tree counting in aerial and satellite images. This article explores the benefits and advancements brought forth by TreeFormer, highlighting its significance in various fields.

Study: TreeFormer: Revolutionizing Tree Counting in Aerial and Satellite Images with Transformer Power. Image credit: Maksim Safaniuk /Shutterstock
Study: TreeFormer: Revolutionizing Tree Counting in Aerial and Satellite Images with Transformer Power. Image credit: Maksim Safaniuk /Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Background

The role of trees in maintaining ecological balance and planetary health cannot be overstated. Counting trees using high-resolution images has practical applications in forest inventory, farm management, urban planning, and crop estimation. While traditional methods like field surveys are time-consuming and expensive, leveraging aerial and satellite images, along with light detection and ranging (LiDAR) data, offers accurate results.

To overcome the high cost of labeling a large number of trees in supervised methods, researchers introduced TreeFormer, a semi-supervised framework based on transformer architecture. TreeFormer incorporates a pyramid vision transformer for feature extraction and a contextual attention-based feature fusion module.

Additionally, they proposed a pyramid learning strategy that leverages unlabeled data through local tree density consistency and local tree count ranking losses. They also developed a tree counter token for estimating global tree counts. The proposed method outperforms state-of-the-art approaches on benchmark datasets (Jiangsu, Yosemite) and a newly created dataset (KCL-London) with manually annotated tree locations.

Related work

Object counting: In object counting, various methods have been developed for different objects, including humans, cells, cars, and trees. Fully supervised methods achieve high performance but require extensive labeled data. Weakly or semi-supervised methods reduce the reliance on labeled data by incorporating unlabeled or weakly annotated data.

Tree counting: Tree counting poses additional challenges due to dense canopies and interlocking trees. Traditional methods involve detecting tree areas and using segmentation techniques, but their accuracy is limited. Deep neural networks (DNNs) have shown promise in tree detection and counting. Detection-based methods use bounding boxes to identify and count individual trees, while density estimation-based methods generate density maps to estimate tree numbers. These methods leverage DNNs and have demonstrated better performance.

However, limited research has focused on tree density estimation, and existing approaches often rely on basic DNN architectures. The scarcity of annotated training data in tree counting calls for an efficient semi-supervised framework.

Methodology

The researchers proposed a semi-supervised framework for estimating tree density maps from remote sensing images. The framework consists of an encoder-decoder architecture with transformer blocks. It includes a pyramid tree feature representation (PTFR) module in the encoder, a contextual attention-based feature fusion (CAFF) module in the decoder, a tree density regressor (TDR) module for density map estimation, and a tree counter token (TCT) module for tree counting.

The framework utilizes supervised distribution matching loss for labeled data and introduces local tree density consistency and local tree count ranking losses for unlabeled data. A global tree count regularization is applied to optimize the network's predictions.

Experiments

Datasets: Three datasets were used in the experiments.

  1. KCL-London dataset: This dataset contains high-resolution images (0.2m ground sampling distance (GSD)) from London, divided into 308 unlabeled and 613 labeled images. The labeled set is further split into 452 training and 161 testing samples.
  2. Jiangsu dataset: This dataset consists of 24 Gaofen-II satellite images (0.8m GSD) from Jiangsu Province, China. It contains 664,487 manually annotated trees across 2400 images, divided into 1920 training and 480 test samples.
  3. Yosemite dataset: This dataset covers Yosemite National Park, California, with a rectangular image of 19,200 × 38,400 pixels (0.12m GSD). It contains 98,949 manually annotated trees and is split into 1350 training and test samples.

Implementation details: The model uses an encoder-decoder architecture with a transformer-based encoder and three-scale density maps estimated by the decoder. Data augmentation techniques like horizontal flipping and random cropping are employed. The network is trained using the Adam optimizer with parameters fine-tuned on the KCL-London dataset.

Evaluation and comparisons: The evaluation protocol involved dividing the training sets into labeled and unlabeled subsets. Performance metrics such as mean absolute error (EMAE), R-Squared (R2), root mean squared error (ERMS), grid average mean absolute error (GAME), precision (P), recall (R), and F1-measure (F1) were utilized. Comparisons with state-of-the-art models were conducted in both semi-supervised and supervised settings, with TreeFormer outperforming existing methods in both groups.

​​​​​​​Overall, TreeFormer demonstrates superior performance compared to state-of-the-art models, showcasing the effectiveness of its architecture and learning strategy.

Conclusion

In conclusion, TreeFormer presents a significant advancement in tree counting from remote sensing images. The semi-supervised framework, built upon the transformer architecture, combines feature fusion and tree density estimation modules to improve extraction and mapping accuracy.

The proposed pyramid learning strategy enhances performance by incorporating local tree count ranking and density consistency. The results on multiple datasets demonstrate TreeFormer's superiority over existing models. Future work should focus on improving generalizability across diverse datasets by employing domain adaptation techniques and considering regional variations in tree shapes.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Journal reference:
Dr. Sampath Lonka

Written by

Dr. Sampath Lonka

Dr. Sampath Lonka is a scientific writer based in Bangalore, India, with a strong academic background in Mathematics and extensive experience in content writing. He has a Ph.D. in Mathematics from the University of Hyderabad and is deeply passionate about teaching, writing, and research. Sampath enjoys teaching Mathematics, Statistics, and AI to both undergraduate and postgraduate students. What sets him apart is his unique approach to teaching Mathematics through programming, making the subject more engaging and practical for students.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Lonka, Sampath. (2023, July 16). TreeFormer: Revolutionizing Tree Counting in Aerial and Satellite Images with Transformer Power. AZoAi. Retrieved on November 21, 2024 from https://www.azoai.com/news/20230716/TreeFormer-Revolutionizing-Tree-Counting-in-Aerial-and-Satellite-Images-with-Transformer-Power.aspx.

  • MLA

    Lonka, Sampath. "TreeFormer: Revolutionizing Tree Counting in Aerial and Satellite Images with Transformer Power". AZoAi. 21 November 2024. <https://www.azoai.com/news/20230716/TreeFormer-Revolutionizing-Tree-Counting-in-Aerial-and-Satellite-Images-with-Transformer-Power.aspx>.

  • Chicago

    Lonka, Sampath. "TreeFormer: Revolutionizing Tree Counting in Aerial and Satellite Images with Transformer Power". AZoAi. https://www.azoai.com/news/20230716/TreeFormer-Revolutionizing-Tree-Counting-in-Aerial-and-Satellite-Images-with-Transformer-Power.aspx. (accessed November 21, 2024).

  • Harvard

    Lonka, Sampath. 2023. TreeFormer: Revolutionizing Tree Counting in Aerial and Satellite Images with Transformer Power. AZoAi, viewed 21 November 2024, https://www.azoai.com/news/20230716/TreeFormer-Revolutionizing-Tree-Counting-in-Aerial-and-Satellite-Images-with-Transformer-Power.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Accent Classification with Deep Learning Models