CrisisViT: Transforming Emergency Response with Image Classification

In an article recently submitted to the Arxiv* server, researchers proposed a novel method for automatic image classification or image tagging in crisis response scenarios. It was based on transformer-based architectures, specifically a vision transformer (ViT) variant called CrisisViT.

Study: CrisisViT: Transforming Emergency Response with Image Classification. Image credit: Jaromir Chalabala/Shutterstock
Study: CrisisViT: Transforming Emergency Response with Image Classification. Image credit: Jaromir Chalabala/Shutterstock

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Leveraging the Incidents1M crisis image dataset, the model outperformed previous methods in emergency type, image relevance, humanitarian category, and damage severity classification. The use of smartphones and social media enabled citizens to contribute valuable information, and the proposed CrisisViT model offered an efficient solution for crisis responders to analyze and categorize images rapidly, aiding in timely decision-making during emergencies.

Background

The increasing frequency of crisis events prompts the need for effective crisis response strategies. Social media platforms become vital sources of information during crises, as users share relevant content that can aid emergency responders. While previous studies have emphasized the significance of social media in crisis information acquisition, the sheer volume of posts necessitates automated tools to extract actionable information promptly. Previously, machine learning approaches focused on analyzing textual content from social media during crises. However, there is a growing recognition of the value of crisis-related images in providing essential information, aiding in resource allocation, and assessing damage severity.

Existing solutions often rely on deep convolutional neural networks (CNNs), pre-trained on non-crisis image datasets, for image classification. Nevertheless, the effectiveness of such models in accurately categorizing crisis imagery, especially in terms of disaster type, informativeness, humanitarian categories, and damage severity, raises concerns. This paper addressed the limitations of previous approaches by introducing CrisisViT. Unlike conventional CNNs, CrisisViT explored the pretraining of models using crisis imagery from the Incidents1M dataset, emphasizing in-domain learning for improved performance. The study compared CrisisViT models with established deep CNNs and ViT models on the Crisis Image Benchmark dataset, demonstrating significant accuracy improvements.

Methodology and Experimental Setup

The researchers explored the efficacy of pre-training a state-of-the-art transformer-based image classification model, ViT, on a large-scale crisis image dataset, Incidents1M. A new variant, CrisisViT, was proposed to enhance performance and robustness across various crisis image classification tasks. Two primary decisions controlled the model's construction, which were the choice of the pre-training dataset and the methodology for pre-training. The pre-training datasets considered were ImageNet-1k (representing general image classification) and Incidents1M (specialized in-domain crisis imagery).

Three pre-training strategies were employed: ImageNet-1k + Incidents1M, Incidents1M only, and self-supervised training. The Incidents1M dataset encompassed 43 incident categories and 49 place categories in total. Various pre-training tasks were explored, including binary, incident or place, dual, and self-supervised training. CrisisViT employed the ViT-base model architecture with different hyperparameters for each pre-training strategy. The experimental setup involved evaluating the model on the Crisis Image Benchmark dataset, which covered disaster type classification, informativeness, humanitarian categories, and damage severity tasks.

The comparison included popular models like ResNet101, EffiNet (b1), VGG16, and ViT-Base as baselines. Performance metrics, specifically classification accuracy, were used for evaluation, with each experiment conducted at least three times and results averaged. The authors aimed to determine how much a large-scale crisis image dataset improved transformer-based models' performance in crisis content categorization, providing insights into best practices during training. The model parameters included the use of the Adam optimizer, batch sizes of 1024 and 128 for self-supervised and supervised learning, and rectified linear unit (ReLU) activation function. Additionally, experiments explored different batch sizes and pre-training epochs on the Incidents1M dataset.

Study Results

The experimental results addressed three research questions regarding the impact of pre-training on the CrisisViT model using the Incidents1M crisis image dataset.

  • ViT vs. Convolutional Neural Baselines: Transformer-based architecture ViT outperformed CNN baselines (ResNet101, EffiNet (b1), VGG16) across crisis image classification tasks. ViT demonstrated superior accuracy, particularly in disaster type classification, humanitarian category classification, and damage severity estimation.
  • Pre-training using Incident Types and Place Categories: Pre-training CrisisViT with Incidents1M yielded improved performance over ImageNet-1k pre-training. Place category labels led to the best results, outperforming incident labels. Combining incident and place labels did not enhance performance significantly. The researchers concluded that pre-training with an in-domain dataset could yield performance gains, but the datasets for pre-training should be chosen selectively.
  • ImageNet-1k + Incidents1M: Augmenting ImageNet-1k pre-training with Incidents1M did not consistently improve performance. While models pre-trained on incident or incident+places labels showed a small performance uplift, it remained unclear whether starting from a pre-trained ViT-Base model was superior to training a new model.

Conclusion

In conclusion, the researchers introduced CrisisViT, a transformer-based image classifier pre-trained on the Incidents1M crisis dataset for improved crisis image classification on social media. Experimentation on disaster type, informativeness, humanitarian category, and damage severity tasks showed significant accuracy gains, averaging 1.25%. The findings highlighted the potential of transformer-based models and Incidents1M for enhancing crisis response tools that leverage social media images for emergency efforts.

*Important notice: arXiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as definitive, used to guide development decisions, or treated as established information in the field of artificial intelligence research.

Journal reference:
Soham Nandi

Written by

Soham Nandi

Soham Nandi is a technical writer based in Memari, India. His academic background is in Computer Science Engineering, specializing in Artificial Intelligence and Machine learning. He has extensive experience in Data Analytics, Machine Learning, and Python. He has worked on group projects that required the implementation of Computer Vision, Image Classification, and App Development.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Nandi, Soham. (2024, January 12). CrisisViT: Transforming Emergency Response with Image Classification. AZoAi. Retrieved on November 21, 2024 from https://www.azoai.com/news/20240112/CrisisViT-Transforming-Emergency-Response-with-Image-Classification.aspx.

  • MLA

    Nandi, Soham. "CrisisViT: Transforming Emergency Response with Image Classification". AZoAi. 21 November 2024. <https://www.azoai.com/news/20240112/CrisisViT-Transforming-Emergency-Response-with-Image-Classification.aspx>.

  • Chicago

    Nandi, Soham. "CrisisViT: Transforming Emergency Response with Image Classification". AZoAi. https://www.azoai.com/news/20240112/CrisisViT-Transforming-Emergency-Response-with-Image-Classification.aspx. (accessed November 21, 2024).

  • Harvard

    Nandi, Soham. 2024. CrisisViT: Transforming Emergency Response with Image Classification. AZoAi, viewed 21 November 2024, https://www.azoai.com/news/20240112/CrisisViT-Transforming-Emergency-Response-with-Image-Classification.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
AI Model Unlocks a New Level of Image-Text Understanding