Long, Z., McCreadiem, R., & Imran, M. (2023). CrisisViT: A Robust Vision Transformer for Crisis Image Classification. In Jaziar Radianti, Ioannis Dokas, Nicolas Lalone, & Deepak Khazanchi (Eds.), Proceedings of the 20th International ISCRAM Conference (pp. 309–319). Omaha, USA: University of Nebraska at Omaha.
Abstract: In times of emergency, crisis response agencies need to quickly and accurately assess the situation on the ground in order to deploy relevant services and resources. However, authorities often have to make decisions based on limited information, as data on affected regions can be scarce until local response services can provide first-hand reports. Fortunately, the widespread availability of smartphones with high-quality cameras has made citizen journalism through social media a valuable source of information for crisis responders. However, analyzing the large volume of images posted by citizens requires more time and effort than is typically available. To address this issue, this paper proposes the use of state-of-the-art deep neural models for automatic image classification/tagging, specifically by adapting transformer-based architectures for crisis image classification (CrisisViT). We leverage the new Incidents1M crisis image dataset to develop a range of new transformer-based image classification models. Through experimentation over the standard Crisis image benchmark dataset, we demonstrate that the CrisisViT models significantly outperform previous approaches in emergency type, image relevance, humanitarian category, and damage severity classification. Additionally, we show that the new Incidents1M dataset can further augment the CrisisViT models resulting in an additional 1.25% absolute accuracy gain.
|
|
Zijun Long, & Richard McCreadie. (2022). Is Multi-Modal Data Key for Crisis Content Categorization on Social Media? In Rob Grace, & Hossein Baharmand (Eds.), ISCRAM 2022 Conference Proceedings – 19th International Conference on Information Systems for Crisis Response and Management (pp. 1068–1080). Tarbes, France.
Abstract: The user-base of social media platforms, like Twitter, has grown dramatically around the world over the last decade. As people post everything they experience on social media, large volumes of valuable multimedia content are being recorded online, which can be analysed to help for a range of tasks. Here we specifically focus on crisis response. The majority of prior works in this space focus on using machine learning to categorize single-modality content (e.g. text of the posts, or images shared), with few works jointly utilizing multiple modalities. Hence, in this paper, we examine to what extent integrating multiple modalities is important for crisis content categorization. In particular, we design a pipeline for multi-modal learning that fuses textual and visual inputs, leverages both, and then classifies that content based on the specified task. Through evaluation using the CrisisMMD dataset, we demonstrate that effective automatic labelling for this task is possible, with an average of 88.31% F1 performance across two significant tasks (relevance and humanitarian category classification). while also analysing cases that unimodal models and multi-modal models success and fail.
|
|