|
Abstract |
The user-base of social media platforms, like Twitter, has grown dramatically around the world over the last decade. As people post everything they experience on social media, large volumes of valuable multimedia content are being recorded online, which can be analysed to help for a range of tasks. Here we specifically focus on crisis response. The majority of prior works in this space focus on using machine learning to categorize single-modality content (e.g. text of the posts, or images shared), with few works jointly utilizing multiple modalities. Hence, in this paper, we examine to what extent integrating multiple modalities is important for crisis content categorization. In particular, we design a pipeline for multi-modal learning that fuses textual and visual inputs, leverages both, and then classifies that content based on the specified task. Through evaluation using the CrisisMMD dataset, we demonstrate that effective automatic labelling for this task is possible, with an average of 88.31% F1 performance across two significant tasks (relevance and humanitarian category classification). while also analysing cases that unimodal models and multi-modal models success and fail. |
|