|
Hongmin Li, Nicolais Guevara, Nic Herndon, Doina Caragea, Kishore Neppalli, Cornelia Caragea, et al. (2015). Twitter Mining for Disaster Response: A Domain Adaptation Approach. In L. Palen, M. Buscher, T. Comes, & A. Hughes (Eds.), ISCRAM 2015 Conference Proceedings ? 12th International Conference on Information Systems for Crisis Response and Management. Kristiansand, Norway: University of Agder (UiA).
Abstract: Microblogging data such as Twitter data contains valuable information that has the potential to help improve the speed, quality, and efficiency of disaster response. Machine learning can help with this by prioritizing the tweets with respect to various classification criteria. However, supervised learning algorithms require labeled data to learn accurate classifiers. Unfortunately, for a new disaster, labeled tweets are not easily available, while they are usually available for previous disasters. Furthermore, unlabeled tweets from the current disaster are accumulating fast. We study the usefulness of labeled data from a prior source disaster, together with unlabeled data from the current target disaster to learn domain adaptation classifiers for the target. Experimental results suggest that, for some tasks, source data itself can be useful for classifying target data. However, for tasks specific to a particular disaster, domain adaptation approaches that use target unlabeled data in addition to source labeled data are superior.
|
|
|
Hongmin Li, Xukun Li, Doina Caragea, & Cornelia Caragea. (2018). Comparison of Word Embeddings and Sentence Encodings for Generalized Representations in Crisis Tweet Classifications. In Kristin Stock, & Deborah Bunker (Eds.), Proceedings of ISCRAM Asia Pacific 2018: Innovating for Resilience – 1st International Conference on Information Systems for Crisis Response and Management Asia Pacific. (pp. 480–493). Albany, Auckland, New Zealand: Massey Univeristy.
Abstract: Many machine learning and natural language processing techniques, including supervised and domain adaptation algorithms, have been proposed and studied in the context of filtering crisis tweets. However, applying these approaches in real-time is still challenging because of time-critical requirements of emergency response operations and also diversities and unique characteristics of emergency events. In this paper, we explore the idea of building “generalized” classifiers for filtering crisis tweets that can be pre-trained, and are thus ready to use in real-time, while generalizing well on future disasters/crises data. We propose to achieve this using simple feature based adaptation with tweet representations based on word embeddings and also sentence-level embeddings, representations which do not rely on unlabeled data to achieve domain adaptations and can be easily implemented. Given that there are different types of word/sentence embeddings that are widely used, we propose to compare them to get a general idea about which type works better with crisis tweets classification tasks. Our experimental results show that GloVe embeddings in general work better with the datasets used in our evaluation, and that the supervised algorithms used in our experiments benefit from GloVe embeddings trained specifically on crisis data. Furthermore, our experimental results show that following GloVe, the sentence embeddings have great potential in crisis tweet tasks.
|
|
|
Nilani Algiriyage, Rangana Sampath, Raj Prasanna, Kristin Stock, Emma Hudson-Doyle, & David Johnston. (2021). Identifying Disaster-related Tweets: A Large-Scale Detection Model Comparison. In Anouck Adrot, Rob Grace, Kathleen Moore, & Christopher W. Zobel (Eds.), ISCRAM 2021 Conference Proceedings – 18th International Conference on Information Systems for Crisis Response and Management (pp. 731–743). Blacksburg, VA (USA): Virginia Tech.
Abstract: Social media applications such as Twitter and Facebook are fast becoming a key instrument in gaining situational awareness (understanding the bigger picture of the situation) during disasters. This has provided multiple opportunities to gather relevant information in a timely manner to improve disaster response. In recent years, identifying crisis-related social media posts is analysed as an automatic task using machine learning (ML) or deep learning (DL) techniques. However, such supervised learning algorithms require labelled training data in the early hours of a crisis. Recently, multiple manually labelled disaster-related open-source twitter datasets have been released. In this work, we create a large dataset with 186,718 tweets by combining a number of such datasets and evaluate the performance of multiple ML and DL algorithms in classifying disaster-related tweets in three settings, namely ``in-disaster'', ``out-disaster'' and ``cross-disaster''. Our results show that the Bidirectional LSTM model with Word2Vec embeddings performs well for the tweet classification task in all three settings. We also make available the preprocessing steps and trained weights for future research.
|
|
|
Reza Mazloom, HongMin Li, Doina Caragea, Muhammad Imran, & Cornelia Caragea. (2018). Classification of Twitter Disaster Data Using a Hybrid Feature-Instance Adaptation Approach. In Kees Boersma, & Brian Tomaszeski (Eds.), ISCRAM 2018 Conference Proceedings – 15th International Conference on Information Systems for Crisis Response and Management (pp. 727–735). Rochester, NY (USA): Rochester Institute of Technology.
Abstract: Huge amounts of data that are generated on social media during emergency situations are regarded as troves of critical information. The use of supervised machine learning techniques in the early stages of a disaster is challenged by the lack of labeled data for that particular disaster. Furthermore, supervised models trained on labeled data from a prior disaster may not produce accurate results, given the inherent variation between the current and the prior disasters. To address the challenges posed by the lack of labeled data for a target disaster, we propose to use a hybrid feature-instance adaptation approach based on matrix factorization and the k nearest neighbors algorithm, respectively. The proposed hybrid adaptation approach is used to select a subset of the source disaster data that is representative for the target disaster. The selected subset is subsequently used to learn accurate Naive Bayes classifiers for the target disaster.
|
|
|
Tiberiu Sosea, Iustin Sirbu, Cornelia Caragea, Doina Caragea, & Traian Rebedea. (2021). Using the Image-Text Relationship to Improve Multimodal Disaster Tweet Classification. In Anouck Adrot, Rob Grace, Kathleen Moore, & Christopher W. Zobel (Eds.), ISCRAM 2021 Conference Proceedings – 18th International Conference on Information Systems for Crisis Response and Management (pp. 691–704). Blacksburg, VA (USA): Virginia Tech.
Abstract: In this paper, we show that the text-image relationship of disaster tweets can be used to improve the classification of tweets from emergency situations. To this end, we introduce DisRel, a dataset which contains 4,600 multimodal tweets, collected during the disasters that hit the USA in 2017, and manually annotated with coherence image-text relationships, such as Similar and Complementary. We explore multiple models to detect these relationships and perform a comprehensive analysis into the robustness of these methods. Based on these models, we build a simple feature augmentation approach that can leverage the text-image relationship. We test our methods on 2 tasks in CrisisMMD: Humanitarian Categories and Damage Assessment, and observe an increase in the performance of the relationship-aware methods.
|
|
|
Xukun Li, & Doina Caragea. (2020). Improving Disaster-related Tweet Classification with a Multimodal Approach. In Amanda Hughes, Fiona McNeill, & Christopher W. Zobel (Eds.), ISCRAM 2020 Conference Proceedings – 17th International Conference on Information Systems for Crisis Response and Management (pp. 893–902). Blacksburg, VA (USA): Virginia Tech.
Abstract: Social media data analysis is important for disaster management. Lots of prior studies have focused on classifying a tweet based on its text or based on its images, independently, even if the tweet contains both text and images. Under the assumptions that text and images may contain complementary information, it is of interest to construct classifiers that make use of both modalities of the tweet. Towards this goal, we propose a multimodal classification model which aggregates text and image information. Our study aims to provide insights into the benefits obtained by combining text and images, and to understand what type of modality is more informative with respect to disaster tweet classification. Experimental results show that both text and image classification can be improved by the multimodal approach.
|
|
|
Zijun Long, & Richard Mccreadie. (2021). Automated Crisis Content Categorization for COVID-19 Tweet Streams. In Anouck Adrot, Rob Grace, Kathleen Moore, & Christopher W. Zobel (Eds.), ISCRAM 2021 Conference Proceedings – 18th International Conference on Information Systems for Crisis Response and Management (pp. 667–678). Blacksburg, VA (USA): Virginia Tech.
Abstract: Social media platforms, like Twitter, are increasingly used by billions of people internationally to share information. As such, these platforms contain vast volumes of real-time multimedia content about the world, which could be invaluable for a range of tasks such as incident tracking, damage estimation during disasters, insurance risk estimation, and more. By mining this real-time data, there are substantial economic benefits, as well as opportunities to save lives. Currently, the COVID-19 pandemic is attacking societies at an unprecedented speed and scale, forming an important use-case for social media analysis. However, the amount of information during such crisis events is vast and information normally exists in unstructured and multiple formats, making manual analysis very time consuming. Hence, in this paper, we examine how to extract valuable information from tweets related to COVID-19 automatically. For 12 geographical locations, we experiment with supervised approaches for labelling tweets into 7 crisis categories, as well as investigated automatic priority estimation, using both classical and deep learned approaches. Through evaluation using the TREC-IS 2020 COVID-19 datasets, we demonstrated that effective automatic labelling for this task is possible with an average of 61% F1 performance across crisis categories, while also analysing key factors that affect model performance and model generalizability across locations.
|
|
|
Zou, H. P., Caragea, C., Zhou, Y., & Caragea, D. (2023). Semi-Supervised Few-Shot Learning for Fine-Grained Disaster Tweet Classification. In Jaziar Radianti, Ioannis Dokas, Nicolas Lalone, & Deepak Khazanchi (Eds.), Proceedings of the 20th International ISCRAM Conference (pp. 385–395). Omaha, USA: University of Nebraska at Omaha.
Abstract: The shared real-time information about natural disasters on social media platforms like Twitter and Facebook plays a critical role in informing volunteers, emergency managers, and response organizations. However, supervised learning models for monitoring disaster events require large amounts of annotated data, making them unrealistic for real-time use in disaster events. To address this challenge, we present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting where only a small number of annotated data is required. Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data, mimicking the early stage of a disaster. Through integrating effective semi-supervised learning ideas and incorporating TextMixUp, CrisisMatch achieves performance improvement on two disaster datasets of 11.2% on average. Further analyses are also provided for the influence of the number of labeled data and out-of-domain results.
|
|