Cornelia Caragea, Adrian Silvescu, & Andrea Tapia. (2016). Identifying Informative Messages in Disasters using Convolutional Neural Networks. In A. Tapia, P. Antunes, V.A. Bañuls, K. Moore, & J. Porto (Eds.), ISCRAM 2016 Conference Proceedings ? 13th International Conference on Information Systems for Crisis Response and Management. Rio de Janeiro, Brasil: Federal University of Rio de Janeiro.
Abstract: Social media is a vital source of information during any major event, especially natural disasters. Data produced through social networking sites is seen as ubiquitous, rapid and accessible, and it is believed to empower average citizens to become more situationally aware during disasters and coordinate to help themselves. However, with the exponential increase in the volume of social media data, so comes the increase in data that are irrelevant to a disaster, thus, diminishing peoples? ability to find the information that they need in order to organize relief efforts, find help, and potentially save lives. In this paper, we present an approach to identifying informative messages in social media streams during disaster events. Our approach is based on Convolutional Neural Networks and shows significant improvement in performance over models that use the ?bag of words? and n-grams as features on several datasets of messages from flooding events.
|
Hongmin Li, Doina Caragea, & Cornelia Caragea. (2021). Combining Self-training with Deep Learning for Disaster Tweet Classification. In Anouck Adrot, Rob Grace, Kathleen Moore, & Christopher W. Zobel (Eds.), ISCRAM 2021 Conference Proceedings – 18th International Conference on Information Systems for Crisis Response and Management (pp. 719–730). Blacksburg, VA (USA): Virginia Tech.
Abstract: Significant progress has been made towards automated classification of disaster or crisis related tweets using machine learning approaches. Deep learning models, such as Convolutional Neural Networks (CNN), domain adaptation approaches based on self-training, and approaches based on pre-trained language models, such as BERT, have been proposed and used independently for disaster tweet classification. In this paper, we propose to combine self-training with CNN and BERT models, respectively, to improve the performance on the task of identifying crisis related tweets in a target disaster where labeled data is assumed to be unavailable, while unlabeled data is available. We evaluate the resulting self-training models on three crisis tweet collections and find that: 1) the pre-trained language model BERTweet is better than the standard BERT model, when fine-tuned for downstream crisis tweets classification; 2) self-training can help improve the performance of the CNN and BERTweet models for larger unlabeled target datasets, but not for smaller datasets.
|
Muhammad Imran, Prasenjit Mitra, & Jaideep Srivastava. (2016). Cross-Language Domain Adaptation for Classifying Crisis-Related Short Messages. In A. Tapia, P. Antunes, V.A. Bañuls, K. Moore, & J. Porto (Eds.), ISCRAM 2016 Conference Proceedings ? 13th International Conference on Information Systems for Crisis Response and Management. Rio de Janeiro, Brasil: Federal University of Rio de Janeiro.
Abstract: Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased.
|