Hongmin Li, Doina Caragea, & Cornelia Caragea. (2021). Combining Self-training with Deep Learning for Disaster Tweet Classification. In Anouck Adrot, Rob Grace, Kathleen Moore, & Christopher W. Zobel (Eds.), ISCRAM 2021 Conference Proceedings – 18th International Conference on Information Systems for Crisis Response and Management (pp. 719–730). Blacksburg, VA (USA): Virginia Tech.
Abstract: Significant progress has been made towards automated classification of disaster or crisis related tweets using machine learning approaches. Deep learning models, such as Convolutional Neural Networks (CNN), domain adaptation approaches based on self-training, and approaches based on pre-trained language models, such as BERT, have been proposed and used independently for disaster tweet classification. In this paper, we propose to combine self-training with CNN and BERT models, respectively, to improve the performance on the task of identifying crisis related tweets in a target disaster where labeled data is assumed to be unavailable, while unlabeled data is available. We evaluate the resulting self-training models on three crisis tweet collections and find that: 1) the pre-trained language model BERTweet is better than the standard BERT model, when fine-tuned for downstream crisis tweets classification; 2) self-training can help improve the performance of the CNN and BERTweet models for larger unlabeled target datasets, but not for smaller datasets.
|
|
Congcong Wang, Paul Nulty, & David Lillis. (2021). Crisis Domain Adaptation Using Sequence-to-Sequence Transformers. In Anouck Adrot, Rob Grace, Kathleen Moore, & Christopher W. Zobel (Eds.), ISCRAM 2021 Conference Proceedings – 18th International Conference on Information Systems for Crisis Response and Management (pp. 655–666). Blacksburg, VA (USA): Virginia Tech.
Abstract: User-generated content (UGC) on social media can act as a key source of information for emergency responders incrisis situations. However, due to the volume concerned, computational techniques are needed to effectively filter and prioritise this content as it arises during emerging events. In the literature, these techniques are trained using annotated content from previous crises. In this paper, we investigate how this prior knowledge can be best leveraged for new crises by examining the extent to which crisis events of a similar type are more suitable for adaptation tonew events (cross-domain adaptation). Given the recent successes of transformers in various language processing tasks, we propose CAST: an approach for Crisis domain Adaptation leveraging Sequence-to-sequence Transformers. We evaluate CAST using two major crisis-related message classification datasets. Our experiments show that ourCAST-based best run without using any target data achieves the state of the art performance in both in-domain and cross-domain contexts. Moreover, CAST is particularly effective in one-to-one cross-domain adaptation when trained with a larger language model. In many-to-one adaptation where multiple crises are jointly used as the source domain, CAST further improves its performance. In addition, we find that more similar events are more likely to bring better adaptation performance whereas fine-tuning using dissimilar events does not help for adaptation. To aid reproducibility, we open source our code to the community.
|
|
Reza Mazloom, HongMin Li, Doina Caragea, Muhammad Imran, & Cornelia Caragea. (2018). Classification of Twitter Disaster Data Using a Hybrid Feature-Instance Adaptation Approach. In Kees Boersma, & Brian Tomaszeski (Eds.), ISCRAM 2018 Conference Proceedings – 15th International Conference on Information Systems for Crisis Response and Management (pp. 727–735). Rochester, NY (USA): Rochester Institute of Technology.
Abstract: Huge amounts of data that are generated on social media during emergency situations are regarded as troves of critical information. The use of supervised machine learning techniques in the early stages of a disaster is challenged by the lack of labeled data for that particular disaster. Furthermore, supervised models trained on labeled data from a prior disaster may not produce accurate results, given the inherent variation between the current and the prior disasters. To address the challenges posed by the lack of labeled data for a target disaster, we propose to use a hybrid feature-instance adaptation approach based on matrix factorization and the k nearest neighbors algorithm, respectively. The proposed hybrid adaptation approach is used to select a subset of the source disaster data that is representative for the target disaster. The selected subset is subsequently used to learn accurate Naive Bayes classifiers for the target disaster.
|
|
Francisco José Quesada Real, Fiona McNeill, Gábor Bella, & Alan Bundy. (2018). Identifying Semantic Domains in Emergency Scenarios. In Kees Boersma, & Brian Tomaszeski (Eds.), ISCRAM 2018 Conference Proceedings – 15th International Conference on Information Systems for Crisis Response and Management (pp. 1130–1132). Rochester, NY (USA): Rochester Institute of Technology.
Abstract: Emergency scenarios are characterised by the participation of multiple and diverse organisations which come from different areas. This diversity is enriching in terms of expertise and approaches to tackle problems, however, it also provokes misunderstandings caused by semantic interoperability problems. There are some approaches which propose tackling these problems by using domain adaptation algorithms. Nevertheless, it is not trivial their application in emergency scenarios where the term “domain” is used in many different ways, not being clear either what it means or which domains are involved in these scenarios. In this paper, we identify semantic domains involved in emergency scenarios by analysing papers published in proceedings of ISCRAM and ISCRAM-med conferences. As a result, a categorisation of these domains has been developed, with the aim of providing a resource that can be used by domain adaptation algorithms to tackle problems such as those involving semantic interoperability.
|
|
Hongmin Li, Doina Caragea, & Cornelia Caragea. (2017). Towards Practical Usage of a Domain Adaptation Algorithm in the Early Hours of a Disaster. In eds Aurélie Montarnal Matthieu Lauras Chihab Hanachi F. B. Tina Comes (Ed.), Proceedings of the 14th International Conference on Information Systems for Crisis Response And Management (pp. 692–704). Albi, France: Iscram.
Abstract: Many machine learning techniques have been proposed to reduce the information overload in social media data during an emergency situation. Among such techniques, domain adaptation approaches present greater potential as compared to supervised algorithms because they don't require labeled data from the current disaster for training. However, the use of domain adaptation approaches in practice is sporadic at best. One reason is that domain adaptation algorithms have parameters that need to be tuned using labeled data from the target disaster, which is presumably not available. To address this limitation, we perform a study on one domain adaptation approach with the goal of understanding how much source data is needed to obtain good performance in a practical situation, and what parameter values of the approach give overall good performance. The results of our study provide useful insights into the practical application of domain adaptation algorithms in real crisis situations.
|
|