|
Congcong Wang, Paul Nulty, & David Lillis. (2021). Crisis Domain Adaptation Using Sequence-to-Sequence Transformers. In Anouck Adrot, Rob Grace, Kathleen Moore, & Christopher W. Zobel (Eds.), ISCRAM 2021 Conference Proceedings – 18th International Conference on Information Systems for Crisis Response and Management (pp. 655–666). Blacksburg, VA (USA): Virginia Tech.
Abstract: User-generated content (UGC) on social media can act as a key source of information for emergency responders incrisis situations. However, due to the volume concerned, computational techniques are needed to effectively filter and prioritise this content as it arises during emerging events. In the literature, these techniques are trained using annotated content from previous crises. In this paper, we investigate how this prior knowledge can be best leveraged for new crises by examining the extent to which crisis events of a similar type are more suitable for adaptation tonew events (cross-domain adaptation). Given the recent successes of transformers in various language processing tasks, we propose CAST: an approach for Crisis domain Adaptation leveraging Sequence-to-sequence Transformers. We evaluate CAST using two major crisis-related message classification datasets. Our experiments show that ourCAST-based best run without using any target data achieves the state of the art performance in both in-domain and cross-domain contexts. Moreover, CAST is particularly effective in one-to-one cross-domain adaptation when trained with a larger language model. In many-to-one adaptation where multiple crises are jointly used as the source domain, CAST further improves its performance. In addition, we find that more similar events are more likely to bring better adaptation performance whereas fine-tuning using dissimilar events does not help for adaptation. To aid reproducibility, we open source our code to the community.
|
|
|
Francisco José Quesada Real, Fiona McNeill, Gábor Bella, & Alan Bundy. (2018). Identifying Semantic Domains in Emergency Scenarios. In Kees Boersma, & Brian Tomaszeski (Eds.), ISCRAM 2018 Conference Proceedings – 15th International Conference on Information Systems for Crisis Response and Management (pp. 1130–1132). Rochester, NY (USA): Rochester Institute of Technology.
Abstract: Emergency scenarios are characterised by the participation of multiple and diverse organisations which come from different areas. This diversity is enriching in terms of expertise and approaches to tackle problems, however, it also provokes misunderstandings caused by semantic interoperability problems. There are some approaches which propose tackling these problems by using domain adaptation algorithms. Nevertheless, it is not trivial their application in emergency scenarios where the term “domain” is used in many different ways, not being clear either what it means or which domains are involved in these scenarios. In this paper, we identify semantic domains involved in emergency scenarios by analysing papers published in proceedings of ISCRAM and ISCRAM-med conferences. As a result, a categorisation of these domains has been developed, with the aim of providing a resource that can be used by domain adaptation algorithms to tackle problems such as those involving semantic interoperability.
|
|
|
Hongmin Li, Doina Caragea, & Cornelia Caragea. (2017). Towards Practical Usage of a Domain Adaptation Algorithm in the Early Hours of a Disaster. In eds Aurélie Montarnal Matthieu Lauras Chihab Hanachi F. B. Tina Comes (Ed.), Proceedings of the 14th International Conference on Information Systems for Crisis Response And Management (pp. 692–704). Albi, France: Iscram.
Abstract: Many machine learning techniques have been proposed to reduce the information overload in social media data during an emergency situation. Among such techniques, domain adaptation approaches present greater potential as compared to supervised algorithms because they don't require labeled data from the current disaster for training. However, the use of domain adaptation approaches in practice is sporadic at best. One reason is that domain adaptation algorithms have parameters that need to be tuned using labeled data from the target disaster, which is presumably not available. To address this limitation, we perform a study on one domain adaptation approach with the goal of understanding how much source data is needed to obtain good performance in a practical situation, and what parameter values of the approach give overall good performance. The results of our study provide useful insights into the practical application of domain adaptation algorithms in real crisis situations.
|
|
|
Hongmin Li, Doina Caragea, & Cornelia Caragea. (2021). Combining Self-training with Deep Learning for Disaster Tweet Classification. In Anouck Adrot, Rob Grace, Kathleen Moore, & Christopher W. Zobel (Eds.), ISCRAM 2021 Conference Proceedings – 18th International Conference on Information Systems for Crisis Response and Management (pp. 719–730). Blacksburg, VA (USA): Virginia Tech.
Abstract: Significant progress has been made towards automated classification of disaster or crisis related tweets using machine learning approaches. Deep learning models, such as Convolutional Neural Networks (CNN), domain adaptation approaches based on self-training, and approaches based on pre-trained language models, such as BERT, have been proposed and used independently for disaster tweet classification. In this paper, we propose to combine self-training with CNN and BERT models, respectively, to improve the performance on the task of identifying crisis related tweets in a target disaster where labeled data is assumed to be unavailable, while unlabeled data is available. We evaluate the resulting self-training models on three crisis tweet collections and find that: 1) the pre-trained language model BERTweet is better than the standard BERT model, when fine-tuned for downstream crisis tweets classification; 2) self-training can help improve the performance of the CNN and BERTweet models for larger unlabeled target datasets, but not for smaller datasets.
|
|
|
Hongmin Li, Nicolais Guevara, Nic Herndon, Doina Caragea, Kishore Neppalli, Cornelia Caragea, et al. (2015). Twitter Mining for Disaster Response: A Domain Adaptation Approach. In L. Palen, M. Buscher, T. Comes, & A. Hughes (Eds.), ISCRAM 2015 Conference Proceedings ? 12th International Conference on Information Systems for Crisis Response and Management. Kristiansand, Norway: University of Agder (UiA).
Abstract: Microblogging data such as Twitter data contains valuable information that has the potential to help improve the speed, quality, and efficiency of disaster response. Machine learning can help with this by prioritizing the tweets with respect to various classification criteria. However, supervised learning algorithms require labeled data to learn accurate classifiers. Unfortunately, for a new disaster, labeled tweets are not easily available, while they are usually available for previous disasters. Furthermore, unlabeled tweets from the current disaster are accumulating fast. We study the usefulness of labeled data from a prior source disaster, together with unlabeled data from the current target disaster to learn domain adaptation classifiers for the target. Experimental results suggest that, for some tasks, source data itself can be useful for classifying target data. However, for tasks specific to a particular disaster, domain adaptation approaches that use target unlabeled data in addition to source labeled data are superior.
|
|
|
Muhammad Imran, Prasenjit Mitra, & Jaideep Srivastava. (2016). Cross-Language Domain Adaptation for Classifying Crisis-Related Short Messages. In A. Tapia, P. Antunes, V.A. Bañuls, K. Moore, & J. Porto (Eds.), ISCRAM 2016 Conference Proceedings ? 13th International Conference on Information Systems for Crisis Response and Management. Rio de Janeiro, Brasil: Federal University of Rio de Janeiro.
Abstract: Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased.
|
|
|
Reza Mazloom, HongMin Li, Doina Caragea, Muhammad Imran, & Cornelia Caragea. (2018). Classification of Twitter Disaster Data Using a Hybrid Feature-Instance Adaptation Approach. In Kees Boersma, & Brian Tomaszeski (Eds.), ISCRAM 2018 Conference Proceedings – 15th International Conference on Information Systems for Crisis Response and Management (pp. 727–735). Rochester, NY (USA): Rochester Institute of Technology.
Abstract: Huge amounts of data that are generated on social media during emergency situations are regarded as troves of critical information. The use of supervised machine learning techniques in the early stages of a disaster is challenged by the lack of labeled data for that particular disaster. Furthermore, supervised models trained on labeled data from a prior disaster may not produce accurate results, given the inherent variation between the current and the prior disasters. To address the challenges posed by the lack of labeled data for a target disaster, we propose to use a hybrid feature-instance adaptation approach based on matrix factorization and the k nearest neighbors algorithm, respectively. The proposed hybrid adaptation approach is used to select a subset of the source disaster data that is representative for the target disaster. The selected subset is subsequently used to learn accurate Naive Bayes classifiers for the target disaster.
|
|
|
Xukun Li, Doina Caragea, Cornelia Caragea, Muhammad Imran, & Ferda Ofli. (2019). Identifying Disaster Damage Images Using a Domain Adaptation Approach. In Z. Franco, J. J. González, & J. H. Canós (Eds.), Proceedings of the 16th International Conference on Information Systems for Crisis Response And Management. Valencia, Spain: Iscram.
Abstract: Approaches for effectively filtering useful situational awareness information posted by eyewitnesses of disasters,
in real time, are greatly needed. While many studies have focused on filtering textual information, the research
on filtering disaster images is more limited. In particular, there are no studies on the applicability of domain
adaptation to filter images from an emergent target disaster, when no labeled data is available for the target disaster.
To fill in this gap, we propose to apply a domain adaptation approach, called domain adversarial neural networks
(DANN), to the task of identifying images that show damage. The DANN approach has VGG-19 as its backbone,
and uses the adversarial training to find a transformation that makes the source and target data indistinguishable.
Experimental results on several pairs of disasters suggest that the DANN model generally gives similar or better
results as compared to the VGG-19 model fine-tuned on the source labeled data.
|
|