Records |
Author |
Hongmin Li; Doina Caragea; Cornelia Caragea |
Title |
Combining Self-training with Deep Learning for Disaster Tweet Classification |
Type |
Conference Article |
Year |
2021 |
Publication |
ISCRAM 2021 Conference Proceedings – 18th International Conference on Information Systems for Crisis Response and Management |
Abbreviated Journal |
Iscram 2021 |
Volume |
|
Issue |
|
Pages |
719-730 |
Keywords |
Domain Adaptation, Self-training, Crisis Tweets Classification, BERT, CNN |
Abstract |
Significant progress has been made towards automated classification of disaster or crisis related tweets using machine learning approaches. Deep learning models, such as Convolutional Neural Networks (CNN), domain adaptation approaches based on self-training, and approaches based on pre-trained language models, such as BERT, have been proposed and used independently for disaster tweet classification. In this paper, we propose to combine self-training with CNN and BERT models, respectively, to improve the performance on the task of identifying crisis related tweets in a target disaster where labeled data is assumed to be unavailable, while unlabeled data is available. We evaluate the resulting self-training models on three crisis tweet collections and find that: 1) the pre-trained language model BERTweet is better than the standard BERT model, when fine-tuned for downstream crisis tweets classification; 2) self-training can help improve the performance of the CNN and BERTweet models for larger unlabeled target datasets, but not for smaller datasets. |
Address |
Department of Computer Science, Kansas State University; Department of Computer Science, Kansas State University; Department of Computer Science, University of Illinois at Chicago |
Corporate Author |
|
Thesis |
|
Publisher |
Virginia Tech |
Place of Publication |
Blacksburg, VA (USA) |
Editor |
Anouck Adrot; Rob Grace; Kathleen Moore; Christopher W. Zobel |
Language |
English |
Summary Language |
English |
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
978-1-949373-61-5 |
ISBN |
|
Medium |
|
Track |
Social Media for Disaster Response and Resilience |
Expedition |
|
Conference |
18th International Conference on Information Systems for Crisis Response and Management |
Notes |
hongminli@ksu.edu |
Approved |
no |
Call Number |
ISCRAM @ idladmin @ |
Serial |
2367 |
Share this record to Facebook |
|
|
|
Author |
Hongmin Li; Doina Caragea; Cornelia Caragea |
Title |
Towards Practical Usage of a Domain Adaptation Algorithm in the Early Hours of a Disaster |
Type |
Conference Article |
Year |
2017 |
Publication |
Proceedings of the 14th International Conference on Information Systems for Crisis Response And Management |
Abbreviated Journal |
Iscram 2017 |
Volume |
|
Issue |
|
Pages |
692-704 |
Keywords |
Twitter; Domain adaptation; Disaster; Classification |
Abstract |
Many machine learning techniques have been proposed to reduce the information overload in social media data during an emergency situation. Among such techniques, domain adaptation approaches present greater potential as compared to supervised algorithms because they don't require labeled data from the current disaster for training. However, the use of domain adaptation approaches in practice is sporadic at best. One reason is that domain adaptation algorithms have parameters that need to be tuned using labeled data from the target disaster, which is presumably not available. To address this limitation, we perform a study on one domain adaptation approach with the goal of understanding how much source data is needed to obtain good performance in a practical situation, and what parameter values of the approach give overall good performance. The results of our study provide useful insights into the practical application of domain adaptation algorithms in real crisis situations. |
Address |
Kansas State University; University of North Texas |
Corporate Author |
|
Thesis |
|
Publisher |
Iscram |
Place of Publication |
Albi, France |
Editor |
Tina Comes, F.B., Chihab Hanachi, Matthieu Lauras, Aurélie Montarnal, eds |
Language |
English |
Summary Language |
English |
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
2411-3387 |
ISBN |
|
Medium |
|
Track |
Social Media Studies |
Expedition |
|
Conference |
14th International Conference on Information Systems for Crisis Response And Management |
Notes |
|
Approved |
no |
Call Number |
ISCRAM @ idladmin @ |
Serial |
2057 |
Share this record to Facebook |
|
|
|
Author |
Hongmin Li; Nicolais Guevara; Nic Herndon; Doina Caragea; Kishore Neppalli; Cornelia Caragea; Anna Squicciarini; Andrea H. Tapia |
Title |
Twitter Mining for Disaster Response: A Domain Adaptation Approach |
Type |
Conference Article |
Year |
2015 |
Publication |
ISCRAM 2015 Conference Proceedings ? 12th International Conference on Information Systems for Crisis Response and Management |
Abbreviated Journal |
ISCRAM 2015 |
Volume |
|
Issue |
|
Pages |
|
Keywords |
Disaster Response; domain adaptation; tweet classification |
Abstract |
Microblogging data such as Twitter data contains valuable information that has the potential to help improve the speed, quality, and efficiency of disaster response. Machine learning can help with this by prioritizing the tweets with respect to various classification criteria. However, supervised learning algorithms require labeled data to learn accurate classifiers. Unfortunately, for a new disaster, labeled tweets are not easily available, while they are usually available for previous disasters. Furthermore, unlabeled tweets from the current disaster are accumulating fast. We study the usefulness of labeled data from a prior source disaster, together with unlabeled data from the current target disaster to learn domain adaptation classifiers for the target. Experimental results suggest that, for some tasks, source data itself can be useful for classifying target data. However, for tasks specific to a particular disaster, domain adaptation approaches that use target unlabeled data in addition to source labeled data are superior. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
University of Agder (UiA) |
Place of Publication |
Kristiansand, Norway |
Editor |
L. Palen; M. Buscher; T. Comes; A. Hughes |
Language |
English |
Summary Language |
English |
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
2411-3387 |
ISBN |
9788271177881 |
Medium |
|
Track |
Social Media Studies |
Expedition |
|
Conference |
ISCRAM 2015 Conference Proceedings ? 12th International Conference on Information Systems for Crisis Response and Management |
Notes |
|
Approved |
yes |
Call Number |
|
Serial |
1234 |
Share this record to Facebook |
|
|
|
Author |
Hongmin Li; Xukun Li; Doina Caragea; Cornelia Caragea |
Title |
Comparison of Word Embeddings and Sentence Encodings for Generalized Representations in Crisis Tweet Classifications |
Type |
Conference Article |
Year |
2018 |
Publication |
Proceedings of ISCRAM Asia Pacific 2018: Innovating for Resilience – 1st International Conference on Information Systems for Crisis Response and Management Asia Pacific. |
Abbreviated Journal |
Iscram Ap 2018 |
Volume |
|
Issue |
|
Pages |
480-493 |
Keywords |
Word Embeddings, Sentence Encodings, Reduced Tweet Representation, Crisis Tweet Classification |
Abstract |
Many machine learning and natural language processing techniques, including supervised and domain adaptation algorithms, have been proposed and studied in the context of filtering crisis tweets. However, applying these approaches in real-time is still challenging because of time-critical requirements of emergency response operations and also diversities and unique characteristics of emergency events. In this paper, we explore the idea of building “generalized” classifiers for filtering crisis tweets that can be pre-trained, and are thus ready to use in real-time, while generalizing well on future disasters/crises data. We propose to achieve this using simple feature based adaptation with tweet representations based on word embeddings and also sentence-level embeddings, representations which do not rely on unlabeled data to achieve domain adaptations and can be easily implemented. Given that there are different types of word/sentence embeddings that are widely used, we propose to compare them to get a general idea about which type works better with crisis tweets classification tasks. Our experimental results show that GloVe embeddings in general work better with the datasets used in our evaluation, and that the supervised algorithms used in our experiments benefit from GloVe embeddings trained specifically on crisis data. Furthermore, our experimental results show that following GloVe, the sentence embeddings have great potential in crisis tweet tasks. |
Address |
Kansas State University; Kansas State University; Kansas State University; Kansas State University |
Corporate Author |
|
Thesis |
|
Publisher |
Massey Univeristy |
Place of Publication |
Albany, Auckland, New Zealand |
Editor |
Kristin Stock; Deborah Bunker |
Language |
English |
Summary Language |
|
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
|
ISBN |
|
Medium |
|
Track |
Social Media and Community Engagement Supporting Resilience Building |
Expedition |
|
Conference |
|
Notes |
|
Approved |
no |
Call Number |
|
Serial |
1689 |
Share this record to Facebook |
|
|
|
Author |
Reza Mazloom; HongMin Li; Doina Caragea; Muhammad Imran; Cornelia Caragea |
Title |
Classification of Twitter Disaster Data Using a Hybrid Feature-Instance Adaptation Approach |
Type |
Conference Article |
Year |
2018 |
Publication |
ISCRAM 2018 Conference Proceedings – 15th International Conference on Information Systems for Crisis Response and Management |
Abbreviated Journal |
Iscram 2018 |
Volume |
|
Issue |
|
Pages |
727-735 |
Keywords |
Tweet classification, Domain adaptation, Matrix factorization, k-Nearest Neighbors, Disaster response |
Abstract |
Huge amounts of data that are generated on social media during emergency situations are regarded as troves of critical information. The use of supervised machine learning techniques in the early stages of a disaster is challenged by the lack of labeled data for that particular disaster. Furthermore, supervised models trained on labeled data from a prior disaster may not produce accurate results, given the inherent variation between the current and the prior disasters. To address the challenges posed by the lack of labeled data for a target disaster, we propose to use a hybrid feature-instance adaptation approach based on matrix factorization and the k nearest neighbors algorithm, respectively. The proposed hybrid adaptation approach is used to select a subset of the source disaster data that is representative for the target disaster. The selected subset is subsequently used to learn accurate Naive Bayes classifiers for the target disaster. |
Address |
|
Corporate Author |
|
Thesis |
|
Publisher |
Rochester Institute of Technology |
Place of Publication |
Rochester, NY (USA) |
Editor |
Kees Boersma; Brian Tomaszeski |
Language |
English |
Summary Language |
English |
Original Title |
|
Series Editor |
|
Series Title |
|
Abbreviated Series Title |
|
Series Volume |
|
Series Issue |
|
Edition |
|
ISSN |
2411-3387 |
ISBN |
978-0-692-12760-5 |
Medium |
|
Track |
Social Media Studies |
Expedition |
|
Conference |
ISCRAM 2018 Conference Proceedings - 15th International Conference on Information Systems for Crisis Response and Management |
Notes |
|
Approved |
no |
Call Number |
|
Serial |
2146 |
Share this record to Facebook |