|Home||<< 1 >>|
Hongmin Li, Xukun Li, Doina Caragea, & Cornelia Caragea. (2018). Comparison of Word Embeddings and Sentence Encodings for Generalized Representations in Crisis Tweet Classifications. In Kristin Stock, & Deborah Bunker (Eds.), Proceedings of ISCRAM Asia Pacific 2018: Innovating for Resilience – 1st International Conference on Information Systems for Crisis Response and Management Asia Pacific. (pp. 480–493). Albany, Auckland, New Zealand: Massey Univeristy.
Abstract: Many machine learning and natural language processing techniques, including supervised and domain adaptation algorithms, have been proposed and studied in the context of filtering crisis tweets. However, applying these approaches in real-time is still challenging because of time-critical requirements of emergency response operations and also diversities and unique characteristics of emergency events. In this paper, we explore the idea of building “generalized” classifiers for filtering crisis tweets that can be pre-trained, and are thus ready to use in real-time, while generalizing well on future disasters/crises data. We propose to achieve this using simple feature based adaptation with tweet representations based on word embeddings and also sentence-level embeddings, representations which do not rely on unlabeled data to achieve domain adaptations and can be easily implemented. Given that there are different types of word/sentence embeddings that are widely used, we propose to compare them to get a general idea about which type works better with crisis tweets classification tasks. Our experimental results show that GloVe embeddings in general work better with the datasets used in our evaluation, and that the supervised algorithms used in our experiments benefit from GloVe embeddings trained specifically on crisis data. Furthermore, our experimental results show that following GloVe, the sentence embeddings have great potential in crisis tweet tasks.