Kiran Zahra, Rahul Deb Das, Frank O. Ostermann, & Ross S. Purves. (2022). Towards an Automated Information Extraction Model from Twitter Threads during Disasters. In Rob Grace, & Hossein Baharmand (Eds.), ISCRAM 2022 Conference Proceedings – 19th International Conference on Information Systems for Crisis Response and Management (pp. 637–653). Tarbes, France.
Abstract: Social media plays a vital role as a communication source during large-scale disasters. The unstructured and informal nature of such short individual posts makes it difficult to extract useful information, often due to a lack of additional context. The potential of social media threads– sequences of posts– has not been explored as a source of adding context and more information to the initiating post. In this research, we explored Twitter threads as an information source and developed an information extraction model capable of extracting relevant information from threads posted during disasters. We used a crowdsourcing platform to determine whether a thread adds more information to the initial tweet and defined disaster-related information present in these threads into six themes– event reporting, location, time, intensity, casualty and damage reports, and help calls. For these themes, we created the respective thematic lexicons from WordNet. Moreover, we developed and compared four information extraction models trained on GloVe, word2vec, bag-of-words, and thematic bag-of-words to extract and summarize the most critical information from the threads. Our results reveal that 70 percent of all threads add information to the initiating post for various disaster-related themes. Furthermore, the thematic bag-of-words information extraction model outperforms the other algorithms and models for preserving the highest number of disaster-related themes.
|
Nasik Muhammad Nafi, Avishek Bose, Sarthak Khanal, Doina Caragea, & William H. Hsu. (2020). Abstractive Text Summarization of Disaster-Related Documents. In Amanda Hughes, Fiona McNeill, & Christopher W. Zobel (Eds.), ISCRAM 2020 Conference Proceedings – 17th International Conference on Information Systems for Crisis Response and Management (pp. 881–892). Blacksburg, VA (USA): Virginia Tech.
Abstract: Abstractive summarization is intended to capture key information from the full text of documents. In the application domain of disaster and crisis event reporting, key information includes disaster effects, cause, and severity. While some researches regarding information extraction in the disaster domain have focused on keyphrase extraction from short disaster-related texts like tweets, there is hardly any work that attempts abstractive summarization of long disaster-related documents. Following the recent success of Reinforcement Learning (RL) in other domains, we leverage an RL-based state-of-the-art approach in abstractive summarization to summarize disaster-related documents. RL enables an agent to find an optimal policy by maximizing some reward. We design a novel hybrid reward metric for the disaster domain by combining \underline{Vec}tor Similarity and \underline{Lex}icon Matching (\textit{VecLex}) to maximize the relevance of the abstract to the source document while focusing on disaster-related keywords. We evaluate the model on a disaster-related subset of a CNN/Daily Mail dataset consisting of 104,913 documents. The results show that our approach produces more informative summaries and achieves higher \textit{VecLex} scores compared to the baseline.
|