Anna Kruspe, Jens Kersten, & Friederike Klan. (2019). Detecting event-related tweets by example using few-shot models. In Z. Franco, J. J. González, & J. H. Canós (Eds.), Proceedings of the 16th International Conference on Information Systems for Crisis Response And Management. Valencia, Spain: Iscram.
Abstract: Social media sources can be helpful in crisis situations, but discovering relevant messages is not trivial. Methods
have so far focused on universal detection models for all kinds of crises or for certain crisis types (e.g. floods).
Event-specific models could implement a more focused search area, but collecting data and training new models for
a crisis that is already in progress is costly and may take too much time for a prompt response. As a compromise,
manually collecting a small amount of example messages is feasible. Few-shot models can generalize to unseen
classes with such a small handful of examples, and do not need be trained anew for each event. We show how
these models can be used to detect crisis-relevant tweets during new events with just 10 to 100 examples and
counterexamples. We also propose a new type of few-shot model that does not require counterexamples.
|
Jens Kersten, Anna Kruspe, Matti Wiegmann, & Friederike Klan. (2019). Robust filtering of crisis-related tweets. In Z. Franco, J. J. González, & J. H. Canós (Eds.), Proceedings of the 16th International Conference on Information Systems for Crisis Response And Management. Valencia, Spain: Iscram.
Abstract: Social media enables fast information exchange and status reporting during crises. Filtering is usually required to
identify the small fraction of social media stream data related to events. Since deep learning has recently shown to
be a reliable approach for filtering and analyzing Twitter messages, a Convolutional Neural Network is examined for
filtering crisis-related tweets in this work. The goal is to understand how to obtain accurate and robust filtering
models and how model accuracies tend to behave in case of new events. In contrast to other works, the application
to real data streams is also investigated. Motivated by the observation that machine learning model accuracies
highly depend on the used data, a new comprehensive and balanced compilation of existing data sets is proposed.
Experimental results with this data set provide valuable insights. Preliminary results from filtering a data stream
recorded during hurricane Florence in September 2018 confirm our results.
|
Jens Kersten, Jan Bongard, & Friederike Klan. (2022). Gaussian Processes for One-class and Binary Classification of Crisis-related Tweets. In Rob Grace, & Hossein Baharmand (Eds.), ISCRAM 2022 Conference Proceedings – 19th International Conference on Information Systems for Crisis Response and Management (pp. 664–673). Tarbes, France.
Abstract: Overload reduction is essential to exploit Twitter text data for crisis management. Often used pre-trained machine learning models require training data for both, crisis-related and off-topic content. However, this task can also be formulated as a one-class classification problem in which labeled off-topic samples are not required. Gaussian processes (GPs) have great potential in both, binary and one-class settings and are therefore investigated in this work. Deep kernel learning combines the representative power of text embeddings with the Bayesian formalism of GPs. Motivated by this, we investigate the potential of deep kernel models for the task of classifying crisis-related tweet texts with special emphasis on cross-event applications. Compared to standard binary neural networks, first experiments with one-class GP models reveal a great potential for realistic scenarios, offering a fast and flexible approach for interactive model training without requiring off-topic training samples and comprehensive expert knowledge (only two model parameters involved).
|
Jens Kersten, Jan Bongard, & Friederike Klan. (2021). Combining Supervised and Unsupervised Learning to Detect and Semantically Aggregate Crisis-Related Twitter Content. In Anouck Adrot, Rob Grace, Kathleen Moore, & Christopher W. Zobel (Eds.), ISCRAM 2021 Conference Proceedings – 18th International Conference on Information Systems for Crisis Response and Management (pp. 744–754). Blacksburg, VA (USA): Virginia Tech.
Abstract: Twitter is an immediate and almost ubiquitous platform and therefore can be a valuable source of information during disasters. Current methods for identifying and classifying crisis-related content are often based on single tweets, i.e., already known information from the past is neglected. In this paper, the combination of tweet-wise pre-trained neural networks and unsupervised semantic clustering is proposed and investigated. The intention is to (1) enhance the generalization capability of pre-trained models, (2) to be able to handle massive amounts of stream data, (3) to reduce information overload by identifying potentially crisis-related content, and (4) to obtain a semantically aggregated data representation that allows for further automated, manual and visual analyses. Latent representations of each tweet based on pre-trained sentence embedding models are used for both, clustering and tweet classification. For a fast, robust and time-continuous processing, subsequent time periods are clustered individually according to a Chinese restaurant process. Clusters without any tweet classified as crisis-related are pruned. Data aggregation over time is ensured by merging semantically similar clusters. A comparison of our hybrid method to a similar clustering approach, as well as first quantitative and qualitative results from experiments with two different labeled data sets demonstrate the great potential for crisis-related Twitter stream analyses.
|
Matti Wiegmann, Jens Kersten, Friederike Klan, Martin Potthast, & Benno Stein. (2020). Analysis of Detection Models for Disaster-Related Tweets. In Amanda Hughes, Fiona McNeill, & Christopher W. Zobel (Eds.), ISCRAM 2020 Conference Proceedings – 17th International Conference on Information Systems for Crisis Response and Management (pp. 872–880). Blacksburg, VA (USA): Virginia Tech.
Abstract: Social media is perceived as a rich resource for disaster management and relief efforts, but the high class imbalance between disaster-related and non-disaster-related messages challenges a reliable detection. We analyze and compare the effectiveness of three state-of-the-art machine learning models for detecting disaster-related tweets. In this regard we introduce the Disaster Tweet Corpus~2020, an extended compilation of existing resources, which comprises a total of 123,166 tweets from 46~disasters covering 9~disaster types. Our findings from a large experiments series include: detection models work equally well over a broad range of disaster types when being trained for the respective type, a domain transfer across disaster types leads to unacceptable performance drops, or, similarly, type-agnostic classification models behave more robust at a lower effectiveness level. Altogether, the average misclassification rate of~3,8\% on performance-optimized detection models indicates effective classification knowledge but comes at the price of insufficient generalizability.
|