|
|
Anjum, U., Zadorozhny, V., & Krishnamurthy, P. (2023). Localization of Events Using Neural Networks in Twitter Data. In Jaziar Radianti, Ioannis Dokas, Nicolas Lalone, & Deepak Khazanchi (Eds.), Proceedings of the 20th International ISCRAM Conference (pp. 909–919). Omaha, USA: University of Nebraska at Omaha.
Abstract: In this paper, we develop a model with neural networks to localize events using microblogging data. Localization is the task of finding the location of an event and can be done by discovering event signatures in microblogging data. We use the deep learning methodology of Bi-directional Long Short-Term Memory (Bi-LSTM) to learn event signatures. We propose a methodology for labeling the Twitter date for use in Bi-LSTM However, there might not be enough data available to train the Bi-LSTM and learn the event signatures. Hence, the data is augmented using generative adversarial networks (GAN). Finally, we combine event signatures at different temporal and spatial granularity to improve the accuracy of event localization. We use microblogging data collected from Twitter to evaluate our model and compare it with other baseline methods.
|
|
|
|
Li, H., Caragea, D., Mhatre, A., Ge, J., & Liu, M. (2023). Identifying COVID-19 Tweets Relevant to Low-Income Households Using Semi-supervised BERT and Zero-shot ChatGPT Models. In Jaziar Radianti, Ioannis Dokas, Nicolas Lalone, & Deepak Khazanchi (Eds.), Proceedings of the 20th International ISCRAM Conference (pp. 953–963). Omaha, USA: University of Nebraska at Omaha.
Abstract: Understanding the COVID-19 pandemic impacts on low-income households can inform social services about the needs of vulnerable communities. Some recent works have studied such impacts through social media content analysis, and supervised machine learning models have been proposed to automatically classify COVID-19 tweets into different categories, such as income and economy impacts, social inequality and justice issues, etc. In this paper, we propose semi-supervised learning models based on BERT with Self-Training and Knowledge Distillation for identifying COVID-19 tweets relevant to low-income households by leveraging readily available unlabeled data in addition to limited amounts of labeled data. Furthermore, we explore ChatGPT’s potential for annotating COVID-19 data and the performance of fine-tuned GPT-3 models. Our semi-supervised BERT model with Knowledge Distillation showed improvements compared to a supervised baseline model, while zero-shot ChatGPT showed good potential as a tool for annotating crisis data. However, our study suggests that the cost of fine-tuning large and expensive GPT-3 models may not be worth for some tasks.
|
|
|
|
Salemi, H., Senarath, Y., & Purohit, H. (2023). A Comparative Study of Pre-trained Language Models to Filter Informative Code-mixed Data on Social Media during Disasters. In Jaziar Radianti, Ioannis Dokas, Nicolas Lalone, & Deepak Khazanchi (Eds.), Proceedings of the 20th International ISCRAM Conference (pp. 920–932). Omaha, USA: University of Nebraska at Omaha.
Abstract: Social media can inform response agencies during disasters to help affected people. However, filtering informative messages from social media content is challenging due to the ungrammatical text, out-of-vocabulary words, etc., that limit the context interpretation of messages. Further, there has been limited exploration of the challenge of code-mixing (using words from another language in a given text of one language) in user-generated content during disasters. Hence, we proposed a new code-mixed dataset of tweets related to the 2017 Iran-Iraq Earthquake and annotated them based on their informativeness characteristics. Additionally, we have evaluated the performance of state-of-the-art pre-trained language models: mBERT, RoBERTa, and XLM-R, on the proposed dataset. The results show that mBERT (with F1 score of 72%) overweighs the other models in classifying informative code-mixed messages. Moreover, we analyzed some patterns of exploiting code-mixing by users, which can help future works in developing these models.
|
|
|
|
Schmidt-Colberg, A., & Löffler-Dauth, L. (2023). A Human-Centric Evaluation Dataset for Automated Early Wildfire Detection from a Causal Perspective. In Jaziar Radianti, Ioannis Dokas, Nicolas Lalone, & Deepak Khazanchi (Eds.), Proceedings of the 20th International ISCRAM Conference (pp. 933–943). Omaha, USA: University of Nebraska at Omaha.
Abstract: Insight into performance ability is crucial for successfully implementing AI solutions in real-world applications. Unanticipated input can lead to false positives (FP) and false negatives (FN), potentially resulting in false alarms in fire detection scenarios. Literature on fire detection models shows varying levels of complexity and explicability in evaluation practices; little supplementary information on performance ability outside of accuracy scores is provided. We advocate for a standardized evaluation dataset that prioritizes the end-user perspective in assessing performance capabilities. This leads us to ask what an evaluation dataset needs to constitute to enable a non-expert to determine the adequacy of a model's performance capabilities for their specific use case. We propose using data augmentation techniques that simulate interventions to remove the connection to the original target label, providing interpretable counterfactual explanations into a model's predictions.
|
|