|
|
Li, H., Caragea, D., Mhatre, A., Ge, J., & Liu, M. (2023). Identifying COVID-19 Tweets Relevant to Low-Income Households Using Semi-supervised BERT and Zero-shot ChatGPT Models. In Jaziar Radianti, Ioannis Dokas, Nicolas Lalone, & Deepak Khazanchi (Eds.), Proceedings of the 20th International ISCRAM Conference (pp. 953–963). Omaha, USA: University of Nebraska at Omaha.
Abstract: Understanding the COVID-19 pandemic impacts on low-income households can inform social services about the needs of vulnerable communities. Some recent works have studied such impacts through social media content analysis, and supervised machine learning models have been proposed to automatically classify COVID-19 tweets into different categories, such as income and economy impacts, social inequality and justice issues, etc. In this paper, we propose semi-supervised learning models based on BERT with Self-Training and Knowledge Distillation for identifying COVID-19 tweets relevant to low-income households by leveraging readily available unlabeled data in addition to limited amounts of labeled data. Furthermore, we explore ChatGPT’s potential for annotating COVID-19 data and the performance of fine-tuned GPT-3 models. Our semi-supervised BERT model with Knowledge Distillation showed improvements compared to a supervised baseline model, while zero-shot ChatGPT showed good potential as a tool for annotating crisis data. However, our study suggests that the cost of fine-tuning large and expensive GPT-3 models may not be worth for some tasks.
|
|