|
Abstract |
This paper summarizes the final year of the four-year Text REtrieval Conference Incident Streams track (TREC-IS), which has produced a large dataset comprising 136,263 annotated tweets, spanning 98 crisis events. Goals of this final year were twofold: 1) to add new categories for assessing messages, with a focus on characterizing the audience, author, and images associated with these messages, and 2) to enlarge the TREC-IS dataset with new events, with an emphasis of deeper pools for sampling. Beyond these two goals, TREC-IS has nearly doubled the number of annotated messages per event for the 26 crises introduced in 2021 and has released a new parallel dataset of 312,546 images associated with crisis content – with 7,297 tweets having annotations about their embedded images. Our analyses of this new crisis data yields new insights about the context of a tweet; e.g., messages intended for a local audience and those that contain images of weather forecasts and infographics have higher than average assessments of priority but are relatively rare. Tweets containing images, however, have higher perceived priorities than tweets without images. Moving to deeper pools, while tending to lower classification performance, also does not generally impact performance rankings or alter distributions of information-types. We end this paper with a discussion of these datasets, analyses, their implications, and how they contribute both new data and insights to the broader crisis informatics community. |
|