Abstract: Over the years, with the increase in social media engagement, there has been an in increase in various pipelines to analyze, classify and prioritize crisis-related data on various social media platforms. These pipelines utilize various data augmentation methods to counter imbalanced crisis data, sophisticated and off-the-shelf models for training. However, there is a lack of comprehensive study which compares these methods for the various sections of a pipeline. In this study, we split a general crisis-related pipeline into 3 major sections, namely, data augmentation, model selection, and training methodology. We compare various methods for each of these sections and then present a comprehensive evaluation of which section to prioritize based on the results from various pipelines. We compare our results against two separate tasks, information classification and priority scoring for crisis-related tweets. Our results suggest that data augmentation, in general,improves the performance. However, sophisticated, state-of-the-art language models like DeBERTa only show performance gain in information classification tasks, and models like RoBERTa tend to show a consistent performance increase over our presented baseline consisting of BERT. We also show that, though training two separate task-specific BERT models does show better performance than one BERT model with multi-task learning methodology over an imbalanced dataset, multi-task learning does improve performance for more sophisticated model like DeBERTa with a much more balanced dataset after augmentation.