||Assessing the effectiveness of crisis response is key to improving preparedness and adapting policies. One method for response evaluation is reviewing actual response activities and interactions. Response reports are often available in the form of natural language text data. Analyzing a large number of such reports requires automated or semi automated solutions. To improve the trustworthiness of methods for this purpose, we empirically validate the reliability of three relation extraction methods that we used to construct interorganizational collaboration networks by comparing them against human-annotated ground truth (crisis-specific situational reports and tweets). For entity extraction, we find that using a combination of two off-the-shelf methods (FlairNLP and SpaCy) is optimal for situational reports data and one method (SpaCy) for tweets data. For relation extraction, we find that a heuristics-based model that we built by leveraging word co-occurrence and deep and shallow syntax as features and training it on domain-specific text data outperforms two state-of-the-art relation extraction models (Stanford OpenIE and OneIE) that were pre-trained on general domain data. We also find that situational reports, on average, contain less entities and relations than tweets, but the extracted networks are more closely related to collaboration activities mentioned in the ground truth. As it is widely known that general domain tools might need adjustment to perform accurately in specific domains, we did not expect the tested off-the-shelf tools to perform highly accurately. Our point is to rather identify what accuracy one could reasonably expect when leveraging available resources as-is for domain specific work (in this case, crisis informatics), what errors (in terms of false positives and false negatives) to expect, and how to account for that.