NOTE: I’m no longer updating this page. A comprehensive and up-to-date list of the datasets is now at the bigIR website.
- Background Relevance Dataset: Annotations and Analysis for Background Linking
- CT19-T2 Fact-Checking Dataset: download zip file.
- ArTest Test Collection: download zip file.
- Troll Detection Dataset: download zip file.
- AyaTEC: QA on the Holy Qur’an Dataset
- Fact Checking Dataset: Check here.
- WebCrowd25k: Check here.
- Dialectal Arabic Tweets (DART) Dataset
- Answerable Question Identification in Arabic Tweets
- ArQAT-AQI-Dataset-v1.0: download txt file.
- ArabicWeb16: Check here.
- EveTAR:
- Journalists Questions on Twitter
- ArQAT-JQ-Dataset-v1.0: download zip file.
- Detecting Automatically-Generated Arabic Tweets
- AutoTweet-Dataset-v1.0: download zip file.
- Question Identification in Arabic Tweets
- ArQAT-QI-Dataset-v1.0: download zip file.