NOTE: I’m no longer updating this page. A comprehensive and up-to-date list of the datasets is now at the bigIR website.

  • Background Relevance Dataset: Annotations and Analysis for Background Linking
  • CT19-T2 Fact-Checking Dataset: download zip file.
  • ArTest Test Collection: download zip file.
  • Troll Detection Dataset: download zip file.
  • AyaTEC: QA on the Holy Qur’an Dataset
  • Fact Checking Dataset: Check here.
  • WebCrowd25k: Check here.
  • Dialectal Arabic Tweets (DART) Dataset
  • Answerable Question Identification in Arabic Tweets
    • ArQAT-AQI-Dataset-v1.0: download txt file.
  • ArabicWeb16: Check here.
  • EveTAR:
    • Download it from here.
  • Journalists Questions on Twitter
    • ArQAT-JQ-Dataset-v1.0: download zip file.
  • Detecting Automatically-Generated Arabic Tweets
    • AutoTweet-Dataset-v1.0: download zip file.
  • Question Identification in Arabic Tweets
    • ArQAT-QI-Dataset-v1.0: download zip file.