Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 564 Bytes

README.md

File metadata and controls

9 lines (7 loc) · 564 Bytes

SDC

  • A 210,396-word corpus called the Saudi Dialect Corpus (SDC)
  • It was built for training the Saudi model, containing the mixed dialects of Saudi Arabia.
  • It was collected from social media platforms, such as Facebook and Twitter.
  • It is 2,018 KB in size.

If you use the SDC corpus, Please cite this paper:

Tarmom, T., Teahan, W., Atwell, E. and Alsalka, M.A., 2020. Compression versus traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case study. Natural Language Engineering, 26(6), pp.663-676.