SDC

A 210,396-word corpus called the Saudi Dialect Corpus (SDC)
It was built for training the Saudi model, containing the mixed dialects of Saudi Arabia.
It was collected from social media platforms, such as Facebook and Twitter.
It is 2,018 KB in size.

If you use the SDC corpus, Please cite this paper:

Tarmom, T., Teahan, W., Atwell, E. and Alsalka, M.A., 2020. Compression versus traditional machine learning classifiers to detect code-switching in varieties and dialects: Arabic as a case study. Natural Language Engineering, 26(6), pp.663-676.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SDC

If you use the SDC corpus, Please cite this paper:

Files

README.md

Latest commit

History

README.md

File metadata and controls

SDC

If you use the SDC corpus, Please cite this paper: