Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition

We, in this repository, share our labeled datasets, extracted corpora, code and scripts of the exploratory analysis, the multivariate machine learning classifiers and clusters, and the implementation and deployment of the best-performing classifier as a web-based detection system called "Egyptian Arabic Wikipedia Scanner", which all are introduced in our accepted paper, Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition, at The 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT6), co-located with LREC-COLING 2024, 20-25 May 2024.

Saied Alshahrani, Hesham Haroon, Ali Elfilali, Mariama Njie, and Jeanna Matthews. 2024. Leveraging Corpus Metadata to Detect Template-based Translation: An Exploratory Case Study of the Egyptian Arabic Wikipedia Edition. In Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024, pages 31–45, Torino, Italia. ELRA and ICCL.*

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Experimental-Setups		Experimental-Setups
Exploratory-Analysis		Exploratory-Analysis
Template-Translation-Detection		Template-Translation-Detection
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback