Anthony Verardi | a.verardi@pitt.edu | University of Pittsburgh
Project completed 4/24/2020
This project explores the contents of the Arabic Learner Corpus (ALC) to assess how they might be applied to Second Language Acquisition/Teaching. The ALC is a collection of written and spoken texts collected from learners of Modern Standard Arabic (MSA) in Saudi Arabia, including both native speaker learners (learning MSA as a prestige variant) and non-native speaker learners. The XML files also accompanied by metadata about each participant and each observation of their data.
- Notebooks: Jupyter Notebooks that contain all of the coding and preliminary analysis done for this project
- ALC Data Organization: a Notebook containing my data (re)organization and cleaning process for the ALC dataset
- ALC Data Analysis: a Notebook containing the actual analysis performed on my restructured version of the ALC
- ALC Scrap Code: a "code graveyard" for ideas that didn't pan out and code that didn't work out quite right
- Presentation: a short presentation outlining the preliminary findings of this project, available as both a full PowerPoint presentation with voiceover or .pdf slides
- Data: samples of the dataset used for this project, namely the first 1000 original XML files (GitHub won't allow me to upload > 1000 files). Note: none of the original XML files have been altered! The cleaning process was done entirely on imported data in my Organization Notebook, leaving the originals untouched.
- Visualizations: image file copies of all visualizations created over the course of this project
- .gitignore: a list of filetypes my repository is set to ignore on my local rig
- final_report.md: the final report for this project containing full analysis and conclusions
- LICENSE.md: the license under which this project has been made publicly available; you can find a quick overview of the license on this page
- README.md: the document you are currently reading!
- progress_report.md: markdown file documenting the development of this project
- project_plan.md: markdown file containing the original and revised project plans for this work
This project is licensed under a Creative Commons Attribution-NonCommercial (CC BY-NC 4.0). Choose this license if you want to permit others to share (mirror) and adapt (borrow and alter) your mod content, providing that they credit you and don't use your work for commercial purposes.
Original corpus credit to:
Alfaifi, A., Atwell, E. and Hedaya, I. (2014). Arabic Learner Corpus (ALC) v2: A New Written and Spoken Corpus of Arabic Learners. In the proceedings of the Learner Corpus Studies in Asia and the World (LCSAW) 2014, 31 May - 01 Jun 2014. Kobe, Japan. http://www.arabiclearnercorpus.com.
Have a comment? Visit my guest book here!