-
Notifications
You must be signed in to change notification settings - Fork 6
7. Organizing, preparing and processing metadata
Organizing, preparing, and processing metadata are steps in corpus building that ultimately lead to a metadata spreadsheet (or spreadsheets) which contains information about the texts and participants who created the texts (e.g., course, assignment, student country of origin, student TOEFL scores). When organizing, preparing, and processing metadata, you need to take into account what participants you have information about (e.g., instructors, students, interviewers). You might also have information on other contextual variables, such as the courses in which assignments were completed or length of a timed exam.
This information is helpful to have on its own, as part of your dataset to keep track of information about participants. A metadata spreadsheet can also be used to add headers to files, change filenames, and as an aid the deidentification process. The metadata may be gathered from your university’s registrar, or a survey that participants take. Alternatively, if your filenames already contain metadata information, you can create a spreadsheet with metadata by extracting the information from the filenames. We will focus on the first use case.
First, we provide guidance on gathering and preparing metadata from various sources in 7a. Gathering and preparing metadata. Next, we provide a script to combine metadata into one spreadsheet in 7b. Running the metadata processing script. You will add the metadata to your files in 8. Adding headers and changing filenames.
Previous: 6b. Manually converting your data
CIABATTA: Corpus in a Box: Automated Tools, Tutorials, & Advising
See a problem in this wiki? Report an issue. Unsure how to report using GitHub? Get help reporting.