7. Organizing, preparing and processing metadata

Organizing, preparing, and processing metadata are steps in corpus building that ultimately lead to a metadata spreadsheet (or spreadsheets) which contains information about the texts and participants who created the texts (e.g., course, assignment, student country of origin, student TOEFL scores). When organizing, preparing, and processing metadata, you need to take into account what participants you have information about (e.g., instructors, students, interviewers). You might also have information on other contextual variables, such as the courses in which assignments were completed or length of a timed exam.

This information is helpful to have on its own, as part of your dataset to keep track of information about participants. A metadata spreadsheet can also be used to add headers to files, change filenames, and as an aid the deidentification process. The metadata may be gathered from your university’s registrar, or a survey that participants take. Alternatively, if your filenames already contain metadata information, you can create a spreadsheet with metadata by extracting the information from the filenames. We will focus on the first use case.

Next steps

First, we provide guidance on gathering and preparing metadata from various sources in 7a. Gathering and preparing metadata. Next, we provide a script to combine metadata into one spreadsheet in 7b. Running the metadata processing script. You will add the metadata to your files in 8. Adding headers and changing filenames.

Navigating CIABATTA

Previous: 6b. Manually converting your data

Next: 7a. Gathering and preparing metadata

CIABATTA: Corpus in a Box: Automated Tools, Tutorials, & Advising

See a problem in this wiki? Report an issue. Unsure how to report using GitHub? Get help reporting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7. Organizing, preparing and processing metadata

Next steps

Navigating CIABATTA

Clone this wiki locally