-
Notifications
You must be signed in to change notification settings - Fork 6
7b. Running the metadata processing script
- Required installations
- Downloading the script and the files
- What does the script do?
- Required documents
- YAML file
- Running the script
- Video presentation
Before you try to run this script, make sure you have the following installed:
If you have Anaconda installed | If you have installed Python another way |
---|---|
conda install pandas |
pip install pandas |
conda install xlrd |
pip install xlrd |
conda install pyyaml |
pip install pyyaml |
Note: If these commands do not work, you might need to install Anaconda first.
There are two ways (1 and 2 below) that you can download the script and test files:
-
- From the git website: Navigate to the ciabatta directory, then in the upper right corner click on the "Code" button and select “Download zip”. This will download the zip file on your computer. Then unzip the file (Windows users: ensure you unzip the file), and you will have the script with the folder on your computer.
-
- From the terminal: Navigate to the ciabatta directory, then in the upper right corner click on the "Code" button and copy the link. Now navigate to your terminal on a Mac (in Windows, use Command Prompt or Powershell) and run this line: git clone https://github.com/writecrow/ciabatta.git This will download the git directory with the script and the files onto your computer.
Note: Your computer might not have Git pre-installed.
On a Mac: To get Git, copy and paste the following code in your command line (to first install Homebrew): /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
. Then in your terminal type brew install git
. To verify if the installation was successful, check your git version with git --version
.
On a PC: After you install Anaconda, run this command:
conda install -c anaconda git
The purpose of the script is to combine two spreadsheets with metadata to be used at a later step for adding headers and changing filenames. The script outputs a new spreadsheet with the data combined from the two existing spreadsheets. Your context for combining spreadsheets with metadata might be different but it is not uncommon to have one spreadsheet, for example, with only consented participants and the other with all the students from a given course. To help you adapt the script to your context, we incorporated a YAML file that can be altered to fit your specific needs.
To run the script, we have provided a folder with test files. The following documents are included in the test files and are needed to run the script:
- Spreadsheet 1 (consented_students.xlsx)
- Spreadsheet 2 (registrar_file.xlsx)
- YAML file (metadata.yaml)
YAML (Yet Another Markup Language or YAML Ain’t Markup Language) is often used in conjunction with programming languages like Python to write configuration files. In our example, the YAML file indicates that we use spreadsheets (file_1 and file_2), and that both of these spreadsheets have a column named “Name” that will be used as an anchor to combine the spreadsheets. Note that if your column name in your data is different, then you would need to change that in the YAML file accordingly. Also, the YAML file indicates that the first file (file_1) contains tabs, which in our case are instructor codes, while the second file (file_2) does not contain tabs.
In your terminal, navigate to where your downloaded ciabatta folder is. For example, if you unzipped the files to your Desktop, navigate to your Desktop.
cd Desktop
, then cd ciabatta
Now navigate inside the metadata_prep folder inside ciabatta where the script and the folder with test files:
cd metadata_prep
Use the following command to run the script:
python process_metadata_ciabatta.py --file1=test_files/consented_students.xlsx --file2=test_files/registrar_file.xlsx --yaml_file=test_files/metadata.yaml
Running the script should result in the creation of .csv file called metadata.csv.
A video version of this content is available on the Crow YouTube channel.
Video: Running a metadata script
Previous: 7a. Gathering and preparing metadata
CIABATTA: Corpus in a Box: Automated Tools, Tutorials, & Advising
See a problem in this wiki? Report an issue. Unsure how to report using GitHub? Get help reporting.