Skip to content

7b. Running the metadata processing script

Shelley Staples edited this page Nov 30, 2021 · 1 revision

Contents

Required installations

Before you try to run this script, make sure you have the following installed:

If you have Anaconda installed If you have installed Python another way
conda install pandas pip install pandas
conda install xlrd pip install xlrd
conda install pyyaml pip install pyyaml

Note: If these commands do not work, you might need to install Anaconda first.

Downloading the script and the files

There are two ways (1 and 2 below) that you can download the script and test files:

    1. From the git website: Navigate to the ciabatta directory, then in the upper right corner click on the "Code" button and select “Download zip”. This will download the zip file on your computer. Then unzip the file (Windows users: ensure you unzip the file), and you will have the script with the folder on your computer.
    1. From the terminal: Navigate to the ciabatta directory, then in the upper right corner click on the "Code" button and copy the link. Now navigate to your terminal on a Mac (in Windows, use Command Prompt or Powershell) and run this line: git clone https://github.com/writecrow/ciabatta.git This will download the git directory with the script and the files onto your computer.

Note: Your computer might not have Git pre-installed.

On a Mac: To get Git, copy and paste the following code in your command line (to first install Homebrew): /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)". Then in your terminal type brew install git. To verify if the installation was successful, check your git version with git --version.

On a PC: After you install Anaconda, run this command: conda install -c anaconda git

What does the script do?

The purpose of the script is to combine two spreadsheets with metadata to be used at a later step for adding headers and changing filenames. The script outputs a new spreadsheet with the data combined from the two existing spreadsheets. Your context for combining spreadsheets with metadata might be different but it is not uncommon to have one spreadsheet, for example, with only consented participants and the other with all the students from a given course. To help you adapt the script to your context, we incorporated a YAML file that can be altered to fit your specific needs.

Required documents

To run the script, we have provided a folder with test files. The following documents are included in the test files and are needed to run the script:

  • Spreadsheet 1 (consented_students.xlsx)
  • Spreadsheet 2 (registrar_file.xlsx)
  • YAML file (metadata.yaml)

YAML file

YAML (Yet Another Markup Language or YAML Ain’t Markup Language) is often used in conjunction with programming languages like Python to write configuration files. In our example, the YAML file indicates that we use spreadsheets (file_1 and file_2), and that both of these spreadsheets have a column named “Name” that will be used as an anchor to combine the spreadsheets. Note that if your column name in your data is different, then you would need to change that in the YAML file accordingly. Also, the YAML file indicates that the first file (file_1) contains tabs, which in our case are instructor codes, while the second file (file_2) does not contain tabs.

Running the script

In your terminal, navigate to where your downloaded ciabatta folder is. For example, if you unzipped the files to your Desktop, navigate to your Desktop.

cd Desktop, then cd ciabatta

Now navigate inside the metadata_prep folder inside ciabatta where the script and the folder with test files:

cd metadata_prep

Use the following command to run the script:

python process_metadata_ciabatta.py --file1=test_files/consented_students.xlsx --file2=test_files/registrar_file.xlsx --yaml_file=test_files/metadata.yaml

Running the script should result in the creation of .csv file called metadata.csv.

Video presentation

A video version of this content is available on the Crow YouTube channel.

Video: Running a metadata script

Navigating CIABATTA

Previous: 7a. Gathering and preparing metadata

Next: 8. Adding headers and changing filenames