Skip to content

Latest commit

 

History

History
143 lines (71 loc) · 3.61 KB

manifiest_file.md

File metadata and controls

143 lines (71 loc) · 3.61 KB

#Manifest File

##raw_data_README.md:

Readme file that describes how to access my raw data.

##Requirements.txt:

All of the needed modules to create my final data file.

##final_institution_doi_reproducible_JupyterNotebook.ipynb:

Reproducible notebook that walks through joining my intermediate data files.

##final_institution_doi_assessment_report.pdf:

An assment of the value of my final dataset.

##DataCite.md:

Markdown file that descirbes my DataCite data file.

##GRID.md:

Markdown file that describes my GRID data file.

##Harvard_DataVerse.md:

Markdown file that describes my GRID data file.

##OpenAire.md:

Markdown file that desicirbres my OpenAire data file.

##Wieker_summary_slide.pptx:

Powerpoint that describes the goals and processes of my dataset project.

##resume_excerpt.pdf:

An excerpt of what this dataset looks like in my Resume.

##original_DataCite_data (folder):

A folder that contains a set of folders, which each contain the json data for each institution from DataCite.

##DataCite_API.py:

Accessing DataCite API for institution pages 1-40.

##DataCite_API_pages(41-100).py:

Accessing DataCite API for institution pages 41-100 if they existed.

##datacentersfull.txt:

Supplemental file for DataCite_API_pages(41-100).py, which is used to access institution names

##DataCite_csvconvert.py:

Extract & convert DataCite data to a csv file.

##DataCite_Doi.csv:

File that was created by DataCite_csvconvert.py; It is the elements that I extracted my raw data.

##curated_DataCite_doi:

The csv file that I created from my DataCite_agg.py script.

##curated_DataCite_doi_2.0.csv:

I made some hand eidts of my curated_DataCite_doi csv file. I added a column next to institution codes, which was a list of institution names that corresponded to the codes. This is the data file that I used for my jupyter notebook.

##DataCite_agg.py: Py file to aggregate the DataCite_Doi.csv file by institution.

##grid.csv:

Main GRID database file that describes institutions.

##types.csv

supplemental GRID database file that describers institution types.

##grid_merge.py:

Merge GRID database files, so I have a file to show institution location and type.

##grid_merge.csv:

Merged GRID database that shows institution location and type.

##grid_curated.csv:

Only contains institutions from the grid_merge.csv file that are from my collected data.

##DataVerse_screenshot_1:

Screenshot 1 of the query I did in DataVerse's repository to get dataset metadata.

##DataVerse_screenshot_2:

Screenshot 2 of the query I did in DataVerse's repository to get dataset metadata.

##DataVerse_doi.csv:

I handcoded this file from the dataset 2017 publication information that was available on DataVerse's repository.

##curated_DataVerse_doi:

I curated my DataVerse_doi.csv file, and this is the file that is read into my jupyter reproducible notebook.

##OpenAire_doi.txt:

I copy pasted all of the dataset 2017 publication dataset data that is present on their search repository page.

##OpenAire_screenshot:

screenshot of what OpenAire looked like when I copy pasted the publication data.

##OpenAire_doi.csv:

Handcoded csv from OpenAire_doi.txt.

##curated_OpenAire_doi.csv:

I curated my OpenAire_doi.csv file, and this is the file that is read into my jupyter reproducible notebook.

##institution_doi.csv:

Merged curated_DataCite_doi_2.0.csv, curated_DataVerse_doi.csv, curated_OpenAire_doi.csv, and grid_curated.csv into one data file.

##cleaned_institution_doi.csv:

Cleaned up some duplicate rows from institution_doi.csv.

##final_institution_doi.csv:

Final dataset; did some more cleaning (removed incorrect license information and reset index).