Combine Newseletter Datasets.Rmd - R script to merge seperated zooniverse manuscripts.
Processing Data.ipynb - Python script that extracted transcriptions from the Newsletter folder. Additionally, this script uses regex to extract information such as the Filename, Hamnet URL, and Luna URL.
Cleaning Data.Rmd - R script that removed duplicate lines from transcriptions and create datasets that only include key information from previously processed datasets.
Unprocessed Data - Contains copys of all the datasets that we used prior to any manipulation or cleaning done.
Processed Data - Contains datasets that had some cleaning/manipulation done, but needed to be imported into R for the final cleaning steps.
Final Datasets - Contains copies of datasets in their final form. Each dataset is to be delivered to a client who needed different types of data for their purposes.
Pandas - Used for data manipulation and creating dataframes that are exported to .csv files.
os/pathlib - Used to work with Newsletter file directories and extract transcriptions from .txt files.
tidyverse - Used for transforming and final cleaning of processed datasets.