Skip to content

VVH/The-Shakespeares-UMD

 
 

Repository files navigation

Shakespeare's World: Data Prep and Processing

Team Lead: Nisa Putri

Team Members: Andrew Carroll, Jonathan Chen, Josh Hershberger, Momina Khan

File Structure:

Combine Newseletter Datasets.Rmd - R script to merge seperated zooniverse manuscripts.

Processing Data.ipynb - Python script that extracted transcriptions from the Newsletter folder. Additionally, this script uses regex to extract information such as the Filename, Hamnet URL, and Luna URL.

Cleaning Data.Rmd - R script that removed duplicate lines from transcriptions and create datasets that only include key information from previously processed datasets.

Unprocessed Data - Contains copys of all the datasets that we used prior to any manipulation or cleaning done.

Processed Data - Contains datasets that had some cleaning/manipulation done, but needed to be imported into R for the final cleaning steps.

Final Datasets - Contains copies of datasets in their final form. Each dataset is to be delivered to a client who needed different types of data for their purposes.

Python Packages:

Pandas - Used for data manipulation and creating dataframes that are exported to .csv files.

os/pathlib - Used to work with Newsletter file directories and extract transcriptions from .txt files.

R Libraries

tidyverse - Used for transforming and final cleaning of processed datasets.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%