This folder includes data used for research on "Missing Data, Speculative Reading" article.
Copies of published Shakespeare and Company Project dataset files are included for convenience.
Current versions should be obtained from the Project site, and should be cited as listed there:
https://shakespeareandco.princeton.edu/about/data/
Data files in this folder generated as part of the research for this article or data not published elsewhere.
- beach_lendinglibrary_catalog.csv
This data is a set of a spreadsheet of acquisitions compiled by Robert Chiossi for the Project from an inventory from the Sylvia Beach papers.
“Inventories, Order Records, Clients; Sylvia Beach Papers, C0108,” (n.d.), Manuscripts Division, Department of Special Collections, Princeton University Library, findingaids.princeton.edu/catalog/C0108_c02205.
Members with extant but incomplete borrowing records. CSV files list these members and their subscriptions without documented borrowing activity. The collapsed version consolidates sequential or near-sequential subscriptions.
The files were generated by identify_partial_borrowers.py
- partial_borrowers.csv
- partial_borrowers_collapsed.csv
In the course of our research, we discovered long-duration borrow events (duration longer than a year) that had been incorrectly entered; these errors are present in the v1.2 datasets but corrections have been submitted to the Shakespeare and Company Project. Since these impact our estimates, we include a list overrides and a mechanism for applying them.
- long_borrow_overrides.csv
The long borrow corrections are meant to be used with the 1.2 version of the dataset. They can be incorporated like this:
events_df = pd.read_csv("SCoData_events_v1.2_2022-01.csv")
borrow_overrides = pd.read_csv("long_borrow_overrides.csv")
events_df = pd.read_csv("SCoData_events_v1.2_2022-01.csv")
borrow_overrides = pd.read_csv("long_borrow_overrides.csv")
for borrow in borrow_overrides.itertuples():
member_item_borrows = events_df[
(events_df.event_type == "Borrow")
& (events_df.member_uris == borrow.member_uris)
& (events_df.item_uri == borrow.item_uri)
]
if borrow.match_date == "start_date":
# get the *index* of the row to update
update_index = member_item_borrows.index[
member_item_borrows.start_date == borrow.start_date
]
elif borrow.match_date == "end_date":
update_index = member_item_borrows.index[
member_item_borrows.end_date == borrow.end_date
]
# update with correct dates & borrow duration
events_df.at[update_index, "start_date"] = borrow.start_date
events_df.at[update_index, "end_date"] = borrow.end_date
events_df.at[
update_index, "borrow_duration_days"
] = borrow.borrow_duration_days