Contents:
.ipynb
file: summary of analysis and results..mp4
file: video recording of a virtual presentation I delivered at the Loyola Chicago Graduate Research Symposium..py
file (2): Python scripts..R
file: R script..pdf
file: slides presentation in .pdf format.
Abstract:
Advances in text mining and natural language processing have made it viable to study text using methods normally reserved for numerical data. Here I present an analysis of song lyrics based on a data set of 200,000+ songs scraped from the web. I find that several summary statistics follow a smooth unimodal distribution, including total words, unique words, and percentage of words that are unique. These distributions differ as a function of genre, with large effect sizes observed. One of the biggest challenges in natural language processing is the development of tools to measure and score literary devices. I propose a novel framework to measure consonance scores and present an original unsupervised algorithm that can detect consonance in text data. These provide a statistical basis for comparing frequencies of literary devices across songs, genres, and artists.