Cassie Maz, 27 April 2019, cmm281@pitt.edu
Cassie's term project. An analysis of how 'feminine' female dialogue is across companies, story role, and time.
A link to my guestbook: click here
Using a handful of features from Lakoff's list of feminine language features, I analyze how well animated female characters adhere to the features. Specifically, I look at 12 Disney Princess movies and 9 DreamWorks movies to compare the following:
- How often female protagonists and female antagonists use these features
- How often Disney characters and DreamWorks characters use these features
- How earlier characters (like Snow White) and later characters (like Moana) use these features
My found data are movie scripts, and come from two main sources:
- README.md: this file
- LICENSE.md: the sharing license for this repo
- Project_Presentation.pdf: a powerpoint of some earlier findings in my project
- final_report.md: a summary of all my analysis
- progress_report.md: reports of what I've done for this project in Spring 2019
- project_plan.md: my initial goals for this project
- images: the various png images I use as visuals for my project
- data_sample:small examples of the data I'm working with
- Disney_Data_Complete_Edits_Good: completed cleanup of the Disney data
- Final_Disney_DataFrame: A dataframe of all the Disney movies in its final form
- Messy_Disney_Movies: observations on which Disney movies need the most clean-up
- Refining_Disney_Data: Removing the Lion King and adding Moana
- Scraping_Frozen: Web scraping a script for Frozen
- Scraping_Tangled: Web scraping a script for Tangled
- Analyzing_White_Space: Observations on how white space can be used to sort these scripts by line
- Dreamworks_DataFrame: constructing and finalizing the DreamWorks movies into a dataframe
- Kung_Fu_Panda_DataFrame: converts the .txt script into a dataframe
- HTTYD_DataFrame: converts the .txt script of How to Train Your Dragon into a dataframe
- Megamind_DataFrame: creates a dataframe of lines in the movie Megamind from a web-scraped script
- Shrek3_DataFrame: creates a dataframe from the .txt script
- Shrek_DataFrame: creates a dataframe from the .txt script
- Streamline_RotG_HTTYD2_Croods_DataFrame: a single piece of code to create dataframes from the .txt Rise of the Guardians, How to Train Your Dragon 2, and The Croods scripts
- antz-Copy1: creates a dataframe from the .txt script of Antz
- httyd_megamind_kungfupanda: an analysis of how to approach these oddly formatted scripts
- Preliminary Analysis: A folder of basic analysis and exploration of the complete Disney and Dreamworks dataframes
- All_Movies_Analysis_Basic: basic type/token analysis and exploration of the the full dataframe
- All_Movies_Analysis_Basic_Part_2: Similar to the jupyter file above, but with saved image files
- Char_Token_Type_Lists: creates a new dataframe of every character and their total Token/Type counts
- Commands_Analysis: Finds commands in each line through regular expressions and looks for significance in distributions
- General_Stats_All_Movies: Looks at character distributions by gender and role
- Hedges: Finds hedges in each line and looks for significance in distributions
- POS_Tag_Adj_Analysis: Tags each line with part of speech and analyzes adjective distributions
- Politeness_and_Apology: Finds polite forms and apologies in each line and looks for significance in distributions
- Significance_Tests_Token_Type_TTR: Significance tests on token, type, ttr, and k-band distributions
- Tag_Questions: finds tag questions with regular expressions and looks for significance in distributions
- Tok_Type_TTR_Analysis: Adds TTR and k-bands to the character dataframe and graphs distributions
Code that has been abandoned for one reason or another, but are still important to the project
- command_try_2: the code that eventually led me to my current approach to commands
- interruption: an attempt to look at interruption as a feature to analyze, with poor results
- Questions_Exclamations_Etc: an attempt to look at punctuation, but poor assumptions made
- Rough_Streamline: the code that eventually became my streamline for parsing 3 Dreamworks movies
- Shrek_Lines: my very first attempt to approach the Dreamworks scripts
- Topics_by_Gender and Topics_by_Gender-Copy1: both failed attempts to find significant topics of conversation between genders