SI 618 final project
💻Data Analysis: Spark SQL, PySpark, Hadoop
🎨Data visualization: Python Altair
It’s not a secret that there exists an inherent gender bias in the movie business. Female actors usually have less income than male actors. There are fewer female protagonists in movies, and female characters are usually lack of serious development and depth compared to their male counterparts. How to quantify this gender bias is one of the topics of interest. Inspired by an article by FiveThirtyEight about the relationship between female prominence in movies, evaluated by the Bechdel test, and movie budget and box office, I decided to explore this topic further in this project. Specifically, I was wondering what kind of movie will pass the Bechdel test. Thus, in this project, I examined and discussed the relationship between a set of movie characteristics with passing the Bechdel test, including release decade, country of production, movie genre, crew gender, IMDb rating, budget, domestic and international box office and return of investment (ROI).
* Bechdel movie dataset from BechdelTest.com
* Boxofficemojo dataset from Kaggle
* IMDb movies extensive dataset from Kaggle
Some of the findings:
💡For more analyses, including details of data preprocessing and manipulation, and more visualizations, please refer to the final report
1. Overall, there is an increasing trend in the percentage of movies passing the Bechdel test (represented by the green bars) over decades.
2. There is an overall trend that the higher the IMDb rating, the lower the percentage of movies passing the Bechdel test.
Another interesting finding is that, this negative relationship between IMDb rating and the percentage passing the Bechdel test doesn’t differ between male and female voters