Authors: Jonathan Fetterolf, Matthew Duncan, Nate Kist, & Roshni Janakiraman
We analyzed multiple film databases to determine what factors make a movie successful. Descriptive analyses of movie characteristics and box office data show that Animation is the most profitable movie genre, high budget films provide the strongest return on investment (ROI), and the highest-rated films tend to be 85 - 100 minutes long.
Microsoft may be able to improve their likelihood of producing box office successes by investing in films with similar characteristics to recent successful releases. The following questions guided our analyses:
- What genre of movie is most profitable?
- What type of budget should be allocated for production?
- What is the ideal movie length?
Data for this analysis is from three online movie databases.
TMDB is a user-built database of movie information and user ratings. The current dataset includes 26,517 datapoints and 9 columns of data. The target data includes release date and genre, where the genre codes are ordered by relevance.
TN dataset consists of box office information across 5,782 movies. The target data includes release date, production budget and worldwide gross revenue.
IMDb is an online database consisting of movie information, statistics, and user ratings. The IMDB dataset is comprised of multiple tables. For the current analysis, we used two tables consisting of basic movie data and user ratings for movies. The target data includes release date, runtime length (in minutes), and average rating.
This project uses descriptive analytics to describe trends in the features of successful movies.
For all tables, we removed unnecessary columns, cleaned, and filtered all of the tables used. To make sure that the data we used was relevant to Microsoft's business question, we limited the data to only include movies released between 2010 - 2019 and English movies. All numerical columns were scaled to be in millions (MM).
Question 1: We merged TN with TMDB to address Question #1. TMDB uses 18 primary genres to classify the movies in their database. We used a bar chart to examine the average net profit of each genre of movie, and limited our findings to the top 10 most profitable genres.
Question 2: Using the TN dataset, we calculated two new variables of interest: Net Profit and Return on Investment (ROI). The main analysis used a bar chart to compare the median ROI of films based on production budget. Based on definitions used by Hollywood market researchers, we grouped our data into 3 budget categories:
- Low (less than $20 MM)
- Medium ($20 MM - $100 MM)
- High (greater than $100 MM)
We then compared the median ROI of each budget groups to determine which budget group provided the best value for its cost. We used the median ROI because there are extreme outliers in each budget group that might misrepresent how a 'typical' movie would fare.
Question 3: To address Question #3, we used the IMDB dataset. We decided to focus on the 'typical' runtime of highly rated films. We narrowed down the dataset to include only the highest rated movies (average user rating of 8.0 or greater) on IMDB.
- Over the 2010s, the Animation genre had the highest average yearly profit. ($313 MM per year)
- Family films, a related genre, was the second-highest profitable genre of the 2010s ($292 MM per year).
- High budget films provide the strongest ROI, with the typical high-budget film yielding a 200% ROI
- Out of all of the highest rated movies (ratings > 8.0), there were more movies in the 85 - 100 minute range than any other movie length.
- This would indicate a viewer preference for movies with this length.
This analysis leads to three recommendations for Microsoft's entry in the film industry:
- Produce movies within the Animation and Family genres to maximize net profit
- Animation and Family movies have had the highest average yearly net profits over the 2010s.
- Invest in high-budget films
- High-budget films have the greatest potential for maximum returns
- Microsoft should plan to invest at least $200 MM per film
- Make movies with a runtime length near 90 minutes
- Out of all highly rated movies, there were more movies in the 90-100 minute range than any other movie length. This indicates a viewer preference for movies with this length.
Further analyses could yield additional insights to improve recommendations for Microsoft's studio debut:
-
Our data ends in the year 2019. By including updated data, we could provide a more accurate representation of the film industry, especially given that cinema attendance experienced a drastic fall, but is now trending towards recovery.
-
Additionally, it could be useful to examine what types of movies did well in spite of the pandemic: even with the barriers of COVID, what factors were compelling enough to draw people to theaters?
-
Question 1: The number of movies in each genre is not taken into consideration. If there are a limited number of movies for a particular genre, the sample could be skewed high or low.
-
Question 2: While we can show that high-budget films generally earn more profit, we do not know if people actually enjoyed the movies that were produced with a high budget.
-
Question 3: The runtime length for movies below an average rating of 8.0 are not taken into consideration. We cannot draw specific conclusions that a 90 minute movie will help contribute to a higher rating, rather we can conclude that most higher rated films are within this runtime.
See the full analysis in the Jupyter Notebook or review this presentation.
For additional info, contact:
- Jonathan Fetterolf: jonathan.fetterolf@gmail.com
- Matthew Duncan: mduncan0923@gmail.com
- Nate Kist: natekist@outlook.com
- Roshni Janakiraman: roshnij618@gmail.com
├── Scratch_Notebooks
│ ├── matt-prelim.ipynb
│ ├── nate-prelim.ipynb
│ ├── joining-df.ipynb
│ ├── jon-prelim.ipynb
│ └── roshni-prelim.ipynb
├── images
│ ├── cinema.jpeg
│ ├── director_shot.jpeg
│ ├── figure1.png
│ ├── figure2.png
│ ├── figure3.png
│ └── jheader.png
├── zippedData
│ ├── bom.movie_gross.csv.gz
│ ├── im.db.zip
│ ├── rt.movie_info.tsv.gz
│ ├── rt.reviews.tsv.gz
│ ├── tmdb.movies.csv.gz
│ └── tn.movie_budgets.csv.gz
├── Movie_industry_analysis_notebook.ipynb
├── presentation.pdf
└── README.md