Analysts at Business Wire estimate that the global film and video market will reach $410.6 billion by 2030. Microsoft is uniquely situated to leverage its existing technology holidings to redefine the film industry by crafting a one stop shop platform which manages the entire process from preproduction to filming to distribution. Microsoft's executives are in search of actionable ways to ensure successful movies are produced as they launch a new movie studio that is well supported from its onset.
Microsoft is uniquely situated to leverage its existing technology holdings to redefine the film industry by crafting a one stop shop platform which manages the entire process from preproduction to filming to distribution. Microsoft's executives are in search of actionable ways to ensure successful movies are produced as they launch a new movie studio that is well supported from its onset.
The project has the following guiding questions:
- When is the best time of year to release a movie?
- Find trends in the profit of movies based on their release month among the most profitable movies.
- Which director makes the most profitable movies?
- Find directors who have proven track record of making films that generate profits worldwide.
- Which genres of movies make the most profit at the box office?
- Determine pattern among successful movie launches including the type and time of year they are released.
The datasets used in this project are from the following sources:
-
- Type: sql database
- Resources used: The following tables:
- movie_basics
- contains information about the following attributes:
- movie_id ← string
- primary_title ← string
- original_title ← string
- start_year ← int
- runtime_minutes ← float
- genres ← string
- contains information about the following attributes:
- movie_ratings
- contains information about the following attributes:
- movie_id ← string
- averagerating ← float
- numvotes ← string - persons
- contains information about the following attributes:
- person_id ← string
- primary_name ← string
- birth_year ← float
- death_year ← float
- primary_profession ← string
- directors
- contains information about the following attributes:
- person_id ←string
- movie_id ← string
- contains information about the following attributes:
- contains information about the following attributes:
- movie_basics
- Resources used: The following tables:
- Type: sql database
-
- Type: csv file
- contains information about the following attributes:
- genre_ids ←string
- id ←int
- original_language ←string
- original_title ←string
- popularity ←float
- release_date ←string
- title ←string
- vote_average ←float
- vote_count ←int
- contains information about the following attributes:
- Type: csv file
-
- Type: csv file
- contains information about the following attributes:
- title ←string
- studio ←string
- domestic_gross ←float
- foreign_gross ←string
- year ←int
- contains information about the following attributes:
- Type: csv file
-
- Type: csv file
- contains information about the following attributes:
- id ←int
- movie ←string
- production_budget ←string
- domestic_gross ←string
- worldwide_gross ←string
- release_year ←int
- release_month ←int
- contains information about the following attributes:
- Type: csv file
In order to determine answers to my guiding questions, first I needed to import relevant libraries and packages.
sqlite3
: a library that provides a SQL interface that allows accessing and manipulating SQL databasepandas
: a data analysis and manipulation library which allows for flexible reading, writing, and reshaping of datanumpy
: a key library that brings the computationaly power of languages like C to Pythonmatplotlib
: a comprehensive visualization libraryseaborn
: a data visualization library based on matplotlib
I used methods like .info()
, .head()
to review data shape and statistics. I alos used .dropna()
to remove missing values from dataframes if that data was less than 1% of the overall data within a column. I replaced values by using a combination of .fillna()
and .median()
replace missing values with the median value. I combined dataframes using .merge()
and .replace()
to ensure that dataframe queries yielded results that I could analyze.
For the first question I looked for correlations between attributes for the most profitable films. I explored data related to this question using visualizations created with seaborn
and matplotlib
.
For the second question I looked at looked for correlations between the director of a movie and the profit for the top 100 most profitable movies. I explored this question by sorting values in a combined data frame using .sort_values() along with visualizations created with seaborn
and matplotlib
For the second question I looked at looked for correlations between the director of a movie and the profit for the top 100 most profitable movies. I explored this question by sorting values in a combined data frame using .sort_values() along with visualizations created with seaborn
and matplotlib
After look at relationships between release month and worldwide profit generation I found that the best months to release a movie are May, June, July, November and December. With May, June and July (Summer months) yielding the largest profits in the top most profitable films. When viewing the top directors of the top 100 most profitable movies, I found that Joe and Anthony Russo directed the most profitable movies out of the top 100 most profitable movies. While Colin Treverrow, James Wan, and Joss Whedon are amongst the top 5 directors to choose from when choosing directors based on their ability to create profitable movies. Finally, after looking for patterns between genre and worldwide profit, I found that while Animation is the best genre of movie to make with Adventure, Action, Sci-Fi and Musicals rounding out the top five best genres to create based on worldwide profit.
Create movies that are __Sci-Fi, Animation or Adventure films with a budget of approximately 215 million dollars. When creating movies, use effective directors, specially choose a director like Joe Russo, Anthony Russo, Colin Treverrow, James Wan or Joss Whedon who have demonstrated successful direction of profitable movies on the worldwide stage. If the film you launch is in the Sci-Fi genre, release it during May. If the film you launch is in the Animation genre, launch it during June or July. If the film you launch is in the Adventure genre, launch it during November. This will allow the new Microsoft studio to diversify their entry into the large video content space as the median worldwide profit for these films was found to be around $200 million dollars.
- Gather and analyze data on the genre of movies each director is known for.
- Use webscraping tools like
beautifulsoup
to find valuable insights from the film ratings and advertising budgets.
Please review my full analysis in my Jupyter notebook or my presentation. You may reach out to Tenicka Norwood at tenicka.norwood@gmail.com if you have additional questions.
. └── movie-studios-viability-project/ ├── README.md Overview for project reviewers ├── data_analysis.ipynb Documentation of Exploratory Data Analysis in Jupyter notebook ├── data_preparation.ipynb Documentation of Data cleaning in Jupyter notebook ├── project_format.ipynb General project format in Jupyter notebook ├── microsoft_movie_studios_viability_analysis.ipynb Documentation of Full Analysis in Jupyter notebook ├── presentation PDF version of Full Analysis shown in a slidedeck ├── notebook PDF version of Full Analysis shown in Jupyter notebook ├── zippedData/ Externally sourced data ├── Images/ Includes images generated from python code and sourced externally └── .gitignore Specifies intentionally untracked files