GitHub - CS-LEE2022/Investigate_TMDb_Movie_Data: Analyze TMDb movie data and unveil the relationships between multiple variables

Investigate TMDb Movie Data

Introduction

The Movie Database(TMDb) is a community built movie and TV database. Every piece of data has been added by the amazing community dating back to 2008. TMDb's strong international focus and breadth of data is largely unmatched and something they're incredibly proud of. Put simply, they live and breathe community and that's precisely what makes them different.

(Image is from a copyright-free website: https://www.pexels.com/royalty-free-images/.)

This data set contains information about 10,000 movies collected from TMDb, including user ratings and revenue.

Certain columns, like ‘cast’ and ‘genres’, contain multiple values separated by pipe (|) characters;
There are some odd characters in the ‘cast’ column;
The final two columns ending with “_adj” show the budget and revenue of the associated movie in terms of 2010 dollars, accounting for inflation over time.

Table of Contents
Prerequisites 🔍📜
Design 📐
Conclusions 📌
License 🔖

Prerequisites

Python 3.6.3
Jupyter Notebook
Anaconda-Navigator

Design

Step One - Choose Data Set

Click this link to download the corresponding data.

Step Two - Get Organized

This project eventually contain:

The report communicating any findings;
Any Python code used during the analysis;
The data set;

Step Three - Analyze

Brainstorm some questions that could be answered using the data set, then start answering those questions, we would mainly focus on looking at the relationships between multiple variables.

Conclusions

In current study, a good amount of profound analysis has been carried out. Prior to each step, deailed instructions was given and interpretions was also provided afterwards. The dataset included 10866 pieces of film information ranging from 1960 to 2015, which consisted most of the main stream movies. Based on such substantial data, the analysis would be more reliable as opposed to small scale analysis. The limitations of current study were NaN values, which could affect the process of analysis. Luckily, those NaN values were all of category type, thus it has limited impact on arithmetric computing.

However, it might matter when comparing category column with numerical column for analysis. The stragety appiled in current study is to keep those NaN value, but convert them as 'No record' which is a string type of data. Among the 19 questions, only 2 questions were affected by the NaN value, thus most of the analysis are highly reliable.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.DS_Store		.DS_Store
.gitignore		.gitignore
Investigate_TMDb_Movie_Data_20180108.html		Investigate_TMDb_Movie_Data_20180108.html
Investigate_TMDb_Movie_Data_20180108.ipynb		Investigate_TMDb_Movie_Data_20180108.ipynb
Investigate_TMDb_Movie_Data_20180108.pdf		Investigate_TMDb_Movie_Data_20180108.pdf
LICENSE		LICENSE
README.md		README.md
tmdb-movies.csv		tmdb-movies.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Investigate TMDb Movie Data

Introduction

Prerequisites

Design

Conclusions

License

About

Releases

Packages

Languages

License

CS-LEE2022/Investigate_TMDb_Movie_Data

Folders and files

Latest commit

History

Repository files navigation

Investigate TMDb Movie Data

Introduction

Prerequisites

Design

Conclusions

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages