Analysis of IMDB movies data (raw .csv file) using python-pandas
DataSource : Kaggle - https://www.kaggle.com/datasets/PromptCloudHQ/imdb-data
RawData File : IMDB-Movie-Data.csv
(Data set of 1,000 most popular movies on IMDB in the 10 years (2006 to 2016). The data points included are: Title, Genre, Description, Director, Actors, Year, Runtime, Rating, Votes, Revenue, Metascrore)
Python-Pandas Source Code : IMDB-DataAnalysis.ipynb
Final Output : IMDB-DataAnalysis.pdf
DataAnalysis For Following Questions :
- Display Top 10 Rows of The Dataset
- Check Last 10 Rows of The Dataset
- Find Shape of Our Dataset (Number of Rows And Number of Columns)
- Getting Information About Our Dataset Like Total Number Rows, Total Number of Columns, Datatypes of Each Column And Memory Requirement
- Check Missing Values In The Dataset
- Drop All The Missing Values
- Check For Duplicate Data
- Get Overall Statistics About The DataFrame
- Display Title of The Movie Having Runtime Greater Than or equal to 180 Minutes
- In Which Year There Was The Highest Average Voting?
- In Which Year There Was The Highest Average Revenue?
- Find The Average Rating For Each Director
- Display Top 10 Lengthy Movies Title and Runtime
- Display Number of Movies Per Year
- Find Most Popular Movie Title (Highest Revenue)
- Display Top 10 Highest Rated Movie Titles And its Directors
- Display Top 10 Highest Revenue Movie Titles
- Find Average Rating of Movies Year Wise
- Does Rating Affect The Revenue?
- Classify Movies Based on Ratings [Excellent, Good, and Average]
- Count Number of Action Movies
- Find Unique Values From Genre
- How Many Films of Each Genre Were Made?