I have joined the Team Verzeo for the Internship between June-August 2020.
I have built a minor project where I have analysed the data, cleaned the data, detected outliers in it and performed the Exploratory Data Analysis (EDA) for the given dataset. Moreover, I have written the code to answer the following questions. However, the answers to the questions can be found here "Mini Project Answers.pdf".
1) Which are the movies with the third lowest and third highest budget? 2) What is the average number of words in movie titles between the year 2000-2005? 3) What is the most common Genre for Vin Diesel & Emma Watson movies? 4) Which are the movies with most and least earned revenue? 5) What is the average runtime of movies in the year 2006? 6) Name any 3 production companies which have invested money in worse revenue movies?
For more details about the project, please refer to "Mini Project.ipynb".
The major project assigned in this internship required me to work with Multinomial Naive Bayes, K-Nearest Neighbors (KNN) and Random Forest models for a given dataset (problem) and to decide which is the best classification algorithm (as per accuracy).
I have worked upon Information.csv dataset for this project. I also performed Exploratory Data Analysis on the data set given by Verzeo team. Moreover, I have performed Ensemble Learning where I have built a model using the 3 classfication algorithms which resulted in an accuracy of 61%. Based on the observations, the project was successfully completed.
- Exploratory Data Analysis
- Cleaning the Data
- Data Visualization
- Normalizing the texts
- Feature Engineering
- Classification algorithms such as :
- Multinomial Naive Bayes
- K-Nearest Neighbors (KNN)
- Random Forest (RFC)
- Ensemble Learning method - Vote Classifier
- Answered the following questions :
Q1) What are the most common emotions/words used by Males and Females? Q2) What is the time when most of the tweets are created by Males and Females?
However, the answers to the questions can be found here "Major Project Summary.pdf".
The project now classfies the common emotions/words used and also the time when most of the tweets are created by a specific gender i.e., by Males and Females. This is my first project based on Machine Learning.
For more details about the project, please refer to "Major Project.ipynb".
I am glad to share this on GitHub as my contribution to open source.