The aim of this group project is to predict the popularity of the reddit posts.
1. Import required libraries and create a dataframe of the JSON file
2. Perform data cleaning, text preprocessing, and feature engineering
3. Translate non-English post titles using google translator and then perform sentiment analysis on them
4. Perform analysis on the cleaned data using correlation to derive insights
5. Model is then trained on training data, model performance evaluated on testing data
6. Results verified with the insights
Reddit Post Popularity.ipynb is used to clean the data and Reddit Post Popularity - modeling.ipynb is used to implement a model on the cleaned data.