This is a project for Machine Learning college course, which was implemented by a two person team consisting of
@JakubDralus and @Veczar.
It aims to predict the popularity of an article using random forest classifier.
Note: we have taken various appraches to this problem as seen in model.ipynb
file.
We later discovered that random forest classifier to be the right model to use.
- Python 3.11.5
- Conda 23.10.00
- pandas
- numpy
- sklearn
- matplotlib
- seaborn
The dataset has 59 features and a target which is the number of shares in social networks (popularity).
This is how it looks afret scailing:
We managed to get an accuracy of 0.67 using random forest classifier. In dataset description it is a recommended best model.
Accuracy: 0.67
precision recall f1-score support
0 0.66 0.60 0.63 5591
1 0.67 0.73 0.70 6303
accuracy 0.67 11894
macro avg 0.67 0.67 0.67 11894
weighted avg 0.67 0.67 0.67 11894
https://archive.ics.uci.edu/dataset/332/online+news+popularity
K. Fernandes, P. Vinagre and P. Cortez. A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News. Proceedings of the 17th EPIA 2015 - Portuguese Conference on Artificial Intelligence, September, Coimbra, Portugal.