Fraud hosts with substantial amount of fraudulent traffic using the impression logs for selected IP addresses
- python 2.7
- sklearn 0.19 dev version or with following fix
- numpy
- pandas
- Install the requirements for the projects by running
jupyter notebook Fraud_Prediction.ipynb
to open the Jupyter notebook
- A One class SVM is used for predicting fraudulent traffic (+1)
- Using bucketed time_stamps as features increases the test accuracy by >20%, which is a good indicator that fraudulent activities is clustered well in time. This fact can be extended to find the botnet networks.
- Over-classification with SMOTE was tried to balance a highly imbalanced dataset, but was not useful in producing better results, hence ommitted.