Phishing is the number one threat in the world of internet. Phishing attacks are from decades and with each passing year it is becoming a major problem for internet users as attackers are coming with unique and creative ideas to breach the security. In this project, an Anti-Phishing Prediction model has been demonstrated . The main aim of the project is to propose a Machine Learning Algorithm that can detect Phishing Mails or Messages using the constraints present in the mail like length of URLs, number of '.' used, number of Dashes used, number Hashes, Number of Ampersands etc., a total of 48 features.
Here, a preprocessed dataset on Phishing Legitimate is used for training and testing of the model. Certain Machine Learning Algorithms have been used for training and testing of Data as well as for Feature Selection. Here, Scikit-Learn libraries have been used for Prediction and Feature Selection. Machine Learning Algorithms used are:
- Logistic Regression
- K-Nearest Neighbor Algorithm
- Naïve Bayes Classifier Algorithm
- Random Forest Algorithm
- An Anti-Phishing Prediction Model based on Machine Learning Algorithms with an accuracy of nearly 98%. This accuracy is good enough to detect most of the phishing sites.
- Phishing_Legitimate_full.csv - A preprocessed dataset on Phishing Legitimate which is used for training and testing of the model.