This project involves developing a machine learning system to predict whether a website is phishing or not. The system is designed to enhance cybersecurity by identifying and blocking malicious websites. The project integrates machine learning techniques with MLOps tools to create a scalable, automated, and efficient solution.
- 🎯 Objective: Predict phishing websites using machine learning to improve cybersecurity.
- 🛠️ Key Technologies: Python, Docker, AWS, MongoDB, FastAPI, GitHub Actions, MLflow, Airflow, Terraform.
- 🔍 Machine Learning Model: A model trained to detect phishing websites based on various indicators.
- 🔄 MLOps Pipeline: Automated deployment pipeline using GitHub Actions, MLflow, Airflow, and Terraform.
- ⚙️ Scalable Deployment: The system is deployed on AWS using Docker, and MongoDB, ensuring scalability and efficient management.
- 🌐 API Integration: FastAPI is used to build and deploy APIs for interacting with the model.
- 💾 Database Integration: MongoDB is used for storing and managing website data during model training and deployment.
- 🐍 Python 3.8+
- 🐳 Docker
- ☁️ AWS account with necessary permissions
- 🍃 MongoDB
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
- Ensure your AWS CLI is configured with the correct credentials.
- Update the
terraform/variables.tf
file with your AWS settings.
- 🐳 Build Docker Image:
docker build -t phishing-detection:v1 .
- 📤 Push to AWS ECR: Follow AWS ECR instructions to push your Docker image.
- 🛠️ Run Terraform:
cd terraform terraform init terraform apply
-
📊 Training the Model:
- Prepare your dataset and upload it to the specified MongoDB collection.
-
🔍 Prediction:
- After deployment, the system will automatically predict whether new websites are phishing and store the results in MongoDB.
- Interact with the model via APIs built with FastAPI.
- The model's performance metrics, such as accuracy, precision, and recall, are tracked using MLflow.
If you'd like to contribute to this project, please fork the repository and use a feature branch. Pull requests are welcome.
For any questions or feedback, please reach out to Krishna Shukla.