Welcome to the Daraz Price Comparator repository! πποΈ This project automates price search and comparison on Daraz across Pakistan, Nepal, Sri Lanka, and Bangladesh. Powered by Python, Requests, BeautifulSoup, and Airflow, it simplifies the task of finding budget-friendly deals. Shop smartly with this efficient tool! π‘π
-
Dynamic E-commerce Search: After user input, Initiate a Product search on Daraz.
-
Automated Price Comparison: The scraper automates Product searches and compares prices across multiple listings.
-
Airflow Workflow: Integrated with Apache Airflow for scheduled and automated execution.
-
Data Storage: Extracted Product details and prices are stored in a structured format.
This repository houses the code and configuration for the Daraz Price ComparatorπποΈ project. The workflow encompasses initiating a search on Daraz across Pakistan, Nepal, Sri Lanka, and Bangladesh. It extracts details from the first page and efficiently compares prices. The entire process is seamlessly orchestrated using Airflow for automation, ensuring a hassle-free shopping experience. π»ππ
- User enter its desired product name, Initiating a Product search on Daraz (Only first page).
- Python scripts leverage Requests and BeautifulSoup for web scraping to extract product details and prices from both platforms.
- Extracted data is compared to identify the Product with the lowest price.
- We compare data of both platforms, Converting Currency and other relevant parameters and store in MongoDb.
- Airflow is configured to automate the entire process at scheduled intervals.
- Web Scraping: Python (Requests, BeautifulSoup)
- Automation: Apache Airflow
- Data Storage: Structured format for extracted data (MongoDb)
ecommerce-product-comparator/
: Contains the code and configuration for the Daraz Price ComparatorπποΈ project.process-imgs/
: Contains helping images related to the project.ecommerce-product-comparator-logo.jpg
: Logo image for the project.ecommerce-product-comparator-process.jpg
: Process/Roadmap image for the project.
main.py
: Contains code, generally calling other classes.code/
: Conatins all the code files.daraz_scraper.py
: Conatins code to scrape from Daraz.extract.py
: Contains code to call both scrapers.transform.py
: Contains code to implement Extract, Transform and Load.load_to_mongodb.py
: Conatins code to load data into MongoDb using PyMongo.
README.md
: You are here, providing an overview of the project.
To replicate this project, follow these steps:
-
Clone this project.
-
Run the Python scripts for web scraping and data comparison.
-
Optionally, integrate with Airflow for automated and scheduled execution.
-
The naming convention i use is
snaking case
.
Feel free to explore the project directory for more details on implementation and configuration.
This project is licensed under the MIT License.