This Python project scrapes product information from the Digikala e-commerce website to extract details about available laptops, such as price, model, CPU, GPU, RAM, screen size, etc. The extracted data is stored in a MySQL database using the mysql library. Additionally, the project includes a simple machine learning model built with scikit-learn for predicting laptop prices based on user input configurations.
- Web Scraping: Scrapes product information from Digikala's laptop category.
- Data Extraction: Collects details like price, model, CPU, GPU, RAM, screen size, etc., from each laptop listing.
- Database Storage: Stores the extracted data in a MySQL database.
- Machine Learning Model: Develops a simple price prediction model based on laptop configurations.
- Find Number of Pages: Determines the number of pages available for the laptop category on the Digikala website.
- Loop and Extract Links: Iterates through each page, extracting links to individual laptop listings.
- Retrieve and Collect Data: Requests each laptop's URL and collects desired information.
- Decode and Store: Decodes the extracted information and stores it in a MySQL database.
- Model Creation: Builds a machine learning model using scikit-learn to predict laptop prices based on user-specified configurations.
- Python 3
- Requests library
- BeautifulSoup library
- mysql library
- scikit-learn library
- MySQL database
- Clone the repository: git clone
https://github.com/Faridghr/ProductScraper.git
- Install dependencies:
pip install -r requirements.txt
- Set up MySQL database and configure connection settings.
- Run the main script to scrape data and store it in the database:
python src/ProductScraper.py