Data mining or gathering data is a very primitive step in the data science life cycle. As per business requirements, one may have to gather data from sources like SAP servers, logs, Databases, APIs, online repositories, or web.
Tools for web scraping like Selenium can scrape a large volume of data such as text and images in a relatively short time.
- Dataset1 : All images
- Dataset2 : Images for Car, and horses class. You can scrape images for classes of your choice. Go through the code to get more details.
- Image Web-Scraping_Part1.ipynb: Python notebook for image webscrapping.
- Image Web-Scraping_Part2.ipynb: It's almost the same code like one in Image Web-Scraping_Part1.ipynb notebook, but I've put the code in seperate functions to make it more modular and readable.
Dependencies:
- selenium: 4.8.2
- PIL: 7.0.0
- requests: 2.22.0
- webdriver_manager: 3.8.5
- Step 1 – Import all required libraries
- Step 2 – Install Chrome Driver
- Step 3 – Specify search URL
- Step 4 – Write a function to take the cursor to the end of the page
- Step5. Write a function to get URL of each Image
- Step 6: Write a function to download each image which is not restricted by any license or copyright.
- Step7: – Write a function to save each Image in the Destination directory
I've published a comprehensive article on Image Web Scraping using Selenium with Python. You can refer this link to get more details.
- What is Web Scraping
- Why Web Scraping
- How Web Scraping is useful
- What is Selenium
- Setup & tools
- Implementation of Image Web Scrapping using Selenium Python
- Headless Chrome browser
- Putting it altogether
- End Notes
If you encounter any issues or have suggestions for improvement, please open an issue in the Issues section of this repository.
If you have a Data Science mini-project that you'd like to share, please follow the guidelines in CONTRIBUTING.md.
Please adhere to our Code of Conduct in all your interactions with the project.
This project is licensed under the MIT License.
For questions or inquiries, feel free to contact me on Linkedin.
I’m a seasoned Data Scientist and founder of TowardsMachineLearning.Org. I've worked on various Machine Learning, NLP, and cutting-edge deep learning frameworks to solve numerous business problems.