Skip to content

Effortlessly gather image data for your deep learning projects using this repository. With Selenium and Python, explore a robust web-scraping solution designed for acquiring numerous images. Accelerate your model training with diverse and extensive datasets, making your deep learning endeavors more effective and efficient.

Notifications You must be signed in to change notification settings

Praveen76/Web-Scraping-using-Selenium-Python

Repository files navigation

Images' web-scraping-using-Selenium-Python

Data mining or gathering data is a very primitive step in the data science life cycle. As per business requirements, one may have to gather data from sources like SAP servers, logs, Databases, APIs, online repositories, or web.

Images' web-scrapping

Tools for web scraping like Selenium can scrape a large volume of data such as text and images in a relatively short time.

Directory Structure

  • Dataset1 : All images
  • Dataset2 : Images for Car, and horses class. You can scrape images for classes of your choice. Go through the code to get more details.
  • Image Web-Scraping_Part1.ipynb: Python notebook for image webscrapping.
  • Image Web-Scraping_Part2.ipynb: It's almost the same code like one in Image Web-Scraping_Part1.ipynb notebook, but I've put the code in seperate functions to make it more modular and readable.

Instructions for Installation:

Dependencies:

  • selenium: 4.8.2
  • PIL: 7.0.0
  • requests: 2.22.0
  • webdriver_manager: 3.8.5

Steps involved:

  • Step 1 – Import all required libraries
  • Step 2 – Install Chrome Driver
  • Step 3 – Specify search URL
  • Step 4 – Write a function to take the cursor to the end of the page
  • Step5. Write a function to get URL of each Image
  • Step 6: Write a function to download each image which is not restricted by any license or copyright.
  • Step7: – Write a function to save each Image in the Destination directory

Article published on Analytics Vidhya:

I've published a comprehensive article on Image Web Scraping using Selenium with Python. You can refer this link to get more details.

Important learnings from the article:

  • What is Web Scraping
  • Why Web Scraping
  • How Web Scraping is useful
  • What is Selenium
  • Setup & tools
  • Implementation of Image Web Scrapping using Selenium Python
  • Headless Chrome browser
  • Putting it altogether
  • End Notes

Issues:

If you encounter any issues or have suggestions for improvement, please open an issue in the Issues section of this repository.

Contributing

If you have a Data Science mini-project that you'd like to share, please follow the guidelines in CONTRIBUTING.md.

Code of Conduct

Please adhere to our Code of Conduct in all your interactions with the project.

License

This project is licensed under the MIT License.

Contact

For questions or inquiries, feel free to contact me on Linkedin.

About Me:

I’m a seasoned Data Scientist and founder of TowardsMachineLearning.Org. I've worked on various Machine Learning, NLP, and cutting-edge deep learning frameworks to solve numerous business problems.

About

Effortlessly gather image data for your deep learning projects using this repository. With Selenium and Python, explore a robust web-scraping solution designed for acquiring numerous images. Accelerate your model training with diverse and extensive datasets, making your deep learning endeavors more effective and efficient.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published