Skip to content

Scraping location specific restaurants from google maps and its websites from google search.

License

Notifications You must be signed in to change notification settings

JeyasriRamesh/Webscraping-restaurants-information

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

Webscraping-restaurants information

Scraping location specific restaurants from google maps and its websites from google search. || Selenium * DOMS * NLP ||

Step 1: Automating Google Map search and extraction

  • Added Selenium and WebDriver libraries to start with.
  • Got runtime inputs on Location and no. of restaurant details to be scraped.
  • Calculated the number of pages to be scraped (Google map displayes 20 results per page)
  • Initialized webdriver
  • Accessed Google map search bar using Xpath and Enter key instead of search button
  • Performed search with keywords -> Location + 'restaurants'
  • Analysed the structure of data displayed using Inspect element (Chrome)
  • Identified the XPath of Restaurant names
  • Iterated through the XPath
  • Added sleep time for pages to load before and after clicks.
  • Extracted Restaurant names and performed a automated click on each restaurants.
  • Handled exceptions
  • Indentified XPath of the address content and extracted them
  • Navigated back to the search results of page 1
  • Repeated the same for all results in page 1
  • Navigated to consecutive pages if required
  • Stored the restaurant names and the addresses in an array

Step 2: Restaurant URL extraction I faced an issue with extractign the website URL directly from the google map results. So I did a work around by extracting them by automating a google search.

  • Performed an automted search with the Restaurant name + city as keyword
  • Iterated through the XPath
  • Added sleep time for pages to load before and after clicks.
  • Retrieved the title and Xpath of the first 10 results
  • Handled exceptions
  • Used FuzzyWuzzy library to calculate the token set ratio for the restautant name and the titles of the retrieved search results
  • Stored the Xpath id and the search result's scores in a list of dictionaries
  • Retrieved the search result that has high score (levenshtein distance)
  • Identified the URL of that particular search result and retrieved it

Step 3: Storing data

  • Initialized a dataframe
  • Dumped array data to the dataframe

Room for Improvement:

  • URL extraction from the home site and ignoring aggregator websites
  • URL extraction from google map results
  • Experimenting the same with Google Maps API
  • Code optimization

About

Scraping location specific restaurants from google maps and its websites from google search.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published