Skip to content

v0.1.0 - Impoved Algorithm

Compare
Choose a tag to compare
@micahcochran micahcochran released this 10 Jun 03:22
· 9 commits to master since this release
  • functionality of Crawler._find_random_url() has been split between _rank_url() and _mine_url(), which work together.
  • Crawler._rank_url() ranks the URLs based on the recipe_url defined in the yaml config. URLs that match recipe_url get put in a higher priority list.
  • _mine_url() processes all of the anchors of a webpage into lists.
  • Crawler._download_page() now picks web pages to download from.
  • Add timeout value to requests.get()
  • Replaced deque (double ended queue) with Python list. Python lists are common and the double ended queue provided no advantages.