Skip to content

parthnamdev/concurrent-web-crawler

Repository files navigation

concurrent-web-crawler

Web crawler and indexer using parallel processing.

Three crawlers having different functioning have been analysed for the project. The most optimised version is a hybrid of the other two crawlers. The configurations are as follows:

  1. Serial Crawler and Indexer - SCSI - serial_crawler.py
  2. Concurrent Crawler and Indexer - CCCI - concurrent_crawler.py
  3. Concurrent Crawler with Serial Indexer - CCSI - hybrid_crawler.py

The third version - "Concurrent Crawler with serial Indexer", shows the most optimum results when tested.

image image

For the above results, URL used was

urlinput = https://en.wikipedia.org/wiki/Black_hole

base = https://en.wikipedia.org/wiki/

Releases

No releases published

Packages

No packages published

Languages