Web crawler and indexer using parallel processing.
Three crawlers having different functioning have been analysed for the project. The most optimised version is a hybrid of the other two crawlers. The configurations are as follows:
- Serial Crawler and Indexer - SCSI - serial_crawler.py
- Concurrent Crawler and Indexer - CCCI - concurrent_crawler.py
- Concurrent Crawler with Serial Indexer - CCSI - hybrid_crawler.py
The third version - "Concurrent Crawler with serial Indexer", shows the most optimum results when tested.
For the above results, URL used was
urlinput = https://en.wikipedia.org/wiki/Black_hole