Skip to content

vish1/hbcse-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Copyright 2004, 2013 Vishwas Bhat, Apurva Pangam, Tarun Makhija, Vineet Jalali

Multipurpose Internet crawler
-----------------------------

Purpose:
-------
Create a knowledge base on a particular domain like mathematics. 

Using set of keywords for the required domain, crawl the Internet for sites containing the keywords and store the relevant pages locally. Use MD5 hash to prevent storing the same page again.

Execute the program:
-------------------
./agentxml.py 

Limitations of the crawler:
--------------------------
1. Does not handle authentication websites
2. Works for only http sites

Required improvements:
---------------------
1. Reorganisation of code
2. Improve storage of captured sites

Credits:
-------
The crawler was created as part of a student project at the Homi Bhabh Center for Science Education under the guidance of Dr. Nagarjuna. We would like to thank him for his support and direction in creating this project.



About

Web crawler designed under Dr. Nagarjuna at HBCSE

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published