This project is a toy project of web crawler specifically targeted on douban group and zhihu.
####The roadmap of this toy project may include tasks below:
-
Regenerate the core tag searching implementation using bs4.(done) -
Crawl a step deeper: reveal social networking graph of author and followers.
-
Connect the crawler with a SQL database instead of current CSV file.(done) -
Provide a progress bar.(done) -
Add multi-threading support.
-
Add a front-end to visualize data scraped using D3.js (or other library)
-
Decoupling the orginal one file into classes.(done)