GitHub - JinpengLI/web_template: auto define crawler

It is tired to define crawler manually? You always extract the meta data by xpath?

Now you can easily to extract a list of meta data from a page.

See the example of func.py.

$ python func.py

是不是很觉得每个网站提取关键字很慢？是不是每个网站要找xpath提取关键字？

这个库可以动态提取关键字，只需要定义两个网页的关键字，就可以训练爬虫，然后去提取其他网页的关键字。

可以执行

$ python func.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
sample_page		sample_page
README.md		README.md
__init__.py		__init__.py
diff_match_patch.py		diff_match_patch.py
func.py		func.py

Provide feedback