Web_Crawler_Python3

用python3制作的网络爬虫（爬取豆瓣电影的电影信息）

运行环境:

python 3.3+

使用说明:

安装 Mongodb
安装库
```
$ pip install -r requirements.txt
```
运行daemon.py:
```
$ python daemon.py start
```

其他:

结束程序:
```
$ python daemon.py stop
```
结束程序，删除数据库中所有数据及日志文件:
```
$ python daemon.py clean
```
查看运行日志:
```
$ tail -f /tmp/daemon.log
```
查看进程pid:
```
$ tail -f /tmp/daemon.pid
```
查看错误日志:
```
$ tail -f /tmp/daemon.err
```

更新日志

V 1.1

bug修复
发生异常退出时将内存中的url存入数据库
加入clean命令

V 1.0

数据库更换为mongodb
异常处理更完善
内存占用优化
后台运行

V 0.1 alpha

urllib更换为第三方库responses
加入了http分析器BeautifulSoup的支持
数据被存到mysql

V 0.0 alpha

可以下载电影页面并保存到data文件夹

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
my_package		my_package
test_old		test_old
tool		tool
unittest		unittest
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
constant.py		constant.py
daemon.py		daemon.py
get_movie_info.py		get_movie_info.py
model.py		model.py
requirements.txt		requirements.txt
search_proxy.py		search_proxy.py
start_search.py		start_search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web_Crawler_Python3

运行环境:

使用说明:

其他:

更新日志

V 1.1

V 1.0

V 0.1 alpha

V 0.0 alpha

About

Releases

Packages

Languages

XiaochenCui/Web_Crawler_Python3

Folders and files

Latest commit

History

Repository files navigation

Web_Crawler_Python3

运行环境:

使用说明:

其他:

更新日志

V 1.1

V 1.0

V 0.1 alpha

V 0.0 alpha

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages