Please follow the restrictions in Apache-2.0 Licence before you refer to this repository.
@Ming Jin (mingj2@student.unimelb.edu.au)
This repository is a simple NLP project for beginners and will be updated occasionally.
Environment: Python3.6
Pull requests are always welcome!
To-do list:
- Sentiment analysis can be done by using LSTM.
- Crawler can be improved based on some libraries like BeautifulSoup.
这是一个微博评论分析工具,实现功能主要有:
- 微博评论数据爬取
- 分词与关键词提取
- 词云与词频统计
- 情感分析
- 主题聚类
正常状态下实现效果在: “ 案例:泰国大象踩踏伤人事件 ” 注意:案例中最后表格需要自己根据LDA结果进行统计
This is a Weibo comments processing toolbox, which has been implemented for:
- Weibo comments crawler that based on regular expression
- Tokenization, filtration and key words extraction
- Words cloud and visualization
- Sentiment analysis
- Topic clustering that based on LDA
MySQL is required (e.g., You may use MySQL Workbench)
- importlib
- sys
- time
- requests
- lxml
- pymysql
- jieba
- PIL
- wordcloud
- snownlp
- logging
- configparser
- random
- codecs
BibTex reference format:
@misc{WeiboAnalyst,
title={Weibo-Analyst: An Open-Source Python Library for Social Media Comments Analysis},
url={https://github.com/KimMeen/Weibo-Analyst},
author={Ming Jin},
year={2018}
}