The pypi distribution is updated on important releases. During the development phase, this is approximatively every week.
$ mkdir e
$ virtualenv e/py
$ source e/py/bin/activate
(py)$ pip install wekeypedia
(py)$ python -m nltk.downloader punkt wordnet maxent_treebank_pos_tagger
If you need to get a up-to-last-second-update version, you might want to use the github master version. This is highly unstable. You both get work in progress features, their bugs and their bugfixes in realtime.
$ mkdir e
$ virtualenv e/py
$ source e/py/bin/activate
(py)$ pip install
(py)$ python -m nltk.downloader punkt wordnet maxent_treebank_pos_tagger
import wekeypedia
p = wekeypedia.WikipediaPage("Pi")
content = p.get_revision()
print content
diff = p.get_diff()
plusminus = p.extract_plusminus(diff)
print p.count_stems([ content ])
You can explore the different current usages of the library by getting a look at the current we are using to build various datasets.
- takes a file with a list of wikipedia pages
- retrieve the hyperlinks network
- store them in networkx format
- take a pre-existing dataset and fetch page contents, revision logs and page view statistics
- this script is used to produce the data for our data science python notebooks. It is mainly an explorary work to find new metrics
- takes a pre-existing file based dataset and produce blocks representations for the synchronology data visualization
- fetch contents and data for the analysis of wikipedia pages involved in the current events about Crimea, Ukrain and Russia
- produce the dataset for the ski slopes UI prototype
$ virtualenv e/py --no-site-packages
$ source e/py/bin/activate
(py)$ pip install -r requirements.txt