Skip to content
This repository has been archived by the owner on Nov 2, 2023. It is now read-only.
/ toolkit-python Public archive

a python library for wikipedia information retrieval and extraction + digital humanities computing

License

Notifications You must be signed in to change notification settings

WeKeyPedia/toolkit-python

Repository files navigation

WeKeyPedia python toolkit Build Status Coverage Status

installation

using virtualenv

The pypi distribution is updated on important releases. During the development phase, this is approximatively every week.

$ mkdir e
$ virtualenv e/py
$ source e/py/bin/activate
(py)$ pip install wekeypedia
(py)$ python -m nltk.downloader punkt wordnet maxent_treebank_pos_tagger

using development version

If you need to get a up-to-last-second-update version, you might want to use the github master version. This is highly unstable. You both get work in progress features, their bugs and their bugfixes in realtime.

$ mkdir e
$ virtualenv e/py
$ source e/py/bin/activate
(py)$ pip install https://github.com/wekeypedia/toolkit-python/archive/master.zip
(py)$ python -m nltk.downloader punkt wordnet maxent_treebank_pos_tagger

usage

get the current content of a page

import wekeypedia

p = wekeypedia.WikipediaPage("Pi")
content = p.get_revision()

print content

parse diff result

diff = p.get_diff()
plusminus = p.extract_plusminus(diff)

p.print_plusminus_overview(plusminus)

count stems of a page

print p.count_stems([ content ])

examples and macros

You can explore the different current usages of the library by getting a look at the current we are using to build various datasets.

using virtualenv

$ virtualenv e/py --no-site-packages
$ source e/py/bin/activate
(py)$ pip install -r requirements.txt

About

a python library for wikipedia information retrieval and extraction + digital humanities computing

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages