Skip to content

Frequency analysis of letters, words and n-tuples.

License

Notifications You must be signed in to change notification settings

bjayaram/wordsworth

 
 

Repository files navigation

wordsworth

Frequency analysis of letters, words and arbitrary-length n-tuples of words. Alt text ###Basic wordsworth: ####Example 1: Print the top 50 n-words in textfile.txt

$ python wordsworth --filename textfile.txt --top 50
$ python wordsworth -f textfile.txt -t 50

####Example 2: Print the top n-tuples of up to 10 words in textfile.txt

$ python wordsworth --filename textfile.txt --ntuple 10
$ python wordsworth -f textfile.txt -n 10

####Example 3: Ignore the words 'the', 'a' and '--'.

$ python wordsworth --filename textfile.txt --ignore the,a,--
$ python wordsworth -f textfile.txt -i the,a,--

####Example 4: Ignore just '--'.

$ python wordsworth --filename textfile.txt --ignore ,--
$ python wordsworth -f textfile.txt -i ,--

###NLTK-enabled wordsworth: wordsworth-nltk.py provides extended analysis, including a frequency analysis of verbs, nouns, adjectives, pronouns etc. To run this script you will need to install the python Natural Language Toolkit (NLTK) and the Brown dataset which is used for token tagging. Fortunately this is very simple to install.

Step 1. Install NLTK

$ sudo pip install nltk

Step 2. Launch the python interpretter

$ python

Step 3. Download the Brown dataset

>>> import nltk
>>> nltk.download('brown')
>>> nltk.download('punkt')

###Example output:

Alt text
Alt text
Alt text
Alt text
Alt text
Alt text

About

Frequency analysis of letters, words and n-tuples.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published