wordsworth

Frequency analysis of letters, words and arbitrary-length n-tuples of words. ###Basic wordsworth: ####Example 1: Print the top 50 n-words in textfile.txt

$ python wordsworth --filename textfile.txt --top 50

$ python wordsworth -f textfile.txt -t 50

####Example 2: Print the top n-tuples of up to 10 words in textfile.txt

$ python wordsworth --filename textfile.txt --ntuple 10

$ python wordsworth -f textfile.txt -n 10

####Example 3: Ignore the words 'the', 'a' and '--'.

$ python wordsworth --filename textfile.txt --ignore the,a,--

$ python wordsworth -f textfile.txt -i the,a,--

####Example 4: Ignore just '--'.

$ python wordsworth --filename textfile.txt --ignore ,--

$ python wordsworth -f textfile.txt -i ,--

###NLTK-enabled wordsworth: wordsworth-nltk.py provides extended analysis, including a frequency analysis of verbs, nouns, adjectives, pronouns etc. To run this script you will need to install the python Natural Language Toolkit (NLTK) and the Brown dataset which is used for token tagging. Fortunately this is very simple to install.

Step 1. Install NLTK

$ sudo pip install nltk

Step 2. Launch the python interpretter

$ python

Step 3. Download the Brown dataset

>>> import nltk
>>> nltk.download('brown')
>>> nltk.download('punkt')

###Example output:

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
screenshots		screenshots
.gitignore		.gitignore
LICENCE.txt		LICENCE.txt
README.md		README.md
random-stats.txt		random-stats.txt
random.txt		random.txt
warandpeace-stats.txt		warandpeace-stats.txt
warandpeace.txt		warandpeace.txt
wordsworth-nltk.py		wordsworth-nltk.py
wordsworth.py		wordsworth.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wordsworth

About

Releases

Packages

License

bjayaram/wordsworth

Folders and files

Latest commit

History

Repository files navigation

wordsworth

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages