Releases: anuragkumarak95/wordnet
WordNet BETA release 2
WordNet
Create a Simple network of words related to each other using Twitter Streaming API.
Major parts of this project.
Streamer
: ~/twitter_streaming.pyTF-IDF
Gene : ~/wordnet/tf_idf_generator.pyNN
words Gene :~/ wordnet/nn_words.pyNETWORK
Gene : ~/wordnet/word_net.py
Using Streamer Functionality
-
Unzip the Source
and run on bash '$pip install -r requirements.txt
' @ root directory and you will be ready to go.. -
Go to root-dir(~), Create a config.py file with details mentioned below:
# Variables that contains the user credentials to access Twitter Streaming API # this link will help you(http://socialmedia-class.org/twittertutorial.html) access_token = "xxx-xx-xxxx" access_token_secret = "xxxxx" consumer_key = "xxxxxx" consumer_secret = "xxxxxxxx"
-
run
Streamer
with an array of filter words that you want to fetch tweets on. eg.$python twitter_streaming.py hello hi hallo namaste > data_file.txt
this will save a line by line words from tweets filtered according to words used as args indata_file.txt
.
Using WordNet Module
-
Unzip the Source
and install wordnet module using this script,$python setup.py install
-
To create a
TF-IDF
structure file for every doc, use:from wordnet import find_tf_idf df, tf_idf = find_tf_idf( file_names=['file/path1','file/path2',..], # paths of files to be processed.(create using twitter_streamer.py) prev_file_path='prev/tf/idf/file/path.tfidfpkl', # prev TF_IDF file to modify over, format standard is .tfidfpkl. default = None dump_path='path/to/dump/file.tfidfpkl' # dump_path if tf-idf needs to be dumped, format standard is .tfidfpkl. default = None ) ''' if no file is provided prev_file_path parameter, new TF-IDF file will be generated ,and else TF-IDF values will be combined with previous file, and dumped at dump_path if mentioned, else will only return the new tf-idf list of dictionaries, and df dictionary. '''
-
To use
NN
Word Gene of this module, simply use wordnet.find_knn:from wordnet import find_knn words = find_knn( tf_idf=tf_idf, # this tf_idf is returned by find_tf_idf() above. input_word='german', # a word for which k nearest neighbours are required. k=10, # k = number of neighbours required, default=10 rand_on=True # rand_on = either to randomly skip few words or show initial k words default=True ) ''' This function will return a list of words closely related to provided input_word refering to tf_idf var provided to it. either use find_tf_idf() to gather this var or pickle.load() a dump file dumped by the same function at your choosen directory. the file contains 2 lists in format (idf, tf_idf). '''
-
To create a Word
Network
, use :from wordnet import generate_net word_net = generate_net( df=df, # this df is returned by find_tf_idf() above. tf_idf=tf_idf, # this tf_idf is returned by find_tf_idf() above. dump_path='path/to/dump.wrnt' # dump_path = path to dump the generated files, format standard is .wrnt. default=None ) ''' this function returns a list of Word entities. '''
-
To retrieve a Word
Network
, use :from wordnet import retrieve_net word_net = retrieve_net( 'path/to/network.wrnt' # path to network file, format standard is .wrnt. ) ''' this function returns a list of Word entities. '''
Test Run
To run a formal test, simply run this script. python test.py
, this module will return 0 if everythinig worked as expected.
test.py uses sample data provided here and executes unittest on find_tf_idf()
, find_knn()
& generate_net()
.
by @Anurag
initial beta version
WordNet V0.0.1-BETA
python3
is being used as per this release.
requirements ( use pip3 )
- tweepy==3.5.0
- colorama==0.3.9
- urllib3==1.22
Three major parts are in this release.
Streamer
: twitter_streaming.pyTF-IDF
Gene : tf_idf_generator.pyNN
words Gene : nn_words.py
Way to go :
-
run
Streamer
with an array of filter words that you want to fetch tweets on.
eg.$python3 twitter_streaming.py hello hi hallo namaste > data_file.txt
this will save a line by line words from tweets filtered according to words used as args indata_file.txt
. -
run
TF-IDF GENE
for generating a TF-IDF file for further process.
eg.$python3 tf_idf_generator.py -d data_file.txt
this will generate a data_file.txt.tfidfpkl file at the same path as data_file.txt
note# current release is generating a very large file from this process. I am woring on it. 👍 -
run
NN Words Gene
for finally generating words that are relative to a specified word from given file.
eg.$python3 nn_words.py -f data_file.txt.tfidfpkl -w hello
this will output a list of words nearly related to thehello
word provided in the command by looking at the givendata_file.txt.tfidfpkl
file.
Step 1 & 2 are needed to be done once only, and repeat Step 3 as you feel free.
I have provided data_bank/data_v1.tfidfpkl
file for testing and enjoying NN Words GENE
right away for people who do not want to waste time of loading new data for having fun. 🥇
Have fun...
Developed by -
Anurag Kumar