Skip to content

Releases: anuragkumarak95/wordnet

WordNet BETA release 2

18 Sep 12:44
Compare
Choose a tag to compare
Pre-release

WordNet

Create a Simple network of words related to each other using Twitter Streaming API.

Made with python-3.5

Major parts of this project.

  • Streamer : ~/twitter_streaming.py
  • TF-IDF Gene : ~/wordnet/tf_idf_generator.py
  • NN words Gene :~/ wordnet/nn_words.py
  • NETWORK Gene : ~/wordnet/word_net.py

Using Streamer Functionality

  1. Unzip the Source and run on bash '$pip install -r requirements.txt' @ root directory and you will be ready to go..

  2. Go to root-dir(~), Create a config.py file with details mentioned below:

    # Variables that contains the user credentials to access Twitter Streaming API
    # this link will help you(http://socialmedia-class.org/twittertutorial.html)
    access_token = "xxx-xx-xxxx"
    access_token_secret = "xxxxx"
    consumer_key = "xxxxxx"
    consumer_secret = "xxxxxxxx"
  3. run Streamer with an array of filter words that you want to fetch tweets on. eg. $python twitter_streaming.py hello hi hallo namaste > data_file.txt this will save a line by line words from tweets filtered according to words used as args in data_file.txt.

Using WordNet Module

  1. Unzip the Source and install wordnet module using this script,

     $python setup.py install
    
  2. To create a TF-IDF structure file for every doc, use:

    from wordnet import find_tf_idf
    
    df, tf_idf = find_tf_idf(
    file_names=['file/path1','file/path2',..],       # paths of files to be processed.(create using twitter_streamer.py)
    prev_file_path='prev/tf/idf/file/path.tfidfpkl', # prev TF_IDF file to modify over, format standard is .tfidfpkl. default = None
    dump_path='path/to/dump/file.tfidfpkl'           # dump_path if tf-idf needs to be dumped, format standard is .tfidfpkl. default = None
    )
    
    '''
    if no file is provided prev_file_path parameter, new TF-IDF file will be generated ,and else
    TF-IDF values will be combined with previous file, and dumped at dump_path if mentioned,
    else will only return the new tf-idf list of dictionaries, and df dictionary.
    '''
  3. To use NN Word Gene of this module, simply use wordnet.find_knn:

    from wordnet import find_knn
    
    words = find_knn(
    tf_idf=tf_idf,       # this tf_idf is returned by find_tf_idf() above.
    input_word='german', # a word for which k nearest neighbours are required.
    k=10,                # k = number of neighbours required, default=10
    rand_on=True         # rand_on = either to randomly skip few words or show initial k words default=True
    )
    
    '''
    This function will return a list of words closely related to provided input_word refering to
    tf_idf var provided to it. either use find_tf_idf() to gather this var or pickle.load() a dump
    file dumped by the same function at your choosen directory. the file contains 2 lists in format
    (idf, tf_idf).
    '''
  4. To create a Word Network, use :

    from wordnet import generate_net
    
    word_net = generate_net(
    df=df,                          # this df is returned by find_tf_idf() above.
    tf_idf=tf_idf,                  # this tf_idf is returned by find_tf_idf() above.
    dump_path='path/to/dump.wrnt'   # dump_path = path to dump the generated files, format standard is .wrnt. default=None
    )
    
    '''
    this function returns a list of Word entities.
    '''
  5. To retrieve a Word Network, use :

    from wordnet import retrieve_net
    
    word_net = retrieve_net(
        'path/to/network.wrnt' # path to network file, format standard is .wrnt.
        )
    '''
    this function returns a list of Word entities.
    '''

Test Run

To run a formal test, simply run this script. python test.py, this module will return 0 if everythinig worked as expected.

test.py uses sample data provided here and executes unittest on find_tf_idf(), find_knn() & generate_net().

BUILT WITH LOVE

by @Anurag

initial beta version

02 Sep 08:33
Compare
Choose a tag to compare
initial beta version Pre-release
Pre-release

WordNet V0.0.1-BETA

python3 is being used as per this release.

requirements ( use pip3 )

  1. tweepy==3.5.0
  2. colorama==0.3.9
  3. urllib3==1.22

Three major parts are in this release.

  1. Streamer : twitter_streaming.py
  2. TF-IDF Gene : tf_idf_generator.py
  3. NN words Gene : nn_words.py

Way to go :

  1. run Streamer with an array of filter words that you want to fetch tweets on.
    eg. $python3 twitter_streaming.py hello hi hallo namaste > data_file.txt
    this will save a line by line words from tweets filtered according to words used as args in data_file.txt.

  2. run TF-IDF GENE for generating a TF-IDF file for further process.
    eg. $python3 tf_idf_generator.py -d data_file.txt
    this will generate a data_file.txt.tfidfpkl file at the same path as data_file.txt
    note# current release is generating a very large file from this process. I am woring on it. 👍

  3. run NN Words Gene for finally generating words that are relative to a specified word from given file.
    eg. $python3 nn_words.py -f data_file.txt.tfidfpkl -w hello
    this will output a list of words nearly related to the hello word provided in the command by looking at the given data_file.txt.tfidfpkl file.

Step 1 & 2 are needed to be done once only, and repeat Step 3 as you feel free.

I have provided data_bank/data_v1.tfidfpkl file for testing and enjoying NN Words GENE right away for people who do not want to waste time of loading new data for having fun. 🥇

Have fun...

Developed by -
Anurag Kumar