Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 464 Bytes

readme.md

File metadata and controls

9 lines (7 loc) · 464 Bytes

Move "Raw Data" folder into 285J_Twitter/ directory
Run format_data_pandas.py to create a single file with all text data in Python format
Run preprocessor_clean.py to perform TF-IDF and NMF on text data, dumping W and H from NMF into a tuple like (W, H) in a Python pickle file
Run generate_topics.py to print out a list of the top words in each topic



Stop words list for Spanish, Catalan, and English from: http://www.ranks.nl/stopwords