Entropy based binary SVM classifier library
I use entropy in positive and negative emotional classification via SVM in many projects. These are general methods that can be applied in many circumnstances.
-
Bag_of_concepts: Provided a dictionary of clustered words, it returns a Bag of Concepts vector for a corpus
-
Bag_of_words: Provided a list of words, it returns a Bag of Words vector for a corpus
-
Corpus_preprocessing: I use gensim to preprocess the corpus into a lemmatized and tokenized version of the texts.
-
Entropy: I use scikit and gensim to calculate the entropy of words in positive and negative documents, so that I can then compare both entropies of the word and know the words that are probabilistically evenly distributed in one category but not in the other, which aids in classification.
-
Posi-Nega-Neutra_Tagged-Sentence-Parsing: When creating my training data, I xml tagged the text with , and tags. This method helps me parse that to a python list.
-
SVM_Methods: The methods I use to analyze SVM training results from K-folds cross validation to weight analysis.
-
Model_methods: Not only to use SVM, but when I wish to use other machine learning methodologies.
-
Model_metrics: I use scikit-learn to write my own K-folds method that returns F1, Accuracy, Precision and Recall. Included methods usually only return Accuracy or F1.
-
Kaomoji: A library to detect kaomoji in text and convert them into numbered tags before applying segmentation by parsers. (in languages without spaces like Chinese)
-
ProjectPaths: Paths to folders inside the project, such as "data", "logs", etc. for organization.
-
UsefulMethods: A few methods I use constantly
-
Best_SVM_selection: I use this constantly to compare different SVM parameters and choose the best, so I made it more accessible to import