For Tfidf feature generation use our pretrained vocabulary with the code below:

#Dataset Description :

""" ./Datasets/Hate_Counter_Dataset.csv --- contains the Tweet IDs of Hate users mapped with corresponding Counter users. There are in total 1290 pairs of tweets by hate and counter users.

./Models/Catboost-model-tfidf.joblib --- is the catboost model which is trained on only the tfidf vector. This model can be used for classification in case we don't have any information regarding the user, only his tweets are available.

./Models/Catboost-model.joblib --- is our best performing model which is trained on all the features.

./Models/char_vocab.pkl --- contains character level vocabulary trained on 6 million tweets.

./Models/word_vocab.pkl --- contains word level vocabulary trained on 6 million tweets.

./Models/Reproducing_Results.ipynb --- contains codes which can be used to reproduce our results.

"""

#Code Reproduction : ##Make sure you have got all Dependencies before running the model

import pickle from sklearn.metrics import classification_report from sklearn.feature_extraction.text import TfidfVectorizer import preprocessor as prep ###### Twitter preprocessor from sklearn.externals import joblib # to save & load the model

#We have got the best accuracy score on CatBoostClassifier. #You can always reproduce the results by importing the model by following commands below:

#**********************************************************************************************************************************

#Model-I """ Description : In this model we only used Tfidf Vectors (Generated from both Word and Character Vocabulary) as our features

"""

#Feature Preparation:

For Tfidf feature generation use our pretrained vocabulary with the code below:

""" char_vocab.pkl - contains character level vocabulary trained on 6 million tweets. word_vocab.pkl - contains word level vocabulary trained on 6 million tweets.

"""

Load the model:

cbc = joblib.load('Refined_Project_Dataset/Catboost-model-tfidf.joblib')

Fetch your test tweets and labels:

"""

x_test : list containing all tweets of users
y_test : contains binary class values as 1: Hate | 0:Counter

""" x_test = pickle.load(open('Refined_Project_Dataset/xw_test_tfidf.pkl','rb')) y_test = pickle.load(open('Refined_Project_Dataset/yw_test_tfidf.pkl','rb'))

# Preprocessing:-------------------------------------

prep_tweets = [] for tweet in tqdm(x_test): prep_tweets.append(prep.tokenize(x_test)) #------------------------------------------------------

# TF-IDF Vectorizers :

word_vectorizer = TfidfVectorizer(vocabulary=pickle.load(open("word_vocab.pkl", "rb")) # pretrained vocabulary from 6 million tweets on word level char_vectorizer = TfidfVectorizer(vocabulary=pickle.load(open("char_vocab.pkl", "rb")) # pretrained vocabulary from 6 million tweets on char level

char_features = char_vectorizer.transform(prep_tweets) word_features = word_vectorizer.transform(prep_tweets)

# Prepare features[word + char]:

features = np.c_[np.asarray(word_features.todense()),np.asarray(char_features.todense())]

#print classification report of your model's performance: print(classification_report(y_test,cbc.predict(x_test))) print('Accuracy:',cbc.score(x_test,y_test))

#******************************************************************************************************************************************

Model-II

""" Description : This is model is where we have got our best accuracy results on testing data

"""

Use this model after you get all your required features:

""" Features Description : [Orderly] i)Tfidf : Word Vectors + Char Vectors [Orderly] ii)Lexicon Features [Empath] iii)Sentiment Features [Vader] + TextBlob iv)User History:[Order is maintained] 1)followers_count/tweet 2)favourites_count/tweet 3)friends_count/tweet 4)listed_count/tweet 5)statuses_count/tweet 6)verified

"""

Load the model:

cbc = joblib.load('Refined_Project_Dataset/Catboost-model.joblib')

Load your testing test features and lablels:

x_test = pickle.load(open('Refined_Project_Dataset/xw_test.pkl','rb')) y_test = pickle.load(open('Refined_Project_Dataset/yw_test.pkl','rb'))

#print classification report of your model's performance: print(classification_report(y_test,cbc.predict(x_test))) print('Accuracy:',cbc.score(x_test,y_test))

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Models		Models
data		data
ReadMe.md		ReadMe.md
Tfidf-lexical-model.py		Tfidf-lexical-model.py
best-model.py		best-model.py
features.pkl		features.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

For Tfidf feature generation use our pretrained vocabulary with the code below:

Load the model:

Fetch your test tweets and labels:

# Preprocessing:-------------------------------------

# TF-IDF Vectorizers :

# Prepare features[word + char]:

Model-II

Use this model after you get all your required features:

Load the model:

Load your testing test features and lablels:

About

Releases

Packages

Languages

hate-alert/Counterspeech_Twitter

Folders and files

Latest commit

History

Repository files navigation

For Tfidf feature generation use our pretrained vocabulary with the code below:

Load the model:

Fetch your test tweets and labels:

# Preprocessing:-------------------------------------

# TF-IDF Vectorizers :

# Prepare features[word + char]:

Model-II

Use this model after you get all your required features:

Load the model:

Load your testing test features and lablels:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages