Twitter_offensive_language_identification_dataset

Hate speech has been a persistent problem on social media; it is used to disparage a person, or a group based on their organizational affiliation or personal attributes. Although various rules and regulations are implemented to diminish hate speech on social media, it is still prevalent in social medias like Facebook and Twitter. Early detection of hate speech can help deescalate the crime and conflicts in the society.

Here, I have classified the tweets in the famous olid twitter dataset (https://sites.google.com/site/offensevalsharedtask/olid) whether they are offensive or not by using top machine learning (ML) algorithms. I used some of the ML models here for my course project as well.

Logistic Regression
Ridge Regression
Support Vector Machine (SVM)
K Nearest Neighbors (K-NN)
Decision Tree
Random Forest

About the data:
We are provided with the training and test datasets as below: Training: "olid-training-v1.0.tsv" is the original dataset which contains the tweets and their labels for each subtask. We will be using subtask_a and other columns are discarded here. Testing: "testset-levela.tsv" is the test set we will be using for testing our model. This dataset contains the tweets to test the model trained on subtask_a. This dataset doesn’t contain the labels for the tweets and is provided in a different file names "labels-levela.csv".

Upon the evaluation, among all the ML models, it was found that SVM and Logistic Regression classify the tweet datasets with higher accuracies of 81% and 80%, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
Tweet_offense_classification.ipynb		Tweet_offense_classification.ipynb
labels-levela.csv		labels-levela.csv
olid-training-v1.0.tsv		olid-training-v1.0.tsv
testset-levela.tsv		testset-levela.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter_offensive_language_identification_dataset

About

Releases

Packages

Languages

apandit2021/Hate_speech_identification

Folders and files

Latest commit

History

Repository files navigation

Twitter_offensive_language_identification_dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages