- A sentiment analysis job about the hate speech.
- Twitter data is anonymized by replacing Mentioned users' username with with @user.
- For the sake of simplicity, we say a tweet contains hate speech if it has a racist or sexist sentiment associated with it.
Dataset:
- The project is from a dataset from Kaggle.
- Link to the Kaggle project site:https://www.kaggle.com/datasets/arkhoshghalb/twitter-sentiment-analysis-hatred-speech?resource=download
- The dataset has to be downloaded from the above Kaggle website
- Full tweet texts are provided with their labels for training data.
- We are given a training sample of tweets and labels, where label '1' denotes the tweet is racist/sexist and label '0' denotes the tweet is not racist/sexist, our objective is to predict the labels on the test dataset (racist/sexist or non racist/sexist.)
- Exploratory Data Analysis
- Preprocessing and cleaning
- Bag-of-Words and TF-IDF
- Model development and evaluation
- Jupyter Notebook