HateSpeechDetect-text

Project Overview

In this project, I focused on benchmarking various machine learning models, deep learning architectures, and fine-tuned BERT-based models to evaluate their performance across multiple metrics. The aim was to establish a robust and efficient framework for text classification tasks, ultimately improving the overall accuracy of predictions by 25%.

Key Highlights

Data Collection:
- Scraped data from Twitter using BeautifulSoup (BS4), collecting tweets related to a specific domain for text classification tasks.
- Preprocessed the scraped data to clean, tokenize, and transform it for effective model training.
Machine Learning Models:
- Benchmarked traditional ML models, including:
  - Logistic Regression
  - Support Vector Classifier
  - Decision Tree Classifier
  - Random Forest Classifier
  - Gradient Boosting Classifier
  - AdaBoost Classifier
  - XGBoost Classifier
Deep Learning Architectures:
- Evaluated the performance of deep learning models:
  - LSTM (Long Short-Term Memory)
  - Bi-LSTM (Bidirectional LSTM)
  - GRU (Gated Recurrent Unit)
Fine-Tuned BERT-Based Models:
- Implemented and fine-tuned transformer-based models, including:
  - BERT
  - RoBERTa
  - ALBERT
  - DistilBERT
- Employed QLoRA-based fine-tuning on the Phi-2 model to enhance its performance.
Performance Metrics:
- Evaluated models on the following metrics:
  - Accuracy
  - Precision (Macro and Weighted)
  - Recall (Macro and Weighted)
  - F1-Score (Macro and Weighted)
Achievements:
- Improved the best-performing model’s accuracy by 25% through advanced fine-tuning and hyperparameter optimization.
- Achieved consistent performance with BERT-based models, maintaining 91% accuracy across multiple datasets.

Benchmarking Results

Model	Accuracy	Precision (Macro)	Recall (Macro)	F1 (Macro)	Precision (Weighted)	Recall (Weighted)	F1 (Weighted)
Logistic Regression	0.75	0.58	0.67	0.59	0.86	0.75	0.79
Support Vector Classifier	0.75	0.54	0.60	0.55	0.83	0.75	0.78
Decision Tree Classifier	0.72	0.51	0.53	0.51	0.79	0.72	0.75
Random Forest Classifier	0.83	0.59	0.60	0.59	0.82	0.83	0.82
Gradient Boosting Classifier	0.75	0.57	0.63	0.57	0.84	0.75	0.79
AdaBoost Classifier	0.71	0.54	0.59	0.54	0.83	0.71	0.76
XGBoost Classifier	0.81	0.59	0.62	0.60	0.83	0.81	0.82
LSTM	0.73	0.57	0.54	0.51	0.85	0.73	0.77
Bi-LSTM	0.75	0.55	0.63	0.57	0.85	0.75	0.78
GRU	0.79	0.57	0.66	0.60	0.85	0.79	0.81
BERT	0.91	0.76	0.69	0.71	0.90	0.91	0.90
roBERTa	0.91	0.75	0.72	0.74	0.90	0.91	0.90
ALBERT	0.91	0.76	0.66	0.67	0.90	0.91	0.91
DistilBERT	0.91	0.79	0.73	0.75	0.91	0.91	0.91
Phi-2 (QLoRA Fine-Tuned)	0.90	0.75	0.68	0.70	0.89	0.90	0.89

Technical Tools and Frameworks

Data Collection: BeautifulSoup (BS4), Python for scraping and preprocessing.
Modeling: Scikit-learn, TensorFlow, PyTorch, and HuggingFace Transformers.
Fine-Tuning: QLoRA-based approach for parameter-efficient fine-tuning of large language models like Phi-2.
Visualization: Matplotlib and Seaborn for plotting evaluation metrics.

Results

The project demonstrated the superior performance of BERT-based models, particularly after applying QLoRA-based fine-tuning on Phi-2, which outperformed traditional machine learning and standard deep learning models across all key metrics. Random Forest and XGBoost were the top-performing ML models, while GRU showed the best results among deep learning approaches.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
hate-speech		hate-speech
HateSpeechDetect_using_DeepLearning.ipynb		HateSpeechDetect_using_DeepLearning.ipynb
Hate_Speech_Transformers.ipynb		Hate_Speech_Transformers.ipynb
Hatespeech_Finetuning_LLM.ipynb		Hatespeech_Finetuning_LLM.ipynb
Hatespeech_detection_classifier.ipynb		Hatespeech_detection_classifier.ipynb
LICENSE		LICENSE
LLM-finetuning-subtask-1		LLM-finetuning-subtask-1
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HateSpeechDetect-text

Project Overview

Key Highlights

Benchmarking Results

Technical Tools and Frameworks

Results

About

Releases

Packages

Languages

License

devroopsaha744/HateSpeechDetect-text

Folders and files

Latest commit

History

Repository files navigation

HateSpeechDetect-text

Project Overview

Key Highlights

Benchmarking Results

Technical Tools and Frameworks

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages