This repository contains a comprehensive collection of models and data designed for the text-based detection of German bot-generated hate speech. It also includes a metadata-based, language-agnostic dataset for bot detection on Reddit and corresponding models for identifying bots. The datasets for this project were predominantly developed internally.
- dataset
- text-based
- train.tsv -- bot- and human-generated German hate speech comments
- test.tsv -- bot- and human-generated German hate speech comments
- metadata-based
- hate_speech_train.tsv -- hate and non-hate German comments
- hate_speech_test.tsv -- hate and non-hate German comments
- bot_human_metadata_train_english.tsv -- English bot- and human-metadata dataset -- contains no usernames
- bot_human_metadata_test_english.tsv -- English bot- and human-metadata dataset -- contains no usernames
- bot_human_metadata_test_german.tsv -- German bot- and human- hate speech metadata dataset -- contains no usernames
- datasets_llama
- bot_ds -- bot comments in the dataset format
- no_bot_ds -- human comments in the dataset format
- text-based
- models
- helper -- preprocessing functions
- extract_features.py -- cleaning for stylometric preprocessing
- extract_metadata.py -- file to extract metadata features from Reddit accounts
- fetch_user_metadata.py -- code to fetch a Reddit user metadata
- helper.py -- file with text preprocessing functions
- light_clean.py -- cleaning for Bert preprocessing
- predict_comment.py -- code based on a BERT-Large model to predict an authorship of a hate comment
- text-based -- models designed for text-based hate speech bot detection
- Bert
- Bert-CNN
- Bert-Style
- Llama2
- Style
- requirements.txt
- metadata-based_pipeline -- models designed for metadata-based hate bot detection
- bot_detector.py
- hate_speech_detector.ipynb
- helper -- preprocessing functions
- data_collection
- bot_comment_generation -- generation of AI-generated comments (training data)
- bot_comment_generation_mixtral -- generation of AI-generated comments (test data)
- offensive_words.txt -- the list of offensive words
- get_subreddit_users.ipynb -- extraction of users from German subreddits
Below you find the sources for the data collection. Note that models were tested on the outputs of an unseen LLM to ensure robustness.
Sources | Total Comments |
---|---|
DeTox | 4,504 |
RP-MOD | 2,813 |
HASOC 2019 | 543 |
GermEval-2018 (test set) | 1598 |
Sources | Total Comments |
---|---|
GPT 3.5 | 1600 |
GPT 4 | 1601 |
TheBloke/em_german_13b_v01-GPTQ | 1600 |
TheBloke/em_german_leo_mistral-GPTQ | 1600 |
TheBloke/leo-hessianai-13B-chat-GPTQ | 1600 |
mistralai/Mixtral-8x7B-Instruct-v0.1 (test set) | 1599 |
We have implemented the following models for the text-based approach:
Model | F-Score |
---|---|
BERT Base | 0.974 |
BERT Large | 0.986 |
BERT Base-CNN | 0.980 |
BERT Base+Stylometric | 0.949 |
Stylometric | 0.881 |
LLM (Llama2) 7B | 0.943 |
LLM (Llama2) 13B | 0.962 |
The hate speech detector was trained on the open-source data, see below:
Sources | Hate | Non-Hate |
---|---|---|
DeTox | 4,504 | 7682 |
RP-MOD | 2,813 | 3412 |
HASOC 2019 | 543 | 5789 |
The Reddit metadata dataset comprises 818 Human + 816 English Bot Accounts and 627 Human + 9 German Bot Accounts with corresponsing features.
Feature name | Description |
---|---|
comment_karma | Comment karma of a user |
post_karma | Post karma of a user |
comment_activity_day | Number of comments per day |
posts_activity_day | Number of posts per day |
avg_frequency_posts | Average time between the posts (in seconds) |
avg_frequency_all | Average time between any activity (in seconds) |
min_time_all | Minimal time between any activity (in seconds) |
num_url | Proportion of links in posts and comments |
num_repeated_post | Proportion of repeated posts |
num_repeated_comment | Proportion of repeated comments |
For the hate-speech detector, Llama 3 was implemented:
Test Set | F-Score |
---|---|
Test set 1 (Detox, RP-Mod, HASOC) | 0.92 |
Test set 2 (Reddit Hate Speech) | 0.76 |
Metadata classification with Random Forest:
Model | F-Score (Performance on the English validation set) |
---|---|
Random Forest | 0.934 |
To request complete metadata datasets, contact