This repository contains the Python code to reproduce the experiments presented in our paper:
An Incremental Clustering Baseline for Event Detection on Twitter.
- Installation
- Download data
- Preprocess data
- Run event detection
- Generate latex table
- Plot execution time
We encourage you to create a virtual environment to install Python 3.8.2. Below are two examples, one with conda, another with pyenv-virtualenv.
git clone https://github.com/medialab/twitter-incremental-clustering.git
cd twitter-incremental-clustering
conda create -n workshop python=3.8.2
source activate workshop
pip install -U pip setuptools
pip install -r requirements.txt
git clone https://github.com/medialab/twitter-incremental-clustering.git
cd twitter-incremental-clustering
pyenv virtualenv 3.8.2 workshop
pyenv activate workshop
pip install -U pip setuptools
pip install -r requirements.txt
We test our method on 2 datasets, Event2012 [McMinn et al., 2013] and Event2018 [Mazoyer et al., 2020]. Follow the instructions by [Cao et al., 2024] here to download the data. Place the entire ./raw_data folder under the root folder.
python preprocess.py
- Run event detection on Event2018 dataset with
Sentence-CamemBERT Large (GPU required):
python run_detection.py --model sbert --sub-model "dangvantuan/sentence-camembert-large" --lang fr --dataset event2018.tsv
- Run event detection on Event2012 dataset with all-mpnet-base-v2 (GPU required):
python run_detection.py --model sbert --sub-model "sentence-transformers/all-mpnet-base-v2" --lang en --dataset event2012.tsv
python generate_table.py
The table is saved in ami_ari_metrics.tex
After running the event detection several times with several --batch-size values, plot the effect of the parameter on AMI and execution time with the command:
python plot_time.py
The figure is saved in timeplot.pdf