EvMine

The source code used for paper "Unsupervised Key Event Detection from Massive Text Corpora", published in KDD 2022.

Requirements

Python 3 and the following packages are required: numpy, sklearn, igraph, inflect, nltk, datefinder.
You will also need huggingface transformers package if you want to obtain document and phrase embeddings on your own corpus.

Data

The two datasets used in the paper are available here, including their original corpora, UCPhrase results, phrase embeddings, document embeddings, document publication times, and event labels (which is only used for evaluation). After downloading the dataset, put them under the ./data/ folder.
If running on your own data, please create a dataset folder and first use UCPhrase with tagging mode to mine quality phrases from the corpus. Then, you can get the phrase embeddings via

python phrase_emb.py \
    --data hkprotest \
    --ucphrase_res doc2sents-0.9-tokenized.id.json \
    --doc_time doc2time.txt \
    --out phrase_emb

and document embeddings via

python doc_emb.py \
    --data hkprotest \
    --ucphrase_res doc2sents-0.9-tokenized.id.json \
    --doc_time doc2time.txt \
    --out doc_emb

Run EvMine

Use the following command to run EvMine and the results will be saved to the corresponding dataset folder.

python EvMine.py \
    --data hkprotest \
    --ucphrase_res doc2sents-0.9-tokenized.id.json \
    --doc_time doc2time.txt \
    --doc_emb doc_emb.npy \
    --phrase_emb phrase_emb \
    --out output.json

Evaluation

Use the following command to evaluate the key event detection results, where the argument eval_top refers to k for the k-Matched measure.

python eval.py \
    --key_event_file data/hkprotest/output.json \
    --ground_truth data/hkprotest/doc2event_id.txt \
    --eval_top 5

Citations

If you find our work useful for your research, please cite the following paper:

@inproceedings{Zhang2022EvMine,
  title={Unsupervised Key Event Detection from Massive Text Corpora},
  author={Yunyi Zhang and Fang Guo and Jiaming Shen and Jiawei Han},
  booktitle={KDD},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
baselines		baselines
data		data
EvMine.py		EvMine.py
README.md		README.md
doc_emb.py		doc_emb.py
eval.py		eval.py
phrase_emb.py		phrase_emb.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EvMine

Requirements

Data

Run EvMine

Evaluation

Citations

About

Releases

Packages

Languages

yzhan238/EvMine

Folders and files

Latest commit

History

Repository files navigation

EvMine

Requirements

Data

Run EvMine

Evaluation

Citations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages