This is the official repository for the paper Paraphrase Detection: Human vs. Machine Content.
We recommend using Python 3.10 for this project.
First install the requirements:
pip install -r requirements.txt
To use GloVe and Fasttext, you need to place their corresponding pre-trained word vectors into the models
directory.
The project has multiple scripts included, each used for separate parts of the experiment.
- Parse datasets from the
datasets
folder to a unified json format:parse.py
- Create the BERT embeddings for text pairs in
true_data.json
and visualize them with t-SNE:embedding_handler.py
- Apply detection methods (training & testing):
detect_paraphrases.py
- Evaluate the detection results:
evaluate.py
- Get examples sorted by best / worst / random performance:
get_examples.py
Not all datasets used in the paper are freely available to the public which is why we do not offer the prediction results on text pairs from these datasets for download. However, you are free to reprocess the experiments using all datasets from the paper once you got access.
This study includes twelve datasets (seven human-generated and five machine-generated). For further information, please refer to the paper.
Human-generated datasets: ETPC, QQP, TURL, SaR, MSCOCO, ParaSCI, APH
Machine-generated datasets: MPC, SAv2, ParaNMT-50M, PAWS-Wiki, APT
We evaluated the results of our experiments in the linked paper above. However, we provide additional material here that was not used in the final version of the paper.
t-SNE visualizations of each datasets BERT embeddings
Dataset | Aquisition Type | Mixed | Paraphrases Only |
---|---|---|---|
APH | Human | Live View | Live View |
APT | Machine | Live View | Live View |
ETPC | Human | Live View | Live View |
MPC | Machine | Live View | Live View |
MSCOCO | Human | Live View | Live View |
PAWS-Wiki | Machine | Live View | Live View |
ParaNMT-50M | Machine | Live View | Live View |
ParaSCI | Human | Live View | Live View |
QQP | Human | Live View | Live View |
SAv2 | Machine | Live View | Live View |
SaR | Human | Live View | Live View |
TURL | Human | Live View | Live View |
*All Datasets* | Mixed | Live View | Live View |
Grid Search Results
We performed a 2-fold randomized grid search of 25 iterations once per detection method. The grid search results can be seen in this directory.One-on-one correlation graphs of detection methods
For a detailed view at each one-on-one correlation, please refer to this directory.If you use this repository or our paper for your research work, please cite us in the following way.
@misc{becker2023paraphrase,
title={Paraphrase Detection: Human vs. Machine Content},
author={Jonas Becker and Jan Philip Wahle and Terry Ruas and Bela Gipp},
year={2023},
eprint={2303.13989},
archivePrefix={arXiv},
primaryClass={cs.CL}
}