Beyond Words: A Topological Exploration of Coherence in Text Documents
The Second Tiny Papers Track at ICLR 2024
conda env create -f environment.yml
conda activate tda-modeling-env
Computing TDA features for a dataset:
python feature_gen.py --cuda 0 --data_name clinton_train --input_dir GCDC_Dataset/ --output_dir gcdc_tda_features --batch_size 100
Train/test MLP using generated TDA features:
python predict_tda.py --input_dir GCDC_Dataset/ --feat_dir gcdc_tda_features/ --domain clinton
- GCDC - Refer GCDC-Corpus for the source
We thank the authors of Artificial Text Detection via Examining the Topology of Attention Maps (EMNLP 2021) for publishing their code.
If you find this code useful for your research, please cite the following paper:
@inproceedings{jain2024beyond,
title={Beyond Words: A Topological Exploration of Coherence in Text Documents},
author={Jain, Samyak and Singhal, Rishi and Krishna, Sriram and Singla, Yaman K and Shah, Rajiv Ratn},
booktitle={The Second Tiny Papers Track at ICLR 2024}
}