LLMs vs Established Text Augmentation Techniques for Classification:

This repository contains both the data and code for this paper. This repository is structured as follows:

Dataset folders (news_category, ag_news, atis, fb, yelp, sst5): each of the folders contains its own readme file with further instructions. In general, each folder contains collected data via LLM-based or established methods; scripts for collecting data and finetuning; and result for each finetuning done on the dataset per augmentation method, no. seed used, no. data collected and finetuned model used.

aggregate_each_vs_each_{placeholder}.py: these scripts are used for the aggregation of results from the finetunings of classifiers when comparing each type of classifier and finetuning method used.

each_vs_each_{placeholder} folders: these folders contain the aggregated results in form of csvs comparing each LLM-based method with each established augmentation method for downstream model accuracy and the scripts and results of visualization done on those results.

augnlp.ipynb: jupyter notebook with examples of how we gather data from established augmentation methods

requirements.txt: python requirements for the scripts

Citing

@misc{cegin2024llmsvsestablishedtext,
      title={LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?}, 
      author={Jan Cegin and Jakub Simko and Peter Brusilovsky},
      year={2024},
      eprint={2408.16502},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.16502}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
ag_news		ag_news
atis		atis
each_vs_each_bert		each_vs_each_bert
each_vs_each_bert_lora		each_vs_each_bert_lora
each_vs_each_distilbert		each_vs_each_distilbert
each_vs_each_distilbert_lora		each_vs_each_distilbert_lora
each_vs_each_roberta		each_vs_each_roberta
each_vs_each_roberta_lora		each_vs_each_roberta_lora
fb		fb
news_category		news_category
sst5		sst5
yelp		yelp
aggregate_each_vs_each.py		aggregate_each_vs_each.py
aggregate_each_vs_each_bert.py		aggregate_each_vs_each_bert.py
aggregate_each_vs_each_bert_lora.py		aggregate_each_vs_each_bert_lora.py
aggregate_each_vs_each_distilbert.py		aggregate_each_vs_each_distilbert.py
aggregate_each_vs_each_distilbert_lora.py		aggregate_each_vs_each_distilbert_lora.py
aggregate_each_vs_each_roberta_lora.py		aggregate_each_vs_each_roberta_lora.py
augnlp.ipynb		augnlp.ipynb
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMs vs Established Text Augmentation Techniques for Classification:

Citing

About

Releases

Packages

Languages

kinit-sk/llms_vs_nlpaug_data_aug

Folders and files

Latest commit

History

Repository files navigation

LLMs vs Established Text Augmentation Techniques for Classification:

Citing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages