Releases: dsfsi/textaugment
Releases · dsfsi/textaugment
2.0.0 16-11-2023
-
now supports gensim >= 4
-
now support fasttext models
-
enhanced code to allow user to select top n words (synonyms/most similar words)
-
added punctuation insertion
1.3.4 05-11-2020
- Fixed minor issues
1.3.3 21-10-2020
- Added support for Fasttext augmentation
- Added example notebook for Fasttext augmentation
1.3.2 10-06-2020
- minor updates
1.3.1 29-05-2020
- fix minor issues
1.3 29-05-2020
- added mixup augmentation algorithm for NLP
1.2 23-05-2020
- Added support for EDA algorithm
- Added examples using Jupyter notebook
1.1, 16-07-2019
Updated ReadMe and icons.
- Added licence icon.
- Release icon.
- Wheel icon.
- Python version icon.
Added pre-print paper citation.
Initial release, 16-07-2019
TextAugment is a Python 3 library for augmenting text for natural language processing applications. TextAugment stands on the giant shoulders of NLTK, Gensim, and TextBlob and plays nicely with them.
Requirements
- Python 3
The following software packages are dependencies and will be installed automatically.
$ pip install numpy nltk gensim textblob googletrans
The following code downloads wordnet, tokenizer, and part-of-speech tagger model.
nltk.download('wordnet')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
Install from pip [Recommended]
$ pip install textaugment
How to use
>>> from textaugment import Word2vec
>>> t = Word2vec(model='path/to/gensim/model'or 'gensim model itself')
>>> t.augment('The stories are good')
The films are good
Citation
@article{marivate2019improving,
title={Improving short text classification through global augmentation methods},
author={Marivate, Vukosi and Sefara, Tshephisho},
journal={arXiv preprint arXiv:1907.03752},
year={2019}
}