GitHub - sleepyBear92/CheatGPT_For_Labeling: Labeling text data with ChatGPT

Introduction

ChatGPT is a really good generation model. I'm not capable of developing such a model, but I thought. "Could I develop a model similar to ChatGPT with as few resources as possible?" So I decided to label the data with ChatGPT and learn this labeling, which has two advantages.

1. we can reduce the resources for data acquisition.

As of GPT3.5, one data instance costs only $0.008192!

2. We can develop a model that can't beat ChatGPT, but can come close.

You can experiment with different models using ChatGPT data, and use ChatGPT as a benchmark to select effective experiments.

To conclude, we've written an example code to call ChatGPT using the API to label a conversation, which is designed to summarize the conversation. If you want to do different labeling, just change the data and prompt!

How to use it

1. Prepare your data

Prepare the data you want to label in a txt file.
In the txt file, one instance of data should be separated by "\n\n". To change the delimiter, change the value of "sep" in "load_data" in "main.py".
See the example data('data' folder) and preprocessing code('sampling_example.ipynb') for details

2. Labeling

python main.py --key {your_key} --model {model_name} --txt_path {sample.txt}

Model Name	Description
gpt-3.5-turbo	Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with our latest model iteration.
gpt-3.5-turbo-0301	Snapshot of gpt-3.5-turbo from March 1st 2023. Unlike gpt-3.5-turbo, this model will not receive updates, and will only be supported for a three month period ending on June 1st 2023.
text-davinci-003	Can do any language task with better quality, longer output, and consistent instruction-following than the curie, babbage, or ada models. Also supports inserting completions within text.
text-davinci-002	Similar capabilities to text-davinci-003 but trained with supervised fine-tuning instead of reinforcement learning
text-curie-001	Very capable, faster and lower cost than Davinci.
text-babbage-001	Capable of straightforward tasks, very fast, and lower cost.
text-ada-001	Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost.

Warning: Be sure to check your usage. If you don't have any credits available, you won't be able to use it anymore.

3. ETC

You can also develop a post-processing module in 'postprocessor' to get only labeling results from the API results. This will require modifications based on the API results of the modified prompt.
For an example, see 'postprocessing.py' in 'postprocessor'

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
assets		assets
data		data
postprocessor		postprocessor
prompt		prompt
.gitignore		.gitignore
api.py		api.py
main.py		main.py
prefix.txt		prefix.txt
readme.md		readme.md
sample_data.txt		sample_data.txt
sampling_example.ipynb		sampling_example.ipynb
surfix.txt		surfix.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

How to use it

About

Releases

Packages

Languages

sleepyBear92/CheatGPT_For_Labeling

Folders and files

Latest commit

History

Repository files navigation

Introduction

How to use it

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages