Bucket

A GPT discord bot that uses your own server for training data. This repo is to help you build that bot.

Why'd you name it Bucket?

seemed funny.

Getting Started

Step 0 - Stuff you need

You will need:

Python
Node.JS
(at least) 5 dollars
consent

Step 1 - Prep

You should probably get consent before doing all this!

To get started, you'll need chat data. Use DiscordChatExporter: https://github.com/Tyrrrz/DiscordChatExporter
In DiscordChatExporter, choose the channel you want to use for the training data, make sure to download it as json. This will probably take a while.

You will probably want to set a partition limit! I used 10mb.

Once you have your json files, make a folder called dirty-data in the preparation folder, and put all the json files in there.
Run the jsoncleaner.py python script.
When asked, enter your system prompt.

Your system prompt sets the "context" for the AI Model, as well as placing restrictions or "boundaries" on its responses. You may want to write this down for future steps, but you can also just grab it from the output.jsonl file.

Step 2 - Training

This step will cost you at least $5

There is a script in the training folder, but I would just use the web ui for this: https://platform.openai.com/finetune/

You can only use gpt-3.5-turbo-xxxx models for fine tuning with the data you've generated.

Training will also take a while, especially if you've given it a lot of data. For me training a GPT3.5 model with ~2048 lines of data will run you about $2.

Step 3 - Releasing it into the wild (Discord)

Your AI Model will want to say slurs after a while, no matter how much you train it. There's a filter in place which should block most if not all of them, and a well crafted system prompt will prevent them as well. We will need a better solution for "ignoring" them from OpenAI's data in the future, though.

Rename frontend/config.sample.json to config.json, and get ready to enter a lot of settings:

discordToken: your discord api key you got from the developer's portal
allowedChannelId: the channel you want the bot to look at for pings (can be a thread)
trainEmoji: an emoji reaction you want the bot to watch for to save the response (and original message) for future training
reactionCount: how many reactions until the bot should save the message
openaiapi section:
- apiKey: your OpenAI api key
- modelId: your fine tuned model id

you should leave these settings as is, but here is what these do

maxTokens: how many tokens your bot can use per response
- default: 256
temperature: how "random" you want the bot to be, can be 0-2. lower is less "random"
- default: 0.9
presencePenalty: how "on topic" the bot should be, can be 0-2. lower is more "on topic"
- default: 0.3
frequencyPenalty: how "repetitive" the bot should be, can be 0-2. lower is more "repetitive"
- default: 0.3
severityCategory: what "level" of slurs and bad words should we filter, can be 0-3.
- default: 2.6

systemPrompt: your system prompt that the ai will use

A note on system prompts

While you're in config.json, you need to add a system prompt. This sets the guidelines and "boundaries" that the AI mostly follows. You can use the same system prompt that was used in jsoncleaner.py, but now would be the best time to mess around and see what gives you the best results.

allowedUserTag: a discord user id for a user who can use the bot in any channel
removePings: can be 0 or 1, 1 to remove pings, 0 to allow them
removeLinks: can be 0 or 1, 1 to remove links, 0 to allow them

Open a terminal/command prompt in the frontend folder, then run npm install to grab all the dependencies
Once that's done, run npm start to start the bot!

Bucket will log responses, and who triggered the bot in the /frontend/logs/ folder.

Contributing

Feel Free! If you want to change something just open a PR.

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
frontend		frontend
old/bot		old/bot
preparation		preparation
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bucket.png		bucket.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bucket

Why'd you name it Bucket?

Getting Started

Step 0 - Stuff you need

Step 1 - Prep

Step 2 - Training

Step 3 - Releasing it into the wild (Discord)

Contributing

About

Releases 5

Languages

License

rungulus/Bucket

Folders and files

Latest commit

History

Repository files navigation

Bucket

Why'd you name it Bucket?

Getting Started

Step 0 - Stuff you need

Step 1 - Prep

Step 2 - Training

Step 3 - Releasing it into the wild (Discord)

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Languages