WhatsApp Chat to JSONL Converter

This project provides a command-line tool for converting WhatsApp chat exports into JSONL (JSON Lines) format. This formatted output can be used to fine-tune an OpenAI GPT model or other conversational AI applications. It includes various utilities such as text cleaning, message moderation, and chat processing.

Features

Chat Parsing: Parses WhatsApp chat exports and extracts messages with timestamps, sender information, and content.
Moderation: Optionally moderates chat messages using OpenAI's moderation API, filtering out inappropriate content.
Anonymization: Anonymizes senders other than the assistant in the conversation.
Conversation Grouping: Groups messages into conversations, handling time gaps and managing assistant/user message flows.
Paired Conversations: Supports generating paired conversations for structured data preparation.
Output: Writes the processed chat data to JSONL files for use in GPT model fine-tuning.
Logging: Provides detailed logging for process tracing and debugging.

Installation

Prerequisites

Python 3.8+
pip package manager

Setup

Clone the repository:

git clone https://github.com/jaylann/WAC2J.git
cd WAC2J

Install the package locally:
```
pip install .
```

Usage

Command-line Interface

To run the chat processor, use the following command:

wac2j --sys-prompt "Your system prompt here" --name "AssistantName" input.txt

Options

--sys-prompt: (Required) System prompt to include in each conversation.
--name: (Required) Name of the assistant in the chat.
--threshold: Moderation threshold (default: 0.7).
--api-key: OpenAI API key (if not provided in .env).
--max-chars: Maximum characters per conversation (default: 8000).
--pairs: Use paired mode for conversations.
--output: Output file path (default: same as input with .jsonl extension).
--dir: Directory containing input files to process.
--no-mod: Skip moderation.
--merge: Merge all JSONL files together.
input: Input chat file path.

Example

To process a single chat export file:

wac2j --sys-prompt "Your system prompt" --name "Assistant" --threshold 0.8 --output output.jsonl input.txt

To process all .txt files in a directory and merge them into a single JSONL file:

wac2j --sys-prompt "Your system prompt" --name "Assistant" --dir /path/to/chats --merge

Extending the Project

This project is designed to be modular. You can easily extend the functionality by adding new models in the models directory or additional processing functions in the processing.py file. One example use case would be the conversation format. For example instead of using PersonX: PersonY: each time you could just remove it.

Contributing

Feel free to fork this repository and create pull requests to improve the project. Make sure to write clear commit messages and add comments to your code where necessary.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhatsApp Chat to JSONL Converter

Features

Installation

Prerequisites

Setup

Usage

Command-line Interface

Options

Example

Extending the Project

Contributing

License

About

Releases

Packages

Languages

License

jaylann/WAC2J

Folders and files

Latest commit

History

Repository files navigation

WhatsApp Chat to JSONL Converter

Features

Installation

Prerequisites

Setup

Usage

Command-line Interface

Options

Example

Extending the Project

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages