Skip to content

Commit

Permalink
Refactor README.md for improved clarity and conciseness
Browse files Browse the repository at this point in the history
  • Loading branch information
yaniv-golan committed Oct 12, 2024
1 parent c01c7ba commit 8fe4157
Showing 1 changed file with 33 additions and 102 deletions.
135 changes: 33 additions & 102 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# YAWT (Yet Another Whisper-based Transcriber)

YAWT is an audio transcription tool that leverages OpenAI's Whisper model to provide accurate and efficient audio-to-text conversion. With built-in speaker diarization using PyAnnote and support for multiple upload services, YAWT is designed to cater to diverse transcription needs with ease.
YAWT is an audio transcription tool that utilizes OpenAI's Whisper model for efficient audio-to-text conversion. It incorporates speaker diarization using PyAnnote and supports multiple upload services, enabling flexible transcription workflows.

## Table of Contents

Expand All @@ -14,28 +14,27 @@ YAWT is an audio transcription tool that leverages OpenAI's Whisper model to pro

## Features

- **Accurate Transcription:** Utilizes OpenAI's Whisper model for high-quality audio-to-text conversion.
- **Speaker Diarization:** Differentiates between multiple speakers in an audio file using PyAnnote.
- **Multiple Output Formats:** Supports exporting transcriptions in `text`, `json`, and `srt` formats.
- **Upload Services Integration:** Seamlessly uploads audio files to services like `0x0.st` and `file.io`.
- **Configurable Timeouts and Costs:** Allows customization of timeout settings and provides cost estimations based on usage.
- **Logging:** Comprehensive logging with configurable log levels and log rotation to monitor and debug processes.
- **Dry-Run Mode:** Estimate processing costs without executing the actual transcription.
- **Transcription:** Uses OpenAI's Whisper model for audio-to-text conversion.
- **Speaker Diarization:** Identifies and separates individual speakers within an audio file using PyAnnote.
- **Output Formats:** Exports transcriptions in `text`, `json`, and `srt` formats.
- **Upload Services:** Uploads audio files to services like `0x0.st` and `file.io`.
- **Configurable Timeouts and Costs:** Users can customize timeout settings and view cost estimations based on usage.
- **Logging:** Includes logging with configurable levels and log rotation for monitoring and debugging.
- **Dry-Run Mode:** Allows cost estimation without executing the transcription process.

## Installation

### Prerequisites

- **Python 3.11** or higher
- **Poetry** for dependency management (optional but recommended)
- **Poetry** for dependency management

### Steps

1. **Clone the Repository:**
1. **Install YAWT using Poetry:**

```bash
git clone https://github.com/yaniv-golan/YAWT/yawt.git
cd yawt
poetry add git+https://github.com/yaniv-golan/YAWT.git@latest
```

2. **Set Up a Virtual Environment:**
Expand Down Expand Up @@ -78,55 +77,19 @@ YAWT is an audio transcription tool that leverages OpenAI's Whisper model to pro
OPENAI_KEY=your_openai_api_key_here
```

- **Configure `config/default_config.yaml`:**

Adjust the configurations as needed, such as API costs, logging settings, and supported upload services.

```yaml
# API Costs
api_costs:
whisper:
cost_per_minute: 0.006 # USD per minute for Whisper
pyannote:
cost_per_hour: 0.18 # USD per hour for diarization
# Logging Configuration
logging:
log_directory: "logs"
max_log_size: 10485760 # 10 MB in bytes
backup_count: 5
# Model Configuration
model:
default_model_id: "openai/whisper-large-v3"
# Supported Services
supported_upload_services:
- "0x0.st"
- "file.io"
# Timeout Settings (in seconds)
timeouts:
download_timeout: 60 # Default download timeout
upload_timeout: 120 # Default upload timeout
diarization_timeout: 3600
job_status_timeout: 60
# Transcription Settings
transcription:
generate_timeout: 300 # Timeout for transcription in seconds
max_target_positions: 448
buffer_tokens: 10 # Reduced from 445 to 10
# API Tokens
# These can also be set via environment variables in the .env file
# pyannote_token: "your_pyannote_api_token_here"
# openai_key: "your_openai_api_key_here"
- **Optional: Create a Custom Configuration File:**

Instead of using the default configuration, you can create a custom configuration file and specify its path using command-line arguments when running YAWT.

```bash
poetry run yawt --config path/to/your_config.yaml
```

## Configuration

YAWT's behavior can be customized via the `config/default_config.yaml` file and environment variables. Here's a breakdown of the key configurations:
YAWT's behavior can be customized via the `config.py` module and environment variables. Additionally, you have the option to create a custom configuration file and specify its path using command-line arguments when running the application.
Here's a breakdown of the key configurations:

- **API Costs:**
- `whisper.cost_per_minute`: Cost per minute for using the Whisper model.
Expand Down Expand Up @@ -167,27 +130,35 @@ YAWT can be used via the command line to transcribe audio files either from a lo
1. **Transcribe a Local Audio File:**

```bash
python src/yawt/main.py --input-file path/to/audio.wav
poetry run yawt --input-file path/to/audio.wav
```

2. **Transcribe an Audio File from a URL:**

```bash
python src/yawt/main.py --audio-url https://example.com/audio.wav
poetry run yawt --audio-url https://example.com/audio.wav
```

3. **Estimate Cost Without Transcription (Dry Run):**

```bash
python src/yawt/main.py --input-file path/to/audio.wav --dry-run
poetry run yawt --input-file path/to/audio.wav --dry-run
```

4. **Specify Output Formats:**

```bash
python src/yawt/main.py --input-file path/to/audio.wav --output-format text json srt
poetry run yawt --input-file path/to/audio.wav --output-format text json srt
```

### Sample Command with Multiple Command-Line Arguments

To run YAWT with a custom configuration file, enable verbose logging, specify the language, and set the number of speakers:

```bash
poetry run yawt --input-file path/to/audio.wav --config config/custom_config.yaml --verbose --language English --num-speakers 2
```

### Available Options

- `--audio-url`: Publicly accessible URL of the audio file to transcribe.
Expand All @@ -206,46 +177,7 @@ YAWT can be used via the command line to transcribe audio files either from a lo

## Testing

YAWT includes a comprehensive test suite to ensure reliability and correctness. Tests are written using `pytest` and utilize `unittest.mock` for mocking external dependencies.

### Running Tests

1. **Ensure All Dependencies Are Installed:**

```bash
pip install -r requirements.txt
```

2. **Run All Tests:**

```bash
pytest
```

3. **Run a Specific Test:**

```bash
pytest tests/test_audio_handler.py::test_load_audio_ffmpeg_error
```

### Test Coverage

The test suite covers various components, including:

- **Audio Handling (`tests/test_audio_handler.py`):**
Tests for loading audio, handling FFmpeg errors, uploading files to supported services, and downloading audio files.

- **Diarization (`tests/test_diarization.py`):**
Tests for submitting diarization jobs, handling rate limits, and checking job statuses.

- **Main Application (`tests/test_main.py`):**
Tests for the main transcription flow, including successful transcriptions and handling failures.

- **Logging Setup (`tests/test_logging_setup.py`):**
Tests for configuring logging based on different settings.

- **Transcription (`tests/test_transcription.py`):**
Tests for transcription generation, handling timeouts, and retry mechanisms.
YAWT includes a set of tests to ensure basic functionality. Tests are written using `pytest` and utilize `unittest.mock` for mocking external dependencies. The test suite is not yet comprehensive and has not been integrated into the build process.

## Contributing

Expand Down Expand Up @@ -278,4 +210,3 @@ Please ensure your code adheres to the existing style and passes all tests.
## License
This project is licensed under the [MIT License](LICENSE).

0 comments on commit 8fe4157

Please sign in to comment.