Okra

Some love it, some hate it, others don't even know that it exists, just like the real-life okra!

Okra is your all-in-one personal AI assistant. This is my effort at recreating something similar to ChatGPT's desktop application. Even though it has a LOT of room for improvement, it's still pretty fun to play with.

Features

Speech recognition: Okra listens to you in the background and recognizes your speech, using the power of the well-known SpeechRecognition library.
Speech-to-text conversion: Okra uses external speech-to-text APIs to transcribe your speech. The currently supported speech-to-text providers are:
- Deepgram
- Groq
- OpenAI
Vision capabilities: You can share your webcam feed or your computer screen with okra, and it will use the image to chat with you and answer your questions!
Multiple LLM support: Okra supports multiple LLM and VLM API providers. The currently available providers are:
Text-to-speech capability: Okra can speak to you, using various text-to-speech models. Currently, it supports:
- Deepgram
- OpenAI

Installation

To install, do the following:

Clone the repository:

git clone https://github.com/S4mpl3r/okra.git

Create a python environment and activate it. (optional, but highly recommended)

# Windows
python -m venv .venv
.venv/Scripts/activate
# Linux
python3 -m venv .venv
source .venv/bin/activate

Create a .env file in the project root and populate it with your API keys according to the .env.example file provided:
```
DEEPGRAM_API_KEY=
GROQ_API_KEY=
GOOGLE_API_KEY=
OPENAI_API_KEY=
```

Install the required packages

# Windows
python -m pip install -r requirements.txt
# Linux
python3 -m pip install -r requirements.txt

Edit the okra/config.py file to your liking. The default configuration uses gemini-1.5-flash as the llm, groq as the speech-to-text provider, and deepgram as the text-to-speech provider:

# okra/config.py
config: GlobalConfig = {
     # Make this False if you don't want to use vision,
     # or the model that you use does not support it
     "use_vision": True,
     # Make this False if you don't want okra to generate speech
     "talk": True,
     # The source of vision, can be either 'screen' (your computer screen) or 'webcam' 
     "image_source": "screen",
     # The llm to use
     "llm": Gemini(
         model_name="models/gemini-1.5-flash-latest",
         system_prompt=system_prompt,
         max_history_length=10,
     ),
     # The speech-to-text model to use
     "speech_to_text": GroqSpeechToText(),
     # The text-to-speech model to use
     "text_to_speech": DeepgramTextToSpeech(),
 }

Run the tool:

# Windows
python okra.py
# Linux
python3 okra.py

Options

You can edit the okra/config.py file to change the behavior of okra to your liking. You have access to:

3 LLM classes found in okra.llm subpackage:
- Gemini
- GPT
- GroqLLM
3 speech-to-text classes found in okra.speech subpackage:
- DeepgramSpeechToText
- OpenAISpeechToText
- GroqSpeechToText
2 text-to-speech classes found in okra.speech subpackage:
- DeepgramTextToSpeech
- OpenAITextToSpeech

Example config 1

LLM: OpenAI
Speech-to-text: Deepgram
Text-to-speech: Deepgram
Vision source: screen

# okra/config.py
config: GlobalConfig = {
    "use_vision": True,
    "talk": True,
    "image_source": "screen",
    "llm": GPT(
        model_name="gpt-4o",
        system_prompt=system_prompt,
        max_history_length=10,
    ),
    "speech_to_text": DeepgramSpeechToText(),
    "text_to_speech": DeepgramTextToSpeech(),
}

Example config 2

LLM: Groq
Speech-to-text: Deepgram
Text-to-speech: OpenAI
No vision

# okra/config.py
config: GlobalConfig = {
    "use_vision": False, # Groq does not support vision models (yet)
    "talk": True,
    "image_source": "screen",
    "llm": GroqLLM(
        model_name="llama3-70b-8192",
        system_prompt=system_prompt,
        max_history_length=10,
    ),
    "speech_to_text": DeepgramSpeechToText(),
    "text_to_speech": OpenAITextToSpeech(),
}

Usage

If you run python okra.py -h, you'll get:

usage: python okra.py [options]

Okra is your all in one desktop AI voice assistant.

options:
  -h, --help    show this help message and exit
  --skip-intro  skip intro
  --no-music    do not play intro music

By default, okra will play an intro cutscene and music ¹ (just for fun, lol). If you want to skip this intro, run python okra.py --skip-intro. If you just want to mute the music, run python okra.py --no-music.

To exit the assistant, type 'q' and press enter in the terminal.

Have fun!

License

MIT

The intro music was created with Suno ↩

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
okra		okra
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
okra.py		okra.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Okra

Features

Installation

Options

Example config 1

Example config 2

Usage

License

About

Releases

Languages

License

S4mpl3r/okra

Folders and files

Latest commit

History

Repository files navigation

Okra

Features

Installation

Options

Example config 1

Example config 2

Usage

License

Footnotes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages