Some love it, some hate it, others don't even know that it exists, just like the real-life okra!
Okra is your all-in-one personal AI assistant. This is my effort at recreating something similar to ChatGPT's desktop application. Even though it has a LOT of room for improvement, it's still pretty fun to play with.
- Speech recognition: Okra listens to you in the background and recognizes your speech, using the power of the well-known SpeechRecognition library.
- Speech-to-text conversion: Okra uses external speech-to-text APIs to transcribe your speech. The currently supported speech-to-text providers are:
- Vision capabilities: You can share your webcam feed or your computer screen with okra, and it will use the image to chat with you and answer your questions!
- Multiple LLM support: Okra supports multiple LLM and VLM API providers. The currently available providers are:
- Text-to-speech capability: Okra can speak to you, using various text-to-speech models. Currently, it supports:
To install, do the following:
- Clone the repository:
git clone https://github.com/S4mpl3r/okra.git
- Create a python environment and activate it. (optional, but highly recommended)
# Windows python -m venv .venv .venv/Scripts/activate # Linux python3 -m venv .venv source .venv/bin/activate
- Create a
.env
file in the project root and populate it with your API keys according to the.env.example
file provided:DEEPGRAM_API_KEY= GROQ_API_KEY= GOOGLE_API_KEY= OPENAI_API_KEY=
- Install the required packages
# Windows python -m pip install -r requirements.txt # Linux python3 -m pip install -r requirements.txt
- Edit the
okra/config.py
file to your liking. The default configuration usesgemini-1.5-flash
as the llm,groq
as the speech-to-text provider, anddeepgram
as the text-to-speech provider:# okra/config.py config: GlobalConfig = { # Make this False if you don't want to use vision, # or the model that you use does not support it "use_vision": True, # Make this False if you don't want okra to generate speech "talk": True, # The source of vision, can be either 'screen' (your computer screen) or 'webcam' "image_source": "screen", # The llm to use "llm": Gemini( model_name="models/gemini-1.5-flash-latest", system_prompt=system_prompt, max_history_length=10, ), # The speech-to-text model to use "speech_to_text": GroqSpeechToText(), # The text-to-speech model to use "text_to_speech": DeepgramTextToSpeech(), }
- Run the tool:
# Windows python okra.py # Linux python3 okra.py
You can edit the okra/config.py
file to change the behavior of okra to your liking. You have access to:
- 3 LLM classes found in
okra.llm
subpackage:Gemini
GPT
GroqLLM
- 3 speech-to-text classes found in
okra.speech
subpackage:DeepgramSpeechToText
OpenAISpeechToText
GroqSpeechToText
- 2 text-to-speech classes found in
okra.speech
subpackage:DeepgramTextToSpeech
OpenAITextToSpeech
- LLM: OpenAI
- Speech-to-text: Deepgram
- Text-to-speech: Deepgram
- Vision source: screen
# okra/config.py
config: GlobalConfig = {
"use_vision": True,
"talk": True,
"image_source": "screen",
"llm": GPT(
model_name="gpt-4o",
system_prompt=system_prompt,
max_history_length=10,
),
"speech_to_text": DeepgramSpeechToText(),
"text_to_speech": DeepgramTextToSpeech(),
}
- LLM: Groq
- Speech-to-text: Deepgram
- Text-to-speech: OpenAI
- No vision
# okra/config.py
config: GlobalConfig = {
"use_vision": False, # Groq does not support vision models (yet)
"talk": True,
"image_source": "screen",
"llm": GroqLLM(
model_name="llama3-70b-8192",
system_prompt=system_prompt,
max_history_length=10,
),
"speech_to_text": DeepgramSpeechToText(),
"text_to_speech": OpenAITextToSpeech(),
}
If you run python okra.py -h
, you'll get:
usage: python okra.py [options]
Okra is your all in one desktop AI voice assistant.
options:
-h, --help show this help message and exit
--skip-intro skip intro
--no-music do not play intro music
By default, okra will play an intro cutscene and music 1 (just for fun, lol). If you want to skip this intro, run python okra.py --skip-intro
. If you just want to mute the music, run python okra.py --no-music
.
To exit the assistant, type 'q' and press enter in the terminal.
Have fun!
MIT