A framework for AI WhatsApp calls using Whisper, Coqui TTS, GPT-3.5 Turbo, Virtual Audio Cable, and the WhatsApp Desktop App.
Demo.mp4
- Whisper (Speech to Text)
- OpenAI GPT 3.5 Turbo
- Coqui TTS
- Virtual Audio Cable
- WhatsApp Desktop App
Download Visual Studio Installer
Note: You need two separate Virtual Audio Cables. I am using VB Audio Cable and (VAC) Virtual Audio Cable. Install both.
Download Whatsapp
https://github.com/skshadan/WhisCall.git
pip install -r requirements.txt
Run the below code to find the index of your virtual audio cable for the microphone and speaker.
import pyaudio
def list_audio_devices():
p = pyaudio.PyAudio()
info = p.get_host_api_info_by_index(0)
num_devices = info.get('deviceCount')
# Lists of devices to return
speakers = []
microphones = []
# Scan through devices and add to list
for i in range(0, num_devices):
device = p.get_device_info_by_index(i)
if device.get('maxInputChannels') > 0:
microphones.append((i, device.get('name')))
if device.get('maxOutputChannels') > 0:
speakers.append((i, device.get('name')))
p.terminate()
return microphones, speakers
microphones, speakers = list_audio_devices()
print("Microphones:")
for idx, name in microphones:
print(f"Index: {idx}, Name: {name}")
print("\nSpeakers:")
for idx, name in speakers:
print(f"Index: {idx}, Name: {name}")
from voice import select_microphone, transcribe_audio
from response import generate_response, text_to_speech, PlayAudio
def main():
mic_index = select_microphone()
for text in transcribe_audio(mic_index):
if text:
gpt_response = generate_response(text)
text_to_speech(gpt_response)
PlayAudio()
if __name__ == "__main__":
main()
If you want different voices, you need to change the TTS model as follows:
Download Models From Here:
Feel free to ask if you are having any issues. Also, feel free to contribute.