Skip to content

An open source chat bot architecture for voice/vision (and multimodal) assistants, local and remote to run; if u run achatbot by yourself, u can learn more, star and fork to contribute~

License

Notifications You must be signed in to change notification settings

ai-bot-pro/achatbot

Repository files navigation

achatbot

PyPI

achatbot factory, create chat bots with llm, asr, tts, vad, ocr, detect object etc..

Project Structure

project-structure

Feature

  • cmd chat bots:
  • support transport connector:
    • pipe(UNIX socket),
    • grpc,
    • queue (redis),
    • websocket
    • TCP/IP socket
  • chat bot processors:
    • aggreators(llm use, assistant message),
    • ai_frameworks
      • langchain rag
      • llamaindex rag
      • autoagen multi agent
    • realtime voice inference(RTVI),
    • transport:
    • ai processor: llm, tts, asr etc..
  • core module:
    • local llm:
      • llama-cpp (support text,vision with function-call model)
      • transformers(manual, pipeline) (support text,vision:🦙,Qwen2-vl,Molmo with function-call model)
      • mlx_lm
    • api llm: personal-ai(like openai api, other ai provider)
  • AI modules:
    • functions:
      • search: search,search1,serper
      • weather: openweathermap
    • speech:
      • asr: sense_voice_asr, whisper_asr, whisper_timestamped_asr, whisper_faster_asr, whisper_transformers_asr, whisper_mlx_asr, lightning_whisper_mlx_asr(!TODO), whisper_groq_asr
      • audio_stream: daily_room_audio_stream(in/out), pyaudio_stream(in/out)
      • detector: porcupine_wakeword,pyannote_vad,webrtc_vad,silero_vad,webrtc_silero_vad
      • player: stream_player
      • recorder: rms_recorder, wakeword_rms_recorder, vad_recorder, wakeword_vad_recorder
      • tts: tts_chat,tts_coqui,tts_cosy_voice,tts_edge,tts_g
      • vad_analyzer: daily_webrtc_vad_analyzer,silero_vad_analyzer
    • vision
      • OCR(Optical Character Recognition):
      • Detector:
        • YOLO (You Only Look Once)
        • RT-DETR (RealTime End-to-End Object Detection with Transformers)
  • gen modules config(*.yaml, local/test/prod) from env with file: .env u also use HfArgumentParser this module's args to local cmd parse args
  • deploy to cloud ☁️ serverless:

Service Deployment Architecture

UI (easy to deploy with github like pages)

Server Deploy (CD)

Install

Note

python --version >=3.10 with asyncio-task

Tip

use uv + pip to run, install the required dependencies fastly, e.g.: uv pip install achatbot uv pip install "achatbot[fastapi_bot_server]"

pypi

python3 -m venv .venv_achatbot
source .venv_achatbot/bin/activate
pip install achatbot
# optional-dependencies e.g.
pip install "achatbot[fastapi_bot_server]"

local

git clone --recursive https://github.com/ai-bot-pro/chat-bot.git
cd chat-bot
python3 -m venv .venv_achatbot
source .venv_achatbot/bin/activate
bash scripts/pypi_achatbot.sh dev
# optional-dependencies e.g.
pip install "dist/achatbot-{$version}-py3-none-any.whl[fastapi_bot_server]"

Run chat bots

Run chat bots with colab notebook

Chat Bot optional-dependencies Colab Device Pipeline Desc
daily_bot
livekit_bot
e.g.:
daily_room_audio_stream | livekit_room_audio_stream,
sense_voice_asr,
groq | together api llm(text),
tts_edge
Open In Colab CPU (free, 2 cores) e.g.:
daily | livekit room in stream
-> silero (vad)
-> sense_voice (asr)
-> groq | together (llm)
-> edge (tts)
-> daily | livekit room out stream
generate_audio2audio remote_queue_chat_bot_be_worker Open In Colab T4(free) e.g.:
pyaudio in stream
-> silero (vad)
-> sense_voice (asr)
-> qwen (llm)
-> cosy_voice (tts)
-> pyaudio out stream
daily_describe_vision_bot
livekit_describe_vision_bot
e.g.:
daily_room_audio_stream | livekit_room_audio_stream
sense_voice_asr,
llm_transformers_manual_vision_qwen,
tts_edge
Open In Colab - Qwen2-VL-2B-Instruct
T4(free)
- Qwen2-VL-7B-Instruct
L4
- Llama-3.2-11B-Vision-Instruct
L4
- allenai/Molmo-7B-D-0924
A100
e.g.:
daily | livekit room in stream
-> silero (vad)
-> sense_voice (asr)
-> qwen-vl (llm)
-> edge (tts)
-> daily | livekit room out stream
daily_chat_vision_bot
livekit_chat_vision_bot
e.g.:
daily_room_audio_stream |livekit_room_audio_stream
sense_voice_asr,
llm_transformers_manual_vision_qwen,
tts_edge
Open In Colab - Qwen2-VL-2B-Instruct
T4(free)
- Qwen2-VL-7B-Instruct
L4
- Ll
ama-3.2-11B-Vision-Instruct
L4
- allenai/Molmo-7B-D-0924
A100
e.g.:
daily | livekit room in stream
-> silero (vad)
-> sense_voice (asr)
-> llm answer guide qwen-vl (llm)
-> edge (tts)
-> daily | livekit room out stream
daily_chat_tools_vision_bot
livekit_chat_tools_vision_bot
e.g.:
daily_room_audio_stream | livekit_room_audio_stream
sense_voice_asr,
groq api llm(text),
tools:
- llm_transformers_manual_vision_qwen,
tts_edge
Open In Colab - Qwen2-VL-2B-Instruct<br
/> T4(free)
- Qwen2-VL-7B-Instruct
L4
- Llama-3.2-11B-Vision-Instruct
L4
- allenai/Molmo-7B-D-0924
A100
e.g.:
daily | livekit room in stream
-> silero (vad)
-> sense_voice (asr)
->llm with tools qwen-vl
-> edge (tts)
-> daily | livekit room out stream
daily_annotate_vision_bot
livekit_annotate_vision_bot
e.g.:
daily_room_audio_stream | livekit_room_audio_stream
vision_yolo_detector
tts_edge
Open In Colab T4(free) e.g.:
daily | livekit room in stream
vision_yolo_detector
-> edge (tts)
-> daily | livekit room out stream
daily_detect_vision_bot
livekit_detect_vision_bot
e.g.:
daily_room_audio_stream | livekit_room_audio_stream
vision_yolo_detector
tts_edge
Open In Colab T4(free) e.g.:
daily | livekit room in stream
vision_yolo_detector
-> edge (tts)
-> daily | livekit room out stream
daily_ocr_vision_bot
livekit_ocr_vision_bot
e.g.:
daily_room_audio_stream | livekit_room_audio_stream
sense_voice_asr,
vision_transformers_got_ocr
tts_edge
Open In Colab T4(free) e.g.:
daily | livekit room in stream
-> silero (vad)
-> sense_voice (asr)
vision_transformers_got_ocr
-> edge (tts)
-> daily | livekit room out stream

Run local chat bots

Note

  1. run pip install "achatbot[local_terminal_chat_bot]" to install dependencies to run local terminal chat bot;

  2. create achatbot data dir in $HOME dir mkdir -p ~/.achatbot/{log,config,models,records,videos};

  3. cp .env.example .env, and check .env, add key/value env params;

  4. select a model ckpt to download:

    • vad model ckpt (default vad ckpt model use silero vad)
    # vad pyannote segmentation ckpt
    huggingface-cli download pyannote/segmentation-3.0  --local-dir ~/.achatbot/models/pyannote/segmentation-3.0 --local-dir-use-symlinks False
    
    • asr model ckpt (default whipser ckpt model use base size)
    # asr openai whisper ckpt
    wget https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt -O ~/.achatbot/models/base.pt
    
    # asr hf openai whisper ckpt for transformers pipeline to load
    huggingface-cli download openai/whisper-base  --local-dir ~/.achatbot/models/openai/whisper-base --local-dir-use-symlinks False
    
    # asr hf faster whisper (CTranslate2)
    huggingface-cli download Systran/faster-whisper-base  --local-dir ~/.achatbot/models/Systran/faster-whisper-base --local-dir-use-symlinks False
    
    # asr SenseVoice ckpt
    huggingface-cli download FunAudioLLM/SenseVoiceSmall  --local-dir ~/.achatbot/models/FunAudioLLM/SenseVoiceSmall --local-dir-use-symlinks False
    
    • llm model ckpt (default llamacpp ckpt(ggml) model use qwen-2 instruct 1.5B size)
    # llm llamacpp Qwen2-Instruct
    huggingface-cli download Qwen/Qwen2-1.5B-Instruct-GGUF qwen2-1_5b-instruct-q8_0.gguf  --local-dir ~/.achatbot/models --local-dir-use-symlinks False
    
    # llm llamacpp Qwen1.5-chat
    huggingface-cli download Qwen/Qwen1.5-7B-Chat-GGUF qwen1_5-7b-chat-q8_0.gguf  --local-dir ~/.achatbot/models --local-dir-use-symlinks False
    
    # llm llamacpp phi-3-mini-4k-instruct
    huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf Phi-3-mini-4k-instruct-q4.gguf --local-dir ~/.achatbot/models --local-dir-use-symlinks False
    
    
    • tts model ckpt (default whipser ckpt model use base size)
    # tts chatTTS
    huggingface-cli download 2Noise/ChatTTS  --local-dir ~/.achatbot/models/2Noise/ChatTTS --local-dir-use-symlinks False
    
    # tts coquiTTS
    huggingface-cli download coqui/XTTS-v2  --local-dir ~/.achatbot/models/coqui/XTTS-v2 --local-dir-use-symlinks False
    
    # tts cosy voice
    git lfs install
    git clone https://www.modelscope.cn/iic/CosyVoice-300M.git ~/.achatbot/models/CosyVoice-300M
    git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git ~/.achatbot/models/CosyVoice-300M-SFT
    git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git ~/.achatbot/models/CosyVoice-300M-Instruct
    #git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git ~/.achatbot/models/CosyVoice-ttsfrd
    
    
  5. run local terminal chat bot with env; e.g.

    • use dufault env params to run local chat bot
    ACHATBOT_PKG=1 TQDM_DISABLE=True \
        python -m achatbot.cmd.local-terminal-chat.generate_audio2audio > ~/.achatbot/log/std_out.log
    

Run remote http fastapi daily chat bots

  1. run pip install "achatbot[fastapi_daily_bot_server]" to install dependencies to run http fastapi daily chat bot;

  2. run below cmd to start http server, see api docs: http://0.0.0.0:4321/docs

    ACHATBOT_PKG=1 python -m achatbot.cmd.http.server.fastapi_daily_bot_serve
    
  3. run chat bot processor, e.g.

    • run a daily langchain rag bot api, with ui/educator-client

    [!NOTE] need process youtube audio save to local file with pytube, run pip install "achatbot[pytube,deep_translator]" to install dependencies and transcribe/translate to text, then chunks to vector store, and run langchain rag bot api; run data process:

    ACHATBOT_PKG=1 python -m achatbot.cmd.bots.rag.data_process.youtube_audio_transcribe_to_tidb
    

    or download processed data from hf dataset weege007/youtube_videos, then chunks to vector store .

    curl -XPOST "http://0.0.0.0:4321/bot_join/chat-bot/DailyLangchainRAGBot" \
     -H "Content-Type: application/json" \
     -d $'{"config":{"llm":{"model":"llama-3.1-70b-versatile","messages":[{"role":"system","content":""}],"language":"zh"},"tts":{"tag":"cartesia_tts_processor","args":{"voice_id":"eda5bbff-1ff1-4886-8ef1-4e69a77640a0","language":"zh"}},"asr":{"tag":"deepgram_asr_processor","args":{"language":"zh","model":"nova-2"}}}}' | jq .
    
    • run a simple daily chat bot api, with ui/web-client-ui (default language: zh)
    curl -XPOST "http://0.0.0.0:4321/bot_join/DailyBot" \
     -H "Content-Type: application/json" \
     -d '{}' | jq .
    

Run remote rpc chat bot worker

  1. run pip install "achatbot[remote_rpc_chat_bot_be_worker]" to install dependencies to run rpc chat bot BE worker; e.g. :
    • use dufault env params to run rpc chat bot BE worker
ACHATBOT_PKG=1 RUN_OP=be TQDM_DISABLE=True \
    TTS_TAG=tts_edge \
    python -m achatbot.cmd.grpc.terminal-chat.generate_audio2audio > ~/.achatbot/log/be_std_out.log
  1. run pip install "achatbot[remote_rpc_chat_bot_fe]" to install dependencies to run rpc chat bot FE;
ACHATBOT_PKG=1 RUN_OP=fe \
    TTS_TAG=tts_edge \
    python -m achatbot.cmd.grpc.terminal-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log

Run remote queue chat bot worker

  1. run pip install "achatbot[remote_queue_chat_bot_be_worker]" to install dependencies to run queue chat bot worker; e.g.:

    • use default env params to run
    ACHATBOT_PKG=1 REDIS_PASSWORD=$redis_pwd RUN_OP=be TQDM_DISABLE=True \
        python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/be_std_out.log
    
    • sense_voice(asr) -> qwen (llm) -> cosy_voice (tts) u can login redislabs create 30M free databases; set REDIS_HOST,REDIS_PORT and REDIS_PASSWORD to run, e.g.:
     ACHATBOT_PKG=1 RUN_OP=be \
       TQDM_DISABLE=True \
       REDIS_PASSWORD=$redis_pwd \
       REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \
       REDIS_PORT=14241 \
       ASR_TAG=sense_voice_asr \
       ASR_LANG=zn \
       ASR_MODEL_NAME_OR_PATH=~/.achatbot/models/FunAudioLLM/SenseVoiceSmall \
       N_GPU_LAYERS=33 FLASH_ATTN=1 \
       LLM_MODEL_NAME=qwen \
       LLM_MODEL_PATH=~/.achatbot/models/qwen1_5-7b-chat-q8_0.gguf \
       TTS_TAG=tts_cosy_voice \
       python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/be_std_out.log
    
  2. run pip install "achatbot[remote_queue_chat_bot_fe]" to install the required packages to run quueue chat bot frontend; e.g.:

    • use default env params to run (default vad_recorder)
    ACHATBOT_PKG=1 RUN_OP=fe \
        REDIS_PASSWORD=$redis_pwd \
        REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \
        REDIS_PORT=14241 \
        python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
    
    • with wake word
    ACHATBOT_PKG=1 RUN_OP=fe \
        REDIS_PASSWORD=$redis_pwd \
        REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \
        REDIS_PORT=14241 \
        RECORDER_TAG=wakeword_rms_recorder \
        python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
    
    • default pyaudio player stream with tts tag out sample info(rate,channels..), e.g.: (be use tts_cosy_voice out stream info)
     ACHATBOT_PKG=1 RUN_OP=fe \
         REDIS_PASSWORD=$redis_pwd \
         REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \
         REDIS_PORT=14241 \
         RUN_OP=fe \
         TTS_TAG=tts_cosy_voice \
         python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
    

    remote_queue_chat_bot_be_worker in colab examples : Open In Colab

    • sense_voice(asr) -> qwen (llm) -> cosy_voice (tts):

Run remote grpc tts speaker bot

  1. run pip install "achatbot[remote_grpc_tts_server]" to install dependencies to run grpc tts speaker bot server;
ACHATBOT_PKG=1 python -m achatbot.cmd.grpc.speaker.server.serve
  1. run pip install "achatbot[remote_grpc_tts_client]" to install dependencies to run grpc tts speaker bot client;
ACHATBOT_PKG=1 TTS_TAG=tts_edge IS_RELOAD=1 python -m achatbot.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_g IS_RELOAD=1 python -m achatbot.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_coqui IS_RELOAD=1 python -m achatbot.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_chat IS_RELOAD=1 python -m achatbot.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_cosy_voice IS_RELOAD=1 python -m achatbot.cmd.grpc.speaker.client

Multimodal Interaction

audio (voice)

  • stream-stt (realtime-recorder) audio-text

  • audio-llm (multimode-chat) pipe queue

  • stream-tts (realtime-(clone)-speaker) text-audio audio-text text-audio

vision (CV)

  • stream-ocr (realtime-object-detection)

more

  • Embodied Intelligence: Robots that touch the world, perceive and move

About

An open source chat bot architecture for voice/vision (and multimodal) assistants, local and remote to run; if u run achatbot by yourself, u can learn more, star and fork to contribute~

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages