Skip to content

biocypher/biochatter-light

Repository files navigation

💬🧬 BioChatter Light

This app demonstrates workflows in biomedical research and application, assisted by large language models. Find the deployed app at https://light.biochatter.org. This app is a development platform and framework demonstration, not a commercial service. We are commited to open source and very open to comments, criticisms, and contributions! Read the preprint here!

This repository contains only the frontend code of our streamlit app. The code base used for communication with the LLMs, vector databases, and other components of our project is developed at https://github.com/biocypher/biochatter. Check there if you have your own UI and are looking for a way to connect it to the world of LLMs! If you are looking for a full-featured client-server web application, check out BioChatter Next, developed at https://github.com/biocypher/biochatter-server and https://github.com/biocypher/biochatter-next.

🚀 Quick start

If you want to build your own version of the app, you can modify all components of the workflow by cloning this repository and running the app locally. You can also deploy the app on your own server or cloud service. The app is built using Streamlit, a Python library for creating web applications from Python scripts.

Tab selection

You can use environment variables to select the tabs to display; these can also be defined in the docker-compose.yml file. The following environment variables are available:

  • Basic tabs
    • CHAT_TAB: Show the chat tab.
    • PROMPT_ENGINEERING_TAB: Show the prompt engineering tab.
    • CORRECTING_AGENT_TAB: Show the correcting agent tab.
  • Retrieval-augmented generation
    • RAG_TAB: Show the retrieval-augmented generation tab.
    • KNOWLEDGE_GRAPH_TAB: Show the knowledge graph tab.
  • Special use cases
    • CELL_TYPE_ANNOTATION_TAB: Show the cell type annotation tab.
    • EXPERIMENTAL_DESIGN_TAB: Show the experimental design tab.
    • GENETICS_ANNOTATION_TAB: Show the genetics annotation tab.
    • LAST_WEEKS_SUMMARY_TAB: Show the last week's summary tab (project management).
    • THIS_WEEKS_TASKS_TAB: Show this week's tasks tab (project management).
    • TASK_SETTINGS_PANEL_TAB: Show the task settings panel tab (project management).
    • FILLING_TEMPLATE_TAB: Show the template filling tab.
    • FILLING_TEMPLATE_API_URL: The URL with the CSV templates for the above tab.

Simply set these to true to show the corresponding tab. We also have the DOCKER_COMPOSE environment variable, which we use to signal to the app that it is running inside a Docker container; you won't need to set this manually.

LLM connectivity and selection

The default use case, demonstrated in docker-compose.yml, is to connect to the OpenAI API, which allows quick development and testing without the need for deploying and managing your own models. In addition, these models currently give best overall performance. To use this feature, you have to provide your own API key in the OPENAI_API_KEY environment variable, as well have a valid and billable account with OpenAI.

Open-source model deployment

We allow deployment and connection to locally hosted models via the Ollama software and Python API. Instead of providing an OpenAI API key, you can connect to the locally or remotely hosted Ollama command line interface using the OLLAMA_MODEL environment variable to select one of the available models from the Ollama library. For instance, OLLAMA_MODEL=llama3.1 will launch and connect to the 8B default variant of Llama-3.1 instruct.

By default, we connect to port 11434 on localhost, which is the initial setting for Ollama. You can change the URL, including pointing to a remote server, by setting the OLLAMA_URL environment variable. For instance, if you want to connect to a locally running Ollama instance from within a Docker container, you can set OLLAMA_URL=http://host.docker.internal:11434. This is demonstrated in the pole example repository, in line 60 of the docker-compose-ollama.yml file.

Neo4j connectivity and authentication

If you want to connect a Neo4j knowledge graph to the BioChatter app, you can set some environment variables to configure the connection. The following variables are available:

  • NEO4J_URI: The URI of the Neo4j database, e.g., bolt://localhost:7687.
  • NEO4J_USER: The username for the Neo4j database, e.g., neo4j.
  • NEO4J_PASSWORD: The password for the Neo4j database, e.g., password.
  • NEO4J_DBNAME: The name of the Neo4j database to connect to, e.g., neo4j.

The knowledge graph tab allows the specification of URI (hostname and port), the username and password in the UI. The database name is set to neo4j by default, if you have a different one, please set the environment variable.

🤝 Get involved!

To stay up to date with the project, please star the repository and watch the zulip community chat (free to join) at https://biocypher.zulipchat.com. Related discussion happens in the #biochatter stream.

We are very happy about contributions from the community, large and small! If you would like to contribute to the platform's development, please refer to our contribution guidelines. :)

Importantly, you don't need to be an expert on any of the technical aspects of the project! As long as you are interested and would like to help make this platform a great open-source tool, you're good. 🙂

Imposter syndrome disclaimer: We want your help. No, really. There may be a little voice inside your head that is telling you that you're not ready, that you aren't skilled enough to contribute. We assure you that the little voice in your head is wrong. Most importantly, there are many valuable ways to contribute besides writing code.

This disclaimer was adapted from the Pooch project.

🛠 Prompt engineering discussions

You can discuss your favourite prompt setups and share the corresponding JSON files in the discussion here! You can go here to find inspiration for things the model can do, such as creating formatted markdown output to create mindmaps or other visualisations.

📑 Retrieval-Augmented Generation / In-context learning

You can use the Retrieval-Augmented Generation (RAG) feature to upload documents and use similarity search to inject context into your prompts. The RAG feature is currently only available on local builds of BioChatter Light (see below). It requires a connection to a vector database (currently only Milvus is supported). We follow these instructions to mount a Docker instance on your machine (using the standard ports). We provide a Docker compose setup to mount the Milvus containers and the BioChatter Light container together:

git clone https://github.com/biocypher/biochatter-light.git
cd biochatter-light
docker compose up -d

This command creates three containers for Milvus and one for BioChatter Light. After a short startup time, you can access the BioChatter Light app at http://localhost:8501.

Local deployment

Docker

The simplest way to deply BioChatter Light on your machine is using the Docker image we provide on Docker Hub. You can run it using the following command:

docker run -p 8501:8501 biocypher/biochatter-light

You can also build the image yourself from this repository (without the additional containers for the vector database):

git clone https://github.com/biocypher/biochatter-light.git
cd biochatter-light
docker build -t biochatter-light .
docker run -p 8501:8501 biochatter-light

Note that the community key feature is not available locally, so you need to provide your own API key (either in the app or as an environment variable).

Local LLMs using Xorbits Inference

Note that connection to locally deployed models via the Xinference API is not supported in the Docker image (because the optional "xinference" dependencies of BioChatter are not installed due to their large size). If you want to use this feature, you can build the image yourself including these dependencies, by setting

biochatter = {version = "0.4.7", extras = ["xinference"]}

in the pyproject.toml file. You can then build the image as described above, or install and run the app locally using Poetry (see below).

Provide your API key

Instead of manually entering the key, you can provide it to the Docker run command as an environment variable. You can designate the variable in your environment directly (export OPENAI_API_KEY=sk-...), or start the container with a text file (e.g. local.env) that contains the keys:

OPENAI_API_KEY=sk-...
...

you can run the following command:

docker run --env-file local.env -p 8501:8501 biochatter-light

Poetry

Local installation can be performed using Poetry (or other package managers that can work with a pyproject.toml file):

git clone https://github.com/biocypher/biochatter-light.git
cd biochatter-light
poetry install

Mac OS and Apple Silicon

For Apple Silicon machines, this must be followed by the following commands (inside the activated environment using poetry shell):

pip uninstall grpcio
mamba install grpcio  # alternatively, conda

This step is necessary due to incompatibilities in the standard ARM grpcio package. Currently, only conda-forge provides a compatible version. To avoid this issue, you can work in a devcontainer (see above).