GitHub Action to regenerate OpenAI word embeddings and store them in a Supabase vector store via LangChain. Useful if you have a retrieval-augmented generation (RAG) system and want to update the word embeddings automatically when the knowledge base changes.
Required Github personal access token
Required OpenAI API key
Required Supabase anon key
Required Supabase url
Required GitHub username of the repository owner
Required Name of the repository
Required Path to the directory containing notes content relative to the root path
Required Either nested
or flat
nested
: path-to-contents
points to a list of directories
flat
: path-to-contents
points to a list of files
Note: please have github-personal-access-token
, openai-api-key
, supabase-anon-key
and supabase-url
defined as environment variables. See the section below
- On the GitHub repository you're adding this action to, go to Settings > Environments and create a new environment called
Dev
- Add environment variables to the
Dev
environment by following these instructions - Create a
.github/workflows
directory in the root of the project - In
.github/workflows
, create a file calledregenerate-embeddings.yml
- Copy the following YAML into
regenerate-embeddings.yml
name: Regenerate embeddings
run-name: Regenerate embeddings and store in Supabase
on: [push]
jobs:
regenerate-embeddings:
runs-on: ubuntu-latest
environment: Dev
steps:
- name: Regenerate embeddings (flat notes)
uses: K02D/regenerate-embeddings@v2.3
with:
repository-owner-username: "K02D"
repository-name: "retrieval-augmented-generation"
path-to-contents: "notes_flat"
directory-structure: "flat"
github-personal-access-token: ${{ secrets.GH_PERSONAL_ACCESS_TOKEN }}
openai-api-key: ${{ secrets.OPENAI_API_KEY }}
supabase-anon-key: ${{ secrets.SUPABASE_ANON_KEY }}
supabase-url: ${{ secrets.SUPABASE_URL }}
This YAML
- Assumes the environment variables added in step 2 are named
GH_PERSONAL_ACCESS_TOKEN
,OPENAI_API_KEY
,SUPABASE_ANON_KEY
, andSUPABASE_URL
- Triggers the action on every push to the
main
branch
-
Create an OpenAI API key here if you don't have one. Use this for
OPENAI_API_KEY
- OpenAI's API is used to generate the word embeddings
-
Create a supabase project here if you don't have one. Once created, go to Project Settings > API to get the project URL and anon api key. Use these for
SUPABASE_URL
andSUPABASE_ANON_KEY
- Supabase is used to store the word embeddings in a postgres vector database so relevant content is retrieved when a user enters a prompt. This relevant content augments the LLM's response
-
Initialize your database in your supabase project using LangChain's template (ref). On your project dashboard, go to SQL Editor > Quickstarts > LangChain and click RUN