dev (#1269)

* Feature/encapsulate orchestration (#1265) * fully encapsulate orchestration * fully encapsulate orchestration * complete encapsulation * revert import cmt * making default r2r lighter (#1268) * making default r2r lighter * fix bug in ingest files * checkin * workingupdate * complete simple orch * update docs * up (#1273) * up * up * merge (#1276) * Postgres configuration settings (#1277) * Improvements on Auth in JS, CLI (#1267) * CLI Telemetry (#1266) * check in * working * redundant * JS auth improvements (#1263) * Check in JS auth improvements * Update login with toke * Fix to allow disabling telemetry * fix lock * Try to avoid merge conflicts * Clean up collection bugs * remove comments * Add Postgres configuration settings * Image * bad github conflict * merge (#1278) * port KG to postgres (#1272) * create + cluster * local search * up * clean * format * basics * add collection_id and paginate * rename * change api * up * kg_creation_status * up * up * up * Feature/cleanup docker (#1279) * merge * up * rm neo4j refs and cleanup docker cmds * fixup * Patch/cleanup kg migration (#1281) * cleanup kg migration * up * Kg testing (#1280) * up * up * up * up * slay neo4j --------- Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com> * add back poetry lock * Default Collections (#1282) * Default collections * Naughty naughty need to follow the SRP * Testing (#1284) * CICD * actions * poetry * poetry * Add env vars * name * increase timeout * add user to collection * Kg testing (#1283) * up * up * cleanup kg migration * up * up * up * Kg testing (#1280) * up * up * up * up * rename * project name * up * add chunk order * fragments => extractions * bug squash * up * up * up * change postgres project name --------- Co-authored-by: emrgnt-cmplxty <owen@algofi.org> * Feature/fix logic bugs (#1285) * fixing minor logic bugs in dev branch * fixing minor logic bugs in dev branch * merge * Application docs * add image (#1287) * Add version to CLI telemetry (#1288) * add image * Add version to cli telemetry * KG hatchet orchestration (#1286) * up * up * cleanup kg migration * up * up * up * Kg testing (#1280) * up * up * up * up * rename * project name * up * add chunk order * fragments => extractions * bug squash * up * up * up * change postgres project name * up * up --------- Co-authored-by: emrgnt-cmplxty <owen@algofi.org> * Feature/update documentation rebased (#1289) * up * merge * rebase * fix ingestion issues (#1291) * fix ingestion issues * fix lock file * fix embedding * Fix SDK KG Serialization (#1292) * add image * serialization * cleanup cli (#1294) * CLI serialization (#1295) * add image * Fix more serialization around kg * Nolan/schemacreation (#1296) * add image * Fix more serialization around kg * add quotes to prevent reserved keywords from failing * Prevent errors if config name is reserved name in postgres (#1297) * Prevent reserved words (#1298) * Move default collection id method to utils (#1299) * Allow json fallback (#1301) * hotfix: import * Fix description error (#1302) * up (#1303) * rename to `full` (#1304) * rename to `full` * add html parser * Remove postgres vecs variables (#1306) * Feature/rename ingest files (#1307) * rename to `full` * add html parser * Vec Removal (#1308) * Remove postgres vecs variables * up * change kg settings parsing (#1309) * offset + limit (#1305) * offset + limit * fix order * update query * change entity offset * leiden seed --------- Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com> Co-authored-by: Shreyas Pimpalgaonkar <shreyas.gp.7@gmail.com>
SciPhi-AI · Oct 2, 2024 · 3721fcb · 3721fcb
1 parent 8644a08
commit 3721fcb
Show file tree

Hide file tree

Showing 257 changed files with 8,098 additions and 8,247 deletions.
diff --git a/.env.example b/.env.example
@@ -12,9 +12,3 @@ export POSTGRES_HOST=your_host
 export POSTGRES_PORT=your_port
 export POSTGRES_DBNAME=your_db
 export POSTGRES_PROJECT_NAME=your_project_name
-
-# Environment variables for KG provider (currently only Neo4j)
-# export NEO4J_USER=YOUR_NEO4J_USER
-# export NEO4J_PASSWORD=YOUR_NEO4J_PASSWORD
-# export NEO4J_URL=YOUR_NEO4J_URL
-# export NEO4J_DATABASE=YOUR_NEO4J_DATABASE
diff --git a/.github/workflows/integration-test-workflow-debian.yml b/.github/workflows/integration-test-workflow-debian.yml
@@ -1,4 +1,4 @@
-name: Debian R2R Docker Build and Integration Test (Debian GNU/Linux 12 (bookworm) amd64)
+name: R2R CLI Integration Test (Debian GNU/Linux 12 (bookworm) amd64)
 
 on:
   push:
@@ -8,124 +8,141 @@ on:
 
 jobs:
   build-and-test:
-    runs-on: arm3
+    runs-on: ubuntu-latest
     permissions:
-      packages: write
       contents: read
       id-token: write
       actions: write
     env:
       OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
       TELEMETRY_ENABLED: false
+      POSTGRES_USER: ${{ secrets.POSTGRES_USER }}
+      POSTGRES_PASSWORD: ${{ secrets.POSTGRES_PASSWORD }}
+      POSTGRES_DBNAME: ${{ secrets.POSTGRES_DBNAME }}
+      POSTGRES_HOST: ${{ secrets.POSTGRES_HOST }}
+      POSTGRES_PORT: ${{ secrets.POSTGRES_PORT }}
+      POSTGRES_PROJECT_NAME: ${{ secrets.POSTGRES_PROJECT_NAME }}
 
     steps:
     - uses: actions/checkout@v4
 
-    - name: Clean up disk space
-      uses: jlumbroso/free-disk-space@main
+    - name: Set up Python
+      uses: actions/setup-python@v4
       with:
-        tool-cache: true
-        android: true
-        dotnet: true
-        haskell: true
-        large-packages: true
-        swap-storage: true
-
-    - name: Docker Auth
-      uses: docker/login-action@v3
-      with:
-        username: ${{ secrets.RAGTORICHES_DOCKER_UNAME }}
-        password: ${{ secrets.RAGTORICHES_DOCKER_TOKEN }}
-
-    - name: Set up Docker Buildx
-      uses: docker/setup-buildx-action@v3
-
-    - name: Set up QEMU
-      uses: docker/setup-qemu-action@v3
+        python-version: '3.x'
 
-    - name: Set image name
-      id: image
+    - name: Install Poetry
       run: |
-        echo "IMAGE_NAME=ragtoriches/dev" >> $GITHUB_OUTPUT
+        curl -sSL https://install.python-poetry.org | python3 -
 
-    - name: Build and Push Docker Image
-      uses: docker/build-push-action@v5
-      with:
-        context: ./py
-        file: ./py/Dockerfile
-        push: true
-        tags: ragtoriches/dev:latest
-        platforms: linux/amd64
-        no-cache: true
-        pull: true
-
-    - name: Run cloud LLM integration tests in Docker
+    - name: Install dependencies
       working-directory: ./py
       run: |
-        python3 -m venv venv
-        source venv/bin/activate
-        pip install -e .
-
-        echo "R2R Version"
-        r2r version
-
-        echo "R2R Serve --docker"
-        r2r serve --docker --exclude-neo4j=true --exclude-ollama=true  --image=ragtoriches/dev:latest
+        poetry install -E core -E ingestion-bundle
 
+    - name: Start R2R server
+      working-directory: ./py
+      run: |
+        poetry run r2r serve &
         echo "Waiting for services to start..."
         sleep 30
 
+    - name: Run integration tests
+      working-directory: ./py
+      run: |
+        echo "R2R Version"
+        poetry run r2r version
+
+    - name: Walkthrough
+      working-directory: ./py
+      run: |
         echo "Ingest Data"
-        r2r ingest-sample-files
+        poetry run r2r ingest-sample-files
 
         echo "Get Documents Overview"
-        r2r documents-overview
+        poetry run r2r documents-overview
 
         echo "Get Document Chunks"
-        r2r document-chunks --document-id=77f67c65-6406-5076-8176-3844f3ef3688
+        poetry run r2r document-chunks --document-id=9fbe403b-c11c-5aae-8ade-ef22980c3ad1
 
         echo "Delete Documents"
-        r2r delete --filter="document_id:eq:f25fd516-5cac-5c09-b120-0fc841270c7e"
+        poetry run r2r delete --filter=document_id:eq:9fbe403b-c11c-5aae-8ade-ef22980c3ad1
+
+        echo "Update Document"
+        poetry run r2r update-files core/examples/data/aristotle_v2.txt --document-ids=9fbe403b-c11c-5aae-8ade-ef22980c3ad1
 
         echo "Vector Search"
-        r2r search --query="What was Uber'\''s profit in 2020?"
+        poetry run r2r search --query="What was Uber's profit in 2020?"
 
         echo "Hybrid Search"
-        r2r search --query="What is a fierce nerd?" --use-hybrid-search
+        r2r search --query="What was Uber's profit in 2020?" --use-hybrid-search
 
         echo "Basic RAG"
-        r2r rag --query="What was Uber'\''s profit in 2020?"
+        poetry run r2r rag --query="What was Uber's profit in 2020?"
 
         echo "RAG with Hybrid Search"
-        r2r rag --query="Who is John Snow?" --use-hybrid-search
+        poetry run r2r rag --query="Who is John Snow?" --use-hybrid-search
 
         echo "Streaming RAG"
-        r2r rag --query="What was Lyft'\''s profit in 2020?" --stream
+        poetry run r2r rag --query="who was aristotle" --use-hybrid-search --stream
+
+        echo "User Registration"
+        curl -X POST http://localhost:7272/v2/register \
+          -H "Content-Type: application/json" \
+          -d '{
+            "email": "test@example.com",
+            "password": "password123"
+          }'
+
+        echo "User Login"
+        curl -X POST http://localhost:7272/v2/login \
+        -H "Content-Type: application/x-www-form-urlencoded" \
+        -d "username=test@example.com&password=password123"
 
         echo "Users Overview"
-        r2r users-overview
+        poetry run r2r users-overview
+
+        echo "Logging"
+        poetry run r2r logs
 
         echo "Analytics"
-        r2r analytics --filters '{"search_latencies": "search_latency"}' --analysis-types '{"search_latencies": ["basic_statistics", "search_latency"]}'
+        poetry run r2r analytics --filters '{"search_latencies": "search_latency"}' --analysis-types '{"search_latencies": ["basic_statistics", "search_latency"]}'
 
-        echo "Logging"
-        r2r logs
+    - name: GraphRAG
+      working-directory: ./py
+      run: |
+        echo "Create Knowledge Graph"
+        poetry run r2r create-graph --document-ids=9fbe403b-c11c-5aae-8ade-ef22980c3ad1
 
-        echo "Docker Down"
-        r2r docker-down
+        echo "Inspect Knowledge Graph"
+        poetry run r2r inspect-knowledge-graph
 
-  cleanup:
-    needs: build-and-test
-    runs-on: arm3
-    if: always()
-    steps:
-    - name: Clean up Virtual Environment
+        echo "Graph Enrichment"
+        poetry run r2r enrich-graph
+
+        echo "Local Search"
+        r2r search --query="Who is Aristotle?" --use-kg-search --kg-search-type=local
+
+        echo "Global Search"
+        r2r search --query="What were Aristotles key contributions to philosophy?" --use-kg-search --kg-search-type=global --max-llm-queries-for-global-search=100
+
+        echo "RAG"
+        r2r rag --query="What are the key contributions of Aristotle to modern society?" --use-kg-search --kg-search-type=global --max-llm-queries-for-global-search=100
+
+
+
+
+
+
+    - name: Advanced RAG
+      working-directory: ./py
+      run: |
+        echo "HyDE"
+        poetry run r2r rag --query="who was aristotle" --use-hybrid-search --stream --search-strategy=hyde
+
+        echo "Rag-Fusion"
+        r2r rag --query="Explain the theory of relativity" --use-hybrid-search --stream --search-strategy=rag_fusion
+
+    - name: Stop R2R server
       run: |
-        if [ -d "venv" ]; then
-          deactivate || true
-          rm -rf venv
-        fi
-        docker stop $(docker ps -a -q) || true
-        docker system prune -af --volumes
-        docker network prune --force
-        docker volume rm $(docker volume ls -qf dangling=true) || true
+        pkill -f "r2r serve"
diff --git a/docs/api-reference/openapi.json b/docs/api-reference/openapi.json
diff --git a/docs/cookbooks/application.mdx b/docs/cookbooks/application.mdx
@@ -6,8 +6,6 @@ icon: 'display'
 
 R2R offers an [open-source React+Next.js application](https://github.com/SciPhi-AI/R2R-Application) designed to give developers an administrative portal for their R2R deployment, and users an application to communicate with out of the box.
 
-In addition, R2R comes with an orchestration GUI powered by Hatchet, which you can learn about [here](/cookbooks/orchestration).
-
 ## Setup
 
 ### Install PNPM
@@ -36,7 +34,7 @@ After installation, you may need to add PNPM to your system's PATH.
 
 ### Installing and Running the R2R Dashboard
 
-If you're running R2R with the Docker, you already have the R2R dashboard running! Just navigate to [http://localhost:3000](http://localhost:3000).
+If you're running R2R with the Docker, you already have the R2R application running! Just navigate to [http://localhost:7273](http://localhost:7273).
 
 If you're running R2R outside of Docker, run the following commands to install the R2R Dashboard.
 
@@ -74,15 +72,27 @@ By default, an R2R instance is hosted on port 7272. The login page will include
 
 ### Documents
 
-The documents page provides an overview of uploaded documents and their metadata. You can upload new documents and update or delete existing ones.
+The documents page provides an overview of uploaded documents and their metadata. You can upload new documents and update, download, or delete existing ones. Additionally, you can view information about each document, including the documents' chunks and previews of PDFs.
 
 ![Documents Page](/images/oss_dashboard_documents.png)
 
-### Playground
+### Collections
+
+Collections allow users to create and share sets of documents. The collections page provides a place to manage your existing collections or create new collections.
+
+![Collections Page](/images/oss_collections_page.png)
+
+### Chat
+
+In the chat page, you can stream RAG responses with different models and configurable settings. You can interact with both the RAG Agent and RAG endpoints here.
+
+![Chat Interface](/images/chat.png)
+
+### Users
 
-The playground allows streaming RAG responses with different models and configurable settings.
+Manage your users and gain insight into their interactions.
 
-![Playground Interface](/images/playground.png)
+![Users Page](/images/users.png)
 
 ### Logs
 

diff --git a/docs/cookbooks/graphrag.mdx b/docs/cookbooks/graphrag.mdx
@@ -30,25 +30,25 @@ r2r serve
 <Accordion icon="gear" title="Configuration: r2r.toml">
 ``` toml
 [kg]
-provider = "neo4j"
+provider = "postgres"
 batch_size = 256
 
   [kg.kg_creation_settings]
-    kg_extraction_prompt = "graphrag_triplet_extraction_zero_shot"
+    kg_triples_extraction_prompt = "graphrag_triples_extraction_few_shot"
     entity_types = [] # if empty, all entities are extracted
     relation_types = [] # if empty, all relations are extracted
     max_knowledge_triples = 100
     fragment_merge_count = 4 # number of fragments to merge into a single extraction
-    generation_config = { model = "gpt-4o-mini" } # and other params, model used for triplet extraction
+    generation_config = { model = "openai/gpt-4o-mini" } # and other params, model used for triplet extraction
 
   [kg.kg_enrichment_settings]
     max_description_input_length = 65536 # increase if you want more comprehensive descriptions
     max_summary_input_length = 65536 # increase if you want more comprehensive summaries
-    generation_config = { model = "gpt-4o-mini" } # and other params, model used for node description and graph clustering
-    leiden_params = { max_levels = 10 } # more params here: https://neo4j.com/docs/graph-data-science/current/algorithms/leiden/
+    generation_config = { model = "openai/gpt-4o-mini" } # and other params, model used for node description and graph clustering
+    leiden_params = {}
 
   [kg.kg_search_settings]
-    generation_config = { model = "gpt-4o-mini" }
+    generation_config = { model = "openai/gpt-4o-mini" }
 ```
 </Accordion>
 </Tab>
@@ -92,13 +92,13 @@ batch_size = 32
 add_title_as_prefix = true
 
 [parsing]
-excluded_parsers = [ "gif", "jpeg", "jpg", "png", "svg", "mp3", "mp4" ]
+excluded_parsers = [ "mp4" ]
 
 [kg]
-provider = "neo4j"
+provider = "postgres"
 
   [kg.kg_creation_settings]
-    kg_extraction_prompt = "graphrag_triplet_extraction_zero_shot"
+    kg_triples_extraction_prompt = "graphrag_triples_extraction_few_shot"
     entity_types = [] # if empty, all entities are extracted
     relation_types = [] # if empty, all relations are extracted
     max_knowledge_triples = 100
@@ -109,7 +109,7 @@ provider = "neo4j"
     max_description_input_length = 65536 # increase if you want more comprehensive descriptions
     max_summary_input_length = 65536
     generation_config = { model = "ollama/llama3.1" } # and other params, model used for node description and graph clustering
-    leiden_params = { max_levels = 10 } # more params here: https://neo4j.com/docs/graph-data-science/current/algorithms/leiden/
+    leiden_params = {}
 
   [kg.kg_search_settings]
     generation_config = { model = "ollama/llama3.1" }
@@ -175,10 +175,7 @@ r2r create-graph --document-ids=9fbe403b-c11c-5aae-8ade-ef22980c3ad1
 [{'message': 'Graph creation task queued successfully.', 'task_id': 'd9dae1bb-5862-4a16-abaf-5297024df390'}]
 ```
 
-This step will create a knowledge graph with nodes and relationships. You can visualize the graph in two ways:
-
-
-1. Using the neo4j browser on `http://localhost:7474`. The username and password are `neo4j` and `ineedastrongerpassword`. To visualize the graph, run the following command in the neo4j browser:
+This step will create a knowledge graph with nodes and relationships. Below is a visualization of the graph which we produced with Neo4j:
 
 ```
 MATCH (a)

diff --git a/docs/cookbooks/observability.mdx b/docs/cookbooks/observability.mdx
@@ -254,7 +254,7 @@ queries = [
 # Perform random searches
 for _ in range(1000):
     query = random.choice(queries)
-    app.rag(query, GenerationConfig(model="gpt-4o-mini"))
+    app.rag(query, GenerationConfig(model="openai/gpt-4o-mini"))
 
 print("Preloading complete. You can now run analytics on this data.")
 ```