chore: add fixture and notebook (#5602)

Arize-ai · Dec 4, 2024 · aeb8a61 · aeb8a61
1 parent 87ab1f8
commit aeb8a61
Show file tree

Hide file tree

Showing 2 changed files with 262 additions and 0 deletions.
diff --git a/src/phoenix/trace/fixtures.py b/src/phoenix/trace/fixtures.py
@@ -222,6 +222,13 @@ class TracesFixture:
     ),
 )
 
+project_sessions_llama_index_rag_arize_docs_fixture = TracesFixture(
+    name="project_sessions_llama_index_rag_arize_docs",
+    project_name="SESSIONS-DEMO",
+    file_name="project_sessions_demo_llama_index_query_engine_arize_docs.parquet",
+    description="RAG queries grouped by session.id and user.id.",
+)
+
 llama_index_calculator_agent_fixture = TracesFixture(
     name="llama_index_calculator_agent",
     description="Traces from running the llama_index with calculator tools.",
@@ -290,6 +297,7 @@ class TracesFixture:
     llama_index_calculator_agent_fixture,
     vision_fixture,
     anthropic_tools_fixture,
+    project_sessions_llama_index_rag_arize_docs_fixture,
 ]
 
 NAME_TO_TRACES_FIXTURE: dict[str, TracesFixture] = {

diff --git a/tutorials/tracing/project_sessions_llama_index_query_engine.ipynb b/tutorials/tracing/project_sessions_llama_index_query_engine.ipynb
@@ -0,0 +1,254 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<center>\n",
+    "    <p style=\"text-align:center\">\n",
+    "        <img alt=\"phoenix logo\" src=\"https://storage.googleapis.com/arize-phoenix-assets/assets/phoenix-logo-light.svg\" width=\"200\"/>\n",
+    "        <br>\n",
+    "        <a href=\"https://docs.arize.com/phoenix/\">Docs</a>\n",
+    "        |\n",
+    "        <a href=\"https://github.com/Arize-ai/phoenix\">GitHub</a>\n",
+    "        |\n",
+    "        <a href=\"https://join.slack.com/t/arize-ai/shared_invite/zt-1px8dcmlf-fmThhDFD_V_48oU7ALan4Q\">Community</a>\n",
+    "    </p>\n",
+    "</center>\n",
+    "<h1 align=\"center\">Using Sessions with LlamaIndex</h1>\n",
+    "\n",
+    "A Session is a sequence of traces representing a user's interaction with an application.\n",
+    "\n",
+    "In this tutorial, you will:\n",
+    "- Build and trace a simple LlamaIndex application\n",
+    "- Use sessions to organize traces\n",
+    "\n",
+    "ℹ️ This notebook requires an OpenAI API key."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Install Dependencies and Import Libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install -Uq \"arize-phoenix[llama-index]\" gcsfs faker"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import os\n",
+    "from getpass import getpass\n",
+    "from random import sample\n",
+    "from urllib.request import urlopen\n",
+    "from uuid import uuid4\n",
+    "\n",
+    "from faker import Faker\n",
+    "from gcsfs import GCSFileSystem\n",
+    "from llama_index.core import (\n",
+    "    Settings,\n",
+    "    StorageContext,\n",
+    "    load_index_from_storage,\n",
+    ")\n",
+    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
+    "from llama_index.llms.openai import OpenAI\n",
+    "from openinference.instrumentation import using_session, using_user\n",
+    "from openinference.instrumentation.llama_index import LlamaIndexInstrumentor\n",
+    "from tqdm import tqdm\n",
+    "\n",
+    "import phoenix as px\n",
+    "from phoenix.otel import register"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Configure Your OpenAI API Key\n",
+    "\n",
+    "Set your OpenAI API key if it is not already set as an environment variable."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if not (openai_api_key := os.getenv(\"OPENAI_API_KEY\")):\n",
+    "    openai_api_key = getpass(\"🔑 Enter your OpenAI API key: \")\n",
+    "os.environ[\"OPENAI_API_KEY\"] = openai_api_key"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Configure the default project then Launch Phoenix\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "🚨 Phoenix is configured with environment variables. 🚨\n",
+    "\n",
+    "In this tutorial we want to change the default project we send traces to by modifying the `PHOENIX_PROJECT_NAME` environment variable defined blow."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "os.environ[\"PHOENIX_PROJECT_NAME\"] = \"SESSIONS-DEMO\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Enable Phoenix tracing via `LlamaIndexInstrumentor`. Phoenix uses OpenInference traces - an open-source standard for capturing and storing LLM application traces that enables LLM applications to seamlessly integrate with LLM observability solutions such as Phoenix."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tracer_provider = register(endpoint=\"http://127.0.0.1:6006/v1/traces\")\n",
+    "LlamaIndexInstrumentor().instrument(skip_dep_check=True, tracer_provider=tracer_provider)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Launch Phoenix"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "px.launch_app()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Build Your LlamaIndex Application"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This example uses a `RetrieverQueryEngine` over a pre-built index of the Arize documentation, but you can use whatever LlamaIndex application you like.\n",
+    "\n",
+    "Download our pre-built index of the Arize docs from cloud storage and instantiate your storage context."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "file_system = GCSFileSystem(project=\"public-assets-275721\")\n",
+    "persist_dir = \"arize-phoenix-assets/datasets/unstructured/llm/llama-index/arize-docs/index/\"\n",
+    "storage_context = StorageContext.from_defaults(fs=file_system, persist_dir=persist_dir)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We are now ready to instantiate our query engine that will perform retrieval-augmented generation (RAG). Query engine is a generic interface in LlamaIndex that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is built on top of Retrievers. You can compose multiple query engines to achieve more advanced capability"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "Settings.llm = OpenAI(model=\"gpt-4o-mini\")\n",
+    "Settings.embed_model = OpenAIEmbedding()\n",
+    "index = load_index_from_storage(storage_context)\n",
+    "query_engine = index.as_query_engine()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5. Download Sample Queries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "queries_url = \"http://storage.googleapis.com/arize-phoenix-assets/datasets/unstructured/llm/context-retrieval/arize_docs_queries.jsonl\"\n",
+    "with urlopen(queries_url) as response:\n",
+    "    queries = [json.loads(line)[\"query\"] for line in response]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 6. Group Queries By User Sessions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "session_id = str(uuid4())\n",
+    "session_user = Faker().user_name()\n",
+    "\n",
+    "with using_session(session_id), using_user(session_user):\n",
+    "    for query in tqdm(sample(queries, 3)):\n",
+    "        query_engine.query(query)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<video controls src=\"https://storage.googleapis.com/arize-phoenix-assets/assets/docs/notebooks/llama-index-knowledge-base-tutorial/project_sessions.mov\" />"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}