feat: backend v2 refactoring (#11)

* chore: add new PriceExpert tool to fetch price of a coin Added a new tool called PriceExpert in price_expert.py to fetch the price of a coin using the ccxt library. The tool provides both synchronous and asynchronous methods to retrieve the price of a specified coin. The tool uses the fetch_price function to get the latest price from the Binance exchange. Updated dependencies in pyproject.toml to include ccxt version 4.3.7. * chore: implemented duneexpert tool for retrieving dune dashboard data from rss3 api * chore: typo * chore: refactor experts code * chore: fixed issue with fetching trades by converting base currency to uppercase before making the API call. * chore: replace feed_expert api endpoint * chore: replace feed_expert api endpoint * chore: refactor price_expert.py to handle multiple exchanges for fetching prices and add error logging for failed price fetch attempts. * feat: refactored nft_expert.py to include search_nft_collections and collection_ranking functions for NFT collection search and ranking. Added ARGS schema for input validation and simplified request methods for API calls. * chore: remove useless expert * chore: refactor function_agent.py to import nft_expert instead of CollectionExpert * chore: add pgvector_store.py for building vector store with openai embeddings and postgresql connection * chore: updated openai dependency to version 1.25.2 * chore: add beautifulsoup4 and markdown dependencies to pyproject.toml and poetry.lock files. * chore: add feed scraping functionality - Added feed scraping functionality to fetch feeds from Mirror and IQWiki platforms - Updated env.py to include RSS3_DATA_API endpoint - Added .env.example entry for RSS3_DATA_API server endpoint * chore: remove python-redis-lock from dependencies in pyproject.toml * chore: remove useless code * feat(index): add cursor parameter to fetch_iqwiki_feeds and fetch_feeds - Added cursor parameter to fetch_iqwiki_feeds and fetch_feeds functions to enable pagination in fetching feeds from platforms. - Updated feed_scrape.py and feed_indexing.py files to include the cursor parameter in the function signatures and usage. - Implemented cursor handling in the build_index function to fetch and index feeds incrementally based on cursor pagination. - Added helper method _clear to reset content before building the index in feed_indexing.py. * chore: update openai version to 0.28.1 * fix: Correct import statement for OpenAIEmbeddings in pgvector_store.py * chore: prmpot modification for integrating llama3 and phi3 * chore: refactor nft_expert for unified naming * chore: Fixed initialization of `store` variable in pgvector_store.py and moved it outside the main block. * feat: Implemented ArticleExpert tool for searching web3 related articles. Includes search functionality with similarity score threshold and returns relevant documents in JSON format. * chore: remove the repeated method * chore: Fixed issue with handling response without meta data in feed_indexing.py * chore: Created database if it does not exist in database.py. * fix: nft expert * chore: log more details * chore: Add retrying library to dependencies for feed_scrape.py functionality with retry functionality. * chore: Refactor article expert tool to improve search accuracy and relevance for web3-related articles. Update search score threshold to 0.8 and return top 3 relevant article excerpts. Add detailed description for tool functionality and usage. * chore: Refactor feed_indexing.py to include separate functions for indexing_mirror and indexing_iqwiki, and add save_records function for saving records before indexing. * chore: Refactor article_expert.py to require a keyword parameter for searching articles and update the description to include information about sourcing articles from IQWiki and Mirror. * chore: Updated langchain version to 0.1.16 and added langchain-postgres dependency. * chore: Updated psycopg2 to version 2.9.9 in pyproject.toml * chore update .env.example * chore: remove RSS3_AI_API_BASE * chore: refactor feed indexing functions and add restart policy to vec_db container in docker-compose.yaml * chore: remove redis_data volume from docker-compose.yaml file. * chore: add PriceExpert for exchange rate questions * chore: remove useless code * chore: Add RSS3_SEARCH_API endpoint to .env.example and update search_expert.py to use the new endpoint. * chore: rename --------- Co-authored-by: Thomas <wxy_000000@qq.com> Co-authored-by: Henry Wang <hi@henry.wang>
webisopen · May 17, 2024 · 38d34ca · 38d34ca
1 parent 57fd39a
commit 38d34ca
Show file tree

Hide file tree

Showing 25 changed files with 1,355 additions and 664 deletions.
diff --git a/backend/.env.example b/backend/.env.example
@@ -1,18 +1,21 @@
 # Usage: Copy this file to .env and fill in the values
 # Model name, required for inference.
-# For OpenAI GPT, use "gpt-4-1106-preview" or "gpt-3.5-turbo-1106" as model name.
-# For local LLM, Those models with Ollama inference are tested and recommended: "solar:10.7b", "codellama:13b", "llava:13b", "deepseek-coder:33b". Other models are not tested and may not work as expected.
+# For OpenAI GPT, refer to https://platform.openai.com/docs/models. We recommend using "gpt-4-turbo" for the best performance.
+# For local LLM, those models with Ollama inference are tested and recommended: "solar:10.7b", "codellama:13b", "llava:13b", "deepseek-coder:33b", "llama:8b", "phi:3.8b". Other models are not tested and may not work as expected.
 MODEL_NAME=llava:13b
 # API to your LLM server, required for inference. When using OpenAI GPT, which you probably should not, use https://api.openai.com/v1
 LLM_API_BASE=...
 # Google Search Engine API key, required for google_expert
 SERPAPI_API_KEY=...
-# RSS3 AI API server endpoint, required for retrieving AI-ready data indexed from many blockchains, see https://docs.rss3.io/docs/introduction-network for more information
-RSS3_AI_API_BASE=https://testnet.rss3.io/m1
 # Executor API server endpoint, required for executing transactions on chain, see executor for more information
 EXECUTOR_API=...
-# Postgres database connection, required if you want to store data in a database
-POSTGRES_SERVER=...
-POSTGRES_USER=...
-POSTGRES_PASSWORD=...
-POSTGRES_DB=...
+# NFTSCAN API key, required for nft expert
+NFTSCAN_API_KEY=...
+# Business logic database connection string
+BIZ_DB_CONNECTION=postgresql://postgres:password@localhost:5432/copilot
+# Vector database connection string
+VEC_DB_CONNECTION=postgresql+psycopg://langchain:langchain@localhost:6024/langchain
+# RSS3 Data API server endpoint, required for retrieving data from RSS3 network
+RSS3_DATA_API=https://testnet.rss3.io/data
+# RSS3 Search API server endpoint, required for searching data from RSS3 network
+RSS3_SEARCH_API=https://devnet.rss3.io/search
diff --git a/backend/docker-compose.yaml b/backend/docker-compose.yaml
@@ -1,55 +1,32 @@
 version: '3.4'
 services:
-  weaviate:
-    image: semitechnologies/weaviate:1.20.3
-    container_name: weaviate
+  vec_db:
+    image: pgvector/pgvector:pg16
+    container_name: vec_db
     restart: unless-stopped
-    ports:
-      - "8091:8080"
-    env_file:
-      - .env
     environment:
-      QUERY_DEFAULTS_LIMIT: 20
-      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true' # disable this in production
-      AUTHENTICATION_APIKEY_ENABLED: 'true'
-      AUTHENTICATION_APIKEY_ALLOWED_KEYS: ${WEAVIATE_API_KEYS}
-      AUTHENTICATION_APIKEY_USERS: ${WEAVIATE_USERS}
-      PERSISTENCE_DATA_PATH: "./data"
-      DEFAULT_VECTORIZER_MODULE: text2vec-openai
-      ENABLE_MODULES: 'text2vec-openai,generative-openai'
+      POSTGRES_USER: langchain
+      POSTGRES_PASSWORD: langchain
+      POSTGRES_DB: langchain
+    ports:
+      - "6024:5432"
     volumes:
-      - weaviate_data:/var/lib/weaviate
-  postgres:
+      - pgvector_data:/var/lib/postgresql/data
+
+  biz_db:
     image: postgres:14-alpine
-    container_name: postgres
+    container_name: biz_db
     restart: unless-stopped
     ports:
       - "5432:5432"
     env_file:
       - .env
     environment:
-      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
+      POSTGRES_PASSWORD: password
     volumes:
       - pg_data:/var/lib/postgresql/data
-  redis:
-    image: redis:alpine
-    container_name: redis
-    restart: unless-stopped
-    ports:
-      - "6379:6379"
-    env_file:
-      - .env
-    command: /bin/sh -c "redis-server --requirepass ${REDIS_PASSWORD}"
-    volumes:
-      - redis_data:/data
-  weaviate-ui:
-    image: naaive/weaviate-ui:v1.0.3
-    ports:
-      - "7777:7777"
-    environment:
-      - WEAVIATE_URL=http://weaviate:8080
-      - WEAVIATE_API_KEYS=${WEAVIATE_API_KEYS}
+
+
 volumes:
-  weaviate_data:
+  pgvector_data:
   pg_data:
-  redis_data:
diff --git a/backend/openagent/agent/function_agent.py b/backend/openagent/agent/function_agent.py
@@ -7,13 +7,16 @@
 
 from openagent.agent.cache import init_cache
 from openagent.agent.postgres_history import PostgresChatMessageHistory
-from openagent.agent.system_prompt import SYSTEM_PROMPT, ollama_agent_kwargs
+from openagent.agent.system_prompt import (
+    SYSTEM_PROMPT,
+    ollama_agent_kwargs,
+)
 from openagent.conf.env import settings
-from openagent.experts.account_expert import AccountExpert
-from openagent.experts.collection_expert import CollectionExpert
+from openagent.experts.article_expert import ArticleExpert
 from openagent.experts.feed_expert import FeedExpert
-from openagent.experts.google_expert import GoogleExpert
-from openagent.experts.hoot_expert import HootExpert
+from openagent.experts.nft_expert import NFTExpert
+from openagent.experts.price_expert import PriceExpert
+from openagent.experts.search_expert import SearchExpert
 from openagent.experts.swap_expert import SwapExpert
 from openagent.experts.transfer_expert import TransferExpert
 
@@ -34,13 +37,13 @@ def get_agent(session_id: str) -> AgentExecutor:
     )
     # load Experts as tools for the agent
     experts = [
-        GoogleExpert(),
+        SearchExpert(),
         FeedExpert(),
-        CollectionExpert(),
-        AccountExpert(),
+        PriceExpert(),
+        ArticleExpert(),
+        NFTExpert(),
         SwapExpert(),
         TransferExpert(),
-        HootExpert(),
     ]
 
     if settings.MODEL_NAME.startswith("gpt"):

diff --git a/backend/openagent/agent/session_title.py b/backend/openagent/agent/session_title.py
@@ -45,9 +45,3 @@ async def agen_session_title(user_id: str, session_id: str, history: str) -> lis
         ).update({ChatSession.title: output})
         db_session.commit()
     return output
-
-
-if __name__ == "__main__":
-    import asyncio
-
-    asyncio.run(agen_session_title("123", "456", "what's your name ?"))
diff --git a/backend/openagent/agent/suggested_question.py b/backend/openagent/agent/suggested_question.py
@@ -55,9 +55,3 @@ async def agen_suggested_questions(user_id: str, history: str) -> list[str]:
     lst = json.loads(output)
     logger.info(f"suggested questions parsed: {lst}")
     return lst
-
-
-if __name__ == "__main__":
-    import asyncio
-
-    asyncio.run(agen_suggested_questions("123", "eth price?"))
diff --git a/backend/openagent/agent/system_prompt.py b/backend/openagent/agent/system_prompt.py
@@ -21,39 +21,46 @@
 ollama_agent_kwargs = {
     "prefix": """
 Your designated name is RSS3 OpenAgent, developed by RSS3, \
-you have the capability to call upon tools to aid in answering questions.
-
+you have the capability to call upon tools to aid in answering questions about web3.
 Assistants may prompt the user to employ specific tools to gather information that might be helpful in addressing the user's initial question.
-
 Here are tools' schemas:
         """,
     "format_instructions": r"""
-
 When responding, you must exclusively use one of the following two formats:
 
 **Option 1:**
 If you're suggesting that the user utilizes a tool, format your response as a markdown code snippet according to this schema:
-
 ```json
 {{{{
     "action": string, // The action to be taken. Must be one of {tool_names}
-    "action_input": object  // The parameters for the action. MUST be JSON object
+    "action_input": dict // The parameters for the action. MUST be a dict object
+}}}}
+```
+e.g.
+```json
+{{{{
+    "action": "search",
+    "action_input": {{{{
+        "query": "price of ETH",
+        "search_type": "google",
+    }}}}
 }}}}
 ```
 
-**Option #2:**
-If you're providing a direct response to the user, format your response as a markdown code snippet following this schema:
+**Option 2:**
+If you observable the tool's results, or you're providing a direct final response to the user, format your response as a markdown code snippet following this schema:
 
 ```json
 {{{{
     "action": "Final Answer", // MUST be literal string "Final Answer", other forms are not acceptable
     "action_input": string // This should contain your response to the user, in human-readable language
 }}}}
 ```
-
-"action\_input" is illegal, never escape it with a backslash.
 """,
     "suffix": """
-REMEMBER to respond with a markdown code snippet of a json \
-blob with a single action, and NOTHING else""",
+YOU MUST FOLLOW THESE INSTRUCTIONS CAREFULLY.
+1. To respond to the users message, you can use only one tool at a time.
+2. When using a tool, only respond with the tool call. Nothing else. Do not add any additional notes, explanations or white space. Never escape with a backslash.
+3. REMEMBER to respond with a markdown code snippet of a json blob with a single action, and nothing else.
+""",
 }
diff --git a/backend/openagent/conf/env.py b/backend/openagent/conf/env.py
@@ -8,17 +8,15 @@
 class Settings(BaseSettings):
     MODEL_NAME: str = Field(default="llava:13b", env="MODEL_NAME")
     LLM_API_BASE: str = Field(..., env="LLM_API_BASE")
-    RSS3_AI_API_BASE: str = Field(..., env="RSS3_AI_API_BASE")
-    EXECUTOR_API: str = Field(..., env="EXECUTOR_API")
-    POSTGRES_SERVER: str = Field(..., env="POSTGRES_SERVER")
-    POSTGRES_USER: str = Field(..., env="POSTGRES_USER")
-    POSTGRES_PASSWORD: str = Field(..., env="POSTGRES_PASSWORD")
-    POSTGRES_DB: str = Field(..., env="POSTGRES_DB")
-    POSTGRES_CONNECTION_STRING: str = ""
-
-    def postgres_connection_string(self):
-        return f"postgresql://{self.POSTGRES_USER}:{self.POSTGRES_PASSWORD}\
-@{self.POSTGRES_SERVER}/{self.POSTGRES_DB}"
+    NFTSCAN_API_KEY: str = Field(..., env="NFTSCAN_API_KEY")
+    BIZ_DB_CONNECTION: str = Field(..., env="BIZ_DB_CONNECTION")
+    VEC_DB_CONNECTION: str = Field(..., env="VEC_DB_CONNECTION")
+    RSS3_DATA_API: str = Field(
+        default="https://testnet.rss3.io/data", env="RSS3_DATA_API"
+    )
+    RSS3_SEARCH_API: str = Field(
+        default="https://devnet.rss3.io/search", env="RSS3_SEARCH_API"
+    )
 
 
 settings = Settings()
diff --git a/backend/openagent/db/database.py b/backend/openagent/db/database.py
@@ -1,12 +1,15 @@
 from sqlalchemy import create_engine
 from sqlalchemy.orm import sessionmaker
+from sqlalchemy_utils import create_database, database_exists
 
 from openagent.conf.env import settings
 from openagent.db.models import Base
 
-engine = create_engine(
-    settings.postgres_connection_string(), connect_args={"options": "-c timezone=utc"}
-)
+url = settings.BIZ_DB_CONNECTION
+
+if not database_exists(url):
+    create_database(url)
+engine = create_engine(url, connect_args={"options": "-c timezone=utc"})
 Base.metadata.create_all(bind=engine)  # type: ignore
 
 DBSession = sessionmaker(bind=engine)
diff --git a/backend/openagent/experts/__init__.py b/backend/openagent/experts/__init__.py
@@ -135,14 +135,3 @@ def handle_ct_token_by_address(addr) -> dict | None:
             "chainId": 1,
         }
     return None
-
-
-async def main():
-    token = await get_token_by_address("0x4d2bf3A34a2311dB4b3D20D4719209EDaDBf69b6")
-    best_token = await select_best_token("ct", "1")
-    print(best_token)
-    print(token)
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
diff --git a/backend/openagent/experts/account_expert.py b/backend/openagent/experts/account_expert.py
diff --git a/backend/openagent/experts/article_expert.py b/backend/openagent/experts/article_expert.py
@@ -0,0 +1,54 @@
+import json
+from typing import Optional, Type
+
+from langchain.callbacks.manager import (
+    AsyncCallbackManagerForToolRun,
+    CallbackManagerForToolRun,
+)
+from langchain.tools import BaseTool
+from pydantic import BaseModel, Field
+
+from openagent.index.pgvector_store import store
+
+
+class ARGS(BaseModel):
+    keyword: str = Field(
+        description="keyword to search for",
+    )
+
+
+class ArticleExpert(BaseTool):
+    name = "article"
+    description = (
+        "A tool for searching web3-related articles. If you lack knowledge about web3, "
+        "you can use this tool to find relevant articles that can help answer "
+        "your questions. Provide a keyword or phrase related to the topic "
+        "you want to search for, and the tool will return a list of "
+        "relevant article excerpts. "
+        "The articles are sourced from IQWiki and Mirror."
+    )
+    args_schema: Type[ARGS] = ARGS
+
+    def _run(
+        self,
+        keyword: str,
+        run_manager: Optional[CallbackManagerForToolRun] = None,
+    ) -> str:
+        return self.search_articles(keyword)
+
+    async def _arun(
+        self,
+        keyword: str,
+        run_manager: Optional[AsyncCallbackManagerForToolRun] = None,
+    ) -> str:
+        return self.search_articles(keyword)
+
+    @staticmethod
+    def search_articles(keyword: str) -> str:
+        retriever = store.as_retriever(
+            search_type="similarity_score_threshold",
+            search_kwargs={"score_threshold": 0.8, "k": 3},
+        )
+        res = retriever.get_relevant_documents(keyword)
+        docs = list(map(lambda x: x.page_content, res))
+        return json.dumps(docs)