Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CrateDB to LangChain LLM framework #1

Draft
wants to merge 28 commits into
base: release-v0.3.4
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
284c5d9
CrateDB vector: Add vector store support
amotl Sep 15, 2023
ba95bde
CrateDB vector: Add documentation
amotl Sep 15, 2023
00159ff
CrateDB loader: Add SQLAlchemy document loader
amotl Sep 16, 2023
473b66a
CrateDB loader: Add document loader support
amotl Sep 16, 2023
b9015c9
Community: Generalize `SQLChatMessageHistory` to improve code reusabi…
amotl Sep 17, 2023
8a0f3d6
CrateDB memory: Add conversational memory support
amotl Sep 17, 2023
3330b0d
CrateDB vector: Fix usage when only reading, and not storing
amotl Oct 27, 2023
38c2374
CrateDB vector: Unable to invoke `add_embeddings` without embeddings
amotl Oct 27, 2023
0f6adf9
CrateDB vector: Improve SQLAlchemy model factory
amotl Nov 20, 2023
2d30228
CrateDB vector: Fix cascading deletes
amotl Nov 20, 2023
9dfc828
CrateDB vector: Add CrateDBVectorSearchMultiCollection
amotl Nov 21, 2023
b72a06c
CrateDB vector: Improve SQLAlchemy data model query utility functions
amotl Nov 21, 2023
f8317fe
CrateDB vector: Improve testing when initialized without dimensionality
amotl Nov 21, 2023
53aee67
CrateDB vector: Use SA's `bulk_save_objects` method for inserting emb…
amotl Nov 21, 2023
70685ce
CrateDB vector: Test non-deterministic values by using pytest.approx
amotl Nov 22, 2023
ccd2a25
CrateDB vector: Fix initialization of vector dimensionality
amotl Nov 27, 2023
800ace6
CrateDB: Refactor to `langchain_community`
amotl Jan 18, 2024
b40c24f
CrateDB vector: Adjustments for updates to pgvector adapter
amotl Jan 18, 2024
cb06a66
CrateDB vector: Relax test constraint
amotl Jan 19, 2024
fa28b24
CrateDB loader: SQLAlchemyLoader has been superseded by SQLDatabaseLo…
amotl Jun 5, 2024
41ccacf
CrateDB: Migrate from `crate[sqlalchemy]` to `sqlalchemy-cratedb`
amotl Jun 10, 2024
3bc63a8
CrateDB: Stop using CrateDB Toolkit
amotl Jun 18, 2024
c561a95
CrateDB: Stop using local `FloatVector` implementation
amotl Jun 25, 2024
8b278a8
CrateDB: Format code. Satisfy linter and type checker. ruff + mypy
amotl Oct 24, 2024
41f6462
CrateDB: Remove adjustment to ConsistentFakeEmbeddings in langchain-core
amotl Oct 28, 2024
19a09ab
CrateDB: Refactor leftovers from langchain-core to langchain-community
amotl Oct 28, 2024
91da770
CrateDB: Remove documentation about SQLDatabaseLoader
amotl Oct 28, 2024
1faedfe
CrateDB: Remove leftovers in langchain-core
amotl Oct 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/docs/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ node_modules/

.docusaurus
.cache-loader
docs/api
docs/api
example.sqlite
232 changes: 232 additions & 0 deletions docs/docs/integrations/document_loaders/cratedb.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,232 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# CrateDB\n",
"\n",
"This notebook demonstrates how to load documents from a [CrateDB] database,\n",
"using the [SQLAlchemy] document loader.\n",
"\n",
"It loads the result of a database query with one document per row.\n",
"\n",
"[CrateDB]: https://github.com/crate/crate\n",
"[SQLAlchemy]: https://www.sqlalchemy.org/"
]
},
{
"cell_type": "markdown",
"source": [
"## Prerequisites"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"#!pip install crash langchain sqlalchemy-cratedb"
]
},
{
"cell_type": "markdown",
"source": [
"Populate database."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001B[32mCONNECT OK\r\n",
"\u001B[0m\u001B[32mPSQL OK, 1 row affected (0.001 sec)\r\n",
"\u001B[0m\u001B[32mDELETE OK, 30 rows affected (0.008 sec)\r\n",
"\u001B[0m\u001B[32mINSERT OK, 30 rows affected (0.011 sec)\r\n",
"\u001B[0m\u001B[0m\u001B[32mCONNECT OK\r\n",
"\u001B[0m\u001B[32mREFRESH OK, 1 row affected (0.001 sec)\r\n",
"\u001B[0m\u001B[0m"
]
}
],
"source": [
"!crash < ./example_data/mlb_teams_2012.sql\n",
"!crash --command \"REFRESH TABLE mlb_teams_2012;\""
]
},
{
"cell_type": "markdown",
"source": [
"## Usage"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from langchain.document_loaders import CrateDBLoader\n",
"from pprint import pprint\n",
"\n",
"CONNECTION_STRING = \"crate://crate@localhost/\"\n",
"\n",
"loader = CrateDBLoader(\n",
" 'SELECT * FROM mlb_teams_2012 ORDER BY \"Team\" LIMIT 5;',\n",
" url=CONNECTION_STRING,\n",
")\n",
"documents = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='Team: Angels\\nPayroll (millions): 154.49\\nWins: 89', metadata={}),\n",
" Document(page_content='Team: Astros\\nPayroll (millions): 60.65\\nWins: 55', metadata={}),\n",
" Document(page_content='Team: Athletics\\nPayroll (millions): 55.37\\nWins: 94', metadata={}),\n",
" Document(page_content='Team: Blue Jays\\nPayroll (millions): 75.48\\nWins: 73', metadata={}),\n",
" Document(page_content='Team: Braves\\nPayroll (millions): 83.31\\nWins: 94', metadata={})]\n"
]
}
],
"source": [
"pprint(documents)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Specifying Which Columns are Content vs Metadata"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"loader = CrateDBLoader(\n",
" 'SELECT * FROM mlb_teams_2012 ORDER BY \"Team\" LIMIT 5;',\n",
" url=CONNECTION_STRING,\n",
" page_content_columns=[\"Team\"],\n",
" metadata_columns=[\"Payroll (millions)\"],\n",
")\n",
"documents = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='Team: Angels', metadata={'Payroll (millions)': 154.49}),\n",
" Document(page_content='Team: Astros', metadata={'Payroll (millions)': 60.65}),\n",
" Document(page_content='Team: Athletics', metadata={'Payroll (millions)': 55.37}),\n",
" Document(page_content='Team: Blue Jays', metadata={'Payroll (millions)': 75.48}),\n",
" Document(page_content='Team: Braves', metadata={'Payroll (millions)': 83.31})]\n"
]
}
],
"source": [
"pprint(documents)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adding Source to Metadata"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"loader = CrateDBLoader(\n",
" 'SELECT * FROM mlb_teams_2012 ORDER BY \"Team\" LIMIT 5;',\n",
" url=CONNECTION_STRING,\n",
" source_columns=[\"Team\"],\n",
")\n",
"documents = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[Document(page_content='Team: Angels\\nPayroll (millions): 154.49\\nWins: 89', metadata={'source': 'Angels'}),\n",
" Document(page_content='Team: Astros\\nPayroll (millions): 60.65\\nWins: 55', metadata={'source': 'Astros'}),\n",
" Document(page_content='Team: Athletics\\nPayroll (millions): 55.37\\nWins: 94', metadata={'source': 'Athletics'}),\n",
" Document(page_content='Team: Blue Jays\\nPayroll (millions): 75.48\\nWins: 73', metadata={'source': 'Blue Jays'}),\n",
" Document(page_content='Team: Braves\\nPayroll (millions): 83.31\\nWins: 94', metadata={'source': 'Braves'})]\n"
]
}
],
"source": [
"pprint(documents)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
-- Provisioning table "mlb_teams_2012".
--
-- crash < mlb_teams_2012.sql
-- psql postgresql://postgres@localhost < mlb_teams_2012.sql

DROP TABLE IF EXISTS mlb_teams_2012;
Expand Down
Loading