-
Notifications
You must be signed in to change notification settings - Fork 494
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
* Fixes #4080: Add Pinecone and Milvus support * small code refactoring * changes review * fix PineconeTest without env vars
- Loading branch information
Showing
29 changed files
with
2,067 additions
and
126 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
225 changes: 225 additions & 0 deletions
225
docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/pinecone.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,225 @@ | ||
|
||
== Pinecone | ||
|
||
Here is a list of all available Pinecone procedures: | ||
|
||
[opts=header, cols="1, 3"] | ||
|=== | ||
| name | description | ||
| apoc.vectordb.pinecone.createCollection(hostOrKey, index, similarity, size, $config) | | ||
Creates an index, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`. | ||
The default endpoint is `<hostOrKey param>/indexes`. | ||
| apoc.vectordb.pinecone.deleteCollection(hostOrKey, index, $config) | | ||
Deletes an index with the name specified in the 2nd parameter. | ||
The default endpoint is `<hostOrKey param>/indexes/<collection param>`. | ||
| apoc.vectordb.pinecone.upsert(hostOrKey, index, vectors, $config) | | ||
Upserts, in the index with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]. | ||
The default endpoint is `<hostOrKey param>/vectors/upsert`. | ||
| apoc.vectordb.pinecone.delete(hostOrKey, index, ids, $config) | | ||
Delete the vectors with the specified `ids`. | ||
The default endpoint is `<hostOrKey param>/indexes/<collection param>`. | ||
| apoc.vectordb.pinecone.get(hostOrKey, index, ids, $config) | | ||
Get the vectors with the specified `ids`. | ||
The default endpoint is `<hostOrKey param>/vectors/fetch`. | ||
| apoc.vectordb.pinecone.getAndUpdate(hostOrKey, index, ids, $config) | | ||
Get the vectors with the specified `ids`, and optionally creates/updates neo4j entities. | ||
The default endpoint is `<hostOrKey param>/vectors/fetch`. | ||
| apoc.vectordb.pinecone.query(hostOrKey, index, vector, filter, limit, $config) | | ||
Retrieve closest vectors the the defined `vector`, `limit` of results, in the index with the name specified in the 2nd parameter. | ||
The default endpoint is `<hostOrKey param>/query`. | ||
| apoc.vectordb.pinecone.queryAndUpdate(hostOrKey, index, vector, filter, limit, $config) | | ||
Retrieve closest vectors the the defined `vector`, `limit` of results, in the index with the name specified in the 2nd parameter, and optionally creates/updates neo4j entities. | ||
The default endpoint is `<hostOrKey param>/query`. | ||
|=== | ||
|
||
where the 1st parameter can be a key defined by the apoc config `apoc.pinecone.<key>.host=myHost`. | ||
|
||
[NOTE] | ||
==== | ||
The procedures create/drop/handle an index, instead of a collection like the other vectordb procedures, | ||
since in Pinecone a collection is a static and non-queryable copy of an index. | ||
Anyway, the create / delete index procedures are named `.createCollection` and `.deleteCollection` to be consistent with the other. | ||
==== | ||
|
||
|
||
The default `hostOrKey` is `"https://api.pinecone.io"`, | ||
therefore in general can be null with the `createCollection` and `deleteCollection` procedures, | ||
and equal to the host name, with the other ones, that is, the one indicated in the Pinecone dashboard: | ||
|
||
image::pinecone-index.png[width=800] | ||
|
||
|
||
=== Examples | ||
|
||
The following example assume we want to create and manage an index called `test-index`. | ||
|
||
.Create an index (it leverages https://docs.pinecone.io/reference/api/control-plane/create_index[this API]) | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.pinecone.createCollection(null, 'test-index', 'cosine', 4, {<optional config>}) | ||
---- | ||
|
||
|
||
.Delete an index (it leverages https://docs.pinecone.io/reference/api/control-plane/delete_index[this API]) | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.pinecone.deleteCollection(null, 'test-index', {<optional config>}) | ||
---- | ||
|
||
|
||
.Upsert vectors (it leverages https://docs.pinecone.io/reference/api/data-plane/upsert[this API]) | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.pinecone.upsert('https://test-index-ilx67g5.svc.aped-4627-b74a.pinecone.io', | ||
'test-index', | ||
[ | ||
{id: '1', vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}}, | ||
{id: '2', vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}} | ||
], | ||
{<optional config>}) | ||
---- | ||
|
||
|
||
.Get vectors (it leverages https://docs.pinecone.io/reference/api/data-plane/fetch[this API]) | ||
|
||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.pinecone.get($host, 'test-index', [1,2], {<optional config>}) | ||
---- | ||
|
||
|
||
.Example results | ||
[opts="header"] | ||
|=== | ||
| score | metadata | id | vector | text | entity | ||
| null | {city: "Berlin", foo: "one"} | null | null | null | null | ||
| null | {city: "Berlin", foo: "two"} | null | null | null | null | ||
| ... | ||
|=== | ||
|
||
.Get vectors with `{allResults: true}` | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.pinecone.get($host, 'test-index', ['1','2'], {allResults: true, <optional config>}) | ||
---- | ||
|
||
|
||
.Example results | ||
[opts="header"] | ||
|=== | ||
| score | metadata | id | vector | text | entity | ||
| null | {city: "Berlin", foo: "one"} | 1 | [...] | null | null | ||
| null | {city: "Berlin", foo: "two"} | 2 | [...] | null | null | ||
| ... | ||
|=== | ||
|
||
.Query vectors (it leverages https://docs.pinecone.io/reference/api/data-plane/query[this API]) | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.pinecone.query($host, | ||
'test-index', | ||
[0.2, 0.1, 0.9, 0.7], | ||
{ city: { `$eq`: "London" } }, | ||
5, | ||
{allResults: true, <optional config>}) | ||
---- | ||
|
||
|
||
.Example results | ||
[opts="header"] | ||
|=== | ||
| score | metadata | id | vector | text | entity | ||
| 1, | {city: "Berlin", foo: "one"} | 1 | [...] | null | null | ||
| 0.1 | {city: "Berlin", foo: "two"} | 2 | [...] | null | null | ||
| ... | ||
|=== | ||
|
||
|
||
We can define a mapping, to auto-create one/multiple nodes and relationships, by leveraging the vector metadata. | ||
|
||
For example, if we have created 2 vectors with the above upsert procedures, | ||
we can populate some existing nodes (i.e. `(:Test {myId: 'one'})` and `(:Test {myId: 'two'})`): | ||
|
||
|
||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index', | ||
[0.2, 0.1, 0.9, 0.7], | ||
{}, | ||
5, | ||
{ mapping: { | ||
embeddingKey: "vect", | ||
nodeLabel: "Test", | ||
entityKey: "myId", | ||
metadataKey: "foo" | ||
} | ||
}) | ||
---- | ||
|
||
which populates the two nodes as: `(:Test {myId: 'one', city: 'Berlin', vect: [vector1]})` and `(:Test {myId: 'two', city: 'London', vect: [vector2]})`, | ||
which will be returned in the `entity` column result. | ||
|
||
|
||
Or else, we can create a node if not exists, via `create: true`: | ||
|
||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index', | ||
[0.2, 0.1, 0.9, 0.7], | ||
{}, | ||
5, | ||
{ mapping: { | ||
create: true, | ||
embeddingKey: "vect", | ||
nodeLabel: "Test", | ||
entityKey: "myId", | ||
metadataKey: "foo" | ||
} | ||
}) | ||
---- | ||
|
||
which creates and 2 new nodes as above. | ||
|
||
Or, we can populate an existing relationship (i.e. `(:Start)-[:TEST {myId: 'one'}]->(:End)` and `(:Start)-[:TEST {myId: 'two'}]->(:End)`): | ||
|
||
|
||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index', | ||
[0.2, 0.1, 0.9, 0.7], | ||
{}, | ||
5, | ||
{ mapping: { | ||
embeddingKey: "vect", | ||
relType: "TEST", | ||
entityKey: "myId", | ||
metadataKey: "foo" | ||
} | ||
}) | ||
---- | ||
|
||
which populates the two relationships as: `()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-()` | ||
and `()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()`, | ||
which will be returned in the `entity` column result. | ||
|
||
[NOTE] | ||
==== | ||
We can use mapping with `apoc.vectordb.pinecone.getAndUpdate` procedure as well | ||
==== | ||
|
||
[NOTE] | ||
==== | ||
To optimize performances, we can choose what to `YIELD` with the `apoc.vectordb.pinecone.query*` and the `apoc.vectordb.pinecone.get*` procedures. | ||
For example, by executing a `CALL apoc.vectordb.pinecone.query(...) YIELD metadata, score, id`, the RestAPI request will have an {"with_payload": false, "with_vectors": false}, | ||
so that we do not return the other values that we do not need. | ||
==== | ||
|
||
|
||
|
||
.Delete vectors (it leverages https://docs.pinecone.io/reference/api/data-plane/delete[this API]) | ||
[source,cypher] | ||
---- | ||
CALL apoc.vectordb.pinecone.delete($host, 'test-index', ['1','2'], {<optional config>}) | ||
---- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.