Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NOID] Fixes #4124: Better document output of vector db procedures (#4154) #4274

Merged
merged 1 commit into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -44,13 +44,20 @@ With hostOrKey=null, the default is 'http://localhost:8000'.
CALL apoc.vectordb.chroma.createCollection($host, 'test_collection', 'Cosine', 4, {<optional config>})
----

.Example results
[opts="header"]
|===
| name | metadata | database | id | tenant
| test_collection | {"size": 4, "hnsw:space": "cosine"} | default_database | 9c046861-f46f-417d-bd01-ca8c9f99aee5 | default_tenant
|===

.Delete a collection (it leverages https://docs.trychroma.com/usage-guide#creating-inspecting-and-deleting-collections[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.deleteCollection($host, '<collection_id>', {<optional config>})
----

which returns an empty result.

.Upsert vectors (it leverages https://docs.trychroma.com/usage-guide#adding-data-to-a-collection[this API])
[source,cypher]
Expand All @@ -63,6 +70,7 @@ CALL apoc.vectordb.qdrant.upsert($host, '<collection_id>',
{<optional config>})
----

which returns an empty result.

.Get vectors (it leverages https://docs.trychroma.com/usage-guide#querying-a-collection[this API])
[source,cypher]
Expand Down Expand Up @@ -149,9 +157,12 @@ CALL apoc.vectordb.chroma.query($host, '<collection_id>',



which returns a string that answers the `$question` by leveraging the embeddings of the db vector.

.Delete vectors (it leverages https://docs.trychroma.com/usage-guide#deleting-data-from-a-collection[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.delete($host, '<collection_id>', [1,2], {<optional config>})
----

which returns an array of strings of deleted ids. For example, `["1", "2"]`
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@

= Milvus

Here is a list of all available Milvus procedures:

[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.milvus.createCollection(hostOrKey, collection, similarity, size, $config) |
Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/v2/vectordb/collections/create`.
| apoc.vectordb.milvus.deleteCollection(hostOrKey, collection, $config) |
Deletes a collection with the name specified in the 2nd parameter.
The default endpoint is `<hostOrKey param>/v2/vectordb/collections/drop`.
| apoc.vectordb.milvus.upsert(hostOrKey, collection, vectors, $config) |
Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}].
The default endpoint is `<hostOrKey param>/v2/vectordb/entities/upsert`.
| apoc.vectordb.milvus.delete(hostOrKey, collection, ids, $config) |
Delete the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/v2/vectordb/entities/delete`.
| apoc.vectordb.milvus.get(hostOrKey, collection, ids, $config) |
Get the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/v2/vectordb/entities/get`.
| apoc.vectordb.milvus.query(hostOrKey, collection, vector, filter, limit, $config) |
Retrieve closest vectors the the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter.
The default endpoint is `<hostOrKey param>/v2/vectordb/entities/search`.
| apoc.vectordb.milvus.getAndUpdate(hostOrKey, collection, ids, $config) |
Get the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/v2/vectordb/entities/get`, and optionally creates/updates neo4j entities.
| apoc.vectordb.milvus.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $config) |
Retrieve closest vectors the the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter, and optionally creates/updates neo4j entities.
The default endpoint is `<hostOrKey param>/v2/vectordb/entities/search`.
|===

where the 1st parameter can be a key defined by the apoc config `apoc.milvus.<key>.host=myHost`.
With hostOrKey=null, the default host is 'http://localhost:19530'.

== Examples

Here is a list of example using a local installation using th default port `19531`.


.Create a collection (it leverages https://milvus.io/api-reference/restful/v2.4.x/v2/Collection%20(v2)/Create.md[this API])
[source,cypher]
----
CALL apoc.vectordb.milvus.createCollection('http://localhost:19531', 'test_collection', 'COSINE', 4, {<optional config>})
----

.Example results
[opts="header"]
|===
| data | code
| null | 200
|===

.Delete a collection (it leverages https://milvus.io/api-reference/restful/v2.4.x/v2/Collection%20(v2)/Drop.md[this API])
[source,cypher]
----
CALL apoc.vectordb.milvus.deleteCollection('http://localhost:19531', 'test_collection', {<optional config>})
----

.Example results
[opts="header"]
|===
| data | code
| null | 200
|===


.Upsert vectors (it leverages https://milvus.io/api-reference/restful/v2.4.x/v2/Vector%20(v2)/Upsert.md[this API])
[source,cypher]
----
CALL apoc.vectordb.milvus.upsert('http://localhost:19531', 'test_collection',
[
{id: 1, vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}},
{id: 2, vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}}
],
{<optional config>})
----

.Example results
[opts="header"]
|===
| data | code
| {"upsertCount": 2, "upsertId": [1, 2]} | 200
|===


.Get vectors (it leverages https://milvus.io/api-reference/restful/v2.4.x/v2/Vector%20(v2)/Get.md[this API])
[source,cypher]
----
CALL apoc.vectordb.milvus.get('http://localhost:19531', 'test_collection', [1,2], {<optional config>})
----


.Example results
[opts="header"]
|===
| score | metadata | id | vector | text | entity
| null | {city: "Berlin", foo: "one"} | null | null | null | null
| null | {city: "Berlin", foo: "two"} | null | null | null | null
| ...
|===

.Get vectors with `{allResults: true}`
[source,cypher]
----
CALL apoc.vectordb.milvus.get('http://localhost:19531', 'test_collection', [1,2], {allResults: true, <optional config>})
----


.Example results
[opts="header"]
|===
| score | metadata | id | vector | text | entity
| null | {city: "Berlin", foo: "one"} | 1 | [...] | null | null
| null | {city: "Berlin", foo: "two"} | 2 | [...] | null | null
| ...
|===

.Query vectors (it leverages https://milvus.io/api-reference/restful/v2.4.x/v2/Vector%20(v2)/Query.md[this API])
[source,cypher]
----
CALL apoc.vectordb.milvus.query('http://localhost:19531',
'test_collection',
[0.2, 0.1, 0.9, 0.7],
{ must:
[ { key: "city", match: { value: "London" } } ]
},
5,
{allResults: true, <optional config>})
----


.Example results
[opts="header"]
|===
| score | metadata | id | vector | text | entity
| 1, | {city: "Berlin", foo: "one"} | 1 | [...] | null | null
| 0.1 | {city: "Berlin", foo: "two"} | 2 | [...] | null | null
| ...
|===


We can define a mapping, to auto-create one/multiple nodes and relationships, by leveraging the vector metadata.

For example, if we have created 2 vectors with the above upsert procedures,
we can populate some existing nodes (i.e. `(:Test {myId: 'one'})` and `(:Test {myId: 'two'})`):


[source,cypher]
----
CALL apoc.vectordb.milvus.queryAndUpdate('http://localhost:19531', 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
----

which populates the two nodes as: `(:Test {myId: 'one', city: 'Berlin', vect: [vector1]})` and `(:Test {myId: 'two', city: 'London', vect: [vector2]})`,
which will be returned in the `entity` column result.


We can also set the mapping configuration `mode` to `CREATE_IF_MISSING` (which creates nodes if not exist), `READ_ONLY` (to search for nodes/rels, without making updates) or `UPDATE_EXISTING` (default behavior):

[source,cypher]
----
CALL apoc.vectordb.milvus.queryAndUpdate('http://localhost:19531', 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
mode: "CREATE_IF_MISSING",
embeddingKey: "vect",
nodeLabel: "Test",
entityKey: "myId",
metadataKey: "foo"
}
})
----

which creates and 2 new nodes as above.

Or, we can populate an existing relationship (i.e. `(:Start)-[:TEST {myId: 'one'}]->(:End)` and `(:Start)-[:TEST {myId: 'two'}]->(:End)`):


[source,cypher]
----
CALL apoc.vectordb.milvus.queryAndUpdate('http://localhost:19531', 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
----

which populates the two relationships as: `()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-()`
and `()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()`,
which will be returned in the `entity` column result.


We can also use mapping for `apoc.vectordb.milvus.query` procedure, to search for nodes/rels fitting label/type and metadataKey, without making updates
(i.e. equivalent to `*.queryOrUpdate` procedure with mapping config having `mode: "READ_ONLY"`).

For example, with the previous relationships, we can execute the following procedure, which just return the relationships in the column `rel`:

[source,cypher]
----
CALL apoc.vectordb.milvus.query('http://localhost:19531', 'test_collection',
[0.2, 0.1, 0.9, 0.7],
{},
5,
{ mapping: {
embeddingKey: "vect",
relType: "TEST",
entityKey: "myId",
metadataKey: "foo"
}
})
----

[NOTE]
====
We can use mapping with `apoc.vectordb.milvus.get*` procedures as well
====

[NOTE]
====
To optimize performances, we can choose what to `YIELD` with the `apoc.vectordb.milvus.query*` and the `apoc.vectordb.milvus.get*` procedures.

For example, by executing a `CALL apoc.vectordb.milvus.query(...) YIELD metadata, score, id`, the RestAPI request will have an {"with_payload": false, "with_vectors": false},
so that we do not return the other values that we do not need.
====

It is possible to execute vector db procedures together with the xref::ml/rag.adoc[apoc.ml.rag] as follow:

[source,cypher]
----
CALL apoc.vectordb.milvus.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
----

which returns a string that answers the `$question` by leveraging the embeddings of the db vector.

.Delete vectors (it leverages https://milvus.io/api-reference/restful/v2.4.x/v2/Vector%20(v2)/Delete.md[this API])
[source,cypher]
----
CALL apoc.vectordb.milvus.delete('http://localhost:19531', 'test_collection', [1,2], {<optional config>})
----

.Example results
[opts="header"]
|===
| data | code
| null | 200
|===
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,25 @@ With hostOrKey=null, the default is 'http://localhost:6333'.
CALL apoc.vectordb.qdrant.createCollection($hostOrKey, 'test_collection', 'Cosine', 4, {<optional config>})
----

.Example results
[opts="header"]
|===
| result | time | status
| true | 0.094182458 | "ok"
|===

.Delete a collection (it leverages https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/delete_collection[this API])
[source,cypher]
----
CALL apoc.vectordb.qdrant.deleteCollection($hostOrKey, 'test_collection', {<optional config>})
----

.Example results
[opts="header"]
|===
| result | time | status
| true | 0.094182458 | "ok"
|===

.Upsert vectors (it leverages https://qdrant.github.io/qdrant/redoc/index.html#tag/points/operation/upsert_points[this API])
[source,cypher]
Expand All @@ -64,6 +76,12 @@ CALL apoc.vectordb.qdrant.upsert($hostOrKey, 'test_collection',
{<optional config>})
----

.Example results
[opts="header"]
|===
| result | time | status
| {"result": { "operation_id": 0, "status": "acknowledged" } } | 0.094182458 | "ok"
|===

.Get vectors (it leverages https://qdrant.github.io/qdrant/redoc/index.html#tag/points/operation/get_points[this API])
[source,cypher]
Expand Down Expand Up @@ -202,8 +220,17 @@ so that we do not return the other values that we do not need.



which returns a string that answers the `$question` by leveraging the embeddings of the db vector.

.Delete vectors (it leverages https://qdrant.github.io/qdrant/redoc/index.html#tag/points/operation/delete_vectors[this API])
[source,cypher]
----
CALL apoc.vectordb.qdrant.delete($hostOrKey, 'test_collection', [1,2], {<optional config>})
----

.Example results
[opts="header"]
|===
| result | time | status
| {"result": { "operation_id": 2, "status": "acknowledged" } } | 0.094182458 | "ok"
|===
Loading
Loading