neo4j-contrib · vga91 · Jul 31, 2024 · May 29, 2024 · Dec 10, 2024
diff --git a/LICENSES.txt b/LICENSES.txt
@@ -3061,6 +3061,7 @@ MIT
   jnr-x86asm-1.0.2.jar
   jsoup-1.15.3.jar
   localstack-1.17.6.jar
+  milvus-1.19.7.jar
   mockito-core-3.12.4.jar
   mssql-jdbc-6.2.1.jre7.jar
   mysql-1.17.6.jar

diff --git a/NOTICE.txt b/NOTICE.txt
@@ -462,6 +462,7 @@ MIT
   jnr-x86asm-1.0.2.jar
   jsoup-1.15.3.jar
   localstack-1.17.6.jar
+  milvus-1.19.7.jar
   mockito-core-3.12.4.jar
   mssql-jdbc-6.2.1.jre7.jar
   mysql-1.17.6.jar

diff --git a/docs/asciidoc/modules/ROOT/images/pinecone-index.png b/docs/asciidoc/modules/ROOT/images/pinecone-index.png
diff --git a/docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/chroma.adoc b/docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/chroma.adoc
@@ -7,6 +7,7 @@ note that the list and the signature procedures are consistent with the others,
 [opts=header, cols="1, 3"]
 |===
 | name | description
+| apoc.vectordb.chroma.info(hostOrKey, collection, $config) | Get information about the specified existing collection or throws an error 500 if it does not exist
 | apoc.vectordb.chroma.createCollection(hostOrKey, collection, similarity, size, $config) |
     Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
     The default endpoint is `<hostOrKey param>/api/v1/collections`.
@@ -38,6 +39,19 @@ With hostOrKey=null, the default is 'http://localhost:8000'.
 
 === Examples
 
+.Get collection info (it leverages https://docs.trychroma.com/reference/py-client#get_collection[this API])
+[source,cypher]
+----
+CALL apoc.vectordb.chroma.info(hostOrKey, 'test_collection', {<optional config>})
+----
+
+.Example results
+[opts="header"]
+|===
+| value
+| {"name": "test_collection", "metadata": {"size": 4, "hnsw:space": "cosine"}, "database": "default_database", "id": "74ebe008-1ccb-4d3d-8c5d-cdd7cfa526c2", "tenant": "default_tenant"}
+|===
+
 .Create a collection (it leverages https://docs.trychroma.com/usage-guide#creating-inspecting-and-deleting-collections[this API])
 [source,cypher]
 ----

diff --git a/docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/index.adoc b/docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/index.adoc
@@ -49,15 +49,17 @@ See the following pages for more details on specific vector db procedures
 - xref:./qdrant.adoc[Qdrant]
 - xref:./chroma.adoc[ChromaDB]
 - xref:./weaviate.adoc[Weaviate]
+- xref:./pinecone.adoc[Pinecone]
+- xref:./milvus.adoc[Milvus]
 
 
-== Store Vector db info (i.e. `apoc.vectordb.configure`) 
+== Store Vector db info (i.e. `apoc.vectordb.configure`)
 
 We can save some info in the System Database to be reused later, that is the host, login credentials, and mapping,
 to be used in `*.get` and `.*query` procedures, except for the `apoc.vectordb.custom.get` one.
 
 Therefore, to store the vector info, we can execute the `CALL apoc.vectordb.configure(vectorName, keyConfig, databaseName, $configMap)`,
-where `vectorName` can be "QDRANT", "CHROMA" or "WEAVIATE", 
+where `vectorName` can be "QDRANT", "CHROMA", "PINECONE", "MILVUS" or "WEAVIATE", 
 that indicates info to be reused respectively by `apoc.vectordb.qdrant.*`, `apoc.vectordb.chroma.*` and `apoc.vectordb.weaviate.*`.
 
 Then `keyConfig` is the configuration name, `databaseName` is the database where the config will be set,

diff --git a/docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/pinecone.adoc b/docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/pinecone.adoc
@@ -0,0 +1,225 @@
+
+== Pinecone
+
+Here is a list of all available Pinecone procedures:
+
+[opts=header, cols="1, 3"]
+|===
+| name | description
+| apoc.vectordb.pinecone.createCollection(hostOrKey, index, similarity, size, $config) |
+    Creates an index, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
+    The default endpoint is `<hostOrKey param>/indexes`.
+| apoc.vectordb.pinecone.deleteCollection(hostOrKey, index, $config) | 
+    Deletes an index with the name specified in the 2nd parameter.
+    The default endpoint is `<hostOrKey param>/indexes/<collection param>`.
+| apoc.vectordb.pinecone.upsert(hostOrKey, index, vectors, $config) | 
+    Upserts, in the index with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}].
+    The default endpoint is `<hostOrKey param>/vectors/upsert`.
+| apoc.vectordb.pinecone.delete(hostOrKey, index, ids, $config) | 
+    Delete the vectors with the specified `ids`.
+    The default endpoint is `<hostOrKey param>/indexes/<collection param>`.
+| apoc.vectordb.pinecone.get(hostOrKey, index, ids, $config) | 
+    Get the vectors with the specified `ids`.
+    The default endpoint is `<hostOrKey param>/vectors/fetch`.
+| apoc.vectordb.pinecone.getAndUpdate(hostOrKey, index, ids, $config) | 
+    Get the vectors with the specified `ids`, and optionally creates/updates neo4j entities.
+    The default endpoint is `<hostOrKey param>/vectors/fetch`.
+| apoc.vectordb.pinecone.query(hostOrKey, index, vector, filter, limit, $config) | 
+    Retrieve closest vectors the the defined `vector`, `limit` of results, in the index with the name specified in the 2nd parameter.
+    The default endpoint is `<hostOrKey param>/query`.
+| apoc.vectordb.pinecone.queryAndUpdate(hostOrKey, index, vector, filter, limit, $config) | 
+    Retrieve closest vectors the the defined `vector`, `limit` of results, in the index with the name specified in the 2nd parameter, and optionally creates/updates neo4j entities.
+    The default endpoint is `<hostOrKey param>/query`.
+|===
+
+where the 1st parameter can be a key defined by the apoc config `apoc.pinecone.<key>.host=myHost`.
+
+[NOTE]
+====
+The procedures create/drop/handle an index, instead of a collection like the other vectordb procedures, 
+since in Pinecone a collection is a static and non-queryable copy of an index.
+
+Anyway, the create / delete index procedures are named `.createCollection` and `.deleteCollection` to be consistent with the other.
+====
+
+
+The default `hostOrKey` is `"https://api.pinecone.io"`,
+therefore in general can be null with the `createCollection` and `deleteCollection` procedures,
+and equal to the host name, with the other ones, that is, the one indicated in the Pinecone dashboard:
+
+image::pinecone-index.png[width=800]
+
+
+=== Examples
+
+The following example assume we want to create and manage an index called `test-index`.
+
+.Create an index (it leverages https://docs.pinecone.io/reference/api/control-plane/create_index[this API])
+[source,cypher]
+----
+CALL apoc.vectordb.pinecone.createCollection(null, 'test-index', 'cosine', 4, {<optional config>})
+----
+
+
+.Delete an index (it leverages https://docs.pinecone.io/reference/api/control-plane/delete_index[this API])
+[source,cypher]
+----
+CALL apoc.vectordb.pinecone.deleteCollection(null, 'test-index', {<optional config>})
+----
+
+
+.Upsert vectors (it leverages https://docs.pinecone.io/reference/api/data-plane/upsert[this API])
+[source,cypher]
+----
+CALL apoc.vectordb.pinecone.upsert('https://test-index-ilx67g5.svc.aped-4627-b74a.pinecone.io',
+  'test-index',
+  [
+    {id: '1', vector: [0.05, 0.61, 0.76, 0.74], metadata: {city: "Berlin", foo: "one"}},
+    {id: '2', vector: [0.19, 0.81, 0.75, 0.11], metadata: {city: "London", foo: "two"}}
+  ],
+  {<optional config>})
+----
+
+
+.Get vectors (it leverages https://docs.pinecone.io/reference/api/data-plane/fetch[this API])
+
+[source,cypher]
+----
+CALL apoc.vectordb.pinecone.get($host, 'test-index', [1,2], {<optional config>})
+----
+
+
+.Example results
+[opts="header"]
+|===
+| score | metadata | id | vector | text | entity
+| null | {city: "Berlin", foo: "one"} | null | null | null | null
+| null | {city: "Berlin", foo: "two"} | null | null | null | null
+| ...
+|===
+
+.Get vectors with `{allResults: true}`
+[source,cypher]
+----
+CALL apoc.vectordb.pinecone.get($host, 'test-index', ['1','2'], {allResults: true, <optional config>})
+----
+
+
+.Example results
+[opts="header"]
+|===
+| score | metadata | id | vector | text | entity
+| null | {city: "Berlin", foo: "one"} | 1 | [...] | null | null
+| null | {city: "Berlin", foo: "two"} | 2 | [...] | null | null
+| ...
+|===
+
+.Query vectors (it leverages https://docs.pinecone.io/reference/api/data-plane/query[this API])
+[source,cypher]
+----
+CALL apoc.vectordb.pinecone.query($host, 
+    'test-index', 
+    [0.2, 0.1, 0.9, 0.7], 
+    { city: { `$eq`: "London" } }, 
+    5, 
+    {allResults: true, <optional config>})
+----
+
+
+.Example results
+[opts="header"]
+|===
+| score | metadata | id | vector | text | entity
+| 1, | {city: "Berlin", foo: "one"} | 1 | [...] | null | null
+| 0.1 | {city: "Berlin", foo: "two"} | 2 | [...] | null | null
+| ...
+|===
+
+
+We can define a mapping, to auto-create one/multiple nodes and relationships, by leveraging the vector metadata.
+
+For example, if we have created 2 vectors with the above upsert procedures,
+we can populate some existing nodes (i.e. `(:Test {myId: 'one'})` and `(:Test {myId: 'two'})`):
+
+
+[source,cypher]
+----
+CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index',
+    [0.2, 0.1, 0.9, 0.7],
+    {},
+    5, 
+    { mapping: {
+            embeddingKey: "vect", 
+            nodeLabel: "Test", 
+            entityKey: "myId", 
+            metadataKey: "foo" 
+        }
+    })
+----
+
+which populates the two nodes as: `(:Test {myId: 'one', city: 'Berlin', vect: [vector1]})` and `(:Test {myId: 'two', city: 'London', vect: [vector2]})`,
+which will be returned in the `entity` column result.
+
+
+Or else, we can create a node if not exists, via `create: true`:
+
+[source,cypher]
+----
+CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index',
+    [0.2, 0.1, 0.9, 0.7],
+    {},
+    5, 
+    { mapping: {
+            create: true,
+            embeddingKey: "vect", 
+            nodeLabel: "Test", 
+            entityKey: "myId", 
+            metadataKey: "foo"
+        }
+    })
+----
+
+which creates and 2 new nodes as above.
+
+Or, we can populate an existing relationship (i.e. `(:Start)-[:TEST {myId: 'one'}]->(:End)` and `(:Start)-[:TEST {myId: 'two'}]->(:End)`):
+
+
+[source,cypher]
+----
+CALL apoc.vectordb.pinecone.queryAndUpdate($host, 'test-index',
+    [0.2, 0.1, 0.9, 0.7],
+    {},
+    5, 
+    { mapping: {
+            embeddingKey: "vect", 
+            relType: "TEST", 
+            entityKey: "myId", 
+            metadataKey: "foo" 
+        }
+    })
+----
+
+which populates the two relationships as: `()-[:TEST {myId: 'one', city: 'Berlin', vect: [vector1]}]-()`
+and `()-[:TEST {myId: 'two', city: 'London', vect: [vector2]}]-()`,
+which will be returned in the `entity` column result.
+
+[NOTE]
+====
+We can use mapping with `apoc.vectordb.pinecone.getAndUpdate` procedure as well
+====
+
+[NOTE]
+====
+To optimize performances, we can choose what to `YIELD` with the `apoc.vectordb.pinecone.query*` and the `apoc.vectordb.pinecone.get*` procedures.
+
+For example, by executing a `CALL apoc.vectordb.pinecone.query(...) YIELD metadata, score, id`, the RestAPI request will have an {"with_payload": false, "with_vectors": false},
+so that we do not return the other values that we do not need.
+====
+
+
+
+.Delete vectors (it leverages https://docs.pinecone.io/reference/api/data-plane/delete[this API])
+[source,cypher]
+----
+CALL apoc.vectordb.pinecone.delete($host, 'test-index', ['1','2'], {<optional config>})
+----
diff --git a/docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/qdrant.adoc b/docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/qdrant.adoc
@@ -6,6 +6,7 @@ note that the list and the signature procedures are consistent with the others,
 [opts=header, cols="1, 3"]
 |===
 | name | description
+| apoc.vectordb.qdrant.info(hostOrKey, collection, $config) | Get information about the specified existing collection or throws a FileNotFoundException if it does not exist
 | apoc.vectordb.qdrant.createCollection(hostOrKey, collection, similarity, size, $config) |
     Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
     The default endpoint is `<hostOrKey param>/collections/<collection param>`.
@@ -38,6 +39,29 @@ With hostOrKey=null, the default is 'http://localhost:6333'.
 
 === Examples
 
+.Get collection info (it leverages https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/get_collection[this API])
+[source,cypher]
+----
+CALL apoc.vectordb.qdrant.info(hostOrKey, 'test_collection', {<optional config>})
+----
+
+.Example results
+[opts="header"]
+|===
+| value
+| {"result": {"optimizer_status": "ok", "points_count": 2, "vectors_count": 2, "segments_count": 8, "indexed_vectors_count": 0,
+    "config": {"params": {"on_disk_payload": true, "vectors": {"size": 4, "distance": "Cosine"}, "shard_number": 1, "replication_factor": 1, "write_consistency_factor": 1},
+        "optimizer_config": {"max_optimization_threads": 1, "indexing_threshold": 20000, "deleted_threshold": 0.2, "flush_interval_sec": 5, "memmap_threshold": null, "default_segment_number": 0, "max_segment_size": null, "vacuum_min_vector_number": 1000}, "quantization_config": null,
+        "hnsw_config": {"max_indexing_threads": 0, "full_scan_threshold": 10000, "ef_construct": 100, "m": 16, "on_disk": false},
+        "wal_config": {"wal_segments_ahead": 0, "wal_capacity_mb": 32}
+        },
+        "status": green,
+        "payload_schema": {}
+    },
+    "time": 1.2725E-4, "status": ok
+}
+|===
+
 .Create a collection (it leverages https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/create_collection[this API])
 [source,cypher]
 ----

diff --git a/docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/weaviate.adoc b/docs/asciidoc/modules/ROOT/pages/database-integration/vectordb/weaviate.adoc
@@ -6,6 +6,7 @@ note that the list and the signature procedures are consistent with the others,
 [opts=header, cols="1, 3"]
 |===
 | name | description
+| apoc.vectordb.weaviate.info($host, $collectionName, $config) | Get information about the specified existing collection or throws a FileNotFoundException if it does not exist
 | apoc.vectordb.weaviate.createCollection(hostOrKey, collection, similarity, size, $config) |
     Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
     The default endpoint is `<hostOrKey param>/schema`.
@@ -39,6 +40,33 @@ With hostOrKey=null, the default is 'http://localhost:8080/v1'.
 
 === Examples
 
+.Get collection info (it leverages https://weaviate.io/developers/weaviate/api/rest#tag/schema/get/schema/{className}[this API])
+[source, cypher]
+----
+CALL apoc.vectordb.weaviate.info($host, 'test_collection', {<optional config>})
+----
+
+.Example results
+[opts="header"]
+|===
+| value
+| {"vectorizer": "none",
+    "invertedIndexConfig": {"bm25": {"b": 0.75, "k1": 1.2}, "stopwords": {"additions": null, "removals": null, "preset": en}, "cleanupIntervalSeconds": 60},
+    "vectorIndexConfig": {"ef": -1, "dynamicEfMin": 100, "pq": {"centroids": 256, "trainingLimit": 100000, "encoder": {"type": "kmeans", "distribution": "log-normal"},
+    "enabled": false, "bitCompression": false, "segments": 0
+    },
+    "distance": cosine, "skip": false, "dynamicEfFactor": 8, "bq": {"enabled": false},
+    "vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64},
+    "multiTenancyConfig": {"enabled": false},
+    "vectorIndexType": "hnsw", "replicationConfig": {"factor": 1},
+    "shardingConfig": {"desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id"},
+    "class": "TestCollection",
+    "properties": [{"name": "city", "description": "This property was generated by Weaviate's auto-schema feature on Wed Jul 10 12:50:18 2024", "indexFilterable": true, "tokenization": "word", "indexSearchable": true, "dataType": ["text"]},
+        {"name": "foo", "description": "This property was generated by Weaviate's auto-schema feature on Wed Jul 10 12:50:18 2024", "indexFilterable": true, "tokenization": word, "indexSearchable": true, "dataType": ["text"]}
+    ]
+}
+|===
+
 .Create a collection (it leverages https://weaviate.io/developers/weaviate/api/rest#tag/schema/post/schema[this API])
 [source,cypher]
 ----