Skip to content

Commit

Permalink
update for programmatic index creation (#342)
Browse files Browse the repository at this point in the history
Co-authored-by: Adam Breindel <c-adam.breindel@anyscale.com>
  • Loading branch information
adbreind and Adam Breindel authored Sep 24, 2024
1 parent b319409 commit 32d6ed6
Show file tree
Hide file tree
Showing 2 changed files with 145 additions and 131 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@
"id": "3986fc0a-86f6-4b6e-aed0-cee807243c9f",
"metadata": {},
"source": [
"## Test our database connection"
"## Test database connection"
]
},
{
Expand All @@ -84,7 +84,7 @@
"import pymongo\n",
"from pymongo import MongoClient, ASCENDING, DESCENDING\n",
"import os\n",
"from pymongo.operations import IndexModel"
"from pymongo.operations import IndexModel, SearchIndexModel"
]
},
{
Expand Down Expand Up @@ -124,7 +124,7 @@
"source": [
"### Setup collection\n",
"\n",
"Run this code one time after you've created your database, to set up the collection and basic indexes"
"Run this code one time after you've created your database, to set up the collection and indexes"
]
},
{
Expand All @@ -148,163 +148,102 @@
")"
]
},
{
"cell_type": "markdown",
"id": "4d47c025-c220-4ac8-904f-6a1224a36f11",
"metadata": {},
"source": [
"### Count docs"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ceb9d962-0a99-472b-85b3-b1a048621ca4",
"id": "35d1bc92-9d22-4f85-aef0-e6be8f1b38b0",
"metadata": {},
"outputs": [],
"source": [
"my_collection = db[collection_name]"
"fts_model = SearchIndexModel(\n",
" definition={\n",
" \"mappings\": {\n",
" \"dynamic\": False,\n",
" \"fields\": {\n",
" \"name\": {\"type\": \"string\", \"analyzer\": \"lucene.standard\",}\n",
" }\n",
" }\n",
" },\n",
" name=\"lexical_text_search_index\",\n",
" type=\"search\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5814f906-2d18-4c13-a135-d0f2bd85c7b3",
"id": "df23997f-c7cc-4cec-8f16-11014ad8b733",
"metadata": {},
"outputs": [],
"source": [
"my_collection.count_documents({})"
]
},
{
"cell_type": "markdown",
"id": "c8fed381-8bed-4560-80fb-1ae7d7e0db0a",
"metadata": {},
"source": [
"---\n",
"\n",
"## Create a vector search index in MongoDB\n",
"\n",
"To do so, follow these steps\n",
"\n",
"1. Select your database\n",
"2. Click on the \"Search Indexes\" tab or the \"Atlas Search\" tab\n",
"\n",
"<img src=\"https://anyscale-public-materials.s3.us-west-2.amazonaws.com/mongodb-demo/screenshots/create_index_manual.png\" width=\"800px\" alt=\"create-index-manual\"/>\n",
"\n",
"3. Click on the \"Create a Search index\" button\n",
"\n",
"#### From here on, follow these steps to build the Atlas Vector Search Index\n",
"\n",
"1. Click on the \"JSON Editor\" Option\n",
"\n",
"<img src=\"https://anyscale-public-materials.s3.us-west-2.amazonaws.com/mongodb-demo/screenshots/atlas_vector_search_json_editor_selected.png\" width=\"800px\" alt=\"json-editor-selected\"/>\n",
"\n",
"2. Click Next\n",
"\n",
"3. Copy this JSON"
]
},
{
"cell_type": "raw",
"id": "02fd950e-4b93-452c-814a-3f4e8b2975a5",
"metadata": {},
"source": [
"{ \"fields\": [\n",
" {\n",
" \"numDimensions\": 1024,\n",
" \"similarity\": \"cosine\",\n",
" \"type\": \"vector\",\n",
" \"path\": \"description_embedding\"\n",
" },\n",
" {\n",
" \"numDimensions\": 1024,\n",
" \"similarity\": \"cosine\",\n",
" \"type\": \"vector\",\n",
" \"path\": \"name_embedding\"\n",
" }, \n",
" {\n",
" \"type\": \"filter\",\n",
" \"path\": \"category\"\n",
" },\n",
" {\n",
" \"type\": \"filter\",\n",
" \"path\": \"season\"\n",
" },\n",
" {\n",
" \"type\": \"filter\",\n",
" \"path\": \"color\"\n",
" },\n",
" {\n",
" \"type\": \"filter\",\n",
" \"path\": \"rating\"\n",
" },\n",
" {\n",
" \"type\": \"filter\",\n",
" \"path\": \"price\"\n",
" }\n",
" ]\n",
"}"
"vs_model = SearchIndexModel(\n",
" definition={\n",
" \"fields\": [\n",
" {\n",
" \"numDimensions\": 1024,\n",
" \"similarity\": \"cosine\",\n",
" \"type\": \"vector\",\n",
" \"path\": \"description_embedding\",\n",
" },\n",
" {\n",
" \"numDimensions\": 1024,\n",
" \"similarity\": \"cosine\",\n",
" \"type\": \"vector\",\n",
" \"path\": \"name_embedding\",\n",
" }, \n",
" {\n",
" \"type\": \"filter\",\n",
" \"path\": \"category\",\n",
" },\n",
" {\n",
" \"type\": \"filter\",\n",
" \"path\": \"season\",\n",
" },\n",
" {\n",
" \"type\": \"filter\",\n",
" \"path\": \"color\",\n",
" },\n",
" {\n",
" \"type\": \"filter\",\n",
" \"path\": \"rating\",\n",
" },\n",
" {\n",
" \"type\": \"filter\",\n",
" \"path\": \"price\",\n",
" },\n",
" ],\n",
" },\n",
" name=\"vector_search_index\",\n",
" type=\"vectorSearch\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "26df3eaf-ca65-4556-bddc-7b7161a6d10d",
"cell_type": "code",
"execution_count": null,
"id": "170f4bd6-e70c-4d5b-b872-0a5123082206",
"metadata": {},
"outputs": [],
"source": [
"#### Ensure the index is called `vector_search_index`\n",
"\n",
"4. Select your collection in the left-hand menu, paste the JSON into the text area, and click Next\n",
"\n",
"<img src=\"https://anyscale-public-materials.s3.us-west-2.amazonaws.com/ray-summit/multimodal-search/vector_index.png\" width=\"800px\">"
"my_collection.create_search_indexes(models=[fts_model, vs_model])"
]
},
{
"cell_type": "markdown",
"id": "8d316bf5-662b-4434-a33c-58e98169e819",
"metadata": {},
"source": [
"---\n",
"\n",
"## Create a full-text search index in MongoDB\n",
"\n",
"1. Click on the \"JSON Editor\" Option\n",
"\n",
"<img src=\"https://anyscale-public-materials.s3.us-west-2.amazonaws.com/mongodb-demo/screenshots/atlas_search_json_editor_selected.png\" width=\"800px\" alt=\"json-editor-selected\">\n",
"\n",
"2. Click Next\n",
"\n",
"3. Copy the JSON"
]
},
{
"cell_type": "raw",
"id": "f1d37476-ae41-4108-a2c2-adc057bf6849",
"id": "4d47c025-c220-4ac8-904f-6a1224a36f11",
"metadata": {},
"source": [
"{\n",
" \"mappings\": {\n",
" \"dynamic\": false,\n",
" \"fields\": {\n",
" \"name\": {\n",
" \"type\": \"string\",\n",
" \"analyzer\": \"lucene.standard\"\n",
" }\n",
" }\n",
" }\n",
"}"
"### Count docs"
]
},
{
"cell_type": "markdown",
"id": "73c1ff09-d9da-40b9-b5f8-3a933f862676",
"cell_type": "code",
"execution_count": null,
"id": "5814f906-2d18-4c13-a135-d0f2bd85c7b3",
"metadata": {},
"outputs": [],
"source": [
"#### Ensure the index is called `lexical_text_search_index`\n",
"\n",
"4. Select your collection in the left-hand menu, paste the JSON, and click Next\n",
"\n",
"<img src=\"https://anyscale-public-materials.s3.us-west-2.amazonaws.com/ray-summit/multimodal-search/text_index.png\" width=\"800px\" alt=\"json-editor-full\">"
"my_collection.count_documents({})"
]
},
{
Expand Down
Loading

0 comments on commit 32d6ed6

Please sign in to comment.