Skip to content

Latest commit

 

History

History
386 lines (312 loc) · 10.5 KB

search.mdx

File metadata and controls

386 lines (312 loc) · 10.5 KB
title description
Search
Add natural language search to your app using AI embeddings

The Modus Collections API provides a robust way to store, retrieve, and search through data using both natural language and vector-based search methods. By leveraging embeddings, developers can enable semantic and similarity-based searches, improving the relevance of search results within their applications.

For example, with natural language similarity, if you search for a product description like 'sleek red sports car', the search method returns similar product descriptions such as "luxury sports car in red" or 'high-speed car with sleek design'.

Understanding key components

Collections: a collection is a structured storage that organizes and stores textual data and associated metadata. Collections enable sophisticated search, retrieval, and classification tasks using vector embeddings.

Search Methods: a search method associated with a collection, defines how to convert collection items into a vector representation and provides indexing parameters.

Vector embeddings: for vector-based search and comparison, Modus converts each item in the collection into a vector representation called embedding. By embedding data, you enable powerful natural language and similarity-based searches.

Modus runtime automatically compute the embeddings, according to your configuration, when you add or update items.

Initializing your collection

Before implementing search, ensure you have defined a collection in the app manifest. In this example, myProducts is the collection used to store product descriptions.

First, we need to populate the collection with items (for example, product descriptions). You can insert individual or multiple items using the upsert and upsertBatch methods, respectively.

Use upsert to insert a product description into the collection. If you don't specify a key, Modus generates a unique key for you.

func AddProduct(description string) ([]string, error) {
  res, err := collections.Upsert(
    "myProducts",  // Collection name defined in the manifest
    nil,           // using nil to let Modus generate a unique ID
    description,   // the text to store
    nil            // we don't have labels for this item
    )
  if err != nil {
    return nil, err
  }
  return res.Keys, nil
}
export function addProduct(description: string): string {
  const response = collections.upsert(
    "myProducts", // Collection name defined in the manifest
    null, // using null to let Modus generate a unique ID
    description, // the text to store
    // no labels for this item
    // no namespace provided, use defautl namespace
  )
  return response.keys[0] // return the identifier of the item
}

Configure your search method

The search capability relies on a search method and embedding function. To configure your search method.

Create an embedding function

An embedding function is any API function that transforms text into vectors that represent their meaning in a high-dimensional space.

Embeddings functions must have the following signature:

package main

func Embed(text []string) ([][]float32, error) {
  ...
}
export function embed(text: string[]): f32[][] {
  ...
}

Modus computes vectors using embedding models. Here are a few examples:

Declare the model in the app manifest

  "models": {
    // model card: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
    "minilm": {
      "sourceModel": "sentence-transformers/all-MiniLM-L6-v2", // model name on the provider
      "provider": "hugging-face", // provider for this model
      "connection": "hypermode" // host where the model is running
    }
  }

Create the embedding function using the embedding model:

package main

import (
"github.com/hypermodeAI/functions-go/pkg/models"
"github.com/hypermodeAI/functions-go/pkg/models/experimental"
)

func Embed(text []string) ([][]float32, error) {
  // "minilm" is the model name declared in the application manifest
  model, err := models.GetModel[experimental.EmbeddingsModel]("minilm")
  if err != nil {
      return nil, err
  }

  input, err := model.CreateInput(text...)
  if err != nil {
      return nil, err
  }
  output, err := model.Invoke(input)
  if err != nil {
      return nil, err
  }
  return output.Predictions, nil
}
import { models } from "@hypermode/functions-as"
import { EmbeddingsModel } from "@hypermode/models-as/models/experimental/embeddings"

export function embed(texts: string[]): f32[][] {
  // "minilm" is the model name declared in the application manifest
  const model = models.getModel<EmbeddingsModel>("minilm")
  const input = model.createInput(texts)
  const output = model.invoke(input)
  return output.predictions
}
[Declare the model](./app-manifest#models) in the app manifest
  "models": {
    // model docs: https://platform.openai.com/docs/models/embeddings
    "openai-embeddings": {
      "sourceModel": "text-embedding-3-small",
      "connection": "openai",
      "path": "v1/embeddings"
    }
  },
  "connections": {
    "openai": {
      "type": "http",
      "baseUrl": "https://api.openai.com/",
      "headers": {
        "Authorization": "Bearer {{API_KEY}}"
      }
    }
  }

Create the embedding function using the embedding model:

import (
  "github.com/hypermodeAI/functions-go/pkg/models"
  "github.com/hypermodeAI/functions-go/pkg/models/experimental"
)

func Embed(texts ...string) ([][]float32, error) {
  // retrieve the model for OpenAI embeddings
  // "openai-embeddings" is the model name declared in the app manifest
  model, err := models.GetModel[openai.EmbeddingsModel]("openai-embeddings")
  if err != nil {
    return nil, fmt.Errorf("failed to get OpenAI embeddings model: %w", err)
  }

  // create input for the model using the provided texts
  input, err := model.CreateInput(texts)
  if err != nil {
    return nil, fmt.Errorf("failed to create input for OpenAI embeddings: %w", err)
  }

  // invoke the model with the generated input
  output, err := model.Invoke(input)
  if err != nil {
    return nil, fmt.Errorf("failed to invoke OpenAI embeddings model: %w", err)
  }

  // prepare the result slice based on the size of the output data
  results := make([][]float32, len(output.Data))

  // copy embeddings from output into the result slice
  for i, d := range output.Data {
    results[i] = d.Embedding
  }

  return results, nil
}
export function embed(text: string[]): f32[][] {
  const model = models.getModel<OpenAIEmbeddingsModel>("openai-embeddings")
  // "openai-embeddings" is the model name declared in the app manifest
  const input = model.createInput(text)
  const output = model.invoke(input)
  return output.data.map<f32[]>((d) => d.embedding)
}

Declare the search method

With an embedding function in place, declare a search method in the collection properties.

  "collections": {
    "myProducts": {
        "searchMethods": {
            "searchMethod1": {
                "embedder": "minilm" // embedding function name
            }
        }
    }
  }

Implement semantic similarity search

With the products stored, you can now search the collection by semantic similarity. The search] API computes an embedding for the provided text, compares it with the embeddings of the items in the collection, and returns the most similar items.

func SearchProducts(productDescription string, maxItems int) (*collections.CollectionSearchResult, error) {
  return collections.Search(myProducts, searchMethods[0], productDescription, collections.WithLimit(maxItems), collections.WithReturnText(true))
}
export function searchProducts(
  product_description: string,
  maxItems: i32,
): collections.CollectionSearchResult {
  const response = collections.search(
    "myProducts", // collection name declared in the application manifest
    "searchMethod1", // search method declared for this collection in the manifest
    product_description, // text to search for
    maxItems,
    true, //  returnText: bool, true to return the items text.
    // no namespace provide, use the default namespace
  )
  return response
}

Search result format

The search response is a CollectionSearchResult containing the following fields:

  • collection: the name of the collection.
  • status: the status of the operation.
  • objects: the search result items with their text, distance, and score values.
    • distance: a lower value indicates a closer match between the search query and the item in the collection
    • score: a higher value (closer to 1) represents a better match
{
  "collection": "myProducts",
  "status": "success",
  "objects": [
    {
      "key": "item-key-123",
      "text": "Sample product description",
      "distance": 0.05,
      "score": 0.95
    }
  ]
}

Search for similar Items

When you need to search similar items to a given item, use the searchByVector API. Retrieve the vector associated with the given item by its key, then perform a search using that vector.

func SearchSimilarProduct(productKey string, maxItems int) (*collections.CollectionSearchResult, error) {
  vec, err := collections.GetVector(
    "myProducts",
    "searchMethod1",
    productKey)

  if err != nil {
    return nil, err
  }
  return collections.SearchByVector(
    "myProducts",
    "searchMethod1",
    vec,
    collections.WithLimit(maxItems),
    collections.WithReturnText(true)
  )
}
export function searchSimilarProducts(
  productId: string,
  maxItems: i32,
): collections.CollectionSearchResult {
  const embedding_vector = collections.getVector(
    "myProducts", // Collection name defined in the manifest
    "searchMethod1", // search method declared for the collection
    productId, // key of the collection item to retrieve
  )
  // search for similar products using the embedding vector
  const response = collections.searchByVector(
    "myProducts",
    "searchMethod1",
    embedding_vector,
    maxItems,
    true, // get the product description
  )

  return response
}