Name		Name	Last commit message	Last commit date
parent directory ..
config		config
docker		docker
images		images
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml

README.md

Similarity Search with Apache Pinot Vector Index

Build a Pre-released version of Apache Pinot

# Clone a repo
git clone https://github.com/apache/pinot.git
cd pinot

# Build Pinot
mvn clean install -DskipTests -Pbin-dist

# Run the Quick Demo
cd build/
bin/quick-start-batch.sh

Fine Food Reviews Example

Apache Pinot comes with a built-in table with embeddings of reviews of fine foods. The embeddings were created using the text-embedding-ada-002 from OpenAI.

To perform a similarity search, we will need to use the same model to generate an embedding from our search query.

Search queries using embeddings aren't convenient to author in a SQL editor. Embeddings are high dimensional vectors (arrays) that aren't easy to type. We want to use a model to convert unstructured data into an embedding. Then, set that embedding into the SQL statement.

Run the example using Python below. The application will prompt you for a search query of the reviews. We suggest this query: tomato soup.

$ python fine_food_reviews.py
what do you want to eat? tomato soup
['B0046H30M8', 'great soup', 4, 0.5498560408357057]
['B000LKTTTW', 'Best tomato soup', 5, 0.5560827995927847]
['B0042WXFJU', 'Tasty, but....', 4, 0.5712535523938602]
['B001NGAT9W', 'A Hit!', 5, 0.5916323445185989]
['B0058CGLH6', "If you like Campbells Pepper pot soup then don't buy this!", 1, 0.5929770105173966]
['B0005Z7GMA', 'Mrs. Dash Tomato Basil Garlic', 5, 0.5954086280399798]

from pinotdb import connect
from openai import OpenAI

model = 'text-embedding-ada-002'
search = input("what do you want to eat? ")

client = OpenAI()

def get_embedding(text, model=model):
   text = text.replace("\n", " ")
   return client.embeddings.create(input = [text], model=model).data[0].embedding

search_embedding = get_embedding(search)

conn = connect(host='localhost', port=8000, path='/query/sql', scheme='http')
curs = conn.cursor()
curs.execute(f"""
SELECT 
  ProductId, 
  Summary, 
  Score,
  l2_distance(embedding, ARRAY{search_embedding}) AS l2_dist
from fineFoodReviews
where VECTOR_SIMILARITY(embedding, ARRAY{search_embedding}, 5)
order by l2_dist asc
""")

for row in curs:
    print(row)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vector

vector

README.md

Similarity Search with Apache Pinot Vector Index

Build a Pre-released version of Apache Pinot

Fine Food Reviews Example

Image Search Example

Build Segments

Load Segments

Search For Images

Files

vector

Directory actions

More options

Directory actions

More options

Latest commit

History

vector

Folders and files

parent directory

README.md

Similarity Search with Apache Pinot Vector Index

Build a Pre-released version of Apache Pinot

Fine Food Reviews Example

Image Search Example

Build Segments

Load Segments

Search For Images