-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Allow the embedding model and dimensions to be changed per-source
- Loading branch information
1 parent
e7270e1
commit d453904
Showing
12 changed files
with
205 additions
and
120 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
-- This script creates the embedding tables for one source. | ||
|
||
-- Required format string placeholders: | ||
-- (Repeated 4 times) source ID (string) | ||
-- vector size (integer) | ||
|
||
-- Why use separate tables for each source? | ||
-- * Faster query times when there are many sources with lots of embeddings that aren't included in the user's query | ||
-- * More accurate `k` limit when there are many sources that aren't included in the query | ||
-- * In the future, different sources could use different embedding sources with different vector sizes | ||
|
||
CREATE TRIGGER IF NOT EXISTS pages_refresh_vector_embeddings_%s AFTER UPDATE ON pages | ||
WHEN old.url != new.url OR old.title != new.title OR old.description != new.description OR old.content != new.content BEGIN | ||
-- If the page has associated vector embeddings, they must be recomputed when the text changes | ||
DELETE FROM pages_vec_%s WHERE id IN (SELECT * FROM vec_chunks WHERE page = old.id); | ||
END; | ||
|
||
CREATE TRIGGER IF NOT EXISTS delete_embedding_on_delete_chunk_%s AFTER DELETE ON vec_chunks BEGIN | ||
DELETE FROM pages_vec_%s WHERE id = old.id; | ||
END; | ||
|
||
CREATE VIRTUAL TABLE IF NOT EXISTS pages_vec_%s USING vec0( | ||
id INTEGER PRIMARY KEY, | ||
embedding FLOAT[%d] distance_metric=cosine | ||
); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.