Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add tfidf #38

Merged
merged 45 commits into from
Jan 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
e88a0c2
added tfidf
digriffiths Dec 18, 2023
4394759
update sparse_encoder if statements to type
digriffiths Dec 18, 2023
f506995
Merge branch 'main' into add_tfidf
digriffiths Dec 19, 2023
4d3ba4d
fixed tests
digriffiths Dec 19, 2023
2eb748e
Merge branch 'add_tfidf' of https://github.com/digriffiths/semantic-r…
digriffiths Dec 19, 2023
724480c
linter used, tests changed to include sparse and dense encoders
digriffiths Dec 19, 2023
99a0983
removed sci-kit learn, added tests
digriffiths Dec 20, 2023
2ae56f5
added text preprocessing
digriffiths Dec 20, 2023
fb5da0c
revert
digriffiths Dec 20, 2023
d039042
original coverage
digriffiths Dec 20, 2023
8c79849
schemas in seperate files due to circular import
digriffiths Dec 21, 2023
56f0465
Merge remote-tracking branch 'upstream/main' into add_tfidf
digriffiths Jan 3, 2024
82f3147
updated with sparse/dense encoder
digriffiths Jan 3, 2024
112201e
Route path updated
digriffiths Jan 3, 2024
85b87f3
seperate schema files removed
digriffiths Jan 3, 2024
99f8c8d
dense_encoder instead of encoder
digriffiths Jan 3, 2024
975560c
schema file restored
digriffiths Jan 3, 2024
86eb341
Route path updated
digriffiths Jan 3, 2024
acf01fd
base encoder mocker path added
digriffiths Jan 3, 2024
a8c64a6
schema path updated
digriffiths Jan 3, 2024
7325834
removed none types for mypy
digriffiths Jan 4, 2024
48b64dc
added hasattr check for mypy
digriffiths Jan 4, 2024
6952c56
fix: updated with sparse/dense encoder
digriffiths Jan 3, 2024
84d40fd
fix: Route path updated
digriffiths Jan 3, 2024
b83bfc6
refactor: seperate schema files removed
digriffiths Jan 3, 2024
ec77a1a
fix: dense_encoder instead of encoder
digriffiths Jan 3, 2024
b87a491
fix: schema file restored
digriffiths Jan 3, 2024
2048019
fix: Route path updated
digriffiths Jan 3, 2024
bbd0618
fix: base encoder mocker path added
digriffiths Jan 3, 2024
c340e08
fix: schema path updated
digriffiths Jan 3, 2024
f94529e
fixed: removed none types for mypy
digriffiths Jan 4, 2024
dd9f07a
fix: added hasattr check for mypy
digriffiths Jan 4, 2024
c760313
Merge branch 'add_tfidf' of https://github.com/digriffiths/semantic-r…
digriffiths Jan 5, 2024
698c20a
Merge branch 'main' into add_tfidf
digriffiths Jan 5, 2024
422f5f7
fix: added default values to class variables
digriffiths Jan 5, 2024
6bd5da3
fix: black lint reformat hybrid layer
digriffiths Jan 5, 2024
0f1ecf7
fix: added default values to tfidf tests
digriffiths Jan 5, 2024
d6c421e
Merge branch 'add_tfidf' of https://github.com/digriffiths/semantic-r…
digriffiths Jan 5, 2024
d96c552
fix: created embedding helper functions
digriffiths Jan 8, 2024
138d540
feat: added more tfidf tests
digriffiths Jan 8, 2024
3698b07
feat: added add route tfidf test
digriffiths Jan 8, 2024
f22224c
Merge remote-tracking branch 'upstream/main' into add_tfidf
digriffiths Jan 8, 2024
aede7ae
Merge branch 'main' into add_tfidf
jamescalam Jan 12, 2024
7679e26
fixes
jamescalam Jan 12, 2024
c651b4a
lint
jamescalam Jan 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .env.example
Original file line number Diff line number Diff line change
@@ -1 +1 @@
COHERE_API_KEY=
COHERE_API_KEY=
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
SOFTWARE.
88 changes: 61 additions & 27 deletions docs/examples/hybrid-layer.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,53 +4,53 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Semantic Router: Hybrid Layer"
"# Semantic Router: Hybrid Layer\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Hybrid Layer in the Semantic Router library can improve making performance particularly for niche use-cases that contain specific terminology, such as finance or medical. It helps us provide more importance to making based on the keywords contained in our utterances and user queries."
"The Hybrid Layer in the Semantic Router library can improve making performance particularly for niche use-cases that contain specific terminology, such as finance or medical. It helps us provide more importance to making based on the keywords contained in our utterances and user queries.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting Started"
"## Getting Started\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We start by installing the library:"
"We start by installing the library:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"!pip install -qU semantic-router==0.0.11"
"#!pip install -qU semantic-router==0.0.11"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We start by defining a dictionary mapping s to example phrases that should trigger those s."
"We start by defining a dictionary mapping s to example phrases that should trigger those s.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from semantic_router.schema import Route\n",
"from semantic_router.route import Route\n",
"\n",
"politics = Route(\n",
" name=\"politics\",\n",
Expand All @@ -69,7 +69,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's define another for good measure:"
"Let's define another for good measure:\n"
]
},
{
Expand All @@ -81,7 +81,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
Expand Down Expand Up @@ -114,58 +114,92 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we initialize our embedding model:"
"Now we initialize our embedding model:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from semantic_router.encoders import CohereEncoder\n",
"from semantic_router.encoders import CohereEncoder, BM25Encoder, TfidfEncoder\n",
"from getpass import getpass\n",
"\n",
"os.environ[\"COHERE_API_KEY\"] = os.environ[\"COHERE_API_KEY\"] or getpass(\n",
" \"Enter Cohere API Key: \"\n",
")\n",
"\n",
"encoder = CohereEncoder()"
"dense_encoder = CohereEncoder()\n",
"# sparse_encoder = BM25Encoder()\n",
"sparse_encoder = TfidfEncoder()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we define the `RouteLayer`. When called, the route layer will consume text (a query) and output the category (`Route`) it belongs to — to initialize a `RouteLayer` we need our `encoder` model and a list of `routes`."
"Now we define the `RouteLayer`. When called, the route layer will consume text (a query) and output the category (`Route`) it belongs to — to initialize a `RouteLayer` we need our `encoder` model and a list of `routes`.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[32m2024-01-08 16:50:29 INFO semantic_router.utils.logger Creating embeddings for all routes...\u001b[0m\n"
]
}
],
"source": [
"from semantic_router.hybrid_layer import HybridRouteLayer\n",
"\n",
"dl = HybridRouteLayer(encoder=encoder, routes=routes)"
"dl = HybridRouteLayer(\n",
" dense_encoder=dense_encoder, sparse_encoder=sparse_encoder, routes=routes\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'politics'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dl(\"don't you love politics?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'chitchat'"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dl(\"how's the weather today?\")"
]
Expand All @@ -174,7 +208,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"---"
"---\n"
]
}
],
Expand Down
Loading
Loading