Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review: Integration Tests » Fake Embeddings » ConsistentFakeEmbeddings.embed_query #28

Closed
amotl opened this issue Oct 28, 2024 · 4 comments

Comments

@amotl
Copy link

amotl commented Oct 28, 2024

def embed_query(self, text: str) -> List[float]:
"""Return consistent embeddings for the text, if seen before, or a constant
one if the text is unknown."""
if text not in self.known_texts:
return [float(1.0)] * (self.dimensionality - 1) + [float(0.0)]
return [float(1.0)] * (self.dimensionality - 1) + [
float(self.known_texts.index(text))
]

This spot has been resolved differently. We will need to check if that will be fine. Otherwise, we will need to use the previous code on behalf of a separate FakeEmbeddings fixture.

Originally posted by @amotl in #1 (comment)

/cc @kneth

@amotl amotl changed the title Integration Tests » Fake Embeddings » ConsistentFakeEmbeddings.embed_query Review: Integration Tests » Fake Embeddings » ConsistentFakeEmbeddings.embed_query Oct 28, 2024
@amotl
Copy link
Author

amotl commented Oct 28, 2024

If the patch still needs those adjustments outlined at #1 (review), and I guess it will, they will need to be vendorized before submitting this patch to upstream LangChain.

Just reviewing the code base vs. the patch, and where the adjustment is located at libs/langchain/tests/integration_tests/cache/fake_embeddings.py in langchain-core, and not finding any references to it from our code on a quick search, I think indeed it is a relic of the past, and is probably not needed any longer.

Instead, test_cratedb.py imports ConsistentFakeEmbeddings from tests.integration_tests.vectorstores.fake_embeddings in langchain-community.

from tests.integration_tests.vectorstores.fake_embeddings import (
    ConsistentFakeEmbeddings,
    FakeEmbeddings,
)

That code looks like this:

class ConsistentFakeEmbeddings(FakeEmbeddings):
"""Fake embeddings which remember all the texts seen so far to return consistent
vectors for the same texts."""
def __init__(self, dimensionality: int = 10) -> None:
self.known_texts: List[str] = []
self.dimensionality = dimensionality
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Return consistent embeddings for each text seen so far."""
out_vectors = []
for text in texts:
if text not in self.known_texts:
self.known_texts.append(text)
vector = [float(1.0)] * (self.dimensionality - 1) + [
float(self.known_texts.index(text))
]
out_vectors.append(vector)
return out_vectors
def embed_query(self, text: str) -> List[float]:
"""Return consistent embeddings for the text, if seen before, or a constant
one if the text is unknown."""
return self.embed_documents([text])[0]

@amotl
Copy link
Author

amotl commented Oct 28, 2024

Hm. Checking where our change is coming from, it is 9dfc828.

Indeed, in current LangChain, there are two folders that include a fake_embeddings.py file, in turn including a ConsistentFakeEmbeddings. At least, their implementation is the same in upstream LangChain.

@amotl
Copy link
Author

amotl commented Oct 28, 2024

I haven't been able to spot any regressions in running the test cases after removing the code in question with 41f6462. Integration tests in both langchain-core and langchain-community still succeed.

@amotl amotl closed this as completed Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant