Review: Integration Tests » Fake Embeddings » ConsistentFakeEmbeddings.embed_query #28

amotl · 2024-10-28T20:12:06Z

langchain/libs/langchain/tests/integration_tests/cache/fake_embeddings.py

Lines 53 to 60 in 8b278a8

    
               def embed_query(self, text: str) -> List[float]: 
        
                   """Return consistent embeddings for the text, if seen before, or a constant 
        
                   one if the text is unknown.""" 
        
                   if text not in self.known_texts: 
        
                       return [float(1.0)] * (self.dimensionality - 1) + [float(0.0)] 
        
                   return [float(1.0)] * (self.dimensionality - 1) + [ 
        
                       float(self.known_texts.index(text)) 
        
                   ]

This spot has been resolved differently. We will need to check if that will be fine. Otherwise, we will need to use the previous code on behalf of a separate FakeEmbeddings fixture.

Originally posted by @amotl in #1 (comment)

/cc @kneth

The text was updated successfully, but these errors were encountered:

amotl · 2024-10-28T20:27:07Z

If the patch still needs those adjustments outlined at #1 (review), and I guess it will, they will need to be vendorized before submitting this patch to upstream LangChain.

Just reviewing the code base vs. the patch, and where the adjustment is located at libs/langchain/tests/integration_tests/cache/fake_embeddings.py in langchain-core, and not finding any references to it from our code on a quick search, I think indeed it is a relic of the past, and is probably not needed any longer.

Instead, test_cratedb.py imports ConsistentFakeEmbeddings from tests.integration_tests.vectorstores.fake_embeddings in langchain-community.

from tests.integration_tests.vectorstores.fake_embeddings import (
    ConsistentFakeEmbeddings,
    FakeEmbeddings,
)

That code looks like this:

langchain/libs/community/tests/integration_tests/vectorstores/fake_embeddings.py

Lines 33 to 56 in 6d524e9

    
           class ConsistentFakeEmbeddings(FakeEmbeddings): 
        
               """Fake embeddings which remember all the texts seen so far to return consistent 
        
               vectors for the same texts.""" 
        
               def __init__(self, dimensionality: int = 10) -> None: 
        
                   self.known_texts: List[str] = [] 
        
                   self.dimensionality = dimensionality 
        
               def embed_documents(self, texts: List[str]) -> List[List[float]]: 
        
                   """Return consistent embeddings for each text seen so far.""" 
        
                   out_vectors = [] 
        
                   for text in texts: 
        
                       if text not in self.known_texts: 
        
                           self.known_texts.append(text) 
        
                       vector = [float(1.0)] * (self.dimensionality - 1) + [ 
        
                           float(self.known_texts.index(text)) 
        
                       ] 
        
                       out_vectors.append(vector) 
        
                   return out_vectors 
        
               def embed_query(self, text: str) -> List[float]: 
        
                   """Return consistent embeddings for the text, if seen before, or a constant 
        
                   one if the text is unknown.""" 
        
                   return self.embed_documents([text])[0]

amotl · 2024-10-28T20:41:53Z

Hm. Checking where our change is coming from, it is 9dfc828.

Indeed, in current LangChain, there are two folders that include a fake_embeddings.py file, in turn including a ConsistentFakeEmbeddings. At least, their implementation is the same in upstream LangChain.

amotl · 2024-10-28T20:48:39Z

We can see that some integration tests use vanilla FakeEmbeddings ...

... and others are providing their custom ones:

In this spirit, I think it will be good enough to also provide our custom ones.

amotl · 2024-10-28T20:54:56Z

I haven't been able to spot any regressions in running the test cases after removing the code in question with 41f6462. Integration tests in both langchain-core and langchain-community still succeed.

amotl changed the title ~~Integration Tests » Fake Embeddings » ConsistentFakeEmbeddings.embed_query~~ Review: Integration Tests » Fake Embeddings » ConsistentFakeEmbeddings.embed_query Oct 28, 2024

amotl mentioned this issue Oct 28, 2024

Add support for CrateDB to LangChain LLM framework #1

Draft

4 tasks

amotl closed this as completed Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review: Integration Tests » Fake Embeddings » ConsistentFakeEmbeddings.embed_query #28

Review: Integration Tests » Fake Embeddings » ConsistentFakeEmbeddings.embed_query #28

amotl commented Oct 28, 2024 •

edited

Loading

amotl commented Oct 28, 2024

amotl commented Oct 28, 2024

amotl commented Oct 28, 2024 •

edited

Loading

amotl commented Oct 28, 2024

Review: Integration Tests » Fake Embeddings » ConsistentFakeEmbeddings.embed_query #28

Review: Integration Tests » Fake Embeddings » ConsistentFakeEmbeddings.embed_query #28

Comments

amotl commented Oct 28, 2024 • edited Loading

amotl commented Oct 28, 2024

amotl commented Oct 28, 2024

amotl commented Oct 28, 2024 • edited Loading

amotl commented Oct 28, 2024

amotl commented Oct 28, 2024 •

edited

Loading

amotl commented Oct 28, 2024 •

edited

Loading