Temperature Attribute in LangchainLLMWrapper Ignored, Causing Lack of Diversity in Generated Questions for Answer Relevance #1812

galrettig · 2025-01-06T10:13:45Z

[X] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
The bug is simple to explain and reproduce but may require a design decision to solve properly. In the Answer-Relevance class, all generated questions are identical, which doesn't make much sense as we would want to compute the mean cosine similarity from many generated questions (which are diverse) to the original query. This issue occurs because the LangchainLLMWrapper/BaseRagasLLM class does not account for the temperature preset in the LangChain OpenAI class.

In the BaseRagasLLM.generate/generate_text function, the temperature attribute of the passed LLM is not checked, causing the temperature to default to a value close to 0. This results in identical generated questions, which contradicts the expected behavior of diversity in the generated questions for answer relevance.

Where the Bug First Occurs

The bug manifests in the _ascore function of the answer-relevance class. A naive solution that demonstrates the issue involves explicitly passing the temperature from the LangchainLLMWrapper object to the generate function:

response = await self.question_generation.generate(
    data=prompt_input,
    llm=self.llm,
    temperature=self.llm.langchain_llm.temperature,
    callbacks=callbacks,
)

While this resolves the issue, it is not an elegant or scalable solution because it redundantly overrides a property that should ideally be encapsulated within the wrapper itself.

Related Commit

This issue is related to a previously closed issue addressed in this commit. However, the commit overwrites the temperature of the langchain_llm object without addressing the root problem of ensuring that the preset temperature is respected.

Proposed Solution

Modify LangchainLLMWrapper or BaseRagasLLM

Encapsulate the logic for handling temperature within the wrapper itself:

class LangchainLLMWrapper(BaseRagasLLM):
    def __init__(self, llm):
        self.llm = llm
        self.temperature = getattr(llm, "temperature", None)

    async def generate(self, prompt, n=1, temperature=None, **kwargs):
        # Use the passed temperature or fall back to the LLM's default
        effective_temperature = temperature if temperature is not None else self.temperature
        return await self.llm.generate(prompt, n=n, temperature=effective_temperature, **kwargs)

Refactor the generate_multiple Method

Ensure temperature is propagated consistently in PydanticPrompt:

async def generate_multiple(
    self,
    llm: BaseRagasLLM,
    data: InputModel,
    n: int = 1,
    temperature: t.Optional[float] = None,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.Optional[Callbacks] = None,
    retries_left: int = 3,
) -> t.List[OutputModel]:
    # Use temperature from the wrapper
    resp = await llm.generate(
        prompt_value,
        n=n,
        temperature=temperature,  # Wrapper handles default fallback
        stop=stop,
        callbacks=callbacks,
    )
    # Rest of the logic remains the same...

Advantages of the Proposed Solution
1. Encapsulation: By moving temperature handling into the wrapper, the logic in higher-level components becomes simpler and more modular.
2. Flexibility: This approach respects user-defined defaults and allows overrides when needed.
3. Readability: Redundant checks and assignments are removed, making the code cleaner and more maintainable.

Ragas version: 0.2.9
Python version: 3.9

Code to Reproduce

from ragas.metrics import ResponseRelevancy
from ragas.llms import LangchainLLMWrapper
from ragas.dataset_schema import SingleTurnSample
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o", temperature=1.0))

sample_to_evaluate = SingleTurnSample()  # Create a SingleTurnSample object with some real data...
evaluator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings(model="text-embedding-3-large", dimensions=3072))
scorer = ResponseRelevancy(llm=evaluator_llm, embeddings=evaluator_embeddings)
answer_relevance_metric_score = scorer.single_turn_score(sample_to_evaluate)

To reproduce: Put a breakpoint in the _ascore function at the line:
response = await self.question_generation.generate
Observe that the temperature set on the LangChain LLM is ignored.

Error trace
N/A – No error trace is produced, but the lack of diversity in the generated questions demonstrates the issue.

Expected behavior
Generated questions in the Answer-Relevance flow should exhibit diversity and shouldn't be generate identically N times, with the temperature used in generation reflecting the preset temperature of the LLM object.

Additional context
Setting the temperature to high value like 1 might create diverse questions but also my make the non-commital part of the answer relevance prompt more random, in this case we could split it into two functionalities one with low temperature for the commital test and if that passes the answer generation could be used with an higher temperature set, in this case the it might make more sense for the answer-relevance class to have two properties of temperature and to send them himself further for each case

Let me know what you think of this issue and how you would like to solve it, i don't mind creating a PR with a solution for it, when we have a solution agreed on

The text was updated successfully, but these errors were encountered:

galrettig added the bug Something isn't working label Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temperature Attribute in LangchainLLMWrapper Ignored, Causing Lack of Diversity in Generated Questions for Answer Relevance #1812

Temperature Attribute in LangchainLLMWrapper Ignored, Causing Lack of Diversity in Generated Questions for Answer Relevance #1812

galrettig commented Jan 6, 2025 •

edited

Loading

Temperature Attribute in LangchainLLMWrapper Ignored, Causing Lack of Diversity in Generated Questions for Answer Relevance #1812

Temperature Attribute in LangchainLLMWrapper Ignored, Causing Lack of Diversity in Generated Questions for Answer Relevance #1812

Comments

galrettig commented Jan 6, 2025 • edited Loading

galrettig commented Jan 6, 2025 •

edited

Loading