Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why is answer_relevancy always nan? #1814

Open
LillyChen opened this issue Jan 6, 2025 · 5 comments
Open

why is answer_relevancy always nan? #1814

LillyChen opened this issue Jan 6, 2025 · 5 comments
Labels
module-metrics this is part of metrics module question Further information is requested

Comments

@LillyChen
Copy link

LillyChen commented Jan 6, 2025

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question
why is answer_relevancy always nan?

** output**
{'answer_relevancy': nan, 'faithfulness': 0.2308}

Code Examples

def eval_run()
  answer_relevancy = AnswerRelevancy()
  faithfulness = Faithfulness()

  dataset = [{
        "query": query,  # 问题
        "retrieved_contexts": [context],  # 上下文
        "response": answer  # 答案
    }]

    # 将 evalsets 列表转换为 Dataset 对象
    ds_case = Dataset.from_list(dataset)
    # 获取结果  获取上下文
    # 开始评测
    column_map = {"user_input": "query", "retrieved_contexts": "retrieved_contexts", "response": "response"}
    metrics = [answer_relevancy, faithfulness]
    return self.eval(dataset, column_map, metrics)


def eval(self, dataset, column_map, metrics):
    result = evaluate(
        dataset=dataset,
        metrics=metrics,
        llm=self.llm,
        embeddings=self.embeddings,
        column_map=column_map,
    )
    print(result)
    print(result.scores)
    df = result.to_pandas()
    print(df.head())
    return result
@LillyChen LillyChen added the question Further information is requested label Jan 6, 2025
@dosubot dosubot bot added the module-metrics this is part of metrics module label Jan 6, 2025
@jjmachan
Copy link
Member

jjmachan commented Jan 7, 2025

hey @LillyChen which model are you using?

image

Do you have any tracing setup? what you can do is easily setup tracing and check if these are the reasons for errors

@LillyChen
Copy link
Author

hey @LillyChen which model are you using?

image

Do you have any tracing setup? what you can do is easily setup tracing and check if these are the reasons for errors

     os.environ["OPENAI_API_KEY"] = "sk-"
    self.llm = LangchainLLMWrapper(ChatOpenAI(
        openai_api_key="sk-",
        openai_api_base="https://dashscope.aliyuncs.com/compatible-mode/v1",
        model_name="qwen-plus",
        temperature=0.7,
        top_p=0.8,
    ))
    os.environ['DASHSCOPE_API_KEY'] = 'sk-'
    self.embeddings = LangchainEmbeddingsWrapper(DashScopeEmbeddings())

@jjmachan
Copy link
Member

jjmachan commented Jan 7, 2025

got it - i could be the model you are using, can you check the traces?

have you setup any tracing tools?

@LillyChen
Copy link
Author

got it - i could be the model you are using, can you check the traces?

have you setup any tracing tools?

What tracing tools should be used for tracking? Please recommend it. Thank you

@jjmachan
Copy link
Member

jjmachan commented Jan 7, 2025

https://docs.arize.com/phoenix is an OSS one available that you could try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module-metrics this is part of metrics module question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants