-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequence[string] #1844
Comments
@dosue |
@dosu |
RAGAS version 0.1.7 expects the context input to be in the format of a This is a bit tricky, @jjmachan, could you lend your expertise on whether RAGAS specifically requires a flattened format for the context? To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
This issue seems related to #1807. To assist you more effectively, I’ve replicated your described data format in the same development environment (RAGAS version 0.1.7), and everything appears to work fine on my end. However, to expedite troubleshooting and ensure we can address your specific issue accurately, it would be extremely helpful if you could share a portion of your dataset or provide the full code you used. from ragas import evaluate
import datasets
from ragas.metrics import (
answer_relevancy,
faithfulness,
context_recall,
context_precision,
)
sample = {
"contexts": [["context1_chunk1", "context1_chunk2"], ["context2_chunk1"]],
"question": ["question1", "question2"],
"answer": ["answer1", "answer2"]
}
dt = datasets.Dataset.from_dict(a)
result = evaluate(
dt,
metrics=[
faithfulness,
answer_relevancy,
],
)
| | contexts | question | answer | faithfulness | answer_relevancy |
|---|------------------------------------|------------|----------|--------------|------------------|
| 1 | [context1_chunk1, context1_chunk2] | question1 | answer1 | NaN | 0.859496 |
| 2 | [context2_chunk1] | question2 | answer2 | NaN | 0.000000 | |
@hundredeuk2 the full code: import json import numpy as np from langchain_core.outputs import LLMResult, Generation from ragas.llms.prompt import PromptValue current_script_path = Path(file).resolve() Load the ground truth datafile_path= os.path.join('src','assets','GT.json') Load the question and the answer and the chunksfile_path= os.path.join('src','assets','user_llm_interaction_embeddings_c1521dd5_b819_4241_b3a4_3e5c1388037c.json') #creating a dataset of str type Iterate through each dictionary in your data#for item in ground_truth_data: Check the length of context to see if anything was appended#print(f"Number of context entries extracted: {len(context)}") Iterate through each dictionary in your datafor item in llm: Check the length of context to see if anything was appended""" print(f"Number of context entries extracted: {len(context)}") """ Replace the IDs with the corresponding contentchunk_embeddings = [] for sublist in context: Initialize empty lists for datasetnew_ragas_dataset = { Assuming question, context, and answer lists are already availablefor i in range(len(question)):
Print to verify the format#print(f"Dataset length: {len(new_ragas_dataset['question'])}") Initialize an empty list to hold the new datasetdata_set = [] Iterate through the list and combine every two dictionariesfor i in range(0, len(llm), 2): #for j in llm: #def map_chunks(data_set, ground_truth_data):
#data_set = map_chunks(data_set, ground_truth_data) Assuming data_set is a list of dictionariesragas_data = [ Create the required structure that structures the data for the Hugging Face Dataset creation.formatted_data = { Define the column_map to match custom columns to expected onescolumn_map = { Create a Dataset using the Hugging Face
|
@hundredeuk2 |
While reviewing the evaluation process, it appears that the variable This suggests there might be an issue with retrieving or processing the documents during the refinement stage. It would be helpful to double-check if something is being missed or mishandled in that step. Additionally, since the print(new_ragas_dataset)
{'question': ['Quali sono gli errori del macchinario futura, riguardanti la tensione a 3V?',
"La macchina futura prevede qualche specifica per la conduttività dell'acqua?",
'Quali sono gli errori che mi segnalano questi problemi?'],
'contexts': [[], [], []],
'answer': ['Gli errori riguardanti la tensione a 3V per il macchinario Futura sono i seguenti:\n\n1. **E306**: Tensione 3.3V sotto il limite\n - Verificare le connessioni in uscita\n - Sostituire la scheda idrico\n\n2. **E307**: Tensione 3.3V interno sopra il limite\n - Verificare le connessioni in uscita\n - Sostituire la scheda idrico\n\n3. **E716**: Tensione 3.3V sotto il limite\n - Verificare connessioni in uscita\n - Sostituire scheda base tavoletta\n\n4. **E717**: Tensione 3.3V interno sopra il limite\n - Verificare connessioni in uscita\n - Sostituire scheda base tavoletta\n\n5. **E449**: Tensione 3.3V sotto il limite\n - Verificare i collegamenti in uscita\n - Sostituire la scheda poltrona\n\n6. **E450**: Tensione 3.3V interno sopra il limite\n - Verificare i collegamenti in uscita\n - Sostituire la scheda poltrona\n\n7. **E116**: Tensione 3.3V sotto il limite\n - Verificare le connessioni in uscita\n - Sostituire la scheda base tavoletta medico\n\n8. **E117**: Tensione 3.3V interno sopra il limite\n - Verificare le connessioni in uscita\n - Sostituire la scheda base tavoletta medico\n\n9. **E224**: Tensione 3.3V sotto il limite\n - Verificare le connessioni in uscita\n - Sostituire la scheda idrico\n\n10. **E225**: Tensione 3.3V interno sopra il limite\n - Verificare le connessioni in uscita\n - Sostituire la scheda idrico',
"Sì, la macchina Futura prevede specifiche precise per la conduttività dell'acqua utilizzata.\n\nEcco i punti salienti relativi alla conduttività dell'acqua:\n\n1. **Conduttività Massima:** L'acqua demineralizzata distillata utilizzata per l'alimentazione della sterilizzatrice dovrebbe avere una **conduttività inferiore a 15 μS/cm**.\n2. **Sensore di Conduttività:** La sterilizzatrice è dotata di un sensore di conduttività che verifica automaticamente la qualità dell'acqua trattata dal filtro demineralizzatore integrato.\n3. **Allarme Conduttività:** Se i valori di conduttività superano i 20 μS/cm, un allarme segnalerà la necessità di procedere alla sostituzione della cartuccia di resina a scambio ionico.\n4. **Rapporto Ciclo:** La misura di conducibilità dell’acqua viene effettuata automaticamente ad ogni avvio del ciclo di sterilizzazione o di test e il valore relativo è riportato sul rapporto ciclo.\n\nL'utilizzo di acqua con conduttività superiore ai limiti indicati può causare danni al macchinario e compromettere l'efficacia della sterilizzazione, incrementando anche il rischio di ossidazione e la formazione di residui calcarei.",
"Gli errori che possono segnalare problemi relativi alla conduttività dell'acqua sono generalmente associati a malfunzionamenti del sistema idrico o delle sonde di livello. Tuttavia, in base ai documenti forniti, non sembra esserci un errore specifico che segnali direttamente un problema di conduttività dell'acqua. Gli errori che potrebbero indirettamente indicare problemi legati alla qualità dell'acqua o alla conduttività sono:\n\n1. **W64**: **Manca H₂O, riempire serbatoio 2**\n - Azioni: Riempire il serbatoio H₂O₂, verificare la scheda igiene, verificare la sonda di livello.\n\n2. **W212**: **Serbatoio disinfettante pieno**\n - Possibile implicazione di conduttività alta se il serbatoio pieno impedisce il normale funzionamento.\n\n3. **E55**: **S1 livello sonde incongruente Max = on Min = off**\n - Incongruenza sonde serbatoio 1 W.H.E.: Verificare lo stato della scheda igiene, verificare le sonde, verificare lo stato dei led della scheda igiene.\n\n4. **E57**: **S2 Livello sonde incongruente Max = on Min = off**\n - Incongruenza sonde serbatoio 2 del sistema W.H.E.: Verificare lo stato della scheda igiene, verificare le sonde, verificare lo stato dei led della scheda igiene.\n\n5. **E60**: **Durante il funzionamento normale del riunito, lettura di sonda di FULL coperta**\n - Tentare la procedura di svuotamento del sistema W.H.E.. Verificare la presenza di trafilamenti acqua nel sistema W.H.E..\n\n6. **E67**: **Anomalia della sonda di massimo serbatoio 1**\n - Serbatoio 1 eliminato per sonda STOP attiva: Verificare l'ingresso acqua al sistema W.H.E., verificare le sonde del serbatoio W.H.E.\n\n7. **E68**: **Anomalia della sonda di massimo serbatoio 2**\n - Serbatoio 2 eliminato per sonda STOP attiva: Verificare l'ingresso acqua al sistema W.H.E., verificare le sonde del serbatoio W.H.E.\n\nIn caso emerga uno di questi errori, potrebbe essere utile controllare anche la qualità dell'acqua utilizzata per assicurarsi che rientri nei parametri specificati di conduttività per evitare ulteriori problemi di funzionamento del macchinario Futura."]} |
@hundredeuk2 second: the new new_ragas_dataset is my textual dataset I have also another dataset related to the embeddings in the same format to not use API calls for evaluation. i think maybe I have to make some changes in the subclass of customragasllm.what do you think? |
@dosu Could you tell me about this and give me a solution for this? |
To resolve the issue of the
By following these steps, you should be able to identify and resolve the issue with the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@hundredeuk2 |
I might not fully understand the exact process you are attempting, but from the perspective of RAGAS Evaluation, your provided code cannot work without The core issue seems to stem from the fact that the Based on the code snippet you shared: chunk_embeddings = []
chunk_string = []
for sublist in context:
strings = []
embeddings = []
for idx, i in enumerate(sublist):
for item in ground_truth_data:
if item['id'] == i:
strings.append(item['content'])
embeddings.append(item['text_vector'])
chunk_embeddings.append(embeddings)
chunk_string.append(strings) The issue seems to arise because nothing is being appended to That said, if creating |
@dosu |
If you already have the question, context, and answer, and you want to avoid using API calls, you don't necessarily need to subclass Here are some steps to consider:
By following these steps, you can effectively use your existing data for evaluation without relying on API calls. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
If you do not need to generate text or modify the LLM, and you want to avoid API calls, you might not need to subclass However, if you encountered an error related to setting the LLM, it might be due to the framework expecting an LLM instance even if it's not used for text generation. In such cases, you can provide a minimal implementation of Here's a basic example of how you might implement this: class MinimalRagasLLM(BaseRagasLLM):
async def _call(self, prompt: str) -> str:
return "No API call needed"
async def generate_text(self, prompt, n=1, temperature=1e-8, stop=None, callbacks=[]):
return LLMResult(generations=[[Generation(text="No API call needed")]])
async def agenerate_text(self, prompt, n=1, temperature=1e-8, stop=None, callbacks=[]):
return LLMResult(generations=[[Generation(text="No API call needed")]]) This way, you satisfy the framework's requirement for an LLM without making any API calls. Ensure your dataset is correctly formatted and that your embeddings are properly integrated to avoid other potential issues. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@hundredeuk2 import json from ragas.llms.prompt import PromptValue data_set ={ embedding_dataset = { } features = Features({ Create a Dataset using the Hugging Face
|
@dosu |
The "RuntimeWarning: Mean of empty slice" error occurs when attempting to calculate the mean of an empty array. In the context of the RAGAS evaluation framework, this can happen if the scores for a particular context are not being populated correctly. To resolve this issue, you can use the def safe_nanmean(arr: t.List[float]) -> float:
if len(arr) == 0:
return np.nan # or some other value or behavior for empty arrays
arr_numpy = np.asarray(arr) # Ensure input is a numpy array
if np.isnan(arr_numpy).all():
return np.nan # or some other value or behavior for all-NaN arrays
return float(np.nanmean(arr_numpy)) Replace any direct calls to To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
Yes, the "RuntimeWarning: Mean of empty slice" error can indeed be caused by malformed data or incorrect formatting passed to the To address this, ensure that:
By ensuring the data is correctly formatted and validated, you can prevent such warnings and ensure the evaluation process runs smoothly. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
In your Here's a brief overview of how the prompt might be handled:
If you want to trace where the prompt is being modified, you can add debugging statements in these methods to print the prompt at different stages of processing. This will help you identify where any changes are being made. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
Yes, the 'context_utilization' metric can be used without having the ground truth in the dataset. The To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu class PromptValue(BasePromptValue):
class Prompt(BaseModel):
|
The "failed to parse the output" error in RAGAS version 0.1.7 could indeed be related to the Here are some potential causes related to these classes:
To resolve this, ensure that the output generated by the prompt matches the expected format specified by the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu this is a part of my code: class CustomRagasLLM(BaseRagasLLM):
Create an instance of the custom LLMcustom_llm = CustomRagasLLM(api_key=None) Define an async function to test the LLM#async def test_llm_with_prompts(): async def test_llm_with_prompts():
if name == "main":
|
The "failed to parse the output" error you're encountering is likely due to the Here are some steps to help resolve this issue:
By ensuring the output format is correct and debugging the prompt processing, you should be able to resolve the parsing error and eliminate the unexpected English text in the response [1]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
To check if the output of your import json
def is_json(myjson):
try:
json_object = json.loads(myjson)
except ValueError as e:
return False
return True
# Example usage within your async function
async def test_llm_with_prompts():
results = []
for prompt in prompts:
# Call the custom LLM with each prompt
llm_result = await custom_llm.generate_text(prompt=prompt)
# Extract the text from LLMResult
generated_text = llm_result.generations[0][0].text # Access the first generation
# Check if the generated text is a valid JSON
if is_json(generated_text):
print("The output is a valid JSON.")
else:
print("The output is not a valid JSON.")
results.append(generated_text)
# Debugging: Print each result (truncate to avoid overly large output)
print(f"LLM Output for prompt: {generated_text[:900]}") # Truncate for display
return results This function, To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
If the output of your custom LLM doesn't pass the
By following these steps, you should be able to ensure that the output from your custom LLM is valid JSON and passes the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@jjmachan i will share my code with you here. please help me. import json from ragas.llms.prompt import PromptValue data_set ={ embedding_dataset = { } features = Features({ Create a Dataset using the Hugging Face
|
@dosu |
Yes, the modification of your prompt by the To fix this issue, you can use the class FixOutputFormat(PydanticPrompt[OutputStringAndPrompt, StringIO]):
instruction = "The output string did not satisfy the constraints given in the prompt. Fix the output string and return it."
input_model = OutputStringAndPrompt
output_model = StringIO
fix_output_format_prompt = FixOutputFormat()
# Usage in RagasOutputParser
class RagasOutputParser(PydanticOutputParser[OutputModel]):
async def parse_output_string(
self,
output_string: str,
prompt_value: PromptValue,
llm: BaseRagasLLM,
callbacks: Callbacks,
retries_left: int = 1,
):
callbacks = callbacks or []
try:
jsonstr = extract_json(output_string)
result = super().parse(jsonstr)
except OutputParserException:
if retries_left != 0:
retry_rm, retry_cb = new_group(
name="fix_output_format",
inputs={"output_string": output_string},
callbacks=callbacks,
)
fixed_output_string = await fix_output_format_prompt.generate(
llm=llm,
data=OutputStringAndPrompt(
output_string=output_string,
prompt_value=prompt_value.to_string(),
),
callbacks=retry_cb,
retries_left=retries_left - 1,
)
retry_rm.on_chain_end({"fixed_output_string": fixed_output_string})
result = fixed_output_string
else:
raise RagasOutputParserException()
return result This code attempts to fix the output format if it doesn't match the expected JSON schema, retrying the process if necessary [1]. To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
what if i see this error is the same one that you just mentioned(RagasOutputParserException)? |
The error "Failed to parse output. Returning None." is indeed related to the To address this:
By ensuring the output is correctly formatted and using To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
The output of Regarding the input, the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
Yes, the input to your To ensure compatibility with the RAGAS framework, you should:
By following these guidelines, you can ensure that your To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
Here's an example of how you might structure the input and output for Input Example: prompt = "Question: What are the errors related to 3V tension in the Futura machine?\nContexts: [\"Error E306: Voltage 3.3V below limit - Check output connections - Replace hydraulic board\", \"Error E307: Internal voltage 3.3V above limit - Check output connections - Replace hydraulic board\"]\nAnswer: The errors related to 3V tension in the Futura machine include E306 and E307." Output Example: {
"question": "What are the errors related to 3V tension in the Futura machine?",
"contexts": [
"Error E306: Voltage 3.3V below limit - Check output connections - Replace hydraulic board",
"Error E307: Internal voltage 3.3V above limit - Check output connections - Replace hydraulic board"
],
"answer": "The errors related to 3V tension in the Futura machine include E306 and E307."
} Ensure that your To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu Question: Quali sono gli errori del macchinario futura, riguardanti la tensione a 3V? |
The input you've provided for
This format is acceptable as an input for To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
Certainly! You can share the function from your To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
class CustomRagasLLM(BaseRagasLLM):
|
The function you've provided for
Given this setup, the output from your To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
but when i run the code, in the class that i showed you i saw these errors: what should I do? |
The error "Object of type PromptValue is not JSON serializable" indicates that the Here's how you can modify your
Here's an example modification: async def _call(self, prompt: PromptValue) -> str:
"""
Process the prompt and return a result. This can be customized to
use a local model or perform any required logic.
"""
# Convert PromptValue to a string or dictionary
prompt_str = prompt.to_string() # Assuming to_string() method exists
# Alternatively, if it can be converted to a dictionary
# prompt_dict = prompt.to_dict() # Assuming to_dict() method exists
# Create the result dictionary
result = {"prompt": prompt_str, "processed": True}
# Debugging: Print the result of `_call`
print("DEBUG: Result from _call:", result)
# Return the result as a JSON string
return json.dumps(result) Make sure to replace To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
ok the code you give me for to_string() lead me to this: |
The error "AttributeError: 'str' object has no attribute 'to_string'" suggests that the Here's how you can modify your async def _call(self, prompt: str) -> str:
"""
Process the prompt and return a result. This can be customized to
use a local model or perform any required logic.
"""
# Since prompt is already a string, use it directly
result = {"prompt": prompt, "processed": True}
# Debugging: Print the result of `_call`
print("DEBUG: Result from _call:", result)
# Return the result as a JSON string
return json.dumps(result) This change assumes that the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
The error "Object of type PromptValue is not JSON serializable" indicates that the Here's how you can modify your
Here's an example modification: async def _call(self, prompt: PromptValue) -> str:
"""
Process the prompt and return a result. This can be customized to
use a local model or perform any required logic.
"""
# Convert PromptValue to a string or dictionary
if isinstance(prompt, PromptValue):
prompt_str = prompt.to_string() # Assuming to_string() method exists
else:
prompt_str = str(prompt) # Fallback to converting to string
# Create the result dictionary
result = {"prompt": prompt_str, "processed": True}
# Debugging: Print the result of `_call`
print("DEBUG: Result from _call:", result)
# Return the result as a JSON string
return json.dumps(result) Make sure to replace To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
To address the error "Object of type PromptValue is not JSON serializable," you can convert the Here's how you can modify your
Here's an example modification: async def _call(self, prompt: PromptValue) -> str:
"""
Process the prompt and return a result. This can be customized to
use a local model or perform any required logic.
"""
# Convert PromptValue to a dictionary
if isinstance(prompt, PromptValue):
prompt_dict = prompt.to_dict() # Assuming to_dict() method exists
else:
prompt_dict = {"prompt": str(prompt)} # Fallback to a simple dictionary
# Create the result dictionary
result = {"prompt": prompt_dict, "processed": True}
# Debugging: Print the result of `_call`
print("DEBUG: Result from _call:", result)
# Return the result as a JSON string
return json.dumps(result) Ensure that the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
@dosu |
To determine the attributes of the If you have access to the source code, you can check the class definition directly. Look for the If you don't have access to the source code, you might consider using Python's built-in functions to inspect the class at runtime. For example, you can use # Assuming you have an instance of PromptValue
prompt_value_instance = PromptValue(...)
print(dir(prompt_value_instance)) This will list all the attributes and methods available on the To continue talking to Dosu, mention @dosu. Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other |
does RAGAS specifically require a flattened format for the context?
or Do you think this format is ok? Sequence[string]
I am working with version 0.1.7
The text was updated successfully, but these errors were encountered: