Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence[string] #1844

Open
amin-kh96 opened this issue Jan 13, 2025 · 60 comments
Open

Sequence[string] #1844

amin-kh96 opened this issue Jan 13, 2025 · 60 comments
Labels
question Further information is requested

Comments

@amin-kh96
Copy link

does RAGAS specifically require a flattened format for the context?
or Do you think this format is ok? Sequence[string]
I am working with version 0.1.7

@amin-kh96 amin-kh96 added the question Further information is requested label Jan 13, 2025
@amin-kh96
Copy link
Author

@dosue
why don't you give me the answer?

@amin-kh96
Copy link
Author

@dosu
why don't you give me the answer?

Copy link

dosubot bot commented Jan 13, 2025

RAGAS version 0.1.7 expects the context input to be in the format of a List[str] (a list of strings) [1]. The Sequence[string] format is not explicitly mentioned as supported, so it's recommended to use a List[str] to ensure compatibility.

This is a bit tricky, @jjmachan, could you lend your expertise on whether RAGAS specifically requires a flattened format for the context?

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@hundredeuk2
Copy link
Contributor

This issue seems related to #1807. To assist you more effectively, I’ve replicated your described data format in the same development environment (RAGAS version 0.1.7), and everything appears to work fine on my end. However, to expedite troubleshooting and ensure we can address your specific issue accurately, it would be extremely helpful if you could share a portion of your dataset or provide the full code you used.

from ragas import evaluate
import datasets
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)


sample = {
"contexts": [["context1_chunk1", "context1_chunk2"], ["context2_chunk1"]],
"question": ["question1", "question2"],
"answer": ["answer1", "answer2"]
}
dt = datasets.Dataset.from_dict(a)

result = evaluate(
    dt,
    metrics=[
        faithfulness,
        answer_relevancy,
    ],
)
Evaluating:   0%|          | 0/4 [00:00<?, ?it/s]No statements were generated from the answer.
No statements were generated from the answer.
Evaluating: 100%|██████████| 4/4 [00:03<00:00,  1.05it/s]
|   | contexts                           | question   | answer   | faithfulness | answer_relevancy |
|---|------------------------------------|------------|----------|--------------|------------------|
| 1 | [context1_chunk1, context1_chunk2] | question1  | answer1  | NaN          | 0.859496         |
| 2 | [context2_chunk1]                  | question2  | answer2  | NaN          | 0.000000         |

@amin-kh96
Copy link
Author

amin-kh96 commented Jan 14, 2025

@hundredeuk2
here is the full code.
I extracted my embeddings and textual data related to embeddings, and now I am receiving this output user_llm_interaction_embeddings_c1521dd5_b819_4241_b3a4_3e5c1388037c.json

the full code:

import json
import os
from pathlib import Path

import numpy as np
from datasets import Dataset
import typing as t
import torch

from langchain_core.outputs import LLMResult, Generation
from langchain_core.outputs import LLMResult
from transformers import AutoModel, AutoTokenizer

from ragas.llms.prompt import PromptValue
from ragas.embeddings import BaseRagasEmbeddings
from ragas.metrics import context_utilization
from ragas.llms import BaseRagasLLM
from ragas.llms.prompt import PromptValue
from langchain_core.outputs import LLMResult, Generation
from ragas import evaluate

current_script_path = Path(file).resolve()
config_base_path = current_script_path.parents[1]

Load the ground truth data

file_path= os.path.join('src','assets','GT.json')
with open(config_base_path / file_path) as f:
ground_truth_data = json.load(f)

Load the question and the answer and the chunks

file_path= os.path.join('src','assets','user_llm_interaction_embeddings_c1521dd5_b819_4241_b3a4_3e5c1388037c.json')
with open(config_base_path / file_path) as f:
llm = json.load(f)

#creating a dataset of str type
#new_data_set = []
question=[]
context=[]
answer=[]
#extarcting str data of 'question' and 'answer'
for item in llm:
if item['role'] == 'user':
for c in item['content']:
question.append(c['text'])
else:
for c in item['content']:
answer.append(c['text'])

Iterate through each dictionary in your data

#for item in ground_truth_data:
# Check if 'content' key exists in the dictionary
#if 'content' in item:
# Access the value of the 'content' key and append it to the context list
#context.append(item['content'])
#else:
#print(f"'content' key not found in item with id: {item.get('id')}")

Check the length of context to see if anything was appended

#print(f"Number of context entries extracted: {len(context)}")

Iterate through each dictionary in your data

for item in llm:
# Check if 'content' key exists in the dictionary
if item['role'] == 'assistant':
# Access the value of the 'content' key and append it to the context list
context.append(item['chunks'])
else:
""" print(f"'content' key not found in item with id: {item.get('id')}") """

Check the length of context to see if anything was appended

""" print(f"Number of context entries extracted: {len(context)}") """

Replace the IDs with the corresponding content

chunk_embeddings = []
chunk_string = []

for sublist in context:
strings = [] # Initialize for each sublist
embeddings = [] # Initialize for each sublist
for idx, i in enumerate(sublist):
for item in ground_truth_data:
if item['id'] == i:
# Append matching content and embeddings
strings.append(item['content'])
embeddings.append(item['text_vector'])
# Append results for the current sublist
chunk_embeddings.append(embeddings)
chunk_string.append(strings)

Initialize empty lists for dataset

new_ragas_dataset = {
"question": [],
"contexts": [],
"answer": []
}

Assuming question, context, and answer lists are already available

for i in range(len(question)):
new_ragas_dataset['question'].append(question[i])

# For now, we assign all the chunks (contexts) to each question
new_ragas_dataset['contexts'].append(chunk_string[i])  # context is a list of chunks

# Assign the corresponding answer
new_ragas_dataset['answer'].append(answer[i])

Print to verify the format

#print(f"Dataset length: {len(new_ragas_dataset['question'])}")
#print(f"Sample entry:\n{new_ragas_dataset['question'][0]}") # Question sample
#print(f"Related contexts: {len(new_ragas_dataset['contexts'][0])}") # Contexts for the first question
#print(f"Answer sample: {new_ragas_dataset['answer'][0]}") # Answer sample

Initialize an empty list to hold the new dataset

data_set = []

Iterate through the list and combine every two dictionaries

for i in range(0, len(llm), 2):
# Calculate the corresponding index for chunky_emi
chunk_index = i // 2 # Map llm index to chunky_emi index
combined_dict = {
"text_vector_1": llm[i].get("text_vector", []),
"text_vector_2": llm[i + 1].get("text_vector", []),
'chunks_embd': chunk_embeddings[chunk_index]
}
data_set.append(combined_dict)

#for j in llm:
# Check if 'content' key exists in the dictionary
#if j['role'] == 'assistant':
# Access the value of the 'content' key and append it to the context list
#data_set.append(j['chunks'])
#else:
#pass

#def map_chunks(data_set, ground_truth_data):
#for item in data_set: # Iterate over each dictionary in data_set
#c = [] # Reset c for each item
#for chunk_id in item['chunks']: # Loop through 'chunks' in the current dictionary
#for element in ground_truth_data: # Loop through ground_truth_data
#if element['id'] == chunk_id: # Match chunk_id with element's id
#c.append(element['text_vector']) # Append the matching text_vector to c
#item['chunks'] = c # Replace the original 'chunks' (ids) with the mapped text_vector values

#return data_set  # Return the updated data_set

#data_set = map_chunks(data_set, ground_truth_data)
#data_set.append(chunk_embeddings)

Assuming data_set is a list of dictionaries

ragas_data = [
{
"question": entry["text_vector_1"], # Assuming this is a list of strings
"answer": entry["text_vector_2"], # Assuming this is a list of strings
"contexts": entry["chunks_embd"] # Assuming this is a list of lists of strings
}
for entry in data_set
]

Create the required structure that structures the data for the Hugging Face Dataset creation.

formatted_data = {
"question": [entry["question"] for entry in ragas_data],
"contexts": [entry["contexts"] for entry in ragas_data],
"answer": [entry["answer"] for entry in ragas_data]
}

Define the column_map to match custom columns to expected ones

column_map = {
"question": "question", # Match to the correct column
"answer": "answer", # Match to the correct column
"contexts": "contexts" # Match to the correct column
}

Create a Dataset using the Hugging Face datasets library

new_ragas_dataset = Dataset.from_dict(new_ragas_dataset)

Assuming new_ragas_dataset is your dataset

#def flatten_contexts(example):
#example['contexts'] = [item for sublist in example['contexts'] for item in sublist]

return example

#new_ragas_dataset = new_ragas_dataset.map(flatten_contexts)
model_name = 'distilbert-base-uncased'

class CustomHuggingFaceRagasEmbeddings(BaseRagasEmbeddings):
def init(self, model_name: str, custom_embeddings: dict = None):
"""
Initialize the Custom Hugging Face Ragas Embeddings with the specified model and custom embeddings.

    Parameters:
        model_name (str): The name of the Hugging Face model to use (e.g., 'distilbert-base-uncased').
        custom_embeddings (dict): A dictionary with pre-computed custom embeddings (optional), 
                                  where keys are text strings and values are the corresponding embeddings.
    """
    self.model_name = model_name
    self.tokenizer = AutoTokenizer.from_pretrained(model_name)
    self.model = AutoModel.from_pretrained(model_name)
    self.custom_embeddings = custom_embeddings if custom_embeddings else {}  # Store custom embeddings as a dict

def embed_text(self, text: str) -> np.ndarray:
    """
    Check if the text has an existing custom embedding, if not, compute it using the model.
    
    Parameters:
        text (str): The text to embed.
    
    Returns:
        np.ndarray: The embedding for the text.
    """
    if text in self.custom_embeddings:
        # Return the custom embedding if it exists
        return np.array(self.custom_embeddings[text])
    
    # Generate a new embedding using the Hugging Face model
    inputs = self.tokenizer(text, return_tensors='pt', padding=True, truncation=True)

    with torch.no_grad():
        outputs = self.model(**inputs)

    # Use the CLS token embedding
    embedding = outputs.last_hidden_state[:, 0, :].numpy()
    
    # Optionally, save this new embedding to custom_embeddings for future use
    self.custom_embeddings[text] = embedding
    
    return embedding

def embed_documents(self, texts: list) -> np.ndarray:
    """
    Generate embeddings for a list of documents. Check if custom embeddings are available first.
    
    Parameters:
        texts (list): A list of documents to embed.
    
    Returns:
        np.ndarray: An array of embeddings for the documents.
    """
    embeddings = []
    for text in texts:
        embedding = self.embed_text(text)
        embeddings.append(embedding)
    
    return np.array(embeddings)

def embed_query(self, query: str) -> np.ndarray:
    """
    Generate an embedding for a single query.
    
    Parameters:
        query (str): The query to embed.
    
    Returns:
        np.ndarray: The embedding for the query.
    """
    return self.embed_text(query)

Initialize the custom embeddings class with your pre-existing embeddings (if any)

Define the custom LLM class

class CustomRagasLLM(BaseRagasLLM):
def init(self, api_key: str = None):
"""
Initialize the custom LLM, optionally using an API key if necessary.
"""
self.api_key = api_key

async def _call(self, prompt: str) -> str:
    """
    Process the prompt and return a result. This can be customized to
    use a local model or perform any required logic.
    """
    if not self.api_key:
        return f"Processed: {prompt} (without API key)"
    else:
        # Handle LLM response if using an API
        return f"Processed: {prompt} (with API key: {self.api_key})"
   
async def generate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    # Synchronous generation logic
    text = await self._call(prompt)
    return LLMResult(generations=[[Generation(text=text)]])
async def agenerate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    """
    Asynchronous method to generate text. This should allow for async processing.
    """
    # Simulate an asynchronous call, here we directly call the sync method for now
    text = await self._call(prompt)
    return LLMResult(generations=[[Generation(text=text)]])     

if name == "main":

# custom_embeddings_dict = {
#    # Example: "question text": [embedding values], "context text": [embedding values], etc.
# }
custom_embeddings_dict = formatted_data
# custom_embeddings = CustomHuggingFaceRagasEmbeddings(model_name='distilbert-base-uncased', custom_embeddings=custom_embeddings_dict)

ragas_embeddings = CustomHuggingFaceRagasEmbeddings(model_name='distilbert-base-uncased', custom_embeddings=custom_embeddings_dict)
custom_llm =CustomRagasLLM(api_key=None) 

# Define the evaluation metrics
metrics = [context_utilization]
# Create a Dataset using the Hugging Face `datasets` library
#ragas_dataset = Dataset.from_dict(formatted_data)
# Run the evaluation
evaluation_report = evaluate(new_ragas_dataset, metrics=metrics, embeddings=ragas_embeddings, llm=custom_llm, column_map=column_map)  # Pass your custom LLM here

print("RAGAS Evaluation Report:")
print(evaluation_report)

@amin-kh96
Copy link
Author

@hundredeuk2
the file that I attached contains the answer and the question then from another file I extracted the chunks.
it was a bit heavy. This can give you insights.

@hundredeuk2
Copy link
Contributor

While reviewing the evaluation process, it appears that the variable new_ragas_dataset is central to performing the assessment. I replicated the steps on my end, and the same error was reproduced. Upon investigation, it seems the root cause lies in the contexts column being returned as an empty list.

This suggests there might be an issue with retrieving or processing the documents during the refinement stage. It would be helpful to double-check if something is being missed or mishandled in that step.

Additionally, since the ground_truth_data from your source code wasn’t included, I couldn’t replicate your setup with complete accuracy. It’s possible that this could also be influencing the observed behavior.

print(new_ragas_dataset)
{'question': ['Quali sono gli errori del macchinario futura, riguardanti la tensione a 3V?',
  "La macchina futura prevede qualche specifica per la conduttività dell'acqua?",
  'Quali sono gli errori che mi segnalano questi problemi?'],
 'contexts': [[], [], []],
 'answer': ['Gli errori riguardanti la tensione a 3V per il macchinario Futura sono i seguenti:\n\n1. **E306**: Tensione 3.3V sotto il limite\n   - Verificare le connessioni in uscita\n   - Sostituire la scheda idrico\n\n2. **E307**: Tensione 3.3V interno sopra il limite\n   - Verificare le connessioni in uscita\n   - Sostituire la scheda idrico\n\n3. **E716**: Tensione 3.3V sotto il limite\n   - Verificare connessioni in uscita\n   - Sostituire scheda base tavoletta\n\n4. **E717**: Tensione 3.3V interno sopra il limite\n   - Verificare connessioni in uscita\n   - Sostituire scheda base tavoletta\n\n5. **E449**: Tensione 3.3V sotto il limite\n   - Verificare i collegamenti in uscita\n   - Sostituire la scheda poltrona\n\n6. **E450**: Tensione 3.3V interno sopra il limite\n   - Verificare i collegamenti in uscita\n   - Sostituire la scheda poltrona\n\n7. **E116**: Tensione 3.3V sotto il limite\n   - Verificare le connessioni in uscita\n   - Sostituire la scheda base tavoletta medico\n\n8. **E117**: Tensione 3.3V interno sopra il limite\n   - Verificare le connessioni in uscita\n   - Sostituire la scheda base tavoletta medico\n\n9. **E224**: Tensione 3.3V sotto il limite\n   - Verificare le connessioni in uscita\n   - Sostituire la scheda idrico\n\n10. **E225**: Tensione 3.3V interno sopra il limite\n    - Verificare le connessioni in uscita\n    - Sostituire la scheda idrico',
  "Sì, la macchina Futura prevede specifiche precise per la conduttività dell'acqua utilizzata.\n\nEcco i punti salienti relativi alla conduttività dell'acqua:\n\n1. **Conduttività Massima:** L'acqua demineralizzata distillata utilizzata per l'alimentazione della sterilizzatrice dovrebbe avere una **conduttività inferiore a 15 μS/cm**.\n2. **Sensore di Conduttività:** La sterilizzatrice è dotata di un sensore di conduttività che verifica automaticamente la qualità dell'acqua trattata dal filtro demineralizzatore integrato.\n3. **Allarme Conduttività:** Se i valori di conduttività superano i 20 μS/cm, un allarme segnalerà la necessità di procedere alla sostituzione della cartuccia di resina a scambio ionico.\n4. **Rapporto Ciclo:** La misura di conducibilità dell’acqua viene effettuata automaticamente ad ogni avvio del ciclo di sterilizzazione o di test e il valore relativo è riportato sul rapporto ciclo.\n\nL'utilizzo di acqua con conduttività superiore ai limiti indicati può causare danni al macchinario e compromettere l'efficacia della sterilizzazione, incrementando anche il rischio di ossidazione e la formazione di residui calcarei.",
  "Gli errori che possono segnalare problemi relativi alla conduttività dell'acqua sono generalmente associati a malfunzionamenti del sistema idrico o delle sonde di livello. Tuttavia, in base ai documenti forniti, non sembra esserci un errore specifico che segnali direttamente un problema di conduttività dell'acqua. Gli errori che potrebbero indirettamente indicare problemi legati alla qualità dell'acqua o alla conduttività sono:\n\n1. **W64**: **Manca H₂O, riempire serbatoio 2**\n   - Azioni: Riempire il serbatoio H₂O₂, verificare la scheda igiene, verificare la sonda di livello.\n\n2. **W212**: **Serbatoio disinfettante pieno**\n   - Possibile implicazione di conduttività alta se il serbatoio pieno impedisce il normale funzionamento.\n\n3. **E55**: **S1 livello sonde incongruente Max = on Min = off**\n   - Incongruenza sonde serbatoio 1 W.H.E.: Verificare lo stato della scheda igiene, verificare le sonde, verificare lo stato dei led della scheda igiene.\n\n4. **E57**: **S2 Livello sonde incongruente Max = on Min = off**\n   - Incongruenza sonde serbatoio 2 del sistema W.H.E.: Verificare lo stato della scheda igiene, verificare le sonde, verificare lo stato dei led della scheda igiene.\n\n5. **E60**: **Durante il funzionamento normale del riunito, lettura di sonda di FULL coperta**\n   - Tentare la procedura di svuotamento del sistema W.H.E.. Verificare la presenza di trafilamenti acqua nel sistema W.H.E..\n\n6. **E67**: **Anomalia della sonda di massimo serbatoio 1**\n   - Serbatoio 1 eliminato per sonda STOP attiva: Verificare l'ingresso acqua al sistema W.H.E., verificare le sonde del serbatoio W.H.E.\n\n7. **E68**: **Anomalia della sonda di massimo serbatoio 2**\n   - Serbatoio 2 eliminato per sonda STOP attiva: Verificare l'ingresso acqua al sistema W.H.E., verificare le sonde del serbatoio W.H.E.\n\nIn caso emerga uno di questi errori, potrebbe essere utile controllare anche la qualità dell'acqua utilizzata per assicurarsi che rientri nei parametri specificati di conduttività per evitare ulteriori problemi di funzionamento del macchinario Futura."]}

@amin-kh96
Copy link
Author

@hundredeuk2
I have a few things to share with you.
first I do not have the GT and I do not know how to have it. because the topics are so specialized and chat GPT cannot be a source of GT in this case. I will create a database based on the analyzed answer later to use as GROUND TRUTH (please correct me if I am wrong).

second: the new new_ragas_dataset is my textual dataset I have also another dataset related to the embeddings in the same format to not use API calls for evaluation.
That's why I wrote a subclass for embedding and also the LLM.
maybe this caused the error because it can not match the embeddings with related text!
what do you think about that? but I say again they are in the exact format.

i think maybe I have to make some changes in the subclass of customragasllm.what do you think?

@amin-kh96
Copy link
Author

@dosu
Upon investigation, it seems the root cause lies in the contexts column being returned as an empty list.
how to understand and solve this?

Could you tell me about this and give me a solution for this?
This suggests there might be an issue with retrieving or processing the documents during the refinement stage. It would be helpful to double-check if something is being missed or mishandled in that step.

Copy link

dosubot bot commented Jan 15, 2025

To resolve the issue of the contexts column being returned as an empty list during the refinement stage in your RAGAS evaluation setup, you should ensure that the query_engine.aquery method is correctly retrieving and processing documents. Here are the steps you can take:

  1. Verify Query Engine Configuration: Ensure that the query_engine is properly configured to access and retrieve documents. Check if the data source is correctly set up and accessible.

  2. Check Query Execution: Confirm that the query_engine.aquery method is executing queries correctly and returning results. You can add logging or debugging statements to verify the responses and source nodes being returned.

  3. Inspect Source Nodes: In the code, the retrieved_contexts are populated from r.source_nodes. Ensure that r.source_nodes is not empty and contains the expected nodes. If it is empty, investigate why the query engine is not retrieving any nodes.

  4. Validate Dataset: Make sure the dataset is correctly formatted and contains valid samples. The dataset should be an instance of EvaluationDataset and should not be empty.

  5. Review Embeddings and LLM Wrappers: If you are using embeddings or LLMs, ensure that they are correctly wrapped and integrated using LlamaIndexEmbeddingsWrapper and LlamaIndexLLMWrapper.

By following these steps, you should be able to identify and resolve the issue with the contexts column being empty [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@hundredeuk2
please read the bot answer too and share your idea with me thanks.

@hundredeuk2
Copy link
Contributor

I might not fully understand the exact process you are attempting, but from the perspective of RAGAS Evaluation, your provided code cannot work without ground_truth_data.

The core issue seems to stem from the fact that the contexts column is being returned as an empty list, which directly causes the mentioned issue. To resolve this, the contexts must be transformed into a list that includes strings.

Based on the code snippet you shared:

chunk_embeddings = []
chunk_string = []

for sublist in context:
    strings = []
    embeddings = [] 
    for idx, i in enumerate(sublist):
        for item in ground_truth_data:
            if item['id'] == i:
                strings.append(item['content'])
                embeddings.append(item['text_vector'])
        chunk_embeddings.append(embeddings)
        chunk_string.append(strings)

The issue seems to arise because nothing is being appended to chunk_string. Since the data source is ground_truth_data, if that is unavailable or inaccessible, the code cannot function as intended.

That said, if creating ground_truth_data is not an option, you could replace it with a set of N retrieved documents used for generating the answer. This substitution should allow the evaluation process to function properly, even without ground_truth_data.

@amin-kh96
Copy link
Author

@dosu
should I use or write a subclass of baseragasllm if I already have the question, context, and answer?
I do not want to use API calls. I already have the embeddings and also the string data.

Copy link

dosubot bot commented Jan 16, 2025

If you already have the question, context, and answer, and you want to avoid using API calls, you don't necessarily need to subclass BaseRagasLLM unless you need to customize the behavior of the language model within the RAGAS framework. Since you have the embeddings and string data, you can focus on ensuring that your dataset is correctly formatted and that the evaluation process can access the necessary data.

Here are some steps to consider:

  1. Ensure Correct Dataset Format: Make sure your dataset is structured correctly, with the contexts column being a list of lists of strings, as required by RAGAS [1].

  2. Use Custom Embeddings: If you have precomputed embeddings, ensure they are correctly integrated using a subclass of BaseRagasEmbeddings. This allows you to use your embeddings without API calls [2].

  3. Avoid Subclassing LLM: If you don't need to generate new text or modify the LLM's behavior, you might not need to subclass BaseRagasLLM. Instead, focus on using your existing data for evaluation.

  4. Check Data Retrieval: Ensure that your code correctly retrieves and processes the contexts, as this is crucial for the evaluation to work. If ground_truth_data is unavailable, consider using a set of retrieved documents as a substitute, as suggested by @hundredeuk2.

By following these steps, you can effectively use your existing data for evaluation without relying on API calls.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
I do not need to generate text or modify LLM but I saw the error previously to set the LLM. which also it will use API calls. That is why I am using the subclass. correct me if I am wrong or etc...

Copy link

dosubot bot commented Jan 16, 2025

If you do not need to generate text or modify the LLM, and you want to avoid API calls, you might not need to subclass BaseRagasLLM. The subclassing is typically necessary when you need to customize the behavior of the language model, such as generating text or handling specific configurations.

However, if you encountered an error related to setting the LLM, it might be due to the framework expecting an LLM instance even if it's not used for text generation. In such cases, you can provide a minimal implementation of BaseRagasLLM that doesn't perform any API calls. This can be a simple subclass that overrides the necessary methods without actually calling any external APIs.

Here's a basic example of how you might implement this:

class MinimalRagasLLM(BaseRagasLLM):
    async def _call(self, prompt: str) -> str:
        return "No API call needed"

    async def generate_text(self, prompt, n=1, temperature=1e-8, stop=None, callbacks=[]):
        return LLMResult(generations=[[Generation(text="No API call needed")]])

    async def agenerate_text(self, prompt, n=1, temperature=1e-8, stop=None, callbacks=[]):
        return LLMResult(generations=[[Generation(text="No API call needed")]])

This way, you satisfy the framework's requirement for an LLM without making any API calls. Ensure your dataset is correctly formatted and that your embeddings are properly integrated to avoid other potential issues.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@hundredeuk2
this is my code with only one sample
can you help me to figure out why I still see the error failed to parse the output?
I put some debug statements but still, I am not sure.

import json
import os
from pathlib import Path
import numpy as np
from datasets import Dataset
from datasets.features import Features, Sequence, Value
import typing as t
import torch
import asyncio
from langchain_core.outputs import LLMResult, Generation
from langchain_core.outputs import LLMResult
from transformers import AutoModel, AutoTokenizer

from ragas.llms.prompt import PromptValue
from ragas.embeddings import BaseRagasEmbeddings
from ragas.metrics import context_utilization
from ragas.llms import BaseRagasLLM
from ragas.llms.prompt import PromptValue
from langchain_core.outputs import LLMResult, Generation
from ragas import evaluate

data_set ={
'question': ['Quali sono gli errori del macchinario futura, riguardanti la tensione a 3V?'],
'contexts':[['NO CAPTION\n\n|E…|CAUSA CONDIZIONE|VERIFICHE|\n|229|Tensione 24Vdc sopra il limite|- verificare la scheda idrico - verificare la tensione in ingresso - verificare la connessione|\n|230|Tensione 12Vdc sotto il limite|- verificare la scheda idrico - verificare la tensione in ingresso - verificare la connessione|\n|231|Tensione 12Vdc sopra il limite|- verificare la scheda idrico - verificare la tensione in ingresso - verificare la connessione|\n|232|Tensione 5Vdc sotto il limite|- verificare la scheda idrico - verificare la tensione in ingresso - verificare la connessione|\n|233|Tensione 5Vdc sopra il limite|- verificare la scheda idrico - verificare la tensione in ingresso - verificare la connessione|\n|251|Errore di comunicazione tra back- end e front-end della console|Spegnere e riaccendere il dispositivo|\n|300|Rilevata condizione di sovrassorbi- mento del motore sliding|- verificare meccanicamente lo sliding - verificare il motore - verificare i cablaggi - verificare la scheda sliding|\n|301|Valore del potenziometro sliding non corretto|- verificare il potenziometro - ripetere la taratura della poltrona - verificare la scheda sliding|\n|302|Valore del potenziometro troppo bas- so|- verificare il potenziometro - ripetere la taratura della poltrona - verificare la scheda sliding|\n|303|Valore del potenziometro troppo alto|- verificare il potenziometro - ripetere la taratura della poltrona - verificare la scheda sliding|\n|304|Valore di tensione di riferimento in- terno alla scheda sotto il limite|- sostituire la scheda idrico|\n|305|Valore di tensione di riferimento in- terno alla scheda sopra il limite|- sostituire la scheda idrico|\n|306|Tensione 3.3V sotto il limite|- verificare le connessioni in uscita - sostituire la scheda idrico|\n|307|Tensione 3.3V interno sopra il limite|- verificare le connessioni in uscita - sostituire la scheda idrico|\n|308|Guasto sensore temperatura CPU|- sostituire la scheda idrico|\n|309|Temperatura CPU sopra il limite|- verificare la temperatura - sostituire la scheda idrico|\n|310|Tensione 24Vdc sotto il limite|- verificare la scheda idrico - verificare la tensione in ingresso - verificare la connessione|\n|311|Tensione 24Vdc sopra il limite|- verificare la scheda idrico - verificare la tensione in ingresso - verificare la connessione|']],
'answer':['Gli errori riguardanti la tensione a 3V per il macchinario Futura sono i seguenti:\n\n1. E306: Tensione 3.3V sotto il limite\n - Verificare le connessioni in uscita\n - Sostituire la scheda idrico\n\n2. E307: Tensione 3.3V interno sopra il limite\n - Verificare le connessioni in uscita\n - Sostituire la scheda idrico\n\n3. E716: Tensione 3.3V sotto il limite\n - Verificare connessioni in uscita\n - Sostituire scheda base tavoletta\n\n4. E717: Tensione 3.3V interno sopra il limite\n - Verificare connessioni in uscita\n - Sostituire scheda base tavoletta\n\n5. E449: Tensione 3.3V sotto il limite\n - Verificare i collegamenti in uscita\n - Sostituire la scheda poltrona\n\n6. E450: Tensione 3.3V interno sopra il limite\n - Verificare i collegamenti in uscita\n - Sostituire la scheda poltrona\n\n7. E116: Tensione 3.3V sotto il limite\n - Verificare le connessioni in uscita\n - Sostituire la scheda base tavoletta medico\n\n8. E117: Tensione 3.3V interno sopra il limite\n - Verificare le connessioni in uscita\n - Sostituire la scheda base tavoletta medico\n\n9. E224: Tensione 3.3V sotto il limite\n - Verificare le connessioni in uscita\n - Sostituire la scheda idrico\n\n10. E225: Tensione 3.3V interno sopra il limite\n - Verificare le connessioni in uscita\n - Sostituire la scheda idrico']
}

embedding_dataset = {
'question':[[-0.019296993, -0.005491396, 0.019954083, -0.015649473, -0.016722273, 0.010982792, -0.015019203, -0.0015924377, -0.031701244, -0.0014709096, 0.034061406, 0.030816184, -0.017285492, -0.0010308939, 0.026109274, -0.010131257, 0.033739567, 0.012840077, 0.024218464, -0.0095680365, 4.7956473e-05, 0.016816143, -0.016172463, -0.0058232932, 0.0016561353, 5.6521065e-05, 0.0062188883, -0.014120732, 0.023923444, -0.013570922, 0.038138047, -0.013047932, -0.0006864245, 0.008984702, -0.002331664, -0.008327612, 0.00091188017, 0.0071341214, 0.008723207, -0.0020986653, 0.041624647, 0.008837191, 0.017540284, 0.002329988, -0.012920537, 0.022890873, 0.008870716, -0.0027172016, -0.020490484, 0.023588194, 0.015622652, 0.0022059453, -0.0040766406, -0.021818073, 0.011170532, 0.0077241613, -0.005880286, 0.00015086253, 0.0011767277, ...]],
'contexts':[[-0.010124174, -0.0067393044, 0.0080547035, -0.030676855, -0.042065702, 0.005971707, -0.01143619, -0.0334632, -0.016406978, -0.024035612, 0.027146582, 0.024103243, -0.020532392, -0.006823842, 0.007953258, 0.006235463, 0.0014768647, 0.0033409097, 0.015892992, -0.018760493, 0.013627394, 0.034220655, -0.04428396, -0.020315977, 0.0013576673, -0.022791225, -0.0062794224, -0.01693449, 0.005944655, -0.020261873, 0.023765095, -0.0077368435, -0.02578046, -0.0016307223, -0.0025208988, -0.019220375, -0.0047881864, -0.020667652, 0.012200407, 0.009738684, 0.019666731, 0.010861338, -0.0056910436, -0.012829363, -0.012423585, 0.0051297164, 0.0150814345, 0.010130936, 0.0034254468, 0.048963938, 0.0020458, 0.0032360835, -0.014310455, -0.011794629, 0.035924923, 0.010996597, 0.0022064205, 0.016582815, 0.0060528624, ...]],
'answer' :[[-0.001484744, -0.0084717255, 1.1016621e-06, -0.018805072, -0.037394304, -0.0007946458, -0.029138418, -0.025050946, -0.0043201754, -0.001330452, 0.03151266, 0.022069653, -0.013301148, -0.009092266, -0.0070013143, -0.002585023, 0.022582274, 0.017604461, 0.020788103, -0.011648622, 0.009402536, 0.03100004, -0.01644432, -0.019830313, -0.0032156808, -0.019398632, -0.0063774018, -0.025981756, -0.00401665, -0.009335086, 0.019115342, -0.0036153223, -0.022757644, 0.014393839, -0.0017435836, -0.014596189, -0.0015690567, -0.004063865, 0.004654053, -0.0026103167, 0.04799745, 0.013962159, -8.678924e-05, -0.0011651994, -0.009328341, 0.03412972, 0.00834357, 0.019816823, -0.024174096, 0.04902269, 0.004667543, -0.0010530638, -0.0014029609, -0.01663318, 0.026966527, 0.00045570935, 0.0071699396, 0.009348576, -0.013415813, ...]]

}

features = Features({
"contexts": Sequence(Value("string")), # Explicitly define as Sequence[string]
"question": Value("string"),
"answer": Value("string")
})

Create a Dataset using the Hugging Face datasets library

data_set = Dataset.from_dict(data_set, features=features)

class CustomHuggingFaceRagasEmbeddings(BaseRagasEmbeddings):
def init(self, model_name: str, custome_embedding: dict = None):
"""
Initialize the Custom Hugging Face Ragas Embeddings with the specified model and custom embeddings.

    Parameters:
        model_name (str): The name of the Hugging Face model to use (e.g., 'distilbert-base-uncased').
        custom_embeddings (dict): A dictionary with pre-computed custom embeddings (optional), 
                                  where keys are text strings and values are the corresponding embeddings.
    """
    self.model_name = model_name
    self.tokenizer = AutoTokenizer.from_pretrained(model_name)
    self.model = AutoModel.from_pretrained(model_name)
    self.custome_embedding = custome_embedding if custome_embedding else {}  # Store custom embeddings as a dict

def embed_text(self, text: str) -> np.ndarray:
    """
    Check if the text has an existing custom embedding, if not, compute it using the model.
    
    Parameters:
        text (str): The text to embed.
    
    Returns:
        np.ndarray: The embedding for the text.
    """
    if text in self.custome_embedding:
        # Return the custom embedding if it exists
        return np.array(self.custome_embedding[text])
    
    # Generate a new embedding using the Hugging Face model
    inputs = self.tokenizer(text, return_tensors='pt', padding=True, truncation=True)

    with torch.no_grad():
        outputs = self.model(**inputs)

    # Use the CLS token embedding
    embedding = outputs.last_hidden_state[:, 0, :].numpy()
    
    # Optionally, save this new embedding to custom_embeddings for future use
    self.custome_embedding[text] = embedding
    
    return embedding

def embed_documents(self, texts: list) -> np.ndarray:
    """
    Generate embeddings for a list of documents. Check if custom embeddings are available first.
    
    Parameters:
        texts (list): A list of documents to embed.
    
    Returns:
        np.ndarray: An array of embeddings for the documents.
    """
    embeddings = []
    for text in texts:
        embedding = self.embed_text(text)
        embeddings.append(embedding)
    
    return np.array(embeddings)

def embed_query(self, query: str) -> np.ndarray:
    """
    Generate an embedding for a single query.
    
    Parameters:
        query (str): The query to embed.
    
    Returns:
        np.ndarray: The embedding for the query.
    """
    return self.embed_text(query)

#def preprocess_dataset(data_set):
#prompts = []
#for question, contexts, answer in zip(data_set["question"], data_set["contexts"], data_set["answer"]):
# prompt = f"Question: {question}\nContexts: {contexts}\nAnswer: {answer}"
# prompts.append(prompt)
#return prompts

def preprocess_dataset(dataset):
prompts = []
for question, contexts, answer in zip(dataset["question"], dataset["contexts"], dataset["answer"]):
prompt = f"Question: {question}\nContexts: {contexts}\nAnswer: {answer}"
prompts.append(prompt)
return prompts

prompts = preprocess_dataset(data_set)

Check and print the type and first part of the prompt

print(f"Type of prompts: {type(prompts)}") # Should confirm if it's a string or list
#print(f"Content of prompts (first 500 chars): {prompts[:500]}") # Display a portion of it

class CustomRagasLLM(BaseRagasLLM):
def init(self, api_key: str = None):
"""
Initialize the custom LLM, optionally using an API key if necessary.
"""
self.api_key = api_key

async def _call(self, prompt: str) -> str:
    """
    Process the prompt and return a result. This can be customized to
    use a local model or perform any required logic.
    """
    # Debugging: Print the input prompt
    print("DEBUG: Input prompt:", prompt)
    
    if not self.api_key:
        result = f"Processed: {prompt} (without API key)"
    else:
        # Handle LLM response if using an API
        result = f"Processed: {prompt} (with API key: {self.api_key})"
    
    # Debugging: Print the result of `_call`
    print("DEBUG: Result from _call:", result)
    return result

async def generate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    # Synchronous generation logic
    text = await self._call(prompt)
    # Debugging: Print the generated text
    print("DEBUG: Generated text in generate_text:", text)
    return LLMResult(generations=[[Generation(text=text)]])

async def agenerate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    """
    Asynchronous method to generate text. This should allow for async processing.
    """
    # Simulate an asynchronous call, here we directly call the sync method for now
    text = await self._call(prompt)
    # Debugging: Print the async generated text
    print("DEBUG: Async generated text in agenerate_text:", text)
    return LLMResult(generations=[[Generation(text=text)]])      

Create an instance of the custom LLM

custom_llm = CustomRagasLLM(api_key=None)

Define an async function to test the LLM

async def test_llm_with_prompts():
results = []
for prompt in prompts:
# Call the custom LLM with each prompt
result = await custom_llm._call(prompt)
results.append(result)
# Debugging: Print each result
print(f"LLM Output for prompt: {result[:700]}") # Truncate to avoid overly large output
return results

if name == "main":

  # Ensure proper execution of async calls
all_results = asyncio.run(test_llm_with_prompts())
print(f"Total Results: {len(all_results)}")  # Confirm all prompts were processed

embedding_dict=embedding_dataset

ragas_embeddings = CustomHuggingFaceRagasEmbeddings(model_name='distilbert-base-uncased', custome_embedding= embedding_dict)
custom_llm =CustomRagasLLM(api_key=None)

# Define the evaluation metrics
metrics = [context_utilization]


# Run the evaluation
evaluation_report = evaluate(data_set, metrics=metrics, embeddings=ragas_embeddings, llm=custom_llm,is_async=True)  # Pass your custom LLM here

print("RAGAS Evaluation Report:")
print(evaluation_report)

@amin-kh96
Copy link
Author

@dosu
what is this error about?
RuntimeWarning: Mean of empty slice
value = np.nanmean(self.scores[cn])

Copy link

dosubot bot commented Jan 19, 2025

The "RuntimeWarning: Mean of empty slice" error occurs when attempting to calculate the mean of an empty array. In the context of the RAGAS evaluation framework, this can happen if the scores for a particular context are not being populated correctly.

To resolve this issue, you can use the safe_nanmean function provided in the RAGAS framework, which safely handles empty arrays by returning np.nan when the input list is empty. Here's how you can implement it:

def safe_nanmean(arr: t.List[float]) -> float:
    if len(arr) == 0:
        return np.nan  # or some other value or behavior for empty arrays

    arr_numpy = np.asarray(arr)  # Ensure input is a numpy array

    if np.isnan(arr_numpy).all():
        return np.nan  # or some other value or behavior for all-NaN arrays

    return float(np.nanmean(arr_numpy))

Replace any direct calls to np.mean or np.nanmean with safe_nanmean to avoid the warning when the input list is empty [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
can this error be because of the wrong format or Malformed data is passed to the evaluate function or somewhere else?

Copy link

dosubot bot commented Jan 19, 2025

Yes, the "RuntimeWarning: Mean of empty slice" error can indeed be caused by malformed data or incorrect formatting passed to the evaluate function. Specifically, if the data being evaluated does not contain the expected values or if certain columns (like contexts, question, or answer) are empty or improperly formatted, it can lead to this warning.

To address this, ensure that:

  1. Data Format: The dataset passed to the evaluate function is correctly formatted. The contexts column should be a List[List[str]], meaning each entry should be a list of strings [1].

  2. Data Validation: Before calling evaluate, validate that all necessary columns are populated with the expected data types and values. This includes checking that no lists are empty and that all required fields are present.

  3. Custom Metrics: If you are using custom metrics, ensure they handle cases where data might be missing or incomplete. Implement checks to avoid processing empty lists or arrays.

By ensuring the data is correctly formatted and validated, you can prevent such warnings and ensure the evaluation process runs smoothly.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
in my customragasllm, i saw that a prompt is adding and i do not know where it comes from.
does it come from the generate function? call function or etc...?
i give you some part of it.

Copy link

dosubot bot commented Jan 20, 2025

In your CustomRagasLLM, the prompt is likely being processed in the _call method. This method is responsible for handling the input prompt and returning a result. If you see a prompt being added or modified, it could be happening within this method or any other method that processes the prompt before it reaches _call.

Here's a brief overview of how the prompt might be handled:

  1. _call Method: This method takes the prompt as input and processes it. If you have any logic in _call that modifies the prompt, this is where it would occur.

  2. generate_text and agenerate_text Methods: These methods call _call to process the prompt. If there's any additional logic in these methods that modifies the prompt before passing it to _call, it could also affect the final prompt.

If you want to trace where the prompt is being modified, you can add debugging statements in these methods to print the prompt at different stages of processing. This will help you identify where any changes are being made.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
can we use the metric 'context_utilization' without having the ground truth in dataset?

Copy link

dosubot bot commented Jan 20, 2025

Yes, the 'context_utilization' metric can be used without having the ground truth in the dataset. The ContextUtilization class is derived from LLMContextPrecisionWithoutReference, which requires the columns "user_input", "response", and "retrieved_contexts" but does not require a ground truth or reference context [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
in RAGAS version 0.1.7 i saw this in the prompt value can this be the error of failed to parse the output?

class PromptValue(BasePromptValue):
prompt_str: str

def to_messages(self) -> t.List[BaseMessage]:
    """Return prompt as a list of Messages."""
    return [HumanMessage(content=self.to_string())]

def to_string(self) -> str:
    return self.prompt_str

class Prompt(BaseModel):
"""
Prompt is a class that represents a prompt for the ragas metrics.

Attributes:
    name (str): The name of the prompt.
    instruction (str): The instruction for the prompt.
    output_format_instruction (str): The output format instruction for the prompt.
    examples (List[Dict[str, Any]]): List of example inputs and outputs for the prompt.
    input_keys (List[str]): List of input variable names.
    output_key (str): The output variable name.
    output_type (Literal["json", "str"]): The type of the output (default: "json").
    language (str): The language of the prompt (default: "english").
"""

name: str
instruction: str
output_format_instruction: str = ""
examples: t.List[Example] = []
input_keys: t.List[str]
output_key: str
output_type: t.Literal["json", "str"] = "json"
language: str = "english"

Copy link

dosubot bot commented Jan 20, 2025

The "failed to parse the output" error in RAGAS version 0.1.7 could indeed be related to the PromptValue and Prompt classes, particularly if the output does not match the expected format. The Prompt class specifies an output_type attribute, which defaults to "json". This means the output is expected to be in JSON format, and any deviation from this format could lead to parsing errors.

Here are some potential causes related to these classes:

  1. Output Format Mismatch: If the output_type is set to "json", but the actual output is not valid JSON or does not comply with the expected JSON schema, it will result in a parsing error. Ensure that the output generated by the prompt adheres to the specified JSON format.

  2. Incorrect Output Key: The output_key attribute specifies the expected output variable name. If the output does not include this key or if it is incorrectly formatted, it could cause parsing issues.

  3. PromptValue Conversion: The to_messages method in PromptValue converts the prompt to a HumanMessage using to_string(). If there is any issue in how the prompt string is constructed or converted, it might affect the output format.

To resolve this, ensure that the output generated by the prompt matches the expected format specified by the output_type and output_key. Additionally, verify that the prompt string is correctly constructed and converted to messages.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
this is my code. when i run it i see failed to parse the output. and another thing is an English text is also in the response which isn't my dataset. I think the customragasllm or the evaluate function brings it from somewhere.

this is a part of my code:

class CustomRagasLLM(BaseRagasLLM):
def init(self, api_key: str = None):
"""
Initialize the custom LLM, optionally using an API key if necessary.
"""
self.api_key = api_key

async def _call(self, prompt: str) -> str:
    """
    Process the prompt and return a result. This can be customized to
    use a local model or perform any required logic.
    """
    # Escaping special characters to ensure they're correctly processed
    prompt = self.escape_special_chars(prompt)
    # Debugging: Print the input prompt
    #print("DEBUG: Input prompt:", prompt)
    
    if not self.api_key:
        result = f"Processed: {prompt} (without API key)"
    else:
        # Handle LLM response if using an API
        result = f"Processed: {prompt} (with API key: {self.api_key})"
    
    # Debugging: Print the result of `_call`
    print("DEBUG: Result from _call:", result)
    return result
def escape_special_chars(self, text: str) -> str:
    """
    Helper method to escape special characters in the prompt
    to ensure they are treated as raw text in downstream processing.
    """
    # Escape newlines and backslashes (you can add more if needed)
    return text #.replace("\\n", "\\\\n").replace("\\", "\\\\")

async def generate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    # Synchronous generation logic
    text = await self._call(prompt)
    # Debugging: Print the generated text
    print("DEBUG: Generated text in generate_text:", text)
    return LLMResult(generations=[[Generation(text=text)]])

async def agenerate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    """
    Asynchronous method to generate text. This should allow for async processing.
    """
    # Simulate an asynchronous call, here we directly call the sync method for now
    text = await self._call(prompt)
    # Debugging: Print the async generated text
    print("DEBUG: Async generated text in agenerate_text:", text)
    return LLMResult(generations=[[Generation(text=text)]])      

Create an instance of the custom LLM

custom_llm = CustomRagasLLM(api_key=None)

Define an async function to test the LLM

#async def test_llm_with_prompts():
#results = []
#for prompt in prompts:
# Call the custom LLM with each prompt
#result = await custom_llm.generate_text(prompt)
#results.append(result)
# Debugging: Print each result
#print(f"LLM Output for prompt: {result[:900]}") # Truncate to avoid overly large output
#return results

async def test_llm_with_prompts():
results = []
for prompt in prompts:
# Call the custom LLM with each prompt
llm_result = await custom_llm.generate_text(prompt=prompt)

    # Extract the text from LLMResult
    generated_text = llm_result.generations[0][0].text  # Access the first generation
    
    results.append(generated_text)
    
    # Debugging: Print each result (truncate to avoid overly large output)
    print(f"LLM Output for prompt: {generated_text[:900]}")  # Truncate for display
return results

if name == "main":

  # Ensure proper execution of async calls
all_results = asyncio.run(test_llm_with_prompts())
print(f"Total Results: {len(all_results)}")  # Confirm all prompts were processed

embedding_dict=embedding_dataset

ragas_embeddings = CustomHuggingFaceRagasEmbeddings(model_name='distilbert-base-uncased', custome_embedding= embedding_dict)
#custom_llm =CustomRagasLLM(api_key=None)

# Define the evaluation metrics
metrics = [context_utilization]


# Run the evaluation
evaluation_report = evaluate(data_set, metrics=metrics, embeddings=ragas_embeddings, llm=custom_llm,is_async=True)  # Pass your custom LLM here

print("RAGAS Evaluation Report:")
print(evaluation_report)

Copy link

dosubot bot commented Jan 20, 2025

The "failed to parse the output" error you're encountering is likely due to the RagasOutputParserException, which occurs when the output parser cannot parse the output, even after retries. The presence of unexpected English text in the response might be related to the output not being in the expected format, causing the parser to fail.

Here are some steps to help resolve this issue:

  1. Check Output Format: Ensure that the output generated by your CustomRagasLLM matches the expected format. If the Prompt class specifies an output_type of "json", the output should be valid JSON. Verify that the output adheres to the expected schema and includes all necessary keys.

  2. Debug Prompt Processing: Add debugging statements in your _call, generate_text, and agenerate_text methods to trace how the prompt is being processed and what output is being generated. This can help identify where the unexpected text might be introduced.

  3. Review PromptValue Usage: Ensure that the PromptValue is correctly constructed and converted to a string using the to_string() method. Any issues in this conversion could affect the output format.

  4. Escape Special Characters: Your escape_special_chars method is intended to handle special characters, but it currently returns the text unchanged. Consider implementing the escaping logic to ensure special characters are correctly processed.

  5. Verify Input Data: Double-check that the input data (questions, contexts, answers) is correctly formatted and does not contain unexpected or malformed entries that could affect the output generation.

By ensuring the output format is correct and debugging the prompt processing, you should be able to resolve the parsing error and eliminate the unexpected English text in the response [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Copy link

dosubot bot commented Jan 20, 2025

To check if the output of your CustomRagasLLM class is a valid JSON, you can implement a utility function that attempts to parse the output using Python's json module. Here's how you can do it:

import json

def is_json(myjson):
    try:
        json_object = json.loads(myjson)
    except ValueError as e:
        return False
    return True

# Example usage within your async function
async def test_llm_with_prompts():
    results = []
    for prompt in prompts:
        # Call the custom LLM with each prompt
        llm_result = await custom_llm.generate_text(prompt=prompt)
        
        # Extract the text from LLMResult
        generated_text = llm_result.generations[0][0].text  # Access the first generation
        
        # Check if the generated text is a valid JSON
        if is_json(generated_text):
            print("The output is a valid JSON.")
        else:
            print("The output is not a valid JSON.")
        
        results.append(generated_text)
        
        # Debugging: Print each result (truncate to avoid overly large output)
        print(f"LLM Output for prompt: {generated_text[:900]}")  # Truncate for display
    return results

This function, is_json, tries to parse the string as JSON. If it succeeds, the function returns True, indicating that the string is valid JSON. If it fails, it catches the ValueError and returns False, indicating that the string is not valid JSON. You can use this function to check each output from your LLM and print a message accordingly.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
have you seen my code?
i think the output of the custome LLM doesn't pass the function you gave me?

Copy link

dosubot bot commented Jan 20, 2025

If the output of your custom LLM doesn't pass the is_json function, it means the output is not valid JSON. Here are some steps you can take to troubleshoot and resolve this issue:

  1. Inspect the Output: Print the output from your LLM to see what it looks like. This can help you identify any obvious formatting issues or unexpected content.

  2. Ensure JSON Format: If your output is supposed to be JSON, make sure it is correctly formatted. This includes ensuring that:

    • Strings are enclosed in double quotes.
    • There are no trailing commas.
    • The JSON structure is complete and properly nested.
  3. Modify the _call Method: If your _call method is generating the output, ensure that it constructs a valid JSON string. You might need to use json.dumps() to convert a Python dictionary to a JSON string.

  4. Debugging: Add debugging statements in your _call, generate_text, and agenerate_text methods to trace how the output is being constructed. This can help you pinpoint where the formatting might be going wrong.

  5. Example Correction: If your output is a simple string, wrap it in a JSON structure. For example:

    result = json.dumps({"response": f"Processed: {prompt} (without API key)"})

By following these steps, you should be able to ensure that the output from your custom LLM is valid JSON and passes the is_json function.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@jjmachan
@antono @peterkaminski @baswenneker
In Ragas version 0.1.7, I am using ContextUtilization metric. but I received the error failed to parse the output.
when I print the output I notice that some prompts inside the metric are added to the original prompt.
can this cause this error that I see?
my prompt is in Italian and I am using CustomHuggingFaceRagasEmbeddings and CustomRagasLLM
Failed to parse output. Returning None.

i will share my code with you here. please help me.

import json
import os
from pathlib import Path
import numpy as np
from datasets import Dataset
from datasets.features import Features, Sequence, Value
import typing as t
import torch
import asyncio
from langchain_core.outputs import LLMResult, Generation
from langchain_core.outputs import LLMResult
from transformers import AutoModel, AutoTokenizer

from ragas.llms.prompt import PromptValue
from ragas.embeddings import BaseRagasEmbeddings
from ragas.metrics import context_utilization
from ragas.llms import BaseRagasLLM
from ragas.llms.prompt import PromptValue
from langchain_core.outputs import LLMResult, Generation
from ragas import evaluate

data_set ={
'question': ['Quali sono gli errori del macchinario futura, riguardanti la tensione a 3V?'],
'contexts':[['re del potenziometro sliding non corretto- verificare il potenziometro - ripete']],
'answer':['Gli errori riguardanti la tensione a 3V per']
}

embedding_dataset = {
'question':[[-0.019296993, -0.005491396, 0.019954083, -0.015649473, -0.016722273, 0.010982792, -0.015019203, -0.0015924377, -0.031701244, -0.0014709096, 0.034061406, 0.030816184, -0.017285492, -0.0010308939, 0.026109274, -0.010131257, 0.033739567, 0.012840077, 0.024218464, -0.0095680365, 4.7956473e-05, 0.016816143, -0.016172463, -0.0058232932, 0.0016561353, 5.6521065e-05, 0.0062188883, -0.014120732, 0.023923444, -0.013570922, 0.038138047, -0.013047932, -0.0006864245, 0.008984702, -0.002331664, -0.008327612, 0.00091188017, 0.0071341214, 0.008723207, -0.0020986653, 0.041624647, 0.008837191, 0.017540284, 0.002329988, -0.012920537, 0.022890873, 0.008870716, -0.0027172016, -0.020490484, 0.023588194, 0.015622652, 0.0022059453, -0.0040766406, -0.021818073, 0.011170532, 0.0077241613, -0.005880286, 0.00015086253, 0.0011767277, ...]],
'contexts':[[-0.010124174, -0.0067393044, 0.0080547035, -0.030676855, -0.042065702, 0.005971707, -0.01143619, -0.0334632, -0.016406978, -0.024035612, 0.027146582, 0.024103243, -0.020532392, -0.006823842, 0.007953258, 0.006235463, 0.0014768647, 0.0033409097, 0.015892992, -0.018760493, 0.013627394, 0.034220655, -0.04428396, -0.020315977, 0.0013576673, -0.022791225, -0.0062794224, -0.01693449, 0.005944655, -0.020261873, 0.023765095, -0.0077368435, -0.02578046, -0.0016307223, -0.0025208988, -0.019220375, -0.0047881864, -0.020667652, 0.012200407, 0.009738684, 0.019666731, 0.010861338, -0.0056910436, -0.012829363, -0.012423585, 0.0051297164, 0.0150814345, 0.010130936, 0.0034254468, 0.048963938, 0.0020458, 0.0032360835, -0.014310455, -0.011794629, 0.035924923, 0.010996597, 0.0022064205, 0.016582815, 0.0060528624, ...]],
'answer' :[[-0.001484744, -0.0084717255, 1.1016621e-06, -0.018805072, -0.037394304, -0.0007946458, -0.029138418, -0.025050946, -0.0043201754, -0.001330452, 0.03151266, 0.022069653, -0.013301148, -0.009092266, -0.0070013143, -0.002585023, 0.022582274, 0.017604461, 0.020788103, -0.011648622, 0.009402536, 0.03100004, -0.01644432, -0.019830313, -0.0032156808, -0.019398632, -0.0063774018, -0.025981756, -0.00401665, -0.009335086, 0.019115342, -0.0036153223, -0.022757644, 0.014393839, -0.0017435836, -0.014596189, -0.0015690567, -0.004063865, 0.004654053, -0.0026103167, 0.04799745, 0.013962159, -8.678924e-05, -0.0011651994, -0.009328341, 0.03412972, 0.00834357, 0.019816823, -0.024174096, 0.04902269, 0.004667543, -0.0010530638, -0.0014029609, -0.01663318, 0.026966527, 0.00045570935, 0.0071699396, 0.009348576, -0.013415813, ...]]

}

features = Features({
"contexts": Sequence(Value("string")), # Explicitly define as Sequence[string]
"question": Value("string"),
"answer": Value("string")
})

Create a Dataset using the Hugging Face datasets library

data_set = Dataset.from_dict(data_set, features=features)

class CustomHuggingFaceRagasEmbeddings(BaseRagasEmbeddings):
def init(self, model_name: str, custome_embedding: dict = None):
"""
Initialize the Custom Hugging Face Ragas Embeddings with the specified model and custom embeddings.

    Parameters:
        model_name (str): The name of the Hugging Face model to use (e.g., 'distilbert-base-uncased').
        custom_embeddings (dict): A dictionary with pre-computed custom embeddings (optional), 
                                  where keys are text strings and values are the corresponding embeddings.
    """
    self.model_name = model_name
    self.tokenizer = AutoTokenizer.from_pretrained(model_name)
    self.model = AutoModel.from_pretrained(model_name)
    self.custome_embedding = custome_embedding if custome_embedding else {}  # Store custom embeddings as a dict

def embed_text(self, text: str) -> np.ndarray:
    """
    Check if the text has an existing custom embedding, if not, compute it using the model.
    
    Parameters:
        text (str): The text to embed.
    
    Returns:
        np.ndarray: The embedding for the text.
    """
    if text in self.custome_embedding:
        # Return the custom embedding if it exists
        return np.array(self.custome_embedding[text])
    
    # Generate a new embedding using the Hugging Face model
    inputs = self.tokenizer(text, return_tensors='pt', padding=True, truncation=True)

    with torch.no_grad():
        outputs = self.model(**inputs)

    # Use the CLS token embedding
    embedding = outputs.last_hidden_state[:, 0, :].numpy()
    
    # Optionally, save this new embedding to custom_embeddings for future use
    self.custome_embedding[text] = embedding
    
    return embedding

def embed_documents(self, texts: list) -> np.ndarray:
    """
    Generate embeddings for a list of documents. Check if custom embeddings are available first.
    
    Parameters:
        texts (list): A list of documents to embed.
    
    Returns:
        np.ndarray: An array of embeddings for the documents.
    """
    embeddings = []
    for text in texts:
        embedding = self.embed_text(text)
        embeddings.append(embedding)
    
    return np.array(embeddings)

def embed_query(self, query: str) -> np.ndarray:
    """
    Generate an embedding for a single query.
    
    Parameters:
        query (str): The query to embed.
    
    Returns:
        np.ndarray: The embedding for the query.
    """
    return self.embed_text(query)

#def preprocess_dataset(data_set):
#prompts = []
#for question, contexts, answer in zip(data_set["question"], data_set["contexts"], data_set["answer"]):
# prompt = f"Question: {question}\nContexts: {contexts}\nAnswer: {answer}"
# prompts.append(prompt)
#return prompts

def preprocess_dataset(dataset):
prompts = []
for question, contexts, answer in zip(dataset["question"], dataset["contexts"], dataset["answer"]):
prompt = f"Question: {question}\nContexts: {contexts}\nAnswer: {answer}"
prompts.append(prompt)
return prompts

prompts = preprocess_dataset(data_set)

for i, prompt in enumerate(prompts, 1):
print(f"Prompt {i}:\n{prompt}\n")

Check and print the type and first part of the prompt

#print(f"Type of prompts: {type(prompts)}") # Should confirm if it's a string or list
#print(f"Content of prompts (first 500 chars): {prompts[:500]}") # Display a portion of it

class CustomRagasLLM(BaseRagasLLM):
def init(self, api_key: str = None):
"""
Initialize the custom LLM, optionally using an API key if necessary.
"""
self.api_key = api_key

async def _call(self, prompt: str) -> str:
    """
    Process the prompt and return a result. This can be customized to
    use a local model or perform any required logic.
    """
    # Escaping special characters to ensure they're correctly processed
    prompt = self.escape_special_chars(prompt)
    # Debugging: Print the input prompt
    #print("DEBUG: Input prompt:", prompt)
    
    #if not self.api_key:
        #result = f"Processed: {prompt} (without API key)"
    #else:
        # Handle LLM response if using an API
        #result = f"Processed: {prompt} (with API key: {self.api_key})"
    result = {"prompt": prompt, "processed": True}
    # Debugging: Print the result of `_call`
    print("DEBUG: Result from _call:", result)
    return json.dumps(result)
def escape_special_chars(self, text: str) -> str:
    """
    Helper method to escape special characters in the prompt
    to ensure they are treated as raw text in downstream processing.
    """
    # Escape newlines and backslashes (you can add more if needed)
    return text #.replace("\\n", "\\\\n").replace("\\", "\\\\")

async def generate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    # Synchronous generation logic
    text = await self._call(prompt)
    # Debugging: Print the generated text
    print("DEBUG: Generated text in generate_text:", text)
    return LLMResult(generations=[[Generation(text=text)]])

async def agenerate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    """
    Asynchronous method to generate text. This should allow for async processing.
    """
    # Simulate an asynchronous call, here we directly call the sync method for now
    text = await self._call(prompt)
    # Debugging: Print the async generated text
    print("DEBUG: Async generated text in agenerate_text:", text)
    return LLMResult(generations=[[Generation(text=text)]])      

Create an instance of the custom LLM

custom_llm = CustomRagasLLM(api_key=None)

Define an async function to test the LLM

#async def test_llm_with_prompts():
#results = []
#for prompt in prompts:
# Call the custom LLM with each prompt
#result = await custom_llm.generate_text(prompt)
#results.append(result)
# Debugging: Print each result
#print(f"LLM Output for prompt: {result[:900]}") # Truncate to avoid overly large output
#return results

def is_json(myjson):
try:
json_object = json.loads(myjson)
except ValueError as e:
return False
return True

async def test_llm_with_prompts():
results = []
for prompt in prompts:
# Call the custom LLM with each prompt
llm_result = await custom_llm.generate_text(prompt=prompt)

    # Extract the text from LLMResult
    generated_text = llm_result.generations[0][0].text  # Access the first generation
    
    results.append(generated_text)
    
    # Debugging: Print each result (truncate to avoid overly large output)
    print(f"LLM Output for prompt: {generated_text[:900]}")  # Truncate for display
return results

if name == "main":

  # Ensure proper execution of async calls
all_results = asyncio.run(test_llm_with_prompts())
print(f"Total Results: {len(all_results)}")  # Confirm all prompts were processed

embedding_dict=embedding_dataset

ragas_embeddings = CustomHuggingFaceRagasEmbeddings(model_name='distilbert-base-uncased', custome_embedding= embedding_dict)
#custom_llm =CustomRagasLLM(api_key=None)

# Define the evaluation metrics
metrics = [context_utilization]


# Run the evaluation
evaluation_report = evaluate(data_set, metrics=metrics, embeddings=ragas_embeddings, llm=custom_llm,is_async=True)  # Pass your custom LLM here

print("RAGAS Evaluation Report:")
print(evaluation_report)

@amin-kh96
Copy link
Author

@dosu
my prompt is being modified by the metric ContextUtilization, there are some examples that are added to my prompt. can this cause the error failed to parse the output? how can I fix that?

Copy link

dosubot bot commented Jan 21, 2025

Yes, the modification of your prompt by the ContextUtilization metric, where examples are added to your prompt, can potentially cause the "failed to parse the output" error. This is because the output needs to be in a specific JSON format that complies with the schema specified by the output_model. If the added examples or the modified prompt result in an output that does not match this expected format, the RagasOutputParser will raise a RagasOutputParserException.

To fix this issue, you can use the FixOutputFormat class, which is designed to handle cases where the output string does not satisfy the constraints given in the prompt. It attempts to fix the output string and return it. Here is how you can use it:

class FixOutputFormat(PydanticPrompt[OutputStringAndPrompt, StringIO]):
    instruction = "The output string did not satisfy the constraints given in the prompt. Fix the output string and return it."
    input_model = OutputStringAndPrompt
    output_model = StringIO

fix_output_format_prompt = FixOutputFormat()

# Usage in RagasOutputParser
class RagasOutputParser(PydanticOutputParser[OutputModel]):
    async def parse_output_string(
        self,
        output_string: str,
        prompt_value: PromptValue,
        llm: BaseRagasLLM,
        callbacks: Callbacks,
        retries_left: int = 1,
    ):
        callbacks = callbacks or []
        try:
            jsonstr = extract_json(output_string)
            result = super().parse(jsonstr)
        except OutputParserException:
            if retries_left != 0:
                retry_rm, retry_cb = new_group(
                    name="fix_output_format",
                    inputs={"output_string": output_string},
                    callbacks=callbacks,
                )
                fixed_output_string = await fix_output_format_prompt.generate(
                    llm=llm,
                    data=OutputStringAndPrompt(
                        output_string=output_string,
                        prompt_value=prompt_value.to_string(),
                    ),
                    callbacks=retry_cb,
                    retries_left=retries_left - 1,
                )
                retry_rm.on_chain_end({"fixed_output_string": fixed_output_string})
                result = fixed_output_string
            else:
                raise RagasOutputParserException()
        return result

This code attempts to fix the output format if it doesn't match the expected JSON schema, retrying the process if necessary [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu

what if i see this error is the same one that you just mentioned(RagasOutputParserException)?
i see this error when i run my code:
Failed to parse output. Returning None.
Evaluating: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:08<00:00, 8.45s/it]
c:\Users\Amin\git\ragas-prototype.venv\Lib\site-packages\ragas\evaluation.py:276: RuntimeWarning: Mean of empty slice
value = np.nanmean(self.scores[cn])

Copy link

dosubot bot commented Jan 21, 2025

The error "Failed to parse output. Returning None." is indeed related to the RagasOutputParserException. This occurs when the output cannot be parsed into the expected format, even after retries. The "RuntimeWarning: Mean of empty slice" suggests that the scores for a particular context are not being populated, likely due to the parsing failure.

To address this:

  1. Ensure Output Format: Verify that the output from your CustomRagasLLM is in the expected JSON format. You can use the is_json function to check if the output is valid JSON.

  2. Use FixOutputFormat: Implement the FixOutputFormat class to handle and correct output format issues. This class can attempt to fix the output string if it doesn't match the expected format [1].

  3. Debug Prompt Modifications: Since the ContextUtilization metric modifies the prompt by adding examples, ensure that these modifications do not interfere with the expected output format. You might need to adjust the examples or the prompt structure to align with the expected JSON schema.

By ensuring the output is correctly formatted and using FixOutputFormat to handle any discrepancies, you should be able to resolve the parsing error and avoid the "Mean of empty slice" warning.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
Should the output of CustomRagasLLM be in JSON format?
What do you think about the input?

Copy link

dosubot bot commented Jan 21, 2025

The output of CustomRagasLLM should indeed be in JSON format. This is specified by the _generate_output_signature method in the PydanticPrompt class, which requires the output to comply with a JSON schema. Additionally, the ContextPrecisionPrompt class, a subclass of PydanticPrompt, explicitly instructs to provide the output in JSON format [1][2].

Regarding the input, the PydanticPrompt class uses a generic InputModel type, suggesting that the input should be structured according to a Pydantic model. The ContextPrecisionPrompt class specifies the input model as QAC, which includes fields for question, context, and answer. Therefore, the input should be a Pydantic model instance of QAC, structured with these fields [1][2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
i don't understand you say the input of customclasssllm should be prompt or something like that and the output should be a JSON?

Copy link

dosubot bot commented Jan 21, 2025

Yes, the input to your CustomRagasLLM should be a prompt, typically structured as a string that includes the necessary information for processing, such as the question, context, and answer. The output should be in JSON format, as this is the expected format for the RAGAS framework to parse and evaluate the results correctly.

To ensure compatibility with the RAGAS framework, you should:

  1. Input: Provide a structured prompt as input. This can be a string that includes all relevant details needed for processing, formatted according to your specific requirements.

  2. Output: Ensure that the output is a valid JSON string. This means the output should be structured as a JSON object, with keys and values that match the expected schema defined by the framework.

By following these guidelines, you can ensure that your CustomRagasLLM integrates smoothly with the RAGAS framework and avoids parsing errors.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
give me an example of the input and the output of customragasllm so I can compare it to my own.

Copy link

dosubot bot commented Jan 21, 2025

Here's an example of how you might structure the input and output for CustomRagasLLM:

Input Example:
The input to CustomRagasLLM is typically a prompt string that includes the necessary information for processing. This could be structured as follows:

prompt = "Question: What are the errors related to 3V tension in the Futura machine?\nContexts: [\"Error E306: Voltage 3.3V below limit - Check output connections - Replace hydraulic board\", \"Error E307: Internal voltage 3.3V above limit - Check output connections - Replace hydraulic board\"]\nAnswer: The errors related to 3V tension in the Futura machine include E306 and E307."

Output Example:
The output from CustomRagasLLM should be a JSON string that matches the expected schema. Here's an example of what the output might look like:

{
  "question": "What are the errors related to 3V tension in the Futura machine?",
  "contexts": [
    "Error E306: Voltage 3.3V below limit - Check output connections - Replace hydraulic board",
    "Error E307: Internal voltage 3.3V above limit - Check output connections - Replace hydraulic board"
  ],
  "answer": "The errors related to 3V tension in the Futura machine include E306 and E307."
}

Ensure that your CustomRagasLLM generates output in this JSON format, with keys and values that align with the expected schema. This will help avoid parsing errors and ensure compatibility with the RAGAS framework.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
what about this one is acceptable as an input for customragasllm?

Question: Quali sono gli errori del macchinario futura, riguardanti la tensione a 3V?
Contexts: ["re del potenziometro sliding non corretto- verificare il potenziometro - ripete"]
Answer: Gli errori riguardanti la tensione a 3V per

Copy link

dosubot bot commented Jan 21, 2025

The input you've provided for CustomRagasLLM is structured correctly as a prompt. It includes the necessary components: a question, contexts, and an answer. Here's a breakdown of your input:

  • Question: "Quali sono gli errori del macchinario futura, riguardanti la tensione a 3V?"
  • Contexts: ["re del potenziometro sliding non corretto- verificare il potenziometro - ripete"]
  • Answer: "Gli errori riguardanti la tensione a 3V per"

This format is acceptable as an input for CustomRagasLLM, assuming that the LLM is designed to process such structured prompts. Ensure that the output generated from this input is in a valid JSON format, as expected by the RAGAS framework.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
can I show you the customragasllm and you check if the output is a valid JSON or not? I only have the function, not the result.

Copy link

dosubot bot commented Jan 21, 2025

Certainly! You can share the function from your CustomRagasLLM, and I can help you determine if it is set up to produce a valid JSON output. Please provide the relevant code, and I'll review it for you.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu

class CustomRagasLLM(BaseRagasLLM):
def init(self, api_key: str = None):
"""
Initialize the custom LLM, optionally using an API key if necessary.
"""
self.api_key = api_key

async def _call(self, prompt: str) -> str:
    """
    Process the prompt and return a result. This can be customized to
    use a local model or perform any required logic.
    """
    # Escaping special characters to ensure they're correctly processed
    prompt = self.escape_special_chars(prompt)
    # Debugging: Print the input prompt
    #print("DEBUG: Input prompt:", prompt)
    
    #if not self.api_key:
        #result = f"Processed: {prompt} (without API key)"
    #else:
        # Handle LLM response if using an API
        #result = f"Processed: {prompt} (with API key: {self.api_key})"
    result = {"prompt": prompt, "processed": True}
    # Debugging: Print the result of `_call`
    print("DEBUG: Result from _call:", result)
    return json.dumps(result)
def escape_special_chars(self, text: str) -> str:
    """
    Helper method to escape special characters in the prompt
    to ensure they are treated as raw text in downstream processing.
    """
    # Escape newlines and backslashes (you can add more if needed)
    return text #.replace("\\n", "\\\\n").replace("\\", "\\\\")

async def generate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    # Synchronous generation logic
    text = await self._call(prompt)
    # Debugging: Print the generated text
    print("DEBUG: Generated text in generate_text:", text)
    return LLMResult(generations=[[Generation(text=text)]])

async def agenerate_text(
    self,
    prompt: PromptValue,
    n: int = 1,
    temperature: float = 1e-8,
    stop: t.Optional[t.List[str]] = None,
    callbacks: t.List = []
) -> LLMResult:
    """
    Asynchronous method to generate text. This should allow for async processing.
    """
    # Simulate an asynchronous call, here we directly call the sync method for now
    text = await self._call(prompt)
    # Debugging: Print the async generated text
    print("DEBUG: Async generated text in agenerate_text:", text)
    return LLMResult(generations=[[Generation(text=text)]])      

Copy link

dosubot bot commented Jan 21, 2025

The function you've provided for CustomRagasLLM appears to be set up to produce a JSON output. Here's a breakdown of how it works:

  1. _call Method: This method processes the prompt and returns a result. The result is constructed as a dictionary with keys "prompt" and "processed", and then it is converted to a JSON string using json.dumps(result). This ensures that the output is in JSON format.

  2. escape_special_chars Method: This method is intended to handle special characters in the prompt, although currently, it returns the text unchanged. You might want to implement escaping logic if needed.

  3. generate_text and agenerate_text Methods: These methods call _call to process the prompt and return the result as part of an LLMResult. The text returned by _call is already in JSON format, so these methods should correctly pass a JSON string as the output.

Given this setup, the output from your CustomRagasLLM should be a valid JSON string. If you encounter any issues with the output not being valid JSON, ensure that the result dictionary in _call is correctly structured and that json.dumps is used to convert it to a JSON string.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu

but when i run the code, in the class that i showed you i saw these errors:
Exception has occurred: TypeError
Object of type PromptValue is not JSON serializable
File "C:\Users\Amin\git\ragas-prototype\src\demo.py", line 184, in _call
return json.dumps(result)
^^^^^^^^^^^^^^^^^^
File "C:\Users\Amin\git\ragas-prototype\src\demo.py", line 219, in agenerate_text
text = await self._call(prompt)
TypeError: Object of type PromptValue is not JSON serializable

what should I do?

Copy link

dosubot bot commented Jan 21, 2025

The error "Object of type PromptValue is not JSON serializable" indicates that the prompt being passed to the _call method is an instance of PromptValue, which cannot be directly serialized to JSON using json.dumps(). To resolve this, you need to convert the PromptValue instance to a string or a dictionary that can be serialized.

Here's how you can modify your _call method to handle this:

  1. Convert PromptValue to String: If PromptValue has a method to convert it to a string (like to_string()), use that method to get a string representation of the prompt.

  2. Convert to Dictionary: If PromptValue can be represented as a dictionary, convert it to a dictionary before serialization.

Here's an example modification:

async def _call(self, prompt: PromptValue) -> str:
    """
    Process the prompt and return a result. This can be customized to
    use a local model or perform any required logic.
    """
    # Convert PromptValue to a string or dictionary
    prompt_str = prompt.to_string()  # Assuming to_string() method exists
    # Alternatively, if it can be converted to a dictionary
    # prompt_dict = prompt.to_dict()  # Assuming to_dict() method exists

    # Create the result dictionary
    result = {"prompt": prompt_str, "processed": True}

    # Debugging: Print the result of `_call`
    print("DEBUG: Result from _call:", result)

    # Return the result as a JSON string
    return json.dumps(result)

Make sure to replace prompt.to_string() with the appropriate method to convert PromptValue to a string or dictionary, depending on the available methods in your PromptValue class. This will ensure that the prompt is serializable and can be included in the JSON output.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu

ok the code you give me for to_string() lead me to this:
Exception has occurred: AttributeError
'str' object has no attribute 'to_string'
File "C:\Users\Amin\git\ragas-prototype\src\demo.py", line 175, in _call
prompt_str = prompt.to_string()
^^^^^^^^^^^^^^^^
File "C:\Users\Amin\git\ragas-prototype\src\demo.py", line 202, in generate_text
text = await self._call(prompt)
File "C:\Users\Amin\git\ragas-prototype\src\demo.py", line 250, in test_llm_with_prompts
llm_result = await custom_llm.generate_text(prompt=prompt)
File "C:\Users\Amin\git\ragas-prototype\src\demo.py", line 265, in
all_results = asyncio.run(test_llm_with_prompts())
AttributeError: 'str' object has no attribute 'to_string'

Copy link

dosubot bot commented Jan 21, 2025

The error "AttributeError: 'str' object has no attribute 'to_string'" suggests that the prompt being passed to the _call method is already a string, not an instance of PromptValue. Therefore, you don't need to call to_string() on it. You can directly use the prompt as it is.

Here's how you can modify your _call method:

async def _call(self, prompt: str) -> str:
    """
    Process the prompt and return a result. This can be customized to
    use a local model or perform any required logic.
    """
    # Since prompt is already a string, use it directly
    result = {"prompt": prompt, "processed": True}

    # Debugging: Print the result of `_call`
    print("DEBUG: Result from _call:", result)

    # Return the result as a JSON string
    return json.dumps(result)

This change assumes that the prompt passed to _call is a string, which seems to be the case based on the error message. This should resolve the AttributeError and allow your code to run without issues.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
ok but i sill see this error if i remove to_string() function.
and this is the error:
Exception has occurred: TypeError
Object of type PromptValue is not JSON serializable
File "C:\Users\Amin\git\ragas-prototype\src\demo.py", line 184, in _call
return json.dumps(result)
^^^^^^^^^^^^^^^^^^
File "C:\Users\Amin\git\ragas-prototype\src\demo.py", line 219, in agenerate_text
text = await self._call(prompt)
TypeError: Object of type PromptValue is not JSON serializable

Copy link

dosubot bot commented Jan 21, 2025

The error "Object of type PromptValue is not JSON serializable" indicates that the prompt being passed to the _call method is an instance of PromptValue, which cannot be directly serialized to JSON using json.dumps(). To resolve this, you need to convert the PromptValue instance to a string or a dictionary that can be serialized.

Here's how you can modify your _call method to handle this:

  1. Check if PromptValue has a method to convert it to a string: If PromptValue has a method like to_string(), use that method to get a string representation of the prompt.

  2. Convert to a Dictionary: If PromptValue can be represented as a dictionary, convert it to a dictionary before serialization.

Here's an example modification:

async def _call(self, prompt: PromptValue) -> str:
    """
    Process the prompt and return a result. This can be customized to
    use a local model or perform any required logic.
    """
    # Convert PromptValue to a string or dictionary
    if isinstance(prompt, PromptValue):
        prompt_str = prompt.to_string()  # Assuming to_string() method exists
    else:
        prompt_str = str(prompt)  # Fallback to converting to string

    # Create the result dictionary
    result = {"prompt": prompt_str, "processed": True}

    # Debugging: Print the result of `_call`
    print("DEBUG: Result from _call:", result)

    # Return the result as a JSON string
    return json.dumps(result)

Make sure to replace prompt.to_string() with the appropriate method to convert PromptValue to a string, depending on the available methods in your PromptValue class. This will ensure that the prompt is serializable and can be included in the JSON output. If PromptValue does not have a to_string() method, you may need to implement one or handle the conversion differently based on its attributes.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
I still see the same error. forget about the string lets change to a dictionary maybe it works.

Copy link

dosubot bot commented Jan 21, 2025

To address the error "Object of type PromptValue is not JSON serializable," you can convert the PromptValue instance to a dictionary before serializing it. This approach ensures that all attributes of PromptValue are represented in a JSON-serializable format.

Here's how you can modify your _call method to convert PromptValue to a dictionary:

  1. Implement a to_dict Method: If PromptValue does not already have a method to convert it to a dictionary, you may need to implement one. This method should return a dictionary representation of the PromptValue instance.

  2. Use the to_dict Method: Use this method to convert the PromptValue to a dictionary before serializing it with json.dumps().

Here's an example modification:

async def _call(self, prompt: PromptValue) -> str:
    """
    Process the prompt and return a result. This can be customized to
    use a local model or perform any required logic.
    """
    # Convert PromptValue to a dictionary
    if isinstance(prompt, PromptValue):
        prompt_dict = prompt.to_dict()  # Assuming to_dict() method exists
    else:
        prompt_dict = {"prompt": str(prompt)}  # Fallback to a simple dictionary

    # Create the result dictionary
    result = {"prompt": prompt_dict, "processed": True}

    # Debugging: Print the result of `_call`
    print("DEBUG: Result from _call:", result)

    # Return the result as a JSON string
    return json.dumps(result)

Ensure that the PromptValue class has a to_dict() method that returns a dictionary representation of its attributes. If it doesn't, you may need to implement this method based on the attributes of PromptValue. This approach should resolve the serialization error by ensuring that the prompt is in a JSON-serializable format.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@amin-kh96
Copy link
Author

@dosu
how can I see or know the attributes of PromptValue?

Copy link

dosubot bot commented Jan 21, 2025

To determine the attributes of the PromptValue class, you would typically look at the class definition in the codebase. However, I wasn't able to find specific details about the attributes of PromptValue in the RAGAS codebase from the information available.

If you have access to the source code, you can check the class definition directly. Look for the PromptValue class in the relevant module, and examine its __init__ method or any class-level attributes. This will give you a clear view of what attributes are defined for instances of PromptValue.

If you don't have access to the source code, you might consider using Python's built-in functions to inspect the class at runtime. For example, you can use dir() to list the attributes and methods of an instance:

# Assuming you have an instance of PromptValue
prompt_value_instance = PromptValue(...)
print(dir(prompt_value_instance))

This will list all the attributes and methods available on the prompt_value_instance, including any that are inherited from parent classes.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants