Optimizing Sukoon's Evaluation Pipeline with Deep Learning Models #8

luv-singh-ai · 2024-09-27T07:51:37Z

luv-singh-ai
Sep 27, 2024
Maintainer

Hi Devs,

We are working on enhancing the evaluation pipeline for Sukoon, our mental health chatbot. Currently, we use an analysis script that makes API calls with a specific prompt to evaluate conversations and generate JSON-formatted analyses. We want to try out some deep learning models for the same.

Objective:

Automate and Scale: Reduce reliance on external API calls to improve scalability.
Enhance Analysis: Utilize deep learning models for more nuanced and accurate evaluations.
Seamless Integration: Incorporate these capabilities directly into our pipeline.

Current Approach: Our script analyzes user messages and Sukoon's responses to produce a JSON output that includes:

User Analysis: Primary concern, subject category, emotional tone.
Sukoon's Response Evaluation: Ratings on empathy, relevance, clarity, helpfulness, overall score, strengths, areas for improvement, and suggested follow-up.

We're looking for recommendations on:

Deep Learning Libraries and Models:
Which libraries (e.g., TensorFlow, PyTorch) are best suited for tasks like sentiment analysis, intent recognition, and empathy scoring?
Are there pre-trained models (e.g., BERT, RoBERTa, GPT) that we can fine-tune for our needs?

Implementing Structured Output:
How to map model predictions directly to our required JSON format.
Best practices for generating structured data from model outputs.

Evaluation Metrics:
Appropriate metrics to assess model performance in this context.
Methods to validate the accuracy and helpfulness of the analyses.

Your Expertise Matters!

Your insights and suggestions will be invaluable in helping us improve Sukoon's evaluation pipeline. Any resources, experiences, or ideas you can share would be greatly appreciated.

Thank you and let's work together!

luv-singh-ai · 2024-09-27T07:52:47Z

luv-singh-ai
Sep 27, 2024
Maintainer Author

Current Approach

We are currently using an analysis script that makes API calls with the following prompt:

You are an expert in mental health communication and chatbot evaluation. Your job is to analyze incoming chat messages from users and the responses from an AI chatbot named 'Sukoon' and rate the responses. For each conversation, provide a detailed analysis in JSON format.

Here's an example of how to analyze a conversation:

User: "I've been feeling really down lately and I don't know why. Nothing seems to make me happy anymore."
Sukoon: "I'm sorry to hear that you've been feeling down. It's not uncommon to experience periods of low mood, and it can be frustrating when you can't pinpoint the reason. Have you noticed any changes in your daily routine or any recent stressful events? Sometimes, talking to a mental health professional can help you explore these feelings and find ways to cope. In the meantime, try to be gentle with yourself and engage in activities that have brought you joy in the past, even if they don't seem appealing right now. Would you like to discuss some simple self-care strategies that might help lift your mood?"

Example analysis:
{
  "user_analysis": {
    "primary_concern": "depression",
    "subject_category": "Emotional Support",
    "emotional_tone": "despondent"
  },
  "sukoon_response_evaluation": {
    "empathy_rating": 4,
    "relevance_rating": 5,
    "clarity_rating": 4,
    "helpfulness_rating": 4,
    "overall_rating": 4.25,
    "strengths": [
      "Acknowledged user's feelings",
      "Normalized the experience",
      "Suggested professional help",
      "Offered to provide self-care strategies"
    ],
    "areas_for_improvement": [
      "Could provide a specific example of a self-care activity",
      "Might ask about the duration of these feelings"
    ],
    "suggested_follow_up": "How long have you been experiencing these feelings of low mood?"
  }
}

Now, analyze the following conversation:

User: {user_message}
Sukoon: {sukoon_response}

Provide your analysis as a JSON object with the following structure:

{
  "user_analysis": {
    "primary_concern": "<main mental health issue or emotional state>",
    "subject_category": "<choose from: General Greeting, Emotional Support, Mental Health Information, Coping Strategies, Crisis Management, Feedback, or Other>",
    "emotional_tone": "<overall emotional tone of the user's message>"
  },
  "sukoon_response_evaluation": {
    "empathy_rating": <rate from 1-5 how well Sukoon demonstrated empathy>,
    "relevance_rating": <rate from 1-5 how relevant Sukoon's response was to the user's concern>,
    "clarity_rating": <rate from 1-5 how clear and easy to understand Sukoon's response was>,
    "helpfulness_rating": <rate from 1-5 how helpful Sukoon's suggestions or information were>,
    "overall_rating": <calculate the average of the above ratings>,
    "strengths": ["<list key strengths of Sukoon's response>"],
    "areas_for_improvement": ["<list areas where Sukoon's response could be improved>"],
    "suggested_follow_up": "<provide a suggestion for how Sukoon could follow up or what question it could ask next>"
  }
}

Ensure your analysis is thorough, impartial, and focuses on how well Sukoon addresses the user's mental health concerns. If the user's message indicates a crisis or severe mental health issue, note this in your analysis and evaluate how appropriately Sukoon handled the situation.

Example Discussion Points:
Sentiment and Emotional Tone Analysis: Implementing models that can detect subtle emotions in user messages to better understand their state of mind.
Empathy and Relevance Scoring: Developing a scoring system that quantifies how empathetic and relevant Sukoon's responses are.
Contextual Understanding: Ensuring the model can understand the conversation's context over multiple turns.
Anomaly and Crisis Detection: Identifying messages that indicate severe distress or crisis situations promptly.

I invite everyone to share their insights, experiences, and suggestions on the following:

Technical Approaches: Any ideas on model architectures, training methods, or pipelines that could work for us.
Resources and References: Papers, tutorials, or repositories that could guide our implementation.
Potential Challenges: Any foreseeable obstacles in this transition and how we might address them.

0 replies

joel-bansal · 2024-09-29T10:37:58Z

joel-bansal
Sep 29, 2024
Collaborator

I found some relevant kaggle dataset for empathy judgement https://www.kaggle.com/datasets/atharvjairath/empathetic-dialogues-facebook-ai/data

But the problem is, this is trained on 1-2 liner reponses, which also seems just. The bigger the response the harder it will be for our model to tell the empathy score. Also summarising the bots output will make it lose its context and the pattern in the text thus hampering our task again

0 replies

Ameer1803 · 2024-09-29T20:07:53Z

Ameer1803
Sep 29, 2024
Collaborator

https://www.canva.com/design/DAGSI3wDVfY/VKBQYBWo1_vERGVXqpuxXA/edit?utm_content=DAGSI3wDVfY&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton
Here is my breakdown on the current eval setup and my thoughts on making it better, subject to frequent updates

1 reply

luv-singh-ai Sep 30, 2024
Maintainer Author

Thanks Ameer for sharing this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing Sukoon's Evaluation Pipeline with Deep Learning Models #8

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Optimizing Sukoon's Evaluation Pipeline with Deep Learning Models #8

luv-singh-ai Sep 27, 2024 Maintainer

Replies: 3 comments · 1 reply

luv-singh-ai Sep 27, 2024 Maintainer Author

joel-bansal Sep 29, 2024 Collaborator

Ameer1803 Sep 29, 2024 Collaborator

luv-singh-ai Sep 30, 2024 Maintainer Author

luv-singh-ai
Sep 27, 2024
Maintainer

Replies: 3 comments 1 reply

luv-singh-ai
Sep 27, 2024
Maintainer Author

joel-bansal
Sep 29, 2024
Collaborator

Ameer1803
Sep 29, 2024
Collaborator

luv-singh-ai Sep 30, 2024
Maintainer Author