RAG Model for Gynecological Education Chatbot

Refer to this Notion Link for detailed documenation drafts.

This project is a Retrieval-Augmented Generation (RAG) model specifically designed to educate users on gynecological topics through a conversational chatbot. This chatbot leverages a Local Large Language Model (LLM) and is optimized to run on CPU, making it suitable for devices without a GPU. The pipeline integrates various tools and techniques to ensure accuracy, contextual memory, and efficiency in resource usage.

Running the Project

1. Run Using Docker

To run the project with Docker, follow these steps:

1.1 Build the Docker Image Open your CLI and execute the following command:

 ```bash
 docker build -t medbot-image .

You can replace medbot-image with any other tag you prefer.

1.2 Run the Docker Container To start the container, run:

docker run -p 8501:8501 medbot-image  # For Streamlit on port 8501

Note

To run the backend API, ensure that you use port 8000 for the FastAPI app (main.py).

2. Run Locally (Without Docker)

Since this project depends on the Ollama LLM model, which is loaded locally on the CPU, Docker might encounter issues during inference. Therefore, to avoid these errors, it is recommended to run the project locally without Docker.

Steps:

2.1 Update Paths: Locate any relative paths in the project code, uncomment them, and replace them with absolute paths as necessary.

2.2 Run the Streamlit App:

Open a Bash terminal.
Execute the following command:
```
streamlit run st_app.py
```

This will launch the Streamlit front end locally.

Project Overview

This project focuses on developing an AI chatbot for gynecological education, aiming to answer common questions related to gynecology. The system is designed to be modular and scalable, with a retrieval-augmented generation approach that leverages LangChain for chaining responses and Pinecone as the vector database for efficient information retrieval.

Pipeline Overview

The development pipeline consists of the following steps:

Model Selection: Choosing a suitable medical LLM for CPU-only environments.
Data Preparation: Gathering and processing relevant gynecological information from various sources.
Text Chunking: Segmenting the data into manageable chunks for better model comprehension.
Vector Database Selection: Storing embeddings and efficiently retrieving relevant chunks.
Retriever and Conversational Chain Setup: Implementing contextual memory for seamless user interactions.
Evaluation: Ensuring model accuracy using a mix of ground truth checks, LLM scoring, and retriever accuracy tests.

Model Selection

Approach

Since the system runs entirely locally on a CPU, model selection was a critical step. Initially, I explored different medical LLMs by examining the medical leaderboard on Hugging Face. Options included:

Open-source medical LLMs with fine-tuning on specialized datasets.
Generic open-source LLMs capable of handling medical knowledge.
Commercial API models (not suitable for our local setup).

After comparing performance, I chose MedLLaMA, ranked 8th among open-source medical LLMs, due to its fine-tuning on medical data. However, as MedLLaMA requires a GPU, I adapted my setup by using Ollama to run a locally available LLaMA v3.2 (7B parameters) model, balancing performance with hardware limitations.

Reasoning

Choosing MedLLaMA initially was due to its strong medical dataset fine-tuning, crucial for providing accurate gynecological information. Ultimately, LLaMA v3.2 was used in development to align with hardware constraints and avoid overburdening the system.

Dataset Preparation

Data Sources

For version 1 of the chatbot, I prioritized data that would be comprehensive and authoritative:

Gynecology-focused PDFs – These included textbooks and medical case studies.
QA Dataset – Aggregated from various medical Q&A websites.
Web-Scraped Articles – Supplementary information from reputable sources.

Reasoning

Starting with the PDF-based dataset ensured a solid foundation of structured, authoritative knowledge. Later versions can incorporate more diverse datasets to improve answer specificity.

Text Chunking Strategy

Chunking Options

For efficient processing, I tested three chunking methods:

Page-Based Split – Dividing content per page.
Recursive Text Splitter – Ideal for books, splitting text based on natural language boundaries.
Semantic Splitter – Used to break content into coherent, meaning-based sections.

Choice and Reasoning

After testing, the recursive text splitter proved optimal for PDF-based content, achieving a good balance between speed and relevance. The semantic splitter, although conceptually ideal, took over 30 minutes to process even 100 pages of an 800-page document, making it infeasible for this setup.

Vector Database Choice

Options Considered

ChromaDB – An open-source vector database with limitations in monitoring, storage, and high-dimensionality handling.
Pinecone – A cloud-based solution offering efficient indexing, monitoring, and scalability.

Choice and Reasoning

I selected Pinecone for its robustness in handling large datasets and its monitoring capabilities, which streamline development and enhance performance. Pinecone also supports high-dimensionality vectors, which are essential for accurate retrieval in RAG.

Retriever and Conversational Chain

Setup

The system incorporates a Retriever Chain using LangChain, which enables conversation flow and contextual memory management. This setup allows the chatbot to recall previous interactions, enhancing the overall user experience.

Reasoning

Using LangChain's retriever chain allows seamless integration of tools and makes it easier to maintain a modular approach. The contextual memory feature is crucial for maintaining coherence across interactions, as users may ask follow-up questions.

Evaluation

Evaluation Methods

Evaluation is essential to gauge the chatbot's accuracy and reliability. I used three primary methods:

QA Ground Truth Comparison – Comparing model responses with established QA pairs.
LLM-Based Scoring – Using LLMs to rate response relevance and coherence.
Retriever Accuracy Check – Ensuring the retriever fetches relevant information.

Resources

The evaluation framework was developed based on guidelines from Hugging Face’s Cookbook on RAG Evaluation.

Reasoning

These evaluation methods ensure the chatbot's responses are accurate, relevant, and contextually appropriate. Ground truth QA checks establish a baseline, while LLM scoring and retriever accuracy enhance precision.

Future Enhancements

This project is currently in its initial version, focusing on foundational setup and data accuracy. Planned improvements include:

Expanding Dataset – Incorporating web-scraped articles and QA datasets for broader coverage.
Real-Time Feedback Integration – Allowing users to rate responses to improve future interactions.
Enhanced Chunking and Retrieval – Exploring faster and more efficient chunking methods for larger datasets.

Conclusion

This RAG-based Gynecological Education Chatbot was developed with a focus on adaptability and efficiency, considering CPU limitations and resource constraints. The project aims to provide accessible, accurate, and relevant medical information through a local setup, making it suitable for educational purposes in environments with limited GPU access.

Running with Docker

cd .
docker build -t medbot-image .
docker run -p 8000:8000 medbot-image

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
PS		PS
api		api
dataset		dataset
inference_outputs		inference_outputs
src		src
.gitignore		.gitignore
=0.1.2		=0.1.2
README.md		README.md
dockerfile		dockerfile
notes.txt		notes.txt
requirements.txt		requirements.txt
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Model for Gynecological Education Chatbot

Running the Project

1. Run Using Docker

2. Run Locally (Without Docker)

Steps:

Table of Contents

Project Overview

Pipeline Overview

Model Selection

Approach

Reasoning

Dataset Preparation

Data Sources

Reasoning

Text Chunking Strategy

Chunking Options

Choice and Reasoning

Vector Database Choice

Options Considered

Choice and Reasoning

Retriever and Conversational Chain

Setup

Reasoning

Evaluation

Evaluation Methods

Resources

Reasoning

Future Enhancements

Conclusion

Running with Docker

About

Releases

Packages

Languages

AGAMPANDEYY/medbot-langchain

Folders and files

Latest commit

History

Repository files navigation

RAG Model for Gynecological Education Chatbot

Running the Project

1. Run Using Docker

2. Run Locally (Without Docker)

Steps:

Table of Contents

Project Overview

Pipeline Overview

Model Selection

Approach

Reasoning

Dataset Preparation

Data Sources

Reasoning

Text Chunking Strategy

Chunking Options

Choice and Reasoning

Vector Database Choice

Options Considered

Choice and Reasoning

Retriever and Conversational Chain

Setup

Reasoning

Evaluation

Evaluation Methods

Resources

Reasoning

Future Enhancements

Conclusion

Running with Docker

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages