Navigating health information can be overwhelming, especially when it comes to understanding symptoms, treatments, and wellness tips. Health professionals aren't always readily available, and searching the web can often lead to conflicting or confusing advice.
The Health Assistant provides an AI-powered solution, offering users instant responses to health-related queries. This project leverages large language models (LLMs) to provide a conversational experience that simplifies access to reliable health information.
This project was implemented as part of LLM Zoomcamp, showcasing the practical use of Retrieval-Augmented Generation (RAG) techniques for health-related queries.
The Health Assistant is designed to assist users in various health-related areas. Whether it’s understanding symptoms or learning about medical treatments, the assistant provides information in an easy-to-use conversational format.
-
Symptom Checker: Users can describe their symptoms, and the assistant suggests possible conditions or treatments, helping them decide if they need to consult a healthcare provider.
-
Medication Information: Providing information about common medications, their uses, and possible side effects.
-
Dietary Guidance: Offering personalized nutritional advice based on user preferences or health conditions (e.g., low sugar, high fiber).
-
Health Alerts & First Aid Tips: Informing users about urgent health issues or offering quick tips for handling emergencies before professional help arrives.
This project utilizes a subset of the MedQuAD (Medical Question Answering Dataset), a comprehensive collection of medical question-answer pairs.
Key features include:
- Source: Derived from 12 NIH websites, including cancer.gov, niddk.nih.gov, GARD, and MedlinePlus Health Topics.
- Scope: Originally containing 47,457 question-answer pairs covering 37 question types related to diseases, drugs, and medical tests.
- Question types: Includes categories such as Treatment, Diagnosis, and Side Effects.
- Sampling method: The dataset was sampled to include only one question per focus area, reducing redundancy and overall volume.
- Purpose: To promote data science applications in healthcare, particularly in the field of medical question answering.
This curated dataset aims to provide a diverse yet focused collection of medical information, suitable for developing and testing healthcare-oriented data science models and applications.
A Question-Entailment Approach to Question Answering". Asma Ben Abacha and Dina Demner-Fushman. BMC Bioinformatics, 2019.
- Python 3.12.1: Core development
- Docker & Docker Compose: Containerization
- Minsearch and Elastic Search: Full-text search
- OpenAI API: Language model integration
- Streamlit: User Interface
- PostgreSQL: Database management
- Grafana: Monitoring and visualization
- Main application code is in the
app
folder:app.py
: Main entry pointrag.py
: Core RAG logicingest.py
: Data ingestion for knowledge baseminsearch2.py
: In-memory search enginedb.py
: Request/response logging to PostgreSQLdb_prep.py
: Database initializationtest.py
: Random question selector from generated ground truth data for testing
- Streamlit serves the application as a UI
ingest.py
handles data ingestion- In-memory database (
minsearch2.py
) used as knowledge base - Ingestion runs at application startup (executed in
rag.py
)
- Jupyter notebooks in
notebooks
folderstarter-notebook
: Data exploration and rag flow testground-truth-data.ipynb
: Evaluation dataset generationtext-search-eval.ipynb
: Retrieval evaluation of text search using misnearch and elastic searchvector-minsearch-eval.ipynb
: Vector experiments using minsearchvector-eleasticsearch-eval.ipynb
: Vector experiments using elastic searchrag-evaluation.ipynb
: RAG evaluation using combination vectorsrag-evaluation_2.ipynb
: RAG evaluation using boosted parameters
-
Text search (without boosting):
- Hit rate : 91%
- MRR : 86%
-
Text search with boosting:
- Hit rate : 90% (slightly worse)
- MRR : 86%
-
Hybrid Vector Search:
- question_answer_vector;
Hit Rate: 98%, MRR: 95%
- answer_focus_vector;
Hit Rate: 97%, MRR: 93%
- question_answer_focus_vector;
Hit Rate: 97%, MRR: 91%
- question_vector;
Hit Rate: 96%, MRR: 93%
- question_focus_vector;
Hit Rate: 96%, MRR: 92%
- answer_vector;
Hit Rate: 96%, MRR: 90%
- question_answer_vector;
-
Text search (without boosting):
- Hit rate : 96%
- MRR : 92%
-
Text search with tuned boosting:
- Hit rate : 97.7%
- MRR : 93%
Boosting parameters:
boost = {
'question': 2.209413642492037,
'answer': 2.030098462268734,
'source': 2.6765387031145362,
'focus_area': 0.26081649788824846
}
- Hybrid Vector Search:
- question_answer_vector;
Hit Rate: 99% MRR: 96%
- question_answer_focus_vector;
Hit Rate: 0.99%, MRR: 95%
- answer_focus_vector;
Hit Rate: 99%, MRR: 93%
- answer_vector;
Hit Rate: 98%, MRR: 90%
- question_vector;
Hit Rate: 97%, MRR: 93%
- question_focus_vector;
Hit Rate: 97%, MRR: 93%
- question_answer_vector;
Using LLM-as-a-Judge metric (sample) utilizing question_answer_vector
:
-
gpt-4o-mini
:- RELEVANT: 88%
- PARTLY_RELEVANT: 6%%
- NON_RELEVANT: 6%
-
gpt-4o
:- RELEVANT: 86%
- PARTLY_RELEVANT : 8%
- NON_RELEVANT: 6%
gpt-4o-mini
was chosen for the final implementation.