Skip to content

Latest commit

 

History

History
119 lines (87 loc) · 3.96 KB

README.md

File metadata and controls

119 lines (87 loc) · 3.96 KB

AutoPM3: Enhancing Variant Interpretation via LLM-driven PM3 Evidence Extraction from Scientific Literature

License

Contact: Ruibang Luo, Shumin Li

Email: rbluo@cs.hku.hk, lishumin@connect.hku.hk

Introduction

We introduce AutoPM3, a method for automating the extraction of ACMG/AMP PM3 evidence from scientific literature using open-source LLMs. It combines an optimized RAG system for text comprehension and a TableLLM equipped with Text2SQL for data extraction. We evaluated AutoPM3 using our collected PM3-Bench, a dataset from ClinGen with 1,027 variant-publication pairs. AutoPM3 significantly outperformed other methods in variant hit and in trans variant identification, thanks to the four key modules. Additionally, we wrapped AutoPM3 with a user-friendly interface to enhances its accessibility. This study presents a powerful tool to improve rare disease diagnosis workflows by facilitating PM3-relevant evidence extraction from scientific literature.

AutoPM3's manucript describing its algorithms and results is at BioRxiv

Contents


Latest Updates

  • v0.1 (Oct, 2024): Initial release.

Online Demo

  • Check out our online demo: AutoPM3-demo. Please note, due to limited computing resources, we recommend deploying AutoPM3 locally to avoid long queuing times.

Installation

Dependency Installation

conda create -n AutoPM3 python=3.10
conda activate AutoPM3
pip3 install -r requirements.txt

Using Ollama to host LLMs

  1. Download Ollama Guidance
  2. Change the directory of Ollama models:
# please change the target folder as you prefer
mkdir ollama_models
export OLLAMA_MODELS=./ollama_models
ollama serve
  1. Download sqlcoder-mistral-7B model and fine-tuned Llama3:
cd $OLLAMA_MODELS
wget https://huggingface.co/MaziyarPanahi/sqlcoder-7b-Mistral-7B-Instruct-v0.2-slerp-GGUF/resolve/main/sqlcoder-7b-Mistral-7B-Instruct-v0.2-slerp.Q8_0.gguf?download=true
mv 'sqlcoder-7b-Mistral-7B-Instruct-v0.2-slerp.Q8_0.gguf?download=true' 'sqlcoder-7b-Mistral-7B-Instruct-v0.2-slerp.Q8_0.gguf'
echo "FROM ./sqlcoder-7b-Mistral-7B-Instruct-v0.2-slerp.Q8_0.gguf" >Modelfile1
ollama create sqlcoder-7b-Mistral-7B-Instruct-v0.2-slerp.Q8_0 -f Modelfile1

wget http://bio8.cs.hku.hk/AutoPM3/llama3_loraFT-8b-f16.gguf
echo "FROM ./llama3_loraFT-8b-f16.gguf" >Modelfile2
ollama create llama3_loraFT-8b-f16 -f Modelfile2
  1. Check the created models:
ollama list
  1. (Optional) Download other models as the backend of the RAG system:
# e.g. download Llama3:70B
ollama pull llama3:70B

Usage

Quick start

  • Step 1. Launch the local web-server:
streamlit run lit.py
  • Step 2. Copy the following http://localhost:8501 to the brower and start to use.

Advanced usage of the python script

  • Check the help of AutoPM3_main.py
python AutoPM3_main.py -h
  • The example of running python scripts:
python AutoPM3_main.py 
--query_variant "NM_004004.5:c.516G>C" ## HVGS format query variant
--paper_path ./xml_papers/20201936.xml ## paper path.
--model_name_text llama3_loraFT-8b-f16 ## change to llama3:70b or other hosted models as the backend of RAG as you prefer, noted that you need pull the model in Ollama in advance.

PM3-Bench

TODO

  • A fast set up for AutoPM3.