AutoPM3: Enhancing Variant Interpretation via LLM-driven PM3 Evidence Extraction from Scientific Literature
Contact: Ruibang Luo, Shumin Li
Email: rbluo@cs.hku.hk, lishumin@connect.hku.hk
We introduce AutoPM3, a method for automating the extraction of ACMG/AMP PM3 evidence from scientific literature using open-source LLMs. It combines an optimized RAG system for text comprehension and a TableLLM equipped with Text2SQL for data extraction. We evaluated AutoPM3 using our collected PM3-Bench, a dataset from ClinGen with 1,027 variant-publication pairs. AutoPM3 significantly outperformed other methods in variant hit and in trans variant identification, thanks to the four key modules. Additionally, we wrapped AutoPM3 with a user-friendly interface to enhances its accessibility. This study presents a powerful tool to improve rare disease diagnosis workflows by facilitating PM3-relevant evidence extraction from scientific literature.
AutoPM3's manucript describing its algorithms and results is at BioRxiv
- v0.1 (Oct, 2024): Initial release.
- Check out our online demo: AutoPM3-demo. Please note, due to limited computing resources, we recommend deploying AutoPM3 locally to avoid long queuing times.
conda create -n AutoPM3 python=3.10
conda activate AutoPM3
pip3 install -r requirements.txt
- Download Ollama Guidance
- Change the directory of Ollama models:
# please change the target folder as you prefer
mkdir ollama_models
export OLLAMA_MODELS=./ollama_models
ollama serve
- Download sqlcoder-mistral-7B model and fine-tuned Llama3:
cd $OLLAMA_MODELS
wget https://huggingface.co/MaziyarPanahi/sqlcoder-7b-Mistral-7B-Instruct-v0.2-slerp-GGUF/resolve/main/sqlcoder-7b-Mistral-7B-Instruct-v0.2-slerp.Q8_0.gguf?download=true
mv 'sqlcoder-7b-Mistral-7B-Instruct-v0.2-slerp.Q8_0.gguf?download=true' 'sqlcoder-7b-Mistral-7B-Instruct-v0.2-slerp.Q8_0.gguf'
echo "FROM ./sqlcoder-7b-Mistral-7B-Instruct-v0.2-slerp.Q8_0.gguf" >Modelfile1
ollama create sqlcoder-7b-Mistral-7B-Instruct-v0.2-slerp.Q8_0 -f Modelfile1
wget http://bio8.cs.hku.hk/AutoPM3/llama3_loraFT-8b-f16.gguf
echo "FROM ./llama3_loraFT-8b-f16.gguf" >Modelfile2
ollama create llama3_loraFT-8b-f16 -f Modelfile2
- Check the created models:
ollama list
- (Optional) Download other models as the backend of the RAG system:
# e.g. download Llama3:70B
ollama pull llama3:70B
- Step 1. Launch the local web-server:
streamlit run lit.py
- Step 2. Copy the following
http://localhost:8501
to the brower and start to use.
- Check the help of AutoPM3_main.py
python AutoPM3_main.py -h
- The example of running python scripts:
python AutoPM3_main.py
--query_variant "NM_004004.5:c.516G>C" ## HVGS format query variant
--paper_path ./xml_papers/20201936.xml ## paper path.
--model_name_text llama3_loraFT-8b-f16 ## change to llama3:70b or other hosted models as the backend of RAG as you prefer, noted that you need pull the model in Ollama in advance.
- We released PM3-Bench used in this study, details listed in PM3-Bench tutorial
- A fast set up for AutoPM3.